I need to read from a file and split the file for specific data and
send the results to an index in C#
Thank you |
Request for Question Clarification by
mathtalk-ga
on
08 Jan 2004 15:51 PST
Hi, amy123456-ga:
Would you clarify the requirements "for specific data" and "send the
results to an index", please?
In Unix there is a utility program "split" which divides an input file
into one or more output files of up to some size (measure in lines,
bytes, etc.). Another program is named "csplit" and makes a
subdivision of a file based on "context" defined by some command line
arguments, eg. regular expressions that may match to lines within the
input file.
Do either of those functions resemble what you are after?
Obviously the chances of helping you to write a program in C# (if
that's your goal) will be greatly enhanced by working from some clear
requirements.
regards, mathtalk-ga
|
Clarification of Question by
amy123456-ga
on
08 Jan 2004 17:14 PST
specific data: The file contains patient information. i need to parse
or split the following from the file (location, room #, patient name &
medical rec #).
index: After i remove the data from the file i need it to go to an
index file, I will need this file for manipulation later.
Thank you
|
Request for Question Clarification by
mathtalk-ga
on
08 Jan 2004 17:36 PST
So, would it be accurate to say that you are trying to extract some data items:
location, room #, patient name & medical rec #
from an input file and write them (in the form of a fixed width
record) to an output file called "index"?
--mathtalk-ga
|
Clarification of Question by
amy123456-ga
on
08 Jan 2004 18:36 PST
extract some data items from an input file : Yes -
write to a listbox as an index within the applicationin.
Thank you for the help.
I am willing to pay for your help, more then what has been stated- i
know $10.00 is nothing!!!! Thank you
|
Request for Question Clarification by
mathtalk-ga
on
08 Jan 2004 18:47 PST
Are you adding code to an existing application? Or are you creating a
.Net based GUI application from scratch?
We could drill down on the details of doing what you originally asked
about (extracting data from a file), but I think it will be helpful to
picture the context surrounding this function in a little detail.
regards,
mathtalk-ga
|
Request for Question Clarification by
mathtalk-ga
on
09 Jan 2004 07:51 PST
Hi, amy123456-ga:
I've asked the Google Answers editors to remove your last
clarification because it contains personal information. Unfortunately
as a Google Answers Researcher I am not able to contact you outside of
the Web site here. However the information you provided about the
application is very helpful, so I have copied it here:
(Amy123456-ga wrote:)
I am building an application in C#.
The process: A physician enters an order from an another application,
it sends a print job to RPM (remote print manager), RPM sends this
print job to a file as RTF.
Now I need to retrive the file and read the information for (Patient name,
Room #, Loc, Med rec #). It needs to go to an index within the
application because I have to retrieve the room number, because I use
the room # in an array. When it finds the room #, it will send a
command to the com port (I have built a box that has 28 room, that
represents the unit rooms) to light up the room # indicating to the
staff that this room has an active order in the system.
(end of Amy123456-ga's clarification)
Now the RTF format is the next thing we need to tackle. This is
basically a text file, and it should be possible to locate the
information you need by using regular expressions to match the
particular data items. To test this idea, let's begin with some
information about the files. You may want to try generating a few of
these for our testing/design requirements. Let me know how big these
files are and we can take a look at them. One way to do this is to
open the file in an editor such as DevStudio (the Visual Studio code
editor), but you may have another favorite such as TextPad that you
prefer.
You will see that the beginning of an RTF file has a lot of
complicated looking configuration data: colors, fonts, etc. Lines in
an RTF file tend to be long, but not ridiculously long because the RTF
specs limit it. Skipping past the early section, you will begin to
see the paragraphs of the printed content. The data items you are
looking for should be easily recognizable by a human (you), and we'll
probably need to cook up a regular expression to match that reliably.
regards, mathtalk-ga
|
Clarification of Question by
amy123456-ga
on
09 Jan 2004 10:59 PST
I am sending you a copy as it is in RTF.
CCMC
Order Session
Groschel, Hans MS6 11-01 25y F tending: GAGNON, PAUL H
07/08/197 0111111/
0000022222222
BREASTMILK -C Requested: 01/08/2004 ASAP A
ctive
Soy Free, 1.5 gms Enfamil powder/100cc bm 22 kcals/oz, 4 gms Prosobee
powder/100cc bm 26 kcals/oz, 4 gms Lactofree powder/100cc bm 26 kcals/oz,
24 kcals/oz = 3 gm EnfaCare powder (1-1/4 tsp)/100cc bm, 24 kcals/oz = 3 gm
Nutramigen per 100cc bm, 24 kca...
|
Request for Question Clarification by
mathtalk-ga
on
09 Jan 2004 14:38 PST
Hi, Amy:
This is actual a plain text representation, rather than the RTF, but
it will do as a starting point. I'll explain the difference in a
Comment below, but first please point out to me the fields in this
text which are of interest.
I imagine that the first name shown is the patient's name, and that
the following information contains the room number, location, and
medical number that you will need to extract. But please write a
short Clarification which picks them out for me.
regards, mathtalk-ga
|
Clarification of Question by
amy123456-ga
on
10 Jan 2004 20:07 PST
Patient name. Groschel, Hans
Room # 11
Loc- MS6
Medical Rec # 0111111/ 0000022222222
Thank you
|
Clarification of Question by
amy123456-ga
on
11 Jan 2004 09:16 PST
Thank you. i can not wait to try it.
|
Clarification of Question by
amy123456-ga
on
12 Jan 2004 13:16 PST
Hi. We decide that we will not use the medical rec.
That should make it slightly better.
Thank you
|
Request for Question Clarification by
mathtalk-ga
on
12 Jan 2004 13:43 PST
Hi, Amy:
Yes, that will simplify things. Now I'm going to create some sample
RTF using Microsoft Word, but this will probably be more complicated
in some ways than the RTF file being saved (created?) by your Remote
Print Manager.
So how about having a look for yourself, using Notepad or another
"plain" text editor, to see what the RTF codes look like. (I don't
want you to be shocked, so be sitting down when you try this!).
regards, mathtalk-ga
|
Clarification of Question by
amy123456-ga
on
13 Jan 2004 11:48 PST
RPM is actually sending plain text.
Thank you
|
Clarification of Question by
amy123456-ga
on
14 Jan 2004 12:05 PST
You cannot modify or comment on this question right now. It is
currently being answered.
Why is this on my screen. i can not do or see any action. Please advise.
thank you
|
Request for Question Clarification by
mathtalk-ga
on
14 Jan 2004 20:56 PST
Hi, Amy:
That was me, "locking" the question. Typically while a Researcher is
working on an Answer, the Question cannot be updated except for the
Customer to post a Clarification (as you did). I don't think anybody
else is apt to jump in, so if you like I can leave the Question
unlocked while I continue to work on it.
Let me know if you need further explanation. The locking is most
useful for fairly quick Questions, to avoid two Researchers working on
an Answer without being aware of what the other is doing.
regards, mathtalk-ga
|
Clarification of Question by
amy123456-ga
on
15 Jan 2004 03:20 PST
Hi. thank you.
PS - Do you have an idea when you might have a solution? I am under
some pressure at work.
PS- This project has many questions which i hope you can continue to
work with me as i post new questions!
Thank you very much
|
Request for Question Clarification by
mathtalk-ga
on
15 Jan 2004 05:18 PST
Hi, Amy:
I'll post a working "console" program tonight of reading the textfile
and printing out these items:
Patient last name
Patient first name
Room number
Location
You can try the program on some "real" data and we can then tweak the
code as we need to fix any problems.
My idea is to search for the first non-blank line (actually, a line
containing a comma) after the first line which contains "Order
Session". That line should then be the one that contains all the
required information to extract.
Will this work for you?
regards, mathtalk-ga
|
Clarification of Question by
amy123456-ga
on
15 Jan 2004 07:04 PST
Yes
Thank you
|
Request for Question Clarification by
mathtalk-ga
on
16 Jan 2004 05:55 PST
Hi, amy:
I've posted my draft C# code below as a Comment. I guess we need to
discuss which version of the .Net Framework Class Library you will be
using. This is pretty basic stuff, so I hoped it would work the same
on both 1.0 and 1.1, but it looks like they may have left out support
for regular expressions in the early copy of 1.1 that I installed.
I created an empty C# project called textextract and added the
(existing) file shown in the Comment as textextract.cs. Give it a
shot and let me know how it works for you. I tried to give some
thought to how the code might work if some of the text items are
missing, and this is the sort of thing that one spends a lot of time
testing.
regards, mathtalk-ga
|
Clarification of Question by
amy123456-ga
on
16 Jan 2004 06:52 PST
Thank you. I have loaded in visual studio C#.
I received some erros,
C:\JavaPrac\Read File web\Class1.cs(46): 'System.IO.StreamReader' does
not contain a definition for 'Readline'
C:\JavaPrac\Read File web\Class1.cs(61): 'System.IO.StreamReader' does
not contain a definition for 'Readline'
C:\JavaPrac\Read File web\Class1.cs(50): Cannot implicitly convert
type 'int' to 'bool'
C:\JavaPrac\Read File web\Class1.cs(78): Method 'string.Trim(params
char[])' referenced without parentheses
C:\JavaPrac\Read File web\Class1.cs(69): Method 'string.Trim(params
char[])' referenced without parentheses
C:\JavaPrac\Read File web\Class1.cs(84): Method 'string.Trim(params
char[])' referenced without parentheses
C:\JavaPrac\Read File web\Class1.cs(87): Method 'string.Trim(params
char[])' referenced without parentheses
C:\JavaPrac\Read File web\Class1.cs(93): Method 'string.Trim(params
char[])' referenced without parentheses
C:\JavaPrac\Read File web\Class1.cs(96): Method 'string.Trim(params
char[])' referenced without parentheses
C:\JavaPrac\Read File web\Class1.cs(50): The left-hand side of an
assignment must be a variable, property or indexer
C:\JavaPrac\Read File web\Class1.cs(38): The type or namespace name
'sr' could not be found (are you missing a using directive or an
assembly reference?)
|
Request for Question Clarification by
mathtalk-ga
on
16 Jan 2004 07:10 PST
Hi, amy:
Some of these are simple syntax errors (such as Readline instead of
ReadLine, my bad). But which .Net Framework Class Library are you
using, 1.0 or 1.1?
--mathtalk
|
Clarification of Question by
amy123456-ga
on
16 Jan 2004 07:28 PST
FCL-1.1
|
Clarification of Question by
amy123456-ga
on
16 Jan 2004 07:49 PST
I am getting an: unhandled exeption of type 'System.indexOutOfRangeException'
occurred in Read File web.exe.
class TxtXtrct
{
public static void Main(string[] args)
{
if(args==null)
{
Console.WriteLine("c:\\lightsr\test22.txt.");
}
else
{
TxtXtrct.ListItems(args[0]);
}
return;
}
|
Clarification of Question by
amy123456-ga
on
16 Jan 2004 07:53 PST
Can you see if the file loc that i placed is correct. Thank you
file loc: C:\lightsr\order.txt
using System;
using System.IO;
using System.Text;
using System.Data;
using System.Text.RegularExpressions;
namespace textextract
{
/// <summary>
/// Main Class TxtXtrct for text extract console app.
/// </summary>
class TxtXtrct
{
public static void Main(string[] args)
{
if(args==null)
{
Console.WriteLine("c:\\lightsr\test22.txt.");
}
else
{
TxtXtrct.ListItems(args[0]);
}
return;
}
/// <summary>
/// Method for extracting text items from file
/// </summary>
/// <param name="filename">root/path/filename.ext</param>
static void ListItems(string filename)
{
if (!File.Exists("c:\\lightsr\\order.rtf"))
{
Console.WriteLine(
"Error: Specified file {0} does not exist.",
filename);
//sr.Close();
return;
}
StreamReader sr = File.OpenText("c:\\lightsr\\order.rtf");
while(sr.Peek() > -1)
{
if (Regex.IsMatch(sr.ReadLine(), "Order Session"))
break;
}
if (sr.Peek() == -1)//test
{
Console.WriteLine(
"Error: Order Session header was not found.");
sr.Close();
return;
}
while (sr.Peek() > -1)
{
string extractLine = sr.ReadLine();
if (Regex.IsMatch(extractLine, ","))
{
// found the line with the items of interest
// the last name is the trimmed part in front of the comma
string lastname = extractLine.Substring(
0, extractLine.IndexOf(",")).Trim();
Console.WriteLine("Last name: " + lastname);
// the first name is trimmed next part up to triple space
string extract1 = extractLine.Substring(
extractLine.IndexOf(",") + 1);
string firstname = extract1.Substring(
0, extract1.IndexOf(" ")).Trim();
Console.WriteLine("First name: " + firstname);
// the location is next after first name
string extract2 = extract1.Substring(
extract1.IndexOf(firstname + " ")).Trim(); // test
string location = extract2.Substring(
extract2.IndexOf(" ")).Trim();
Console.WriteLine("Location: " + location);
// the last item room number is next, up to a dash
string extract3 = extract2.Substring(
extract2.IndexOf(location + " ")).Trim();
string roomnumber = extract3.Substring(
extract3.IndexOf("-")).Trim();
Console.WriteLine("Room number: " + roomnumber);
break;
}
}
sr.Close();
return;
}
}
}
|
Request for Question Clarification by
mathtalk-ga
on
16 Jan 2004 09:32 PST
Hi, amy:
Thanks, I will focus on using FCL 1.1 as well. The index out of range
is no doubt a bug in my use of IndexOf. I will step through the code
and get that cleaned up, then provide the necessary fixes.
--mathtalk
|
Clarification of Question by
amy123456-ga
on
16 Jan 2004 09:47 PST
Thank you
|
Request for Question Clarification by
mathtalk-ga
on
16 Jan 2004 11:21 PST
Hi, if you want to hard code the filename, then do it this way:
class TxtXtrct
{
public static void Main(string[] args)
{
TxtXtrct.ListItems("c:\\lightsr\test22.txt.");
return;
}
I think the error you were seeing comes about from call the program
with no command line arguments. My code to detect this needs some
fixing, but the above will get you past that point.
regards, mathtalk-ga
|
Clarification of Question by
amy123456-ga
on
17 Jan 2004 21:28 PST
Hi. it is looking good. I just have a few questions?
for the location i get the following return- MS6 11-01 25y F
tending: GAGNON, PAUL H
for roomnumber i get nothing.
Thank you very much.
// the location is next after first name
string extract2 = extract1.Substring(
extract1.IndexOf(firstname + " ")).Trim(); // test
string location = extract2.Substring(
extract2.IndexOf(" ")).Trim();
Console.WriteLine("Location: " + location);
// the last item room number is next, up to a dash
string extract3 = extract2.Substring(
extract2.IndexOf(location + "-")).Trim();
string roomnumber = extract3.Substring(
extract3.IndexOf(" ")).Trim(); // -
Console.WriteLine("Room number: " + roomnumber);
|
Clarification of Question by
amy123456-ga
on
18 Jan 2004 08:16 PST
Hi I continue to receive this error
An unhandled exception of type 'System.ArgumentOutOfRangeException'
occurred in mscorlib.dll
Additional information: StartIndex cannot be less than zero.
The program '[2520] Parse file out1.exe' has exited with code 0 (0x0).
|
Request for Question Clarification by
mathtalk-ga
on
18 Jan 2004 08:34 PST
The location and room number problems and the invalid index exception
are related, and I've fixed these bugs in my current version of the
code. I can post the debugged code as an Answer, but perhaps we
should discuss an idea that might make the extraction more robust.
Is it possible to give a ooncise description of the valid locations?
I'm thinking that if we always have a location (which must conform to
a very tight specification) then by focusing in on that item first, we
will be able to extract the name (firstname, lastname) in a more
reliable fashion. There are many variations in names, including the
possibility of missing (say) a first name (or having multiple first
names).
Also, what is the format of the room number? When I was first coding
the extraction I thought the room number would be just a two digit
value. But as I reflect on your original example, I'm wondering if
the room number (which you showed as simply 11) being given as 11-01
might means "11th floor, room 1" or something of the sort. Anyway,
for now I've just extracted the digits in front of the hyphen to be
consistent with your earlier clarification. I'm producing a string
value, but this can easily be converted to an integer value if that
would be better for your purposes.
regards, mathtalk-ga
|
Clarification of Question by
amy123456-ga
on
18 Jan 2004 11:40 PST
Hi.
first of. thank you.
MS6 11-1 means MEDICAL SERGICAL6TH floor, room 11 bed 1.
Do not worry about the bed # BECAUSE WE ONLY HAVE ONE PATIENT PER ROOM.
|
Request for Question Clarification by
mathtalk-ga
on
18 Jan 2004 13:25 PST
Okay, that takes care of the room number. Would it be safe to assume
that the location will always be two letters + a numeric part (as in
"MS6")?
--mathtalk-ga
|
Clarification of Question by
amy123456-ga
on
18 Jan 2004 13:42 PST
The conf for the hospital is.\
Floors-- MS6, MS7, MS8 romm # 1-28
Units - PICU, NICU, MS8I - Room # 1-26
PS. When you post the answer for this part, let me know when i can
post another question. all for the same project. Thank you
|
Hi, amy123456-ga:
Here is a revised version of the code which properly extracts the data
items from the sample file.
Some comments on how it works follow the code.
/* begin textextract.cs */
using System;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;
namespace textextract
{
/// <summary>
/// Main Class TxtXtrct for text extract console app.
/// </summary>
class TxtXtrct
{
public static void Main(string[] args)
{
if (args.Length == 0)
{
Console.WriteLine("You must specify a filename.");
}
else
{
TxtXtrct.ListItems(args[0]);
}
return;
}
/// <summary>
/// Method for extracting text items from file
/// </summary>
/// <param name="filename">root/path/filename.ext</param>
static void ListItems(string filename)
{
if (!File.Exists(filename))
{
Console.WriteLine(
"Error: Specified file {0} does not exist.",
filename);
return;
}
StreamReader sr = File.OpenText(filename);
while(sr.Peek() > -1)
{
if (Regex.IsMatch(sr.ReadLine(), "Order Session"))
break;
}
if (sr.Peek() == -1)
{
Console.WriteLine(
"Error: Order Session header was not found.");
sr.Close();
return;
}
while (sr.Peek() > -1)
{
string extractLine = sr.ReadLine();
if (Regex.IsMatch(extractLine, ","))
{
// found the line with the items of interest
// the last name is the trimmed part in front of the comma
string lastname = extractLine.Substring(
0, extractLine.IndexOf(",")).Trim();
Console.WriteLine("Last name: " + lastname);
// the first name is trimmed next part up to triple space
string extract1 = extractLine.Substring(
extractLine.IndexOf(",") + 1);
string firstname = extract1.Substring(
0, extract1.IndexOf(" ")).Trim();
Console.WriteLine("First name: " + firstname);
// the location is next after first name
string extract2 = extract1.Substring(
extract1.IndexOf(firstname + " ")
+ firstname.Length ).Trim();
string location = extract2.Substring(
0, extract2.IndexOf(" ")).Trim();
Console.WriteLine("Location: " + location);
// the last item room number is next, up to a dash
string extract3 = extract2.Substring(
extract2.IndexOf(location + " ")
+ location.Length ).Trim();
string roomnumber = extract3.Substring(
0, extract3.IndexOf("-")).Trim();
Console.WriteLine("Room number: " + roomnumber);
break;
}
}
sr.Close();
return;
}
}
}
/* end of textextract.cs */
This program is intended to illustrate some useful techniques for
extract specific data items from a text file. First, of course, we
test for the existence of a file (given its name) and open that file
with a "StreamReader" object.
Then we read lines from the StreamReader and manipulate the resulting
strings in various ways. The "regular expression" class Regex allows
us to match a variety of patterns within strings, but the use made
here is minimal. We look first for a line that contains "Order
Session", and subsequently for a line that has a comma.
Once we have that line, we use the Substring and IndexOf methods of
the string class to extract the data items we need.
This code:
lastname = extractLine.Substring(0, extractLine.IndexOf(",")).Trim();
illustrates the ideas. We find where the comma is in the string
extractLine with the expression:
extractLine.IndexOf(",")
The characters in a string are identified by zero-based indexing.
Thus when we ask for the substring beginning at position 0 (the first
character) having length extractLine.IndexOf(","), this gives
everything in front of (but not including) the comma character. The
final Trim() method removes any whitespace from both ends of the
result.
In the next step we use Substring with a single argument:
extract1 = extractLine.Substring(extractLine.IndexOf(",") + 1);
which starts at one position beyond the comma and takes the rest of
the string. From here we would continue to extract the first name and
other items until we are finished.
If you have any questions about the details of the code, please post a
Request for Clarification.
The structure of the code is linear, in the sense of proceding forward
through the file in one pass. While this seems adequate for the
immediate task, parsing text often requires a more complicated
arrangements. As your project progresses you might need to revisit
this code and make a more elaborate search logic. Especially if the
code proves unreliable, we might need to wrap the search in a
try/catch exception handler. Further testing will hopefully help you
decide if this is the case.
regards, mathtalk-ga |
Request for Answer Clarification by
amy123456-ga
on
19 Jan 2004 09:44 PST
Hi . The code is not running at all, it starts, blinks and terminates.
|
Request for Answer Clarification by
amy123456-ga
on
19 Jan 2004 09:53 PST
I can go as far as here. ***** Console.WriteLine("order.rt"); ******
namespace textextract
{
/// <summary>
/// Main Class TxtXtrct for text extract console app.
/// </summary>
class TxtXtrct
{
public static void Main(string[] args)
{
if (args.Length == 0)
{
***** Console.WriteLine("order.rt");*****
|
Clarification of Answer by
mathtalk-ga
on
19 Jan 2004 11:59 PST
Hi, amy:
Let me be a bit more specific about how to run the code as a standalone program.
I started with a new "empty project" in C#, then added the file
textextract.cs to the project as an existing file. It was then
necessary (to support the System.Text.RegularExpressions namespace,
though none of the others) to add a Reference to the Project for
System.dll (right click on the project and do Add Reference...;
System.dll can be found in alphabetic order under the .Net tab of the
dialog).
From there the program can be executed by putting the path to the file
you want to extract text from on the command line, e.g.
testextract C:\testdata.txt
The program checks the length of the command line arguments (an array
of string) and if this is greater than 0, it tests for the existence
of the file being named by args[0], the first command line argument.
There is a setting under the Debug options to supply the command line
arguments in the debugging mode. Is that how you are trying to run
the program?
If you want, I can zip up my entire project and post it where you can
download it, if you think that would be expeditious.
regards, mathtalk-ga
|