Google Answers Logo
View Question
 
Q: C# split-- read from a file and split the file for specific data ( Answered 5 out of 5 stars,   9 Comments )
Question  
Subject: C# split-- read from a file and split the file for specific data
Category: Computers > Programming
Asked by: amy123456-ga
List Price: $75.00
Posted: 08 Jan 2004 14:53 PST
Expires: 07 Feb 2004 14:53 PST
Question ID: 294531
I need to read from a file and  split the file for specific data and
send the results to an index in    C#

Thank you

Request for Question Clarification by mathtalk-ga on 08 Jan 2004 15:51 PST
Hi, amy123456-ga:

Would you clarify the requirements "for specific data" and "send the
results to an index", please?

In Unix there is a utility program "split" which divides an input file
into one or more output files of up to some size (measure in lines,
bytes, etc.).  Another program is named "csplit" and makes a
subdivision of a file based on "context" defined by some command line
arguments, eg. regular expressions that may match to lines within the
input file.

Do either of those functions resemble what you are after?

Obviously the chances of helping you to write a program in C# (if
that's your goal) will be greatly enhanced by working from some clear
requirements.

regards, mathtalk-ga

Clarification of Question by amy123456-ga on 08 Jan 2004 17:14 PST
specific data:  The file contains patient information. i need to parse
or split the following from the file (location, room #, patient name &
medical rec #).

index:  After i remove the data from the file i need it to go to an      
        index file, I will need this file for manipulation later.

Thank you

Request for Question Clarification by mathtalk-ga on 08 Jan 2004 17:36 PST
So, would it be accurate to say that you are trying to extract some data items:

location, room #, patient name & medical rec #

from an input file and write them (in the form of a fixed width
record) to an output file called "index"?

--mathtalk-ga

Clarification of Question by amy123456-ga on 08 Jan 2004 18:36 PST
extract some data items from an input file : Yes - 

write to a listbox as an index within the applicationin.

Thank you for the help.

I am willing to pay for your help,  more then what has been stated- i
know $10.00 is nothing!!!!   Thank you

Request for Question Clarification by mathtalk-ga on 08 Jan 2004 18:47 PST
Are you adding code to an existing application?  Or are you creating a
.Net based GUI application from scratch?

We could drill down on the details of doing what you originally asked
about (extracting data from a file), but I think it will be helpful to
picture the context surrounding this function in a little detail.

regards,
mathtalk-ga

Request for Question Clarification by mathtalk-ga on 09 Jan 2004 07:51 PST
Hi, amy123456-ga:

I've asked the Google Answers editors to remove your last
clarification because it contains personal information.  Unfortunately
as a Google Answers Researcher I am not able to contact you outside of
the Web site here.  However the information you provided about the
application is very helpful, so I have copied it here:

(Amy123456-ga wrote:)

I am building an application in C#.  

The process: A physician enters an order from an another application,
it sends a print job to RPM (remote print manager), RPM sends this
print job to a file as RTF.

Now I need to retrive the file and read the information for (Patient name,
Room #, Loc, Med rec #). It needs to go to an index within the
application because I have to retrieve the room number, because I use
the room # in an array.  When it finds the room #, it will send a
command to the com port (I have built a box that has 28 room, that
represents the unit rooms) to light up the room # indicating to the
staff that this room has an active order in the system.

(end of Amy123456-ga's clarification)

Now the RTF format is the next thing we need to tackle.  This is
basically a text file, and it should be possible to locate the
information you need by using regular expressions to match the
particular data items.  To test this idea, let's begin with some
information about the files.  You may want to try generating a few of
these for our testing/design requirements.  Let me know how big these
files are and we can take a look at them.  One way to do this is to
open the file in an editor such as DevStudio (the Visual Studio code
editor), but you may have another favorite such as TextPad that you
prefer.

You will see that the beginning of an RTF file has a lot of
complicated looking configuration data: colors, fonts, etc.  Lines in
an RTF file tend to be long, but not ridiculously long because the RTF
specs limit it.  Skipping past the early section, you will begin to
see the paragraphs of the printed content.  The data items you are
looking for should be easily recognizable by a human (you), and we'll
probably need to cook up a regular expression to match that reliably.

regards, mathtalk-ga

Clarification of Question by amy123456-ga on 09 Jan 2004 10:59 PST
I am sending you a copy as it is in RTF. 



                                       CCMC
                                  Order Session


     Groschel, Hans       MS6 11-01         25y   F tending: GAGNON, PAUL H
                                             07/08/197                0111111/ 
                                                                  0000022222222
        BREASTMILK -C                Requested:  01/08/2004 ASAP    A
                                                                     ctive

     Soy Free, 1.5 gms Enfamil powder/100cc bm 22 kcals/oz, 4 gms Prosobee 
     powder/100cc bm 26 kcals/oz, 4 gms Lactofree powder/100cc bm 26 kcals/oz, 
     24 kcals/oz = 3 gm EnfaCare powder (1-1/4 tsp)/100cc bm, 24 kcals/oz = 3 gm 
     Nutramigen per 100cc bm, 24 kca...

Request for Question Clarification by mathtalk-ga on 09 Jan 2004 14:38 PST
Hi, Amy:

This is actual a plain text representation, rather than the RTF, but
it will do as a starting point.  I'll explain the difference in a
Comment below, but first please point out to me the fields in this
text which are of interest.

I imagine that the first name shown is the patient's name, and that
the following information contains the room number, location, and
medical number that you will need to extract.  But please write a
short Clarification which picks them out for me.

regards, mathtalk-ga

Clarification of Question by amy123456-ga on 10 Jan 2004 20:07 PST
Patient name.      Groschel, Hans       
Room #     11
Loc-      MS6 
Medical Rec #   0111111/ 0000022222222


Thank you

Clarification of Question by amy123456-ga on 11 Jan 2004 09:16 PST
Thank you. i can not wait to try it.

Clarification of Question by amy123456-ga on 12 Jan 2004 13:16 PST
Hi. We decide that we will not use the medical rec.

That should make it slightly better.

Thank you

Request for Question Clarification by mathtalk-ga on 12 Jan 2004 13:43 PST
Hi, Amy:

Yes, that will simplify things.  Now I'm going to create some sample
RTF using Microsoft Word, but this will probably be more complicated
in some ways than the RTF file being saved (created?) by your Remote
Print Manager.

So how about having a look for yourself, using Notepad or another
"plain" text editor, to see what the RTF codes look like.  (I don't
want you to be shocked, so be sitting down when you try this!).

regards, mathtalk-ga

Clarification of Question by amy123456-ga on 13 Jan 2004 11:48 PST
RPM is actually sending plain text.

Thank you

Clarification of Question by amy123456-ga on 14 Jan 2004 12:05 PST
You cannot modify or comment on this question right now. It is
currently being answered.

Why is this on my screen. i can not do or see any action. Please advise.

thank you

Request for Question Clarification by mathtalk-ga on 14 Jan 2004 20:56 PST
Hi, Amy:

That was me, "locking" the question.  Typically while a Researcher is
working on an Answer, the Question cannot be updated except for the
Customer to post a Clarification (as you did).  I don't think anybody
else is apt to jump in, so if you like I can leave the Question
unlocked while I continue to work on it.

Let me know if you need further explanation.  The locking is most
useful for fairly quick Questions, to avoid two Researchers working on
an Answer without being aware of what the other is doing.

regards, mathtalk-ga

Clarification of Question by amy123456-ga on 15 Jan 2004 03:20 PST
Hi. thank you.

PS - Do you have an idea when you might have a solution? I am under
some pressure at work.

PS- This project has many questions which i hope you can continue to
work with me as i post new questions!

Thank you very much

Request for Question Clarification by mathtalk-ga on 15 Jan 2004 05:18 PST
Hi, Amy:

I'll post a working "console" program tonight of reading the textfile
and printing out these items:

Patient last name
Patient first name
Room number
Location

You can try the program on some "real" data and we can then tweak the
code as we need to fix any problems.

My idea is to search for the first non-blank line (actually, a line
containing a comma) after the first line which contains "Order
Session".  That line should then be the one that contains all the
required information to extract.

Will this work for you?

regards, mathtalk-ga

Clarification of Question by amy123456-ga on 15 Jan 2004 07:04 PST
Yes  

Thank you

Request for Question Clarification by mathtalk-ga on 16 Jan 2004 05:55 PST
Hi, amy:

I've posted my draft C# code below as a Comment.  I guess we need to
discuss which version of the .Net Framework Class Library you will be
using.  This is pretty basic stuff, so I hoped it would work the same
on both 1.0 and 1.1, but it looks like they may have left out support
for regular expressions in the early copy of 1.1 that I installed.

I created an empty C# project called textextract and added the
(existing) file shown in the Comment as textextract.cs.  Give it a
shot and let me know how it works for you.  I tried to give some
thought to how the code might work if some of the text items are
missing, and this is the sort of thing that one spends a lot of time
testing.

regards, mathtalk-ga

Clarification of Question by amy123456-ga on 16 Jan 2004 06:52 PST
Thank you. I have loaded in visual studio C#.

I received some erros,

C:\JavaPrac\Read File web\Class1.cs(46): 'System.IO.StreamReader' does
not contain a definition for 'Readline'
C:\JavaPrac\Read File web\Class1.cs(61): 'System.IO.StreamReader' does
not contain a definition for 'Readline'
C:\JavaPrac\Read File web\Class1.cs(50): Cannot implicitly convert
type 'int' to 'bool'
C:\JavaPrac\Read File web\Class1.cs(78): Method 'string.Trim(params
char[])' referenced without parentheses
C:\JavaPrac\Read File web\Class1.cs(69): Method 'string.Trim(params
char[])' referenced without parentheses
C:\JavaPrac\Read File web\Class1.cs(84): Method 'string.Trim(params
char[])' referenced without parentheses
C:\JavaPrac\Read File web\Class1.cs(87): Method 'string.Trim(params
char[])' referenced without parentheses
C:\JavaPrac\Read File web\Class1.cs(93): Method 'string.Trim(params
char[])' referenced without parentheses
C:\JavaPrac\Read File web\Class1.cs(96): Method 'string.Trim(params
char[])' referenced without parentheses
C:\JavaPrac\Read File web\Class1.cs(50): The left-hand side of an
assignment must be a variable, property or indexer
C:\JavaPrac\Read File web\Class1.cs(38): The type or namespace name
'sr' could not be found (are you missing a using directive or an
assembly reference?)

Request for Question Clarification by mathtalk-ga on 16 Jan 2004 07:10 PST
Hi, amy:

Some of these are simple syntax errors (such as Readline instead of
ReadLine, my bad).  But which .Net Framework Class Library are you
using, 1.0 or 1.1?

--mathtalk

Clarification of Question by amy123456-ga on 16 Jan 2004 07:28 PST
FCL-1.1

Clarification of Question by amy123456-ga on 16 Jan 2004 07:49 PST
I am getting an: unhandled exeption of type 'System.indexOutOfRangeException'
occurred in Read File web.exe.

class TxtXtrct
	{
		public static void Main(string[] args)
		{
			if(args==null)
			{
				Console.WriteLine("c:\\lightsr\test22.txt.");
			}
			else
			{
				TxtXtrct.ListItems(args[0]);
			}
			return;
		}

Clarification of Question by amy123456-ga on 16 Jan 2004 07:53 PST
Can you see if the file loc that i placed is correct. Thank you


file loc: C:\lightsr\order.txt

using System;
using System.IO;
using System.Text;
using System.Data;
using System.Text.RegularExpressions;




namespace textextract
{
	/// <summary>
	/// Main Class TxtXtrct for text extract console app.
	/// </summary>
	class TxtXtrct
	{
		public static void Main(string[] args)
		{
			if(args==null)
			{
				Console.WriteLine("c:\\lightsr\test22.txt.");
			}
			else
			{
				TxtXtrct.ListItems(args[0]);
			}
			return;
		}

		/// <summary>
		/// Method for extracting text items from file
		/// </summary>
		/// <param name="filename">root/path/filename.ext</param>
		static void ListItems(string filename)
		{
			if (!File.Exists("c:\\lightsr\\order.rtf"))
			{
				Console.WriteLine(
					"Error: Specified file {0} does not exist.",
					filename);

				//sr.Close();
				return;
			}

			StreamReader sr = File.OpenText("c:\\lightsr\\order.rtf");
        
			while(sr.Peek() > -1)
			{
				if (Regex.IsMatch(sr.ReadLine(), "Order Session"))
					break;
			}
        
			if (sr.Peek() == -1)//test
			{
				Console.WriteLine(
					"Error: Order Session header was not found.");

				sr.Close();
				return;
			}
        
			while (sr.Peek() > -1)
			{
				string extractLine = sr.ReadLine();
        
				if (Regex.IsMatch(extractLine, ","))
				{
					// found the line with the items of interest
          
					// the last name is the trimmed part in front of the comma
					string lastname = extractLine.Substring(
						0, extractLine.IndexOf(",")).Trim();
               
					Console.WriteLine("Last name: " + lastname);
          
					// the first name is trimmed next part up to triple space
					string extract1 = extractLine.Substring(
						extractLine.IndexOf(",") + 1);
          
					string firstname = extract1.Substring(
						0, extract1.IndexOf("   ")).Trim();
               
					Console.WriteLine("First name: " + firstname);
          
					// the location is next after first name
					string extract2 = extract1.Substring(
						extract1.IndexOf(firstname + "   ")).Trim(); // test
          
					string location = extract2.Substring(
						extract2.IndexOf(" ")).Trim();
          
					Console.WriteLine("Location: " + location);
          
					// the last item room number is next, up to a dash
					string extract3 = extract2.Substring(
						extract2.IndexOf(location + " ")).Trim();
          
					string roomnumber = extract3.Substring(
						extract3.IndexOf("-")).Trim();
          
					Console.WriteLine("Room number: " + roomnumber);
               
					break;
				}
			}
        
			sr.Close();
			return;
		}

	}
  
}

Request for Question Clarification by mathtalk-ga on 16 Jan 2004 09:32 PST
Hi, amy:

Thanks, I will focus on using FCL 1.1 as well.  The index out of range
is no doubt a bug in my use of IndexOf.  I will step through the code
and get that cleaned up, then provide the necessary fixes.

--mathtalk

Clarification of Question by amy123456-ga on 16 Jan 2004 09:47 PST
Thank you

Request for Question Clarification by mathtalk-ga on 16 Jan 2004 11:21 PST
Hi, if you want to hard code the filename, then do it this way:

class TxtXtrct
    {
	public static void Main(string[] args)
	{
	    TxtXtrct.ListItems("c:\\lightsr\test22.txt.");
	    return;
	}

I think the error you were seeing comes about from call the program
with no command line arguments.  My code to detect this needs some
fixing, but the above will get you past that point.

regards, mathtalk-ga

Clarification of Question by amy123456-ga on 17 Jan 2004 21:28 PST
Hi. it is looking good. I just have a few questions?

for the location i get the following return-  MS6 11-01      25y   F
tending: GAGNON, PAUL H

for roomnumber i get nothing.  

Thank you very much.


// the location is next after first name
					string extract2 = extract1.Substring(
						extract1.IndexOf(firstname + " ")).Trim(); // test
          
					string location = extract2.Substring(
					    extract2.IndexOf("   ")).Trim();
          
					Console.WriteLine("Location: " + location);
          
					// the last item room number is next, up to a dash
					string extract3 = extract2.Substring(
						extract2.IndexOf(location + "-")).Trim();
          
					string roomnumber = extract3.Substring(
						extract3.IndexOf(" ")).Trim(); // -
          
					Console.WriteLine("Room number: " + roomnumber);

Clarification of Question by amy123456-ga on 18 Jan 2004 08:16 PST
Hi I continue to receive this error

An unhandled exception of type 'System.ArgumentOutOfRangeException'
occurred in mscorlib.dll

Additional information: StartIndex cannot be less than zero.

The program '[2520] Parse file out1.exe' has exited with code 0 (0x0).

Request for Question Clarification by mathtalk-ga on 18 Jan 2004 08:34 PST
The location and room number problems and the invalid index exception
are related, and I've fixed these bugs in my current version of the
code.  I can post the debugged code as an Answer, but perhaps we
should discuss an idea that might make the extraction more robust.

Is it possible to give a ooncise description of the valid locations? 
I'm thinking that if we always have a location (which must conform to
a very tight specification) then by focusing in on that item first, we
will be able to extract the name (firstname, lastname) in a more
reliable fashion.  There are many variations in names, including the
possibility of missing (say) a first name (or having multiple first
names).

Also, what is the format of the room number?  When I was first coding
the extraction I thought the room number would be just a two digit
value.  But as I reflect on your original example, I'm wondering if
the room number (which you showed as simply 11) being given as 11-01
might means "11th floor, room 1" or something of the sort.  Anyway,
for now I've just extracted the digits in front of the hyphen to be
consistent with your earlier clarification.  I'm producing a string
value, but this can easily be converted to an integer value if that
would be better for your purposes.

regards, mathtalk-ga

Clarification of Question by amy123456-ga on 18 Jan 2004 11:40 PST
Hi. 
first of. thank you.

MS6 11-1   means MEDICAL SERGICAL6TH floor,  room  11   bed 1. 

Do not worry about the bed # BECAUSE WE ONLY HAVE ONE PATIENT PER ROOM.

Request for Question Clarification by mathtalk-ga on 18 Jan 2004 13:25 PST
Okay, that takes care of the room number.  Would it be safe to assume
that the location will always be two letters + a numeric part (as in
"MS6")?

--mathtalk-ga

Clarification of Question by amy123456-ga on 18 Jan 2004 13:42 PST
The conf for the hospital is.\

Floors-- MS6, MS7, MS8  romm # 1-28

Units - PICU,  NICU,  MS8I  - Room # 1-26

PS. When you post the answer for this part, let me know when i can
post another question. all for the same project. Thank you
Answer  
Subject: Re: C# split-- read from a file and split the file for specific data
Answered By: mathtalk-ga on 19 Jan 2004 08:35 PST
Rated:5 out of 5 stars
 
Hi, amy123456-ga:

Here is a revised version of the code which properly extracts the data
items from the sample file.

Some comments on how it works follow the code.

/*  begin textextract.cs  */

using System;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;

namespace textextract
{
  /// <summary>
  /// Main Class TxtXtrct for text extract console app.
  /// </summary>
  class TxtXtrct
  {
    public static void Main(string[] args)
    {
      if (args.Length == 0)
      {
        Console.WriteLine("You must specify a filename.");
      }
      else
      {
        TxtXtrct.ListItems(args[0]);
      }
      return;
    }

    /// <summary>
    /// Method for extracting text items from file
    /// </summary>
    /// <param name="filename">root/path/filename.ext</param>
    static void ListItems(string filename)
    {
      if (!File.Exists(filename))
      {
        Console.WriteLine(
             "Error: Specified file {0} does not exist.",
             filename);

        return;
      }

      StreamReader sr = File.OpenText(filename);
        
      while(sr.Peek() > -1)
      {
        if (Regex.IsMatch(sr.ReadLine(), "Order Session"))
          break;
      }
        
      if (sr.Peek() == -1)
      {
        Console.WriteLine(
             "Error: Order Session header was not found.");

        sr.Close();
        return;
      }
        
      while (sr.Peek() > -1)
      {
        string extractLine = sr.ReadLine();
        
        if (Regex.IsMatch(extractLine, ","))
        {
          // found the line with the items of interest
          
          // the last name is the trimmed part in front of the comma
          string lastname = extractLine.Substring(
               0, extractLine.IndexOf(",")).Trim();
               
          Console.WriteLine("Last name: " + lastname);
          
          // the first name is trimmed next part up to triple space
          string extract1 = extractLine.Substring(
               extractLine.IndexOf(",") + 1);
          
          string firstname = extract1.Substring(
               0, extract1.IndexOf("   ")).Trim();
               
          Console.WriteLine("First name: " + firstname);
          
          // the location is next after first name
          string extract2 = extract1.Substring(
               extract1.IndexOf(firstname + " ")
			   + firstname.Length             ).Trim();
          
          string location = extract2.Substring(
               0, extract2.IndexOf(" ")).Trim();
          
          Console.WriteLine("Location: " + location);
          
          // the last item room number is next, up to a dash
          string extract3 = extract2.Substring(
               extract2.IndexOf(location + " ")
			   + location.Length              ).Trim();
          
          string roomnumber = extract3.Substring(
               0, extract3.IndexOf("-")).Trim();
          
          Console.WriteLine("Room number: " + roomnumber);
               
          break;
        }
      }
        
      sr.Close();
      return;
    }

  }
  
}

/*   end of textextract.cs   */

This program is intended to illustrate some useful techniques for
extract specific data items from a text file.  First, of course, we
test for the existence of a file (given its name) and open that file
with a "StreamReader" object.

Then we read lines from the StreamReader and manipulate the resulting
strings in various ways.  The "regular expression" class Regex allows
us to match a variety of patterns within strings, but the use made
here is minimal.  We look first for a line that contains "Order
Session", and subsequently for a line that has a comma.

Once we have that line, we use the Substring and IndexOf methods of
the string class to extract the data items we need.

This code:

lastname = extractLine.Substring(0, extractLine.IndexOf(",")).Trim();

illustrates the ideas.  We find where the comma is in the string
extractLine with the expression:

extractLine.IndexOf(",")

The characters in a string are identified by zero-based indexing. 
Thus when we ask for the substring beginning at position 0 (the first
character) having length extractLine.IndexOf(","), this gives
everything in front of (but not including) the comma character.  The
final Trim() method removes any whitespace from both ends of the
result.

In the next step we use Substring with a single argument:

extract1 = extractLine.Substring(extractLine.IndexOf(",") + 1);

which starts at one position beyond the comma and takes the rest of
the string.  From here we would continue to extract the first name and
other items until we are finished.

If you have any questions about the details of the code, please post a
Request for Clarification.

The structure of the code is linear, in the sense of proceding forward
through the file in one pass.  While this seems adequate for the
immediate task, parsing text often requires a more complicated
arrangements.  As your project progresses you might need to revisit
this code and make a more elaborate search logic.  Especially if the
code proves unreliable, we might need to wrap the search in a
try/catch exception handler.  Further testing will hopefully help you
decide if this is the case.

regards, mathtalk-ga

Request for Answer Clarification by amy123456-ga on 19 Jan 2004 09:44 PST
Hi . The code is not running at all, it starts, blinks and terminates.

Request for Answer Clarification by amy123456-ga on 19 Jan 2004 09:53 PST
I can go as far as here. ***** Console.WriteLine("order.rt");     ******

namespace textextract
{
	/// <summary>
	/// Main Class TxtXtrct for text extract console app.
	/// </summary>
	class TxtXtrct
	{
		public static void Main(string[] args)
		{
			if (args.Length == 0)
			{
			*****	Console.WriteLine("order.rt");*****

Clarification of Answer by mathtalk-ga on 19 Jan 2004 11:59 PST
Hi, amy:

Let me be a bit more specific about how to run the code as a standalone program.

I started with a new "empty project" in C#, then added the file
textextract.cs to the project as an existing file.  It was then
necessary (to support the System.Text.RegularExpressions namespace,
though none of the others) to add a Reference to the Project for
System.dll (right click on the project and do Add Reference...; 
System.dll can be found in alphabetic order under the .Net tab of the
dialog).

From there the program can be executed by putting the path to the file
you want to extract text from on the command line, e.g.

testextract  C:\testdata.txt

The program checks the length of the command line arguments (an array
of string) and if this is greater than 0, it tests for the existence
of the file being named by args[0], the first command line argument.

There is a setting under the Debug options to supply the command line
arguments in the debugging mode.  Is that how you are trying to run
the program?

If you want, I can zip up my entire project and post it where you can
download it, if you think that would be expeditious.

regards, mathtalk-ga
amy123456-ga rated this answer:5 out of 5 stars
Excellent to work with.

Comments  
Subject: Re: C# split-- read from a file and split the file for specific data
From: mathtalk-ga on 11 Jan 2004 00:21 PST
 
RTF (Rich Text Format) is a specification developed by Microsoft to
facilitate conversions between Word and other word processing document
types.

The plain text shown in your sample would appear in an RTF file with
"markup" characteristics as well as the text "content".

As a strategy for picking out the specific items you need, beginning
with the patient's name, I would suggest first scanning through to the
phrase "order Session" under the assumption that this would appear in
every such file.

The next (non-whitespace) text after that should be the patient's
name, judging by your example.  The location "MS6" seems to be the
next piece of text, followed by the room number "11" in the same
example.

The medical record number may be the most challenging data to match. 
Your example makes it look as if the text of that item is actually
split across two lines (both at the right hand margins of the text).

If you would take a look at your RTF file in a (simple) text editor,
such as Notepad if you have nothing better, then you will be able to
see these additional "formatting" characters used by RTF.

Next we'll look at some C# code for opening a file and doing the
search for such strings.

regards, mathtalk-ga
Subject: Re: C# split-- read from a file and split the file for specific data
From: amy123456-ga on 12 Jan 2004 14:44 PST
 
The problem is that RPM sends the file in this format automaticaly.
Any other format the the file does not look like anything like the
original.

PS. What i send you was actually a true file from RPM.

Thank you
Subject: Re: C# split-- read from a file and split the file for specific data
From: mathtalk-ga on 12 Jan 2004 14:48 PST
 
Well, perhaps the file which is sent by RPM is a plain text file
rather than an RTF file.  I will post some samples of RTF so you can
be sure what I'm talking about.

-- mathtalk-ga
Subject: Re: C# split-- read from a file and split the file for specific data
From: mathtalk-ga on 16 Jan 2004 05:51 PST
 
using System;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;

namespace textextract
{
  /// <summary>
  /// Main Class TxtXtrct for text extract console app.
  /// </summary>
  class TxtXtrct
  {
    public static void Main(string[] args)
    {
      if(args==null)
      {
        Console.WriteLine("You must specify a filename.");
      }
      else
      {
        TxtXtrct.ListItems(args[0]);
      }
      return;
    }

    /// <summary>
    /// Method for extracting text items from file
    /// </summary>
    /// <param name="filename">root/path/filename.ext</param>
    static void ListItems(string filename)
    {
      if (!File.Exists(filename))
      {
        Console.WriteLine(
             "Error: Specified file {0} does not exist.",
             filename);

        sr.Close();
        return;
      }

      StreamReader sr = File.OpenText(filename);
        
      while(sr.Peek() > -1)
      {
        if (Regex.IsMatch(sr.Readline(), "Order Session"))
          break;
      }
        
      if (sr.Peek() = -1)
      {
        Console.WriteLine(
             "Error: Order Session header was not found.");

        sr.Close();
        return;
      }
        
      while (sr.Peek() > -1)
      {
        string extractLine = sr.Readline();
        
        if (Regex.IsMatch(extractLine, ","))
        {
          // found the line with the items of interest
          
          // the last name is the trimmed part in front of the comma
          string lastname = extractLine.Substring(
               0, extractLine.IndexOf(",")).Trim;
               
          Console.WriteLine("Last name: " + lastname);
          
          // the first name is trimmed next part up to triple space
          string extract1 = extractLine.Substring(
               extractLine.IndexOf(",") + 1);
          
          string firstname = extract1.Substring(
               0, extract1.IndexOf("   ")).Trim;
               
          Console.WriteLine("First name: " + firstname);
          
          // the location is next after first name
          string extract2 = extract1.Substring(
               extract1.IndexOf(firstname + "   ")).Trim;
          
          string location = extract2.Substring(
               extract2.IndexOf(" ")).Trim;
          
          Console.WriteLine("Location: " + location);
          
          // the last item room number is next, up to a dash
          string extract3 = extract2.Substring(
               extract2.IndexOf(location + " ")).Trim;
          
          string roomnumber = extract3.Substring(
               extract3.IndexOf("-")).Trim;
          
          Console.WriteLine("Room number: " + roomnumber);
               
          break;
        }
      }
        
      sr.Close();
      return;
    }

  }
  
}
Subject: Re: C# split-- read from a file and split the file for specific data
From: amy123456-ga on 16 Jan 2004 06:49 PST
 
Thank you. I loaded and i received several erros. 

C:\JavaPrac\Read File web\Class1.cs(46): 'System.IO.StreamReader' does
not contain a definition for 'Readline'
C:\JavaPrac\Read File web\Class1.cs(61): 'System.IO.StreamReader' does
not contain a definition for 'Readline'
C:\JavaPrac\Read File web\Class1.cs(50): Cannot implicitly convert
type 'int' to 'bool'
C:\JavaPrac\Read File web\Class1.cs(78): Method 'string.Trim(params
char[])' referenced without parentheses
C:\JavaPrac\Read File web\Class1.cs(69): Method 'string.Trim(params
char[])' referenced without parentheses
C:\JavaPrac\Read File web\Class1.cs(84): Method 'string.Trim(params
char[])' referenced without parentheses
C:\JavaPrac\Read File web\Class1.cs(87): Method 'string.Trim(params
char[])' referenced without parentheses
C:\JavaPrac\Read File web\Class1.cs(93): Method 'string.Trim(params
char[])' referenced without parentheses
C:\JavaPrac\Read File web\Class1.cs(96): Method 'string.Trim(params
char[])' referenced without parentheses
C:\JavaPrac\Read File web\Class1.cs(50): The left-hand side of an
assignment must be a variable, property or indexer
C:\JavaPrac\Read File web\Class1.cs(38): The type or namespace name
'sr' could not be found (are you missing a using directive or an
assembly reference?)
Subject: Re: C# split-- read from a file and split the file for specific data
From: mathtalk-ga on 16 Jan 2004 07:20 PST
 
You may actually be a little further along than I am.

Here are some fixes to my code:

1) Readline should be ReadLine (two occurences)

2) If (sr.Peek() = -1) should be "==" for equality comparison

3) The Trim method needs to be called with parentheses, Trim()
   (several places)

4) Remove the first sr.Close(); I haven't even declared it yet!

regards, mathtalk-ga
Subject: Re: C# split-- read from a file and split the file for specific data
From: amy123456-ga on 19 Jan 2004 13:00 PST
 
I would like to post another question. How do i do that and have you
continue working with the project. I know that i need to ask a quetion
and post $ amount for the question.

Thank you very much
Subject: Re: C# split-- read from a file and split the file for specific data
From: amy123456-ga on 19 Jan 2004 13:06 PST
 
The following is worth $30.00. simple one.

The extraction that we just did on the orders, i need it to go to a
file so i can archive for further use. I need it to be saved as

F_Name/L_Name/Loc/Room #/

like a log file, all the events in one file.

Thank you  very much.   Let me know and i will post the question.
Subject: Re: C# split-- read from a file and split the file for specific data
From: mathtalk-ga on 19 Jan 2004 21:25 PST
 
First, thanks Amy for the kind words and encouragement.  What you
propose is quite generous.  I don't believe that I'm the only
Researcher here who would be able to provide a completely satisfactory
Answer to your new Question.  However if you do wish in the future to
direct a Question to a particular Researcher, the Editors have
suggested that adding "for suchperson-ga" to the Subject title would
be acceptable.  I tend to track the Science > Math and Computers >
Programming categories pretty closely and would be happy to work with
you on such problems.

For the interested reader who may be trying to reconstruct the project
as I did it, here are some more detailed steps relating to the VS .Net
IDE:

1. Launch MS Visual Studio .Net 2003.  Click New Project.

2. On the New Project dialog, select Visual C# Projects on the upper
left "tree control" and select Empty Project in the pane to the right.
 Let's call the project/solution "TxtXtrct" and take your defaults as
far as the directory/folder choices go.  Click OK.  [Console
application would also be a fine choice here; I just wanted to
know/control every little detail that was going into the Customer's
project.]

3. You will now see the Solution Explorer pane at the upper right of
the main IDE view, for the TxtXtrct project (in the TxtXtrct
solution).  Right click on "References" (the only item in the tree
under TxtXtrct initially), and choose Add Reference... from the menu
(not Add Web Reference...).

4. You are now presented with a tabbed Add Reference dialog box, with
.Net being the leftmost tab (and probably the default view).  Normally
the default sort on the available .Net references is ascending on the
Component Name (first field), and we are looking to add System.dll to
our project.  It may be a little hard to spot, mixed in with many
other components that are labelled System.something or other, but it
will be there.  Mine says version 1.0.5000.0.  Click on the target and
hit the Select button, or simply double click, and the System.dll will
appear on a line in the lower pane.  Click OK.  You should now see
that "System" appears below References in the Solution Explorer.

5. Finally add the C# source code file textextract.cs to the project,
either as an existing file or (if you prefer) as a new file which you
can then overwrite with my posted code.  [Of course use the final code
from my Answer rather than the bug-riddled draft code from my earlier
Comment.]

6. You should save and build the project at this point.  Let me know
of any errors that occur during the build.  The IDE keeps track of
them in a "thing to do" grid at the lower right.

7. To pass in command line arguments to the program when you launch
the debugger, right click on the project and go to Properties (or use
the Project/Properties menu item).  A Property Pages dialog opens up
(which you can also access by clicking a Property Pages icon/button on
the Project Properties "explorer" pane, probably at lower right), and
now you see two items in the left hand pane, Common Properties and
Configuration Properties.  Click on Configuration Properties, to
select that second one, and underneath you'll see a few subheadings. 
One of them is Debugging, and when it is selected, you'll see three
topics in the right hand pane.  The last of these is Start Options, in
which we find item Command Line Argument listed first.  Put the full
path and filename of your test data here and click OK.

regards, mathtalk-ga

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy