Google Answers Logo
View Question
 
Q: Perl script to mark lines of tab-delimited file according to content ( Answered 5 out of 5 stars,   3 Comments )
Question  
Subject: Perl script to mark lines of tab-delimited file according to content
Category: Computers > Programming
Asked by: gerry1234-ga
List Price: $50.00
Posted: 22 May 2006 14:18 PDT
Expires: 21 Jun 2006 14:18 PDT
Question ID: 731427
I have a series of tab-delimited text files.  Each file consists of
records on separate lines.  Every record contains text for each of 50
columns (the columns, or fields, are separated by tabs; there are no
other tabs in any file).  I want to mark each record that contains at
least two directional indicators in the 10th column.  There are 16
directional indicators: n, nne, ne, ene, e, ese, se, sse, s, ssw, sw,
wsw, w, wnw, nw, nnw.  These could be present in any combination of
upper or lower case letters.  I want to count as valid indicators only
those that are preceded and followed by a single space, comma,
semicolon, or period (in any combination; 3 examples in brackets: <
n.>,<.eSE >,<;w.>).  I want to mark only those records having two of
these valid indicators in the 10th column, ignoring any matches in
other columns.  For output, I want a file created that lists all
records from the input file.  Each record should have identical
content and structure to the input file (tab-delimited), with the
addition that all matching records have appended one tab and the word
"matched".  Therefore, the input and output files will be identical
for all columns, except that the output file will have one additional
column containing "matched" for all records containing two valid
directional indicators in the 10th field.  I have tried variations of
the following:

#!/usr/bin/perl -w
while (<>) {
  if (/NEED_A_WORKING_REGEX_HERE/gi) { 
    print "$`$&$'\tmarked\n";
  } else {
    print "$`$&$'\n";
  }
}

In first attempts with the above, I was using $& to allow manipulation
of the prinout of a matching string segement.  Now, I will be
satisfied to get (e.g., by redirection: script.pl in.txt > out.txt) an
output file as described in the preceding paragraph.

I would like the answer to work with Perl v.5.8 and higher.
Answer  
Subject: Re: Perl script to mark lines of tab-delimited file according to content
Answered By: palitoy-ga on 23 May 2006 02:27 PDT
Rated:5 out of 5 stars
 
Hello gerry1234-ga,

Thank-you for your question.

After studying your question I believe I have come up with a solution
for you that works and I have included it below.  Should this be not
correct please ask for clarification and include an example file of
the data input so that I can work directly on this.

As you indicated in your question, the most important section of the
script is the regular expression so I will try to explain my approach
to it.

This is the regular expression I came up with from your description:

([\s,;.](n|nne|ne|ene|e|ese|se|sse|s|ssw|sw|wsw|w|wnw|nw|nnw)[\s,;.].*){2,}

Within the square brackets are the characters that can go before/after
the directional indicator (single space, comma, semi-colon and
period).  Between the two sets of square brackets are the possible
directional indicators; within a regular expression each example is
separated by a | character and should be enclosed within a set of ()
brackets to generate the match.  After the second set of square
brackets we do a greedy match to ensure anything can be matched here. 
The final part of the regular expression is the {2,} section, this
means to match anything in the () brackets immediately preceding this
at least twice.


#!/usr/bin/perl

open (TXTFILE, "TEST.TXT");

while($origLine = <TXTFILE>) {
  # split $origLine into parts by the tab in the line
  @splitLine = split(/\t/, $origLine);
  # check whether the 10th section of $origLine matches the pattern
  if($splitLine[9] =~
m/([\s,;.](n|nne|ne|ene|e|ese|se|sse|s|ssw|sw|wsw|w|wnw|nw|nnw)[\s,;.].*){2,}/gi)
{
    chomp($origLine);
    print $origLine."\tmatched\n";
    }
  # if it does not match the pattern
  else { print $origLine; };
}
close(TXTFILE);
gerry1234-ga rated this answer:5 out of 5 stars
palitoy-ga:  Your answer appears to work just as stated.  The
presentation of your answer was very clear, which will benefit me as
well.  Sorry for the delay in acknowledging your good work.

Comments  
Subject: Re: Perl script to mark lines of tab-delimited file according to content
From: kharn-ga on 22 May 2006 16:15 PDT
 
Can you provide a sample file with a few lines for testing?
Subject: Re: Perl script to mark lines of tab-delimited file according to content
From: kharn-ga on 22 May 2006 16:59 PDT
 
I came up with this quick and dirty. I don't know the extent of your
input, but try it out and see if it works for you.

#!/usr/bin/perl

while($origLine = <>) {
	@myLine = split(/\t/, $origLine);
	if($myLine[9] =~ m/([
,;.](n|nne|ne|ene|e|ese|se|sse|s|ssw|sw|wsw|w|wnw|nw|nnw)[
,;.].*){2}/gi) {
		chomp($origLine);
		print $origLine . "\tmatched" . "\n";
	} else {
		print $origLine;
	}
}


Eric
Subject: Re: Perl script to mark lines of tab-delimited file according to content
From: gerry1234-ga on 11 Jun 2006 19:51 PDT
 
kharn-ga:  Thank you for the effort with your comment.  Unfortunately,
it did not identify any "positive" lines correctly.  Perhaps the
answer by palitoy-ga helps highlight the problem?

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy