|
|
Subject:
Perl script to mark lines of tab-delimited file according to content
Category: Computers > Programming Asked by: gerry1234-ga List Price: $50.00 |
Posted:
22 May 2006 14:18 PDT
Expires: 21 Jun 2006 14:18 PDT Question ID: 731427 |
I have a series of tab-delimited text files. Each file consists of records on separate lines. Every record contains text for each of 50 columns (the columns, or fields, are separated by tabs; there are no other tabs in any file). I want to mark each record that contains at least two directional indicators in the 10th column. There are 16 directional indicators: n, nne, ne, ene, e, ese, se, sse, s, ssw, sw, wsw, w, wnw, nw, nnw. These could be present in any combination of upper or lower case letters. I want to count as valid indicators only those that are preceded and followed by a single space, comma, semicolon, or period (in any combination; 3 examples in brackets: < n.>,<.eSE >,<;w.>). I want to mark only those records having two of these valid indicators in the 10th column, ignoring any matches in other columns. For output, I want a file created that lists all records from the input file. Each record should have identical content and structure to the input file (tab-delimited), with the addition that all matching records have appended one tab and the word "matched". Therefore, the input and output files will be identical for all columns, except that the output file will have one additional column containing "matched" for all records containing two valid directional indicators in the 10th field. I have tried variations of the following: #!/usr/bin/perl -w while (<>) { if (/NEED_A_WORKING_REGEX_HERE/gi) { print "$`$&$'\tmarked\n"; } else { print "$`$&$'\n"; } } In first attempts with the above, I was using $& to allow manipulation of the prinout of a matching string segement. Now, I will be satisfied to get (e.g., by redirection: script.pl in.txt > out.txt) an output file as described in the preceding paragraph. I would like the answer to work with Perl v.5.8 and higher. |
|
Subject:
Re: Perl script to mark lines of tab-delimited file according to content
Answered By: palitoy-ga on 23 May 2006 02:27 PDT Rated: |
Hello gerry1234-ga, Thank-you for your question. After studying your question I believe I have come up with a solution for you that works and I have included it below. Should this be not correct please ask for clarification and include an example file of the data input so that I can work directly on this. As you indicated in your question, the most important section of the script is the regular expression so I will try to explain my approach to it. This is the regular expression I came up with from your description: ([\s,;.](n|nne|ne|ene|e|ese|se|sse|s|ssw|sw|wsw|w|wnw|nw|nnw)[\s,;.].*){2,} Within the square brackets are the characters that can go before/after the directional indicator (single space, comma, semi-colon and period). Between the two sets of square brackets are the possible directional indicators; within a regular expression each example is separated by a | character and should be enclosed within a set of () brackets to generate the match. After the second set of square brackets we do a greedy match to ensure anything can be matched here. The final part of the regular expression is the {2,} section, this means to match anything in the () brackets immediately preceding this at least twice. #!/usr/bin/perl open (TXTFILE, "TEST.TXT"); while($origLine = <TXTFILE>) { # split $origLine into parts by the tab in the line @splitLine = split(/\t/, $origLine); # check whether the 10th section of $origLine matches the pattern if($splitLine[9] =~ m/([\s,;.](n|nne|ne|ene|e|ese|se|sse|s|ssw|sw|wsw|w|wnw|nw|nnw)[\s,;.].*){2,}/gi) { chomp($origLine); print $origLine."\tmatched\n"; } # if it does not match the pattern else { print $origLine; }; } close(TXTFILE); |
gerry1234-ga
rated this answer:
palitoy-ga: Your answer appears to work just as stated. The presentation of your answer was very clear, which will benefit me as well. Sorry for the delay in acknowledging your good work. |
|
Subject:
Re: Perl script to mark lines of tab-delimited file according to content
From: kharn-ga on 22 May 2006 16:15 PDT |
Can you provide a sample file with a few lines for testing? |
Subject:
Re: Perl script to mark lines of tab-delimited file according to content
From: kharn-ga on 22 May 2006 16:59 PDT |
I came up with this quick and dirty. I don't know the extent of your input, but try it out and see if it works for you. #!/usr/bin/perl while($origLine = <>) { @myLine = split(/\t/, $origLine); if($myLine[9] =~ m/([ ,;.](n|nne|ne|ene|e|ese|se|sse|s|ssw|sw|wsw|w|wnw|nw|nnw)[ ,;.].*){2}/gi) { chomp($origLine); print $origLine . "\tmatched" . "\n"; } else { print $origLine; } } Eric |
Subject:
Re: Perl script to mark lines of tab-delimited file according to content
From: gerry1234-ga on 11 Jun 2006 19:51 PDT |
kharn-ga: Thank you for the effort with your comment. Unfortunately, it did not identify any "positive" lines correctly. Perhaps the answer by palitoy-ga helps highlight the problem? |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |