Google Answers: Perl script required

View Question

Q: Perl script required ( Answered 5 out of 5 stars

, 0 Comments )

Question

Subject: Perl script required
Category: Computers > Programming
Asked by: mickr-ga
List Price: $20.00

Posted: 19 Jul 2004 02:16 PDT
Expires: 18 Aug 2004 02:16 PDT
Question ID: 376032

Hi,

I would like a perl script that remove duplicate entities from a file.

the file will look like the following

<many lines of text>
 Startpoint: a/b/c
  <one line of text>
 Endpoint: d/e/f/g
  <many lines of text>
 slack <plus some more texton the same line>
  <some blank lines>
 Startpoint: e/r/k
  <one line of text>
 Endpoint: d/e/f/g
  <many lines of text>
 slack <plus some more texton the same line>
  <some blank lines>
 Startpoint: a/b/c
  <one line of text>
 Endpoint: d/e/f/g
  <many lines of text>
 slack <plus some more texton the same line>
  <some blank lines>

i would like all the lines between Startpoint and slack to be printed
only for the first unique variation of Startpoint and Endpoint i.e. in
the above example

lets say we had 

a) Startpoint: a/b/c Endpoint d/e/f/g - not seen before so print it
upto and including slack followed by a /n

b) Startpoint: e/r/k Endpoint: d/e/f/g - Startpoint not seen before so print it

c) Startpoint: e/r/k Endpoint: d/e/f/g/z - Endpoint not seen before so print it

d) Startpoint: a/b/c Endpoint: d/e/f/g/z - Startpoint and Endpoint not
seen before so print it (they have both occured but not at the same
time)

e) Startpoint: e/r/k Endpoint: d/e/f/g/z - both Startpoint and
Endpoint seen before so do not print it.

Please make the code user friendly and easy to understand with comments.

Answer

Subject: Re: Perl script required
Answered By: palitoy-ga on 19 Jul 2004 05:10 PDT
Rated: 5 out of 5 stars

Hello Mickr If I understand you correctly the following script should give you the output you require. I have tried to make the script as readable and friendly as possible by not using too many shortcuts and commenting it throughout. At the beginning of the script it reads in a text file called "filename.txt", this should be altered to fit your needs and depending on where your input file is held. If you have any questions or queries regarding the script please ask for clarification and I will do my best to help. ###BEGIN### #!/usr/bin/perl # open and read the file containing the data into an array called "lines" open (TXTFILE, "filename.txt"); @lines = <TXTFILE>; close(TXTFILE); # variable to hold the information to print $print_this = ""; # variable to check whether we should be printing $printing = 0; # an array holding starting/ending points to check we do not duplicate things @points = ""; # a variable to say whether we should output the information or not $should_output = 0; # variables to hold the starting and ending points $start_point = ""; $end_point = ""; # loop through the text file and process the data foreach $newline (@lines) { # if the line is a starting point then... if ($newline =~ m/^Startpoint/) { # remember what is found in the text file until $printing is changed $printing = 1; # the start point is $start_point = $newline; } # if the line is a ending point then... $end_point = $newline if ($newline =~ m/^Endpoint/); # if we have seen the start and end point then process them if ( ( $start_point ne "" ) && ( $end_point ne "" ) ) { # join the start and end points $start_point = $start_point . " " . $end_point; # if we have not seen this start/end point before... if ( grep(/$start_point/, @points) == 0 ) { # remember it by adding it to our points array push @points, $start_point; # indicate that the information should be printed out $should_output = 1; } } # if the line begins with "slack" and the info should be printed... if ( ($newline =~ m/^slack/) && ($should_output == 1) ) { # print it out! print $print_this . $newline . "\n"; }; # if the line begins with "slack" then reset the variables if ( $newline =~ m/^slack/ ) { $printing = 0; $print_this = ""; $should_output = 0; $start_point = ""; $end_point = ""; }; # if the endpoint has not been reached but we are printing if ( $printing == 1 ) { # add the new line to the data that could be printed out $print_this .= $newline; }; }; # end loop through text file # end the script exit(0); ### END ###
Request for Answer Clarification by mickr-ga on 19 Jul 2004 07:11 PDT Hi palitoy, It works with some minor modifications - startpoint etc are not at line begin there is space before them so no ^ necessary - there is empty lines so I have changed the startpoint ne "" etc to startpoint ne "dummys". It appears to work but it is very very slow. I have 127 entities and it has only processed 19 in 13 minutes? The input file is 3.6M with 25k lines. Any idea why it is so slow? Is it possible for me to send you the input file to try it? Regards, Mick
Clarification of Answer by palitoy-ga on 19 Jul 2004 09:16 PDT Hi Mick Thanks for the clarification. I will deal with your points one at a time if I may... 1) If the startpoint always begins with a [SPACE]Startpoint then I would suggest these changes: if ($newline =~ m/^Startpoint/) { to: if ($newline =~ m/^\sStartpoint/) { This would look for a line starting with a space followed by the "Startpoint". Is this standard throughout the file or do the number of spaces change? Is there ever anything else before the "Startpoint"? Similarly: $end_point = $newline if ($newline =~ m/^Endpoint/); to: $end_point = $newline if ($newline =~ m/^\sEndpoint/); And: if ( ($newline =~ m/^slack/) && ($should_output == 1) ) { to: if ( ($newline =~ m/^\sslack/) && ($should_output == 1) ) { And: if ( $newline =~ m/^slack/ ) { to: if ( $newline =~ m/^\sslack/ ) { 2) Re: changing startpoint ne "dummys" - I am unclear why you have done this... is it because there are empty lines between each example? The [if ( ( $start_point ne "" ) && ( $end_point ne "" ) ) {] section is used to determine whether the script has found the start and end points of a section, this happens when these two variables are no longer blank. Changing the startpoint to not equal "dummys" would mean that the argument would not be correct. If you changed this because of the empty lines between each example the proper solution would be to change this: print $print_this . $newline . "\n"; to this: print $print_this . $newline ; 3) The speed of the script is dependant on a number of factors but I would not have thought it would take as long as you are describing. I am guessing that the problem is because of the changes taken in part 2) here. 4) Unfortunately we are not allowed to give out any personal details for people to be able to contact us. Any information given out would be removed as soon as the Google Answers Editors saw the information. Some people who ask the questions put their email addresses in the questions or clarifications but this is frowned upon by the Google Answers editors...
Clarification of Answer by palitoy-ga on 20 Jul 2004 01:34 PDT Glad I could sort it out for you. Thanks for the 5-star rating and tip - they are both appreciated.
Request for Answer Clarification by mickr-ga on 23 Jul 2004 05:07 PDT Hi Palitoy, Thanks for all your help previously. I am stuck on Perl modules now, new question posted if you are interested. Thanks, Mick
Clarification of Answer by palitoy-ga on 23 Jul 2004 05:35 PDT Hello Mick I am just looking at that question now for you... what operating system are you running and which version of perl?

mickr-ga rated this answer: 5 out of 5 stars

and gave an additional tip of: $10.00

Thanks, it worked my debug didn't!

Comments

There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy