Google Answers: Perl script

View Question

Q: Perl script ( Answered 5 out of 5 stars

, 0 Comments )

Question

Subject: Perl script
Category: Computers > Programming
Asked by: mickr-ga
List Price: $20.00

Posted: 22 Jun 2004 10:05 PDT
Expires: 22 Jul 2004 10:05 PDT
Question ID: 364584

Hi,

I want a perl script to parse a file

columb1 columb2   columb3 columb4 columb5 columb6 
pin     here/1/2  -0.456  14.56   name    name1
pin     here/3/2  -0.966  1.56    joe     john
pin     here/1/2  -1.256  12.56   name    name1
pin     here/3/2  -8.23   2.67    fred    bill
I want the output to have one line per different columb 2
i.e. here/1/2 and here/3/2 in this case.

The line output should have the lowest number for 
columb 3 and columb 4 for each columb 2.

i.e. 

here/1/2 columb 3 = -0.456 or -1.256 so -1.256 is choosen
here/1/2 columb 4 = 14.56 or 12.56 so 12.56 is choosen

here/3/2 columb 3 = -0.966 or -8.23 so -8.23 is choosen
here/3/2 columb 4 = 1.56 or 2.57 so 1.56 is choosen

hence the output file would be

here/1/2  -1.256   12.56
here/3/2  8.23     1.56

Please make the script user friendly for a complete novice.

Thanks,

Mick

Answer

Subject: Re: Perl script
Answered By: palitoy-ga on 22 Jun 2004 10:57 PDT
Rated: 5 out of 5 stars

Hello Mick I have made one assumption in solving this problem for you which I was unable to tell from your data set, this is that the columns are separated by a tab. That is they go columb1{tab}columb2{tab} etc. If this is not the case please ask for clarification and I can sort this out for you. I have tried to write the script in a way that is most easily read - there are probably ways that it could be written to make it smaller but you requested it was as user-friendly as possible. Perl is notorious for being able to be made unreadable! Here is the script: # START #!/usr/bin/perl # make it web-friendly so that it can be run on the internet print "Content-type: text/html\n\n"; # open file and read it into an array to process # note you may need to alter the position and name of the test.txt file open (TXTFILE, "test.txt"); my @lines = <TXTFILE>; close(TXTFILE); # sort the array alphabetically - this is done so that each columb2 is # in alphabetical order @lines = sort(@lines); # set up a counter to count the number of loops my $counter = 0; # create two hashes to hold the information about the highest and lowest # numbers my (%mincolumbs, %maxcolumbs); # create an array that holds all the different columb2 names my @namesofcolumbs; # create a variable that holds the last columb2 name found my $columnname = ""; # loop through the lines in the file foreach $newline (@lines) { # if it is the first line then ignore it as this is just column headers if ( $counter != 0 ) { # start processing the line into a new array # this splits the line at every tab character and assumes that # the columns are separated by a tab character my @splitline = split("\t",$newline); # this is where the data is held that we need to process # $splitline[1] = columb2 # $splitline[2] = columb3 # $splitline[3] = columb4 # if this number is smaller than the stored value we have # (or we have no vale stored then store it) if ( $splitline[2] < $mincolumbs[$splitline[1]] \|\| $mincolumbs{$splitline[1]} == '' ) { $mincolumbs{$splitline[1]} = $splitline[2]; } # if this number is smaller than the stored value we have # (or we have no vale stored then store it) if ( $splitline[3] < $maxcolumbs[$splitline[1]] \|\| $maxcolumbs{$splitline[1]} == '' ) { $maxcolumbs{$splitline[1]} = $splitline[3]; } # if we have a new columb2 name then store it in # the namesofcolumbs array if ( $columnname eq "" \|\| $columnname ne $splitline[1] ) { push @namesofcolumbs, $splitline[1]; $columnname = $splitline[1]; } } # add one to the counter $counter++; } # outputting the data # for each name in the namesofcolumbs array get the highest and lowest values # stored and output it to a text file named output.txt open (TXTFILE, ">output.txt"); foreach $printout ( @namesofcolumbs ) { print TXTFILE $printout . "\t" . $mincolumbs{$printout} . "\t" . $maxcolumbs{$printout} . "\n"; } close(TXTFILE); # END The script reads in a file called test.txt and this should be placed in the directory holding the script. The script outputs a file called output.txt in this directory also. If you have any questions or require any additional help please ask for clarification and I will do all I can to help.
Clarification of Answer by palitoy-ga on 22 Jun 2004 11:36 PDT Hello Mick I must apologise, there is a small typo in the script I posted. You need to change this line: if ( $splitline[2] < $mincolumbs[$splitline[1]] \|\| $mincolumbs{$splitline[1]} eq '' ) { TO: if ( $splitline[2] < $mincolumbs{$splitline[1]} \|\| $mincolumbs{$splitline[1]} eq '' ) { (NOTE THE { instead of [ and } instead of ] ). Similarly change: if ( $splitline[3] < $maxcolumbs[$splitline[1]] \|\| $maxcolumbs{$splitline[1]} eq '' ) { TO: if ( $splitline[3] < $maxcolumbs{$splitline[1]} \|\| $maxcolumbs{$splitline[1]} eq '' ) { Sorry for the small typo once again. If there is anything else please let me know by asking for clarification.
Request for Answer Clarification by mickr-ga on 22 Jun 2004 13:34 PDT Hi, Cant try this till tomorrow UK time but it looks great and just what I asked for. However, I forgot to say I would like the output file to be sorted on columb 3 (smallest first) so as -1.256 is less than 8.23 the order is still here/1/2 -1.256 12.56 here/3/2 8.23 1.56 $10 bonus for the extra work. If it can't be done easily with the built in perl sort command I would be just as happy with an example of calling the unix sort command within perl to do it. PS rather than tabs the input file is seperated by multiple spaces so I think I can just use my @splitline = split(" ",$newline); is that correct. Thanks, Mick
Request for Answer Clarification by mickr-ga on 23 Jun 2004 00:57 PDT Hi, Tried it this morning worked great - Thanks! I am still on for the $10 bonus for sorting the output. Thanks, Mick
Clarification of Answer by palitoy-ga on 23 Jun 2004 01:26 PDT Hi Mick I will work on those corrections for you and post the solution nearer lunchtime (I am in the UK also).
Clarification of Answer by palitoy-ga on 23 Jun 2004 03:02 PDT Hello Mick Here is the solution you require. If you have any further questions on this please ask for further clarification. # START #!/usr/bin/perl # make it web-friendly print "Content-type: text/html\n\n"; # open file and read it into an array to process # note you may need to alter the position and name of the test.txt file open (TXTFILE, "test.txt"); my @lines = <TXTFILE>; close(TXTFILE); # sort the array alphabetically - this is done so that each columb2 is # in alphabetical order @lines = sort(@lines); # set up a counter to count the number of loops my $counter = 0; # create two hashes to hold the information about the highest and lowest numbers my (%mincolumbs, %maxcolumbs); # create an array that holds all the different columb2 names my @namesofcolumbs; # create a variable that holds the last columb2 name found my $columnname = ""; # loop through the lines in the file foreach $newline (@lines) { # if it is the first line then ignore it as this is just column headers if ( $counter != 0 ) { # start processing the line into a new array # remove any multiple spaces $newline =~ s/\s{1,}/ /g ; # this splits the line at every tab character and assumes that # the columns are separated by a tab character my @splitline = split(" ",$newline); # this is where the data is held that we need to process # $splitline[1] = columb2 # $splitline[2] = columb3 # $splitline[3] = columb4 # if this number is smaller than the stored value we have # (or we have no vale stored then store it) if ( $splitline[2] < $mincolumbs{$splitline[1]} \|\| $mincolumbs{$splitline[1]} eq '' ) { $mincolumbs{$splitline[1]} = $splitline[2]; } # if this number is smaller than the stored value we have # (or we have no vale stored then store it) if ( $splitline[3] < $maxcolumbs{$splitline[1]} \|\| $maxcolumbs{$splitline[1]} eq '' ) { $maxcolumbs{$splitline[1]} = $splitline[3]; } # if we have a new columb2 name then store it in # the namesofcolumbs array if ( $columnname eq "" \|\| $columnname ne $splitline[1] ) { push @namesofcolumbs, $splitline[1]; $columnname = $splitline[1]; } } # add one to the counter $counter++; } # sort the arrays so they are in the correct order and output the data # to a file @keys = sort { $mincolumbs{$a} cmp $mincolumbs{$b} } ( keys %mincolumbs ); open (TXTFILE, ">output.txt"); foreach $key ( @keys ) { print TXTFILE $key . " " . $mincolumbs{$key} . " " . $maxcolumbs{$key} . "\n"; } close(TXTFILE); # close the program exit(0); #END
Request for Answer Clarification by mickr-ga on 23 Jun 2004 05:01 PDT Hi, Thanks very much. I will add the $10 as a tip when I rate the question. The sort almost worked but by default it gives the larger number first so I got here/3/2 8.23 1.56 here/1/2 -1.256 12.56 instead of here/1/2 -1.256 12.56 here/3/2 8.23 1.56 I couldn't find a sort -r switch in perl so I just did reverse (sort { $mincolumbs{$a} cmp $mincolumbs{$b} } ( keys %mincolumbs ) ) ; is that OK or is there a better way. Thanks, Mick
Clarification of Answer by palitoy-ga on 23 Jun 2004 05:18 PDT Thanks for the 5-star rating and tip, they are much appreciated. If you need any further help please ask. The reverse() solution you came up with is the method I would have used also as it is the easiest and most common sense one when reading the script through. I always try to write my scripts in a way that they are most readable as it makes them much easier to edit when anyone tries to edit them, I am glad you appreciate this! Thanks again!
Request for Answer Clarification by mickr-ga on 19 Jul 2004 02:29 PDT Hi, I would like to get another perl script please. I have posted the question if you would like to do it. Thanks, Mick
Clarification of Answer by palitoy-ga on 19 Jul 2004 05:14 PDT Hi Mick I have just completed your other script question. Hopefully you will find it works to your needs, if not just ask for clarification on that question and I will work it through with you again.

mickr-ga rated this answer: 5 out of 5 stars

and gave an additional tip of: $10.00

Excellent script and support - very readable as I could read and understand
it, I have even made a few changes ;-)
Thanks very much. Plan to use you again if I have to do perl again.

Comments

There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy