Google Answers Logo
View Question
 
Q: Perl script ( Answered 5 out of 5 stars,   0 Comments )
Question  
Subject: Perl script
Category: Computers > Programming
Asked by: mickr-ga
List Price: $20.00
Posted: 22 Jun 2004 10:05 PDT
Expires: 22 Jul 2004 10:05 PDT
Question ID: 364584
Hi,

I want a perl script to parse a file

columb1 columb2   columb3 columb4 columb5 columb6 
pin     here/1/2  -0.456  14.56   name    name1
pin     here/3/2  -0.966  1.56    joe     john
pin     here/1/2  -1.256  12.56   name    name1
pin     here/3/2  -8.23   2.67    fred    bill
I want the output to have one line per different columb 2
i.e. here/1/2 and here/3/2 in this case.

The line output should have the lowest number for 
columb 3 and columb 4 for each columb 2.

i.e. 

here/1/2 columb 3 = -0.456 or -1.256 so -1.256 is choosen
here/1/2 columb 4 = 14.56 or 12.56 so 12.56 is choosen

here/3/2 columb 3 = -0.966 or -8.23 so -8.23 is choosen
here/3/2 columb 4 = 1.56 or 2.57 so 1.56 is choosen

hence the output file would be

here/1/2  -1.256   12.56
here/3/2  8.23     1.56

Please make the script user friendly for a complete novice.

Thanks,

Mick
Answer  
Subject: Re: Perl script
Answered By: palitoy-ga on 22 Jun 2004 10:57 PDT
Rated:5 out of 5 stars
 
Hello Mick

I have made one assumption in solving this problem for you which I was
unable to tell from your data set, this is that the columns are
separated by a tab.  That is they go columb1{tab}columb2{tab} etc.  If
this is not the case please ask for clarification and I can sort this
out for you.

I have tried to write the script in a way that is most easily read -
there are probably ways that it could be written to make it smaller
but you requested it was as user-friendly as possible.  Perl is
notorious for being able to be made unreadable!

Here is the script:

# START
#!/usr/bin/perl

# make it web-friendly so that it can be run on the internet
print "Content-type: text/html\n\n";

# open file and read it into an array to process
# note you may need to alter the position and name of the test.txt file
open (TXTFILE, "test.txt");
my @lines = <TXTFILE>;
close(TXTFILE);

# sort the array alphabetically - this is done so that each columb2 is
# in alphabetical order
@lines = sort(@lines);

# set up a counter to count the number of loops
my $counter = 0;

# create two hashes to hold the information about the highest and lowest
# numbers
my (%mincolumbs, %maxcolumbs);

# create an array that holds all the different columb2 names
my @namesofcolumbs;

# create a variable that holds the last columb2 name found
my $columnname = "";

# loop through the lines in the file
foreach $newline (@lines) {
    # if it is the first line then ignore it as this is just column headers
    if ( $counter != 0 ) {
          # start processing the line into a new array
          # this splits the line at every tab character and assumes that
          # the columns are separated by a tab character
          my @splitline = split("\t",$newline);

          # this is where the data is held that we need to process
          # $splitline[1] = columb2
          # $splitline[2] = columb3
          # $splitline[3] = columb4
          # if this number is smaller than the stored value we have
          # (or we have no vale stored then store it)
          if ( $splitline[2] < $mincolumbs[$splitline[1]] ||
$mincolumbs{$splitline[1]} == '' ) {
                 $mincolumbs{$splitline[1]} = $splitline[2];
          }
          # if this number is smaller than the stored value we have
          # (or we have no vale stored then store it)
          if ( $splitline[3] < $maxcolumbs[$splitline[1]] ||
$maxcolumbs{$splitline[1]} == '' ) {
                 $maxcolumbs{$splitline[1]} = $splitline[3];
          }
          # if we have a new columb2 name then store it in
          # the namesofcolumbs array
          if ( $columnname eq "" || $columnname ne $splitline[1] ) {
                push @namesofcolumbs, $splitline[1];
                $columnname = $splitline[1];
          }
    }
# add one to the counter
$counter++;
}

# outputting the data
# for each name in the namesofcolumbs array get the highest and lowest values
# stored and output it to a text file named output.txt
open (TXTFILE, ">output.txt");
foreach $printout ( @namesofcolumbs ) {
         print TXTFILE $printout . "\t" . $mincolumbs{$printout} .
"\t" . $maxcolumbs{$printout} . "\n";
}
close(TXTFILE);
# END

The script reads in a file called test.txt and this should be placed
in the directory holding the script.  The script outputs a file called
output.txt in this directory also.

If you have any questions or require any additional help please ask
for clarification and I will do all I can to help.

Clarification of Answer by palitoy-ga on 22 Jun 2004 11:36 PDT
Hello Mick

I must apologise, there is a small typo in the script I posted.

You need to change this line:

if ( $splitline[2] < $mincolumbs[$splitline[1]] ||
$mincolumbs{$splitline[1]} eq '' ) {

TO:

if ( $splitline[2] < $mincolumbs{$splitline[1]} ||
$mincolumbs{$splitline[1]} eq '' ) {

(NOTE THE { instead of [ and } instead of ] ).

Similarly change:

if ( $splitline[3] < $maxcolumbs[$splitline[1]] ||
$maxcolumbs{$splitline[1]} eq '' ) {

TO:

if ( $splitline[3] < $maxcolumbs{$splitline[1]} ||
$maxcolumbs{$splitline[1]} eq '' ) {

Sorry for the small typo once again.  If there is anything else please
let me know by asking for clarification.

Request for Answer Clarification by mickr-ga on 22 Jun 2004 13:34 PDT
Hi,

Cant try this till tomorrow UK time but it looks great and just what I asked
for. However, I forgot to say I would like the output file to be sorted on
columb 3 (smallest first) so as -1.256 is less than 8.23 the order is still

here/1/2  -1.256   12.56
here/3/2  8.23     1.56

$10 bonus for the extra work. If it can't be done easily with the 
built in perl sort command I would be just as happy with an example
of calling the unix sort command within perl to do it. 

PS rather than tabs the input file is seperated by multiple spaces 
so I think I can just use 
my @splitline = split(" ",$newline);

is that correct.

Thanks,

Mick

Request for Answer Clarification by mickr-ga on 23 Jun 2004 00:57 PDT
Hi,

Tried it this morning worked great - Thanks!
I am still on for the $10 bonus for sorting
the output.

Thanks,

Mick

Clarification of Answer by palitoy-ga on 23 Jun 2004 01:26 PDT
Hi Mick

I will work on those corrections for you and post the solution nearer
lunchtime (I am in the UK also).

Clarification of Answer by palitoy-ga on 23 Jun 2004 03:02 PDT
Hello Mick

Here is the solution you require.  If you have any further questions
on this please ask for further clarification.

# START
#!/usr/bin/perl

# make it web-friendly
print "Content-type: text/html\n\n";

# open file and read it into an array to process
# note you may need to alter the position and name of the test.txt file
open (TXTFILE, "test.txt");
my @lines = <TXTFILE>;
close(TXTFILE);

# sort the array alphabetically - this is done so that each columb2 is
# in alphabetical order
@lines = sort(@lines);

# set up a counter to count the number of loops
my $counter = 0;

# create two hashes to hold the information about the highest and lowest numbers
my (%mincolumbs, %maxcolumbs);

# create an array that holds all the different columb2 names
my @namesofcolumbs;

# create a variable that holds the last columb2 name found
my $columnname = "";

# loop through the lines in the file
foreach $newline (@lines) {
    # if it is the first line then ignore it as this is just column headers
    if ( $counter != 0 ) {
          # start processing the line into a new array
          # remove any multiple spaces
          $newline =~ s/\s{1,}/ /g ;

          # this splits the line at every tab character and assumes that
          # the columns are separated by a tab character
          my @splitline = split(" ",$newline);

          # this is where the data is held that we need to process
          # $splitline[1] = columb2
          # $splitline[2] = columb3
          # $splitline[3] = columb4
          # if this number is smaller than the stored value we have
          # (or we have no vale stored then store it)
          if ( $splitline[2] < $mincolumbs{$splitline[1]} ||
$mincolumbs{$splitline[1]} eq '' ) {
                 $mincolumbs{$splitline[1]} = $splitline[2];
          }
          # if this number is smaller than the stored value we have
          # (or we have no vale stored then store it)
          if ( $splitline[3] < $maxcolumbs{$splitline[1]} ||
$maxcolumbs{$splitline[1]} eq '' ) {
                 $maxcolumbs{$splitline[1]} = $splitline[3];
          }
          # if we have a new columb2 name then store it in
          # the namesofcolumbs array
          if ( $columnname eq "" || $columnname ne $splitline[1] ) {
                push @namesofcolumbs, $splitline[1];
                $columnname = $splitline[1];
          }
    }
# add one to the counter
$counter++;
}

# sort the arrays so they are in the correct order and output the data
# to a file
@keys = sort { $mincolumbs{$a} cmp $mincolumbs{$b} } ( keys %mincolumbs );
open (TXTFILE, ">output.txt");
foreach $key ( @keys ) {
   print TXTFILE $key . " " . $mincolumbs{$key} . " " . $maxcolumbs{$key} . "\n";
}
close(TXTFILE);

# close the program
exit(0);
#END

Request for Answer Clarification by mickr-ga on 23 Jun 2004 05:01 PDT
Hi,

Thanks very much. I will add the $10 as a tip when I rate the question.

The sort almost worked but by default it gives the larger number first
so I got

here/3/2  8.23     1.56
here/1/2  -1.256   12.56

instead of 

here/1/2  -1.256   12.56
here/3/2  8.23     1.56

I couldn't find a sort -r switch in perl so I just did 
reverse (sort { $mincolumbs{$a} cmp $mincolumbs{$b} } ( keys %mincolumbs ) ) ;

is that OK or is there a better way.

Thanks,

Mick

Clarification of Answer by palitoy-ga on 23 Jun 2004 05:18 PDT
Thanks for the 5-star rating and tip, they are much appreciated.

If you need any further help please ask.  The reverse() solution you
came up with is the method I would have used also as it is the easiest
and most common sense one when reading the script through.  I always
try to write my scripts in a way that they are most readable as it
makes them much easier to edit when anyone tries to edit them, I am
glad you appreciate this!

Thanks again!

Request for Answer Clarification by mickr-ga on 19 Jul 2004 02:29 PDT
Hi,

I would like to get another perl script please. I have posted the
question if you would like to do it.

Thanks,

Mick

Clarification of Answer by palitoy-ga on 19 Jul 2004 05:14 PDT
Hi Mick

I have just completed your other script question.  Hopefully you will
find it works to your needs, if not just ask for clarification on that
question and I will work it through with you again.
mickr-ga rated this answer:5 out of 5 stars and gave an additional tip of: $10.00
Excellent script and support - very readable as I could read and understand
it, I have even made a few changes ;-)
Thanks very much. Plan to use you again if I have to do perl again.

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy