Google Answers Logo
View Question
 
Q: Perl script to modify text files ( Answered 5 out of 5 stars,   0 Comments )
Question  
Subject: Perl script to modify text files
Category: Computers > Programming
Asked by: gerry1234-ga
List Price: $30.00
Posted: 22 Jun 2006 11:23 PDT
Expires: 22 Jul 2006 11:23 PDT
Question ID: 740263
I would like a Perl script suitable for modifying text files as
follows.  In case the file structure is not clear due to
line-wrapping, in the input example, each line begins with "part" and
ends with ";".  There are no other occurrences of "part" or ";" in a
line.  In the output example, each line starts with an initial "(" and
ends with a ";", the only ";" in each line.

Example text file input:
part B_1.1 = [&U] [&W 1/2]
(item_3908:0.0073,(((((item_4436:0.0052,(((item_4447:0.0126,item_1670:0.0105):0.0040,(item_4449:0.0116,(((item_4450:0.0010,item_1597:0.0021):0.0064,((item_1509:0.0234,(item_1607:0.0111,((item_1636:0.0165,item_1639:5.783e-006):0.0075,item_1642:0.0043):0.0066):0.0221):0.0271,item_1665:0.0061):0.0013):0.0065,item_1610:0.0056):0.0839):0.0114):0.0131,((item_4448:0.0034,item_1620:0.0019):0.0320,(((((ite
m_4814:0,item_42767:0,item_43257:0):0.0010,item_4815:0.0042):0.0005,item_573:0.0160):0.0154,item_1617:0.0095):0.0031,(item_87:1.908e-006,item_68426:0):0.0069):0.0146):0.0058):0.0099):0.0067,item_1350:0):0.0196,item_1600:0.0018):0.0049,item_4944:0,item_1612:0):0.0199,item_4934:0.0078):0.0008,item_1668:0.0033);
part B_1.2 = [&W 1/2]
(item_3908:0.0073,(((((item_4436:0.0052,(((item_4447:0.0126,item_1670:0.0105):0.0040,(item_4449:0.0116,(((item_4450:0.0010,item_1597:0.0021):0.0064,((item_1509:0.0234,(item_1607:0.0111,((item_1636:0.0165,item_1639:2.891e-006):0.0075,item_1642:0.0043):0.0066):0.0221):0.0271,item_1665:0.0061):0.0013):0.0065,item_1610:0.0056):0.0839):0.0114):0.0131,((item_4448:0.0034,item_1620:0.0019):0.0320,(((((ite
m_4814:0,item_42767:0,item_43257:0):0.0010,item_4815:0.0042):0.0005,item_573:0.0160):0.0154,item_1617:0.0095):0.0031,(item_87:9.539e-007,item_68426:0):0.0069):0.0146):0.0058):0.0099):0.0067,item_1350:9.539e-007):0.0196,item_1600:0.0018):0.0049,(item_4944:0,item_1612:0):3.875e-006):0.0199,item_4934:0.0078):0.0008,item_1668:0.0033);
part B_2.1 = [&W 1/16]
(item_3908:0.0032,((((item_4436:0.0052,(((item_4447:0.0104,item_1670:0.0197):0.0032,(item_4449:0.0168,((((item_4450:0.0021,item_1597:0):0.0044,item_1610:0.0077):0.0013,item_1665:0.0008):0.0165,(item_1509:0.0143,(item_1607:0.0107,((item_1636:0.0134,item_1639:0):0.0077,item_1642:1.905e-006):0.0026):0.0359):0.0171):0.0763):0.0034):0.0137,((item_4448:0.0042,item_1620:0):0.0411,(((item_4814:0.0010,((it
em_4815:0.0031,item_43257:0):3.927e-006,(item_573:0.0107,item_42767:0):3.824e-006):0.0021):0.0171,(item_87:0,item_68426:0.0031):0.0114):0.0002,item_1617:0.0056):0.0177):0.0016):0.0134):0.0088,item_1350:3.835e-006):0.0151,item_1612:0):0.0011,item_4944:0,item_1600:0.0011):0.0223,item_4934:0.0011,item_1668:0.0021);
part B_2.2 = [&W 1/16]
(item_3908:0.0032,(((((item_4436:0.0052,(((item_4447:0.0104,item_1670:0.0197):0.0032,(item_4449:0.0168,((((item_4450:0.0021,item_1597:0):0.0044,item_1610:0.0077):0.0013,item_1665:0.0008):0.0165,(item_1509:0.0143,(item_1607:0.0107,((item_1636:0.0134,item_1639:0):0.0077,item_1642:9.525e-007):0.0026):0.0359):0.0171):0.0763):0.0034):0.0137,((item_4448:0.0042,item_1620:0):0.0411,(((item_4814:0.0010,((i
tem_4815:0.0031,item_43257:0):1.964e-006,(item_573:0.0107,item_42767:0):1.912e-006):0.0021):0.0171,(item_87:0,item_68426:0.0031):0.0114):0.0002,item_1617:0.0056):0.0177):0.0016):0.0134):0.0088,item_1350:1.917e-006):0.0151,item_1612:1.975e-006):0.0011,item_1600:0.0011):1.978e-006,item_4944:0):0.0223,item_4934:0.0011,item_1668:0.0021);
part B_3.1 = [&U] [&W 1/9]
(item_3908:0.0086,(((((item_4436:0.0060,(((((item_4814:0.0008,item_42767:0,item_43257:0):0.0001,item_4815:0.0076):0.0032,item_573:0.0191):0.0104,item_1617:0.0138):0.0016,(item_87:0,item_68426:0.0033):0.0053):0.0194):0.0081,((item_4447:0.0116,item_1670:0.0206):0.0013,(item_4449:0.0073,(((item_4450:0.0021,item_1597:0.0010):0.0045,(item_1610:0.0088,item_1665:0.0021):0.0009):0.0066,(item_1509:0.0256,(
item_1607:0.0060,((item_1636:0.0142,item_1639:0):0.0071,item_1642:0.0005):0.0054):0.0224):0.0193):0.0859):0.0083):0.0124):0.0031,item_1350:0.0029):0.0089,(item_4448:0.0044,item_1620:0.0020):0.0236):0.0112,(item_4944:0,item_1600:0.0011,item_1612:0):0.0050):0.0047,item_4934:0.0054,item_1668:0.0021);
part B_3.2 = [&U] [&W 1/9]
(item_3908:0.0086,(((((item_4436:0.0062,((((((item_4814:0.0009,item_42767:0):2.847e-005,item_4815:0.0076):7.631e-006,item_43257:0):0.0033,item_573:0.0191):0.0104,item_1617:0.0138):0.0015,(item_87:0,item_68426:0.0033):0.0054):0.0194):0.0080,((item_4447:0.0116,item_1670:0.0206):0.0013,(item_4449:0.0073,(((item_4450:0.0021,item_1597:0.0010):0.0045,(item_1610:0.0088,item_1665:0.0021):0.0008):0.0066,(i
tem_1509:0.0256,(item_1607:0.0059,((item_1636:0.0142,item_1639:0):0.0070,item_1642:0.0005):0.0054):0.0224):0.0193):0.0859):0.0083):0.0124):0.0031,item_1350:0.0029):0.0089,(item_4448:0.0044,item_1620:0.0020):0.0236):0.0112,(item_4944:0,item_1600:0.0011,item_1612:0):0.0050):0.0047,item_4934:0.0054,item_1668:0.0021);
part B_4.1 = [&W 1] (item_3908,item_1668,(item_4934,((item_1600,(item_4944,(item_1612,(item_1350,(item_4436,(((item_1665,(item_1610,(item_4450,item_1597))),(item_1509,(item_1607,(item_1642,(item_1636,item_1639))))),((item_1617,(item_573,(item_4814,item_42767,item_43257,item_4815))),(item_87,item_68426)))))))),(item_1670,(item_4449,(item_4447,(item_4448,item_1620)))))))
[1];

Example text file output:
(item_3908:0.0073,(((((item_4436:0.0052,(((item_4447:0.0126,item_1670:0.0105):0.0040,(item_4449:0.0116,(((item_4450:0.0010,item_1597:0.0021):0.0064,((item_1509:0.0234,(item_1607:0.0111,((item_1636:0.0165,item_1639:5.783e-006):0.0075,item_1642:0.0043):0.0066):0.0221):0.0271,item_1665:0.0061):0.0013):0.0065,item_1610:0.0056):0.0839):0.0114):0.0131,((item_4448:0.0034,item_1620:0.0019):0.0320,(((((ite
m_4814:0,item_42767:0,item_43257:0):0.0010,item_4815:0.0042):0.0005,item_573:0.0160):0.0154,item_1617:0.0095):0.0031,(item_87:1.908e-006,item_68426:0):0.0069):0.0146):0.0058):0.0099):0.0067,item_1350:0):0.0196,item_1600:0.0018):0.0049,item_4944:0,item_1612:0):0.0199,item_4934:0.0078):0.0008,item_1668:0.0033)
[0.5];
(item_3908:0.0073,(((((item_4436:0.0052,(((item_4447:0.0126,item_1670:0.0105):0.0040,(item_4449:0.0116,(((item_4450:0.0010,item_1597:0.0021):0.0064,((item_1509:0.0234,(item_1607:0.0111,((item_1636:0.0165,item_1639:2.891e-006):0.0075,item_1642:0.0043):0.0066):0.0221):0.0271,item_1665:0.0061):0.0013):0.0065,item_1610:0.0056):0.0839):0.0114):0.0131,((item_4448:0.0034,item_1620:0.0019):0.0320,(((((ite
m_4814:0,item_42767:0,item_43257:0):0.0010,item_4815:0.0042):0.0005,item_573:0.0160):0.0154,item_1617:0.0095):0.0031,(item_87:9.539e-007,item_68426:0):0.0069):0.0146):0.0058):0.0099):0.0067,item_1350:9.539e-007):0.0196,item_1600:0.0018):0.0049,(item_4944:0,item_1612:0):3.875e-006):0.0199,item_4934:0.0078):0.0008,item_1668:0.0033)
[0.5];
(item_3908:0.0032,((((item_4436:0.0052,(((item_4447:0.0104,item_1670:0.0197):0.0032,(item_4449:0.0168,((((item_4450:0.0021,item_1597:0):0.0044,item_1610:0.0077):0.0013,item_1665:0.0008):0.0165,(item_1509:0.0143,(item_1607:0.0107,((item_1636:0.0134,item_1639:0):0.0077,item_1642:1.905e-006):0.0026):0.0359):0.0171):0.0763):0.0034):0.0137,((item_4448:0.0042,item_1620:0):0.0411,(((item_4814:0.0010,((it
em_4815:0.0031,item_43257:0):3.927e-006,(item_573:0.0107,item_42767:0):3.824e-006):0.0021):0.0171,(item_87:0,item_68426:0.0031):0.0114):0.0002,item_1617:0.0056):0.0177):0.0016):0.0134):0.0088,item_1350:3.835e-006):0.0151,item_1612:0):0.0011,item_4944:0,item_1600:0.0011):0.0223,item_4934:0.0011,item_1668:0.0021)
[0.0625];
(item_3908:0.0032,(((((item_4436:0.0052,(((item_4447:0.0104,item_1670:0.0197):0.0032,(item_4449:0.0168,((((item_4450:0.0021,item_1597:0):0.0044,item_1610:0.0077):0.0013,item_1665:0.0008):0.0165,(item_1509:0.0143,(item_1607:0.0107,((item_1636:0.0134,item_1639:0):0.0077,item_1642:9.525e-007):0.0026):0.0359):0.0171):0.0763):0.0034):0.0137,((item_4448:0.0042,item_1620:0):0.0411,(((item_4814:0.0010,((i
tem_4815:0.0031,item_43257:0):1.964e-006,(item_573:0.0107,item_42767:0):1.912e-006):0.0021):0.0171,(item_87:0,item_68426:0.0031):0.0114):0.0002,item_1617:0.0056):0.0177):0.0016):0.0134):0.0088,item_1350:1.917e-006):0.0151,item_1612:1.975e-006):0.0011,item_1600:0.0011):1.978e-006,item_4944:0):0.0223,item_4934:0.0011,item_1668:0.0021)
[0.0625];
(item_3908:0.0086,(((((item_4436:0.0060,(((((item_4814:0.0008,item_42767:0,item_43257:0):0.0001,item_4815:0.0076):0.0032,item_573:0.0191):0.0104,item_1617:0.0138):0.0016,(item_87:0,item_68426:0.0033):0.0053):0.0194):0.0081,((item_4447:0.0116,item_1670:0.0206):0.0013,(item_4449:0.0073,(((item_4450:0.0021,item_1597:0.0010):0.0045,(item_1610:0.0088,item_1665:0.0021):0.0009):0.0066,(item_1509:0.0256,(
item_1607:0.0060,((item_1636:0.0142,item_1639:0):0.0071,item_1642:0.0005):0.0054):0.0224):0.0193):0.0859):0.0083):0.0124):0.0031,item_1350:0.0029):0.0089,(item_4448:0.0044,item_1620:0.0020):0.0236):0.0112,(item_4944:0,item_1600:0.0011,item_1612:0):0.0050):0.0047,item_4934:0.0054,item_1668:0.0021)
[0.111111];
(item_3908:0.0086,(((((item_4436:0.0062,((((((item_4814:0.0009,item_42767:0):2.847e-005,item_4815:0.0076):7.631e-006,item_43257:0):0.0033,item_573:0.0191):0.0104,item_1617:0.0138):0.0015,(item_87:0,item_68426:0.0033):0.0054):0.0194):0.0080,((item_4447:0.0116,item_1670:0.0206):0.0013,(item_4449:0.0073,(((item_4450:0.0021,item_1597:0.0010):0.0045,(item_1610:0.0088,item_1665:0.0021):0.0008):0.0066,(i
tem_1509:0.0256,(item_1607:0.0059,((item_1636:0.0142,item_1639:0):0.0070,item_1642:0.0005):0.0054):0.0224):0.0193):0.0859):0.0083):0.0124):0.0031,item_1350:0.0029):0.0089,(item_4448:0.0044,item_1620:0.0020):0.0236):0.0112,(item_4944:0,item_1600:0.0011,item_1612:0):0.0050):0.0047,item_4934:0.0054,item_1668:0.0021)
[0.111111];
(item_3908,item_1668,(item_4934,((item_1600,(item_4944,(item_1612,(item_1350,(item_4436,(((item_1665,(item_1610,(item_4450,item_1597))),(item_1509,(item_1607,(item_1642,(item_1636,item_1639))))),((item_1617,(item_573,(item_4814,item_42767,item_43257,item_4815))),(item_87,item_68426)))))))),(item_1670,(item_4449,(item_4447,(item_4448,item_1620)))))))
[1];

Notes:
All horizontal white space in lines will be space characters.
Lines in input list may not all have the marker "[&U]".
All lines in input list with have a "[&W #]" marker.  The # can be a 1
or the reciprocal of an integer.
All item sets within lines will be enclosed in (), with many subsets of ().  
Within each line, the contents from the first "(" to the last ")" do
not change from input to output.
The line order must match between the input and output files.

The only changes are summarized as
[1] strip away the beginning of the line, before the first "("
[2] between last ")" and line-ending ";", insert 
	a blank space, followed by
	a decimal representation of the rational number from the [&W #]
marker, enclosed in [].

A maximum of 6 decimal places will be sufficient for the decimal representation.
Script should be Perl 5.8 compatible.

Please let me know if additional information is required.
Answer  
Subject: Re: Perl script to modify text files
Answered By: palitoy-ga on 22 Jun 2006 12:14 PDT
Rated:5 out of 5 stars
 
Hello gerry1234-ga,

Thank-you for your question.

Here is my proposed script for you to try.  It was difficult to work
out from the line-wrapping whether "part B_1.1 = [&U] [&W 1/2]" was
meant to be on the same line as "(item_3908:0.0073,(((((i...".  In my
script I have assumed every line starts with "part" and ends with ";"
with no line breaks in between.

I have also tried to make the script as readable as possible and added
comments to what I am doing at each stage - I know Perl can get very
unreadable at times and I prefer to make scripts as readable as
possible!

You will need to alter two things at the beginning of the script - the
filename of where your data is coming from and where to save it.

I think I have everything you require covered but if not please do not
hesitate to ask for clarification and state what works for you and
what doesn't.

The script:

#!/usr/bin/perl

# filenames to alter before running
$input_name = "testfile.txt";
$output_name = "output.txt";

# read in the textfile
open (TXTFILE, $input_name);
@lines = <TXTFILE>;
close(TXTFILE);

# assign a variable for the output
$output = "";

# loop through each line in the text file
foreach $line (@lines) {
  # let's get the stuff in brackets
  $brackets = $line;
  # use a regular expression to match anything in ( )
  $brackets =~ /\((.*)\)/;
  # assign the items matched to a variable
  $brackets = "(".$1.")";
  # now, let's get the fraction after the &W
  $fraction = $line;
  # use a regular expression to find the numbers after &W and before ]
  $fraction =~ /\&W\s(.*?)\]/;
  # assign the match to a variable to manipulate
  $fraction = $1;
  # let's calculate the fraction as a decimal
  # split the $fraction variable into parts separated by "/"
  @fraction_parts = split("/",$fraction);
  # the first part of the split array will be the top of the fraction
  $top = $fraction_parts[0];
  # the second part will be the bottom part of the fraction
  $bottom = $fraction_parts[1];
  # output the information found
  # check if it is a whole number (ie $bottom is 0)
  if ( $bottom != 0 ) {
    $output .= $brackets." [".sprintf("%.6f",($top/$bottom))."];\n";
  }
  else {
    $output .= $brackets." [".sprintf("%.6f",$top).";\n";
  };
  # clean up for the next loop
  $top = 0;
  $bottom = 0;
  $brackets = "";
};

# save the cleaned up information
open (TXTFILE, ">".$output_name.".txt");
print TXTFILE $output;
close(TXTFILE);
gerry1234-ga rated this answer:5 out of 5 stars
palitoy-ga:  Another high-quality answer.

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy