Hello Mickr
If I understand you correctly the following script should give you the
output you require. I have tried to make the script as readable and
friendly as possible by not using too many shortcuts and commenting it
throughout.
At the beginning of the script it reads in a text file called
"filename.txt", this should be altered to fit your needs and depending
on where your input file is held.
If you have any questions or queries regarding the script please ask
for clarification and I will do my best to help.
###BEGIN###
#!/usr/bin/perl
# open and read the file containing the data into an array called "lines"
open (TXTFILE, "filename.txt");
@lines = <TXTFILE>;
close(TXTFILE);
# variable to hold the information to print
$print_this = "";
# variable to check whether we should be printing
$printing = 0;
# an array holding starting/ending points to check we do not duplicate things
@points = "";
# a variable to say whether we should output the information or not
$should_output = 0;
# variables to hold the starting and ending points
$start_point = "";
$end_point = "";
# loop through the text file and process the data
foreach $newline (@lines) {
# if the line is a starting point then...
if ($newline =~ m/^Startpoint/) {
# remember what is found in the text file until $printing is changed
$printing = 1;
# the start point is
$start_point = $newline;
}
# if the line is a ending point then...
$end_point = $newline if ($newline =~ m/^Endpoint/);
# if we have seen the start and end point then process them
if ( ( $start_point ne "" ) && ( $end_point ne "" ) ) {
# join the start and end points
$start_point = $start_point . " " . $end_point;
# if we have not seen this start/end point before...
if ( grep(/$start_point/, @points) == 0 ) {
# remember it by adding it to our points array
push @points, $start_point;
# indicate that the information should be printed out
$should_output = 1;
}
}
# if the line begins with "slack" and the info should be printed...
if ( ($newline =~ m/^slack/) && ($should_output == 1) ) {
# print it out!
print $print_this . $newline . "\n";
};
# if the line begins with "slack" then reset the variables
if ( $newline =~ m/^slack/ ) {
$printing = 0;
$print_this = "";
$should_output = 0;
$start_point = "";
$end_point = "";
};
# if the endpoint has not been reached but we are printing
if ( $printing == 1 ) {
# add the new line to the data that could be printed out
$print_this .= $newline;
};
}; # end loop through text file
# end the script
exit(0);
### END ### |
Request for Answer Clarification by
mickr-ga
on
19 Jul 2004 07:11 PDT
Hi palitoy,
It works with some minor modifications - startpoint etc are not at
line begin there is space before them so no ^ necessary - there is
empty lines so I have changed the startpoint ne "" etc to startpoint
ne "dummys".
It appears to work but it is very very slow. I have 127 entities and
it has only processed 19 in 13 minutes? The input file is 3.6M with
25k lines.
Any idea why it is so slow? Is it possible for me to send you the
input file to try it?
Regards,
Mick
|
Clarification of Answer by
palitoy-ga
on
19 Jul 2004 09:16 PDT
Hi Mick
Thanks for the clarification. I will deal with your points one at a
time if I may...
1) If the startpoint always begins with a [SPACE]Startpoint then I
would suggest these changes:
if ($newline =~ m/^Startpoint/) {
to:
if ($newline =~ m/^\sStartpoint/) {
This would look for a line starting with a space followed by the
"Startpoint". Is this standard throughout the file or do the number
of spaces change? Is there ever anything else before the
"Startpoint"?
Similarly:
$end_point = $newline if ($newline =~ m/^Endpoint/);
to:
$end_point = $newline if ($newline =~ m/^\sEndpoint/);
And:
if ( ($newline =~ m/^slack/) && ($should_output == 1) ) {
to:
if ( ($newline =~ m/^\sslack/) && ($should_output == 1) ) {
And:
if ( $newline =~ m/^slack/ ) {
to:
if ( $newline =~ m/^\sslack/ ) {
2) Re: changing startpoint ne "dummys" - I am unclear why you have
done this... is it because there are empty lines between each example?
The [if ( ( $start_point ne "" ) && ( $end_point ne "" ) ) {] section
is used to determine whether the script has found the start and end
points of a section, this happens when these two variables are no
longer blank. Changing the startpoint to not equal "dummys" would
mean that the argument would not be correct.
If you changed this because of the empty lines between each example
the proper solution would be to change this:
print $print_this . $newline . "\n";
to this:
print $print_this . $newline ;
3) The speed of the script is dependant on a number of factors but I
would not have thought it would take as long as you are describing. I
am guessing that the problem is because of the changes taken in part
2) here.
4) Unfortunately we are not allowed to give out any personal details
for people to be able to contact us. Any information given out would
be removed as soon as the Google Answers Editors saw the information.
Some people who ask the questions put their email addresses in the
questions or clarifications but this is frowned upon by the Google
Answers editors...
|
Clarification of Answer by
palitoy-ga
on
20 Jul 2004 01:34 PDT
Glad I could sort it out for you. Thanks for the 5-star rating and
tip - they are both appreciated.
|
Request for Answer Clarification by
mickr-ga
on
23 Jul 2004 05:07 PDT
Hi Palitoy,
Thanks for all your help previously. I am stuck on Perl modules now,
new question posted if you are interested.
Thanks,
Mick
|
Clarification of Answer by
palitoy-ga
on
23 Jul 2004 05:35 PDT
Hello Mick
I am just looking at that question now for you... what operating
system are you running and which version of perl?
|