Google Answers Logo
View Question
 
Q: Recursive string matching utility ( Answered 4 out of 5 stars,   0 Comments )
Question  
Subject: Recursive string matching utility
Category: Computers > Programming
Asked by: cereb-ga
List Price: $30.00
Posted: 29 Sep 2004 01:38 PDT
Expires: 29 Oct 2004 01:38 PDT
Question ID: 407818
I have multiple text files on the same topic, each file is slightly
different from the others, and mainly by adding sentences here and
there and by re-arranging paragraphs.  I need to create a single
document for such topic, and existing text-comparing utilities don't
work well for what I have in mind.  This is how I envision doing that:
First I will select one file as the standard (file1), and combine all
other text files into another file (file2)

Then I will convert each (period-delimited) sentence of file1 into a
record of a database.  For each record in that database I will then
look for a match in file2.  If a match is found in file2, that
sentence can either be deleted (preferred), its font may be changed or
its color, whichever is easiest to implement.  After each record (of
file1) has been used to find a match in file2, file2 will be saved in
its modified form.

I can then look at all the modifications I made in file 2 and decide
what I will use to finalize the topic of file1

 
For example, if the reference file contains 200 sentence-records, and
the other file only 120, I still need to run all 200 records to look
for a match.  If the other file has 300 records, any or all of the 200
reference sentences may find a match, but at least 100 sentences will
remain un-matched, and these represent the changes I made before.

Clarification of Question by cereb-ga on 29 Sep 2004 02:22 PDT
I can run the utility on the hosting server of my web site where
MySQL, PHP and Perl are available.  I can run it also on my desk box
which is Windows 98SE. A windows utility makes the job a bit easier
for me, but the other option is available.
Answer  
Subject: Re: Recursive string matching utility
Answered By: palitoy-ga on 29 Sep 2004 03:21 PDT
Rated:4 out of 5 stars
 
Hello cereb

I have written a small perl script for you to perform this task.  It
checks file1 for sentences and then removes them from file2, this then
leaves file2 with only the different lines (and this is output as
file3).

I always try to make my script as readable as possible for others (I
could make it a lot more unreadable if you wish as Perl is good at
that!) and within the script you need to alter the three filenames to
the required values (this is near the top of the script).

I have made the assumption that differences in whitespace (such as
multiple spaces) are not important but differences in CasiNg is.

If you require any further help or need any modifications please ask
for clarification and I will try to respond as swiftly as possible.

###
#!/usr/bin/perl

# comment the next line out if run from a command line
print "Content-type: text/html\n\n";

# some declarations
use strict;

# these are the filenames of the files and should be editted.
# $file3 is the file that is output after editting.
my $file1 = "file1.txt";
my $file2 = "file2.txt";
my $file3 = "output.txt";

# read in the first file into a variable called $string
open FILE, $file1 or die "Couldn't open file: $!";
my $string = join("", <FILE>);
close FILE;

# remove multiple spaces from $string
$string =~ s/\s{1,}/ /gi;

# split the $string into sentences separated by . into an array
my @lines = split /\./, $string;

# read in the second file into a string
open FILE, $file2 or die "Couldn't open file: $!";
$string = join("", <FILE>);
close FILE;

# remove multiple spaces from $string
$string =~ s/\s{1,}/ /gi;

# loop through the sentences in file1 (now stored in the array @lines)
foreach ( @lines ) {
  # delete the line from file2 using a regular expression if it exists
  # this is *not* case sensitive but can be made so if necessary
  $string =~ s/$_\.//g;
};

# remove multiple spaces from $string which now holds the different lines
$string =~ s/\s{1,}/ /gi;

# split the string into sentences so that when it is
# printed it has one sentence per line
my @output = split /\./, $string;

# output the differences
open (OUT, ">$file3");
foreach my $line ( @output ) {
  # remove leading whitespace
  $line =~ s/^\s+//;
  # print to file
  print OUT "$line.\n";
  # print to screen
  print "$line.\n";
};
close(OUT);

# exit the program
exit(0);
###

Request for Answer Clarification by cereb-ga on 29 Sep 2004 08:25 PDT
Thank you, palitoy.  I will consider the question as answered although
I have yet to test it, trusting that I can come back to resolve any
glitches I may encounter.  Since the script will sit on a hosted
account that I access remotely from my windows desktop, I have some
homework to do to figure out how to make it work.

Clarification of Answer by palitoy-ga on 29 Sep 2004 09:19 PDT
You can use this solution on your desktop by installing ActivePerl
(which is free) and is available for Windows.
cereb-ga rated this answer:4 out of 5 stars
Thank you, that was quick.  A desktop solution would have been best,
but you were responsive and I trust it will work.

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy