Google Answers Logo
View Question
 
Q: string matching with numerical score in perl ( No Answer,   5 Comments )
Question  
Subject: string matching with numerical score in perl
Category: Computers
Asked by: luong-ga
List Price: $10.00
Posted: 09 Dec 2002 18:28 PST
Expires: 08 Jan 2003 18:28 PST
Question ID: 122141
What I am looking for is a perl function to be used in a search
engine which would use a flat text file of about 10000 records.
 Given two multi-words strings (a "query" and a "target"), I want to 
get a score number which is proportional to the quality of the match
between query and target. If all the words of the query are in the
target (regardless of order), the score would be 100%. If only parts of
the query word(s) match parts of the target string, a lower score
should be returned. An example of what I want to be able to match is
"waterfall" and "yosemite falls". I am looking for a link to an
existing perl function, as I am sure this problem has been addressed
before several times.
Answer  
There is no answer at this time.

Comments  
Subject: Re: string matching with numerical score in perl
From: samwise-ga on 10 Dec 2002 14:12 PST
 
I don't have a link for you, but I'd attack this by generating a list
of substrings of each query word, and then trying to find those in the
target.  The score is then the percentage of substrings that match.
Subject: Re: string matching with numerical score in perl
From: marietta-ga on 17 Dec 2002 13:28 PST
 
I can see a way to do it by extracting all matching substrings from a
'dictionary' of words you are interested in (eg you would probably like
to exclude all very short words, add some of your own, and what about plurals?)
However what is the longest time you want to provide the scores from your
hypothetical 10000 record file, and what is the length of that file?
Subject: Re: string matching with numerical score in perl
From: luong-ga on 17 Dec 2002 14:49 PST
 
I wouldn't want the search to take an unusually long time, 
say more than a couple of seconds. Each record consists of
a short sentence, of an average of less than 10 words. 
When I posted the query, I thought it was a fairly common thing, but
apparently it is not.
Subject: Re: string matching with numerical score in perl
From: triniman-ga on 18 Dec 2002 17:21 PST
 
I would suggest posting to comp.lang.perl.misc.

You'll get a bunch of comments for free.
Subject: Re: string matching with numerical score in perl
From: scoobysnacks-ga on 21 Dec 2002 05:15 PST
 
Take a look at xapian, it looks close to what you're asking for.  You
can grab a perl front end for in from CPAN ...
http://www.xapian.org/
http://search.cpan.org/search?query=xapian

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy