Google Answers Logo
View Question
 
Q: Off the shelf application to compare 2 text lists ( No Answer,   5 Comments )
Question  
Subject: Off the shelf application to compare 2 text lists
Category: Computers
Asked by: vicsing-ga
List Price: $20.00
Posted: 30 Aug 2003 09:55 PDT
Expires: 29 Sep 2003 09:55 PDT
Question ID: 250512
I am involved in a data migration project at my company and have to
combine/link client records from 5-6 different product systems to one
master system.

As you can imagine there has historically been no uniformity in
"naming convention" of clients across the 5-6 product systems i.e., a
client could be spelt "ABC Limited", or "ABC Ltd." or "ABC" or "The
ABC" across the various systems.

To overcome this we created a "clean" master list of client names and
now want to compare the 5-6 "dirty" lists to this clean list.

We found some bulletin boards where users have solved such similar
problems using "SQL" functions such as "LIKE" or some PHP text
functions. HOwever, rather than recreate the wheel, we were wondering
if there were any opensource or commercial applications out there that
do this?

Here is what the dream application would do: 
The user would input the "clean list" and the "dirty list(s)". The
application would cycle through each entry in the each of the dirty
lists, match it against the clean list and compute a score (i.e., the
higher the score the closer the match). Then for all matches above a
certain "threshold" score (say 90%) it would automaticaly
rename/overwrite the dirty record with the clean one, for the others
it would present the user a dialox box with the closest matches from
the clean list in decending order of the score (i.e., "The closest
matches are "ABC   82%, "Aaa   75%, "A^&%$   65%). The user then would
then just click one of these matches and the application would
overwrite the dirty record with the clean record that the user just
chose. And so on and so forth. The application should ideally also be
customizable (i.e., ability to fine tune the algorithm that computes
the score, or choose the "threshold" score over which the record is
automatically owerwritten etc.)

Anything out there?

Request for Question Clarification by answerguru-ga on 30 Aug 2003 10:17 PDT
Hi there,

I've written several standardization scripts and programs over the
years in various languages - I haven't come across a "general"
standardization program based on a user input file of clean data. If
other researchers cannot find anything for you, I would be interested
in helping you write a script/program to solve this problem. For now
though, we'll see if any other researchers can hunt anything down.

answerguru-ga

Clarification of Question by vicsing-ga on 31 Aug 2003 05:42 PDT
Sigh...I was hoping to avoid writing code. Was hoping someone had
solved this same problem before and I could use a finished product.

I guess I'll wait a few days and see if anyone responds
Answer  
There is no answer at this time.

Comments  
Subject: Re: Off the shelf application to compare 2 text lists
From: answerforce-ga on 02 Sep 2003 09:15 PDT
 
I found this program on the internet: examdiff.exe (For windows).  It
does some or most of the things you've requested.  Like you wanted, it
has saved me countless hours.
You can find it at this site:
http://www.prestosoft.com/ps.asp?page=edp_examdiff

I hope it serves you as it has served me.  Good Luck

Raymond
Subject: Re: Off the shelf application to compare 2 text lists
From: jgraves-ga on 03 Sep 2003 06:27 PDT
 
Look for MaxDup on http://www.anchorcomputersoftware.com/.

Pricing is not listed on the website.

I wanted to become an official researcher before I posted this answer
(to get the $$$) but they are not accepting new applicants.  But I
thought it was more important to answer you than play their political
games.  I'm not sure why someone needs to be 'official' to answer the
question.

Jay
Subject: Re: Off the shelf application to compare 2 text lists
From: yosarian-ga on 04 Sep 2003 03:23 PDT
 
Hi vicsing-ga.
If I understand you correctly, you are looking for a record linkage
program. 
Here's one such program:
http://www.linkagewiz.com/
Here is a list of several others:
http://datamining.anu.edu.au/projects/linkage.html#record_linkage_software
I have no experience with any of the above programs, but they look as a step
in the right direction. (They may be an overkill as you mention 
only one field to be matched).
My search words in Google were:
"record linkage" software
Good luck,
yosarian-ga
P.S. jgraves-ga,
as you said, nobody has to be 'official' to
answer the question. However, As someone 
who also wishes someday to become an official
GA researcher, here's my 2 cents:
What Google Answers sells is not only the answers,
but the answers with a certain reputation - 
Google Answers staff test their researchers for correctness, promptness,
communication skills; Later they check them according
to user ratings. While anybody with good intentions
may answer a question, I think non-researchers have less
of a 'quality guarantee'. Is this a political game?
I do not think so.
Subject: OT
From: jgraves-ga on 04 Sep 2003 09:21 PDT
 
Hi yosarian-ga;

Thanks for your comments.  I'm going to look into your links (It's a
business need for one of company's I work with.)

As far as the political portion of my answer, I guess I am questioning
why Google Answers exists in such a limited capacity.  Are the
official researchers so omnipotent that they can answer any question. 
Apparently not because all three of the suggestions have come from
non-official users.  Do any of the suggestions so far help the user? 
I don't know because he hasn't replied, but what is the guarantee that
the 'official researcher' comes up with an acceptable answer.

There seem to be an awful lot of unanswered questions on the site
(I've only looked thru the computer section though.)  It seems to me
that opening it up a little more would help all sides.

I see your point but I hope you see mine too.

Jay
Subject: Re: Off the shelf application to compare 2 text lists
From: yosarian-ga on 07 Sep 2003 09:26 PDT
 
Hi jgraves-ga,
Having read your response, I guess I agree with your reasoning:
There is no guarantee the official researcher's answer is adequate.
I have sampled an (unrepresentative) number of unanswered questions.
Most of them have
had comments or clarifications that made a complete answer redundant,
but the case
is really not open and shut.
I believe if you keep answering these questions for free, the GA
establishment
will eventually take you into account, and make you official :-) (that
was the story
told by pinkfreud-ga, one of my favourite researchers.)
Good luck,
yosarian-ga

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy