Google Answers Logo
View Question
 
Q: For "GoogleExpert-ga" ONLY: "Need an exhaustive list of nicknames" - continued ( Answered 5 out of 5 stars,   0 Comments )
Question  
Subject: For "GoogleExpert-ga" ONLY: "Need an exhaustive list of nicknames" - continued
Category: Reference, Education and News > General Reference
Asked by: reblazer-ga
List Price: $25.00
Posted: 08 Oct 2003 14:26 PDT
Expires: 07 Nov 2003 13:26 PST
Question ID: 264338
This question is a continuation of expired question id #253527, found
at http://answers.google.com/answers/main?cmd=threadview&id=253527

-------------

GoogleExpert,
Hey, where did you go???  :-)

I'm confused:  The reason you said you can't answer is because "The
excessive nickname groups: 1 Jim 1 James 2 Jim 2 James 2 Jimmy has
been giving me some trouble."

In the previous post, you asked me:
   I converted the nicknames to the format you requested,  
however based on the nicknames from this file: 
http://search.cpan.org/src/BRIANL/Lingua-EN-Nickname-1.13/nicknames.txt
"Ab" is short for several names (Abel,Abiel,Abigail,
Abijah,Abner,Absalom) and in my converted file, "Ab" is assigned many
Group Numbers.  Should "Ab" and other names be assigned only One
Number or Several
Numbers?

and I replied:
   As far as the example of "Ab" - yes, it should be assigned several
group numbers - i.e. to the Abel-group, the Abiel-group, the
Abilgail-group, etc., exactly as you have done.

Isn't that the same issue?  If it is, it is not a problem at all!

It sounds like you have the answer all ready to go - in the exact form
I need (please review question id #253527 for all of the details)!  If
so, please answer the question!

Thank you very much!
Lazer
Answer  
Subject: Re: For "GoogleExpert-ga" ONLY: "Need an exhaustive list of nicknames" - continued
Answered By: googleexpert-ga on 10 Oct 2003 18:22 PDT
Rated:5 out of 5 stars
 
Hi Lazer,
I'm sorry for the confusion.  I was trying to prune the list of
nicknames so that lookup time would improve.

Below is a link to an Excel File which contains nicknames(English and
Japanese).
http://www.geocities.com/tmp264338/allnicks.xls

Please examine the file carefully for any duplicates or inaccuracies
in the nicknames.

Thank you.
-googleexpert

Request for Answer Clarification by reblazer-ga on 13 Oct 2003 07:38 PDT
GoogleExpert,

The link doesn't work.

Lazer

Clarification of Answer by googleexpert-ga on 13 Oct 2003 07:58 PDT
I'm sorry I didn't see this before.  It seems like Geocities requires
Referres to come from Geocities in order to access files.

The following link should work:
http://www.geocities.com/tmp264338/

Clarification of Answer by googleexpert-ga on 13 Oct 2003 07:59 PDT
Sorry, it should be "Referrers" NOT "Referres"

Request for Answer Clarification by reblazer-ga on 13 Oct 2003 10:00 PDT
GoogleExpert, would you please clarify your answer:
 
1) Does source "E1" come straight from
     http://search.cpan.org/src/BRIANL/Lingua-EN-Nickname-1.13/nicknames.txt
?
2) Does source "J" mean the combination of
       http://business.baylor.edu/Phil_VanAuken//JapaneseMaleNames.html
     and
       http://hsb.baylor.edu/html/vanauken/JapaneseFemaleNames.html
     ?
     Is it (source "J") also from
       http://www.20000-names.com/female_japanese_names.htm 
     and
       http://www.20000-names.com/male_japanese_names.htm 
     too?
3) Is what you wrote previously, namely that "The Japanese Name Lists
that I posted are not accurate nicknames. They are just shortened
names" still the case, or did you find actual Japanese nicknames
somewhere?
4) What happened to the Chinese nicknames from
       http://www.csupomona.edu/~faculty_computing/lab/Pronunciations/Pronunciation/mandarin.html
     and
       http://www.csupomona.edu/~faculty_computing/lab/Pronunciations/Pronunciation/cantonese.html
     or from
       http://www.20000-names.com/female_chinese_names.htm
     and
       http://www.20000-names.com/male_chinese_names.htm
     Is there a problem with including them, too?
5) Where you ever able to find a secondary source ("E2")?
 
In the last few questions (above) I am not asking you to go do more
research - just if you already have that data available to stick in
that Excel file - it would be great!
 
I am very appreciative of your hard work and the nice Excel format!
 
Lazer

Clarification of Answer by googleexpert-ga on 13 Oct 2003 16:41 PDT
Hi Lazer,
Here are my answers to your questions:
1.)  Yes, Rows with "E1" come from :
http://search.cpan.org/src/BRIANL/Lingua-EN-Nickname-1.13/nicknames.txt
2.)  Rows with "J" come from:
http://hsb.baylor.edu/html/vanauken/JapaneseFemaleNames.html
http://business.baylor.edu/Phil_VanAuken//JapaneseMaleNames.html
3.)  The Japanese nicknames I provided are still shortened and
unfortunately, less accurate.
4.)  There is a problem with the Chinese Nicknames, I need to find
more information about Chinese Nicknames.
5.)  I found and added a secondary source.  Unfortunately, you will
find names like "Winifred" appearing many times in many groups.

The updated file is at the same link I gave before.

I should add that my search strategy in finding nicknames list was to
search different nicknames
in Google like:
Jasper Casper 
Bob Robert

Please let me know if you need anymore clarifications.
Thank you.
-googleexpert

Request for Answer Clarification by reblazer-ga on 20 Oct 2003 14:11 PDT
GoogleExpert,

First of all, I want to sincerely thank you for your very hard work.

Upon carefully examining the data, however, there are some
peculiarities.

For example:
    Aaron Ron RonnE Erin Ronald
    Samuel Sam SammE Samantha Samson
 
I understand how Aaron becomes Ron.  But am I understanding your table
correctly to mean that Aaron is also associated with Ronald?  That
doesn't seem to be correct.

Similarly, I understand that Sam could become Samson, Samantha, Sammy,
or Samuel.  But does the table indicate that Samuel (male) and
Samantha (female) are associated?  That also doesn't seem right.

There are many examples of names that seem to use some sort of "free
association" kind of like BAD becomes MAD becomes FAD - and all FADs
are *not* BAD.  You get the point.  :-)

The bottom line is that in order to use this data we need to manually
review and regroup all 4,715 rows.  Unfortunately, it seems to be
unusable in its current state.

Any ideas?
Lazer

Clarification of Answer by googleexpert-ga on 22 Oct 2003 09:49 PDT
Hi Lazer,
I am sorry that the list I provided is not usable.
If you want a refund or want to repost your question, you can visit
this page:
http://answers.google.com/answers/main?cmd=refundrequest

In the meantime, I will do my best to find more Nicknames Sources to
Redo the List by November 7.

By the way, I found some Research Papers on Name Matching that might
help in your Application:

Adaptive Name Matching in Information Integration
http://www.cs.cmu.edu/~wcohen/postscript/intelligent-systems-2003.pdf

Using a Pronunciation Dictoinary and Phonetic Rules for Name Matching
Applications:
http://www.cs.rmit.edu.au/~jz/sci/p2.pdf

Request for Answer Clarification by reblazer-ga on 22 Oct 2003 11:30 PDT
GoogleExpert,

I am most impressed by your diligent efforts and will certainly *not*
request a refund!  :-)  I would be thrilled if you would really be
able to redo the list by then - that would demand a tip!

BTW, very interesting research papers you found!  

Thank you very much!
Lazer

Request for Answer Clarification by reblazer-ga on 23 Oct 2003 08:40 PDT
GoogleExpert,

I emailed the author of
http://search.cpan.org/src/BRIANL/Lingua-EN-Nickname-1.13/nicknames.txt
and he was kind enough to reply.  He wrote the following:

"
     ...there are actually three tab-separated columns in
nicknames.txt, the last couple containing space-separated data.
Column one contains the "root" name, the second column contains
various potential nicknames, and the third contains "related" names
(determined with almost arbitrary subjectivity). The third column
allows more (indirect) matches to a name, at a lesser degree of
certainty, for greater flexibility (e.g. maybe someone heard the name
wrong when entering it in a database).

In this case, I figured Aaron and Ronald share a syllable/nickname
(Ron), making them a candidate for an indirect match.

... "

Perhaps in addition to finding a more accurate source (let's call it
"E3"), you could do the following:  add an additional column to the
Excel file (let's call it "Indirect match").  All rows would have a
null value for that column except for the names from his third column
(the "indirect matches") from
http://search.cpan.org/src/BRIANL/Lingua-EN-Nickname-1.13/nicknames.txt
- that way the data would be there if we decide we want it, but we'd
also have the option of filtering it out...

Thanks again,
Lazer

Request for Answer Clarification by reblazer-ga on 29 Oct 2003 08:33 PST
Any updates, GoogleExpert?

Thanks again,
Lazer

Clarification of Answer by googleexpert-ga on 30 Oct 2003 09:35 PST
Lazer,
Thanks for waiting.
I'm still working on it.  I'll keep you on posted on any significant changes.

-googleexpert

Clarification of Answer by googleexpert-ga on 31 Oct 2003 09:36 PST
Lazer,
I added an "Indirect Match" Column to the file:
http://www.geocities.com/tmp264338/allnicks.xls

Unfortunately, I could not find new sources of nicknames.
Please let me know if I can be of further assistance.
Thanks.
-googleexpert
reblazer-ga rated this answer:5 out of 5 stars
Truly amazing efforts - great results!

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy