I need 3 comprehensive lists: "USA" last names, USA county names, and
USA street names, including their proper punctuation and
capitalization. The need arises from working with mailing databases
that are delivered in all caps (presumably to dodge just this kind of
problem) and which I would like to get back into punctuated and
mixed-case form. The question concerns, for instance, names like
MacArthur, which get two caps, and Macon, which gets just one, as well
as DIMAGGIO / DIAMOND, LASALLE / LANSING, LECOUTURE / LEAVENWORTH, and
so forth.
To start with, I am asking for a comprehensive list of _where_ these
lists are
available for purchase separately and within software which does the
name and address massage, and how much each vendor charges for their
wares. |
Request for Question Clarification by
scriptor-ga
on
12 Nov 2002 13:08 PST
Dear evan0,
Since it is not even clear how many streets exist in the USA, it may
be possible that a complete list of street names does not exist at
all. Also, I doubt that there is any database or list with all
American last names.
Regards,
Scriptor
|
Request for Question Clarification by
webadept-ga
on
12 Nov 2002 14:45 PST
Hi,
Presumably, since you already have the list, you could use a program
that would alter the text from all upper case to the mixed case you
are wanting to use. This is a relatively simple program if you have
Perl installed or can install it on your computer, but could be done
with just about any other language and format as well. Would you be
interested in a solution such as this?
webadept-ga
|
Clarification of Question by
evan0-ga
on
12 Nov 2002 16:25 PST
Clarification for scriptor: Such databases do indeed exist, and are
the basis of direct mail marketing and postal delivery, and services
that do just this kind of conversion. I'd prefer to have the
translation utility available in house if possible, though, rather
than paying say $30/K (or about $300 for my current list of about 10K
contacts) each time I need the service, or paying $2K to buy a program
(not to mention subscription fees) to do the process.
Clarification for webadept: It's not a programming issue. The issue
is, for instance, the (let's say) 8,000 unique last names and place
names that in uc begin with DI, LA, LE, MAC, etc, that have to have
been listed in proper mixed case at least once to establish that:
%xlat = (
LASALLE => LaSalle,
LANSING => Lansing,
# ...
);
Best,
Evan
|
Request for Question Clarification by
webadept-ga
on
12 Nov 2002 18:41 PST
QUOTE "I'd prefer to have the
translation utility available in house if possible, though, rather
than paying say $30/K (or about $300 for my current list of about 10K
contacts) each time I need the service, or paying $2K to buy a program
(not to mention subscription fees) to do the process.";
Okay, I'll try this again, what I was saying is, : The process for
doing this is not that difficult. It's simply a matter of changing the
lowercaes and uppercase formats of your database to match "common
standards". For instance if the name is MACDOUGAL, the common change
for this is MacDougal.
://www.google.com/search?num=100&hl=en&lr=&ie=ISO-8859-1&safe=off&q=MCDOUGAL&btnG=Google+Search
The programming needed to make such a utility to change a set of data
into this is very simple, and if you have Perl I could make it for
you. If you do not, then you could look at installing it, and I could
give you instructions. If you don't want to use Perl for this, then it
could still be done, by hiring a programer to make the simple utility
it would take to covert all of your data into the proper format, for
about $100-200. Then you would not have to worry about having
subscription fees or huge conversion costs or finding another source
for your mailing lists.
The program would take care of all name changes by matching them to
common standards found on the web, and recording those standards after
it has found them. Really simple stuff. If you are interested, then
cool, if not, then I'm sure one of the other researchers will be able
to help you with finding a new data source.
A break down of how this program would work is
A) read the name
B) check known list for standard
C) if not found then search the web for that name, find name with
lower case letters, and Upper case first letter. Find 10 of these,
take most common form, record into "known list for standards"
D) alter name in datatable to the standard
e) next name.
Street names, county names and country names could all be done in the
same fashion. After a few times, the web would no longer be searched
because you would have a pretty comprehensive list in the Known
standards file.
webadept-ga
|
Clarification of Question by
evan0-ga
on
12 Nov 2002 20:59 PST
Hello Webadept,
OK, I get it now, and damn, it's painful to say that.
Perl is fine, just no write-only code, please, and comments. No UI
needed, but some useful args would be nice, and/or we'll edit .pl as
needed.
(0) Need knob for minimum url's to examine, probably just take the
first match from each(?).
(1) Need 'knob' for near-ties. The nominal "standard" needs to
out-poll the second best by a ratio of at least, say,
$1stTo2ndMinRatio. For instance, try LECOUTURE, and find approximately
equal counts of LeCouture and Lecouture. But can resolve this for
instance with (2)...
(2) Need selectable language filtering. I would guess there are
standard techniques for language determination; otherwise maybe use a
list of telltales (for English, 'and', 'the', etc; ) and might as well
try to separate written dialects as well (center/centre, etc.) Then
LeCouture is the preferred spelling on English-speaking pages.
So: How about you write whatever is $125 worth of perl along these
lines, and I take that much out for a spin? Let me know if that's ok
and I'll post (clarify) a sample list of a thousand or so
DE,DI,MAC,LA,LE of present interest. Also advise how I specify that I
want the answer from you in particular -- is there a way to do that?
Best,
Evan
|
Request for Question Clarification by
webadept-ga
on
13 Nov 2002 00:09 PST
I think most of your concerns will be covered by the script. I've done
things like this before. Since you are most concerned with US
addresses and streets, it becomes pretty easy to check with the US
Post office for spelling and then run it against another site. You'll
see what I'm talking about. If you have some experience with Perl
you'll see that this is really the best tool for this sort of thing.
Easy to edit and adjust.
The offer is fine, and you can put my name on the question as the
preferred researcher. It's not really frowned upon, but not encouraged
either. I don't think they would have a problem with this if you refer
to this question as the reason.
Posting the list to the question is fine as well, but if you have a
website you can put it on and link too, that would be better. I have a
service that I can post the files to you with hyperlinks, so you can
just download the completed data test and when you feel that is
working for you I'll post the code.
Code is completely yours, and editable. Perl is a scripting language,
though there are methods of compiling it to executable format, it's
not something I do. If you are on Windows, please let me know, and
I'll point you to where to get ActiveState Perl and how to get it
installed. The computer you do this on will need a connection to the
Internet obviously, DSL or Cable is good, modem will work, but the
program will run pretty slow if you are on that.
Look forward to your question and getting started on this. Thanks,
webadept-ga
|
Clarification of Question by
evan0-ga
on
14 Nov 2002 09:56 PST
webadept-ga,
Question ID: 107703 has been posted for $105. Please post an "answer"
to this current question (Question ID: 106245) for the other $20, and
to tie of the loose end.
Best,
Evan
|