Google Answers Logo
View Question
 
Q: Script to categorize companies ( No Answer,   0 Comments )
Question  
Subject: Script to categorize companies
Category: Computers > Programming
Asked by: drewiu-ga
List Price: $50.00
Posted: 13 Jun 2006 19:12 PDT
Expires: 15 Jun 2006 09:12 PDT
Question ID: 737949
Hello Experts,

   I am summer intern and one of my projects is to analyze the
customer base of my company.   I?ve got huge list of clients including
names, emails, company names, and  addresses.   This list is fairly
accurate, but not perfect.  For example, Microsoft could be listed as
?Microsoft?, ?Microsoft, Inc.? or ?Microsoft Corporation?.  Also, some
customers have used a free e-mail account (yahoo, gmail?) instead of
the company domain.

I?ve been asked to individually look up the industry and employee
count for about 6,000 of the most recent customers.  However, I?d
really like to impress them by categorizing as many as I can ? and
I?ve got over 35,000 records!

For each record, I?d like to place it in one of these categories?
?	Education
?	Government
?	Military
?	Non-Profit
?	Business with 1-100 Employees
?	Business with 101-500 Employees
?	Business with 501-1000 Employees
?	Business with 1001-10000 Employees
?	Business with 10000+ Employees	

This categorization will be used to better understand our customer
base.  We will NOT be using this data for any type of spam and it will
not be resold. (this is an ethical project)  Because, I respect the
privacy of the customers, I cannot provide the raw data.  Please
assume it?s a long CSV file like this?

?Gates,Bill?,?Microsoft, inc?,? One Microsoft Way?,?Redmond?,?WA?,?
98052?,?jsmith@microsoft.com?
?Summers,Lawrence ?,?Harvard?,? Massachusetts Hall?,?
Cambridge?,?MA?,? 98052?,?lsummers@harvard.edu?
?Gates,Bill?,?Microsoft, inc?,? One Microsoft Way?,?Redmond?,?WA?,?
98052?,?jsmith@microsoft.com?
?Summers,Lawrence ?,?Harvard?,? Massachusetts Hall?,?
Cambridge?,?MA?,? 98052?,?lsummers@harvard.edu?


So here is the challenge...  Find an elegant way to automate the
process of categorizing these records.  Write some sort of script that
can go through each record, query an online source, such as Hoovers,
Google finance, or whatever source other you can find.  Find a match
and return an employee count and a categorization from the list above.

This script should be able to handle it when a company is unlisted, or
the name is slightly off.   Google finance can find the correct
company most of the time, even if the name is a variant

Obviously this categorization will not be perfect, but try to keep the
margin of error as low as possible.  Simple things like not
categorizing every person with a @yahoo.com email address as a Yahoo
employee should be done.  Feel free to ask any questions.

Request for Question Clarification by pafalafa-ga on 14 Jun 2006 13:23 PDT
drewiu-ga,

I'm not a programmer, so I can't help you with a script, though I
doubt anyone will be able to meet your needs with only $50 worth of
effort.

But for starters, why not just count the extensions in your database? 
That is, the number of gov, mil, edu, org and com endings that show
up.

If nothing else, this should give a very quick overview of the
breakouts by these broad sectors, and make the task of parsing
individual segments (probably the com's) much more manageable.

Just a suggestion...enjoy your internship experience.  

pafalafa-ga
Answer  
There is no answer at this time.

Comments  
There are no comments at this time.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy