![]() |
|
![]() | ||
|
Subject:
MS Word Document Data Mining
Category: Computers > Software Asked by: steveb-ga List Price: $10.00 |
Posted:
19 Jun 2002 10:19 PDT
Expires: 11 Jul 2002 09:48 PDT Question ID: 29245 |
I have an interesting problem; I am hoping to automate extraction of name, contact, and address information from hundreds of word documents into a database or spreadsheet. All the information is within the first 15-20 lines of each document, below letterhead, and letter header information (from, date, etc). All the documents vary slightly and do not use a template or standard layout. Is there existing software or scripts, or some other means to pull this data? Any comments are appreciated! Thanks! Steve |
![]() | ||
|
There is no answer at this time. |
![]() | ||
|
Subject:
Re: MS Word Document Data Mining
From: chilledout-ga on 19 Jun 2002 11:54 PDT |
Mike, Both Word and Excel support a scripting language called VBA, which is a form of Visual Basic. You could automate this using VBA, which is not too difficult for someone with programming or scripting experience. The main problem you will encounter is inconsistancies between different documents. For example, how would the script know the difference between a name and an address. Without seeing your exact documents I can't really determine how difficult it would be. Hope that helps! Joshua |
Subject:
Re: MS Word Document Data Mining
From: steveb-ga on 19 Jun 2002 12:28 PDT |
Thanks Joshua That is my backup plan, I am hoping for existing software as programming such a script will take many hours. Thanks! Steve |
Subject:
Re: MS Word Document Data Mining
From: anand_suhana-ga on 20 Jun 2002 05:26 PDT |
Mail me two document sampls and mention the fields, have done thjis stuff before, will see if i can work a very quick fix. ANAND artyfact_in@hotmail.com |
Subject:
Re: MS Word Document Data Mining
From: anand_suhana-ga on 20 Jun 2002 05:32 PDT |
Also try a programme called 80:20...I do not knwo whether it will work though...but is worth a try. ANAND |
Subject:
Re: MS Word Document Data Mining
From: ddent-ga on 20 Jun 2002 13:54 PDT |
One method would be to use a utility such as catdoc (http://www.ice.ru/~vitus/catdoc/) (for the Linux operating system) to extract the text from the files, and then to pipe the output of that through 'head' (a common utility included with most linux distributions which will give you the first n lines of a file). From that point, you can either manually add the information to the database, or you can use some kind of pattern matching tool such as AAC (http://www.patrice.ch/en/computer/programs/aac/aac.html) to identify phone numbers and addresses to some extent automatically, or if you want to spend some money for a commercial product, (where they may be able to help get you going), http://www.vedit.com/office-tasks.htm may be of use. Hope this helps! |
Subject:
Re: MS Word Document Data Mining
From: shivakumar-ga on 23 Jun 2002 00:49 PDT |
hi Steve, There is a software which does exactly what you require It identifies all the contact infromation which you requre its name is Address grabber to know further about the product check the website http://www.egrabber.com/addressgrabberdeluxe/addressgrabber_s.htm They have also provided a 15 day free trial http://www.egrabber.com/addressgrabberdeluxe/trial.htm Rgds, sivakumar. |
Subject:
Re: MS Word Document Data Mining
From: heinrich-ga on 26 Jun 2002 21:18 PDT |
Email me a sample document, perhaps I can do it for you. kochhw@intekom.co.za |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |