Google Answers Logo
View Question
 
Q: Extract data from many pdf files or MS Word docs to single MSExcel or equiv file ( No Answer,   2 Comments )
Question  
Subject: Extract data from many pdf files or MS Word docs to single MSExcel or equiv file
Category: Computers
Asked by: khunjoe-ga
List Price: $20.00
Posted: 12 Sep 2004 22:57 PDT
Expires: 18 Sep 2004 19:02 PDT
Question ID: 400388
I have about 1000 pdf files (that I can already convert to MS Word if
necessary).  I would like to take selected information from each pdf
(information found in specified locations, such as ID number,
variable1, variable2, etc) and insert those values into a single Excel
file or equivalent, ultimately to run analyses in SAS.  In other
words, I would like to condense specified data from around 1000 pdf
files into a single Excel file with a given structure.

A small challenge is that the pdf files have some incomplete data. 
For example one pdf file with all the data may list data as follows:
 
Site A            IDnumber 9001 
L2    121.1
L3    135.2
L4    119.9
L5    145.7
 
 
Another pdf file, with a missing value for L4 would read as follows:
 
Site B            IDnumber 9007
L2    187.1
L3    191.0
L5    209.1
 
 
An Excel (or equivalent) file generated by extracting data from the
above pdfs must read roughly as follows:
 
site  idno                L2           L3             L4            L5
A     9001                121.1        135.2          119.9        145.7
B     9007                187.1        191.0          .            209.1
 
 
(that is a period "." under L4 for number 9007)

My question is the following:  How can I automate this data entry for
less?  I need to enter the data to begin my thesis work, and there is
no budget allowance for data entry (which could take some time).

Clarification of Question by khunjoe-ga on 15 Sep 2004 20:41 PDT
I don't know how to provide a shortcut to a file.  If you have an idea
about how I could do so, I will gladly.  Otherwise, I could email one
or a few files (the PDFs are less than 200KB.)

Clarification of Question by khunjoe-ga on 18 Sep 2004 19:01 PDT
some additional snooping my part found someone at vernon@vernon.ch who
very expertly answered this question with a customized plug-in.
Answer  
There is no answer at this time.

Comments  
Subject: Re: Extract data from many pdf files or MS Word docs to single MSExcel or equiv file
From: rbrookes-ga on 15 Sep 2004 15:10 PDT
 
Can you provide shortcut to one of the pdf files to develop from?
Subject: Re: Extract data from many pdf files or MS Word docs to single MSExcel or equiv file
From: khunjoe-ga on 17 Sep 2004 02:10 PDT
 
I posted a reply earlier but it has not shown up yet.  I would be
happy to provide a shortcut to a few of the PDF files, but I don't
know how.  I have emailed googleanswers with some files to see if they
will post them to you.
I am open to any ideas you have about other ways to do this.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy