I have about 1000 pdf files (that I can already convert to MS Word if
necessary). I would like to take selected information from each pdf
(information found in specified locations, such as ID number,
variable1, variable2, etc) and insert those values into a single Excel
file or equivalent, ultimately to run analyses in SAS. In other
words, I would like to condense specified data from around 1000 pdf
files into a single Excel file with a given structure.
A small challenge is that the pdf files have some incomplete data.
For example one pdf file with all the data may list data as follows:
Site A IDnumber 9001
L2 121.1
L3 135.2
L4 119.9
L5 145.7
Another pdf file, with a missing value for L4 would read as follows:
Site B IDnumber 9007
L2 187.1
L3 191.0
L5 209.1
An Excel (or equivalent) file generated by extracting data from the
above pdfs must read roughly as follows:
site idno L2 L3 L4 L5
A 9001 121.1 135.2 119.9 145.7
B 9007 187.1 191.0 . 209.1
(that is a period "." under L4 for number 9007)
My question is the following: How can I automate this data entry for
less? I need to enter the data to begin my thesis work, and there is
no budget allowance for data entry (which could take some time). |