Google Answers Logo
View Question
 
Q: Load word Table into MSSQL ( Answered,   9 Comments )
Question  
Subject: Load word Table into MSSQL
Category: Computers > Programming
Asked by: twims-ga
List Price: $20.00
Posted: 09 Jul 2003 06:37 PDT
Expires: 08 Aug 2003 06:37 PDT
Question ID: 226928
I have some word document file , and in this document I have several
tables that
I have to load into MSSQL , Is there any way to load or I have to Use
any third party Software , Answer will be highly appreciated

Request for Question Clarification by mathtalk-ga on 09 Jul 2003 09:41 PDT
Hi, twims-ga:

How familiar are you with Excel and MS Access?  Would you be
interested in a recipe for moving the data from Word to MSSQL that
employs one or both of these Microsoft Office programs?

Will you be creating new database tables in MSSQL to hold the Word
information, or do you need to insert records into an existing table?

regards, mathtalk-ga

Clarification of Question by twims-ga on 09 Jul 2003 10:48 PDT
Thanks ,Mathtalk-ga
1)I'm well with both of the microsoft Programs,
2)and we can create a new database tables in MSSQL to hold the Word
information

Thanks for your earliest attn.

Clarification of Question by twims-ga on 14 Jul 2003 07:26 PDT
Hi mathtalk-ga,
I was looking for your answer 
Thanks
Answer  
Subject: Re: Load word Table into MSSQL
Answered By: mathtalk-ga on 14 Jul 2003 10:04 PDT
 
Hi, twims-ga:

As I'm sure you have seen, the process of moving information from a
Word document into a SQL Server database table must deal with the
unrestricted types of data that might be present in a Word "table". 
My resume, for example, uses a Word table to provide formatting
control, but it would be tough to "frame" that information as a nice
relational database table.

Therefore in addition to the mechanics of moving the data, I'd like to
make some suggestions about "cleaning" the data as it moves.

Typically a Word table will have few columns and many rows.  So long
as the row and column orientation of the table is to be preserved with
respect to the SQL Server table, this works somewhat to our advantage
in that there should be correspondingly few "datatypes" represented by
those columns that may require conversion.

To begin let me sketch out the recipe:

1) Start with a Word document containing a table and select/copy its
contents.

2) Paste the information into an Excel spreadsheet and save the
workbook.

3) Open Access and import the Excel spreadsheet into a new Access
table.

4) After doing any final data conversions, export the Access table to
SQL Server.

Now the details!

As our example I've chosen a data set mentioned in one of my recent
answers, the relative frequencies of amino acids in vertebrate
proteins:

[Q221855: A problem on probabilities]
http://answers.google.com/answers/main?cmd=threadview&id=221855

[Amino Acid Frequency] 
http://www.tiem.utk.edu/~harrell/webmodules/aminoacid.htm 

[Amino Acid Symbols]
http://www.medicine.mcgill.ca/nephros/aasymbol.html

I reformatted this relatively short table (there are only twenty amino
acids) in a Word table of three columns:

names of amino acid (Heading: Amino Acids)
three letter abbreviations (Heading: Symbol)
percentages of occurance (Heading: Observed Frequency in Vertebrates)

This presents us with a fairly simple variety of datatypes to work
with, a variable length character string (name), a fixed length
character string (symbol), and a number expressed as a percentage.

I then selected the contents of the Word table, copied these with
Ctrl-C, opened Excel and pasted (into the default new spreadsheet)
using Ctrl-V. Incidentally, here I'm using Office XP (Word 2002, Excel
2002, Access 2002) but the recipe followed is essentially the same as
what I've used on previous versions (going back to Windows for
Workgroups).

It is not necessary to pre-select a group of cells in Excel of the
exact size needed (3 by 21).  Rather it is enough to have the focus in
Excel on the upper left cell, and the pasting of the Word table's data
then fills out the required number of rows and columns.  I manually
tweaked the width of the columns (and the depth of the first row), but
this is was for aesthetics and is not mandatory.

Now Excel has recognized the percentages in the final column as
signifying a number, and the values are treated accordingly by Excel
(although it formats them to two decimal places including a trailing
zero).

I saved the Excel spreadsheet as Word2SQL.xls and closed it.  The
Excel spreadsheet can now be used as a data source by Access.

I opened Access and created a new database Word2SQL.mdb.  Using the
menu entry File > Get External Data > Import, I used the resulting
dialog box, set for Files of type: Microsoft Excel (*.xls), to choose
the previously save Word2SQL.xls.

Clicking the Import button leads us to an Import Spreadsheet Wizard
that displays either the spreadsheets or named ranges within the
targetted workbook.  In the immediate case our table is in the first
spreadsheet, which happens to be the default selection in the dialog. 
Therefore in the first step of the Wizard we simply click the Next
button.

The second step of the Wizard has an optional checkbox titled "First
Row Contains Column Headings" which (in our case) will work to our
advantage.  Using this option allows the Heading information to be
treated as column names later.  Again the checkbox is set by default
in our case, so the second step of the Wizard also requires us merely
to click the Next button.

The third step asks whether we want the data to be saved in a new
table, or in an existing one.  We accept the default, which is to put
it into a new table, and click Next.

The fourth step asks us to confirm the names of column headings and
the data types being used in each case.  It also gives the opportunity
to define indexes on columns.  Note that the final column (Observed
Frequencies in Vertebrates) is being converted as "double" numeric
values.  For the sake of simplicity I just accepted all the defaults
(although on the next screen we will implicitly create an index on
"Symbol") and clicked Next.

On the fifth step we finally do something, namely pick Choose my own
primary key: Symbol.  Clicking Next takes us to the final step, where
I replace the default table name (Sheet1 after the corresponding Excel
worksheet) with AminoAcid.  Click Finish.

The conversion from Excel to Access produced a table AminoAcid with
three columns and twenty rows, with two column headings that contain
(as outlined above) embedded spaces.

Some older versions of SQL Server did not allow column names to have
embedded spaces, but this can be accomodated by SQL Server 7 and
above.  If necessary (or desirable) one can rename the columns in
Access or perform data conversions before we take the next step of
exporting to SQL Server.

Where Excel may be a good place to do ad hoc adjustments (data
cleanup), Access is probably the best place to do systematic
transformations of data.  Access allows us to do a lot in the way of
conversion simply by changing the data type of a column.  If strong
machinery is needed, we can write a bit of code in Access VB for
user-defined functions, and these can be employed with "Create table"
and other sorts of Queries in Access.  In short, Excel is a good place
for initial touch-ups to data, but Access is probably better for
handling any systematic problems.  An exception to this scheme might
be for numeric values that require some advanced mathematical
treatment (e.g. converting percentages to standard deviations).

To Export an Access table to SQL Server, use the File > Export... menu
item (while the desired Access table is selected) and pick Save as
Type: ODBC database (the last entry on my dropdown list).  As it
happens, I'd previously created a "File Data Source" DSN called
TESTSQL.dsn that targets a database called "behave" on my local SQL
Server 2000 instance.  I just picked that one and clicked OK.  If you
don't have a suitable ODBC data source already to go, then use the New
button on the dialog and a wizard will walk you through the steps of
creating and verifying such a data source, then bring you back to this
point so you can use it.

Finally I used Query Analyzer to check over the results inside SQL
Server.  One minor and unavoidable glitch is that SQL Server stores
the numeric values in full (binary) precision, so that if you look at
them in Query Analyzer's result window, you are apt to see things
like:

Alanine      Ala	7.3999999999999996E-2

where we originally had 7.4 %  in the Word document table.  These
999's are a result of the underlying binary arithmetic and do not
indicate an error in the conversion process.  If it were desirable to
avoid this, a different datatype than "double" (namely decimal) could
be used in Access (with Scale equal to three, specifying the number of
positions to the right of the decimal place).

After reviewing my suggestions and seeing how they may apply in your
case, please use the Request for Answer Clarification feature to ask
about any concerns that you may have.  In the meantime I will zip up
the various MS Office example files used above and find a place on the
Internet where you can download them at your convenience.  I'll post
the link for this in a clarification below.

regards, mathtalk-ga

Request for Answer Clarification by twims-ga on 14 Jul 2003 11:16 PDT
Thanks Mathtalk-ga,
Actually In each of my word document file has around 10 to 15 tables,
and I don't want to do manually cut/paste into excel , i need a
program which identify the table in document ,  i.e word to XML and
then MSSQL

Clarification of Answer by mathtalk-ga on 14 Jul 2003 12:15 PDT
Hi, twims-ga:

Sorry for the misunderstanding.  If you want a fully automated
solution, I will need to know what versions of Word and MSSQL you work
with.

For the sake of clarity, are you saying that Word --> XML --> MSSQL is
the required path of conversion?

thanks in advance,
mathtalk-ga

Request for Answer Clarification by twims-ga on 14 Jul 2003 13:04 PDT
Yes you are right actually , I choose programming section of google
answer,
for automated solution of my question , 
1) I have MSSQL 2000
2) I have MSOffice 2000

and again I want to clear that 
1)I don't want to do manual cut/paste (Because word file qty is in
thousand(s))
2)Please suggest me a method to Export Table from Word  to MSSQL
3)I'm just suggesting like word-->XML-->--MSSQL
4)I want to write a program which identify table from word and then
export in to MSSQL

Clarification of Answer by mathtalk-ga on 14 Jul 2003 14:14 PDT
Hi, twims-ga:

Please see my comments below, which present some design issues for
your consideration.

regards, mathtalk-ga

Request for Answer Clarification by twims-ga on 16 Jul 2003 10:05 PDT
I was awaiting your well oraganised Answer

thanks

Clarification of Answer by mathtalk-ga on 16 Jul 2003 19:24 PDT
Hi, twims-ga:

If you prefer, you can respond to my last comment, asking about the
preferred method for updating the database (and about your programming
background), here rather than under the Comments.

I will then have all the ingredients necessary to "organize" the
recipe you want to write the program.

regards, mathtalk-ga

Request for Answer Clarification by twims-ga on 17 Jul 2003 06:12 PDT
Hi ,Mathtalk-ga,
Please give me the receipe for word to text format (csv format) 
then I will append it to my Database or create new table in my database ,
please give me oragnised receipe from scratch(Word Document) to text (csv format)
Thanks

Clarification of Answer by mathtalk-ga on 21 Jul 2003 09:34 PDT
Hi, twims-ga:

The Platform
============

The Visual Basic program you want could be written in a variety of
development environments.  Any one of the Office VBA platforms, an ASP
page, Windows Scripting Host (WSH), or even compiled VB6/VB.Net would
be feasible.

However I've decided to recommend an Access VBA program.  This engine
one you'd previously agreed to use, and it presents the possibility of
using the Access tables to "version" aspects of the conversion.

What I'm thinking is that besides the "conversion" routines, you may
want to create some routines that "query" for Word files in selected
directories and populate an Access table with the filenames and last
modified dates which are found there.  Add a field to hold the
datetime when the file is converted (from Word tables to .csv files),
and then you can track the progress of your conversion processing.  If
a Word file gets modified after it is converted, you'd be able detect
that and perhaps flag the file in your database for reconversion.

Add References
==============

Let's begin with adding References to an Access VBA "project".  I have
Access 2002 installed, so I'll be using that as a point of reference. 
Directions would be similar for other versions.

A standard way to get into the Visual Basic editor in Access is from
under the Tools > Macro menu, where you will also find the shortcut
Alt + F11.  However if you are starting from a relatively empty Access
database, you will want to begin by adding a module, which Access 2002
allows you to do from the Insert > Module menu (or by using the New
Module button within the Objects, view Modules "tab").

Even before you create a module, References can be added in the Visual
Basic editor using the Tools > References... dialog.  You'll see a
list of potential libraries to use with check boxes beside them. In
this context "References" means the COM interfaces that are exposed by
given object libraries.  Two will be particularly important for this
project:

Microsoft Scripting Runtime  (scrrun.dll)
Microsoft Word 10.0 Object Library  (msword.olb)

Note that your version of the Word Object Library may have a different
number.  Version 10.0 denotes the Office XP release.

The Scripting Runtime is useful for using the Windows file system,
e.g. files and directories, through the FileSystemObject and related
interfaces.  The Word Object Library, of course, allows us to
manipulate the Word application and documents.

Click the checkboxes beside both of these and close the References
dialog.  You may want to open the dialog again just to confirm the two
references are added by seeing that they've moved up to the beginning
of the list with the default references and any others that you've
added to your project.

Code Main Routine
=================

The top level routine, from the perspective of the immediate
conversion procedure, should probably take a single filename of a Word
document and take responsibility for parsing all of its tables.  In
this approach you have the flexibility to process different Word
documents in any order, which seems to me the right granularity.  One
can always wrap such a subroutine within a loop over a list of
documents, e.g. from the table of filenames suggested at top.

In this approach one would have a subroutine declaration that looks
something like this:

Public Sub Word2CSV(strWordDoc As String)

End Sub

Note that you will need to insert a module before you have a place to
insert such a code procedure.

Here is some pseudo-code for what the routine should accomplish:

0. Validate the input, e.g. that the string ends in ".doc".

1. Check that the file exists; exit (return) if it does not.

2. Open the file as a Word document, and check that this is
successful.

3. Check the count of tables in the document.  If there are none, exit
(return).

4. Assuming there are Word tables in the document, loop over them &
call ParseTable on each one.

The purpose of subroutine ParseTable will be to create a .csv file for
the given Word table.  It is usually a good idea to find ways to
compartmentalize the program's functionality.  Here the responsibilty
for handling the conversion of a single table is assigned to the
ParseTable subroutine, and the outer routine Word2CSV only needs to
take responsibility for passing on the information that ParseTable
will need to perform its job.

As I see it there are two arguments that ParseTable will need to have
passed to it by the calling routine, Word2CSV.  These are:

A) a reference to the Word table, and 

B) the name of the .csv file to create for this table.

You probably have some ideas about the naming conventions that would
be useful, but from my limited knowledge of the intended application,
I'd think to modify the name of the existing Word document to get
names for various comma-separated-value files which will result. 
Constructing these modified filenames then becomes a responsibility of
the outer routine Word2CSV.

What I specifically imagine is that besides changing the filename
extension from .doc to .csv, we would append a two-digit number to the
end of the proper filename.  Since you indicated the Word documents
contain no more than 15 tables apiece, the extra two digits in the
filename, corresponding to the position of a table within the Word
document, will allow a unique filename for each table.  If it is
possible for your Word documents to have more than 99 tables, then
this numbering scheme can be replaced by one that uses three- or
four-digit extensions.

With this in mind here are some quite detailed suggestions about tasks
0. through 4. itemized above.

0. You may have some additional validations on the filename string, or
you may feel comfortable omitting any validation because of your
confidence in the mechanism(s) by which Word2CSV is going to be
called.  In any case a check that the last four characters of a string
are equal to ".doc" can be coded like this:

Dim bDocStr As Boolean

' Third argument vbTextCompare disregards case differences, .doc vs.
.DOC
If StrComp(".doc", Right(strWordDoc, 4), vbTextCompare) = 0 Then
    bDocStr = True
Else
    bDocStr = False
End If

If Not (bDocStr) Then Return

1. To check if a fully-qualified filename corresponds to an existing
file, you will probably want to use the FileSystemObject from the
Scripting Runtime library.  Here's a code snippet to illustrate the
technique:

Dim fs As New Scripting.FileSystemObject
Dim bDocExist As Boolean

bDocExist = fs.FileExists(strWordDoc)

For a different illustration of using the Scripting Runtime's
FileSystemObject within VBA projects, see here:

[EDC Excel & VBA Tips - Using Microsoft Scripting Runtime]
http://www.erlandsendata.no/english/vba/fileaccess/scripting.htm

or consult the Access VBA help index entry under "file; size;" called
Size Property (FileSystemObject object).

Someplace later in your code, before exiting, you would want to
release the FileSystemObject object like this:

Set fs = Nothing

If you have a bit of familiarity with COM interfaces, this is the VBA
way to decrement the reference count on a COM object, allowing the
Windows operating system to reclaim the allocated memory once the
reference count drops to zero.

2. Once we know that the filename is valid and corresponds to an
existing file, we should try to open it as a Word document.  This will
require an instance of the Word application to be running.

Some care is needed to avoid problems with either creating too many
instances of the Word application, or creating an instance of Word
through "automation" that winds up being "shared" as a manually
started instance of Word (launched by a desktop user).  In the later
case you can have a scenario in with the interactive user closes the
shared instance of Word, causing subsequent errors for the
"automation" user (VBA code) that relies on a Word application
instance.

This last sort of thing is actually a bug, as Microsoft acknowledges:

[188546 BUG - Starting Word Manually Uses Same Instance as Automation]
http://support.microsoft.com/?kbid=188546

I suggest studying that article, and in particular their proposed
workaround, which involves creating a temporary instance of Word and
closing it after the "private" copy of Word has been created.  Note
that in the code shown there, the wordApp variable is declared
"globally", so that it potentially can be created once and reused for
numerous documents.

For the sake of illustration I will simply write:

Dim wordApp As Word.Application

If TypeName(wordApp) <> "Application" Then
    Set wordApp = CreateObject("Word.Application")
End If

This code fragment illustrates checking whether the variable wordApp
is already initialized, and if not using CreateObject to invoke a new
instance.

Once we have the Word application in hand, opening the document is
easy:

Dim wordDoc As Word.Document

Set wordDoc = wordApp.Documents.Open(strWordDoc)

After the instance of the Word application referenced by wordApp is no
longer needed, a couple of steps are needed to dispose of it properly:

If TypeName(wordApp) = "Application" Then
    wordApp.Quit  SaveChanges:=wdDoNotSaveChanges
    Set wordApp = Nothing
End If

NB:  The parameter wdDoNotSaveChanges is intended to coax the Word
instance into closing without protest, but it is still possible for
this quitting operation to take a bit of time.  When working in a
VBScript (client-side) environment, I have found it prudent to put a
bit of delay between the Quit command and setting the object variable
to Nothing.  I don't think this will be the case in your program, but
it's something to keep in mind.

3. The Word document objects have, as previously discussed, a
collection type member called Tables, which contains object references
for every Table object in the document in a kind of array.  To see how
many there are, we can do this:

Dim nTables As Integer

nTables = wordDoc.Tables.Count

If nTables = 0 Then Return

4. Also as discussed before, a loop over the tables in the document
can take advantage of the "For Each" syntax supported by VBA
collection objects, ie. the wordDoc.Tables collection in this case. 
We might code something like this:

Dim wordTbl As Word.Table

nTables = 0
For Each wordTbl In wordDoc.Tables
    nTables = nTables + 1
    Call ParseTable(wordTbl, Left(strWordDoc, Len(strWordDoc) - 4) _
      & Format(nTables, "00") & ".csv")
Next wordTbl

where the subroutine ParseTable, to be discussed next, has been
declared this way:

Public Sub ParseTable(wordTbl As Word.Table, strCSVfile As String)

End Sub

Note that the integer variable nTables is being "reused" here to count
through the tables as we loop, providing the information about the
"index" of wordTbl in the collection wordDoc.Tables needed to
construct a unique filename for the .csv output that ParseTable will
require.

Code Inner Subroutine
=====================

With the framework of the program out of the way, you can now focus on
coding the core functionality.  A number of the ingredients have
already been discussed.  Manipulation of tables, and in particular how
to extract the string content of the cells row by row and column by
column, is amply illustrated in these pages:

[MS Word 2002 VBA - Working with Tables]  
http://msdn.microsoft.com/library/en-us/vbawd10/html/wohowWorkingWithTables.asp
 
[Exporting Word 2000/2002 Documents to XML]  
http://msdn.microsoft.com/library/techart/odc_expwordtoxml.htm  
 
The output to CSV format may require a bit of discussion.  If we are
treating all cell values as text, it would perhaps be a good idea to
output them as quoted text.  To use a row from one of your examples as
an illustration, we might like to have output lines that look like:

"empname "," rate 8 Hrs "," rate 10 Hrs"
"Mike    ","$25         ","$40         "

rather than simply:

empname , rate 8 Hrs , rate 10 Hrs
Mike    ,$25         ,$40         

The decision may depend on what your downstreaming process for these
files will be.  Also, think about the possibility that some table
cells may already contain "double quote" characters.  If so, some
logic will need to be written to handle them (assuming you want the
quoted strings to load properly into the database).

Another design decision concerns the leading and trailing spaces that
cell values may have (as is the case above).  If you want to remove
these, then one option is to rely on the Visual Basic functions LTrim,
RTrim, or Trim (the latter removes both leading and trailing spaces).

Just as the Word document has a collection of tables called Tables,
the Word table object has a collection of rows called Rows.  Since the
CSV output needs to be organized by rows, it seems to be logical to
structure ParseTable as an outer loop over the items in wordTbl.Rows
and an inner loop over the cells in a particular row.

Here then is a list of tasks to be performed by ParseTable:

1. Use a FileSystemObject to create a TextStream object, using the
CreateTextFile method on the filename.

2. Loop (outer) over the individual rows in wordTbl.

3. Loop (inner) over the individual cells in each row, appending
successive field contents to a string.

4. After the inner loop exits, write the string constructed to the
TextStream output file as a single line.

5. After the outer loop exits, close the TextStream output file; set
objects to Nothing before returning.

Something of the same structure can be seen in Kevin McDowell's
program above, which exports Word to XML.  His ParseTable routine is
coded in the file XMLConverter.cls, which you will see if you download
and unzip his demo files.  A slight modification of that code for your
purposes might look like this:

Public Sub ParseTable(wordTbl As Word.Table, strCSVfile As String)

Dim fs As New Scripting.FileSystemObject
Dim ts As Scripting.TextStream

Set ts = fs.CreateTextFile(strCSVfile)

Dim strTemp As String
Dim wordRow As Word.Row
Dim wordCell As Word.Cell
Dim nRow As Integer

    ' Outer loop on each row.
    For nRow = 1 To wordTbl.Rows.Count
        Set wordRow = wordTbl.Rows(nRow)
        
        strTemp = ""
        
        ' Inner loop through each cell in current row.
        For Each wordCell In wordRow.Cells
            
            ' Call cell parser... make it insert commas?
            strTemp = strTemp & ParseCell(wordCell)
        Next
        ts.WriteLine strTemp
    Next

ts.Close
Set ts = Nothing
Set fo = Nothing
Set fs = Nothing

End Sub

In appealing to yet another subroutine ParseCell (as Kevin did in his
code), I avoid some of the picky details about where to place the
commas and (possibly) quotation marks.  In this sort of application
there's a tricky bit of logic needed one way or the other to suppress
the final comma.  As written here the introduction of commas would be
a responsibility of ParseCell, so perhaps the smartest approach is to
make ParseCell aware (through the Parent of wordCell) when the last
cell in a row is being processed.  Alternatively the last comma can
simply be "deleted" by shortening strTemp by one character just before
it is written to the output.

regards, mathtalk-ga
Comments  
Subject: Re: Load word Table into MSSQL
From: mathtalk-ga on 14 Jul 2003 12:30 PDT
 
Hi, twims-ga:

This is in regard to your request for clarification, "Actually In each
of my word document file has around 10 to 15 tables, and I don't want
to do manually cut/paste into excel , I need a program which identify
the table in document,  i.e word to XML and then MSSQL."

Although the detailed description I've given above may give the
appearance of a lengthy process, it really is a case of taking much
longer to explain than to do.

Assuming a minimal amount of data fiddling is required, as in my
example, and that the ODBC data source already exists, the Word to
Excel to Access to MSSQL path takes less than two minutes per table
(and will go even faster with practice and with a batch processing
approach to 10 or 15 tables per document).

My guess is that the first document can be converted in under an hour,
and that subsequent documents can be converted in substantially less
time, maybe 15 minutes each.

Automating the process makes sense if this is an ongoing
responsibility, but it would be an investment to do the necessary
custom programming for the first few documents.

On the other hand, if a substantial amount of data cleanup is
required, then there is probably little hope of automating the process
completely.  Perhaps a sample of the kind of data you want to export
from Word to MSSSQL would help in illuminating this concern.

regards, mathtalk-ga
Subject: Re: Load word Table into MSSQL
From: mathtalk-ga on 14 Jul 2003 14:11 PDT
 
Hi, twims-ga:

Okay, let's sketch out a plan for programming this, starting with the
Word documents (of which you have thousands).

One possibility is to use Office Automation, as it applies to the Word
object model, to save each Word 2000 document as XML.  Then we could
apply an XSLT transformation to filter out only the tables, possibly
expressing the content directly as a SQL script with suitable CREATE
TABLE and INSERT statements.

Another possibility is to increase the use of VBA/Office Automation. 
For example, one could carry out a manipulation of the Word object
model from within any of the Office VBA environments.  Using Word 2000
itself might have some advantages, but Access 2000 would have some
advantages on the database side.  One could even build a compiled VB6
application that does the Word manipulation.

So let me offer you the choice of these two initial approaches, the
first which makes a minimal use of the Word object model in order to
move the "action" to XML along the line you suggested, or one which
makes a greater use of the Word object model, e.g. to identify tables
with Word objects rather than by XSLT patterns (essentially the XPATH
standard).

Some design decisions need to be made around the general structure of
the tables to be created:  naming conventions for the tables and
columns, and assumptions about the datatypes to be imported.

Please post an example of the sort of table to be imported and make
some comments about how different or similiar all the tables can be
expected to be.

thanks,
mathtalk
Subject: Re: Load word Table into MSSQL
From: twims-ga on 14 Jul 2003 19:49 PDT
 
Thanks 
Table (1) Like:
COL1 | COL2 | COL3.1|COL3.2|
     |      | COL3.1|COL3.2|
-----|------|-------|------|
aaa  |bbb   |ccc    |ddd   |
-----|------|-------|------|
a1a1a|b1b1b1|c1c1c1 |d1d1d1|
-----|------|-------|------|

Table (2) :

-----|------
COL1 | COL2 | COL3.1|COL3.2|COL4.1|COL5.1|
     |      | COL3.1|COL3.2|COL4.1|COL5.1|
-----|------|-------|------|------|------|
aaa  |bbb   |ccc    |ddd   |eee   |fff   |
-----|------|-------|------|------|------|
a1a1a|b1b1b1|c1c1c1 |d1d1d1|e1e1e1|f1f1f1|
-----|------|-------|------|------|------|
Subject: Re: Load word Table into MSSQL
From: twims-ga on 15 Jul 2003 06:10 PDT
 
Hi, mathtalk-ga,
Please  go with the other possibility with the use of word object
model  (VBA/Office Automation)

Thanks
Subject: Re: Load word Table into MSSQL
From: mathtalk-ga on 15 Jul 2003 08:59 PDT
 
Hi, twims-ga:

The Word 2002 object model includes a "Tables" collection, defined for
each Document object.  Please take a look at these code samples to get
a feel for the properties and syntax in VBA:

[MS Word 2002 VBA - Working with Tables]
msdn.microsoft.com/library/en-us/vbawd10/html/wohowWorkingWithTables.asp

Essentially we could set up a "loop" over the ActiveDocument.Tables
collection:

Dim tbl As Table

For Each tbl In ActiveDocument.Tables
    ParseTable(tbl)
Next

where ParseTable is a "to be written" routine that actually exports
the table.

Some helpful experience with parsing Word tables can be gleaned from
this project:

[Exporting Word 2000/2002 Documents to XML]
msdn.microsoft.com/library/techart/odc_expwordtoxml.htm

which improves somewhat upon the native capability of Word to export
to an XML format.  See in particular the comments on detecting nested
tables:

"This solution provides a starting point to build an XML parser for
Word documents. In addition to the XML functionality, it discusses how
to build custom objects to handle sequential instances of all styles
and graphics and how to loop through tables and lists."

I'm unable to tell, from the "sample" tables you've shown above, what
sort of assumptions are appropriate about the datatypes for particular
columns.  It suggests, from the labeling of columns, that there may be
issues with normalization (repeating groups) or maybe with nested
tables, but that the first row of cells should be treated as column
headings.

Perhaps there would be some value to an "export" in which all cell
values were treated as text, although I fear this simply throws the
difficulty of datatype identification onto the SQL Server platform. 
Is that how you prefer to proceed?

regards, mathtalk-ga
Subject: Re: Load word Table into MSSQL
From: mathtalk-ga on 15 Jul 2003 09:02 PDT
 
The links above were not "clickable" because I omitted the "http" prefix:
 
[MS Word 2002 VBA - Working with Tables] 
http://msdn.microsoft.com/library/en-us/vbawd10/html/wohowWorkingWithTables.asp

[Exporting Word 2000/2002 Documents to XML] 
http://msdn.microsoft.com/library/techart/odc_expwordtoxml.htm 

Sorry for any inconvenience.

-- mathtalk-ga
Subject: Re: Load word Table into MSSQL
From: twims-ga on 15 Jul 2003 14:06 PDT
 
Hi,
Let me clear the Exact table Structure
-----------------------------------
empname | rate 8 Hrs | rate 10 Hrs|
--------|------------|------------|
Mike    |$25         |$40         |
--------|------------|------------|
Allen   |$40         |$60         |
--------|------------|------------|
Waugh   |$50         |$65         |
--------|------------|------------|
Don't worry about datatype issue, Just please let me know the exact
method to upload
Thanks
Subject: Re: Load word Table into MSSQL
From: twims-ga on 16 Jul 2003 10:05 PDT
 
I was awaiting your well oraganised Answer 
 
thanks
Subject: Re: Load word Table into MSSQL
From: mathtalk-ga on 16 Jul 2003 11:26 PDT
 
Hi, twims-ga:

Once the database table design is decided (for a particular Word
table), there are a range of approaches to upload records (extracted
row by row from the Word table) into the SQL Server table.

Given that we are working with the VBA/Word automation framework, the
choices for an output method range from:

1) emitting a text file containing the relevant SQL commands (perhaps
including the table creation logic), to:

2) performing the record insertions within the VBA program, e.g. by
using ADO with the native OLEDB provider for SQL Server.

I have not formed much of an impression as to your background as a
programmer, so my only thoughts as to what preference you might have
in this spectrum of choices concern the issue of data cleanup.

It is my experience with attempting to load databases from text files,
even those especially created for the purpose, that there will be a
certain percentage of records that fail to insert because of improper
field values.

Under these circumstances it is helpful, in my opinion, to take the
first approach even though this may seem less automated.  Perhaps the
best explanation for my view is that a SQL script produced by 1) can
be placed under version control, and thereby the history of data
cleanups can be tracked with considerable reliability.

On the other hand you have not shown much interest in the issue of
data cleanup, and it may be that approach 2), together with an
exception handling mechanism that traps SQL errors on inserts that
fail, will prove to be more suitable for your purposes.

It would be helpful if you could, in addition to indicating a
preference between these two general approaches, explain how your own
programming experience relates to the preferred method.

regards, mathtalk-ga

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy