|
|
|
date: Thu, 10 Apr 2008 21:39:16 -0700,
group: microsoft.public.word.vba.beginners
back
Stepping through a document
I have dozens of Word documents that are composed mostly of tables, and a
line that labels the table.
There may be many tables in one document. Some documents have hundred of
rows in many tables.
e.g:
Table name or descriptor
column one data column2 data
next Col1 data corresponding col2 data
Next table name
col1 Data Column two data
Another Column1 It's column 2 data
Generally, there is one line of description for each table, followed by the
table.
These Word documents were created over a space of several years, and were
manually maintained, so that "generally" is key. I'm sure that sprinkled
throughout these files are notations, keyed-entries, explanatory texts, etc.
The spacing between tables and headers is not standardized.
The style is not standardized.
What I need to to export all these docs to either Excel or Access, and
create a three-column table, with the table header as the first or third
column - not important.
I can't wrap my head around a algorythm for this. I can't do a for each..
next loop, I don't think.
Do I step through each paragraph, and assess if it is a row of a table? If
so, how do I do that?
How do I handle a situation if there is one row or one table that has three
cells in it?
If I am stepping through paragraphs, and determine that it is a row in a
table, how do I then stop through the cells in that row?
I don't know how to approach this, and any help will be GREATLY appreciated!
date: Thu, 10 Apr 2008 21:39:16 -0700
author: Margaret Bartley
Re: Stepping through a document
Hi Margaret,
Just to start you thinking, one way I might approach this is to go
through the tables collection in word,
eg (the pseudo code below)
dim myTable as table
for each myTable in activedocument
' do domething like get the first and last row of the table
' and list it in a report so that you could check that it is
picking up the info correctly
'use myTable..Columns.Count to output the number of columns in the
table.
next
Having a handle on each table you could use the Range method to look
at the paragraph before the table and/or after it. Again to check/
list that you have the correct table.
Hope this helps to get you thinking.
Cheers!
TonyS.
perhaps
On Apr 11, 2:39 pm, "Margaret Bartley"
wrote:
> I have dozens of Word documents that are composed mostly of tables, and a
> line that labels the table.
> There may be many tables in one document. Some documents have hundred of
> rows in many tables.
>
> e.g:
>
> Table name or descriptor
> column one data column2 data
> next Col1 data corresponding col2 data
>
> Next table name
> col1 Data Column two data
> Another Column1 It's column 2 data
>
> Generally, there is one line of description for each table, followed by the
> table.
>
> These Word documents were created over a space of several years, and were
> manually maintained, so that "generally" is key. I'm sure that sprinkled
> throughout these files are notations, keyed-entries, explanatory texts, etc.
>
> The spacing between tables and headers is not standardized.
> The style is not standardized.
>
> What I need to to export all these docs to either Excel or Access, and
> create a three-column table, with the table header as the first or third
> column - not important.
>
> I can't wrap my head around a algorythm for this. I can't do a for each> next loop, I don't think.
> Do I step through each paragraph, and assess if it is a row of a table? If
> so, how do I do that?
> How do I handle a situation if there is one row or one table that has three
> cells in it?
>
> If I am stepping through paragraphs, and determine that it is a row in a
> table, how do I then stop through the cells in that row?
>
> I don't know how to approach this, and any help will be GREATLY appreciated!
date: Sat, 19 Apr 2008 17:44:00 -0700 (PDT)
author: Tony Strazzeri
Re: Stepping through a document
Margaret,
If you create the following folders
C:\WordData
C:\WordData\Processed
and you move all of the documents from which you want to extract the data
into the folder C:\WordData
The if you create a database named WordData in the C:\ folder and in it you
create a table named tblWordData containing the following fields
Descriptor
Column1
Column2
and you create a Form in the database that contains a command button and if
you have the following code in that command button Click event, it will
import all of the data from all of the tables in all of the documents in the
C:\WordData folder into the table tblWordData
Dim dbsWordData As Database
Dim rstWordData As Recordset
Dim wordApp As Object
Dim vDescriptor As String
Dim vColumn1 As String
Dim vColumn2 As String
Dim FldrPath As String
Dim RecordDoc As String
Dim Source As Object
Dim SourceTable As Table
Dim i As Long, j As Long, k As Long
Dim tblrng As Range
Dim tblname As Range
Dim datarng As Range
Set dbsWordData = OpenDatabase("c:\WordData.mdb")
Set rstWordData = dbsWordData.OpenRecordset("tblWordData", dbOpenDynaset)
On Error GoTo CreateWordApp
Set wordApp = GetObject(, "Word.Application")
wordApp.Visible = False
On Error Resume Next
FldrPath = "C:\WordData\"
RecordDoc = Dir$(FldrPath & "*.doc")
k = 0
While RecordDoc <> ""
Set Source = wordApp.Documents.Open(FldrPath & RecordDoc)
With Source
For i = 1 To .Tables.Count
Set tblrng = .Tables(i).Range
Set tblname = tblrng.Duplicate
tblname.MoveStart wdParagraph, -1
tblname.End = tblname.Paragraphs(1).Range.End - 1
vDescriptor = tblname.Text
Set datarng = .Tables(i).Cell(1, 1).Range
datarng.End = datarng.End - 1
vColumn1 = datarng.Text
Set datarng = .tables(i).Cell(1, 2).Range
datarng.End = datarng.End - 1
vColumn2 = datarng.Text
With rstWordData
.AddNew
!Descriptor = vDescriptor
!Column1 = vColumn1
!Column2 = vColumn2
.Update
For j = 2 To Source.Tables(i).Rows.Count
Set datarng = Source.Tables(i).Cell(j, 1).Range
datarng.End = datarng.End - 1
vColumn1 = datarng.Text
Set datarng = Source.tables(i).Cell(j, 2).Range
datarng.End = datarng.End - 1
vColumn2 = datarng.Text
.AddNew
!Column1 = vColumn1
!Column2 = vColumn2
.Update
Next j
End With
Next i
End With
k = k + 1
Source.SaveAs "c:\WordData\Processed\" & Source.Name
Kill "c:\WordData\" & Source
Source.Close wdDoNotSaveChanges
RecordDoc = Dir
Wend
MsgBox "Data Imported from " & k & " documents."
wordApp.Quit
Set wordApp = Nothing
Set WordDoc = Nothing
Set rstWordData = Nothing
Set dbsWordData = Nothing
CreateWordApp:
Set wordApp = CreateObject("Word.Application")
Resume Next
After each document is processed, it is moved into the folder
C:\WordData\Processed and when all of the documents have been processed, a
message box will be displayed with the message "Data imported from #
documents."
Testing the above with two documents, each containing two tables with
Document#Table# in a paragraph before each table, it imported the data into
the tblWordData as follows:
Descriptor Column1 Column2
Document1Table 1 D1T1R1C1 D1T1R1C2
D1T1R2C1 D1T1R2C2
D1T1R3C1 D1T1R3C2
Document1Table 2 D1T2R1C1 D1T2R1C2
D1T2R2C1 D1T2R2C2
D1T2R3C1 D1T2R3C2
Document2Table 1 D2T1R1C1 D2T1R1C2
D2T1R2C1 D2T1R2C2
D2T1R3C1 D2T1R3C2
Document2Table 2 D2T2R1C1 D2T2R1C2
D2T2R2C1 D2T2R2C2
D2T2R3C1 D2T2R3C2
Whether or not it will work for you will depend upon the meaning of your
word "generally".
--
Hope this helps.
Please reply to the newsgroup unless you wish to avail yourself of my
services on a paid consulting basis.
Doug Robbins - Word MVP
"Margaret Bartley" wrote in
message news:OMNqqgZoIHA.3976@TK2MSFTNGP03.phx.gbl...
>I have dozens of Word documents that are composed mostly of tables, and a
>line that labels the table.
> There may be many tables in one document. Some documents have hundred of
> rows in many tables.
>
> e.g:
>
> Table name or descriptor
> column one data column2 data
> next Col1 data corresponding col2 data
>
>
> Next table name
> col1 Data Column two data
> Another Column1 It's column 2 data
>
> Generally, there is one line of description for each table, followed by
> the table.
>
> These Word documents were created over a space of several years, and were
> manually maintained, so that "generally" is key. I'm sure that sprinkled
> throughout these files are notations, keyed-entries, explanatory texts,
> etc.
>
> The spacing between tables and headers is not standardized.
> The style is not standardized.
>
> What I need to to export all these docs to either Excel or Access, and
> create a three-column table, with the table header as the first or third
> column - not important.
>
> I can't wrap my head around a algorythm for this. I can't do a for each..
> next loop, I don't think.
> Do I step through each paragraph, and assess if it is a row of a table?
> If so, how do I do that?
> How do I handle a situation if there is one row or one table that has
> three cells in it?
>
> If I am stepping through paragraphs, and determine that it is a row in a
> table, how do I then stop through the cells in that row?
>
> I don't know how to approach this, and any help will be GREATLY
> appreciated!
>
>
date: Sun, 20 Apr 2008 17:46:47 +1000
author: Doug Robbins - Word MVP
Re: Stepping through a document
Thank you! That looks like exactly what I want.
"Doug Robbins - Word MVP" wrote in message
news:umyF2qroIHA.4672@TK2MSFTNGP05.phx.gbl...
> Margaret,
>
> If you create the following folders
>
> C:\WordData
> C:\WordData\Processed
>
> and you move all of the documents from which you want to extract the data
> into the folder C:\WordData
>
> The if you create a database named WordData in the C:\ folder and in it
> you create a table named tblWordData containing the following fields
>
> Descriptor
> Column1
> Column2
>
> and you create a Form in the database that contains a command button and
> if you have the following code in that command button Click event, it will
> import all of the data from all of the tables in all of the documents in
> the C:\WordData folder into the table tblWordData
>
> Dim dbsWordData As Database
> Dim rstWordData As Recordset
> Dim wordApp As Object
> Dim vDescriptor As String
> Dim vColumn1 As String
> Dim vColumn2 As String
> Dim FldrPath As String
> Dim RecordDoc As String
> Dim Source As Object
> Dim SourceTable As Table
> Dim i As Long, j As Long, k As Long
> Dim tblrng As Range
> Dim tblname As Range
> Dim datarng As Range
>
> Set dbsWordData = OpenDatabase("c:\WordData.mdb")
> Set rstWordData = dbsWordData.OpenRecordset("tblWordData", dbOpenDynaset)
>
> On Error GoTo CreateWordApp
> Set wordApp = GetObject(, "Word.Application")
> wordApp.Visible = False
> On Error Resume Next
>
> FldrPath = "C:\WordData\"
>
> RecordDoc = Dir$(FldrPath & "*.doc")
> k = 0
> While RecordDoc <> ""
> Set Source = wordApp.Documents.Open(FldrPath & RecordDoc)
> With Source
> For i = 1 To .Tables.Count
> Set tblrng = .Tables(i).Range
> Set tblname = tblrng.Duplicate
> tblname.MoveStart wdParagraph, -1
> tblname.End = tblname.Paragraphs(1).Range.End - 1
> vDescriptor = tblname.Text
> Set datarng = .Tables(i).Cell(1, 1).Range
> datarng.End = datarng.End - 1
> vColumn1 = datarng.Text
> Set datarng = .tables(i).Cell(1, 2).Range
> datarng.End = datarng.End - 1
> vColumn2 = datarng.Text
> With rstWordData
> .AddNew
> !Descriptor = vDescriptor
> !Column1 = vColumn1
> !Column2 = vColumn2
> .Update
> For j = 2 To Source.Tables(i).Rows.Count
> Set datarng = Source.Tables(i).Cell(j, 1).Range
> datarng.End = datarng.End - 1
> vColumn1 = datarng.Text
> Set datarng = Source.tables(i).Cell(j, 2).Range
> datarng.End = datarng.End - 1
> vColumn2 = datarng.Text
> .AddNew
> !Column1 = vColumn1
> !Column2 = vColumn2
> .Update
> Next j
> End With
> Next i
> End With
> k = k + 1
> Source.SaveAs "c:\WordData\Processed\" & Source.Name
> Kill "c:\WordData\" & Source
> Source.Close wdDoNotSaveChanges
> RecordDoc = Dir
> Wend
> MsgBox "Data Imported from " & k & " documents."
> wordApp.Quit
>
> Set wordApp = Nothing
> Set WordDoc = Nothing
> Set rstWordData = Nothing
> Set dbsWordData = Nothing
>
> CreateWordApp:
> Set wordApp = CreateObject("Word.Application")
> Resume Next
>
> After each document is processed, it is moved into the folder
> C:\WordData\Processed and when all of the documents have been processed, a
> message box will be displayed with the message "Data imported from #
> documents."
>
> Testing the above with two documents, each containing two tables with
> Document#Table# in a paragraph before each table, it imported the data
> into the tblWordData as follows:
>
> Descriptor Column1 Column2
> Document1Table 1 D1T1R1C1 D1T1R1C2
> D1T1R2C1 D1T1R2C2
> D1T1R3C1 D1T1R3C2
> Document1Table 2 D1T2R1C1 D1T2R1C2
> D1T2R2C1 D1T2R2C2
> D1T2R3C1 D1T2R3C2
> Document2Table 1 D2T1R1C1 D2T1R1C2
> D2T1R2C1 D2T1R2C2
> D2T1R3C1 D2T1R3C2
> Document2Table 2 D2T2R1C1 D2T2R1C2
> D2T2R2C1 D2T2R2C2
> D2T2R3C1 D2T2R3C2
>
> Whether or not it will work for you will depend upon the meaning of your
> word "generally".
>
> --
> Hope this helps.
>
> Please reply to the newsgroup unless you wish to avail yourself of my
> services on a paid consulting basis.
>
> Doug Robbins - Word MVP
>
> "Margaret Bartley" wrote
> in message news:OMNqqgZoIHA.3976@TK2MSFTNGP03.phx.gbl...
>>I have dozens of Word documents that are composed mostly of tables, and a
>>line that labels the table.
>> There may be many tables in one document. Some documents have hundred of
>> rows in many tables.
>>
>> e.g:
>>
>> Table name or descriptor
>> column one data column2 data
>> next Col1 data corresponding col2 data
>>
>>
>> Next table name
>> col1 Data Column two data
>> Another Column1 It's column 2 data
>>
>> Generally, there is one line of description for each table, followed by
>> the table.
>>
>> These Word documents were created over a space of several years, and were
>> manually maintained, so that "generally" is key. I'm sure that sprinkled
>> throughout these files are notations, keyed-entries, explanatory texts,
>> etc.
>>
>> The spacing between tables and headers is not standardized.
>> The style is not standardized.
>>
>> What I need to to export all these docs to either Excel or Access, and
>> create a three-column table, with the table header as the first or third
>> column - not important.
>>
>> I can't wrap my head around a algorythm for this. I can't do a for
>> each.. next loop, I don't think.
>> Do I step through each paragraph, and assess if it is a row of a table?
>> If so, how do I do that?
>> How do I handle a situation if there is one row or one table that has
>> three cells in it?
>>
>> If I am stepping through paragraphs, and determine that it is a row in a
>> table, how do I then stop through the cells in that row?
>>
>> I don't know how to approach this, and any help will be GREATLY
>> appreciated!
>>
>>
>
>
date: Tue, 22 Apr 2008 05:03:25 -0700
author: Margaret Bartley
|
|