How to read contents of an Table in MS-Word file Using Python?
Solution 1:
Jumping in rather late in life, but thought I'd put this out anyway: Now (2015), you can use the pretty neat doc python library: https://python-docx.readthedocs.org/en/latest/. And then:
from docx import Document
wordDoc = Document('<path to docx file>')
for table in wordDoc.tables:
for row in table.rows:
for cell in row.cells:
print cell.text
Solution 2:
Here is what works for me in Python 2.7:
import win32com.client as win32
word = win32.Dispatch("Word.Application")
word.Visible = 0
word.Documents.Open("MyDocument")
doc = word.ActiveDocument
To see how many tables your document has:
doc.Tables.Count
Then, you can select the table you want by its index. Note that, unlike python, COM indexing starts at 1:
table = doc.Tables(1)
To select a cell:
table.Cell(Row = 1, Column= 1)
To get its content:
table.Cell(Row =1, Column =1).Range.Text
Hope that this helps.
EDIT:
An example of a function that returns Column index based on its heading:
def Column_index(header_text):
for i in range(1 , table.Columns.Count+1):
if table.Cell(Row = 1,Column = i).Range.Text == header_text:
return i
then you can access the cell you want this way for example:
table.Cell(Row =1, Column = Column_index("The Column Header") ).Range.Text