"View Source"-equivalent for Word documents?

Sometimes Word documents seem to be more or less break, usually when the layout has gotten quite complex and the document has changed hands and/or versions a couple of times. Symptoms may be that nothing happens when pressing the Backspace or Enter keys in a certain location of the document where it really should work, or that the formatting seems to apply and reset itself more or less randomly. I think we've all been there.

Often it can be very hard to know exactly what is wrong, since what happens under the hood in Word is quite opaque. You could have a document that looks empty, but in reality the underlying state regarding formatting etc can be quite complex.

In these cases it would be useful to peek at the source code behind what is shown on the page; like how you in a browser could do View Source, and ideally be able to do edits directly in the source code, like how you would do when using Latex. Is there a View Source-type command or utility for Microsoft Word documents?

My guess is that there is no such command, or I would have heard about it. If that is the case, does anyone have any good approach when it comes to getting a grip on annoying "hidden formatting" in a Word document?

I suspect there might be some differences in the .doc and .docx formats; I am interested in both cases.


If formatting is what you primarily interested in then Word does have a good feture for inspecting all types of formatting applied to text and objects called Reveal Formatting. In Word 2007 and 2010 the shortcut for this panel is Shift + F1.

enter image description here

Otherwise if you are looking for an even deeper understanding of the document format then you can look at the XML for DOCX files.

  1. Find your DOCX document on disk.
  2. Change the extension of the document from .docx to .zip.
  3. Double click on the file and open it in the default archive manager.
  4. Navigate to the "Word" folder in the zip program and open Document.xml. This is the code behind what makes up the bulk of the document content, although the other files also are used in other ways i.e for styles or font information.

You will definitely need a decent XML editor just to view the data and even then it's quite complex and for a large document will be very very long.

When it comes to DOC there is no easy way to "view the source" as it's a binary file made up of separate streams and therefore there is no easy way to view the contents.


I guess the .doc format is pretty hard, so I can't help you here. However, .docx is actually a zip file with all the details stored in XML files. Thus, rename the file to .zip and take a look at the source!


When it comes to a binary format like *.doc then things are trickier. You can use LibreOffice's mso-dumper. Just clone the solution to your local machine and run

python doc-dump.py \path\to\file.doc >output.xml

Now all the things in the binary file will be converted to xml in the exact format described in Word (.doc) Binary File Format

There's also WordFileDump which is simpler but not as powerful as mso-dumper

Unfortunately those are only for analyzing the structure and there's no tool to reassemble the xml output back to a *.doc file, so once you've found find the root cause you'll have to use Word to edit it. Therefore it'd be easier to convert to *.docx, examine the *.docx file then convert back to *.doc if necessary

Or you can also save the file as rtf which is a "human-readable" text file instead of office xml. Alternatively save the word file as html