Can I grab the content from a .pages file, without using Pages.app?

I’m interested in automatically extracting the content of .pages documents, preferably via a programming language on a web server.

There are quite a few references around the web that say Pages documents are actually zipped archives containing various files (e.g. http://www.tuaw.com/2009/11/02/iwork-secret-life-as-zip-file-revealed-includes-pdf-preview/), but a Pages document sent to me by a friend doesn’t seem to be unzippable by Mac OS X’s Archive Utility, or The Unarchiver, no matter what I change the extension to.

Is there a way to get the content from recent Pages files?


If you're trying to roll your own solution, the actual .pages file is a package. If you right click it, you can show package contents. Inside the resulting folder will be all the graphic files plus a file called index.xlm.gz. If you unzip the file, it is an xml file containing all the text in the pages document.


Apple provides a hosted solution called iWork.com and has snazzy videos showing how it works. I believe it is still in beta despite being available for tens of months, and was announced that "As of July 31, 2012, you will no longer be able to access your documents on the iWork.com site or view them on the web.".

It's not clear if this will roll into iCloud once the iOS versions and the OSX versions of the apps reach parity in file format. At the moment, the capabilities and file formats are not close enough to work together well. Also keep in mind that Mountain Lion is slated for release this summer and the current versions of Pages, Numbers and Keynote are all marketed as iWork '09 and perhaps could be revamped to provide this interface using iCloud.

Lion Server does this for you as part of the Wiki Functionality. Many documents - including Pages will be rendered into a mostly html5 / css3 web page so that viewers of the wiki and embedded/uploaded documents on mobile or other client platforms do not need a Pages viewer app and can just use a modern web browser.


I could expect you were looking for more of an open-source solution where you had inner workings of the tools better documented, but this is a widely available and quite low cost solution for some to the problem you ask.