The big difference is readability. By humans. In Office 2007, Microsoft adopted their new XML based file format which they called Office Open XML. There already was a simpler version in existence called Open Document Format (ODF) used by Open Office and IBM, but Microsoft chose to create a new one. At the same time they used the Zip functionality to bundle all of the XML files that comprise a word document together. Want to see for yourself? Rename any docx file to .zip and expand it to see the the XML file that represents the document meta data and the XML that holds the data and so on. Notice that you can open any of these XML files and examine the content, they are not encoded. The same is true for an xlsx file.

What does this mean for us? Well it means three things straightaway; 1. As we have just seen we can open a docx file and read the contents, 2. Which means that any software vendor can open and display the contents (which is why Open Office from 3.0 onward and Apple’s Pages have no problem opening word documents), 3. We can write XML files that represent word documents and excel spreadsheets. There are open source tools out there that help us do just that see link and we can create these files without needing a licence for Office. Although in order to read them as word documents you would need a licence.

We can also use other tools such as Linq (Language INtegrated Query) to interrogate and modify the XML in existing documents.

See https://en.wikipedia.org/wiki/Apache_OpenOffice and https://en.wikipedia.org/wiki/Office_Open_XML

and Eric White’s work at http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2015/08/10/announcing-the-release-of-powertools-for-open-xml-on-github.aspx

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.