Easing the transfer of documents
Using XML as the basis of a document format means that there is a whole range of tools and software that can already parse it, making it unnecessary for the end user to obtain a particular program in order to read a document.
Even the doyen of the proprietary file format, Microsoft, is keen to embrace XML. Office 2003 provides the ability for Microsoft Word and its stablemates to save and read documents in XML-based formats. Microsoft has also published the schemas it has used for these formats (such as 'SpreadsheetML' and 'WordprocessingML'), making it possible for other programs to understand, not just read, files saved in these formats and to save files in that format as well.
Previously, developers had to reverse engineer Office's many document formats in order to read them, usually imperfectly.
"Perhaps the most important factor relating to standard XML file formats is that of human-readable tags and standard processing techniques," says Gary Edwards, OpenOffice.org's representative on the OASIS OpenOffice XML Format Technical Committee. "With a proprietary file format, users had to either get special permission from the application vendor, or reverse engineer the binary format, in order to work with the files in ways that met their specific needs. With a standard XML file format, users can mine, re-use and re-purpose information any way they can think of. Plus, the standardisation of the file formats and related XML transformation technologies means that powerful machines can be constructed to service advanced content management and collaboration needs without having to beg the application vendor for permission or future enhancements."
The many identities of XML
Unlike HTML, XML has no pre-defined tags or way of ordering tags for content. Consequently, any organisation that uses XML has to decide which set of tags ('schema') it will use. If it never intends the document's structure to be understandable to anyone outside the organisation, it can choose a completely arbitrary schema. But if it is to exchange documents with another organisation, that organisation will need to understand the schema underlying the document.
Rather than reinvent the wheel, many organisations are using schemas appropriate for their industries such as ebXML (ebusiness XML), LegalXML and Acrod XML for Life Insurance. There are currently many thousands of pre-defined schemas, so picking one appropriate to the organisation, partners and purpose can be hard. But by using a pre-defined schema and XML, organisations can have the benefit of industry experience in a pre-packaged form, and a document exchange format that almost any system can read without the need for a developer to create a parsing system especially for that format.