The rise and rise of XML

Towards the end of 1997, 12 experts in the standard generalised mark-up language (SGML) set-up a World Wide Web Consortium (W3C) working group to define a subset of SGML that would make it easier to publish content on the web. A few months later, in February 1998, the W3C published a 25-page document called XML 1.0. The rest, as they say, is history.

Ten years later, and long after the narrow purpose of the extensible mark-up language has been largely forgotten, it is no exaggeration to say that XML has played a pivotal role in shaping the Internet era of IT and, as such, helped to create the modern world of real-time, interactive electronic commerce.

Today indeed, says Tim Jennings, research director at industry analyst the Butler Group, XML “is now part of the fabric of business. It is the default mechanism for representing data in a host of areas across the IT landscape and, if there is any requirement for interoperability between systems then XML is likely to be the data interchange medium.”

Of course, this wasn’t the plan. In the beginning, says Dave Hollander, co-founder of the original W3C XML Working Group, and now CTO of semantic web software developer Contivo, all that he and his XML Working Group colleagues had intended to do was fix a problem with SGML.

At the time, he remembers, SGML had been in action for about 10 years as a platform-independent means of describing content electronically but “SGML had its problems. It was difficult to learn, it was difficult to use, [and] its acceptance was limited to documents professionals; it was [also] very difficult to apply to the new medium known as the web.”

However, “the web was becoming ubiquitous and we wanted to use it to publish our SGML-encoded information,” he adds. The answer that Hollander and his colleagues came up with was simple and effective: cut out the pieces of SGML that made that language complex and tricky to use, and replace them with concepts – such as ‘angle-bracketed tags’ – that had helped to make HTML (the language of the web) so popular and simple.

This best-of-both-world’s approach to fixing SGML unwittingly created a language that is more than the sum of its parts. XML has simpler syntax than either of its predecessors, which ensures that any XML-defined content can always be interpreted by any XML reader. Like SGML, it supports user-definable tags, which means it can be used as an extensible meta-language that can be modified to meet application-specific demands. And, as with HTML, XML definitions can actually be read by people. This means that an XML file format tends to be self-documenting – it does what it says on the label.

Between them, these three core strengths of XML – simplicity, readability and, especially, extensibility – rapidly brought it to the attention of a host of interested parties outside the SGML community.

At first this rush to append XML to the front of different content types didn’t see XML’s use straying very far from its web publishing origins. Rather, application vendors began to use XML-based vocabularies to widen the accessibility of their previously proprietary file formats, and some more adventurous end-user organisations, notably companies in financial services, latched on to XML’s potential to ease their internal application exchange issues.

However, within three year’s of its publication as a simple standard for web content format, XML had started to be adopted for far more ambitious purposes, beginning with the W3C’s own desire to extend the reach of the web beyond PC-browsers.

This culminated in May 2001 with the W3C’s publication of XHTML – a version of HTML re-written as XML Schema – an event that was swiftly copied by NTT DoCoMo (then the world’s leading mobile Internet service provider) which published a similarly XML-driven makeover of the compact HTML standard for handsets – cHTML.

The use of XML as a general purpose, device- and network-independent format standard was a major turning point for XML. It had now graduated to being a de facto data (as opposed to merely content) exchange standard, and it was starting to establish a pivotal role in application integration initiatives such as SOAP – the Simple Object Access Protocol that is a key element of web services standards.

In the years since then, the proliferation of XML-based technologies has continued unabated, and their impact on commercial IT has been profound. Today, says Butler’s Jennings, XML has already reinvented content creation and management, revolutionised information-level systems integration and still has the potential to do much more. But is the industry, as other analysts believe, close to discovering the limits to XML’s versatility and performance?

Although XML’s proponents must still concede that the flexible and relatively unstructured way that XML defines data can still make it prohibitively resource hungry in high-volume data environments, Jennings believes that this problem will not persist.

“Some of the processing issues have already been overcome by chip-level technologies, and by the development of XML processing appliances,” says Jennings, but the biggest breakthroughs are likely to come with the imminent arrival of hybrid SQL/XML databases, such as IBM’s DB2 ‘Viper’.

Thanks to Viper and supporting information consolidation technologies such as JustSystems’ xfy, “old barriers to extracting XML from relational databases are being removed. For the first time XML is going to have real parity with SQL data in a transaction environment,” says Jennings.

Pete Swabey

Pete was Editor of Information Age and head of technology research for Vitesse Media plc from 2005 to 2013, before moving on to be Senior Editor and then Editorial Director at The Economist Intelligence... More by Pete Swabey

Pete Swabey

Related Topics

Related Stories

Observability – everything you need to know

Why data isn’t the answer to everything

Two-thirds of UKI firms struggling with data insight costs

Qlik completes acquisition of Talend

Related Stories

Observability – everything you need to know

Why data isn’t the answer to everything

Two-thirds of UKI firms struggling with data insight costs

What generative AI means for business analytics