An embarrassment of riches

It is ironic how often technological standards designed to ease integration and unite data sources grow to become new sources of complexity and data disintegration.

XML, for example, is a language designed to ease the passage of data across systems. It allows organisations to choose their own, over-arching data definitions, meaning that different applications can nonetheless refer to the same data objects.

So successful has this idea been that common data vocabularies are now built that serve not just single organisations as they attempt to integrate their own systems, but entire industries. These vertical XML schemas allow service partners to integrate applications using standard definitions of data objects unique to their line of trade.

“XML has no semantic information in it: it is the alphabet,” says Gartner analyst Rita Knox. “You  create models in order to bring in the vocabulary.”

These specialised XML vocabularies include XBRL, for business reporting; HR-XML, for the human resources industry; LegalXML and LexML, both for the legal profession; and FPML, that defines financial derivatives for the banking industry.

One of the most widely used XML schemas is Universal Business Language (UBL). Ratified by the standards body OASIS, UBL offers universal definitions of business documents such as invoices and purchase orders in XML form. “Using UBL is a way of doing the procurement chain without paper,” explains Benjamin Walshaw, head of technical services for EMEA at JustSystems, the XML application framework company.

Some schemas are extremely specific, for example the standard for automotive retail, or STAR. Using the Open Applications Group Integration Standard, a consortium of car retailers designed a specific XML schema to define the various data objects required in car transactions, from wheel specifications to whether a car was dented on a test drive. 

Even more flavours of XML come in the form of technical schemas. These are not designed with any vertical industry in mind, but instruct machines on how to interact with other machines. Examples include VoiceXML, which allows organisations to design speech recognition applications; and RSS, the web information syndication standard.

Adopting these standard XML vocabularies can help organisations when integrating applications. “I used to get a lot more clients saying ‘we’re thinking of developing our own XML data model’. I would say ‘Don’t do that. There are so many already out there, why give yourself the extra work?’,” says Knox. Now, she adds, companies rarely attempt to build their own schemas, given the variety of independently-tended alternatives that are available.

Introducing complexity

Because vertically or technically specified XML schemas are designed to be comprehensive, they include far more definitions than any one business – except one of the world’s largest corporations – is ever likely to use. This means that most organisations use a small sub-section of the vocabulary defined in any given schema.

Organisations using UBL or XBRL, for example, will only use the definitions that are salient in the countries in which they operate. Similarly, a small company is likely to need a simpler definition of an invoice than is possible with UBL.

And while using a small subset of an XML schema makes data management simpler in the short term, it introduces complexity when it comes to integrating applications, whether within an organisation or with a partner business. Because the all-encompassing vocabularies are so diverse, there are many different ways to slice and dice them, so one medium size business or internal department could be using an entirely different set of definitions from another, even if they are using the same schema.

This complexity grows exponentially when businesses attempt to combine more than one XML schema. For example, should a business process tie together an invoice and a paycheck, the relevant XML definitions may draw on UBL and HR-XML.

Although all these schemas are based on the same underlying language, XML, their specialisation means that they cannot easily be conjoined. Even universal objects, such as an address or a name, may be differently defined and labeled in order to best suit the context in which they are to be used.

The most immediately apparent way of solving this problem is to hand code applications to understand the pertinent XML definitions from various schemas. This approach is appropriate when the current state of data and application integration is expected to remain unchanging.

However, this technique constrains some of the flexibility that XML-based business processes have the potential to unleash. If a business is constrained by the cost of hiring a specialised Java developer every time it wants to make a slight change to a business process, then the cost and flexibility benefits of automation will not be forthcoming.

Instead, businesses can employ tools that establish a permanent record of how data should be translated between schema vocabularies and that reside on a server, to be called by applications as needed.

Extensible Stylesheet Language Transformations (XSLT) language is an XML-based tool for converting documents between XML vocabularies. It takes an XML document and stylesheet that describes the way in which data objects should be translated, producing a new document using the required vocabulary. 

The advantage of using XSLT, says JustSystems’ Walshaw, is that it lightens the development load, and therefore the cost, of using multiple strains of XML. While constantly readjusting application code requires a highly specialised programmer, it is far simpler to use XSLT with an interface such as JustSystems’ xfy product, which uses an extension of XSLT known as XVCD.

And once particular transformations are developed, such as passing an address from HR-XML to a UBL-based system, they can be published in a library of transformations and reused at will. Walshaw goes as far as to suggest that non-technical staff can be trained to build dynamic documents that draw from multiple XML schemas using a drag-and-drop interface.

The proliferation of XML variations will continue – so useful are the preset vocabularies they contain. But that jeopardises the interoperability and therefore the utility of information stored in XML form.

Indeed, the ease of integration that XSLT allows will be be vital if XML is to fulfil its potential as the basis for the next significant development of the Internet – an Internet of data, or semantic web.


Further reading

Pete Swabey

Pete Swabey

Pete was Editor of Information Age and head of technology research for Vitesse Media plc from 2005 to 2013, before moving on to be Senior Editor and then Editorial Director at The Economist Intelligence...

Related Topics