It is a problem that relational database software vendors have been wrestling with since the dawn of B2B commerce: how to provide facilities for managing and querying XML data, without sacrificing performance and without compromising the structure of the original document. It clearly represents an awkward disconnect.
On the one hand, the major vendors – IBM, Oracle and Microsoft – are touting the virtues of XML as the language of ecommerce, data interchange and web services. Yet on the other, they have told customers there is no purpose to pure XML databases, arguing that extensions to their relational database systems are the best way for handling XML data.
But what they shy away from is that the relational model of data and the structure of XML are seemingly irreconcilable. Put simply, the basic structure of the relational model of data is too rigid to be able to store XML-tagged documents in all their richness. Based on two-dimensional tables, relational databases are almost exclusively designed for handling alphanumeric data. Simply adding support for an XML data type has not proved to be the answer.
Users can store XML documents as a ‘BLOB' (binary large object), in much the same way as they might store an image file. But as Robert Perry, an analyst with the Yankee Group, points out, that means that "individual document elements cannot be queried and retrieved, greatly limiting the flexibility and effectiveness of the application."
The database market leaders all acknowledge the problem. In fact, as XML has become central to many customers' B2B strategies, they have come under pressure to address the situation before their web services rhetoric is undermined by an inability to fully support XML natively within their databases. The response by Oracle has been XML DB, part of Oracle 9i release 2, due in June 2002. For IBM, the solution is Xperanto, an XML add-on to its Universal Database (DB2), which should ship later this year.
The approach with such XML-friendly relational products is to split up the XML documents into their constituent elements and assign these to relational tables. There is a major drawback to that, though, say Perry. "Since each element is treated separately, the relational database can retrieve any element requested by the application, but must rebuild the hierarchy each time the data is retrieved."
First, there is an overhead involved in terms of the time it takes to design the database, as well as in mapping the various elements of the XML documents to the various columns of the relational database. Mapping is required to enable users to pull the XML document out and re-assemble it in its original form. "When you use XML, you usually have to think about structure in some way – the structure of the document or the schema or the document type definition. So even if the data never goes near the database, you still have to think about the structure," says Nigel Hutchinson, chief scientist at Software AG.
Second, inputting and accessing data in this way has a negative impact in terms of performance. For example, retrieving just one XML document from a relational database requires multiple tables to be searched and a series of join operations to be performed. This imposes a significant performance overhead, says Rob Hailstone, European director of software infrastructure research at IDC.
IBM, for one, recognises this. "The relational database does not support native XML, [so] you are paying in terms of performance, in terms of scalability, because you have to map that XML data into a relational schema, which can be quite inefficient," says Nelson Mattos, a distinguished engineer and director of information integration initiatives at IBM.
Other vendors, from the pure play XML database camp, clearly don't have such problems. For example, the main component of Software AG's Tamino XML database product – the market leading XML database, according to analyst IDC – is its XML engine, which automatically indexes XML tags, replacing the relational concept of keys, to provide access to the database's contents.
Alongside that there is the data map, which stores schema definitions used to manage the storage and retrieval of XML documents; the XML parser and object processor, which validates that requests to create new XML objects conform with the formatting rules; and XQuery, the database access method akin to relational's SQL.
XQuery is an SQL-like query language, a Worldwide Web Consortium (W3C) standard that has been added to the XPath standard, a platform that describes a way to locate and process items in XML documents. XQuery enables users to query XML document fragments so that more granular data can be interrogated or extracted from the database.
Perhaps the most intriguing function offered by Tamino – a feature that other database vendors plan to introduce soon – is X-Node, which provides the ability to tap data from non-XML data sources, so that Tamino can be used as the hub of a federated or virtual database.
Yet XML databases are not without their shortcomings. For example, because the technology is immature, vendors are still ironing out bugs and still working to build in many of the capabilities that relational database users have become accustomed to for years, says Scott Fulton of Avon and Somerset police, a Tamino user (see In practice: Avon and Somerset Police).
And even the most ardent of XML database enthusiasts do not believe that the primacy of relational database technology is under serious threat. Instead, they point to initiatives from the database giants to absorb key technologies, enabling them to offer fuller XML support within the relational model.
During 2002, IBM and Oracle will release new software that can mimic the native XML support that pure XML database vendors such as Software AG offer, even if their approach is less elegant.
"I think there's some inherent reasons why you wouldn't want to use specialised databases even if they do provide better performance," says Ken Jacobs, Oracle vice president of database product strategy. There are few applications in the real world that are exclusively XML. "I can't imagine any company that will have only XML data and no structured data," he says. In addition, XML database technology is simply too immature at the moment, he argues.
Oracle has steadily refined the support it offers for handling XML documents. It first introduced support for XML with the release of Oracle 8i in late 1999. "It was a relatively loose integration, featuring content transformation performed externally to the database itself," says Victor Votsch, an analyst with the Patricia Seybold Group. But Oracle 9i release 2, the next version of the database, features Oracle XML DB, a capability that has been well received by XML developers.
"This new XML support in Oracle looks to me like the first real alternative in the RDBMS [relational database management system] space for natively storing XML data," says Kimbro Staken of XMLdatabases.org. Key features include better schema support, enabling developers to separate the data definition from the physical storage of the data in Oracle for the first time. This should simplify initial database set-up.
Oracle is also providing better support for XPath, which provides a further layer of abstraction, separating the details of physical storage from the application logic.
"Oracle XML DB really provides an implementation inside the Oracle engine of an XML data type, the ability to reference XML path expressions in SQL statements, basically allowing you to use SQL and XML interchangeably or as one language to access both SQL and XML data," says Jacobs.
IBM's Xperanto initiative, announced in March but not due for incorporation into DB2 until late 2002, is similarly bold. It has gone one step further than Oracle by introducing support for XQuery and enabling DB2 to potentially be used as the basis of a federated database project, in which a number of different data sources – including databases – can be queried as if they were one database.
However, XML database vendors – predictably – remain unconvinced. "The strength of the IBM system, the only strength they have, is reading the database, getting XML out of it. That's the easy bit. But when it comes to the storing of live XML data, they still have the problem that they need to either break it up or store it as a block," says Software AG's Hutchinson.
But the lesson of the last database war, where object databases tried to usurp relational's role, proved just how resilient the relational database model could be and showed how reluctant most users were to change. "While applications will be developed entirely on XML databases, relational databases will continue to be the primary data store for applications," concludes Yankee Group's Perry.