Database pretenders

It may not be everyone’s idea of a fun night out, but the capacity crowd that descended on Silicon Valley’s Computer History Museum on the evening of 3 February 2003 certainly thought so. Eager, attentive, even excited, they had come from all round the US to hear the pioneers of the relational database tell how they built the foundations of a $7 billion industry.

Their modestly billed discussion point: ‘How the database has changed the world.’ “Every time you withdraw cash from an ATM, make airline reservations or charge something on your credit card, database systems are working behind the scenes,” observed Ken Jacobs, employee number 18 at Oracle and now vice president of product strategy. All that was missing was a guest appearance by his boss of 22 years, company co-founder Larry Ellison – hardly a thoroughbred technologist himself, but an individual whose marketing prowess brought in the dollars that helped turn relational from a barely usable technology into the de facto standard.

The event’s sense of conquering ubiquity was echoed by the other renowned, if latter-year, stars of IT: Chris Date, heir to the relational model developed by IBM scientist Ted Codd; Michael Stonebraker, creator of the Ingres database; Bob Epstein, inventor of Sybase; Roger Sippl, the man behind Informix; and Pat Sellinger, one of the team that developed IBM’s DB2 database.

It is perhaps difficult to appreciate the depth of the foundations they and their peers laid. Relational database management systems (RDBMS) facilitated the separation of data definitions from applications code – something that sounds trivial today, but which freed programmers from endless hours of pointless coding. More significantly, though, relational databases provided the platform for the development of packaged applications and for the whole business intelligence industry.

Next revolution

In a matter of years in the late 1980s, in what became known as ‘the database wars’, relational overthrew the established order and all the models of data management that went before it: flat files, hierarchical databases, Codasyl databases. It even saw off an early 1990s challenge by object databases.

RDBMSs may have dominated database thinking for 30 years but it is inconceivable that they are the last word in database technology, that the model is so strong that it can be constantly augmented to meet the demands of evolving businesses. So what comes next? And are there any equivalents of Oracle, Sybase, Ingres and DB2 in the labs now that may spark the next database war?

Before anyone can answer that, they need to examine why there is pressure for change.

A good starting point is the changing substance and nature of data. Since the first data was stored, technologists have been constantly astounded by the growth of data requirements and the volumes available for database systems to manage. Organisations that used to measure their capacity in megabytes now talk of hundreds of terabytes, even exabytes of data.

The drivers for that are clear: the retention of the daily waves of email; the holding of inconceivable amounts of target customer data; and the digital storage of customer calls to contact centres.

The growth in the size and sophistication of the data that RDBMSs have to deal with has been exacerbated by the pressure on organisations to address corporate governance

 
 

Oracle 10g

Oracle CEO Larry Ellison’s pitch that 10g is the most revolutionary advance in computing since the IBM System/360 mainframe of 1964 is typical Ellison hyperbole. But his claim that it will reduce the cost of operating major databases will certainly find willing listeners.

Oracle 10g is designed to enable organisations to construct networks of pooled, low-cost server and storage devices that can be automatically ‘provisioned’ to meet the changing workload demands of different applications and databases.

At the heart of the database, is Oracle’s two year old real application clusters (RAC) technology, which enables organisations to install and run a database across multiple servers.

Organisations can either use this to increase the scalability of their databases or to cut costs because databases can be hosted on clusters of inexpensive servers, rather than big and expensive high-end servers.

According to Andy Mendelsohn, senior VP for database development at Oracle, many new buyers are looking to use it to spread processing across low-cost, two- and four-way Intel-based blade servers running Linux – a message that dominates the marketing of 10g.

Ellison says that about half the company’s 200,000 customers will switch to a grid computing architecture in the near future because of the combination of cost savings along with performance, scalability and reliability benefits.

Although 10g will not be widely available until the end of 2003, early adopters report some impressive results.

Ellison cites the example of entertainment software provider Electronics Arts which has moved the highly popular online version of its ‘The Sims’ game from a centralised server to a grid of ‘Lintel’ – Linux on Intel – machines.

“They are getting 30,000 SQL calls per second,” he says. “Don’t try that on your mainframe at home, kids.”

 

 

issues, which often means they have to hold on to data for many years – and be able to get to it fast, reliably and with audit trails of how it got there, who has seen it since and how it has been changed.

Those assets do not exactly play to relational’s traditional strength in ‘slicing and dicing’ and serving up queries fired at numbers and characters stored in interrelated tables. Is relational doing an equally good job with email data or XML files or image files or sound files? The answer depends on who you talk to.

For some, the industry should already be in a post-relational era. “The complexity of the data being stored today, not to mention the demand for data access, has created a situation where traditional relational products struggle,” says Peter Harris, a senior sales engineer at ‘post-relational’ database supplier InterSystems.

Data warehousing and data mining provide prime examples. When the relational database discipline was in its infancy in the 1980s, both the amount of data stored and the sophistication of queries look simplistic by today’s standards.

Today, the information that business analysts want to extract from their data warehouses is far more multi-dimensional. “Twenty years ago when the means of querying databases, when relational’s SQL [the structured query language] language was developed, data was much simpler in its structure and the types of data organisations stored was limited by the structure of relational,” says Harris.

Relational databases may now include multiple extensions to allow them to handle multimedia, spatial and other types of exotic data and take seconds rather than minutes to return answers. But limits still exist.

Security is one key issue. There is an increased need for security awareness amongst database developers, as so-called ‘SQL injection’ attacks on databases have become prevalent.

For example, Yuval Ben Itzhak, chief technology officer of security software vendor Kavado, points to simple database attacks that can be executed against sites that ask for login names and passwords and check those against entries in a database to gain access. All the attacker needs to do is put in a string of SQL code for the password to guarantee a positive response, he says.

“The SQL language around which many databases still, unfortunately, revolve, is highly restrictive in terms of the architecture that you can develop up-front and it’s also highly restrictive in terms of future expansion of that structure,” says Harris.

New models

Given relational’s shortcomings, it might be expected that a new generation of Codds, Sippls, Epsteins and Stonebrakers – even Ellisons – is waiting in the wings, touting the next wave of database management systems and ready to sit on panels in 20 years time reminiscing about how their technology overthrew relational and changed the world. The short answer is that they do not exist: the more interesting answer is that there are companies providing database technologies that surpass relational in their ability to meet new and important – if sometimes niche – business requirements.

At one point, challengers were expected to come from the object database camp.

Object databases, first developed to support the growth of object-oriented programming and the storage of unstructured data such as images and digital recordings, for a few years threatened relational technology – and caused executives like Oracle’s Ellison to order his developers to find ways of building extensions to the company’s relational product that could log and, more latterly, provide analytical capabilities on non-alphanumeric data types.

The add-ons to Oracle worked, but were irrelevant to most users, as was the disastrous attempt by Informix to move to an ‘object relational’ database with its acquisition of Mike Stonebraker’s Illustra – a ‘mark two’ Ingres that in the early 1990s naively sold itself as being able to handle multimedia queries, such as pick out red flowers from a database of bouquets.

But the demand to store unstructured data has not been as high as object database companies originally calculated and most – names such as Servio, Versant, Poet and Objectivity – have long ago stepped out of the database limelight.

That does not mean to say there is no demand for such products. It has been constant, if flat, for several years (see graph), but at less than a collective 1% of the total database software market it is undeniably a niche sector.

Some vendors, such as InterSystems, shied away from the object-relational split that emerged. It drew on its pedigree in healthcare and the now-legacy MUMPS database and operating system to build an SQL-enabled object database called Caché.

“Old-fashioned relational databases force data into a simplistic two-dimensional model. Caché stores data in multi-dimensional structures that enable high-speed performance and massive scalability – both essential for this application,” it claims.

For some users that combination, coupled with the company’s domain expertise, works well. Moorfield’s Eye Hospital in London, for example, chose InterSystems because of the high performance of the Caché product and its ability to handle multimedia documents.

That ability to handle multiple data types (alphanumeric characters, images, voice, spatial data, etc) has a growing value, and the emergence of important new data types often provide niche opportunities.

The successor to the object databases as a relational challenger must be XML databases. These are specifically designed to deal with the structure of XML documents, resulting in a capacity to deliver faster XML performance than relational products, with much less associated administration.

With an XML database, “you can get many benefits of using a DBMS without going through the agonies of data modelling, database design and performance tuning that relational DBMS-based applications traditionally suffer,” says Mike Champion, a senior research and development advisor for new technologies at Software AG, which produces the Tamino XML database.

However, analysts depict a narrow future for the XML database. Mike Gilpin, a research director at Forrester Research, says that the problem for XML database vendors is that there is simply little need for them because of the way in which XML is being deployed. “SQL is for data and XML is for data in motion,” says Gilpin. He explains: “If you look at where XML is having an impact in the market, it’s where data is moving from one place to another.”

Gilpin concludes that XML databases will therefore only have a niche role as a means of holding XML data temporarily as it is shuttled from one application to another. The existing RDBMS vendors have already implemented XML data type-support in their products, re-enforcing the notion that they are able to constantly augment the relational underpinnings with enhance-ments that support new demands.

Building associates

Building on an ageing foundation is not everyone’s notion of how database technology should develop.

Lazy Software founder Simon Williams advocates the Associative Model of Data. This offers a more flexible architecture than relational models and can dramatically reduce development times, claims Williams.

Lazy’s product, Sentences, truly reflects the relationships between ‘data’, he says. This enables the creation of databases in which the relationships between different entities, such as customers and the products they buy, can be put together on the fly. They do not need to be predetermined in the database structure as they would in a relational database.

“Rather than store data with references to other data, the Associative Model involves storing data in objects of two types: entity types and association types,” says IDC analyst Carl Olofson. Nor does it need to be pre-configured, and users can discover data without having to have knowledge of the schema, he adds.

But Fabian Pascal, an associate of relational database guru Chris Date, takes a distinctly hostile view to the Associative Model. He does not believe that it even represents a model of data, let alone an alternative to relational – underpinned, as it is, by the robust mathematics of set theory.

“You’ve got to understand what a data model is and what the relational data model is. Unfortunately, you don’t,” Pascal told Williams in a typically blunt online exchange.

Both Pascal and Date are regarded in the industry as relational database zealots, critical even of the major vendors’ products for not being sufficiently relational.

And despite its ease of use, users have not exactly flocked to the associative model either – Lazy has yet to reach three figures in terms of customers and those it does have are engaged in distinctly low-end projects.

Commercial challenges, however, may not be the main threat to the leading relational database vendors.

In the same way that Linux has taken on Unix and Windows, relational products will increasingly come up against open source databases, most notably MySQL, which is already popular as a web site back end. It also ships as standard with Novell’s Netware network operating system.

With around four million installations and 30,000 downloads per day, there are more deployments of MySQL than there are of Sentences, Tamino and Caché put together. That, more than any revolutionary new model, is what will keep Ellison looking over his shoulder.

Avatar photo

Ben Rossi

Ben was Vitesse Media's editorial director, leading content creation and editorial strategy across all Vitesse products, including its market-leading B2B and consumer magazines, websites, research and...

Related Topics