Solid structures

In January 2007, the BBC’s website sounded the death knell for the floppy disk: a major computer retailer, PC World, was to stop selling the devices once stocks had run out. The lesson is simple: technologies have a limited shelf life.

Yet one veteran of the technology industry, the relational database management system (RDBMS), defies the pattern of long-established technologies, such as floppies or even computer punch cards, becoming information age relics.

From its conception, the relational model – the brain child of British-born computer scientist Dr Edgar ‘Ted’ Codd – transformed the database thinking, moving beyond the hierarchical approach that had predominated. A new structured query language (SQL) could select, update, insert and delete data across a series of inter-linked tables, instead of the structure of data being defined within each application.

Not only has Codd’s model survived well into 2007, but his seminal 1969 paper titled A Relational Model of Data for Large Shared Data Banks spawned the beginnings of today’s database giants, including Oracle, IBM’s DB2, Microsoft’s SQL Server and Sybase. Together with Teradata, these five vendors dominate over 90% of today’s database market.

“The RDBMS model has come to dominate the thinking of the database industry, with the two being almost synonymous,” says David Mitchell, software practice leader at analyst firm Ovum. “RDBMS is the dominant model for structured data, with simple file systems still being used for the majority of less structured data,” he adds.

Demand for relational databases dwarfs that of more specialised databases, such as object orientation. According to analyst group Gartner, RDBMS software revenue grew from $12.8 billion in 2004 to $13.8 billion in 2005, a growth rate of 8.3%. While demand for relational databases is expected to grow at a healthy rate of 7.2% until 2009, revenue from new software licences for object databases is set to keep falling at over 4% year-on-year.

As business leaders turn to information management initiatives, such as strategic data management, business intelligence and integration projects, to drive business growth the demand for relational databases has increased. They provide an effective platform for online transaction processing (OLTP), giving systems such as business intelligence a fast and efficient way to store and query multidimensional cubes needed for analysis and reporting tasks.

The relational database ‘staging post’ will continue to be the favoured method of populating the data warehouse with production environment data until some resourceful vendor can crack the conundrum of delivering true real-time business intelligence systems directly into high-performance databases.

Different content

Despite the positive signs for the RDBMS model, there remains one very large problem: its core technology was never designed to handle today’s object-orientated structures – such as those used in Java programming, for example – nor was it intended to process XML or unstructured data such as videos, voice calls or images.

In fact, it is too simplistic to regard today’s major databases as simply relational – they support relational structures and a lot more, says Donald Feinberg, analyst at Gartner. “The one thing we have seen, and I expect to continue to see for many years to come, is that the [relational] model has been able to transform and has been flexible enough to handle everything you have needed to do so far,” he says.

The UK’s Land Registry is a case in point. During the 1990s the size of its database was growing at 20% each year. In 1998, it embarked on a project that entailed adding immensely to that growth, by storing scanned images of its records and paper documents (some dating back to 1780), so that its customers could access them online.

The images went into the database and not a content management system because “we had the skills and experience of working in the DB2 [database] environment,” says Steve Dean, physical data services centre manager at the Land Registry. “Our applications were geared around the mainframe as a data server and so it was natural for us to maintain that consistency.”

There are now over 130 million scanned documents – taking up 27 terabytes of space in the Land Registry’s database and vastly overshadowing the three terabytes of structured, relational data. But the Land Registry’s applications are not currently written to manage XML structures – the applications simply ‘chop’ the images into chunks and stores them as a sequence of rows on the database tables. And when the image is viewed, the application retrieves it and ‘glues’ it back together again.

However, many businesses would find the difficulty of prising such unstructured data in the database too much hard work, says Mike Fuller, director of UK marketing at database vendor InterSystems. The ‘shredding’ of XML documents into rows and tables introduces inefficiencies and high overheads into the lifecycle of the software running on the database.

“Converting objects to SQL, or SQL to XML, which developers want to do today in a modern web development means the cost of design and programming change. So you get project overruns increasing the cost of software, and a lag between what the user wants and what the developer can deliver because of this ongoing ‘treacle’ [between converting data types],” he adds.

IBM claims to have bridged the gap between relational and XML data types by storing both in one repository, using the same database manager. It does this by using open application programming interfaces (APIs) to access either XML or relational information, which can be pushed into applications through standard SQL statements without coding the logic into the application itself.

XML is a fundamentally different data structure than relational, but “customers have not really been integrating XML information efficiently into their applications,” says Alyse Passarelli, director of IBM’s information management unit. IBM’s newest versions of its DB2 database “allow you to store this hybrid model of both XML and relational information,” she says.

Other database vendors also offer XML functionality alongside relational data types. Oracle, for example, uses the notion of data duality where SQL – the workhorse of the relational model – has been extended to support different data types such as XML. This means that SQL programmers can access both relational and XML data types using the SQL, while the XML has also been extended to allow SQL-type operations.

“Data formats such as XML present challenges for some relational databases.”

Alyse Passarelli, IBM

New job roles

As the relational model has evolved to include new data structures, so too has the role of the database administrator (DBA). Many of the DBA’s tasks, such as back-up, tuning and monitoring, are increasingly being automated, scripts can also allow the automation of functions such as data loading and validation.

The Land Registry’s Dean explains that automating functions is essential, if its DBAs are to manage the ever-increasing data volumes, and to do so under the pressures of 24/7 operation.

Having freed up some time, DBAs are increasingly focused on business, not just technical issues. “If you as a DBA prefer talking to a computer as opposed to talking to a person then you’ve got a problem,” says Gartner’s Feinberg. “Somebody has to set [the databases] up, but a DBA’s value and understanding of databases is not managing the physical database on the storage device, but in fact using their expertise to help create much more efficient and optimised databases,” he says.

As organisations continue to look at more initiatives to save money within the corporate data centre, factors such as database consolidation and standardisation will also feature as part of the DBA’s role.

But despite the changing environment, the importance of Ted Codd’s ideas persists: the relational model will continue to be an important component of the corporate architecture for many years to come.

The rise of embedded databases

Embedded databases have fast become an integral part of many applications and they are found on wide number of portable devices such as mobile phones, handheld scanners, networking equipment, telecoms devices and even car engine management systems.

Unlike the relational database management system that can support several applications simultaneously, an embedded database is built directly into one client application.

Embedded databases do not need to connect to a server across a network to access information, making them both portable and versatile. And unlike their relational counterparts, they do not require a heavily-coded interface, built between the application and database, to deal with multiple commands from multiple applications. This makes them capable of running at very high speeds, and eliminates the need for human administration.

Currently the market for embedded databases is rocketing: IT market watcher, Ovum, reports growth of 35% per year. And as Oracle’s 2006 purchase of embedded database, Berkley, from Sleepycat demonstrates, the large database vendors have been quick to spot an opportunity.

The licensing of embedded databases adds to the appeal. For an embedded application licensing is usually very complex, says Ovum’s David Mitchell. The licensing models around an embedded database are more readily understood because licences can be created for the business application, instead of worrying about licences for the infrastructural layer, he says.

The new challengers

Open source has become an accepted part of software deployment within today’s enterprise. A trail has been blazed by the Linux operating system and the Apache web server. Is the database market now ripe for open source?

In just over a decade the popular MySQL open source database has joined the lexicon of open source products pushing to become viable alternatives to proprietary software. According to IT advisory group Forrester Research, the open source database market – including support and services – is currently worth $300 million. It expects that figure to rocket to $1 billion by 2008.

One reason for this growth, says Andy Astor, CEO of database vendor EnterpriseDB, is that open source has reduced the supply-side barriers of entry to many markets, enabling new suppliers to compete against existing mature product categories. EnterpriseDB is an example of how young companies are using the open source model to quickly establish themselves. The three-year-old vendor builds proprietary features, such as Oracle migration tools, on top of the freely available PostgreSQL database, allowing EnterpriseDB to greatly undercut the prices of its better established competitors.

The lack of upfront purchase fees and the relative cheapness of the management tools needed for database tuning and maintenance, adds to the appeal.

However, open source databases are still used mainly within an experimental or simple production environment, says Donald Feinberg, analyst at Gartner. And he warns that they should only be used in non-mission critical projects until they mature further.

That maturation will likely take three to five years, he adds, during which time proprietary databases will continue to offer better support, scalability and a greater breadth of management tools.

Nevertheless, open source vendor MySQL, for example, is already the third most widely deployed database in non-mission critical production or development environments – beating IBM, Sybase and Teradata (see graph). And this figure, currently estimated to include an installed base of 10 million users, includes many thought-leading organisations, such as Google, Yahoo and Wikipedia. Success at such companies could provide a launch pad for widespread enterprise adoption.

Further reading in Information Age

Relational database management– April 2005

 

Ian Cowley

Ian Cowley is the managing director of printer cartridge company cartridgesave.co.uk. By taking a systematic trial and improvement approach, Cowley and marketing director Sean Blanks have created a Sunday...

Related Topics