In order to get the best deal from its suppliers, it stands to reason that an organisation needs to know how many suppliers it uses, and on what terms. But when Brunswick, a $3.8 billion US-based manufacturer of recreational equipment from boats to billiards tables, attempted to calculate the size of its supplier base,
it found that its initial estimate of between 10,000 and 12,000 suppliers was wildly off the mark. In fact, the true figure was closer to 30,000.
The reason for the discrepancy was historical: Brunswick built its business through acquiring smaller companies, which it incorporated into its overall structure as new divisions. Each division had its own systems, its own suppliers – and its own supplier data. Only by consolidating this data using extraction, transformation and loading (ETL) software from data analytics specialist Informatica, and then analysing it using Informatica’s analytic applications, was Brunswick able to get a clearer picture of its procurement activities.
Like Brunswick, most organisations find that achieving a ‘single version of the truth’ relies heavily on successful data migration. In fact, along with data cleansing and analysis, ETL is key to a successful business intelligence project. In the ETL process, data is first acquired from its input source, validated and transformed according to particular job specifications, and then loaded in a standard format to a new repository.
However, while the basic objective of ETL – consolidating data – is straightforward enough, the process is complex and time-consuming, even with the help of tools that to some extent automate the process. Before a byte can be shifted, technology decision-makers need to assess the attributes of data residing on disparate legacy systems, and decide which tools are best suited to the task. And even at that early stage, the real problems quickly become apparent. “It’s no good carrying out an ETL project if you are taking inaccurate data and just combining it in different ways. When you try to get a single view of anything it means combing data from multiple source systems and that is only easy if the source data has standardised values,” points out Jay Huff, director of business development and marketing at Ascential. Since between 40% and 70% of business intelligence project budgets are typically set aside for data migration, mistakes will likely prove costly.
Many IT departments still handle data migration manually by coding their own migration routines. However, success on one project or type of data can lead organisations to assume they can apply the same routines and procedures to other corporate data resources. “[These companies] do a small data march, create programs that seem to work quite well, so they then try to scale that up,” says Huff. “We have been into [clients] that have literally 300 or 400 extract/transform teams, and have data staging areas sitting on lots of different databases, and they end up with a little cottage industry within the business.”
“The ETL process can become a beguiling end in itself,” agrees Huw Ringer, business director at systems integration company Lateral. “One client of ours spent two years and millions of dollars purely figuring out how to get ‘end to end’
metadata into and out of their chosen ETL tool without ever actually transferring a single byte of real live data onto the target platform that the business could use.”
Five years ago if an organisation could support flat files, then it could – with the right in-house skills – probably handle ETL on its own, says Ringer. But as organisations increasingly integrate external data from customers, suppliers, and distributors with their own, the situation has become exponentially more complex. “Today a lot of business to business data is XML-based, and that is not standardised, so you need the ability to take it in whatever form it comes and translate it into something more standard.”
This is an issue that will take organisations some time to tackle. According to a recent report from market research company Giga Group, “XML adapters that enable the major ETL vendors to consume and produce XML out of the ETL engine have been shipping for more than a year, although the adoption is still sparse. This is because XML is such a discontinuous technology that end-user installations are still learning (and defining) the rules of the game”.
The ETL market has grown considerably over the last few years – from a core group of ETL specialists to a market that comprises vendors with a range of approaches and perspectives (see box, The ETL landscape). “If you look at all of the companies in the marketplace, you’ll see they have different heritages,” says Goldsbrough. As a result, customers find that a single ETL solution only goes part of the way to addressing its needs. “Every ETL solution out there is pretty good at extraction. The question you have to ask is how many sources can it extract from?” says Huff.
Kevin Magee, sales director at Information Builders, says that these concerns have prompted the company to take a different approach to ETL with its Iway software. In particular, it is challenging the established concept of physically moving disparate data into a single repository. “Many times this is the best solution. But a single view can also be achieved through combining middleware with the metadata layer,” he says. “When integrated with middleware, the metadata layer provides a single view of the data, wherever the data is actually stored.” Using this approach, Magee claims, IWay can access over 85 different database types on many different platforms.
Ascential is also focussing on offering a so-called ‘end-to-end’ product that addresses a wider range of ETL needs. In the past year, it has made a number of key company acquisitions to that end, including its recent acquisition of data quality and cleansing specialist Vality Technology. “We think we have all the pieces now,” says Jay Huff. “When I joined Ascential about eighteen months ago, I was surprised to see a data integration market as fragmented as it was. You have ETL vendors, profiling vendors, and [companies specialising in] data quality. It makes no sense to me, as the process demands all of those things.” And as the process becomes exponentially more complex, the bewildering choice faced by IT decision-makers is unlikely to become clearer.