Every single transaction. Every key the check-out worker punches. Every refund processed. Every opening of the till drawer and the time it remains open.
One US retailer is capturing all that data at every retail point, at every one of its supermarket branches, every retail day of the year, for all 15,000 of its till operators.
The purpose of sucking up such vast amounts of microscopic detail? The US retailer (which asked not be identified for fear of alienating check-out staff) is searching for employee fraud by analysing behaviour, comparing the cash register log to punched key data, and spotting anomalies that might indicate missing cash or goods.
The ‘shrinkage’ the supermarket chain is trying to eliminate may only account for around 1% of revenues, but in a sector where low-single digit margins are the norm, that can be the difference between profit and loss.
Data warehousing on such a scale – and at such a cost – may seem to be overkill in an era of IT budget constraints, but it is a symptom of the pressure organisations are under to support faster, more widely spread and more accurate decision making – factors that become more rather than less important in tougher economic times.
The sheer volume of data available for analysis is staggering – and, if not corralled and made accessible, is overwhelming. A recent poll by market research company BuzzBack of 158 senior executives at major companies
(those with revenues of more than $500 million) found that 59% perceived that the amount of data available to them for decision-making was doubling or tripling each year. Some described the sensation as “drowning” or “swimming” in data; others said they felt “frozen”, finding it hard to act because of either conflicting data or data that doesn’t reach them in time to help with key decisions.
“Businesses are undergoing a fundamental shift in the way they make decisions. In today’s environment, decision making occurs more frequently and at all levels of an organisation. It is no longer a semi-regular senior management activity,” say analysts at market watcher IDC.
“The trend towards the democratisation of information and broader decision-making responsibilities demands timely delivery of relevant information to each decision maker,” they add.
That is backed up by the results of the BuzzBack survey. It showed that 73% of respondents felt they were making more daily decisions than a year ago and 53% had less time to make those decisions.
The upshot: missed opportunities for the business – a feeling that was expressed by 49% of respondents.
But companies are not just looking to tease better insight from the vast amounts of detail data they are warehousing, they are also making the resource available to a wider group of decision makers and ensuring that the data is as up-to-date as possible.
That need for fast and widely distributed decision making is not always well served by the traditional ‘snapshot’ approach to data warehousing. Currently, the dominant method of replenishing data warehouses and data marts is to use extraction, transformation and load tools to pull data from source systems periodically – at the end of the day, week or month.
Moving forward from that, so-called ‘active data warehouses’ draw live data from transaction systems using a more continuous approach such as a message bus, and therefore refresh the warehouse on a much more frequent basis.
The aim is pretty transparent. With more up-to-date data, employees such as shipping clerks, customer service staff or call centre agents can run queries on customer, order or schedule information that is only minutes old. An airline gate attendant, for example, would be able to decide which passenger gets a seat on an overbooked flight by running a quick check on which has the best frequent flier profile. Or a call centre agent can try to push through a sale based on customer information gleaned moments before when the customer visited the company web site.
Two of the pioneers of data warehousing – retail giant Wal-Mart and systems vendor Dell Computer – are turning their vast stores of data into ‘live’ decision-making engines.
Wal-Mart – which already refreshes its 300 terabytes (TB) warehouse with new and updated records every 10 minutes – is currently discussing with its warehouse technology supplier NCR Teradata how to cut that cycle down to two minutes.
As always with Wal-Mart, the cost justification comes from the refinement of the company’s supply chain and the fine-tuning of profitability analysis from regional levels right down to the aisle and product level. “They want to take
further cost out of the supply chain by getting the right goods to the right people at the right time,” says NCR CEO Mark Hurd.
Dell has similar goals, aiming to ensure that data in the warehouse is only at most an hour old (see box, In practice).
Both companies have built enterprise-wide data warehouses, essentially ‘one version of the truth’. But for most, analytical data is still distributed across different parts of the organisation – resulting in duplication and unreliability.
End of the mart
In the late 1990s and into this decade, providing staff with decision-making power often meant delivering a data engine that was fit for the task in hand but which ultimately stood alone from other the corporate data structures.
Such data marts – single subject, decentralised databases – typically contain application specific and aggregate or summary data, not detailed data. In many cases, different data marts also contain duplicate data – customer detail, name and address, income, credit history, and so on. But as they sit apart from each other and are often populated at separate times, produce contradictory analyses from different groups and result in the considerable duplication of effort and cost.
Moreover, data marts, though conceptually alluring have become notoriously expensive to maintain. The cost of running each data mart is put at between $1 million and $2 million per year, say analysts at Meta Group. “It’s the long term support cost of marts that really eat you alive – DBAs, systems admins, network cost, moving all the data to marts, all the data preparation, the mainframe chargebacks, the maintenance pricing on the software and hardware,” says Stephen Brobst, chief technology officer at NCR’s Teradata unit.
Though many large companies have made the decision to move to an enterprise data warehouse, the deployment of data marts does not seem to have stopped. While in 2002, 31% of respondents to the BuzzBack survey reported their organisations had anything between 11 and 100 data marts deployed, in 2003 that figure rose to 38%. One factor there might simply be a greater awareness of their existence.
“Very often these marts are not something that IT has built, but something that is hidden under someone’s desk in marketing or risk management or something like that. But these things, because though they are in the dark, grow like mushrooms,” says Brobst.
In any case, existing data marts are jealously guarded by departments. The reason for that is clear.
Data marts are cheaper and quicker to deploy and are often funded out of a departmental budget, and because the department feels it owns the data, any suggestion that they should be closed down and absorbed into a enterprise data warehouse is nothing short of a political act.
“It is only in recent years when the budget pressure has been more intense, that the need for thriftiness in managing the budget has been able to overwhelm the politics,” says Brobst.
“Data mart consolidation is one of the top three IT projects that can result in cost savings,” says Brobst. And there is plenty of evidence that consolidating marts into a warehouse pays off – despite the often-substantial effort involved. Studying the data mart consolidation at US mobile telecoms company GST, researchers at the Kellogg School of Management concluded that the three-year switch to an enterprise data warehouse, from scores of data marts, produced a return on investment of 65% and saved GST $27 million in just one year. It is a similar story at Bank of America, which claims to have saved tens of millions of dollars by consolidating its scores of data marts over an 18-month period.
Those kinds of moves signal that, at most, large organisations’ data marts are being rolled up into enterprise data warehouses – a single data repository containing consistent data from and about the whole company or at least all of a major division. The aim is to provide multiple business functions and departments different views of the same data. The underlying detailed data, however, is stored only once.
“The drive is towards the elimination of duplicate data,” says Randy Mott, CIO of Dell and previously CIO of Wal-Mart.
Six out of ten large companies, judging from the BuzzBack sample, are currently investing in enterprise data warehousing technology, and another fifth say they will do so within the next two years.
Of course, running enterprise data warehouses are hardly cost free. For on-going support of a large data warehouse, analysts talk of an average of $500,000 per year per ‘subject area’, and a typical data warehouse will have six subject areas. As high as that seems, it is still only equivalent to maintaining two to three data marts, says Teradata’s Brobst.
At the same time, as organisations are consolidating data marts they are also often building operational data stores (ODSs) – replications from the transaction processing system that are used for tactical decision making and operational reporting. But the stress should be on tactical, says analysts.
“An ODS is like a data mart in sheep’s clothing,” suggests Brobst. “Three years from now, those organisations building ODSs will be consolidating these ODSs into the data warehouse.”
The industry’s aim is clear: to enhance the data warehouse capability with new service levels for data freshness and performance that can support both strategic and tactical decision making. At this stage, not all technologies are capable of supporting that active, enterprise data warehousing, but the direction being set by the trailblazers, and the business benefits they are seeing, will cause many to follow.