Polk position

During the past two and a half years, Michigan-based marketing information distributor R L Polk has been on what Gary Rosteck, its data quality manager, refers to as a “gruelling journey”.

Polk is one of the oldest and most-established providers of marketing information and services to the US vehicle industry – everything from the number of hybrid cars on the road to the revenues generated by car sales online – and it holds more than 1.5 quadrillion pieces of information in its repository. That resource is fed by no less than 250 different sources – insurance companies, manufacturers and creditors – each using their own data formats and schemas. To create its range of information products and services, Polk has had to rationalise, scrub and repackage these sources.

For nearly 30 years, batch-based IBM mainframe applications underpinned these processes. As a result, Polk’s IT environment became inflexible and extremely complex. Furthermore, it was heavily reliant on legacy skills. “We had a good retention rate, but we’d got to the point where it was, ‘Only George knows how to do this or that’,” says Rosteck. “It was just too dependent on people.”

Under project ‘ReFuel’, which boasted a $20 million budget, Polk set about re-engineering its processes for capturing, standardising, enhancing and storing data – with a goal of improving speed and efficiency by 50% and, critically, bringing data accuracy up to 100%.

“Data quality is everything. We win or lose multi-million dollar contracts on a 2% data quality margin,” says Rosteck. “So it wasn’t just about improving data accuracy, it was also about competitive advantage going forward.”

"It used to take days for someone to programme an analysis on a data file – now we can do it in minutes."

Gary Rosteck
Data quality manager
R L Polk

In order to come close to what Rosteck calls the ‘50-50-100’ goal, Polk had to both simplify and speed up its ageing IT estate. This meant moving off the mainframe – a formidable task that the company had unsuccessfully attempted twice before – and onto a grid computing system.

From the outset Polk applied a data governance framework in order to assess the end-to-end processes and ensure that the quality of the data would be checked at each stage. The team then mapped the pieces that the new system – dubbed the ‘Data Factory’ – would require. In the new environment, says Rosteck, flexibility would be absolutely key. “So we knew it had to be based on a service-oriented architecture (SOA) for agility, particularly as we were looking to future scale. This was not a one-time corrective fix. The project was about allowing the company to grow.”

For the sake of speed, Polk wanted to buy components where possible, with the IT team building the rest. To this end, Polk chose Dell servers running Linux, which were configured into separate grids. For its SOA backbone, the company selected Tibco’s BusinessWorks messaging bus, integrating a data quality solution – dfPower Studio from DataFlux – into that.

“Right out of the gate DataFlux supported the SOA, which meant we could plug in the news services very quickly and flexibly,” says Rosteck. “We had a lot of high-level complexity built into Polk’s systems, and we had to be able to replicate that very quickly and efficiently in the new environment.”

Using data migration software from Informatica, Polk is now able to translate all received data into XML.

The DataFlux system is then used to cleanse that data. Because the dfPower Studio operates a user-configurable, GUI-based interface, the end-user can profile data on the fly. This capability has dramatically improved performance, with Polk now processing 10 million transactions per day.

“We can call up the data and look for trends, frequency distribution and so on. It used to take days for someone to programme an analysis on a data file – now we can do it in minutes,” says Rosteck. When errors are flagged up for investigation, they are passed into a common exceptions process that Polk has wrapped around the DataFlux system, he adds.

Armed with the new infrastructure and the re-engineered processes, Polk has successfully met its ‘50-50-100’ goal, enabling it to cut resource usage and transfer staff to different business units. In addition, migrating off the mainframe has provided the company with software as well as hardware savings.

“We now have a consolidated view of our informational assets – and you can’t put a price tag on that,” Rosteck adds. “We were able to take a ‘Big Bang’ approach because we had significant buy-in from management. Now we’ve made that investment, we have a solid foundation for the future.”

Henry Catchpole

Henry Catchpole runs Inform Direct, a company records management software company which simplifies the process of dealing with Companies House. The business was set up in 2013.

Related Topics