The value of bringing structured and unstructured information together

It is half past four. Regional sales manager John Smith has just received a call from his sales director requesting a progress meeting early the next morning. He promptly accesses the company’s business intelligence (BI) system and, as if by magic, it collates transactional data from the customer relationship management (CRM) system to reveal that one member of his sales team may be about to miss their target.

In a traditional business intelligence system, that is where the process ends. While BI tools may provide a view of the business at a glance, they don’t necessarily provide enough context to really solve the problem.

By using a search tool through his customised portal to drill down into related unstructured data – email, instant messaging exchanges, voice-over-IP calls – Smith can find out a little more.

He soon sees that the sales representative who is underperforming has sent 83 emails to one particular customer. That might raise an alert, says Sid Probstein, vice president of the CTO office at Fast Search & Transfer, who provides this example.

“We could then use sentiment analysis to see that 60% those 83 emails were negative in tone,” adds Probstein. “Because you have the structured and unstructured data together, you can get exactly the right answers on the fly.” In this case, the sales representative has spent too much time in an unresolved dispute with a single customer when he should have been selling elsewhere.

This is a positive example: a demonstration that not only is unstructured information available and accessible, but that the limiting technical divide between unstructured and structured information can be overcome by integrating different systems. In fact, more sophisticated enterprise search tools and content management systems now mean that unstructured information can be managed and interrogated spontaneously, just as if it were structured.

Beyond structure

Storing structured information is a given. In most cases it is already held in the tables of a relational or multi-dimensional database that can be easily queried. The challenge for businesses now is to decide which elements of that information should be analysed, which unstructured information should be captured alongside it, and how to store and present the whole mixed picture all in such a way that it is useful. “People have suddenly woken up to the fact that there is a lot of information in email, instant messaging and desktop files,” says Royce Bell, CEO of Accenture’s Information Management Services practice, which advises on how companies can develop and implement an overall information management strategy.

Email is one area where businesses are struggling to gain control. UK supermarket chain Somerfield, for example, found that its 3,500 users were sending and receiving over 100,000 external business emails every month, in addition to internal emails. The problem was particularly worrying because Somerfield’s managers were using email to negotiate with suppliers, says Colin Clark, corporate cost audit manager at the company, and much of the context of those transactions was going unobserved. “Nine times out of ten, critical business information resides with the corporate email system,” he adds.

Somerfield’s solution was to install a searchable email archive that now holds in excess of 11 million emails. The financial department can now accurately track negotiations with suppliers and an estimated £3 million has been saved over two and a half years, says Clark.

Emails are also a major issue for regulators. “If you look at a number of court cases recently, [regulators] are starting to go through email in a big way and analysing them for patterns,” says Stephen Gallagher, senior director of Information Management Services at Accenture. “The analysis of unstructured information is becoming key.” Instant messages and voice calls, especially if they are ‘packetised’ and stored in a database, are also being treated in the same way.

In financial services, especially, failure to produce the right emails can be expensive, with laws such as the US’s Sarbanes-Oxley and the financial service industry’s Basel II and III forcing companies to store unstructured as well as structured data.

The US-headquartered investment bank Morgan Stanley provides a case in point. In 2002, Morgan Stanley, and a number of other Wall Street companies, including Goldman Sachs, Deutsche Bank Securities, Salomon Smith Barney and US Bancorp Piper Jaffray were each fined $1.65 million for failing to preserve email documents and keeping them accessible. Three years later Morgan Stanley was again subject to a similar investigation by the US Securities and Exchange Commission for failing to keep emails that may have been relevant to a number of cases brought against the bank in recent years.

Telephone calls are now adding to this burden. Some businesses, especially in financial services, have for long recorded calls; now converged, IP-based networks will make voice as searchable as structured information is today.

Digital ears

This brings new responsibilities and challenges. With technology from enterprise search company Autonomy, international mobile service provider Vodafone electronically ‘listens’ to incoming customer calls and, based on the contents of the calls, suggests on-screen responses to call centre operators, in real time.

The value of bringing structured and unstructured information together is no longer questioned. But there are still distinct opinions about the technical approach organisations can take.

Barry Litwin, CEO at content management vendor Hummingbird, for example, argues in favour of putting unstructured repositories alongside structured repositories, operating under one enterprise content management system. But more frequently, consultants are likely to recommend integration at the portal or user level.

Dr. Mike Lynch, CEO of Autonomy, on the other hand, believes that free text search technology is improving so rapidly that adding structure to unstructured information may be pointless: “The initial reaction of the IT world is ‘let’s structure it’. But before you go down that route, I would say ‘think very carefully’.”

But for now, the unified experience is more a matter of co-presentation, rather than pure integration of data. This will depend on what Bell of Accenture calls “the sources and application of information”. In other words: what does the business need?

Likewise, Jeff Raikes, president of Microsoft’s Business Division, says that by combining, for example, a business intelligence scorecarding program with an analysis of various blogs, he can identify patterns that any standalone BI tools might have missed. “[Such] ability to bring unstructured information together with structured information is critical; it is what has given me the insight to the new opportunities that I have here.”

Structured vs. unstructured information: the difference

The structured/unstructured divide runs through information management like the San Andreas Fault through Silicon Valley. It may be possible to bridge, but it is always there.

The alphanumeric data created by – or stored for use by – most business applications, from enterprise resource planning or ecommerce systems down to Excel spreadsheets, is typically referred to as ‘structured data’. It is usually stored in a series of rows and columns organised according to a set of rules to provide an efficient means of storage and retrieval, usually via the Structured Query Language (SQL). Some business intelligence tools use multidimensional tables for more complex analyses, but the same principles apply.

While such table structures are efficient for alphanumeric data, they fall down when trying to handle unstructured data such as images, email, text, sound and so on. This information can be stored in variety of ways: in object databases or via the object extensions to relational databases – neither of which is directly searchable. Alternatively, they can be as a ‘big list of files’, where these may be ordered using tags or descriptions (metadata), which may be ordered into categories according to a taxonomy. This tagging enables them to be stored in a structured database. Text-based files can also be retrieved or indexed using specialist free text searching tools.

Avatar photo

Ben Rossi

Ben was Vitesse Media's editorial director, leading content creation and editorial strategy across all Vitesse products, including its market-leading B2B and consumer magazines, websites, research and...

Related Topics