Voyage of discovery

The amount of data stored by enterprises is generally held to double annually. That presents a formidable challenge to the average employee who has to use that data and navigate an increasingly vast and complex environment just to do their job.

A diverse enterprise search industry has evolved in response to this need, offering products ranging from commoditised full-text search through to sophisticated video and sentiment analysis technology that can derive the tone as well as location of a document.

The last few years in particular have seen a burgeoning number of projects in the sector; Gartner predicts the market for enterprise search technology will grow 15% this year, topping $1.2 billion by 2010, with the technology becoming pervasive by 2012.

This developing market is fuelling the spectacular growth of companies like UK-based Autonomy and was the trigger for Microsoft’s £524 million acquisition of FAST early in 2008. Meanwhile, fresh entrants such as the open source search firms Simplexo and Flax are competing with established players such as Exalead (which evolved from the remnants of AltaVista) and French firm Sinequa.

The fact that the enterprise search offering of Google – the company that dominates web search to such an extent that its very name has entered the vernacular as a verb “to search” – is an appliance-based sideline simply highlights the significant differences in searching the web and searching within internal enterprise environments.

Unlike the web, which is fairly consistently constructed of HTML pages that want to be found, unstructured data in an enterprise includes emails, documents, files and even third-party data strewn across a multitude of systems, some of which the enterprise may even be unaware it owns.

“Google has an incredible brand, but the reason they haven’t been as successful in enterprise search is because it is a very different field [to the web],” says Autonomy CEO Mike Lynch.

“Unlike the web, [enterprise] information is spread across multiple repositories and perhaps 300 to 400 different file types and formats. Search algorithms on the Net work by looking through the most popular pages – but the important legal document you’re searching for, for example, might be unpopular. If you miss it, you [could] go to jail.”

Drivers for search

As Lynch’s comments reveal, much of the current growth around enterprise search is driven by legal discovery and compliance. According to Sarah Burnett, an analyst with the Butler Group, implementing enterprise search technology “is a good way to avoid paying lawyers lots of money. I expect huge growth this year as a result of the financial meltdown,” she adds.

Martyn Christian, IBM’s vice president of enterprise content management (ECM), agrees. “Legal discovery is a business process,” he explains. “Lots of companies undergo litigation, and often make the decision to settle in the discovery phase. They [use search] to make sure they have defensible case before going to court.”

Another key driver behind the rise of enterprise search is the current incursion of consumer technology into the workplace, brought on by growing dissatisfaction with the search capabilities of business systems.

A study by French search firm Sinequa found that only 18% of organisations have a dedicated enterprise search tool, while 41% of organisations rely on application-specific search tools. A further 35%

use an “online filing system”, while 6% think “it is not my concern if employees don’t know how to search for information”. Out of all companies surveyed, 59% profess they rely on employees “saving documents in the right place.”

These approaches cause great frustration among users, says Dave Armstrong, head of products and marketing in EMEA for Google’s Enterprise division, especially when they go home and discover the tools they use to surf the web are more sophisticated and efficient than those they are forced to use at work.

“The vast majority of employees are bringing consumer technology into the workplace [as a result],” says Armstrong. “Why are they doing that? Ask any CIO how many employees use a third-party email account when their attachment is too large to send via corporate email.”

Varying quality

However, the enterprise search industry is, according to Autonomy’s Lynch beset by “marketing and hype”, and with technologies available being far from equal. “Conventional search doesn’t understand that a Labrador is a dog; a keyword search doesn’t understand that Apple is a technology company,” he explains. “It’s a big step up to understanding meaning.”

And there is a need to see that distinction in action. Paul Ellerbeck, IT director at online listings site Fish4, emphasises the importance of actually trying out different vendors’ search technologies, preferably in tandem, and comparing both the search results and performance before adoption.

“It’s very easy to be sucked into the belief that a search technology can do what you want it to do,” he says, describing Fish4’s own experience with FAST as “leaving a sour taste in our mouth”.

“I would advise anyone doing [an enterprise search implementation] to actually see a proof of concept, even if it costs money, otherwise the solution can cost a fortune in human time to implement only to discover it can’t meet your specific requirements,” Ellerbeck says.

Security is another important consideration. Even though any enterprise search solution worth its salt will preserve existing security controls across a file system, hopefully preventing junior employees accessing the finance departments salary roster, in practice businesses have found that the technology can render the concept of ‘personal workspace’ obsolete.

“Think of all the personal documents on people’s machines that get saved to personal workspaces,” says David Fitch, knowledge management director at law firm Simmons & Simmons, which has a Recommind implementation for legal searches. “With a much more sophisticated search layer you can find things like personal letters. You’re not doing anything you can’t do already, but search gives lightning-fast access to it all. It’s an issue that comes up in every search implementation, and it’s something you have to manage.”

Another check is how the search technology handles security – even the existence of certain documents must be kept hidden from some employees (for instance, ‘imminent_merger.doc’).

“The average employee is only allowed to see one in a thousand documents,” explains Lynch. “Some search technology will check a document by going to the repository, going back to check the rules, then back again to unlock the document. Going backwards and forwards 10,000 documents later, and the network grinds to a halt.”

Search’s future

Search technology is commonly perceived as a temporary fix for a wider failing, namely the sub-standard search functionality of most enterprise content management (ECM) systems.

Opinion is therefore divided among vendors, customers and analysts as to whether enterprise search is merely a feature that should be consolidated into mainstream ECM, or a stand-alone technology in its own right.

Microsoft’s acquisition of FAST, for example, lends weight to the former view. And IBM, which sells an array of search technologies under the OmniFind banner, does not see it as a distinct business area.

“Search is a business process,” explains ECM head Christian. “If you can’t find information then that’s not a search issue but a business process issue. Rather than use a search tool to overcome it, I would argue that steps need to be taken earlier – in classification and metadata. Then the search challenge is easier to solve.”

But Razmik Abnous, CTO for document management at information infrastructure technology giant EMC and a founding engineer of Documentum, describes the view “that if I build a taxonomy, the world will organise around my taxonomy” as old-world thinking. (EMC’s Documentum Platform uses Apache’s Lucene for full-text search, but also offers close integration with FAST for more sophisticated enterprise deployments).

Armstrong from Google agrees: “Human nature does not naturally push people to structure their content. A system that is built to force them to do will not work. People will always make unstructured data, and search will always have its place.”

That view is mirrored among those implementing the technology. For instance, Simmons and Simmons’ Fitch believes search has a role in extending the longevity of existing information systems, as well as boosting efficiency and aiding regulatory compliance.

“We have one 12-year-old database. We could have built a new one, but put a search layer across the top and it solves a lot of the issues. I don’t think ECM will negate the need for search,” he concludes.

Related Topics