Sound thinking on speech recognition

Designers at car maker Honda had some grand plans for the latest model in its Civic range: the cars come equipped with satellite navigation, audio and climate control systems that can all be controlled by voice commands. The sophistication of voice recognition technology is such that is has moved out of the realms of science fiction, into reality. And that promises to have a radical effect on business processes.

Progress in the accuracy of speech recognition applications has been slow since they first emerged in the 1950s. The reliability and speed of physical interfaces such as the keyboard and mouse make them the overwhelmingly preferred method for interacting with computers and systems.

This is changing, and today, voice is a viable machine control tool. The technology has matured to the point where historical complaints over innaccuracies are no longer as valid. Speech recognition systems now employ artificial intelligence to refine their accuracy and so actually improve with age.

Furthermore, there has been considerable consolidation in the industry, which has made a broad range of functionality available cheaply from a few vendors. Most significantly, speech recognition software maker Nuance Communications (formerly ScanSoft) has, since 2000, acquired a roll-call of speech technology vendors, including Lernout & Hauspie and SpeechWorks.

Open standards for speech technology have given companies a degree of control over the design of their own systems, and let them integrate easily into other applications (see box: VoiceXML).

Hearing voices

The impact of the voice revolution can be seen at hotel chain Travelodge. In the past two years, the company has been able to dramatically reduce the price of hotel rooms, thanks to its ability to reduce overheads through the increasing use of online booking.

However, there is a considerable proportion of its customer base that does not use Internet booking. Initially, that meant that Travelodge was running two channels for bookings – web and telephone – that had different cost structures and offered different prices, with telephone customers often paying more.

“As many as 30% of calls to an automated menu system end up misdirected.

Jason Humphries, head of EMEA professional services, Nuance Communications

That changed with the introduction of a speech recognition-based self-service phone line. Telephone customers can now speak to ‘Lisa’, a virtual persona designed to reflect Travelodge’s brand values, in order to book their rooms.

Crucially, the self-service system – built by speech recognition specialist Fluency – integrates directly with the same booking and payment systems as the website. Not only are users’ voice commands entered into those systems directly, Lisa can also utter information extracted from them using text-to-speech software.

This all means Travelodge can now offer Internet-rates to telephone customers. The voice service now accounts for roughly 7% of all transactions. “The system has made our call centre costs resemble our web self-service costs much more closely,” says Shona Fraser, director of revenue and reservations at Travelodge.

Talk to the machine

Market research analyst body The Yankee Group estimates that the speech recognition-based self-service technology market is growing at a rate between 7% – 10% annually, this growth mainly driven by the US market. 

Yankee Group analyst Art Schoeller explains the economics: “The average cost of a transaction on the web is less than a dollar. Voice-based self-service transaction costs are slightly more because of the development cost. Compare that with the average email transaction cost of $4, or the call centre transaction cost of $5.50.”

The web has changed the expectations both of senior management and customers; the former want low overheads on customer service and the latter low prices. By using a voice-based self-service line that integrates with existing online self-service applications, businesses can leverage prior investment to deliver both.

Of course, automated phone lines do not need to employ sophisticated speech recognition technology: touch-tone systems are still popular, and sales are still growing.

“Speech recognition is not for everybody,” concedes Tony Corlett, of voice and data communications consultancy Azzuri. A lot of the benefits of intelligent routing in a call centre can still be achieved with a simple touch-tone menu system, he says.

"Organisations are looking to get more use from their web-based content and to use it in a speech self-service world."

But Jason Humphries, head of EMEA professional services at speech technology vendor Nuance, believes that systems which invite the caller to speak freely and then react, rather than constraining them to limited menu options, reduce the chance of callers requesting to speak to a human operator – an option which pushes up transaction costs.

“Unless you have a very limited number of services that you are offering on a touch-tone menu, it is very difficult to make sure the prompts and the options are right,” says Humphries. “As many as 30% of calls to an automated menu system end up misdirected.”

Unsatisfactory menu systems can lead customers to opt to talk to operators and puts an unpredictable strain on call-centre resources. Speech recognition systems that can faithfully capture what a caller wants to do greatly improve the chance that their self-service call will be successful, or will be routed to an appropriate call-centre agent. “The technology can be expensive,” says Corlett, “but the price pales into insignificance compared to the cost of over- or understaffing your call centre.”

The multimodal web

Emerging applications of speech recognition beside self-service are many and varied: automated reception, remote identification and email-to-phone messaging are a few among them. But one particular area is widely predicted to boost the importance of speech recognition technology to businesses: mobile Internet applications.

Web-surfing functionality on mobile phones is advancing rapidly, but the size of the devices makes navigation more fiddly than it is on desktop machines. Speech recognition is an obvious alternative interface for directing mobile web applications; people are already used to talking into their mobile phones.

The idea of speech-controlled web apps, which according to Gartner analyst Steve Cramoysan has been implemented by a handful of bleeding-edge technology companies, received a significant endorsement in April 2006 when the US patent office revealed that search-engine giant Google had taken out a patent for a voice enabled search. “If Google rolls this out on mobiles, people will begin to demand it from all the companies they deal with,” says Cramoysan.

Quite how speech-recognition technology will work with mobile phones remains uncertain: some models propose that speech recognition software should be housed in remote servers; other place it on the phone itself.

But the proposition illustrates the ‘multimodal’ approach to distributing online information. As IP-based communications technologies become more prolific, the greater the opportunity for businesses to apply web-based customer service and information systems to different media becomes.

“Today, many organisations are looking to get more use from their web-based content, and to turn that material and knowledge into grammars and dialogues to use in the speech self-service world,” says Yankee Group’s Schoeller.

Schoeller encourages businesses to consider automated phone lines as an extension of their website; instead of a graphical user interface (GUI), they have a voice user interface (VUI).

With web and voice services operating as two different faces of a unified system, each medium contributes to the business’ knowledge base, leaving both customer facing agents and customers themselves better informed, and hopefully, more satisfied.



Voice XML

“Historically, there were both technical and cultural gaps between the way voice-based systems have evolved and that of the Internet and Web, leaving the information available only to voice systems or the web,” said web pioneer Tim Berners-Lee, in his capacity as director of the web standards body W3C, at the launch of VoiceXML 2.0 in 2004. “We’re now able to integrate and benefit from the strengths of both groups.”

The third generation of VoiceXML, which will launch in late 2006, is a mark-up language similar to web-development standard XML. It grants voice applications native access to data used by web-based applications, a significant step in the unification of web and voice channels.

“Normally, there would be intermediate servers between voice and web applications,” explains Gartner analyst Steve Cramoysan. “That meant that the two could get out of sync.”

Not only does the VXML standard enable two communication channels to share data, it also grants IT departments greater control over voice applications. As the standard greatly resembles XML, it is not a giant leap for an XML developer to design or modify applications for use in automated call-centres.

“Even if you still have a dedicated voice programmer, they can now talk to web-developers in the same language,” says Cramoysan.

That flexibility will be a boon in the long run but may confuse some companies in the short term, he adds. “Operating voice applications is not going to become any simpler because of VXML, but they will be more flexible.”

And not all the functionality behind a speech recognition self-service system can be controlled by VXML: Call Control XML (CCXML) and Speech Recognition Grammar Specification (SRGS) govern call handling and speech recognition rules respectively. 

Nevertheless, VXML has been widely adopted as the way to do voice applications. “Of the voice self-service requests-for-proposal I see now,” says Yankee Group analyst Art Schoeller, “90% are for VXML systems.”

Further reading

Happy talk – Call centre automation and the customer – December 2005

Related Topics