The talking web

They are the two defining innovations of the information age, yet thus far mobile telephony and the Internet have remained almost entirely separate. However, the growing sophistication of voice recognition technology may bring about this oft-heralded union.

The mobile Internet has suffered a prolonged gestation. The experience of using wireless access protocol-based platforms has quashed  many users’ appetite; GPRS and 3G have made it easier for users to access online content, but much of it remains within the service providers’ ‘walled gardens’ and the practicalities of viewing content on mobile devices can be off-putting.

But that is now beginning to change: a recent study by Internet market research organisation comScore found that 29% of Internet users in Western Europe had accessed content via their mobile phones in 2006.

This growth can be explained by changes being made by the content providers, who have become attuned to repurposing material for mobile devices. “Usability is the key enabler for the mobile Internet,” argues Peggy-Ann Salz, mobile web analyst at Informa. “To get more users, it has to be usable by the mass market.”

And for the speech recognition makers, this represents an irresistible opportunity: their technology could become the key enabler for a multi-billion dollar industry.

Speech Recognition Tools

“Speech recognition is the missing element in the mobile explosion,” enthuses Vlad Sejnoha, chief scientist at speech recognition maker Nuance. Currently, users find mobile interfaces “difficult to use”, but by introducing a voice-enabled search engine, users would have a welcoming gateway into the mobile web, he adds.

Sejnoha argues that people are already comfortable talking into their mobile phones, so a voice-based user interface would be more readily accepted. To back that claim he points to research which showed that after watching a video of the company’s speech technology working on a mobile, 80% of subjects reported that they would ‘significantly’ increase their mobile web-surfing if they had the technology on their phones.

There are a couple of problems with this result, however. First, the baseline rate of mobile web use is tiny to begin with, so any increase in mobile web use could be described as ‘significant’. Secondly, subjects were shown a video of a Nuance employee using the technology rather than being allowed to use it for themselves.

Nuance has just begun to offer speech-recognition tools to mobile users commercially, most significantly through the US mobile operator Sprint. But Sejnoha admits that there remain a number of issues that threaten the success of the paradigm.

One factor hindering development of speech recognition on mobiles is the sheer number of form factors and operating systems that the software has to work on. Secondly, the technology needs to be more sophisticated, allowing users to choose their own phrases rather than being limited to a pre-defined set of commands as they are today.

Sejnoha is unperturbed: “We are making very quick progress on all of those factors. Mobile dictation, for example, has improved greatly in the last year.”

One company that will play a pivotal role in the success or failure of the mobile Internet is handset giant Nokia. It first introduced some low-level speech recognition functions, such as using voice tags for contacts, over a decade ago. Also, according to comScore, Nokia’s phones are the preferred devices for mobile web users. But Nokia remains agnostic about the technology.

Juho Iso Sipila of Nokia’s technology platform division believes that there are still too many unanswered questions to predict whether speech recognition will prove encourage mobile web use. “What is being searched [in a mobile web search]? What do you do with the result? Where does the target data reside? How is the data accessed? Where is the recogniser running? Who will control the servers?” he asks. “The technology for the speech-enabled mobile web is there in parts, but I can’t see any concrete time when it will happen.”

However this assumes that the mobile Internet will simply aim to replicate the desktop experience, where navigation is based around a browser, links and URLs. Other models for the mobile web are being developed in which automated audio information services are accessed via speech. After all, it is the information contained on websites that is in demand.

An Internet of phones

One service lighting the way is that provided by Bharti Airtel, India’s largest mobile operator. Like much of the Indian economy, Bharti has enjoyed a staggering explosion of growth in the last few years. It now boasts 37 million subscribers, 22% of the sizeable Indian market. But heavy competition in the sector means that average revenue per user – based on call charges alone – is low.

 For that reason, the company has focussed on value-added services, which now contribute 10% of its revenue. These include ‘voice portals’, information services based around the so-called ABC of Indian culture (astrology, Bollywood and cricket). These are accessed via a speech recognition interface, for which users pay an additional charge.

The voice portals receive 8 million calls per year, which may be a small proportion of the total number of calls that Bharti handles, but the business case has proved strong enough for nearly all of its competitors to have copied the offering.

The service resembles an automated call-centre more than it does a website, but this is just as plausible a model for the mobile web as a speech-enabled browser. Two telephone services launched in the US suggest that, potentially, it is this kind of service that has legs.

In April 2007, search giant Google launched a speech recognition-enabled,  automated telephone directory service called Google 411. A week later, TellMe, a speech-enabled telephone services provider acquired this year by Microsoft, launched a mobile web search service. The services are accessed by a telephone number and are essentially automated versions of directory enquiries, but the information they retrieve is the same as would be delivered via a Google or Microsoft Live web search.

These two offerings, from two of the world’s largest technology companies, represent a significant endorsement to the prospect of a speech-enabled mobile web. “Google and Microsoft’s involvement is the catalyst that the mobile web needs,” says Informa’s Salz.

However, it remains to be seen whether such automated directory services will prove sufficiently alluring; mobile phones have already taken on enticing new capabilities, becoming cameras, MP3 players, and personal computers.

And in this rich-media environment, there persists a chicken-and-egg problem for the mobile web. Consumers will not begin to surf the mobile Internet until there is sufficient content but providers will only make that content available when there is a proven revenue stream.

Salz believes that while speech recognition might encourage consumer uptake of mobile web services, there are many other factors at play. “[Speech recognition] will help users to access content on the mobile web, so it definitely should be part of the solution. But I don’t think it’s the killer app.”

Box-out: ABN Amro boosts security with biometric voice recognition.

Further reading in Information Age


Sound thinking on speech recognition
– May 2006

Pete Swabey

Pete Swabey

Pete was Editor of Information Age and head of technology research for Vitesse Media plc from 2005 to 2013, before moving on to be Senior Editor and then Editorial Director at The Economist Intelligence...

Related Topics