Skip to content

 

Talk time

The future of speech recognition is now very much in the present. Today, there are numerous call centre suppliers with integrated application development services, that enable the use of voice automation and speech recognition technology. Some of these solutions can be installed very rapidly and can provide companies with significant bottom line savings.

Today, many smart companies including telco’s and the financial sector plus certain government agencies are acknowledging the benefits that speech recognition technology has to offer.

Although speech recognition applications can provide significant ROI through high levels of automation and increased operational efficiency, the question still haunts many people: how can a computer understand what we say? 

Speech recognition technology basically works like the human ear and brain: the ear takes in the sounds in the form of vibrations, and the brain decodes the signals and determines meaning. 

A caller speaks and the speech recognition engine captures the utterance, digitises it, and then converts it into a spectral representation.  It is this representation that describes the way the caller’s spoken words have been broken into individual frequency components—a very similar process to how the human ear functions.

The next step is to translate this spectral representation into ‘phonemes’—basic language sounds like the “b” in bike or the “th” in father.

speech recognition   

The challenge for speech applications is to precisely distinguish between variations in accent, voice quality, background noise and style of speech.  They do so by employing very complex statistical models, which, like the brain, are trained on thousands of hours of speech.

Finally, the application searches through all possible sounds and utterances.  Based on a “confidence score” of how sure the system is about what was said, it arrives at the most likely meaning of a caller’s spoken response.

A confidence score is used by the application to determine how to move forward.  A low confidence score may lead to a re-prompt such as, “I think you said William Bloggs.  Is that correct?”

For a speech recognition system to be effective, there needs to be a high-quality speech engine as a starting point. Several years ago, the key question call centre managers asked about speech recognition was "does speech software really work?"

At that time, the answer was a qualified "yes." Industry players and early adopters knew that while the technology could perform basic recognition tasks well, further advances were required for speech recognition to be highly effective, and proliferate into the mainstream customer service environment.

Those advances have been made and continue to be made. Today, the latest speech recognition software from the major vendors, works very well in call centres and other operating environments. The most recent releases have enhancements that can meaningfully improve the caller experience. Prospective buyers and/or renters of the technology who haven't investigated these products should do so.

However, just as the brass or percussion sections alone don't make a complete orchestra, speech recognition software itself does not make a good call centre application. There are other important components that must come together successfully to achieve great performance. These components include dialogue design, host interface connectivity, computer telephony integration, application personalisation and other elements including robust, comprehensive testing of your application at various stages.

Of course, in the call centre environment generally, some of the difficulties with speech recognition revolve around tone and manner.

Intonation and sentence stress can play an important role in the interpretation of an utterance. As a simple example, utterances that might be transcribed as "go!", "go?" and "go." can clearly be recognised by a person, but determining which intonation corresponds to which punctuation is difficult for a computer.

Most speech recognition systems are unable to provide any more information about an utterance other than what words were pronounced, so information about stress and intonation cannot be used by the application using the recogniser. Researchers are currently investigating emotion recognition, which may have practical applications. For example if a system detects anger or frustration, it can try asking different questions or forward the caller to a live operator.

 

Effective Listening

Need an employee? Kelly provides top employees with a broad range of skills in a multitude of fields.