|
|
| Under the Hood |
|
|
November 17, 1996
|
|
A Mini-Glossary of Speech TermsSpeech Recognition Sidebar
|
|
TechOnline
- Alphanumeric Recognition
- The ability to recognize spoken numbers zero through nine and the letters of the alphabet.
- Automated Attendant
- A device which routes calls to the proper extension. Function replaces most functions of telephone receptionist and may accept voice input (ASR) or touch-tone commands.
- Automatic Speech Recognition (ASR)
- The process of converting spoken words to computer-intelligible information. Also known as speech-to-text, and speech recognition.
- Channel
- A transmission facility with defined frequency response, gain, and bandwidth; the basic unit rented from the telephone company. Also called a "line" or a "circuit."
- DSP
- A special digital signal processing computing chip optimized for operating on waveforms in real time.
- DTMF
- Dual tone, multi-frequency is the official designation for "touch tone" dialing, the familiar 12-key keypad on the telephone used to encode digits over analog telephone lines. Data input via DTMF is a common alternative to ASR.
- IVR
- Interactive voice response. A computer responds to queries with either digitized recorded voice snippets or by synthesizing voice responses using text-to-speech. The IVR computer accepts user inputs either via touch-tones or via speech recognition. IVR is the technique the train (bus) company uses to provide schedule information over the telephone.
- Natural Language Processing
- The science of trying to determine "meaning" from a text string. Natural language processing is not speech specific; but the process could receive inputs from typed words or from characters which have been scanned and recognized (OCR).
- Speech Compression
- The process of digitizing speech and compressing the data for subsequent storage or transmission in digital form. Speech compression maximizes RAM and disk resources as well as transmission bandwidth. Speech compression is not directly related to speech synthesis or speech recognition however. Speech compression algorithms take advantage of models of the human voice system to provide higher quality data and lower bitrates than would be possible using other types of data compression algorithms.
- Speaker Recognition
- As an objective speaker identification, identifying the individual speaking from many candidates. In text-dependent speaker identification, a speaker is required to utter a certain phrase or password so that both the speaker and text can be identified/verified.
- Speaker Verification
- The process of determining if a speaker is the individual he or she claims to be. Speaker verification is useful in security applications.
- Template
- A digitized pattern of sound, typically a word, that makes up a vocabulary.
- Utterance
- Speech made by the user.
- Vocabulary
- A collection of one or more templates.
- Voice Cut Through
- Allows speech recognition during voice prompts.
- Word Spotting
- The identification of pre-defined words or phrases in a stream of speech utterances. These words or phrases are called keywords. Typically the rest of the speech is ignored. For instance, if an interactive schedule program prompted the user for his destination city, and he responded, "I would like to visit London." then the task of the word spotting program is to isolate "London" from the remainder of the digitized input.
|
|
|
|
|
|
|