摘要:
A speech controlled dialing circuit identifies input utterances which may be a command word (mode select), repertory word (dialing name or number), or nonrecognized ("Other"). Responsive to the identification of each occurring input utterance, a set of predetermined templates are selected to identify the next occuring utterance. A programmed microprocessor system is described to implement the main controller function.
摘要:
Speaker independent recognition of small vocabularies, spoken over the long distance telephone network, is achieved using two types of models, one type for defined vocabulary words (e.g., collect, calling-card, person, third-number and operator), and one type for extraneous input which ranges from non-speech sounds to groups of non-vocabulary words (e.g. `I want to make a collect call please`). For this type of key word spotting, modifications are made to a connected word speech recognition algorithm based on state-transitional (hidden Markov) models which allow it to recognize words from a pre-defined vocabulary list spoken in an unconstrained fashion. Statistical models of both the actual vocabulary words and the extraneous speech and background noises are created. A syntax-driven connected word recognition system is then used to find the best sequence of extraneous input and vocabulary word models for matching the actual input speech.
摘要:
An arrangement for endpoint detection improves speech recognition accuracy and lowers rejection rates by developing an ordered list of endpoint candidates. A triple thresholding technique defines energy signal pulses. The energy pulses are combined according to predetermined criteria to form the endpoint candidates.
摘要:
A system for generating speech pattern templates for use with either speech recognition or speech synthesis. Reference demisyllable templates are first generated from a reference first speaker using both manual and automatic analysis. The analysis for a second speaker is simplified and automated by comparing with the first speaker's templates. The second speaker speaks the same words at a rate time-warped to match the first speakers rate and template. We define a demisyllable as each of the two halves of a syllable, assuming a syllable starts and ends with a noisy consonant, and the syllable is split at its vowel center, thereby simplifying concatenation and comparison. Key features of the invention include generating a set of signals representative of the time alignment between the first and second speaker's templates, and the time-of-occurence boundaries of each syllable in a word.
摘要:
A speech controlled dialing circuit identifies input utterances which may be a command word (mode select), repertory word (dialing name or number), or non-recognized ("Other"). Responsive to the identification of each occurring input utterance, a set of predetermined templates are selected to identify the next occuring utterance. A programmed microprocessor system is described to implement the main controller function.
摘要:
Apparatus and method for recording data in a speech recognition system and recognizing spoken data corresponding to the recorded data. The apparatus and method responds to entered data by generating a string of phonetic transcriptions from the entered data. The data and generated phonetic transcription string associated therewith is recorded in a vocabulary lexicon of the speech recognition system. The apparatus and method responds to receipt of spoken data by constructing a model of subwords characteristic of the spoken data and compares the constructed subword model with ones of the recorded lexicon vocabulary recorded phonetic transcription strings to recognize the spoken data as the data identified by and associated with a phonetic transcription string matching the constructed subword string.
摘要:
An arrangement for endpoint detection improves speech recognition accuracy where the input signal includes nonstationary noise. Energy pulses are found by looking for local energy level peaks, then analyzing surrounding energy levels to determine pulse boundaries. Energy pulses are combined according to predetermined criteria to form longer pulses corresponding to words or phrases in the input signal.
摘要:
An input word is recognized as one of a set of reference words. A set of word distance signals representative of the correspondence of the input word to the reference words is generated. A set of weighted word distance signals is also generated. Responsive to the word distance signals and the weighted word distance signals, the reference word that most closely corresponds to the input word is selected.
摘要:
An arrangement for endpoint detection improves speech recognition accuracy and lowers rejection rates by developing an ordered list of endpoint candidates. A triple thresholding technique defines energy signal pulses. The energy pulses are combined according to predetermined criteria to form the endpoint candidates.