Abstract:
A call routing and supervising system includes an input receiving customer speech from a remote location, and a voice characteristics extractor extracting voice characteristics from the customer speech, such as language/dialect/accent, age group, gender, and eigendimension coordinates. A customer service representative selector selects one or more customer service representatives based on profiles of the customer service representatives respective of customers having voice characteristics similar to the extracted voice characteristics. In other aspects, a call monitor automatically analyzes dialogue between the customer and the customer service representative, such as detected interruptions, tracked dialogue turns, and recognized key phrases indicating frustration, polity, and/or resolution characteristics of dialogue. The call monitor records performance of the customer service representative respective of customers having the voice characteristics. Automatic call rerouting and/or real-time instruction of call center personnel can also be accomplished based on analysis results.
Abstract:
An automated hotel attendant is provided for coordinating room-to-room calling over a telephone switching system that supports multiple telephone extensions. A hotel registration system receives and stores the spelled names of hotel guests as well as assigns each guest an associated telephone extension. A lexicon training system is connected to the hotel registration system for generating pronunciations for each spelled name by converting the characters that spell those names into word-phoneme data. This word-phoneme data is in turn stored in a lexicon that is used by a speech recognition system. In particular, a phoneticizer in conjunction with a Hidden Markov Model (HMM) based model trainer serves as the basis for the lexicon training system, such that one or several HMM models associated with each guest name are stored in the lexicon. An automated attendant is coupled to the speech recognition system for converting a spoken name of a hotel guest entered from one of the telephone extensions into a predefined hotel guest name that can be used to retrieve an assigned telephone extension from the hotel registration system. Next, the automated attendant causes the telephone switching system to call the requested telephone extension in response to the entry of the spoken name from one of the telephone extensions.
Abstract:
The mixed decision tree includes a network of yes-no questions about adjacent letters in a spelled word sequence and also about adjacent phonemes in the phoneme sequence corresponding to the spelled word sequence. Leaf nodes of the mixed decision tree provide information about which phonetic transcriptions are most probable. Using the mixed trees, scores are developed for each of a plurality of possible pronunciations, and these scores can be used to select the best pronunciation as well as to rank pronunciations in order of probability. The pronunciations generated by the system can be used in speech synthesis and speech recognition applications as well as lexicography applications.
Abstract:
A computer-implemented method and apparatus is provided for processing a spoken request from a user. A speech recognizer converts the spoken request into a digital format. A frame data structure associates semantic components of the digitized spoken request with predetermined slots. The slots are indicative of data which are used to achieve a predetermined goal. A speech understanding module which is connected to the speech recognizer and to the frame data structure determines semantic components of the spoken request. The slots are populated based upon the determined semantic components. A dialog manager which is connected to the speech understanding module may determine at least one slot which is unpopulated based upon the determined semantic components and in a preferred embodiment may provide confirmation of the populated slots. A computer generated-request is formulated in order for the user to provide data related to the unpopulated slot. The method and apparatus are well-suited (but not limited) to use in a hand-held speech translation device.
Abstract:
Decision trees are used to store a series of yes-no questions that can be used to convert spelled-word letter sequences into pronunciations. Letter-only trees, having internal nodes populated with questions about letters in the input sequence, generate one or more pronunciations based on probability data stored in the leaf nodes of the tree. The pronunciations may then be improved by processing them using mixed trees which are populated with questions about letters in the sequence and also questions about phonemes associated with those letters. The mixed tree screens out pronunciations that would not occur in natural speech, thereby greatly improving the results of the letter-to-pronunciation transformation.
Abstract:
A speaker authentication system includes a data fuser operable to fuse voiceprint match attempt results with additional information to assist in authenticating a speaker providing audio input. In other aspects, the system includes a data store of speaker voiceprints and a voiceprint matching module adapted to receive an audio input and operable to attempt to assist in authenticating a speaker by matching the audio input to at least one of the speaker voiceprints. The voiceprint matching module adjusts a confidence of voiceprint match attempt results by at least one of: (a) a number of utterance repetitions upon which a matching speaker voiceprint has been trained; or (b) a passage of time since a training occurrence associated with a matching speaker voiceprint.
Abstract:
A set of models is developed to represent sound units and these models are then used with the incorrect sound units to determine which generate high likelihood scores. The models generating high likelihood scores for the incorrect sound units represent those that are more likely to be confused. The resulting confusability data may then be used in generating more discriminative speech models and in subsequent pruning of the acoustic decision tree. The confusability data may also be used to develop confusability predictors used for rejection during search and in developing continuous speech recognition models that are optimized to minimize confusability.
Abstract:
The system includes a database of program records representing A/V programs which are available for recording. The system also includes an A/V recording device for receiving a recording command and recording the A/V program. A speech recognizer is provided for receiving the spoken request and translating the spoken request into a text stream having a plurality of words. A natural language processor receives the text stream and processes the words for resolving a semantic content of the spoken request. The natural language processor places the meaning of the words into a task frame having a plurality of key word slots. A dialogue system analyzes the task frame for determining if a sufficient number of key word slots have been filled and prompts the user for additional information for filling empty slots. The dialogue system searches the database of program records using the key words placed within the task frame for selecting the A/V program and generating the recording command for use by the A/V recording device.
Abstract:
A call routing and supervising system includes an input receiving customer speech from a remote location, and a voice characteristics extractor extracting voice characteristics from the customer speech, such as language/dialect/accent, age group, gender, and eigendimension coordinates. A customer service representative selector selects one or more customer service representatives based on profiles of the customer service representatives respective of customers having voice characteristics similar to the extracted voice characteristics. In other aspects, a call monitor automatically analyzes dialogue between the customer and the customer service representative, such as detected interruptions, tracked dialogue turns, and recognized key phrases indicating frustration, polity, and/or resolution characteristics of dialogue. The call monitor records performance of the customer service representative respective of customers having the voice characteristics. Automatic call rerouting and/or real-time instruction of call center personnel can also be accomplished based on analysis results.
Abstract:
A constraint-based speech recognition system for use with a form-filling application employed over a telephone system is disclosed. The system comprises an input signal, wherein the input signal includes both speech input and non-speech input of a type generated by a user via a manually operated device. The system further comprises a constraint module operable to access an information database containing information suitable for use with speech recognition, and to generate candidate information based on the non-speech input and the information database, wherein the candidate information corresponds to a portion of the information. The system further comprises a speech recognition module operable to recognize speech based on the speech input and the candidate information. In an exemplary embodiment, the manually operated device is a touch-tone telephone keypad, and the information database is a lexicon encoded according to classes defined by the keys of the keypad.