Abstract:
A speaker authentication system includes a data fuser operable to fuse voiceprint match attempt results with additional information to assist in authenticating a speaker providing audio input. In other aspects, the system includes a data store of speaker voiceprints and a voiceprint matching module adapted to receive an audio input and operable to attempt to assist in authenticating a speaker by matching the audio input to at least one of the speaker voiceprints. The voiceprint matching module adjusts a confidence of voiceprint match attempt results by at least one of: (a) a number of utterance repetitions upon which a matching speaker voiceprint has been trained; or (b) a passage of time since a training occurrence associated with a matching speaker voiceprint.
Abstract:
A system for operating one or more devices using speech input including a receiver for receiving a speech input, a controller in communication with said receiver, software executing on said controller for converting the speech input into computer-readable data, software executing on said controller for generating a table of active commands, the table including a portion of all valid commands of the system, software executing on said controller for identifying at least one active command represented by the data, and software executing on said controller for transmitting the active command to at least one device operable by the active command.
Abstract:
A set of models is developed to represent sound units and these models are then used with the incorrect sound units to determine which generate high likelihood scores. The models generating high likelihood scores for the incorrect sound units represent those that are more likely to be confused. The resulting confusability data may then be used in generating more discriminative speech models and in subsequent pruning of the acoustic decision tree. The confusability data may also be used to develop confusability predictors used for rejection during search and in developing continuous speech recognition models that are optimized to minimize confusability.
Abstract:
The system includes a database of program records representing A/V programs which are available for recording. The system also includes an A/V recording device for receiving a recording command and recording the A/V program. A speech recognizer is provided for receiving the spoken request and translating the spoken request into a text stream having a plurality of words. A natural language processor receives the text stream and processes the words for resolving a semantic content of the spoken request. The natural language processor places the meaning of the words into a task frame having a plurality of key word slots. A dialogue system analyzes the task frame for determining if a sufficient number of key word slots have been filled and prompts the user for additional information for filling empty slots. The dialogue system searches the database of program records using the key words placed within the task frame for selecting the A/V program and generating the recording command for use by the A/V recording device.
Abstract:
A voice controlled medical system with improved speech recognition includes a first microphone array, a second microphone array, a controller in communication with the first and second microphone arrays, and a medical device operable by the controller. The controller includes a beam module that generates a first beamed signal using signals from the first microphone array and a second beamed signal using signals from the second microphone array. The controller also includes a comparison module that compares the first and second beamed signals and determines a correlation between the first and second beamed signals. The controller also includes a voice interpreting module that identifies commands within the first and second beamed signals if the correlation is above a correlation threshold. The controller also includes an instrument control module that executes the commands to operate said medical device.
Abstract:
A device control system including at least one device operable by the system, at least one processor, software executing on the at least one processor for receiving message data and determining a corresponding XML document type, software executing on the at least one processor for generating a XML document based on the XML document type, the XML document including the message data, software executing on the processor for packetizing the XML document, and two or more communication components, each communication component including an XML parser for parsing the XML document and extracting the message data.
Abstract:
An audio, visual and device data capturing system including an audio recorder for recording audio data, at least one visual recorder for recording visual data, at least one device data recorder for receiving device data from at least one device in communication with the system, a speech recognition module for interpreting the audio data, a transcript module for generating transcript data from the interpreted audio data, a data capturing module for generating a data record including at least a portion of each of the audio data, the transcript data, the visual data and the device data, and at least one storage device for storing the data record.
Abstract:
A speech recognition and control system continuously operable by two or more users including a receiver for receiving a speech input, a processor in communication with the receiver, a database in communication with the processor, the database including a plurality of user profiles, profile management software executing on the processor for determining an active profile from the plurality of user profiles, and software executing on the processor for identifying at least one command from the speech input based on the active profile.
Abstract:
A system for tuning the text-to-speech conversion process having a text-to-speech engine that converts the input text into a processed text form which includes speech features. A visual editing interface displaying the processed text form using graphical indicators on an output device to allow a user to edit the text and graphical indicators to modify the speech features of the text input.
Abstract:
A computer-implemented method and apparatus is provided for processing a spoken request from a user. A speech recognizer converts the spoken request into a digital format. A frame data structure associates semantic components of the digitized spoken request with predetermined slots. The slots are indicative of data which are used to achieve a predetermined goal. A speech understanding module which is connected to the speech recognizer and to the frame data structure determines semantic components of the spoken request. The slots are populated based upon the determined semantic components. A dialog manager which is connected to the speech understanding module may determine at least one slot which is unpopulated based upon the determined semantic components and in a preferred embodiment may provide confirmation of the populated slots. A computer generated-request is formulated in order for the user to provide data related to the unpopulated slot. The method and apparatus are well-suited (but not limited) to use in a hand-held speech translation device.