摘要:
Systems, methods, and apparatus for using different interfaces to receive from different devices representations of at least one audio signal. In some embodiments, each representation may be generated using at least one microphone of the respective device during a meeting attended by a plurality of participants. In some further embodiments, a first representation may be received from a first device via a telephone network, while a second representation may be received from a second device via a data network. In yet some further embodiments, the first and second representations may be processed to obtain a processed representation of the at least one audio signal.
摘要:
Systems, methods and apparatus for capturing at least one audio signal using a plurality of microphones that generate a plurality of representations of the at least one audio signal. In some embodiments, the plurality of microphones are disposed in a multiple-microphone setting so that the at least one audio signal is captured by at least two of the plurality of microphones. In some embodiments, at least one of the plurality of microphones is a microphone of a mobile device. The plurality of representations of the at least one audio signal may be processed to obtain a processed representation of the at least one audio signal.
摘要:
Systems, methods, and apparatus for using at least one mobile device to receive a representation of at least one audio signal. In some embodiments, the at least one audio signal comprises speech of at least one of a plurality of first participants of a meeting, the plurality of first participants participating in the meeting from a first location, and the at least one audio signal may be audibly rendered to at least one second participant of the meeting at a second location different from the first location. In some embodiments, the at least one mobile device may further receive an indication of an identity of a leading speaker of the speech in the at least one audio signal, the leading speaker being identified from among the plurality of first participants, and may render the identity of the leading speaker to the at least one second participant.
摘要:
Systems, methods and apparatus for capturing at least one audio signal using a plurality of microphones that generate a plurality of representations of the at least one audio signal. In some embodiments, the plurality of microphones are disposed in a multiple-microphone setting so that the at least one audio signal is captured by at least two of the plurality of microphones. In some embodiments, at least one of the plurality of microphones is a microphone of a mobile device. The plurality of representations of the at least one audio signal may be processed to obtain a processed representation of the at least one audio signal.
摘要:
Techniques for error correction using a history list comprising at least one misrecognition and correction information associated with each of the at least one misrecognitions indicating how a user corrected the associated misrecognition. The techniques include converting data input from a user to generate a text segment, determining whether at least a portion of the text segment appears in the history list as one of the at least one misrecognitions, if the at least a portion of the text segment appears in the history list as one of the at least one misrecognitions, obtaining the correction information associated with the at least one misrecognition, and correcting the at least a portion of the text segment based, at least in part, on the correction information.
摘要:
Some embodiments relate to a method of performing a search for content on the Internet, in which a user may speak a search query and speech recognition may be performed on the spoken query to generate a text search query to be provided to a plurality of search engines. This enables a user to speak the search query rather than having to type it, and also allows the user to provide the search query only once, rather than having to provide it separately to multiple different search engines.
摘要:
In one aspect, a method for determining a validity of an identity asserted by a speaker using a voice print is provided. The method comprises acts of performing a first verification stage comprising comparing a first voice signal from the speaker uttering at least one first challenge utterance-with at least a portion of the voice print and performing a second verification stage if it is concluded in the first verification stage that the first voice signal was obtained from an utterance by the user. The second verification stage comprises adapting at least one parameter of the voice print based, at least in part, on the first voice signal to obtain an adapted voice print, and comparing a second voice signal from the speaker uttering at least one second challenge utterance with at least a portion of the adapted voice print.
摘要:
Some embodiments relate to a method of performing a search for content on the Internet, in which a user may speak a search query and speech recognition may be performed on the spoken query to generate a text search query to be provided to a plurality of search engines. This enables a user to speak the search query rather than having to type it, and also allows the user to provide the search query only once, rather than having to provide it separately to multiple different search engines.
摘要:
A modeless large vocabulary continuous speech recognition system is provided that represents an input utterance as a sequence of input vectors. The system includes a common library of acoustic model states for arrangement in sequences that form acoustic models. Each acoustic model is composed of a sequence of segment models and each segment model is composed of a sequence of model states. An input processor compares each vector in a sequence of input vectors to a set of model states in the common library to produce a match score for each model state in the set, reflecting the likelihood that a state is represented by a vector. The system also includes a plurality of recognition modules and associated recognition grammars. The recognition modules operate in parallel and use the match scores with the acoustic models to determine at least one recognition result in each of the recognition modules. The recognition modules includes a dictation module for producing at least one probable dictation recognition result, a select module for recognizing a portion of visually displayed text for processing with a command, and a command module for producing at least one probable command recognition result. An arbitrator uses an arbitration algorithm and a score ordered queue of recognition results, together with their associated recognition modules, to compare the recognition results of the recognition modules to select at least one system recognition result.
摘要:
In some embodiments, the recognition results produced by a speech processing system (which may include a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated to determine whether a meaning of any of the alternative recognition results differs from a meaning of the top recognition result in a manner that is significant for the domain. In some embodiments, one or more of the recognition results may be evaluated to determine whether the result(s) include one or more words or phrases that, when included in a result, would change a meaning of the result in a manner that would be significant for the domain.