摘要:
A method for building a language model, a speech recognition method and an electronic apparatus are provided. The speech recognition method includes the following steps. Phonetic transcriptions of a speech signal are obtained from an acoustic model. Phonetic spellings matching the phonetic transcriptions are obtained according to the phonetic transcriptions and a syllable acoustic lexicon. According to the phonetic spellings, a plurality of text sequences and a plurality of text sequence probabilities are obtained from a language model. Each phonetic spelling is matched to a candidate sentence table; a word probability of each phonetic spelling matching a word in a sentence of the sentence table are obtained; and the word probabilities of the phonetic spellings are calculated so as to obtain the text sequence probabilities. The text sequence corresponding to a largest one of the sequence probabilities is selected as a recognition result of the speech signal.
摘要:
A method is provided in one example and includes receiving data propagating in a network environment, and identifying selected words within the data based on a whitelist. The whitelist includes a plurality of designated words to be tagged. The method further includes assigning a weight to the selected words based on at least one characteristic associated with the data, and associating the selected words to an individual. A resultant composite is generated for the selected words that are tagged. In more specific embodiments, the resultant composite is partitioned amongst a plurality of individuals associated with the data propagating in the network environment. A social graph can be generated that identifies a relationship between a selected individual and the plurality of individuals based on a plurality of words exchanged between the selected individual and the plurality of individuals.
摘要:
A natural language understanding system performs automatic unsupervised clustering of dialog data from a natural language dialog application. A log parser automatically extracts structured dialog data from application logs. A dialog generalizing module generalizes the extracted dialog data to generalization identifier vectors. A data clustering module automatically clusters the dialog data based on the generalization identifier vectors using an unsupervised density-based clustering algorithm without a predefined number of clusters and without a predefined distance threshold in an iterative approach based on a hierarchical ordering of the generalization.
摘要:
A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model using discriminative training techniques, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.
摘要:
A speech recognition system uses, in one embodiment, an extended phonetic dictionary that is obtained by processing words in a user's set of databases, such as a user's contacts database, with a set of pronunciation guessers. The speech recognition system can use a conventional phonetic dictionary and the extended phonetic dictionary to recognize speech inputs that are user requests to use the contacts database, for example, to make a phone call, etc. The extended phonetic dictionary can be updated in response to changes in the contacts database, and the set of pronunciation guessers can include pronunciation guessers for a plurality of locales, each locale having its own pronunciation guesser.
摘要:
A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model using discriminative training techniques, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.
摘要:
An apparatus is provided for classifying targets into a known-object group and an unknown-object group. The apparatus includes a speech/image data storage unit configured to store a spoken sound of a name of an object and an image of the object; a unit configured to calculate a speech confidence level of a speech for the name of the object with reference to a spoken sound of a name of a known object; a unit configured to calculate an image confidence level of an image of an object with respect to an image of a known object; and a unit configured to compare an evaluation value, which is obtained by combining the speech confidence level and image confidence level, with a threshold value, and classify a target object into an object group determined according to whether the spoken sound of the name and the image are known or unknown.
摘要:
A system is provided for training an acoustic model for use in speech recognition. In particular, such a system may be used to perform training based on a spoken audio stream and a non-literal transcript of the spoken audio stream. Such a system may identify text in the non-literal transcript which represents concepts having multiple spoken forms. The system may attempt to identify the actual spoken form in the audio stream which produced the corresponding text in the non-literal transcript, and thereby produce a revised transcript which more accurately represents the spoken audio stream. The revised, and more accurate, transcript may be used to train the acoustic model using discriminative training techniques, thereby producing a better acoustic model than that which would be produced using conventional techniques, which perform training based directly on the original non-literal transcript.
摘要:
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for collecting web data in order to create diverse language models. A system configured to practice the method first crawls, such as via a crawler operating on a computing device, a set of documents in a network of interconnected devices according to a visitation policy, wherein the visitation policy is configured to focus on novelty regions for a current language model built from previous crawling cycles by crawling documents whose vocabulary considered likely to fill gaps in the current language model. A language model from a previous cycle can be used to guide the creation of a language model in the following cycle. The novelty regions can include documents with high perplexity values over the current language model.
摘要:
A system of templates of words and terms use in medicine and surgery by physicians for optimizing the outcomes of speech recognition process of converting digital voice data produced by physicians into digital text data comprises words and terms use in medical and surgical specialties. The medical and surgical vocabulary templates comprise individual logical arrangements or orders of related words and terms use by physicians to communicate and record health and health-care information and data.