Abstract:
A method and a device for training an acoustic language model, include: conducting word segmentation for training samples in a training corpus using an initial language model containing no word class labels, to obtain initial word segmentation data containing no word class labels; performing word class replacement for the initial word segmentation data containing no word class labels, to obtain first word segmentation data containing word class labels; using the first word segmentation data containing word class labels to train a first language model containing word class labels; using the first language model containing word class labels to conduct word segmentation for the training samples in the training corpus, to obtain second word segmentation data containing word class labels; and in accordance with the second word segmentation data meeting one or more predetermined criteria, using the second word segmentation data containing word class labels to train the acoustic language model.
Abstract:
Methods and computer systems for audio search on a social networking platform are disclosed. The method includes: while running a social networking application, receiving a first audio input from a user of the computer system, the first audio input including one or more search keywords; generating a first audio confusion network from the first audio input; determining whether the first audio confusion network matches at least one of one or more second audio confusion networks, wherein a respective second audio confusion network was generated from a corresponding second audio input associated with a chat session of which the user is a participant; and identifying a second audio input corresponding to the at least one second audio confusion network that matches the first audio confusion network, wherein the identified second audio input includes the one or more search keywords that are included in the first audio input.
Abstract:
A method of processing information content based on a Chinese language model is performed at a computer, the method including: identifying a plurality of expressions in the information content extracted from a speech input through speech recognition that is queued to be processed; dividing the expressions into a plurality of characteristic units according to semantic features and predetermined characteristics associated with each characteristic unit, each including a subset of the expressions and the predetermined characteristics at least including a respective integer number of expressions that are included in the characteristic unit; extracting, from the Chinese language model, a plurality of probabilities for punctuation marks associated with each characteristic unit; and in accordance with the probabilities, associating a respective punctuation mark with each characteristic unit included in the information content. The method further comprises adding punctuation marks based on a weight determined for each punctuation mark.
Abstract:
The present application discloses a method, an electronic system and a non-transitory computer readable storage medium for recognizing audio commands in an electronic device. The electronic device obtains audio data based on an audio signal provided by a user and extracts characteristic audio fingerprint features from the audio data. The electronic device further determines whether the corresponding audio signal is generated by an authorized user by comparing the characteristic audio fingerprint features with an audio fingerprint model for the authorized user and with a universal background model that represents user-independent audio fingerprint features, respectively. When the corresponding audio signal is generated by the authorized user of the electronic device, an audio command is extracted from the audio data, and an operation is performed according to the audio command.
Abstract:
An electronic device with one or more processors and memory trains an acoustic model with an international phonetic alphabet (IPA) phoneme mapping collection and audio samples in different languages, where the acoustic model includes: a foreground model; and a background model. The device generates a phone decoder based on the trained acoustic model. The device collects keyword audio samples, decodes the keyword audio samples with the phone decoder to generate phoneme sequence candidates, and selects a keyword phoneme sequence from the phoneme sequence candidates. After obtaining the keyword phoneme sequence, the device detects one or more keywords in an input audio signal with the trained acoustic model, including: matching phonemic keyword portions of the input audio signal with phonemes in the keyword phoneme sequence with the foreground model; and filtering out phonemic non-keyword portions of the input audio signal with the background model.
Abstract:
Methods and computer systems for audio search on a social networking platform are disclosed. While running a social networking application, a computer system receives a first audio input from a user of the computer system and then generates a first audio confusion network from the first audio input. After comparing the first audio confusion network with one or more second audio confusion networks, each corresponding to a second audio input associated with one of a plurality of participants of a chat session of the social networking application, the computer system identifies at least one second audio input corresponding to the at least one second audio confusion network that matches the first audio confusion network and displays a portion of the chat session including a visual icon representing the identified second audio input on a display of the computer system.
Abstract:
Systems and methods are provided for adding punctuations. For example, one or more first feature units are identified in a voice file taken as a whole; the voice file is divided into multiple segments by detecting silences in the voice file; one or more second feature units are identified in the voice file; a first aggregate weight of first punctuation states of the voice file and a second aggregate weight of second punctuation states of the voice file are determined, using a language model established based on word separation and third semantic features; a weighted calculation is performed to generate a third aggregate weight based on a linear combination associated with the first aggregate weight and the second aggregate weight; and one or more final punctuations are added to the voice file based on at least information associated with the third aggregate weight.
Abstract:
A method and an apparatus are provided for retrieving keyword. The apparatus configures at least two types of language models in a model file, where each type of language model includes a recognition model and a corresponding decoding model; the apparatus extracts a speech feature from the to-be-processed speech data; performs language matching on the extracted speech feature by using recognition models in the model file one by one, and determines a recognition model based on a language matching rate; and determines a decoding model corresponding to the recognition model; decoding the extracted speech feature by using the determined decoding model, and obtains a word recognition result after the decoding; and matches a keyword in a keyword dictionary and the word recognition result, and outputs a matched keyword.
Abstract:
A method and an apparatus are provided for retrieving keyword. The apparatus configures at least two types of language models in a model file, where each type of language model includes a recognition model and a corresponding decoding model; the apparatus extracts a speech feature from the to-be-processed speech data; performs language matching on the extracted speech feature by using recognition models in the model file one by one, and determines a recognition model based on a language matching rate; and determines a decoding model corresponding to the recognition model; decoding the extracted speech feature by using the determined decoding model, and obtains a word recognition result after the decoding; and matches a keyword in a keyword dictionary and the word recognition result, and outputs a matched keyword.
Abstract:
A method includes: acquiring data samples; performing categorized sentence mining in the acquired data samples to obtain categorized training samples for multiple categories; building a text classifier based on the categorized training samples; classifying the data samples using the text classifier to obtain a class vocabulary and a corpus for each category; mining the corpus for each category according to the class vocabulary for the category to obtain a respective set of high-frequency language templates; training on the templates for each category to obtain a template-based language model for the category; training on the corpus for each category to obtain a class-based language model for the category; training on the class vocabulary for each category to obtain a lexicon-based language model for the category; building a speech decoder according to an acoustic model, the class-based language model and the lexicon-based language model for any given field, and the data samples.