-
公开(公告)号:US11450312B2
公开(公告)日:2022-09-20
申请号:US16900824
申请日:2020-06-12
Inventor: Shilun Lin , Xilin Zhang , Wenhua Ma , Bo Liu , Xinhui Li , Li Lu , Xiucai Jiang
Abstract: A speech recognition method includes: obtaining speech information; and determining beginning and ending positions of a candidate speech segment in the speech information by using a weighted finite state transducer (WFST) network. The candidate speech segment is identified as corresponding to a preset keyword. The method also includes clipping the candidate speech segment from the speech information according to the beginning and ending positions of the candidate speech segment; detecting whether the candidate speech segment includes a preset keyword by using a machine learning model; and determining, upon determining that the candidate speech segment comprises the preset keyword, that the speech information comprises the preset keyword.
-
公开(公告)号:US20190102373A1
公开(公告)日:2019-04-04
申请号:US16133440
申请日:2018-09-17
Inventor: Lou Li , Qiang Cheng , Feng Rao , Li Lu , Xiang Zhang , Shuai Yue , Bo Chen , Duling Lu
IPC: G06F17/27
Abstract: A method is performed at a computer for automatically correcting typographical errors. The computer selects a target word in a target sentence and identifies a target word therein as having a typographical error and first and second sequences of words separated by the target word as context. After identifying, among a database of grammatically correct sentences, a set of sentences having the first and second sequences of words, each sentence including a replacement word, the computer selects a set of candidate grammatically correct sentences whose corresponding replacement words have similarities to the target word above a pre-set threshold, Finally, the computer chooses, among the set of candidate grammatically correct sentences, a fittest grammatically correct sentence according to a linguistic model and replaces the target word in the target sentence with the replacement word within the fittest grammatically correct sentence.
-
3.
公开(公告)号:US09805715B2
公开(公告)日:2017-10-31
申请号:US14106634
申请日:2013-12-13
Inventor: Shuai Yue , Li Lu , Xiang Zhang , Dadong Xie , Haibo Liu , Bo Chen , Jian Liu
CPC classification number: G10L15/14 , G10L15/063 , G10L15/083 , G10L15/32 , G10L2015/088 , G10L2015/223
Abstract: A method of recognizing speech commands includes generating a background acoustic model for a sound using a first sound sample, the background acoustic model characterized by a first precision metric. A foreground acoustic model is generated for the sound using a second sound sample, the foreground acoustic model characterized by a second precision metric. A third sound sample is received and decoded by assigning a weight to the third sound sample corresponding to a probability that the sound sample originated in a foreground using the foreground acoustic model and the background acoustic model. The method further includes determining if the weight meets predefined criteria for assigning the third sound sample to the foreground and, when the weight meets the predefined criteria, interpreting the third sound sample as a portion of a speech command. Otherwise, recognition of the third sound sample as a portion of a speech command is forgone.
-
公开(公告)号:US09502038B2
公开(公告)日:2016-11-22
申请号:US14105110
申请日:2013-12-12
Inventor: Eryu Wang , Li Lu , Xiang Zhang , Haibo Liu , Lou Li , Feng Rao , Duling Lu , Shuai Yue , Bo Chen
Abstract: A method and device for voiceprint recognition, include: establishing a first-level Deep Neural Network (DNN) model based on unlabeled speech data, the unlabeled speech data containing no speaker labels and the first-level DNN model specifying a plurality of basic voiceprint features for the unlabeled speech data; obtaining a plurality of high-level voiceprint features by tuning the first-level DNN model based on labeled speech data, the labeled speech data containing speech samples with respective speaker labels, and the tuning producing a second-level DNN model specifying the plurality of high-level voiceprint features; based on the second-level DNN model, registering a respective high-level voiceprint feature sequence for a user based on a registration speech sample received from the user; and performing speaker verification for the user based on the respective high-level voiceprint feature sequence registered for the user.
Abstract translation: 用于声纹识别的方法和装置包括:基于未标记的语音数据建立第一级深神经网络(DNN)模型,不包含扬声器标签的未标记语音数据和指定多个基本声纹特征的第一级DNN模型 对于未标记的语音数据; 通过基于标记的语音数据调整第一级DNN模型来获得多个高级声纹特征,所述标记语音数据包含具有相应扬声器标签的语音样本,并且调谐产生指定多个高的DNN模型 级的声纹特征; 基于第二级DNN模型,基于从用户接收到的注册语音样本,为用户注册相应的高级声纹特征序列; 以及基于为用户注册的各个高级声纹特征序列,为用户执行说话人验证。
-
公开(公告)号:US09396723B2
公开(公告)日:2016-07-19
申请号:US14109845
申请日:2013-12-17
IPC: G10L15/00 , G10L15/06 , G06F17/28 , G10L15/183
CPC classification number: G10L15/063 , G06F17/28 , G10L15/183
Abstract: A method and a device for training an acoustic language model, include: conducting word segmentation for training samples in a training corpus using an initial language model containing no word class labels, to obtain initial word segmentation data containing no word class labels; performing word class replacement for the initial word segmentation data containing no word class labels, to obtain first word segmentation data containing word class labels; using the first word segmentation data containing word class labels to train a first language model containing word class labels; using the first language model containing word class labels to conduct word segmentation for the training samples in the training corpus, to obtain second word segmentation data containing word class labels; and in accordance with the second word segmentation data meeting one or more predetermined criteria, using the second word segmentation data containing word class labels to train the acoustic language model.
Abstract translation: 一种用于训练声学语言模型的方法和装置,包括:使用不含词类标签的初始语言模型,在训练语料库中训练样本的词分割,以获得不包含词类标签的初始分词数据; 对不包含词类标签的初始分词数据执行单词类替换,以获得包含单词分类标签的第一分词数据; 使用包含词类标签的第一词分割数据来训练包含词类标签的第一语言模型; 使用包含词类标签的第一语言模型对训练语料库中的训练样本进行词分割,以获得包含词类标签的第二词分割数据; 并且根据满足一个或多个预定标准的第二字分割数据,使用包含词类标签的第二词分割数据来训练声学语言模型。
-
公开(公告)号:US20150095032A1
公开(公告)日:2015-04-02
申请号:US14567969
申请日:2014-12-11
Inventor: Lu LI , Li Lu , Jianxiong Ma , Linghui Kong , Feng Rao , Shuai Yue , Xiang Zhang , Haibo Liu , Eryu Wang , Bo Chen
IPC: G10L15/08
CPC classification number: G10L15/08 , G10L15/083 , G10L2015/088
Abstract: This application discloses a method implemented of recognizing a keyword in a speech that includes a sequence of audio frames further including a current frame and a subsequent frame. A candidate keyword is determined for the current frame using a decoding network that includes keywords and filler words of multiple languages, and used to determine a confidence score for the audio frame sequence. A word option is also determined for the subsequent frame based on the decoding network, and when the candidate keyword and the word option are associated with two distinct types of languages, the confidence score of the audio frame sequence is updated at least based on a penalty factor associated with the two distinct types of languages. The audio frame sequence is then determined to include both the candidate keyword and the word option by evaluating the updated confidence score according to a keyword determination criterion.
Abstract translation: 本申请公开了一种实现的方法,其中识别语音中的关键字,其中包括进一步包括当前帧和后续帧的音频帧序列。 使用包括多种语言的关键词和填充词的解码网络为当前帧确定候选关键字,并且用于确定音频帧序列的置信度分数。 还基于解码网络为后续帧确定字选项,并且当候选关键词和词选项与两种不同类型的语言相关联时,至少基于惩罚来更新音频帧序列的置信度得分 与两种不同类型语言相关联的因素。 然后通过根据关键字确定标准评估更新的可信度得分,确定音频帧序列以包括候选关键词和词选项。
-
公开(公告)号:US09818432B2
公开(公告)日:2017-11-14
申请号:US15176047
申请日:2016-06-07
Inventor: Lu Li , Jianxiong Ma , Li Lu
CPC classification number: G10L25/54 , G06F17/30026 , G10L15/14 , G10L21/10 , G10L2015/027 , G10L2015/088
Abstract: Methods and computer systems for audio search on a social networking platform are disclosed. The method includes: while running a social networking application, receiving a first audio input from a user of the computer system, the first audio input including one or more search keywords; generating a first audio confusion network from the first audio input; determining whether the first audio confusion network matches at least one of one or more second audio confusion networks, wherein a respective second audio confusion network was generated from a corresponding second audio input associated with a chat session of which the user is a participant; and identifying a second audio input corresponding to the at least one second audio confusion network that matches the first audio confusion network, wherein the identified second audio input includes the one or more search keywords that are included in the first audio input.
-
公开(公告)号:US09811517B2
公开(公告)日:2017-11-07
申请号:US14148579
申请日:2014-01-06
Inventor: Haibo Liu , Eryu Wang , Xiang Zhang , Li Lu , Shuai Yue , Qiuge Liu , Bo Chen , Jian Liu , Lu Li
CPC classification number: G06F17/273 , G06F17/2775 , G06F17/2785 , G06F17/289 , G10L15/265
Abstract: A method of processing information content based on a Chinese language model is performed at a computer, the method including: identifying a plurality of expressions in the information content extracted from a speech input through speech recognition that is queued to be processed; dividing the expressions into a plurality of characteristic units according to semantic features and predetermined characteristics associated with each characteristic unit, each including a subset of the expressions and the predetermined characteristics at least including a respective integer number of expressions that are included in the characteristic unit; extracting, from the Chinese language model, a plurality of probabilities for punctuation marks associated with each characteristic unit; and in accordance with the probabilities, associating a respective punctuation mark with each characteristic unit included in the information content. The method further comprises adding punctuation marks based on a weight determined for each punctuation mark.
-
公开(公告)号:US09754581B2
公开(公告)日:2017-09-05
申请号:US13903593
申请日:2013-05-28
Inventor: Li Lu , Feng Rao , Song Liu , Zongyao Tang , Xiang Zhang , Shuai Yue , Bo Chen
CPC classification number: G10L15/08 , G06Q10/1097 , G10L15/26 , G10L2015/088
Abstract: The present invention, pertaining to the field of speech recognition, discloses a reminder setting method and apparatus. The method includes: acquiring speech signals; acquiring time information in speech signals by using keyword recognition, and determining reminder time for reminder setting according to the time information; acquiring text sequence corresponding to the speech signals by using continuous speech recognition, and determining reminder content for reminder setting according to the time information and the text sequence; and setting a reminder according to the reminder time and the reminder content. According to the present invention, acquiring time information in speech signals by using keyword recognition ensures correctness of time information extraction, and achieves an effect that correct time information is still acquired by keyword recognition to set a reminder even in the case that a recognized text sequence is incorrect due to poor precision in whole text recognition in the speech recognition.
-
10.
公开(公告)号:US09697821B2
公开(公告)日:2017-07-04
申请号:US14108223
申请日:2013-12-16
Inventor: Feng Rao , Li Lu , Bo Chen , Shuai Yue , Xiang Zhang , Eryu Wang , Dadong Xie , Lou Li , Duling Lu
IPC: G10L15/06 , G10L15/183 , G10L15/197 , G10L15/26
CPC classification number: G10L15/063 , G10L15/183 , G10L15/197 , G10L15/26
Abstract: An automatic speech recognition method includes at a computer having one or more processors and memory for storing one or more programs to be executed by the processors, obtaining a plurality of speech corpus categories through classifying and calculating raw speech corpus; obtaining a plurality of classified language models that respectively correspond to the plurality of speech corpus categories through a language model training applied on each speech corpus category; obtaining an interpolation language model through implementing a weighted interpolation on each classified language model and merging the interpolated plurality of classified language models; constructing a decoding resource in accordance with an acoustic model and the interpolation language model; and decoding input speech using the decoding resource, and outputting a character string with a highest probability as a recognition result of the input speech.
-
-
-
-
-
-
-
-
-