Patent search ap:("TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED") AND inv:"Eryu Wang" Page 1

1.

发明授权
Systems and methods for audio command recognition with speaker authentication 有权

公开(公告)号：US10013985B2

公开(公告)日：2018-07-03

申请号：US14958606

申请日：2015-12-03

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Shuai Yue , Xiang Zhang , Li Lu , Feng Rao , Eryu Wang , Haibo Liu , Bo Chen , Jian Liu , Lu Li

IPC: G10L17/24 , G10L17/02 , G10L17/04 , G06F3/16 , G10L17/16 , G10L17/26 , G10L15/22

CPC classification number: G10L17/24 , G06F3/167 , G10L15/22 , G10L17/02 , G10L17/16 , G10L17/26 , G10L2015/223

Abstract: The present application discloses a method, an electronic system and a non-transitory computer readable storage medium for recognizing audio commands in an electronic device. The electronic device obtains audio data based on an audio signal provided by a user and extracts characteristic audio fingerprint features from the audio data. The electronic device further determines whether the corresponding audio signal is generated by an authorized user by comparing the characteristic audio fingerprint features with an audio fingerprint model for the authorized user and with a universal background model that represents user-independent audio fingerprint features, respectively. When the corresponding audio signal is generated by the authorized user of the electronic device, an audio command is extracted from the audio data, and an operation is performed according to the audio command.

2.

发明授权
Method and device for parallel processing in model training 有权
Title translation: 模型训练中并行处理的方法与装置

公开(公告)号：US09508347B2

公开(公告)日：2016-11-29

申请号：US14108237

申请日：2013-12-16

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Eryu Wang , Li Lu , Xiang Zhang , Haibo Liu , Feng Rao , Lou Li , Shuai Yue , Bo Chen

IPC: G10L15/16 , G10L15/34 , G10L15/06 , G06N3/02

CPC classification number: G10L15/34 , G06N3/02 , G10L15/063 , G10L15/16

Abstract: A method and a device for training a DNN model includes: at a device including one or more processors and memory: establishing an initial DNN model; dividing a training data corpus into a plurality of disjoint data subsets; for each of the plurality of disjoint data subsets, providing the data subset to a respective training processing unit of a plurality of training processing units operating in parallel, wherein the respective training processing unit applies a Stochastic Gradient Descent (SGD) process to update the initial DNN model to generate a respective DNN sub-model based on the data subset; and merging the respective DNN sub-models generated by the plurality of training processing units to obtain an intermediate DNN model, wherein the intermediate DNN model is established as either the initial DNN model for a next training iteration or a final DNN model in accordance with a preset convergence condition.

Abstract translation: 用于训练DNN模型的方法和设备包括：在包括一个或多个处理器和存储器的设备上：建立初始DNN模型; 将训练数据语料库划分为多个不相交的数据子集; 对于多个不相交数据子集中的每一个，将数据子集提供给并行操作的多个训练处理单元的相应训练处理单元，其中各训练处理单元应用随机梯度下降（SGD）过程来更新初始 DNN模型基于数据子集生成相应的DNN子模型; 并且合并由多个训练处理单元生成的各个DNN子模型，以获得中间DNN模型，其中中间DNN模型被建立为用于下一个训练迭代的初始DNN模型或根据下面的训练迭代的最终DNN模型预设收敛条件。

3.

发明授权
Keyword detection with international phonetic alphabet by foreground model and background model 有权
Title translation: 用前景模型和背景模型对国际语音字母进行关键词检测

公开(公告)号：US09466289B2

公开(公告)日：2016-10-11

申请号：US14103775

申请日：2013-12-11

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Li Lu , Xiang Zhang , Shuai Yue , Feng Rao , Eryu Wang , Lu Li

IPC: G10L15/06 , G10L15/08

CPC classification number: G10L15/063 , G10L15/08 , G10L2015/088

Abstract: An electronic device with one or more processors and memory trains an acoustic model with an international phonetic alphabet (IPA) phoneme mapping collection and audio samples in different languages, where the acoustic model includes: a foreground model; and a background model. The device generates a phone decoder based on the trained acoustic model. The device collects keyword audio samples, decodes the keyword audio samples with the phone decoder to generate phoneme sequence candidates, and selects a keyword phoneme sequence from the phoneme sequence candidates. After obtaining the keyword phoneme sequence, the device detects one or more keywords in an input audio signal with the trained acoustic model, including: matching phonemic keyword portions of the input audio signal with phonemes in the keyword phoneme sequence with the foreground model; and filtering out phonemic non-keyword portions of the input audio signal with the background model.

Abstract translation: 具有一个或多个处理器和存储器的电子设备具有使用不同语言的国际语音字母（IPA）音素映射收集和音频样本的声学模型，其中声学模型包括：前景模型; 和背景模型。该设备基于经过训练的声学模型生成电话解码器。设备收集关键字音频样本，用手机解码器解码关键词音频样本，以产生音素序列候选，并从音素序列候选中选择关键词音素序列。在获得关键字音素序列之后，设备利用经训练的声学模型检测输入音频信号中的一个或多个关键词，包括：使用前景模型将关键字音素序列中的输入音频信号的音素关键词部分与音素相匹配; 并用背景模型滤出输入音频信号的音素非关键字部分。

4.

发明授权
Keyword detection for speech recognition 有权
Title translation: 语音识别的关键字检测

公开(公告)号：US09230541B2

公开(公告)日：2016-01-05

申请号：US14567969

申请日：2014-12-11

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Lu Ll , Li Lu , Jianxiong Ma , Linghui Kong , Feng Rao , Shuai Yue , Xiang Zhang , Haibo Liu , Eryu Wang , Bo Chen

IPC: G10L15/08

CPC classification number: G10L15/08 , G10L15/083 , G10L2015/088

Abstract: This application discloses a method implemented of recognizing a keyword in a speech that includes a sequence of audio frames further including a current frame and a subsequent frame. A candidate keyword is determined for the current frame using a decoding network that includes keywords and filler words of multiple languages, and used to determine a confidence score for the audio frame sequence. A word option is also determined for the subsequent frame based on the decoding network, and when the candidate keyword and the word option are associated with two distinct types of languages, the confidence score of the audio frame sequence is updated at least based on a penalty factor associated with the two distinct types of languages. The audio frame sequence is then determined to include both the candidate keyword and the word option by evaluating the updated confidence score according to a keyword determination criterion.

Abstract translation: 本申请公开了一种实现的方法，其中识别语音中的关键字，其中包括进一步包括当前帧和后续帧的音频帧序列。使用包括多种语言的关键词和填充词的解码网络为当前帧确定候选关键字，并且用于确定音频帧序列的置信度分数。还基于解码网络为后续帧确定字选项，并且当候选关键词和词选项与两种不同类型的语言相关联时，至少基于惩罚来更新音频帧序列的置信度得分与两种不同类型语言相关联的因素。然后通过根据关键字确定标准评估更新的可信度得分，确定音频帧序列以包括候选关键词和词选项。

5.

发明授权
Method and device for voiceprint recognition 有权
Title translation: 用于声纹识别的方法和装置

公开(公告)号：US09502038B2

公开(公告)日：2016-11-22

申请号：US14105110

申请日：2013-12-12

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Eryu Wang , Li Lu , Xiang Zhang , Haibo Liu , Lou Li , Feng Rao , Duling Lu , Shuai Yue , Bo Chen

IPC: G10L21/00 , G10L17/18

CPC classification number: G10L17/18 , G10L17/02 , G10L17/04 , G10L17/08

Abstract: A method and device for voiceprint recognition, include: establishing a first-level Deep Neural Network (DNN) model based on unlabeled speech data, the unlabeled speech data containing no speaker labels and the first-level DNN model specifying a plurality of basic voiceprint features for the unlabeled speech data; obtaining a plurality of high-level voiceprint features by tuning the first-level DNN model based on labeled speech data, the labeled speech data containing speech samples with respective speaker labels, and the tuning producing a second-level DNN model specifying the plurality of high-level voiceprint features; based on the second-level DNN model, registering a respective high-level voiceprint feature sequence for a user based on a registration speech sample received from the user; and performing speaker verification for the user based on the respective high-level voiceprint feature sequence registered for the user.

Abstract translation: 用于声纹识别的方法和装置包括：基于未标记的语音数据建立第一级深神经网络（DNN）模型，不包含扬声器标签的未标记语音数据和指定多个基本声纹特征的第一级DNN模型对于未标记的语音数据; 通过基于标记的语音数据调整第一级DNN模型来获得多个高级声纹特征，所述标记语音数据包含具有相应扬声器标签的语音样本，并且调谐产生指定多个高的DNN模型级的声纹特征; 基于第二级DNN模型，基于从用户接收到的注册语音样本，为用户注册相应的高级声纹特征序列; 以及基于为用户注册的各个高级声纹特征序列，为用户执行说话人验证。

6.

发明申请
DATA PARALLEL PROCESSING METHOD AND APPARATUS BASED ON MULTIPLE GRAPHIC PROCESSING UNITS 审中-公开
Title translation: 基于多个图形处理单元的数据并行处理方法和装置

公开(公告)号：US20160321777A1

公开(公告)日：2016-11-03

申请号：US15210278

申请日：2016-07-14

Applicant: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventor： Xing Jin , Yi Li , Yongqiang Zou , Zhimao Guo , Eryu Wang , Wei Xue , Bo Chen , Yong Li , Chunjian Bao , Lei Xiao

IPC: G06T1/20 , G06F9/52 , G06F9/50 , G06T1/60

CPC classification number: G06T1/20 , G06F9/5016 , G06F9/5027 , G06F9/52 , G06N3/0454 , G06N3/063 , G06N3/084 , G06T1/60

Abstract: A parallel data processing method based on multiple graphic processing units (GPUs) is provided, including: creating, in a central processing unit (CPU), a plurality of worker threads for controlling a plurality of worker groups respectively, the worker groups including one or more GPUs; binding each worker thread to a corresponding GPU; loading a plurality of batches of training data from a nonvolatile memory to GPU video memories in the plurality of worker groups; and controlling the plurality of GPUs to perform data processing in parallel through the worker threads. The method can enhance efficiency of multi-GPU parallel data processing. In addition, a parallel data processing apparatus is further provided.

Abstract translation: 提供了一种基于多个图形处理单元（GPU）的并行数据处理方法，包括：在中央处理单元（CPU）中分别创建多个用于控制多个工作者组的工作线程，所述工作人员组包括一个或更多GPU 将每个工作线程绑定到相应的GPU; 将多批培训数据从非易失性存储器加载到多个工作者组中的GPU视频存储器; 并且通过工作线程并行地控制多个GPU来执行数据处理。该方法可以提高多GPU并行数据处理的效率。另外，还提供并行数据处理装置。

7.

发明授权
Method and device for acoustic language model training 有权
Title translation: 声学语言模型训练的方法和装置

公开(公告)号：US09396723B2

公开(公告)日：2016-07-19

申请号：US14109845

申请日：2013-12-17

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Duling Lu , Lu Li , Feng Rao , Bo Chen , Li Lu , Xiang Zhang , Eryu Wang , Shuai Yue

IPC: G10L15/00 , G10L15/06 , G06F17/28 , G10L15/183

CPC classification number: G10L15/063 , G06F17/28 , G10L15/183

Abstract: A method and a device for training an acoustic language model, include: conducting word segmentation for training samples in a training corpus using an initial language model containing no word class labels, to obtain initial word segmentation data containing no word class labels; performing word class replacement for the initial word segmentation data containing no word class labels, to obtain first word segmentation data containing word class labels; using the first word segmentation data containing word class labels to train a first language model containing word class labels; using the first language model containing word class labels to conduct word segmentation for the training samples in the training corpus, to obtain second word segmentation data containing word class labels; and in accordance with the second word segmentation data meeting one or more predetermined criteria, using the second word segmentation data containing word class labels to train the acoustic language model.

Abstract translation: 一种用于训练声学语言模型的方法和装置，包括：使用不含词类标签的初始语言模型，在训练语料库中训练样本的词分割，以获得不包含词类标签的初始分词数据; 对不包含词类标签的初始分词数据执行单词类替换，以获得包含单词分类标签的第一分词数据; 使用包含词类标签的第一词分割数据来训练包含词类标签的第一语言模型; 使用包含词类标签的第一语言模型对训练语料库中的训练样本进行词分割，以获得包含词类标签的第二词分割数据; 并且根据满足一个或多个预定标准的第二字分割数据，使用包含词类标签的第二词分割数据来训练声学语言模型。

8.

发明申请
Keyword Detection For Speech Recognition 有权
Title translation: 语音识别的关键字检测

公开(公告)号：US20150095032A1

公开(公告)日：2015-04-02

申请号：US14567969

申请日：2014-12-11

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Lu LI , Li Lu , Jianxiong Ma , Linghui Kong , Feng Rao , Shuai Yue , Xiang Zhang , Haibo Liu , Eryu Wang , Bo Chen

IPC: G10L15/08

CPC classification number: G10L15/08 , G10L15/083 , G10L2015/088

Abstract: This application discloses a method implemented of recognizing a keyword in a speech that includes a sequence of audio frames further including a current frame and a subsequent frame. A candidate keyword is determined for the current frame using a decoding network that includes keywords and filler words of multiple languages, and used to determine a confidence score for the audio frame sequence. A word option is also determined for the subsequent frame based on the decoding network, and when the candidate keyword and the word option are associated with two distinct types of languages, the confidence score of the audio frame sequence is updated at least based on a penalty factor associated with the two distinct types of languages. The audio frame sequence is then determined to include both the candidate keyword and the word option by evaluating the updated confidence score according to a keyword determination criterion.

Abstract translation: 本申请公开了一种实现的方法，其中识别语音中的关键字，其中包括进一步包括当前帧和后续帧的音频帧序列。使用包括多种语言的关键词和填充词的解码网络为当前帧确定候选关键字，并且用于确定音频帧序列的置信度分数。还基于解码网络为后续帧确定字选项，并且当候选关键词和词选项与两种不同类型的语言相关联时，至少基于惩罚来更新音频帧序列的置信度得分与两种不同类型语言相关联的因素。然后通过根据关键字确定标准评估更新的可信度得分，确定音频帧序列以包括候选关键词和词选项。

9.

发明授权
Method and device for voiceprint recognition 有权

公开(公告)号：US09940935B2

公开(公告)日：2018-04-10

申请号：US15240696

申请日：2016-08-18

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Eryu Wang , Li Lu , Xiang Zhang , Haibo Liu , Lou Li , Feng Rao , Duling Lu , Shuai Yue , Bo Chen

IPC: G10L15/16 , G10L17/18 , G10L17/02 , G10L17/04 , G10L17/08

CPC classification number: G10L17/18 , G10L17/02 , G10L17/04 , G10L17/08

Abstract: A method is performed at a device having one or more processors and memory. The device establishes a first-level Deep Neural Network (DNN) model based on unlabeled speech data, the unlabeled speech data containing no speaker labels and the first-level DNN model specifying a plurality of basic voiceprint features for the unlabeled speech data. The device establishes a second-level DNN model by tuning the first-level DNN model based on labeled speech data, the labeled speech data containing speech samples with respective speaker labels, wherein the second-level DNN model specifies a plurality of high-level voiceprint features. Using the second-level DNN model, registers a first high-level voiceprint feature sequence for a user based on a registration speech sample received from the user. The device performs speaker verification for the user based on the first high-level voiceprint feature sequence registered for the user.

10.

发明授权
Systems and methods for adding punctuations by detecting silences in a voice using plurality of aggregate weights which obey a linear relationship 有权

公开(公告)号：US09779728B2

公开(公告)日：2017-10-03

申请号：US14160808

申请日：2014-01-22

Applicant: Tencent Technology (Shenzhen) Company Limited

Inventor： Haibo Liu , Eryu Wang , Xiang Zhang , Shuai Yue , Lu Li , Li Lu , Jian Liu , Bo Chen

IPC: G10L15/00 , G10L15/04 , G10L15/18 , G10L15/26 , G10L15/187 , G06F17/27 , G10L15/183

CPC classification number: G10L15/1815 , G06F17/27 , G06F17/2725 , G10L15/04 , G10L15/183 , G10L15/187 , G10L15/26 , G10L15/265

Abstract: Systems and methods are provided for adding punctuations. For example, one or more first feature units are identified in a voice file taken as a whole; the voice file is divided into multiple segments by detecting silences in the voice file; one or more second feature units are identified in the voice file; a first aggregate weight of first punctuation states of the voice file and a second aggregate weight of second punctuation states of the voice file are determined, using a language model established based on word separation and third semantic features; a weighted calculation is performed to generate a third aggregate weight based on a linear combination associated with the first aggregate weight and the second aggregate weight; and one or more final punctuations are added to the voice file based on at least information associated with the third aggregate weight.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification