Patent search ap:("Beijing Baidu Netcom Science AND Technology Co. Page Ltd.") AND inv:"Zhijie Chen"

1.

发明授权
Method and apparatus for speech recognition, and storage medium 有权

公开(公告)号：US11756529B2

公开(公告)日：2023-09-12

申请号：US17123253

申请日：2020-12-16

Applicant: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.

Inventor： Liao Zhang , Xiaoyin Fu , Zhengxiang Jiang , Mingxin Liang , Junyao Shao , Qi Zhang , Zhijie Chen , Qiguang Zang

IPC: G10L15/02 , G06F40/12 , G06F40/30 , G06F7/78 , G06F17/16 , G10L15/26

CPC classification number: G10L15/02 , G06F7/78 , G06F17/16 , G06F40/12 , G06F40/30 , G10L15/26 , G10L2015/025

Abstract: Proposed are a method and apparatus for speech recognition, and a storage medium. The specific solution includes: obtaining audio data to be recognized; decoding the audio data to obtain a first syllable of a to-be-converted word, in which the first syllable is a combination of at least one phoneme corresponding to the to-be-converted word; obtaining a sentence to which the to-be-converted word belongs and a converted word in the sentence, and obtaining a second syllable of the converted word; encoding the first syllable and the second syllable to generate first encoding information of the first syllable; and decoding the first encoding information to obtain a text corresponding to the to-be-converted word.

2.

发明授权
Method and apparatus for training speech spectrum generation model, and electronic device 有权

公开(公告)号：US11488578B2

公开(公告)日：2022-11-01

申请号：US17205121

申请日：2021-03-18

Applicant: Beijing Baidu Netcom Science and Technology Co., Ltd.

Inventor： Zhijie Chen , Tao Sun , Lei Jia

IPC: G10L13/00 , G10L13/047 , G10L13/10 , G10L25/18 , G10L25/30

Abstract: The present application discloses a method and an apparatus for training a speech spectrum generation model, as well as an electronic device, and relates to the technical field of speech synthesis and deep learning. A specific implementation is as follows: inputting a first text sequence into the speech spectrum generation model to generate an analog spectrum sequence corresponding to the first text sequence, and obtain a first loss value of the analog spectrum sequence according to a preset loss function; inputting the analog spectrum sequence corresponding to the first text sequence into an adversarial loss function model, which is a generative adversarial network model, to obtain a second loss value of the analog spectrum sequence; and training the speech spectrum generation model based on the first loss value and the second loss value.

Patent Agency Ranking