Patent search ap:("Beijing Baidu Netcom Science Technology Co. Page Ltd.") AND inv:"Ce ZHANG"

1.

发明公开
AUDIO SIGNAL PROCESSING METHOD, TRAINING METHOD, APPARATUS AND STORAGE MEDIUM 审中-公开

公开(公告)号：US20230197096A1

公开(公告)日：2023-06-22

申请号：US17812784

申请日：2022-07-15

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Wenkai ZHANG , Ce ZHANG , Zheng LI , Lei JIA

IPC: G10L21/0224 , G10L15/22 , G10L25/30 , G10L15/06

CPC classification number: G10L21/0224 , G10L15/22 , G10L15/063 , G10L25/30 , G10L2015/223 , G10L2021/02082

Abstract: Provided are an audio signal processing method, a training method, an apparatus and a storage medium, relating to the field of data processing, in particular to, the field of voice. The audio signal processing method includes: eliminating at least part of a linear echo signal from a mixed voice signal, to obtain an intermediate processing signal, where the mixed voice signal is obtained by mixing a target voice signal with an echo signal, and the echo signal is generated in an environment where the target voice signal is located and includes the linear echo signal and a nonlinear echo signal; and removing the nonlinear echo signal and a residual part of the linear echo signal from the intermediate processing signal, by using a target full convolution neural network model, to obtain an approximate target voice signal, the target full convolution neural network model including at least two convolution layers.

2.

发明申请
METHOD OF TRAINING DEEP LEARNING MODEL, AND METHOD OF SYNTHESIZING SPEECH 有权

公开(公告)号：US20250157457A1

公开(公告)日：2025-05-15

申请号：US19023572

申请日：2025-01-16

Applicant: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventor： Bin HUANG , Tao SUN , Ce ZHANG , Yongguo KANG , Xiaoyin FU , Lei JIA

IPC: G10L13/027

Abstract: A method of training a deep learning model and a method of synthesizing a speech are provided, which relate to a field of artificial intelligence technology, in particular to fields of large model, large language model, generative model, deep learning, and speech processing technologies. The method of training a deep learning model includes: determining a reference speech feature of a sample speech, the reference speech feature being associated with a prosodic feature of the sample speech; retrieving a speech library using a sample text corresponding to the sample speech, so as to obtain a pronunciation expression feature of the sample text; inputting the pronunciation expression feature into the deep learning model to obtain an output speech feature; determining a loss of the deep learning model according to the reference speech feature and the output speech feature; and adjusting a parameter of the deep learning model according to the loss.

Patent Agency Ranking