-
公开(公告)号:US20230197096A1
公开(公告)日:2023-06-22
申请号:US17812784
申请日:2022-07-15
Inventor: Wenkai ZHANG , Ce ZHANG , Zheng LI , Lei JIA
IPC: G10L21/0224 , G10L15/22 , G10L25/30 , G10L15/06
CPC classification number: G10L21/0224 , G10L15/22 , G10L15/063 , G10L25/30 , G10L2015/223 , G10L2021/02082
Abstract: Provided are an audio signal processing method, a training method, an apparatus and a storage medium, relating to the field of data processing, in particular to, the field of voice. The audio signal processing method includes: eliminating at least part of a linear echo signal from a mixed voice signal, to obtain an intermediate processing signal, where the mixed voice signal is obtained by mixing a target voice signal with an echo signal, and the echo signal is generated in an environment where the target voice signal is located and includes the linear echo signal and a nonlinear echo signal; and removing the nonlinear echo signal and a residual part of the linear echo signal from the intermediate processing signal, by using a target full convolution neural network model, to obtain an approximate target voice signal, the target full convolution neural network model including at least two convolution layers.
-
公开(公告)号:US20250157457A1
公开(公告)日:2025-05-15
申请号:US19023572
申请日:2025-01-16
Inventor: Bin HUANG , Tao SUN , Ce ZHANG , Yongguo KANG , Xiaoyin FU , Lei JIA
IPC: G10L13/027
Abstract: A method of training a deep learning model and a method of synthesizing a speech are provided, which relate to a field of artificial intelligence technology, in particular to fields of large model, large language model, generative model, deep learning, and speech processing technologies. The method of training a deep learning model includes: determining a reference speech feature of a sample speech, the reference speech feature being associated with a prosodic feature of the sample speech; retrieving a speech library using a sample text corresponding to the sample speech, so as to obtain a pronunciation expression feature of the sample text; inputting the pronunciation expression feature into the deep learning model to obtain an output speech feature; determining a loss of the deep learning model according to the reference speech feature and the output speech feature; and adjusting a parameter of the deep learning model according to the loss.
-