-
1.
公开(公告)号:US11488578B2
公开(公告)日:2022-11-01
申请号:US17205121
申请日:2021-03-18
Inventor: Zhijie Chen , Tao Sun , Lei Jia
IPC: G10L13/00 , G10L13/047 , G10L13/10 , G10L25/18 , G10L25/30
Abstract: The present application discloses a method and an apparatus for training a speech spectrum generation model, as well as an electronic device, and relates to the technical field of speech synthesis and deep learning. A specific implementation is as follows: inputting a first text sequence into the speech spectrum generation model to generate an analog spectrum sequence corresponding to the first text sequence, and obtain a first loss value of the analog spectrum sequence according to a preset loss function; inputting the analog spectrum sequence corresponding to the first text sequence into an adversarial loss function model, which is a generative adversarial network model, to obtain a second loss value of the analog spectrum sequence; and training the speech spectrum generation model based on the first loss value and the second loss value.
-
公开(公告)号:US12118989B2
公开(公告)日:2024-10-15
申请号:US17507437
申请日:2021-10-21
Inventor: Xu Chen , Jinfeng Bai , Runqiang Han , Lei Jia
IPC: G10L15/20 , G06N3/084 , G10L15/06 , G10L15/22 , G10L21/0208 , G10L21/0232 , G10L21/038 , G10L25/30
CPC classification number: G10L15/20 , G06N3/084 , G10L15/063 , G10L15/22 , G10L21/0232 , G10L21/038 , G10L25/30 , G10L2021/02082
Abstract: The present disclosure provides a speech processing method, and a method for generating a speech processing model, related to a field of signal processing technologies. The speech processing method includes: obtaining M speech signals to be processed and N reference signals; performing sub-band decomposition on each of the M speech signals and each of the N reference signals to obtain frequency-band components in each speech signal and each reference signal; processing the frequency-band components in each speech signal and each reference signal by using an echo cancellation model, to obtain an ideal ratio mask corresponding to the N reference signals in each frequency band of each speech signal; and performing echo cancellation on each frequency-band component of each speech signal based on the ideal ratio mask corresponding to the N reference signals in each frequency band of each speech signal, to obtain M echo-cancelled speech signals.
-
公开(公告)号:US11823662B2
公开(公告)日:2023-11-21
申请号:US17158726
申请日:2021-01-26
Inventor: Cong Gao , Saisai Zou , Jinfeng Bai , Lei Jia
CPC classification number: G10L15/08 , G10L15/22 , G10L15/02 , G10L15/14 , G10L2015/088
Abstract: The present disclosure discloses a control method and a control apparatus for speech interaction. The detailed implementation solution of the control method for the speech interaction includes: collecting an audio signal; detecting a wake-up word in the audio signal to obtain a wake-up word result; and playing a prompt tone and/or executing a speech instruction in the audio signal based on the wake-up word result.
-
公开(公告)号:US11735168B2
公开(公告)日:2023-08-22
申请号:US17209681
申请日:2021-03-23
Inventor: Xin Li , Bin Huang , Ce Zhang , Jinfeng Bai , Lei Jia
CPC classification number: G10L15/16 , G06N3/08 , G10L15/063 , G10L15/197 , G10L15/22 , G10L15/32 , G10L25/18 , G10L15/20 , G10L2015/0631
Abstract: A method and an apparatus for recognizing a voice are provided. The method may include: inputting a target voice into a pre-trained voice recognition model to obtain an initial text output by at least one recognition network in the voice recognition model, the recognition network including a plurality of preset types of processing layers, and at least one type of processing layer of the recognition network being obtained by training based on a voice sample in a preset direction interval; and determining a voice recognition result of the target voice, based on the initial text.
-
公开(公告)号:US11615784B2
公开(公告)日:2023-03-28
申请号:US17118869
申请日:2020-12-11
Inventor: Cong Gao , Saisai Zou , Jinfeng Bai , Lei Jia
Abstract: The present disclosure discloses a control method and a control apparatus for speech interaction. The detailed implementation solution of the control method for the speech interaction includes: collecting an audio signal; detecting a wake-up word in the audio signal to obtain a wake-up word result; and playing a prompt tone and/or executing a speech instruction in the audio signal based on the wake-up word result.
-
公开(公告)号:US20210210113A1
公开(公告)日:2021-07-08
申请号:US17208387
申请日:2021-03-22
Inventor: Xin Li , Bin Huang , Ce Zhang , Jinfeng Bai , Lei Jia
Abstract: The present disclosure provides a method and apparatus for detecting a voice, relates to the fields of voice processing and deep learning technology. The method may include: acquiring a target voice; and inputting the target voice into a pre-trained deep neural network to obtain whether the target voice has a sub-voice in each of a plurality of preset direction intervals, the deep neural network being used to predict whether the voice has a sub-voice in each of the plurality of direction intervals.
-
-
-
-
-