-
公开(公告)号:US11217229B2
公开(公告)日:2022-01-04
申请号:US16921537
申请日:2020-07-06
Inventor: Yi Gao , Ji Meng Zheng , Meng Yu , Min Luo
Abstract: A speech recognition method, apparatus, a computer device and an electronic device for recognizing speech. The method includes receiving an audio signal obtained by a microphone array; performing a beamforming processing on the audio signal in a plurality of target directions to obtain a plurality of beam signals; performing a speech recognition on each of the plurality of beam signals to obtain a plurality of speech recognition results corresponding to the plurality of beam signals; and determining a speech recognition result of the audio signal based on the plurality of speech recognition results of the plurality of beam signals.
-
公开(公告)号:US20200051549A1
公开(公告)日:2020-02-13
申请号:US16655548
申请日:2019-10-17
Inventor: Lianwu Chen , Meng Yu , Min Luo , Dan Su
Abstract: Embodiments of the present invention provide a speech signal processing model training method, an electronic device and a storage medium. The embodiments of the present invention determines a target training loss function based on a training loss function of each of one or more speech signal processing tasks; inputs a task input feature of each speech signal processing task into a starting multi-task neural network, and updates model parameters of a shared layer and each of one or more task layers of the starting multi-task neural network corresponding to the one or more speech signal processing tasks by minimizing the target training loss function as a training objective, until the starting multi-task neural network converges, to obtain a speech signal processing model.
-
3.
公开(公告)号:US12057135B2
公开(公告)日:2024-08-06
申请号:US17227123
申请日:2021-04-09
IPC: G10L21/0216 , G10L21/02 , G10L21/0208 , G10L21/0232 , G10L25/78 , G10L25/84
CPC classification number: G10L21/0216 , G10L21/0208 , G10L21/0232 , G10L25/78 , G10L25/84 , G10L21/02
Abstract: This application discloses a speech noise reduction method performed by a computing device. The method includes: obtaining a noisy speech signal, the noisy speech signal including a pure speech signal and a noise signal; estimating a posteriori signal-to-noise ratio and a priori signal-to-noise ratio of the noisy speech signal; determining a speech/noise likelihood ratio in a Bark domain based on the estimated posteriori signal-to-noise ratio and the estimated priori signal-to-noise ratio; estimating a priori speech existence probability based on the determined speech/noise likelihood ratio; determining a gain based on the estimated posteriori signal-to-noise ratio, the estimated priori signal-to-noise ratio, and the estimated priori speech existence probability, the gain being a frequency domain transfer function used for converting the noisy speech signal into an estimation of the pure speech signal; and exporting the estimation of the pure speech signal from the noisy speech signal based on the gain.
-
公开(公告)号:US11856376B2
公开(公告)日:2023-12-26
申请号:US17319024
申请日:2021-05-12
Inventor: Jimeng Zheng , Yi Gao , Xuan Ji , Weiwei Li , Meng Yu , Kai Xia , Jun Feng , Zhu Chen , Hongyang Chen , Wenbin Yang , Yu Wang , Yong Liu
IPC: H04R3/00
CPC classification number: H04R3/005
Abstract: This application discloses a sound acquisition component array, including: two first sound acquisition components, two second sound acquisition components, and two third sound acquisition components. The two second sound acquisition components are located at a first side of a line connecting the two first sound acquisition components, and the two third sound acquisition components are located at a second side of the connecting line that is opposite to the first side of the connecting line; the two second sound acquisition components are symmetrical about a perpendicular bisector of the connecting line, and the two third sound acquisition components are symmetrical about the perpendicular bisector; and a distance between the two first sound acquisition components, a distance between the two second sound acquisition components, and a distance between the two third sound acquisition components are respectively different from one another along a direction defined by the connecting line.
-
5.
公开(公告)号:US11450337B2
公开(公告)日:2022-09-20
申请号:US17023829
申请日:2020-09-17
Inventor: Lianwu Chen , Meng Yu , Yanmin Qian , Dan Su , Dong Yu
IPC: G10L21/0272 , G06N3/04 , G06N3/08 , G10L25/30 , G10L25/51
Abstract: A multi-person speech separation method is provided for a terminal. The method includes extracting a hybrid speech feature from a hybrid speech signal requiring separation, N human voices being mixed in the hybrid speech signal, N being a positive integer greater than or equal to 2; extracting a masking coefficient of the hybrid speech feature by using a generative adversarial network (GAN) model, to obtain a masking matrix corresponding to the N human voices, wherein the GAN model comprises a generative network model and an adversarial network model; and performing a speech separation on the masking matrix corresponding to the N human voices and the hybrid speech signal by using the GAN model, and outputting N separated speech signals corresponding to the N human voices.
-
6.
公开(公告)号:US20210375294A1
公开(公告)日:2021-12-02
申请号:US17401125
申请日:2021-08-12
Inventor: Rongzhi Gu , Shixiong Zhang , Lianwu Chen , Yong Xu , Meng Yu , Dan Su , Dong Yu
IPC: G10L19/008 , G10L25/30 , G10L25/03
Abstract: This application relates to a method of extracting an inter channel feature from a multi-channel multi-sound source mixed audio signal performed at a computing device. The method includes: transforming one channel component of a multi-channel multi-sound source mixed audio signal into a single-channel multi-sound source mixed audio representation in a feature space; performing a two-dimensional dilated convolution on the multi-channel multi-sound source mixed audio signal to extract inter-channel features; performing a feature fusion on the single-channel multi-sound source mixed audio representation and the inter-channel features; estimating respective weights of sound sources in the single-channel multi-sound source mixed audio representation based on a fused multi-channel multi-sound source mixed audio feature; obtaining respective representations of the plurality of sound sources according to the single-channel multi-sound source mixed audio representation and the respective weights; and transforming the respective representations of the sound sources into respective audio signals of the plurality of sound sources.
-
7.
公开(公告)号:US11908483B2
公开(公告)日:2024-02-20
申请号:US17401125
申请日:2021-08-12
Inventor: Rongzhi Gu , Shixiong Zhang , Lianwu Chen , Yong Xu , Meng Yu , Dan Su , Dong Yu
IPC: G10L19/008 , G10L25/03 , G10L25/30 , H04S3/02 , H04S5/00
CPC classification number: G10L19/008 , G10L25/03 , G10L25/30 , H04S3/02 , H04S5/00
Abstract: This application relates to a method of extracting an inter channel feature from a multi-channel multi-sound source mixed audio signal performed at a computing device. The method includes: transforming one channel component of a multi-channel multi-sound source mixed audio signal into a single-channel multi-sound source mixed audio representation in a feature space; performing a two-dimensional dilated convolution on the multi-channel multi-sound source mixed audio signal to extract inter-channel features; performing a feature fusion on the single-channel multi-sound source mixed audio representation and the inter-channel features; estimating respective weights of sound sources in the single-channel multi-sound source mixed audio representation based on a fused multi-channel multi-sound source mixed audio feature; obtaining respective representations of the plurality of sound sources according to the single-channel multi-sound source mixed audio representation and the respective weights; and transforming the respective representations of the sound sources into respective audio signals of the plurality of sound sources.
-
公开(公告)号:US11158304B2
公开(公告)日:2021-10-26
申请号:US16655548
申请日:2019-10-17
Inventor: Lianwu Chen , Meng Yu , Min Luo , Dan Su
Abstract: Embodiments of the present invention provide a speech signal processing model training method, an electronic device and a storage medium. The embodiments of the present invention determines a target training loss function based on a training loss function of each of one or more speech signal processing tasks; inputs a task input feature of each speech signal processing task into a starting multi-task neural network, and updates model parameters of a shared layer and each of one or more task layers of the starting multi-task neural network corresponding to the one or more speech signal processing tasks by minimizing the target training loss function as a training objective, until the starting multi-task neural network converges, to obtain a speech signal processing model.
-
公开(公告)号:US20210266664A1
公开(公告)日:2021-08-26
申请号:US17319024
申请日:2021-05-12
Inventor: Jimeng Zheng , Yi Gao , Xuan Ji , Weiwei Li , Meng Yu , Kai Xia , Jun Feng , Zhu Chen , Hongyang Chen , Wenbin Yang , Yu Wang , Yong Liu
IPC: H04R3/00
Abstract: This application discloses a sound acquisition component array, including: two first sound acquisition components, two second sound acquisition components, and two third sound acquisition components. The two second sound acquisition components are located at a first side of a line connecting the two first sound acquisition components, and the two third sound acquisition components are located at a second side of the connecting line that is opposite to the first side of the connecting line; the two second sound acquisition components are symmetrical about a perpendicular bisector of the connecting line, and the two third sound acquisition components are symmetrical about the perpendicular bisector; and a distance between the two first sound acquisition components, a distance between the two second sound acquisition components, and a distance between the two third sound acquisition components are respectively different from one another along a direction defined by the connecting line.
-
公开(公告)号:US12051441B2
公开(公告)日:2024-07-30
申请号:US17944067
申请日:2022-09-13
Inventor: Jimeng Zheng , Lianwu Chen , Weiwei Li , Zhiyi Duan , Meng Yu , Dan Su , Kaiyu Jiang
CPC classification number: G10L25/84 , G06T7/20 , G10L17/02 , G10L17/22 , G10L21/028 , G10L25/21 , G06T2207/30201
Abstract: This application discloses a multi-sound area-based speech detection method and related apparatus, and a storage medium, which is applied to the field of artificial intelligence. The method includes: obtaining sound area information corresponding to N sound areas including multiple users speaking simultaneously; generating a control signal corresponding to each target detection sound area according to user information corresponding to the target detection sound area; processing multi-user speech input signals by using the control signals, to obtain a speech output signal corresponding to each target detection sound area; generating a speech detection result of the target detection sound area according to the speech output signal corresponding to the target detection sound area; and selecting, among the multiple users, a main speaker based on the user information, the speech output signals and speech detection results of multiple users in the N sound areas.
-
-
-
-
-
-
-
-
-