Patent search ap:("Beijing Youzhuju Network Technology Co. Page Ltd.") AND inv:"Linhao DONG"

1.

发明公开
METHOD, APPARATUS, ELECTRONIC DEVICE, AND MEDIUM FOR SPEECH PROCESSING 审中-公开

公开(公告)号：US20240046921A1

公开(公告)日：2024-02-08

申请号：US18365765

申请日：2023-08-04

Applicant: Beijing Youzhuju Network Technology Co., Ltd.

Inventor： Linhao DONG , Zhenlin Liang , Zhiyun Fan , Yi Liu , Zejun Ma

IPC: G10L15/18 , G10L15/183 , G10L17/06

CPC classification number: G10L15/1815 , G10L15/183 , G10L17/06

Abstract: Embodiments of the present disclosure provide a method, apparatus, electronic device, and medium for speech processing. The method comprises generating a token-level semantic feature of target speech data based on a frame-level acoustic feature of the target speech data. The method further comprises generating a token-level voiceprint feature of the target speech data based on the frame-level acoustic feature. The method further comprises determining a token in the target speech data where speaker change occurs based on the token-level semantic feature and the token-level voiceprint feature. According to embodiments of the present disclosure, speaker change in speech data is detected at the token level in conjunction with the speaker's acoustic features and speech contents, and speaker-based speech recognition results are output directly without post-processing, simplifying the speech recognition process.

2.

发明公开
METHOD, APPARATUS, DEVICE, AND STORAGE MEDIUM FOR SPEAKER CHANGE POINT DETECTION 审中-公开

公开(公告)号：US20240331706A1

公开(公告)日：2024-10-03

申请号：US18741427

申请日：2024-06-12

Applicant: Beijing Youzhuju Network Technology Co., Ltd.

Inventor： Linhao DONG , Zhiyun FAN , Zejun MA

IPC: G10L17/04

CPC classification number: G10L17/04

Abstract: A method, apparatus, device, and storage medium for speaker change point detection, the method including: acquiring target voice data to be detected; and extracting an acoustic feature characterizing acoustic information of the target voice data from the target voice data; encoding the acoustic feature to obtain speaker characterization vectors of the target voice data; integrating and firing the speaker characterization vectors of the target voice data based on a continuous integrate-and-fire CIF mechanism, to obtain a sequence of speaker characterizations in the target voice data; and determining the speaker change points, according to the sequence of the speaker characterizations bounded by the speaker change points in the target voice data. This method can effectively improve the accuracy of the detection result of a speaker change point in target voice data with a type of interaction.

3.

发明公开
VOICE RECOGNITION METHOD AND APPARATUS, MEDIUM, AND ELECTRONIC DEVICE 审中-公开

公开(公告)号：US20240221729A1

公开(公告)日：2024-07-04

申请号：US18288531

申请日：2022-05-07

Applicant: Beijing Youzhuju Network Technology Co., Ltd.

Inventor： Linhao DONG , Zejun MA

IPC: G10L15/16 , G10L15/02

CPC classification number: G10L15/16 , G10L15/02

Abstract: The present disclosure provides a voice recognition method and apparatus, a medium, and an electronic device. The method includes: encoding received voice data to obtain an acoustic vector sequence corresponding to the voice data; obtaining, according to the acoustic vector sequence and a first prediction model, an information amount sequence corresponding to the voice data and a first probability sequence corresponding to the voice data; obtaining a second probability sequence according to the acoustic vector sequence and a second prediction model; determining a target probability sequence according to the first probability sequence and the second probability sequence; and determining a target text corresponding to the voice data according to the target probability sequence.

4.

发明公开
SPEECH PROCESSING METHOD AND APPARATUS, AND ELECTRONIC DEVICE 审中-公开

公开(公告)号：US20230402031A1

公开(公告)日：2023-12-14

申请号：US18249031

申请日：2022-04-06

Applicant: Beijing Youzhuju Network Technology Co., Ltd.

Inventor： Linhao DONG , Meng CAI , Zejun MA

IPC: G10L15/02 , G10L15/06 , G10L15/22 , G10L15/16

CPC classification number: G10L15/02 , G10L15/063 , G10L15/22 , G10L15/16

Abstract: A speech processing method is provided. The method includes: receiving a speech block to be identified as a current speech block, where the speech block includes a past frame, a current frame and a future frame; performing a speech identification process based on the current speech block, where the speech identification process includes: performing speech identification based on the current speech block to obtain a speech identification result of the current frame and a speech identification result of the future frame; determining whether a previous speech block for the current speech block exists; in a case that the previous speech block for the current speech block exists, updating a target identification result based on the speech identification result of the current frame of the current speech block; and outputting the speech identification result of the future frame of the current speech block.

5.

发明公开
METHOD AND DEVICE OF GENERATING ACOUSTIC FEATURES, SPEECH MODEL TRAINING, AND SPEECH RECOGNITION 审中-公开

公开(公告)号：US20240169988A1

公开(公告)日：2024-05-23

申请号：US18427538

申请日：2024-01-30

Applicant: Beijing Youzhuju Network Technology Co., Ltd.

Inventor： Linhao DONG , Zejun MA

IPC: G10L15/22 , G10L15/06

CPC classification number: G10L15/22 , G10L15/063

Abstract: The present disclosure discloses a method and device of generating acoustic features, speech model training, and speech recognition. By acquiring the acoustic information vector of the current speech frame and the information weight of the current speech frame, and according to the accumulated information weight corresponding to the previous speech frame, the retention rate corresponding to the current speech frame, and the information weight of the current speech frame, the accumulated information weight corresponding to the current speech frame can be obtained. The retention rate is the difference between 1 and a leakage rate.

6.

发明公开
METHOD, APPARATUS, DEVICE, AND STORAGE MEDIUM FOR SPEAKER CHANGE POINT DETECTION 审中-公开

公开(公告)号：US20240135933A1

公开(公告)日：2024-04-25

申请号：US18394143

申请日：2023-12-22

Applicant: Beijing Youzhuju Network Technology Co., Ltd.

Inventor： Linhao DONG , Zhiyun FAN , Zejun MA

IPC: G10L17/04

CPC classification number: G10L17/04

Abstract: A method, apparatus, device, and storage medium for speaker change point detection, the method including: acquiring target voice data to be detected; and extracting an acoustic feature characterizing acoustic information of the target voice data from the target voice data; encoding the acoustic feature to obtain speaker characterization vectors at a voice frame level of the target voice data; integrating and firing the speaker characterization vectors at the voice frame level of the target voice data based on a continuous integrate-and-fire CIF mechanism, to obtain a sequence of speaker characterizations bounded by speaker change points in the target voice data; and determining a timestamp corresponding to the speaker change points, according to the sequence of the speaker characterizations bounded by the speaker change points in the target voice data.

7.

发明公开
MODEL TRAINING METHOD, SPEECH RECOGNITION METHOD, DEVICE, MEDIUM, AND APPARATUS 审中-公开

公开(公告)号：US20240127795A1

公开(公告)日：2024-04-18

申请号：US18276769

申请日：2022-05-07

Applicant: Beijing Youzhuju Network Technology Co., Ltd.

Inventor： Linhao DONG , Zejun MA

IPC: G10L15/06 , G10L15/065

CPC classification number: G10L15/063 , G10L15/065 , G10L2015/0635 , G10L19/04

Abstract: A model training method, a speech recognition method and apparatus, a medium, and a device are provided. The speech recognition model including an encoder, a CIF prediction sub-model and a CTC prediction sub-model. The model training method includes: encoding training speech data based on the encoder to obtain an acoustic vector sequence corresponding to the training speech data; obtaining an information amount sequence corresponding to the training speech data based on the acoustic vector sequence and the CIF prediction sub-model; obtaining a target probability sequence based on the acoustic vector sequence and the CTC prediction sub-model; determining a target loss of the speech recognition model based on the information amount sequence and the target probability sequence; and updating, in response to an updating condition being satisfied, a model parameter of the speech recognition model based on the target loss.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification