Patent search ap:("Beijing Youzhuju Network Technology Co. Page Ltd.") AND inv:"Zhiyun FAN"

1.

发明公开
METHOD, APPARATUS, DEVICE, AND STORAGE MEDIUM FOR SPEAKER CHANGE POINT DETECTION 审中-公开

公开(公告)号：US20240331706A1

公开(公告)日：2024-10-03

申请号：US18741427

申请日：2024-06-12

Applicant: Beijing Youzhuju Network Technology Co., Ltd.

Inventor： Linhao DONG , Zhiyun FAN , Zejun MA

IPC: G10L17/04

CPC classification number: G10L17/04

Abstract: A method, apparatus, device, and storage medium for speaker change point detection, the method including: acquiring target voice data to be detected; and extracting an acoustic feature characterizing acoustic information of the target voice data from the target voice data; encoding the acoustic feature to obtain speaker characterization vectors of the target voice data; integrating and firing the speaker characterization vectors of the target voice data based on a continuous integrate-and-fire CIF mechanism, to obtain a sequence of speaker characterizations in the target voice data; and determining the speaker change points, according to the sequence of the speaker characterizations bounded by the speaker change points in the target voice data. This method can effectively improve the accuracy of the detection result of a speaker change point in target voice data with a type of interaction.

2.

发明公开
METHOD, APPARATUS, DEVICE, AND STORAGE MEDIUM FOR SPEAKER CHANGE POINT DETECTION 审中-公开

公开(公告)号：US20240135933A1

公开(公告)日：2024-04-25

申请号：US18394143

申请日：2023-12-22

Applicant: Beijing Youzhuju Network Technology Co., Ltd.

Inventor： Linhao DONG , Zhiyun FAN , Zejun MA

IPC: G10L17/04

CPC classification number: G10L17/04

Abstract: A method, apparatus, device, and storage medium for speaker change point detection, the method including: acquiring target voice data to be detected; and extracting an acoustic feature characterizing acoustic information of the target voice data from the target voice data; encoding the acoustic feature to obtain speaker characterization vectors at a voice frame level of the target voice data; integrating and firing the speaker characterization vectors at the voice frame level of the target voice data based on a continuous integrate-and-fire CIF mechanism, to obtain a sequence of speaker characterizations bounded by speaker change points in the target voice data; and determining a timestamp corresponding to the speaker change points, according to the sequence of the speaker characterizations bounded by the speaker change points in the target voice data.

Patent Agency Ranking