-
公开(公告)号:US20240331706A1
公开(公告)日:2024-10-03
申请号:US18741427
申请日:2024-06-12
Inventor: Linhao DONG , Zhiyun FAN , Zejun MA
IPC: G10L17/04
CPC classification number: G10L17/04
Abstract: A method, apparatus, device, and storage medium for speaker change point detection, the method including: acquiring target voice data to be detected; and extracting an acoustic feature characterizing acoustic information of the target voice data from the target voice data; encoding the acoustic feature to obtain speaker characterization vectors of the target voice data; integrating and firing the speaker characterization vectors of the target voice data based on a continuous integrate-and-fire CIF mechanism, to obtain a sequence of speaker characterizations in the target voice data; and determining the speaker change points, according to the sequence of the speaker characterizations bounded by the speaker change points in the target voice data. This method can effectively improve the accuracy of the detection result of a speaker change point in target voice data with a type of interaction.
-
公开(公告)号:US20240135933A1
公开(公告)日:2024-04-25
申请号:US18394143
申请日:2023-12-22
Inventor: Linhao DONG , Zhiyun FAN , Zejun MA
IPC: G10L17/04
CPC classification number: G10L17/04
Abstract: A method, apparatus, device, and storage medium for speaker change point detection, the method including: acquiring target voice data to be detected; and extracting an acoustic feature characterizing acoustic information of the target voice data from the target voice data; encoding the acoustic feature to obtain speaker characterization vectors at a voice frame level of the target voice data; integrating and firing the speaker characterization vectors at the voice frame level of the target voice data based on a continuous integrate-and-fire CIF mechanism, to obtain a sequence of speaker characterizations bounded by speaker change points in the target voice data; and determining a timestamp corresponding to the speaker change points, according to the sequence of the speaker characterizations bounded by the speaker change points in the target voice data.
-