METHOD, APPARATUS, DEVICE, AND STORAGE MEDIUM FOR SPEAKER CHANGE POINT DETECTION

    公开(公告)号:US20240331706A1

    公开(公告)日:2024-10-03

    申请号:US18741427

    申请日:2024-06-12

    CPC classification number: G10L17/04

    Abstract: A method, apparatus, device, and storage medium for speaker change point detection, the method including: acquiring target voice data to be detected; and extracting an acoustic feature characterizing acoustic information of the target voice data from the target voice data; encoding the acoustic feature to obtain speaker characterization vectors of the target voice data; integrating and firing the speaker characterization vectors of the target voice data based on a continuous integrate-and-fire CIF mechanism, to obtain a sequence of speaker characterizations in the target voice data; and determining the speaker change points, according to the sequence of the speaker characterizations bounded by the speaker change points in the target voice data. This method can effectively improve the accuracy of the detection result of a speaker change point in target voice data with a type of interaction.

    AUDIO CAPTION ALIGNMENT METHOD AND APPARATUS, MEDIUM, AND ELECTRONIC DEVICE

    公开(公告)号:US20240379116A1

    公开(公告)日:2024-11-14

    申请号:US18662675

    申请日:2024-05-13

    Abstract: The disclosure relates to an audio caption alignment method and apparatus, a medium, and an electronic device. The method includes: obtaining a target audio and a target caption text of the target audio; obtaining a plurality of first target audios by slicing the target audio according to a slicing duration in a case that a duration of the target audio is greater than a first preset duration; determining first audio feature information of each of the first target audios; obtaining target audio feature information of the target audio by concatenating all of the first audio feature information in a case that the duration of the target audio is less than or equal to a second preset duration, where the second preset duration is greater than the first preset duration; and generating caption information corresponding to the target audio according to the target caption text and the target audio feature information.

    VOICE RECOGNITION METHOD AND APPARATUS, MEDIUM, AND ELECTRONIC DEVICE

    公开(公告)号:US20240221729A1

    公开(公告)日:2024-07-04

    申请号:US18288531

    申请日:2022-05-07

    CPC classification number: G10L15/16 G10L15/02

    Abstract: The present disclosure provides a voice recognition method and apparatus, a medium, and an electronic device. The method includes: encoding received voice data to obtain an acoustic vector sequence corresponding to the voice data; obtaining, according to the acoustic vector sequence and a first prediction model, an information amount sequence corresponding to the voice data and a first probability sequence corresponding to the voice data; obtaining a second probability sequence according to the acoustic vector sequence and a second prediction model; determining a target probability sequence according to the first probability sequence and the second probability sequence; and determining a target text corresponding to the voice data according to the target probability sequence.

    SPEECH PROCESSING METHOD AND APPARATUS, AND ELECTRONIC DEVICE

    公开(公告)号:US20230402031A1

    公开(公告)日:2023-12-14

    申请号:US18249031

    申请日:2022-04-06

    CPC classification number: G10L15/02 G10L15/063 G10L15/22 G10L15/16

    Abstract: A speech processing method is provided. The method includes: receiving a speech block to be identified as a current speech block, where the speech block includes a past frame, a current frame and a future frame; performing a speech identification process based on the current speech block, where the speech identification process includes: performing speech identification based on the current speech block to obtain a speech identification result of the current frame and a speech identification result of the future frame; determining whether a previous speech block for the current speech block exists; in a case that the previous speech block for the current speech block exists, updating a target identification result based on the speech identification result of the current frame of the current speech block; and outputting the speech identification result of the future frame of the current speech block.

    INTENTION RECOGNITION METHOD AND APPARATUS, READABLE MEDIUM, AND ELECTRONIC DEVICE

    公开(公告)号:US20240185046A1

    公开(公告)日:2024-06-06

    申请号:US18444050

    申请日:2024-02-16

    CPC classification number: G06N3/063

    Abstract: The present application relates to an intention recognition method and apparatus, a readable medium, and an electronic device. The method includes: by means of a preset intention recognition quantification model, performing a quantification operation on a dot product of a query vector and a key vector which correspond to each character in a target text, so as to obtain a fixed-point type target vector of a first bit; according to the fixed-point type target vector, determining, by means of a target mapping relationship, a floating-point type attention weight of a second bit corresponding to each character; and according to the floating-point type attention weight, determining a target intention corresponding to the target text, the first bit being smaller than the second bit.

    METHOD, APPARATUS, DEVICE, AND STORAGE MEDIUM FOR SPEAKER CHANGE POINT DETECTION

    公开(公告)号:US20240135933A1

    公开(公告)日:2024-04-25

    申请号:US18394143

    申请日:2023-12-22

    CPC classification number: G10L17/04

    Abstract: A method, apparatus, device, and storage medium for speaker change point detection, the method including: acquiring target voice data to be detected; and extracting an acoustic feature characterizing acoustic information of the target voice data from the target voice data; encoding the acoustic feature to obtain speaker characterization vectors at a voice frame level of the target voice data; integrating and firing the speaker characterization vectors at the voice frame level of the target voice data based on a continuous integrate-and-fire CIF mechanism, to obtain a sequence of speaker characterizations bounded by speaker change points in the target voice data; and determining a timestamp corresponding to the speaker change points, according to the sequence of the speaker characterizations bounded by the speaker change points in the target voice data.

    MODEL TRAINING METHOD, SPEECH RECOGNITION METHOD, DEVICE, MEDIUM, AND APPARATUS

    公开(公告)号:US20240127795A1

    公开(公告)日:2024-04-18

    申请号:US18276769

    申请日:2022-05-07

    CPC classification number: G10L15/063 G10L15/065 G10L2015/0635 G10L19/04

    Abstract: A model training method, a speech recognition method and apparatus, a medium, and a device are provided. The speech recognition model including an encoder, a CIF prediction sub-model and a CTC prediction sub-model. The model training method includes: encoding training speech data based on the encoder to obtain an acoustic vector sequence corresponding to the training speech data; obtaining an information amount sequence corresponding to the training speech data based on the acoustic vector sequence and the CIF prediction sub-model; obtaining a target probability sequence based on the acoustic vector sequence and the CTC prediction sub-model; determining a target loss of the speech recognition model based on the information amount sequence and the target probability sequence; and updating, in response to an updating condition being satisfied, a model parameter of the speech recognition model based on the target loss.

Patent Agency Ranking