MODEL TRAINING METHOD AND APPARATUS, ELECTRONIC DEVICE AND COMPUTER READABLE MEDIUM

    公开(公告)号:US20250022456A1

    公开(公告)日:2025-01-16

    申请号:US18897849

    申请日:2024-09-26

    Inventor: Qinglin MENG

    Abstract: The present disclosure provides a model training method, including: performing feature extraction from a speech sample to obtain a speech feature; inputting the speech feature into an encoding network of a to-be-trained model for encoding processing; decoding an intermediate encoding feature to obtain an additional loss; obtaining an encoding loss based on an encoding feature and an encoding label; obtaining a total encoding loss based on the additional loss, the encoding loss, and a preset first loss weight; inputting the encoding feature into a decoding network for decoding processing to obtain a total decoding loss; obtaining a total model loss based on the total encoding loss, the total decoding loss, and a preset second loss weight; updating parameters in the model based on the total model loss, and continuing to train the to-be-trained model according to the updated parameters until the total model loss converges, obtaining a trained model.

    Keyword detection for audio content

    公开(公告)号:US12190867B2

    公开(公告)日:2025-01-07

    申请号:US17804603

    申请日:2022-05-31

    Inventor: Zvi Figov

    Abstract: Examples of the present disclosure describe improved systems and methods for detecting keywords in audio content. In one example implementation, audio content is segmented into one or more audio segments. One or more text segments is generated, each text segment corresponding to each of the audio segments. For each text segment, one or more phrase candidate values is generated using a textual analysis, and one or more sentence embedding values is generated using a sentence embedding analysis. Next, an average sentence embedding value is calculated using the one or more sentence embedding values. Each of the one or more phrase candidate values is compared to the average sentence embedding value. Each phrase candidate value having a comparison value above a threshold value is labeled as representing a keyword.

    HEADSET CONTROL METHOD, HEADSET, APPARATUS, AND STORAGE MEDIUM

    公开(公告)号:US20240422468A1

    公开(公告)日:2024-12-19

    申请号:US18815959

    申请日:2024-08-27

    Abstract: Embodiments of the present disclosure provide a headset control method, a headset, a headset control apparatus, and a related storage medium. The method includes: collecting environment information, and determining key sound detection sensitivity based on the environment information; performing key sound detection in the environment information based on the key sound detection sensitivity; and if a key sound exists in the environment information, adjusting the headset to a hear through mode, or playing the key sound. The headset collects the environment information, determines the key sound detection sensitivity based on the environment information, and performs key sound detection in the environment information based on the key sound detection sensitivity. If the key sound exists, the headset is adjusted to the hear through mode, or the key sound is played. In this solution, the key sound detection sensitivity corresponding to the environment information is determined based on the environment information, to perform key sound detection.

    Automatic detection of neurocognitive impairment based on a speech sample

    公开(公告)号:US12161481B2

    公开(公告)日:2024-12-10

    申请号:US17415418

    申请日:2019-12-16

    Abstract: The invention is a method for automatic detection of neurocognitive impairment, comprising, generating, in a segmentation and labelling step (11), a labelled segment series (26) from a speech sample (22) using a speech recognition unit (24); and generating from the labelled segment series (26), in an acoustic parameter calculation step (12), acoustic parameters (30) characterizing the speech sample (22). The method is characterised by determining, in a probability analysis step (14), in a particular temporal division of the speech sample (22), respective probability values (38) corresponding to silent pauses, filled pauses and any types of pauses for respective temporal intervals thereof; calculating, in an additional parameter calculating step (15), a histogram by generating an additional histogram data set (42) from the determined probability values (38) by dividing a probability domain into subdomains and aggregating durations of the temporal intervals corresponding to the probability values falling into the respective subdomains; and generating, in an evaluation step (13), decision information (34) by feeding the acoustic parameters (30) and the additional histogram data set (42) into an evaluation unit (32), the evaluation unit (32) using a machine learning algorithm. The invention is furthermore data processing system, a computer program product and a computer-readable storage medium for carrying out the method.

    Stable Output Streaming Speech Translation System

    公开(公告)号:US20240395240A1

    公开(公告)日:2024-11-28

    申请号:US18200889

    申请日:2023-05-23

    Abstract: A computer implemented method includes receiving speech data representative of speech in a first language The speech data is divided into chunks of speech data, each chunk comprising multiple temporally consecutive frames of acoustic information. Each temporally consecutive chunk of data is processed using beam search on each frame to identify candidate language tokens representing a second language different from the first language. A best candidate language token(s) is selected for each chunk as processed. The selected best candidate language token or tokens for each chunk of data is committed as a prefix for a next temporally consecutive chunk of data.

Patent Agency Ranking