On-device speech synthesis of textual segments for training of on-device speech recognition model

    公开(公告)号:US11978432B2

    公开(公告)日:2024-05-07

    申请号:US18204324

    申请日:2023-05-31

    申请人: GOOGLE LLC

    IPC分类号: G10L13/047 G10L15/06

    摘要: Processor(s) of a client device can: identify a textual segment stored locally at the client device; process the textual segment, using a speech synthesis model stored locally at the client device, to generate synthesized speech audio data that includes synthesized speech of the identified textual segment; process the synthesized speech, using an on-device speech recognition model that is stored locally at the client device, to generate predicted output; and generate a gradient based on comparing the predicted output to ground truth output that corresponds to the textual segment. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.

    LEARNING APPARATUS, ESTIMATION APPARATUS, METHODS AND PROGRAMS FOR THE SAME

    公开(公告)号:US20240144912A1

    公开(公告)日:2024-05-02

    申请号:US18280159

    申请日:2021-03-10

    IPC分类号: G10L15/04 G10L15/02 G10L15/06

    摘要: An estimation apparatus includes an estimation unit that estimates a future incident occurrence quantitative value in a region on the basis of at least two or more inputted psychological-state/sensibility expressing words emitted in a predetermined region and the input order of the two or more psychological-state/sensibility expressing words, using an estimation model for estimating an incident occurrence quantitative value that is a quantitative value of an occurrence of a predetermined event in the region after a certain time, with an input being at least a time series of two or more psychological-state/sensibility expressing words emitted in the predetermined region before the certain time.

    Voice communication analysis system

    公开(公告)号:US11967307B2

    公开(公告)日:2024-04-23

    申请号:US17174845

    申请日:2021-02-12

    发明人: Suraj Shinde

    摘要: Techniques are disclosed for applying a trained machine learning model to incoming voice communications to determine whether the voice communications are genuine or not genuine. The trained machine learning model may identify vocal attributes within the target call and use the identified attributes, and the training, determine whether the target call is genuine or not genuine. An applied trained machine learning model may include multiple different types of trained machine learning models, where each of different types of machine learning models is trained and/or configured for a different function within the analysis.

    DOMAIN ADAPTIVE SPEECH RECOGNITION USING ARTIFICIAL INTELLIGENCE

    公开(公告)号:US20240127801A1

    公开(公告)日:2024-04-18

    申请号:US17965226

    申请日:2022-10-13

    摘要: Methods, systems, and computer program products for domain adaptive speech recognition using artificial intelligence are provided herein. A computer-implemented method includes generating a set of language data candidates, each language data candidate comprising one or more graphemes, by processing a sequence of phonemes related to input speech data using an artificial intelligence-based data conversion model; determining, for a target pair of phonemes and graphemes, a subset of graphemes from the set of language data candidates; generating a first speech recognition output by processing the subset of graphemes using at least one biasing language model and an artificial intelligence-based speech recognition model; generating a second speech recognition output by replacing at least a portion of the subset of graphemes in the first speech recognition output with at least one of the graphemes from the target pair; and performing automated actions based on the second speech recognition output.

    LEARNING APPARATUS, ESTIMATION APPARATUS, METHODS AND PROGRAMS FOR THE SAME

    公开(公告)号:US20240127796A1

    公开(公告)日:2024-04-18

    申请号:US18277552

    申请日:2021-02-18

    IPC分类号: G10L15/06 G10L15/16

    摘要: The present invention estimates intention of an utterance more accurately than the related arts. A learning device learns an estimation model on the basis of learning data including an acoustic signal for learning and a label indicating whether or not the acoustic signal has been uttered to a predetermined target. The learning device includes: a feature synchronization unit configured to obtain a post-synchronization feature by synchronizing an acoustic feature obtained from the acoustic signal for learning with a text feature corresponding to the acoustic signal; an utterance intention estimation unit configured to estimate whether or not the acoustic signal has been uttered to the predetermined target by using the post-synchronization feature; and a parameter update unit configured to update a parameter of the estimation model on the basis of the label included in the learning data and an estimation result by the utterance intention estimation unit.

    Phrase extraction for ASR models
    68.
    发明授权

    公开(公告)号:US11955134B2

    公开(公告)日:2024-04-09

    申请号:US17643848

    申请日:2021-12-13

    申请人: Google LLC

    摘要: A method of phrase extraction for ASR models includes obtaining audio data characterizing an utterance and a corresponding ground-truth transcription of the utterance and modifying the audio data to obfuscate a particular phrase recited in the utterance. The method also includes processing, using a trained ASR model, the modified audio data to generate a predicted transcription of the utterance, and determining whether the predicted transcription includes the particular phrase by comparing the predicted transcription of the utterance to the ground-truth transcription of the utterance. When the predicted transcription includes the particular phrase, the method includes generating an output indicating that the trained ASR model leaked the particular phrase from a training data set used to train the ASR model.

    Electronic device and control method therefor

    公开(公告)号:US11948567B2

    公开(公告)日:2024-04-02

    申请号:US17418314

    申请日:2019-10-04

    摘要: The present disclosure provides an electronic device and a control method therefor. The electronic device of the present disclosure comprises: a voice reception unit; and a processor for, when a first user voice and a second user voice are received through the voice reception unit, determining whether the second user voice corresponds to a candidate of utterance subsequent to the first user voice on the basis of a result obtained by dividing a plurality of attributes of the second user voice according to a predefined attribute, and controlling the electronic device to perform an operation corresponding to the second user voice on the basis of the intent of the second user voice obtained through a result of the determination.