Emitting word timings with end-to-end models

    公开(公告)号:US12027154B2

    公开(公告)日:2024-07-02

    申请号:US18167050

    申请日:2023-02-09

    申请人: Google LLC

    IPC分类号: G10L25/30 G10L15/06 G10L25/78

    摘要: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

    Method to improve digital agent conversations

    公开(公告)号:US12020689B2

    公开(公告)日:2024-06-25

    申请号:US17718357

    申请日:2022-04-12

    摘要: A computer-implemented method for virtual agent conversation training is disclosed. The computer-implemented method includes determining a current state of a first stage of a conversation between a pair of virtual agents. The computer-implemented method further includes determining a pivot distance between the current state of the first stage of the conversation and a subsequent, second stage of the conversation. The computer-implemented method further includes responsive to determining that the pivot distance between the current state of the first stage of the conversation and the subsequent, second stage of the conversation is below a predetermined threshold, determining an angle of dislocation with respect to the pivot distance. The computer-implemented method further includes terminating the conversation based, at least in part, on determining that the angle of dislocation is above a predetermined threshold.

    Large-scale language model data selection for rare-word speech recognition

    公开(公告)号:US12014725B2

    公开(公告)日:2024-06-18

    申请号:US17643861

    申请日:2021-12-13

    申请人: Google LLC

    摘要: A method of training a language model for rare-word speech recognition includes obtaining a set of training text samples, and obtaining a set of training utterances used for training a speech recognition model. Each training utterance in the plurality of training utterances includes audio data corresponding to an utterance and a corresponding transcription of the utterance. The method also includes applying rare word filtering on the set of training text samples to identify a subset of rare-word training text samples that include words that do not appear in the transcriptions from the set of training utterances or appear in the transcriptions from the set of training utterances less than a threshold number of times. The method further includes training the external language model on the transcriptions from the set of training utterances and the identified subset of rare-word training text samples.

    Modular Training for Flexible Attention Based End-to-End ASR

    公开(公告)号:US20240185839A1

    公开(公告)日:2024-06-06

    申请号:US18526148

    申请日:2023-12-01

    申请人: Google LLC

    IPC分类号: G10L15/06

    CPC分类号: G10L15/063 G10L2015/0635

    摘要: A method for training a modular neural network model includes training only a backbone model to provide a first model configuration of the modular neural network model. The first model configuration includes only the trained backbone model. The method also includes adding an intrinsic sub-model to the trained backbone model. During a fine-tuning training stage, the method includes freezing parameters of the trained backbone model and fine-tuning parameters of the intrinsic sub-model added to the trained backbone model while the parameters of the trained backbone model are frozen to provide a second model configuration that includes the backbone model initially trained during the initial training stage and the intrinsic sub-model having the parameters fine-tuned during the fine-tuning stage.

    SYSTEM AND METHOD FOR ACTIVE LEARNING BASED MULTILINGUAL SEMANTIC PARSER

    公开(公告)号:US20240185838A1

    公开(公告)日:2024-06-06

    申请号:US18318225

    申请日:2023-05-16

    申请人: Openstream Inc.

    IPC分类号: G10L15/06 G10L15/18

    摘要: Described is a system and method for training a multilingual semantic parser. A method includes receiving, by a multilingual semantic parser, a multilingual training dataset, wherein the multilingual training dataset includes pairs of utterances and meaning representations from at least one high-resource language and at least one low-resource language and wherein the multilingual training dataset is initially a machine-translated dataset, training, the multilingual semantic parser, by translating the utterances in the multilingual training dataset to a target language; and iteratively performing selecting, by an acquisition functions estimator, a subset of the multilingual training dataset for human translation, updating the multilingual training dataset with the human-translated subset of the multilingual training dataset with, and retraining, the multilingual semantic parser, with the updated multilingual training dataset.