Monte Carlo Self-Training for Speech Recognition

    公开(公告)号:US20240177706A1

    公开(公告)日:2024-05-30

    申请号:US18515212

    申请日:2023-11-20

    申请人: Google LLC

    摘要: A method for training a sequence transduction model includes receiving a sequence of unlabeled input features extracted from unlabeled input samples. Using a teacher branch of an unsupervised subnetwork, the method includes processing the sequence of input features to predict probability distributions over possible teacher branch output labels, sampling one or more sequences of teacher branch output labels, and determining a sequence of pseudo output labels based on the one or more sequences of teacher branch output labels. Using a student branch that includes a student encoder of the unsupervised subnetwork, the method includes processing the sequence of input 10 features to predict probability distributions over possible student branch output labels, determining a negative log likelihood term based on the predicted probability distributions over possible student branch output labels and the sequence of pseudo output labels, and updating parameters of the student encoder.

    SYSTEM AND METHOD FOR TRANSLATION OF STREAMING ENCRYPTED CONTENT

    公开(公告)号:US20240161734A1

    公开(公告)日:2024-05-16

    申请号:US18282112

    申请日:2022-03-31

    IPC分类号: G10L15/06

    CPC分类号: G10L15/063

    摘要: Method and servers for generating a speech model for generating signals representative of utterances in a first language based on signals representative of utterances in a second language are disclosed. The method comprises transmitting a first and a second speech models to a first and a second devices of a first and a second users respectively. The first device is communicatively coupled with the second device by an encrypted communication link. A third speech model is acquired from the second device based on a local training of the second speech model on the second device. A training set comprises a first and a second decrypted signals representative of an utterance of the first user in the first language and a translated utterance of the first user in the second language respectively. The speech model is locally generated by the server by combining the second and third speech models.