SYSTEMS AND METHODS FOR MUTUAL INFORMATION BASED SELF-SUPERVISED LEARNING

    公开(公告)号:US20220067534A1

    公开(公告)日:2022-03-03

    申请号:US17006570

    申请日:2020-08-28

    Abstract: Embodiments described herein combine both masked reconstruction and predictive coding. Specifically, unlike contrastive learning, the mutual information between past states and future states are directly estimated. The context information can also be directly captured via shifted masked reconstruction—unlike standard masked reconstruction, the target reconstructed observations are shifted slightly towards the future to incorporate more predictability. The estimated mutual information and shifted masked reconstruction loss can then be combined as the loss function to update the neural model.

    Systems and methods for mutual information based self-supervised learning

    公开(公告)号:US12198060B2

    公开(公告)日:2025-01-14

    申请号:US17006570

    申请日:2020-08-28

    Abstract: Embodiments described herein combine both masked reconstruction and predictive coding. Specifically, unlike contrastive learning, the mutual information between past states and future states are directly estimated. The context information can also be directly captured via shifted masked reconstruction—unlike standard masked reconstruction, the target reconstructed observations are shifted slightly towards the future to incorporate more predictability. The estimated mutual information and shifted masked reconstruction loss can then be combined as the loss function to update the neural model.

    Phone-based sub-word units for end-to-end speech recognition

    公开(公告)号:US11328731B2

    公开(公告)日:2022-05-10

    申请号:US16903964

    申请日:2020-06-17

    Abstract: System and methods for identifying a text word from a spoken utterance are provided. An ensemble BPE system that includes a phone BPE system and a character BPE system receives a spoken utterance. Both BPE systems include a multi-level language model (LM) and an acoustic model. The phone BPE system identifies first words from the spoken utterance and determine a first score for each first word. The first words are converted into character sequences. The character BPE model converts the character sequences into second words and determines a second score for each second word. For each word from the first words that matches a word in the second words the first and second scores are combined. The text word is the word with a highest score.

    Phone-Based Sub-Word Units for End-to-End Speech Recognition

    公开(公告)号:US20210319796A1

    公开(公告)日:2021-10-14

    申请号:US16903964

    申请日:2020-06-17

    Abstract: System and methods for identifying a text word from a spoken utterance are provided. An ensemble BPE system that includes a phone BPE system and a character BPE system receives a spoken utterance. Both BPE systems include a multi-level language model (LM) and an acoustic model. The phone BPE system identifies first words from the spoken utterance and determine a first score for each first word. The first words are converted into character sequences. The character BPE model converts the character sequences into second words and determines a second score for each second word. For each word from the first words that matches a word in the second words the first and second scores are combined. The text word is the word with a highest score.

Patent Agency Ranking