Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow fusion

    公开(公告)号:US12087306B1

    公开(公告)日:2024-09-10

    申请号:US17535005

    申请日:2021-11-24

    CPC classification number: G10L15/28 G10L15/16

    Abstract: In one embodiment, a method includes receiving a user's utterance comprising a word in a custom vocabulary list of the user, generating a previous token to represent a previous audio portion of the utterance, and generating a current token to represent a current audio portion of the utterance by generating a bias embedding by using the previous token to query a trie of wordpieces representing the custom vocabulary list, generating first probabilities of respective first candidate tokens likely uttered in the current audio portion based on the bias embedding and the current audio portion, generating second probabilities of respective second candidate tokens likely uttered after the previous token based on the previous token and the bias embedding, and generating the current token to represent the current audio portion of the utterance based on the first probabilities of the first candidate tokens and the second probabilities of the second candidate tokens.

Patent Agency Ranking