JOINT END-TO-END SPOKEN LANGUAGE UNDERSTANDING AND AUTOMATIC SPEECH RECOGNITION

    公开(公告)号:US20250078824A1

    公开(公告)日:2025-03-06

    申请号:US18814275

    申请日:2024-08-23

    Abstract: A method includes receiving an utterance from an audio input device. The method also includes determining a context associated with the utterance. The method also includes providing the utterance as an input to a joint model for automatic speech recognition (ASR) and spoken language understanding (SLU), wherein the joint model operates in a single mode to perform both ASR and SLU or a dual mode to perform one of ASR or SLU depending on the context. The method also includes using an output of the joint model to perform an action requested in the utterance. The joint model is trained by training a shared encoder and a shared decoder using a text-to-text task and, after training the shared encoder and the shared decoder, training a speech encoder and the shared encoder using a speech self-supervised learning (SSL) learning task and a text-to-text task with a masked prediction loss.

Patent Agency Ranking