DIALOGUE SYSTEM AND CONTROL METHOD THEREOF
    21.
    发明公开

    公开(公告)号:US20230317080A1

    公开(公告)日:2023-10-05

    申请号:US18081201

    申请日:2022-12-14

    Inventor: In Jik Lee

    Abstract: A dialogue system includes a voice recognition module provided to execute voice recognition, a storage device in which a result of the executed voice recognition is stored, and a controller configured to determine priorities of an external event and the voice recognition when the external event occurs while an utterance of a user is input, and pause the execution of the voice recognition and store a result of the voice recognition of the utterance of the user inputted before the pause in the storage device, when the controller concludes that the priority of the external event is higher than the priority of the voice recognition.

    GUIDANCE QUERY FOR CACHE SYSTEM
    24.
    发明公开

    公开(公告)号:US20230223027A1

    公开(公告)日:2023-07-13

    申请号:US18184783

    申请日:2023-03-16

    CPC classification number: G06F16/634 G06F16/636 G10L15/285 G10L15/32

    Abstract: A device may be configured to determine whether an audio file is a first type of audio file that is capable of being processed to recognize the voice query based on a characteristic of the audio file itself or a second type of audio file that may require speech recognition processing in order to recognize the voice query associated with the audio file. In determining whether the audio file is a first type of audio file or a second type of audio file, a query filter associated with the device may be configured to access one or more guidance queries. Using the one or more guidance queries, the device may classify the audio file as a first type of audio file or a second type of audio file based on receiving only a portion of the audio file, thereby improving the speed at which the audio file can be processed.

    FUSED ACOUSTIC AND TEXT ENCODING FOR MULTIMODAL BILINGUAL PRETRAINING AND SPEECH TRANSLATION

    公开(公告)号:US20230169281A1

    公开(公告)日:2023-06-01

    申请号:US17533687

    申请日:2021-11-23

    Applicant: Baidu USA, LLC

    CPC classification number: G06F40/58 G10L15/28 G10L15/063

    Abstract: Representation learning for text and speech has improved many language-related tasks. However, existing methods only learn from one input modality, while a unified representation for both speech and text is needed for tasks such as end-to-end speech translation. Consequently, these methods cannot exploit various large-scale text and speech data and their performance is limited by the scarcity of parallel speech translation data. To address these problems, embodiments of a fused acoustic and text masked language model (FAT-MLM) are disclosed. FAT-MLM embodiments jointly learn a unified representation for both acoustic and text input from various types of corpora including parallel data for speech recognition and machine translation, and pure speech and text data. Within this cross-modal representation learning framework, an end-to-end model is further presented for fused acoustic and text speech translation. Experiments show that by fine-tuning from FAT-MLM, the speech translation model embodiments substantially improve translation quality.

    Guidance query for cache system
    28.
    发明授权

    公开(公告)号:US11609947B2

    公开(公告)日:2023-03-21

    申请号:US16659262

    申请日:2019-10-21

    Abstract: A device may be configured to determine whether an audio file is a first type of audio file that is capable of being processed to recognize the voice query based on a characteristic of the audio file itself or a second type of audio file that may require speech recognition processing in order to recognize the voice query associated with the audio file. In determining whether the audio file is a first type of audio file or a second type of audio file, a query filter associated with the device may be configured to access one or more guidance queries. Using the one or more guidance queries, the device may classify the audio file as a first type of audio file or a second type of audio file based on receiving only a portion of the audio file, thereby improving the speed at which the audio file can be processed.

    Hotword detection on multiple devices

    公开(公告)号:US11557299B2

    公开(公告)日:2023-01-17

    申请号:US17137157

    申请日:2020-12-29

    Applicant: Google LLC

    Inventor: Matthew Sharifi

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword detection on multiple devices are disclosed. In one aspect, a method includes the actions of receiving, by a first computing device, audio data that corresponds to an utterance. The actions further include determining a first value corresponding to a likelihood that the utterance includes a hotword. The actions further include receiving a second value corresponding to a likelihood that the utterance includes the hotword, the second value being determined by a second computing device. The actions further include comparing the first value and the second value. The actions further include based on comparing the first value to the second value, initiating speech recognition processing on the audio data.

Patent Agency Ranking