Concurrent multi-path processing of audio signals for automatic speech recognition systems

    公开(公告)号:US12080274B2

    公开(公告)日:2024-09-03

    申请号:US17433868

    申请日:2019-02-28

    摘要: A system and method for concurrent multi-path processing of audio signals for automatic speech recognition is presented. Audio information defining a set of audio signals may be obtained (502). The audio signals may convey mixed audio content produced by multiple audio sources. A set of source-specific audio signals may be determined by demixing the mixed audio content produced by the multiple audio sources. Determining the set of source-specific audio signals may comprises providing the set of audio signals to both a first signal processing path and a second signal processing path (504). The first signal processing path may determine a value of a demixing parameter for demixing the mixed audio content (506). The second signal processing path may apply the value of the demixing parameter to the individual audio signals of the set of audio signals (508) to generate the individual source-specific audio signals (510).

    Systems and methods for processing speech dialogues

    公开(公告)号:US11862143B2

    公开(公告)日:2024-01-02

    申请号:US16996961

    申请日:2020-08-19

    发明人: Haiyang Xu Kun Han

    摘要: The present disclosure is related to systems and methods for processing speech dialogue. The method includes obtaining target speech dialogue data. The method includes obtaining a text vector representation sequence, a phonetic symbol vector representation sequence, and a role vector representation sequence by performing a vector transformation on the target speech dialogue data based on a text embedding model, a phonetic symbol embedding model, and a role embedding model, respectively. The method includes determining a representation vector corresponding to the target speech dialogue data by inputting the text vector representation sequence, the phonetic symbol vector representation sequence, and the role vector representation sequence into a trained speech dialogue coding model. The method includes determining a summary of the target speech dialogue data by inputting the representation vector into a classification model.