HIERARCHICAL AUDIO-VISUAL FEATURE FUSING METHOD FOR AUDIO-VISUAL QUESTION ANSWERING AND PRODUCT

    公开(公告)号:US20240046628A1

    公开(公告)日:2024-02-08

    申请号:US18144274

    申请日:2023-05-08

    CPC classification number: G06V10/806

    Abstract: A hierarchical audio-visual feature fusing method for audio-visual question answering and a product relate to the field of audio-visual question answering. By fusing audio embedding in an input video clip with a baseline model as well as video embedding and question embedding respectively at an early stage, a middle stage and a late stage in a hierarchical feature fusing process, a first answer probability distribution, a second answer probability distribution and a third answer probability distribution are obtained, and the answer probability distributions are added based on preset weights, and then averaged for hierarchical integration to generate a final answer.

Patent Agency Ranking