-
1.
公开(公告)号:US20240046628A1
公开(公告)日:2024-02-08
申请号:US18144274
申请日:2023-05-08
Applicant: Tsinghua University
Inventor: Wenwu ZHU , Xin WANG , Pinci YANG
IPC: G06V10/80
CPC classification number: G06V10/806
Abstract: A hierarchical audio-visual feature fusing method for audio-visual question answering and a product relate to the field of audio-visual question answering. By fusing audio embedding in an input video clip with a baseline model as well as video embedding and question embedding respectively at an early stage, a middle stage and a late stage in a hierarchical feature fusing process, a first answer probability distribution, a second answer probability distribution and a third answer probability distribution are obtained, and the answer probability distributions are added based on preset weights, and then averaged for hierarchical integration to generate a final answer.