END-TO-END MULTITASK VIDEO RETRIEVAL WITH CROSS-ATTENTION

    公开(公告)号:US20250139971A1

    公开(公告)日:2025-05-01

    申请号:US18410363

    申请日:2024-01-11

    Abstract: A method includes obtaining a video and a relational space-time query and identifying at least one type of the relational space-time query. The at least one identified type of the relational space-time query represents at least one of: an activity type, an object type, or a time type. The method also includes learning correlations among activities, objects, and time in the video using, one or more cross-attention models. The method further includes obtaining one or more predictions generated using one or more outputs of the one or more cross-attention models based on the at least one identified type of the relational space-time query. In addition, the method includes generating a response to the relational space-time query based on the one or more predictions.

Patent Agency Ranking