MULTI-MODAL UNDERSTANDING OF EMOTIONS IN VIDEO CONTENT

    公开(公告)号:US20240169711A1

    公开(公告)日:2024-05-23

    申请号:US18057643

    申请日:2022-11-21

    CPC classification number: G06V10/80 G06V40/168 G06V40/174

    Abstract: A method includes obtaining a video sequence having multiple video frames and audio data. The method also includes extracting video features associated with at least one face in the video frames and audio features associated with the audio data. The method further includes processing the video features and the audio features using a trained machine learning model. The trained machine learning model performs a multi-tiered fusion of the video features and different subsets of the audio features in order to identify at least one emotion expressed by at least one person in the video sequence. The multi-tiered fusion of the video features and the audio features may include (i) a first fusion of the video features and a first subset of the audio features and (ii) a second fusion of processed features and a second subset of the audio features, where the processed features are based on the first fusion.

    SYSTEM AND METHOD FOR ENHANCING MACHINE LEARNING MODEL FOR AUDIO/VIDEO UNDERSTANDING USING GATED MULTI-LEVEL ATTENTION AND TEMPORAL ADVERSARIAL TRAINING

    公开(公告)号:US20220300740A1

    公开(公告)日:2022-09-22

    申请号:US17387889

    申请日:2021-07-28

    Abstract: A method includes obtaining, using at least one processor, audio/video content. The method also includes processing, using the at least one processor, the audio/video content with a trained attention-based machine learning model to classify the audio/video content. Processing the audio/video content includes, using the trained attention-based machine learning model, generating a global representation of the audio/video content based on the audio/video content, generating a local representation of the audio/video content based on different portions of the audio/video content, and combining the global representation of the audio/video content and the local representation of the audio/video content to generate an output representation of the audio/video content. The audio/video content is classified based on the output representation.

    System and method for enhancing machine learning model for audio/video understanding using gated multi-level attention and temporal adversarial training

    公开(公告)号:US11989939B2

    公开(公告)日:2024-05-21

    申请号:US17387889

    申请日:2021-07-28

    CPC classification number: G06V20/41 G06F18/214

    Abstract: A method includes obtaining, using at least one processor, audio/video content. The method also includes processing, using the at least one processor, the audio/video content with a trained attention-based machine learning model to classify the audio/video content. Processing the audio/video content includes, using the trained attention-based machine learning model, generating a global representation of the audio/video content based on the audio/video content, generating a local representation of the audio/video content based on different portions of the audio/video content, and combining the global representation of the audio/video content and the local representation of the audio/video content to generate an output representation of the audio/video content. The audio/video content is classified based on the output representation.

Patent Agency Ranking