Invention Publication
- Patent Title: MULTI-MODAL UNDERSTANDING OF EMOTIONS IN VIDEO CONTENT
-
Application No.: US18057643Application Date: 2022-11-21
-
Publication No.: US20240169711A1Publication Date: 2024-05-23
- Inventor: Divya Choudhary , Palash Goyal
- Applicant: Samsung Electronics Co., Ltd.
- Applicant Address: KR Suwon-si, Gyeonggi-do
- Assignee: Samsung Electronics Co., Ltd.
- Current Assignee: Samsung Electronics Co., Ltd.
- Current Assignee Address: KR Suwon-si, Gyeonggi-do
- Main IPC: G06V10/80
- IPC: G06V10/80 ; G06V40/16

Abstract:
A method includes obtaining a video sequence having multiple video frames and audio data. The method also includes extracting video features associated with at least one face in the video frames and audio features associated with the audio data. The method further includes processing the video features and the audio features using a trained machine learning model. The trained machine learning model performs a multi-tiered fusion of the video features and different subsets of the audio features in order to identify at least one emotion expressed by at least one person in the video sequence. The multi-tiered fusion of the video features and the audio features may include (i) a first fusion of the video features and a first subset of the audio features and (ii) a second fusion of processed features and a second subset of the audio features, where the processed features are based on the first fusion.
Information query