-
公开(公告)号:US20240169711A1
公开(公告)日:2024-05-23
申请号:US18057643
申请日:2022-11-21
Applicant: Samsung Electronics Co., Ltd.
Inventor: Divya Choudhary , Palash Goyal
CPC classification number: G06V10/80 , G06V40/168 , G06V40/174
Abstract: A method includes obtaining a video sequence having multiple video frames and audio data. The method also includes extracting video features associated with at least one face in the video frames and audio features associated with the audio data. The method further includes processing the video features and the audio features using a trained machine learning model. The trained machine learning model performs a multi-tiered fusion of the video features and different subsets of the audio features in order to identify at least one emotion expressed by at least one person in the video sequence. The multi-tiered fusion of the video features and the audio features may include (i) a first fusion of the video features and a first subset of the audio features and (ii) a second fusion of processed features and a second subset of the audio features, where the processed features are based on the first fusion.
-
公开(公告)号:US20220300740A1
公开(公告)日:2022-09-22
申请号:US17387889
申请日:2021-07-28
Applicant: Samsung Electronics Co., Ltd.
Inventor: Saurabh Sahu , Palash Goyal
Abstract: A method includes obtaining, using at least one processor, audio/video content. The method also includes processing, using the at least one processor, the audio/video content with a trained attention-based machine learning model to classify the audio/video content. Processing the audio/video content includes, using the trained attention-based machine learning model, generating a global representation of the audio/video content based on the audio/video content, generating a local representation of the audio/video content based on different portions of the audio/video content, and combining the global representation of the audio/video content and the local representation of the audio/video content to generate an output representation of the audio/video content. The audio/video content is classified based on the output representation.
-
公开(公告)号:US11989939B2
公开(公告)日:2024-05-21
申请号:US17387889
申请日:2021-07-28
Applicant: Samsung Electronics Co., Ltd.
Inventor: Saurabh Sahu , Palash Goyal
IPC: G06V20/40 , G06F18/214
CPC classification number: G06V20/41 , G06F18/214
Abstract: A method includes obtaining, using at least one processor, audio/video content. The method also includes processing, using the at least one processor, the audio/video content with a trained attention-based machine learning model to classify the audio/video content. Processing the audio/video content includes, using the trained attention-based machine learning model, generating a global representation of the audio/video content based on the audio/video content, generating a local representation of the audio/video content based on different portions of the audio/video content, and combining the global representation of the audio/video content and the local representation of the audio/video content to generate an output representation of the audio/video content. The audio/video content is classified based on the output representation.
-
公开(公告)号:US20220245424A1
公开(公告)日:2022-08-04
申请号:US17368683
申请日:2021-07-06
Applicant: Samsung Electronics Co., Ltd.
Inventor: Palash Goyal , Saurabh Sahu , Shalini Ghosh , Hyun Chul Lee
Abstract: A method includes accessing video data that includes at least two different modalities. The method also includes using a convolutional neural network layer to incorporate temporal coherence into a machine learning model architecture configured to process the video data. The method further includes learning dependency among the at least two different modalities in an attention space of the machine learning model architecture. In addition, the method includes predicting one or more correlations among the at least two different modalities.
-
-
-