Patent search ap:("Samsung Electronics Co. Page Ltd.") AND inv:"Palash Goyal"

1.

发明公开
MULTI-MODAL UNDERSTANDING OF EMOTIONS IN VIDEO CONTENT 审中-公开

公开(公告)号：US20240169711A1

公开(公告)日：2024-05-23

申请号：US18057643

申请日：2022-11-21

Applicant: Samsung Electronics Co., Ltd.

Inventor： Divya Choudhary , Palash Goyal

IPC: G06V10/80 , G06V40/16

CPC classification number: G06V10/80 , G06V40/168 , G06V40/174

Abstract: A method includes obtaining a video sequence having multiple video frames and audio data. The method also includes extracting video features associated with at least one face in the video frames and audio features associated with the audio data. The method further includes processing the video features and the audio features using a trained machine learning model. The trained machine learning model performs a multi-tiered fusion of the video features and different subsets of the audio features in order to identify at least one emotion expressed by at least one person in the video sequence. The multi-tiered fusion of the video features and the audio features may include (i) a first fusion of the video features and a first subset of the audio features and (ii) a second fusion of processed features and a second subset of the audio features, where the processed features are based on the first fusion.

2.

发明申请
SYSTEM AND METHOD FOR ENHANCING MACHINE LEARNING MODEL FOR AUDIO/VIDEO UNDERSTANDING USING GATED MULTI-LEVEL ATTENTION AND TEMPORAL ADVERSARIAL TRAINING 有权

公开(公告)号：US20220300740A1

公开(公告)日：2022-09-22

申请号：US17387889

申请日：2021-07-28

Applicant: Samsung Electronics Co., Ltd.

Inventor： Saurabh Sahu , Palash Goyal

IPC: G06K9/00 , G06K9/62

Abstract: A method includes obtaining, using at least one processor, audio/video content. The method also includes processing, using the at least one processor, the audio/video content with a trained attention-based machine learning model to classify the audio/video content. Processing the audio/video content includes, using the trained attention-based machine learning model, generating a global representation of the audio/video content based on the audio/video content, generating a local representation of the audio/video content based on different portions of the audio/video content, and combining the global representation of the audio/video content and the local representation of the audio/video content to generate an output representation of the audio/video content. The audio/video content is classified based on the output representation.

3.

发明授权
System and method for enhancing machine learning model for audio/video understanding using gated multi-level attention and temporal adversarial training 有权

公开(公告)号：US11989939B2

公开(公告)日：2024-05-21

申请号：US17387889

申请日：2021-07-28

Applicant: Samsung Electronics Co., Ltd.

Inventor： Saurabh Sahu , Palash Goyal

IPC: G06V20/40 , G06F18/214

CPC classification number: G06V20/41 , G06F18/214

Abstract: A method includes obtaining, using at least one processor, audio/video content. The method also includes processing, using the at least one processor, the audio/video content with a trained attention-based machine learning model to classify the audio/video content. Processing the audio/video content includes, using the trained attention-based machine learning model, generating a global representation of the audio/video content based on the audio/video content, generating a local representation of the audio/video content based on different portions of the audio/video content, and combining the global representation of the audio/video content and the local representation of the audio/video content to generate an output representation of the audio/video content. The audio/video content is classified based on the output representation.

4.

发明申请
MICROGENRE-BASED HYPER-PERSONALIZATION WITH MULTI-MODAL MACHINE LEARNING 有权

公开(公告)号：US20220245424A1

公开(公告)日：2022-08-04

申请号：US17368683

申请日：2021-07-06

Applicant: Samsung Electronics Co., Ltd.

Inventor： Palash Goyal , Saurabh Sahu , Shalini Ghosh , Hyun Chul Lee

IPC: G06N3/04 , G06N3/08

Abstract: A method includes accessing video data that includes at least two different modalities. The method also includes using a convolutional neural network layer to incorporate temporal coherence into a machine learning model architecture configured to process the video data. The method further includes learning dependency among the at least two different modalities in an attention space of the machine learning model architecture. In addition, the method includes predicting one or more correlations among the at least two different modalities.

Patent Agency Ranking