-
公开(公告)号:US11610610B1
公开(公告)日:2023-03-21
申请号:US17643805
申请日:2021-12-10
Applicant: Amazon Technologies, Inc.
Inventor: Avijit Vajpayee , Hooman Mahyar , Vimal Bhat , Abhinav Jain , Zhikang Zhang
Abstract: Systems and methods are provided for detecting and correcting synchronization errors in multimedia content comprising a video stream and a non-original audio stream. Techniques for directly detecting synchronization of video and audio streams may be inadequate to detect synchronize errors for non-original audio streams, particularly where such non-original audio streams contain audio not reflective of events within the video stream, such as speaking dialog in a different language than the speakers of the video stream. To overcome this problem, the present disclosure enables synchronization of a non-original audio stream to another audio stream, such as an original audio stream, that is synchronized to the video stream. By comparison of signatures, the non-original and other audio stream are aligned to determine an offset that can be used to synchronize the non-original audio stream to the video stream.
-
公开(公告)号:US11581020B1
公开(公告)日:2023-02-14
申请号:US17217221
申请日:2021-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Sunil Sharadchandra Hadap , Vimal Bhat , Abhinav Jain
IPC: G11B27/036 , G06V40/16
Abstract: Techniques are disclosed for performing video synthesis of audiovisual content. In an example, a computing system may determine first facial parameters of a face of a particular person from a first frame in a video shot, whereby the video shot shows the particular person speaking a message. The system may determine second facial parameters based on an audio file that corresponds to the message being spoken in a different way from the video shot. The system may generate third facial parameters by merging the first and the second facial parameters. The system may identify a region of the face that is associated with a difference between the first and second facial parameters, render the region of the face based on a neural texture of the video shot, and then output a new frame showing the face of the particular person speaking the message in the different way.
-
公开(公告)号:US20240242413A1
公开(公告)日:2024-07-18
申请号:US18432623
申请日:2024-02-05
Applicant: Amazon Technologies, Inc.
Inventor: Avijit Vajpayee , Vimal Bhat , Arjun Cholkar , Louis Kirk Barker , Abhinav Jain
IPC: G06T13/40 , G06F40/20 , G06N3/08 , G06T17/00 , G06V20/40 , G06V40/16 , G06V40/20 , G09B21/00 , G10L25/63 , H04N5/272
CPC classification number: G06T13/40 , G06F40/20 , G06N3/08 , G06T17/00 , G06V20/46 , G06V40/174 , G06V40/28 , G09B21/009 , G10L25/63 , H04N5/272
Abstract: Systems, methods, and computer-readable media are disclosed for systems and methods for automated generation and presentation of sign language avatars for video content. Example methods may include determining, by one or more computer processors coupled to memory, a first segment of video content, the first segment including a first set of frames, first audio content, and first subtitle data, where the first subtitle data comprises a first word and a second word. Methods may include determining, using a first machine learning model, a first sign gesture associated with the first word, determining first motion data associated with the first sign gesture, and determining first facial expression data. Methods may include generating an avatar configured to perform the first sign gesture using the first motion data, where a facial expression of the avatar while performing the first sign gesture is based on the first facial expression data.
-
公开(公告)号:US11935170B1
公开(公告)日:2024-03-19
申请号:US17530070
申请日:2021-11-18
Applicant: Amazon Technologies, Inc.
Inventor: Abhinav Jain , Avijit Vajpayee , Vimal Bhat , Arjun Cholkar , Louis Kirk Barker
IPC: G06T13/40 , G06F40/20 , G06N3/08 , G06T17/00 , G06V20/40 , G06V40/16 , G06V40/20 , G09B21/00 , G10L25/63 , H04N5/272
CPC classification number: G06T13/40 , G06F40/20 , G06N3/08 , G06T17/00 , G06V20/46 , G06V40/174 , G06V40/28 , G09B21/009 , G10L25/63 , H04N5/272
Abstract: Systems, methods, and computer-readable media are disclosed for systems and methods for automated generation and presentation of sign language avatars for video content. Example methods may include determining, by one or more computer processors coupled to memory, a first segment of video content, the first segment including a first set of frames, first audio content, and first subtitle data, where the first subtitle data comprises a first word and a second word. Methods may include determining, using a first machine learning model, a first sign gesture associated with the first word, determining first motion data associated with the first sign gesture, and determining first facial expression data. Methods may include generating an avatar configured to perform the first sign gesture using the first motion data, where a facial expression of the avatar while performing the first sign gesture is based on the first facial expression data.
-
公开(公告)号:US11582519B1
公开(公告)日:2023-02-14
申请号:US17215475
申请日:2021-03-29
Applicant: Amazon Technologies, Inc.
Inventor: Vimal Bhat , Sunil Sharadchandra Hadap , Abhinav Jain
IPC: H04N21/466 , G06V20/40 , G06V40/10 , G06V40/16
Abstract: Techniques are disclosed for performing video synthesis of audiovisual content. In an example, a computing system may determine first parameters of a face and body of a source person from a first frame in a video shot. The system also determines second parameters of a face and body of a target person. The system determines that the target person is a replacement for the source person in the first frame. The system generates third parameters of the target person based on merging the first parameters with the second parameters. The system then performs deferred neural rendering of the target person based on a neural texture that corresponds to a texture space of the video shot. The system then outputs a second frame that shows the target person as the replacement for the source person.
-
公开(公告)号:US11659217B1
公开(公告)日:2023-05-23
申请号:US17301212
申请日:2021-03-29
Applicant: Amazon Technologies, Inc.
Inventor: Hooman Mahyar , Avijit Vajpayee , Abhinav Jain , Arjun Cholkar , Vimal Bhat
IPC: H04N21/242 , H04N21/234 , H04N21/233
CPC classification number: H04N21/242 , H04N21/233 , H04N21/234
Abstract: Techniques are described for detecting desynchronization between an audio component and a video component of a media presentation. Feature sets may be determined for portions of the audio component and portions of the video component, which may then be used to generate correlations between portions of the audio component and portions of the video component. Synchronization may then be assessed based on the correlations.
-
-
-
-
-