-
公开(公告)号:US12067779B1
公开(公告)日:2024-08-20
申请号:US17668014
申请日:2022-02-09
Applicant: Amazon Technologies, Inc.
Inventor: Shixing Chen , Xiang Hao , Xiaohan Nie , Muhammad Raffay Hamid
IPC: G06V20/40 , G06V10/774
CPC classification number: G06V20/48 , G06V10/774 , G06V20/46
Abstract: A plurality of similar video pairs may be determined based on one or more similarity information types. Each video pair of the plurality of similar video pairs may include a first respective video and a second respective video. For each video pair, one or more similar scene pairs may be determined. Each of the one or more similar scene pairs may include a respective first scene from the first respective video and a second respective scene from the second respective video. An encoder may be trained using a contrastive learning model that contrasts a plurality of similar scene pairs with a plurality of random scenes. The plurality of similar scene pairs may include the one or more scene pairs for each video pair. One or more scene features of one or more other scenes of one or more other videos may be determined using the encoder.
-
公开(公告)号:US12046002B1
公开(公告)日:2024-07-23
申请号:US17684197
申请日:2022-03-01
Applicant: Amazon Technologies, Inc.
Inventor: Xiaohan Nie , Michael Thomas Pecchia , Leo Chan , Ahmed Aly Saad Ahmed , Muhammad Raffay Hamid , Sheng Liu
Abstract: Systems, devices, and methods are provided for depth guided structure from motion. A system may obtain a plurality of image frames from a digital content item that corresponds to a scene and determine, based at least in part on a correspondence search, a set of 2-D keypoints for the plurality of image frames. A depth estimator may be used to determine a plurality of dense depth map for the plurality of image frames. The set of 2-D keypoints and the plurality of dense depth maps may be used to determine a corresponding set of depth priors. Initialization and/or depth-regularized optimization may be performed using the keypoints and depth priors.
-
公开(公告)号:US11625928B1
公开(公告)日:2023-04-11
申请号:US17009311
申请日:2020-09-01
Applicant: Amazon Technologies, Inc.
Inventor: Tamojit Chatterjee , Mayank Sharma , Muhammad Raffay Hamid , Sandeep Joshi
Abstract: Systems, methods, and computer-readable media are disclosed for language-agnostic subtitle drift detection and correction. A method may include determining subtitles and/or captions from media content (e.g., videos), the subtitles and/or captions corresponding to dialog in the media content. The subtitles may be broken up into segments which may be analyzed to determine a likelihood of drift (e.g., a likelihood that the subtitles are out of synchronization with the dialog in the media content) for each segment. For segments with a high likelihood of drift, the subtitles may be incrementally adjusted to determine an adjustment that eliminates and/or reduces the amount of drift and the drift in the segment may be corrected based on the drift amount detected. A linear regression model and/or human blocks determined by human operators may be used to otherwise optimize drift correction.
-
4.
公开(公告)号:US11829413B1
公开(公告)日:2023-11-28
申请号:US17030103
申请日:2020-09-23
Applicant: Amazon Technologies, Inc.
Inventor: Xiang Hao , Jingxiang Chen , Vernon Germano , Muhammad Raffay Hamid , Lakshay Sharma
IPC: G06F16/783 , G06N20/00 , G06F16/75
CPC classification number: G06F16/7847 , G06F16/75 , G06N20/00
Abstract: Techniques for temporal localization of mature content in long-form videos using only video-level labels are described. According to some embodiments, computer-implemented method includes receiving a request to train a machine learning model on a training video file comprising at least one mature content label, training the machine learning model to generate a feature vector for each of a plurality of video frames of the training video file, generate a plurality of frame-level mature content classification scores of the training video file from the feature vectors of the training video file, and generate a video-level mature content classification score of the training video file from the plurality of frame-level mature content classification scores for the training video file based at least in part on the at least one mature content label of the training video file, receiving a request for an input video file, generating, by the machine learning model in response to the request, a feature vector for each of a plurality of video frames of the input video file, a plurality of frame-level mature content classification scores of the input video file from the feature vectors of the input video file, and a video-level mature content classification score of the input video file from the plurality of frame-level mature content classification scores for the input video file, and transmitting the plurality of frame-level mature content classification scores of the input video file or the video-level mature content classification score of the input video file to a client application or to a storage location.
-
公开(公告)号:US11748988B1
公开(公告)日:2023-09-05
申请号:US17236688
申请日:2021-04-21
Applicant: Amazon Technologies, Inc.
Inventor: Shixing Chen , Xiaohan Nie , David Jiatian Fan , Dongqing Zhang , Vimal Bhat , Muhammad Raffay Hamid
IPC: G06V20/40 , G06N20/00 , G06N5/04 , G06F16/73 , G06F16/78 , G11B27/34 , H04N5/14 , G11B27/036 , G06V10/75 , G06F18/22 , G06F18/214
CPC classification number: G06V20/46 , G06F16/73 , G06F16/78 , G06F18/214 , G06F18/22 , G06N5/04 , G06N20/00 , G06V10/751 , G06V20/49 , G11B27/036 , G11B27/34 , H04N5/147
Abstract: Techniques for automatic scene change detection in a video are described. As one example, a computer-implemented method includes extracting features of a query shot and its neighboring shots of a first set of shots without labels with a query model, determining a key shot of the neighboring shots which is most similar to the query shot based at least in part on the features of the query shot and its neighboring shots, extracting features of the key shot with a key model, training the query model into a trained query model based at least in part on a comparison of the features of the query shot and the features of the key shot, extracting features of a second set of shots with labels with the trained query model, and training a temporal model into a trained temporal model based at least in part on the features extracted from the second set of shots and the labels of the second set of shots.
-
公开(公告)号:US11468578B2
公开(公告)日:2022-10-11
申请号:US16948348
申请日:2020-09-14
Applicant: Amazon Technologies, Inc.
Inventor: Xiaohan Nie , Muhammad Raffay Hamid
IPC: G06T7/33 , G06T7/73 , G06V20/40 , G06V10/75 , G06K9/62 , H04N5/272 , H04N21/2187 , H04N21/234
Abstract: Methods and systems are described for registering a sports field to a video. Video of a live event may feature participants at a venue. A template of the venue, including virtual markings that represent real markings on the venue, may be obtained. A homographic transformation between an image plane and a ground plane may be determined by matching virtual markings to corresponding real markings captured in at least one frame of the video. The determined homographic transformation may be used in the automated analysis of sports statistics and in improving inserted annotations and visualizations.
-
公开(公告)号:US11341185B1
公开(公告)日:2022-05-24
申请号:US16386992
申请日:2019-04-17
Applicant: Amazon Technologies, Inc.
Inventor: Muhammad Raffay Hamid
IPC: G06F16/71 , G06F16/783 , G10L25/57 , G06V20/40 , G06K9/62
Abstract: Techniques for content-based indexing of videos at web-scale are described. As one example, a computer-implemented method includes receiving a video file, splitting the video file into video frames and audio for the video frames, determining audial features for the audio, clustering each of a plurality of subsets of the audial features into a respective audio centroid for a shared set of bases, determining a first adjacency matrix of distances between the respective audio centroids, determining visual features for the video frames, clustering each of a plurality of subsets of the visual features into a respective video centroid, and determining a second adjacency matrix of distances between the respective video centroids.
-
公开(公告)号:US20220084222A1
公开(公告)日:2022-03-17
申请号:US16948348
申请日:2020-09-14
Applicant: Amazon Technologies, Inc.
Inventor: Xiaohan Nie , Muhammad Raffay Hamid
Abstract: Methods and systems are described for registering a sports field to a video. Video of a live event may feature participants at a venue. A template of the venue, including virtual markings that represent real markings on the venue, may be obtained. A homographic transformation between an image plane and a ground plane may be determined by matching virtual markings to corresponding real markings captured in at least one frame of the video. The determined homographic transformation may be used in the automated analysis of sports statistics and in improving inserted annotations and visualizations.
-
公开(公告)号:US11205445B1
公开(公告)日:2021-12-21
申请号:US16436351
申请日:2019-06-10
Applicant: Amazon Technologies, Inc.
Inventor: Mayank Sharma , Sandeep Joshi , Muhammad Raffay Hamid
Abstract: Systems, methods, and computer-readable media are disclosed for systems and methods for language agnostic automated voice activity detection. Example methods may include determining an audio file associated with video content, generating a number of audio segments using the audio file, the plurality of audio segments including a first segment and a second segment, where the first segment and the second segment are consecutive segments. Example methods may include determining, using a Gated Recurrent Unit neural network, that the first segment includes first voice activity, determining, using the Gated Recurrent Unit neural network, that the second segment includes second voice activity, and determining that voice activity is present between a first timestamp associated with the first segment and a second timestamp associated with the second segment.
-
公开(公告)号:US12211222B2
公开(公告)日:2025-01-28
申请号:US18481179
申请日:2023-10-04
Applicant: Amazon Technologies, Inc.
Inventor: Xiaohan Nie , Muhammad Raffay Hamid
IPC: G06T7/33 , G06F18/21 , G06F18/214 , G06F18/40 , G06T7/73 , G06V10/75 , G06V20/40 , H04N5/272 , H04N21/2187 , H04N21/234
Abstract: Methods and systems are described for registering a sports field to a video. Video of a live event may feature participants at a venue. A template of the venue, including virtual markings that represent real markings on the venue, may be obtained. A homographic transformation between an image plane and a ground plane may be determined by matching virtual markings to corresponding real markings captured in at least one frame of the video. The determined homographic transformation may be used in the automated analysis of sports statistics and in improving inserted annotations and visualizations.
-
-
-
-
-
-
-
-
-