-
公开(公告)号:US11995892B2
公开(公告)日:2024-05-28
申请号:US17804188
申请日:2022-05-26
Applicant: Microsoft Technology Licensing, LLC
Inventor: Mattan Serry , Zvi Figov , Yonit Hoffman , Maayan Yedidia
IPC: G06V20/40 , G06V20/62 , G11B27/036
CPC classification number: G06V20/48 , G06V20/46 , G06V20/635 , G11B27/036
Abstract: Systems, methods, and a computer-readable medium are provided for matching textless elements to texted elements in video content. A video processing system including a textless matching system may divide a video into shots, identify shots having similar durations, identify sequences of shots having similar durations, and compare image content in representative frames of the sequences to determine whether the sequences match. When the sequences are determined to match, the sequences may be paired, wherein the first sequence may include shots with overlaid text and the second sequence may include textless version of corresponding texted shots included in the first sequence. In some examples, the video processing system may further replace the determined corresponding texted shots.
-
公开(公告)号:US12190867B2
公开(公告)日:2025-01-07
申请号:US17804603
申请日:2022-05-31
Applicant: Microsoft Technology Licensing, LLC
Inventor: Zvi Figov
Abstract: Examples of the present disclosure describe improved systems and methods for detecting keywords in audio content. In one example implementation, audio content is segmented into one or more audio segments. One or more text segments is generated, each text segment corresponding to each of the audio segments. For each text segment, one or more phrase candidate values is generated using a textual analysis, and one or more sentence embedding values is generated using a sentence embedding analysis. Next, an average sentence embedding value is calculated using the one or more sentence embedding values. Each of the one or more phrase candidate values is compared to the average sentence embedding value. Each phrase candidate value having a comparison value above a threshold value is labeled as representing a keyword.
-
公开(公告)号:US12026200B2
公开(公告)日:2024-07-02
申请号:US17865117
申请日:2022-07-14
Applicant: Microsoft Technology Licensing, LLC
Inventor: Yonit Hoffman , Tom Hirshberg , Maayan Yedidia , Zvi Figov
IPC: G06F16/78 , G06F3/04847 , G06F16/738 , G06F16/783 , G06V20/40
CPC classification number: G06F16/739 , G06F3/04847 , G06F16/784 , G06V20/46
Abstract: A video-processing technique uses machine-trained logic to detect and track people that appear in video information. The technique then ranks the prominence of these people in the video information, to produce ranking information. The prominence of a person reflects a level of importance of the person in the video information, corresponding to the capacity of the person to draw the attention of a viewer. For instance, the prominence of the person reflects, at least in part, an extent to which the person appears in the video information. The technique performs its ranking based on person-specific feature information. The technique produces each instance of person-specific feature information by accumulating features pertaining to a particular person. One or more application systems make use of the ranking information to control the presentation of the video information.
-
公开(公告)号:US11755643B2
公开(公告)日:2023-09-12
申请号:US16921248
申请日:2020-07-06
Applicant: Microsoft Technology Licensing, LLC
IPC: G06F16/00 , G06F16/71 , G06F16/783 , G06V10/25 , G06V20/40
CPC classification number: G06F16/71 , G06F16/7837 , G06V10/25 , G06V20/41 , G06V20/48
Abstract: A video indexing system identifies groups of frames within a video frame sequence captured by a static camera during a same scene. Context metadata is generated for each frame in each group based on an analysis of fewer than all frames in the group. The frames are indexed in a database in association with the generated context metadata.
-
公开(公告)号:US11386609B2
公开(公告)日:2022-07-12
申请号:US17138207
申请日:2020-12-30
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Mattan Serry , Zvi Figov , Irit Ofer
Abstract: An approach using 3D algorithms to solve 2D head localization problems is disclosed. A system can extrapolate aspects of one part of an object, e.g., extract characteristics of a person's head, using a 2D input image of another part of the object, e.g., a 2D image of the person's face. The system then selects an appropriate 3D model by the use of facial features detected in an image of a person's face. Using the selected 3D model and the 3D rotation angles provided by a face detector, the system rotates the model and then projects the model to a 2D shape. The system then scales and translates, e.g., transforms, the 2D shape to match the 2D face bounding box. Then, using the transformed 2D shape, the system extracts a bounding box for the extracted portion of an object, e.g., the head of the person depicted in the 2D input image.
-
-
-
-