TRANSLATING TEXTS FOR VIDEOS BASED ON VIDEO CONTEXT

    公开(公告)号:US20230102217A1

    公开(公告)日:2023-03-30

    申请号:US18049185

    申请日:2022-10-24

    Applicant: Adobe Inc.

    Abstract: The present disclosure describes systems, non-transitory computer-readable media, and methods that can generate contextual identifiers indicating context for frames of a video and utilize those contextual identifiers to generate translations of text corresponding to such video frames. By analyzing a digital video file, the disclosed systems can identify video frames corresponding to a scene and a term sequence corresponding to a subset of the video frames. Based on images features of the video frames corresponding to the scene, the disclosed systems can utilize a contextual neural network to generate a contextual identifier (e.g. a contextual tag) indicating context for the video frames. Based on the contextual identifier, the disclosed systems can subsequently apply a translation neural network to generate a translation of the term sequence from a source language to a target language. In some cases, the translation neural network also generates affinity scores for the translation.

    Generating and providing composition effect tutorials for creating and editing digital content

    公开(公告)号:US10685470B2

    公开(公告)日:2020-06-16

    申请号:US16128904

    申请日:2018-09-12

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable media for generating and providing composition effect tutorials for creating and editing digital content based on a metadata composite structure. For example, the disclosed systems can generate and/or access a metadata composite structure that includes nodes corresponding to composition effects applied to a digital content item, where a given node can include location information indicating where a composition effect is applied relative to a digital content item. The disclosed systems can further generate a tutorial to guide a user to implement a selected composition effect by identifying composition effects of nodes that correspond to a location selected within a composition interface and presenting instructions for a particular composition effect.

    Audio-based media edit point selection

    公开(公告)号:US11875781B2

    公开(公告)日:2024-01-16

    申请号:US17008427

    申请日:2020-08-31

    Applicant: Adobe Inc.

    CPC classification number: G10L15/04 G06F40/253 G06F40/30

    Abstract: A media edit point selection process can include a media editing software application programmatically converting speech to text and storing a timestamp-to-text map. The map correlates text corresponding to speech extracted from an audio track for the media clip to timestamps for the media clip. The timestamps correspond to words and some gaps in the speech from the audio track. The probability of identified gaps corresponding to a grammatical pause by the speaker is determined using the timestamp-to-text map and a semantic model. Potential edit points corresponding to grammatical pauses in the speech are stored for display or for additional use by the media editing software application. Text can optionally be displayed to a user during media editing.

    Trace layer for replicating a source region of a digital image

    公开(公告)号:US11573689B1

    公开(公告)日:2023-02-07

    申请号:US17507105

    申请日:2021-10-21

    Applicant: Adobe Inc.

    Abstract: Techniques are described for trace layer for replicating a source region of a digital image. In an implementation, a user leverages a content editing system to select a source region of a source image to be replicated and a target region of a target image to which portions of the source region are to be replicated. A trace layer is generated that is a visual representation of portions of the source region, and the trace layer is positioned on the target region of the target image. Further, the trace layer is generated based on a visibility factor such that the trace layer is at least partially transparent. The trace layer receives user interaction to select portions of the trace layer and visibility of the selected portions is modified to replicate corresponding portions of the source region to the target region.

    IMAGE PROCESSING TECHNIQUES TO QUICKLY FIND A DESIRED OBJECT AMONG OTHER OBJECTS FROM A CAPTURED VIDEO SCENE

    公开(公告)号:US20220261579A1

    公开(公告)日:2022-08-18

    申请号:US17177761

    申请日:2021-02-17

    Applicant: Adobe Inc.

    Abstract: Techniques are provided for identifying objects (such as products within a physical store) within a captured video scene and indicating which of object in the captured scene matches a desired object requested by a user. The matching object is then displayed in an accentuated manner to the user in real-time (via augmented reality). Object identification is carried out via a multimodal methodology. Objects within the captured video scene are identified using a neural network trained to identify different types of objects. The identified objects can then be compared against a database of pre-stored images of the desired product to determine if a close match is found. Additionally, text on the identified objects is analyzed and compared to the text of the desired object. Based on either or both identification methods, the desired object is indicated to the user on their display, via an augmented reality graphic.

    TRACKING IMAGE MODIFICATIONS FOR A DIGITAL IMAGE

    公开(公告)号:US20230360271A1

    公开(公告)日:2023-11-09

    申请号:US17661881

    申请日:2022-05-03

    Applicant: Adobe Inc.

    Abstract: The present disclosure relates to systems, non-transitory computer-readable media, and methods for detecting changes to a point of interest between a selected version and a previous version of a digital image and providing a summary of the changes to the point of interest. For example, the disclosed system provides for display a selected version of a digital image and detects a point of interest within the selected version of the digital image. The disclosed system determines image modifications to the point of interest (e.g., tracks changes to the point of interest) to generate a summary of the image modifications. Moreover, the summary can indicate further information concerning image modifications applied to the selected point of interest, such as timestamp, editor, or author information.

    Translating texts for videos based on video context

    公开(公告)号:US11481563B2

    公开(公告)日:2022-10-25

    申请号:US16678378

    申请日:2019-11-08

    Applicant: Adobe Inc.

    Abstract: The present disclosure describes systems, non-transitory computer-readable media, and methods that can generate contextual identifiers indicating context for frames of a video and utilize those contextual identifiers to generate translations of text corresponding to such video frames. By analyzing a digital video file, the disclosed systems can identify video frames corresponding to a scene and a term sequence corresponding to a subset of the video frames. Based on images features of the video frames corresponding to the scene, the disclosed systems can utilize a contextual neural network to generate a contextual identifier (e.g. a contextual tag) indicating context for the video frames. Based on the contextual identifier, the disclosed systems can subsequently apply a translation neural network to generate a translation of the term sequence from a source language to a target language. In some cases, the translation neural network also generates affinity scores for the translation.

    Extracting session information from video content to facilitate seeking

    公开(公告)号:US10701434B1

    公开(公告)日:2020-06-30

    申请号:US16253120

    申请日:2019-01-21

    Applicant: Adobe Inc.

    Abstract: A seek content extraction system analyzes frames of video content and identifies locations in the frames where session information is displayed. This session information refers to information that is displayed as part of video content and that describes, for a particular location in the video content, what is currently happening in the video content at that particular location. This session information is extracted from each of multiple frames, and for a given frame the extracted session information is associated with the frame. While the user is seeking forward or backward through the video content, a thumbnail of the frame at a given location in the video content is displayed along with the extracted session information associated with the frame.

    Salient video frame establishment

    公开(公告)号:US10460196B2

    公开(公告)日:2019-10-29

    申请号:US15232533

    申请日:2016-08-09

    Applicant: Adobe Inc.

    Abstract: Salient video frame establishment is described. In one or more example embodiments, salient frames of a video are established based on multiple photos. An image processing module is capable of analyzing both video frames and photos, both of which may include entities, such as faces or objects. Frames of a video are decoded and analyzed in terms of attributes of the video. Attributes include, for example, scene boundaries, facial expressions, brightness levels, and focus levels. From the video frames, the image processing module determines candidate frames based on the attributes. The image processing module analyzes multiple photos to ascertain multiple relevant entities based on the presence of entities in the multiple photos. Relevancy of an entity can depend, for instance, on a number of occurrences. The image processing module establishes multiple salient frames from the candidate frames based on the multiple relevant entities. Salient frames can be displayed.

    Translating texts for videos based on video context

    公开(公告)号:US12299408B2

    公开(公告)日:2025-05-13

    申请号:US18049185

    申请日:2022-10-24

    Applicant: Adobe Inc.

    Abstract: The present disclosure describes systems, non-transitory computer-readable media, and methods that can generate contextual identifiers indicating context for frames of a video and utilize those contextual identifiers to generate translations of text corresponding to such video frames. By analyzing a digital video file, the disclosed systems can identify video frames corresponding to a scene and a term sequence corresponding to a subset of the video frames. Based on images features of the video frames corresponding to the scene, the disclosed systems can utilize a contextual neural network to generate a contextual identifier (e.g. a contextual tag) indicating context for the video frames. Based on the contextual identifier, the disclosed systems can subsequently apply a translation neural network to generate a translation of the term sequence from a source language to a target language. In some cases, the translation neural network also generates affinity scores for the translation.

Patent Agency Ranking