CAPTIONING USING GENERATIVE ARTIFICIAL INTELLIGENCE

    公开(公告)号:US20250139161A1

    公开(公告)日:2025-05-01

    申请号:US18431134

    申请日:2024-02-02

    Applicant: ADOBE INC.

    Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for cutting down a user's larger input video into an edited video comprising the most important video segments and applying corresponding video effects. Some embodiments of the present invention are directed to adding captioning video effects to the trimmed video (e.g., applying face-aware and non-face-aware captioning to emphasize extracted video segment headings, important sentences, quotes, words of interest, extracted lists, etc.). For example, a prompt is provided to a generative language model to identify portions of a transcript (e.g., extracted scene summaries, important sentences, lists of items discussed in the video, etc.) to apply to corresponding video segments as captions depending on the type of caption (e.g., an extracted heading may be captioned at the start of a corresponding video segment, important sentences and/or extracted list items may be captioned when they are spoken).

    AUTOMATIC RECOGNITION OF VISUAL AND AUDIO-VISUAL CUES

    公开(公告)号:US20230169795A1

    公开(公告)日:2023-06-01

    申请号:US17539652

    申请日:2021-12-01

    Applicant: ADOBE INC.

    Abstract: A method for detecting a cue (e.g., a visual cue or a visual cue combined with an audible cue) occurring together in an input video includes: presenting a user interface to record an example video of a user performing an act including the cue; determining a part of the example video where the cue occurs; applying a feature of the part to a neural network to generate a positive embedding; dividing the input video into a plurality of chunks and applying a feature of each chunk to the neural network to generate a plurality of negative embeddings; applying a feature of a given one of the chunks to the neural network to output a query embedding; and determining whether the cue occurs in the input video from the query embedding, the positive embedding, and the negative embeddings.

    GENERATING GESTURE REENACTMENT VIDEO FROM VIDEO MOTION GRAPHS USING MACHINE LEARNING

    公开(公告)号:US20240161335A1

    公开(公告)日:2024-05-16

    申请号:US18055310

    申请日:2022-11-14

    Applicant: Adobe Inc.

    CPC classification number: G06T7/73 G06F16/685 G06F40/242 G06T7/207

    Abstract: Embodiments are disclosed for generating a gesture reenactment video sequence corresponding to a target audio sequence using a trained network based on a video motion graph generated from a reference speech video. In particular, in one or more embodiments, the disclosed systems and methods comprise receiving a first input including a reference speech video and generating a video motion graph representing the reference speech video, where each node is associated with a frame of the reference video sequence and reference audio features of the reference audio sequence. The disclosed systems and methods further comprise receiving a second input including a target audio sequence, generating target audio features, identifying a node path through the video motion graph based on the target audio features and the reference audio features, and generating an output media sequence based on the identified node path through the video motion graph paired with the target audio sequence.

    ZOOM AND SCROLL BAR FOR A VIDEO TIMELINE

    公开(公告)号:US20230043769A1

    公开(公告)日:2023-02-09

    申请号:US17969536

    申请日:2022-10-19

    Applicant: Adobe Inc.

    Abstract: Embodiments are directed to techniques for interacting with a hierarchical video segmentation using a video timeline. In some embodiments, the finest level of a hierarchical segmentation identifies the smallest interaction unit of a video—semantically defined video segments of unequal duration called clip atoms, and higher levels cluster the clip atoms into coarser sets of video segments. A presented video timeline is segmented based on one of the levels, and one or more segments are selected through interactions with the video timeline. For example, a click or tap on a video segment or a drag operation dragging along the timeline snaps selection boundaries to corresponding segment boundaries defined by the level. Navigating to a different level of the hierarchy transforms the selection into coarser or finer video segments defined by the level. Any operation can be performed on selected video segments, including playing back, trimming, or editing.

Patent Agency Ranking