VIDEO GENERATION USING FRAME-WISE TOKEN EMBEDDINGS

    公开(公告)号:US20250119624A1

    公开(公告)日:2025-04-10

    申请号:US18894443

    申请日:2024-09-24

    Applicant: ADOBE INC.

    Abstract: A method, apparatus, non-transitory computer readable medium, and system for generating synthetic videos includes obtaining an input prompt describing a video scene. The embodiments then generate a plurality of frame-wise token embeddings corresponding to a sequence of video frames, respectively, based on the input prompt. Subsequently, embodiments generate, using a video generation model, a synthesized video depicting the video scene. The synthesized includes a plurality of images corresponding to the sequence of video frames.

    Face-aware speaker diarization for transcripts and text-based video editing

    公开(公告)号:US12125501B2

    公开(公告)日:2024-10-22

    申请号:US17967399

    申请日:2022-10-17

    Applicant: Adobe Inc.

    CPC classification number: G11B27/031 G06V20/41

    Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for face-aware speaker diarization. In an example embodiment, an audio-only speaker diarization technique is applied to generate an audio-only speaker diarization of a video, an audio-visual speaker diarization technique is applied to generate a face-aware speaker diarization of the video, and the audio-only speaker diarization is refined using the face-aware speaker diarization to generate a hybrid speaker diarization that links detected faces to detected voices. In some embodiments, to accommodate videos with small faces that appear pixelated, a cropped image of any given face is extracted from each frame of the video, and the size of the cropped image is used to select a corresponding active speaker detection model to predict an active speaker score for the face in the cropped image.

Patent Agency Ranking