FACE-AWARE SPEAKER DIARIZATION FOR TRANSCRIPTS AND TEXT-BASED VIDEO EDITING

    公开(公告)号:US20240127857A1

    公开(公告)日:2024-04-18

    申请号:US17967399

    申请日:2022-10-17

    Applicant: Adobe Inc.

    CPC classification number: G11B27/031 G06V20/41

    Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for face-aware speaker diarization. In an example embodiment, an audio-only speaker diarization technique is applied to generate an audio-only speaker diarization of a video, an audio-visual speaker diarization technique is applied to generate a face-aware speaker diarization of the video, and the audio-only speaker diarization is refined using the face-aware speaker diarization to generate a hybrid speaker diarization that links detected faces to detected voices. In some embodiments, to accommodate videos with small faces that appear pixelated, a cropped image of any given face is extracted from each frame of the video, and the size of the cropped image is used to select a corresponding active speaker detection model to predict an active speaker score for the face in the cropped image.

Patent Agency Ranking