Patent search ap:("Google LLC") AND inv:"Tali Dekel" Page 1

1.

发明授权
Depth determination for images captured with a moving camera and representing moving features 有权

公开(公告)号：US11978225B2

公开(公告)日：2024-05-07

申请号：US18135678

申请日：2023-04-17

Applicant: Google LLC

Inventor： Tali Dekel , Forrester Cole , Ce Liu , William Freeman , Richard Tucker , Noah Snavely , Zhengqi Li

IPC: G06T7/579 , G06T7/246 , G06T7/73

CPC classification number: G06T7/579 , G06T7/246 , G06T7/73 , G06T2207/10016 , G06T2207/10028 , G06T2207/20081 , G06T2207/30244

Abstract: A method includes obtaining a reference image and a target image each representing an environment containing moving features and static features. The method also includes determining an object mask configured to mask out the moving features and preserves the static features in the target image. The method additionally includes determining, based on motion parallax between the reference image and the target image, a static depth image representing depth values of the static features in the target image. The method further includes generating, by way of a machine learning model, a dynamic depth image representing depth values of both the static features and the moving features in the target image. The model is trained to generate the dynamic depth image by determining depth values of at least the moving features based on the target image, the object mask, and the static depth image.

2.

发明公开
Text-Based Real Image Editing with Diffusion Models 审中-公开

公开(公告)号：US20240355017A1

公开(公告)日：2024-10-24

申请号：US18302508

申请日：2023-04-18

Applicant: Google LLC

Inventor： Shiran Elyahu Zada , Bahjat Kawar , Oran Lang , Omer Tov , Huiwen Chang , Tali Dekel , Inbar Mosseri

IPC: G06T11/60 , G06T3/40

CPC classification number: G06T11/60 , G06T3/4053

Abstract: Methods and systems for editing an image are disclosed herein. The method includes receiving an input image and a target text, the target text indicating a desired edit for the input image and obtaining, by the computing system, a target text embedding based on the target text. The method also includes obtaining, by the computing system, an optimized text embedding based on the target text embedding and the input image and fine-tuning, by the computing system, a diffusion model based on the optimized text embedding. The method can further include interpolating, by the computing system, the target text embedding and the optimized text embedding to obtain an interpolated embedding and generating, by the computing system, an edited image including the desired edit using the diffusion model based on the input image and the interpolated embedding.

3.

发明申请
Depth Determination for Images Captured with a Moving Camera and Representing Moving Features 有权

公开(公告)号：US20210090279A1

公开(公告)日：2021-03-25

申请号：US16578215

申请日：2019-09-20

Applicant: Google LLC

Inventor： Tali Dekel , Forrester Cole , Ce Liu , William Freeman , Richard Tucker , Noah Snavely , Zhengqi Li

IPC: G06T7/579 , G06T7/246 , G06T7/73

Abstract: A method includes obtaining a reference image and a target image each representing an environment containing moving features and static features. The method also includes determining an object mask configured to mask out the moving features and preserves the static features in the target image. The method additionally includes determining, based on motion parallax between the reference image and the target image, a static depth image representing depth values of the static features in the target image. The method further includes generating, by way of a machine learning model, a dynamic depth image representing depth values of both the static features and the moving features in the target image. The model is trained to generate the dynamic depth image by determining depth values of at least the moving features based on the target image, the object mask, and the static depth image.

4.

发明授权
Re-timing objects in video via layered neural rendering 有权

公开(公告)号：US12243145B2

公开(公告)日：2025-03-04

申请号：US17927101

申请日：2020-05-22

Applicant: Google LLC

Inventor： Forrester H. Cole , Erika Lu , Tali Dekel , William T. Freeman , David Henry Salesin , Michael Rubinstein

IPC: G06T13/80 , G06V10/44 , G06V10/82 , G06V20/40 , G11B27/00 , G11B27/031

Abstract: A computer-implemented method for decomposing videos into multiple layers (212, 213) that can be re-combined with modified relative timings includes obtaining video data including a plurality of image frames (201) depicting one or more objects. For each of the plurality of frames, the computer-implemented method includes generating one or more object maps descriptive of a respective location of at least one object of the one or more objects within the image frame. For each of the plurality of frames, the computer-implemented method includes inputting the image frame and the one or more object maps into a machine-learned layer Tenderer model. (220) For each of the plurality of frames, the computer-implemented method includes receiving, as output from the machine-learned layer Tenderer model, a background layer illustrative of a background of the video data and one or more object layers respectively associated with one of the one or more object maps. The object layers include image data illustrative of the at least one object and one or more trace effects at least partially attributable to the at least one object such that the one or more object layers and the background layer can be re-combined with modified relative timings.

5.

发明授权
Audio-visual speech separation 有权

公开(公告)号：US11894014B2

公开(公告)日：2024-02-06

申请号：US17951002

申请日：2022-09-22

Applicant: Google LLC

Inventor： Inbar Mosseri , Michael Rubinstein , Ariel Ephrat , William Freeman , Oran Lang , Kevin William Wilson , Tali Dekel , Avinatan Hassidim

IPC: G10L25/57 , G10L15/16 , G10L21/10 , G10L21/18 , G06V20/40 , G06V40/16 , G10L15/25 , G06F18/214 , G10L17/18

CPC classification number: G10L25/57 , G06F18/214 , G06V20/41 , G06V40/161 , G10L15/16 , G10L15/25 , G10L17/18 , G10L21/10 , G10L21/18

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

6.

发明申请
AUDIO-VISUAL SPEECH SEPARATION 有权

公开(公告)号：US20230122905A1

公开(公告)日：2023-04-20

申请号：US17951002

申请日：2022-09-22

Applicant: Google LLC

Inventor： Inbar Mosseri , Michael Rubinstein , Ariel Ephrat , William Freeman , Oran Lang , Kevin William Wilson , Tali Dekel , Avinatan Hassidim

IPC: G10L21/10 , G10L15/16 , G10L21/18 , G06V20/40 , G06V40/16 , G10L15/25 , G06F18/214

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

7.

发明公开
Re-Timing Objects in Video Via Layered Neural Rendering 审中-公开

公开(公告)号：US20230206955A1

公开(公告)日：2023-06-29

申请号：US17927101

申请日：2020-05-22

Applicant: Google LLC

Inventor： Forrester H. Cole , Erika Lu , Tali Dekel , William T. Freeman , David Henry Salesin , Michael Rubinstein

IPC: G11B27/00 , G06V10/82 , G06V20/40 , G11B27/031

CPC classification number: G11B27/005 , G06V10/82 , G06V20/46 , G11B27/031

Abstract: A computer-implemented method for decomposing videos into multiple layers (212, 213) that can be re-combined with modified relative timings includes obtaining video data including a plurality of image frames (201) depicting one or more objects. For each of the plurality of frames, the computer-implemented method includes generating one or more object maps descriptive of a respective location of at least one object of the one or more objects within the image frame. For each of the plurality of frames, the computer-implemented method includes inputting the image frame and the one or more object maps into a machine-learned layer Tenderer model. (220) For each of the plurality of frames, the computer-implemented method includes receiving, as output from the machine-learned layer Tenderer model, a background layer illustrative of a background of the video data and one or more object layers respectively associated with one of the one or more object maps. The object layers include image data illustrative of the at least one object and one or more trace effects at least partially attributable to the at least one object such that the one or more object layers and the background layer can be re-combined with modified relative timings.

8.

发明授权
Audio-visual speech separation 有权

公开(公告)号：US11456005B2

公开(公告)日：2022-09-27

申请号：US16761707

申请日：2018-11-21

Applicant: GOOGLE LLC

Inventor： Inbar Mosseri , Michael Rubinstein , Ariel Ephrat , William Freeman , Oran Lang , Kevin William Wilson , Tali Dekel , Avinatan Hassidim

IPC: G10L21/10 , G06K9/62 , G10L15/16 , G10L21/18 , G06V20/40 , G06V40/16

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

9.

发明申请
AUDIO-VISUAL SPEECH SEPARATION 审中-公开

公开(公告)号：US20200335121A1

公开(公告)日：2020-10-22

申请号：US16761707

申请日：2018-11-21

Applicant: GOOGLE LLC

Inventor： Inbar Mosseri , Michael Rubinstein , Ariel Ephrat , William Freeman , Oran Lang , Kevin William Wilson , Tali Dekel , Avinatan Hassidim

IPC: G10L21/10 , G10L21/18 , G10L15/16 , G06K9/00 , G06K9/62

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

10.

发明公开
Systems and Methods for Identifying and Extracting Object-Related Effects in Videos 审中-公开

公开(公告)号：US20240249523A1

公开(公告)日：2024-07-25

申请号：US18560609

申请日：2022-05-11

Applicant: Google LLC

Inventor： Forrester H. Cole , Andrew Zisserman , Tali Dekel , William Tafel Freeman , Erika Lu , Michael Rubinstein

IPC: G06V20/40 , G06T7/194 , G06T7/246 , G06T7/73 , G06V10/26 , G06V10/776 , G06V10/82

CPC classification number: G06V20/46 , G06T7/194 , G06T7/246 , G06T7/73 , G06V10/26 , G06V10/776 , G06V10/82 , G06T2207/10016 , G06T2207/10024 , G06T2207/20081 , G06T2207/20084

Abstract: The present disclosure provides systems and methods for identifying and extracting object-related effects in videos. Given an ordinary video and a rough segmentation mask overtime of one or more subjects of interest, example systems proposed herein can estimate an omnimatte for each subject—an alpha matte and color image that includes the subject along with all its related time-varying scene elements. Example implementations of the proposed models can be trained only on the input video in a self-supervised manner, without any manual labels, and are generic. For example, the models can produce omnimattes automatically for arbitrary objects and a variety of effects.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification