Few-shot synthesis of talking heads

    公开(公告)号:US12026833B2

    公开(公告)日:2024-07-02

    申请号:US17310678

    申请日:2020-10-28

    申请人: Google LLC

    IPC分类号: G06T17/20 G06T7/40 G06T15/04

    摘要: Systems and methods are described for utilizing an image processing system with at least one processing device to perform operations including receiving a plurality of input images of a user, generating a three-dimensional mesh proxy based on a first set of features extracted from the plurality of input images and a second set of features extracted from the plurality of input images. The method may further include generating a neural texture based on a three-dimensional mesh proxy and the plurality of input images, generating a representation of the user including at least a neural texture, and sampling at least one portion of the neural texture from the three-dimensional mesh proxy. In response to providing the at least one sampled portion to a neural renderer, the method may include receiving, from the neural renderer, a synthesized image of the user that is previously not captured by the image processing system.

    METHODS, SYSTEMS, AND MEDIA FOR RELIGHTING IMAGES USING PREDICTED DEEP REFLECTANCE FIELDS

    公开(公告)号:US20200372284A1

    公开(公告)日:2020-11-26

    申请号:US16616235

    申请日:2019-10-16

    申请人: Google LLC

    摘要: Methods, systems, and media for relighting images using predicted deep reflectance fields are provided. In some embodiments, the method comprises: identifying a group of training samples, wherein each training sample includes (i) a group of one-light-at-a-time (OLAT) images that have each been captured when one light of a plurality of lights arranged on a lighting structure has been activated, (ii) a group of spherical color gradient images that have each been captured when the plurality of lights arranged on the lighting structure have been activated to each emit a particular color, and (iii) a lighting direction, wherein each image in the group of OLAT images and each of the spherical color gradient images are an image of a subject, and wherein the lighting direction indicates a relative orientation of a light to the subject; training a convolutional neural network using the group of training samples, wherein training the convolutional neural network comprises: for each training iteration in a series of training iterations and for each training sample in the group of training samples: generating an output predicted image, wherein the output predicted image is a representation of the subject associated with the training sample with lighting from the lighting direction associated with the training sample; identifying a ground-truth OLAT image included in the group of OLAT images for the training sample that corresponds to the lighting direction for the training sample; calculating a loss that indicates a perceptual difference between the output predicted image and the identified ground-truth OLAT image; and updating parameters of the convolutional neural network based on the calculated loss; identifying a test sample that includes a second group of spherical color gradient images and a second lighting direction; and generating a relit image of the subject included in each of the second group of spherical color gradient images with lighting from the second lighting direction using the trained convolutional neural network.

    FEW-SHOT SYNTHESIS OF TALKING HEADS

    公开(公告)号:US20220130111A1

    公开(公告)日:2022-04-28

    申请号:US17310678

    申请日:2020-10-28

    申请人: Google LLC

    IPC分类号: G06T17/20 G06T15/04 G06T7/40

    摘要: Systems and methods are described for utilizing an image processing system with at least one processing device to perform operations including receiving a plurality of input images of a user, generating a three-dimensional mesh proxy based on a first set of features extracted from the plurality of input images and a second set of features extracted from the plurality of input images. The method may further include generating a neural texture based on a three-dimensional mesh proxy and the plurality of input images, generating a representation of the user including at least a neural texture, and sampling at least one portion of the neural texture from the three-dimensional mesh proxy. In response to providing the at least one sampled portion to a neural renderer, the method may include receiving, from the neural renderer, a synthesized image of the user that is previously not captured by the image processing system.

    Methods, systems, and media for relighting images using predicted deep reflectance fields

    公开(公告)号:US10997457B2

    公开(公告)日:2021-05-04

    申请号:US16616235

    申请日:2019-10-16

    申请人: Google LLC

    摘要: Methods, systems, and media for relighting images using predicted deep reflectance fields are provided. In some embodiments, the method comprises: identifying a group of training samples, wherein each training sample includes (i) a group of one-light-at-a-time (OLAT) images that have each been captured when one light of a plurality of lights arranged on a lighting structure has been activated, (ii) a group of spherical color gradient images that have each been captured when the plurality of lights arranged on the lighting structure have been activated to each emit a particular color, and (iii) a lighting direction, wherein each image in the group of OLAT images and each of the spherical color gradient images are an image of a subject, and wherein the lighting direction indicates a relative orientation of a light to the subject; training a convolutional neural network using the group of training samples, wherein training the convolutional neural network comprises: for each training iteration in a series of training iterations and for each training sample in the group of training samples: generating an output predicted image, wherein the output predicted image is a representation of the subject associated with the training sample with lighting from the lighting direction associated with the training sample; identifying a ground-truth OLAT image included in the group of OLAT images for the training sample that corresponds to the lighting direction for the training sample; calculating a loss that indicates a perceptual difference between the output predicted image and the identified ground-truth OLAT image; and updating parameters of the convolutional neural network based on the calculated loss; identifying a test sample that includes a second group of spherical color gradient images and a second lighting direction; and generating a relit image of the subject included in each of the second group of spherical color gradient images with lighting from the second lighting direction using the trained convolutional neural network.