-
公开(公告)号:US11562518B2
公开(公告)日:2023-01-24
申请号:US17340671
申请日:2021-06-07
Applicant: Google LLC
Inventor: Tianhao Zhang , Weilong Yang , Honglak Lee , Hung-Yu Tseng , Irfan Aziz Essa , Lu Jiang
Abstract: A method for generating an output image from an input image and an input text instruction that specifies a location and a modification of an edit applied to the input image using a neural network is described. The neural network includes an image encoder, an image decoder, and an instruction attention network. The method includes receiving the input image and the input text instruction; extracting, from the input image, an input image feature that represents features of the input image using the image encoder; generating a spatial feature and a modification feature from the input text instruction using the instruction attention network; generating an edited image feature from the input image feature, the spatial feature and the modification feature; and generating the output image from the edited image feature using the image decoder.
-
公开(公告)号:US20210383584A1
公开(公告)日:2021-12-09
申请号:US17340671
申请日:2021-06-07
Applicant: Google LLC
Inventor: Tianhao Zhang , Weilong Yang , Honglak Lee , Hung-Yu Tseng , Irfan Aziz Essa , Lu Jiang
Abstract: A method for generating an output image from an input image and an input text instruction that specifies a location and a modification of an edit applied to the input image using a neural network is described. The neural network includes an image encoder, an image decoder, and an instruction attention network. The method includes receiving the input image and the input text instruction; extracting, from the input image, an input image feature that represents features of the input image using the image encoder; generating a spatial feature and a modification feature from the input text instruction using the instruction attention network; generating an edited image feature from the input image feature, the spatial feature and the modification feature; and generating the output image from the edited image feature using the image decoder.
-
公开(公告)号:US11900517B2
公开(公告)日:2024-02-13
申请号:US18085487
申请日:2022-12-20
Applicant: Google LLC
Inventor: Tianhao Zhang , Weilong Yang , Honglak Lee , Hung-Yu Tseng , Irfan Aziz Essa , Lu Jiang
Abstract: A method for generating an output image from an input image and an input text instruction that specifies a location and a modification of an edit applied to the input image using a neural network is described. The neural network includes an image encoder, an image decoder, and an instruction attention network. The method includes receiving the input image and the input text instruction; extracting, from the input image, an input image feature that represents features of the input image using the image encoder; generating a spatial feature and a modification feature from the input text instruction using the instruction attention network; generating an edited image feature from the input image feature, the spatial feature and the modification feature; and generating the output image from the edited image feature using the image decoder.
-
公开(公告)号:US20230177754A1
公开(公告)日:2023-06-08
申请号:US18085487
申请日:2022-12-20
Applicant: Google LLC
Inventor: Tianhao Zhang , Weilong Yang , Honglak Lee , Hung-Yu Tseng , Irfan Aziz Essa , Lu Jiang
Abstract: A method for generating an output image from an input image and an input text instruction that specifies a location and a modification of an edit applied to the input image using a neural network is described. The neural network includes an image encoder, an image decoder, and an instruction attention network. The method includes receiving the input image and the input text instruction; extracting, from the input image, an input image feature that represents features of the input image using the image encoder; generating a spatial feature and a modification feature from the input text instruction using the instruction attention network; generating an edited image feature from the input image feature, the spatial feature and the modification feature; and generating the output image from the edited image feature using the image decoder.
-
公开(公告)号:US20240212246A1
公开(公告)日:2024-06-27
申请号:US18400629
申请日:2023-12-29
Applicant: Google LLC
Inventor: Tianhao Zhang , Weilong Yang , Honglak Lee , Hung-Yu Tseng , Irfan Aziz Essa , Lu Jiang
Abstract: A method for generating an output image from an input image and an input text instruction that specifies a location and a modification of an edit applied to the input image using a neural network is described. The neural network includes an image encoder, an image decoder, and an instruction attention network. The method includes receiving the input image and the input text instruction; extracting, from the input image, an input image feature that represents features of the input image using the image encoder; generating a spatial feature and a modification feature from the input text instruction using the instruction attention network; generating an edited image feature from the input image feature, the spatial feature and the modification feature; and generating the output image from the edited image feature using the image decoder.
-
公开(公告)号:US20250166135A1
公开(公告)日:2025-05-22
申请号:US18951203
申请日:2024-11-18
Applicant: Google LLC
Inventor: Yu-Chuan Su , Hsin-Ping Huang , Ming-Hsuan Yang , Deqing Sun , Lu Jiang , Yukun Zhu , Xuhui Jia
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controllable video generation. One of the methods includes receiving a text prompt that specifies an object; receiving a control input that comprises an image that depicts a particular instance of the object; generating a video that comprises a respective video frame at each of a plurality of time steps in the video and that depicts the particular instance of the object. Generating the video includes, at each of the plurality of time steps: obtaining a text prompt embedding; obtaining a control input embedding; and generating the respective video frame at the time step using a video generation neural network while the video generation neural network is conditioned on the text prompt embedding and on the control input embedding.
-
-
-
-
-