-
公开(公告)号:US20240096001A1
公开(公告)日:2024-03-21
申请号:US18013983
申请日:2022-11-15
Applicant: Google LLC
Inventor: Seyed Mohammad Mehdi Sajjadi , Henning Meyer , Etienne François Régis Pot , Urs Michael Bergmann , Klaus Greff , Noha Radwan , Suhani Deepak-Ranu Vora , Mario Lu¢i¢ , Daniel Christopher Duckworth , Thomas Allen Funkhouser , Andrea Tagliasacchi
Abstract: Provided are machine learning models that generate geometry-free neural scene representations through efficient object-centric novel-view synthesis. In particular, one example aspect of the present disclosure provides a novel framework in which an encoder model (e.g., an encoder transformer network) processes one or more RGB images (with or without pose) to produce a fully latent scene representation that can be passed to a decoder model (e.g., a decoder transformer network). Given one or more target poses, the decoder model can synthesize images in a single forward pass. In some example implementations, because transformers are used rather than convolutional or MLP networks, the encoder can learn an attention model that extracts enough 3D information about a scene from a small set of images to render novel views with correct projections, parallax, occlusions, and even semantics, without explicit geometry.