-
公开(公告)号:US20250131984A1
公开(公告)日:2025-04-24
申请号:US18687059
申请日:2022-08-29
Applicant: Google LLC
Inventor: Andrew Walker Carroll , Gunjan Baid , Pi-Chuan Chang , Daniel Elwood Cook , Maria Nattestad , Taedong Yun , Cory Yuen Fu McLean , MD Kishwar Shafin , Jean-Philippe Vert , Quentin Didier Olivier Berthet , Felipe Llinares López , Ashish Teku Vaswani
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for sequence error correction using neural networks.
-
公开(公告)号:US20240404238A1
公开(公告)日:2024-12-05
申请号:US18698997
申请日:2022-10-05
Applicant: Google LLC
Inventor: Jiahui Yu , Vijay Vasudevan , Alexander Yeong-Shiuh Ku , Yonghui Wu , Jason Michael Baldridge , Yuanzhong Xu , Jing Yu Koh , Thang Minh Luong , Gunjan Baid , Zirui Wang , Han Zhang , Xin Li
IPC: G06V10/28 , G06F40/284 , G06V10/764 , G06V10/766 , G06V10/82
Abstract: Systems and methods are provided for vector-quantized image modeling using vision transformers and improved codebook handling. In particular, the present disclosure provides a Vector-quantized Image Modeling (VIM) approach that involves pre-training a machine learning model (e.g., Transformer model) to predict rasterized image tokens autoregressively. The discrete image tokens can be encoded from a learned Vision-Transformer-based VQGAN (example implementations of which can be referred to as ViT-VQGAN). The present disclosure proposes multiple improvements over vanilla VQGAN from architecture to codebook learning, yielding better efficiency and reconstruction fidelity. The improved ViT-VQGAN further improves vector-quantized image modeling tasks, including unconditional image generation, conditioned image generation (e.g., class-conditioned image generation), and unsupervised representation learning.
-
公开(公告)号:US20240112088A1
公开(公告)日:2024-04-04
申请号:US18520083
申请日:2023-11-27
Applicant: Google LLC
Inventor: Jiahui Yu , Xin Li , Han Zhang , Vijay Vasudevan , Alexander Yeong-Shiuh Ku , Jason Michael Baldridge , Yuanzhong Xu , Jing Yu Koh , Thang Minh Luong , Gunjan Baid , Zirui Wang , Yonghui Wu
IPC: G06N20/00
CPC classification number: G06N20/00
Abstract: Systems and methods are provided for vector-quantized image modeling using vision transformers and improved codebook handling. In particular, the present disclosure provides a Vector-quantized Image Modeling (VIM) approach that involves pretraining a machine learning model (e.g., Transformer model) to predict rasterized image tokens autoregressively. The discrete image tokens can be encoded from a learned Vision-Transformer-based VQGAN (example implementations of which can be referred to as ViT-VQGAN). The present disclosure proposes multiple improvements over vanilla VQGAN from architecture to codebook learning, yielding better efficiency and reconstruction fidelity. The improved ViT-VQGAN further improves vector-quantized image modeling tasks, including unconditional image generation, conditioned image generation (e.g., class-conditioned image generation), and unsupervised representation learning.
-
-