-
公开(公告)号:US20230131228A1
公开(公告)日:2023-04-27
申请号:US17922531
申请日:2020-05-19
Applicant: Google LLC
Inventor: Yilin Wang , Balineedu Adsumilli , Feng Yang
Abstract: A method includes training a first model to measure the banding artefacts, training a second model to deband the image, and generating a debanded image for the image using the second model. Training the first model can include selecting a first set of first training images, generating a banding edge map for a first training image, where the map includes weights that emphasize banding edges and de-emphasize true edges in the first training image, and using the map and a luminance plane of the first training image as input to the first model. Training the second model can include selecting a second set of second training images, generating a debanded training image for a second training image, generating a banding score for the debanded training image using the first model, and using the banding score in a loss function used in training the second model.
-
公开(公告)号:US20230091374A1
公开(公告)日:2023-03-23
申请号:US17802060
申请日:2020-02-24
Applicant: Google LLC
Inventor: Qifei Wang , Alexander Kuznetsov , Alec Michael Go , Grace Chu , Eunyoung Kim , Feng Yang , Andrew Gerald Howard , Jeffrey M. Gilbert
IPC: G06V30/413 , G06V10/22
Abstract: The present disclosure is directed to object and/or character recognition for use in applications such as computer vision. Advantages of the present disclosure include lightweight functionality that can be used on devices such as smart phones. Aspects of the present disclosure include a sequential architecture where a lightweight machine-learned model can receive an image, detect whether an object is present in one or more regions of the image, and generate an output based on the detection. This output can be applied as a filter to remove image data that can be neglected for more memory intensive machine-learned models applied downstream.
-
公开(公告)号:US20220335560A1
公开(公告)日:2022-10-20
申请号:US17764445
申请日:2019-05-12
Applicant: Google LLC
Inventor: Innfarn Yoo , Feng Yang , Xiyang Luo
Abstract: A computer-implemented method that provides watermark-based image reconstruction to compensate for lossy encoding schemes. The method can generate a difference image describing the data loss associated with encoding an image using a lossy encoding scheme. The difference image can be encoded as a message and embedded in the encoded image using a watermark and later extracted from the encoded image. The difference image can be added to the encoded image to reconstruct the original image. As an example, an input image encoded using a lossy JPEG compression scheme can be embedded with the lost data and later reconstructed, using the embedded data, to a fidelity level that is identical or substantially similar to the original.
-
公开(公告)号:US20200186836A1
公开(公告)日:2020-06-11
申请号:US16210900
申请日:2018-12-05
Applicant: Google LLC
Inventor: Peyman Milanfar , Feng Yang , Sungjoon Choi
IPC: H04N19/625 , H04N19/124 , H04N19/184
Abstract: Methods are provided for sharpening or otherwise modifying compressed images without decompressing and re-encoding the images. An overall image quality is determined based on the source of the compressed image, the quantization table of the compressed image, or some other factor(s), and a set of scaling factors corresponding to the image quality is selected. The selected scaling factors are then applied to corresponding quantization factors of the image's quantization table or other parameters of the compressed image that describe the image contents of the compressed image. The scaling factors of a given set of scaling factors can be determined by a machine learning process that involves training the scaling factors based on training images determined by decompressing and then sharpening or otherwise modifying a source set of compressed images. These methods can provide improvements with respect to encoded image size and computational cost of the image modification method.
-
公开(公告)号:US20250069382A1
公开(公告)日:2025-02-27
申请号:US18726881
申请日:2023-01-05
Applicant: Google LLC
Inventor: Yinxiao Li , Zhengzhong Tu , Hossein Talebi , Han Zhang , Feng Yang , Peyman Milanfar
IPC: G06V10/82 , G06V10/764 , G06V10/77
Abstract: Provided are machine learning systems and models featuring resolution-flexible multi-axis attention blocks. In particular, the present disclosure provides example multi-axis MLP based architectures (example implementations of which can be generally referred to as MAXIM) that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks. In some implementations, MAXIM can use a UNet-shaped hierarchical structure and supports long-range interactions enabled by spatially-gated MLPs. Specifically, some example implementations of MAXIM can contain two MLP-based building blocks: a multi-axis gated MLP that allows for efficient and scalable spatial mixing of local and global visual cues, and a cross-gating block, an alternative to cross-attention, which accounts for cross-feature mutual conditioning.
-
公开(公告)号:US12238322B2
公开(公告)日:2025-02-25
申请号:US18008789
申请日:2022-01-11
Applicant: GOOGLE LLC
Inventor: Xiyang Luo , Feng Yang , Elnaz Barshan Tashnizi , Dake He , Ryan Matthew Haggarty , Michael Gene Goebel
IPC: H04N19/467 , G06T1/00
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for jointly training an encoder that generates a watermark and a decoder that decodes a data item encoded within the watermark. The training comprises obtaining a plurality of training images and data items. For each training image, a first watermark is generated using an encoder and a subsequent second watermark is generated by tiling two or more first watermarks. The training image is watermarked using the second watermark to generate a first error value and distortions are added to the watermarked image. A distortion detector predicts the distortions based on which the distorted image is modified. The modified image is decoded by the decoder to generate a predicted data item and a second error value. The training parameters of the encoder and decoder are adjusted based on the first and the second error value.
-
37.
公开(公告)号:US12206914B2
公开(公告)日:2025-01-21
申请号:US18021636
申请日:2022-06-08
Applicant: Google LLC
Inventor: Yilin Wang , Balineedu Adsumilli , Junjie Ke , Hossein Talebi , Joong Yim , Neil Birkbeck , Peyman Milanfar , Feng Yang
IPC: H04N21/266 , G06N3/045 , H04N17/02 , H04N19/154 , H04N21/234 , H04N21/434 , H04N21/44 , H04N21/466
Abstract: Methods, systems, and media for determining perceptual quality indicators of video content items are provided. In some embodiments, the method comprises: receiving a video content item; extracting a plurality of frames from the video content item; determining, using a first subnetwork of a deep neural network, a content quality indicator for each frame of the plurality of frames of the video content item; determining, using a second subnetwork of the deep neural network, a video distortion indicator for each frame of the plurality of frames of the video content item; determining, using a third subnetwork of the deep neural network, a compression sensitivity indicator for each frame of the plurality of frames of the video content item; generating a quality level for each frame of the plurality of frames of the video content item that concatenates the content quality indicator, the video distortion indicator, and the compression sensitivity indicator for that frame of the video content item; generating an overall quality level for video content item by aggregating the quality level of each frame of the plurality of frames; and causing a video recommendation to be presented based on the overall quality level of the video content item.
-
公开(公告)号:US20230362399A1
公开(公告)日:2023-11-09
申请号:US18008789
申请日:2022-01-11
Applicant: GOOGLE LLC
Inventor: Xiyang Luo , Feng Yang , Elnaz Barshan Tashnizi , Dake He , Ryan Matthew Haggarty , Michael Gene Goebel
IPC: H04N19/467 , G06T1/00
CPC classification number: H04N19/467 , G06T1/0021
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for jointly training an encoder that generates a watermark and a decoder that decodes a data item encoded within the watermark. The training comprises obtaining a plurality of training images and data items. For each training image, a first watermark is generated using an encoder and a subsequent second watermark is generated by tiling two or more first watermarks. The training image is watermarked using the second watermark to generate a first error value and distortions are added to the watermarked image. A distortion detector predicts the distortions based on which the distorted image is modified. The modified image is decoded by the decoder to generate a predicted data item and a second error value. The training parameters of the encoder and decoder are adjusted based on the first and the second error value.
-
公开(公告)号:US20230267307A1
公开(公告)日:2023-08-24
申请号:US18014314
申请日:2020-07-23
Applicant: Google LLC
Inventor: Qifei Wang , Junjie Ke , Grace Chu , Gabriel Mintzer Bender , Luciano Sbaiz , Feng Yang , Andrew Gerald Howard , Alec Michael Go , Jeffrey M. Gilbert , Peyman Milanfar , Joshua William Charles Greaves
Abstract: Systems and methods of the present disclosure are directed to a method for generating a machine-learned multitask model configured to perform tasks. The method can include obtaining a machine-learned multitask search model comprising candidate nodes. The method can include obtaining tasks and machine-learned task controller models associated with the tasks. As an example, for a task, the method can include using the task controller model to route a subset of the candidate nodes in a machine-learned task submodel for the corresponding task. The method can include inputting task input data to the task submodel to obtain a task output. The method can include generating, using the task output, a feedback value based on an objective function. The method can include adjusting parameters of the task controller model based on the feedback value.
-
公开(公告)号:US20230222623A1
公开(公告)日:2023-07-13
申请号:US17787699
申请日:2021-07-01
Applicant: Google LLC
Inventor: Junjie Ke , Feng Yang , Qifei Wang , Yilin Wang , Peyman Milanfar
CPC classification number: G06T3/0012 , G06T3/40 , G06T7/0002 , G06T2207/30168 , G06T2207/20081 , G06T2207/20016
Abstract: The technology employs a patch-based multi-scale Transformer (300) that is usable with various imaging applications. This avoids constraints on image fixed input size and predicts the quality effectively on a native resolution image. A native resolution image (304) is transformed into a multi-scale representation (302), enabling the Transformer's self-attention mechanism to capture information on both fine-grained detailed patches and coarse-grained global patches. Spatial embedding (316) is employed to map patch positions to a fixed grid, in which patch locations at each scale are hashed to the same grid. A separate scale embedding (318) is employed to distinguish patches coming from different scales in the multiscale representation. Self-attention (508) is performed to create a final image representation. In some instances, prior to performing self-attention, the system may prepend a learnable classification token (322) to the set of input tokens.
-
-
-
-
-
-
-
-
-