Debanding Using A Novel Banding Metric

    公开(公告)号:US20230131228A1

    公开(公告)日:2023-04-27

    申请号:US17922531

    申请日:2020-05-19

    Applicant: Google LLC

    Abstract: A method includes training a first model to measure the banding artefacts, training a second model to deband the image, and generating a debanded image for the image using the second model. Training the first model can include selecting a first set of first training images, generating a banding edge map for a first training image, where the map includes weights that emphasize banding edges and de-emphasize true edges in the first training image, and using the map and a luminance plane of the first training image as input to the first model. Training the second model can include selecting a second set of second training images, generating a debanded training image for a second training image, generating a banding score for the debanded training image using the first model, and using the banding score in a loss function used in training the second model.

    Watermark-Based Image Reconstruction

    公开(公告)号:US20220335560A1

    公开(公告)日:2022-10-20

    申请号:US17764445

    申请日:2019-05-12

    Applicant: Google LLC

    Abstract: A computer-implemented method that provides watermark-based image reconstruction to compensate for lossy encoding schemes. The method can generate a difference image describing the data loss associated with encoding an image using a lossy encoding scheme. The difference image can be encoded as a message and embedded in the encoded image using a watermark and later extracted from the encoded image. The difference image can be added to the encoded image to reconstruct the original image. As an example, an input image encoded using a lossy JPEG compression scheme can be embedded with the lost data and later reconstructed, using the embedded data, to a fidelity level that is identical or substantially similar to the original.

    Adaptive DCT Sharpener
    34.
    发明申请

    公开(公告)号:US20200186836A1

    公开(公告)日:2020-06-11

    申请号:US16210900

    申请日:2018-12-05

    Applicant: Google LLC

    Abstract: Methods are provided for sharpening or otherwise modifying compressed images without decompressing and re-encoding the images. An overall image quality is determined based on the source of the compressed image, the quantization table of the compressed image, or some other factor(s), and a set of scaling factors corresponding to the image quality is selected. The selected scaling factors are then applied to corresponding quantization factors of the image's quantization table or other parameters of the compressed image that describe the image contents of the compressed image. The scaling factors of a given set of scaling factors can be determined by a machine learning process that involves training the scaling factors based on training images determined by decompressing and then sharpening or otherwise modifying a source set of compressed images. These methods can provide improvements with respect to encoded image size and computational cost of the image modification method.

    Machine Learning Models Featuring Resolution-Flexible Multi-Axis Attention Blocks

    公开(公告)号:US20250069382A1

    公开(公告)日:2025-02-27

    申请号:US18726881

    申请日:2023-01-05

    Applicant: Google LLC

    Abstract: Provided are machine learning systems and models featuring resolution-flexible multi-axis attention blocks. In particular, the present disclosure provides example multi-axis MLP based architectures (example implementations of which can be generally referred to as MAXIM) that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks. In some implementations, MAXIM can use a UNet-shaped hierarchical structure and supports long-range interactions enabled by spatially-gated MLPs. Specifically, some example implementations of MAXIM can contain two MLP-based building blocks: a multi-axis gated MLP that allows for efficient and scalable spatial mixing of local and global visual cues, and a cross-gating block, an alternative to cross-attention, which accounts for cross-feature mutual conditioning.

    End-to-end watermarking system
    36.
    发明授权

    公开(公告)号:US12238322B2

    公开(公告)日:2025-02-25

    申请号:US18008789

    申请日:2022-01-11

    Applicant: GOOGLE LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for jointly training an encoder that generates a watermark and a decoder that decodes a data item encoded within the watermark. The training comprises obtaining a plurality of training images and data items. For each training image, a first watermark is generated using an encoder and a subsequent second watermark is generated by tiling two or more first watermarks. The training image is watermarked using the second watermark to generate a first error value and distortions are added to the watermarked image. A distortion detector predicts the distortions based on which the distorted image is modified. The modified image is decoded by the decoder to generate a predicted data item and a second error value. The training parameters of the encoder and decoder are adjusted based on the first and the second error value.

    Methods, systems, and media for determining perceptual quality indicators of video content items

    公开(公告)号:US12206914B2

    公开(公告)日:2025-01-21

    申请号:US18021636

    申请日:2022-06-08

    Applicant: Google LLC

    Abstract: Methods, systems, and media for determining perceptual quality indicators of video content items are provided. In some embodiments, the method comprises: receiving a video content item; extracting a plurality of frames from the video content item; determining, using a first subnetwork of a deep neural network, a content quality indicator for each frame of the plurality of frames of the video content item; determining, using a second subnetwork of the deep neural network, a video distortion indicator for each frame of the plurality of frames of the video content item; determining, using a third subnetwork of the deep neural network, a compression sensitivity indicator for each frame of the plurality of frames of the video content item; generating a quality level for each frame of the plurality of frames of the video content item that concatenates the content quality indicator, the video distortion indicator, and the compression sensitivity indicator for that frame of the video content item; generating an overall quality level for video content item by aggregating the quality level of each frame of the plurality of frames; and causing a video recommendation to be presented based on the overall quality level of the video content item.

    END-TO-END WATERMARKING SYSTEM
    38.
    发明公开

    公开(公告)号:US20230362399A1

    公开(公告)日:2023-11-09

    申请号:US18008789

    申请日:2022-01-11

    Applicant: GOOGLE LLC

    CPC classification number: H04N19/467 G06T1/0021

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for jointly training an encoder that generates a watermark and a decoder that decodes a data item encoded within the watermark. The training comprises obtaining a plurality of training images and data items. For each training image, a first watermark is generated using an encoder and a subsequent second watermark is generated by tiling two or more first watermarks. The training image is watermarked using the second watermark to generate a first error value and distortions are added to the watermarked image. A distortion detector predicts the distortions based on which the distorted image is modified. The modified image is decoded by the decoder to generate a predicted data item and a second error value. The training parameters of the encoder and decoder are adjusted based on the first and the second error value.

    MULTI-SCALE TRANSFORMER FOR IMAGE ANALYSIS
    40.
    发明公开

    公开(公告)号:US20230222623A1

    公开(公告)日:2023-07-13

    申请号:US17787699

    申请日:2021-07-01

    Applicant: Google LLC

    Abstract: The technology employs a patch-based multi-scale Transformer (300) that is usable with various imaging applications. This avoids constraints on image fixed input size and predicts the quality effectively on a native resolution image. A native resolution image (304) is transformed into a multi-scale representation (302), enabling the Transformer's self-attention mechanism to capture information on both fine-grained detailed patches and coarse-grained global patches. Spatial embedding (316) is employed to map patch positions to a fixed grid, in which patch locations at each scale are hashed to the same grid. A separate scale embedding (318) is employed to distinguish patches coming from different scales in the multiscale representation. Self-attention (508) is performed to create a final image representation. In some instances, prior to performing self-attention, the system may prepend a learnable classification token (322) to the set of input tokens.

Patent Agency Ranking