Multi-scale Transformer for Image Analysis

    公开(公告)号:US20250124537A1

    公开(公告)日:2025-04-17

    申请号:US18999336

    申请日:2024-12-23

    Applicant: Google LLC

    Abstract: The technology employs a patch-based multi-scale Transformer (300) that is usable with various imaging applications. This avoids constraints on image fixed input size and predicts the quality effectively on a native resolution image. A native resolution image (304) is transformed into a multi-scale representation (302), enabling the Transformer's self-attention mechanism to capture information on both fine-grained detailed patches and coarse-grained global patches. Spatial embedding (316) is employed to map patch positions to a fixed grid, in which patch locations at each scale are hashed to the same grid. A separate scale embedding (318) is employed to distinguish patches coming from different scales in the multiscale representation. Self-attention (508) is performed to create a final image representation. In some instances, prior to performing self-attention, the system may prepend a learnable classification token (322) to the set of input tokens.

    METHODS, SYSTEMS, AND MEDIA FOR DETERMINING PERCEPTUAL QUALITY INDICATORS OF VIDEO CONTENT ITEMS

    公开(公告)号:US20230319327A1

    公开(公告)日:2023-10-05

    申请号:US18021636

    申请日:2022-06-08

    Applicant: Google LLC

    CPC classification number: H04N21/23418 H04N19/154 H04N21/4668

    Abstract: Methods, systems, and media for determining perceptual quality indicators of video content items are provided. In some embodiments, the method comprises: receiving a video content item; extracting a plurality of frames from the video content item; determining, using a first subnetwork of a deep neural network, a content quality indicator for each frame of the plurality of frames of the video content item; determining, using a second subnetwork of the deep neural network, a video distortion indicator for each frame of the plurality of frames of the video content item; determining, using a third subnetwork of the deep neural network, a compression sensitivity indicator for each frame of the plurality of frames of the video content item; generating a quality level for each frame of the plurality of frames of the video content item that concatenates the content quality indicator, the video distortion indicator, and the compression sensitivity indicator for that frame of the video content item; generating an overall quality level for video content item by aggregating the quality level of each frame of the plurality of frames; and causing a video recommendation to be presented based on the overall quality level of the video content item.

    EVALUATING VISUAL QUALITY OF DIGITAL CONTENT

    公开(公告)号:US20220301141A1

    公开(公告)日:2022-09-22

    申请号:US17612372

    申请日:2020-08-06

    Applicant: GOOGLE LLC

    Abstract: Systems, devices, methods, and computer readable medium for evaluating visual quality of digital content are disclosed. Methods can include training machine learning models on images. A request is received to evaluate quality of an image included in a current version of a digital component generated by the computing device. The machine learning models are deployed on the image to generate a score for each quality characteristic of the image. A weight is assigned to each score to generate weighted scores. The weighted scores are combined to generate a combined score for the image. The combined score is compared to one or more thresholds to generate a quality of the image.

Patent Agency Ranking