Efficiently augmenting images with related content

    公开(公告)号:US11231832B2

    公开(公告)日:2022-01-25

    申请号:US16069071

    申请日:2017-09-13

    Applicant: Google LLC

    Abstract: The subject matter of this specification generally relates to providing content related to text depicted in images. In one aspect, a system includes a data processing apparatus configured to extract text from an image. The extracted text is partitioned into multiple blocks. The multiple blocks are presented as respective first user-selectable targets on a user interface at a first zoom level. A re user selection of a first block of the multiple blocks is detected. In response to detecting the user selection of the first block, portions of the extracted text in the first block are presented as respective second user-selectable targets on the user interface at a second zoom level greater than the first zoom level. In response to detecting a user selection of a portion of the extracted text within the first block, an action is initiated based on content of the user-selected text.

    EFFICIENTLY AUGMENTING IMAGES WITH RELATED CONTENT

    公开(公告)号:US20210208741A1

    公开(公告)日:2021-07-08

    申请号:US16069071

    申请日:2017-09-13

    Applicant: Google LLC

    Abstract: The subject matter of this specification generally relates to providing content related to text depicted in images. In one aspect, a system includes a data processing apparatus configured to extract text from an image. The extracted text is partitioned into multiple blocks. The multiple blocks are presented as respective first user-selectable targets on a user interface at a first zoom level. A re user selection of a first block of the multiple blocks is detected. In response to detecting the user selection of the first block, portions of the extracted text in the first block are presented as respective second user-selectable targets on the user interface at a second zoom level greater than the first zoom level. In response to detecting a user selection of a portion of the extracted text within the first block, an action is initiated based on content of the user-selected text.

    Instance Level Scene Recognition with a Vision Language Model

    公开(公告)号:US20250140006A1

    公开(公告)日:2025-05-01

    申请号:US18620136

    申请日:2024-03-28

    Applicant: Google LLC

    Abstract: Systems and methods for image understanding can include one or more object recognition systems and one or more vision language models to generate an augmented language output that can be both scene-aware and object-aware. The systems and methods can process an input image with an object recognition model to generate an object recognition output descriptive of identification details for an object depicted in the input image. The systems and methods can include processing the input image with a vision language model to generate a language output descriptive of a predicted scene description. The object recognition output can then be utilized to augment the language output to generate an augmented language output that includes the scene understanding of the language output with the specificity of the object recognition output.

    Efficiently Augmenting Images with Related Content

    公开(公告)号:US20250013351A1

    公开(公告)日:2025-01-09

    申请号:US18887662

    申请日:2024-09-17

    Applicant: Google LLC

    Abstract: The subject matter of this specification generally relates to providing content related to text depicted in images. In one aspect, a system includes a data processing apparatus configured to extract text from an image. The extracted text is partitioned into multiple blocks. The multiple blocks are presented as respective first user-selectable targets on a user interface at a first zoom level. A user selection of a first block of the multiple blocks is detected. In response to detecting the user selection of the first block, portions of the extracted text in the first block are presented as respective second user-selectable targets on the user interface at a second zoom level greater than the first zoom level. In response to detecting a user selection of a portion of the extracted text within the first block, an action is initiated based on content of the user-selected text.

    Visual Citations for Information Provided in Response to Multimodal Queries

    公开(公告)号:US20240378237A1

    公开(公告)日:2024-11-14

    申请号:US18314663

    申请日:2023-05-09

    Applicant: Google LLC

    Abstract: Result images are retrieved based on a similarity to a query image. A set of textual inputs is processed with a machine-learned language model to obtain a language output comprising textual content, wherein the set of textual inputs comprises textual content from source documents that include the result images, and a prompt associated with the query image. The language output and the result images are provided to a user computing device. Information is received descriptive of an indication by a user that a first result image is visually dissimilar to the query image. Textual content associated with the source document that includes the first result image from the set of textual inputs is removed. The set of textual inputs is processed with the machine-learned language model to obtain a refined language output. The refined language output is provided to the user computing device.

    Instance level scene recognition with a vision language model

    公开(公告)号:US11978271B1

    公开(公告)日:2024-05-07

    申请号:US18496402

    申请日:2023-10-27

    Applicant: Google LLC

    CPC classification number: G06V20/70 G06V10/764 G06V20/41

    Abstract: Systems and methods for image understanding can include one or more object recognition systems and one or more vision language models to generate an augmented language output that can be both scene-aware and object-aware. The systems and methods can process an input image with an object recognition model to generate an object recognition output descriptive of identification details for an object depicted in the input image. The systems and methods can include processing the input image with a vision language model to generate a language output descriptive of a predicted scene description. The object recognition output can then be utilized to augment the language output to generate an augmented language output that includes the scene understanding of the language output with the specificity of the object recognition output.

    Efficiently augmenting images with related content

    公开(公告)号:US11747960B2

    公开(公告)日:2023-09-05

    申请号:US17563695

    申请日:2021-12-28

    Applicant: Google LLC

    Abstract: The subject matter of this specification generally relates to providing content related to text depicted in images. In one aspect, a system includes a data processing apparatus configured to extract text from an image. The extracted text is partitioned into multiple blocks. The multiple blocks are presented as respective first user-selectable targets on a user interface at a first zoom level. A user selection of a first block of the multiple blocks is detected. In response to detecting the user selection of the first block, portions of the extracted text in the first block are presented as respective second user-selectable targets on the user interface at a second zoom level greater than the first zoom level. In response to detecting a user selection of a portion of the extracted text within the first block, an action is initiated based on content of the user-selected text.

Patent Agency Ranking