-
公开(公告)号:US20240143700A1
公开(公告)日:2024-05-02
申请号:US18409411
申请日:2024-01-10
Applicant: Google LLC
Inventor: Ariel Fuxman , Aleksei Timofeev , Zhen Li , Chun-Ta Lu , Manan Shah , Chen Sun , Krishnamurthy Viswanathan , Chao Jia
IPC: G06F18/24 , G06F18/214 , G06F18/2413
CPC classification number: G06F18/24 , G06F18/214 , G06F18/24147
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for realizing a multimodal image classifier. In an aspect, a method includes, for each image of a plurality of images: processing the image by a textual generator model to obtain a set of phrases that are descriptive of the content of the image, wherein each phrase is one or more terms, processing the set of phrases by a textual embedding model to obtain an embedding of predicted text for the image, and processing the image using an image embedding model to obtain an embedding of image pixels of the image. Then a multimodal image classifier is trained on the embeddings of predicted text for the images and the embeddings of image pixels for the images to produce, as output, labels of an output taxonomy to classify an image based on the image as input.
-
公开(公告)号:US20240370487A1
公开(公告)日:2024-11-07
申请号:US18253859
申请日:2022-11-04
Applicant: Google LLC
Inventor: Severin Heiniger , Balint Miklos , Yun-Hsuan Sung , Zhen Li , Yinfei Yang , Chao Jia
IPC: G06F16/538 , G06F16/55 , G06N3/084
Abstract: Systems and methods of the present disclosure are directed to computer-implemented method for machine-learned multimodal search refinement. The method includes obtaining a query image embedding for a query image and a textual query refinement associated with the query image. The method includes processing the query image embedding and the textual query refinement with a machine-learned query refinement model to obtain a refined query image embedding that incorporates the textual query refinement. The method includes evaluating a loss function that evaluates a distance between the refined query image embedding and an embedding for a ground truth image within an image embedding space. The method includes modifying value(s) of parameter(s) of the machine-learned query refinement model based on the loss function.
-
公开(公告)号:US11907337B2
公开(公告)日:2024-02-20
申请号:US17046313
申请日:2019-11-18
Applicant: Google LLC
Inventor: Ariel Fuxman , Aleksei Timofeev , Zhen Li , Chun-Ta Lu , Manan Shah , Chen Sun , Krishnamurthy Viswanathan , Chao Jia
IPC: G06K9/62 , G06K9/46 , G06F18/24 , G06F18/214 , G06F18/2413
CPC classification number: G06F18/24 , G06F18/214 , G06F18/24147
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for realizing a multimodal image classifier. In an aspect, a method includes, for each image of a plurality of images: processing the image by a textual generator model to obtain a set of phrases that are descriptive of the content of the image, wherein each phrase is one or more terms, processing the set of phrases by a textual embedding model to obtain an embedding of predicted text for the image, and processing the image using an image embedding model to obtain an embedding of image pixels of the image. Then a multimodal image classifier is trained on the embeddings of predicted text for the images and the embeddings of image pixels for the images to produce, as output, labels of an output taxonomy to classify an image based on the image as input.
-
公开(公告)号:US20210264203A1
公开(公告)日:2021-08-26
申请号:US17046313
申请日:2019-11-18
Applicant: Google LLC
Inventor: Ariel Fuxman , Aleksei Timofeev , Zhen Li , Chun-Ta Lu , Manan Shah , Chen Sun , Krishnamurthy Viswanathan , Chao Jia
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for realizing a multimodal image classifier. In an aspect, a method includes, for each image of a plurality of images: processing the image by a textual generator model to obtain a set of phrases that are descriptive of the content of the image, wherein each phrase is one or more terms, processing the set of phrases by a textual embedding model to obtain an embedding of predicted text for the image, and processing the image using an image embedding model to obtain an embedding of image pixels of the image. Then a multimodal image classifier is trained on the embeddings of predicted text for the images and the embeddings of image pixels for the images to produce, as output, labels of an output taxonomy to classify an image based on the image as input.
-
-
-