-
公开(公告)号:US20230222285A1
公开(公告)日:2023-07-13
申请号:US17928984
申请日:2020-12-22
Applicant: Google LLC
Inventor: Mingyang Zhang , Cheng Li , Tao Chen , Spurthi Amba Hombaiah , Michael Bendersky , Marc Alexander Najork , Te-Lin Wu
IPC: G06F40/166 , G06F40/284 , G06V30/413 , G06F40/109
CPC classification number: G06F40/166 , G06F40/284 , G06V30/413 , G06F40/109
Abstract: Systems and methods for document processing that can process and understand the layout, text size, text style, and multimedia of a document can generate more accurate and informed document representations. The layout of a document paired with text size and style can indicate what portions of a document are possibly more important, and the understanding of that importance can help with understanding of the document. Systems and methods utilizing a hierarchical framework that processes the block-level and the document-level of a document can capitalize on these indicators to generate a better document representation.
-
公开(公告)号:US20230401382A1
公开(公告)日:2023-12-14
申请号:US18249275
申请日:2021-10-19
Applicant: Google LLC
Inventor: Spurthi Amba Hombaiah , Mingyang Zhang , Michael Bendersky , Tao Chen , Marc Alexander Najork
IPC: G06F40/242 , G06F40/40 , G06F40/30 , G06F40/284
CPC classification number: G06F40/242 , G06F40/40 , G06F40/30 , G06F40/284
Abstract: Provided are systems and methods for incremental training of machine learning models to adapt to changes in an underlying data distribution. One example setting in which the techniques described herein may be beneficial is for incrementally training natural language models to enable the models to have or adapt to a dynamically changing vocabulary. Incremental training is provided as a feasible and inexpensive way of adapting machine learning models to evolving vocabulary without having to retrain them from scratch.
-
公开(公告)号:US11238058B2
公开(公告)日:2022-02-01
申请号:US17086564
申请日:2020-11-02
Applicant: Google LLC
Inventor: Marc Alexander Najork , Sujith Ravi , Michael Bendersky , Peter Shao-sen Young , Timothy Youngjin Sohn , Mingyang Zhang , Thomas Nelson , Xuanhui Wang
IPC: G06F16/248 , G06F16/2455 , G06F16/951 , G06F16/38
Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate identification of additional trigger-terms for a structured information card. In one aspect, the method includes actions of accessing data associated with a template for presenting structured information, wherein the accessed data references (i) a label term and (ii) a value. Other actions may include obtaining a candidate label term, identifying one or more entities that are associated with the label term, identifying one or more of the entities that are associated with the candidate label term, and for each particular entity of the one or more entities that are associated with the candidate label term, associating, with the candidate label term, (i) a label term that is associated with the particular entity, and (ii) the value associated with the label term.
-
4.
公开(公告)号:US12210837B2
公开(公告)日:2025-01-28
申请号:US18321424
申请日:2023-05-22
Applicant: Google LLC
Inventor: Liu Yang , Marc Najork , Michael Bendersky , Mingyang Zhang , Cheng Li
IPC: G06F40/30 , G06F40/205 , G06N3/045 , G06N3/08
Abstract: Systems and methods of the present disclosure are directed to a method for predicting semantic similarity between documents. The method can include obtaining a first document and a second document. The method can include parsing the first document into a plurality of first textual blocks and the second document into a plurality of second textual blocks. The method can include processing each of the plurality of first textual blocks and the second textual blocks with a machine-learned semantic document encoding model to obtain a first document encoding and a second document encoding. The method can include determining a similarity metric descriptive of a semantic similarity between the first document and the second document based on the first document encoding and the second document encoding.
-
5.
公开(公告)号:US20230297783A1
公开(公告)日:2023-09-21
申请号:US18321424
申请日:2023-05-22
Applicant: Google LLC
Inventor: Liu Yang , Marc Najork , Michael Bendersky , Mingyang Zhang , Cheng Li
IPC: G06F40/30 , G06N3/08 , G06F40/205 , G06N3/045
CPC classification number: G06F40/30 , G06N3/08 , G06F40/205 , G06N3/045
Abstract: Systems and methods of the present disclosure are directed to a method for predicting semantic similarity between documents. The method can include obtaining a first document and a second document. The method can include parsing the first document into a plurality of first textual blocks and the second document into a plurality of second textual blocks. The method can include processing each of the plurality of first textual blocks and the second textual blocks with a machine-learned semantic document encoding model to obtain a first document encoding and a second document encoding. The method can include determining a similarity metric descriptive of a semantic similarity between the first document and the second document based on the first document encoding and the second document encoding.
-
公开(公告)号:US20210049165A1
公开(公告)日:2021-02-18
申请号:US17086564
申请日:2020-11-02
Applicant: Google LLC
Inventor: Marc Alexander Najork , Sujith Ravi , Michael Bendersky , Peter Shao-sen Young , Timothy Youngjin Sohn , Mingyang Zhang , Thomas Nelson , Xuanhui Wang
IPC: G06F16/248 , G06F16/2455 , G06F16/951 , G06F16/38
Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, to facilitate identification of additional trigger-terms for a structured information card. In one aspect, the method includes actions of accessing data associated with a template for presenting structured information, wherein the accessed data references (i) a label term and (ii) a value. Other actions may include obtaining a candidate label term, identifying one or more entities that are associated with the label term, identifying one or more of the entities that are associated with the candidate label term, and for each particular entity of the one or more entities that are associated with the candidate label term, associating, with the candidate label term, (i) a label term that is associated with the particular entity, and (ii) the value associated with the label term.
-
公开(公告)号:US20230177004A1
公开(公告)日:2023-06-08
申请号:US17544705
申请日:2021-12-07
Applicant: GOOGLE LLC
Inventor: Weize Kong , Mingyang Zhang , Michael Bendersky , Marc Alexander Najork , Mike Colagrosso , Brandon Vargo , Remy Burger
CPC classification number: G06F16/122 , G06F16/18
Abstract: Techniques are described herein for enabling more computationally efficient organization of files within a cloud storage system. A method includes: receiving information identifying a document and a set of folders; for each folder in the set of folders, using a trained model to predict a similarity measure between the folder and the document; for each folder in the set of folders, determining a score for the folder based on the predicted similarity measure for the folder; selecting a candidate folder from the set of folders using the scores of the folders within the set of folders; and providing, on a user interface, a selectable option to associate the document with the candidate folder.
-
公开(公告)号:US20240403564A1
公开(公告)日:2024-12-05
申请号:US18325934
申请日:2023-05-30
Applicant: Google LLC
Inventor: Michael Bendersky , Mingyang Zhang
Abstract: A method for providing personalized responses to textual prompts using a large scale, privacy preserving, large language model (LLM) includes receiving a textual prompt from a user specifying a task for an LLM to perform, and obtaining a set of user features associated with the user. The method also includes determining, using the set of user features associated with the user, a user prompt embedding for the user, and processing, using the LLM, the textual prompt conditioned on the user prompt embedding for the user to generate a personalized response to the textual prompt. The method further includes providing the personalized response to the textual prompt for output from a user device associated with the user.
-
公开(公告)号:US12072839B2
公开(公告)日:2024-08-27
申请号:US17544705
申请日:2021-12-07
Applicant: GOOGLE LLC
Inventor: Weize Kong , Mingyang Zhang , Michael Bendersky , Marc Alexander Najork , Mike Colagrosso , Brandon Vargo , Remy Burger
CPC classification number: G06F16/122 , G06F16/18
Abstract: Techniques are described herein for enabling more computationally efficient organization of files within a cloud storage system. A method includes: receiving information identifying a document and a set of folders; for each folder in the set of folders, using a trained model to predict a similarity measure between the folder and the document; for each folder in the set of folders, determining a score for the folder based on the predicted similarity measure for the folder; selecting a candidate folder from the set of folders using the scores of the folders within the set of folders; and providing, on a user interface, a selectable option to associate the document with the candidate folder.
-
10.
公开(公告)号:US11694034B2
公开(公告)日:2023-07-04
申请号:US17078569
申请日:2020-10-23
Applicant: Google LLC
Inventor: Liu Yang , Marc Najork , Michael Bendersky , Mingyang Zhang , Cheng Li
IPC: G06F40/30 , G06N3/08 , G06F40/205 , G06N3/045
CPC classification number: G06F40/30 , G06F40/205 , G06N3/045 , G06N3/08
Abstract: Systems and methods of the present disclosure are directed to a method for predicting semantic similarity between documents. The method can include obtaining a first document and a second document. The method can include parsing the first document into a plurality of first textual blocks and the second document into a plurality of second textual blocks. The method can include processing each of the plurality of first textual blocks and the second textual blocks with a machine-learned semantic document encoding model to obtain a first document encoding and a second document encoding. The method can include determining a similarity metric descriptive of a semantic similarity between the first document and the second document based on the first document encoding and the second document encoding.
-
-
-
-
-
-
-
-
-