-
公开(公告)号:US11948159B2
公开(公告)日:2024-04-02
申请号:US16843334
申请日:2020-04-08
Applicant: Google LLC
Inventor: Amir H. Hormati , Lisa Yin , Umar Ali Syed , Mingge Deng
IPC: G06F16/332 , G06F16/22 , G06F16/2453 , G06F17/16 , G06F18/214 , G06N5/04 , G06Q30/0201
CPC classification number: G06Q30/0201 , G06F16/221 , G06F16/24535 , G06F17/16 , G06F18/214 , G06N5/04
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for scalable matrix factorization. A method includes obtaining a Structured Query Language (SQL) query to create a matrix factorization model based on a set of training data, generating SQL sub-queries that don't include non-scalable functions, obtaining the set of training data, and generating a matrix factorization model based on the set of training data and the SQL sub-queries that don't include non-scalable functions.
-
公开(公告)号:US11544596B2
公开(公告)日:2023-01-03
申请号:US16843371
申请日:2020-04-08
Applicant: Google LLC
Inventor: Mingge Deng , Amir H. Hormati , Xi Cheng
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, that creates a machine learning model with k-means clustering. In some implementations, an instruction to create a model is obtained. A data set including geographic data and non-geographic data is received. The data set includes multiple data entries. Geographic centroids are determined from the geographic data. The data set is analyzed to obtain statistics of the data set. Transformed data is generated from the data set, the statistics, and the geographic centroids. A model is generated with the transformed data, the model indicating multiple data groupings.
-
公开(公告)号:US20220405623A1
公开(公告)日:2022-12-22
申请号:US17354392
申请日:2021-06-22
Applicant: Google LLC
Inventor: Xi Cheng , Lisa Yin , Jiashang Liu , Amir H. Hormati , Mingge Deng , Christopher Avery Meyers
IPC: G06N5/04 , G06K9/62 , G06F16/245 , G06N20/00
Abstract: The disclosure is directed to a query-driven machine learning platform for generating feature attributions and other data for interpreting the relationship between inputs and outputs of a machine learning model. The platform can receive query statements for selecting data, training a machine learning model, and generating model explanation data for the model. The platform can distribute processing for generating the model explanation data to scale in response to requests to process selected data, including multiple records with a variety of different feature values. The interface between a user device and the machine learning platform can streamline deployment of different model explainability approaches across a variety of different machine learning models.
-
公开(公告)号:US20200320072A1
公开(公告)日:2020-10-08
申请号:US16843334
申请日:2020-04-08
Applicant: Google LLC
Inventor: Amir H. Hormati , Lisa Yin , Umar Ali Syed , Mingge Deng
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for scalable matrix factorization. A method includes obtaining a Structured Query Language (SQL) query to create a matrix factorization model based on a set of training data, generating SQL sub-queries that don't include non-scalable functions, obtaining the set of training data, and generating a matrix factorization model based on the set of training data and the SQL sub-queries that don't include non-scalable functions.
-
公开(公告)号:US20250124026A1
公开(公告)日:2025-04-17
申请号:US18484661
申请日:2023-10-11
Applicant: Google LLC
Inventor: Xi Cheng , Wen Zhang , Jiashang Liu , Mingge Deng , Amir Hormati , Omid Fatemieh
IPC: G06F16/2452 , G06F16/242 , G06F40/40
Abstract: A method includes receiving a text embedding generation query from a user requesting generation of a text embedding for one or more data elements stored at a data warehouse. In response, the method includes selecting, using the text embedding generation query, a text embedding model from a plurality of different text embedding models. The method includes generating, using the selected text embedding model, the text embedding for the one or more data elements and storing the text embeddings at the data warehouse. The method includes receiving a machine learning model training query from the user device requesting training of a machine learning model using the text embeddings. In response to receiving the machine learning model training query, the method includes training the machine learning model using the text embeddings. The method includes providing, to the user device, a notification indicating that training of the machine learning model is complete.
-
公开(公告)号:US11928017B2
公开(公告)日:2024-03-12
申请号:US17664409
申请日:2022-05-21
Applicant: Google LLC
Inventor: Zichuan Ye , Jiashang Liu , Forest Elliott , Amir Hormati , Xi Cheng , Mingge Deng
IPC: G06F11/07
CPC classification number: G06F11/0793 , G06F11/0709 , G06F11/079
Abstract: A method includes receiving a point data anomaly detection query from a user. The query requests the data processing hardware to determine a quantity of anomalous point data values in a set of point data values. The method includes training a model using the set of point data values. For at least one respective point data value in the set of point data values, the method includes determining, using the trained model, a variance value for the respective point data value and determining that the variance value satisfies a threshold value. Based on the variance value satisfying the threshold value, the method includes determining that the respective point data value is an anomalous point data value. The method includes reporting the determined anomalous point data value to the user.
-
公开(公告)号:US20240045845A1
公开(公告)日:2024-02-08
申请号:US17817987
申请日:2022-08-06
Applicant: Google LLC
Inventor: Thibaud Baptiste Hottelier , Yuri Volobuev , Mingge Deng , Justin Levandoski , Gaurav Saxena , Deepak Choudhary Nettem , Anoop Kochummen Johnson
IPC: G06F16/22 , G06F16/338
CPC classification number: G06F16/221 , G06F16/338 , G06F16/2282
Abstract: A method for unstructured data analytics in data warehouses includes receiving an unstructured data query from a user, the unstructured data query requesting the data processing hardware determine one or more unstructured data files stored at a data repository that match query parameters. The method includes determining, using an object table, a set of unstructured data files stored at the data repository that matches the query parameters. The object table includes a plurality of rows, each row of the plurality of rows associated with a respective unstructured data file stored at the data repository, and a plurality of columns, each column of the plurality of columns comprising metadata associated with the respective unstructured data file of each row of the plurality of rows. The method includes returning, to the user, a structured data table including the determined set of unstructured data files.
-
公开(公告)号:US20230094005A1
公开(公告)日:2023-03-30
申请号:US18062271
申请日:2022-12-06
Applicant: GOOGLE LLC
Inventor: Mingge Deng , Amir H. Hormati , Xi Cheng
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, that creates a machine learning model with k-means clustering. In some implementations, an instruction to create a model is obtained. A data set including geographic data and non-geographic data is received. The data set includes multiple data entries. Geographic centroids are determined from the geographic data. The data set is analyzed to obtain statistics of the data set. Transformed data is generated from the data set, the statistics, and the geographic centroids. A model is generated with the transformed data, the model indicating multiple data groupings.
-
公开(公告)号:US20220382622A1
公开(公告)日:2022-12-01
申请号:US17664409
申请日:2022-05-21
Applicant: Google LLC
Inventor: Zichaun Ye , Jiashang Liu , Forest Elliott , Amir Hormati , Xi Cheng , Mingge Deng
Abstract: A method includes receiving a point data anomaly detection query from a user. The query requests the data processing hardware to determine a quantity of anomalous point data values in a set of point data values. The method includes training a model using the set of point data values. For at least one respective point data value in the set of point data values, the method includes determining, using the trained model, a variance value for the respective point data value and determining that the variance value satisfies a threshold value. Based on the variance value satisfying the threshold value, the method includes determining that the respective point data value is an anomalous point data value. The method includes reporting the determined anomalous point data value to the user.
-
公开(公告)号:US20200320413A1
公开(公告)日:2020-10-08
申请号:US16843371
申请日:2020-04-08
Applicant: Google LLC
Inventor: Mingge Deng , Amir H. Hormati , Xi Cheng
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, that creates a machine learning model with k-means clustering. In some implementations, an instruction to create a model is obtained. A data set including geographic data and non-geographic data is received. The data set includes multiple data entries. Geographic centroids are determined from the geographic data. The data set is analyzed to obtain statistics of the data set. Transformed data is generated from the data set, the statistics, and the geographic centroids. A model is generated with the transformed data, the model indicating multiple data groupings.
-
-
-
-
-
-
-
-
-