Large Language Models in Cloud Database Platforms

    公开(公告)号:US20250036670A1

    公开(公告)日:2025-01-30

    申请号:US18736618

    申请日:2024-06-07

    Applicant: Google LLC

    Abstract: Aspects of the disclosure are directed to integrating one or more large language models (LLMs) into a cloud database platform, such as a data warehouse. Users of the cloud database platform can provide queries to instruct one or more LLMs to perform generative natural language processing tasks by manipulating or generating text directly in the cloud database platform with a table valued function. Users can provide input to register or generate one or more LLMs of the cloud database platform for performing the natural language processing tasks. Integrating LLMs into the cloud database platform can improve processing capabilities of the LLMs and save computing resources, as specialized LLMs or application-specific API may no longer be necessary.

    Creating a machine learning model with k-means clustering

    公开(公告)号:US11842291B2

    公开(公告)日:2023-12-12

    申请号:US18062271

    申请日:2022-12-06

    Applicant: Google LLC

    CPC classification number: G06N5/04 G06F7/14 G06F16/29 G06N20/00

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, that creates a machine learning model with k-means clustering. In some implementations, an instruction to create a model is obtained. A data set including geographic data and non-geographic data is received. The data set includes multiple data entries. Geographic centroids are determined from the geographic data. The data set is analyzed to obtain statistics of the data set. Transformed data is generated from the data set, the statistics, and the geographic centroids. A model is generated with the transformed data, the model indicating multiple data groupings.

    Machine Learning Regression Analysis

    公开(公告)号:US20230094479A1

    公开(公告)日:2023-03-30

    申请号:US17449660

    申请日:2021-09-30

    Applicant: Google LLC

    Abstract: A method includes receiving a model analysis request from a user. The model analysis requests requesting the data processing hardware to provide one or more statistics of a model trained on a dataset. The method also includes obtaining the trained model. The trained model includes a plurality of weights. Each weight is assigned to a feature of the trained model. The model also includes determining, using the dataset and the plurality of weights, the one or more statistics of the trained model based on a linear regression of the trained model. The method includes reporting the one or more statistics of the trained model to the user.

    Machine Learning Super Large-Scale Time-series Forecasting

    公开(公告)号:US20230274180A1

    公开(公告)日:2023-08-31

    申请号:US17652863

    申请日:2022-02-28

    Applicant: Google LLC

    CPC classification number: G06N20/00 G06F16/248

    Abstract: A method for forecasting time-series data, when executed by data processing hardware, causes the data processing hardware to perform operations including receiving a time series forecasting query from a user requesting a time series forecast forecasting future data based on a set of current time-series data. The operations include obtaining, from the set of current time-series data, a set of training data. The operations include training, using a first portion of the set of training data, a first sub-model of a forecasting model and training, using a second portion of the set of training data, a second sub-model of the forecasting model. The second portion is different than the first portion. The operations include forecasting, using the forecasting model, the future data based on the set of current time-series data and returning, to the user, the forecasted future data for the time series forecast.

    Anomaly Detection with Local Outlier Factor
    6.
    发明公开

    公开(公告)号:US20230153311A1

    公开(公告)日:2023-05-18

    申请号:US18053738

    申请日:2022-11-08

    Applicant: Google LLC

    CPC classification number: G06F16/2462 G06F16/215 G06F16/256

    Abstract: A method for anomaly detection includes receiving an anomaly detection query from a user. The anomaly detection query requests data processing hardware determine one or more anomalies in a dataset including a plurality of examples. Each example in the plurality of examples is associated with one or more features. The method includes training a model using the dataset. The trained model is configured to use a local outlier factor (LOF) algorithm. For each respective example of the plurality of examples in the dataset, the method includes determining, using the trained model, a respective local deviation score based on the one or more features. The method includes determining that the respective local deviation score satisfies a deviation score threshold and, based on the location deviation score satisfying the threshold, determining that the respective example is anomalous. The method includes reporting the respective anomalous example to the user.

    Holiday Modeling in Forecasting
    7.
    发明申请

    公开(公告)号:US20250013937A1

    公开(公告)日:2025-01-09

    申请号:US18739429

    申请日:2024-06-11

    Applicant: Google LLC

    Abstract: Aspects of the disclosure are directed methods, systems, and computer readable media for in-database holiday effect modeling for time series forecasting. The modeling can be accurate, explainable, customizable, and scalable. Machine learning models can receive a first dataset for time series data and a second dataset for configurable holiday data. The models can detect and model effects of each configurable holiday on one or more forecasts, effectively accumulating effects of overlapping holidays, to manage different levels of holiday modeling. Holiday data can be customizable, including an ability to modify existing holidays and/or add new holidays, through one or more interfaces that can display default holiday information, combined holiday information based on both default and customizable holidays, effects of each holiday on forecasts, and accumulated effects of multiple holidays on forecasts.

    Point Anomaly Detection
    8.
    发明公开

    公开(公告)号:US20240193035A1

    公开(公告)日:2024-06-13

    申请号:US18438717

    申请日:2024-02-12

    Applicant: Google LLC

    CPC classification number: G06F11/0793 G06F11/0709 G06F11/079

    Abstract: A method includes receiving a point data anomaly detection query from a user. The query requests the data processing hardware to determine a quantity of anomalous point data values in a set of point data values. The method includes training a model using the set of point data values. For at least one respective point data value in the set of point data values, the method includes determining, using the trained model, a variance value for the respective point data value and determining that the variance value satisfies a threshold value. Based on the variance value satisfying the threshold value, the method includes determining that the respective point data value includes an anomalous point data value. The method includes reporting the determined anomalous point data value to the user.

    Principal Component Analysis
    9.
    发明申请

    公开(公告)号:US20230045139A1

    公开(公告)日:2023-02-09

    申请号:US17816288

    申请日:2022-07-29

    Applicant: Google LLC

    Abstract: A method for principal component analysis includes receiving a principal component analysis (PCA) request from a user requesting data processing hardware to perform PCA on a dataset, the dataset including a plurality of input features. The method further includes training a PCA model on the plurality of input features of the dataset. The method includes determining, using the trained PCA model, one or more principal components of the dataset. The method also includes generating, based on the plurality of input features and the one or more principal components, one or more embedded features of the dataset. The method includes returning the one or more embedded features to the user.

    Machine Learning Hyperparameter Tuning

    公开(公告)号:US20220366318A1

    公开(公告)日:2022-11-17

    申请号:US17663430

    申请日:2022-05-15

    Applicant: Google LLC

    Abstract: A method, when executed by data processing hardware, causes the data processing hardware to perform operations including receiving, from a user device, a hyperparameter optimization request requesting optimization of one or more hyperparameters of a machine learning model. The operations include obtaining training data for training the machine learning model and determining a set of hyperparameter permutations of the one or more hyperparameters. For each respective hyperparameter permutation in the set of hyperparameter permutations, the operations include training a unique machine learning model using the training data and the respective hyperparameter permutation and determining a performance of the trained model. The operations include selecting, based on the performance of each of the trained unique machine learning models of the user device, one of the trained unique machine learning models. The operations include generating one or more predictions using the selected one of the trained unique machine learning models.

Patent Agency Ranking