IDENTIFYING OPTIMAL WEIGHTS TO IMPROVE PREDICTION ACCURACY IN MACHINE LEARNING TECHNIQUES

    公开(公告)号:US20210150407A1

    公开(公告)日:2021-05-20

    申请号:US16684396

    申请日:2019-11-14

    IPC分类号: G06N20/00 G06N5/02

    摘要: A computer-implemented method, system and computer program product for improving prediction accuracy in machine learning techniques. A teacher model is constructed, where the teacher model generates a weight for each data case. The current student model is then trained using training data and the weights generated by the teacher model. After training the current student model, the current student model generates state features, which are used by the teacher model to generate new weights. A candidate student model is then trained using training data and these new weights. A reward is generated by comparing the current student model with the candidate student model using training and testing data, which is used to update the teacher model if a stopping rule has not been satisfied. Upon a stopping rule being satisfied, the weights generated by the teacher model are deemed to be the “optimal” weights which are returned to the user.

    Efficient execution of a decision tree

    公开(公告)号:US12093838B2

    公开(公告)日:2024-09-17

    申请号:US17027688

    申请日:2020-09-21

    摘要: Embodiments of the present disclosure relate to a method, system, and computer program product for efficient execution of a decision tree. According to the method, respective target values of a plurality of attributes of a target entity are obtained. Representations of a plurality of leaf nodes of a decision tree are obtained. Each of the representations indicates respective statistic values of a plurality of attributes of historical entities and a statistic prediction result determined from historical prediction results output at a respective one of the plurality of leaf nodes for the historical entities. Distance measures between the target entity and the plurality of leaf nodes are determined based on the target values and the statistic values indicated by the representations of the plurality of leaf nodes. A target prediction result for the target entity is determined based on the distance measures and the statistic prediction results of the historical entities.

    Identifying optimal weights to improve prediction accuracy in machine learning techniques

    公开(公告)号:US11443235B2

    公开(公告)日:2022-09-13

    申请号:US16684396

    申请日:2019-11-14

    摘要: A computer-implemented method, system and computer program product for improving prediction accuracy in machine learning techniques. A teacher model is constructed, where the teacher model generates a weight for each data case. The current student model is then trained using training data and the weights generated by the teacher model. After training the current student model, the current student model generates state features, which are used by the teacher model to generate new weights. A candidate student model is then trained using training data and these new weights. A reward is generated by comparing the current student model with the candidate student model using training and testing data, which is used to update the teacher model if a stopping rule has not been satisfied. Upon a stopping rule being satisfied, the weights generated by the teacher model are deemed to be the “optimal” weights which are returned to the user.

    DATA PARTITIONING WITH NEURAL NETWORK

    公开(公告)号:US20220156572A1

    公开(公告)日:2022-05-19

    申请号:US16950017

    申请日:2020-11-17

    IPC分类号: G06N3/08 G06F16/27

    摘要: A computer-implemented method, system and computer program product for processing a data set is provided. In this method, an original data set including a plurality of data records is obtained. Each data record in the original data set has values of a first number of features. A representative data set having the plurality of representative data records is determined. Each representative data record has values of a second number of representatives. The second number of representatives are obtained by training an autoencoder neutral network with values of the first number of features as inputs, and the second number is smaller than the first number. The plurality of representative data records is segmented into two or more clusters based on the values of the second number of representatives. The representative data records in the two or more clusters are partitioned to form a predefined number of representative data subsets.

    EFFICIENT EXECUTION OF A DECISION TREE

    公开(公告)号:US20220092437A1

    公开(公告)日:2022-03-24

    申请号:US17027688

    申请日:2020-09-21

    摘要: Embodiments of the present disclosure relate to a method, system, and computer program product for efficient execution of a decision tree. According to the method, respective target values of a plurality of attributes of a target entity are obtained. Representations of a plurality of leaf nodes of a decision tree are obtained. Each of the representations indicates respective statistic values of a plurality of attributes of historical entities and a statistic prediction result determined from historical prediction results output at a respective one of the plurality of leaf nodes for the historical entities. Distance measures between the target entity and the plurality of leaf nodes are determined based on the target values and the statistic values indicated by the representations of the plurality of leaf nodes. A target prediction result for the target entity is determined based on the distance measures and the statistic prediction results of the historical entities.

    Feature Generation for Training Data Sets Based on Unlabeled Data

    公开(公告)号:US20230073137A1

    公开(公告)日:2023-03-09

    申请号:US17447258

    申请日:2021-09-09

    IPC分类号: G06N20/00 G06K9/62

    摘要: A computer implemented method for machine learning model training. A number of processor units creates a cluster model comprising labeled samples and unlabeled samples. The number of processor units identifies cluster information for the labeled samples from the cluster model. The number of processor units adds a set of new features to a set of original features for the labeled samples using the cluster information to form an extended set of features for the labeled samples, wherein the labeled samples with the set of original features and the set of new features form a training data set for training a machine learning model.

    FRAUD SUSPECTS DETECTION AND VISUALIZATION

    公开(公告)号:US20230083118A1

    公开(公告)日:2023-03-16

    申请号:US17476401

    申请日:2021-09-15

    IPC分类号: G06Q20/40 G06K9/62 G06F17/18

    摘要: An approach is provided in which the approach generates anomaly score variables using multiple unsupervised models based on a set of data records. The approach normalizes the anomaly score variables into multiple normalized variables, and constructs at least one interaction based on a first one of the normalized variables and a second one of the normalized variables. The first normalized variable corresponds to a first one of the anomaly score variables and the second normalized variable corresponds to a second one of the anomaly score variables. The approach detects a set of anomalies based on the at least one interaction and transmits the set of anomalies to a user.

    Uplift modeling
    9.
    发明授权

    公开(公告)号:US11562400B1

    公开(公告)日:2023-01-24

    申请号:US17483328

    申请日:2021-09-23

    IPC分类号: G06Q30/02 G06K9/62 G06N20/00

    摘要: A method includes training a plurality of different types of machine learning models using a training dataset to produce a set of trained machine learning models and determining a lift of each trained machine learning model in the set of trained machine learning models using a validation dataset. The method also includes selecting a trained machine learning model from the set of trained machine learning models that has a highest lift of the set of trained machine learning models and predicting a likelihood that a person will perform an action by applying the selected trained machine learning model to data about the person.

    IDENTIFYING OPTIMAL WEIGHTS TO IMPROVE PREDICTION ACCURACY IN MACHINE LEARNING TECHNIQUES

    公开(公告)号:US20220292401A1

    公开(公告)日:2022-09-15

    申请号:US17827495

    申请日:2022-05-27

    IPC分类号: G06N20/00 G06N5/02

    摘要: A computer-implemented method, system and computer program product for improving prediction accuracy in machine learning techniques. A teacher model is constructed, where the teacher model generates a weight for each data case. The current student model is then trained using training data and the weights generated by the teacher model. After training the current student model, the current student model generates state features, which are used by the teacher model to generate new weights. A candidate student model is then trained using training data and these new weights. A reward is generated by comparing the current student model with the candidate student model using training and testing data, which is used to update the teacher model if a stopping rule has not been satisfied. Upon a stopping rule being satisfied, the weights generated by the teacher model are deemed to be the “optimal” weights which are returned to the user.