-
1.
公开(公告)号:US20210150407A1
公开(公告)日:2021-05-20
申请号:US16684396
申请日:2019-11-14
发明人: Jing Xu , Si Er Han , Steven George Barbee , Xue Ying Zhang , Ji Hui Yang
摘要: A computer-implemented method, system and computer program product for improving prediction accuracy in machine learning techniques. A teacher model is constructed, where the teacher model generates a weight for each data case. The current student model is then trained using training data and the weights generated by the teacher model. After training the current student model, the current student model generates state features, which are used by the teacher model to generate new weights. A candidate student model is then trained using training data and these new weights. A reward is generated by comparing the current student model with the candidate student model using training and testing data, which is used to update the teacher model if a stopping rule has not been satisfied. Upon a stopping rule being satisfied, the weights generated by the teacher model are deemed to be the “optimal” weights which are returned to the user.
-
公开(公告)号:US20210142213A1
公开(公告)日:2021-05-13
申请号:US16681920
申请日:2019-11-13
发明人: Si Er Han , Steven George Barbee , Jing Xu , Ji Hui Yang , Xue Ying Zhang
IPC分类号: G06N20/00 , G06N5/04 , G06F16/28 , G06F16/2457
摘要: Evaluating data partition quality is provided. A historical data set is partitioned into a specified number of partitions. A quality of each partition in the specified number of partitions is evaluated by measuring a distribution similarity between variables from each data subset in a respective partition and the historical data set. A highest-quality partition in the specified number of partitions is recommended to build a supervised machine learning model based on the highest-quality partition having a highest variable distribution similarity measure with the historical data set.
-
公开(公告)号:US12093838B2
公开(公告)日:2024-09-17
申请号:US17027688
申请日:2020-09-21
发明人: Jing Xu , Si Er Han , Xue Ying Zhang , Steven George Barbee , Ji Hui Yang
IPC分类号: G06N5/01 , G06F17/18 , G06F18/22 , G06F18/2413 , G06N7/01
CPC分类号: G06N5/01 , G06F17/18 , G06F18/22 , G06F18/2413 , G06N7/01
摘要: Embodiments of the present disclosure relate to a method, system, and computer program product for efficient execution of a decision tree. According to the method, respective target values of a plurality of attributes of a target entity are obtained. Representations of a plurality of leaf nodes of a decision tree are obtained. Each of the representations indicates respective statistic values of a plurality of attributes of historical entities and a statistic prediction result determined from historical prediction results output at a respective one of the plurality of leaf nodes for the historical entities. Distance measures between the target entity and the plurality of leaf nodes are determined based on the target values and the statistic values indicated by the representations of the plurality of leaf nodes. A target prediction result for the target entity is determined based on the distance measures and the statistic prediction results of the historical entities.
-
4.
公开(公告)号:US11443235B2
公开(公告)日:2022-09-13
申请号:US16684396
申请日:2019-11-14
发明人: Jing Xu , Si Er Han , Steven George Barbee , Xue Ying Zhang , Ji Hui Yang
摘要: A computer-implemented method, system and computer program product for improving prediction accuracy in machine learning techniques. A teacher model is constructed, where the teacher model generates a weight for each data case. The current student model is then trained using training data and the weights generated by the teacher model. After training the current student model, the current student model generates state features, which are used by the teacher model to generate new weights. A candidate student model is then trained using training data and these new weights. A reward is generated by comparing the current student model with the candidate student model using training and testing data, which is used to update the teacher model if a stopping rule has not been satisfied. Upon a stopping rule being satisfied, the weights generated by the teacher model are deemed to be the “optimal” weights which are returned to the user.
-
公开(公告)号:US20220156572A1
公开(公告)日:2022-05-19
申请号:US16950017
申请日:2020-11-17
发明人: Si Er Han , Jing Xu , Xue Ying Zhang , Ji Hui Yang , Steven George Barbee
摘要: A computer-implemented method, system and computer program product for processing a data set is provided. In this method, an original data set including a plurality of data records is obtained. Each data record in the original data set has values of a first number of features. A representative data set having the plurality of representative data records is determined. Each representative data record has values of a second number of representatives. The second number of representatives are obtained by training an autoencoder neutral network with values of the first number of features as inputs, and the second number is smaller than the first number. The plurality of representative data records is segmented into two or more clusters based on the values of the second number of representatives. The representative data records in the two or more clusters are partitioned to form a predefined number of representative data subsets.
-
公开(公告)号:US20220092437A1
公开(公告)日:2022-03-24
申请号:US17027688
申请日:2020-09-21
发明人: Jing Xu , Si Er Han , Xue Ying Zhang , Steven George Barbee , Ji Hui Yang
摘要: Embodiments of the present disclosure relate to a method, system, and computer program product for efficient execution of a decision tree. According to the method, respective target values of a plurality of attributes of a target entity are obtained. Representations of a plurality of leaf nodes of a decision tree are obtained. Each of the representations indicates respective statistic values of a plurality of attributes of historical entities and a statistic prediction result determined from historical prediction results output at a respective one of the plurality of leaf nodes for the historical entities. Distance measures between the target entity and the plurality of leaf nodes are determined based on the target values and the statistic values indicated by the representations of the plurality of leaf nodes. A target prediction result for the target entity is determined based on the distance measures and the statistic prediction results of the historical entities.
-
公开(公告)号:US20230073137A1
公开(公告)日:2023-03-09
申请号:US17447258
申请日:2021-09-09
发明人: Jing Xu , Si Er Han , Xue Ying Zhang , Steven George Barbee , Ji Hui Yang
摘要: A computer implemented method for machine learning model training. A number of processor units creates a cluster model comprising labeled samples and unlabeled samples. The number of processor units identifies cluster information for the labeled samples from the cluster model. The number of processor units adds a set of new features to a set of original features for the labeled samples using the cluster information to form an extended set of features for the labeled samples, wherein the labeled samples with the set of original features and the set of new features form a training data set for training a machine learning model.
-
公开(公告)号:US20230083118A1
公开(公告)日:2023-03-16
申请号:US17476401
申请日:2021-09-15
发明人: Steven George Barbee , Si Er Han , Jing Xu , Ji Hui Yang , Xue Ying Zhang
摘要: An approach is provided in which the approach generates anomaly score variables using multiple unsupervised models based on a set of data records. The approach normalizes the anomaly score variables into multiple normalized variables, and constructs at least one interaction based on a first one of the normalized variables and a second one of the normalized variables. The first normalized variable corresponds to a first one of the anomaly score variables and the second normalized variable corresponds to a second one of the anomaly score variables. The approach detects a set of anomalies based on the at least one interaction and transmits the set of anomalies to a user.
-
公开(公告)号:US11562400B1
公开(公告)日:2023-01-24
申请号:US17483328
申请日:2021-09-23
发明人: Jing Xu , Si Er Han , Xue Ying Zhang , Steven George Barbee , Ji Hui Yang
摘要: A method includes training a plurality of different types of machine learning models using a training dataset to produce a set of trained machine learning models and determining a lift of each trained machine learning model in the set of trained machine learning models using a validation dataset. The method also includes selecting a trained machine learning model from the set of trained machine learning models that has a highest lift of the set of trained machine learning models and predicting a likelihood that a person will perform an action by applying the selected trained machine learning model to data about the person.
-
10.
公开(公告)号:US20220292401A1
公开(公告)日:2022-09-15
申请号:US17827495
申请日:2022-05-27
发明人: Jing Xu , Si Er Han , Steven George Barbee , Xue Ying Zhang , Ji Hui Yang
摘要: A computer-implemented method, system and computer program product for improving prediction accuracy in machine learning techniques. A teacher model is constructed, where the teacher model generates a weight for each data case. The current student model is then trained using training data and the weights generated by the teacher model. After training the current student model, the current student model generates state features, which are used by the teacher model to generate new weights. A candidate student model is then trained using training data and these new weights. A reward is generated by comparing the current student model with the candidate student model using training and testing data, which is used to update the teacher model if a stopping rule has not been satisfied. Upon a stopping rule being satisfied, the weights generated by the teacher model are deemed to be the “optimal” weights which are returned to the user.
-
-
-
-
-
-
-
-
-