Model training methods, apparatuses, and systems

    公开(公告)号:US11176469B2

    公开(公告)日:2021-11-16

    申请号:US17244811

    申请日:2021-04-29

    Abstract: A first training participant performs an iterative process until a predetermined condition is satisfied, where the iterative process includes: obtaining, using secret sharing matrix addition and based on the current sub-model of each training participant and a corresponding feature sample subset of each training participant, a current prediction value of the regression model for a feature sample set, where the corresponding feature sample subset of each training participant is obtained by performing vertical segmentation on the feature sample set; determining a prediction difference between the current prediction value and a label corresponding to the current prediction value; sending the prediction difference to each second training participant; and updating a current sub-model of the first training participant based on the current sub-model of the first training participant and a product of a corresponding feature sample subset of the first training participant and the prediction difference.

    Recommendation system construction method and apparatus

    公开(公告)号:US10902332B2

    公开(公告)日:2021-01-26

    申请号:US16725589

    申请日:2019-12-23

    Abstract: A client device determines a local user gradient value based on a current user preference vector and a local item gradient value based on a current item feature vector. The client device updates a user preference vector by using the local user gradient value and updates an item feature vector by using the local item gradient value. The client device determines a neighboring client device based on a predetermined adjacency relationship. The local item gradient value is sent by the client device to the neighboring client device. The client device receives a neighboring item gradient value sent by the neighboring client device. The client device updates the item feature vector by using the neighboring item gradient value. In response to the client device determining that a predetermined iteration stop condition is satisfied, the client device outputs the user preference vector and the item feature vector.

    Method and apparatus for clustering data stream

    公开(公告)号:US11226993B2

    公开(公告)日:2022-01-18

    申请号:US16684831

    申请日:2019-11-15

    Abstract: Provided is a method for clustering a data stream. The method comprises: acquiring a plurality of resulting models of a plurality of preceding data partitions prior to a current data partition in a data stream, wherein data partitions in the data stream have a temporal relationship, and wherein each of the plurality of resulting models is generated according to a clustering result of a corresponding preceding data partition, and each of the plurality of resulting models comprises one or more representative parameters in different categories; determining a starting model of the current data partition according to the plurality of resulting models, wherein the starting model comprises one or more representative parameters in different categories determined based on representative parameters of the same category in the plurality of resulting models; and clustering data records in the current data partition by using the starting model.

    GBDT model feature interpretation method and apparatus

    公开(公告)号:US11205129B2

    公开(公告)日:2021-12-21

    申请号:US16889695

    申请日:2020-06-01

    Abstract: Implementations of the present specification disclose methods, devices, and apparatuses for determining a feature interpretation of a predicted label value of a user generated by a GBDT model. In one aspect, the method includes separately obtaining, from each of a predetermined quantity of decision trees ranked among top decision trees, a leaf node and a score of the leaf node; determining a respective prediction path of each leaf node; obtaining, for each parent node on each prediction path, a split feature and a score of the parent node; determining, for each child node on each prediction path, a feature corresponding to the child node and a local increment of the feature on the child node; obtaining a collection of features respectively corresponding to the child nodes; and obtaining a respective measure of relevance between the feature corresponding to the at least one child node and the predicted label value.

    Model training method and apparatus based on data sharing

    公开(公告)号:US11106804B2

    公开(公告)日:2021-08-31

    申请号:US16720931

    申请日:2019-12-19

    Abstract: Techniques for data sharing between a data miner and a data provider are provided. A set of public parameters is downloaded from the data miner. The public parameters are data miner parameters associated with a feature set of training sample data. A set of private parameters in the data provider can be replaced with the set of public parameters. The private parameters are data provider parameters associated with the feature set of training sample data. The private parameters are updated to provide a set of update results. The private parameters are updated based on a model parameter update algorithm associated with the data provider. The update results is uploaded to the data miner.

    Model training method and apparatus based on gradient boosting decision tree

    公开(公告)号:US11157818B2

    公开(公告)日:2021-10-26

    申请号:US17158451

    申请日:2021-01-26

    Abstract: Disclosed are a model training method and apparatus based on gradient boosting decision tree (GBDT). A GBDT algorithm flow is divided into two stages. In the first stage, labeled samples are obtained from a data domain of a service scenario similar to a target service scenario to sequentially train several decision trees, and training residual generated after the training in the first stage is determined; in the second stage, labeled samples are obtained from a data domain of the target service scenario, and several decision trees continue to be trained based on the training residual. Finally, a model applied to the target service scenario is actually obtained by integrating the decision trees trained in the first stage with the decision trees trained in the second stage.

Patent Agency Ranking