-
公开(公告)号:US20180331897A1
公开(公告)日:2018-11-15
申请号:US16044757
申请日:2018-07-25
Applicant: HUAWEI TECHNOLOGIES CO.,LTD.
Inventor: Youhua Zhang , Dandan Tu
Abstract: A method and a device for training a model in a distributed system are disclosed, so as to reduce load of a master node (101) during model training. The method includes: receiving, by a parameter server (1022) in a first slave node (102), a training result sent by a parameter client (1021) in at least one slave node (102) in the distributed system, where the first slave node (102) is any slave node (102) in the distributed system, and a parameter client (1021) in each slave node (102) obtains a training result by executing a training task corresponding to a sub-model stored on a parameter server (1022) in the slave node (102); and updating, by the parameter server (1022) in the first slave node (102) based on the received training result, a sub-model stored on the parameter server in the first slave node.