Distributed model training
    1.
    发明授权

    公开(公告)号:US11853391B1

    公开(公告)日:2023-12-26

    申请号:US16139607

    申请日:2018-09-24

    CPC classification number: G06F18/2148 G06N20/00

    Abstract: Exemplary embodiments provide distributed parallel training of a machine learning model. Multiple processors may be used to train a machine learning model to reduce training time. To synchronize trained model data between the processors, data is communicated between the processors after some number of training cycles. To improve the communication efficiency, exemplary embodiments synchronize data among a set of processors after a predetermined number of training cycles, and synchronize data between one or more processors of each set of the processors after a predetermined number of training cycles. During the first synchronization among a set of processors, compressed model gradient data generated after performing the training cycles may be communicated. During the second synchronization between the set of processors, trained models or full model gradient data generated after performing the training cycles may be communicated.

Patent Agency Ranking