-
公开(公告)号:US11853391B1
公开(公告)日:2023-12-26
申请号:US16139607
申请日:2018-09-24
Applicant: Amazon Technologies, Inc.
Inventor: Pranav Prashant Ladkat , Oleg Rybakov , Nikko Strom , Sri Venkata Surya Siva Rama Krishna Garimella , Sree Hari Krishnan Parthasarathi
IPC: G06F18/214 , G06N20/00
CPC classification number: G06F18/2148 , G06N20/00
Abstract: Exemplary embodiments provide distributed parallel training of a machine learning model. Multiple processors may be used to train a machine learning model to reduce training time. To synchronize trained model data between the processors, data is communicated between the processors after some number of training cycles. To improve the communication efficiency, exemplary embodiments synchronize data among a set of processors after a predetermined number of training cycles, and synchronize data between one or more processors of each set of the processors after a predetermined number of training cycles. During the first synchronization among a set of processors, compressed model gradient data generated after performing the training cycles may be communicated. During the second synchronization between the set of processors, trained models or full model gradient data generated after performing the training cycles may be communicated.