-
公开(公告)号:US20240004828A1
公开(公告)日:2024-01-04
申请号:US18251123
申请日:2020-11-11
发明人: Kenji Tanaka , Tsuyoshi Ito , Yuki Arikawa , Tsutomu Takeya , Kazuhiko Terada , Takeshi Sakamoto
IPC分类号: G06F15/173 , G06F9/38
CPC分类号: G06F15/17306 , G06F9/3867
摘要: Each NIC performs an aggregation calculation of data output from each processor in a normal order including a head NIC located at a head position of a first pipeline connection, an intermediate NIC located at an intermediate position, and a tail NIC located at a tail position, and when the aggregation calculation in the tail NIC is completed, each NIC starts distribution of an obtained aggregation result, distributes the aggregation result in a reverse order including the tail NIC, the intermediate NIC, and the head NIC, and outputs the aggregation result to the processor of the communication interface.
-
公开(公告)号:US11823063B2
公开(公告)日:2023-11-21
申请号:US16973707
申请日:2019-05-21
发明人: Tsuyoshi Ito , Kenji Kawai , Junichi Kato , Huycu Ngo , Yuki Arikawa , Takeshi Sakamoto
摘要: Individual distributed processing nodes packetize distributed data for each weight of a neural network of a learning object in an order of a number of the weight, transmit the distributed data to an aggregation processing node, acquire aggregation data transmitted from the node in order, and update the weight of the neural network. The node acquires the transmitted distributed data, packetizes the aggregation data for which the distributed data of all the distributed processing nodes is aggregated for each weight, and transmits the aggregation data to the individual nodes. The individual nodes monitor an unreceived data amount which is a difference between data amounts of the transmitted distributed data and the acquired aggregation data, and when the unreceived data amount becomes equal to or larger than a threshold Ma, stops transmission of the distributed data until the unreceived data amount becomes equal to or smaller than a threshold Mb (Mb
-
公开(公告)号:US20230273835A1
公开(公告)日:2023-08-31
申请号:US18006934
申请日:2020-08-05
发明人: Yuki Arikawa , Kenji Tanaka , Tsuyoshi Ito , Takeshi Sakamoto
IPC分类号: G06F9/50
CPC分类号: G06F9/5072 , G06F9/505
摘要: A computer system according to the present invention includes N (N is an integer of 2 or more) data output devices, a transmission control device, and an arithmetic device, in which the arithmetic device executes predetermined arithmetic processing on data collected from the N data output devices via a communication network connecting the data output devices and the arithmetic device to each other, the transmission control device controls transmission timing of data output from the N data output devices according to a processing content of the predetermined arithmetic processing executed by the arithmetic device, and the N data storage devices are configured to output the data on the basis of the transmission timing notified by the transmission control device.
-
公开(公告)号:US20220321641A1
公开(公告)日:2022-10-06
申请号:US17627346
申请日:2019-07-16
发明人: Tsuyoshi Ito , Kenji Kawai , Junichi Kato , Huycu Ngo , Yuki Arikawa , Takeshi Sakamoto , Kenji Tanaka
IPC分类号: H04L67/10 , H04L67/2866 , G06N3/08
摘要: A distributed deep learning system according to an embodiment includes M distributed processing nodes that perform deep learning of a neural network distributed from each other, and N aggregation processing nodes that are connected to each of the M distributed processing nodes via a first communication line and a second communication line, and perform aggregation of distributed processing results obtained at the M distributed processing nodes via the first communication line. Accordingly, even in a case of a plurality of users sharing the distributed deep learning system at the same time, efficient and stable distributed deep learning processing can be realized.
-
公开(公告)号:US20220261620A1
公开(公告)日:2022-08-18
申请号:US17596070
申请日:2019-06-03
发明人: Kenji Kawai , Junichi Kato , Huycu Ngo , Yuki Arikawa , Tsuyoshi lto , Takeshi Sakamoto
摘要: A distributed processing node transmits distributed data for M groups as intermediate consolidated data from M communication units to a distributed processing node. A distributed processing node generates, for each group, updated intermediate consolidated data from the received intermediate consolidated data and distributed data, and transmits the updated intermediate consolidated data from the M communication units to a distributed processing node. The distributed processing node transmits the received intermediate consolidated data to a distributed processing node as consolidated data. The distributed processing node transmits the received consolidated data to a distributed processing node. Each of the distributed processing nodes updates weights of a neural network, based on the consolidated data.
-
公开(公告)号:US20210216855A1
公开(公告)日:2021-07-15
申请号:US17255209
申请日:2019-05-27
发明人: Junichi Kato , Kenji Kawai , Huycu Ngo , Yuki Arikawa , Tsuyoshi Ito , Takeshi Sakamoto
摘要: A distributed deep learning system that can achieve speeding-up by processing learning in parallel at a large number of learning nodes connected with a communication network and perform faster cooperative processing among the learning nodes connected through the communication network is provided. The distributed deep learning system includes: a plurality of computing interconnect devices 1 connected with each other through a ring communication network 3 through which communication is possible in one direction; and a plurality of learning nodes 2 connected with the respective computing interconnect devices 1 in a one-to-one relation, and each computing interconnect device 1 executes communication packet transmission-reception processing between the learning nodes 2 and All-reduce processing simultaneously in parallel.
-
公开(公告)号:US20210209443A1
公开(公告)日:2021-07-08
申请号:US16973717
申请日:2019-05-05
发明人: Kenji Kawai , Junichi Kato , Huycu Ngo , Yuki Arikawa , Tsuyoshi Ito , Takeshi Sakamoto
摘要: A first distributed processing node sets, as intermediate aggregated data, distributed data generated by the own node and transmits this data to the distributed processing node having the next number designated in advance. The intermediate distributed processing node excluding the first and last distributed processing nodes calculates, for each of weights corresponding thereto, a sum of the received intermediate aggregated data and distributed data generated by the own node, generates intermediate aggregated data after update, and transmits this data to the distributed processing node having the next number designated in advance. The last distributed processing node calculates, for each of the weights corresponding thereto, a sum of the received intermediate aggregated data and distributed data generated by the own node, generates aggregated data, and transmits this data to the first and intermediate distributed processing nodes. The distributed processing nodes update the weights of a neural network based on this data.
-
公开(公告)号:US20210034978A1
公开(公告)日:2021-02-04
申请号:US16967702
申请日:2019-02-06
发明人: Junichi Kato , Kenji Kawai , Huycu Ngo , Yuki Arikawa , Tsuyoshi Ito , Takeshi Sakamoto
摘要: Each of learning nodes calculates gradients of a loss function from an output result obtained by inputting learning data to a learning target neural network, converts a calculation result into a packet, and transmits the packet to a computing interconnect device. The computing interconnect device receives the packet transmitted from each of the learning nodes, acquires a value of the gradients stored in the packet, calculates a sum of the gradients, converts a calculation result into a packet, and transmits the packet to each of the learning nodes. Each of the learning nodes receives the packet transmitted from the computing interconnect device and updates a constituent parameter of a neural network based on a value stored in the packet.
-
公开(公告)号:US12035295B2
公开(公告)日:2024-07-09
申请号:US17612270
申请日:2019-05-31
发明人: Yuki Arikawa , Takeshi Sakamoto
CPC分类号: H04W72/12 , H04L5/0035
摘要: A scheduling apparatus includes: a division control device configured to divide an entire communicable area into a plurality of areas; a combination generation device (12-1 to 12-N) configured to generate candidate patterns of combinations of transmission points and user terminals for each area; a combination evaluation device (13-1 to 13-N) configured to calculate evaluation values of candidate patterns for each area; an optimal combination holding device (15-1 to 15-N) configured to hold an optimal combination pattern among candidate patterns for each area; a calculation result sharing device configured to output an evaluation value of an optimal combination pattern to the combination evaluation device (13-1 to 13-N) for sharing with the areas as shared information; and an overall transmission weight matrix calculation device configured to calculate a transmission weight matrix for an entire communicable area based on a result obtained by combining optimal combination patterns of the areas.
-
公开(公告)号:US20230004787A1
公开(公告)日:2023-01-05
申请号:US17779736
申请日:2019-11-27
发明人: Kenji Tanaka , Yuki Arikawa , Tsuyoshi Ito , Kazuhiko Terada , Takeshi Sakamoto
摘要: A distributed deep learning system includes nodes (1-n, n=1, . . . , 4) and a network. The node (1-n) includes GPUs (11-n-1 and 11-n-2), and an FPGA (12-n). The FPGA (12-n) includes a plurality of GPU reception buffers, a plurality of network transmission buffers that store data transferred from the GPU reception buffers, a plurality of network reception buffers that store aggregated data received from other nodes, and a plurality of GPU transmission buffers that store data transferred from the network reception buffers. The GPUs (11-n-1 and 11-n-2) DMA-transfer data to the FPGA (12-n). The data stored in the GPU transmission buffers is DMA-transferred to the GPUs (11-n-1 and 11-n-2).
-
-
-
-
-
-
-
-
-