-
1.
公开(公告)号:US20190258924A1
公开(公告)日:2019-08-22
申请号:US15898433
申请日:2018-02-17
IPC分类号: G06N3/08 , G06F15/173
摘要: A method of training a neural network includes, at a local computing node, receiving remote parameters from a set of one or more remote computing nodes, initiating execution of a forward pass in a local neural network in the local computing node to determine a final output based on the remote parameters, initiating execution of a backward pass in the local neural network to determine updated parameters for the local neural network, and prior to completion of the backward pass, transmitting a subset of the updated parameters to the set of remote computing nodes.
-
公开(公告)号:US12086447B2
公开(公告)日:2024-09-10
申请号:US16719076
申请日:2019-12-18
IPC分类号: G06F3/06 , G06F12/0882
CPC分类号: G06F3/0647 , G06F3/0611 , G06F3/0659 , G06F3/0688 , G06F12/0882 , G06F2212/7201
摘要: A processing system includes a first processor couplable to a first memory and a second memory. In response to a page migration trigger for a page in the first memory, the first processor is configured to, responsive to the page being a read-only page storing code for execution, initiate migration of the page to a code cache portion of a second memory associated with a second processor and shared by multiple processes executing at the second processor, and to configure each process of a set of processes executing at the second processor to access and execute the code from the code cache portion.
-
公开(公告)号:US20220100391A1
公开(公告)日:2022-03-31
申请号:US17033170
申请日:2020-09-25
IPC分类号: G06F3/06 , G06F12/02 , G06F12/0802
摘要: A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.
-
公开(公告)号:US20200034195A1
公开(公告)日:2020-01-30
申请号:US16049216
申请日:2018-07-30
摘要: Techniques for improved networking performance in systems where a graphics processing unit or other highly parallel non-central-processing-unit (referred to as an accelerated processing device or “APD” herein) has the ability to directly issue commands to a networking device such as a network interface controller (“NIC”) are disclosed. According to a first technique, the latency associated with loading certain metadata into NIC hardware memory is reduced or eliminated by pre-fetching network command queue metadata into hardware network command queue metadata slots of the NIC, thereby reducing the latency associated with fetching that metadata at a later time. A second technique involves reducing latency by prioritizing work on an APD when it is known that certain network traffic is soon to arrive over the network via a NIC.
-
公开(公告)号:US12086422B2
公开(公告)日:2024-09-10
申请号:US18320819
申请日:2023-05-19
IPC分类号: G06F3/06 , G06F12/02 , G06F12/0802
CPC分类号: G06F3/0619 , G06F3/0656 , G06F3/067 , G06F12/0223 , G06F12/0802 , G06F2212/152
摘要: A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.
-
公开(公告)号:US20240220336A1
公开(公告)日:2024-07-04
申请号:US18147081
申请日:2022-12-28
IPC分类号: G06F9/54 , G06F9/50 , G06F15/173
CPC分类号: G06F9/54 , G06F9/5044 , G06F15/17356
摘要: In accordance with described techniques for PE-centric all-to-all communication, a distributed computing system includes processing elements, such as graphics processing units, distributed in clusters. An all-to-all communication procedure is performed by the processing elements that are each configured to generate data packets in parallel for all-to-all data communication between the clusters. The all-to-all communication procedure includes a first stage of intra-cluster parallel data communication between respective processing elements of each of the clusters; a second stage of inter-cluster data exchange for all-to-all data communication between the clusters; and a third stage of intra-cluster data distribution to the respective processing elements of each of the clusters.
-
公开(公告)号:US20230289070A1
公开(公告)日:2023-09-14
申请号:US18320819
申请日:2023-05-19
IPC分类号: G06F3/06 , G06F12/02 , G06F12/0802
CPC分类号: G06F3/0619 , G06F12/0223 , G06F3/0656 , G06F3/067 , G06F12/0802 , G06F2212/152
摘要: A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.
-
公开(公告)号:US11714559B2
公开(公告)日:2023-08-01
申请号:US17033170
申请日:2020-09-25
IPC分类号: G06F3/06 , G06F12/02 , G06F12/0802
CPC分类号: G06F3/0619 , G06F3/067 , G06F3/0656 , G06F12/0223 , G06F12/0802 , G06F2212/152
摘要: A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.
-
公开(公告)号:US11630994B2
公开(公告)日:2023-04-18
申请号:US15898433
申请日:2018-02-17
IPC分类号: G06N3/08 , G06F15/173 , G06N3/084 , G06N3/063 , G06N3/045
摘要: A method of training a neural network includes, at a local computing node, receiving remote parameters from a set of one or more remote computing nodes, initiating execution of a forward pass in a local neural network in the local computing node to determine a final output based on the remote parameters, initiating execution of a backward pass in the local neural network to determine updated parameters for the local neural network, and prior to completion of the backward pass, transmitting a subset of the updated parameters to the set of remote computing nodes.
-
公开(公告)号:US10936697B2
公开(公告)日:2021-03-02
申请号:US16044145
申请日:2018-07-24
摘要: A method includes storing a first portion of a sparse triangular matrix in a local memory and launching a kernel for executing a set of workgroups. The first portion includes a plurality of row blocks, and each workgroup in the set of workgroups is associated with one of the plurality of row blocks. The method also includes, for each workgroup in the set of workgroups, solving the row block. The row block is solved by, for each row segment of a first subset of row segments in the row block, calculating a partial sum for the row segment based on one or more matrix elements in the row segment, and writing the partial sum to a remote memory of a first remote processing unit prior to terminating the kernel.
-
-
-
-
-
-
-
-
-