-
公开(公告)号:US12169782B2
公开(公告)日:2024-12-17
申请号:US16425403
申请日:2019-05-29
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Shomit N. Das , Abhinav Vishnu
Abstract: A processor determines losses of samples within an input volume that is provided to a neural network during a first epoch, groups the samples into subsets based on losses, and assigns the subsets to operands in the neural network that represent the samples at different precisions. Each subset is associated with a different precision. The processor then processes the subsets in the neural network at the different precisions during the first epoch. In some cases, the samples in the subsets are used in a forward pass and a backward pass through the neural network. A memory configured to store information representing the samples in the subsets at the different precisions. In some cases, the processor stores information representing model parameters of the neural network in the memory at the different precisions of the subsets of the corresponding samples.
-
公开(公告)号:US11775799B2
公开(公告)日:2023-10-03
申请号:US16194958
申请日:2018-11-19
Applicant: Advanced Micro Devices, Inc.
Inventor: Georgios Mappouras , Amin Farmahini-Farahani , Sudhanva Gurumurthi , Abhinav Vishnu , Gabriel H. Loh
CPC classification number: G06N3/04 , G06F9/44505 , G06F9/544 , G06N3/084
Abstract: Systems, apparatuses, and methods for managing buffers in a neural network implementation with heterogeneous memory are disclosed. A system includes a neural network coupled to a first memory and a second memory. The first memory is a relatively low-capacity, high-bandwidth memory while the second memory is a relatively high-capacity, low-bandwidth memory. During a forward propagation pass of the neural network, a run-time manager monitors the usage of the buffers for the various layers of the neural network. During a backward propagation pass of the neural network, the run-time manager determines how to move the buffers between the first and second memories based on the monitored buffer usage during the forward propagation pass. As a result, the run-time manager is able to reduce memory access latency for the layers of the neural network during the backward propagation pass.
-
公开(公告)号:US20230196430A1
公开(公告)日:2023-06-22
申请号:US17559636
申请日:2021-12-22
Applicant: Advanced Micro Devices, Inc.
Inventor: Sarunya Pumma , Abhinav Vishnu
IPC: G06Q30/06
CPC classification number: G06Q30/0625
Abstract: An electronic device includes multiple nodes. Each node generates compressed lookup data to be used for processing instances of input data through a model using input index vectors from a compressed set of input index vectors for each part among multiple parts of a respective set of input index vectors. Each node then communicates compressed lookup data for a respective part to each other node.
-
公开(公告)号:US11669473B2
公开(公告)日:2023-06-06
申请号:US17032195
申请日:2020-09-25
Applicant: Advanced Micro Devices, Inc.
Inventor: Abhinav Vishnu , Joseph Lee Greathouse
IPC: G06F13/28
CPC classification number: G06F13/28 , G06F2213/28
Abstract: Systems, apparatuses, and methods for performing an allreduce operation on an enhanced direct memory access (DMA) engine are disclosed. A system implements a machine learning application which includes a first kernel and a second kernel. The first kernel corresponds to a first portion of a machine learning model while the second kernel corresponds to a second portion of the machine learning model. The first kernel is invoked on a plurality of compute units and the second kernel is converted into commands executable by an enhanced DMA engine to perform a collective communication operation. The first kernel is executed on the plurality of compute units in parallel with the enhanced DMA engine executing the commands for performing the collective communication operation. As a result, the allreduce operation may be executed in parallel on the enhanced DMA engine to the compute units.
-
公开(公告)号:US20210049446A1
公开(公告)日:2021-02-18
申请号:US16538764
申请日:2019-08-12
Applicant: Advanced Micro Devices, Inc.
Inventor: Sudhanva Gurumurthi , Abhinav Vishnu
Abstract: A system comprising an electronic device that includes a processor is described. During operation, the processor acquires a full version of a neural network, the neural network including internal elements for processing instances of input image data having a set of color channels. The processor then generates, from the neural network, a set of sub-networks, each sub-network being a separate copy of the neural network with the internal elements for processing at least one of the color channels in instances of input image data removed, so that each sub-network is configured for processing a different set of one or more color channels in instances of input image data. The processor next provides the sub-networks for processing instances of input image data—and may itself use the sub-networks for processing instances of input image data.
-
公开(公告)号:US20190354833A1
公开(公告)日:2019-11-21
申请号:US16027454
申请日:2018-07-05
Applicant: Advanced Micro Devices, Inc.
Inventor: Abhinav Vishnu
Abstract: Methods and systems for reducing communication frequency in neural networks (NN) are described. The method includes running, in an initial epoch, mini-batches of samples from a training set through the NN and determining one or more errors from a ground truth, where the ground truth is the given label for the sample. The errors are recorded for each sample and are sorted in a non-decreasing order. In a next epoch, mini-batches of samples are formed starting from the sample which has the smallest error in the sorted list. The parameters of the NN are updated and the mini-batches are run. A mini-batch(es) are communicated to the other processing elements if a previous update has resulted in making a significant impact on the NN, where significant impact is measured by determining if the errors or accumulated errors since the last communication update meet or exceed a significance threshold.
-
-
-
-
-