Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"Abhinav Vishnu"

11.

发明授权
Dynamic precision scaling at epoch granularity in neural networks 有权

公开(公告)号：US12169782B2

公开(公告)日：2024-12-17

申请号：US16425403

申请日：2019-05-29

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Shomit N. Das , Abhinav Vishnu

IPC: G06N3/084 , G06N3/04

Abstract: A processor determines losses of samples within an input volume that is provided to a neural network during a first epoch, groups the samples into subsets based on losses, and assigns the subsets to operands in the neural network that represent the samples at different precisions. Each subset is associated with a different precision. The processor then processes the subsets in the neural network at the different precisions during the first epoch. In some cases, the samples in the subsets are used in a forward pass and a backward pass through the neural network. A memory configured to store information representing the samples in the subsets at the different precisions. In some cases, the processor stores information representing model parameters of the neural network in the memory at the different precisions of the subsets of the corresponding samples.

12.

发明授权
Runtime extension for neural network training with heterogeneous memory 有权

公开(公告)号：US11775799B2

公开(公告)日：2023-10-03

申请号：US16194958

申请日：2018-11-19

Applicant: Advanced Micro Devices, Inc.

Inventor： Georgios Mappouras , Amin Farmahini-Farahani , Sudhanva Gurumurthi , Abhinav Vishnu , Gabriel H. Loh

IPC: G06N20/10 , G06N3/04 , G06F9/54 , G06F9/445 , G06N3/084

CPC classification number: G06N3/04 , G06F9/44505 , G06F9/544 , G06N3/084

Abstract: Systems, apparatuses, and methods for managing buffers in a neural network implementation with heterogeneous memory are disclosed. A system includes a neural network coupled to a first memory and a second memory. The first memory is a relatively low-capacity, high-bandwidth memory while the second memory is a relatively high-capacity, low-bandwidth memory. During a forward propagation pass of the neural network, a run-time manager monitors the usage of the buffers for the various layers of the neural network. During a backward propagation pass of the neural network, the run-time manager determines how to move the buffers between the first and second memories based on the monitored buffer usage during the forward propagation pass. As a result, the run-time manager is able to reduce memory access latency for the layers of the neural network during the backward propagation pass.

13.

发明公开
Compression of Lookup Data Communicated by Nodes in an Electronic Device 审中-公开

公开(公告)号：US20230196430A1

公开(公告)日：2023-06-22

申请号：US17559636

申请日：2021-12-22

Applicant: Advanced Micro Devices, Inc.

Inventor： Sarunya Pumma , Abhinav Vishnu

IPC: G06Q30/06

CPC classification number: G06Q30/0625

Abstract: An electronic device includes multiple nodes. Each node generates compressed lookup data to be used for processing instances of input data through a model using input index vectors from a compressed set of input index vectors for each part among multiple parts of a respective set of input index vectors. Each node then communicates compressed lookup data for a respective part to each other node.

14.

发明授权
Allreduce enhanced direct memory access functionality 有权

公开(公告)号：US11669473B2

公开(公告)日：2023-06-06

申请号：US17032195

申请日：2020-09-25

Applicant: Advanced Micro Devices, Inc.

Inventor： Abhinav Vishnu , Joseph Lee Greathouse

IPC: G06F13/28

CPC classification number: G06F13/28 , G06F2213/28

Abstract: Systems, apparatuses, and methods for performing an allreduce operation on an enhanced direct memory access (DMA) engine are disclosed. A system implements a machine learning application which includes a first kernel and a second kernel. The first kernel corresponds to a first portion of a machine learning model while the second kernel corresponds to a second portion of the machine learning model. The first kernel is invoked on a plurality of compute units and the second kernel is converted into commands executable by an enhanced DMA engine to perform a collective communication operation. The first kernel is executed on the plurality of compute units in parallel with the enhanced DMA engine executing the commands for performing the collective communication operation. As a result, the allreduce operation may be executed in parallel on the enhanced DMA engine to the compute units.

15.

发明申请
Using Sub-Networks Created from Neural Networks for Processing Color Images 有权

公开(公告)号：US20210049446A1

公开(公告)日：2021-02-18

申请号：US16538764

申请日：2019-08-12

Applicant: Advanced Micro Devices, Inc.

Inventor： Sudhanva Gurumurthi , Abhinav Vishnu

IPC: G06N3/04 , G06T7/90 , G06N20/20

Abstract: A system comprising an electronic device that includes a processor is described. During operation, the processor acquires a full version of a neural network, the neural network including internal elements for processing instances of input image data having a set of color channels. The processor then generates, from the neural network, a set of sub-networks, each sub-network being a separate copy of the neural network with the internal elements for processing at least one of the color channels in instances of input image data removed, so that each sub-network is configured for processing a different set of one or more color channels in instances of input image data. The processor next provides the sub-networks for processing instances of input image data—and may itself use the sub-networks for processing instances of input image data.

16.

发明申请
METHOD AND SYSTEM FOR REDUCING COMMUNICATION FREQUENCY IN NEURAL NETWORK SYSTEMS 审中-公开

公开(公告)号：US20190354833A1

公开(公告)日：2019-11-21

申请号：US16027454

申请日：2018-07-05

Applicant: Advanced Micro Devices, Inc.

Inventor： Abhinav Vishnu

IPC: G06N3/04 , G06N3/08 , G06N3/063

Abstract: Methods and systems for reducing communication frequency in neural networks (NN) are described. The method includes running, in an initial epoch, mini-batches of samples from a training set through the NN and determining one or more errors from a ground truth, where the ground truth is the given label for the sample. The errors are recorded for each sample and are sorted in a non-decreasing order. In a next epoch, mini-batches of samples are formed starting from the sample which has the smallest error in the sorted list. The parameters of the NN are updated and the mini-batches are run. A mini-batch(es) are communicated to the other processing elements if a previous update has resulted in making a significant impact on the NN, where significant impact is measured by determining if the errors or accumulated errors since the last communication update meet or exceed a significance threshold.

Patent Agency Ranking