Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"Abhinav Vishnu"

1.

发明授权
Adaptive batch reuse on deep memories 有权

公开(公告)号：US12039450B2

公开(公告)日：2024-07-16

申请号：US16424115

申请日：2019-05-28

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Abhinav Vishnu

IPC: G06N3/084 , G06F9/38 , G06F18/21 , G06F18/2115 , G06F18/214 , G06N3/08

CPC classification number: G06N3/084 , G06F9/3877 , G06F18/2115 , G06F18/2148 , G06F18/217 , G06N3/08

Abstract: A method of adaptive batch reuse includes prefetching, from a CPU to a GPU, a first plurality of mini-batches comprising a subset of a training dataset. The GPU trains the neural network for the current epoch by reusing, without discard, the first plurality of mini-batches in training the neural network for the current epoch based on a reuse count value. The GPU also runs a validation set to identify a validation error for the current epoch. If the validation error for the current epoch is less than a validation error of a previous epoch, the reuse count value is incremented for a next epoch. However, if the validation error for the current epoch is greater than a validation error of a previous epoch, the reuse count value is decremented for the next epoch.

2.

发明授权
Using sub-networks created from neural networks for processing color images 有权

公开(公告)号：US11763155B2

公开(公告)日：2023-09-19

申请号：US16538764

申请日：2019-08-12

Applicant: Advanced Micro Devices, Inc.

Inventor： Sudhanva Gurumurthi , Abhinav Vishnu

IPC: G06N3/04 , G06T7/90 , G06N20/20 , G06N3/082 , G06N3/045 , G06V10/56 , G06V10/82 , G06V10/70

CPC classification number: G06N3/082 , G06N3/045 , G06N20/20 , G06T7/90 , G06V10/56 , G06V10/82 , G06V10/87

Abstract: A system comprising an electronic device that includes a processor is described. During operation, the processor acquires a full version of a neural network, the neural network including internal elements for processing instances of input image data having a set of color channels. The processor then generates, from the neural network, a set of sub-networks, each sub-network being a separate copy of the neural network with the internal elements for processing at least one of the color channels in instances of input image data removed, so that each sub-network is configured for processing a different set of one or more color channels in instances of input image data. The processor next provides the sub-networks for processing instances of input image data—and may itself use the sub-networks for processing instances of input image data.

3.

发明授权
Dropout for accelerated deep learning in heterogeneous architectures 有权

公开(公告)号：US11620525B2

公开(公告)日：2023-04-04

申请号：US16141648

申请日：2018-09-25

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Abhinav Vishnu

IPC: G06N3/063 , G06N3/082 , G06N3/084 , G06T1/20 , G06F17/16 , G06N3/04

Abstract: A heterogeneous processing system includes at least one central processing unit (CPU) core and at least one graphics processing unit (GPU) core. The heterogeneous processing system is configured to compute an activation for each one of a plurality of neurons for a first network layer of a neural network. The heterogeneous processing system randomly drops a first subset of the plurality of neurons for the first network layer and keeps a second subset of the plurality of neurons for the first network layer. Activation for each one of the second subset of the plurality of neurons is forwarded to the CPU core and coalesced to generate a set of coalesced activation sub-matrices.

4.

发明授权
Proactive management of inter-GPU network links 有权

公开(公告)号：US11436060B2

公开(公告)日：2022-09-06

申请号：US16552065

申请日：2019-08-27

Applicant: Advanced Micro Devices, Inc.

Inventor： Karthik Rao , Abhinav Vishnu

IPC: G06F9/46 , G06F9/50 , G06N3/08 , G06F9/54 , G06F9/38

Abstract: Systems, apparatuses, and methods for proactively managing inter-processor network links are disclosed. A computing system includes at least a control unit and a plurality of processing units. Each processing unit of the plurality of processing units includes a compute module and a configurable link interface. The control unit dynamically adjusts a clock frequency and a link width of the configurable link interface of each processing unit based on a data transfer size and layer computation time of a plurality of layers of a neural network so as to reduce execution time of each layer. By adjusting the clock frequency and the link width of the link interface on a per-layer basis, the overlapping of communication and computation phases is closely matched, allowing layers to complete more quickly.

5.

发明申请
ALLREDUCE ENHANCED DIRECT MEMORY ACCESS FUNCTIONALITY 有权

公开(公告)号：US20210406209A1

公开(公告)日：2021-12-30

申请号：US17032195

申请日：2020-09-25

Applicant: Advanced Micro Devices, Inc.

Inventor： Abhinav Vishnu , Joseph Lee Greathouse

IPC: G06F13/28

Abstract: Systems, apparatuses, and methods for performing an allreduce operation on an enhanced direct memory access (DMA) engine are disclosed. A system implements a machine learning application which includes a first kernel and a second kernel. The first kernel corresponds to a first portion of a machine learning model while the second kernel corresponds to a second portion of the machine learning model. The first kernel is invoked on a plurality of compute units and the second kernel is converted into commands executable by an enhanced DMA engine to perform a collective communication operation. The first kernel is executed on the plurality of compute units in parallel with the enhanced DMA engine executing the commands for performing the collective communication operation. As a result, the allreduce operation may be executed in parallel on the enhanced DMA engine to the compute units.

6.

发明申请
COMPUTER RESOURCE SCHEDULING USING GENERATIVE ADVERSARIAL NETWORKS 审中-公开

公开(公告)号：US20200379814A1

公开(公告)日：2020-12-03

申请号：US16425878

申请日：2019-05-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Sergey Blagodurov , Abhinav Vishnu , Thaleia Dimitra Doudali , Jagadish B. Kotra

IPC: G06F9/50 , G06N3/04 , G06K9/62

Abstract: Techniques for scheduling resources on a managed computer system are provided herein. A generative adversarial network generates predicted resource utilization. An orchestrator trains the generative adversarial network and provides the predicted resource utilization from the generative adversarial network to a resource scheduler for usage when the quality of the predicted resource utilization is above a threshold. The quality is measured as the ability of a generator component of the generative adversarial network to “fool” a discriminator component of the generative adversarial network into misclassifying the predicted resource utilization as being real (i.e., being of the type that is actually measured from the computer system).

7.

发明公开
Using Sub-Networks Created from Neural Networks for Processing Color Images 审中-公开

公开(公告)号：US20230394311A1

公开(公告)日：2023-12-07

申请号：US18236864

申请日：2023-08-22

Applicant: Advanced Micro Devices, Inc.

Inventor： Sudhanva Gurumurthi , Abhinav Vishnu

IPC: G06N3/082 , G06N20/20 , G06T7/90 , G06N3/045 , G06V10/56 , G06V10/82 , G06V10/70

CPC classification number: G06N3/082 , G06N20/20 , G06T7/90 , G06N3/045 , G06V10/56 , G06V10/82 , G06V10/87

Abstract: A system comprising an electronic device that includes a processor is described. During operation, the processor acquires a full version of a neural network, the neural network including internal elements for processing instances of input image data having a set of color channels. The processor then generates, from the neural network, a set of sub-networks, each sub-network being a separate copy of the neural network with the internal elements for processing at least one of the color channels in instances of input image data removed, so that each sub-network is configured for processing a different set of one or more color channels in instances of input image data. The processor next provides the sub-networks for processing instances of input image data—and may itself use the sub-networks for processing instances of input image data.

8.

发明申请
PROACTIVE MANAGEMENT OF INTER-GPU NETWORK LINKS 有权

公开(公告)号：US20210064444A1

公开(公告)日：2021-03-04

申请号：US16552065

申请日：2019-08-27

Applicant: Advanced Micro Devices, Inc.

Inventor： Karthik Rao , Abhinav Vishnu

IPC: G06F9/50 , G06F9/38 , G06F9/54 , G06N3/08

Abstract: Systems, apparatuses, and methods for proactively managing inter-processor network links are disclosed. A computing system includes at least a control unit and a plurality of processing units. Each processing unit of the plurality of processing units includes a compute module and a configurable link interface. The control unit dynamically adjusts a clock frequency and a link width of the configurable link interface of each processing unit based on a data transfer size and layer computation time of a plurality of layers of a neural network so as to reduce execution time of each layer. By adjusting the clock frequency and the link width of the link interface on a per-layer basis, the overlapping of communication and computation phases is closely matched, allowing layers to complete more quickly.

9.

发明申请
ADAPTIVE FILTER REPLACEMENT IN CONVOLUTIONAL NEURAL NETWORKS 有权

公开(公告)号：US20210012203A1

公开(公告)日：2021-01-14

申请号：US16508277

申请日：2019-07-10

Applicant: Advanced Micro Devices, Inc.

Inventor： Abhinav Vishnu , Prakash Sathyanath Raghavendra , Tamer M. Elsharnouby , Rachida Kebichi , Walid Ali , Jonathan Charles Gallmeier

IPC: G06N3/08

Abstract: Systems, methods, and devices for increasing inference speed of a trained convolutional neural network (CNN). A first computation speed of first filters having a first filter size in a layer of the CNN is determined, and a second computation speed of second filters having a second filter size in the layer of the CNN is determined. The size of at least one of the first filters is changed to the second filter size if the second computation speed is faster than the first computation speed. In some implementations the CNN is retrained, after changing the size of at least one of the first filters to the second filter size, to generate a retrained CNN. The size of a fewer number of the first filters is changed to the second filter size if a key performance indicator loss of the retrained CNN exceeds a threshold.

10.

发明申请
RUNTIME EXTENSION FOR NEURAL NETWORK TRAINING WITH HETEROGENEOUS MEMORY 审中-公开

公开(公告)号：US20200042859A1

公开(公告)日：2020-02-06

申请号：US16194958

申请日：2018-11-19

Applicant: Advanced Micro Devices, Inc.

Inventor： Georgios Mappouras , Amin Farmahini-Farahani , Sudhanva Gurumurthi , Abhinav Vishnu , Gabriel H. Loh

IPC: G06N3/04 , G06N3/08 , G06F9/445 , G06F9/54

Abstract: Systems, apparatuses, and methods for managing buffers in a neural network implementation with heterogeneous memory are disclosed. A system includes a neural network coupled to a first memory and a second memory. The first memory is a relatively low-capacity, high-bandwidth memory while the second memory is a relatively high-capacity, low-bandwidth memory. During a forward propagation pass of the neural network, a run-time manager monitors the usage of the buffers for the various layers of the neural network. During a backward propagation pass of the neural network, the run-time manager determines how to move the buffers between the first and second memories based on the monitored buffer usage during the forward propagation pass. As a result, the run-time manager is able to reduce memory access latency for the layers of the neural network during the backward propagation pass.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification