专利检索 ap:("Advanced Micro Devices, Inc.") AND inv:"Khaled Hamidouche" 第 1 页

1.

发明申请
OPTIMIZED ASYNCHRONOUS TRAINING OF NEURAL NETWORKS USING A DISTRIBUTED PARAMETER SERVER WITH EAGER UPDATES 审中-公开

公开(公告)号：US20190258924A1

公开(公告)日：2019-08-22

申请号：US15898433

申请日：2018-02-17

申请人： Advanced Micro Devices, Inc.

发明人： Khaled Hamidouche , Michael W LeBeane , Walter B Benton , Michael L Chu

IPC分类号： G06N3/08 , G06F15/173

摘要： A method of training a neural network includes, at a local computing node, receiving remote parameters from a set of one or more remote computing nodes, initiating execution of a forward pass in a local neural network in the local computing node to determine a final output based on the remote parameters, initiating execution of a backward pass in the local neural network to determine updated parameters for the local neural network, and prior to completion of the backward pass, transmitting a subset of the updated parameters to the set of remote computing nodes.

2.

发明授权
Systems and methods for reducing instruction code memory footprint for multiple processes executed at a coprocessor 有权

公开(公告)号：US12086447B2

公开(公告)日：2024-09-10

申请号：US16719076

申请日：2019-12-18

申请人： ADVANCED MICRO DEVICES, INC.

发明人： Khaled Hamidouche , Michael W. Lebeane , Hari S. Thangirala

IPC分类号： G06F3/06 , G06F12/0882

CPC分类号： G06F3/0647 , G06F3/0611 , G06F3/0659 , G06F3/0688 , G06F12/0882 , G06F2212/7201

摘要： A processing system includes a first processor couplable to a first memory and a second memory. In response to a page migration trigger for a page in the first memory, the first processor is configured to, responsive to the page being a read-only page storing code for execution, initiate migration of the page to a code cache portion of a second memory associated with a second processor and shared by multiple processes executing at the second processor, and to configure each process of a set of processes executing at the second processor to access and execute the code from the code cache portion.

3.

发明申请
EFFICIENT MEMORY-SEMANTIC NETWORKING USING SCOPED MEMORY MODELS 有权

公开(公告)号：US20220100391A1

公开(公告)日：2022-03-31

申请号：US17033170

申请日：2020-09-25

申请人： Advanced Micro Devices, Inc.

发明人： Michael W. LeBeane , Khaled Hamidouche , Hari S. Thangirala , Brandon Keith Potter

IPC分类号： G06F3/06 , G06F12/02 , G06F12/0802

摘要： A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.

4.

发明申请
NETWORK-RELATED PERFORMANCE FOR GPUS 审中-公开

公开(公告)号：US20200034195A1

公开(公告)日：2020-01-30

申请号：US16049216

申请日：2018-07-30

申请人： Advanced Micro Devices, Inc.

发明人： Michael W. LeBeane , Khaled Hamidouche , Bradford M. Beckmann

IPC分类号： G06F9/48 , G06F9/50 , G06F9/54

摘要： Techniques for improved networking performance in systems where a graphics processing unit or other highly parallel non-central-processing-unit (referred to as an accelerated processing device or “APD” herein) has the ability to directly issue commands to a networking device such as a network interface controller (“NIC”) are disclosed. According to a first technique, the latency associated with loading certain metadata into NIC hardware memory is reduced or eliminated by pre-fetching network command queue metadata into hardware network command queue metadata slots of the NIC, thereby reducing the latency associated with fetching that metadata at a later time. A second technique involves reducing latency by prioritizing work on an APD when it is known that certain network traffic is soon to arrive over the network via a NIC.

5.

发明授权
Efficient memory-semantic networking using scoped memory models 有权

公开(公告)号：US12086422B2

公开(公告)日：2024-09-10

申请号：US18320819

申请日：2023-05-19

申请人： Advanced Micro Devices, Inc.

发明人： Michael W. LeBeane , Khaled Hamidouche , Hari S. Thangirala , Brandon Keith Potter

IPC分类号： G06F3/06 , G06F12/02 , G06F12/0802

CPC分类号： G06F3/0619 , G06F3/0656 , G06F3/067 , G06F12/0223 , G06F12/0802 , G06F2212/152

摘要： A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.

6.

发明公开
Processing Element-Centric All-to-All Communication 审中-公开

公开(公告)号：US20240220336A1

公开(公告)日：2024-07-04

申请号：US18147081

申请日：2022-12-28

申请人： Advanced Micro Devices, Inc.

发明人： Kishore Punniyamurthy , Khaled Hamidouche , Brandon K Potter , Rohit Shahaji Zambre

IPC分类号： G06F9/54 , G06F9/50 , G06F15/173

CPC分类号： G06F9/54 , G06F9/5044 , G06F15/17356

摘要： In accordance with described techniques for PE-centric all-to-all communication, a distributed computing system includes processing elements, such as graphics processing units, distributed in clusters. An all-to-all communication procedure is performed by the processing elements that are each configured to generate data packets in parallel for all-to-all data communication between the clusters. The all-to-all communication procedure includes a first stage of intra-cluster parallel data communication between respective processing elements of each of the clusters; a second stage of inter-cluster data exchange for all-to-all data communication between the clusters; and a third stage of intra-cluster data distribution to the respective processing elements of each of the clusters.

7.

发明公开
EFFICIENT MEMORY-SEMANTIC NETWORKING USING SCOPED MEMORY MODELS 审中-公开

公开(公告)号：US20230289070A1

公开(公告)日：2023-09-14

申请号：US18320819

申请日：2023-05-19

申请人： Advanced Micro Devices, Inc.

发明人： Michael W. LeBeane , Khaled Hamidouche , Hari S. Thangirala , Brandon Keith Potter

IPC分类号： G06F3/06 , G06F12/02 , G06F12/0802

CPC分类号： G06F3/0619 , G06F12/0223 , G06F3/0656 , G06F3/067 , G06F12/0802 , G06F2212/152

摘要： A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.

8.

发明授权
Efficient memory-semantic networking using scoped memory models 有权

公开(公告)号：US11714559B2

公开(公告)日：2023-08-01

申请号：US17033170

申请日：2020-09-25

申请人： Advanced Micro Devices, Inc.

发明人： Michael W. LeBeane , Khaled Hamidouche , Hari S. Thangirala , Brandon Keith Potter

IPC分类号： G06F3/06 , G06F12/02 , G06F12/0802

CPC分类号： G06F3/0619 , G06F3/067 , G06F3/0656 , G06F12/0223 , G06F12/0802 , G06F2212/152

摘要： A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.

9.

发明授权
Optimized asynchronous training of neural networks using a distributed parameter server with eager updates 有权

公开(公告)号：US11630994B2

公开(公告)日：2023-04-18

申请号：US15898433

申请日：2018-02-17

申请人： Advanced Micro Devices, Inc.

发明人： Khaled Hamidouche , Michael W LeBeane , Walter B Benton , Michael L Chu

IPC分类号： G06N3/08 , G06F15/173 , G06N3/084 , G06N3/063 , G06N3/045

摘要： A method of training a neural network includes, at a local computing node, receiving remote parameters from a set of one or more remote computing nodes, initiating execution of a forward pass in a local neural network in the local computing node to determine a final output based on the remote parameters, initiating execution of a backward pass in the local neural network to determine updated parameters for the local neural network, and prior to completion of the backward pass, transmitting a subset of the updated parameters to the set of remote computing nodes.

10.

发明授权
Optimized and scalable sparse triangular linear systems on networks of accelerators 有权

公开(公告)号：US10936697B2

公开(公告)日：2021-03-02

申请号：US16044145

申请日：2018-07-24

申请人： Advanced Micro Devices, Inc.

发明人： Khaled Hamidouche , Michael W. LeBeane , Nicholas P. Malaya , Joseph L. Greathouse

IPC分类号： G06F17/16 , G06F9/38 , G06F9/30 , G06F17/12

摘要： A method includes storing a first portion of a sparse triangular matrix in a local memory and launching a kernel for executing a set of workgroups. The first portion includes a plurality of row blocks, and each workgroup in the set of workgroups is associated with one of the plurality of row blocks. The method also includes, for each workgroup in the set of workgroups, solving the row block. The row block is solved by, for each row segment of a first subset of row segments in the row block, calculating a partial sum for the row segment based on one or more matrix elements in the row segment, and writing the partial sum to a remote memory of a first remote processing unit prior to terminating the kernel.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类