-
公开(公告)号:US20240193292A1
公开(公告)日:2024-06-13
申请号:US18212858
申请日:2023-06-22
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Jagadish B. Kotra , David Kaplan , Kishore Punniyamurthy , Alexander Toufic Freij
IPC: G06F21/62
CPC classification number: G06F21/6218 , G06F2221/2113 , G06F2221/2141
Abstract: A processing system receives graph object data and graph object metadata. The processing system stores the graph object metadata inline with the graph object data. The graph object metadata indicates access permissions for corresponding graph objects. Because the graph object metadata is stored inline with the graph object data, the graph object metadata is more easily retrieved and fewer system resources are consumed to determine access permissions of a requester as compared to a system where graph object metadata is stored separately from the graph object data.
-
公开(公告)号:US11119665B2
公开(公告)日:2021-09-14
申请号:US16212388
申请日:2018-12-06
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Shomit N. Das , Kishore Punniyamurthy
IPC: G06F3/06
Abstract: A processing system scales power to memory and memory channels based on identifying causes of stalls of threads of a wavefront. If the cause is other than an outstanding memory request, the processing system throttles power to the memory to save power. If the stall is due to memory stalls for a subset of the memory channels servicing memory access requests for threads of a wavefront, the processing system adjusts power of the memory channels servicing memory access request for the wavefront based on the subset. By boosting power to the subset of channels, the processing system enables the wavefront to complete processing more quickly, resulting in increased processing speed. Conversely, by throttling power to the remainder of channels, the processing system saves power without affecting processing speed.
-
公开(公告)号:US12197378B2
公开(公告)日:2025-01-14
申请号:US17804949
申请日:2022-06-01
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Jagadish B. Kotra , Kishore Punniyamurthy
Abstract: An apparatus configured for offloading system service tasks to a processing-in-memory (“PIM”) device includes an agent configured to: receive, from a host processor, a request to offload a memory task associated with a system service to the PIM device; determine at least one PIM command and at least one memory page associated with the host processor based upon the request; and issue the at least one PIM command to the PIM device for execution by the PIM device to perform the memory task upon the at least one memory page.
-
公开(公告)号:US20220206946A1
公开(公告)日:2022-06-30
申请号:US17135657
申请日:2020-12-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Brandon K. Potter , Marko Scrbak , Sergey Blagodurov , Kishore Punniyamurthy , Nathaniel Morris
IPC: G06F12/0817
Abstract: Method and apparatus monitor eviction conflicts among cache directory entries in a cache directory and produce cache directory victim entry information for a memory manager. In some examples, the memory manager reduces future cache directory conflicts by changing a page level physical address assignment for a page of memory based on the produced cache directory victim entry information. In some examples, a scalable data fabric includes hardware control logic that performs the monitoring of the eviction conflicts among cache directory entries in the cache directory and produces the cache directory victim entry information.
-
公开(公告)号:US20240220336A1
公开(公告)日:2024-07-04
申请号:US18147081
申请日:2022-12-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Kishore Punniyamurthy , Khaled Hamidouche , Brandon K Potter , Rohit Shahaji Zambre
IPC: G06F9/54 , G06F9/50 , G06F15/173
CPC classification number: G06F9/54 , G06F9/5044 , G06F15/17356
Abstract: In accordance with described techniques for PE-centric all-to-all communication, a distributed computing system includes processing elements, such as graphics processing units, distributed in clusters. An all-to-all communication procedure is performed by the processing elements that are each configured to generate data packets in parallel for all-to-all data communication between the clusters. The all-to-all communication procedure includes a first stage of intra-cluster parallel data communication between respective processing elements of each of the clusters; a second stage of inter-cluster data exchange for all-to-all data communication between the clusters; and a third stage of intra-cluster data distribution to the respective processing elements of each of the clusters.
-
公开(公告)号:US11880312B2
公开(公告)日:2024-01-23
申请号:US17539189
申请日:2021-11-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Kishore Punniyamurthy , SeyedMohammad SeyedzadehDelcheh , Sergey Blagodurov , Ganesh Dasika , Jagadish B Kotra
IPC: G06F12/00 , G06F12/126 , G06F12/0855
CPC classification number: G06F12/126 , G06F12/0859 , G06F2212/1024 , G06F2212/6042
Abstract: A method includes storing a function representing a set of data elements stored in a backing memory and, in response to a first memory read request for a first data element of the set of data elements, calculating a function result representing the first data element based on the function.
-
公开(公告)号:US11507522B2
公开(公告)日:2022-11-22
申请号:US16706421
申请日:2019-12-06
Applicant: Advanced Micro Devices, Inc.
Inventor: Sooraj Puthoor , Kishore Punniyamurthy , Onur Kayiran , Xianwei Zhang , Yasuko Eckert , Johnathan Alsop , Bradford Michael Beckmann
Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.
-
公开(公告)号:US20240311182A1
公开(公告)日:2024-09-19
申请号:US18185641
申请日:2023-03-17
Applicant: Advanced Micro Devices, Inc.
Inventor: Kishore Punniyamurthy , Sagnik Basu , Khaled Hamidouche , Brandon Keith Potter
IPC: G06F9/48
CPC classification number: G06F9/4881
Abstract: A device includes a communication scheduler to generate schedule trees for scheduling data communication among a plurality of nodes configured to perform a collective operation using data contributed from the plurality of nodes. The device includes data reduction logic to: identify one or more skewed nodes among the plurality of nodes, perform, according to a first set of schedule trees, a first operation to generate partial results based on data contributed from non-skewed nodes, and perform, according to a second set of schedule trees, a second operation to generate final results based on the partial results and data contributed from the one or more skewed nodes.
-
公开(公告)号:US12050531B2
公开(公告)日:2024-07-30
申请号:US17952697
申请日:2022-09-26
Applicant: Advanced Micro Devices, Inc.
Inventor: Kishore Punniyamurthy , Jagadish B Kotra
IPC: G06F12/02
CPC classification number: G06F12/0292 , G06F2212/1024 , G06F2212/401
Abstract: In accordance with the described techniques for data compression and decompression for processing in memory, a page address is received by a processing in memory component that maps to a first location in memory where data of a page is maintained. The data of the page is compressed by the processing in memory component. Further, compressed data of the page is written by the processing in memory component to a compressed block device responsive to the compressed data satisfying one or more compressibility criteria. The compressed block device is a portion of the memory dedicated to storing data in a compressed form.
-
10.
公开(公告)号:US20240211399A1
公开(公告)日:2024-06-27
申请号:US18089480
申请日:2022-12-27
Applicant: Advanced Micro Devices, Inc.
Inventor: Kishore Punniyamurthy , Khaled Hamidouche , Brandon Keith Potter
IPC: G06F12/0813 , G06N20/00
CPC classification number: G06F12/0813 , G06N20/00
Abstract: A distributed cache network used for machine learning is provided which comprises a network fabric having file systems which store data and a plurality of processing devices, each comprising cache memory and a processor configured to execute a training of a machine learning model and selectively cache portions of the data based on a frequency with which the data is accessed by the processor. Each processing device stores metadata identifying portions of data which are cached in the cache memory and other portions of the data which are cached in other processing devices of the network. When requested data is not cached in another processing device, the portion of requested data is accessed from a network file system via a client to server channel and is accessed from another processing device via a client to client channel when the requested data is cached in the other processing device.
-
-
-
-
-
-
-
-
-