-
公开(公告)号:US20240103745A1
公开(公告)日:2024-03-28
申请号:US17954784
申请日:2022-09-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Niti Madan , Johnathan Robert Alsop , Alexandru Dutu , Mahzabeen Islam , Yasuko Eckert , Nuwan S Jayasena
IPC: G06F3/06
CPC classification number: G06F3/064 , G06F3/0659 , G06F3/0604 , G06F3/0679
Abstract: A memory controller coupled to a memory module receives both processing-in-memory (PIM) requests and memory requests from a host (e.g., a host processor). The memory controller issues PIM requests to one group of memory banks and concurrently issues memory requests to one or more other groups of memory banks. Accordingly, memory requests are performed on groups of memory banks that would otherwise be idle while PIM requests are performed on the one group of memory banks. Optionally, the memory controller coupled to the memory module also takes various actions when switching between operating in a PIM mode and a non-processing-in-memory mode to reduce or hide overhead when switching between the two modes.
-
公开(公告)号:US20240095180A1
公开(公告)日:2024-03-21
申请号:US18088170
申请日:2022-12-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Gabriel H. Loh , Michael Estlick , Jay Fleischman , Michael J. Schulte , Bradford Beckmann , Yasuko Eckert
IPC: G06F12/1009
CPC classification number: G06F12/1009 , G06F2212/1008
Abstract: The disclosed computer-implemented method for interpolating register-based lookup tables can include identifying, within a set of registers, a lookup table that has been encoded for storage within the set of registers. The method can also include receiving a request to look up a value in the lookup table and responding to the request by interpolating, from the encoded lookup table stored in the set of registers, a representation of the requested value. Various other methods, systems, and computer-readable media are also disclosed.
-
公开(公告)号:US20230401154A1
公开(公告)日:2023-12-14
申请号:US17835810
申请日:2022-06-08
Applicant: Advanced Micro Devices, Inc.
Inventor: Mohamed Assem Abd ElMohsen Ibrahim , Onur Kayiran , Shaizeen Dilawarhusen Aga , Yasuko Eckert
IPC: G06F12/0862
CPC classification number: G06F12/0862 , G06F2212/602
Abstract: A system and method for efficiently accessing sparse data for a workload are described. In various implementations, a computing system includes an integrated circuit and a memory for storing tasks of a workload that includes sparse accesses of data items stored in one or more tables. The integrated circuit receives a user query, and generates a result based on multiple data items targeted by the user query. To reduce the latency of processing the workload even with sparse lookup operations performed on the one or more tables, a prefetch engine of the integrated circuit stores a subset of data items in prefetch data storage. The prefetch engine also determines which data items to store in the prefetch data storage based on one or more of a frequency of reuse, a distance or latency of access of a corresponding table of the one more tables, or other.
-
公开(公告)号:US11768779B2
公开(公告)日:2023-09-26
申请号:US16716194
申请日:2019-12-16
Applicant: Advanced Micro Devices, Inc.
Inventor: Jieming Yin , Yasuko Eckert , Subhash Sethumurugan
IPC: G06F12/12 , G06F12/126
CPC classification number: G06F12/126 , G06F2212/1021 , G06F2212/1044 , G06F2212/602
Abstract: Systems, apparatuses, and methods for cache management based on access type priority are disclosed. A system includes at least a processor and a cache. During a program execution phase, certain access types are more likely to cause demand hits in the cache than others. Demand hits are load and store hits to the cache. A run-time profiling mechanism is employed to find which access types are more likely to cause demand hits. Based on the profiling results, the cache lines that will likely be accessed in the future are retained based on their most recent access type. The goal is to increase demand hits and thereby improve system performance. An efficient cache replacement policy can potentially reduce redundant data movement, thereby improving system performance and reducing energy consumption.
-
公开(公告)号:US20230153218A1
公开(公告)日:2023-05-18
申请号:US17526218
申请日:2021-11-15
Applicant: Advanced Micro Devices, Inc.
Inventor: Shrikanth Ganapathy , Yasuko Eckert , Anthony Gutierrez , Karthik Ramu Sangaiah , Vedula Venkata Srikant Bharadwaj
CPC classification number: G06F11/3051 , G06F15/80 , G06F11/3024
Abstract: A processor includes a controller and plurality of chiplets, each chiplet including a plurality of processor cores. The controller provides chiplet-level performance information for the chiplets that identifies a performance of each chiplet at each of a plurality of performance levels for specified sets of processor cores on that chiplet. The controller receives an identification of one or more selected chiplets from among the plurality of chiplets for which a specified number of processor cores are to be configured at a given performance level, the one or more selected chiplets having been selected based on the chiplet-level performance information and performance requirements. The controller configures the specified number of processor cores of the one or more selected chiplets at the given performance level. A task is then run on the specified number of processor cores of the one or more selected chiplets at the given performance level.
-
公开(公告)号:US11586539B2
公开(公告)日:2023-02-21
申请号:US16713940
申请日:2019-12-13
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Weon Taek Na , Jagadish B. Kotra , Yasuko Eckert , Steven Raasch , Sergey Blagodurov
IPC: G06F12/08 , G06F12/0811 , G06F12/0871 , G06F9/30 , G06F12/0882 , G06F12/1027 , G06F12/0831
Abstract: A processing system selectively allocates space to store a group of one or more cache lines at a cache level of a cache hierarchy having a plurality of cache levels based on memory access patterns of a software application executing at the processing system. The processing system generates bit vectors indicating which cache levels are to allocate space to store groups of one or more cache lines based on the memory access patterns, which are derived from data granularity and movement information. Based on the bit vectors, the processing system provides hints to the cache hierarchy indicating the lowest cache level that can exploit the reuse potential for a particular data.
-
87.
公开(公告)号:US11226900B2
公开(公告)日:2022-01-18
申请号:US16776416
申请日:2020-01-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Weon Taek Na , Yasuko Eckert , Mark H. Oskin , Gabriel H. Loh , William Louie Walker , Michael Warren Boyer
IPC: G06F12/0815 , G06F16/22 , G06F12/0831
Abstract: An approach for tracking data stored in caches uses a Bloom filter to reduce the number of addresses that need to be tracked by a coherence directory. When a requested address is determined to not be currently tracked by either the coherence directory or the Bloom filter, tracking of the address is initiated in the Bloom filter, but not in the coherence directory. Initiating tracking of the address in the Bloom filter includes setting hash bits in the Bloom filter so that subsequent requests for the address will “hit” the Bloom filter. When a requested address is determined to be tracked by the coherence directory, the Bloom filter is not used to track the address.
-
公开(公告)号:US20210182216A1
公开(公告)日:2021-06-17
申请号:US16716194
申请日:2019-12-16
Applicant: Advanced Micro Devices, Inc.
Inventor: Jieming Yin , Yasuko Eckert , Subhash Sethumurugan
IPC: G06F12/126
Abstract: Systems, apparatuses, and methods for cache management based on access type priority are disclosed. A system includes at least a processor and a cache. During a program execution phase, certain access types are more likely to cause demand hits in the cache than others. Demand hits are load and store hits to the cache. A run-time profiling mechanism is employed to find which access types are more likely to cause demand hits. Based on the profiling results, the cache lines that will likely be accessed in the future are retained based on their most recent access type. The goal is to increase demand hits and thereby improve system performance. An efficient cache replacement policy can potentially reduce redundant data movement, thereby improving system performance and reducing energy consumption.
-
公开(公告)号:US20210173796A1
公开(公告)日:2021-06-10
申请号:US16706421
申请日:2019-12-06
Applicant: Advanced Micro Devices, Inc.
Inventor: Sooraj Puthoor , Kishore Punniyamurthy , Onur Kayiran , Xianwei Zhang , Yasuko Eckert , Johnathan Alsop , Bradford Michael Beckmann
Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.
-
公开(公告)号:US10853904B2
公开(公告)日:2020-12-01
申请号:US15079543
申请日:2016-03-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Yasuko Eckert , Nuwan Jayasena
Abstract: A processor employs a hierarchical register file for a graphics processing unit (GPU). A top level of the hierarchical register file is stored at a local memory of the GPU (e.g., a memory on the same integrated circuit die as the GPU). Lower levels of the hierarchical register file are stored at a different, larger memory, such as a remote memory located on a different die than the GPU. A register file control module monitors the status of in-flight wavefronts at the GPU, and in particular whether each in-flight wavefront is active, predicted to be become active, or inactive. The register file control module places execution data for active and predicted-active wavefronts in the top level of the hierarchical register file and places execution data for inactive wavefronts at lower levels of the hierarchical register file.
-
-
-
-
-
-
-
-
-