-
公开(公告)号:US12216590B2
公开(公告)日:2025-02-04
申请号:US18208059
申请日:2023-06-09
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: Saurabh Sharma , Hashem Hashemi , Guennadi Riguer
IPC: G06F12/08 , G06F12/0811 , G06F12/126
Abstract: A cache controller of a processing system implementing a non-uniform memory architecture (NUMA) adjusts a cache replacement priority of local and non-local data stored at a cache based on a cache replacement policy. Local data is data that is accessed by the cache via a local memory channel and non-local data is data that is accessed by the cache via a non-local memory channel. The cache controller assigns priorities to local and non-local data stored at the cache based on a cache replacement policy and selects data for replacement at the cache based, at least in part, on the assigned priorities.
-
公开(公告)号:US20240411706A1
公开(公告)日:2024-12-12
申请号:US18208059
申请日:2023-06-09
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: Saurabh Sharma , Hashem Hashemi , Guennadi Riguer
IPC: G06F12/126 , G06F12/0811
Abstract: A cache controller of a processing system implementing a non-uniform memory architecture (NUMA) adjusts a cache replacement priority of local and non-local data stored at a cache based on a cache replacement policy. Local data is data that is accessed by the cache via a local memory channel and non-local data is data that is accessed by the cache via a non-local memory channel. The cache controller assigns priorities to local and non-local data stored at the cache based on a cache replacement policy and selects data for replacement at the cache based, at least in part, on the assigned priorities.
-
公开(公告)号:US20230195626A1
公开(公告)日:2023-06-22
申请号:US17558008
申请日:2021-12-21
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: Saurabh Sharma , Jeremy Lukacs , Hashem Hashemi , Gianpaolo Tommasi , Guennadi Riguer , Mark Fowler , Randy Ramsey
IPC: G06F12/0806 , G06F12/10
CPC classification number: G06F12/0806 , G06F12/10 , G06F2212/1016
Abstract: A processing system is configured to translate a first cache access pattern of a dispatch of work items to a cache access pattern that facilitates consumption of data stored at a cache of a parallel processing unit by a subsequent access before the data is evicted to a more remote level of the memory hierarchy. For consecutive cache accesses having read-after-read data locality, in some embodiments the processing system translates the first cache access pattern to a space-filling curve. In some embodiments, for consecutive accesses having read-after-write data locality, the processing system translates a first typewriter cache access pattern that proceeds in ascending order for a first access to a reverse typewriter cache access pattern that proceeds in descending order for a subsequent cache access. By translating the cache access pattern based on data locality, the processing system increases the hit rate of the cache.
-
公开(公告)号:US12189534B2
公开(公告)日:2025-01-07
申请号:US17564474
申请日:2021-12-29
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: Saurabh Sharma , Hashem Hashemi , Paavo Pessi , Mika Tuomi , Gianpaolo Tommasi , Jeremy Lukacs , Guennadi Riguer
IPC: G06F9/46 , G06F9/48 , G06F12/0855
Abstract: A processing system divides successive dispatches of work items into portions. The successive dispatches are separated from each other by barriers, each barrier indicating that the work items of the previous dispatch must complete execution before work items of a subsequent dispatch can begin execution. In some embodiments, the processing system interleaves execution of portions of a first dispatch with portions of subsequent dispatches that consume data produced by the first dispatch. The processing system thereby reduces the amount of data written to the local cache by a producer dispatch while preserving data locality for a subsequent consumer (or consumer/producer) dispatch and facilitating processing efficiency.
-
公开(公告)号:US20230195509A1
公开(公告)日:2023-06-22
申请号:US17557927
申请日:2021-12-21
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: Saurabh Sharma , Jeremy Lukacs , Hashem Hashemi , Gianpaolo Tommasi , Guennadi Riguer , Mark Fowler , Randy Ramsey
IPC: G06F9/48
CPC classification number: G06F9/4831
Abstract: A processing unit performs a dispatch walk of a set of thread groups based on a programmable access pattern. The access pattern is stored at a table that is programmed with the access pattern based upon a specified command. By using the command to program the table with different access patterns, the dispatch order of the set of thread groups is adapted to better suit the processing of different data sets, thereby reducing power consumption at the processing unit, and improving overall processing efficiency.
-
公开(公告)号:US20230195639A1
公开(公告)日:2023-06-22
申请号:US17557475
申请日:2021-12-21
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Saurabh Sharma , Jeremy Lukacs , Hashem Hashemi , Gianpaolo Tommasi , Christopher J. Brennan
IPC: G06F12/0893
CPC classification number: G06F12/0893 , G06F2212/6042
Abstract: A processing system selectively allocates storage at a local cache of a parallel processing unit for cache lines of a repeating pattern of data that exceeds the storage capacity of the cache. The processing system identifies repeating patterns of data having cache lines that have a reuse distance that exceeds the storage capacity of the cache. A cache controller allocates storage for only a subset of cache lines of the repeating pattern of data at the cache and excludes the remainder of cache lines of the repeating pattern of data from the cache. By restricting the cache to store only a subset of cache lines of the repeating pattern of data, the cache controller increases the hit rate at the cache for the subset of cache lines.
-
公开(公告)号:US20240370965A1
公开(公告)日:2024-11-07
申请号:US18373004
申请日:2023-09-26
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Saurabh Sharma , Hashem Hashemi , Ian Richard Beaumont , Jeffrey C. Allan , Dana Schaa
IPC: G06T1/20
Abstract: A processing unit includes traversal recursion circuitry that performs, on behalf of a software shader, at least some of the requisite actions for traversing selected types of nodes of the acceleration structure. In response to identifying a first node of a raytracing acceleration structure is of a first type, the processing unit provides an intersection result for the first node to recursion circuitry. In response to the intersection result for the first node, the processing unit performs a traversal operation for the raytracing acceleration structure at the recursion circuitry.
-
公开(公告)号:US12117939B2
公开(公告)日:2024-10-15
申请号:US17557475
申请日:2021-12-21
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Saurabh Sharma , Jeremy Lukacs , Hashem Hashemi , Gianpaolo Tommasi , Christopher J. Brennan
IPC: G06F12/00 , G06F12/0893
CPC classification number: G06F12/0893 , G06F2212/6042
Abstract: A processing system selectively allocates storage at a local cache of a parallel processing unit for cache lines of a repeating pattern of data that exceeds the storage capacity of the cache. The processing system identifies repeating patterns of data having cache lines that have a reuse distance that exceeds the storage capacity of the cache. A cache controller allocates storage for only a subset of cache lines of the repeating pattern of data at the cache and excludes the remainder of cache lines of the repeating pattern of data from the cache. By restricting the cache to store only a subset of cache lines of the repeating pattern of data, the cache controller increases the hit rate at the cache for the subset of cache lines.
-
-
-
-
-
-
-