-
公开(公告)号:US20190310864A1
公开(公告)日:2019-10-10
申请号:US15948795
申请日:2018-04-09
Applicant: Advanced Micro Devices, Inc.
Inventor: Anthony T. Gutierrez , Sergey Blagodurov , Scott A. Moe , Xianwei Zhang , Jieming Yin , Matthew D. Sinclair
Abstract: An electronic device includes a controller functional block and a computational functional block. During operation, while the computational functional block executes a test portion of a workload at at least one precision level, the controller functional block monitors a behavior of the computational functional block. Based on the behavior of the computational functional block while executing the test portion of the workload at the at least one precision level, the controller functional block selects a given precision level from among a set of two or more precision levels at which the computational functional block is to execute a remaining portion of the workload. The controller functional block then configures the computational block to execute the remaining portion of the workload at the given precision level.
-
公开(公告)号:US11507522B2
公开(公告)日:2022-11-22
申请号:US16706421
申请日:2019-12-06
Applicant: Advanced Micro Devices, Inc.
Inventor: Sooraj Puthoor , Kishore Punniyamurthy , Onur Kayiran , Xianwei Zhang , Yasuko Eckert , Johnathan Alsop , Bradford Michael Beckmann
Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.
-
公开(公告)号:US11740791B2
公开(公告)日:2023-08-29
申请号:US17497286
申请日:2021-10-08
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Seyed Mohammad Seyedzadehdelcheh , Xianwei Zhang , Bradford Beckmann , Shomit N. Das
IPC: G06F3/06 , G06F12/0875 , G06T1/20
CPC classification number: G06F3/0608 , G06F3/064 , G06F3/0659 , G06F3/0673 , G06F12/0875 , G06F2212/1044 , G06T1/20
Abstract: In some embodiments, a memory controller in a processor includes a base value cache, a compressor, and a metadata cache. The compressor is coupled to the base value cache and the metadata cache. The compressor compresses a data block using at least a base value and delta values. The compressor determines whether the size of the data block exceeds a data block threshold value. Based on the determination of whether the size of the compressed data block generated by the compressor exceeds the data block threshold value, the memory controller transfers only a set of the compressed delta values to memory for storage. A decompressor located in the lower level cache of the processor decompresses the compressed data block using the base value stored in the base value cache, metadata stored in the metadata cache and the delta values stored in memory.
-
公开(公告)号:US11150899B2
公开(公告)日:2021-10-19
申请号:US15948795
申请日:2018-04-09
Applicant: Advanced Micro Devices, Inc.
Inventor: Anthony T. Gutierrez , Sergey Blagodurov , Scott A. Moe , Xianwei Zhang , Jieming Yin , Matthew D. Sinclair
Abstract: An electronic device includes a controller functional block and a computational functional block. During operation, while the computational functional block executes a test portion of a workload at at least one precision level, the controller functional block monitors a behavior of the computational functional block. Based on the behavior of the computational functional block while executing the test portion of the workload at the at least one precision level, the controller functional block selects a given precision level from among a set of two or more precision levels at which the computational functional block is to execute a remaining portion of the workload. The controller functional block then configures the computational block to execute the remaining portion of the workload at the given precision level.
-
公开(公告)号:US20210173796A1
公开(公告)日:2021-06-10
申请号:US16706421
申请日:2019-12-06
Applicant: Advanced Micro Devices, Inc.
Inventor: Sooraj Puthoor , Kishore Punniyamurthy , Onur Kayiran , Xianwei Zhang , Yasuko Eckert , Johnathan Alsop , Bradford Michael Beckmann
Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.
-
公开(公告)号:US11487671B2
公开(公告)日:2022-11-01
申请号:US16446119
申请日:2019-06-19
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Xianwei Zhang , John Kalamatianos , Bradford Beckmann
IPC: G06F12/0891 , G06F12/0888 , G06F12/0895 , G06F9/54 , G06F9/38
Abstract: Wavefront loading in a processor is managed and includes monitoring a selected wavefront of a set of wavefronts. Reuse of memory access requests for the selected wavefront is counted. A cache hit rate in one or more caches of the processor is determined based on the counted reuse. Based on the cache hit rate, subsequent memory requests of other wavefronts of the set of wavefronts are modified by including a type of reuse of cache lines in requests to the caches. In the caches, storage of data in the caches is based on the type of reuse indicated by the subsequent memory access requests. Reused cache lines are protected by preventing cache line contents from being replaced by another cache line for a duration of processing the set of wavefronts. Caches are bypassed when streaming access requests are made.
-
公开(公告)号:US11144208B2
公开(公告)日:2021-10-12
申请号:US16724609
申请日:2019-12-23
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: SeyedMohammad Seyedzadehdelcheh , Xianwei Zhang , Bradford Beckmann , Shomit N. Das
IPC: G06K9/36 , G06F3/06 , G06F12/0875 , G06T1/20
Abstract: In some embodiments, a memory controller in a processor includes a base value cache, a compressor, and a metadata cache. The compressor is coupled to the base value cache and the metadata cache. The compressor compresses a data block using at least a base value and delta values. The compressor determines whether the size of the data block exceeds a data block threshold value. Based on the determination of whether the size of the compressed data block generated by the compressor exceeds the data block threshold value, the memory controller transfers only a set of the compressed delta values to memory for storage. A decompressor located in the lower level cache of the processor decompresses the compressed data block using the base value stored in the base value cache, metadata stored in the metadata cache and the delta values stored in memory.
-
-
-
-
-
-