-
公开(公告)号:US20210096877A1
公开(公告)日:2021-04-01
申请号:US16583969
申请日:2019-09-26
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Bin HE , Michael MANTOR , Jiasheng CHEN
Abstract: An arithmetic logic unit (ALU) pipeline of a processing unit collapses execution bubbles in response to a stall at a stage of the ALU pipeline. An execution bubble occurs at the pipeline in response to an invalid instruction being placed in the pipeline for execution. The invalid instruction thus consumes an available “slot” in the pipeline, and proceeds through the pipeline until a stall in a subsequent stage (that is, a stage after the stage executing the invalid instruction) is detected. In response to detecting the stall, the ALU continues to execute instructions that are behind the invalid instruction in the pipeline, thereby collapsing the execution bubble and conserving resources of the ALU.in response to a stall at a stage of the ALU pipeline.
-
公开(公告)号:US20200379767A1
公开(公告)日:2020-12-03
申请号:US16426613
申请日:2019-05-30
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Rex Eldon MCCRARY , Yi LUO , Harry J. WISE , Alexander Fuad ASHKAR , Michael MANTOR
IPC: G06F9/38 , G06F16/245 , G06T1/20 , G06T1/60
Abstract: A method of context bouncing includes receiving, at a command processor of a graphics processing unit (GPU), a conditional execute packet providing a hash identifier corresponding to an encapsulated state. The encapsulated state includes one or more context state packets following the conditional execute packet. A command packet following the encapsulated state is executed based at least in part on determining whether the hash identifier of the encapsulated state matches one of a plurality of hash identifiers of active context states currently stored at the GPU.
-
公开(公告)号:US20190278605A1
公开(公告)日:2019-09-12
申请号:US16425625
申请日:2019-05-29
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Brian EMBERLING , Michael MANTOR
Abstract: A system includes a processor configured to operate in at least a first mode and a second mode. In the first mode the first processor operates to execute an instruction for an entire wavefront before executing a next instruction for the entire wavefront. In the second mode the processor operates to execute a set instructions for a portion of a wavefront before executing the set instructions for another portion of the same wavefront. The system further includes a memory coupled to the processor. The memory is configured to store a shader program for execution by the processor, wherein the shader program includes at least one indication associated with one of the first mode or the second mode. The processor is further to implement one of the first mode or the second mode while executing the shader program responsive to the at least one indication present in the first shader program.
-
24.
公开(公告)号:US20140157287A1
公开(公告)日:2014-06-05
申请号:US13691066
申请日:2012-11-30
Applicant: ADVANCED MICRO DEVICES, INC
Inventor: Lee W. HOWES , Benedict R. GASTER , Michael MANTOR
IPC: G06F9/46
CPC classification number: G06F9/461
Abstract: Methods, systems, and computer readable storage media embodiments allow for low overhead context switching of threads. In embodiments, applications, such as, but not limited to, iterative data-parallel applications, substantially reduce the overhead of context switching by adding a user or higher-level program configurability of a state to be saved upon preempting of a executing thread. These methods, systems, and computer readable storage media include aspects of running a group of threads on a processor, saving state information by respective threads in the group in response to a signal from a scheduler, and pre-empting running of the group after the saving of the state information.
Abstract translation: 方法,系统和计算机可读存储介质实施例允许线程的低开销上下文切换。 在实施例中,诸如但不限于迭代数据并行应用的应用通过在抢占执行线程时添加要保存的状态的用户或更高级别的程序可配置性来大大减少上下文切换的开销。 这些方法,系统和计算机可读存储介质包括在处理器上运行一组线程的方面,响应于来自调度器的信号来节省组中相应线程的状态信息,并且在先后处理 保存状态信息。
-
公开(公告)号:US20240168719A1
公开(公告)日:2024-05-23
申请号:US18414164
申请日:2024-01-16
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Bin HE , Brian EMBERLING , Mark LEATHER , Michael MANTOR
CPC classification number: G06F7/57 , G06F9/3867 , G06F17/16 , G06T1/20 , G06F15/8015
Abstract: A processing system executes wavefronts at multiple arithmetic logic unit (ALU) pipelines of a single instruction multiple data (SIMD) unit in a single execution cycle. The ALU pipelines each include a number of ALUs that execute instructions on wavefront operands that are collected from vector general process register (VGPR) banks at a cache and output results of the instructions executed on the wavefronts at a buffer. By storing wavefronts supplied by the VGPR banks at the cache, a greater number of wavefronts can be made available to the SIMD unit without increasing the VGPR bandwidth, enabling multiple ALU pipelines to execute instructions during a single execution cycle.
-
公开(公告)号:US20240143283A1
公开(公告)日:2024-05-02
申请号:US18219268
申请日:2023-07-07
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Bin HE , Shubh SHAH , Michael MANTOR
Abstract: A parallel processing unit employs an arithmetic logic unit (ALU) having a relatively small footprint, thereby reducing the overall power consumption and circuit area of the processing unit. To support the smaller footprint, the ALU includes multiple stages to execute operations corresponding to a received instruction. The ALU executes at least one operation at a precision indicated by the received instruction, and then reduces the resulting data of the at least one operation to a smaller size before providing the results to another stage of the ALU to continue execution of the instruction.
-
公开(公告)号:US20220237851A1
公开(公告)日:2022-07-28
申请号:US17706811
申请日:2022-03-29
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Mark LEATHER , Michael MANTOR
Abstract: A graphics processing unit (GPU) or other apparatus includes a plurality of shader engines. The apparatus also includes a first front end (FE) circuit and one or more second FE circuits. The first FE circuit is configured to schedule geometry workloads for the plurality of shader engines in a first mode. The first FE circuit is configured to schedule geometry workloads for a first subset of the plurality of shader engines and the one or more second FE circuits are configured to schedule geometry workloads for a second subset of the plurality of shader engines in a second mode. In some cases, a partition switch is configured to selectively connect the first FE circuit or the one or more second FE circuits to the second subset of the plurality of shader engines depending on whether the apparatus is in the first mode or the second mode.
-
公开(公告)号:US20220197973A1
公开(公告)日:2022-06-23
申请号:US17125457
申请日:2020-12-17
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Sateesh LAGUDU , Allen H. RUSH , Michael MANTOR
Abstract: A processing system includes a first set and a second set of general-purpose registers (GPRs) and memory access circuitry that fetches nonzero values of a sparse matrix into consecutive slots in the first set. The memory access circuitry also fetches values of an expanded matrix into consecutive slots in the second set of GPRs. The expanded matrix is formed based on values of a vector and locations of the nonzero values in the sparse matrix. The processing system also includes a set of multipliers that concurrently perform multiplication of the nonzero values in slots of the first set of GPRs with the values of the vector in corresponding slots of the second set. Reduced sum circuitry accumulates results from the set of multipliers for rows of the sparse matrix.
-
公开(公告)号:US20190318527A1
公开(公告)日:2019-10-17
申请号:US16452831
申请日:2019-06-26
Applicant: Advanced Micro Devices, Inc.
Inventor: Mangesh P. NIJASURE , Todd MARTIN , Michael MANTOR
Abstract: Improvements in the graphics processing pipeline that allow multiple pipelines to cooperate to render a single frame are disclosed. Two approaches are provided. In a first approach, world-space pipelines for the different graphics processing pipelines process all work for draw calls received from a central processing unit (CPU). In a second approach, the world-space pipelines divide up the work. Work that is divided is synchronized and redistributed at various points in the world-space pipeline. In either approach, the triangles output by the world-space pipelines are distributed to the screen-space pipelines based on the portions of the render surface overlapped by the triangles. Triangles are rendered by screen-space pipelines associated with the render surface portions overlapped by those triangles.
-
公开(公告)号:US20190155604A1
公开(公告)日:2019-05-23
申请号:US15818304
申请日:2017-11-20
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Brian EMBERLING , Michael MANTOR
IPC: G06F9/30 , G06F12/0862 , G06F12/0811
Abstract: A processing unit includes a plurality of processing elements and one or more caches. A first thread executes a program that includes one or more prefetch instructions to prefetch information into a first cache. Prefetching is selectively enabled when executing the first thread on a first processing element dependent upon whether one or more second threads previously executed the program on the first processing element. The first thread is then dispatched to execute the program on the first processing element. In some cases, a dispatcher receives the first thread four dispatching to the first processing element. The dispatcher modifies the prefetch instruction to disable prefetching into the first cache in response to the one or more second threads having previously executed the program on the first processing element.
-
-
-
-
-
-
-
-
-