Abstract:
A computer-implemented method of executing an instruction sequence with a recursive function call of a plurality of threads within a thread group in a Single-Instruction-Multiple-Threads (SIMT) system is provided. Each thread is provided with a function call counter (FCC), an active mask, an execution mask and a per-thread program counter (PTPC). The instruction sequence with the recursive function call is executed by the threads in the thread group according to a program counter (PC) indicating a target. Upon executing the recursive function call, for each thread, the active mask is set according to the PTPC and the target indicated by the PC, the FCC is determined when entering or returning from the recursive function call, the execution mask is determined according to the FCC and the active mask. It is determined whether an execution result of the recursive function call takes effects according to the execution mask.
Abstract:
A graphics processing circuit includes a buffer, a first vertex shader, and a second vertex shader. The first vertex shader generates at least coordinate values of a plurality of vertices to the buffer. The second vertex shader reads at least a portion of buffered coordinate values from the buffer, and reuses at least the portion of the buffered coordinate values to generate a value of at least one user-defined variable.
Abstract:
A Single-Instruction-Multiple-Treads (SIMT) computing system includes multiple processors and a scheduler to schedule multiple threads to each of the processors. Each processor includes a scalar unit to provide a scalar lane for scalar execution and vector units to provide N parallel lanes for vector execution. During execution time, a processor detects that an instruction of N threads has been predicted by a compiler to have (N−M) inactive threads and the same source operands for M active threads, where N>M≥1. Upon the detection, the instruction is sent to the scalar unit for scalar execution.
Abstract:
A graphics processing circuit includes a buffer, a first vertex shader, and a second vertex shader. The first vertex shader generates at least coordinate values of a plurality of vertices to the buffer. The second vertex shader reads at least a portion of buffered coordinate values from the buffer, and reuses at least the portion of the buffered coordinate values to generate a value of at least one user-defined variable.