Abstract:
The present disclosure relates to methods and apparatus for graphics processing. An example method generally includes receiving, at a graphics processing unit (GPU), a plurality of commands corresponding to a plurality of draws across a frame, each of the plurality of commands indicating a depth test direction with respect to a low-resolution depth (LRZ) buffer for the corresponding draw. The method generally includes maintaining, at the GPU, a LRZ status buffer to store a corresponding depth test direction for a first command in time of the plurality of commands processed by the GPU. The method generally includes disabling, at the GPU, use of the LRZ buffer for depth testing for any of the plurality of commands remaining unprocessed after processing a command of the plurality of commands having a different depth test direction than the corresponding depth test direction stored in the LRZ status buffer.
Abstract:
A graphics processing unit (GPU) may execute a shader program that may include instructions for prioritization and scheduling of waves processed in parallel. According to some aspects of the described techniques, instruction variants (e.g., set-lowest-priority, set-highest-priority, set-priority-to-N, etc.) may be executed by hardware during processing of a wave to control (e.g., modify) processing priority for that wave. As such, the described techniques for shader controlled wave scheduling priority may allow waves to be processed while avoiding interference with lagging waves, while avoiding taking resources from lagging waves, etc. In one example, when a set-lowest-priority instruction is executed by hardware during execution of a first loop of a first wave, the instruction may push the current wave's priority to be lowest on the list. Such may result in pending loops from other waves being processed prior to the processing returning to a second loop of the first wave.
Abstract:
A method for processing data in a graphics processing unit including receiving an indication that all threads of a warp in a graphics processing unit (GPU) are to execute a same branch in a first set of instructions, storing one or more predicate bits in a memory as a single set of predicate bits, wherein the single set of predicate bits applies to all of the threads in the warp, and executing a portion of the first set of instructions in accordance with the single set of predicate bits. Executing the first set of instructions may include executing the first set of instruction in accordance with the single set of predicate bits using a single instruction, multiple data (SIMD) processing core and/or executing the first set of instruction in accordance with the single set of predicate bits using a scalar processing unit.
Abstract:
Aspects of this disclosure relate to a process for rendering graphics that includes performing, with a hardware unit of a graphics processing unit (GPU) designated for vertex shading, a vertex shading operation to shade input vertices so as to output vertex shaded vertices, wherein the hardware unit adheres to an interface that receives a single vertex as an input and generates a single vertex as an output. The process also includes performing, with the hardware unit of the GPU designated for vertex shading, a hull shading operation to generate one or more control points based on one or more of the vertex shaded vertices, wherein the one or more hull shading operations operate on at least one of the one or more vertex shaded vertices to output the one or more control points.
Abstract:
Methods, systems, and devices for rendering are described. A device may divide a frame into a plurality of bins. The device may generate a command stream containing multiple repetitions of a fixed-stride draw table (FSDT), where each repetition of the FSDT includes a respective state vector for one or more hardware registers of a set of hardware registers. The device may identify, for each bin, a subset of the multiple repetitions of the FSDT in the command stream that include a live draw call. The device may execute, using the set of hardware registers, one or more rendering commands for each bin based at least in part on the corresponding subset of the multiple repetitions of the FSDT.
Abstract:
A graphics processing unit (GPU) may rasterize a primitive into a plurality of samples, wherein vertices of the primitive are associated with VRS parameters. The GPU may determine a VRS quality group that comprises one or more sub regions of the plurality of samples based at least in part on the VRS parameters. The GPU may fragment shade a VRS tile that represents the VRS quality group, wherein the VRS tile comprises fewer samples than the VRS quality group. The GPU may amplify the stored VRS tile into shaded fragments that correspond to the VRS quality group.
Abstract:
A method for processing data in a graphics processing unit including receiving a code block of instructions common to a plurality of groups of threads of a shader, executing the code block of instructions common to the plurality of groups of threads of the shader creating a result by a first group of threads of the plurality of groups of threads, storing the result of the code block of instructions common to the plurality of groups of threads of the shader in on-chip random access memory (RAM), the on-chip RAM accessible by each of the plurality of groups of threads, and upon a determination that storing the result of the code block of instructions common to the plurality of groups of threads of the shader has completed, returning the result of the code block of instructions common to the plurality of groups of threads of the shader from on-chip RAM.
Abstract:
This disclosure describes techniques for performing hierarchical z-culling in a graphics processing system. In some examples, the techniques for performing hierarchical z-culling may involve selectively merging partially-covered source tiles for a tile location into a fully-covered merged source tile based on whether conservative farthest z-values for the partially-covered source tiles are nearer than a culling z-value for the tile location, and using a conservative farthest z-value associated with the fully-covered merged source tile to update the culling z-value for the tile location. In further examples, the techniques for performing hierarchical z-culling may use a cache unit that is not associated with an underlying memory to store conservative farthest z-values and coverage masks for merged source tiles. The capacity of the cache unit may be smaller than the size of cache needed to store merged source tile data for all of the tile locations in a render target.
Abstract:
This disclosure describes techniques for performing hierarchical z-culling in a graphics processing system. In some examples, the techniques for performing hierarchical z-culling may involve selectively merging partially-covered source tiles for a tile location into a fully-covered merged source tile based on whether conservative farthest z-values for the partially-covered source tiles are nearer than a culling z-value for the tile location, and using a conservative farthest z-value associated with the fully-covered merged source tile to update the culling z-value for the tile location. In further examples, the techniques for performing hierarchical z-culling may use a cache unit that is not associated with an underlying memory to store conservative farthest z-values and coverage masks for merged source tiles. The capacity of the cache unit may be smaller than the size of cache needed to store merged source tile data for all of the tile locations in a render target.
Abstract:
This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for single pass anti-ringing clamping enabled image processing. A graphics processor may perform a filtering operation on a set of texture samples. The graphics processor may select, during a single sampling operation, a minimum value and a maximum value associated with the set of texture samples during the performance of the filtering operation on the set of texture samples. The graphics processor may adjust, during the single sampling operation, a value of a filtered texture sample associated with the set of texture samples based on the minimum value and the maximum value. The graphics processor may output an indication of the adjusted value of the filtered texture sample.