Abstract:
Techniques are described in which a device is configured to retrieve a metadata buffer for rendering a sub-frame of a set of sub-frames for a frame. A data block of a data buffer is configured to store image data for rendering the sub-frame. In response to determining, based on the metadata buffer for rendering the sub-frame, that the sub-frame includes a color pattern, fixed color value, or combination thereof, the device refrains from retrieving the image data from the data block of the data buffer and determines the image data for rendering the sub-frame based on the metadata buffer.
Abstract:
A graphics processing unit (GPU) may determine a workload of a fragment shader program that executes on the GPU. The GPU may compare the workload of the fragment shader program to a threshold. In response to determining that the workload of the fragment shader program is lower than a specified threshold, the fragment shader program may process one or more fragments without the GPU performing early depth testing of the one or more fragments before the processing by the fragment shader program. The GPU may perform, after processing by the fragment shader program, late depth testing of the one or more fragments to result in one or more non-occluded fragments. The GPU may write pixel values for the one or more non-occluded fragments into a frame buffer.
Abstract:
This disclosure presents techniques and structures for graphics processing. In one example, a method of graphics processing may include rendering, with a graphics processing unit (GPU), one or more portions of a frame using one or more graphics operations, and writing, with the GPU, color data directly to a color buffer in a system memory in accordance with the one or more graphics operations. The method may further include writing, with the GPU, depth data to a depth buffer in a graphics memory in accordance with the one or more graphics operations, and resolving, with the GPU, the depth buffer in the graphics memory to the system memory when the rendering of the one or more portions of the frame is complete.
Abstract:
This disclosure describes techniques for performing memory transfer operations with a graphics processing unit (GPU) based on a selectable memory transfer mode, and techniques for selecting a memory transfer mode for performing all or part of a memory transfer operation with a GPU. In some examples, the techniques of this disclosure may include selecting a memory transfer mode for performing at least part of a memory transfer operation, and performing, with a GPU, the memory transfer operation based on the selected memory transfer mode. The memory transfer mode may be selected from a set of at least two different memory transfer modes that includes an interleave memory transfer mode and a sequential memory transfer mode. The techniques of this disclosure may be used to improve the performance of GPU-assisted memory transfer operations.
Abstract:
This disclosure describes techniques for performing memory transfer operations with a graphics processing unit (GPU) based on a selectable memory transfer mode, and techniques for selecting a memory transfer mode for performing all or part of a memory transfer operation with a GPU. In some examples, the techniques of this disclosure may include selecting a memory transfer mode for performing at least part of a memory transfer operation, and performing, with a GPU, the memory transfer operation based on the selected memory transfer mode. The memory transfer mode may be selected from a set of at least two different memory transfer modes that includes an interleave memory transfer mode and a sequential memory transfer mode. The techniques of this disclosure may be used to improve the performance of GPU-assisted memory transfer operations.
Abstract:
A sliced graphics processing unit (GPU) architecture in processor-based devices is disclosed. In some aspects, a GPU based on a sliced GPU architecture includes multiple hardware slices. The GPU further includes a command processor (CP) circuit and an unslice primitive controller (PC_US). Upon receiving a graphics instruction from a central processing unit (CPU), the CP circuit determines a graphics workload, and transmits the graphics workload to the PC_US. The PC_US then partitions the graphics workload into multiple subbatches and distributes each subbatch to a PC_S of a hardware slice for processing.
Abstract:
The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may configure a portion of a GPU to include at least one depth processing block, the at least one depth processing block being associated with at least one depth buffer. The apparatus may also identify one or more depth passes of each of a plurality of graphics workloads, the plurality of graphics workloads being associated with a plurality of frames. Further, the apparatus may process each of the one or more depth passes in the portion of the GPU including the at least one depth processing block, each of the one or more depth passes being processed by the at least one depth processing block, the one or more depth passes being associated with the at least one depth buffer.
Abstract:
A cache memory may receive, from a client, a request for a long cache line of data. The cache memory may receive, from a memory, the requested long cache line of data. The cache memory may store the requested long cache line of data into a plurality of data stores across a plurality of memory banks as a plurality of short cache lines of data distributed across the plurality of data stores in the cache memory. The cache memory may also store a plurality of tags associated with the plurality of short cache lines of data into one of a plurality of tag stores in the plurality of memory banks.
Abstract:
A graphics processing unit (GPU) may perform a binning pass to determine primitive-tile intersections for a plurality of primitives and a plurality of tiles making up a graphical scene, including performing low-resolution z-culling of representations of the plurality of primitives based at least in part on a first set of culling z-values each having a first test size to determine a first set of visible primitives from the plurality of primitives. The GPU may further perform a rendering pass to render the plurality of tiles based at least in part on performing the low-resolution z-culling of representations of the first set of visible primitives based at least in part on a second set of culling z-values that represents a second test size to determine a second set of visible primitives from the first set of visible primitives, wherein the first test size is greater than the second test size.
Abstract:
Techniques are described for stochastic rasterization. A graphics processing unit (GPU) may discard samples of bounding polygons that together indicate movement of one or more primitives before a pixel shader process the samples. The GPU may leverage a stencil buffer and stencil test for discarding of such samples.