Abstract:
A graphics processing unit (GPU) may determine a workload of a fragment shader program that executes on the GPU. The GPU may compare the workload of the fragment shader program to a threshold. In response to determining that the workload of the fragment shader program is lower than a specified threshold, the fragment shader program may process one or more fragments without the GPU performing early depth testing of the one or more fragments before the processing by the fragment shader program. The GPU may perform, after processing by the fragment shader program, late depth testing of the one or more fragments to result in one or more non-occluded fragments. The GPU may write pixel values for the one or more non-occluded fragments into a frame buffer.
Abstract:
This disclosure describes techniques for performing hierarchical z-culling in a graphics processing system. In some examples, the techniques for performing hierarchical z-culling may involve selectively merging partially-covered source tiles for a tile location into a fully-covered merged source tile based on whether conservative farthest z-values for the partially-covered source tiles are nearer than a culling z-value for the tile location, and using a conservative farthest z-value associated with the fully-covered merged source tile to update the culling z-value for the tile location. In further examples, the techniques for performing hierarchical z-culling may use a cache unit that is not associated with an underlying memory to store conservative farthest z-values and coverage masks for merged source tiles. The capacity of the cache unit may be smaller than the size of cache needed to store merged source tile data for all of the tile locations in a render target.
Abstract:
This disclosure describes techniques for performing hierarchical z-culling in a graphics processing system. In some examples, the techniques for performing hierarchical z-culling may involve selectively merging partially-covered source tiles for a tile location into a fully-covered merged source tile based on whether conservative farthest z-values for the partially-covered source tiles are nearer than a culling z-value for the tile location, and using a conservative farthest z-value associated with the fully-covered merged source tile to update the culling z-value for the tile location. In further examples, the techniques for performing hierarchical z-culling may use a cache unit that is not associated with an underlying memory to store conservative farthest z-values and coverage masks for merged source tiles. The capacity of the cache unit may be smaller than the size of cache needed to store merged source tile data for all of the tile locations in a render target.
Abstract:
The present disclosure relates to methods and apparatus for graphics processing. Aspects of the present disclosure can determine a portion of a display area, where the portion of the display area is determined based on display content of the display area. Further, aspects of the present disclosure can communicate display information corresponding to the determined portion of the display area. Additionally, aspects of the present disclosure can update the display information corresponding to the determined portion of the display area. Aspects of the present disclosure can also communicate the updated display information corresponding to the determined portion of the display area. Aspects of the present disclosure can also render at least some display content of the display area corresponding to the determined portion of the display area. In some aspects, the updated display information can be based on the rendered display content of the display area.
Abstract:
A computing device may allocate a plurality of blocks in the memory, wherein each of the plurality of blocks is of a uniform fixed size in the memory. The computing device may further store a plurality of bandwidth-compressed graphics data into the respective plurality of blocks in the memory, wherein one or more of the plurality of bandwidth-compressed graphics data each has a size that is smaller than the fixed size. The computing device may further store data associated with the plurality of bandwidth-compressed graphics data into unused space of one or more of the plurality of blocks that contains the respective one or more of the plurality of bandwidth-compressed graphics data.
Abstract:
The described techniques provide for bin-based rendering where the scene geometry in a frame is subdivided into bins or tiles, and bins are resolved concurrently with the rendering of a next bin. For example, a graphics processing unit (GPU) may process an entire image and sort transactions (e.g., rasterized primitives, such as triangles) into bins. For the rendering of each transaction, a device may identify a memory address of a memory block (e.g., a unit or portion of internal GPU memory (GMEM)) the transaction will be written (i.e., rendered) to. The device may thus prepare the memory block for rendering (e.g., by performing a resolve operation, a clear operation, or an unresolve operation on the memory block), such that the memory block is prepared prior to rendering of the particular transaction. As such, transactions of a bin may be resolved concurrently with rendering of transactions of a next bin.
Abstract:
A graphics processing unit (GPU) may determine a workload of a fragment shader program that executes on the GPU. The GPU may compare the workload of the fragment shader program to a threshold. In response to determining that the workload of the fragment shader program is lower than a specified threshold, the fragment shader program may process one or more fragments without the GPU performing early depth testing of the one or more fragments before the processing by the fragment shader program. The GPU may perform, after processing by the fragment shader program, late depth testing of the one or more fragments to result in one or more non-occluded fragments. The GPU may write pixel values for the one or more non-occluded fragments into a frame buffer.
Abstract:
This disclosure describes techniques for performing memory transfer operations with a graphics processing unit (GPU) based on a selectable memory transfer mode, and techniques for selecting a memory transfer mode for performing all or part of a memory transfer operation with a GPU. In some examples, the techniques of this disclosure may include selecting a memory transfer mode for performing at least part of a memory transfer operation, and performing, with a GPU, the memory transfer operation based on the selected memory transfer mode. The memory transfer mode may be selected from a set of at least two different memory transfer modes that includes an interleave memory transfer mode and a sequential memory transfer mode. The techniques of this disclosure may be used to improve the performance of GPU-assisted memory transfer operations.
Abstract:
This disclosure describes techniques for performing memory transfer operations with a graphics processing unit (GPU) based on a selectable memory transfer mode, and techniques for selecting a memory transfer mode for performing all or part of a memory transfer operation with a GPU. In some examples, the techniques of this disclosure may include selecting a memory transfer mode for performing at least part of a memory transfer operation, and performing, with a GPU, the memory transfer operation based on the selected memory transfer mode. The memory transfer mode may be selected from a set of at least two different memory transfer modes that includes an interleave memory transfer mode and a sequential memory transfer mode. The techniques of this disclosure may be used to improve the performance of GPU-assisted memory transfer operations.