Abstract:
A method for hardware management of DMA transfer commands includes accessing, by a first DMA engine, a DMA transfer command and determining a first portion of a data transfer requested by the DMA transfer command. Transfer of a first portion of the data transfer by the first DMA engine is initiated based at least in part on the DMA transfer command. Similarly, a second portion of the data transfer by a second DMA engine is initiated based at least in part on the DMA transfer command. After transferring the first portion and the second portion of the data transfer, an indication is generated that signals completion of the data transfer requested by the DMA transfer command.
Abstract:
Devices and methods for rendering objects using ray tracing are provided which include during a build time: generating an accelerated hierarchy structure comprising data representing an approximate volume bounding a group of geometric shapes representing the objects in the scene and data representing the geometric shapes; and generating additional data used to transform rays, to be cast in the scene, from a high precision space to a low precision space; and during a render time occurring after the build time: performing ray intersection tests, using the additional data generated during the build time, for the rays in the scene; and rendering the scene based on the ray intersection tests. Because the additional data is generated prior to render time, the additional data can be used to perform the ray intersection testing more efficiently.
Abstract:
A Streaming Wave Coalescer (SWC) circuit stores a first set of state values associated with a first subset of threads of a first wave in a bin based on each of the first subset of threads including a first set of instructions to be executed. A second set of state values associated with a second subset of threads of a second wave is stored in the bin based on each of the second subset of threads including the first set of instructions to be executed and based on the first wave and the second wave both being associated with a hard key. A third wave is formed from the threads of the first subset and the second subset and is emitted for execution. As a result of reorganizing the threads and reconstituting a different wave, thread divergence of waves sent for execution is reduced.
Abstract:
A technique for performing ray tracing operations is provided. The technique includes for a ray being tested for intersection with geometry associated with a bounding volume hierarchy, traversing to a pre-filtering node that includes information for filtering out triangles of a leaf node of the bounding volume hierarchy; evaluating a quantized ray that corresponds to the ray against quantized triangles of the pre-filtering node to filter out one or more triangles of the leaf node from consideration; and testing the triangles of the leaf node that are not filtered out and not testing the triangles of the leaf node that are filtered out.
Abstract:
A method for hardware management of DMA transfer commands includes accessing, by a first DMA engine, a DMA transfer command and determining a first portion of a data transfer requested by the DMA transfer command. Transfer of a first portion of the data transfer by the first DMA engine is initiated based at least in part on the DMA transfer command. Similarly, a second portion of the data transfer by a second DMA engine is initiated based at least in part on the DMA transfer command. After transferring the first portion and the second portion of the data transfer, an indication is generated that signals completion of the data transfer requested by the DMA transfer command.
Abstract:
An apparatus and method for efficiently migrating the execution of threads between multiple parallel lanes of execution. In various implementations, a computing system includes multiple vector processing circuits of a compute circuit that executes multiple lanes of multiple waves. Each lane includes a key indicating a path of execution. When a lane of the multiple lanes of execution executes a stream wave coalescing (SWC) reorder instruction, a control circuit compares keys of waves that have previously executed the SWC reorder instruction. When the number of lanes with a matching key exceeds a threshold and after identifying at least this number of lanes to swap, the control circuit swaps continuation state information (live active state information) between lanes of an emitting wave that do not have a matching key and lanes of contributing waves that do have a matching key. The resulting (reordered) emitting wave executes more efficiently, which increases performance.
Abstract:
A technique for performing ray tracing operations is provided. The technique includes determining error bounds for a rotation operation for a ray; selecting a technique for determining whether the ray intersects a bounding box based on the error bounds; and determining whether the ray hits the bounding box based on the selected technique.
Abstract:
Apparatuses, computer readable mediums, and methods of building a k-dimensional tree (kd-tree) are disclosed. The method may include a first processor, for example a graphics processing unit (GPU), selecting a node to split in a depth first manner. The method may include the GPU splitting based on a split plane a node into a left node and a right node. The GPU may assign the left (right) node to the GPU when a number of polygons associated with the left (right) node is above a threshold and otherwise assign the left node to a second processor, for example a central processing unit (CPU). The CPU may build the kd-tree in a depth first manner. The GPU (CPU) may select a next node to split based on a last node assigned to the GPU (CPU) or by selecting a node that is currently in a local memory of the GPU (CPU).