GPR OPTIMIZATION IN A GPU BASED ON A GPR RELEASE MECHANISM

    公开(公告)号:US20230113415A1

    公开(公告)日:2023-04-13

    申请号:US18046901

    申请日:2022-10-14

    Abstract: This disclosure provides systems, devices, apparatus and methods, including computer programs encoded on storage media, for GPR optimization in a GPU based on a GPR release mechanism. More specifically, a GPU may determine at least one unutilized branch within an executable shader based on constants defined for the executable shader. Based on the at least one unutilized branch, the GPU may further determine a number of GPRs that can be deallocated from previously allocated GPRs. The GPU may deallocate, for a subsequent thread within a draw call, the number of GPRs from the previously allocated GPRs during execution of the executable shader based on the determined number of GPRs to be deallocated.

    METHODS AND APPARATUS FOR MAPPING SOURCE LOCATION FOR INPUT DATA TO A GRAPHICS PROCESSING UNIT

    公开(公告)号:US20220036498A1

    公开(公告)日:2022-02-03

    申请号:US16984024

    申请日:2020-08-03

    Abstract: The present disclosure relates to methods and apparatus for mapping a source location of input data for processing by a graphics processing unit. The apparatus can configure a processing element of the graphics processing unit with a predefined rule for decoding a data source parameter for executing a task by the graphics processing unit. Moreover, the apparatus can store the parameter in local storage of the processing element and configure the processing element to decode the parameter according to the at least one predefined rule to determine a source location of the input data and at least one relationship between invocations of the task. The apparatus can also load, to the local storage of the processing element, the input data from a plurality of memory addresses of the source location determined by the parameter. A one logic unit can then execute the task on the loaded input data.

    CONCURRENT BINNING AND RENDERING
    24.
    发明申请

    公开(公告)号:US20200020067A1

    公开(公告)日:2020-01-16

    申请号:US16035372

    申请日:2018-07-13

    Abstract: A method, an apparatus, and a computer-readable medium may be configured to perform a binning pass for a first frame. The apparatus may be configured to perform a rendering pass for the first frame in parallel with the binning pass. The apparatus may be configured to enhance efficiency in performing a binning pass and a rendering pass for tile-based rendering, such that the binning pass and rendering pass are performed concurrently. The apparatus may be configured to perform the binning pass using a first hardware pipeline, and may be configured to perform the rendering pass using a second hardware pipeline.

    ACCELERATED BOUNDING VOLUME HIERARCHY (BVH) TRAVERSAL FOR RAY TRACING

    公开(公告)号:US20250166287A1

    公开(公告)日:2025-05-22

    申请号:US19031051

    申请日:2025-01-17

    Abstract: Systems and techniques are provided for accelerated ray tracing. For instance, a process can include obtaining a hierarchical acceleration data structure that includes a plurality of primitives of a scene object and obtaining a respective information value associated with each primitive included in the plurality of primitives. A sort order can be determined for two or more nodes included in a same level of the hierarchical acceleration data structure at least in part by sorting the two or more nodes based on a respective sorting parameter value determined for each respective node of the two or more nodes. Each respective sorting parameter value can be determined based on at least one information value associated with one or more primitives included in a sub-tree of each respective node of the two or more nodes. The hierarchical acceleration data structure can be traversed using the sort order.

    RUNTIME MECHANISM TO OPTIMIZE SHADER EXECUTION FLOW

    公开(公告)号:US20240046543A1

    公开(公告)日:2024-02-08

    申请号:US17817815

    申请日:2022-08-05

    CPC classification number: G06T15/005 G06T15/80

    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for runtime optimization of the shader execution flow. A graphics processor may obtain instruction execution data associated with a graphics workload, the instruction execution data including graphics data for a set of shader operations. The graphics processor may configure, at a first iteration, at least one predication value based on the instruction execution data including the graphics data for the set of shader operations. The graphics processor may adjust, at a second iteration, an execution flow of the graphics workload based on the configured at least one predication value, the execution flow of the graphics workload including the set of shader operations. The graphics processor may execute or refrain from executing, at the second iteration, each of the set of shader operations based on the adjusted execution flow of the graphics workload.

    DYNAMIC WAVE PAIRING
    27.
    发明公开

    公开(公告)号:US20230267567A1

    公开(公告)日:2023-08-24

    申请号:US17652478

    申请日:2022-02-24

    CPC classification number: G06T1/20 G06F9/505

    Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for dynamic wave pairing. A graphics processor may allocate one or more GPU workloads to one or more wave slots of a plurality of wave slots. The graphics processor may select a first execution slot of a plurality of execution slots for executing the one or more GPU workloads. The selection may be based on one of a plurality of granularities. The graphics processor may execute, at the selected first execution slot, the one or more GPU workloads at the one of the plurality of granularities.

    FOVEATED BINNED RENDERING ASSOCIATED WITH SAMPLE SPACES

    公开(公告)号:US20230092394A1

    公开(公告)日:2023-03-23

    申请号:US17478694

    申请日:2021-09-17

    Abstract: Aspects presented herein relate to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may receive a plurality of primitives associated with one or more frames in a scene, a portion of the scene being associated with an upscaled sample space and/or a downscaled sample space. The apparatus may also perform a binning pass for the plurality of primitives, the binning pass being associated with an unscaled sample space, where the binning pass sorts each of the primitives into one or more bins associated with each of the one or more frames. Further, the apparatus may perform one of one or more rendering passes for each of the one or more bins. The apparatus may also rasterize each of the plurality of primitives based on at least one of the upscaled sample space or the downscaled sample space.

    OPTIMIZATION OF DEPTH AND SHADOW PASS RENDERING IN TILE BASED ARCHITECTURES

    公开(公告)号:US20230017522A1

    公开(公告)日:2023-01-19

    申请号:US17373704

    申请日:2021-07-12

    Abstract: The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may configure a portion of a GPU to include at least one depth processing block, the at least one depth processing block being associated with at least one depth buffer. The apparatus may also identify one or more depth passes of each of a plurality of graphics workloads, the plurality of graphics workloads being associated with a plurality of frames. Further, the apparatus may process each of the one or more depth passes in the portion of the GPU including the at least one depth processing block, each of the one or more depth passes being processed by the at least one depth processing block, the one or more depth passes being associated with the at least one depth buffer.

    METHODS AND APPARATUS FOR TENSOR OBJECT SUPPORT IN MACHINE LEARNING WORKLOADS

    公开(公告)号:US20220253969A1

    公开(公告)日:2022-08-11

    申请号:US17173643

    申请日:2021-02-11

    Abstract: The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may modify at least one texture memory object to support a data structure for one or more tensor objects. The apparatus may also determine one or more supported memory layouts for the one or more tensor objects based on the modified at least one texture memory object. Additionally, the apparatus may access data associated with the one or more tensor objects based on the one or more supported memory layouts, the data for each of the one or more tensor objects corresponding to at least one data instruction. The apparatus may also execute the at least one data instruction based on the accessed data associated with the one or more tensor objects.

Patent Agency Ranking