GPR OPTIMIZATION IN A GPU BASED ON A GPR RELEASE MECHANISM

    公开(公告)号:US20210358076A1

    公开(公告)日:2021-11-18

    申请号:US16877367

    申请日:2020-05-18

    IPC分类号: G06T1/60 G06T1/20 G06F9/30

    摘要: This disclosure provides systems, devices, apparatus and methods, including computer programs encoded on storage media, for GPR optimization in a GPU based on a GPR release mechanism. More specifically, a GPU may determine at least one unutilized branch within an executable shader based on constants defined for the executable shader. Based on the at least one unutilized branch, the GPU may further determine a number of GPRs that can be deallocated from previously allocated GPRs. The GPU may deallocate, for a subsequent thread within a draw call, the number of GPRs from the previously allocated GPRs during execution of the executable shader based on the determined number of GPRs to be deallocated.

    RUN-TIME MECHANISM FOR OPTIMAL SHADER
    3.
    发明公开

    公开(公告)号:US20230377240A1

    公开(公告)日:2023-11-23

    申请号:US17664033

    申请日:2022-05-18

    IPC分类号: G06T15/00 G06T1/60

    CPC分类号: G06T15/005 G06T1/60

    摘要: Aspects presented herein relate to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may receive a set of draw call instructions corresponding to a graphics workload, where the set of draw call instructions is associated with at least one run-time parameter. The apparatus may also obtain a first shader program associated with storing data in a system memory and at least one second shader program associated with storing data in a constant memory. Further, the apparatus may execute the first shader program or the at least one second shader program based on whether the at least one run-time parameter is less than or equal to a size of the constant memory. The apparatus may also update or maintain a configuration of a shader processor or a streaming processor based on executing the first shader program or the at least one second shader program.

    METHODS AND APPARATUS FOR WAVE SLOT RETIREMENT PROCEDURES

    公开(公告)号:US20220357983A1

    公开(公告)日:2022-11-10

    申请号:US17315205

    申请日:2021-05-07

    IPC分类号: G06F9/48 G06F12/0875 G06T1/20

    摘要: The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may receive a plurality of workloads based on a workload order, each of the plurality of workloads being received in the workload order including at least a first workload and a second workload. The apparatus may also allocate one or more workloads of the plurality of workloads to one or more wave slots. Additionally, the apparatus may execute the one or more allocated workloads at the one or more wave slots, such that at least the first workload is executed at the first wave slot and the second workload is executed at the second wave slot. The apparatus may also allocate at least one other workload of the plurality of workloads to at least one previously-allocated wave slot of the one or more wave slots.

    RASTERIZATION OF COMPUTE WORKLOADS
    7.
    发明公开

    公开(公告)号:US20230394738A1

    公开(公告)日:2023-12-07

    申请号:US18035507

    申请日:2020-11-09

    IPC分类号: G06T15/00

    CPC分类号: G06T15/005

    摘要: The present disclosure relates to methods and apparatus for graphics processing, e.g., a GPU. The apparatus may receive an image including a plurality of pixels associated with one or more workgroups and one or more pixel tiles, each of the workgroups and the pixel tiles including one or more pixels of the plurality of pixels. The apparatus may determine whether the one or more workgroups are misaligned with the one or more pixel tiles. The apparatus may determine a conversion order of the one or more workgroups when the one or more workgroups are misaligned with the one or more pixel tiles, the conversion order corresponding to a common multiple of one of the one or more workgroups and one of the one or more pixel tiles. The apparatus may convert each of the one or more workgroups based on the conversion order of the one or more workgroups.

    GPR OPTIMIZATION IN A GPU BASED ON A GPR RELEASE MECHANISM

    公开(公告)号:US20230113415A1

    公开(公告)日:2023-04-13

    申请号:US18046901

    申请日:2022-10-14

    IPC分类号: G06T1/60 G06F9/30 G06T1/20

    摘要: This disclosure provides systems, devices, apparatus and methods, including computer programs encoded on storage media, for GPR optimization in a GPU based on a GPR release mechanism. More specifically, a GPU may determine at least one unutilized branch within an executable shader based on constants defined for the executable shader. Based on the at least one unutilized branch, the GPU may further determine a number of GPRs that can be deallocated from previously allocated GPRs. The GPU may deallocate, for a subsequent thread within a draw call, the number of GPRs from the previously allocated GPRs during execution of the executable shader based on the determined number of GPRs to be deallocated.

    RUNTIME MECHANISM TO OPTIMIZE SHADER EXECUTION FLOW

    公开(公告)号:US20240046543A1

    公开(公告)日:2024-02-08

    申请号:US17817815

    申请日:2022-08-05

    IPC分类号: G06T15/00 G06T15/80

    CPC分类号: G06T15/005 G06T15/80

    摘要: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for runtime optimization of the shader execution flow. A graphics processor may obtain instruction execution data associated with a graphics workload, the instruction execution data including graphics data for a set of shader operations. The graphics processor may configure, at a first iteration, at least one predication value based on the instruction execution data including the graphics data for the set of shader operations. The graphics processor may adjust, at a second iteration, an execution flow of the graphics workload based on the configured at least one predication value, the execution flow of the graphics workload including the set of shader operations. The graphics processor may execute or refrain from executing, at the second iteration, each of the set of shader operations based on the adjusted execution flow of the graphics workload.

    DYNAMIC WAVE PAIRING
    10.
    发明公开

    公开(公告)号:US20230267567A1

    公开(公告)日:2023-08-24

    申请号:US17652478

    申请日:2022-02-24

    IPC分类号: G06T1/20 G06F9/50

    CPC分类号: G06T1/20 G06F9/505

    摘要: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for dynamic wave pairing. A graphics processor may allocate one or more GPU workloads to one or more wave slots of a plurality of wave slots. The graphics processor may select a first execution slot of a plurality of execution slots for executing the one or more GPU workloads. The selection may be based on one of a plurality of granularities. The graphics processor may execute, at the selected first execution slot, the one or more GPU workloads at the one of the plurality of granularities.