-
公开(公告)号:US20230290034A1
公开(公告)日:2023-09-14
申请号:US18317825
申请日:2023-05-15
Applicant: QUALCOMM Incorporated
Inventor: Thomas Edwin FRISINGER , Richard HAMMERSTONE , Andrew Evan GRUBER , Gang ZHONG , Yun DU , Jonnala Gadda NAGENDRA KUMAR
CPC classification number: G06T15/005 , G06F9/30101 , G06F9/30123 , G06T1/20 , G06T1/60 , G06T15/80
Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for fast incremental shared constants. In aspects, a CPU may determine/update shared constant data for a first draw call of a plurality of draw calls. The shared constant data, which may correspond to at least one shader, may be updated based on a draw call update for the first draw call. The CPU may communicate the updated shared constant data for the first draw call to a GPU. The GPU may receive, in at least one register, the updated shared constant data from the CPU and configure the at least one register based on the updated shared constant data corresponding to the draw call update of the first draw call of the plurality of draw calls.
-
公开(公告)号:US20210358076A1
公开(公告)日:2021-11-18
申请号:US16877367
申请日:2020-05-18
Applicant: QUALCOMM Incorporated
Inventor: Andrew Evan GRUBER , Yun DU
Abstract: This disclosure provides systems, devices, apparatus and methods, including computer programs encoded on storage media, for GPR optimization in a GPU based on a GPR release mechanism. More specifically, a GPU may determine at least one unutilized branch within an executable shader based on constants defined for the executable shader. Based on the at least one unutilized branch, the GPU may further determine a number of GPRs that can be deallocated from previously allocated GPRs. The GPU may deallocate, for a subsequent thread within a draw call, the number of GPRs from the previously allocated GPRs during execution of the executable shader based on the determined number of GPRs to be deallocated.
-
公开(公告)号:US20230377240A1
公开(公告)日:2023-11-23
申请号:US17664033
申请日:2022-05-18
Applicant: QUALCOMM Incorporated
Inventor: Yun DU , Eric DEMERS , Andrew Evan GRUBER , Chun YU , Chihong ZHANG , Baoguang YANG , Yuehai DU , Gang ZHONG , Avinash SEETHARAMAIAH , Jonnala Gadda NAGENDRA KUMAR
CPC classification number: G06T15/005 , G06T1/60
Abstract: Aspects presented herein relate to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may receive a set of draw call instructions corresponding to a graphics workload, where the set of draw call instructions is associated with at least one run-time parameter. The apparatus may also obtain a first shader program associated with storing data in a system memory and at least one second shader program associated with storing data in a constant memory. Further, the apparatus may execute the first shader program or the at least one second shader program based on whether the at least one run-time parameter is less than or equal to a size of the constant memory. The apparatus may also update or maintain a configuration of a shader processor or a streaming processor based on executing the first shader program or the at least one second shader program.
-
公开(公告)号:US20230019763A1
公开(公告)日:2023-01-19
申请号:US17758219
申请日:2020-01-31
Applicant: QUALCOMM Incorporated
Inventor: Yun DU , Andrew Evan GRUBER , Chun YU , Chihong ZHANG , Thomas Edwin FRISINGER , Richard HAMMERSTONE , Zilin YING , Heng QI , Quanquan XU , Sheng GU
IPC: G06T1/60
Abstract: The present disclosure relates to methods and apparatus for graphics processing. For example, disclosed techniques facilitate improving bindless state processing at a graphics processor. Aspects of the present disclosure can receive, at a graphics processor, a shader program including a preamble section and a main instructions section. Aspects of the present disclosure can also execute, with a scalar processor dedicated to processing preamble sections, instructions of the preamble section to implement a bindless mechanism for loading constant data associated with the shader program. Additionally, aspects of the present disclosure can distribute the main instructions section and the constant data to a streaming processor for executing the shader program.
-
公开(公告)号:US20220357983A1
公开(公告)日:2022-11-10
申请号:US17315205
申请日:2021-05-07
Applicant: QUALCOMM Incorporated
Inventor: Yun DU , Andrew Evan GRUBER , Zilin YING , Gang ZHONG , Baoguang YANG , Yang YU , Yang XIA , Ravindra KUMAR , Chun YU , Eric DEMERS
IPC: G06F9/48 , G06F12/0875 , G06T1/20
Abstract: The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may receive a plurality of workloads based on a workload order, each of the plurality of workloads being received in the workload order including at least a first workload and a second workload. The apparatus may also allocate one or more workloads of the plurality of workloads to one or more wave slots. Additionally, the apparatus may execute the one or more allocated workloads at the one or more wave slots, such that at least the first workload is executed at the first wave slot and the second workload is executed at the second wave slot. The apparatus may also allocate at least one other workload of the plurality of workloads to at least one previously-allocated wave slot of the one or more wave slots.
-
公开(公告)号:US20240046543A1
公开(公告)日:2024-02-08
申请号:US17817815
申请日:2022-08-05
Applicant: QUALCOMM Incorporated
Inventor: Yun DU , Eric DEMERS , Andrew Evan GRUBER , Chun YU , Baoguang YANG , Chihong ZHANG , Yuehai DU , Avinash SEETHARAMAIAH , Jonnala Gadda NAGENDRA KUMAR , Gang ZHONG , Zilin YING , Fei WEI
CPC classification number: G06T15/005 , G06T15/80
Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for runtime optimization of the shader execution flow. A graphics processor may obtain instruction execution data associated with a graphics workload, the instruction execution data including graphics data for a set of shader operations. The graphics processor may configure, at a first iteration, at least one predication value based on the instruction execution data including the graphics data for the set of shader operations. The graphics processor may adjust, at a second iteration, an execution flow of the graphics workload based on the configured at least one predication value, the execution flow of the graphics workload including the set of shader operations. The graphics processor may execute or refrain from executing, at the second iteration, each of the set of shader operations based on the adjusted execution flow of the graphics workload.
-
公开(公告)号:US20230267567A1
公开(公告)日:2023-08-24
申请号:US17652478
申请日:2022-02-24
Applicant: QUALCOMM Incorporated
Inventor: Yun DU , Andrew Evan GRUBER , Zilin YING , Chunling HU , Baoguang YANG , Yang XIA , Gang ZHONG , Chun YU , Eric DEMERS
Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for dynamic wave pairing. A graphics processor may allocate one or more GPU workloads to one or more wave slots of a plurality of wave slots. The graphics processor may select a first execution slot of a plurality of execution slots for executing the one or more GPU workloads. The selection may be based on one of a plurality of granularities. The graphics processor may execute, at the selected first execution slot, the one or more GPU workloads at the one of the plurality of granularities.
-
公开(公告)号:US20220139021A1
公开(公告)日:2022-05-05
申请号:US17085272
申请日:2020-10-30
Applicant: QUALCOMM Incorporated
Inventor: Thomas Edwin FRISINGER , Richard HAMMERSTONE , Andrew Evan GRUBER , Gang ZHONG , Yun DU , Jonnala Gadda NAGENDRA KUMAR
Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for fast incremental shared constants. In aspects, a CPU may determine/update shared constant data for a first draw call of a plurality of draw calls. The shared constant data, which may correspond to at least one shader, may be updated based on a draw call update for the first draw call. The CPU may communicate the updated shared constant data for the first draw call to a GPU. The GPU may receive, in at least one register, the updated shared constant data from the CPU and configure the at least one register based on the updated shared constant data corresponding to the draw call update of the first draw call of the plurality of draw calls.
-
公开(公告)号:US20200311859A1
公开(公告)日:2020-10-01
申请号:US16368782
申请日:2019-03-28
Applicant: QUALCOMM Incorporated
Inventor: Yun DU , Nigel POOLE , Zilin YING , Ling Feng HUANG , Donghyun KIM , Chun YU , Tzun-Wei LEE , Xuefeng TANG , Shambhoo KHANDELWAL , Hongjiang SHANG , Elina KAMENETSKAYA , Zhu LIANG , Cary ROBINS
Abstract: The present disclosure relates to methods and apparatus for graphics processing. In some aspects, multiple processing units can be in a graphics processing pipeline of a GPU. The apparatus can also group the multiple processing units into one or more processing unit clusters. In some aspects, each of the one or more processing unit clusters can correspond to one or more context registers. Additionally, the apparatus can determine one or more context states of the one or more context registers in each of the one or more processing unit clusters. Also, the apparatus can implement one or more execution counters corresponding to at least one of the one or more processing unit clusters in the graphics processing pipeline, where each of the one or more execution counters includes an execution value.
-
公开(公告)号:US20240037183A1
公开(公告)日:2024-02-01
申请号:US18487918
申请日:2023-10-16
Applicant: QUALCOMM Incorporated
Inventor: Yun DU , Gang ZHONG , Fei WEI , Yibin ZHANG , Jing HAN , Hongjiang SHANG , Elina KAMENETSKAYA , Minjie HUANG , Alexei Vladimirovich BOURD , Chun YU , Andrew Evan GRUBER , Eric DEMERS
Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
-
-
-
-
-
-
-
-
-