-
公开(公告)号:US20230113415A1
公开(公告)日:2023-04-13
申请号:US18046901
申请日:2022-10-14
Applicant: QUALCOMM Incorporated
Inventor: Andrew Evan GRUBER , Yun DU
Abstract: This disclosure provides systems, devices, apparatus and methods, including computer programs encoded on storage media, for GPR optimization in a GPU based on a GPR release mechanism. More specifically, a GPU may determine at least one unutilized branch within an executable shader based on constants defined for the executable shader. Based on the at least one unutilized branch, the GPU may further determine a number of GPRs that can be deallocated from previously allocated GPRs. The GPU may deallocate, for a subsequent thread within a draw call, the number of GPRs from the previously allocated GPRs during execution of the executable shader based on the determined number of GPRs to be deallocated.
-
公开(公告)号:US20220327654A1
公开(公告)日:2022-10-13
申请号:US17229697
申请日:2021-04-13
Applicant: QUALCOMM Incorporated
Inventor: Vishwanath Shashikant NIKAM , Kalyan Kumar BHIRAVABHATLA , Suvam CHATTERJEE , Siva Satyanarayana KOLA , Abhishek LAL , Andrew Evan GRUBER
Abstract: The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may receive a plurality of indices for each of a plurality of primitives. The apparatus may also determine a size of each of a plurality of primitive batches, each of the plurality of primitive batches including at least one primitive of the plurality of primitives. Additionally, the apparatus may divide, based on the determined size of each of the plurality of primitive batches, the plurality of primitives into the plurality of primitive batches. The apparatus may also distribute each of the plurality of primitive batches to each of a plurality of geometry slices, each of the plurality of geometry slices including one or more primitives of the plurality of primitives.
-
23.
公开(公告)号:US20220036498A1
公开(公告)日:2022-02-03
申请号:US16984024
申请日:2020-08-03
Applicant: QUALCOMM Incorporated
Inventor: Liang LI , Elina KAMENETSKAYA , Andrew Evan GRUBER
Abstract: The present disclosure relates to methods and apparatus for mapping a source location of input data for processing by a graphics processing unit. The apparatus can configure a processing element of the graphics processing unit with a predefined rule for decoding a data source parameter for executing a task by the graphics processing unit. Moreover, the apparatus can store the parameter in local storage of the processing element and configure the processing element to decode the parameter according to the at least one predefined rule to determine a source location of the input data and at least one relationship between invocations of the task. The apparatus can also load, to the local storage of the processing element, the input data from a plurality of memory addresses of the source location determined by the parameter. A one logic unit can then execute the task on the loaded input data.
-
公开(公告)号:US20200020067A1
公开(公告)日:2020-01-16
申请号:US16035372
申请日:2018-07-13
Applicant: QUALCOMM Incorporated
Inventor: Jian LIANG , Tao WANG , Chun YU , Andrew Evan GRUBER , Donghyun KIM , Nigel POOLE , Tzun-Wei LEE , Shambhoo KHANDELWAL
Abstract: A method, an apparatus, and a computer-readable medium may be configured to perform a binning pass for a first frame. The apparatus may be configured to perform a rendering pass for the first frame in parallel with the binning pass. The apparatus may be configured to enhance efficiency in performing a binning pass and a rendering pass for tile-based rendering, such that the binning pass and rendering pass are performed concurrently. The apparatus may be configured to perform the binning pass using a first hardware pipeline, and may be configured to perform the rendering pass using a second hardware pipeline.
-
公开(公告)号:US20250166287A1
公开(公告)日:2025-05-22
申请号:US19031051
申请日:2025-01-17
Applicant: QUALCOMM Incorporated
Inventor: Piyush GUPTA , Pavan Kumar AKKARAJU , Alexei Vladimirovich BOURD , Andrew Evan GRUBER
Abstract: Systems and techniques are provided for accelerated ray tracing. For instance, a process can include obtaining a hierarchical acceleration data structure that includes a plurality of primitives of a scene object and obtaining a respective information value associated with each primitive included in the plurality of primitives. A sort order can be determined for two or more nodes included in a same level of the hierarchical acceleration data structure at least in part by sorting the two or more nodes based on a respective sorting parameter value determined for each respective node of the two or more nodes. Each respective sorting parameter value can be determined based on at least one information value associated with one or more primitives included in a sub-tree of each respective node of the two or more nodes. The hierarchical acceleration data structure can be traversed using the sort order.
-
公开(公告)号:US20240046543A1
公开(公告)日:2024-02-08
申请号:US17817815
申请日:2022-08-05
Applicant: QUALCOMM Incorporated
Inventor: Yun DU , Eric DEMERS , Andrew Evan GRUBER , Chun YU , Baoguang YANG , Chihong ZHANG , Yuehai DU , Avinash SEETHARAMAIAH , Jonnala Gadda NAGENDRA KUMAR , Gang ZHONG , Zilin YING , Fei WEI
CPC classification number: G06T15/005 , G06T15/80
Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for runtime optimization of the shader execution flow. A graphics processor may obtain instruction execution data associated with a graphics workload, the instruction execution data including graphics data for a set of shader operations. The graphics processor may configure, at a first iteration, at least one predication value based on the instruction execution data including the graphics data for the set of shader operations. The graphics processor may adjust, at a second iteration, an execution flow of the graphics workload based on the configured at least one predication value, the execution flow of the graphics workload including the set of shader operations. The graphics processor may execute or refrain from executing, at the second iteration, each of the set of shader operations based on the adjusted execution flow of the graphics workload.
-
公开(公告)号:US20230267567A1
公开(公告)日:2023-08-24
申请号:US17652478
申请日:2022-02-24
Applicant: QUALCOMM Incorporated
Inventor: Yun DU , Andrew Evan GRUBER , Zilin YING , Chunling HU , Baoguang YANG , Yang XIA , Gang ZHONG , Chun YU , Eric DEMERS
Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for dynamic wave pairing. A graphics processor may allocate one or more GPU workloads to one or more wave slots of a plurality of wave slots. The graphics processor may select a first execution slot of a plurality of execution slots for executing the one or more GPU workloads. The selection may be based on one of a plurality of granularities. The graphics processor may execute, at the selected first execution slot, the one or more GPU workloads at the one of the plurality of granularities.
-
公开(公告)号:US20230092394A1
公开(公告)日:2023-03-23
申请号:US17478694
申请日:2021-09-17
Applicant: QUALCOMM Incorporated
Inventor: Ashokanand NEELAMBARAN , Piyush GUPTA , Kalyan Kumar BHIRAVABHATLA , Tao WANG , Andrew Evan GRUBER
IPC: G06T1/20 , G06T11/40 , G06T3/40 , A63F13/525
Abstract: Aspects presented herein relate to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may receive a plurality of primitives associated with one or more frames in a scene, a portion of the scene being associated with an upscaled sample space and/or a downscaled sample space. The apparatus may also perform a binning pass for the plurality of primitives, the binning pass being associated with an unscaled sample space, where the binning pass sorts each of the primitives into one or more bins associated with each of the one or more frames. Further, the apparatus may perform one of one or more rendering passes for each of the one or more bins. The apparatus may also rasterize each of the plurality of primitives based on at least one of the upscaled sample space or the downscaled sample space.
-
公开(公告)号:US20230017522A1
公开(公告)日:2023-01-19
申请号:US17373704
申请日:2021-07-12
Applicant: QUALCOMM Incorporated
Inventor: Sreyas KURUMANGHAT , Kalyan Kumar BHIRAVABHATLA , Andrew Evan GRUBER , Tao WANG , Baoguang YANG , Pavan Kumar AKKARAJU
Abstract: The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may configure a portion of a GPU to include at least one depth processing block, the at least one depth processing block being associated with at least one depth buffer. The apparatus may also identify one or more depth passes of each of a plurality of graphics workloads, the plurality of graphics workloads being associated with a plurality of frames. Further, the apparatus may process each of the one or more depth passes in the portion of the GPU including the at least one depth processing block, each of the one or more depth passes being processed by the at least one depth processing block, the one or more depth passes being associated with the at least one depth buffer.
-
公开(公告)号:US20220253969A1
公开(公告)日:2022-08-11
申请号:US17173643
申请日:2021-02-11
Applicant: QUALCOMM Incorporated
Inventor: Elina KAMENETSKAYA , Liang LI , Andrew Evan GRUBER , Jeffrey LEGER , Balaji CALIDAS , Ruihao ZHANG
IPC: G06T1/60
Abstract: The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may modify at least one texture memory object to support a data structure for one or more tensor objects. The apparatus may also determine one or more supported memory layouts for the one or more tensor objects based on the modified at least one texture memory object. Additionally, the apparatus may access data associated with the one or more tensor objects based on the one or more supported memory layouts, the data for each of the one or more tensor objects corresponding to at least one data instruction. The apparatus may also execute the at least one data instruction based on the accessed data associated with the one or more tensor objects.
-
-
-
-
-
-
-
-
-