-
公开(公告)号:US12086644B2
公开(公告)日:2024-09-10
申请号:US17399711
申请日:2021-08-11
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Steven Fishwick , David A. Gotwalt , Benjamin Bowman , Ralph C. Taylor , Melissa L. Velez , Mladen Wilder , Ali Rabbani Rankouhi , Fergus W. MacGarry
CPC classification number: G06F9/5044 , G06F9/4881 , G06F9/505 , G06T1/20 , G06T1/60
Abstract: Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.
-
公开(公告)号:US20210248006A1
公开(公告)日:2021-08-12
申请号:US17240406
申请日:2021-04-26
Applicant: Apple Inc.
Inventor: Mark D. Earl , Dimitri Tan , Christopher L. Spencer , Jeffrey T. Brady , Ralph C. Taylor , Terence M. Potter
IPC: G06F9/50
Abstract: In various embodiments, a resource allocation management circuit may allocate a plurality of different types of hardware resources (e.g., different types of registers) to a plurality of threads. The different types of hardware resources may correspond to a plurality of hardware resource allocation circuits. The resource allocation management circuit may track allocation of the hardware resources to the threads using state identification values of the threads. In response to determining that fewer than a respective requested number of one or more types of the hardware resources are available, the resource allocation management circuit may identify one or more threads for deallocation. As a result, the hardware resource allocation system may allocate hardware resources to threads more efficiently (e.g., may deallocate hardware resources allocated to fewer threads), as compared to a hardware resource allocation system that does not track allocation of hardware resources to threads using state identification values.
-
公开(公告)号:US10755383B2
公开(公告)日:2020-08-25
申请号:US16130265
申请日:2018-09-13
Applicant: Apple Inc.
Inventor: Justin A. Hensley , Karl D. Mann , Ralph C. Taylor , Randall R. Rauwendaal , Jonathan M. Redshaw
Abstract: Techniques are disclosed relating to rendering graphics objects. In some embodiments, a graphics unit is configured to transform graphics objects from a virtual space into a second space according to different transformation parameters for different portions of the second space. This may result in sampling different portions of the virtual space at different sample rates, which may reduce the number of samples required in various stages of the rendering process. In the disclosed techniques, transformation may occur prior to rasterization and shading, which may further reduce computation and power consumption in a graphics unit, improve image quality as displayed to a user, and/or reduce bandwidth usage or latency of video content on a network. In some embodiments, a transformed image may be viewed through a distortion-compensating lens or resampled prior to display.
-
公开(公告)号:US20240045808A1
公开(公告)日:2024-02-08
申请号:US18490588
申请日:2023-10-19
Applicant: Apple Inc.
Inventor: Justin A. Hensley , Karl D. Mann , Yoong Chert Foo , Terence M. Potter , Frank W. Liljeros , Ralph C. Taylor
IPC: G06F12/1018 , G06F12/084
CPC classification number: G06F12/1018 , G06F12/084 , G06F30/392
Abstract: Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.
-
公开(公告)号:US20230050061A1
公开(公告)日:2023-02-16
申请号:US17399711
申请日:2021-08-11
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Steven Fishwick , David A. Gotwalt , Benjamin Bowman , Ralph C. Taylor , Melissa L. Velez , Mladen Wilder , Ali Rabbani Rankouhi , Fergus W. MacGarry
Abstract: Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.
-
公开(公告)号:US20210271606A1
公开(公告)日:2021-09-02
申请号:US16804128
申请日:2020-02-28
Applicant: Apple Inc.
Inventor: Justin A. Hensley , Karl D. Mann , Yoong Chert Foo , Terence M. Potter , Frank W. Liljeros , Ralph C. Taylor
IPC: G06F12/1018 , G06F12/084
Abstract: Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.
-
公开(公告)号:US10074210B1
公开(公告)日:2018-09-11
申请号:US15659188
申请日:2017-07-25
Applicant: Apple Inc.
Inventor: Christopher L. Spencer , Karl D. Mann , Ralph C. Taylor , Dinesh D. Kuwar
CPC classification number: G06T15/40 , G06T11/40 , G06T15/005 , G06T15/08 , G06T15/80 , G06T17/20 , G09G5/00 , G09G5/363
Abstract: Techniques are disclosed relating to rendering graphics objects that require shader operations to determine visibility. In some embodiments, a graphics unit is configured to process feedback objects, which may require shading to determine whether they are visible relative to previously-processed objects, out of draw order. For example, in embodiments where a buffer is used to store fragment data for deferred rendering, the graphics unit may bypass the buffer and shade feedback objects ahead of earlier non-feedback objects whose fragment data is stored in the buffer. This may allow a determination of whether to remove occluded non-feedback fragment data from the buffer, which may reduce graphics overdraw. In disclosed two-pass techniques, data for feedback objects is first allowed to bypass the buffer for visibility shading, but is then stored in the buffer for a second pass to perform fragment shading to actually determine pixel attributes, which may further reduce overdraw.
-
公开(公告)号:US12182926B1
公开(公告)日:2024-12-31
申请号:US18055111
申请日:2022-11-14
Applicant: Apple Inc.
Inventor: Jeffrey T. Brady , Jason D. Carroll , Michael A. Mang , Ralph C. Taylor
Abstract: Techniques are disclosed relating to using an initial version of an object shader to determine a child count and distribute geometry work based on the child count. In some embodiments, graphics shader circuitry is configured to execute shader programs including object shaders and mesh shaders. Vertex control circuitry is configured to, for a given object shader: launch an initial version of the given object shader to determine a number of meshlets to be generated by the given object shader (e.g., where the initial version of the given object shader does not commit side effects to architectural state of the apparatus) and select shader circuitry to execute a complete version of the given object shader based on the determined number of meshlets.
-
公开(公告)号:US12169898B1
公开(公告)日:2024-12-17
申请号:US18054581
申请日:2022-11-11
Applicant: Apple Inc.
Inventor: Michael A. Mang , Jason D. Carroll , Jingfei Kong , Ralph C. Taylor
Abstract: Techniques are disclosed relating to object and mesh shaders executed by a graphics processor. In some embodiments, a device includes buffer circuitry and shader circuitry configured to execute graphics programs. Control circuitry may: generate object shader work and mesh shader work for the shader circuitry, receive output information generated by a mesh shader that indicates a number of vertices and primitives to be output by the mesh shader, allocate, based on the output information and after execution of at least a portion of the mesh shader, a region of the buffer circuitry for storage of the vertices to be output by the mesh shader, and store the vertices output by the mesh shader in the allocated region. Disclosed techniques may advantageously provide efficient use of limited buffer resources.
-
公开(公告)号:US10223822B2
公开(公告)日:2019-03-05
申请号:US15388915
申请日:2016-12-22
Applicant: Apple Inc.
Inventor: Terence M. Potter , Ralph C. Taylor , Richard W. Schreyer , Aaftab A. Munshi , Justin A. Hensley
Abstract: Techniques are disclosed relating to performing mid-render auxiliary compute tasks for graphics processing. In some embodiments, auxiliary compute tasks are performed during a render pass, using at least a portion of a memory context of the render pass, without accessing a shared memory during the render pass. Relative to flushing render data to shared memory to perform compute tasks, this may reduce memory accesses and/or cache thrashing, which may in turn increase performance and/or reduce power consumption.
-
-
-
-
-
-
-
-
-