-
公开(公告)号:US20230077058A1
公开(公告)日:2023-03-09
申请号:US17468328
申请日:2021-09-07
Applicant: Apple Inc.
Inventor: Benjamin Bowman , Fergus W. MacGarry , Kutty Banerjee , Pratik Chandresh Shah
Abstract: Disclosed techniques relate to distributing graphics work based on priority. In some embodiments, circuitry implements a plurality of tracking slots for sets of graphics work. A set of graphics processor sub-units may each implement multiple distributed hardware slots. Control circuitry may attempt to assign a first set of graphics work having a first priority to a graphics processor sub-unit that is currently executing graphics work having an equal or higher priority than the first priority, where the first set of graphics work is from a first tracking slot. The control circuitry may, in response to a failure of the attempt, generate a signal to graphics software that indicates the failure, wherein the signal indicates the first tracking slot. Disclosed techniques may reduce or avoid problems relating to higher priority work being scheduled behind lower priority work.
-
公开(公告)号:US12190164B2
公开(公告)日:2025-01-07
申请号:US17399808
申请日:2021-08-11
Applicant: Apple Inc.
Inventor: Steven Fishwick , Fergus W. MacGarry , Jonathan M. Redshaw , David A. Gotwalt , Ali Rabbani Rankouhi , Benjamin Bowman
Abstract: Disclosed embodiments relate to controlling sets of graphics work (e.g., kicks) assigned to graphics processor circuitry. In some embodiments, tracking slot circuitry implements entries for multiple tracking slots. Slot manager circuitry may store, using an entry of the tracking slot circuitry, software-specified information for a set of graphics work, where the information includes: type of work, dependencies on other sets of graphics work, and location of data for the set of graphics work. The slot manager circuitry may prefetch, from the location and prior to allocating shader core resources for the set of graphics work, configuration register data for the set of graphics work. Control circuitry may program configuration registers for the set of graphics work using the prefetched data and initiate processing of the set of graphics work by the graphics processor circuitry according to the dependencies. Disclosed techniques may reduce kick-to-kick transition time, in some embodiments.
-
公开(公告)号:US20230075531A1
公开(公告)日:2023-03-09
申请号:US17468312
申请日:2021-09-07
Applicant: Apple Inc.
Inventor: Benjamin Bowman , Fergus W. MacGarry , Kutty Banerjee , Pratik Chandresh Shah
IPC: G06F9/48
Abstract: Disclosed techniques relate to circuitry configured to aggregate and report usage information in a distributed processor (e.g., a GPU). In some embodiments, graphics processor circuitry that includes at least first and second portions that are respectively configured to execute sets of graphics work. First utilization circuitry may track execution time for sets of graphics work on the first portion of the graphics processor circuitry and second utilization circuitry may track execution time for sets of graphics work on the second portion of the graphics processor circuitry. Command queue circuitry may store multiple different command queues. Control circuitry may access the first and second utilization circuitry and aggregate utilization data on a per-command-queue basis, where for a given command queue, the aggregated utilization data indicates respective utilization of the first and second portions of the graphics processor circuitry. The control circuitry may provide the aggregated per-command-queue utilization data in software-accessible registers.
-
公开(公告)号:US20200104180A1
公开(公告)日:2020-04-02
申请号:US16145573
申请日:2018-09-28
Applicant: Apple Inc.
Inventor: Kutty Banerjee , Benjamin Bowman , Terence M. Potter , Tatsuya Iwamoto , Gokhan Avkarogullari
Abstract: In general, embodiments are disclosed for tracking and allocating graphics processor hardware resources. More particularly, a graphics hardware resource allocation system is able to generate a priority list for a plurality of data masters for graphics processor based on a comparison between a current utilizations for the data masters and a target utilizations for the data masters. The graphics hardware resource allocation system designate, based on the priority list, a first data master with a higher priority to submit work to the graphics processor compared to a second data master. The graphics hardware resource allocation system determines a stall counter value for the data master and generates a notification to pause work for the second data master based on the stall counter value.
-
公开(公告)号:US20240272940A1
公开(公告)日:2024-08-15
申请号:US18450978
申请日:2023-08-16
Applicant: Apple Inc.
Inventor: Benjamin Bowman , Ali Rabbani Rankouhi , Jonathan M. Redshaw , Steven Fishwick
CPC classification number: G06F9/4881 , G06F9/3887 , G06F9/485
Abstract: Disclosed techniques relate to scheduling sets of graphics work with dependencies. In some embodiments, a first set of graphics work depends on a second set of graphics work. Control circuitry may, in response to a release signal that indicates the second set reaching a first processing point, initiate processing of the first set. Control circuitry may, in response to reaching a kick gate point, stall processing of the first set. Control circuitry may, in response to an end signal for the second set, resume processing of the first set.
-
公开(公告)号:US20230050061A1
公开(公告)日:2023-02-16
申请号:US17399711
申请日:2021-08-11
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Steven Fishwick , David A. Gotwalt , Benjamin Bowman , Ralph C. Taylor , Melissa L. Velez , Mladen Wilder , Ali Rabbani Rankouhi , Fergus W. MacGarry
Abstract: Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.
-
公开(公告)号:US11500692B2
公开(公告)日:2022-11-15
申请号:US17021720
申请日:2020-09-15
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Benjamin Bowman
Abstract: Techniques are disclosed relating to dynamically adjusting buffering for distributing compute work in a graphics processor. In some embodiments, the graphics processor includes shader circuitry configured to process compute work from a compute kernel, multiple distributed workload parser circuits configured to send compute work to the shader circuitry, primary workload parser circuitry configured to send, via a communications fabric, compute work from the compute kernel to the distributed workload parser circuits, and buffer circuitry configured to buffer compute work received by one or more of the distributed workload parser circuits from the primary workload parser circuitry. In some embodiments, the graphics processor is configured to dynamically adjust a limit on the number of entries used in the buffer circuitry based on information indicating complexity of the compute kernel. This may advantageously maintain launch rates while reducing or avoiding workload imbalances, in some embodiments.
-
公开(公告)号:US20220083396A1
公开(公告)日:2022-03-17
申请号:US17021720
申请日:2020-09-15
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Benjamin Bowman
Abstract: Techniques are disclosed relating to dynamically adjusting buffering for distributing compute work in a graphics processor. In some embodiments, the graphics processor includes shader circuitry configured to process compute work from a compute kernel, multiple distributed workload parser circuits configured to send compute work to the shader circuitry, primary workload parser circuitry configured to send, via a communications fabric, compute work from the compute kernel to the distributed workload parser circuits, and buffer circuitry configured to buffer compute work received by one or more of the distributed workload parser circuits from the primary workload parser circuitry. In some embodiments, the graphics processor is configured to dynamically adjust a limit on the number of entries used in the buffer circuitry based on information indicating complexity of the compute kernel. This may advantageously maintain launch rates while reducing or avoiding workload imbalances, in some embodiments.
-
公开(公告)号:US12086644B2
公开(公告)日:2024-09-10
申请号:US17399711
申请日:2021-08-11
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Steven Fishwick , David A. Gotwalt , Benjamin Bowman , Ralph C. Taylor , Melissa L. Velez , Mladen Wilder , Ali Rabbani Rankouhi , Fergus W. MacGarry
CPC classification number: G06F9/5044 , G06F9/4881 , G06F9/505 , G06T1/20 , G06T1/60
Abstract: Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.
-
公开(公告)号:US12039368B2
公开(公告)日:2024-07-16
申请号:US17468328
申请日:2021-09-07
Applicant: Apple Inc.
Inventor: Benjamin Bowman , Fergus W. MacGarry , Kutty Banerjee , Pratik Chandresh Shah
CPC classification number: G06F9/4881 , G06F9/4831 , G06F9/5011 , G06F9/5038 , G06F9/546 , G06T1/20
Abstract: Disclosed techniques relate to distributing graphics work based on priority. In some embodiments, circuitry implements a plurality of tracking slots for sets of graphics work. A set of graphics processor sub-units may each implement multiple distributed hardware slots. Control circuitry may attempt to assign a first set of graphics work having a first priority to a graphics processor sub-unit that is currently executing graphics work having an equal or higher priority than the first priority, where the first set of graphics work is from a first tracking slot. The control circuitry may, in response to a failure of the attempt, generate a signal to graphics software that indicates the failure, wherein the signal indicates the first tracking slot. Disclosed techniques may reduce or avoid problems relating to higher priority work being scheduled behind lower priority work.
-
-
-
-
-
-
-
-
-