-
公开(公告)号:US10089077B1
公开(公告)日:2018-10-02
申请号:US15402820
申请日:2017-01-10
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Terence M. Potter , Brian K. Reynolds , Justin Friesenhahn
Abstract: Techniques are disclosed relating to performing arithmetic operations to generate values for different related threads. In some embodiments, the threads are graphics threads and the values are operand locations. In some embodiments, an apparatus performs an arithmetic operation using first circuitry, on type value inputs for different threads that are encoded to represent values to be operated on by the first circuitry. In some embodiments, second arithmetic circuitry is configured to perform an arithmetic operation on an output of the first circuitry and an input (e.g., address information such as a base and an offset) that is common to the different threads and has a greater number of bits than the output of the first circuitry. In various embodiments, disclosed techniques may allow decoding of encoded values for different threads (which may reduce memory requirements relative to non-encoded values) with a shorter critical path and lower power consumption, e.g., relative to sequential decoding.
-
公开(公告)号:US10353711B2
公开(公告)日:2019-07-16
申请号:US15257386
申请日:2016-09-06
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Brian K. Reynolds , Liang Xia , Terence M. Potter
Abstract: Techniques are disclosed relating to clause-based execution of program instructions, which may be single-instruction multiple data (SIMD) computer instructions. In some embodiments, an apparatus includes execution circuitry configured to receive clauses of instructions and SIMD groups of input data to be operated on by the clauses. In some embodiments, the apparatus further includes one or more storage elements configured to store state information for clauses processed by the execution circuitry. In some embodiments, the apparatus further includes scheduling circuitry configured to send instructions of a first clause and corresponding input data for execution by the execution circuitry and indicate, prior to sending instruction and input data of a second clause to the execution circuitry for execution, whether the second clause and a first clause are assigned to operate on groups of input data corresponding to the same instruction stream. In some embodiments, the apparatus is configured to determine, based on the indication, whether to maintain as valid, for use by the second clause, stored state information for the first clause.
-
3.
公开(公告)号:US10324726B1
公开(公告)日:2019-06-18
申请号:US15429982
申请日:2017-02-10
Applicant: Apple Inc.
Inventor: Michael A. Geary , Brian K. Reynolds , Terence M. Potter
IPC: G06F9/30 , G06F9/38 , G06F12/0897 , G06F12/0875
Abstract: Techniques are disclosed relating to scheduling graphics instructions for execution on different types of execution units based on characteristics of decoded and cached graphics instruction. In some embodiments, a graphics unit includes multiple different types of execution units that are configured to execute different types of instructions (e.g., different units for datapath, sample, load/store, etc.). In some embodiments, the graphics unit stores decoded instructions in an instruction cache in at least one cache level, along with information specifying characteristics of the instructions. The characteristics may be stored at clause granularity and may indicate the type of instructions in each clause (e.g., corresponding to which type of execution unit is configured to execute the instructions). In some embodiments, scheduling circuitry is configured to access the information and select instructions from the instruction cache to send to ones of the plurality of execution units based on the stored information.
-
公开(公告)号:US11360780B2
公开(公告)日:2022-06-14
申请号:US16749618
申请日:2020-01-22
Applicant: Apple Inc.
Inventor: Benjiman L. Goodman , Terence M. Potter , Anjana Rajendran , Jeffrey T. Brady , Brian K. Reynolds , Jeffrey A. Lohman
IPC: G06F9/38
Abstract: Techniques are disclosed relating to context switching in a SIMD processor. In some embodiments, an apparatus includes pipeline circuitry configured to execute graphics instructions included in threads of a group of single-instruction multiple-data (SIMD) threads in a thread group. In some embodiments, context switch circuitry is configured to atomically: save, for the SIMD group, a program counter and information that indicates whether threads in the SIMD group are active using one or more context switch registers, set all threads to an active state for the SIMD group, and branch to handler code for the SIMD group. In some embodiments, the pipeline circuitry is configured to execute the handler code to save context information for the SIMD group and subsequently execute threads of another thread group. Disclosed techniques may allow instruction-level context switching even when some SIMD threads are non-active.
-
公开(公告)号:US20210224072A1
公开(公告)日:2021-07-22
申请号:US16749618
申请日:2020-01-22
Applicant: Apple Inc.
Inventor: Benjiman L. Goodman , Terence M. Potter , Anjana Rajendran , Jeffrey T. Brady , Brian K. Reynolds , Jeffrey A. Lohman
IPC: G06F9/38
Abstract: Techniques are disclosed relating to context switching in a SIMD processor. In some embodiments, an apparatus includes pipeline circuitry configured to execute graphics instructions included in threads of a group of single-instruction multiple-data (SIMD) threads in a thread group. In some embodiments, context switch circuitry is configured to atomically: save, for the SIMD group, a program counter and information that indicates whether threads in the SIMD group are active using one or more context switch registers, set all threads to an active state for the SIMD group, and branch to handler code for the SIMD group. In some embodiments, the pipeline circuitry is configured to execute the handler code to save context information for the SIMD group and subsequently execute threads of another thread group. Disclosed techniques may allow instruction-level context switching even when some SIMD threads are non-active.
-
公开(公告)号:US20190034166A1
公开(公告)日:2019-01-31
申请号:US16146147
申请日:2018-09-28
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Terence M. Potter , Brian K. Reynolds , Justin Friesenhahn
IPC: G06F7/505
Abstract: Techniques are disclosed relating to performing arithmetic operations to generate values for different related threads. In some embodiments, the threads are graphics threads and the values are operand locations. In some embodiments, an apparatus includes circuitry configured to generate results for multiple threads by performing a plurality of arithmetic operations indicated by an instruction. In some embodiments, the instruction specifies: an input value that is common to the multiple threads and, for at least one of the multiple threads, a type value that indicates whether to generate a result for the thread by performing an arithmetic operation based on a first input that is a result of an arithmetic operation from another thread of the multiple threads or to generate a result for the thread using the input value that is common to the multiple threads. In some embodiments, the circuitry is configured to generate a result for the at least one of the multiple threads by selectively performing the arithmetic operation or using the input value that is common to the multiple threads based on the type value.
-
公开(公告)号:US10387119B2
公开(公告)日:2019-08-20
申请号:US16146147
申请日:2018-09-28
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Terence M. Potter , Brian K. Reynolds , Justin Friesenhahn
Abstract: Techniques are disclosed relating to performing arithmetic operations to generate values for different related threads. In some embodiments, the threads are graphics threads and the values are operand locations. In some embodiments, an apparatus includes circuitry configured to generate results for multiple threads by performing a plurality of arithmetic operations indicated by an instruction. In some embodiments, the instruction specifies: an input value that is common to the multiple threads and, for at least one of the multiple threads, a type value that indicates whether to generate a result for the thread by performing an arithmetic operation based on a first input that is a result of an arithmetic operation from another thread of the multiple threads or to generate a result for the thread using the input value that is common to the multiple threads. In some embodiments, the circuitry is configured to generate a result for the at least one of the multiple threads by selectively performing the arithmetic operation or using the input value that is common to the multiple threads based on the type value.
-
公开(公告)号:US20180067748A1
公开(公告)日:2018-03-08
申请号:US15257386
申请日:2016-09-06
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Brian K. Reynolds , Liang Xia , Terence M. Potter
IPC: G06F9/38
CPC classification number: G06F9/3867 , G06F9/3851 , G06F9/3887
Abstract: Techniques are disclosed relating to clause-based execution of program instructions, which may be single-instruction multiple data (SIMD) computer instructions. In some embodiments, an apparatus includes execution circuitry configured to receive clauses of instructions and SIMD groups of input data to be operated on by the clauses. In some embodiments, the apparatus further includes one or more storage elements configured to store state information for clauses processed by the execution circuitry. In some embodiments, the apparatus further includes scheduling circuitry configured to send instructions of a first clause and corresponding input data for execution by the execution circuitry and indicate, prior to sending instruction and input data of a second clause to the execution circuitry for execution, whether the second clause and a first clause are assigned to operate on groups of input data corresponding to the same instruction stream. In some embodiments, the apparatus is configured to determine, based on the indication, whether to maintain as valid, for use by the second clause, stored state information for the first clause.
-
公开(公告)号:US09633409B2
公开(公告)日:2017-04-25
申请号:US13975520
申请日:2013-08-26
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Brian K. Reynolds , Michael A. Geary
CPC classification number: G06T1/20 , G06F9/30072 , G06F9/3017 , G06F9/30181 , G06F9/30185 , G06F9/3867 , G06F9/3877
Abstract: Techniques are disclosed relating to predication. In one embodiment, a graphics processing unit is disclosed that includes a first set of architecturally-defined registers configured to store predication information. The graphics processing unit further includes a second set of registers configured to mirror the first set of registers and an execution pipeline configured to discontinue execution of an instruction sequence based on predication information in the second set of registers. In one embodiment, the second set of registers includes one or more registers proximal to an output of the execution pipeline. In some embodiments, the execution pipeline writes back a predicate value determined for a predicate writer to the second set of registers. The first set of architecturally-defined registers is then updated with the predicate value written back to the second set of registers. In some embodiments, the execution pipeline discontinues execution of the instruction sequence without stalling.
-
公开(公告)号:US20150054837A1
公开(公告)日:2015-02-26
申请号:US13975520
申请日:2013-08-26
Applicant: Apple Inc.
Inventor: Andrew M. Havlir , Brian K. Reynolds , Michael A. Geary
IPC: G06T1/20
CPC classification number: G06T1/20 , G06F9/30072 , G06F9/3017 , G06F9/30181 , G06F9/30185 , G06F9/3867 , G06F9/3877
Abstract: Techniques are disclosed relating to predication. In one embodiment, a graphics processing unit is disclosed that includes a first set of architecturally-defined registers configured to store predication information. The graphics processing unit further includes a second set of registers configured to mirror the first set of registers and an execution pipeline configured to discontinue execution of an instruction sequence based on predication information in the second set of registers. In one embodiment, the second set of registers includes one or more registers proximal to an output of the execution pipeline. In some embodiments, the execution pipeline writes back a predicate value determined for a predicate writer to the second set of registers. The first set of architecturally-defined registers is then updated with the predicate value written back to the second set of registers. In some embodiments, the execution pipeline discontinues execution of the instruction sequence without stalling.
Abstract translation: 公开了与预测有关的技术。 在一个实施例中,公开了一种图形处理单元,其包括被配置为存储预测信息的第一组体系结构定义的寄存器。 图形处理单元还包括配置为镜像第一组寄存器的第二组寄存器和配置为基于第二组寄存器中的预测信息中止指令序列的执行的执行流水线。 在一个实施例中,第二组寄存器包括靠近执行流水线的输出的一个或多个寄存器。 在一些实施例中,执行流水线将为谓词写入器确定的谓词值写回到第二组寄存器。 然后,第一组体系结构定义的寄存器被更新,并将谓词值写回第二组寄存器。 在一些实施例中,执行流水线不中断执行指令序列。
-
-
-
-
-
-
-
-
-