-
公开(公告)号:US11256518B2
公开(公告)日:2022-02-22
申请号:US16597625
申请日:2019-10-09
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Robert D. Kenney , Terence M. Potter , Vinod Reddy Nalamalapu , Sivayya V. Ayinala
Abstract: Techniques are disclosed relating to sharing operands among SIMD threads for a larger arithmetic operation. In some embodiments, a set of multiple hardware pipelines is configured to execute single-instruction multiple-data (SIMD) instructions for multiple threads in parallel, where ones of the hardware pipelines include execution circuitry configured to perform floating-point operations using one or more pipeline stages of the pipeline and first routing circuitry configured to select, from among thread-specific operands stored for the hardware pipeline and from one or more other pipelines in the set, a first input operand for an operation by the execution circuitry. In some embodiments, a device is configured to perform a mathematical operation on source input data structures stored across thread-specific storage for the set of hardware pipelines, by executing multiple SIMD floating-point operations using the execution circuitry and the first routing circuitry. This may improve performance and reduce power consumption for matrix multiply and reduction operations, for example.
-
公开(公告)号:US11210761B1
公开(公告)日:2021-12-28
申请号:US17131052
申请日:2020-12-22
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Vinod Reddy Nalamalapu
IPC: G06T1/20
Abstract: Techniques are disclosed relating to selecting a number of candidates based on priority. In some embodiments, position determination circuitry receives an input vector that orders a set of potential candidates from a highest-priority position within the input vector to a lowest priority position. In some embodiments, it determines, starting from a first end of the input vector and based on non-overlapping groups of candidates, a particular position within the input vector at which a threshold number of available candidate are found. This may include to generate respective count values within the groups of candidates, identify a transition group in which the particular position is located based on accumulation of the respective count values, and identify the particular position within the transition group. Output circuitry may generate, based on the particular position, an output vector that indicates the threshold number of available candidates from the input vector.
-
公开(公告)号:US20210109761A1
公开(公告)日:2021-04-15
申请号:US16597625
申请日:2019-10-09
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Robert D. Kenney , Terence M. Potter , Vinod Reddy Nalamalapu , Sivayya V. Ayinala
Abstract: Techniques are disclosed relating to sharing operands among SIMD threads for a larger arithmetic operation. In some embodiments, a set of multiple hardware pipelines is configured to execute single-instruction multiple-data (SIMD) instructions for multiple threads in parallel, where ones of the hardware pipelines include execution circuitry configured to perform floating-point operations using one or more pipeline stages of the pipeline and first routing circuitry configured to select, from among thread-specific operands stored for the hardware pipeline and from one or more other pipelines in the set, a first input operand for an operation by the execution circuitry. In some embodiments, a device is configured to perform a mathematical operation on source input data structures stored across thread-specific storage for the set of hardware pipelines, by executing multiple SIMD floating-point operations using the execution circuitry and the first routing circuitry. This may improve performance and reduce power consumption for matrix multiply and reduction operations, for example.
-
-