-
公开(公告)号:US11372621B2
公开(公告)日:2022-06-28
申请号:US16893051
申请日:2020-06-04
Applicant: Apple Inc.
Inventor: Anthony Y. Tai , Liang-Kai Wang , Ian R. Ollmann , Anand Poovekurussi
IPC: G06F7/483 , G06F9/30 , G06F7/499 , G06F9/38 , G06F1/3206
Abstract: Techniques are disclosed relating to floating-point circuitry configured to perform a corner check instruction for a floating-point power operation. In some embodiments, the power operation is performed by executing multiple instructions, including one or more instructions specify to generate an initial power result of a first input raised to the power of a second input as 2(second input*log2(first input)). In some embodiments, the corner check instruction operates on the first and second inputs and outputs output a corrected power result based on detection of a corner condition for the first and second inputs. Corner check circuitry may share circuits with other datapaths. In various embodiments, the disclosed techniques may reduce code size and power consumption for the power operation.
-
公开(公告)号:US20210149679A1
公开(公告)日:2021-05-20
申请号:US16686060
申请日:2019-11-15
Applicant: Apple Inc.
Inventor: Christopher A. Burns , Liang-Kai Wang , Robert D. Kenney , Terence M. Potter
Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.
-
公开(公告)号:US20210055931A1
公开(公告)日:2021-02-25
申请号:US16548812
申请日:2019-08-22
Applicant: Apple Inc.
Inventor: Robert D. Kenney , Liang-Kai Wang , Terence M. Potter
Abstract: Techniques are disclosed relating to routing circuitry configured to perform permute operations for operands of threads in a single-instruction multiple-data group. In some embodiments, an apparatus includes hierarchical operand routing circuitry configured to route operands between a set of single-instruction multiple-data (SIMD) pipelines based on a permute instruction. In some embodiments, the routing circuitry includes a first level and a second level. The first level may include a set of multiple crossbar circuits each configured to receive operands from a respective subset of the pipelines and output one or more of the received operands on multiple output lines based on the permute instruction, where the crossbar circuits support full permutation within a respective subset. A second level may be configured to select an operand from a previous level for each of the pipelines, and may select from among only a portion of output operands from the previous level to provide an operand for a respective pipeline.
-
公开(公告)号:US10282169B2
公开(公告)日:2019-05-07
申请号:US15092401
申请日:2016-04-06
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Terence M. Potter , Andrew M. Havlir , Yu Sun , Nicolas X. Pena , Xiao-Long Wu , Christopher A. Burns
Abstract: Techniques are disclosed relating to floating-point operations with down-conversion. In some embodiments, a floating-point unit is configured to perform fused multiply-addition operations based on first and second different instruction types. In some embodiments, the first instruction type specifies result in the first floating-point format and the second instruction type specifies fused multiply addition of input operands in the first floating-point format to generate a result in a second, lower-precision floating-point format. For example, the first format may be a 32-bit format and the second format may be a 16-bit format. In some embodiments, the floating-point unit includes rounding circuitry, exponent circuitry, and/or increment circuitry configured to generate signals for the second instruction type in the same pipeline stage as for the first instruction type. In some embodiments, disclosed techniques may reduce the number of pipeline stages included in the floating-point circuitry.
-
公开(公告)号:US20190034166A1
公开(公告)日:2019-01-31
申请号:US16146147
申请日:2018-09-28
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Terence M. Potter , Brian K. Reynolds , Justin Friesenhahn
IPC: G06F7/505
Abstract: Techniques are disclosed relating to performing arithmetic operations to generate values for different related threads. In some embodiments, the threads are graphics threads and the values are operand locations. In some embodiments, an apparatus includes circuitry configured to generate results for multiple threads by performing a plurality of arithmetic operations indicated by an instruction. In some embodiments, the instruction specifies: an input value that is common to the multiple threads and, for at least one of the multiple threads, a type value that indicates whether to generate a result for the thread by performing an arithmetic operation based on a first input that is a result of an arithmetic operation from another thread of the multiple threads or to generate a result for the thread using the input value that is common to the multiple threads. In some embodiments, the circuitry is configured to generate a result for the at least one of the multiple threads by selectively performing the arithmetic operation or using the input value that is common to the multiple threads based on the type value.
-
公开(公告)号:US09846579B1
公开(公告)日:2017-12-19
申请号:US15180725
申请日:2016-06-13
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Terence M. Potter , Andrew M. Havlir
CPC classification number: G06F9/30021 , G06F9/3001 , G06F9/30083
Abstract: Techniques are disclosed relating to comparison circuitry. In some embodiments, compare circuitry is configured to generate comparison results for sets of inputs in both one or more integer formats and one or more floating-point formats. In some embodiments, the compare circuitry includes padding circuitry configured to add one or more bits to each of first and second input values to generate first and second padded values. In some embodiments, the compare circuitry also includes integer subtraction circuitry configured to subtract the first padded value from the second padded value to generate a subtraction result. In some embodiments, the compare circuitry includes output logic configured to generate the comparison result based on the subtraction result. In various embodiments, using at least a portion of the same circuitry (e.g., the subtractor) for both integer and floating-point comparisons may reduce processor area.
-
公开(公告)号:US12008377B2
公开(公告)日:2024-06-11
申请号:US18299452
申请日:2023-04-12
Applicant: Apple Inc.
Inventor: Christopher A. Burns , Liang-Kai Wang , Robert D. Kenney , Terence M. Potter
CPC classification number: G06F9/3887 , G06F9/30098 , G06T1/20 , G06T1/60
Abstract: Techniques are disclosed relating to operand routing among SIMD pipelines. In some embodiments, an apparatus includes a set of multiple hardware pipelines configured to execute a single-instruction multiple-data (SIMD) instruction for multiple threads in parallel, wherein the instruction specifies first and second architectural registers. In some embodiments, the pipelines include execution circuitry configured to perform operations using one or more pipeline stages of the pipeline. In some embodiments, the pipelines include routing circuitry configured to select, based on the instruction, a first input operand for the execution circuitry from among: a value from the first architectural register from thread-specific storage for another pipeline and a value from the second architectural register from thread-specific storage for a thread assigned to another pipeline. In some embodiments, the routing circuitry may support a shift and fill instruction that facilitates storage of an arbitrary portion of a graphics frame in one or more registers.
-
公开(公告)号:US20240061650A1
公开(公告)日:2024-02-22
申请号:US17820766
申请日:2022-08-18
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Ian R. Ollmann , Anthony Y. Tai
CPC classification number: G06F7/556 , G06F7/4873
Abstract: Techniques are disclosed relating to polynomial approximation of the base-2 logarithm. In some embodiments, floating-point circuitry is configured to perform an approximation of a base-2 logarithm operation and provide a fixed unit of least precision (ULP) error over a range of inputs. In some embodiments, the floating-point circuitry includes a set of parallel pipelines for polynomial approximation, where the output is chosen from a particular pipeline based on a determination of whether the input operand is in a first subset of a range of inputs. Disclosed techniques may advantageously provide fixed ULP error for an entire input operand range for the floating-point base-2 logarithmic function with minimal area and energy footprint, relative to traditional techniques.
-
公开(公告)号:US20240053960A1
公开(公告)日:2024-02-15
申请号:US18489640
申请日:2023-10-18
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Ian R. Ollmann , Anthony Y. Tai
CPC classification number: G06F7/4873 , G06F7/49984 , G06F7/4876
Abstract: Techniques are disclosed relating to circuitry for floating-point division. In some embodiments, the circuitry is configured to generate a subnormal result for a division operation that divides a numerator by a denominator. The circuitry may include floating-point circuitry configured to perform a reciprocal operation to determine a normalized mantissa value for the reciprocal of a floating-point representation of the denominator. The circuitry may further include fixed-point circuitry configured to multiply a fixed-point representation of the normalized mantissa value for the reciprocal by a mantissa of the numerator to generate an initial value. Control circuitry may determine error data for the initial value and generate a final subnormal mantissa result for the division operation based on the error data and the initial value. Embodiments with multiple modes with different accuracy guarantees are disclosed.
-
公开(公告)号:US11210761B1
公开(公告)日:2021-12-28
申请号:US17131052
申请日:2020-12-22
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Vinod Reddy Nalamalapu
IPC: G06T1/20
Abstract: Techniques are disclosed relating to selecting a number of candidates based on priority. In some embodiments, position determination circuitry receives an input vector that orders a set of potential candidates from a highest-priority position within the input vector to a lowest priority position. In some embodiments, it determines, starting from a first end of the input vector and based on non-overlapping groups of candidates, a particular position within the input vector at which a threshold number of available candidate are found. This may include to generate respective count values within the groups of candidates, identify a transition group in which the particular position is located based on accumulation of the respective count values, and identify the particular position within the transition group. Output circuitry may generate, based on the particular position, an output vector that indicates the threshold number of available candidates from the input vector.
-
-
-
-
-
-
-
-
-