-
公开(公告)号:US20220317971A1
公开(公告)日:2022-10-06
申请号:US17218041
申请日:2021-03-30
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Ian R. Ollmann , Anthony Y. Tai
Abstract: Techniques are disclosed relating to circuitry for floating-point division. In some embodiments, the circuitry is configured to generate a subnormal result for a division operation that divides a numerator by a denominator. The circuitry may include floating-point circuitry configured to perform a reciprocal operation to determine a normalized mantissa value for the reciprocal of a floating-point representation of the denominator. The circuitry may further include fixed-point circuitry configured to multiply a fixed-point representation of the normalized mantissa value for the reciprocal by a mantissa of the numerator to generate an initial value. Control circuitry may determine error data for the initial value and generate a final subnormal mantissa result for the division operation based on the error data and the initial value. Embodiments with multiple modes with different accuracy guarantees are disclosed.
-
公开(公告)号:US11836459B2
公开(公告)日:2023-12-05
申请号:US17218041
申请日:2021-03-30
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Ian R. Ollmann , Anthony Y. Tai
CPC classification number: G06F7/4873 , G06F7/4876 , G06F7/49984
Abstract: Techniques are disclosed relating to circuitry for floating-point division. In some embodiments, the circuitry is configured to generate a subnormal result for a division operation that divides a numerator by a denominator. The circuitry may include floating-point circuitry configured to perform a reciprocal operation to determine a normalized mantissa value for the reciprocal of a floating-point representation of the denominator. The circuitry may further include fixed-point circuitry configured to multiply a fixed-point representation of the normalized mantissa value for the reciprocal by a mantissa of the numerator to generate an initial value. Control circuitry may determine error data for the initial value and generate a final subnormal mantissa result for the division operation based on the error data and the initial value. Embodiments with multiple modes with different accuracy guarantees are disclosed.
-
公开(公告)号:US20140313214A1
公开(公告)日:2014-10-23
申请号:US14254801
申请日:2014-04-16
Applicant: Apple Inc.
Inventor: Aaftab A. Munshi , Ian R. Ollmann
IPC: G06T1/60
CPC classification number: G09G5/001 , G06F9/5044 , G06T1/60 , G09G2360/127
Abstract: A method and an apparatus for a parallel computing program using subbuffers to perform a data processing task in parallel among heterogeneous compute units are described. The compute units can include a heterogeneous mix of central processing units (CPUs) and graphic processing units (GPUs). A system creates a subbuffer from a parent buffer for each of a plurality of heterogeneous compute units. If a subbuffer is not associated with the same compute unit as the parent buffer, the system copies data from the subbuffer to memory of that compute unit. The system further tracks updates to the data and transfers those updates back to the subbuffer.
Abstract translation: 描述了使用子缓冲器在异构计算单元之间并行执行数据处理任务的并行计算程序的方法和装置。 计算单元可以包括中央处理单元(CPU)和图形处理单元(GPU)的异构混合。 系统从多个异构计算单元中的每一个的父缓冲器创建子缓冲器。 如果子缓冲区未与父缓冲区相同的计算单元关联,则系统将数据从子缓冲区复制到该计算单元的存储器。 该系统进一步跟踪数据的更新并将这些更新传送回子缓冲区。
-
公开(公告)号:US20210382687A1
公开(公告)日:2021-12-09
申请号:US16893051
申请日:2020-06-04
Applicant: Apple Inc.
Inventor: Anthony Y. Tai , Liang-Kai Wang , Ian R. Ollmann , Anand Poovekurussi
IPC: G06F7/483 , G06F7/499 , G06F1/3206 , G06F9/30 , G06F9/38
Abstract: Techniques are disclosed relating to floating-point circuitry configured to perform a corner check instruction for a floating-point power operation. In some embodiments, the power operation is performed by executing multiple instructions, including one or more instructions specify to generate an initial power result of a first input raised to the power of a second input as 2(second input*log2(first input)). In some embodiments, the corner check instruction operates on the first and second inputs and outputs output a corrected power result based on detection of a corner condition for the first and second inputs. Corner check circuitry may share circuits with other datapaths. In various embodiments, the disclosed techniques may reduce code size and power consumption for the power operation.
-
公开(公告)号:US09691346B2
公开(公告)日:2017-06-27
申请号:US14575261
申请日:2014-12-18
Applicant: Apple Inc.
Inventor: Aaftab A. Munshi , Ian R. Ollmann
CPC classification number: G09G5/001 , G06F9/5044 , G06T1/60 , G09G2360/127
Abstract: A method and an apparatus for a parallel computing program using subbuffers to perform a data processing task in parallel among heterogeneous compute units are described. The compute units can include a heterogeneous mix of central processing units (CPUs) and graphic processing units (GPUs). A system creates a subbuffer from a parent buffer for each of a plurality of heterogeneous compute units. If a subbuffer is not associated with the same compute unit as the parent buffer, the system copies data from the subbuffer to memory of that compute unit. The system further tracks updates to the data and transfers those updates back to the subbuffer.
-
公开(公告)号:US08957906B2
公开(公告)日:2015-02-17
申请号:US14254801
申请日:2014-04-16
Applicant: Apple Inc.
Inventor: Aaftab A. Munshi , Ian R. Ollmann
IPC: G06F15/167 , G06T1/60
CPC classification number: G09G5/001 , G06F9/5044 , G06T1/60 , G09G2360/127
Abstract: A method and an apparatus for a parallel computing program using subbuffers to perform a data processing task in parallel among heterogeneous compute units are described. The compute units can include a heterogeneous mix of central processing units (CPUs) and graphic processing units (GPUs). A system creates a subbuffer from a parent buffer for each of a plurality of heterogeneous compute units. If a subbuffer is not associated with the same compute unit as the parent buffer, the system copies data from the subbuffer to memory of that compute unit. The system further tracks updates to the data and transfers those updates back to the subbuffer.
Abstract translation: 描述了使用子缓冲器在异构计算单元之间并行执行数据处理任务的并行计算程序的方法和装置。 计算单元可以包括中央处理单元(CPU)和图形处理单元(GPU)的异构混合。 系统从多个异构计算单元中的每一个的父缓冲器创建子缓冲器。 如果子缓冲区未与父缓冲区相同的计算单元关联,则系统将数据从子缓冲区复制到该计算单元的存储器。 该系统进一步跟踪数据的更新并将这些更新传送回子缓冲区。
-
公开(公告)号:US11372621B2
公开(公告)日:2022-06-28
申请号:US16893051
申请日:2020-06-04
Applicant: Apple Inc.
Inventor: Anthony Y. Tai , Liang-Kai Wang , Ian R. Ollmann , Anand Poovekurussi
IPC: G06F7/483 , G06F9/30 , G06F7/499 , G06F9/38 , G06F1/3206
Abstract: Techniques are disclosed relating to floating-point circuitry configured to perform a corner check instruction for a floating-point power operation. In some embodiments, the power operation is performed by executing multiple instructions, including one or more instructions specify to generate an initial power result of a first input raised to the power of a second input as 2(second input*log2(first input)). In some embodiments, the corner check instruction operates on the first and second inputs and outputs output a corrected power result based on detection of a corner condition for the first and second inputs. Corner check circuitry may share circuits with other datapaths. In various embodiments, the disclosed techniques may reduce code size and power consumption for the power operation.
-
公开(公告)号:US20240061650A1
公开(公告)日:2024-02-22
申请号:US17820766
申请日:2022-08-18
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Ian R. Ollmann , Anthony Y. Tai
CPC classification number: G06F7/556 , G06F7/4873
Abstract: Techniques are disclosed relating to polynomial approximation of the base-2 logarithm. In some embodiments, floating-point circuitry is configured to perform an approximation of a base-2 logarithm operation and provide a fixed unit of least precision (ULP) error over a range of inputs. In some embodiments, the floating-point circuitry includes a set of parallel pipelines for polynomial approximation, where the output is chosen from a particular pipeline based on a determination of whether the input operand is in a first subset of a range of inputs. Disclosed techniques may advantageously provide fixed ULP error for an entire input operand range for the floating-point base-2 logarithmic function with minimal area and energy footprint, relative to traditional techniques.
-
公开(公告)号:US20240053960A1
公开(公告)日:2024-02-15
申请号:US18489640
申请日:2023-10-18
Applicant: Apple Inc.
Inventor: Liang-Kai Wang , Ian R. Ollmann , Anthony Y. Tai
CPC classification number: G06F7/4873 , G06F7/49984 , G06F7/4876
Abstract: Techniques are disclosed relating to circuitry for floating-point division. In some embodiments, the circuitry is configured to generate a subnormal result for a division operation that divides a numerator by a denominator. The circuitry may include floating-point circuitry configured to perform a reciprocal operation to determine a normalized mantissa value for the reciprocal of a floating-point representation of the denominator. The circuitry may further include fixed-point circuitry configured to multiply a fixed-point representation of the normalized mantissa value for the reciprocal by a mantissa of the numerator to generate an initial value. Control circuitry may determine error data for the initial value and generate a final subnormal mantissa result for the division operation based on the error data and the initial value. Embodiments with multiple modes with different accuracy guarantees are disclosed.
-
公开(公告)号:US20150187322A1
公开(公告)日:2015-07-02
申请号:US14575261
申请日:2014-12-18
Applicant: Apple Inc.
Inventor: Aaftab A. Munshi , Ian R. Ollmann
CPC classification number: G09G5/001 , G06F9/5044 , G06T1/60 , G09G2360/127
Abstract: A method and an apparatus for a parallel computing program using subbuffers to perform a data processing task in parallel among heterogeneous compute units are described. The compute units can include a heterogeneous mix of central processing units (CPUs) and graphic processing units (GPUs). A system creates a subbuffer from a parent buffer for each of a plurality of heterogeneous compute units. If a subbuffer is not associated with the same compute unit as the parent buffer, the system copies data from the subbuffer to memory of that compute unit. The system further tracks updates to the data and transfers those updates back to the subbuffer.
Abstract translation: 描述了使用子缓冲器在异构计算单元之间并行执行数据处理任务的并行计算程序的方法和装置。 计算单元可以包括中央处理单元(CPU)和图形处理单元(GPU)的异构混合。 系统从多个异构计算单元中的每一个的父缓冲器创建子缓冲器。 如果子缓冲区未与父缓冲区相同的计算单元关联,则系统将数据从子缓冲区复制到该计算单元的存储器。 该系统进一步跟踪数据的更新并将这些更新传送回子缓冲区。
-
-
-
-
-
-
-
-
-