Computer processor for higher precision computations using a mixed-precision decomposition of operations

    公开(公告)号:US11544057B2

    公开(公告)日:2023-01-03

    申请号:US17069230

    申请日:2020-10-13

    申请人: INTEL CORPORATION

    摘要: Embodiments detailed herein relate to arithmetic operations of float-point values. An exemplary processor includes decoding circuitry to decode an instruction, where the instruction specifies locations of a plurality of operands, values of which being in a floating-point format. The exemplary processor further includes execution circuitry to execute the decoded instruction, where the execution includes to: convert the values for each operand, each value being converted into a plurality of lower precision values, where an exponent is to be stored for each operand; perform arithmetic operations among lower precision values converted from values for the plurality of the operands; and generate a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and store the floating-point value.

    Systems and methods for combining low-mantissa units to achieve and exceed FP64 emulation of matrix multiplication

    公开(公告)号:US11669586B2

    公开(公告)日:2023-06-06

    申请号:US17680483

    申请日:2022-02-25

    申请人: Intel Corporation

    IPC分类号: G06F17/16 G06F9/455 G06F9/30

    摘要: The present disclosure relates to an apparatus that includes decoding circuitry that decodes a single instruction. The single instruction includes an identifier of a first source operand, an identifier of a second source operand, an identifier of a destination, and an opcode indicative of execution circuitry is to multiply from the identified first source operand and the identified second source operand and store a result in the identified destination. Additionally, the apparatus includes execution circuitry to execute the single decoded instruction to calculate a dot product by calculating a plurality of products using data elements of the identified first and second operands using values less precise than the identified first and second source operands, summing the calculated products, and storing the summed products in the destination.

    Computer processor for higher precision computations using a mixed-precision decomposition of operations

    公开(公告)号:US10853067B2

    公开(公告)日:2020-12-01

    申请号:US16144964

    申请日:2018-09-27

    申请人: INTEL CORPORATION

    摘要: Embodiments detailed herein relate to arithmetic operations of float-point values. An exemplary processor includes decoding circuitry to decode an instruction, where the instruction specifies locations of a plurality of operands, values of which being in a floating-point format. The exemplary processor further includes execution circuitry to execute the decoded instruction, where the execution includes to: convert the values for each operand, each value being converted into a plurality of lower precision values, where an exponent is to be stored for each operand; perform arithmetic operations among lower precision values converted from values for the plurality of the operands; and generate a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and store the floating-point value.

    Computer processor for higher precision computations using a mixed-precision decomposition of operations

    公开(公告)号:US11126428B2

    公开(公告)日:2021-09-21

    申请号:US17125846

    申请日:2020-12-17

    申请人: INTEL CORPORATION

    摘要: Embodiments detailed herein relate to arithmetic operations of float-point values. An exemplary processor includes decoding circuitry to decode an instruction, where the instruction specifies locations of a plurality of operands, values of which being in a floating-point format. The exemplary processor further includes execution circuitry to execute the decoded instruction, where the execution includes to: convert the values for each operand, each value being converted into a plurality of lower precision values, where an exponent is to be stored for each operand; perform arithmetic operations among lower precision values converted from values for the plurality of the operands; and generate a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and store the floating-point value.

    APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR LOADING A TILE OF A MATRIX OPERATIONS ACCELERATOR

    公开(公告)号:US20240220323A1

    公开(公告)日:2024-07-04

    申请号:US18149045

    申请日:2022-12-30

    申请人: Intel Corporation

    IPC分类号: G06F9/50 G06F5/01 G06F7/487

    摘要: Systems, methods, and apparatuses relating to floating-point support circuitry to implement floating-point operations on a two-dimensional grid of fixed-point processing elements are described. In one example, a hardware processor includes a two-dimensional grid of fixed-point processing elements; floating-point support circuitry coupled to the two-dimensional grid of fixed-point processing elements; storage for a first, a second, and a destination two-dimensional floating-point matrices coupled to the floating-point support circuitry; and controller circuitry to cause the two-dimensional grid of fixed-point processing elements and the floating-point support circuitry to: determine, by the floating-point support circuitry, an extreme exponent for each row of the first two-dimensional floating-point matrix and for each column of the second two-dimensional floating-point matrix, generate, by the floating-point support circuitry, a first fixed-point matrix from the first two-dimensional floating-point matrix and a second fixed-point matrix from the second two-dimensional floating-point matrix, generate, by the two-dimensional grid of fixed-point processing elements, corresponding fixed-point results by a multiplication of fixed-point elements of the first fixed-point matrix by corresponding fixed-point elements of the second fixed-point matrix, scale, by the floating-point support circuitry, the corresponding fixed-point results according to the extreme exponents to generate scaled fixed-point results, generate, by the floating-point support circuitry, a resultant floating-point matrix from the scaled fixed-point results, and store the resultant floating-point matrix into the destination two-dimensional floating-point matrix.

    SINGLE PRECISION SUPPORT FOR SYSTOLIC PIPELINE IN A GRAPHICS ENVIRONMENT

    公开(公告)号:US20240111825A1

    公开(公告)日:2024-04-04

    申请号:US17937229

    申请日:2022-09-30

    申请人: Intel Corporation

    IPC分类号: G06F17/16 G06F7/483

    CPC分类号: G06F17/16 G06F7/483

    摘要: An apparatus to facilitate single precision support for systolic pipeline in a graphics environment is disclosed. The apparatus includes a processor comprising systolic array hardware including a plurality of data processing units, wherein the systolic array hardware is to: receive data for performance of a matrix multiplication operation in a first precision format; convert an original value of the data into two split values with a second precision format having a lower precision than the first precision format; perform the matrix multiplication operation using the two split values in the second precision format, the matrix multiplication operation comprising a split-term operation that utilizes two passes through the systolic array hardware with feedback wiring and local reduction; and generate an emulated result for the matrix multiplication operation in the first precision format.

    EMULATION OF FLOATING POINT CALCULATION

    公开(公告)号:US20230086275A1

    公开(公告)日:2023-03-23

    申请号:US17482166

    申请日:2021-09-22

    申请人: Intel Corporation

    摘要: Emulating floating point calculation using lower precision format calculations is described. An example of a processor includes a floating point unit (FPU) to provide a native floating point operation in a first precision format; and systolic array hardware including multiple data processing units, wherein the processor is to receive data for performance of a matrix multiplication operation in the first precision format; enable an emulated floating point multiplication operation using one or more values with a second precision format, the second precision format having a lower precision than the first precision format, the emulated floating point multiplication including operation of the systolic array hardware; and generate an emulated result for the matrix multiplication operation.

    Systems and Methods for Combining Low-Mantissa Units to Achieve and Exceed FP64 Emulation of Matrix Multiplication

    公开(公告)号:US20220391470A1

    公开(公告)日:2022-12-08

    申请号:US17680483

    申请日:2022-02-25

    申请人: Intel Corporation

    IPC分类号: G06F17/16 G06F9/455 G06F9/30

    摘要: The present disclosure relates to an apparatus that includes decoding circuitry that decodes a single instruction. The single instruction includes an identifier of a first source operand, an identifier of a second source operand, an identifier of a destination, and an opcode indicative of execution circuitry is to multiply from the identified first source operand and the identified second source operand and store a result in the identified destination. Additionally, the apparatus includes execution circuitry to execute the single decoded instruction to calculate a dot product by calculating a plurality of products using data elements of the identified first and second operands using values less precise than the identified first and second source operands, summing the calculated products, and storing the summed products in the destination.