-
公开(公告)号:US11544057B2
公开(公告)日:2023-01-03
申请号:US17069230
申请日:2020-10-13
申请人: INTEL CORPORATION
发明人: Gregory Henry , Alexander Heinecke
摘要: Embodiments detailed herein relate to arithmetic operations of float-point values. An exemplary processor includes decoding circuitry to decode an instruction, where the instruction specifies locations of a plurality of operands, values of which being in a floating-point format. The exemplary processor further includes execution circuitry to execute the decoded instruction, where the execution includes to: convert the values for each operand, each value being converted into a plurality of lower precision values, where an exponent is to be stored for each operand; perform arithmetic operations among lower precision values converted from values for the plurality of the operands; and generate a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and store the floating-point value.
-
公开(公告)号:US11669586B2
公开(公告)日:2023-06-06
申请号:US17680483
申请日:2022-02-25
申请人: Intel Corporation
发明人: Gregory Henry , Alexander Heinecke
CPC分类号: G06F17/16 , G06F9/3001 , G06F9/3016 , G06F9/45508
摘要: The present disclosure relates to an apparatus that includes decoding circuitry that decodes a single instruction. The single instruction includes an identifier of a first source operand, an identifier of a second source operand, an identifier of a destination, and an opcode indicative of execution circuitry is to multiply from the identified first source operand and the identified second source operand and store a result in the identified destination. Additionally, the apparatus includes execution circuitry to execute the single decoded instruction to calculate a dot product by calculating a plurality of products using data elements of the identified first and second operands using values less precise than the identified first and second source operands, summing the calculated products, and storing the summed products in the destination.
-
公开(公告)号:US10853067B2
公开(公告)日:2020-12-01
申请号:US16144964
申请日:2018-09-27
申请人: INTEL CORPORATION
发明人: Gregory Henry , Alexander Heinecke
摘要: Embodiments detailed herein relate to arithmetic operations of float-point values. An exemplary processor includes decoding circuitry to decode an instruction, where the instruction specifies locations of a plurality of operands, values of which being in a floating-point format. The exemplary processor further includes execution circuitry to execute the decoded instruction, where the execution includes to: convert the values for each operand, each value being converted into a plurality of lower precision values, where an exponent is to be stored for each operand; perform arithmetic operations among lower precision values converted from values for the plurality of the operands; and generate a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and store the floating-point value.
-
公开(公告)号:US20240111826A1
公开(公告)日:2024-04-04
申请号:US17937252
申请日:2022-09-30
申请人: Intel Corporation
发明人: Jiasheng Chen , Kevin Hurd , Changwon Rhee , Jorge Parra , Fangwen Fu , Theo Drane , William Zorn , Peter Caday , Gregory Henry , Guei-Yuan Lueh , Farzad Chehrazi , Amit Karande , Turbo Majumder , Xinmin Tian , Milind Girkar , Hong Jiang
CPC分类号: G06F17/16 , G06F7/5443 , G06T1/20
摘要: An apparatus to facilitate hardware enhancements for double precision systolic support is disclosed. The apparatus includes matrix acceleration hardware having double-precision (DP) matrix multiplication circuitry including a multiplier circuits to multiply pairs of input source operands in a DP floating-point format; adders to receive multiplier outputs from the multiplier circuits and accumulate the multiplier outputs in a high precision intermediate format; an accumulator circuit to accumulate adder outputs from the adders with at least one of a third global source operand on a first pass of the DP matrix multiplication circuitry or an intermediate result from the first pass on a second pass of the DP matrix multiplication circuitry, wherein the accumulator circuit to generate an accumulator output in the high precision intermediate format; and a down conversion and rounding circuit to down convert and round an output of the second pass as final result in the DP floating-point format.
-
公开(公告)号:US11868770B2
公开(公告)日:2024-01-09
申请号:US18091157
申请日:2022-12-29
申请人: INTEL CORPORATION
发明人: Gregory Henry , Alexander Heinecke
CPC分类号: G06F9/30014 , G06F1/16 , G06F7/483 , G06F7/485 , G06F7/4876 , G06F7/5324 , G06F7/5443 , G06F9/30025 , G06F9/30145
摘要: Embodiments detailed herein relate to arithmetic operations of float-point values. An exemplary processor includes decoding circuitry to decode an instruction, where the instruction specifies locations of a plurality of operands, values of which being in a floating-point format. The exemplary processor further includes execution circuitry to execute the decoded instruction, where the execution includes to: convert the values for each operand, each value being converted into a plurality of lower precision values, where an exponent is to be stored for each operand; perform arithmetic operations among lower precision values converted from values for the plurality of the operands; and generate a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and store the floating-point value.
-
公开(公告)号:US11126428B2
公开(公告)日:2021-09-21
申请号:US17125846
申请日:2020-12-17
申请人: INTEL CORPORATION
发明人: Gregory Henry , Alexander Heinecke
摘要: Embodiments detailed herein relate to arithmetic operations of float-point values. An exemplary processor includes decoding circuitry to decode an instruction, where the instruction specifies locations of a plurality of operands, values of which being in a floating-point format. The exemplary processor further includes execution circuitry to execute the decoded instruction, where the execution includes to: convert the values for each operand, each value being converted into a plurality of lower precision values, where an exponent is to be stored for each operand; perform arithmetic operations among lower precision values converted from values for the plurality of the operands; and generate a floating-point value by converting a resulting value from the arithmetic operations into the floating-point format and store the floating-point value.
-
7.
公开(公告)号:US20240220323A1
公开(公告)日:2024-07-04
申请号:US18149045
申请日:2022-12-30
申请人: Intel Corporation
CPC分类号: G06F9/5027 , G06F5/012 , G06F7/4876
摘要: Systems, methods, and apparatuses relating to floating-point support circuitry to implement floating-point operations on a two-dimensional grid of fixed-point processing elements are described. In one example, a hardware processor includes a two-dimensional grid of fixed-point processing elements; floating-point support circuitry coupled to the two-dimensional grid of fixed-point processing elements; storage for a first, a second, and a destination two-dimensional floating-point matrices coupled to the floating-point support circuitry; and controller circuitry to cause the two-dimensional grid of fixed-point processing elements and the floating-point support circuitry to: determine, by the floating-point support circuitry, an extreme exponent for each row of the first two-dimensional floating-point matrix and for each column of the second two-dimensional floating-point matrix, generate, by the floating-point support circuitry, a first fixed-point matrix from the first two-dimensional floating-point matrix and a second fixed-point matrix from the second two-dimensional floating-point matrix, generate, by the two-dimensional grid of fixed-point processing elements, corresponding fixed-point results by a multiplication of fixed-point elements of the first fixed-point matrix by corresponding fixed-point elements of the second fixed-point matrix, scale, by the floating-point support circuitry, the corresponding fixed-point results according to the extreme exponents to generate scaled fixed-point results, generate, by the floating-point support circuitry, a resultant floating-point matrix from the scaled fixed-point results, and store the resultant floating-point matrix into the destination two-dimensional floating-point matrix.
-
公开(公告)号:US20240111825A1
公开(公告)日:2024-04-04
申请号:US17937229
申请日:2022-09-30
申请人: Intel Corporation
发明人: Jiasheng Chen , Changwon Rhee , Kevin Hurd , Gregory Henry , Peter Caday , Kristopher Wong
摘要: An apparatus to facilitate single precision support for systolic pipeline in a graphics environment is disclosed. The apparatus includes a processor comprising systolic array hardware including a plurality of data processing units, wherein the systolic array hardware is to: receive data for performance of a matrix multiplication operation in a first precision format; convert an original value of the data into two split values with a second precision format having a lower precision than the first precision format; perform the matrix multiplication operation using the two split values in the second precision format, the matrix multiplication operation comprising a split-term operation that utilizes two passes through the systolic array hardware with feedback wiring and local reduction; and generate an emulated result for the matrix multiplication operation in the first precision format.
-
公开(公告)号:US20230086275A1
公开(公告)日:2023-03-23
申请号:US17482166
申请日:2021-09-22
申请人: Intel Corporation
发明人: Jiasheng Chen , Changwon Rhee , Sabareesh Ganapathy , Gregory Henry , Fangwen Fu
摘要: Emulating floating point calculation using lower precision format calculations is described. An example of a processor includes a floating point unit (FPU) to provide a native floating point operation in a first precision format; and systolic array hardware including multiple data processing units, wherein the processor is to receive data for performance of a matrix multiplication operation in the first precision format; enable an emulated floating point multiplication operation using one or more values with a second precision format, the second precision format having a lower precision than the first precision format, the emulated floating point multiplication including operation of the systolic array hardware; and generate an emulated result for the matrix multiplication operation.
-
公开(公告)号:US20220391470A1
公开(公告)日:2022-12-08
申请号:US17680483
申请日:2022-02-25
申请人: Intel Corporation
发明人: Gregory Henry , Alexander Heinecke
摘要: The present disclosure relates to an apparatus that includes decoding circuitry that decodes a single instruction. The single instruction includes an identifier of a first source operand, an identifier of a second source operand, an identifier of a destination, and an opcode indicative of execution circuitry is to multiply from the identified first source operand and the identified second source operand and store a result in the identified destination. Additionally, the apparatus includes execution circuitry to execute the single decoded instruction to calculate a dot product by calculating a plurality of products using data elements of the identified first and second operands using values less precise than the identified first and second source operands, summing the calculated products, and storing the summed products in the destination.
-
-
-
-
-
-
-
-
-