-
公开(公告)号:US20240103867A1
公开(公告)日:2024-03-28
申请号:US18521000
申请日:2023-11-28
Applicant: Intel Corporation
Inventor: Bret TOLL , Christopher J. HUGHES , Dan BAUM , Elmoustapha OULD-AHMED-VALL , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE
IPC: G06F9/30
CPC classification number: G06F9/30145 , G06F9/30032 , G06F9/30036 , G06F9/30109
Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.
-
2.
公开(公告)号:US20230350682A1
公开(公告)日:2023-11-02
申请号:US18309469
申请日:2023-04-28
Applicant: Intel Corporation
Inventor: Raanan SADE , Robert VALENTINE , Bret TOLL , Christopher J. HUGHES , Alexander F. HEINECKE , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY
IPC: G06F9/30
CPC classification number: G06F9/30167 , G06F9/30149 , G06F9/30101
Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of source and destination matrices, wherein the opcode indicates that the processor is to transform the specified source matrix into the specified destination matrix having the row-interleaved format; and execution circuitry to respond to the decoded instruction by transforming the specified source matrix into the specified RowInt-formatted destination matrix by interleaving J elements of each J-element sub-column of the specified source matrix in either row-major or column-major order into a K-wide submatrix of the specified destination matrix, the K-wide submatrix having K columns and enough rows to hold the J elements.
-
公开(公告)号:US20220100515A1
公开(公告)日:2022-03-31
申请号:US17549221
申请日:2021-12-13
Applicant: Intel Corporation
Inventor: Bret TOLL , Christopher J. HUGHES , Dan BAUM , Elmoustapha OULD-AHMED-VALL , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE
IPC: G06F9/30
Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.
-
公开(公告)号:US20220100505A1
公开(公告)日:2022-03-31
申请号:US17549363
申请日:2021-12-13
Applicant: Intel Corporation
Inventor: Bret TOLL , Christopher J. HUGHES , Dan BAUM , Elmoustapha OULD-AHMED-VALL , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE
Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.
-
公开(公告)号:US20200241873A1
公开(公告)日:2020-07-30
申请号:US16487784
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Menachem ADELMAN , Zeev SPERBER , Mark J. CHARNEY , Bret L. TOLL , Jesus CORBAL , Alexander F. HEINECKE , Barukh ZIV , Elmoustapha OULD-AHMED-VALL , Stanislav SHWARTSMAN
Abstract: Embodiments detailed herein relate to matrix operations. In particular, performing a matrix operation of zeroing a matrix in response to a single instruction. For example, a processor detailed which includes decode circuitry to decode an instruction having fields for an opcode and a source/destination matrix operand identifier; and execution circuitry to execute the decoded instruction to zero each data element of the identified source/destination matrix.
-
公开(公告)号:US20190042261A1
公开(公告)日:2019-02-07
申请号:US16131382
申请日:2018-09-14
Applicant: Intel Corporation
Inventor: Christopher J. HUGHES , Bret TOLL , Dan BAUM , Elmoustapha OULD-AHMED-VALL , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE
Abstract: Disclosed embodiments relate to systems and methods for performing instructions specifying horizontal tile operations. In one example, a processor includes fetch circuitry to fetch an instruction specifying a horizontal tile operation, a location of a M by N source matrix comprising K groups of elements, and locations of K destinations, wherein each of the K groups of elements comprises the same number of elements, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction by generating K results, each result being generated by performing the specified horizontal tile operation across every element of a corresponding group of the K groups, and writing each generated result to a corresponding location of the K specified destination locations.
-
7.
公开(公告)号:US20240329991A1
公开(公告)日:2024-10-03
申请号:US18194327
申请日:2023-03-31
Applicant: Intel Corporation
Inventor: Martin LANGHAMMER , Alexander F. HEINECKE
IPC: G06F9/30
CPC classification number: G06F9/30145 , G06F9/3001 , G06F9/30025
Abstract: An apparatus of an aspect includes decoder circuitry to decode an instruction. The instruction to indicate at least one source floating-point vector, a destination storage location, and at least one value. The source floating-point vector is to have floating-point data elements. The at least one value is to indicate at least one of: (a) a number of significand bits of the floating-point data elements; (b) a number of exponent bits of the floating-point data elements; (c) exponent bias information for the floating-point data elements; or (d) any combination thereof. Execution circuitry coupled with decoder circuitry is to perform operations according to the instruction. The operations include to interpret the floating-point data elements consistent with the at least one value, perform an operation specified by the instruction on the at least one source floating-point vector to generate a result vector, and store the result vector in the destination storage location.
-
公开(公告)号:US20240126555A1
公开(公告)日:2024-04-18
申请号:US17967756
申请日:2022-10-17
Applicant: Intel Corporation
Inventor: Saurabh GAYEN , Christopher J. HUGHES , Utkarsh Y. KAKAIYA , Alexander F. HEINECKE
CPC classification number: G06F9/3836 , G06F9/3004 , G06F9/3555
Abstract: A method of an aspect includes receiving a request for a chained accelerator operation, and configuring a chain of accelerators to perform the chained accelerator operation. This may include configuring a first accelerator to access an input data from a source memory location in system memory, process the input data, and generate first intermediate data. This may also include configuring a second accelerator to receive the first intermediate data, without the first intermediate data having been sent to the system memory, process the first intermediate data, and generate additional data. Other apparatus, methods, systems, and machine-readable medium are disclosed.
-
公开(公告)号:US20240078285A1
公开(公告)日:2024-03-07
申请号:US18502291
申请日:2023-11-06
Applicant: Intel Corporation
Inventor: Dan BAUM , Chen KOREN , Elmoustapha OULD-AHMED-VALL , Michael ESPIG , Christopher J. HUGHES , Raanan SADE , Robert VALENTINE , Mark J. CHARNEY , Alexander F. HEINECKE
CPC classification number: G06F17/16 , G06F9/3001 , G06F9/30101 , G06F9/3016 , G06F9/3802
Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.
-
公开(公告)号:US20220291927A1
公开(公告)日:2022-09-15
申请号:US17706428
申请日:2022-03-28
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Menachem ADELMAN , Elmoustapha OULD-AHMED-VALL , Bret L. TOLL , Milind B. GIRKAR , Zeev SPERBER , Mark J. CHARNEY , Rinat RAPPOPORT , Jesus CORBAL , Stanislav SHWARTSMAN , Igor YANOVER , Alexander F. HEINECKE , Barukh ZIV , Dan BAUM , Yuri GEBIL
Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information
-
-
-
-
-
-
-
-
-