Patent search ap:("Intel Corporation") AND inv:"Amit Gradstein" Page 2

11.

发明申请
MATRIX OPERATION WITH MULTIPLE TILES PER MATRIX DIMENSION 有权

公开(公告)号：US20230094414A1

公开(公告)日：2023-03-30

申请号：US17484200

申请日：2021-09-24

Applicant: Intel Corporation

Inventor： Menachem Adelman , Amit Gradstein , Simon Rubanovich

IPC: G06F17/16

Abstract: An embodiment of an apparatus comprises a systolic array to perform a matrix operation on two input tiles to produce an output tile result, and circuitry coupled to the systolic array to cause the systolic array to perform respective full matrix operations on more than one tile per matrix dimension in response to a single request. Other embodiments are disclosed and claimed.

12.

发明授权
Systems, apparatuses, and methods for fused multiply add 有权

公开(公告)号：US11507369B2

公开(公告)日：2022-11-22

申请号：US17465905

申请日：2021-09-03

Applicant: Intel Corporation

Inventor： Robert Valentine , Galina Ryvchin , Piotr Majcher , Mark J. Charney , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal , Milind B. Girkar , Zeev Sperber , Simon Rubanovich , Amit Gradstein

IPC: G06F9/30 , G06F7/544 , G06F9/38

Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand. Execution circuitry then executes the decoded single instruction to perform, for each packed data element position of the destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.

13.

发明授权
Efficient implementation of complex vector fused multiply add and complex vector multiply 有权

公开(公告)号：US11455167B2

公开(公告)日：2022-09-27

申请号：US16701082

申请日：2019-12-02

Applicant: Intel Corporation

Inventor： Raanan Sade , Thierry Pons , Amit Gradstein , Zeev Sperber , Mark J. Charney , Robert Valentine , Eyal Oz-Sinay

IPC: G06F9/30 , G06F9/38 , G06F17/16

Abstract: Disclosed embodiments relate to efficient complex vector multiplication. In one example, an apparatus includes execution circuitry, responsive to an instruction having fields to specify multiplier, multiplicand, and summand complex vectors, to perform two operations: first, to generate a double-even multiplicand by duplicating even elements of the specified multiplicand, and to generate a temporary vector using a fused multiply-add (FMA) circuit having A, B, and C inputs set to the specified multiplier, the double-even multiplicand, and the specified summand, respectively, and second, to generate a double-odd multiplicand by duplicating odd elements of the specified multiplicand, to generate a swapped multiplier by swapping even and odd elements of the specified multiplier, and to generate a result using a second FMA circuit having its even product negated, and having A, B, and C inputs set to the swapped multiplier, the double-odd multiplicand, and the temporary vector, respectively.

14.

发明申请
APPARATUSES, METHODS, AND SYSTEMS FOR 8-BIT FLOATING-POINT MATRIX DOT PRODUCT INSTRUCTIONS 有权

公开(公告)号：US20220206801A1

公开(公告)日：2022-06-30

申请号：US17134373

申请日：2020-12-26

Applicant: Intel Corporation

Inventor： Naveen Mellempudi , Alexander F. Heinecke , Robert Valentine , Mark J. Charney , Christopher J. Hughes , Evangelos Georganas , Zeev Sperber , Amit Gradstein , Simon Rubanovich

IPC: G06F9/30 , G06F7/499 , G06F9/38

Abstract: Systems, methods, and apparatuses relating to 8-bit floating-point matrix dot product instructions are described. A processor embodiment includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a destination matrix having single-precision elements, a first source matrix, and a second source matrix, the source matrices having elements that each comprise a quadruple of 8-bit floating-point values, the opcode to indicate execution circuitry is to cause, for each element of the first source matrix and corresponding element of the second source matrix, a conversion of the 8-bit floating-point values to single-precision values, a multiplication of different pairs of converted single-precision values to generate plurality of results, and an accumulation of the results with previous contents of a corresponding element of the destination matrix, decode circuitry to decode the fetched instruction, and the execution circuitry to respond to the decoded instruction as specified by the opcode.

15.

发明授权
Enabling removal and reconstruction of flag operations in a processor 有权

公开(公告)号：US11036509B2

公开(公告)日：2021-06-15

申请号：US14930848

申请日：2015-11-03

Applicant: Intel Corporation

Inventor： Zeev Sperber , Tomer Weiner , Amit Gradstein , Simon Rubanovich , Alex Gerber , Itai Ravid

IPC: G06F9/30

Abstract: In one embodiment, a processor includes a fetch logic to fetch instructions, a decode logic to decode the fetched instructions, and an execution logic to execute at least some of the instructions. The decode logic may determine whether a flag portion of a first instruction to be folded is to be performed, and if not, accumulate a first immediate value of the first instruction with a folded immediate value obtained from an entry of an immediate buffer. Other embodiments are described and claimed.

16.

发明授权
Apparatuses, methods, and systems for transpose instructions of a matrix operations accelerator 有权

公开(公告)号：US10990397B2

公开(公告)日：2021-04-27

申请号：US16370894

申请日：2019-03-30

Applicant: Intel Corporation

Inventor： Amit Gradstein , Simon Rubanovich , Sagi Meller , Zeev Sperber , Jose Yallouz , Robert Valentine

IPC: G06F9/30 , G06F9/38 , G06F17/16

Abstract: Systems, methods, and apparatuses relating to a matrix operations accelerator are described. In one embodiment, a processor includes a matrix operations accelerator circuit that includes a two-dimensional grid of fused multiply accumulate circuits; a first plurality of registers that represents an input two-dimensional matrix coupled to the matrix operations accelerator circuit; a decoder, of a core coupled to the matrix operations accelerator circuit, to decode an instruction into a decoded instruction; and an execution circuit of the core to execute the decoded instruction to cause the two-dimensional grid of fused multiply accumulate circuits to form a transpose of the input two-dimensional matrix when the matrix operations accelerator circuit is in a transpose mode.

17.

发明授权
Systems and methods for performing 16-bit floating-point matrix dot product instructions 有权

公开(公告)号：US10963246B2

公开(公告)日：2021-03-30

申请号：US16186387

申请日：2018-11-09

Applicant: Intel Corporation

Inventor： Alexander F. Heinecke , Robert Valentine , Mark J. Charney , Raanan Sade , Menachem Adelman , Zeev Sperber , Amit Gradstein , Simon Rubanovich

IPC: G06F9/30 , G06F9/38

Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.

18.

发明申请
APPARATUSES, METHODS, AND SYSTEMS FOR HASHING INSTRUCTIONS 有权

公开(公告)号：US20210049013A1

公开(公告)日：2021-02-18

申请号：US17087536

申请日：2020-11-02

Applicant: Intel Corporation

Inventor： Regev Shemy , Zeev Sperber , Wajdi Feghali , Vinodh Gopal , Amit Gradstein , Simon Rubanovich , Sean Gulley , Ilya Albrekht , Jacob Doweck , Jose Yallouz , Ittai Anati

IPC: G06F9/30 , G06F9/38 , H04L9/06

Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.

19.

发明授权
Efficient implementation of complex vector fused multiply add and complex vector multiply 有权

公开(公告)号：US10521226B2

公开(公告)日：2019-12-31

申请号：US15941531

申请日：2018-03-30

Applicant: Intel Corporation

Inventor： Raanan Sade , Thierry Pons , Amit Gradstein , Zeev Sperber , Mark J. Charney , Robert Valentine , Eyal Oz-Sinay

IPC: G06F9/30 , G06F17/16 , G06F9/38

Abstract: Disclosed embodiments relate to efficient complex vector multiplication. In one example, an apparatus includes execution circuitry, responsive to an instruction having fields to specify multiplier, multiplicand, and summand complex vectors, to perform two operations: first, to generate a double-even multiplicand by duplicating even elements of the specified multiplicand, and to generate a temporary vector using a fused multiply-add (FMA) circuit having A, B, and C inputs set to the specified multiplier, the double-even multiplicand, and the specified summand, respectively, and second, to generate a double-odd multiplicand by duplicating odd elements of the specified multiplicand, to generate a swapped multiplier by swapping even and odd elements of the specified multiplier, and to generate a result using a second FMA circuit having its even product negated, and having A, B, and C inputs set to the swapped multiplier, the double-odd multiplicand, and the temporary vector, respectively.

20.

发明授权
Systems, apparatuses, and methods for performing a double blocked sum of absolute differences 有权

公开(公告)号：US10303471B2

公开(公告)日：2019-05-28

申请号：US15445741

申请日：2017-02-28

Applicant: Intel Corporation

Inventor： Elmoustapha Ould-Ahmed-Vall , Mostafa Hagog , Robert Valentine , Amit Gradstein , Simon Rubanovich , Zeev Sperber

IPC: G06F9/302 , G06F7/544 , G06F15/78 , G06F9/30 , G06F9/38 , G06F7/50

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector double block packed sum of absolute differences (SAD) in response to a single vector double block packed sum of absolute differences instruction that includes a destination vector register operand, first and second source operands, an immediate, and an opcode are described.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification