Systems, Apparatuses, And Methods For Fused Multiply Add

    公开(公告)号:US20210406011A1

    公开(公告)日:2021-12-30

    申请号:US17468258

    申请日:2021-09-07

    Abstract: Embodiments of systems, apparatuses, and methods for fused multiple add. In some embodiments, a decoder decodes a single instruction having an opcode, a destination field representing a destination operand, and fields for a first, second, and third packed data source operand, wherein packed data elements of the first and second packed data source operand are of a first, different size than a second size of packed data elements of the third packed data operand. Execution circuitry then executes the decoded single instruction to perform, for each packed data element position of the destination operand, a multiplication of a M N-sized packed data elements from the first and second packed data sources that correspond to a packed data element position of the third packed data source, add of results from these multiplications to a full-sized packed data element of a packed data element position of the third packed data source, and storage of the addition result in a packed data element position destination corresponding to the packed data element position of the third packed data source, wherein M is equal to the full-sized packed data element divided by N.

    Apparatuses, methods, and systems for hashing instructions

    公开(公告)号:US11188335B2

    公开(公告)日:2021-11-30

    申请号:US17087536

    申请日:2020-11-02

    Abstract: Systems, methods, and apparatuses relating to performing hashing operations on packed data elements are described. In one embodiment, a processor includes a decode circuit to decode a single instruction into a decoded single instruction, the single instruction including at least one first field that identifies eight 32-bit state elements A, B, C, D, E, F, G, and H for a round according to a SM3 hashing standard and at least one second field that identifies an input message; and an execution circuit to execute the decoded single instruction to: rotate state element C left by 9 bits to form a rotated state element C, rotate state element D left by 9 bits to form a rotated state element D, rotate state element G left by 19 bits to form a rotated state element G, rotate state element H left by 19 bits to form a rotated state element H, perform two rounds according to the SM3 hashing standard on the input message and state element A, state element B, rotated state element C, rotated state element D, state element E, state element F, rotated state element G, and rotated state element H to generate an updated state element A, an updated state element B, an updated state element E, and an updated state element F, and store the updated state element A, the updated state element B, the updated state element E, and the updated state element F into a location specified by the single instruction.

    Efficient rotate adder for implementing cryptographic basic operations

    公开(公告)号:US11176278B2

    公开(公告)日:2021-11-16

    申请号:US16236450

    申请日:2018-12-29

    Abstract: Integrated circuits to compute a result of summing m values, rotating the sum by k bits, and adding a summation of n values Bi to Bn to the rotated sum. An embodiment includes: a first carry save adder to add up the m values to generate a first carry and a first sum; rotator circuitry to rotate both the first carry and the first sum by k bits to generate a second carry and a second sum; a second carry save adder to add up the second carry, the second sum, and the summation of values Bi to Bn to generate a third carry and a third sum; two parallel adders to generate a first intermediate result and a second intermediary result based on the third carry and the third sum; and a multiplexer to generate the result utilizing various portions of the first and second intermediate results.

    Apparatus and method of improved permute instructions

    公开(公告)号:US10474459B2

    公开(公告)日:2019-11-12

    申请号:US15808788

    申请日:2017-11-09

    Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.

Patent Agency Ranking