APPARATUS AND METHOD FOR CONJUGATE TRANSPOSE AND MULTIPLY

    公开(公告)号:US20220197975A1

    公开(公告)日:2022-06-23

    申请号:US17133456

    申请日:2020-12-23

    Abstract: An apparatus and method for complex matrix conjugation and multiplication. For example, one embodiment of a processor comprises: a decoder to decode a complex matrix conjugation and multiplication instruction including a first source operand to identify a first complex source matrix comprising a first plurality of complex values, a second source operand to identify a second complex source matrix comprising a second plurality of complex values, and a first destination operand to identify a result matrix; execution circuitry to execute the complex matrix conjugation and multiplication instruction, the execution circuitry comprising: matrix conjugation hardware logic to determine a plurality of complex conjugate values corresponding to the first plurality of complex values; transpose hardware logic to transpose the plurality of complex conjugate values to generate a conjugate transpose matrix comprising the complex conjugate values; parallel multiplication circuitry to: multiply real values from the plurality of complex conjugate values of the conjugate transpose matrix with corresponding imaginary values from the second plurality of complex values to generate a first plurality of imaginary products, and multiply imaginary values from the plurality of complex conjugate values of the conjugate transpose matrix with corresponding real values from the second plurality of complex values to generate a second plurality of imaginary products; and addition/subtraction circuitry to add each imaginary product in the first plurality of imaginary products to a corresponding imaginary product in the second plurality of imaginary products to produce a corresponding imaginary component in the result matrix.

    APPARATUS AND METHOD FOR COMPLEX MATRIX TRANSPOSE AND MULTIPLY

    公开(公告)号:US20220197601A1

    公开(公告)日:2022-06-23

    申请号:US17133363

    申请日:2020-12-23

    Abstract: An apparatus and method for complex matrix transpose and multiply. For example, one embodiment of a processor comprises: a decoder to decode a first complex matrix multiplication and transpose instruction including a first source operand to identify a first plurality of real and imaginary values in a first complex source matrix, a second source operand to identify a second plurality of real and imaginary values in a second complex source matrix, and a first destination operand to identify a result matrix with real and imaginary values; execution circuitry to execute the first complex matrix transpose and multiplication instruction, the execution circuitry comprising transpose hardware logic to transpose at least one of the source matrices, parallel multiplication circuitry to multiply real values from the first plurality of real and imaginary values with corresponding real values from the second plurality of real and imaginary values to generate a first plurality of real products, to multiply imaginary values from the first plurality of real and imaginary values with corresponding imaginary values from the second plurality of real and imaginary values to generate a second plurality of real products; and addition/subtraction circuitry to subtract each real product in the second plurality of real products from a corresponding real product in the first plurality of real products to produce a corresponding real value in the result matrix.

    USING FUZZY-JBIT LOCATION OF FLOATING-POINT MULTIPLY-ACCUMULATE RESULTS

    公开(公告)号:US20210279038A1

    公开(公告)日:2021-09-09

    申请号:US17330064

    申请日:2021-05-25

    Abstract: Disclosed embodiments relate to performing floating-point (FP) arithmetic. In one example, a processor is to decode an instruction specifying locations of first, second, and third floating-point (FP) operands and an opcode calling for accumulating a FP product of the first and second FP operands with the third FP operand, and execution circuitry to, in a first cycle, generate the FP product having a Fuzzy-Jbit format comprising a sign bit, a 9-bit exponent, and a 25-bit mantissa having two possible positions for a JBit and, in a second cycle, to accumulate the FP product with the third FP operand, while concurrently, based on Jbit positions of the FP product and the third FP operand, determining an exponent adjustment and a mantissa shift control of a result of the accumulation, wherein performing the exponent adjustment concurrently enhances an ability to perform the accumulation in one cycle.

    SYSTEMS AND METHODS FOR PERFORMING 16-BIT FLOATING-POINT MATRIX DOT PRODUCT INSTRUCTIONS

    公开(公告)号:US20190079768A1

    公开(公告)日:2019-03-14

    申请号:US16186387

    申请日:2018-11-09

    Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.

Patent Agency Ranking