Apparatus and method of improved insert instructions

    公开(公告)号:US10459728B2

    公开(公告)日:2019-10-29

    申请号:US15809721

    申请日:2017-11-10

    Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus also includes masking layer circuitry to mask the first and third instructions at a first resultant vector granularity, and, mask the second and fourth instructions at a second resultant vector granularity.

    Instruction and logic for early underflow detection and rounder bypass

    公开(公告)号:US10157059B2

    公开(公告)日:2018-12-18

    申请号:US15280324

    申请日:2016-09-29

    Abstract: A processor for floating point underflow detection includes circuitry to decode a first instruction and a floating point unit. The decoded instruction, when executed by the processor, may be for performing a fused multiply-add (FMA) operation. The floating point unit includes circuitry to determine a non-normalized result of the first instruction based on a first input, a second input, and a third input. The floating point unit further includes circuitry to determine whether underflow exists in the non-normalized result based on a first exponent of the first input, a second exponent of the second input, and a third exponent of the third input.

    Vector mask driven clock gating for power efficiency of a processor

    公开(公告)号:US10133577B2

    公开(公告)日:2018-11-20

    申请号:US13997791

    申请日:2012-12-19

    Abstract: A processor includes an instruction schedule and dispatch (schedule/dispatch) unit to receive a single instruction multiple data (SIMD) instruction to perform an operation on multiple data elements stored in a storage location indicated by a first source operand. The instruction schedule/dispatch unit is to determine a first of the data elements that will not be operated to generate a result written to a destination operand based on a second source operand. The processor further includes multiple processing elements coupled to the instruction schedule/dispatch unit to process the data elements of the SIMD instruction in a vector manner, and a power management unit coupled to the instruction schedule/dispatch unit to reduce power consumption of a first of the processing elements configured to process the first data element.

    Floating point scaling processors, methods, systems, and instructions

    公开(公告)号:US10089076B2

    公开(公告)日:2018-10-02

    申请号:US15922074

    申请日:2018-03-15

    Abstract: A method of an aspect includes receiving a floating point scaling instruction. The floating point scaling instruction indicates a first source including one or more floating point data elements, a second source including one or more corresponding floating point data elements, and a destination. A result is stored in the destination in response to the floating point scaling instruction. The result includes one or more corresponding result floating point data elements each including a corresponding floating point data element of the second source multiplied by a base of the one or more floating point data elements of the first source raised to a power of an integer representative of the corresponding floating point data element of the first source. Other methods, apparatus, systems, and instructions are disclosed.

    Leading change anticipator logic
    89.
    发明授权
    Leading change anticipator logic 有权
    领先的变化预测逻辑

    公开(公告)号:US09274752B2

    公开(公告)日:2016-03-01

    申请号:US13729421

    申请日:2012-12-28

    CPC classification number: G06F7/74 G06F5/012 G06F7/485

    Abstract: In one embodiment, a processor includes at least one floating point unit. The at least one floating point unit may include an adder, leading change anticipator (LCA) logic, and a shifter. The adder may be to add a first operand X and a second operand Y to obtain an output operand having a bit length n. The LCA logic may be to: for each bit position i from n−1 to 1, obtain a set of propagation values and a set of bit values based on the first operand X and the second operand Y; and generate a LCA mask based on the set of propagation values and the set of bit values. The shifter may be to normalize the output operand based on the LCA mask. Other embodiments are described and claimed.

    Abstract translation: 在一个实施例中,处理器包括至少一个浮点单元。 所述至少一个浮点单元可以包括加法器,引导改变预测器(LCA)逻辑和移位器。 加法器可以添加第一操作数X和第二操作数Y以获得具有位长度n的输出操作数。 LCA逻辑可以是:对于从n-1到1的每个比特位置i,基于第一操作数X和第二操作数Y获得一组传播值和一组比特值; 并且基于传播值集合和位值集合来生成LCA掩码。 移位器可以是基于LCA掩码来规范化输出操作数。 描述和要求保护其他实施例。

    Apparatus and method for conjugate transpose and multiply

    公开(公告)号:US12216734B2

    公开(公告)日:2025-02-04

    申请号:US17133456

    申请日:2020-12-23

    Abstract: An apparatus and method for complex matrix conjugation and multiplication. For example, one embodiment of a processor comprises: a decoder to decode a complex matrix conjugation and multiplication instruction including a first source operand to identify a first complex source matrix comprising a first plurality of complex values, a second source operand to identify a second complex source matrix comprising a second plurality of complex values, and a first destination operand to identify a result matrix; execution circuitry to execute the complex matrix conjugation and multiplication instruction, the execution circuitry comprising: matrix conjugation hardware logic to determine a plurality of complex conjugate values corresponding to the first plurality of complex values; transpose hardware logic to transpose the plurality of complex conjugate values to generate a conjugate transpose matrix comprising the complex conjugate values; parallel multiplication circuitry to: multiply real values from the plurality of complex conjugate values of the conjugate transpose matrix with corresponding imaginary values from the second plurality of complex values to generate a first plurality of imaginary products, and multiply imaginary values from the plurality of complex conjugate values of the conjugate transpose matrix with corresponding real values from the second plurality of complex values to generate a second plurality of imaginary products; and addition/subtraction circuitry to add each imaginary product in the first plurality of imaginary products to a corresponding imaginary product in the second plurality of imaginary products to produce a corresponding imaginary component in the result matrix.

Patent Agency Ranking