REDUNDANT EXECUTION FOR RELIABILITY IN A SUPER FMA ALU
    1.
    发明申请
    REDUNDANT EXECUTION FOR RELIABILITY IN A SUPER FMA ALU 有权
    在超级FMA ALU中实现可靠性的冗余执行

    公开(公告)号:US20140189305A1

    公开(公告)日:2014-07-03

    申请号:US13732228

    申请日:2012-12-31

    申请人: Brian J. Hickmann

    发明人: Brian J. Hickmann

    IPC分类号: G06F9/30

    摘要: A system, processor and method to increase computational reliability by using underutilized portions of a data path with a SuperFMA ALU. The method allows the reuse of underutilized hardware to implement spatial redundancy by using detection during the dispatch stage to determine if the operation may be executed by redundant hardware in the ALU. During execution, if determination is made that the correct conditions exists as determined by the redundant execution modes, the SuperFMA ALU performs the operation with redundant execution and compares the results for a match in order to generate a computational result. The method to increase computational reliability by using redundant execution is advantageous because the hardware cost of adding support for redundant execution is low and the complexity of implementation of the disclosed method is minimal due to the reuse of existing hardware.

    摘要翻译: 一种通过使用SuperFMA ALU通过使用数据路径的不充分部分来增加计算可靠性的系统,处理器和方法。 该方法允许利用未充分利用的硬件来重新利用在调度阶段使用检测来实现空间冗余,以确定该操作是否可以由ALU中的冗余硬件执行。 在执行期间,如果确定冗余执行模式所确定的正确条件,SuperFMA ALU将执行冗余执行操作,并比较匹配结果以生成计算结果。 通过使用冗余执行来增加计算可靠性的方法是有利的,因为对冗余执行的添加支持的硬件成本低,并且由于现有硬件的重用而使所公开的方法的实施的复杂度最小。

    FUNCTIONAL UNIT CAPABLE OF EXECUTING APPROXIMATIONS OF FUNCTIONS
    4.
    发明申请
    FUNCTIONAL UNIT CAPABLE OF EXECUTING APPROXIMATIONS OF FUNCTIONS 有权
    具有执行功能近似功能的功能单元

    公开(公告)号:US20120079250A1

    公开(公告)日:2012-03-29

    申请号:US12890533

    申请日:2010-09-24

    IPC分类号: G06F9/302

    摘要: A semiconductor chip is described having a functional unit that can execute a first instruction and execute a second instruction. The first instruction is an instruction that multiplies two operands. The second instruction is an instruction that approximates a function according to C0+C1X2+C2X22. The functional unit has a multiplier circuit. The multiplier circuit has: i) a first input to receive bits of a first operand of the first instruction and receive bits of a C1 term of the second instruction; ii) a second input to receive bits of a second operand of the first instruction and receive bits of a X2 term of the second instruction.

    摘要翻译: 描述了具有可执行第一指令并执行第二指令的功能单元的半导体芯片。 第一条指令是将两个操作数相乘的指令。 第二条指令是根据C0 + C1X2 + C2X22近似函数的指令。 功能单元具有乘法电路。 所述乘法器电路具有:i)第一输入,用于接收所述第一指令的第一操作数的比特并接收所述第二指令的C1项的比特; ii)用于接收第一指令的第二操作数的比特并接收第二指令的X2项的比特的第二输入。

    APPARATUS AND METHOD FOR FUSED MULTIPLY-MULTIPLY INSTRUCTIONS
    5.
    发明申请
    APPARATUS AND METHOD FOR FUSED MULTIPLY-MULTIPLY INSTRUCTIONS 审中-公开
    多功能多用途指令的装置和方法

    公开(公告)号:US20160188327A1

    公开(公告)日:2016-06-30

    申请号:US14583046

    申请日:2014-12-24

    IPC分类号: G06F9/30

    摘要: In one embodiment of the invention, a processor device including a storage location configured to store a set of source packed-data operands, each of the operands having a plurality of packed-data elements that are positive or negative according to an immediate bit value within one of the operands. The processor also including: a decoder to decode an instruction requiring an input of a plurality of source operands, and an execution unit to receive the decoded instructions and to generate a result that is a product of the source operands. In one embodiment, the result is stored back into one of the source operands or the result is stored into an operand that is independent of the source operands.

    摘要翻译: 在本发明的一个实施例中,一种包括存储位置的处理器设备,被配置为存储一组源压缩数据操作数,每个操作数具有多个压缩数据元素,根据其内的立即位值,该数据元素为正或负值 其中一个操作数。 处理器还包括:解码器,用于对需要多个源操作数的输入的指令进行解码;以及执行单元,用于接收解码的指令并产生作为源操作数的乘积的结果。 在一个实施例中,将结果存储回源操作数之一,或将结果存储到独立于源操作数的操作数中。

    COALESCING ADJACENT GATHER/SCATTER OPERATIONS
    7.
    发明申请
    COALESCING ADJACENT GATHER/SCATTER OPERATIONS 有权
    加油相机/散热器操作

    公开(公告)号:US20140181464A1

    公开(公告)日:2014-06-26

    申请号:US13997784

    申请日:2012-12-26

    IPC分类号: G06F12/10

    摘要: According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location.

    摘要翻译: 根据一个实施例,处理器包括指令解码器,用于解码从存储器收集数据元素的第一指令,所述第一指令具有指定第一存储位置的第一操作数和指定存储多个数据元素的第一存储器地址的第二操作数 。 处理器还包括执行单元,其响应于第一指令而耦合到指令解码器,基于由第二操作数指示的第一存储器地址从存储器位置读取连续的第一和第二数据元素,并且 将所述第一数据元素存储在所述第一存储位置的第一条目中,以及将第二数据元素存储在与所述第一存储位置的所述第一条目相对应的第二存储位置的第二条目中。

    Method, apparatus, system for single-path floating-point rounding flow that supports generation of normals/denormals and associated status flags
    8.
    发明授权
    Method, apparatus, system for single-path floating-point rounding flow that supports generation of normals/denormals and associated status flags 有权
    用于单路径浮点舍入流的方法,装置,系统,其支持法线/代数的生成和相关联的状态标志

    公开(公告)号:US09141586B2

    公开(公告)日:2015-09-22

    申请号:US13725268

    申请日:2012-12-21

    IPC分类号: G06F7/499 G06F17/10 G06F7/00

    摘要: A mechanism for performing single-path floating-point rounding in a floating point unit is disclosed. A system of the disclosure includes a memory and a processing device communicably coupled to the memory. In one embodiment, the processing device comprises a floating point unit (FPU) to generate a plurality of status flags for a rounded value of a finite nonzero number. The plurality of status flags are generated based on the finite nonzero number without calculating the rounded value of the finite nonzero number. The plurality of status flags comprises an overflow flag and an underflow flag. The FPU determines whether a rounded value should be calculated for the finite nonzero number based on the plurality of status flags and whether the overflow flag is asserted. Upon determining that the rounded value should be calculated for the finite nonzero number based on the plurality of status flags and that the overflow flag is asserted, the FPU calculates the rounded value of the finite nonzero number based on an overflow rounding. Upon determining that the rounded value should be calculated for the finite nonzero number based on the plurality of status flags and that the overflow flag is not asserted, the FPU calculates the rounded value of the finite nonzero number based on a blended reduced precision rounding.

    摘要翻译: 公开了一种用于在浮点单元中执行单路径浮点舍入的机构。 本公开的系统包括可通信地耦合到存储器的存储器和处理装置。 在一个实施例中,处理装置包括浮点单元(FPU),用于为有限非零数的舍入值生成多个状态标志。 在不计算有限非零数的舍入值的情况下,基于有限非零数生成多个状态标志。 多个状态标志包括溢出标志和下溢标志。 基于多个状态标志,FPU确定是否应针对有限非零数计算舍入值,以及是否断言溢出标志。 在确定对于基于多个状态标志的有限非零数量计算舍入值并且断言溢出标志时,FPU基于溢出舍入来计算有限非零数的舍入值。 在确定基于多个状态标志对于有限非零数进行舍入值计算并且不断言溢出标志时,FPU基于混合减少的精度舍入来计算有限非零数的舍入值。

    METHOD, APPARATUS, SYSTEM FOR SINGLE-PATH FLOATING-POINT ROUNDING FLOW THAT SUPPORTS GENERATION OF NORMALS/DENORMALS AND ASSOCIATED STATUS FLAGS
    9.
    发明申请
    METHOD, APPARATUS, SYSTEM FOR SINGLE-PATH FLOATING-POINT ROUNDING FLOW THAT SUPPORTS GENERATION OF NORMALS/DENORMALS AND ASSOCIATED STATUS FLAGS 有权
    方法,装置,用于支持正常/丹麦和相关状态标记生成的单路流量浮点流动系统

    公开(公告)号:US20140181169A1

    公开(公告)日:2014-06-26

    申请号:US13725268

    申请日:2012-12-21

    IPC分类号: G06F17/10

    摘要: A mechanism for performing single-path floating-point rounding in a floating point unit is disclosed. A system of the disclosure includes a memory and a processing device communicably coupled to the memory. In one embodiment, the processing device comprises a floating point unit (FPU) to generate a plurality of status flags for a rounded value of a finite nonzero number. The plurality of status flags are generated based on the finite nonzero number without calculating the rounded value of the finite nonzero number. The plurality of status flags comprises an overflow flag and an underflow flag. The FPU determines whether a rounded value should be calculated for the finite nonzero number based on the plurality of status flags and whether the overflow flag is asserted. Upon determining that the rounded value should be calculated for the finite nonzero number based on the plurality of status flags and that the overflow flag is asserted, the FPU calculates the rounded value of the finite nonzero number based on an overflow rounding. Upon determining that the rounded value should be calculated for the finite nonzero number based on the plurality of status flags and that the overflow flag is not asserted, the FPU calculates the rounded value of the finite nonzero number based on a blended reduced precision rounding.

    摘要翻译: 公开了一种用于在浮点单元中执行单路径浮点舍入的机构。 本公开的系统包括可通信地耦合到存储器的存储器和处理装置。 在一个实施例中,处理装置包括浮点单元(FPU),用于为有限非零数的舍入值生成多个状态标志。 在不计算有限非零数的舍入值的情况下,基于有限非零数生成多个状态标志。 多个状态标志包括溢出标志和下溢标志。 基于多个状态标志,FPU确定是否应针对有限非零数计算舍入值,以及是否断言溢出标志。 在确定对于基于多个状态标志的有限非零数量计算舍入值并且断言溢出标志时,FPU基于溢出舍入来计算有限非零数的舍入值。 在确定基于多个状态标志对于有限非零数进行舍入值计算并且不断言溢出标志时,FPU基于混合减少的精度舍入来计算有限非零数的舍入值。

    Functional unit capable of executing approximations of functions
    10.
    发明授权
    Functional unit capable of executing approximations of functions 有权
    能够执行功能近似的功能单元

    公开(公告)号:US08676871B2

    公开(公告)日:2014-03-18

    申请号:US12890533

    申请日:2010-09-24

    IPC分类号: G06F1/02

    摘要: A semiconductor chip is described having a functional unit that can execute a first instruction and execute a second instruction. The first instruction is an instruction that multiplies two operands. The second instruction is an instruction that approximates a function according to C0+C1X2+C2X22. The functional unit has a multiplier circuit. The multiplier circuit has: i) a first input to receive bits of a first operand of the first instruction and receive bits of a C1 term of the second instruction; ii) a second input to receive bits of a second operand of the first instruction and receive bits of a X2 term of the second instruction.

    摘要翻译: 描述了具有可执行第一指令并执行第二指令的功能单元的半导体芯片。 第一条指令是将两个操作数相乘的指令。 第二条指令是根据C0 + C1X2 + C2X22近似函数的指令。 功能单元具有乘法电路。 所述乘法器电路具有:i)第一输入,用于接收所述第一指令的第一操作数的比特并接收所述第二指令的C1项的比特; ii)用于接收第一指令的第二操作数的比特并接收第二指令的X2项的比特的第二输入。