Correct and efficient sticky bit calculation for exact floating point
divide/square root results
    1.
    发明授权
    Correct and efficient sticky bit calculation for exact floating point divide/square root results 失效
    精确浮点除法/平方根结果的正确和有效的粘性位计算

    公开(公告)号:US5787030A

    公开(公告)日:1998-07-28

    申请号:US498397

    申请日:1995-07-05

    摘要: Quotient digit selection logic is modified so as to prevent a partial remainder equal to the negative divisor from occurring. An enhanced quotient digit selection function prevents the working partial remainder from becoming negative if the result is exact. The enhanced quotient digit selection logic chooses a quotient digit of zero instead of a quotient digit of one when the actual partial remainder is zero. Using a five bit estimated partial remainder where the upper four bits are zero, a possible carry propagation into fourth most significant bit is detected. This can be accomplished by looking at the fifth most significant sum and carry bits of the redundant partial remainder. If they are both zero, then a carry propagation out of that bit position into the least significant position of the estimated partial remainder is not possible, and a quotient digit of zero is chosen. In the alternative case in which one or both of the fifth most significant carry or sum bits of the redundant partial remainder are ones, a quotient digit of one is chosen. This provides a one cycle savings since negative partial remainders no longer need to be restored before calculating the sticky bit. Extra hardware is eliminated because it is no longer necessary to provide any extra mechanism for restoring the preliminary final partial remainder. Latency is improved because no additional cycle time is required to restore negative preliminary partial remainders. An optimized five-level circuit is shown which implements the enhanced quotient selection function.

    摘要翻译: 修改商数字选择逻辑,以防止等于负除数的部分余数发生。 如果结果是精确的,增强的商数选择功能可防止工作部分余数变为负数。 当实际部分余数为零时,增强的商数选择逻辑选择零的商数,而不是1的商数。 使用五位估计的部分余数,其中高四位为零,检测到第四最高有效位的可能进位传播。 这可以通过查看第五最高有效和并且携带冗余部分余数的位来实现。 如果它们均为零,则从该位位置进入估计的部分余数的最低有效位置的进位传播是不可能的,并且选择零的商数。 在冗余部分余数的第五最高有效进位或和位中的一个或两个为一个的替代情况下,选择一个的商数。 这提供了一个周期的节省,因为在计算粘性位之前不再需要恢复负部分余数。 额外的硬件被消除,因为不再需要提供任何额外的机制来恢复初步的最终部分剩余。 改进了延迟,因为不需要额外的周期时间来恢复负的初步部分余数。 示出了优化的五电平电路,其实现增强的商选择功能。

    Three overlapped stages of radix-2 square root/division with speculative
execution
    2.
    发明授权
    Three overlapped stages of radix-2 square root/division with speculative execution 失效
    基数2平方根/划分与投机执行的三个重叠阶段

    公开(公告)号:US5870323A

    公开(公告)日:1999-02-09

    申请号:US928073

    申请日:1997-09-11

    CPC分类号: G06F7/535 G06F7/5525

    摘要: In hardware SRT division and square root mantissa units maximal quotient selection overlapping for three quotient digits per cycle are used. An effective radix-8 implementation cascades three partial remainder computation circuits and overlaps three quotient selection circuits. Two carry save adders speculatively compute the possible resulting partial remainders corresponding to each possible value, -1, 0, and +1, of the quotient digit by adding the divisor, not adding anything, and adding the two's complement of the divisor, respectively, thus shortening the critical path of a single SRT iteration producing a single quotient digit. The propagation delays of two carry save adders which speculatively compute the possible resulting partial remainders are masked by a longer delay through quotient selection logic.

    摘要翻译: 在硬件SRT划分和平方根尾数单位中,使用每个周期三个商数的最大商选择重叠。 有效的基数8实现级联三个部分余数计算电路并与三个商选择电路重叠。 两个进位保存加法器通过添加除数而不添加任何东西并分别添加除数的二进制补码来推测计算对应于商数的每个可能值-1,0和+1的可能的结果部分余数, 从而缩短产生单个商数的单个SRT迭代的关键路径。 推测计算可能产生的部分余数的两个进位保存加法器的传播延迟被商选择逻辑的较长延迟掩蔽。

    Three overlapped stages of radix-2 square root/division with speculative
execution
    3.
    发明授权
    Three overlapped stages of radix-2 square root/division with speculative execution 失效
    基数2平方根/划分与投机执行的三个重叠阶段

    公开(公告)号:US5696712A

    公开(公告)日:1997-12-09

    申请号:US498424

    申请日:1995-07-05

    CPC分类号: G06F7/535 G06F7/5525

    摘要: In hardware SRT division and square root mantissa units maximal quotient selection overlapping for three quotient digits per cycle are used. An effective radix-8 implementation cascades three partial remainder computation circuits and overlaps three quotient selection circuits. Two carry save adders speculatively compute the possible resulting partial remainders corresponding to each possible value, -1, 0 , and +1, of the quotient digit by adding the divisor, not adding anything, and adding the two's complement of the divisor, respectively, thus shortening the critical path of a single SRT iteration producing a single quotient digit. The propagation delays of two carry save adders which speculatively compute the possible resulting partial remainders are masked by a longer delay through quotient selection logic.

    摘要翻译: 在硬件SRT划分和平方根尾数单位中,使用每个周期三个商数的最大商选择重叠。 有效的基数8实现级联三个部分余数计算电路并与三个商选择电路重叠。 两个进位保存加法器通过添加除数而不添加任何东西并分别添加除数的二进制补码来推测计算对应于商数的每个可能值-1,0和+1的可能的结果部分余数, 从而缩短产生单个商数的单个SRT迭代的关键路径。 推测计算可能产生的部分余数的两个进位保存加法器的传播延迟被商选择逻辑的较长延迟掩蔽。

    SIMD TCP/UDP checksumming in a CPU
    4.
    发明授权
    SIMD TCP/UDP checksumming in a CPU 失效
    CPU中的SIMD TCP / UDP校验和

    公开(公告)号:US5953240A

    公开(公告)日:1999-09-14

    申请号:US880925

    申请日:1997-06-23

    IPC分类号: G06F7/50 G06F11/00 G06F11/10

    CPC分类号: G06F7/507 G06F7/505

    摘要: A CPU adapted to calculate a checksum simultaneously on multiple values packed into a single register. An adder is provided which adds a number of values packed into a first register to a number of packed values from a second register. The adder is constructed, or partitioned, so that the values do not propagate their carry bit to the next value. A special carry bit adder is provided which will add a carry bit out of each partitioned portion back into the sum value to generate the sum required by the checksum protocol.

    摘要翻译: CPU适用于同时计算打包到单个寄存器中的多个值的校验和。 提供了一个加法器,它将打包到第一寄存器中的值的数目从第二寄存器添加到多个压缩值。 加法器被构造或分区,使得值不将它们的进位位传播到下一个值。 提供了一个特殊进位位加法器,它将每个分区部分中的进位位返回到和值中,以产生校验和协议所需的总和。

    Merging single precision floating point operands
    5.
    发明授权
    Merging single precision floating point operands 有权
    合并单精度浮点运算

    公开(公告)号:US06463525B1

    公开(公告)日:2002-10-08

    申请号:US09375700

    申请日:1999-08-16

    申请人: J. Arjun Prabhu

    发明人: J. Arjun Prabhu

    IPC分类号: G06F938

    摘要: Where it is desired to perform a double precision operation using single precision operands, first and second single precision operands are loaded into first and second respective rows of a re-order buffer, and third and fourth single precision operands are loaded into third and fourth respective rows of the re-order buffer. A first merge instruction copies the first and second single precision operands from respective first and second rows of the re-order buffer into first and second portions of a fifth row of the re-order buffer, thereby concatenating the first and second single precision operands to represent a first double precision operand. A second merge instruction copies the third and fourth single precision operands from respective third and fourth rows of the re-order buffer into first and second portions of a sixth row of the re-order buffer, thereby concatenating the third and fourth single precision operands to represent a second double precision operand. The first and second double precision operands stored in the fifth and sixth rows, respectively, of the re-order buffer are then provided directly to an associated FPU for execution.

    摘要翻译: 在希望使用单精度操作数执行双精度操作的情况下,将第一和第二单精度操作数加载到重新排序缓冲器的第一和第二相应行中,并且将第三和第四单精度操作数加载到第三和第四单独精度操作数中 重排缓冲区的行。 第一合并指令将第一和第二单精度操作数从重排序缓冲器的相应第一和第二行复制到重新排序缓冲器的第五行的第一和第二部分,从而将第一和第二单精度操作数连接到 代表第一个双精度操作数。 第二合并指令将第三和第四单精度操作数从重排序缓冲器的相应第三和第四行复制到重排序缓冲器的第六行的第一和第二部分,从而将第三和第四单精度操作数连接到 代表第二个双精度操作数。 然后分别存储在重新排序缓冲器的第五和第六行中的第一和第二双精度操作数被直接提供给相关联的FPU以供执行。

    Quotient digit selection logic for floating point division/square root
    6.
    发明授权
    Quotient digit selection logic for floating point division/square root 有权
    用于浮点除法/平方根的商数字选择逻辑

    公开(公告)号:US06594681B1

    公开(公告)日:2003-07-15

    申请号:US09390071

    申请日:1999-09-03

    申请人: J. Arjun Prabhu

    发明人: J. Arjun Prabhu

    IPC分类号: G06F738

    摘要: Quotient digit selection logic using a three-bit carry propagate adder is presented. An enhanced quotient digit selection function prevents the working partial remainder from becoming negative if the result is exact. The enhanced quotient digit selection logic chooses a quotient digit of zero instead of a quotient digit of one when the actual partial remainder is zero. Using a four bit estimated partial remainder where the upper four bits are zero, a possible carry propagation into fourth most significant bit is detected. This can be accomplished by looking at the fourth most significant sum and carry bits of the redundant partial remainder. If they are both zero, then a carry propagation out of that bit position into the least significant position of the estimated partial remainder is not possible, and a quotient digit of zero is chosen. This provides a one cycle savings since negative partial remainders no longer need to be restored before calculating the sticky bit. Extra hardware is eliminated because it is no longer necessary to provide any extra mechanism for restoring the preliminary final partial remainder. Latency is improved because no additional cycle time is required to restore negative preliminary partial remainders. In an alternative embodiment, where the upper three bits of the estimated partial remainder are ones while the fourth most significant bit is zero, a quotient digit of negative one is chosen. This alternative embodiment allows correct exact results in all rounding modes including rounding toward plus or minus infinity.

    摘要翻译: 提出了使用三位进位传播加法器的商数字选择逻辑。 如果结果是精确的,增强的商数选择功能可防止工作部分余数变为负数。 当实际部分余数为零时,增强的商数选择逻辑选择零的商数,而不是1的商数。 使用四位估计的部分余数,其中高四位为零,可检测到第四最高有效位的可能进位传播。 这可以通过查看第四最高有效和并且携带冗余部分余数的位来实现。 如果它们均为零,则从该位位置进入估计的部分余数的最低有效位置的进位传播是不可能的,并且选择零的商数。 这提供了一个周期的节省,因为在计算粘性位之前不再需要恢复负部分余数。 额外的硬件被消除,因为不再需要提供任何额外的机制来恢复初步的最终部分剩余。 改进了延迟,因为不需要额外的周期时间来恢复负的初步部分余数。 在替代实施例中,当第四最高有效位为零时,估计的部分余数的高三位是1,而选择负数的商数。 该替代实施例允许在所有舍入模式中的正确精确结果,包括向正或负无穷大舍入。

    Quotient digit selection logic for floating point division/square root
    7.
    发明授权
    Quotient digit selection logic for floating point division/square root 失效
    用于浮点除法/平方根的商数字选择逻辑

    公开(公告)号:US5954789A

    公开(公告)日:1999-09-21

    申请号:US648410

    申请日:1996-05-15

    摘要: Quotient digit selection logic is modified so as to prevent a partial remainder equal to the negative divisor from occurring. An enhanced quotient digit selection function prevents the working partial remainder from becoming negative if the result is exact, choosing a quotient digit of zero instead of a quotient digit of one when the actual partial remainder is zero. Using a five bit estimated partial remainder where the upper four bits are zero, a possible carry propagation into fourth most significant bit is detected. This can be accomplished by looking at the fifth most significant sum and carry bits of the redundant partial remainder. If they are both zero, then a carry propagation out of that bit position into the least significant position of the estimated partial remainder is not possible, and a quotient digit of zero is chosen. This provides a one cycle savings since negative partial remainders no longer need to be restored before calculating the sticky bit. Extra hardware is eliminated because it is no longer necessary to provide any extra mechanism for restoring the preliminary final partial remainder. Latency is improved because no additional cycle time is required to restore negative preliminary partial remainders. In an alternative embodiment, where the upper four bits of the estimated partial remainder are ones while the fifth most significant bit is zero, a quotient digit of negative one is chosen. This alternative embodiment allows correct exact results in all rounding modes including rounding toward plus or minus infinity.

    摘要翻译: 修改商数字选择逻辑,以防止等于负除数的部分余数发生。 如果结果是精确的,增强的商数选择功能可以防止工作部分余数变为否定,当实际部分余数为零时,选择零的商数为零,而不是1的商数。 使用五位估计的部分余数,其中高四位为零,检测到第四最高有效位的可能进位传播。 这可以通过查看第五最高有效和并且携带冗余部分余数的位来实现。 如果它们均为零,则从该位位置进入估计的部分余数的最低有效位置的进位传播是不可能的,并且选择零的商数。 这提供了一个周期的节省,因为在计算粘性位之前不再需要恢复负部分余数。 额外的硬件被消除,因为不再需要提供任何额外的机制来恢复初步的最终部分剩余。 改进了延迟,因为不需要额外的周期时间来恢复负的初步部分余数。 在替代实施例中,其中估计的部分余数的高四位是1,而第五最高有效位为零,则选择负数的商数。 该替代实施例允许在所有舍入模式中的正确精确结果,包括向正或负无穷大舍入。

    Exception handling for SIMD floating point-instructions using a floating point status register to report exceptions
    8.
    发明授权
    Exception handling for SIMD floating point-instructions using a floating point status register to report exceptions 有权
    使用浮点状态寄存器来报告异常的SIMD浮点指令异常处理

    公开(公告)号:US06675292B2

    公开(公告)日:2004-01-06

    申请号:US09374052

    申请日:1999-08-13

    IPC分类号: G06F1500

    CPC分类号: G06F9/3861

    摘要: A method, apparatus, and computer program product for handling IEEE 754 standard exceptions for Single Instruction Multiple Data (SIMD) instructions. Each SIMD sub-operation's corresponding IEEE 754 exception flag is bit-wise “ORed” with an accrued exception field if a trap enable mask field is configured to mask the exception, with the “ORed” result written back in the accrued exception field. If the trap enable mask field is configured to enable the exception, the accrued exception field and a current exception field are cleared, and an unfinished floating-point exception flag is set in a floating-point trap type field. The actual sub-operation(s) causing the exception is determined through software.

    摘要翻译: 一种用于处理单指令多数据(SIMD)指令的IEEE 754标准异常的方法,装置和计算机程序产品。 如果将陷阱启用掩码字段配置为屏蔽异常,每个SIMD子操作的相应的IEEE 754异常标志是按比例的“OR”与累加的异常字段,将“ORed”结果写回累加的异常字段。 如果将“启用掩码”字段配置为启用异常,则将清除累加异常字段和当前异常字段,并在浮点陷阱类型字段中设置未完成的浮点异常标志。 导致异常的实际子操作是通过软件确定的。

    Auxiliary register file accessing technique
    9.
    发明授权
    Auxiliary register file accessing technique 失效
    辅助寄存器文件访问技术

    公开(公告)号:US5845307A

    公开(公告)日:1998-12-01

    申请号:US787339

    申请日:1997-01-27

    IPC分类号: G06F9/30 G06F9/318 G06F12/02

    摘要: Certain bits in existing op code formats for a processor do not change from one instruction to another when particular classes of instructions are used. Applicants optionally utilize one or more of these bits to identify one of a plurality of different register files from which to retrieve operands or to store the results of an operation. These bits along with allocated address bits in predetermined address fields now allow the processor to address many more registers. This can be used to increase the performance of the processor. Those programs not utilizing the bits outside of the address fields for designating a particular register file are backwards compatible with the modified processor.

    摘要翻译: 当使用特定类别的指令时,处理器的现有操作码格式中的某些位不会从一个指令变为另一个指令。 申请人可选地使用这些位中的一个或多个来识别多个不同寄存器文件中的一个,以从中检索操作数或存储操作的结果。 这些位以及预定地址字段中的分配的地址位现在允许处理器寻址更多的寄存器。 这可以用来提高处理器的性能。 那些不利用地址字段之外的位用于指定特定寄存器文件的程序与修改的处理器向后兼容。