摘要:
A CPU adapted to calculate a checksum simultaneously on multiple values packed into a single register. An adder is provided which adds a number of values packed into a first register to a number of packed values from a second register. The adder is constructed, or partitioned, so that the values do not propagate their carry bit to the next value. A special carry bit adder is provided which will add a carry bit out of each partitioned portion back into the sum value to generate the sum required by the checksum protocol.
摘要:
Quotient digit selection logic is modified so as to prevent a partial remainder equal to the negative divisor from occurring. An enhanced quotient digit selection function prevents the working partial remainder from becoming negative if the result is exact. The enhanced quotient digit selection logic chooses a quotient digit of zero instead of a quotient digit of one when the actual partial remainder is zero. Using a five bit estimated partial remainder where the upper four bits are zero, a possible carry propagation into fourth most significant bit is detected. This can be accomplished by looking at the fifth most significant sum and carry bits of the redundant partial remainder. If they are both zero, then a carry propagation out of that bit position into the least significant position of the estimated partial remainder is not possible, and a quotient digit of zero is chosen. In the alternative case in which one or both of the fifth most significant carry or sum bits of the redundant partial remainder are ones, a quotient digit of one is chosen. This provides a one cycle savings since negative partial remainders no longer need to be restored before calculating the sticky bit. Extra hardware is eliminated because it is no longer necessary to provide any extra mechanism for restoring the preliminary final partial remainder. Latency is improved because no additional cycle time is required to restore negative preliminary partial remainders. An optimized five-level circuit is shown which implements the enhanced quotient selection function.
摘要:
Where it is desired to perform a double precision operation using single precision operands, first and second single precision operands are loaded into first and second respective rows of a re-order buffer, and third and fourth single precision operands are loaded into third and fourth respective rows of the re-order buffer. A first merge instruction copies the first and second single precision operands from respective first and second rows of the re-order buffer into first and second portions of a fifth row of the re-order buffer, thereby concatenating the first and second single precision operands to represent a first double precision operand. A second merge instruction copies the third and fourth single precision operands from respective third and fourth rows of the re-order buffer into first and second portions of a sixth row of the re-order buffer, thereby concatenating the third and fourth single precision operands to represent a second double precision operand. The first and second double precision operands stored in the fifth and sixth rows, respectively, of the re-order buffer are then provided directly to an associated FPU for execution.
摘要:
Quotient digit selection logic using a three-bit carry propagate adder is presented. An enhanced quotient digit selection function prevents the working partial remainder from becoming negative if the result is exact. The enhanced quotient digit selection logic chooses a quotient digit of zero instead of a quotient digit of one when the actual partial remainder is zero. Using a four bit estimated partial remainder where the upper four bits are zero, a possible carry propagation into fourth most significant bit is detected. This can be accomplished by looking at the fourth most significant sum and carry bits of the redundant partial remainder. If they are both zero, then a carry propagation out of that bit position into the least significant position of the estimated partial remainder is not possible, and a quotient digit of zero is chosen. This provides a one cycle savings since negative partial remainders no longer need to be restored before calculating the sticky bit. Extra hardware is eliminated because it is no longer necessary to provide any extra mechanism for restoring the preliminary final partial remainder. Latency is improved because no additional cycle time is required to restore negative preliminary partial remainders. In an alternative embodiment, where the upper three bits of the estimated partial remainder are ones while the fourth most significant bit is zero, a quotient digit of negative one is chosen. This alternative embodiment allows correct exact results in all rounding modes including rounding toward plus or minus infinity.
摘要:
Quotient digit selection logic is modified so as to prevent a partial remainder equal to the negative divisor from occurring. An enhanced quotient digit selection function prevents the working partial remainder from becoming negative if the result is exact, choosing a quotient digit of zero instead of a quotient digit of one when the actual partial remainder is zero. Using a five bit estimated partial remainder where the upper four bits are zero, a possible carry propagation into fourth most significant bit is detected. This can be accomplished by looking at the fifth most significant sum and carry bits of the redundant partial remainder. If they are both zero, then a carry propagation out of that bit position into the least significant position of the estimated partial remainder is not possible, and a quotient digit of zero is chosen. This provides a one cycle savings since negative partial remainders no longer need to be restored before calculating the sticky bit. Extra hardware is eliminated because it is no longer necessary to provide any extra mechanism for restoring the preliminary final partial remainder. Latency is improved because no additional cycle time is required to restore negative preliminary partial remainders. In an alternative embodiment, where the upper four bits of the estimated partial remainder are ones while the fifth most significant bit is zero, a quotient digit of negative one is chosen. This alternative embodiment allows correct exact results in all rounding modes including rounding toward plus or minus infinity.
摘要:
In hardware SRT division and square root mantissa units maximal quotient selection overlapping for three quotient digits per cycle are used. An effective radix-8 implementation cascades three partial remainder computation circuits and overlaps three quotient selection circuits. Two carry save adders speculatively compute the possible resulting partial remainders corresponding to each possible value, -1, 0, and +1, of the quotient digit by adding the divisor, not adding anything, and adding the two's complement of the divisor, respectively, thus shortening the critical path of a single SRT iteration producing a single quotient digit. The propagation delays of two carry save adders which speculatively compute the possible resulting partial remainders are masked by a longer delay through quotient selection logic.
摘要:
A method, apparatus, and computer program product for handling IEEE 754 standard exceptions for Single Instruction Multiple Data (SIMD) instructions. Each SIMD sub-operation's corresponding IEEE 754 exception flag is bit-wise “ORed” with an accrued exception field if a trap enable mask field is configured to mask the exception, with the “ORed” result written back in the accrued exception field. If the trap enable mask field is configured to enable the exception, the accrued exception field and a current exception field are cleared, and an unfinished floating-point exception flag is set in a floating-point trap type field. The actual sub-operation(s) causing the exception is determined through software.
摘要:
Certain bits in existing op code formats for a processor do not change from one instruction to another when particular classes of instructions are used. Applicants optionally utilize one or more of these bits to identify one of a plurality of different register files from which to retrieve operands or to store the results of an operation. These bits along with allocated address bits in predetermined address fields now allow the processor to address many more registers. This can be used to increase the performance of the processor. Those programs not utilizing the bits outside of the address fields for designating a particular register file are backwards compatible with the modified processor.
摘要:
In hardware SRT division and square root mantissa units maximal quotient selection overlapping for three quotient digits per cycle are used. An effective radix-8 implementation cascades three partial remainder computation circuits and overlaps three quotient selection circuits. Two carry save adders speculatively compute the possible resulting partial remainders corresponding to each possible value, -1, 0 , and +1, of the quotient digit by adding the divisor, not adding anything, and adding the two's complement of the divisor, respectively, thus shortening the critical path of a single SRT iteration producing a single quotient digit. The propagation delays of two carry save adders which speculatively compute the possible resulting partial remainders are masked by a longer delay through quotient selection logic.