摘要:
A microprocessor comprises an instruction execution unit operable to generate an intermediate result vector and a plurality of calculation control indicators and storage external to the instruction execution unit which stores the intermediate result vector and the plurality of calculation control indicators. The intermediate result vector is generated from an application of at least a first arithmetic operation of a compound arithmetic operation. The calculation control indicators indicate how subsequent calculations to generate a final result from the intermediate result vector should proceed. The subsequent calculations may involve one or more remaining arithmetic operations of the compound arithmetic operation. The intermediate result vector, in combination with the plurality of calculation control indicators, provides sufficient information to generate a result indistinguishable from an infinitely precise calculation of the compound arithmetic operation whose result is reduced in significance to a target data size.
摘要:
A microprocessor prepares a fused multiply-accumulate operation of a form ±A*B±C for execution by issuing first and second multiply-accumulate microinstructions to one or more instruction execution units to complete the fused multiply-accumulate operation. The first multiply-accumulate microinstruction causes an unrounded nonredundant result vector to be generated from a first accumulation of a selected one of (a) the partial products of A and B or (b) C with the partial products of A and B. The second multiply-accumulate microinstruction causes performance of a second accumulation of C with the unrounded nonredundant result vector, if the first accumulation did not include C. The second multiply-accumulate microinstruction also causes a final rounded result to be generated from the unrounded nonredundant result vector, wherein the final rounded result is a complete result of the fused multiply-accumulate operation.
摘要:
A microprocessor performs a fused multiply-accumulate operation of a form ±A*B±C. An evaluation is made to detect whether values of A, B, and/or C meet a sufficient condition for performing a joint accumulation of C with partial products of A and B. If so, a joint accumulation of C is done with partial products of A and B and result of the joint accumulation is rounded. If not, then a primary accumulation is done of the partial products of A and B. This generates an unrounded non-redundant result of the primary accumulation. The unrounded result is then truncated to generate an unrounded non-redundant intermediate result vector that excludes one or more least significant bits of the unrounded non-redundant result. A secondary accumulation is then performed, adding or subtracting C to the unrounded non-redundant intermediate result vector. Finally, the result of the secondary accumulation is rounded.
摘要:
A round-for-reround mode (preferably in a BID encoded Decimal format) of a floating point instruction prepares a result for later rounding to a variable number of digits by detecting that the least significant digit may be a 0, and if so changing it to 1 when the trailing digits are not all 0. A subsequent reround instruction is then able to round the result to any number of digits at least 2 fewer than the number of digits of the result. An optional embodiment saves a tag indicating the fact that the low order digit of the result is 0 or 5 if the trailing bits are non-zero in a tag field rather than modify the result. Another optional embodiment also saves a half-way-and-above indicator when the trailing digits represent a decimal with a most significant digit having a value of 5. An optional subsequent reround instruction is able to round the result to any number of digits fewer or equal to the number of digits of the result using the saved tags.
摘要:
A round-for-reround mode (preferably in a BID encoded Decimal format) of a floating point instruction prepares a result for later rounding to a variable number of digits by detecting that the least significant digit may be a 0, and if so changing it to 1 when the trailing digits are not all 0. A subsequent reround instruction is then able to round the result to any number of digits at least 2 fewer than the number of digits of the result. An optional embodiment saves a tag indicating the fact that the low order digit of the result is 0 or 5 if the trailing bits are non-zero in a tag field rather than modify the result. Another optional embodiment also saves a half-way-and-above indicator when the trailing digits represent a decimal with a most significant digit having a value of 5. An optional subsequent reround instruction is able to round the result to any number of digits fewer or equal to the number of digits of the result using the saved tags.
摘要:
A digital signal processor is provided. The digital signal processor includes an execution circuit configured to receive a first data including first bits expressed in a signed magnitude method and a second data including second bits expressed in the signed magnitude method, and a control logic circuit configured to output a control signal that determines a type of operation on the first data and the second data based on a command signal, wherein the execution circuit is further configured to perform an operation on the first data and the second data according to a determined type of operation and generate a result of the operation.
摘要:
Embodiments disclosed pertain to apparatuses, systems, and methods for floating point operations. Disclosed embodiments pertain to a circuit that is capable of processing both a normal and denormal inputs and outputting normal and denormal results, and where a rounding module is used advantageously to reduce operational latency of the circuit.
摘要:
A data processing apparatus and method are provided for multiplying first and second normalized floating point operands in order to generate a result, each normalized floating point operand comprising a significand and an exponent. Exponent determination circuitry is used to compute a result exponent for a normalized version of the result, and rounding value generation circuitry then generates a rounding value by shifting a rounding constant in a first direction by a shift amount that is dependent on the result exponent. Partial product generation circuitry multiplies the significands of the first and second normalized floating point operands to generate the first and second partial products, and the first and second partial products are then added together, along with the rounding value, in order to generate a normalized result significand. Thereafter, the normalized result significand is shifted in a second direction opposite to the first direction, by the shift amount, in order to generate a rounded result significand. This provides a particularly efficient mechanism for multiplying floating point numbers, while correctly rounding the result in situations where the result is subnormal.
摘要:
A microprocessor prepares a fused multiply-accumulate operation of a form ±A*B±C for execution by issuing first and second multiply-accumulate microinstructions to one or more instruction execution units to complete the fused multiply-accumulate operation. The first multiply-accumulate microinstruction causes an unrounded nonredundant result vector to be generated from a first accumulation of a selected one of (a) the partial products of A and B or (b) C with the partial products of A and B. The second multiply-accumulate microinstruction causes performance of a second accumulation of C with the unrounded nonredundant result vector, if the first accumulation did not include C. The second multiply-accumulate microinstruction also causes a final rounded result to be generated from the unrounded nonredundant result vector, wherein the final rounded result is a complete result of the fused multiply-accumulate operation.
摘要:
A microprocessor performs a fused multiply-accumulate operation of a form ±A*B±C. An evaluation is made to detect whether values of A, B, and/or C meet a sufficient condition for performing a joint accumulation of C with partial products of A and B. If so, a joint accumulation of C is done with partial products of A and B and result of the joint accumulation is rounded. If not, then a primary accumulation is done of the partial products of A and B. This generates an unrounded non-redundant result of the primary accumulation. The unrounded result is then truncated to generate an unrounded non-redundant intermediate result vector that excludes one or more least significant bits of the unrounded non-redundant result. A secondary accumulation is then performed, adding or subtracting C to the unrounded non-redundant intermediate result vector. Finally, the result of the secondary accumulation is rounded.