Abstract:
A method for operating a fused-multiply-add pipeline in a floating-point unit of a processor is disclosed. A multiplication is initially performed between a first operand and a second operand in a multiplier block to obtain a set of partial product results. The partial product results are sent to a carry-save adder block. A partial product reduction is performed on the partial product results to generate a carry-save result having a sum term and a carry term. The carry-save result is then formatted to generate a carry-out bit. The carry-save result is added to a third operand to generate a final result.
Abstract:
Disclosed below are representative embodiments of methods, apparatus, and systems for performing formal verification. For example, certain embodiments can be used to formally verify a Booth multiplier. For instance, in one example embodiment, a specification of a Booth multiplier circuit is received; an initial model checking operation is performed for a smaller version of the Booth multiplier circuit; a series of subsequent model checking operations are performed for versions of the Booth multiplier circuit that are incrementally larger than the smaller version of the Booth multiplier circuit, wherein, for each incrementally larger Booth multiplier circuit, two or more model checking operations are performed, the two or more model checking operations representing decomposed proof obligations for showing; and a verification result of the Booth multiplier circuit is output.
Abstract:
An apparatus having operand registers, an opcode detector, a carryless preformat unit, a compressor, a left shifter, and exclusive-OR logic. The operand registers receive operands for a carryless multiplication. The opcode detector receives a carryless multiplication instruction, and asserts a carryless signal. The carryless preformat unit partitions a first operand into a plurality of parts that are such that a Booth encoder is precluded from selection of second partial products of a second operand, where the second partial products reflect implicit carry operations. The compressor sums first partial products of the second operand via carry save adders arranged in a Wallace tree configuration, where generation of carry bits is disabled. The left shifter shifts one or more outputs of the compressor. The exclusive-OR logic executes an exclusive-OR function to yield a carryless multiplication result.
Abstract:
A unified computation unit for iterative multiplication and division may include an architecture having a unified integer iterative multiplication and division circuit. A method may include a device receiving a dividend and a divisor for a division operation, separating the dividend into two parts based on the determining, and evaluating whether an overflow situation exists based on the two parts. A single-cycle multiplication unit may include a multi-operand addition schema for partial products compression that implements tree-based addition methods for single-cycle multiplication operations.
Abstract:
A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.
Abstract:
A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.
Abstract:
A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.
Abstract:
A matrix of execution blocks form a set of rows and columns. The rows support parallel execution of instructions and the columns support execution of dependent instructions. The matrix of execution blocks process a single block of instructions specifying parallel and dependent instructions.
Abstract:
A matrix of execution blocks form a set of rows and columns. The rows support parallel execution of instructions and the columns support execution of dependent instructions. The matrix of execution blocks process a single block of instructions specifying parallel and dependent instructions.
Abstract:
Methods and apparatus relating to reducing power consumption in multi-precision floating point multipliers are described. In an embodiment, certain portions of a multiplier are disabled in response to two or more multiplication operations with the same data size and data type occurring back-to-back. Other embodiments are also claimed and described.