Abstract:
A system is provided for processing concurrently one or more branch instructions in an instruction bundle. The system includes multiple branch execution pipelines, each capable of executing a branch instruction to determine a branch direction, target address, and any side effects. Linking logic receives the resolved branch information and identifies a first branch instruction in execution order for which the branch direction is taken.
Abstract:
A method for gating a clock signal to an execution unit on long latency memory stalls monitors a stall signal, a scoreboard (data) hazard signal, a resource hazard signal, and a data return signal. The clock signal is decoupled from the execution unit when the stall and data hazard signals are asserted for a selected interval and the data return and resource hazard signals are not asserted for a selected interval.
Abstract:
A branch prediction instruction is provided that includes hint information for indicating a storage location for associated branch prediction information in a hierarchy of branch prediction storage structures. When the hint information is in a first state, branch prediction information is stored in a first structure that provides single cycle access to the stored information. When the hint information is in a second state, the branch prediction information is stored in a second structure that provides slower access to the stored information.
Abstract:
The present invention is an apparatus to normalize a floating point number. The apparatus has a first storage area comprising the floating point number. The floating point number comprises an exponent field and an explicit bit. The apparatus further comprises a circuit to normalize the floating point number when the explicit bit is not set and the exponent field has a first predetermined value identifying a redundant denormal encoding of the floating point number. Otherwise the encoding of the number is not changed by the circuit.
Abstract:
An apparatus for storing data in a computer memory, the number originating from one of a plurality of floating point data formats. Each data format from which the number originates has a first exponent bias and a minimum exponent value. The number has a first exponent and an unbiased exponent value, the unbiased exponent value equal to the difference between the first exponent and the first exponent bias. The number also has a sign and a significand. The apparatus for storing the number in computer memory consists of at least one sign bit and a significand having an explicit integer bit, the explicit integer bit having a first predetermined value when the number is normal and having a second predetermined value when the number is denormal. The apparatus also has a second exponent with a second exponent bias, the second exponent equal to the sum of the unbiased exponent value and the second exponent bias when the number is normal, the second exponent equal to the sum of the minimum exponent value and the second exponent bias when the number is denormal.
Abstract:
Interconnect-dominated large register files are reduced in chip area and delay time. A register file in a processor having a number of execution units is divided into multiple copies. Different groups of execution units can read from and write to their own copy of the file registers by a set of local read and write ports. All of the register-file copies are synchronized by writing data from the execution units to remote write ports in at least some registers in other copies of the register file. Each copy can be divided into local and global registers. While all copies of the global registers continue to be written by the remote write ports, the local registers can be written only by a local cluster of execution units. Alternatively or additionally, all of the execution units can write to their local register-file copy, but only some of the units can write the global registers in all copies of the register file.
Abstract:
A pipelined data processor has instructions at different stages of execution. Some of the instructions specify virtual addresses into a file of registers having physical addresses. A speculative translator maps the virtual registers of an instruction at one pipeline stage into physical registers for speculative use by the instruction at a later pipeline stage. The registers have multiple differently translated regions. Failure of speculative renaming reverts to an archive copy of renaming data.
Abstract:
A processor having a large register file utilizes a template field for ening a set of most useful instruction sequences in a long instruction word format. The instruction set of the processor includes instructions which are one of the plurality of different instruction types. The execution units of the processor are similarly categorized into different types, wherein each instruction type may be executed on one or more of the execution unit types. The instructions are grouped together into 128-bit sized and aligned containers called bundles, with each bundle includes a plurality of instruction slots and a template field that specifies the mapping of the instruction slots to the execution unit types.
Abstract:
A moderately coupled floating point and integer units of a processor allows for rapid transfer of data between the two units. The integer unit is comprised of a plurality of integer registers arranged into an integer register file and coupled to one or more integer execution units. Similarly, the floating point unit is comprised of a plurality of floating point registers arranged into a floating point register file and coupled to one or more floating point execution units. The two units operate as separate units except for the data transfer between them on a transfer bus. The transfer bus is the only direct data link between the two register files. Multiplexers are used to control the bit transfer between the two register files so that all or some of the bits of a register are transferred to a receiving register. Furthermore, the data transfer scheme allows for both numeric and Booleans to be transferred and compounding of Booleans can be performed in either numeric unit.
Abstract:
Fast exception processing is disclosed. In one embodiment, a system includes a splice cache, an exception logic, and an instrumentation mechanism. The splice cache contains one or more lightweight handlers. The exception logic is coupled to the splice cache and determines whether the corresponding lightweight handler for an exception is located in the splice cache. The instrumentation mechanism is coupled to the splice cache. The instrumentation mechanism inserts the lightweight handler into an execution stream.