Abstract:
Systems, apparatuses, and methods for implementing zero cycle load bypass operations are described. A system includes a processor with at least a decode unit, control logic, mapper, and free list. When a load operation is detected, the control logic determines if the load operation qualifies to be converted to a zero cycle load bypass operation. Conditions for qualifying include the load operation being in the same decode group as an older store operation to the same address. Qualifying load operations are converted to zero cycle load bypass operations. A lookup of the free list is prevented for a zero cycle load bypass operation and a destination operand of the load is renamed with a same physical register identifier used for a source operand of the store. Also, the data of the store is bypassed to the load.
Abstract:
A processor includes a mechanism for disabling a memory array of a branch prediction unit. The processor may include a next fetch prediction unit that may include a number of entries. Each entry may correspond to a next instruction fetch group and may store an indication of whether or not the corresponding the next fetch group includes a conditional branch instruction. In response to an indication that the next fetch group does not include a conditional branch instruction, the fetch prediction unit may be configured to disable, in a next instruction execution cycle, the memory array of the branch prediction unit.
Abstract:
In an embodiment, an apparatus includes a plurality of memories configured to store respective data in a plurality of branch prediction entries. Each branch prediction entry corresponds to at least one of a plurality of branch instructions. The apparatus also includes a control circuit configured to store first data associated with a first branch instruction into a corresponding branch prediction entry in at least one memory of the plurality of memories. The control circuit is further configured to select a first memory of the plurality of memories, to disconnect the first memory from a power supply in response to a detection of a first power mode signal, and to cease storing data in the plurality of memories in response to the detection of the first power mode signal.
Abstract:
A system and method for efficiently indicating branch target addresses. A semiconductor chip predecodes instructions of a computer program prior to installing the instructions in an instruction cache. In response to determining a particular instruction is a control flow instruction with a displacement relative to a program counter address (PC), the chip replaces a portion of the PC relative displacement in the particular instruction with a subset of a target address. The subset of the target address is an untranslated physical subset of the full target address. When the recoded particular instruction is fetched and decoded, the remaining portion of the PC relative displacement is added to a virtual portion of the PC used to fetch the particular instruction. The result is concatenated with the portion of the target address embedded in the fetched particular instruction to form a full target address.
Abstract:
In an embodiment, a processor includes a register file having multiple widths corresponding to different operands sizes of a given data type implemented by the processor. For example, the integer register file may have 32 bit and 64 bit widths for 32 and 64 bit operand sizes. The register file may have a section of registers for each operand size, and the map unit may allocate registers from the appropriate section for each instruction operation based on the operand size of that instruction operation. The register file may consume less integrated circuit area than another register file having the same number of registers, all of which are implemented at the largest operand size. In some embodiments, only the register file and the map unit (specifically the free list management logic in the map unit) are changed to implement the multiple-width register file.
Abstract:
Systems, processors, and methods for determining when to enter loop buffer mode early for loops in an instruction stream. A processor waits until a branch history register has saturated before entering loop buffer mode for a loop if the processor has not yet determined the loop has an unpredictable exit. However, if the loop has an unpredictable exit, then the loop is allowed to enter loop buffer mode early. While in loop buffer mode, the loop is dispatched from a loop buffer, and the front-end of the processor is powered down until the loop terminates.
Abstract:
A system and method for efficiently indicating branch target addresses. A semiconductor chip predecodes instructions of a computer program prior to installing the instructions in an instruction cache. In response to determining a particular instruction is a control flow instruction with a displacement relative to a program counter address (PC), the chip replaces a portion of the PC relative displacement in the particular instruction with a subset of a target address. The subset of the target address is an untranslated physical subset of the full target address. When the recoded particular instruction is fetched and decoded, the remaining portion of the PC relative displacement is added to a virtual portion of the PC used to fetch the particular instruction. The result is concatenated with the portion of the target address embedded in the fetched particular instruction to form a full target address.
Abstract:
A processor includes a mechanism for disabling a memory array of a branch prediction unit. The processor may include a next fetch prediction unit that may include a number of entries. Each entry may correspond to a next instruction fetch group and may store an indication of whether or not the corresponding the next fetch group includes a conditional branch instruction. In response to an indication that the next fetch group does not include a conditional branch instruction, the fetch prediction unit may be configured to disable, in a next instruction execution cycle, the memory array of the branch prediction unit.
Abstract:
In an embodiment, a processor includes a register file having multiple widths corresponding to different operands sizes of a given data type implemented by the processor. For example, the integer register file may have 32 bit and 64 bit widths for 32 and 64 bit operand sizes. The register file may have a section of registers for each operand size, and the map unit may allocate registers from the appropriate section for each instruction operation based on the operand size of that instruction operation. The register file may consume less integrated circuit area than another register file having the same number of registers, all of which are implemented at the largest operand size. In some embodiments, only the register file and the map unit (specifically the free list management logic in the map unit) are changed to implement the multiple-width register file.