Abstract:
A mechanism for reducing power consumption of a cache memory of a processor includes a processor with a cache memory that stores instruction information for one or more instruction fetch groups fetched from a system memory. The cache memory may include a number of ways that are each independently controllable. The processor also includes a way prediction unit. The way prediction unit may enable, in a next execution cycle, a given way within which instruction information corresponding to a target of a next branch instruction is stored in response to a branch taken prediction for the next branch instruction. The way prediction unit may also, in response to the branch taken prediction for the next branch instruction, enable, one at a time, each corresponding way within which instruction information corresponding to respective sequential instruction fetch groups that follow the next branch instruction are stored.
Abstract:
A mechanism for reducing power consumption of a cache memory of a processor includes a processor with a cache memory that stores instruction information for one or more instruction fetch groups fetched from a system memory. The cache memory may include a number of ways that are each independently controllable. The processor also includes a way prediction unit. The way prediction unit may enable, in a next execution cycle, a given way within which instruction information corresponding to a target of a next branch instruction is stored in response to a branch taken prediction for the next branch instruction. The way prediction unit may also, in response to the branch taken prediction for the next branch instruction, enable, one at a time, each corresponding way within which instruction information corresponding to respective sequential instruction fetch groups that follow the next branch instruction are stored.
Abstract:
A cache subsystem is disclosed. The cache subsystem includes a cache configured to store information in cache lines arranged in a plurality of ways. A requestor circuit generates a request to access a particular cache line in the cache. A prediction circuit is configured to generate a prediction of which of the ways includes the particular cache line. A comparison circuit verifies the prediction by comparing a particular address tag associated with the particular cache line to a cache tag corresponding to a predicted one of the ways. Responsive to determining that the prediction was correct, a confirmation indication is stored indicating the correct prediction. For a subsequent request for the particular cache line, the cache is configured to forego a verification of the prediction that the particular cache line is included in the one of the ways based on the confirmation indication.
Abstract:
An embodiment of an apparatus includes a processing circuit and a system memory. The processing circuit may store a pending request in a buffer, the pending request corresponding to a transaction that includes a write request to the system memory. The processing circuit may also allocate an entry in a write table corresponding the transaction. After sending the transaction to the system memory to be processed, the pending request in the buffer may be removed in response to the allocation of the write entry.
Abstract:
A processor includes a mechanism for disabling a memory array of a branch prediction unit. The processor may include a next fetch prediction unit that may include a number of entries. Each entry may correspond to a next instruction fetch group and may store an indication of whether or not the corresponding the next fetch group includes a conditional branch instruction. In response to an indication that the next fetch group does not include a conditional branch instruction, the fetch prediction unit may be configured to disable, in a next instruction execution cycle, the memory array of the branch prediction unit.
Abstract:
Techniques are disclosed relating to a cache for patterns of instructions. In some embodiments, an apparatus includes an instruction cache and is configured to detect a pattern of execution of instructions by an instruction processing pipeline. The pattern of execution may involve execution of only instructions in a particular group of instructions. The instructions may include multiple backward control transfers and/or a control transfer instruction that is taken in one iteration of the pattern and not taken in another iteration of the pattern. The apparatus may be configured to store the instructions in the instruction cache and fetch and execute the instructions from the instruction cache. The apparatus may include a branch predictor dedicated to predicting the direction of control transfer instructions for the instruction cache. Various embodiments may reduce power consumption associated with instruction processing.
Abstract:
Techniques are disclosed relating to power reduction during execution of instruction loops. Multiple different power saving modes may be used by a processor, such as a first power saving mode after only a few loop iterations (e.g., 2-3) and a second, deeper power saving mode after a greater number of loop iterations. The first power saving mode may include keeping a branch predictor and/or other structures active, but the second power saving mode may include reducing power to the branch predictor and/or other structures. An observation mode and an instruction capture mode may also be used by a processor prior to entering a power saving mode for loop execution. Power saving modes may also be achieved during execution of complex loops having multiple backward branches (e.g., nested loops).
Abstract:
Disclosed techniques relate to trace caches. Trace cache circuitry may identify traces that satisfy one or more criteria. Generally, internal branches of a trace should satisfy a threshold bias level in a particular direction. To achieve this goal, the processor may initially assume that branches meet the threshold, track their usefulness in the trace context over time, and prevent inclusion of branches that fall below a usefulness threshold (which indicates that those branches are not sufficiently biased). Branches that do not meet the threshold may be added to a Bloom filter, for example. Usefulness may be tracked during trace training, when valid in a trace cache, or both.
Abstract:
An apparatus includes an instruction cache circuit and an instruction fetch circuit. The instruction fetch circuit is configured to retrieve, from the instruction cache circuit, a fetch group that includes a plurality of instructions for execution by a processing circuit, and to make a determination that the fetch group includes a control transfer instruction that is predicted to be taken. A target address associated with the control transfer instruction is directed to an instruction within the fetch group. The instruction fetch circuit is further configured to, based on the determination, alter instructions within the fetch group in a manner that is based on a type of the control transfer instruction.
Abstract:
A processor includes a mechanism for disabling a memory array of a branch prediction unit. The processor may include a next fetch prediction unit that may include a number of entries. Each entry may correspond to a next instruction fetch group and may store an indication of whether or not the corresponding the next fetch group includes a conditional branch instruction. In response to an indication that the next fetch group does not include a conditional branch instruction, the fetch prediction unit may be configured to disable, in a next instruction execution cycle, the memory array of the branch prediction unit.