Abstract:
Systems and methods for operating a processor include determining confidence levels, such as high, low, and medium confidence levels, associated with in-flight branch instructions in an instruction pipeline of the processor, based on counters used for predicting directions of the in-flight branch instructions. Numbers of in-flight branch instructions associated with each of confidence levels are determined. A weighted sum of the numbers weighted with weights corresponding to the confidence levels is calculated and the weighted sum is compared with a threshold. A throttling signal may be asserted to indicate that instructions are to be throttled in a pipeline stage of the instruction pipeline based on the comparison.
Abstract:
Replaying speculatively dispatched load-dependent instructions in response to a cache miss for a producing load instruction in an out-of-order processor (OoP) is disclosed. To allow for a scheduler circuit to restore register dependencies in a register dependency tracking circuit for a replay operation in response to a cache miss for execution of a load instruction, the scheduler circuit includes a replay circuit. The replay circuit includes a load dependency tracking circuit. The replay circuit is configured to track dependencies of dispatched load instructions in the load dependency tracking circuit. The replay circuit uses these tracked dependencies to restore register dependencies for the dispatched load instructions in the register dependency tracking circuit in response to a replay operation. Thus, the load instruction does not have to be re-allocated to restore register dependencies in the register dependency tracking circuit used for re-dispatching load-dependent instructions.
Abstract:
Next line prefetchers employing initial high prefetch prediction confidence states for throttling next line prefetches in processor-based system are disclosed. Next line prefetcher prefetches a next memory line into cache memory in response to read operation. To mitigate prefetch mispredictions, next line prefetcher is throttled to cease prefetching after prefetch prediction confidence state becomes a no next line prefetch state indicating number of incorrect predictions. Instead of initial prefetch prediction confidence state being set to no next line prefetch state, which is built up in response to correct predictions before performing a next line prefetch, initial prefetch prediction confidence state is set to next line prefetch state to allow next line prefetching. Thus, next line prefetcher starts prefetching next lines before requiring correct predictions to be “built up” in prefetch prediction confidence state. CPU performance may be increased, because prefetching begins sooner rather than waiting for correct predictions to occur.
Abstract:
Next line prefetchers employing initial high prefetch prediction confidence states for throttling next line prefetches in processor-based system are disclosed. Next line prefetcher prefetches a next memory line into cache memory in response to read operation. To mitigate prefetch mispredictions, next line prefetcher is throttled to cease prefetching after prefetch prediction confidence state becomes a no next line prefetch state indicating number of incorrect predictions. Instead of initial prefetch prediction confidence state being set to no next line prefetch state, which is built up in response to correct predictions before performing a next line prefetch, initial prefetch prediction confidence state is set to next line prefetch state to allow next line prefetching. Thus, next line prefetcher starts prefetching next lines before requiring correct predictions to be “built up” in prefetch prediction confidence state. CPU performance may be increased, because prefetching begins sooner rather than waiting for correct predictions to occur.
Abstract:
Selective flushing of instructions in an instruction pipeline in a processor back to an execution-determined target address in response to a precise interrupt is disclosed. A selective instruction pipeline flush controller determines if a precise interrupt has occurred for an executed instruction in the instruction pipeline. The selective instruction pipeline flush controller determines if an instruction at the correct resolved target address of the instruction that caused the precise interrupt is contained in the instruction pipeline. If so, the selective instruction pipeline flush controller can selectively flush instructions back to the instruction in the pipeline that contains the correct resolved target address to reduce the amount of new instruction fetching. In this manner, as an example, the performance penalty of precise interrupts can be lessened through less instruction refetching and reduced delay in instruction pipeline refilling when the instruction containing the correct target address is already contained in the pipeline.
Abstract:
Speculative history forwarding in overriding branch predictors, and related circuits, methods, and computer-readable media are disclosed. In one embodiment, a branch prediction circuit including a first branch predictor and a second branch predictor is provided. The first branch predictor generates a first branch prediction for a conditional branch instruction, and the first branch prediction is stored in a first branch prediction history. The first branch prediction is also speculatively forwarded to a second branch prediction history. The second branch predictor subsequently generates a second branch prediction based on the second branch prediction history, including the speculatively forwarded first branch prediction. By enabling the second branch predictor to base its branch prediction on the speculatively forwarded first branch prediction, an accuracy of the second branch predictor may be improved.
Abstract:
Providing early instruction execution in an out-of-order (OOO) processor, and related apparatuses, methods, and computer-readable media are disclosed. In one aspect, an apparatus comprises an early execution engine communicatively coupled to a front-end instruction pipeline and a back-end instruction pipeline of an OOO processor. The early execution engine is configured to receive an incoming instruction from the front-end instruction pipeline, and determine whether an input operand of one or more input operands of the incoming instruction is present in a corresponding entry of one or more entries in an early register cache. The early execution engine is also configured to, responsive to determining that the input operand is present in the corresponding entry, substitute the input operand with a non-speculative immediate value stored in the corresponding entry. In some aspects, the early execution engine may execute the incoming instruction using an early execution unit and update the early register cache.
Abstract:
Aspects disclosed in the detailed description include providing load address predictions using address prediction tables based on load path history in processor-based systems. In one aspect, a load address prediction engine provides a load address prediction table containing multiple load address prediction table entries. Each load address prediction table entry includes a predictor tag field and a memory address field for a load instruction. The load address prediction engine generates a table index and a predictor tag based on an identifier and a load path history for a detected load instruction. The table index is used to look up a corresponding load address prediction table entry. If the predictor tag matches the predictor tag field of the load address prediction table entry corresponding to the table index, the memory address field of the load address prediction table entry is provided as a predicted memory address for the load instruction.
Abstract:
A first load instruction specifying a first virtual address misses in a data cache. A delta value is received based on a program counter value of the first load instruction. A second virtual address is computed based on the delta value and the first virtual address. Data associated with the second virtual address is then prefetched from a main memory to the data cache prior to a second load instruction specifying the second virtual address missing in the data cache.
Abstract:
Aspects disclosed in the detailed description include providing load address predictions using address prediction tables based on load path history in processor-based systems. In one aspect, a load address prediction engine provides a load address prediction table containing multiple load address prediction table entries. Each load address prediction table entry includes a predictor tag field and a memory address field for a load instruction. The load address prediction engine generates a table index and a predictor tag based on an identifier and a load path history for a detected load instruction. The table index is used to look up a corresponding load address prediction table entry. If the predictor tag matches the predictor tag field of the load address prediction table entry corresponding to the table index, the memory address field of the load address prediction table entry is provided as a predicted memory address for the load instruction.