Abstract:
In response to decoding a zero-overhead loop control instruction of an instruction set architecture, processing circuitry sets at least one loop control parameter for controlling execution of one or more iterations of a program loop body of a zero-overhead loop. Based on the at least one loop control parameter, loop control circuitry controls execution of the one or more iterations of the program loop body of the zero-overhead loop, the program loop body excluding the zero-overhead loop control instruction. Branch prediction disabling circuitry detects whether the processing circuitry is executing the program loop body of the zero-overhead loop associated with the zero-overhead loop control instruction, and dependent on detecting that the processing circuitry is executing the program loop body of the zero-overhead loop, disables branch prediction circuitry. This reduces power consumption during a zero-overhead loop when the branch prediction circuitry is unlikely to provide a benefit.
Abstract:
A processor includes an execution pipeline having one or more execution units to execute instructions and a branch prediction unit coupled to the execution units. The branch prediction unit includes a branch history table to store prior branch predictions, a branch predictor, in response to a conditional branch instruction, to predict a branch target address of the conditional branch instruction based on the branch history table, and address match logic to compare the predicted branch target address with an address of a next instruction executed immediately following the conditional branch instruction. The address match logic is to cause the execution pipeline to be flushed if the predicted branch target address does not match the address of the next instruction to be executed.
Abstract:
One embodiment of the present invention sets forth a technique for instruction level execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. Any in-flight instructions that follow the preemption command in the processing pipeline are captured and stored in a processing task buffer to be reissued when the preempted program is resumed. The processing task buffer is designated as a high priority task to ensure the preempted instructions are reissued before any new instructions for the preempted context when execution of the preempted context is restored.
Abstract:
Techniques provided implement automatic data type annotation in dynamically-typed source code. A codebase, which may comprise a plurality of source code files, is scanned at a global level. The resulting scanned data may describe characteristics of the codebase, including variable and function usage. Based on inferences drawn from the scanning, data types are determined for different variables, expressions, or functions to facilitate conversion from dynamically-typed source code to statically-typed source code. For example, if a function is called once with a parameter value of data type A (e.g., class A), and another time with a parameter value of data type B (e.g., class B), a conversion tool may annotate the parameter variable in the declaration of the function with a data type D (e.g., class d) when data type D is identified as a common ancestor (e.g., superclass) to both data type A and data type B.
Abstract:
A computing system includes a microprocessor that receives values for configuring operating modes thereof. A device driver monitors which software applications currently running on the microprocessor are in a predetermined list and responsively dynamically writes the values to the microprocessor to configure its operating modes. Examples of the operating modes the device driver may configure relate to the following: data prefetching; branch prediction; instruction cache eviction; instruction execution suspension; sizes of cache memories, reorder buffer, store/load/fill queues; hashing algorithms related to data forwarding and branch target address cache indexing; number of instruction translation, formatting, and issuing per clock cycle; load delay mechanism; speculative page tablewalks; instruction merging; out-of-order execution extent; caching of non-temporal hinted data; and serial or parallel access of an L2 cache and processor bus in response to an instruction cache miss.
Abstract:
The present invention provides an overlay instruction accessing unit and method, and a method and apparatus for compressing and storing a program. The overlay instruction accessing unit is used to execute a program stored in a memory in the form of a plurality of compressed program segments, and compresses: a buffer; a processing unit for issuing an instruction reading request, reading an instruction from the buffer, and executing the instruction; and a decompressing unit for reading a requested compressed instruction segment from the memory in response to the instruction reading request of the processing unit, decompressing the compressed instruction segment, and storing the decompressed instruction segment in the buffer, wherein while the processing unit is executing the instruction segment, the decompressing unit reads, according to a storage address of a compressed program segment to be invoked in a header corresponding to the instruction segment, a corresponding compressed instruction segment from the memory, decompresses the compressed instruction segment, and stores the decompressed instruction segment in the buffer for later use by the processing unit.
Abstract:
A design structure embodied in a machine readable medium used in a design process includes an apparatus for predictive decoding, the apparatus including register logic for fetching an instruction; predictor logic containing predictor information including prior instruction execution characteristics; logic for obtaining predictor information for the fetched instruction from the predictor; and decode logic for generating a selected one of a plurality of decode operation streams corresponding to the fetched instruction, wherein the decode operation stream is selected based on the predictor information.
Abstract:
In one embodiment, the branch prediction mechanism includes a first storage including a first plurality of locations for storing a first set of partial prediction information. The branch prediction mechanism also includes a second storage including a second plurality of locations for storing a second set of partial prediction information. Further, the branch prediction mechanism includes a control unit that performs a first hash function on input branch information to generate a first index for accessing a selected location within the first storage. The control unit also performs a second hash function on the input branch information to generate a second index for accessing a selected location within the second storage. Lastly, the control unit further provides a prediction value based on corresponding partial prediction information in the selected locations of the first and the second storages.
Abstract:
Conditional branch bytecodes are processed by a Virtual Machine Interpreter (VMI) hardware accelerator that utilizes a branch prediction scheme to determine whether to speculatively process bytecodes while waiting for the CPU to return a condition control variable. The VMI assumes the branch condition will be fulfilled if a conditional branch bytecode calls for a backward jump and that the branch condition will not be fulfilled if a conditional branch bytecode calls for a forward jump. Alternatively, the VMI makes an assumption only if a conditional branch bytecode calls for a backward jump or the VMI assumes that the branch condition will be fulfilled whenever it processes a conditional branch bytecode. The VMI only speculatively processes bytecodes that are easily reversible, and suspends speculative processing of bytecodes upon encountering a bytecode that is not easily reversible. If a VMI assumption is invalidated, any speculatively processed bytecodes are reversed.
Abstract:
A processor 2 incorporates a branch prediction mechanism 14, 18, 20 which acts to predict branch outcomes for predicted type branch instructions. The processor also supports non-predicted type branch instructions which are ignored by the branch prediction mechanisms 14, 18, 20 and are not subject to prediction. The impact of mispredictions degrading overall performance of the prediction mechanisms 14, 18, 20 is reduced by employing non-prediction type branch program instructions to represent/control branch operations when it is known that misprediction is likely for those branch operations.