摘要:
A processor architecture includes a register file hierarchy to implement virtual registers that provide a larger set of registers than those directly supported by an instruction set architecture to facilitate multiple copies of the same architecture register for different processing threads, where the register file hierarchy includes a plurality of hierarchy levels. The processor architecture further includes a plurality of execution units coupled to the register file hierarchy.
摘要:
A system and method for using an operation (op) cache is disclosed. The system and method include an op cache for caching previously decoded instructions. The op cache includes a plurality of physically indexed and tagged instructions allowing sharing of instructions between threads. The op cache is chained through multiple ways allowing service of a plurality of instructions in a cache line. The op cache is stored between a shared operation storage and immediate/displacement storage to maximize capacity.
摘要:
A method for securing a first program, the first program including a finite number of program points and evolution rules associated to program points and defining the passage of a program point to another, the method including defining a plurality of exit cases and, when a second program is used in the definition of the first program, for each exit case, definition of a branching toward a specific program point of the first program or a declaration of branching impossibility, defining a set of properties to be proven, each associated with one of the constitutive elements of the first program, said set of properties comprising the branching impossibility as a particular property and establishment of the formal proof of the set of properties.
摘要:
In one embodiment, a micro-processing system includes a hardware structure disposed on a processor core. The hardware structure includes a plurality of entries, each of which are associated with portion of code and a translation of that code which can be executed to achieve substantially equivalent functionality. The hardware structure includes a redirection array that enables, when referenced, execution to be redirected from a portion of code to its counterpart translation. The entries enabling such redirection are maintained within or evicted from the hardware structure based on usage information for the entries.
摘要:
Instructions and logic provide vector load-op and/or store-op with stride functionality. Some embodiments, responsive to an instruction specifying: a set of loads, a second operation, destination register, operand register, memory address, and stride length; execution units read values in a mask register, wherein fields in the mask register correspond to stride-length multiples from the memory address to data elements in memory. A first mask value indicates the element has not been loaded from memory and a second value indicates that the element does not need to be, or has already been loaded. For each having the first value, the data element is loaded from memory into the corresponding destination register location, and the corresponding value in the mask register is changed to the second value. Then the second operation is performed using corresponding data in the destination and operand registers to generate results. The instruction may be restarted after faults.
摘要:
A processor includes an execution pipeline and monitoring circuity. The execution pipeline is configured to execute instructions of program code. The monitoring circuity is configured to monitor the instructions in a segment of a repetitive sequence of the instructions so as to construct a specification of register access by the monitored instructions, to parallelize execution of the repetitive sequence based on the corrected specification, and to terminate monitoring of the instructions and discard the specification in response to detecting a branch mis-prediction in the monitored instructions.
摘要:
A method for translating instructions for a processor. The method includes accessing a plurality of guest instructions that comprise multiple guest branch instructions, and assembling the plurality of guest instructions into a guest instruction block. The guest instruction block is converted into a corresponding native conversion block. The native conversion block is stored into a native cache. A mapping of the guest instruction block to corresponding native conversion block is stored in a conversion look aside buffer. Upon a subsequent request for a guest instruction, the conversion look aside buffer is indexed to determine whether a hit occurred, wherein the mapping indicates whether the guest instruction has a corresponding converted native instruction in the native cache. The converted native instruction is forwarded for execution in response to the hit.
摘要:
Techniques are disclosed relating to a cache for patterns of instructions. In some embodiments, an apparatus includes an instruction cache and is configured to detect a pattern of execution of instructions by an instruction processing pipeline. The pattern of execution may involve execution of only instructions in a particular group of instructions. The instructions may include multiple backward control transfers and/or a control transfer instruction that is taken in one iteration of the pattern and not taken in another iteration of the pattern. The apparatus may be configured to store the instructions in the instruction cache and fetch and execute the instructions from the instruction cache. The apparatus may include a branch predictor dedicated to predicting the direction of control transfer instructions for the instruction cache. Various embodiments may reduce power consumption associated with instruction processing.
摘要:
Embodiments relate to a system for relative offset branching in a reduced instruction set computing (RISC) architecture. One aspect is a system that includes memory and a processing circuit communicatively coupled to the memory. The system is configured to perform a method that includes fetching a branch instruction from an instruction stream having a fixed instruction width. A relative offset value is acquired from the instruction stream. The relative offset value is formatted as an offset relative to a program counter value and sized as a multiple of the fixed instruction width. The relative offset value is added with the program counter value to form a branch target address value. The branch target address value is loaded into a program counter based on the branch instruction. Execution of the instruction stream is redirected to a next instruction based on the branch target address value in the program counter.
摘要:
According to an aspect, management of auxiliary branch prediction in a processing system including a primary branch predictor and an auxiliary branch predictor is provided. A congruence class of the auxiliary branch predictor is located based on receiving a primary branch predictor misprediction indicator corresponding to a mispredicted target address of the primary branch predictor. An entry is identified in the congruence class having an auxiliary usefulness level set to a least useful level with respect to one or more other entries of the congruence class. Auxiliary data corresponding to the mispredicted target address is installed into the entry. The auxiliary usefulness level of the entry is reset to an initial value based on installing the auxiliary data.