摘要:
In one embodiment, a processor includes logic, responsive to a first instruction, to perform an operation on a first source operand and a second source operand associated with the first instruction and write a result of the operation to a destination location comprising a third source operand. The write may be a partial write of the destination location to maintain an unmodified portion of the third source operand. Other embodiments are described and claimed.
摘要:
A processor includes a first logic to execute an instruction stream out-of-order, the instruction stream divided into a plurality of strands, the instruction stream and each strand ordered by program order (PO). The processor also includes a second logic to determine an oldest undispatched instruction in the instruction stream and store an associated PO value of the oldest undispatched instruction as an executed instruction pointer. The instruction stream includes dispatched and undispatched instructions. The processor also includes a third logic to determine a most recently retired instruction in the instruction stream and store an associated PO value of the most recently retired instruction as a retirement pointer, a fourth logic to select a range of instructions between the retirement pointer and the executed instruction pointer, and a fifth logic to identify the range of instructions as eligible for retirement.
摘要:
Embodiments described herein generally relate to the field of multi-strand out-of-order loop processing, and, more specifically, to apparatus and methods to support counted loop exits in a multi-strand loop processor. In one embodiment, a processor includes a loop accelerator comprising a strand documentation buffer and a plurality of strand execution circuits; and a binary translator to receive a plurality of loop instructions, divide the plurality of loop instructions into a plurality of strands, and store a strand documentation for each of the plurality of strands into the strand documentation buffer, each strand documentation indicating at least a number of iterations; wherein the binary translator further causes the loop accelerator to execute the plurality of strands asynchronously and in parallel using the plurality of strand execution circuits, wherein each of the strand execution circuits repeats the strand for the number of iterations indicated in the strand documentation associated with the strand.
摘要:
Embodiments of a method and apparatus for implementing and maintaining a stack of predicate values with stack synchronization instructions. In one embodiment the apparatus is an out of order hardware/software co-designed processor including instructions to explicitly manage the predicate register stack to maintain stack consistency across branches of executing that push a variable number of predicate values onto the predicate stack. In one embodiment the stack-based predicate register implementation enables early branch calculation and early branch misprediction recovery via early renaming of predicate registers.
摘要:
A processor includes a first logic to execute an instruction stream out-of-order, the instruction stream divided into a plurality of strands, the instruction stream and each strand ordered by program order (PO). The processor also includes a second logic to determine an oldest undispatched instruction in the instruction stream and store an associated PO value of the oldest undispatched instruction as an executed instruction pointer. The instruction stream includes dispatched and undispatched instructions. The processor also includes a third logic to determine a most recently retired instruction in the instruction stream and store an associated PO value of the most recently retired instruction as a retirement pointer, a fourth logic to select a range of instructions between the retirement pointer and the executed instruction pointer, and a fifth logic to identify the range of instructions as eligible for retirement.
摘要:
Embodiments described herein generally relate to the field of multi-strand out-of-order loop processing, and, more specifically, to apparatus and methods to support counted loop exits in a multi-strand loop processor. In one embodiment, a processor includes a loop accelerator comprising a strand documentation buffer and a plurality of strand execution circuits; and a binary translator to receive a plurality of loop instructions, divide the plurality of loop instructions into a plurality of strands, and store a strand documentation for each of the plurality of strands into the strand documentation buffer, each strand documentation indicating at least a number of iterations; wherein the binary translator further causes the loop accelerator to execute the plurality of strands asynchronously and in parallel using the plurality of strand execution circuits, wherein each of the strand execution circuits repeats the strand for the number of iterations indicated in the strand documentation associated with the strand.
摘要:
A processor includes a front end to receive an instruction. The processor also includes a core to execute the instruction. The core includes logic to execute a base function of the instruction to yield a result, generate a predicate value of a comparison of the result based upon a predication setting in the instruction, and set the predicate value in a register. The processor also includes a retirement unit to retire the instruction.
摘要:
An apparatus includes a register file and a binary translator to create a plurality of strands and a plurality of iteration windows, where each iteration window of the plurality of iteration windows is allocated a set of continuous registers of the register file. The apparatus further includes a buffer to store strand documentation for a strand from the plurality of strands, where the strand documentation for the strand is to include an indication of a current register base for the strand. The apparatus further includes an execution circuit to execute an instruction to update the current register base for the strand in the strand documentation for the strand based on a fixed step value and an iteration window size.
摘要:
A processor includes a front end to receive an instruction. The processor also includes a core to execute the instruction. The core includes logic to execute a base function of the instruction to yield a result, generate a predicate value of a comparison of the result based upon a predication setting in the instruction, and set the predicate value in a register. The processor also includes a retirement unit to retire the instruction.
摘要:
Methods and apparatuses to control access to a multiple bank data cache are described. In one embodiment, a processor includes conflict resolution logic to detect multiple instructions scheduled to access a same bank of a multiple bank data cache in a same clock cycle and to grant access priority to an instruction of the multiple instructions scheduled to access a highest total of banks of the multiple bank data cache. In another embodiment, a method includes detecting multiple instructions scheduled to access a same bank of a multiple bank data cache in a same clock cycle, and granting access priority to an instruction of the multiple instructions scheduled to access a highest total of banks of the multiple bank data cache.