Abstract:
An apparatus and method for a dual return stack buffer (RSB) for use in binary translation systems. For example, one embodiment of a processor comprises: a dual return stack buffer (DRSB) comprising a native RSB and an extended RSB (XRSB), the dual RSB to be used within a binary translation execution environment in which guest call-return instruction sequences are translated to native call-return instruction sequences to be executed directly by the processor; the native RSB to store native return addresses associated with the native call-return instruction sequences; and the XRSB to store emulated return addresses associated with the guest call-return instruction sequences, wherein each native return address stored in the RSB is associated with an emulated return address stored in the XRSB.
Abstract:
A processor comprising an instruction execution circuit to execute a second code stored at a second address of a memory, wherein the second code is translated from a first code stored at a first address of the memory and a translation table (TT) controller coupled to a translation table to store a TT entry comprising a mapping between the first address and the second address and an attribute field comprising an attribute value associated with execution of the second code, wherein the TT controller is to monitor execution of the second code by the instruction execution circuit and update, based on a performance metric of the execution, the attribute value of the TT entry.
Abstract:
Methods and apparatuses relating to preventing the execution of a modified instruction. In one embodiment, an apparatus includes a hardware binary translator to translate an instruction to a translated instruction, and a consistency hardware manager to prevent execution of the translated instruction by a hardware processor on detection of a modification to a virtual to physical address mapping of the instruction after the translation.
Abstract:
A processing device including a first shadow register, a second shadow register, and an instruction execution circuit, communicatively coupled to the first shadow register and the second shadow register, to receive a sequence of instructions comprising a first local commit marker, a first global commit marker, and a first register access instruction referencing an architectural register, speculatively execute the first register access instruction to generate a speculative register state value associated with a physical register, responsive to identifying the first local commit marker, store, in the first shadow register, the speculative register state value, and responsive to identifying the first global commit marker, store, in the second shadow register, the speculative register state value.
Abstract:
A processing device including a first shadow register, a second shadow register, and an instruction execution circuit, communicatively coupled to the first shadow register and the second shadow register, to receive a sequence of instructions comprising a first local commit marker, a first global commit marker, and a first register access instruction referencing an architectural register, speculatively execute the first register access instruction to generate a speculative register state value associated with a physical register, responsive to identifying the first local commit marker, store, in the first shadow register, the speculative register state value, and responsive to identifying the first global commit marker, store, in the second shadow register, the speculative register state value.
Abstract:
A mechanism for tracking the control flow of instructions in an application and performing one or more optimizations of a processing device, based on the control flow of the instructions in the application, is disclosed. Control flow data is generated to indicate the control flow of blocks of instructions in the application. The control flow data may include annotations that indicate whether optimizations may be performed for different blocks of instructions. The control flow data may also be used to track the execution of the instructions to determine whether an instruction in a block of instructions is assigned to a thread, a process, and/or an execution core of a processor, and to determine whether errors have occurred during the execution of the instructions.
Abstract:
A processing system implementing techniques for binary translation support using processor instruction prefixes is provided. In one embodiment, the processing system includes a register bank having a plurality of registers to store data for use in executing instructions and a processor core coupled to the register bank. An instruction to be executed by the processor core is received. The instruction is associated with a binary translator operation to translate input instruction sequences to output instruction sequences. An opcode prefix referencing an extended register of the plurality of registers to be used during the binary translator operation. The extended register preserves a source register value of the plurality of registers.
Abstract:
Technologies for partial binary translation on multi-core platforms include a shared translation cache, a binary translation thread scheduler, a global installation thread, and a local translation thread and analysis thread for each processor core. On detection of a hotspot, the thread scheduler first resumes the global thread if suspended, next activates the global thread if a translation cache operation is pending, and last schedules local translation or analysis threads for execution. Translation cache operations are centralized in the global thread and decoupled from analysis and translation. The thread scheduler may execute in a non-preemptive nucleus, and the translation and analysis threads may execute in a preemptive runtime. The global thread may be primarily preemptive with a small non-preemptive nucleus to commit updates to the shared translation cache. The global thread may migrate to any of the processor cores. Forward progress is guaranteed. Other embodiments are described and claimed.
Abstract:
An apparatus and method for a dual return stack buffer (RSB) for use in binary translation systems. An embodiment of a processor includes: a dual return stack buffer (DRSB) comprising a native RSB and an extended RSB (XRSB), the dual RSB to be used within a binary translation execution environment in which guest call-return instruction sequences are translated to native call-return instruction sequences to be executed directly by the processor; the native RSB to store native return addresses associated with the native call-return instruction sequences; and the XRSB to store emulated return addresses associated with the guest call-return instruction sequences, wherein each native return address stored in the RSB is associated with an emulated return address stored in the XRSB.
Abstract:
A hardware-software co-designed processor includes a front end to decode an instruction, an execution unit to execute the instruction, an auxiliary cache to store auxiliary information for consumption during execution of the instruction, an instruction blender, and a retirement unit to retire the instruction. The auxiliary information may include long immediate values, non-working instructions for emulating an untranslated instruction stream, or execution hints, and is not decoded by the front end. The auxiliary cache includes circuitry to receive the auxiliary information from a binary translator, to store the auxiliary information in the auxiliary cache, and to provide the auxiliary information to the instruction blender prior to execution. The instruction blender includes circuitry to receive the auxiliary information, to blend the instruction with the auxiliary information, and to provide the blended instruction to the execution unit. Use of the auxiliary cache may reduce fetch and decode bandwidth requirements.