Abstract:
Various embodiments are generally directed to techniques to detect a return-oriented programming (ROP) attack by verifying target addresses of branch instructions during execution. An apparatus includes a processor component, and a comparison component for execution by the processor component to determine whether there is a matching valid target address for a target address of a branch instruction associated with a translated portion of a routine in a table comprising valid target addresses. Other embodiments are described and claimed.
Abstract:
One embodiment provides an apparatus. The apparatus includes a processor, a chipset, a memory to store a process, and logic. The processor includes one or more core(s) and is to execute the process. The logic is to acquire performance monitoring data in response to a platform processor utilization parameter (PUP) greater than a detection utilization threshold (UT), identify a spin loop based, at least in part, on at least one of a detected hot function and/or a detected hot loop, modify the identified spin loop using binary translation to create a modified process portion, and implement redirection from the identified spin loop to the modified process portion.
Abstract:
Technologies for partial binary translation on multi-core platforms include a shared translation cache, a binary translation thread scheduler, a global installation thread, and a local translation thread and analysis thread for each processor core. On detection of a hotspot, the thread scheduler first resumes the global thread if suspended, next activates the global thread if a translation cache operation is pending, and last schedules local translation or analysis threads for execution. Translation cache operations are centralized in the global thread and decoupled from analysis and translation. The thread scheduler may execute in a non-preemptive nucleus, and the translation and analysis threads may execute in a preemptive runtime. The global thread may be primarily preemptive with a small non-preemptive nucleus to commit updates to the shared translation cache. The global thread may migrate to any of the processor cores. Forward progress is guaranteed. Other embodiments are described and claimed.
Abstract:
Technologies for shadow stack management include a computing device that, when executing a translated call routine in a translated binary, pushes a native return address on to a native stack of the computing device, adds a constant offset to a stack pointer of the computing device, executes a native call instruction to a translated call target, and, after executing the native call instruction, subtracts the constant offset from the stack pointer. Executing the native call instruction pushes a translated return address onto a shadow stack of the computing device. The computing device may map two or more virtual memory pages of the shadow stack onto a single physical memory page. The computing device may execute a translated return routine that pops the native return address from the native stack, adds the constant offset to the stack pointer, and executes a native return instruction. Other embodiments are described and claimed.
Abstract:
One embodiment provides an apparatus. The apparatus includes a processor, a chipset, a memory to store a process, and logic. The processor includes one or more core(s) and is to execute the process. The logic is to acquire performance monitoring data in response to a platform processor utilization parameter (PUP) greater than a detection utilization threshold (UT), identify a spin loop based, at least in part, on at least one of a detected hot function and/or a detected hot loop, modify the identified spin loop using binary translation to create a modified process portion, and implement redirection from the identified spin loop to the modified process portion.
Abstract:
A disclosed example apparatus includes memory; and processor circuitry to: identify a lock-protected section of instructions in the memory; replace lock/unlock instructions with transactional lock acquire and transactional lock release instructions to form a transactional process; and execute the transactional process in a speculative execution.
Abstract:
Technologies for binary translation include a computing device that allocates a translation cache shared by all threads associated with a corresponding execution domain. The computing device assigns a thread to an execution domain, translates original binary code of the thread to generate translated binary code, and installs the translated binary code into the corresponding translation cache for execution. The computing device may allocate a global region cache, generate region metadata associated with the original binary code of a thread, and store the region metadata in the global region cache. The original binary code may be translated using the region metadata. The computing device may allocate a global prototype cache, translate the original binary code of a thread to generate prototype code, and install the prototype code in the global prototype cache. The prototype code may be a non-executable version of the translated binary code. Other embodiments are described and claimed.