Abstract:
Technologies for optimized binary translation include a computing device that determines a cost-benefit metric associated with each translated code block of a translation cache. The cost-benefit metric is indicative of translation cost and performance benefit associated with the translated code block. The translation cost may be determined by measuring translation time of the translated code block. The cost-benefit metric may be calculated using a weighted cost-benefit function based on an expected workload of the computing device. In response to determining to free space in the translation cache, the computing device determines whether to discard each translated code block as a function of the cost-benefit metric. In response to determining to free space in the translation cache, the computing device may increment an iteration count and skip each translated code block if the iteration count modulo the corresponding cost-benefit metric is non-zero. Other embodiments are described and claimed.
Abstract:
A method and apparatus for providing support for self modifying guest code. The apparatus includes a memory, a hardware buffer, and a processor. The processor is configured to convert guest code to native code and store converted native code equivalent of the guest code into a code cache portion of the processor. The processor is further configured to maintain the hardware buffer configured for tracking respective locations of converted code in a code cache. The hardware buffer is updated based a respective access to a respective location in the memory associated with a respective location of converted code in the code cache. The processor is further configured to perform a request to modify a memory location after accessing the hardware buffer.
Abstract:
A method for dynamically configuring a graphics pipeline system. The method includes determining an optimal pipeline based on: estimating one or more of memory power consumption and computation power consumption of storing and regenerating intermediate results based on graphics state information and one or more factors; determining granularity for the optimal graphics pipeline configuration based on the graphics state information and the one or more factors; collecting runtime information for primitives from graphics pipeline hardware including factors from tessellation or using graphics state information for determining geometry expansion at an output of one or more shader stages; and determining intermediate results to save from a previous processing pass by comparing memory power consumption needed to save the intermediate results with computation power as well as memory power needed for regenerating the intermediate results in one or more later tile rendering passes.
Abstract:
A system for an agnostic runtime architecture. The system includes a system emulation/virtualization converter, an application code converter, and a converter wherein a system emulation/virtualization converter and an application code converter implement a system emulation process, and wherein the system converter implements a system and application conversion process for executing code from a guest image, wherein the system converter or the system emulator. The system further includes a reordering process through JIT (just in time) optimization that ensures loads do not dispatch ahead of other loads that are to the same address, wherein a load will check for a same address of subsequent loads from a same thread, and a thread checking process that enable other thread store checks against the entire load queue and a monitor extension.
Abstract:
An apparatus and method for performing a check on inputs to a mathematical instruction and selecting a default sequence efficiently managing the architectural state of a processor. For example, one embodiment of a processor comprises: an arithmetic logic unit (ALU) to perform a plurality of mathematical operations using one or more source operands; instruction check logic to evaluate the source operands for a current mathematical instruction and to determine, based on the evaluation, whether to execute a default sequence of operations including executing the current mathematical instruction by the ALU or to jump to an alternate sequence of operations adapted to provide a result for the mathematical instruction having particular types of source operands more efficiently than the default sequence of operations.
Abstract:
Methods, systems, and computer-readable media for providing a virtualization layer for mobile applications are presented. A computing device may parse code of an application to identify a first set of one or more classes in the application. The computing device may transmit code usable by the first set of one or more classes to a module accessible to the application and create a second set of one or more classes in the application to replace the first set of one or more classes, wherein the second set of one or more classes does not inherit from the first set of one or more classes in an object hierarchy. In some embodiments, the second set of one or more classes provides at least one different function from the first set of one or more classes. The computing device may execute the application comprising the second set of one or more classes.
Abstract:
Systems and techniques are described for modifying an executable file of an application and executing the application using the modified executable file. A described technique includes receiving, by a virtual machine, a request to perform an initial function of an application and an executable file for the application. The virtual machine modifies the executable file by redirecting the executable file to a custom runtime library that includes a custom function configured to initialize the application and to place the application in a paused state. A custom function call is added to the custom function in the executable file. The virtual machine initializes the application by executing the modified executable file, the executing causing the custom function to initialize the application and place the application in a paused state.
Abstract:
A method and system uses exceptions for code specialization in a system that supports transactions. The method and system includes inserting one or more branchless instructions into a sequence of computer instructions. The branchless instructions include one or more instructions that are executable if a commonly occurring condition is satisfied and include one or more instructions that are configured to raise an exception if the commonly occurring condition is not satisfied.
Abstract:
A method includes receiving a program code at a processor. The method also includes generating, via the processor, a heap model corresponding to the program code. The method further includes detecting, via the processor, a linearizable data structure in the program code. The method also further includes modifying, via the processor, the heap model based on the detected linearizable data structure. The method also further includes analyzing, via the processor, the program code using the modified heap model.
Abstract:
Described herein are optimizations of thread loop intermediate representation (IR) code. One embodiment involves an algorithm that, based on data-flow analysis, computes sets of temporary variables that are loaded at the beginning of a thread loop and stored upon exit from a thread loop. Another embodiment involves reducing the size of a thread loop trip for a commonly-found case where a piece of compute shader is executed by a single thread (or a compiler-analyzable range of threads). In yet another embodiment, compute shader thread indices are cached to avoid excessive divisions, further improving execution speed.