Abstract:
Systems, apparatuses and methods suitable for optimizing synchronization mechanisms for multi-core processors are provided. The synchronizing mechanisms may be optimized by receiving a command stream which comprises a plurality of commands including one or more wait commands, wherein each wait command has an associated state and one or more associated conditions; sequentially processing each command in the command stream until a wait command is reached; checking the state associated with the wait command to be processed, wherein if said state is a blocking state, further processing of commands in the command stream is paused until each of said wait command's associated conditions are met, and wherein if said state is a non-blocking state, the next command in the command stream is retrieved and processed.
Abstract:
An assembly language program for a coarse grained reconfiguration array (CGRA), having dispatch interface information indicating operations to be performed via a dispatch interface of the CGRA to receive an input, memory interface information indicating operations to be performed via one or more memory interfaces of the CGRA, tile memory information indicating memory variables referring to memory locations to be implemented in tile memories of the CGRA, a flow description specifying one or more synchronous data flows, through the memory locations referenced via the memory variables in the tile memory information, to produce a result from the input using the CGRA.
Abstract:
Described is a computer-implemented method of reordering condition checks. Two or more condition checks in computer code that may be reordered within the code are identified. It is determined that the execution frequency of a later one of the condition checks is satisfied at a greater frequency than a preceding one of the condition checks. It is determined that there is an absence of side effects in the two or more condition checks. The values of the condition checks are propagated and abstract interpretation is performed on the values that are propagated. It is determined that the condition checks are exclusive of each other, and the condition checks are reordered within the computer code.
Abstract:
An approach to dynamic run-time alias checking comprising creating a main thread and a helper thread, computing an optimized first region of code in a rollback-only transactional memory associated with the main thread checking for one or more alias dependencies in an un-optimized first region of code, responsive to a determination in a predetermined amount of time that no alias dependencies are present in the un-optimized first region of code, committing a transaction and responsive to at least one of a failure to determine results of the check for one or more alias dependencies in the predetermined amount of time and a determination in the predetermined amount of time that alias dependencies are present in the un-optimized first region of code, performing a rollback of the transaction and executing the un-optimized first region of code.
Abstract:
Systems and methods for generating random address mapping in non-volatile memories using local and global interleaving are provided. One such method for generating a random address mapping for a non-volatile memory (NVM) involves identifying a number of bits (N) in a physical address space of the NVM, selecting G bit(s) of the N bits to be used for global interleaving, where G is less than N, determining a number of bits (N−G) to be used for local interleaving, mapping the G bit(s) using a mapping function for global interleaving, interleaving (N−G) bits using an interleaving function for local interleaving, and generating a combined mapping comprising the mapped G bit(s) and the interleaved (N−G) bits.
Abstract:
A method, system, and computer program product synchronize a group of workitems executing an instruction stream on a processor. The processor is yielded by a first workitem responsive to a synchronization instruction in the instruction stream. A first one of a plurality of program counters is updated to point to a next instruction following the synchronization instruction in the instruction stream to be executed by the first workitem. A second workitem is run on the processor after the yielding.
Abstract:
A method, system, and computer program product synchronize a group of workitems executing an instruction stream on a processor. The processor is yielded by a first workitem responsive to a synchronization instruction in the instruction stream. A first one of a plurality of program counters is updated to point to a next instruction following the synchronization instruction in the instruction stream to be executed by the first workitem. A second workitem is run on the processor after the yielding.
Abstract:
A graphics processing unit comprises a programmable execution unit executing graphics processing programs for execution threads to perform graphics processing operations, a local register memory comprising one or more registers, where registers of the register memory are assignable to store data associated with an individual execution thread that is being executed by the execution unit, and where the register(s) assigned to an individual execution thread are accessible only to that associated individual execution thread, and a further local memory that is operable to store data for use in common by plural execution threads, where the data stored in the further local memory is accessible to plural execution threads as they execute. The programmable execution unit is operable to selectively store output data for an execution thread in a register(s) of the local register memory assigned to the execution thread, and the further local memory.
Abstract:
A language-based model to support asynchronous operations set forth in a synchronous syntax is provided. The asynchronous operations are transformed in a compiler into an asynchronous pattern, such as an APM-based pattern (or asynchronous programming model based pattern). The ability to compose asynchronous operations comes from the ability to efficiently call asynchronous methods from other asynchronous methods, pause them and later resume them, and effectively implementing a single-linked stack. One example includes support for ordered and unordered compositions of asynchronous operations. In an ordered composition, each asynchronous operation is started and finished before another operation in the composition is started. In an unordered composition, each asynchronous operation is started and completed independently of the operations in the unordered composition.
Abstract:
Hardware transactional memory (HTM) systems may guarantee that transactions commit without falling back to non-speculative code paths. A transaction that fails to progress may enter a power mode, giving the transaction priority when it conflicts with non-power-mode transactions. If, during execution of a power-mode transaction, another thread attempts, using a non-power-mode transaction, to access a shared resource being accessed by the power-mode transaction, it may be determined whether any actual data conflict occurs between the two transactions. If no data conflict exists, both transactions may continue to completion. If, however, a data conflict does exist, the power-mode transaction may deny the other transaction access to the shared resource. HTM systems may, in some embodiments, ensure that only one power-mode transaction exists at a time. In other embodiments, multiple, concurrent, power-mode transactions may be supported while ensuring that they access disjoint data sets.