Abstract:
A first execution time of a first thread executing on a first processing unit of a multiprocessor is determined. A second execution time of a second thread executing on a second processing unit of the multiprocessor is determined, the first and second threads executing in parallel. Power is set to the first and second processing units to effectuate the first and second threads to finish executing at approximately the same time in future executions of the first and second threads. Other embodiments are also described and claimed.
Abstract:
Method, apparatus and system embodiments to schedule user-level OS-independent “shreds” without intervention of an operating system. For at least one embodiment, the shred is scheduled for execution by a scheduler routine rather than the operating system. The scheduler routine may receive compiler-generated hints from a compiler. The compiler hints may be generated by the compiler without user-provided pragmas, and may be passed to the scheduler routine via an API-like interface. The interface may include a scheduling hint data structure that is maintained by the compiler. Other embodiments are also described and claimed.
Abstract:
A first execution time of a first thread executing on a first processing unit of a multiprocessor is determined. A second execution time of a second thread executing on a second processing unit of the multiprocessor is determined, the first and second threads executing in parallel. Power is set to the first and second processing units to effectuate the first and second threads to finish executing at approximately the same time in future executions of the first and second threads. Other embodiments are also described and claimed.
Abstract:
A first execution time of a first thread executing on a first processing unit of a multiprocessor is determined. A second execution time of a second thread executing on a second processing unit of the multiprocessor is determined, the first and second threads executing in parallel. Power is set to the first and second processing units to effectuate the first and second threads to finish executing at approximately the same time in future executions of the first and second threads. Other embodiments are also described and claimed.
Abstract:
An apparatus associated with identifying a critical thread based on information gathered during meeting point processing is provided. One embodiment of the apparatus may include logic to selectively update meeting point counts for threads upon determining that they have arrived at a meeting point. The embodiment may also include logic to periodically identify which thread in a set of threads is a critical thread. The critical thread may be the slowest thread and criticality may be determined by examining meeting point counts. The embodiment may also include logic to selectively manipulate a configurable attribute of the critical thread and/or core upon which the critical thread will run.
Abstract:
An apparatus associated with identifying a critical thread based on information gathered during meeting point processing is provided. One embodiment of the apparatus may include logic to selectively update meeting point counts for threads upon determining that they have arrived at a meeting point. The embodiment may also include logic to periodically identify which thread in a set of threads is a critical thread. The critical thread may be the slowest thread and criticality may be determined by examining meeting point counts. The embodiment may also include logic to selectively manipulate a configurable attribute of the critical thread and/or core upon which the critical thread will run.
Abstract:
According to one embodiment a system is disclosed. The system includes a central processing unit (CPU), a first cache memory coupled to the CPU to store only data for vital loads that are to be immediately processed at the CPU, a second cache memory coupled to the CPU to store data for semi-vital loads to be processed at the CPU, and a third cache memory coupled to the CPU, the first cache memory and the second cache memory to store non-vital loads to be processed at the CPU.
Abstract:
Disclosed are embodiments of a system, methods and mechanism for management and translation of mapping between logical sequencer addresses and physical or logical sequencers in a multi-sequencer multithreading system. A mapping manager may manage assignment and mapping of logical sequencer addresses or pages to actual sequencers or frames of the system. Rationing logic associated with the mapping manager may take into account sequencer attributes when such mapping is performed Relocation logic associated with the mapping manager may manage spill and fill of context information to/from a backing store when re-mapping actual sequencers. Sequencers may be allocated singly, or may be allocated as part of partitioned blocks. The mapping manager may also include translation logic that provides an identifier for the mapped sequencer each time a logical sequencer address is used in a user program. Other embodiments are also described and claimed.
Abstract:
Parallel cachelets are provided for a level of cache in a microprocessor. The cachelets may be independently addressable. The level of cache may accept multiple load requests in a single cycle and apply each to a respective cachelet. Depending upon the content stored in each cachelet, the cachelet may generate a hit/miss response to the respective load request. Load requests that hit their cachelets may be satisfied therefrom. Load requests that miss their cachelets may be referred to another level of cache.
Abstract:
After an instruction loads data into a register at a first time, the register is monitored to see if it is read in a next clock cycle. When the data is not read in the next clock cycle, the instruction is classified as a slowable instruction. An instruction address associated with the instruction is used to update a history table. The history table stores information to indicate if an instruction is a slowable instruction or a non-slowable instruction. When the instruction address of the instruction is encountered at a second time, the history table is used to determine if the instruction is slowable or non-slowable.