摘要:
Strand-based computing hardware and dynamically optimizing strandware are included in a high performance microprocessor system. The system operates in real time automatically and unobservably to parallelize single-threaded software into parallel strands for execution by cores implemented in a multi-core and/or multi-threaded microprocessor of the system. The system organizes native instructions of the strands into commit groups. With respect to each commit group, results are either atomically committed or entirely discarded. A hierarchical two-level rollback mechanism enables rolling back at a granularity of a single one of the commit groups, or alternatively rollback at a granularity of an entire strand. The system operates to respond to local events (e.g. branch misprediction) via rollback of commit groups, and to global events (e.g. strand-level mis-speculation) via rollback of strands. Rolling back of commit groups of a particular strand only affects commit groups of the particular strand, leaving other strands unaffected.
摘要:
Method and hardware apparatus are disclosed for reducing the rollback penalty on exceptions in a microprocessor executing traces of scheduled instructions. Speculative state is committed to the architectural state of the microprocessor at a series of commit points within a trace, rather than committing the state as a single atomic operation at the end of the trace.
摘要:
Strand-based computing hardware and dynamically optimizing strandware are included in a high performance microprocessor system. The system operates in real time automatically and unobservably to parallelize single-threaded software into a plurality of parallel strands for execution by cores implemented in a multi-core and/or multi-threaded microprocessor of the system. The microprocessor executes a native instruction set tailored for speculative multithreading. The strandware directs hardware of the microprocessor to collect dynamic profiling information while executing the single-threaded software. The strandware analyzes the profiling information for the parallelization, and uses binary translation and dynamic optimization to produce native instructions to store in a translation cache later accessed to execute the produced native instructions instead of some of the single-threaded software. The system is capable of parallelizing a plurality of single-threaded software applications (e.g. application software, device drivers, operating system routines or kernels, and hypervisors).