摘要:
Systems and methods for predicting out-of-order instruction-level parallelism (ILP) of threads being executed in a multi-threaded processor and prioritizing scheduling thereof are described herein. One aspect provides for tracking completion of instructions using a global completion table having a head segment and a tail segment; storing prediction values for each instruction in a prediction table indexed via instruction identifiers associated with each instruction, a prediction value being configured to indicate an instruction is predicted to issue from one of: the head segment and the tail segment; and predicting threads with more instructions issuing from the tail segment have a higher degree of out-of-order instruction-level parallelism. Other embodiments and aspects are also described herein.
摘要:
A hardened store-in cache system includes a store-in cache having lines of a first linesize stored with checkbits, wherein the checkbits include byte-parity bits, and an ancillary store-only cache (ASOC) that holds a copy of most recently stored-to lines of the store-in cache. The ASOC includes fewer lines than the store-in cache, each line of the ASOC having the first linesize stored with the checkbits.
摘要:
A method for implementing dynamic refresh protocols for DRAM based cache includes partitioning a DRAM cache into a refreshable portion and a non-refreshable portion, and assigning incoming individual cache lines to one of the refreshable portion and the non-refreshable portion of the cache based on a usage history of the cache lines. Cache lines corresponding to data having a usage history below a defined frequency are assigned to the refreshable portion of the cache, and cache lines corresponding to data having a usage history at or above the defined frequency are assigned to the non-refreshable portion of the cache.
摘要:
Disclosed is an apparatus to reduce broadcasts in multiprocessors including a plurality of processors; a plurality of memory caches associated with the processors; a plurality of translation lookaside buffers (TLBs) associated with the processors; and a physical memory shared with the processors memory caches and TLBs; wherein each TLB includes a plurality of entries for translation of a page of addresses from virtual memory to physical memory, each TLB entry having page characterization information indicating whether the page is private to one processor or shared with more than one processor. Also disclosed is a computer program product and method to reduce broadcasts in multiprocessors.
摘要:
Iterative write pausing techniques to improve read latency of memory systems including memory systems with phase change memory (PCM) devices. A PCM device includes a plurality of memory locations and a mechanism for executing an iterative write to one or more of the memory locations in response to receiving a write command that includes data to be written. The executing includes initiating the iterative write, updating a state of the iterative write, pausing the iterative write including saving the state in response to receiving a pause command, and resuming the iterative write in response to receiving a resume command. The resuming is responsive to the saved state and to the data to be written.
摘要:
A method for implementing dynamic refresh protocols for DRAM based cache includes partitioning a DRAM cache into a refreshable portion and a non-refreshable portion, and assigning incoming individual cache lines to one of the refreshable portion and the non-refreshable portion of the cache based on a usage history of the cache lines. Cache lines corresponding to data having a usage history below a defined frequency are assigned to the refreshable portion of the cache, and cache lines corresponding to data having a usage history at or above the defined frequency are assigned to the non-refreshable portion of the cache.
摘要:
A two level branch history table (TLBHT) is substantially improved by providing a mechanism to prefetch entries from the very large second level branch history table (L2 BHT) into the active (very fast) first level branch history table (L1 BHT) before the processor uses them in the branch prediction process and at the same time prefetch cache misses into the instruction cache. The mechanism prefetches entries from the very large L2 BHT into the very fast L1 BHT before the processor uses them in the branch prediction process. A TLBHT is successful because it can prefetch branch entries into the L1 BHT sufficiently ahead of the time the entry is needed. This feature of the TLBHT is also used to prefetch instructions into the cache ahead of their use. In fact, the timeliness of the prefetches produced by the TLBHT can be used to remove most of the cycle time penalty incurred by cache misses.
摘要:
A hardware based method for determining when to migrate cache lines to the cache bank closest to the requesting processor to avoid remote access penalty for future requests. In a preferred embodiment, decay counters are enhanced and used in determining the cost of retaining a line as opposed to replacing it while not losing the data. In one embodiment, a minimization of off-chip communication is sought; this may be particularly useful in a CMP environment.
摘要:
A system and method of providing a cache system having a store-in policy and affording the advantages of store-in cache operation, while simultaneously providing protection against soft-errors in locally modified data, which would normally preclude the use of a store-in cache when reliability is paramount. The improved store-in cache mechanism includes a store-in L1 cache, at least one higher-level storage hierarchy; an ancillary store-only cache (ASOC) that holds most recently stored-to lines of the store-in L1 cache, and a cache controller that controls storing of data to the ancillary store-only cache (ASOC) and recovering of data from the ancillary store-only cache (ASOC) such that the data from the ancillary store-only cache (ASOC) is used only if parity errors are encountered in the store-in L1 cache.
摘要:
An integrated circuit (IC) including unit power control, leakage reduction circuit for controllably reducing leakage power with reduced LdI/dt noise in the IC and, an activity prediction unit invoking active/dormant states in IC units. The prediction unit determines turn on and turn off times for each IC unit. The prediction unit controls a supply voltage select circuit selectively passing a supply voltage to a separate supply line at the predicted turn on time and selectively blocking the supply voltage at the predicted turn off time.