Abstract:
In one embodiment, a processor may be configured to write ECC granular stores into the data cache, while non-ECC granular stores may be merged with cache data in a memory request buffer. In one embodiment, a processor may be configured to detect that a victim block writeback hits one or more stores in a memory request buffer (or vice versa) and may convert the victim block writeback to a fill. In one embodiment, a processor may speculatively issue stores that are subsequent to a load from a load/store queue, but prevent the update for the stores in response to a snoop hit on the load.
Abstract:
In one embodiment, a processor may be configured to write ECC granular stores into the data cache, while non-ECC granular stores may be merged with cache data in a memory request buffer. In one embodiment, a processor may be configured to detect that a victim block writeback hits one or more stores in a memory request buffer (or vice versa) and may convert the victim block writeback to a fill. In one embodiment, a processor may speculatively issue stores that are subsequent to a load from a load/store queue, but prevent the update for the stores in response to a snoop hit on the load.
Abstract:
In one embodiment, a processor may be configured to write ECC granular stores into the data cache, while non-ECC granular stores may be merged with cache data in a memory request buffer. In one embodiment, a processor may be configured to detect that a victim block writeback hits one or more stores in a memory request buffer (or vice versa) and may convert the victim block writeback to a fill. In one embodiment, a processor may speculatively issue stores that are subsequent to a load from a load/store queue, but prevent the update for the stores in response to a snoop hit on the load.
Abstract:
In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.
Abstract:
In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache.
Abstract:
In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.
Abstract:
In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.
Abstract:
If a consumer instruction specifies a 64 bit source register comprised of results provided by two 32 bit producer instructions, the number of dependencies that must be tracked per source register can be decreased by transforming one or more of the 32 bit producer instructions so that rather than simply storing its result in a 32 bit destination register, the transformed instruction stores its result into a 64 bit logical register along with another 32 bit value held in another 32 bit register.
Abstract:
In one embodiment, a processor comprises a core configured to execute a data cache block write instruction and an interface unit coupled to the core and to an interconnect on which the processor is configured to communicate. The core is configured to transmit a request to the interface unit in response to the data cache block write instruction. If the request is speculative, the interface unit is configured to issue a first transaction on the interconnect. On the other hand, if the request is non-speculative, the interface unit is configured to issue a second transaction on the interconnect. The second transaction is different from the first transaction. For example, the second transaction may be an invalidate transaction and the first transaction may be a probe transaction. In some embodiments, the processor may be in a system including the interconnect and one or more caching agents.
Abstract:
In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache.