摘要:
A data processing system includes multiple processing units all having access to a shared memory system. A processing unit includes a lower level cache configured to serve as a point of systemwide coherency and a processor core coupled to the lower level cache. The processor core includes an upper level cache, an execution unit that executes a store-conditional instruction to generate a store-conditional request that specifies a store target address and store data, and a flag that, when set, indicates the store-conditional request can be completed early in the processor core. The processor core also includes completion logic configured to commit an update of the shared memory system with the store data specified by the store-conditional request based on whether the flag is set.
摘要:
Managing lowest point of coherency (LPC) memory using a service layer adapter, the adapter coupled to a processor and an accelerator on a host computing system, the processor configured for symmetric multi-processing, including receiving, by the adapter, a memory access instruction from the accelerator; retrieving, by the adapter, a real address for the memory access instruction; determining, using base address registers on the adapter, that the real address targets the LPC memory, wherein the base address registers direct memory access requests between the LPC memory and other memory locations on the host computing system; and sending, by the adapter, the memory access instruction and the real address to a media controller for the LPC memory, wherein the media controller for the LPC memory is attached to the adapter via a memory interface.
摘要:
A technique for operating a cache memory of a data processing system includes creating respective pollution vectors to track which of multiple concurrent threads executed by an associated processor core are currently polluted by a store operation resident in the cache memory. Dependencies in a dependency data structure of a store queue of the cache memory are set based on the pollution vectors to reduce unnecessary ordering effects. Store operations are dispatched from the store queue in accordance with the dependencies indicated by the dependency data structure.
摘要:
In at least some embodiments, a cache memory of a data processing system receives a transactional memory access request including a target address and a priority of the requesting memory transaction. In response, transactional memory logic detects a conflict for the target address with a transaction footprint of an existing memory transaction and accesses a priority of the existing memory transaction. In response to detecting the conflict, the transactional memory logic resolves the conflict by causing the cache memory to fail the requesting or existing memory transaction based at least in part on their relative priorities. Resolving the conflict includes at least causing the cache memory to fail the existing memory transaction when the requesting memory transaction has a higher priority than the existing memory transaction, the transactional memory access request is a transactional load request, and the target address is within a store footprint of the existing memory transaction.
摘要:
In a data processing system, a selection is made, based at least on an access type of a memory access request, between at least a first timing and a second timing of data transmission with respect to completion of error detection processing on a target memory block of the memory access request. In response to receipt of the memory access request and selection of the first timing, data from the target memory block is transmitted to a requestor prior to completion of error detection processing on the target memory block. In response to receipt of the memory access request and selection of the second timing, data from the target memory block is transmitted to the requestor after and in response to completion of error detection processing on the target memory block.
摘要:
An integrated circuit includes a first communication interface for communicatively coupling the integrated circuit with a coherent data processing system, a second communication interface for communicatively coupling the integrated circuit with an accelerator unit including an effective address-based accelerator cache for buffering copies of data from a system memory, and a real address-based directory inclusive of contents of the accelerator cache. The real address-based directory assigns entries based on real addresses utilized to identify storage locations in the system memory. The integrated circuit further includes directory control logic that configures at least a number of congruence classes utilized in the real address-based directory based on configuration parameters specified on behalf of or by the accelerator unit.
摘要:
A cache coherent data processing system includes at least non-overlapping first, second, and third coherency domains. A master in the first coherency domain of the cache coherent data processing system selects a scope of an initial broadcast of an interconnect operation from among a set of scopes including (1) a remote scope including both the first coherency domain and the second coherency domain, but excluding the third coherency domain that is a peer of the first coherency domain, and (2) a local scope including only the first coherency domain. The master then performs an initial broadcast of the interconnect operation within the cache coherent data processing system utilizing the selected scope, where performing the initial broadcast includes the master initiating broadcast of the interconnect operation within the first coherency domain.
摘要:
In a data processing system, a switch includes a receive data structure including receive entries each uniquely corresponding to a receive window, where each receive entry includes addressing information for one or more mailboxes into which messages can be injected, a send data structure including send entries each uniquely corresponding to a send window, where each send entry includes a receive window field that identifies one or more receive windows, and switch logic. The switch logic, responsive to a request to push a message to one or more receiving threads, accesses a send entry that corresponds to a send window of the sending thread, utilizes contents of the receive window field of the send entry to access one or more of the receive entries, and pushes the message to one or more mailboxes of one or more receiving threads utilizing the addressing information of the receive entry or entries.
摘要:
A technique for operating a cache memory of a data processing system includes creating respective pollution vectors to track which of multiple concurrent threads executed by an associated processor core are currently polluted by a store operation resident in the cache memory. Dependencies in a dependency data structure of a store queue of the cache memory are set based on the pollution vectors to reduce unnecessary ordering effects. Store operations are dispatched from the store queue in accordance with the dependencies indicated by the dependency data structure.
摘要:
In some embodiments, in response to execution of a load-reserve instruction that binds to a load target address held in a store-through upper level cache, a processor core sets a core reservation flag, transmits a load-reserve operation to a store-in lower level cache, and tracks, during a core reservation tracking interval, the reservation requested by the load-reserve operation until the store-in lower level cache signals that the store-in lower level cache has assumed responsibility for tracking the reservation. In response to receipt during the core reservation tracking interval of an invalidation signal indicating presence of a conflicting snooped operation, the processor core cancels the reservation by resetting the core reservation flag and fails a subsequent store-conditional operation. Responsive to not canceling the reservation during the core reservation tracking interval, the processor core determines whether a store-conditional operation succeeds by reference to a pass/fail indication provided by the store-in lower level cache.