摘要:
A processing device includes a core to execute instructions, and memory management circuitry coupled to, memory, the core and an I/O device that supports page faults. The memory management circuitry includes an express invalidations circuitry, and a page translation permission circuitry. The memory management circuitry is to, while the core is executing the instructions, receive a command to pause communication between the I/O device and the memory. In response to receiving the command to pause the communication, modify permissions of page translations by the page translation permission circuitry and transmit an invalidation request, by the express invalidations circuitry to the I/O device, to cause cached page translations in the I/O device to be invalidated.
摘要:
A distributed processing system includes a cache coherency mechanism that essentially encodes network routing information into sectored presence bits. The mechanism organizes the sectored presence bits as one or more arbitration masks that system switches decode and use directly to route invalidate messages through one or more higher levels of the system. The lower level or levels of the system use local routing mechanisms, such as local directories, to direct the invalidate messages to the individual processors that are holding the data of interest.
摘要:
A memory system has a plurality of interleaved memory ranks that use SDRAMs requiring a periodic refresh, and an arbiter which controls access to the memory ranks and restricts access to a memory rank being refreshed. The memory ranks are interleaved on a memory module. Counting refresh registers on each memory module are associated with the module's memory ranks. The arbiter has its own counting refresh register. At regular intervals, the arbiter broadcasts a refresh signal along with a refresh address to the modules via a transaction bus. The refresh address provided by the arbiter is latched by the refresh registers which then begin counting at a pre-programmed interval. A refresh to a particular memory rank is triggered when a refresh register associated with the memory rank matches a unique identifier assigned to that rank. The arbiter uses its refresh register to identify the memory rank being refreshed, allowing the arbiter to restrict access to that memory rank. As a result, the memory ranks are refreshed sequentially without ongoing control by the arbiter.
摘要:
Various systems and methods for controlling memory traffic flow rate are described herein. A system for computer memory management, the system comprising: rate control circuitry to: receive a rate exceeded signal from monitoring circuitry, the rate exceeded signal indicating that memory traffic flow from a traffic source exceeds a threshold; receive a distress signal from a memory controller that interfaces with a memory device, the distress signal indicating that the memory device is oversubscribed; and implement throttle circuitry to throttle the memory traffic flow from the traffic source when the rate exceeded signal and the distress signal are both asserted.
摘要:
Technologies for a distributed hardware queue manager include a compute device having a procesor. The processor includes two or more hardware queue managers as well as two or more processor cores. Each processor core can enqueue or dequeue data from the hardware queue manager. Each hardware queue manager can be configured to contain several queue data structures. In some embodiments, the queues are addressed by the processor cores using virtual queue addresses, which are translated into physical queue addresses for accessing the corresponding hardware queue manager. The virtual queues can be moved from one physical queue in one hardware queue manager to a different physical queue in a different physical queue manager without changing the virtual address of the virtual queue.
摘要:
A method and apparatus for preventing system wide data dependent stalls is provided. Requests that reach the top of a probe queue and which target data that is not contained in an attached cache memory, are stalled until the data is filled into the appropriate location in cache memory. Only the associated central processor unit's probe queue is stalled and not the entire system. Accordingly, the present invention allows a system to chain together two or more concurrent operations for the same data block without adversely affecting system performance.
摘要:
A multiprocessor computer system releases a victim data buffer storing victim data, when system control logic determines that a count of the number of probe messages pending at a specified time equals the number of such probe messages that have had an address comparison performed after the specified time. The specified time occurs when a command to write the victim data element to main memory passes a serialization point of the computer system.The address comparison compares a target address of a probe message with addresses of data stored in the victim data buffer and the associated cache of a CPU of the computer system.
摘要:
In accordance with the present invention, a method and apparatus is provided for maintaining the coherency of victim data from a time when the data is stored in a victim data buffer until a time when the data is written into a main memory. Alternatively, the coherency of the victim data is preserved until a determination is made that pending probe messages do not target the victim data. At that time the victim data buffer can be deallocated.With both arrangements, a central processing unit can release a victim data buffer at a point in time other than when the data that is stored therein is read from the buffer. Thus, the central processor unit can perform the release or deallocation of the buffer when it is most efficient and when no further access to the data is required.
摘要:
A method and apparatus for preventing system wide data dependent stalls is provided. Requests that reach the top of a probe queue and which target data that is not contained in an attached cache memory subsystem, are stalled until the data is filled into the appropriate location in cache memory. Only the associated central processor unit's probe queue is stalled and not the entire system. Accordingly, the present invention allows a system to chain together two or more concurrent operations for the same data block without adversely affecting system performance.