Abstract:
A system for implementing load balancing schemes includes one or more processing units, a memory, and a communication fabric with a plurality of switches coupled to the processing unit(s) and the memory. A switch of the fabric determines a first number of streams on a first input port that are targeting a first output port. The switch also determines a second number of requestors, from all input ports, that are targeting the first output port. Then, the switch calculates a throttle factor for the first input port by dividing the first number of streams by the second number of streams. The switch applies the throttle factor to regulate bandwidth on the first input port for requestors targeting the first output port. The switch also calculates throttle factors for the other ports and applies the throttle factors when regulating bandwidth on the other ports.
Abstract:
A system and method for network traffic management between multiple nodes are described. A computing system includes multiple nodes connected to one another. When a home node determines a number of nodes requesting read access for a given data block assigned to the home node exceeds a threshold and a copy of the given data block is already stored at a first node of the multiple nodes in the system, the home node sends a command to the first node. The command directs the first node to forward a copy of the given data block to the home node. The home node then maintains a copy of the given data block and forwards copies of the given data block to other requesting nodes until the home node detects a write request or a lock release request for the given data block.
Abstract:
A method and device are described for encoding erroneous data in an error correction code (ECC) protected memory. In one embodiment, incoming data including a plurality of data symbols and a data integrity marker is received. At least one extra symbol is used to mark the incoming data as error-free data or erroneous data (i.e., poison) based on the data integrity marker. ECC may be created to protect the data symbols. The ECC may include a plurality of check symbols, a plurality of unused symbols and the at least one extra symbol. In another embodiment, an error marker may be propagated from a single ECC word to all ECC words of data block (e.g., a cache line, a page, and the like) to prevent errors due to corruption of the error marker caused by faulty memory in the erroneous ECC word.
Abstract:
A processor includes a set of processing modules, each of the processing modules including a cache and a coherency manager that keeps track of the memory addresses of data stored at the caches of other processing modules. In response to its local cache requesting access to a particular memory address or other triggering event, the coherency manager generates a coherency probe. In the event that the generated coherency probe is targeted to multiple processing modules, the coherency manager includes a set of multicast bits indicating the processing modules whose caches include copies of the data targeted by the multicast probe. A transport switch that connects the processing module to the fabric communicates the coherency probe only to subset of processing modules indicated by the multicast bits.
Abstract:
A system and method for mapping an address space to a non-power-of-two number of memory channels. Addresses are translated and interleaved to the memory channels such that each memory channel has an equal amount of mapped address space. The address space is partitioned into two regions, and a first translation function is used for memory requests targeting the first region and a second translation function is used for memory requests targeting the second region. The first translation function is based on a first set of address bits and the second translation function is based on a second set of address bits.
Abstract:
Systems, methods, and techniques are provided for a fabric addressable memory. A memory access request is received from a host computing device attached via one edge port of one or more interconnect switches, the memory access request directed to a destination segment of a physical fabric memory block that is allocated in local physical memory of the host computing device. The edge port accesses a stored mapping between segments of the physical fabric memory block and one or more destination port identifiers that are each associated with a respective edge port of the fabric addressable memory. The memory access request is routed by the one edge port to a destination edge port based on the stored mapping.
Abstract:
A technique for operating a cache is disclosed. The technique includes based on a workload change, identifying a first allocation permissions policy; operating the cache according to the first allocation permissions policy; based on set sampling, identifying a second allocation permissions policy; and operating the cache according to the second allocation permissions policy.
Abstract:
The disclosed computer-implemented method includes partitioning a cache structure into a plurality of cache partitions designated by a plurality of cache types, forwarding a memory request to a cache partition corresponding to a target cache type of the memory request, and performing, using the cache partition, the memory request. Various other methods, systems, and computer-readable media are also disclosed.
Abstract:
A coherent memory fabric includes a plurality of coherent master controllers and a coherent slave controller. The plurality of coherent master controllers each include a response data buffer. The coherent slave controller is coupled to the plurality of coherent master controllers. The coherent slave controller, responsive to determining a selected coherent block read command is guaranteed to have only one data response, sends a target request globally ordered message to the selected coherent master controller and transmits responsive data. The selected coherent master controller, responsive to receiving the target request globally ordered message, blocks any coherent probes to an address associated with the selected coherent block read command until receipt of the responsive data is acknowledged by a requesting client.
Abstract:
A method includes, in response to each write request of a plurality of write requests received at a memory-side cache device coupled with a memory device, writing payload data specified by the write request to the memory-side cache device, and when a first bandwidth availability condition is satisfied, performing a cache write-through by writing the payload data to the memory device, and recording an indication that the payload data written to the memory-side cache device matches the payload data written to the memory device.