Abstract:
A content aware application processing system is provided for allowing directed access to data stored in a non-cache memory thereby bypassing cache coherent memory. The processor includes a system interface to cache coherent memory and a low latency memory interface to a non-cache coherent memory. The system interface directs memory access for ordinary load/store instructions executed by the processor to the cache coherent memory. The low latency memory interface directs memory access for non-ordinary load/store instructions executed by the processor to the non-cache memory, thereby bypassing the cache coherent memory. The non-ordinary load/store instruction can be a coprocessor instruction. The memory can be a low-latency type memory. The processor can include a plurality of processor cores.
Abstract:
Methods and apparatus are provided for selectively replicating a data structure in a low-latency memory. The memory includes multiple individual memory banks configured to store replicated copies of the same data structure. Upon receiving a request to access the stored data structure, a low-latency memory access controller selects one of the memory banks, then accesses the stored data from the selected memory bank. Selection of a memory bank can be accomplished using a thermometer technique comparing the relative availability of the different memory banks. Exemplary data structures that benefit from the resulting efficiencies include deterministic finite automata (DFA) graphs and other data structures that are loaded (i.e., read) more often than they are stored (i.e., written).
Abstract:
A processor for traversing deterministic finite automata (DFA) graphs with incoming packet data in real-time. The processor includes at least one processor core and a DFA module operating asynchronous to the at least one processor core for traversing at least one DFA graph stored in a non-cache memory with packet data stored in a cache-coherent memory.
Abstract:
A computer-readable instruction is described for traversing deterministic finite automata (DFA) graphs to perform a pattern search in the in-coming packet data in real-time. The instruction includes one or more pre-defined fields. One of the fields includes a DFA graph identifier for identifying one of several previously-stored DFA graphs. Another one of the fields includes an input reference for identifying input data to be processed using the identified DFA graphs. Yet another one of the fields includes an output reference for storing results generated responsive to the processed input data. The instructions are forwarded to a DFA engine adapted to process the input data using the identified DFA graph and to provide results as instructed by the output reference.
Abstract:
In one embodiment, a system comprises a memory and a memory controller that provides a cache access path to the memory and a bypass-cache access path to the memory, receives requests to read graph data from the memory on the bypass-cache access path and receives requests to read non-graph data from the memory on the cache access path. A method comprises receiving a request at a memory controller to read graph data from a memory on a bypass-cache access path, receiving a request at the memory controller to read non-graph data from the memory through a cache access path, and arbitrating, in the memory controller, among the requests using arbitration.
Abstract:
A computer system has a memory controller that includes read buffers coupled to a plurality of memory channels. The memory controller advantageously eliminates the inter-channel skew caused by memory modules being located at different distances from the memory controller. The memory controller preferably includes a channel interface and synchronization logic circuit for each memory channel. This circuit includes read and write buffers and load and unload pointers for the read buffer. Unload pointer logic generates the unload pointer and load pointer logic generates the load pointer. The pointers preferably are free-running pointers that increment in accordance with two different clock signals. The load pointer increments in accordance with a clock generated by the memory controller but that has been routed out to and back from the memory modules. The unload pointer increments in accordance with a clock generated by the computer system itself Because the trace length of each memory channel may differ, the time that it takes for a memory module to provide read data back to the memory controller may differ for each channel. The “skew” is defined as the difference in time between when the data arrives on the earliest channel and when data arrives on the latest channel. During system initialization, the pointers are synchronized. After initialization, the pointers are used to load and unload the read buffers in such a way that the effects of inner-channel skew is eliminated.
Abstract:
A computer system has a memory controller that includes read buffers coupled to a plurality of memory channels. The memory controller advantageously eliminates the inter-channel skew caused by memory modules being located at different distances from the memory controller. The memory controller preferably includes a channel interface and synchronization logic circuit for each memory channel. This circuit includes read and write buffers and load and unload pointers for the read buffer. Unload pointer logic generates the unload pointer and load pointer logic generates the load pointer. The pointers preferably are free-running pointers that increment in accordance with two different clock signals. The load pointer increments in accordance with a clock generated by the memory controller but that has been routed out to and back from the memory modules. The unload pointer increments in accordance with a clock generated by the computer system itself. Because the trace length of each memory channel may differ, the time that it takes for a memory module to provide read data back to the memory controller may differ for each channel. The “skew” is defined as the difference in time between when the data arrives on the earliest channel and when data arrives on the latest channel. During system initialization, the pointers are synchronized. After initialization, the pointers are used to load and unload the read buffers in such a way that the effects of inner-channel skew is eliminated.
Abstract:
A method and architecture for improved system resource management and allocation for the processing of request and response messages in a computer system. The resource management scheme provides for dynamically sharing system resources, such as data buffers, between request and response messages or transactions. In particular, instead of simply dedicating a portion of the system resources to requests and the remaining portion to responses, a minimum amount of resources are reserved for responses and a minimum amount for requests, while the remaining resources are dynamically shared between both types of messages. The method and architecture of the present invention allows for more efficient use of system resources, while avoiding deadlock conditions and ensuring a minimum service rate for requests.
Abstract:
In one embodiment, a system includes memory ports distributed into subsets identified by a subset index, where each memory port has an individual wait time based on a respective workload. The system further comprises a first address hashing unit configured to receive a read request including a virtual memory address associated with a replication factor and referring to graph data. The first address hashing unit translates the replication factor into a corresponding subset index based on the virtual memory address, and converts the virtual memory address to a hardware based memory address referring to graph data in the memory ports within a subset indicated by the corresponding subset index. The system further comprises a memory replication controller configured to direct read requests to the hardware based address to the one of the memory ports within the subset indicated by the corresponding subset index with a lowest individual wait time.
Abstract:
In one embodiment, a system comprises multiple memory ports distributed into multiple subsets, each subset identified by a subset index and each memory port having an individual wait time. The system further comprises a first address hashing unit configured to receive a read request including a virtual memory address associated with a replication factor, and referring to graph data. The first address hashing unit translates the replication factor into a corresponding subset index based on the virtual memory address, and converts the virtual memory address to a hardware based memory address that refers to graph data in the memory ports within a subset indicated by the corresponding subset index. The system further comprises a memory replication controller configured to direct read requests to the hardware based address to the one of the memory ports within the subset indicated by the corresponding subset index with a lowest individual wait time.