摘要:
A DMA device prefetches descriptors into a descriptor prefetch buffer. The size of descriptor prefetch buffer holds an appropriate number of descriptors for a given latency environment. To support a linked list of descriptors, the DMA engine prefetches descriptors based on the assumption that they are sequential in memory and discards any descriptors that are found to violate this assumption. The DMA engine seeks to keep the descriptor prefetch buffer full by requesting multiple descriptors per transaction whenever possible. The bus engine fetches these descriptors from system memory and writes them to the prefetch buffer. The DMA engine may also use an aggressive prefetch where the bus engine requests the maximum number of descriptors that the buffer will support whenever there is any space in the descriptor prefetch buffer. The DMA device discards any remaining descriptors that cannot be stored.
摘要:
A direct memory access (DMA) device includes a barrier and interrupt mechanism that allows interrupt and mailbox operations to occur in such a way that ensures correct operation, but still allows for high performance out-of-order data moves to occur whenever possible. Certain descriptors are defined to be “barrier descriptors.” When the DMA device encounters a barrier descriptor, it ensures that all of the previous descriptors complete before the barrier descriptor completes. The DMA device further ensures that any interrupt generated by a barrier descriptor will not assert until the data move associated with the barrier descriptor completes. The DMA controller only permits interrupts to be generated by barrier descriptors. The barrier descriptor concept also allows software to embed mailbox completion messages into the scatter/gather linked list of descriptors.
摘要:
A method and structure for snooping cache memories of several snooping masters connected to a bus macro, wherein each non-originating snooping master has cache memory, and wherein some, but less than all the cache memories, may have the data requested by an originating snooping master and wherein the needed data in a non-originating snooping master is marked as updated, and wherein a main memory having addresses for all data is connected to the bus macro.Only those non-originating snooping masters which may have the requested data are queried. All the non-originating snooping masters that have been queried reply. If a non-originating snooping master has the requested data marked as updated, that non-originating snooping master returns the updated data to the originating snooping master and possibly to the main memory. If none of the non-originating snooping masters has the requested data marked as updated, then the requested data is read from main memory.
摘要:
A method for granting access to a bus is disclosed where a fair arbitration is modified to account for varying conditions. Each bus master (BM) is assigned a Grant Balance Factor value (hereafter GBF) that corresponds to a desired bandwidth from the bus. Arbitration gives priority BMs with a GBF greater than zero in a stratified protocol where requesting BMs with the same highest priority are granted access first. The GBF of a BM is decremented each time an access is granted. Requesting BMs with a GBF equal to zero are fairly arbitrated when there are no requesting BMs with GBFs greater than zero wherein they receive equal access using a frozen arbiter status. The bus access time may be partitioned into bus intervals (BIs) each comprising N clock cycles. BIs and GBFs may be modified to guarantee balanced access over multiple BIs in response to error conditions or interrupts.
摘要:
A hardware and software mechanism is provided for ensuring that a feature processor card, included with other feature cards in a host system, can be reset without interrupting software running on other feature cards. A delay is provided that starts counting each time a watchdog timer expires. If the watchdog timer is reset by an interrupt service routine, then the feature card processor is assumed to be reset. But, if the watchdog timer is not reset before the delay timer expires, then it is assumed that service routine is corrupt and that external reset of the feature card is required. Upon expiration of the watchdog, an error signal is sent, via the system bus, to the host CPU. Recovery code that is resident on the host CPU is then run and resets the CPU on the feature card. A reset signal is output from the host CPU, via the system bus, to a reset register on the feature card which then forwards the signal to the feature card CPU, thereby initiating reset of the system.
摘要:
A mechanism is provided in a collective acceleration unit for performing a collective operation to distribute or collect data among a plurality of participant nodes. The mechanism receives an input collective packet for a collective operation from a neighbor node within a collective tree. The input collective packet comprises a tree identifier and an input data field and wherein the collective tree comprises a plurality of sub trees. The mechanism maps the tree identifier to an index within the collective acceleration unit. The index identifies a portion of resources within the collective acceleration unit and is associated with a set of neighbor nodes in a given sub tree within the collective tree. For each neighbor node the collective acceleration unit stores destination information. The collective acceleration unit performs an operation on the input data field using the portion of resources to effect the collective operation.
摘要:
A design structure for piggybacking multiple data tenures on a single data bus grant to achieve higher bus utilization is disclosed. In one embodiment of the design structure, a method in a computer-aided design system includes a source device sending a request for a bus grant to deliver data to a data bus connecting a source device and a destination device. The device receives the bus grant and logic within the device determines whether the bandwidth of the data bus allocated to the bus grant will be filled by the data. If the bandwidth of the data bus allocated to the bus grant will not be filled by the data, the device appends additional data to the first data and delivers the combined data to the data bus during the bus grant for the first data. When the bandwidth of the data bus allocated to the bus grant will be filled by the first data, the device delivers only the first data to the data bus during the bus grant.
摘要:
Mechanisms for providing a cluster-wide system clock in a multi-tiered full graph (MTFG) interconnect architecture are provided. Heartbeat signals transmitted by each of the processor chips in the computing cluster are synchronized. Internal system clock signals are generated in each of the processor chips based on the synchronized heartbeat signals. As a result, the internal system clock signals of each of the processor chips are synchronized since the heartbeat signals, that are the basis for the internal system clock signals, are synchronized. Mechanisms are provided for performing such synchronization using direct couplings of processor chips within the same processor book, different processor books in the same supernode, and different processor books in different supernodes of the MTFG interconnect architecture.
摘要:
A direct memory access (DMA) device is structured as a loosely coupled DMA engine (DE) and a bus engine (BE). The DE breaks the programmed data block moves into separate transactions, interprets the scatter/gather descriptors, and arbitrates among channels. The DE and BE use a combined read-write (RW) command that can be queued between the DE and the BE. The bus engine (BE) has two read queues and a write queue. The first read queue is for “new reads” and the second read queue is for “old reads,” which are reads that have been retried on the bus at least once. The BE gives absolute priority to new reads, and still avoids deadlock situations.
摘要:
A direct memory access (DMA) device is structured as a loosely coupled DMA engine (DE) and a bus engine (BE). The DE breaks the programmed data block moves into separate transactions, interprets the scatter/gather descriptors, and arbitrates among channels. The DE and BE use a combined read-write (RW) command that can be queued between the DE and the BE. The bus engine (BE) has two read queues and a write queue. The first read queue is for “new reads” and the second read queue is for “old reads,” which are reads that have been retried on the bus at least once. The BE gives absolute priority to new reads, and still avoids deadlock situations.