Descriptor prefetch mechanism for high latency and out of order DMA device
    11.
    发明授权
    Descriptor prefetch mechanism for high latency and out of order DMA device 有权
    高延迟和无序的DMA设备的描述符预取机制

    公开(公告)号:US07620749B2

    公开(公告)日:2009-11-17

    申请号:US11621789

    申请日:2007-01-10

    IPC分类号: G06F13/28

    CPC分类号: G06F13/28

    摘要: A DMA device prefetches descriptors into a descriptor prefetch buffer. The size of descriptor prefetch buffer holds an appropriate number of descriptors for a given latency environment. To support a linked list of descriptors, the DMA engine prefetches descriptors based on the assumption that they are sequential in memory and discards any descriptors that are found to violate this assumption. The DMA engine seeks to keep the descriptor prefetch buffer full by requesting multiple descriptors per transaction whenever possible. The bus engine fetches these descriptors from system memory and writes them to the prefetch buffer. The DMA engine may also use an aggressive prefetch where the bus engine requests the maximum number of descriptors that the buffer will support whenever there is any space in the descriptor prefetch buffer. The DMA device discards any remaining descriptors that cannot be stored.

    摘要翻译: DMA设备将描述符预取到描述符预取缓冲区中。 描述符预取缓冲区的大小在给定的等待时间环境中保存适当数量的描述符。 为了支持描述符的链表,DMA引擎基于它们在存储器中是连续的假设来预取描述符,并丢弃任何被发现违反这个假设的描述符。 DMA引擎寻求通过每个事务请求多个描述符尽可能地保持描述符预取缓冲区已满。 总线引擎从系统内存中读取这些描述符,并将它们写入预取缓冲区。 DMA引擎还可以使用积极的预取,其中总线引擎请求缓冲区将在描述符预取缓冲器中存在任何空间时将支持的最大数量的描述符。 DMA设备丢弃任何其他不能存储的描述符。

    Barrier and Interrupt Mechanism for High Latency and Out of Order DMA Device
    12.
    发明申请
    Barrier and Interrupt Mechanism for High Latency and Out of Order DMA Device 失效
    高延迟和高阶DMA设备的屏障和中断机制

    公开(公告)号:US20080168191A1

    公开(公告)日:2008-07-10

    申请号:US11621776

    申请日:2007-01-10

    IPC分类号: G06F13/28 G06F12/14

    CPC分类号: G06F13/28

    摘要: A direct memory access (DMA) device includes a barrier and interrupt mechanism that allows interrupt and mailbox operations to occur in such a way that ensures correct operation, but still allows for high performance out-of-order data moves to occur whenever possible. Certain descriptors are defined to be “barrier descriptors.” When the DMA device encounters a barrier descriptor, it ensures that all of the previous descriptors complete before the barrier descriptor completes. The DMA device further ensures that any interrupt generated by a barrier descriptor will not assert until the data move associated with the barrier descriptor completes. The DMA controller only permits interrupts to be generated by barrier descriptors. The barrier descriptor concept also allows software to embed mailbox completion messages into the scatter/gather linked list of descriptors.

    摘要翻译: 直接存储器访问(DMA)设备包括屏障和中断机制,允许中断和邮箱操作以确保正确操作的方式发生,但仍然允许在可能的情况下发生高性能无序数据移动。 某些描述符被定义为“屏障描述符”。 当DMA设备遇到屏障描述符时,它确保所有先前的描述符在屏障描述符完成之前完成。 DMA设备进一步确保在与屏障描述符关联的数据移动完成之前,屏障描述符产生的任何中断都不会断言。 DMA控制器仅允许由屏障描述符生成中断。 屏障描述符概念还允许软件将邮箱完成消息嵌入到描述符的分散/收集链接列表中。

    Selective snooping by snoop masters to locate updated data
    13.
    发明授权
    Selective snooping by snoop masters to locate updated data 失效
    通过窥探大师进行选择性窥探以查找更新的数据

    公开(公告)号:US07395380B2

    公开(公告)日:2008-07-01

    申请号:US10393116

    申请日:2003-03-20

    IPC分类号: G06F12/00 G06F3/00

    CPC分类号: G06F12/0831 Y02D10/13

    摘要: A method and structure for snooping cache memories of several snooping masters connected to a bus macro, wherein each non-originating snooping master has cache memory, and wherein some, but less than all the cache memories, may have the data requested by an originating snooping master and wherein the needed data in a non-originating snooping master is marked as updated, and wherein a main memory having addresses for all data is connected to the bus macro.Only those non-originating snooping masters which may have the requested data are queried. All the non-originating snooping masters that have been queried reply. If a non-originating snooping master has the requested data marked as updated, that non-originating snooping master returns the updated data to the originating snooping master and possibly to the main memory. If none of the non-originating snooping masters has the requested data marked as updated, then the requested data is read from main memory.

    摘要翻译: 一种用于窥探连接到总线宏的多个窥探主机的高速缓冲存储器的方法和结构,其中每个非起始侦听主机具有高速缓冲存储器,并且其中一些但是小于所有高速缓存存储器可以具有由始发侦听器请求的数据 主站,并且其中非起始侦听主控器中的所需数据被标记为更新,并且其中具有用于所有数据的地址的主存储器连接到总线宏。 只有那些可能具有请求的数据的非始发侦听主机才被查询。 所有被查询的非始发侦听主人都回复。 如果非始发侦听主机具有被标记为更新的请求数据,则该非起始侦听主机会将更新的数据返回给始发侦听主机,并将其返回到主内存。 如果非始发侦听主机中没有一个被标记为已更新的请求数据,则从主存储器读取所请求的数据。

    Method and apparatus for bus access allocation
    14.
    发明授权
    Method and apparatus for bus access allocation 失效
    总线接入分配的方法和装置

    公开(公告)号:US07065595B2

    公开(公告)日:2006-06-20

    申请号:US10249271

    申请日:2003-03-27

    IPC分类号: G06F13/362

    CPC分类号: G06F13/3625

    摘要: A method for granting access to a bus is disclosed where a fair arbitration is modified to account for varying conditions. Each bus master (BM) is assigned a Grant Balance Factor value (hereafter GBF) that corresponds to a desired bandwidth from the bus. Arbitration gives priority BMs with a GBF greater than zero in a stratified protocol where requesting BMs with the same highest priority are granted access first. The GBF of a BM is decremented each time an access is granted. Requesting BMs with a GBF equal to zero are fairly arbitrated when there are no requesting BMs with GBFs greater than zero wherein they receive equal access using a frozen arbiter status. The bus access time may be partitioned into bus intervals (BIs) each comprising N clock cycles. BIs and GBFs may be modified to guarantee balanced access over multiple BIs in response to error conditions or interrupts.

    摘要翻译: 公开了一种允许访问总线的方法,其中公平仲裁被修改以解决变化的条件。 为每个总线主机(BM)分配一个与总线所需带宽对应的授权平衡因子值(以下称为GBF)。 仲裁在分层协议中给予GBF大于零的优先级BM,其中首先授予具有相同最高优先权的BM。 每次授予访问权限时,BM的GBF都将递减。 当没有请求具有大于零的GBF的BM时,请求具有等于零的GBF的BM被相当地仲裁,其中它们使用冷冻仲裁器状态接收相等的访问。 总线访问时间可以被划分为每个包括N个时钟周期的总线间隔(BI)。 可以修改BI和GBF,以保证响应于错误条件或中断而在多个BI上进行平衡访问。

    System crash detect and automatic reset mechanism for processor cards
    15.
    发明授权
    System crash detect and automatic reset mechanism for processor cards 失效
    处理器卡的系统崩溃检测和自动复位机制

    公开(公告)号:US5333285A

    公开(公告)日:1994-07-26

    申请号:US795562

    申请日:1991-11-21

    申请人: Bernard C. Drerup

    发明人: Bernard C. Drerup

    IPC分类号: G06F1/24 G06F11/00 G06F11/14

    摘要: A hardware and software mechanism is provided for ensuring that a feature processor card, included with other feature cards in a host system, can be reset without interrupting software running on other feature cards. A delay is provided that starts counting each time a watchdog timer expires. If the watchdog timer is reset by an interrupt service routine, then the feature card processor is assumed to be reset. But, if the watchdog timer is not reset before the delay timer expires, then it is assumed that service routine is corrupt and that external reset of the feature card is required. Upon expiration of the watchdog, an error signal is sent, via the system bus, to the host CPU. Recovery code that is resident on the host CPU is then run and resets the CPU on the feature card. A reset signal is output from the host CPU, via the system bus, to a reset register on the feature card which then forwards the signal to the feature card CPU, thereby initiating reset of the system.

    摘要翻译: 提供了一种硬件和软件机制,用于确保与主机系统中的其他功能卡一起提供的功能处理器卡可以在不中断其他功能卡上运行的软件的情况下进行复位。 提供延迟,每次看门狗定时器到期时都会开始计数。 如果看门狗定时器由中断服务程序复位,则功能卡处理器被假定为复位。 但是,如果在延迟定时器到期之前看门狗定时器未复位,则假定服务程序已损坏,并且需要外部复位功能卡。 看门狗到期后,通过系统总线将错误信号发送到主机CPU。 然后运行驻留在主机CPU上的恢复代码,并将功能卡上的CPU复位。 复位信号从主机CPU经由系统总线输出到特征卡上的复位寄存器,然后将该信号转发到特征卡CPU,从而启动系统复位。

    Collective Acceleration Unit Tree Structure
    16.
    发明申请
    Collective Acceleration Unit Tree Structure 有权
    集体加速单位树结构

    公开(公告)号:US20110238956A1

    公开(公告)日:2011-09-29

    申请号:US12749100

    申请日:2010-03-29

    摘要: A mechanism is provided in a collective acceleration unit for performing a collective operation to distribute or collect data among a plurality of participant nodes. The mechanism receives an input collective packet for a collective operation from a neighbor node within a collective tree. The input collective packet comprises a tree identifier and an input data field and wherein the collective tree comprises a plurality of sub trees. The mechanism maps the tree identifier to an index within the collective acceleration unit. The index identifies a portion of resources within the collective acceleration unit and is associated with a set of neighbor nodes in a given sub tree within the collective tree. For each neighbor node the collective acceleration unit stores destination information. The collective acceleration unit performs an operation on the input data field using the portion of resources to effect the collective operation.

    摘要翻译: 在集体加速单元中提供一种用于执行集合操作以在多个参与者节点之间分发或收集数据的机制。 该机制从集体树中的邻居节点接收用于集体操作的输入集合分组。 所述输入集合分组包括树标识符和输入数据字段,并且其中所述集合树包括多个子树。 该机制将树标识符映射到集体加速单元内的索引。 索引识别集体加速单元内的资源的一部分,并且与集合树内的给定子树中的一组相邻节点相关联。 对于每个邻居节点,集体加速单元存储目的地信息。 集体加速单元使用资源部分对输入数据字段进行操作以实现集体操作。

    Structure for piggybacking multiple data tenures on a single data bus grant to achieve higher bus utilization
    17.
    发明授权
    Structure for piggybacking multiple data tenures on a single data bus grant to achieve higher bus utilization 失效
    在单个数据总线上搭载多个数据期限的结构,以实现更高的总线利用率

    公开(公告)号:US07987437B2

    公开(公告)日:2011-07-26

    申请号:US12112818

    申请日:2008-04-30

    CPC分类号: G06F13/364

    摘要: A design structure for piggybacking multiple data tenures on a single data bus grant to achieve higher bus utilization is disclosed. In one embodiment of the design structure, a method in a computer-aided design system includes a source device sending a request for a bus grant to deliver data to a data bus connecting a source device and a destination device. The device receives the bus grant and logic within the device determines whether the bandwidth of the data bus allocated to the bus grant will be filled by the data. If the bandwidth of the data bus allocated to the bus grant will not be filled by the data, the device appends additional data to the first data and delivers the combined data to the data bus during the bus grant for the first data. When the bandwidth of the data bus allocated to the bus grant will be filled by the first data, the device delivers only the first data to the data bus during the bus grant.

    摘要翻译: 公开了一种用于在单个数据总线上搭载多个数据期限以实现更高总线利用率的设计结构。 在设计结构的一个实施例中,计算机辅助设计系统中的一种方法包括发送对总线许可的请求的源设备,以向连接源设备和目的地设备的数据总线传送数据。 设备接收总线许可,并且设备内的逻辑确定分配给总线授权的数据总线的带宽是否将被数据填充。 如果分配给总线授权的数据总线的带宽不会被数据填充,则设备将附加数据附加到第一个数据,并在第一个数据的总线授权期间将组合的数据传送到数据总线。 当分配给总线授权的数据总线的带宽将由第一个数据填充时,设备在总线授权期间只将第一个数据传送到数据总线。

    Cluster-wide system clock in a multi-tiered full-graph interconnect architecture
    18.
    发明授权
    Cluster-wide system clock in a multi-tiered full-graph interconnect architecture 有权
    多层全图互连架构中的群集范围的系统时钟

    公开(公告)号:US07921316B2

    公开(公告)日:2011-04-05

    申请号:US11853522

    申请日:2007-09-11

    IPC分类号: G06F1/00 G06F1/04 G06F1/12

    CPC分类号: G06F1/10 G06F1/12

    摘要: Mechanisms for providing a cluster-wide system clock in a multi-tiered full graph (MTFG) interconnect architecture are provided. Heartbeat signals transmitted by each of the processor chips in the computing cluster are synchronized. Internal system clock signals are generated in each of the processor chips based on the synchronized heartbeat signals. As a result, the internal system clock signals of each of the processor chips are synchronized since the heartbeat signals, that are the basis for the internal system clock signals, are synchronized. Mechanisms are provided for performing such synchronization using direct couplings of processor chips within the same processor book, different processor books in the same supernode, and different processor books in different supernodes of the MTFG interconnect architecture.

    摘要翻译: 提供了一种在多层全图(MTFG)互连架构中提供集群范围的系统时钟的机制。 计算群集中的每个处理器芯片发送的心跳信号同步。 基于同步的心跳信号,在每个处理器芯片中产生内部系统时钟信号。 结果,每个处理器芯片的内部系统时钟信号被同步,因为作为内部系统时钟信号的基础的心跳信号被同步。 提供了用于使用同一处理器书中的处理器芯片的直接耦合,同一超级节点中的不同处理器书以及MTFG互连体系结构的不同超节点中的不同处理器簿来执行这种同步的机制。

    Method for performing a direct memory access block move in a direct memory access device
    19.
    发明授权
    Method for performing a direct memory access block move in a direct memory access device 失效
    用于执行直接存储器访问块的方法在直接存储器访问设备中移动

    公开(公告)号:US07523228B2

    公开(公告)日:2009-04-21

    申请号:US11532562

    申请日:2006-09-18

    IPC分类号: G06F13/28 G06F3/00

    CPC分类号: G06F13/28

    摘要: A direct memory access (DMA) device is structured as a loosely coupled DMA engine (DE) and a bus engine (BE). The DE breaks the programmed data block moves into separate transactions, interprets the scatter/gather descriptors, and arbitrates among channels. The DE and BE use a combined read-write (RW) command that can be queued between the DE and the BE. The bus engine (BE) has two read queues and a write queue. The first read queue is for “new reads” and the second read queue is for “old reads,” which are reads that have been retried on the bus at least once. The BE gives absolute priority to new reads, and still avoids deadlock situations.

    摘要翻译: 直接存储器访问(DMA)设备被构造为松散耦合的DMA引擎(DE)和总线引擎(BE)。 DE将编程数据块移动到单独的事务中,解释分散/收集描述符,并在通道之间进行仲裁。 DE和BE使用可以在DE和BE之间排队的组合读写(RW)命令。 总线引擎(BE)具有两个读队列和一个写队列。 第一个读取队列是用于“新读取”,第二个读取队列用于“旧读取”,这是至少在总线上重试的读取。 BE绝对优先考虑新的读取,并且仍然避免死锁的情况。

    DMA Controller with Support for High Latency Devices
    20.
    发明申请
    DMA Controller with Support for High Latency Devices 失效
    支持高延迟器件的DMA控制器

    公开(公告)号:US20080126602A1

    公开(公告)日:2008-05-29

    申请号:US11532562

    申请日:2006-09-18

    IPC分类号: G06F13/28

    CPC分类号: G06F13/28

    摘要: A direct memory access (DMA) device is structured as a loosely coupled DMA engine (DE) and a bus engine (BE). The DE breaks the programmed data block moves into separate transactions, interprets the scatter/gather descriptors, and arbitrates among channels. The DE and BE use a combined read-write (RW) command that can be queued between the DE and the BE. The bus engine (BE) has two read queues and a write queue. The first read queue is for “new reads” and the second read queue is for “old reads,” which are reads that have been retried on the bus at least once. The BE gives absolute priority to new reads, and still avoids deadlock situations.

    摘要翻译: 直接存储器访问(DMA)设备被构造为松散耦合的DMA引擎(DE)和总线引擎(BE)。 DE将编程数据块移动到单独的事务中,解释分散/收集描述符,并在通道之间进行仲裁。 DE和BE使用可以在DE和BE之间排队的组合读写(RW)命令。 总线引擎(BE)具有两个读队列和一个写队列。 第一个读取队列是用于“新读取”,第二个读取队列用于“旧读取”,这是至少在总线上重试的读取。 BE绝对优先考虑新的读取,并且仍然避免死锁的情况。