Virtual channels for effective packet transfer
    1.
    发明授权
    Virtual channels for effective packet transfer 有权
    用于有效数据包传输的虚拟通道

    公开(公告)号:US08539130B2

    公开(公告)日:2013-09-17

    申请号:US12873057

    申请日:2010-08-31

    IPC分类号: G06F12/00 G06F11/00

    CPC分类号: G06F13/1605

    摘要: The invention sets forth a crossbar unit that includes multiple virtual channels, each virtual channel being a logical flow of data within the crossbar unit. Arbitration logic coupled to source client subsystems is configured to select a virtual channel for transmitting a data request or a data packet to a destination client subsystem based on the type of the source client subsystem and/or the type of data request. Higher priority traffic is transmitted over virtual channels that are configured to transmit data without causing deadlocks and/or stalls. Lower priority traffic is transmitted over virtual channels that can be stalled.

    摘要翻译: 本发明提出了一种包括多个虚拟通道的交叉开关单元,每个虚拟通道是横杠单元内的逻辑数据流。 耦合到源客户端子系统的仲裁逻辑被配置为基于源客户端子系统的类型和/或数据请求的类型来选择用于将数据请求或数据分组发送到目的地客户端子系统的虚拟信道。 较高优先级的流量通过配置为传输数据而不引起死锁和/或停顿的虚拟通道进行传输。 较低优先级的流量可以通过虚拟通道进行传输,可以停滞。

    Cache and associated method with frame buffer managed dirty data pull and high-priority clean mechanism
    2.
    发明授权
    Cache and associated method with frame buffer managed dirty data pull and high-priority clean mechanism 有权
    缓存和相关方法与帧缓冲区管理脏数据拉和高优先级清理机制

    公开(公告)号:US08464001B1

    公开(公告)日:2013-06-11

    申请号:US12331305

    申请日:2008-12-09

    IPC分类号: G06F12/12 G06F13/00

    摘要: Systems and methods are disclosed for managing the number of affirmatively associated cache lines related to the different sets of a data cache unit. A tag look-up unit implements two thresholds, which may be configurable thresholds, to manage the number of cache lines related to a given set that store dirty data or are reserved for in-flight read requests. If the number of affirmatively associated cache lines in a given set is equal to a maximum threshold, the tag look-up unit stalls future requests that require an available cache line within that set to be affirmatively associated. To reduce the number of stalled requests, the tag look-up unit transmits a high priority clean notification to a frame buffer logic when the number of affirmatively associated cache lines in a given set approaches the maximum threshold. The frame buffer logic then processes requests associated with that set preemptively.

    摘要翻译: 公开了用于管理与数据高速缓存单元的不同集合相关的肯定关联的高速缓存行的数量的系统和方法。 标签查找单元实现两个阈值,其可以是可配置的阈值,以管理与存储脏数据的给定集合相关的高速缓存行的数量或者被保留用于飞行读取请求。 如果给定集合中的肯定关联的高速缓存行的数量等于最大阈值,则标签查找单元停止需要在该集合内可用的高速缓存行被肯定地关联的将来的请求。 为了减少停止请求的数量,当给定集中的肯定关联的高速缓存行的数量接近最大阈值时,标签查找单元向帧缓冲器逻辑发送高优先级的清除通知。 帧缓冲器逻辑然后预先处理与该组相关联的请求。

    Class Dependent Clean and Dirty Policy
    3.
    发明申请
    Class Dependent Clean and Dirty Policy 有权
    类依赖的清洁和肮脏的政策

    公开(公告)号:US20130124802A1

    公开(公告)日:2013-05-16

    申请号:US13296119

    申请日:2011-11-14

    IPC分类号: G06F12/08

    CPC分类号: G06F12/0804

    摘要: A method for cleaning dirty data in an intermediate cache is disclosed. A dirty data notification, including a memory address and a data class, is transmitted by a level 2 (L2) cache to frame buffer logic when dirty data is stored in the L2 cache. The data classes may include evict first, evict normal and evict last. In one embodiment, data belonging to the evict first data class is raster operations data with little reuse potential. The frame buffer logic uses a notification sorter to organize dirty data notifications, where an entry in the notification sorter stores the DRAM bank page number, a first count of cache lines that have resident dirty data and a second count of cache lines that have resident evict_first dirty data associated with that DRAM bank. The frame buffer logic transmits dirty data associated with an entry when the first count reaches a threshold.

    摘要翻译: 公开了一种用于清除中间高速缓存中的脏数据的方法。 当脏数据存储在L2高速缓存中时,包含存储器地址和数据类的脏数据通知由级别2(L2)高速缓存发送到帧缓冲器逻辑。 数据类可能包括首先驱逐,最后驱逐正常和驱逐。 在一个实施例中,属于第一数据类别的数据是具有很少重用潜力的光栅操作数据。 帧缓冲器逻辑使用通知排序器来组织脏数据通知,其中通知分类器中的条目存储DRAM存储体页面编号,具有驻留脏数据的高速缓存行的第一计数和具有居民驱逐器的第一高速缓存行计数 与该DRAM库相关联的脏数据。 当第一计数达到阈值时,帧缓冲器逻辑发送与条目相关联的脏数据。

    Coalescing to avoid read-modify-write during compressed data operations
    4.
    发明授权
    Coalescing to avoid read-modify-write during compressed data operations 有权
    聚合以避免压缩数据操作期间的读 - 修改 - 写

    公开(公告)号:US08427495B1

    公开(公告)日:2013-04-23

    申请号:US11954722

    申请日:2007-12-12

    IPC分类号: G06F12/02

    CPC分类号: G06T1/60 H04N19/423

    摘要: Write operations to a unit of compressible memory, known as a compression tile, are examined to see if data blocks to be written completely cover a single compression tile. If the data blocks completely cover a single compression tile, the write operations are coalesced into a single write operation and the single compression tile is overwritten with the data blocks. Coalescing multiple write operations into a single write operation improves performance, because it avoids the read-modify-write operations that would otherwise be needed.

    摘要翻译: 对可压缩存储器(称为压缩片)的单位进行写操作,以查看要写入的数据块是否完全覆盖单个压缩片。 如果数据块完全覆盖单个压缩块,则写入操作合并为单个写入操作,并且单个压缩块被数据块覆盖。 将多个写入操作合并为单个写入操作会提高性能,因为它避免了否则需要的读取 - 修改 - 写入操作。

    Managing conflicts on shared L2 bus

    公开(公告)号:US08195858B1

    公开(公告)日:2012-06-05

    申请号:US12510985

    申请日:2009-07-28

    IPC分类号: G06F13/36 G06F13/00

    CPC分类号: G06F12/0859 G06F2212/302

    摘要: One embodiment of the present invention sets forth a mechanism to schedule read data transmissions and write data transmissions to/from a cache to frame buffer logic on the L2 bus. When processing a read or a write command, a scheduling arbiter examines a bus schedule to determine that a read-read conflict, a read-write conflict or a write-read exists, and allocates an available memory space in a read buffer to store the read data causing the conflict until the read return data transmission can be scheduled. In the case of a write command, the scheduling arbiter then transmits a write request to a request buffer. When processing a write request, the request arbiter examines the request buffers to determine whether a write-write conflict. If so, then the request arbiter allocates a memory space in a request buffer to store the write request until the write data transmission can be scheduled.

    VIRTUAL CHANNELS FOR EFFECTIVE PACKET TRANSFER
    7.
    发明申请
    VIRTUAL CHANNELS FOR EFFECTIVE PACKET TRANSFER 有权
    用于有效分组传输的虚拟通道

    公开(公告)号:US20110072177A1

    公开(公告)日:2011-03-24

    申请号:US12873057

    申请日:2010-08-31

    IPC分类号: G06F13/14 G06F13/00

    CPC分类号: G06F13/1605

    摘要: The invention sets forth a crossbar unit that includes multiple virtual channels, each virtual channel being a logical flow of data within the crossbar unit. Arbitration logic coupled to source client subsystems is configured to select a virtual channel for transmitting a data request or a data packet to a destination client subsystem based on the type of the source client subsystem and/or the type of data request. Higher priority traffic is transmitted over virtual channels that are configured to transmit data without causing deadlocks and/or stalls. Lower priority traffic is transmitted over virtual channels that can be stalled.

    摘要翻译: 本发明提出了一种包括多个虚拟通道的交叉开关单元,每个虚拟通道是横杠单元内的逻辑数据流。 耦合到源客户端子系统的仲裁逻辑被配置为基于源客户端子系统的类型和/或数据请求的类型来选择用于将数据请求或数据分组发送到目的地客户端子系统的虚拟信道。 较高优先级的流量通过配置为传输数据而不引起死锁和/或停顿的虚拟通道进行传输。 较低优先级的流量可以通过虚拟通道进行传输,可以停滞。

    Memory addressing scheme using partition strides
    8.
    发明授权
    Memory addressing scheme using partition strides 有权
    使用分区步长的内存寻址方案

    公开(公告)号:US07872657B1

    公开(公告)日:2011-01-18

    申请号:US11454362

    申请日:2006-06-16

    摘要: Systems and methods for addressing memory where data is interleaved across different banks using different interleaving granularities improve graphics memory bandwidth by distributing graphics data for efficient access during rendering. Various partition strides may be selected to modify the number of sequential addresses mapped to each DRAM and change the interleaving granularity. A memory addressing scheme is used to allow different partition strides for each virtual memory page without causing memory aliasing problems in which physical memory locations in one virtual memory page are also mapped to another virtual memory page. When a physical memory address lies within a virtual memory page crossing region, the smallest partition stride is used to access the physical memory.

    摘要翻译: 用于寻址存储器的系统和方法,其中数据使用不同的交织粒度在不同存储体之间进行交织,通过在渲染期间分配图形数据以有效访问来提高图形存储器带宽。 可以选择各种分段步长来修改映射到每个DRAM的顺序地址的数量并改变交织粒度。 存储器寻址方案用于允许每个虚拟存储器页面的不同分区步长,而不会导致存储器混叠问题,其中一个虚拟存储器页面中的物理存储器位置也映射到另一个虚拟存储器页面。 当物理内存地址位于虚拟内存页面交叉区域内时,最小的分区步幅用于访问物理内存。

    Page stream sorter for poor locality access patterns
    9.
    发明授权
    Page stream sorter for poor locality access patterns 有权
    页面流排序器用于不良的局部访问模式

    公开(公告)号:US07664905B2

    公开(公告)日:2010-02-16

    申请号:US11592540

    申请日:2006-11-03

    IPC分类号: G06F13/14

    CPC分类号: G06F13/1626

    摘要: In some applications, such as video motion compression processing for example, a request pattern or “stream” of requests for accesses to memory (e.g., DRAM) may have, over a large number of requests, a relatively small number of requests to the same page. Due to the small number of requests to the same page, conventionally sorting to aggregate page hits may not be very effective. Reordering the stream can be used to “bury” or “hide” much of the necessary precharge/activate time, which can have a highly positive impact on overall throughput. For example, separating accesses to different rows of the same bank by at least a predetermined number of clocks can effectively hide the overhead involved in precharging/activating the rows.

    摘要翻译: 在一些应用中,例如视频运动压缩处理,例如,对存储器(例如,DRAM)访问的请求的请求模式或“流”可以在大量请求中具有相对较少数量的请求 页。 由于对同一页面的请求数量不多,常规排序以汇总页面命中可能不是很有效。 重新排序流可以用于“埋葬”或“隐藏”大量必要的预充/激活时间,这对整体吞吐量可能产生很大的积极影响。 例如,将对相同存储体的不同行的访问分离至少预定数量的时钟可以有效地隐藏预充电/激活行所涉及的开销。

    Method and apparatus for predicting multiple conditional branches
    10.
    发明授权
    Method and apparatus for predicting multiple conditional branches 有权
    用于预测多个条件分支的方法和装置

    公开(公告)号:US06272624B1

    公开(公告)日:2001-08-07

    申请号:US09285529

    申请日:1999-04-02

    IPC分类号: G06F9305

    CPC分类号: G06F9/3848

    摘要: The outcome of a plurality of branch instructions in a computer program is predicted by fetching a plurality or group of instructions in a given slot, along with a corresponding prediction. A group global history (gghist) is maintained to indicate of recent program control flow. In addition, a predictor table comprising a plurality of predictions, preferably saturating counters. A particular counter is updated when a branch is encountered. The particular counter is associated with a branch instruction by hashing the fetched instruction group's program counter (PC) with the gghist. To predict multiple branch instruction outcomes, the gghist is hashed with the PC to form an index which is used to access naturally aligned but randomly ordered predictions in the predictor table, which are then reordered based on value of the lower gghits bits. Preferably, instructions are fetched in blocks of eight instructions. The gghist is maintained by shifting in a 1 if a branch in the corresponding group is taken, or a 0 if no branch in the corresponding group is taken. The hashing function is preferably an XOR operation. Preferably, a predictor table counter is incremented when a corresponding branch is taken, but not beyond a maximum value, and is decremented when the corresponding branch is not taken, but not below zero. Preferably, the most significant bit of a counter is used to determine a prediction.

    摘要翻译: 计算机程序中的多个分支指令的结果通过在给定时隙中取出多个或一组指令以及相应的预测来预测。 维持组织全球历史(gghist)以指示最近的程序控制流程。 另外,预测器表包括多个预测,优选饱和计数器。 遇到分支时,会更新一个特定的计数器。 特定计数器与分支指令相关联,通过用gghist散列获取的指令组的程序计数器(PC)。 为了预测多个分支指令结果,gghist与PC进行散列以形成一个索引,用于访问预测器表中的自然对齐但随机排序的预测,然后基于较低g比特位的值重新排序。 优选地,指令以八个指令的块来获取。 如果相应组中的分支被采用,则通过在1中移位来维持高斯特,或者如果不对应组中的分支,则为0。 散列函数优选为异或运算。 优选地,当相应的分支被采取但不超过最大值时,预测器表计数器递增,并且当相应的分支未被采用但不低于零时递减。 优选地,计数器的最高有效位用于确定预测。