RESOURCE MANAGEMENT SUBSYSTEM THAT MAINTAINS FAIRNESS AND ORDER
    2.
    发明申请
    RESOURCE MANAGEMENT SUBSYSTEM THAT MAINTAINS FAIRNESS AND ORDER 有权
    资源管理子系统维护公平和秩序

    公开(公告)号:US20130311999A1

    公开(公告)日:2013-11-21

    申请号:US13476791

    申请日:2012-05-21

    IPC分类号: G06F9/50

    CPC分类号: G06F9/5011 G06F2209/507

    摘要: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

    摘要翻译: 本公开的一个实施例阐述了在与重放操作相关的公共资源访问请求的调度中维持公平性和顺序的有效方式。 具体地说,流式多处理器(SM)包括配置成通过一个或多个执行周期调度访问请求的总顺序队列(TOQ)。 访问请求被允许在需要时将共同资源分配给该请求来进行进展。 在多个访问请求需要相同的公共资源的情况下,优先级被赋予较旧的访问请求。 访问请求可能处于睡眠状态,等待某些公共资源的可用性。 可以通过允许较旧的访问请求从较年轻的资源请求中窃取资源来避免死锁。 所公开的技术的一个优点是较旧的公共资源访问请求不被重复阻止以通过较新的访问请求提前进展。

    BATCHED REPLAYS OF DIVERGENT OPERATIONS
    3.
    发明申请
    BATCHED REPLAYS OF DIVERGENT OPERATIONS 有权
    批量操作的重复操作

    公开(公告)号:US20130159684A1

    公开(公告)日:2013-06-20

    申请号:US13329066

    申请日:2011-12-16

    IPC分类号: G06F9/38 G06F9/312

    CPC分类号: G06F9/3851 G06F9/3861

    摘要: One embodiment of the present invention sets forth an optimized way to execute replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back. Advantageously, divergent operations requiring two or more replay operations operate with reduced latency. Where memory access operations require transfer of more than two cache lines to service all threads, the number of clock cycles required to complete all replay operations is reduced.

    摘要翻译: 本发明的一个实施例阐述了在并行处理子系统中对发散操作执行重放操作的优化方法。 具体地说,流式多处理器(SM)包括多级流水线,其被配置为批量两个或更多个重播操作以便经由重放循环进行处理。 多级流水线内的逻辑元件检测当前流水线阶段是否正在访问共享资源,例如从共享内存加载数据。 如果线程正在访问分布在多个高速缓存行中的数据,则多级管道批量执行两个或更多个重放操作,其中重放操作被背对背地插入到管道中。 有利地,需要两次或更多次重放操作的发散操作以降低的等待时间运行。 在存储器访问操作需要传送两条以上的高速缓存行以服务所有线程的情况下,完成所有重放操作所需的时钟周期数减少。

    N-way memory barrier operation coalescing
    5.
    发明授权
    N-way memory barrier operation coalescing 有权
    N路记忆障碍操作合并

    公开(公告)号:US08997103B2

    公开(公告)日:2015-03-31

    申请号:US13441785

    申请日:2012-04-06

    摘要: One embodiment sets forth a technique for N-way memory barrier operation coalescing. When a first memory barrier is received for a first thread group execution of subsequent memory operations for the first thread group are suspended until the first memory barrier is executed. Subsequent memory barriers for different thread groups may be coalesced with the first memory barrier to produce a coalesced memory barrier that represents memory barrier operations for multiple thread groups. When the coalesced memory barrier is being processed, execution of subsequent memory operations for the different thread groups is also suspended. However, memory operations for other thread groups that are not affected by the coalesced memory barrier may be executed.

    摘要翻译: 一个实施例提出了一种用于N路存储器屏障操作合并的技术。 当为第一线程组接收到第一存储器障碍时,对于第一线程组的后续存储器操作的执行被暂停,直到执行第一存储器障碍。 不同线程组的后续内存障碍可以与第一存储器屏障合并,以产生代表多个线程组的存储器屏障操作的聚结存储器屏障。 当合并的存储器障碍被处理时,对于不同的线程组的后续存储器操作的执行也被暂停。 然而,可以执行不受聚结的存储器屏障影响的其他线程组的存储器操作。

    PRE-SCHEDULED REPLAYS OF DIVERGENT OPERATIONS
    6.
    发明申请
    PRE-SCHEDULED REPLAYS OF DIVERGENT OPERATIONS 审中-公开
    预先安排的重复操作

    公开(公告)号:US20130212364A1

    公开(公告)日:2013-08-15

    申请号:US13370173

    申请日:2012-02-09

    IPC分类号: G06F9/38 G06F9/312

    摘要: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced. One advantage of the disclosed technique is that divergent operations requiring one or more replay operations execute with reduced latency.

    摘要翻译: 本公开的一个实施例阐述了在并行处理子系统中执行用于发散操作的预先安排的重播操作的优化方式。 具体地,流式多处理器(SM)包括多级流水线,其被配置为将预先安排的重播操作插入到多级流水线中。 预先安排的重播单元检测与当前指令相关联的操作是否正在访问公共资源。 如果线程正在访问分布在多个高速缓存线上的数据,则预先安排的重播单元在当前指令后面插入预先安排的重放操作。 多级流水线顺序执行指令和相关的预先安排的重播操作。 如果附加线程在执行指令和预先安排的重放操作之后保持未被接受,则通过重放循环插入附加的重放操作,直到对所有线程进行服务。 所公开技术的一个优点是需要一个或多个重放操作的发散操作以较低的等待时间执行。

    MECHANISM FOR TRACKING AGE OF COMMON RESOURCE REQUESTS WITHIN A RESOURCE MANAGEMENT SUBSYSTEM
    7.
    发明申请
    MECHANISM FOR TRACKING AGE OF COMMON RESOURCE REQUESTS WITHIN A RESOURCE MANAGEMENT SUBSYSTEM 有权
    跟踪资源管理子系统中共同资源年龄的机制

    公开(公告)号:US20130311686A1

    公开(公告)日:2013-11-21

    申请号:US13476825

    申请日:2012-05-21

    IPC分类号: G06F5/00

    CPC分类号: H04L49/254 G06F9/46

    摘要: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

    摘要翻译: 本公开的一个实施例阐述了在与重放操作相关的公共资源访问请求的调度中维持公平性和顺序的有效方式。 具体地说,流式多处理器(SM)包括配置成通过一个或多个执行周期调度访问请求的总顺序队列(TOQ)。 访问请求被允许在需要时将共同资源分配给该请求来进行进展。 在多个访问请求需要相同的公共资源的情况下,优先级被赋予较旧的访问请求。 访问请求可能处于睡眠状态,等待某些公共资源的可用性。 可以通过允许较旧的访问请求从较年轻的资源请求中窃取资源来避免死锁。 所公开的技术的一个优点是较旧的公共资源访问请求不被重复阻止以通过较新的访问请求提前进展。