THREAD-LEVEL SLEEP IN A MASSIVELY MULTITHREADED ARCHITECTURE

    公开(公告)号:US20180314522A1

    公开(公告)日:2018-11-01

    申请号:US15582549

    申请日:2017-04-28

    Abstract: A streaming multiprocessor (SM) includes a nanosleep (NS) unit configured to cause individual threads executing on the SM to sleep for a programmer-specified interval of time. For a given thread, the NS unit parses a NANOSLEEP instruction and extracts a sleep time. The NS unit then maps the sleep time to a single bit of a timer and causes the thread to sleep. When the timer bit changes, the sleep time expires, and the NS unit awakens the thread. The thread may then continue executing. The SM also includes a nanotrap (NT) unit configured to issue traps using a similar timing mechanism to that described above. For a given thread, the NT unit parses a NANOTRAP instruction and extracts a trap time. The NT unit then maps the trap time to a single bit of a timer. When the timer bit changes, the NT unit issues a trap.

    TECHNIQUES FOR EFFICIENTLY SYNCHRONIZING MULTIPLE PROGRAM THREADS

    公开(公告)号:US20220391264A1

    公开(公告)日:2022-12-08

    申请号:US17338377

    申请日:2021-06-03

    Abstract: Various embodiments include a parallel processing computer system that enables parallel instances of a program to synchronize at disparate addresses in memory. When the parallel program instances need to exchange data, the program instances synchronize based on a mask that identifies the program instances that are synchronizing. As each program instance reaches the point of synchronization, the program instance blocks and waits for all other program instances to reach the point of synchronization. When all program instances have reached the point of synchronization, at least one program instance executes a synchronous operation to exchange data. The program instances then continue execution at respective and disparate return addresses.

    TECHNIQUES FOR EFFICIENTLY PERFORMING DATA REDUCTIONS IN PARALLEL PROCESSING UNITS

    公开(公告)号:US20210019198A1

    公开(公告)日:2021-01-21

    申请号:US16513393

    申请日:2019-07-16

    Abstract: Techniques are disclosed for reducing the latency associated with performing data reductions in a multithreaded processor. In response to a single instruction associated with a set of threads executing in the multithreaded processor, a warp reduction unit acquires register values stored in source registers, where each register value is associated with a different thread included in the set of threads. The warp reduction unit performs operation(s) on the register values to compute an aggregate value. The warp reduction unit stores the aggregate value in a destination register that is accessible to at least one of the threads in the set of threads. Because the data reduction is performed via a single instruction using hardware specialized for data reductions, the number of cycles required to perform the data reduction is decreased relative to prior-art techniques that are performed via multiple instructions using hardware that is not specialized for data reductions.

    TECHNIQUE FOR REDUCING VOLTAGE DROOP BY THROTTLING INSTRUCTION ISSUE RATE
    9.
    发明申请
    TECHNIQUE FOR REDUCING VOLTAGE DROOP BY THROTTLING INSTRUCTION ISSUE RATE 审中-公开
    通过指导性发电速率降低电压的技术

    公开(公告)号:US20150089198A1

    公开(公告)日:2015-03-26

    申请号:US14033378

    申请日:2013-09-20

    CPC classification number: G06F9/3836

    Abstract: An issue control unit is configured to control the rate at which an instruction issue unit issues instructions to an execution pipeline in order to avoid spikes in power drawn by that execution pipeline. The issue control unit maintains a history buffer that reflects, for N previous cycles, the number of instructions issued during each of those N cycles. If the total number of instructions issued during the N previous cycles exceeds a threshold value, then the issue control unit throttles the instruction issue unit from issuing instructions during a subsequent cycle. In addition, the issue control unit increases the threshold value in proportion to the number of previously issued instructions and based on a variety of configurable parameters. Accordingly, the issue control unit maintains granular control over the rate with which the instruction issue unit “ramps up” to a maximum instruction issue rate.

    Abstract translation: 问题控制单元被配置为控制指令发布单元向执行流水线发出指令的速率,以避免该执行流水线所绘制的功率尖峰。 问题控制单元保持历史缓冲器,其反映在N个先前循环中在这N个周期中的每一个期间发出的指令的数量。 如果在N个先前循环中发出的指令的总数超过阈值,则发布控制单元在随后的周期期间阻止指令发出单元发出指令。 此外,问题控制单元根据先前发布的指令的数量并且基于各种可配置参数来增加阈值。 因此,问题控制单元对指令发布单元“上升”到最大指令发布速率的速率进行细粒度控制。

Patent Agency Ranking