Spill-After Programming Model for the Streaming Wave Coalescer

    公开(公告)号:US20250130811A1

    公开(公告)日:2025-04-24

    申请号:US18895737

    申请日:2024-09-25

    Abstract: An apparatus and method for efficiently migrating the execution of threads between multiple parallel lanes of execution. In various implementations, a computing system includes multiple vector processing circuits of a compute circuit that executes multiple lanes of multiple waves. Each lane includes a key indicating a path of execution. When a lane of the multiple lanes of execution executes a stream wave coalescing (SWC) reorder instruction, a control circuit compares keys of waves that have previously executed the SWC reorder instruction. When the number of lanes with a matching key exceeds a threshold and after identifying at least this number of lanes to swap, the control circuit swaps continuation state information (live active state information) between lanes of an emitting wave that do not have a matching key and lanes of contributing waves that do have a matching key. The resulting (reordered) emitting wave executes more efficiently, which increases performance.

    Mitigation Of Undershoot And Overshoot On A Power Rail

    公开(公告)号:US20250004516A1

    公开(公告)日:2025-01-02

    申请号:US18346070

    申请日:2023-06-30

    Abstract: An apparatus and method for efficiently managing voltage droop among replicated compute circuits of an integrated circuit. In various implementations, an integrated circuit includes multiple, replicated compute circuits, each with the circuitry of multiple lanes of execution. Control circuitry of the integrated circuit identifies, early in execution pipelines, groups of instructions to be executed by a corresponding compute circuit, and generates a total power consumption estimate for the groups. The control circuitry maintains N previous total power consumption estimates, and stores the N power consumption estimates in staging circuitry referred to as an “instruction history pipeline.” If any differences between total power consumption estimates of different stages of the instruction history pipeline exceeds a corresponding threshold, then the control circuitry reduces, late in the execution pipeline, the rate of instruction execution of computation lanes of a corresponding compute circuit.

    STREAMING WAVE COALESCER CIRCUIT
    3.
    发明申请

    公开(公告)号:US20250068429A1

    公开(公告)日:2025-02-27

    申请号:US18536982

    申请日:2023-12-12

    Abstract: A Streaming Wave Coalescer (SWC) circuit stores a first set of state values associated with a first subset of threads of a first wave in a bin based on each of the first subset of threads including a first set of instructions to be executed. A second set of state values associated with a second subset of threads of a second wave is stored in the bin based on each of the second subset of threads including the first set of instructions to be executed and based on the first wave and the second wave both being associated with a hard key. A third wave is formed from the threads of the first subset and the second subset and is emitted for execution. As a result of reorganizing the threads and reconstituting a different wave, thread divergence of waves sent for execution is reduced.

Patent Agency Ranking