COMPUTING EFFICIENT CROSS CHANNEL OPERATIONS IN PARALLEL COMPUTING MACHINES USING SYSTOLIC ARRAYS

    公开(公告)号:US20230367740A1

    公开(公告)日:2023-11-16

    申请号:US18310129

    申请日:2023-05-01

    CPC classification number: G06F15/8046 G06F15/8007 G06N20/00

    Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.

    SYSTOLIC ARRAY OF ARBITRARY PHYSICAL AND LOGICAL DEPTH

    公开(公告)号:US20220414053A1

    公开(公告)日:2022-12-29

    申请号:US17304678

    申请日:2021-06-24

    Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.

    COMPUTING EFFICIENT CROSS CHANNEL OPERATIONS IN PARALLEL COMPUTING MACHINES USING SYSTOLIC ARRAYS

    公开(公告)号:US20210365402A1

    公开(公告)日:2021-11-25

    申请号:US16900236

    申请日:2020-06-12

    Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, the systolic array circuit modified to receive inputs from the single source register and route elements of the single source register to multiple channels in the systolic array circuit.

    Systolic array of arbitrary physical and logical depth

    公开(公告)号:US12174783B2

    公开(公告)日:2024-12-24

    申请号:US17304678

    申请日:2021-06-24

    Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.

    SHARED LOCAL REGISTERS FOR THREAD TEAM PROCESSING

    公开(公告)号:US20240112295A1

    公开(公告)日:2024-04-04

    申请号:US17958216

    申请日:2022-09-30

    CPC classification number: G06T1/20 G06F9/30098 G06F9/3836

    Abstract: Shared local registers for thread team processing is described. An example of an apparatus includes one or more processors including a graphic processor having multiple processing resources; and memory for storage of data, the graphics processor to allocate a first thread team to a first processing resource, the first thread team including hardware threads to be executed solely by the first processing resource; allocate a shared local register (SLR) space that may be directly reference in the ISA instructions to the first processing resource, the SLR space being accessible to the threads of the thread team and being inaccessible to threads outside of the thread team; and allocate individual register spaces to the thread team, each of the individual register spaces being accessible to a respective thread of the thread team.

Patent Agency Ranking