MULTIPLE ACCUMULATE BUSSES IN A SYSTOLIC ARRAY

    公开(公告)号:US20230385233A1

    公开(公告)日:2023-11-30

    申请号:US18446357

    申请日:2023-08-08

    CPC classification number: G06F15/8046 G06F7/53 G06F7/5443 G06F7/505 G06F9/3001

    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.

    SYSTOLIC ARRAY WITH INPUT REDUCTION TO MULTIPLE REDUCED INPUTS

    公开(公告)号:US20230004523A1

    公开(公告)日:2023-01-05

    申请号:US17363900

    申请日:2021-06-30

    Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reducer can receive a particular input and generate multiple reduced inputs from the input. The reduced inputs can include reduced input data elements and/or a reduced weights. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide multiple reduced inputs with second shorter bit-length to the array. The systolic array may perform multiply-accumulate operations on each unique combination of the multiple reduced input data elements and the reduced weights to generate multiple partial outputs. The systolic array may sum the partial outputs to generate the output.

    SYSTOLIC ARRAY COMPONENT COMBINING MULTIPLE INTEGER AND FLOATING-POINT DATA TYPES

    公开(公告)号:US20210157549A1

    公开(公告)日:2021-05-27

    申请号:US16698838

    申请日:2019-11-27

    Abstract: Systems and methods are provided to perform multiply-accumulate operations of multiple data types in a systolic array to increase clock speeds and/or reduce the size and quantity of systolic arrays required to perform multiply-accumulate operations of multiple data types. Each processing element in the systolic array can have a shared multiplier and one or more adders. The shared multiplier can have a separate and/or a shared circuitry where the shared circuitry is capable of performing at least a part of integer multiplication and at least a part of non-integer multiplication. The one or more adders can be a shared adder or separate adders. The shared adder can have a separate and a shared circuitry wherein the shared circuitry is capable of performing at least a part of integer addition and at least a part of non-integer addition.

    SYSTOLIC ARRAY INCLUDING FUSED MULTIPLY ACCUMULATE WITH EFFICIENT PRENORMALIZATION AND EXTENDED DYNAMIC RANGE

    公开(公告)号:US20210157548A1

    公开(公告)日:2021-05-27

    申请号:US16698809

    申请日:2019-11-27

    Inventor: Thomas Elmer

    Abstract: Systems and methods are provided to perform multiply-accumulate operations of normalized numbers in a systolic array to enable greater computational density, reduce the size of systolic arrays required to perform multiply-accumulate operations of normalized numbers, and/or enable higher throughput operation. The systolic array can be provided normalized numbers by a column of normalizers and can lack support for denormal numbers. Each normalizer can normalize the inputs to each processing element in the systolic array. The systolic array can include a multiplier and an adder. The multiplier can have multiple data paths that correspond to the data type of the input. The multiplier and adder can employ expanded exponent range to operate on normalized floating-point numbers and can lack support for denormal numbers.

    Increasing performance of computational array accelerators

    公开(公告)号:US12182691B1

    公开(公告)日:2024-12-31

    申请号:US17249900

    申请日:2021-03-17

    Abstract: To improve performance of a computational array, the architecture of the array can be modified to allow the processing engines of a column to operate in parallel and the clock frequency of the array to be increased. The processing engines of each column of the array can be grouped into a series of row groups. The processing engines of each row group can be loaded with input values, and computations on the input values can be carried out in parallel to generate the column output. One or more flip-flop stages can be inserted into the computational logic of each of the processing engines. The computational logic can then be distributed across the flip-flop stages to reduce the propagation delay between flip-flop stages of the processing engine, hence allowing the clock frequency of the array to be increased.

    Systolic array with efficient input reduction and extended array performance

    公开(公告)号:US11880682B2

    公开(公告)日:2024-01-23

    申请号:US17363894

    申请日:2021-06-30

    CPC classification number: G06F9/3001 G06F15/8046

    Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reduced input can include a reduced input data element and/or a reduced weight. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide the reduced input to the array. In order to reduce the bit-length, the reducer may reduce the number of trailing bits of the input. Further, the systolic array can receive a reduced and rounded input. The systolic array can propagate the reduced input through the processing elements in the systolic array. Each processing element may include a multiplier and/or an adder to perform arithmetical operations based on the reduced input.

    Multiple busses interleaved in a systolic array

    公开(公告)号:US11308026B1

    公开(公告)日:2022-04-19

    申请号:US16915777

    申请日:2020-06-29

    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each row of the systolic array can include multiple busses enabling independent transmission of inputs along the respective bus. Each processing element of a given row-oriented bus can receive an input from a prior element of the given row-oriented bus, and perform arithmetic operations on the input. Each processing element can generate an output partial sum based on the arithmetic operations, provide the input to a next processing element of the given row-oriented bus, without the input being processed by a processing element of the row located between the two processing elements that uses a different row-oriented bus. Use of row-oriented busses can enable parallelization to increase speed or enable increased latency at individual processing elements.

    SYSTOLIC ARRAY INCLUDING FUSED MULTIPLY ACCUMULATE WITH EFFICIENT PRENORMALIZATION AND EXTENDED DYNAMIC RANGE

    公开(公告)号:US20240361986A1

    公开(公告)日:2024-10-31

    申请号:US18767411

    申请日:2024-07-09

    Inventor: Thomas Elmer

    CPC classification number: G06F7/5443 G06F15/8046

    Abstract: Systems and methods are provided to perform multiply-accumulate operations of at least one normalized number in a systolic array. The systolic array can obtain a first input and detect that the first input is denormal. Based on determining the first input is denormal, the systolic array can generate a first normalized number by normalizing the first input. Processing elements of the systolic array can include a multiplier and an adder. The multiplier can multiply the first normalized number by a second normal or normalized number to generate a multiplier product and the adder can add an input partial sum to the multiplier product to generate an addition result.

Patent Agency Ranking