SIGNED MULTIPLICATION USING UNSIGNED MULTIPLIER WITH DYNAMIC FINE-GRAINED OPERAND ISOLATION

    公开(公告)号:US20210141603A1

    公开(公告)日:2021-05-13

    申请号:US17151115

    申请日:2021-01-15

    Abstract: An N×N multiplier may include a N/2×N first multiplier, a N/2×N/2 second multiplier, and a N/2×N/2 third multiplier. The N×N multiplier receives two operands to multiply. The first, second and/or third multipliers are selectively disabled if an operand equals zero or has a small value. If the operands are both less than 2N/2, the second or the third multiplier are used to multiply the operands. If one operand is less than 2N/2 and the other operand is equal to or greater than 2N/2, the first multiplier is used or the second and third multipliers are used to multiply the operands. If both operands are equal to or greater than 2N/2, the first, second and third multipliers are used to multiply the operands.

    ACCELERATE NEURAL NETWORKS WITH COMPRESSION AT DIFFERENT LEVELS

    公开(公告)号:US20230153586A1

    公开(公告)日:2023-05-18

    申请号:US17578428

    申请日:2022-01-18

    CPC classification number: G06N3/063 G06F5/01 G06F7/5443

    Abstract: A neural network accelerator includes 2n multiplier circuits, 2n shifter circuits and an adder tree circuit. Each respective multiplier circuit multiplies a first value by a second value to output a first product value. Each respective first value is represented by a first predetermined number of bits beginning at a most significant bit of the first value having a value equal to 1. Each respective second value is represented by a second predetermined number of bits, and each respective first product value is represented by a third predetermined number of bits. Each respective shifter circuit receives the first product value of a corresponding multiplier circuit and left shifts the corresponding product value by the first predetermined number of bits to form a respective second product value. The adder circuit adds each respective second product value to form a partial-sum value represented by a fourth predetermined number of bits.

    HARDWARE CHANNEL-PARALLEL DATA COMPRESSION/DECOMPRESSION

    公开(公告)号:US20230047025A1

    公开(公告)日:2023-02-16

    申请号:US17969671

    申请日:2022-10-19

    Abstract: A multichannel data packer includes a plurality of two-input multiplexers and a controller. The plurality of two-input multiplexers is arranged in 2N rows and N columns in which N is an integer greater than 1. Each input of a multiplexer in a first column receives a respective bit stream of 2N channels of bit streams. Each respective bit stream includes a bit-stream length based on data in the bit stream. The multiplexers in a last column output 2N channels of packed bit streams each having a same bit-stream length. The controller controls the plurality of multiplexers so that the multiplexers in the last column output the 2N channels of bit streams that each has the same bit-stream length.

    PARTIAL SUM COMPRESSION
    14.
    发明申请

    公开(公告)号:US20220413805A1

    公开(公告)日:2022-12-29

    申请号:US17407150

    申请日:2021-08-19

    Abstract: A method for performing a neural network operation. In some embodiments, method includes: calculating a first plurality of products, each of the first plurality of products being the product of a weight and an activation; calculating a first partial sum, the first partial sum being the sum of the products; and compressing the first partial sum to form a first compressed partial sum.

    MIXED-PRECISION NEURAL NETWORK ACCELERATOR TILE WITH LATTICE FUSION

    公开(公告)号:US20220405559A1

    公开(公告)日:2022-12-22

    申请号:US17463544

    申请日:2021-08-31

    Abstract: A neural network accelerator is disclosed that includes a multiplication unit, an adder-tree unit and an accumulator unit. The multiplication unit and the adder tree unit are configured to perform lattice-multiplication operations. The accumulator unit is coupled to an output of the adder tree to form dot-product values from the lattice-multiplication operations performed by the multiplication unit and the adder tree unit. The multiplication unit includes n multiplier units that perform lattice-multiplication-based operations and output product values. Each multiplier unit includes a plurality of multipliers. Each multiplier unit receives first and second multiplicands that each include a most significant nibble (MSN) and a least significant nibble (LSN). The multipliers in each multiplier unit receive different combinations of the MSNs and the LSNs of the multiplicands. The multiplication unit and the adder can provide mixed-precision dot-product computations.

    DUAL-SPARSE NEURAL PROCESSING UNIT WITH MULTI-DIMENSIONAL ROUTING OF NON-ZERO VALUES

    公开(公告)号:US20220156568A1

    公开(公告)日:2022-05-19

    申请号:US17521840

    申请日:2021-11-08

    Abstract: A general matrix-matrix (GEMM) accelerator core includes first and second buffers, a control logic circuit, and a first processing element (PE). The first buffer receives a elements of a first matrix A of activation values. The second buffer receives b elements of a second matrix B of weight values. The control logic circuit replaces a zero-valued a element in a first column of the first buffer with a nonzero-valued a element that is within a maximum borrowing distance of a location of the zero-valued a element in the first column of the first buffer. The PE receives a elements from the first column of the first buffer including the nonzero-valued element a selected to replace the zero-valued a element and receives b elements from locations in the second buffer that correspond to locations in the first buffer from where the a elements have been received by the PE.

    SUPPORTING FLOATING POINT 16 (FP16) IN DOT PRODUCT ARCHITECTURE

    公开(公告)号:US20210319079A1

    公开(公告)日:2021-10-14

    申请号:US17153871

    申请日:2021-01-20

    Abstract: A dot-product architecture and method are disclosed for calculating floating-point dot-products of two vectors. The architecture includes an array of multiplier units that each include an integer logic that multiplies integer values of corresponding elements of the two vectors; an exponent logic that adds exponent values of the corresponding elements of the two vectors to form an unbiased exponent values, and a local shifter that forms a first shifted value by shifting a product-integer value by a number of bits in a predetermined direction based on a difference value between an unbiased exponent value corresponding to the product-integer value and a maximum unbiased exponent value for the array of multiplier units. An adder tree adds shifted values output from local shifters of the array of multiplier units to form an output, and an accumulator accumulates the output of the addition unit.

    MIXED-PRECISION NEURAL PROCESSING UNIT (NPU) USING SPATIAL FUSION WITH LOAD BALANCING

    公开(公告)号:US20210312325A1

    公开(公告)日:2021-10-07

    申请号:US16898433

    申请日:2020-06-10

    Abstract: According to one general aspect, an apparatus may include a machine learning system. The machine learning system may include a precision determination circuit configured to: determine a precision level of data, and divide the data into a data subdivision. The machine learning system may exploit sparsity during the computation of each subdivision. The machine learning system may include a load balancing circuit configured to select a load balancing technique, wherein the load balancing technique includes alternately loading the computation circuit with at least a first data/weight subdivision combination and a second data/weight subdivision combination. The load balancing circuit may be configured to load a computation circuit with a selected data subdivision and a selected weight subdivision based, at least in part, upon the load balancing technique. The machine learning system may include a computation circuit configured to compute a partial computation result based, at least in part, upon the selected data subdivision and the weight subdivision.

    SIGNED MULTIPLICATION USING UNSIGNED MULTIPLIER WITH DYNAMIC FINE-GRAINED OPERAND ISOLATION

    公开(公告)号:US20200150924A1

    公开(公告)日:2020-05-14

    申请号:US16276582

    申请日:2019-02-14

    Abstract: An N×N multiplier may include a N/2×N first multiplier, a N/2×N/2 second multiplier, and a N/2×N/2 third multiplier. The N×N multiplier receives two operands to multiply. The first, second and/or third multipliers are selectively disabled if an operand equals zero or has a small value. If the operands are both less than 2N/2, the second or the third multiplier are used to multiply the operands. If one operand is less than 2N/2 and the other operand is equal to or greater than 2N/2, the first multiplier is used or the second and third multipliers are used to multiply the operands. If both operands are equal to or greater than 2N/2, the first, second and third multipliers are used to multiply the operands.

Patent Agency Ranking