MIXING SPARSITY COMPRESSION
    1.
    发明申请

    公开(公告)号:US20230100930A1

    公开(公告)日:2023-03-30

    申请号:US17449576

    申请日:2021-09-30

    Abstract: Techniques for compressing a neural network model by mixing compression ratios (sparsity patterns) are described. The weight tensor of a neural network model is divided into weight groups. The pruning cost of compressing the weight values according to a compression ratio is determined for each weight group, and a pruning cost distribution for the compression ratio is generated from the pruning costs of the weight groups. A cost threshold can then be selected from the pruning cost distribution, and weight groups having a pruning cost below the selected cost threshold are compressed according to the compression ratio. The remaining weight groups can be compressed using one or more less aggressive compression ratios. The cost threshold can be adjusted to tune the overall sparsity and accuracy of the compressed neural network.

    Throughput increase for tensor operations

    公开(公告)号:US12099840B1

    公开(公告)日:2024-09-24

    申请号:US18185236

    申请日:2023-03-16

    CPC classification number: G06F9/30018 G06F9/30032

    Abstract: A technique for performing a tensor operation includes inputting concatenated data words of a first input tensor and concatenated data words of a second input tensor into a compute channel having a plurality of compute stages coupled in series. The concatenated data words of the first input tensor and the second input tensor represented in a first datatype can be converted into data elements represented in a second datatype using a first subset of the compute stages. A binary operation can be performed on each data element represented in the second datatype from the first input tensor with a corresponding data element represented in the second datatype from the second input tensor to generate output data elements of an output tensor represented in the second datatype using a second subset of the compute stages. The output data elements of the output tensor can then be outputted from the compute channel.

    COMPUTE ENGINE WITH TRANSPOSE CIRCUITRY
    4.
    发明公开

    公开(公告)号:US20240103813A1

    公开(公告)日:2024-03-28

    申请号:US17934145

    申请日:2022-09-21

    CPC classification number: G06F7/768 G06F7/57 G06F17/16

    Abstract: An integrated circuit that combines transpose and compute operations may include a transpose circuit coupled to a set of compute channels. Each compute channel may include multiple arithmetic logic unit (ALU) circuits coupled in series. The transpose circuit is operable to receive an input tensor, transpose the input tensor, and output a transposed tensor to the set of compute channels. The set of compute channels is operable to generate outputs in parallel, with each of the outputs being generated from a corresponding vector of the transposed tensor.

Patent Agency Ranking