DNNS ACCELERATION WITH BLOCK-WISE N:M STRUCTURED WEIGHT SPARSITY

    公开(公告)号:US20240160483A1

    公开(公告)日:2024-05-16

    申请号:US18097200

    申请日:2023-01-13

    CPC classification number: G06F9/5027 G06F9/544

    Abstract: An accelerator core includes first and second buffers and at least one group of k processing elements. The first buffer receives at least one group of block-wise sparsified first elements. A block size (k,c) of each group of block-wise sparsified first elements includes k rows and c columns in which k is greater than or equal to 2, k times p equals K, and c times q equals C in which K is an output channel dimension of a tensor of first elements, C is a number of input channels of the tensor of first elements, p is an integer and q is an integer. The second buffer receive second elements. Each respective group of processing elements receive k rows of first elements from a block of first elements corresponding to the group of PEs, and receives second elements that correspond to first elements received from the first buffer.

    MIXED-PRECISION NEURAL PROCESSING UNIT (NPU) USING SPATIAL FUSION WITH LOAD BALANCING

    公开(公告)号:US20210312325A1

    公开(公告)日:2021-10-07

    申请号:US16898433

    申请日:2020-06-10

    Abstract: According to one general aspect, an apparatus may include a machine learning system. The machine learning system may include a precision determination circuit configured to: determine a precision level of data, and divide the data into a data subdivision. The machine learning system may exploit sparsity during the computation of each subdivision. The machine learning system may include a load balancing circuit configured to select a load balancing technique, wherein the load balancing technique includes alternately loading the computation circuit with at least a first data/weight subdivision combination and a second data/weight subdivision combination. The load balancing circuit may be configured to load a computation circuit with a selected data subdivision and a selected weight subdivision based, at least in part, upon the load balancing technique. The machine learning system may include a computation circuit configured to compute a partial computation result based, at least in part, upon the selected data subdivision and the weight subdivision.

Patent Agency Ranking