EFFICIENCY OF VISION TRANSFORMERS WITH ADAPTIVE TOKEN PRUNING

    公开(公告)号:US20230368494A1

    公开(公告)日:2023-11-16

    申请号:US17978959

    申请日:2022-11-01

    Abstract: A system and a method are disclosed for training a vision transformer. A token distillation loss of an input image based on a teacher network classification token and a token importance score of a student network (the vision transformer during training) are determined at a pruning layer of the vision transformer. When a current epoch number is odd, sparsification of tokens of the input image is skipped and the dense input image is processed by layers that are subsequent to the pruning layer. When the current epoch number is even, tokens of the input image are pruned at the pruning layer and processed by layers that are subsequent to the pruning layer. A label loss and a total loss for the input image are determined by the subsequent layers and the student network is updated.

    ACCELERATE NEURAL NETWORKS WITH COMPRESSION AT DIFFERENT LEVELS

    公开(公告)号:US20230153586A1

    公开(公告)日:2023-05-18

    申请号:US17578428

    申请日:2022-01-18

    CPC classification number: G06N3/063 G06F5/01 G06F7/5443

    Abstract: A neural network accelerator includes 2n multiplier circuits, 2n shifter circuits and an adder tree circuit. Each respective multiplier circuit multiplies a first value by a second value to output a first product value. Each respective first value is represented by a first predetermined number of bits beginning at a most significant bit of the first value having a value equal to 1. Each respective second value is represented by a second predetermined number of bits, and each respective first product value is represented by a third predetermined number of bits. Each respective shifter circuit receives the first product value of a corresponding multiplier circuit and left shifts the corresponding product value by the first predetermined number of bits to form a respective second product value. The adder circuit adds each respective second product value to form a partial-sum value represented by a fourth predetermined number of bits.

    PARTIAL SUM COMPRESSION
    3.
    发明申请

    公开(公告)号:US20220413805A1

    公开(公告)日:2022-12-29

    申请号:US17407150

    申请日:2021-08-19

    Abstract: A method for performing a neural network operation. In some embodiments, method includes: calculating a first plurality of products, each of the first plurality of products being the product of a weight and an activation; calculating a first partial sum, the first partial sum being the sum of the products; and compressing the first partial sum to form a first compressed partial sum.

Patent Agency Ranking