SELF-PRUNING NEURAL NETWORKS FOR WEIGHT PARAMETER REDUCTION

    公开(公告)号:US20220129756A1

    公开(公告)日:2022-04-28

    申请号:US17572625

    申请日:2022-01-10

    Abstract: A technique to prune weights of a neural network using an analytic threshold function h(w) provides a neural network having weights that have been optimally pruned. The neural network includes a plurality of layers in which each layer includes a set of weights w associated with the layer that enhance a speed performance of the neural network, an accuracy of the neural network, or a combination thereof. Each set of weights is based on a cost function C that has been minimized by back-propagating an output of the neural network in response to input training data. The cost function C is also minimized based on a derivative of the cost function C with respect to a first parameter of the analytic threshold function h(w) and on a derivative of the cost function C with respect to a second parameter of the analytic threshold function h(w).

    JOINTLY PRUNING AND QUANTIZING DEEP NEURAL NETWORKS

    公开(公告)号:US20200293893A1

    公开(公告)日:2020-09-17

    申请号:US16396619

    申请日:2019-04-26

    Abstract: A system and a method generate a neural network that includes at least one layer having weights and output feature maps that have been jointly pruned and quantized. The weights of the layer are pruned using an analytic threshold function. Each weight remaining after pruning is quantized based on a weighted average of a quantization and dequantization of the weight for all quantization levels to form quantized weights for the layer. Output feature maps of the layer are generated based on the quantized weights of the layer. Each output feature map of the layer is quantized based on a weighted average of a quantization and dequantization of the output feature map for all quantization levels. Parameters of the analytic threshold function, the weighted average of all quantization levels of the weights and the weighted average of each output feature map of the layer are updated using a cost function.

    METHODS AND ALGORITHMS OF REDUCING COMPUTATION FOR DEEP NEURAL NETWORKS VIA PRUNING

    公开(公告)号:US20190050735A1

    公开(公告)日:2019-02-14

    申请号:US15724267

    申请日:2017-10-03

    Abstract: A method is disclosed to reduce computational load of a deep neural network. A number of multiply-accumulate (MAC) operations is determined for each layer of the deep neural network. A pruning error allowance per weight is determined based on a computational load of each layer. For each layer of the deep neural network: a threshold estimator is initialized, and weights of each layer are pruned based on a standard deviation of all weights within the layer. A pruning error per weight is determined for the layer, and if the pruning error per weight exceeds a predetermined threshold, the threshold estimator is updated for the layer the weights of the layer are repruned using the updated threshold estimator and the pruning error per weight is re-determined until the pruning error per weight is less than the threshold. The deep neural network is then retrained.

    JOINTLY PRUNING AND QUANTIZING DEEP NEURAL NETWORKS

    公开(公告)号:US20230004813A1

    公开(公告)日:2023-01-05

    申请号:US17943176

    申请日:2022-09-12

    Abstract: A system and a method generate a neural network that includes at least one layer having weights and output feature maps that have been jointly pruned and quantized. The weights of the layer are pruned using an analytic threshold function. Each weight remaining after pruning is quantized based on a weighted average of a quantization and dequantization of the weight for all quantization levels to form quantized weights for the layer. Output feature maps of the layer are generated based on the quantized weights of the layer. Each output feature map of the layer is quantized based on a weighted average of a quantization and dequantization of the output feature map for all quantization levels. Parameters of the analytic threshold function, the weighted average of all quantization levels of the weights and the weighted average of each output feature map of the layer are updated using a cost function.

    SELF-PRUNING NEURAL NETWORKS FOR WEIGHT PARAMETER REDUCTION

    公开(公告)号:US20190180184A1

    公开(公告)日:2019-06-13

    申请号:US15894921

    申请日:2018-02-12

    Abstract: A technique to prune weights of a neural network using an analytic threshold function h(w) provides a neural network having weights that have been optimally pruned. The neural network includes a plurality of layers in which each layer includes a set of weights w associated with the layer that enhance a speed performance of the neural network, an accuracy of the neural network, or a combination thereof. Each set of weights is based on a cost function C that has been minimized by back-propagating an output of the neural network in response to input training data. The cost function C is also minimized based on a derivative of the cost function C with respect to a first parameter of the analytic threshold function h(w) and on a derivative of the cost function C with respect to a second parameter of the analytic threshold function h(w).

Patent Agency Ranking