APPARATUS AND METHOD FOR GENERATING EFFICIENT CONVOLUTION

    公开(公告)号:US20180349317A1

    公开(公告)日:2018-12-06

    申请号:US15611342

    申请日:2017-06-01

    CPC classification number: G06F17/15 G06F7/5443 G06F7/556 G06N3/0454 G06N3/063

    Abstract: An apparatus and a method is provided. The apparatus includes a polynomial generator, including an input and an output; a first matrix generator, including an input connected to the output of the polynomial generator, and an output; a second matrix generator, including an input connected to the output of the first matrix generator, and an output; a third matrix generator, including a first input connected to the output of the first matrix generator, a second input connected to the output of the second matrix generator, and an output; and a convolution generator, including an input connected to the output of the third matrix generator, and an output.

    JOINTLY PRUNING AND QUANTIZING DEEP NEURAL NETWORKS

    公开(公告)号:US20200293893A1

    公开(公告)日:2020-09-17

    申请号:US16396619

    申请日:2019-04-26

    Abstract: A system and a method generate a neural network that includes at least one layer having weights and output feature maps that have been jointly pruned and quantized. The weights of the layer are pruned using an analytic threshold function. Each weight remaining after pruning is quantized based on a weighted average of a quantization and dequantization of the weight for all quantization levels to form quantized weights for the layer. Output feature maps of the layer are generated based on the quantized weights of the layer. Each output feature map of the layer is quantized based on a weighted average of a quantization and dequantization of the output feature map for all quantization levels. Parameters of the analytic threshold function, the weighted average of all quantization levels of the weights and the weighted average of each output feature map of the layer are updated using a cost function.

    METHODS AND ALGORITHMS OF REDUCING COMPUTATION FOR DEEP NEURAL NETWORKS VIA PRUNING

    公开(公告)号:US20190050735A1

    公开(公告)日:2019-02-14

    申请号:US15724267

    申请日:2017-10-03

    Abstract: A method is disclosed to reduce computational load of a deep neural network. A number of multiply-accumulate (MAC) operations is determined for each layer of the deep neural network. A pruning error allowance per weight is determined based on a computational load of each layer. For each layer of the deep neural network: a threshold estimator is initialized, and weights of each layer are pruned based on a standard deviation of all weights within the layer. A pruning error per weight is determined for the layer, and if the pruning error per weight exceeds a predetermined threshold, the threshold estimator is updated for the layer the weights of the layer are repruned using the updated threshold estimator and the pruning error per weight is re-determined until the pruning error per weight is less than the threshold. The deep neural network is then retrained.

    SELF-PRUNING NEURAL NETWORKS FOR WEIGHT PARAMETER REDUCTION

    公开(公告)号:US20220129756A1

    公开(公告)日:2022-04-28

    申请号:US17572625

    申请日:2022-01-10

    Abstract: A technique to prune weights of a neural network using an analytic threshold function h(w) provides a neural network having weights that have been optimally pruned. The neural network includes a plurality of layers in which each layer includes a set of weights w associated with the layer that enhance a speed performance of the neural network, an accuracy of the neural network, or a combination thereof. Each set of weights is based on a cost function C that has been minimized by back-propagating an output of the neural network in response to input training data. The cost function C is also minimized based on a derivative of the cost function C with respect to a first parameter of the analytic threshold function h(w) and on a derivative of the cost function C with respect to a second parameter of the analytic threshold function h(w).

    NORMALIZATION METHOD FOR TRAINING DEEP NEURAL NETWORKS

    公开(公告)号:US20200097829A1

    公开(公告)日:2020-03-26

    申请号:US16186468

    申请日:2018-11-09

    Inventor: Weiran DENG

    Abstract: A system and a method to normalize a deep neural network (DNN) in which a mean of activations of the DNN is set to be equal to about 0 for a training batch size of 8 or less, and a variance of the activations of the DNN is set to be equal to about a predetermined value for the training batch size. A minimization module minimizes a sum of a network loss of the DNN plus a sum of a product of a first Lagrange multiplier times the mean of the activations squared plus a sum of a product of a second Lagrange multiplier times a quantity of the variance of the activations minus one squared.

    APPARATUS AND METHOD FOR GENERATING EFFICIENT CONVOLUTION

    公开(公告)号:US20190325004A1

    公开(公告)日:2019-10-24

    申请号:US16460564

    申请日:2019-07-02

    Abstract: A method of manufacturing an apparatus and a method of constructing an integrated circuit are provided. The method of manufacturing an apparatus includes forming the apparatus on a wafer or a package with at least one other apparatus, wherein the apparatus comprises a polynomial generator, a first matrix generator, a second matrix generator, a third matrix generator, and a convolution generator; and testing the apparatus, wherein testing the apparatus comprises testing the apparatus using one or more electrical to optical converters, one or more optical splitters that split an optical signal into two or more optical signals, and one or more optical to electrical converters.

    JOINTLY PRUNING AND QUANTIZING DEEP NEURAL NETWORKS

    公开(公告)号:US20230004813A1

    公开(公告)日:2023-01-05

    申请号:US17943176

    申请日:2022-09-12

    Abstract: A system and a method generate a neural network that includes at least one layer having weights and output feature maps that have been jointly pruned and quantized. The weights of the layer are pruned using an analytic threshold function. Each weight remaining after pruning is quantized based on a weighted average of a quantization and dequantization of the weight for all quantization levels to form quantized weights for the layer. Output feature maps of the layer are generated based on the quantized weights of the layer. Each output feature map of the layer is quantized based on a weighted average of a quantization and dequantization of the output feature map for all quantization levels. Parameters of the analytic threshold function, the weighted average of all quantization levels of the weights and the weighted average of each output feature map of the layer are updated using a cost function.

    METHOD TO BALANCE SPARSITY FOR EFFICIENT INFERENCE OF DEEP NEURAL NETWORKS

    公开(公告)号:US20200097830A1

    公开(公告)日:2020-03-26

    申请号:US16186470

    申请日:2018-11-09

    Inventor: Weiran DENG

    Abstract: A system and method that provides balanced pruning of weights of a deep neural network (DNN) in which weights of the DNN are partitioned into a plurality of groups, a count of a number of non-zero weights is determined in each group, a variance of the count of weights in each group is determined, a loss function of the DNN is minimized using Lagrange multipliers with a constraint that the variance of the count of weights in each group is equal to 0, and the weights and the Lagrange multipliers are retrained by back-propagation.

    SELF-PRUNING NEURAL NETWORKS FOR WEIGHT PARAMETER REDUCTION

    公开(公告)号:US20190180184A1

    公开(公告)日:2019-06-13

    申请号:US15894921

    申请日:2018-02-12

    Abstract: A technique to prune weights of a neural network using an analytic threshold function h(w) provides a neural network having weights that have been optimally pruned. The neural network includes a plurality of layers in which each layer includes a set of weights w associated with the layer that enhance a speed performance of the neural network, an accuracy of the neural network, or a combination thereof. Each set of weights is based on a cost function C that has been minimized by back-propagating an output of the neural network in response to input training data. The cost function C is also minimized based on a derivative of the cost function C with respect to a first parameter of the analytic threshold function h(w) and on a derivative of the cost function C with respect to a second parameter of the analytic threshold function h(w).

Patent Agency Ranking