Flexible, lightweight quantized deep neural networks
摘要:
To improve the throughput and energy efficiency of Deep Neural Networks (DNNs) on customized hardware, lightweight neural networks constrain the weights of DNNs to be a limited combination of powers of 2. In such networks, the multiply-accumulate operation can be replaced with a single shift operation, or two shifts and an add operation. To provide even more design flexibility, the k for each convolutional filter can be optimally chosen instead of being fixed for every filter. The present invention formulates the selection of k to be differentiable and describes model training for determining k-based weights on a per-filter basis. The present invention can achieve higher speeds as compared to lightweight NNs with only minimal accuracy degradation, while also achieving higher computational energy efficiency for ASIC implementation.
公开/授权文献
信息查询
0/0