Retraining-free methods for fast on-the-fly pruning of convolutional neural networks

[摘要] We explore retraining-free pruning of CNNs. We propose and evaluate three model-independent methods for sparsification of model weights. Our methods are magnitude-based, efficient, and can be applied on-the-fly during model load time, which is necessary in some deployment contexts. We evaluate the effectiveness of these methods in introducing sparsity with minimal loss of inference accuracy using five state-of-the-art pretrained CNNs. The evaluation shows that the methods reduce the number of weights by up to 73% (i.e., compression factor of 3.7 x) without incurring more than 5% loss in Top-5 accuracy. These results also hold for quantized versions of the CNNs. We develop a classifier to determine which of the three methods is most suited for a given model. Finally, we employ additional, but impractical in our deployment context, fine-tuning and show that it gains only 8% in sparsity. This indicates that our on-the-fly methods capture much of the sparsity than can be attained without retraining, yet remain efficient and straight-forward to use. (C) 2019 Elsevier B.V. All rights reserved.

[发布日期] 2019-12-22 [发布机构]

[效力级别] [学科分类]

[关键词] Deep learning;Convolutional neural networks;Sparsity;Pruning [时效性]

浏览次数：2

统一登录查看全文激活码登录查看全文