Abstract
Deep Neural Networks (DNNs) have demonstrated tremendous success in many applications, but incur high computational burden on the inference side. The 2:4 sparsity pruning method has recently been developed to effectively compress and accelerate DNNs with little to no loss in performance. The method comprises a training phase followed by a pruning step where 2 out of 4 consecutive weights are eliminated to obtain a pruned matrix, which is then retrained to fine-tune the remaining weights. The accuracy of the resultant sparse network is maximized by permuting the matrix along the channel dimension in a way that maximizes the total magnitude of weights preserved during pruning. While earlier works have proposed heuristic methods to generate good permutations, we formalized the problem as a discrete optimization problem. In this paper, we propose four different mathematical programs to determine the optimal permutations and compare their performance for small-sized instances using a standard solver. Further, we develop a complementary column generation scheme to solve DNNs with realistic number of channels.
Similar content being viewed by others
References
Al-Ykoob, S.M., Sherali, H.D.: A complementary column generation approach for the graph equipartition problem. Informatica 31, 1–20 (2020)
Anwar, S., Hwang, K., Sung, W.: Structured pruning of deep convolutional neural networks. J. Emerg. Technol. Comput. Syst. 13(3), 1 (2017)
Barnhart, C., Johnson, E.L., Nemhauser, G.L., Savelsbergh, M.W.P., Vance, P.H.: Branch-and-price: Column generation for solving huge integer programs. Oper. Res. 46(3), 316–329 (1998)
Bertsimas, D.: Introduction to linear optimization. Athena Scientific series in optimization and neural computation. Athena Scientific, Belmont, Mass (1997–1997)
Garey, M.R., Johnson, D.S.: Computers and intractability; a guide to the theory of NP-completeness. W. H. Freeman & Co., USA (1990)
Ghoniem, A., Sherali, H.D.: Complementary column generation and bounding approaches for set partitioning formulations. Optim. Lett. 3(1), 123–136 (2009)
Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural networks. In: Proc. of the 28th Intl. Conference on Neural Information Processing Systems—Volume 1, NIPS’15, p. 1135–1143. MIT Press, Cambridge, MA, USA (2015)
Hestness, J., Ardalani, N., Diamos, G.: Beyond human-level accuracy: Computational challenges in deep learning. In: Proc. of the 24th Symposium on principles and practice of parallel programming, PPoPP ’19, p. 1-14. Association for Computing Machinery, New York, NY, USA (2019)
Lee, E., Lee, C.Y.: Neuralscale: Efficient scaling of neurons for resource-constrained deep neural networks. In: Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1478–1487 (2020)
Li, Y., Gu, S., Mayer, C., Van Gool, L., Timofte, R.: Group sparsity: The hinge between filter pruning and decomposition for network compression. In: Proc. of the IEEE international conf. on computer vision (2020)
Mishra, A., Latorre, J.A., Pool, J., Stosic, D., Stosic, D., Venkatesh, G., Yu, C., Micikevicius, P.: Accelerating sparse deep neural networks (2021)
NVIDIA apex library. https://github.com/NVIDIA/apex
NVIDIA: Nvidia a100 tensor core gpu architecture (2020). Whitepaper at https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf
Pool, J., Yu, C.: Channel permutations for n:m sparsity. In: Advances in neural information processing systems, pp. 13316–13327 (2021)
Savelsbergh, M.: A branch-and-price algorithm for the generalized assignment problem. Oper. Res. 45(6), 831–841 (1997)
Zhou, A., Ma, Y., Zhu, J., Liu, J., Zhang, Z., Yuan, K., Sun, W., Li, H.: Learning n:m fine-grained structured sparse neural networks from scratch (2021)
Zhu, M., Gupta, S.: To prune, or not to prune: exploring the efficacy of pruning for model compression (2018). URL https://openreview.net/forum?id=Sy1iIDkPM
Acknowledgements
We acknowledge the immense contributions of Dr. Jeff Pool, NVIDIA, who introduced us to this problem and supported the numerical testing by sharing the APEX repository and his expertise. Rakesh Nagi also acknowledges NVIDIA for an equipment gift under the Applied Accelerator Program.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mahajan, M., Hwu, WM. & Nagi, R. Determining optimal channel partition for 2:4 fine grained structured sparsity. Optim Lett (2024). https://doi.org/10.1007/s11590-023-02084-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11590-023-02084-8