skip to main content
research-article

Reprogrammable Non-Linear Circuits Using ReRAM for NN Accelerators

Published:27 January 2024Publication History
Skip Abstract Section

Abstract

As the massive usage of artificial intelligence techniques spreads in the economy, researchers are exploring new techniques to reduce the energy consumption of Neural Network (NN) applications, especially as the complexity of NNs continues to increase. Using analog Resistive RAM devices to compute matrix-vector multiplication in O(1) time complexity is a promising approach, but it is true that these implementations often fail to cover the diversity of non-linearities required for modern NN applications. In this work, we propose a novel approach where Resistive RAMs themselves can be reprogrammed to compute not only the required matrix multiplications but also the activation functions, Softmax, and pooling layers, reducing energy in complex NNs. This approach offers more versatility for researching novel NN layouts compared to custom logic. Results show that our device outperforms analog and digital field-programmable approaches by up to 8.5× in experiments on real-world human activity recognition and language modeling datasets with convolutional neural network, generative pre-trained Transformer, and long short-term memory models.

REFERENCES

  1. [1] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  2. [2] Szegedy Christian, Liu Wei, Jia Yangqing, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erhan Dumitru, Vanhoucke Vincent, and Rabinovich Andrew. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 19.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Mnih Volodymyr, Kavukcuoglu Koray, Silver David, Rusu Andrei A., Veness Joel, Bellemare Marc G., Graves Alex, Riedmiller Martin, Fidjeland Andreas K., Ostrovski Georg, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529533.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Chollet François. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12511258.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 1–11.Google ScholarGoogle Scholar
  6. [6] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  7. [7] Radford Alec, Wu Jeffrey, Child Rewon, Luan David, Amodei Dario, Sutskever and Ilya2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.Google ScholarGoogle Scholar
  8. [8] Zidan Mohammed A., Strachan John Paul, and Lu Wei D.. 2018. The future of electronics based on memristive systems. Nature Electronics 1, 1 (2018), 2229.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Ankit Aayush, Hajj Izzat El, Chalamalasetti Sai Rahul, Agarwal Sapan, Marinella Matthew, Foltin Martin, Strachan John Paul, Milojicic Dejan, Hwu Wen-Mei, and Roy Kaushik. 2020. Panther: A programmable architecture for neural network training harnessing energy-efficient reram. IEEE Transactions on Computers 69, 8 (2020), 11281142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Gokmen Tayfun and Vlasov Yurii. 2016. Acceleration of deep neural network training with resistive cross-point devices: Design considerations. Frontiers in Neuroscience 10 (2016), 333.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Ji Yu, Zhang Youyang, Xie Xinfeng, Li Shuangchen, Wang Peiqi, Hu Xing, Zhang Youhui, and Xie Yuan. 2019. FPSA: A full system stack solution for reconfigurable ReRAM-based NN accelerator architecture. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. 733747.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Shafiee Ali, Nag Anirban, Muralimanohar Naveen, Balasubramonian Rajeev, Strachan John Paul, Hu Miao, Williams R. Stanley, and Srikumar Vivek. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News 44, 3 (2016), 1426.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Luo Tao, Liu Shaoli, Li Ling, Wang Yuqing, Zhang Shijin, Chen Tianshi, Xu Zhiwei, Temam Olivier, and Chen Yunji. 2016. DaDianNao: A neural network supercomputer. IEEE Transactions on Computers 66, 1 (2016), 7388.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Long Yun, Na Taesik, and Mukhopadhyay Saibal. 2018. ReRAM-based processing-in-memory architecture for recurrent neural network acceleration. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 12 (2018), 27812794.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Han Jianhui, Liu He, Wang Mingyu, Li Zhaolin, and Zhang Youhui. 2019. ERA-LSTM: An efficient ReRAM-based architecture for long short-term memory. IEEE Transactions on Parallel and Distributed Systems 31, 6 (2019), 13281342.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Gao Ligang, Wang I.-Ting, Chen Pai-Yu, Vrudhula Sarma, Seo Jae-Sun, Cao Yu, Hou Tuo-Hung, and Yu Shimeng. 2015. Fully parallel write/read in resistive synaptic array for accelerating on-chip learning. Nanotechnology 26, 45 (2015), 455204.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Grossi Alessandro, Vianello Elisa, Zambelli Cristian, Royer Pablo, Noel Jean-Philippe, Giraud Bastien, Perniola Luca, Olivo Piero, and Nowak Etienne. 2018. Experimental investigation of 4-kb RRAM arrays programming conditions suitable for TCAM. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 12 (2018), 25992607.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Adam Kazybek, Smagulova Kamilya, and James Alex. 2021. Generalised analog LSTMs recurrent modules for neural computing. Frontiers in Computational Neuroscience 15 (2021), 85.Google ScholarGoogle Scholar
  19. [19] Han Song, Kang Junlong, Mao Huizi, Hu Yiming, Li Xin, Li Yubin, Xie Dongliang, Luo Hong, Yao Song, Wang Yu, et al. 2017. ESE: Efficient speech recognition engine with sparse LSTM on FPGA. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 7584.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Wang Shuo, Li Zhe, Ding Caiwen, Yuan Bo, Qiu Qinru, Wang Yanzhi, and Liang Yun. 2018. C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 1120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Hasler Jennifer. 2022. The potential of SoC FPAAs for emerging ultra-low-power machine learning. Journal of Low Power Electronics and Applications 12, 2 (2022), 33.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Chi Ping, Li Shuangchen, Xu Cong, Zhang Tao, Zhao Jishen, Liu Yongpan, Wang Yu, and Xie Yuan. 2016. Prime: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. ACM SIGARCH Computer Architecture News 44, 3 (2016), 2739.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Li Boxun, Gu Peng, Shan Yi, Wang Yu, Chen Yiran, and Yang Huazhong. 2015. RRAM-based analog approximate computing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 34, 12 (2015), 19051917.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Yildirim Melih. 2021. Analog circuit architecture for max and min pooling methods on image. Analog Integrated Circuits and Signal Processing 108, 1 (2021), 119124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Sanh Victor, Wolf Thomas, and Rush Alexander. 2020. Movement pruning: Adaptive sparsity by fine-tuning. Advances in Neural Information Processing Systems 33 (2020), 2037820389.Google ScholarGoogle Scholar
  26. [26] Zhang Shijin, Du Zidong, Zhang Lei, Lan Huiying, Liu Shaoli, Li Ling, Guo Qi, Chen Tianshi, and Chen Yunji. 2016. Cambricon-X: An accelerator for sparse neural networks. In Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’16). IEEE, Los Alamitos, CA, 112.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Yu Jiecao, Lukefahr Andrew, Palframan David, Dasika Ganesh, Das Reetuparna, and Mahlke Scott. 2017. Scalpel: Customizing DNN pruning to the underlying hardware parallelism. ACM SIGARCH Computer Architecture News 45, 2 (2017), 548560.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Ji Yu, Liang Ling, Deng Lei, Zhang Youyang, Zhang Youhui, and Xie Yuan. 2018. TETRIS: TilE-matching the TRemendous Irregular Sparsity. Advances in Neural Information Processing Systems 31 (2018), 2041.Google ScholarGoogle Scholar
  29. [29] Peng Xiaochen, Huang Shanshi, Jiang Hongwu, Lu Anni, and Yu Shimeng. 2020. DNN+ NeuroSim V2.0: An end-to-end benchmarking framework for compute-in-memory accelerators for on-chip training. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 40, 11 (2020), 23062319.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Kull Lukas, Toifl Thomas, Schmatz Martin, Francese Pier Andrea, Menolfi Christian, Braendli Matthias, Kossel Marcel, Morf Thomas, Andersen Toke Meyer, and Leblebici Yusuf. 2013. A 3.1 mW 8b 1.2 GS/s single-channel asynchronous SAR ADC with alternate comparators for enhanced speed in 32 nm digital SOI CMOS. IEEE Journal of Solid-State Circuits 48, 12 (2013), 30493058.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Saberi Mehdi, Lotfi Reza, Mafinezhad Khalil, and Serdijn Wouter A.. 2011. Analysis of power consumption and linearity in capacitive digital-to-analog converters used in successive approximation ADCs. IEEE Transactions on Circuits and Systems I: Regular Papers 58, 8 (2011), 17361748.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Muralimanohar Naveen, Balasubramonian Rajeev, and Jouppi Norm. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’07). IEEE, Los Alamitos, CA, 314.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Carreira Joao, Noland Eric, Hillier Chloe, and Zisserman Andrew. 2019. A short note on the Kinetics-700 human action dataset. arXiv preprint arXiv:1907.06987 (2019).Google ScholarGoogle Scholar
  34. [34] Soomro Khurram, Zamir Amir Roshan, and Shah Mubarak. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012).Google ScholarGoogle Scholar
  35. [35] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM 60, 6 (2017), 8490.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Brown Tom, Mann Benjamin, Ryder Nick, Subbiah Melanie, Kaplan Jared D., Dhariwal Prafulla, Neelakantan Arvind, Shyam Pranav, Sastry Girish, Askell Amanda, et al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33 (2020), 18771901.Google ScholarGoogle Scholar
  38. [38] Reddy Siva, Chen Danqi, and Manning Christopher D.. 2019. CoQA: A conversational question answering challenge. Transactions of the Association for Computational Linguistics 7 (2019), 249266.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Moreno Daniel García, Barrio Alberto A. Del, Botella Guillermo, and Hasler Jennifer. 2021. A cluster of FPAAs to recognize images using neural networks. IEEE Transactions on Circuits and Systems II: Express Briefs 68, 11 (2021), 33913395.Google ScholarGoogle Scholar
  40. [40] Zhang Chen, Wu Di, Sun Jiayu, Sun Guangyu, Luo Guojie, and Cong Jason. 2016. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In Proceedings of the 2016 International Symposium on Low Power Electronics and Design. 326331.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Reprogrammable Non-Linear Circuits Using ReRAM for NN Accelerators

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Reconfigurable Technology and Systems
          ACM Transactions on Reconfigurable Technology and Systems  Volume 17, Issue 1
          March 2024
          446 pages
          ISSN:1936-7406
          EISSN:1936-7414
          DOI:10.1145/3613534
          • Editor:
          • Deming Chen
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 January 2024
          • Online AM: 10 October 2023
          • Accepted: 15 August 2023
          • Revised: 6 July 2023
          • Received: 22 February 2023
          Published in trets Volume 17, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
        • Article Metrics

          • Downloads (Last 12 months)219
          • Downloads (Last 6 weeks)48

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text