research-article

Reprogrammable Non-Linear Circuits Using ReRAM for NN Accelerators

Authors:
Rafael Fão De Moura

Federal University of Rio Grande do Sul, Brazil

Federal University of Rio Grande do Sul, Brazil

0000-0001-5954-9004
View Profile

,
Luigi Carro

Federal University of Rio Grande do Sul, Brazil

Federal University of Rio Grande do Sul, Brazil

0000-0002-7402-4780
View Profile

ACM Transactions on Reconfigurable Technology and Systems Volume 17 Issue 1Article No.: 7pp 1–19https://doi.org/10.1145/3617894

Published:27 January 2024Publication History

ACM Transactions on Reconfigurable Technology and Systems

Abstract

As the massive usage of artificial intelligence techniques spreads in the economy, researchers are exploring new techniques to reduce the energy consumption of Neural Network (NN) applications, especially as the complexity of NNs continues to increase. Using analog Resistive RAM devices to compute matrix-vector multiplication in O(1) time complexity is a promising approach, but it is true that these implementations often fail to cover the diversity of non-linearities required for modern NN applications. In this work, we propose a novel approach where Resistive RAMs themselves can be reprogrammed to compute not only the required matrix multiplications but also the activation functions, Softmax, and pooling layers, reducing energy in complex NNs. This approach offers more versatility for researching novel NN layouts compared to custom logic. Results show that our device outperforms analog and digital field-programmable approaches by up to 8.5× in experiments on real-world human activity recognition and language modeling datasets with convolutional neural network, generative pre-trained Transformer, and long short-term memory models.

REFERENCES

[1] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
[2] Szegedy Christian, Liu Wei, Jia Yangqing, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erhan Dumitru, Vanhoucke Vincent, and Rabinovich Andrew. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.Google ScholarCross Ref
[3] Mnih Volodymyr, Kavukcuoglu Koray, Silver David, Rusu Andrei A., Veness Joel, Bellemare Marc G., Graves Alex, Riedmiller Martin, Fidjeland Andreas K., Ostrovski Georg, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529–533.Google ScholarCross Ref
[4] Chollet François. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1251–1258.Google ScholarCross Ref
[5] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017), 1–11.Google Scholar
[6] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
[7] Radford Alec, Wu Jeffrey, Child Rewon, Luan David, Amodei Dario, Sutskever and Ilya2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.Google Scholar
[8] Zidan Mohammed A., Strachan John Paul, and Lu Wei D.. 2018. The future of electronics based on memristive systems. Nature Electronics 1, 1 (2018), 22–29.Google ScholarCross Ref
[9] Ankit Aayush, Hajj Izzat El, Chalamalasetti Sai Rahul, Agarwal Sapan, Marinella Matthew, Foltin Martin, Strachan John Paul, Milojicic Dejan, Hwu Wen-Mei, and Roy Kaushik. 2020. Panther: A programmable architecture for neural network training harnessing energy-efficient reram. IEEE Transactions on Computers 69, 8 (2020), 1128–1142.Google ScholarDigital Library
[10] Gokmen Tayfun and Vlasov Yurii. 2016. Acceleration of deep neural network training with resistive cross-point devices: Design considerations. Frontiers in Neuroscience 10 (2016), 333.Google ScholarCross Ref
[11] Ji Yu, Zhang Youyang, Xie Xinfeng, Li Shuangchen, Wang Peiqi, Hu Xing, Zhang Youhui, and Xie Yuan. 2019. FPSA: A full system stack solution for reconfigurable ReRAM-based NN accelerator architecture. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. 733–747.Google ScholarDigital Library
[12] Shafiee Ali, Nag Anirban, Muralimanohar Naveen, Balasubramonian Rajeev, Strachan John Paul, Hu Miao, Williams R. Stanley, and Srikumar Vivek. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News 44, 3 (2016), 14–26.Google ScholarDigital Library
[13] Luo Tao, Liu Shaoli, Li Ling, Wang Yuqing, Zhang Shijin, Chen Tianshi, Xu Zhiwei, Temam Olivier, and Chen Yunji. 2016. DaDianNao: A neural network supercomputer. IEEE Transactions on Computers 66, 1 (2016), 73–88.Google ScholarDigital Library
[14] Long Yun, Na Taesik, and Mukhopadhyay Saibal. 2018. ReRAM-based processing-in-memory architecture for recurrent neural network acceleration. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 12 (2018), 2781–2794.Google ScholarCross Ref
[15] Han Jianhui, Liu He, Wang Mingyu, Li Zhaolin, and Zhang Youhui. 2019. ERA-LSTM: An efficient ReRAM-based architecture for long short-term memory. IEEE Transactions on Parallel and Distributed Systems 31, 6 (2019), 1328–1342.Google ScholarDigital Library
[16] Gao Ligang, Wang I.-Ting, Chen Pai-Yu, Vrudhula Sarma, Seo Jae-Sun, Cao Yu, Hou Tuo-Hung, and Yu Shimeng. 2015. Fully parallel write/read in resistive synaptic array for accelerating on-chip learning. Nanotechnology 26, 45 (2015), 455204.Google ScholarCross Ref
[17] Grossi Alessandro, Vianello Elisa, Zambelli Cristian, Royer Pablo, Noel Jean-Philippe, Giraud Bastien, Perniola Luca, Olivo Piero, and Nowak Etienne. 2018. Experimental investigation of 4-kb RRAM arrays programming conditions suitable for TCAM. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 12 (2018), 2599–2607.Google ScholarCross Ref
[18] Adam Kazybek, Smagulova Kamilya, and James Alex. 2021. Generalised analog LSTMs recurrent modules for neural computing. Frontiers in Computational Neuroscience 15 (2021), 85.Google Scholar
[19] Han Song, Kang Junlong, Mao Huizi, Hu Yiming, Li Xin, Li Yubin, Xie Dongliang, Luo Hong, Yao Song, Wang Yu, et al. 2017. ESE: Efficient speech recognition engine with sparse LSTM on FPGA. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 75–84.Google ScholarDigital Library
[20] Wang Shuo, Li Zhe, Ding Caiwen, Yuan Bo, Qiu Qinru, Wang Yanzhi, and Liang Yun. 2018. C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 11–20.Google ScholarDigital Library
[21] Hasler Jennifer. 2022. The potential of SoC FPAAs for emerging ultra-low-power machine learning. Journal of Low Power Electronics and Applications 12, 2 (2022), 33.Google ScholarCross Ref
[22] Chi Ping, Li Shuangchen, Xu Cong, Zhang Tao, Zhao Jishen, Liu Yongpan, Wang Yu, and Xie Yuan. 2016. Prime: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. ACM SIGARCH Computer Architecture News 44, 3 (2016), 27–39.Google ScholarDigital Library
[23] Li Boxun, Gu Peng, Shan Yi, Wang Yu, Chen Yiran, and Yang Huazhong. 2015. RRAM-based analog approximate computing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 34, 12 (2015), 1905–1917.Google ScholarDigital Library
[24] Yildirim Melih. 2021. Analog circuit architecture for max and min pooling methods on image. Analog Integrated Circuits and Signal Processing 108, 1 (2021), 119–124.Google ScholarDigital Library
[25] Sanh Victor, Wolf Thomas, and Rush Alexander. 2020. Movement pruning: Adaptive sparsity by fine-tuning. Advances in Neural Information Processing Systems 33 (2020), 20378–20389.Google Scholar
[26] Zhang Shijin, Du Zidong, Zhang Lei, Lan Huiying, Liu Shaoli, Li Ling, Guo Qi, Chen Tianshi, and Chen Yunji. 2016. Cambricon-X: An accelerator for sparse neural networks. In Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’16). IEEE, Los Alamitos, CA, 1–12.Google ScholarCross Ref
[27] Yu Jiecao, Lukefahr Andrew, Palframan David, Dasika Ganesh, Das Reetuparna, and Mahlke Scott. 2017. Scalpel: Customizing DNN pruning to the underlying hardware parallelism. ACM SIGARCH Computer Architecture News 45, 2 (2017), 548–560.Google ScholarDigital Library
[28] Ji Yu, Liang Ling, Deng Lei, Zhang Youyang, Zhang Youhui, and Xie Yuan. 2018. TETRIS: TilE-matching the TRemendous Irregular Sparsity. Advances in Neural Information Processing Systems 31 (2018), 2041.Google Scholar
[29] Peng Xiaochen, Huang Shanshi, Jiang Hongwu, Lu Anni, and Yu Shimeng. 2020. DNN+ NeuroSim V2.0: An end-to-end benchmarking framework for compute-in-memory accelerators for on-chip training. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 40, 11 (2020), 2306–2319.Google ScholarDigital Library
[30] Kull Lukas, Toifl Thomas, Schmatz Martin, Francese Pier Andrea, Menolfi Christian, Braendli Matthias, Kossel Marcel, Morf Thomas, Andersen Toke Meyer, and Leblebici Yusuf. 2013. A 3.1 mW 8b 1.2 GS/s single-channel asynchronous SAR ADC with alternate comparators for enhanced speed in 32 nm digital SOI CMOS. IEEE Journal of Solid-State Circuits 48, 12 (2013), 3049–3058.Google ScholarCross Ref
[31] Saberi Mehdi, Lotfi Reza, Mafinezhad Khalil, and Serdijn Wouter A.. 2011. Analysis of power consumption and linearity in capacitive digital-to-analog converters used in successive approximation ADCs. IEEE Transactions on Circuits and Systems I: Regular Papers 58, 8 (2011), 1736–1748.Google ScholarCross Ref
[32] Muralimanohar Naveen, Balasubramonian Rajeev, and Jouppi Norm. 2007. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’07). IEEE, Los Alamitos, CA, 3–14.Google ScholarDigital Library
[33] Carreira Joao, Noland Eric, Hillier Chloe, and Zisserman Andrew. 2019. A short note on the Kinetics-700 human action dataset. arXiv preprint arXiv:1907.06987 (2019).Google Scholar
[34] Soomro Khurram, Zamir Amir Roshan, and Shah Mubarak. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012).Google Scholar
[35] Krizhevsky Alex, Sutskever Ilya, and Hinton Geoffrey E.. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM 60, 6 (2017), 84–90.Google ScholarDigital Library
[36] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
[37] Brown Tom, Mann Benjamin, Ryder Nick, Subbiah Melanie, Kaplan Jared D., Dhariwal Prafulla, Neelakantan Arvind, Shyam Pranav, Sastry Girish, Askell Amanda, et al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33 (2020), 1877–1901.Google Scholar
[38] Reddy Siva, Chen Danqi, and Manning Christopher D.. 2019. CoQA: A conversational question answering challenge. Transactions of the Association for Computational Linguistics 7 (2019), 249–266.Google ScholarCross Ref
[39] Moreno Daniel García, Barrio Alberto A. Del, Botella Guillermo, and Hasler Jennifer. 2021. A cluster of FPAAs to recognize images using neural networks. IEEE Transactions on Circuits and Systems II: Express Briefs 68, 11 (2021), 3391–3395.Google Scholar
[40] Zhang Chen, Wu Di, Sun Jiayu, Sun Guangyu, Luo Guojie, and Cong Jason. 2016. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In Proceedings of the 2016 International Symposium on Low Power Electronics and Design. 326–331.Google ScholarDigital Library

Index Terms

Reprogrammable Non-Linear Circuits Using ReRAM for NN Accelerators
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
      2. Reconfigurable computing
2. Hardware
  1. Emerging technologies

Recommendations

Data and Computation Reuse in CNNs Using Memristor TCAMs
Exploiting computational and data reuse in CNNs is crucial for the successful design of resource-constrained platforms. In image recognition applications, high levels of input locality and redundancy present in CNNs have become the golden goose for ...
Read More
Trained Biased Number Representation for ReRAM-Based Neural Network Accelerators
Special Issue on HALO for Energy-Constrained On-Chip Machine Learning

Recent works have demonstrated the promise of using resistive random access memory (ReRAM) to perform neural network computations in memory. In particular, ReRAM-based crossbar structures can perform matrix-vector multiplication directly in the analog ...
Read More
A survey of FPGA-based accelerators for convolutional neural networks
Abstract
Deep convolutional neural networks (CNNs) have recently shown very high accuracy in a wide range of cognitive tasks, and due to this, they have received significant interest from the researchers. Given the high computational demands of CNNs, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Reconfigurable Technology and Systems Volume 17, Issue 1
March 2024
446 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/3613534
Editor:
Deming Chen
University of Illinois, Urbana-Champaign, USA
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 January 2024
- Online AM: 10 October 2023
- Accepted: 15 August 2023
- Revised: 6 July 2023
- Received: 22 February 2023
Published in trets Volume 17, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Neural networks
Resistive RAM (ReRAM)
in-memory processing
reconfigurable computing
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 219
  Total Downloads
- Downloads (Last 12 months)219
- Downloads (Last 6 weeks)48
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Reprogrammable Non-Linear Circuits Using ReRAM for NN Accelerators

ACM Transactions on Reconfigurable Technology and Systems

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Data and Computation Reuse in CNNs Using Memristor TCAMs

Trained Biased Number Representation for ReRAM-Based Neural Network Accelerators

A survey of FPGA-based accelerators for convolutional neural networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

Caption

Reprogrammable Non-Linear Circuits Using ReRAM for NN Accelerators

ACM Transactions on Reconfigurable Technology and Systems

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Data and Computation Reuse in CNNs Using Memristor TCAMs

Trained Biased Number Representation for ReRAM-Based Neural Network Accelerators

A survey of FPGA-based accelerators for convolutional neural networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

Share this Publication link

Share on Social Media