research-article

Free Access

Just Accepted

Towards practical superconducting accelerators for machine learning using U-SFQ

Authors:
Patricia Gonzalez-Guerrero

Lawrence Berkeley National Laboratory, USA

Lawrence Berkeley National Laboratory, USA

0000-0003-4377-7496
View Profile

,
Kylie Huch

Lawrence Berkeley National Laboratory, USA

Lawrence Berkeley National Laboratory, USA

0009-0006-7551-0867
View Profile

,
Nirmalendu Patra

Lawrence Berkeley National Laboratory, USA

Lawrence Berkeley National Laboratory, USA

0000-0002-7011-0999
View Profile

,
Thom Popovici

Lawrence Berkeley National Laboratory, USA

Lawrence Berkeley National Laboratory, USA

0000-0002-7271-8092
View Profile

,
George Michelogiannakis

Lawrence Berkeley National Laboratory, USA

Lawrence Berkeley National Laboratory, USA

0000-0003-3743-6054
View Profile

Authors Info & Claims

ACM Journal on Emerging Technologies in Computing SystemsAccepted on March 2024https://doi.org/10.1145/3653073

Online AM:09 April 2024Publication History

ACM Journal on Emerging Technologies in Computing Systems

Abstract

Most popular superconducting circuits operate on information carried by ps-wide, \(\boldsymbol{\mu}\)V-tall, single flux quantum (SFQ) pulses. These circuits can operate at frequencies of hundreds of GHz with orders of magnitude lower switching energy than complementary-metal-oxide-semiconductors (CMOS). However, under the stringent area constraints of modern superconductor technologies, fully-fledged, CMOS-inspired superconducting architectures cannot be fabricated at large scales. Unary SFQ (U-SFQ) is an alternative computing paradigm that can address these area constraints. In U-SFQ, information is mapped to a combination of streams of SFQ pulses and in the temporal domain. In this work, we extend U-SFQ to introduce novel building blocks such as a multiplier and an accumulator. These blocks reduce area and power consumption by 2\(\times\) and 4\(\times\) compared with previously-proposed U-SFQ building blocks, and yield at least 97% area savings compared with binary approaches. Using these multiplier and adder, we propose a U-SFQ Convolutional Neural Network (CNN) hardware accelerator capable of comparable peak performance with state-of-the-art superconducting binary approach (B-SFQ) in 32\(\times\) less area. CNNs can operate with 5-8 bits of resolution with no significant degradation in classification accuracy. For 5 bits of resolution, our proposed accelerator yields 5\(\times\)-63\(\times\) better performance than CMOS and 15\(\times\)-173\(\times\) better area efficiency than B-SFQ.

References

A. Akahori, M. Tanaka, A. Sekiya, A. Fujimaki, and H. Hayakawa. 2003. Design and demonstration of SFQ pipelined multiplier. IEEE Transactions on Applied Superconductivity 13, 2 (2003), 559–562. https://doi.org/10.1109/TASC.2003.813946Google ScholarCross Ref
Christopher L. Ayala, Tomoyuki Tanaka, Ro Saito, Mai Nozoe, Naoki Takeuchi, and Nobuyuki Yoshikawa. 2021. MANA: A Monolithic Adiabatic iNtegration Architecture Microprocessor Using 1.4-zJ/op Unshunted Superconductor Josephson Junction Devices. IEEE Journal of Solid-State Circuits 56, 4 (2021), 1152–1165. https://doi.org/10.1109/JSSC.2020.3041338Google ScholarCross Ref
Ron Banner, Yury Nahshan, and Daniel Soudry. 2019. Post training 4-bit quantization of convolutional networks for rapid-deployment. Advances in Neural Information Processing Systems 32 (2019).Google Scholar
Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, and George Michelogiannakis. 2021. Superconducting Shuttle-flux Shift Buffer for Race Logic. In 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS). 796–801. https://doi.org/10.1109/MWSCAS47672.2021.9531899Google ScholarCross Ref
Paul Bunyk, Konstantin Likharev, and Dmitry Zinoviev. 2001. RSFQ technology: Physics and devices. International journal of high speed electronics and systems 11, 01 (2001), 257–305.Google ScholarCross Ref
Ruizhe Cai, Ao Ren, Olivia Chen, Ning Liu, Caiwen Ding, Xuehai Qian, Jie Han, Wenhui Luo, Nobuyuki Yoshikawa, and Yanzhi Wang. 2019. A stochastic-computing based deep learning framework using adiabatic quantum-flux-parametron superconducting technology. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). IEEE, 567–578.Google ScholarDigital Library
W. Chen, A.V. Rylyakov, V. Patel, J.E. Lukens, and K.K. Likharev. 1999. Rapid single flux quantum T-flip flop operating up to 770 GHz. IEEE Transactions on Applied Superconductivity 9, 2 (1999), 3212–3215. https://doi.org/10.1109/77.783712Google ScholarCross Ref
Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. PACT: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018).Google Scholar
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. Advances in neural information processing systems 28 (2015).Google Scholar
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. 92–104.Google ScholarDigital Library
Timur V. Filippov, Anubhav Sahu, Alex F. Kirichenko, Igor V. Vernik, Mikhail Dorojevets, Christopher L. Ayala, and Oleg A. Mukhanov. 2012. 20GHz Operation of an Asynchronous Wave-Pipelined RSFQ Arithmetic-Logic Unit. Physics Procedia 36 (2012), 59–65. https://doi.org/10.1016/j.phpro.2012.06.130 SUPERCONDUCTIVITY CENTENNIAL Conference 2011.Google ScholarCross Ref
Akira Fujimaki, Masamitsu Tanaka, Ryo Kasagi, Katsumi Takagi, Masakazu Okada, Yuhi Hayakawa, Kensuke Takata, Hiroyuki Akaike, Nobuyuki Yoshikawa, Shuichi Nagasawa, et al. 2014. Large-scale integrated circuit design based on a Nb nine-layer structure for reconfigurable data-path processors. IEICE Transactions on Electronics 97, 3 (2014), 157–165.Google ScholarCross Ref
Brian R Gaines. 1967. Stochastic computing. In Proceedings of the April 18-20, 1967, spring joint computer conference. 149–156.Google Scholar
Patricia Gonzalez-Guerrero, Meriam Gay Bautista, Darren Lyles, and George Michelogiannakis. 2022. Temporal and SFQ Pulse-Streams Encoding for Area-Efficient Superconducting Accelerators. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS ’22). Association for Computing Machinery, New York, NY, USA, 963–976. https://doi.org/10.1145/3503222.3507765Google ScholarDigital Library
Patricia Gonzalez-Guerrero and Mircea R Stan. 2019. Asynchronous Stochastic Computing. In 2019 53rd Asilomar Conference on Signals, Systems, and Computers. IEEE, 280–285.Google Scholar
Patricia Gonzalez-Guerrero, Stephen G Wilson, and Mircea R Stan. 2019. Error-latency Trade-off for Asynchronous Stochastic Computing with ΣΔ Streams for the IoT. In 2019 32nd IEEE International System-on-Chip Conference (SOCC). IEEE, 97–102.Google Scholar
D Scott Holmes, Andrew L Ripple, and Marc A Manheimer. 2013. Energy-efficient superconducting computing—Power budgets and requirements. IEEE Transactions on Applied Superconductivity 23, 3 (2013), 1701610–1701610.Google ScholarCross Ref
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. Advances in neural information processing systems 29 (2016).Google Scholar
Kyuyeon Hwang and Wonyong Sung. 2014. Fixed-point feedforward deep neural network design using weights +1, 0, and −1. In 2014 IEEE Workshop on Signal Processing Systems (SiPS). 1–6. https://doi.org/10.1109/SiPS.2014.6986082Google ScholarCross Ref
Koki Ishida, Ilkwon Byun, Ikki Nagaoka, Kosuke Fukumitsu, Masamitsu Tanaka, Satoshi Kawakami, Teruo Tanimoto, Takatsugu Ono, Jangwoo Kim, and Koji Inoue. 2020. SuperNPU: An extremely fast neural processing unit using superconducting logic devices. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 58–72.Google ScholarCross Ref
Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture. 1–12.Google ScholarDigital Library
Ryota Kashima, Ikki Nagaoka, Masamitsu Tanaka, Taro Yamashita, and Akira Fujimaki. 2021. 64-GHz Datapath Demonstration for Bit-Parallel SFQ Microprocessors Based on a Gate-Level-Pipeline Structure. IEEE Transactions on Applied Superconductivity 31, 5 (2021), 1–6.Google ScholarCross Ref
Fei Ke, Olivia Chen, Yanzhi Wang, and Nobuyuki Yoshikawa. 2021. Demonstration of a 47.8 GHz High-Speed FFT Processor Using Single-Flux-Quantum Technology. IEEE Transactions on Applied Superconductivity 31, 5 (2021), 1–5.Google ScholarCross Ref
Asifullah Khan, Anabia Sohail, Umme Zahoora, and Aqsa Saeed Qureshi. 2020. A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review 53, 8 (2020), 5455–5516.Google ScholarDigital Library
Minje Kim and Paris Smaragdis. 2016. Bitwise neural networks. arXiv preprint arXiv:1601.06071 (2016).Google Scholar
Gleb Krylov and Eby G Friedman. 2022. Single Flux Quantum Integrated Circuit Design. Springer.Google Scholar
K.K. Likharev and V.K. Semenov. 1991. RSFQ logic/memory family: a new Josephson-junction technology for sub-terahertz-clockfrequency digital systems. IEEE Transactions on Applied Superconductivity 1, 1 (1991), 3–28. https://doi.org/10.1109/77.80745Google ScholarCross Ref
Advait Madhavan, Timothy Sherwood, and Dmitri Strukov. 2014. Race Logic: A Hardware Acceleration for Dynamic Programming Algorithms. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (Minneapolis, Minnesota, USA) (ISCA ’14). IEEE Press, 517–528.Google ScholarCross Ref
R McDermott, MG Vavilov, BLT Plourde, FK Wilhelm, PJ Liebermann, OA Mukhanov, and TA Ohki. 2018. Quantum–classical interface based on single flux quantum digital logic. Quantum science and technology 3, 2 (2018), 024004.Google Scholar
Vratislav Michal, Emanuele Baggetta, Mario Aurino, Sophie Bouat, and Jean-Claude Villegier. 2011. Superconducting RSFQ logic: Towards 100GHz digital electronics. In Proceedings of 21st International Conference Radioelektronika.Google ScholarCross Ref
Janardan Misra and Indranil Saha. 2010. Artificial neural networks in hardware: A survey of two decades of progress. Neurocomputing 74, 1 (2010), 239–255. https://doi.org/10.1016/j.neucom.2010.03.021 Artificial Brains.Google ScholarDigital Library
O. Mukhanov, V. Semenov, and K. Likharev. 1987. Ultimate performance of the RSFQ logic circuits. IEEE Transactions on Magnetics 23, 2 (1987), 759–762. https://doi.org/10.1109/TMAG.1987.1064951Google ScholarCross Ref
Oleg A. Mukhanov. 2011. Energy-efficient single flux quantum technology. IEEE Transactions on Applied Superconductivity 21, 3 (2011), 760–769.Google ScholarCross Ref
Oleg A. Mukhanov, Stanislav V. Polonsky, and Vasili K. Semenov. 1991. New elements of the RSFQ logic family. IEEE Transactions on Magnetics 27, 2 (1991), 2435–2438. https://doi.org/10.1109/20.133710Google ScholarCross Ref
Ikki Nagaoka, Masamitsu Tanaka, Koji Inoue, and Akira Fujimaki. 2019. A 48ghz 5.6 mw gate-level-pipelined multiplier using single-flux quantum logic. In 2019 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 460–462.Google ScholarCross Ref
openroad. 2021. OpenRoad. https://theopenroadproject.org.Google Scholar
pytorch. 2021. PyTorch. https://pytorch.org/docs/stable/quantization.html.Google Scholar
Atul Rahman, Jongeun Lee, and Kiyoung Choi. 2016. Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1393–1398.Google Scholar
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European conference on computer vision. Springer, 525–542.Google ScholarCross Ref
Peter Russer. 1971. General energy relations for Josephson junctions. Proc. IEEE 59, 2 (1971), 282–283.Google ScholarCross Ref
Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna. 2018. Scale-sim: Systolic cnn accelerator simulator. arXiv preprint arXiv:1811.02883 (2018).Google Scholar
Catherine D Schuman, Thomas E Potok, Robert M Patton, J Douglas Birdwell, Mark E Dean, Garrett S Rose, and James S Plank. 2017. A survey of neuromorphic computing and neural networks in hardware. arXiv preprint arXiv:1705.06963 (2017).Google Scholar
Jing Shen, Haoqi Ren, Zhifeng Zhang, Jun Wu, Wenqi Pan, and Zhenyu Jiang. 2019. A High-Performance Systolic Array Accelerator Dedicated for CNN. In 2019 IEEE 19th International Conference on Communication Technology (ICCT). 1200–1204. https://doi.org/10.1109/ICCT46805.2019.8947127Google ScholarCross Ref
H. Sim and J. Lee. 2017. A new stochastic computing multiplier with application to deep convolutional neural networks. In 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC). 1–6.Google Scholar
II Soloviev, SV Bakurskiy, VI Ruzhickiy, NV Klenov, M Yu Kupriyanov, AA Golubov, OV Skryabina, and VS Stolyarov. 2021. Miniaturization of Josephson Junctions for Digital Superconducting Circuits. Physical review applied 16, 4 (2021), 044060.Google Scholar
Vivienne Sze, Yu-Hsin Chen, Joel Emer, Amr Suleiman, and Zhengdong Zhang. 2017. Hardware for machine learning: Challenges and opportunities. In 2017 IEEE Custom Integrated Circuits Conference (CICC). IEEE, 1–8.Google ScholarCross Ref
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (2017), 2295–2329.Google ScholarCross Ref
Swamit S. Tannu, Douglas M. Carmean, and Moinuddin K. Qureshi. 2017. Cryogenic-DRAM Based Memory System for Scalable Quantum Computers: A Feasibility Study. In Proceedings of the International Symposium on Memory Systems (Alexandria, Virginia) (MEMSYS ’17). Association for Computing Machinery, New York, NY, USA, 189–195. https://doi.org/10.1145/3132402.3132436Google ScholarDigital Library
Swamit S. Tannu, Poulami Das, Michael L. Lewis, Robert Krick, Douglas M. Carmean, and Moinuddin K. Qureshi. 2019. A Case for Superconducting Accelerators (CF). 67–75. https://doi.org/10.1145/3310273.3321561Google ScholarDigital Library
Georgios Tzimpragos, Dilip Vasudevan, Nestan Tsiskaridze, George Michelogiannakis, Advait Madhavan, Jennifer Volk, John Shalf, and Timothy Sherwood. 2020. A Computational Temporal Logic for Superconducting Accelerators. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS ’20). Association for Computing Machinery, New York, NY, USA, 435–448. https://doi.org/10.1145/3373376.3378517Google ScholarDigital Library
T. Van Duzer. 2005. Cryogenic Memories for RSFQ Ultra-High-Speed Processor. In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (SC ’05). IEEE Computer Society, USA, 66. https://doi.org/10.1109/SC.2005.21Google ScholarDigital Library
Mark H. Volkmann, Anubhav Sahu, Coenrad J. Fourie, and Oleg A. Mukhanov. 2013. Experimental Investigation of Energy-Efficient Digital Circuits Based on eSFQ Logic. IEEE Transactions on Applied Superconductivity 23, 3 (2013), 1301505–1301505. https://doi.org/10.1109/TASC.2013.2240755Google ScholarCross Ref
Di Wu, Jingjie Li, Ruokai Yin, Hsuan Hsiao, Younghyun Kim, and Joshua San Miguel. 2020. uGEMM: unary computing architecture for GEMM applications. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 377–390.Google ScholarDigital Library
Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044 (2017).Google Scholar
Dmitry Zinoviev. 2021. RFSQ cell library. http://www.physics.sunysb.edu/Physics/RSFQ/Lib/contents.html.Google Scholar
Farzaneh Zokaee and Lei Jiang. 2021. SMART: A Heterogeneous Scratchpad Memory Architecture for Superconductor SFQ-based Systolic CNN Accelerators. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (Oct 2021). https://doi.org/10.1145/3466752.3480041Google ScholarDigital Library

Index Terms

Towards practical superconducting accelerators for machine learning using U-SFQ
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

SuperBP: Design Space Exploration of Perceptron-Based Branch Predictors for Superconducting CPUs
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture

Single Flux Quantum (SFQ) superconducting technology has a considerable advantage over CMOS in power and performance. SFQ CPUs can also help scale quantum computing technologies, as SFQ circuits can be integrated with qubits due to their amenability to ...
Read More
Depth-bounded Graph Partitioning Algorithm and Dual Clocking Method for Realization of Superconducting SFQ Circuits

Superconducting Single Flux Quantum (SFQ) logic with switching delay of 1ps and switching energy of 10⁻¹⁹J is a potential emerging candidate for replacing Complementary Metal Oxide Semiconductor (CMOS) to achieve very high speed and ultra energy ...
Read More
Temporal and SFQ pulse-streams encoding for area-efficient superconducting accelerators
ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Superconducting technology is a prime candidate for the future of computing. However, current superconducting prototypes are limited to small-scale examples due to stringent area constraints and complex architectures inspired from voltage-level encoding ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Journal on Emerging Technologies in Computing Systems Just Accepted
ISSN:1550-4832
EISSN:1550-4840
Table of Contents

Copyright © 2024 Copyright held by the owner/author(s).
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States

Journal Family
ACM Journals for the Design of Smart and Connected Systems
Publication History
- Published: 9 April 2024
- Accepted: 8 March 2024
- Revised: 2 October 2023
- Received: 27 April 2023
Published in jetc Just Accepted

Check for updates
Author Tags
superconducting digital computing
convolutional neural networks
unary computing
SFQ
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 58
  Total Downloads
- Downloads (Last 12 months)58
- Downloads (Last 6 weeks)58
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards practical superconducting accelerators for machine learning using U-SFQ

ACM Journal on Emerging Technologies in Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

SuperBP: Design Space Exploration of Perceptron-Based Branch Predictors for Superconducting CPUs

Depth-bounded Graph Partitioning Algorithm and Dual Clocking Method for Realization of Superconducting SFQ Circuits

Temporal and SFQ pulse-streams encoding for area-efficient superconducting accelerators

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Towards practical superconducting accelerators for machine learning using U-SFQ

ACM Journal on Emerging Technologies in Computing Systems

Abstract

References

Cited By

Index Terms

Recommendations

SuperBP: Design Space Exploration of Perceptron-Based Branch Predictors for Superconducting CPUs

Depth-bounded Graph Partitioning Algorithm and Dual Clocking Method for Realization of Superconducting SFQ Circuits

Temporal and SFQ pulse-streams encoding for area-efficient superconducting accelerators

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Journal Family

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media