skip to main content
research-article
Free Access
Just Accepted

Towards practical superconducting accelerators for machine learning using U-SFQ

Online AM:09 April 2024Publication History
Skip Abstract Section

Abstract

Most popular superconducting circuits operate on information carried by ps-wide, \(\boldsymbol{\mu}\)V-tall, single flux quantum (SFQ) pulses. These circuits can operate at frequencies of hundreds of GHz with orders of magnitude lower switching energy than complementary-metal-oxide-semiconductors (CMOS). However, under the stringent area constraints of modern superconductor technologies, fully-fledged, CMOS-inspired superconducting architectures cannot be fabricated at large scales. Unary SFQ (U-SFQ) is an alternative computing paradigm that can address these area constraints. In U-SFQ, information is mapped to a combination of streams of SFQ pulses and in the temporal domain. In this work, we extend U-SFQ to introduce novel building blocks such as a multiplier and an accumulator. These blocks reduce area and power consumption by 2\(\times\) and 4\(\times\) compared with previously-proposed U-SFQ building blocks, and yield at least 97% area savings compared with binary approaches. Using these multiplier and adder, we propose a U-SFQ Convolutional Neural Network (CNN) hardware accelerator capable of comparable peak performance with state-of-the-art superconducting binary approach (B-SFQ) in 32\(\times\) less area. CNNs can operate with 5-8 bits of resolution with no significant degradation in classification accuracy. For 5 bits of resolution, our proposed accelerator yields 5\(\times\)-63\(\times\) better performance than CMOS and 15\(\times\)-173\(\times\) better area efficiency than B-SFQ.

References

  1. A. Akahori, M. Tanaka, A. Sekiya, A. Fujimaki, and H. Hayakawa. 2003. Design and demonstration of SFQ pipelined multiplier. IEEE Transactions on Applied Superconductivity 13, 2 (2003), 559–562. https://doi.org/10.1109/TASC.2003.813946Google ScholarGoogle ScholarCross RefCross Ref
  2. Christopher L. Ayala, Tomoyuki Tanaka, Ro Saito, Mai Nozoe, Naoki Takeuchi, and Nobuyuki Yoshikawa. 2021. MANA: A Monolithic Adiabatic iNtegration Architecture Microprocessor Using 1.4-zJ/op Unshunted Superconductor Josephson Junction Devices. IEEE Journal of Solid-State Circuits 56, 4 (2021), 1152–1165. https://doi.org/10.1109/JSSC.2020.3041338Google ScholarGoogle ScholarCross RefCross Ref
  3. Ron Banner, Yury Nahshan, and Daniel Soudry. 2019. Post training 4-bit quantization of convolutional networks for rapid-deployment. Advances in Neural Information Processing Systems 32 (2019).Google ScholarGoogle Scholar
  4. Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, and George Michelogiannakis. 2021. Superconducting Shuttle-flux Shift Buffer for Race Logic. In 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS). 796–801. https://doi.org/10.1109/MWSCAS47672.2021.9531899Google ScholarGoogle ScholarCross RefCross Ref
  5. Paul Bunyk, Konstantin Likharev, and Dmitry Zinoviev. 2001. RSFQ technology: Physics and devices. International journal of high speed electronics and systems 11, 01 (2001), 257–305.Google ScholarGoogle ScholarCross RefCross Ref
  6. Ruizhe Cai, Ao Ren, Olivia Chen, Ning Liu, Caiwen Ding, Xuehai Qian, Jie Han, Wenhui Luo, Nobuyuki Yoshikawa, and Yanzhi Wang. 2019. A stochastic-computing based deep learning framework using adiabatic quantum-flux-parametron superconducting technology. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). IEEE, 567–578.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Chen, A.V. Rylyakov, V. Patel, J.E. Lukens, and K.K. Likharev. 1999. Rapid single flux quantum T-flip flop operating up to 770 GHz. IEEE Transactions on Applied Superconductivity 9, 2 (1999), 3212–3215. https://doi.org/10.1109/77.783712Google ScholarGoogle ScholarCross RefCross Ref
  8. Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. PACT: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018).Google ScholarGoogle Scholar
  9. Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. Advances in neural information processing systems 28 (2015).Google ScholarGoogle Scholar
  10. Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. 92–104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Timur V. Filippov, Anubhav Sahu, Alex F. Kirichenko, Igor V. Vernik, Mikhail Dorojevets, Christopher L. Ayala, and Oleg A. Mukhanov. 2012. 20GHz Operation of an Asynchronous Wave-Pipelined RSFQ Arithmetic-Logic Unit. Physics Procedia 36 (2012), 59–65. https://doi.org/10.1016/j.phpro.2012.06.130 SUPERCONDUCTIVITY CENTENNIAL Conference 2011.Google ScholarGoogle ScholarCross RefCross Ref
  12. Akira Fujimaki, Masamitsu Tanaka, Ryo Kasagi, Katsumi Takagi, Masakazu Okada, Yuhi Hayakawa, Kensuke Takata, Hiroyuki Akaike, Nobuyuki Yoshikawa, Shuichi Nagasawa, et al. 2014. Large-scale integrated circuit design based on a Nb nine-layer structure for reconfigurable data-path processors. IEICE Transactions on Electronics 97, 3 (2014), 157–165.Google ScholarGoogle ScholarCross RefCross Ref
  13. Brian R Gaines. 1967. Stochastic computing. In Proceedings of the April 18-20, 1967, spring joint computer conference. 149–156.Google ScholarGoogle Scholar
  14. Patricia Gonzalez-Guerrero, Meriam Gay Bautista, Darren Lyles, and George Michelogiannakis. 2022. Temporal and SFQ Pulse-Streams Encoding for Area-Efficient Superconducting Accelerators. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS ’22). Association for Computing Machinery, New York, NY, USA, 963–976. https://doi.org/10.1145/3503222.3507765Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Patricia Gonzalez-Guerrero and Mircea R Stan. 2019. Asynchronous Stochastic Computing. In 2019 53rd Asilomar Conference on Signals, Systems, and Computers. IEEE, 280–285.Google ScholarGoogle Scholar
  16. Patricia Gonzalez-Guerrero, Stephen G Wilson, and Mircea R Stan. 2019. Error-latency Trade-off for Asynchronous Stochastic Computing with ΣΔ Streams for the IoT. In 2019 32nd IEEE International System-on-Chip Conference (SOCC). IEEE, 97–102.Google ScholarGoogle Scholar
  17. D Scott Holmes, Andrew L Ripple, and Marc A Manheimer. 2013. Energy-efficient superconducting computing—Power budgets and requirements. IEEE Transactions on Applied Superconductivity 23, 3 (2013), 1701610–1701610.Google ScholarGoogle ScholarCross RefCross Ref
  18. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. Advances in neural information processing systems 29 (2016).Google ScholarGoogle Scholar
  19. Kyuyeon Hwang and Wonyong Sung. 2014. Fixed-point feedforward deep neural network design using weights +1, 0, and −1. In 2014 IEEE Workshop on Signal Processing Systems (SiPS). 1–6. https://doi.org/10.1109/SiPS.2014.6986082Google ScholarGoogle ScholarCross RefCross Ref
  20. Koki Ishida, Ilkwon Byun, Ikki Nagaoka, Kosuke Fukumitsu, Masamitsu Tanaka, Satoshi Kawakami, Teruo Tanimoto, Takatsugu Ono, Jangwoo Kim, and Koji Inoue. 2020. SuperNPU: An extremely fast neural processing unit using superconducting logic devices. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 58–72.Google ScholarGoogle ScholarCross RefCross Ref
  21. Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture. 1–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ryota Kashima, Ikki Nagaoka, Masamitsu Tanaka, Taro Yamashita, and Akira Fujimaki. 2021. 64-GHz Datapath Demonstration for Bit-Parallel SFQ Microprocessors Based on a Gate-Level-Pipeline Structure. IEEE Transactions on Applied Superconductivity 31, 5 (2021), 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  23. Fei Ke, Olivia Chen, Yanzhi Wang, and Nobuyuki Yoshikawa. 2021. Demonstration of a 47.8 GHz High-Speed FFT Processor Using Single-Flux-Quantum Technology. IEEE Transactions on Applied Superconductivity 31, 5 (2021), 1–5.Google ScholarGoogle ScholarCross RefCross Ref
  24. Asifullah Khan, Anabia Sohail, Umme Zahoora, and Aqsa Saeed Qureshi. 2020. A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review 53, 8 (2020), 5455–5516.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Minje Kim and Paris Smaragdis. 2016. Bitwise neural networks. arXiv preprint arXiv:1601.06071 (2016).Google ScholarGoogle Scholar
  26. Gleb Krylov and Eby G Friedman. 2022. Single Flux Quantum Integrated Circuit Design. Springer.Google ScholarGoogle Scholar
  27. K.K. Likharev and V.K. Semenov. 1991. RSFQ logic/memory family: a new Josephson-junction technology for sub-terahertz-clockfrequency digital systems. IEEE Transactions on Applied Superconductivity 1, 1 (1991), 3–28. https://doi.org/10.1109/77.80745Google ScholarGoogle ScholarCross RefCross Ref
  28. Advait Madhavan, Timothy Sherwood, and Dmitri Strukov. 2014. Race Logic: A Hardware Acceleration for Dynamic Programming Algorithms. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (Minneapolis, Minnesota, USA) (ISCA ’14). IEEE Press, 517–528.Google ScholarGoogle ScholarCross RefCross Ref
  29. R McDermott, MG Vavilov, BLT Plourde, FK Wilhelm, PJ Liebermann, OA Mukhanov, and TA Ohki. 2018. Quantum–classical interface based on single flux quantum digital logic. Quantum science and technology 3, 2 (2018), 024004.Google ScholarGoogle Scholar
  30. Vratislav Michal, Emanuele Baggetta, Mario Aurino, Sophie Bouat, and Jean-Claude Villegier. 2011. Superconducting RSFQ logic: Towards 100GHz digital electronics. In Proceedings of 21st International Conference Radioelektronika.Google ScholarGoogle ScholarCross RefCross Ref
  31. Janardan Misra and Indranil Saha. 2010. Artificial neural networks in hardware: A survey of two decades of progress. Neurocomputing 74, 1 (2010), 239–255. https://doi.org/10.1016/j.neucom.2010.03.021 Artificial Brains.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. O. Mukhanov, V. Semenov, and K. Likharev. 1987. Ultimate performance of the RSFQ logic circuits. IEEE Transactions on Magnetics 23, 2 (1987), 759–762. https://doi.org/10.1109/TMAG.1987.1064951Google ScholarGoogle ScholarCross RefCross Ref
  33. Oleg A. Mukhanov. 2011. Energy-efficient single flux quantum technology. IEEE Transactions on Applied Superconductivity 21, 3 (2011), 760–769.Google ScholarGoogle ScholarCross RefCross Ref
  34. Oleg A. Mukhanov, Stanislav V. Polonsky, and Vasili K. Semenov. 1991. New elements of the RSFQ logic family. IEEE Transactions on Magnetics 27, 2 (1991), 2435–2438. https://doi.org/10.1109/20.133710Google ScholarGoogle ScholarCross RefCross Ref
  35. Ikki Nagaoka, Masamitsu Tanaka, Koji Inoue, and Akira Fujimaki. 2019. A 48ghz 5.6 mw gate-level-pipelined multiplier using single-flux quantum logic. In 2019 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 460–462.Google ScholarGoogle ScholarCross RefCross Ref
  36. openroad. 2021. OpenRoad. https://theopenroadproject.org.Google ScholarGoogle Scholar
  37. pytorch. 2021. PyTorch. https://pytorch.org/docs/stable/quantization.html.Google ScholarGoogle Scholar
  38. Atul Rahman, Jongeun Lee, and Kiyoung Choi. 2016. Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1393–1398.Google ScholarGoogle Scholar
  39. Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European conference on computer vision. Springer, 525–542.Google ScholarGoogle ScholarCross RefCross Ref
  40. Peter Russer. 1971. General energy relations for Josephson junctions. Proc. IEEE 59, 2 (1971), 282–283.Google ScholarGoogle ScholarCross RefCross Ref
  41. Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna. 2018. Scale-sim: Systolic cnn accelerator simulator. arXiv preprint arXiv:1811.02883 (2018).Google ScholarGoogle Scholar
  42. Catherine D Schuman, Thomas E Potok, Robert M Patton, J Douglas Birdwell, Mark E Dean, Garrett S Rose, and James S Plank. 2017. A survey of neuromorphic computing and neural networks in hardware. arXiv preprint arXiv:1705.06963 (2017).Google ScholarGoogle Scholar
  43. Jing Shen, Haoqi Ren, Zhifeng Zhang, Jun Wu, Wenqi Pan, and Zhenyu Jiang. 2019. A High-Performance Systolic Array Accelerator Dedicated for CNN. In 2019 IEEE 19th International Conference on Communication Technology (ICCT). 1200–1204. https://doi.org/10.1109/ICCT46805.2019.8947127Google ScholarGoogle ScholarCross RefCross Ref
  44. H. Sim and J. Lee. 2017. A new stochastic computing multiplier with application to deep convolutional neural networks. In 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC). 1–6.Google ScholarGoogle Scholar
  45. II Soloviev, SV Bakurskiy, VI Ruzhickiy, NV Klenov, M Yu Kupriyanov, AA Golubov, OV Skryabina, and VS Stolyarov. 2021. Miniaturization of Josephson Junctions for Digital Superconducting Circuits. Physical review applied 16, 4 (2021), 044060.Google ScholarGoogle Scholar
  46. Vivienne Sze, Yu-Hsin Chen, Joel Emer, Amr Suleiman, and Zhengdong Zhang. 2017. Hardware for machine learning: Challenges and opportunities. In 2017 IEEE Custom Integrated Circuits Conference (CICC). IEEE, 1–8.Google ScholarGoogle ScholarCross RefCross Ref
  47. Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (2017), 2295–2329.Google ScholarGoogle ScholarCross RefCross Ref
  48. Swamit S. Tannu, Douglas M. Carmean, and Moinuddin K. Qureshi. 2017. Cryogenic-DRAM Based Memory System for Scalable Quantum Computers: A Feasibility Study. In Proceedings of the International Symposium on Memory Systems (Alexandria, Virginia) (MEMSYS ’17). Association for Computing Machinery, New York, NY, USA, 189–195. https://doi.org/10.1145/3132402.3132436Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Swamit S. Tannu, Poulami Das, Michael L. Lewis, Robert Krick, Douglas M. Carmean, and Moinuddin K. Qureshi. 2019. A Case for Superconducting Accelerators (CF). 67–75. https://doi.org/10.1145/3310273.3321561Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Georgios Tzimpragos, Dilip Vasudevan, Nestan Tsiskaridze, George Michelogiannakis, Advait Madhavan, Jennifer Volk, John Shalf, and Timothy Sherwood. 2020. A Computational Temporal Logic for Superconducting Accelerators. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS ’20). Association for Computing Machinery, New York, NY, USA, 435–448. https://doi.org/10.1145/3373376.3378517Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. T. Van Duzer. 2005. Cryogenic Memories for RSFQ Ultra-High-Speed Processor. In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (SC ’05). IEEE Computer Society, USA, 66. https://doi.org/10.1109/SC.2005.21Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Mark H. Volkmann, Anubhav Sahu, Coenrad J. Fourie, and Oleg A. Mukhanov. 2013. Experimental Investigation of Energy-Efficient Digital Circuits Based on eSFQ Logic. IEEE Transactions on Applied Superconductivity 23, 3 (2013), 1301505–1301505. https://doi.org/10.1109/TASC.2013.2240755Google ScholarGoogle ScholarCross RefCross Ref
  53. Di Wu, Jingjie Li, Ruokai Yin, Hsuan Hsiao, Younghyun Kim, and Joshua San Miguel. 2020. uGEMM: unary computing architecture for GEMM applications. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 377–390.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044 (2017).Google ScholarGoogle Scholar
  55. Dmitry Zinoviev. 2021. RFSQ cell library. http://www.physics.sunysb.edu/Physics/RSFQ/Lib/contents.html.Google ScholarGoogle Scholar
  56. Farzaneh Zokaee and Lei Jiang. 2021. SMART: A Heterogeneous Scratchpad Memory Architecture for Superconductor SFQ-based Systolic CNN Accelerators. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (Oct 2021). https://doi.org/10.1145/3466752.3480041Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards practical superconducting accelerators for machine learning using U-SFQ

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Journal on Emerging Technologies in Computing Systems
            ACM Journal on Emerging Technologies in Computing Systems Just Accepted
            ISSN:1550-4832
            EISSN:1550-4840
            Table of Contents

            Copyright © 2024 Copyright held by the owner/author(s).

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 9 April 2024
            • Accepted: 8 March 2024
            • Revised: 2 October 2023
            • Received: 27 April 2023
            Published in jetc Just Accepted

            Check for updates

            Qualifiers

            • research-article
          • Article Metrics

            • Downloads (Last 12 months)58
            • Downloads (Last 6 weeks)58

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader