Abstract
Most popular superconducting circuits operate on information carried by ps-wide, \(\boldsymbol{\mu}\)V-tall, single flux quantum (SFQ) pulses. These circuits can operate at frequencies of hundreds of GHz with orders of magnitude lower switching energy than complementary-metal-oxide-semiconductors (CMOS). However, under the stringent area constraints of modern superconductor technologies, fully-fledged, CMOS-inspired superconducting architectures cannot be fabricated at large scales. Unary SFQ (U-SFQ) is an alternative computing paradigm that can address these area constraints. In U-SFQ, information is mapped to a combination of streams of SFQ pulses and in the temporal domain. In this work, we extend U-SFQ to introduce novel building blocks such as a multiplier and an accumulator. These blocks reduce area and power consumption by 2\(\times\) and 4\(\times\) compared with previously-proposed U-SFQ building blocks, and yield at least 97% area savings compared with binary approaches. Using these multiplier and adder, we propose a U-SFQ Convolutional Neural Network (CNN) hardware accelerator capable of comparable peak performance with state-of-the-art superconducting binary approach (B-SFQ) in 32\(\times\) less area. CNNs can operate with 5-8 bits of resolution with no significant degradation in classification accuracy. For 5 bits of resolution, our proposed accelerator yields 5\(\times\)-63\(\times\) better performance than CMOS and 15\(\times\)-173\(\times\) better area efficiency than B-SFQ.
- A. Akahori, M. Tanaka, A. Sekiya, A. Fujimaki, and H. Hayakawa. 2003. Design and demonstration of SFQ pipelined multiplier. IEEE Transactions on Applied Superconductivity 13, 2 (2003), 559–562. https://doi.org/10.1109/TASC.2003.813946Google ScholarCross Ref
- Christopher L. Ayala, Tomoyuki Tanaka, Ro Saito, Mai Nozoe, Naoki Takeuchi, and Nobuyuki Yoshikawa. 2021. MANA: A Monolithic Adiabatic iNtegration Architecture Microprocessor Using 1.4-zJ/op Unshunted Superconductor Josephson Junction Devices. IEEE Journal of Solid-State Circuits 56, 4 (2021), 1152–1165. https://doi.org/10.1109/JSSC.2020.3041338Google ScholarCross Ref
- Ron Banner, Yury Nahshan, and Daniel Soudry. 2019. Post training 4-bit quantization of convolutional networks for rapid-deployment. Advances in Neural Information Processing Systems 32 (2019).Google Scholar
- Meriam Gay Bautista, Patricia Gonzalez-Guerrero, Darren Lyles, and George Michelogiannakis. 2021. Superconducting Shuttle-flux Shift Buffer for Race Logic. In 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS). 796–801. https://doi.org/10.1109/MWSCAS47672.2021.9531899Google ScholarCross Ref
- Paul Bunyk, Konstantin Likharev, and Dmitry Zinoviev. 2001. RSFQ technology: Physics and devices. International journal of high speed electronics and systems 11, 01 (2001), 257–305.Google ScholarCross Ref
- Ruizhe Cai, Ao Ren, Olivia Chen, Ning Liu, Caiwen Ding, Xuehai Qian, Jie Han, Wenhui Luo, Nobuyuki Yoshikawa, and Yanzhi Wang. 2019. A stochastic-computing based deep learning framework using adiabatic quantum-flux-parametron superconducting technology. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). IEEE, 567–578.Google ScholarDigital Library
- W. Chen, A.V. Rylyakov, V. Patel, J.E. Lukens, and K.K. Likharev. 1999. Rapid single flux quantum T-flip flop operating up to 770 GHz. IEEE Transactions on Applied Superconductivity 9, 2 (1999), 3212–3215. https://doi.org/10.1109/77.783712Google ScholarCross Ref
- Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. PACT: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085 (2018).Google Scholar
- Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. Advances in neural information processing systems 28 (2015).Google Scholar
- Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. 92–104.Google ScholarDigital Library
- Timur V. Filippov, Anubhav Sahu, Alex F. Kirichenko, Igor V. Vernik, Mikhail Dorojevets, Christopher L. Ayala, and Oleg A. Mukhanov. 2012. 20GHz Operation of an Asynchronous Wave-Pipelined RSFQ Arithmetic-Logic Unit. Physics Procedia 36 (2012), 59–65. https://doi.org/10.1016/j.phpro.2012.06.130 SUPERCONDUCTIVITY CENTENNIAL Conference 2011.Google ScholarCross Ref
- Akira Fujimaki, Masamitsu Tanaka, Ryo Kasagi, Katsumi Takagi, Masakazu Okada, Yuhi Hayakawa, Kensuke Takata, Hiroyuki Akaike, Nobuyuki Yoshikawa, Shuichi Nagasawa, et al. 2014. Large-scale integrated circuit design based on a Nb nine-layer structure for reconfigurable data-path processors. IEICE Transactions on Electronics 97, 3 (2014), 157–165.Google ScholarCross Ref
- Brian R Gaines. 1967. Stochastic computing. In Proceedings of the April 18-20, 1967, spring joint computer conference. 149–156.Google Scholar
- Patricia Gonzalez-Guerrero, Meriam Gay Bautista, Darren Lyles, and George Michelogiannakis. 2022. Temporal and SFQ Pulse-Streams Encoding for Area-Efficient Superconducting Accelerators. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS ’22). Association for Computing Machinery, New York, NY, USA, 963–976. https://doi.org/10.1145/3503222.3507765Google ScholarDigital Library
- Patricia Gonzalez-Guerrero and Mircea R Stan. 2019. Asynchronous Stochastic Computing. In 2019 53rd Asilomar Conference on Signals, Systems, and Computers. IEEE, 280–285.Google Scholar
- Patricia Gonzalez-Guerrero, Stephen G Wilson, and Mircea R Stan. 2019. Error-latency Trade-off for Asynchronous Stochastic Computing with ΣΔ Streams for the IoT. In 2019 32nd IEEE International System-on-Chip Conference (SOCC). IEEE, 97–102.Google Scholar
- D Scott Holmes, Andrew L Ripple, and Marc A Manheimer. 2013. Energy-efficient superconducting computing—Power budgets and requirements. IEEE Transactions on Applied Superconductivity 23, 3 (2013), 1701610–1701610.Google ScholarCross Ref
- Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. Advances in neural information processing systems 29 (2016).Google Scholar
- Kyuyeon Hwang and Wonyong Sung. 2014. Fixed-point feedforward deep neural network design using weights +1, 0, and −1. In 2014 IEEE Workshop on Signal Processing Systems (SiPS). 1–6. https://doi.org/10.1109/SiPS.2014.6986082Google ScholarCross Ref
- Koki Ishida, Ilkwon Byun, Ikki Nagaoka, Kosuke Fukumitsu, Masamitsu Tanaka, Satoshi Kawakami, Teruo Tanimoto, Takatsugu Ono, Jangwoo Kim, and Koji Inoue. 2020. SuperNPU: An extremely fast neural processing unit using superconducting logic devices. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 58–72.Google ScholarCross Ref
- Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture. 1–12.Google ScholarDigital Library
- Ryota Kashima, Ikki Nagaoka, Masamitsu Tanaka, Taro Yamashita, and Akira Fujimaki. 2021. 64-GHz Datapath Demonstration for Bit-Parallel SFQ Microprocessors Based on a Gate-Level-Pipeline Structure. IEEE Transactions on Applied Superconductivity 31, 5 (2021), 1–6.Google ScholarCross Ref
- Fei Ke, Olivia Chen, Yanzhi Wang, and Nobuyuki Yoshikawa. 2021. Demonstration of a 47.8 GHz High-Speed FFT Processor Using Single-Flux-Quantum Technology. IEEE Transactions on Applied Superconductivity 31, 5 (2021), 1–5.Google ScholarCross Ref
- Asifullah Khan, Anabia Sohail, Umme Zahoora, and Aqsa Saeed Qureshi. 2020. A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review 53, 8 (2020), 5455–5516.Google ScholarDigital Library
- Minje Kim and Paris Smaragdis. 2016. Bitwise neural networks. arXiv preprint arXiv:1601.06071 (2016).Google Scholar
- Gleb Krylov and Eby G Friedman. 2022. Single Flux Quantum Integrated Circuit Design. Springer.Google Scholar
- K.K. Likharev and V.K. Semenov. 1991. RSFQ logic/memory family: a new Josephson-junction technology for sub-terahertz-clockfrequency digital systems. IEEE Transactions on Applied Superconductivity 1, 1 (1991), 3–28. https://doi.org/10.1109/77.80745Google ScholarCross Ref
- Advait Madhavan, Timothy Sherwood, and Dmitri Strukov. 2014. Race Logic: A Hardware Acceleration for Dynamic Programming Algorithms. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (Minneapolis, Minnesota, USA) (ISCA ’14). IEEE Press, 517–528.Google ScholarCross Ref
- R McDermott, MG Vavilov, BLT Plourde, FK Wilhelm, PJ Liebermann, OA Mukhanov, and TA Ohki. 2018. Quantum–classical interface based on single flux quantum digital logic. Quantum science and technology 3, 2 (2018), 024004.Google Scholar
- Vratislav Michal, Emanuele Baggetta, Mario Aurino, Sophie Bouat, and Jean-Claude Villegier. 2011. Superconducting RSFQ logic: Towards 100GHz digital electronics. In Proceedings of 21st International Conference Radioelektronika.Google ScholarCross Ref
- Janardan Misra and Indranil Saha. 2010. Artificial neural networks in hardware: A survey of two decades of progress. Neurocomputing 74, 1 (2010), 239–255. https://doi.org/10.1016/j.neucom.2010.03.021 Artificial Brains.Google ScholarDigital Library
- O. Mukhanov, V. Semenov, and K. Likharev. 1987. Ultimate performance of the RSFQ logic circuits. IEEE Transactions on Magnetics 23, 2 (1987), 759–762. https://doi.org/10.1109/TMAG.1987.1064951Google ScholarCross Ref
- Oleg A. Mukhanov. 2011. Energy-efficient single flux quantum technology. IEEE Transactions on Applied Superconductivity 21, 3 (2011), 760–769.Google ScholarCross Ref
- Oleg A. Mukhanov, Stanislav V. Polonsky, and Vasili K. Semenov. 1991. New elements of the RSFQ logic family. IEEE Transactions on Magnetics 27, 2 (1991), 2435–2438. https://doi.org/10.1109/20.133710Google ScholarCross Ref
- Ikki Nagaoka, Masamitsu Tanaka, Koji Inoue, and Akira Fujimaki. 2019. A 48ghz 5.6 mw gate-level-pipelined multiplier using single-flux quantum logic. In 2019 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 460–462.Google ScholarCross Ref
- openroad. 2021. OpenRoad. https://theopenroadproject.org.Google Scholar
- pytorch. 2021. PyTorch. https://pytorch.org/docs/stable/quantization.html.Google Scholar
- Atul Rahman, Jongeun Lee, and Kiyoung Choi. 2016. Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1393–1398.Google Scholar
- Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European conference on computer vision. Springer, 525–542.Google ScholarCross Ref
- Peter Russer. 1971. General energy relations for Josephson junctions. Proc. IEEE 59, 2 (1971), 282–283.Google ScholarCross Ref
- Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna. 2018. Scale-sim: Systolic cnn accelerator simulator. arXiv preprint arXiv:1811.02883 (2018).Google Scholar
- Catherine D Schuman, Thomas E Potok, Robert M Patton, J Douglas Birdwell, Mark E Dean, Garrett S Rose, and James S Plank. 2017. A survey of neuromorphic computing and neural networks in hardware. arXiv preprint arXiv:1705.06963 (2017).Google Scholar
- Jing Shen, Haoqi Ren, Zhifeng Zhang, Jun Wu, Wenqi Pan, and Zhenyu Jiang. 2019. A High-Performance Systolic Array Accelerator Dedicated for CNN. In 2019 IEEE 19th International Conference on Communication Technology (ICCT). 1200–1204. https://doi.org/10.1109/ICCT46805.2019.8947127Google ScholarCross Ref
- H. Sim and J. Lee. 2017. A new stochastic computing multiplier with application to deep convolutional neural networks. In 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC). 1–6.Google Scholar
- II Soloviev, SV Bakurskiy, VI Ruzhickiy, NV Klenov, M Yu Kupriyanov, AA Golubov, OV Skryabina, and VS Stolyarov. 2021. Miniaturization of Josephson Junctions for Digital Superconducting Circuits. Physical review applied 16, 4 (2021), 044060.Google Scholar
- Vivienne Sze, Yu-Hsin Chen, Joel Emer, Amr Suleiman, and Zhengdong Zhang. 2017. Hardware for machine learning: Challenges and opportunities. In 2017 IEEE Custom Integrated Circuits Conference (CICC). IEEE, 1–8.Google ScholarCross Ref
- Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (2017), 2295–2329.Google ScholarCross Ref
- Swamit S. Tannu, Douglas M. Carmean, and Moinuddin K. Qureshi. 2017. Cryogenic-DRAM Based Memory System for Scalable Quantum Computers: A Feasibility Study. In Proceedings of the International Symposium on Memory Systems (Alexandria, Virginia) (MEMSYS ’17). Association for Computing Machinery, New York, NY, USA, 189–195. https://doi.org/10.1145/3132402.3132436Google ScholarDigital Library
- Swamit S. Tannu, Poulami Das, Michael L. Lewis, Robert Krick, Douglas M. Carmean, and Moinuddin K. Qureshi. 2019. A Case for Superconducting Accelerators (CF). 67–75. https://doi.org/10.1145/3310273.3321561Google ScholarDigital Library
- Georgios Tzimpragos, Dilip Vasudevan, Nestan Tsiskaridze, George Michelogiannakis, Advait Madhavan, Jennifer Volk, John Shalf, and Timothy Sherwood. 2020. A Computational Temporal Logic for Superconducting Accelerators. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS ’20). Association for Computing Machinery, New York, NY, USA, 435–448. https://doi.org/10.1145/3373376.3378517Google ScholarDigital Library
- T. Van Duzer. 2005. Cryogenic Memories for RSFQ Ultra-High-Speed Processor. In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing (SC ’05). IEEE Computer Society, USA, 66. https://doi.org/10.1109/SC.2005.21Google ScholarDigital Library
- Mark H. Volkmann, Anubhav Sahu, Coenrad J. Fourie, and Oleg A. Mukhanov. 2013. Experimental Investigation of Energy-Efficient Digital Circuits Based on eSFQ Logic. IEEE Transactions on Applied Superconductivity 23, 3 (2013), 1301505–1301505. https://doi.org/10.1109/TASC.2013.2240755Google ScholarCross Ref
- Di Wu, Jingjie Li, Ruokai Yin, Hsuan Hsiao, Younghyun Kim, and Joshua San Miguel. 2020. uGEMM: unary computing architecture for GEMM applications. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 377–390.Google ScholarDigital Library
- Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044 (2017).Google Scholar
- Dmitry Zinoviev. 2021. RFSQ cell library. http://www.physics.sunysb.edu/Physics/RSFQ/Lib/contents.html.Google Scholar
- Farzaneh Zokaee and Lei Jiang. 2021. SMART: A Heterogeneous Scratchpad Memory Architecture for Superconductor SFQ-based Systolic CNN Accelerators. MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (Oct 2021). https://doi.org/10.1145/3466752.3480041Google ScholarDigital Library
Index Terms
- Towards practical superconducting accelerators for machine learning using U-SFQ
Recommendations
SuperBP: Design Space Exploration of Perceptron-Based Branch Predictors for Superconducting CPUs
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitectureSingle Flux Quantum (SFQ) superconducting technology has a considerable advantage over CMOS in power and performance. SFQ CPUs can also help scale quantum computing technologies, as SFQ circuits can be integrated with qubits due to their amenability to ...
Depth-bounded Graph Partitioning Algorithm and Dual Clocking Method for Realization of Superconducting SFQ Circuits
Superconducting Single Flux Quantum (SFQ) logic with switching delay of 1ps and switching energy of 10−19J is a potential emerging candidate for replacing Complementary Metal Oxide Semiconductor (CMOS) to achieve very high speed and ultra energy ...
Temporal and SFQ pulse-streams encoding for area-efficient superconducting accelerators
ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsSuperconducting technology is a prime candidate for the future of computing. However, current superconducting prototypes are limited to small-scale examples due to stringent area constraints and complex architectures inspired from voltage-level encoding ...
Comments