research-article

Public Access

An Instruction Set Architecture for Machine Learning

Authors:
Yunji Chen

SKL of Computer Architecture, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences; Institute of BrainIntelligence Technology, Zhangjiang Laboratory (BIT, ZJLab); Shanghai Research Center for Brain Science and Brain-Inspired Intelligence (Shanghai Brain/AI)

SKL of Computer Architecture, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences; Institute of BrainIntelligence Technology, Zhangjiang Laboratory (BIT, ZJLab); Shanghai Research Center for Brain Science and Brain-Inspired Intelligence (Shanghai Brain/AI)
View Profile

,
Huiying Lan

SKL of Computer Architecture, Institute of Computing Technology, CAS

SKL of Computer Architecture, Institute of Computing Technology, CAS

0000-0003-3120-5773
View Profile

,
Zidong Du

SKL of Computer Architecture, Institute of Computing Technology, CAS

SKL of Computer Architecture, Institute of Computing Technology, CAS
View Profile

,
Shaoli Liu

SKL of Computer Architecture, Institute of Computing Technology, CAS

SKL of Computer Architecture, Institute of Computing Technology, CAS
View Profile

,
Jinhua Tao

SKL of Computer Architecture, Institute of Computing Technology, CAS

SKL of Computer Architecture, Institute of Computing Technology, CAS
View Profile

,
Dong Han

SKL of Computer Architecture, Institute of Computing Technology, CAS

SKL of Computer Architecture, Institute of Computing Technology, CAS
View Profile

,
Tao Luo

SKL of Computer Architecture, Institute of Computing Technology, CAS

SKL of Computer Architecture, Institute of Computing Technology, CAS
View Profile

,
Qi Guo

SKL of Computer Architecture, Institute of Computing Technology, CAS

SKL of Computer Architecture, Institute of Computing Technology, CAS
View Profile

,
Ling Li

Institute of Software, Chinese Academy of Sciences, CAS; University of Chinese Academy of Sciences

Institute of Software, Chinese Academy of Sciences, CAS; University of Chinese Academy of Sciences
View Profile

,
Yuan Xie

Department of Electrical and Computer Engineering, UCSB, USA

Department of Electrical and Computer Engineering, UCSB, USA
View Profile

,
Tianshi Chen

SKL of Computer Architecture, Institute of Computing Technology, CAS

SKL of Computer Architecture, Institute of Computing Technology, CAS
View Profile

Authors Info & Claims

ACM Transactions on Computer Systems Volume 36 Issue 3Article No.: 9pp 1–35https://doi.org/10.1145/3331469

Published:13 August 2019Publication History

ACM Transactions on Computer Systems

Abstract

Machine Learning (ML) are a family of models for learning from the data to improve performance on a certain task. ML techniques, especially recent renewed neural networks (deep neural networks), have proven to be efficient for a broad range of applications. ML techniques are conventionally executed on general-purpose processors (such as CPU and GPGPU), which usually are not energy efficient, since they invest excessive hardware resources to flexibly support various workloads. Consequently, application-specific hardware accelerators have been proposed recently to improve energy efficiency. However, such accelerators were designed for a small set of ML techniques sharing similar computational patterns, and they adopt complex and informative instructions (control signals) directly corresponding to high-level functional blocks of an ML technique (such as layers in neural networks) or even an ML as a whole. Although straightforward and easy to implement for a limited set of similar ML techniques, the lack of agility in the instruction set prevents such accelerator designs from supporting a variety of different ML techniques with sufficient flexibility and efficiency.

In this article, we first propose a novel domain-specific Instruction Set Architecture (ISA) for NN accelerators, called Cambricon, which is a load-store architecture that integrates scalar, vector, matrix, logical, data transfer, and control instructions, based on a comprehensive analysis of existing NN techniques. We then extend the application scope of Cambricon from NN to ML techniques. We also propose an assembly language, an assembler, and runtime to support programming with Cambricon, especially targeting large-scale ML problems. Our evaluation over a total of 16 representative yet distinct ML techniques have demonstrated that Cambricon exhibits strong descriptive capacity over a broad range of ML techniques and provides higher code density than general-purpose ISAs such as x86, MIPS, and GPGPU. Compared to the latest state-of-the-art NN accelerator design DaDianNao [7] (which can only accommodate three types of NN techniques), our Cambricon-based accelerator prototype implemented in TSMC 65nm technology incurs only negligible latency/power/area overheads, with a versatile coverage of 10 different NN benchmarks and 7 other ML benchmarks. Compared to the recent prevalent ML accelerator PuDianNao, our Cambricon-based accelerator is able to support all the ML techniques as well as the 10 NNs but with only approximate 5.1% performance loss.

References

N. S. Altman. 1992. An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46, 3 (1992), 175--185.Google ScholarCross Ref
L. Breiman, J. H. Friedman, R. A. Olshcn, and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth International Group, Belmont CA.Google Scholar
Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, and Srihari Cadambi. 2010. A dynamically configurable coprocessor for convolutional neural networks. In Proceedings of the 37th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
Yun-Fan Chang, P. Lin, Shao-Hua Cheng, Kai-Hsuan Chan, Yi-Chong Zeng, Chia-Wei Liao, Wen-Tsung Chang, Yu-Chiang Wang, and Yu Tsao. 2014. Robust anchorperson detection based on audio streams using a hybrid I-vector and DNN system. In Proceedings of the 2014 Annual Summit and Conference on Asia-Pacific Signal and Information Processing Association.Google ScholarCross Ref
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2015. A high-throughput neural network accelerator. IEEE Micro 35, 3 (2015), 24--32.Google ScholarDigital Library
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. Google ScholarDigital Library
Ping Chi, Wang-Chien Lee, and Yuan Xie. 2016. Adapting B-plus tree for emerging nov-volatile memory based main memory. In Proceedings of the IEEE Conference on Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD’16).Google Scholar
Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA’16). Google ScholarDigital Library
A. Coates, B. Huval, T. Wang, D. J. Wu, and A. Y. Ng. 2013. Deep learning with cots hpc systems. In Proceedings of the 30th International Conference on Machine Learning. Google ScholarDigital Library
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Mach. Learn. 20, 3 (1995), 273--297. Google ScholarDigital Library
G. E. Dahl, T. N. Sainath, and G. E. Hinton. 2013. Improving deep neural networks for LVCSR using rectified linear units and dropout. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.Google Scholar
A. L. Edwards. 1984. An introduction to linear regression and correlation. Math. Gaz. 69, 2 (1984), 1--17.Google Scholar
V. Eijkhout. 2011. Introduction to High Performance Scientific computing. Retrieved from www.lulu.com. Google ScholarDigital Library
H. Esmaeilzadeh, P. Saeedi, B. N. Araabi, C. Lucas, and Sied Mehdi Fakhraie. 2006. Neural network stream processing core (NnSP) for embedded systems. In Proceedings of the 2006 IEEE International Symposium on Circuits and Systems.Google ScholarCross Ref
Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Neural acceleration for general-purpose approximate programs. In Proceedings of the 2012 IEEE/ACM International Symposium on Microarchitecture. Google ScholarDigital Library
C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun. 2011. NeuFlow: A runtime reconfigurable dataflow processor for vision. In Proceedings of the 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.Google Scholar
C. Farabet, C. Poulet, J.Y. Han, and Y. LeCun. 2009. CNP: An FPGA-based processor for convolutional networks. In Proceedings of the 2009 International Conference on Field Programmable Logic and Applications.Google Scholar
E. W. Forgy. 1965. Cluster analysis of multivariate data : Efficiency versus interpretability of classifications. Biometrics 21, 3 (1965), 41--52.Google Scholar
V. Gokhale, Jonghoon Jin, A. Dundar, B. Martini, and E. Culurciello. 2014. A 240 G-ops/s mobile coprocessor for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Google ScholarDigital Library
A. Graves and J. Schmidhuber. 2005. Framewise phoneme classification with bidirectional LSTM networks. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks.Google Scholar
Atif Hashmi, Andrew Nere, James Jamal Thomas, and Mikko Lipasti. 2011. A case for neuromorphic ISAs. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proceedings of the International Conference on Computer Vision. 1026--1034. Google ScholarDigital Library
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778.Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778.Google ScholarCross Ref
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management. Google ScholarDigital Library
INTEL. {n.d.}. AVX-512. Retrieved from https://software.intel.com/en-us/blogs/2013/avx-512-instructions.Google Scholar
INTEL. {n.d.}. MKL. Retrieved from https://software.intel.com/en-us/intel-mkl.Google Scholar
Pineda Fernando J. 1987. Generalization of back-propagation to recurrent neural networks. Phys. Rev. Lett. (1987), 602--611. Google ScholarDigital Library
K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun. 2009. What is the best multi-stage architecture for object recognition? In Proceedings of the 12th IEEE International Conference on Computer Vision.Google Scholar
Norman P. Jouppi, Cliff Young, Nishant Patil, David A. Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17). 1--12. Google ScholarDigital Library
V. Kantabutra. 1996. On hardware for computing exponential and trigonometric functions. IEEE Trans. Comput. 45, 3 (1996), 328--339. Google ScholarDigital Library
A. Krizhevsky. {n.d.}. cuda-convnet: High-performance c++/cuda implemen- tation of convolutional neural networks.Google Scholar
Alex Krizhevsky, Sutskever Ilya, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25. Google ScholarDigital Library
Pat Langley, Wayne Iba, and Kevin Thompson. 1992. An analysis of bayesian classifiers. In Proceedings of the 10th National Conference on Artificial Intelligence. 223--228. Google ScholarDigital Library
Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio. 2007. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning. Google ScholarDigital Library
Q.V. Le. 2013. Building high-level features using large scale unsupervised learning. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.Google ScholarCross Ref
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, Vol. 86. 2278--2324.Google ScholarCross Ref
Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. 2015. PuDianNao: A polyvalent machine learning accelerator. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarDigital Library
Shaoli Liu, Zidong Du, Jinhua Tao, Dong Han, Tao Luo, Yuan Xie, Yunji Chen, and Tianshi Chen. 2016. Cambricon: An instruction set architecture for neural networks. In Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA’16). 393--405. Google ScholarDigital Library
A. A. Maashri, M. DeBole, M. Cotter, N. Chandramoorthy, Yang Xiao, V. Narayanan, and C. Chakrabarti, C. 2012. Accelerating neuromorphic vision algorithms for recognition. In Proceedings of the 49th ACM/EDAC/IEEE Design Automation Conference. Google ScholarDigital Library
Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kyung Kim, and Hadi Esmaeilzadeh. 2016. TABLA: A unified template-based framework for accelerating statistical machine learning. In Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA’16). 14--26.Google ScholarCross Ref
G. Marsaglia and W. W. Tsang. 2000. The ziggurat method for generating random variables. J. Stat. Softw. 5, 8 (2000). https://EconPapers.repec.org/RePEc:jss:jstsof:v:005:i08.Google ScholarCross Ref
Paul A Merolla, John V. Arthur, Rodrigo Alvarez-icaza, Andrew S. Cassidy, Jun Sawada, Filipp Akopyan, Bryan L. Jackson, Nabil Imam, Chen Guo, Yutaka Nakamura, Bernard Brezzo, Ivan Vo, Steven K. Esser, Rathinakumar Appuswamy, Brian Taba, Arnon Amir, Myron D. Flickner, William P. Risk, Rajit Manohar, and Dharmendra S. Modha. 2014. A million spiling-neuron interated circuit with a scalable communication network and interface. Science 345, 6197 (2014), 668--673.Google Scholar
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.Google Scholar
M. A. Motter. 1999. Control of the NASA langley 16-foot transonic tunnel with the self-organizing map. In Proceedings of the 1999 American Control Conference.Google Scholar
NVIDIA. {n.d.}. CUBLAS. Retrieved from https://developer.nvidia.com/cublas.Google Scholar
C. S. Oliveira and E. Del Hernandez. 2004. Forms of adapting patterns to Hopfield neural networks with larger number of nodes and higher storage capacity. In Proceedings of the 2004 IEEE International Joint Conference on Neural Networks.Google Scholar
Jongse Park, Hardik Sharma, Divya Mahajan, Joon Kyung Kim, Preston Olds, and Hadi Esmaeilzadeh. 2017. Scale-out acceleration for machine learning. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’17). 367--381. Google ScholarDigital Library
David A. Patterson and Carlo H. Sequin. 1981. RISC I: A reduced instruction set VLSI computer. In Proceedings of the 8th Annual Symposium on Computer Architecture. Google ScholarDigital Library
M. Peemen, A. A. A. Setio, B. Mesman, and H. Corporaal. 2013. Memory-centric accelerator design for convolutional neural networks. In Proceedings of the 31st IEEE International Conference on Computer Design.Google Scholar
John C. Platt, Nello Cristianini, and John Shawe-Taylor. 1999. Large margin DAGs for multiclass classification. In Proceedings of the Advances in Neural Information Processing Systems 12 (NIPS’99). 547--553. http://papers.nips.cc/paper/1773-large-margin-dags-for-multiclass-classification Google ScholarDigital Library
Matt Poremba, Tao Zhang, and Yuan Xie. 2016. Fine-granularity tile-level parallelism in non-volatile memory architecture with two-dimensional bank subdivision. In Proceedings of the IEEE/ACM Design Automation Conference (DAC’16). Google ScholarDigital Library
J. R. Quinlan. 1986. Induction of Decision Trees. Kluwer Academic Publishers, Amsterdam. 81--106. Google ScholarDigital Library
J. Ross Quinlan. 1996. Bagging, boosting, and C4.5. In Proceedings of the 13th National Conference on Artificial Intelligence and 8th Innovative Applications of Artificial Intelligence Conference, (AAAI ’96 and IAAI’96). 725--730. Google ScholarDigital Library
R. Salakhutdinov and G. E. Hinton. 2012. An efficient learning procedure for deep boltzmann machines. Neur. Comput. 24, 8 (2012), 1967--2006. Google ScholarDigital Library
M. Sankaradas, V. Jakkula, S. Cadambi, S. Chakradhar, I. Durdanovic, E. Cosatto, and H. P. Graf. 2009. A massively parallel coprocessor for convolutional neural networks. In Proceedings of the 20th IEEE International Conference on Application-specific Systems, Architectures and Processors. Google ScholarDigital Library
R. Sarikaya, G. E. Hinton, and A. Deoras. 2014. Application of deep belief networks for natural language understanding. IEEE Trans. Aud. Speech Lang. Process. 22, 4 (2014), 778--784. Google ScholarDigital Library
P. Sermanet and Y. LeCun. 2011. Traffic sign recognition with multi-scale convolutional networks. In Proceedings of the 2011 International Joint Conference on Neural Networks.Google Scholar
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations. http://arxiv.org/abs/1409.1556Google Scholar
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the Computer Vision and Pattern Recognition. 1--9.Google ScholarCross Ref
O. Temam. 2012. A defect-tolerant accelerator for emerging high-performance applications. In Proceedings of the 39th Annual International Symposium on Computer Architecture. Google ScholarDigital Library
V. Vanhoucke, A. Senior, and M. Z. Mao. 2011. Improving the speed of neural networks on CPUs. In Proceedings of the Deep Learning and Unsupervised Feature Learning Workshop (NIPS’11).Google Scholar
Yu Wang, Tianqi Tang, Lixue Xia, Boxun Li, Peng Gu, Huazhong Yang, Hai Li, and Yuan Xie. 2015. Energy efficient RRAM spiking neural network for real time classification. In Proceedings of the 25th Edition of the Great Lakes Symposium on VLSI. Google ScholarDigital Library
Cong Xu, Dimin Niu, Naveen Muralimanohar, Rajeev Balasubramonian, Tao Zhang, Shimeng Yu, and Yuan Xie. 2015. Overcoming the challenges of cross-point resistive memory architectures. In Proceedings of the 21st International Symposium on High Performance Computer Architecture.Google Scholar
Tao Xu, Jieping Zhou, Jianhua Gong, Wenyi Sun, Liqun Fang, and Yanli Li. 2012. Improved SOM based data mining of seasonal flu in mainland China. In Proceedings of the 2012 8th International Conference on Natural Computation.Google ScholarCross Ref
Xian-Hua Zeng, Si-Wei Luo, and Jiao Wang. 2007. Auto-associative neural network system for recognition. In Proceedings of the 2007 International Conference on Machine Learning and Cybernetics.Google ScholarCross Ref
Zhengyou Zhang, M. Lyons, M. Schuster, and S. Akamatsu. 1998. Comparison between geometry-based and Gabor-wavelets-based facial expression recognition using multi-layer perceptron. In Proceedings of the 3rd IEEE International Conference on Automatic Face and Gesture Recognition. Google ScholarDigital Library
Jishen Zhao, Guangyu Sun, Gabriel H. Loh, and Yuan Xie. 2013. Optimizing GPU energy efficiency with 3D die-stacking graphics memory and reconfigurable memory interface. ACM Trans. Arch. Code Optimiz. 10, 4, Article 24 (Dec. 2013), 25 pages. Google ScholarDigital Library

Index Terms

An Instruction Set Architecture for Machine Learning
1. Hardware
  1. Very large scale integration design
    1. Application-specific VLSI designs
      1. Application specific instruction set processors

Recommendations

Block-aware instruction set architecture

Instruction delivery is a critical component for wide-issue, high-frequency processors since its bandwidth and accuracy place an upper limit on performance. The processor front-end accuracy and bandwidth are limited by instruction-cache misses, ...
Read More
Clockhands: Rename-free Instruction Set Architecture for Out-of-order Processors
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture

Out-of-order superscalar processors are currently the only architecture that speeds up irregular programs, but they suffer from poor power efficiency. To tackle this issue, we focused on how to specify register operands. Specifying operands by register ...
Read More
Flag and Register Array Based High Performance Instruction Set Architecture of Embedded Processor
CSNT '13: Proceedings of the 2013 International Conference on Communication Systems and Network Technologies

Here, assumption is that if we add 8 numbers from register array then it takes 120ns when execution time is 5ns and register access time is 10ns. If we add same 8 number using one by one fetching from memory then it takes 840ns to add 8 numbers. In that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Computer Systems Volume 36, Issue 3
August 2018
99 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/3341160
Editor:
Michael Swift
University of Wisconsin, USA
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 August 2019
- Accepted: 1 May 2019
- Revised: 1 April 2019
- Received: 1 May 2018
Published in tocs Volume 36, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Instruction set architecture
machine learning
machine-learning accelerator
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 6,280
  Total Downloads
- Downloads (Last 12 months)1,729
- Downloads (Last 6 weeks)232
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

An Instruction Set Architecture for Machine Learning

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Block-aware instruction set architecture

Clockhands: Rename-free Instruction Set Architecture for Out-of-order Processors

Flag and Register Array Based High Performance Instruction Set Architecture of Embedded Processor

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

An Instruction Set Architecture for Machine Learning

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Block-aware instruction set architecture

Clockhands: Rename-free Instruction Set Architecture for Out-of-order Processors

Flag and Register Array Based High Performance Instruction Set Architecture of Embedded Processor

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media