Abstract
The increasing demand of Industrial and Scientific data intensive applications are higher precision arithmetic with reduced computation time. In this paper, we designed a high-precision, fully pipelined 32-bit floating-point (FP) divider using Newton–Raphson (NR) algorithm realized with Urdhva–Tiryakbhyam (UT) multiplier for System on Chip applications. The divider design is based on Newton–Raphson (multiplicative) method and it supports all IEEE rounding modes with a latency of 15 cycles. The iterative NR computations are performed by using FP multiplier and FP adder. The key module of FP multiplier for calculating mantissa part is UT multiplier. It’s an ancient Vedic multiplication technique used from few centuries back for doing fast multiplications. We implemented two UT multipliers: one using carry look-ahead adders and another one using carry save adders. The results show that, the proposed architectures have 12% better precision with 24% high throughput than existing algorithms, at the cost of high on-chip power. The inputs to the divider are represented in IEEE-754 standard. The design uses Xilinx Vivado software and it is implemented on Virtex7 FPGA.
Similar content being viewed by others
References
IEEE Computer Society (2008) IEEE standard for floating-point arithmetic. IEEE standard 754-2008, August 2008. http://ieeexplore.ieee.org/servlet/opac?punumber=4610933
Jean-Michel M, Brisebarre N, Dinechin FD, Lefevre V et al (2010) Handbook of floating-point arithmetic. Birkhauser, Springer, Berlin
Liu Wie, Nannarelli Alberto (2013) Power efficient division and square root unit. IEEE Trans Comput 61(8):1059–1071
Baliga H, Cooray N, Gamsaragan E, Smith P, Yoon K, Abel J, Valles A (2008) Improvements in the Intel Core2 Penryn processor family architecture and microarchitecture. Intel Technol J 12(3):179–192
Gerwig G, Wetter H, Schwarz EM, Haess J (2003) High performance floating-point unit with 116 bit wide divider. In: Proceedings of the 16th symposium computer arithmetic, pp. 87–94
Soderquist Peter, Lesser Miriam (1996) Area and performance tradeoffs in floating-point divide and square root implementations. ACM Comput Surv 28(3):1–48
Goldberg R, Even G, Seidel PM (2007) An FPGA implementation of pipelined multiplicative division with IEEE rounding. In: International symposium on field-programmable custom computing machines, pp. 185–94
Oberman SF (1999) Floating-point division and square root algorithms and implementation in the AMD- K7 microprocessor. In: 14th IEEE Symposium Computer Arithmetic, Adelaide, pp. 106–115
Fermi (2009) NVIDIA’s next generation CUDA compute architecture. http://www.nvidia.com/content/PDF/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf. Accessed 25 Apr 2019
Amaricai Alexandru, Vladutiu Mircea, Boncalo Oana (2010) Design issues and implementations for floating-point divide-add fused. IEEE Trans Circuits and Systems 57(4):295–299
Lenart T, Owall V (2006) Architectures for dynamic data scaling in 2/4/8 K pipeline FFT cores. IEEE Trans Very Large Scale Integr 14(11):1286–1290
Kornerup Peter, Muller Jean-Michel (2006) Choosing starting values for certain Newton–Raphson Iterations. Theoret Comput Sci 351:101–110
Nannarelli Alberto (2016) Performance/Power Space Exploration for Binary64 Division Units. IEEE Trans. Comput 65(5):1671–1677
Arish S, Sharma RK (2015) An efficient floating point multiplier design for high speed applications using Karatsuba algorithm and Urdhva–Tiryagbhyam algorithm. In: IEEE international conference on signal processing and communication, pp. 303–308
Ali MH, Sahani AK (2013) Study, implementation and comparison of different multipliers based on Array, KCM and Vedic Mathematics using EDA tools. Int J Sci Res Publ 3(6):1
Galal Sameh, Horowitz Mark (2011) Energy-efficient floating-point unit design. IEEE Trans Comput 60(7):913–922
Tino Anita, Raahemifar Kaamran (2017) Increasing the efficiency and feasibility of configurable computing units. J Syst Arch 75:107–119
Bailey DH (2005) High-precision floating-point arithmetic in scientific computation. Comput Sci Eng 7(3):54–61
Hanuman CRS, Kamala J (2017) Implementation of high-performance floating point divider using pipeline architecture in FPGA. In: Proceedings of 6th international conference on FICTA—advances in intelligent systems and computing, Bhubaneswar, pp. 129–138
Venkataraman NL, Kumar R, Shakeel PM (2019) Ant lion optimized bufferless routing in the design of low power application specific network on chip. Circuits Syst Signal Process. https://doi.org/10.1007/s00034-019-01065-6
Joldes Mioara, Marty Olivier, Muller Jean-Michel (2016) Arithmetic algorithms for extended precision using floating-point expansions. IEEE Trans Comput 65(4):1197–1210
Tan Dimitri, Lemonds Carl E, Schulte Michel J (2009) Low-power multiple-precision iterative floating- point multiplier with SIMD support. IEEE Trans Comput 58(2):175–187
Renxi G, Shangjun Z, Hainan Z, Xiaobi M, et al (2015) Hardware implementation of high speed floating point multiplier based on FPGA. In: IEEE international conference on computer science and education, pp. 1902–1906
Sriraman L, Prabhakar TN (2012) Design and implementation of two variable multiplier using KCM and vedic mathematics. In: IEEE International Conference on Recent Advances in Information Technolog, pp. 852–857
Anjana S, Pradeep C, Samuel P (2015) Synthesize of high speed floating-point multipliers based on Vedic mathematics. Proc Comput Sci 46:1294–1302
Kodali RK, Boppana L, Yenamachintala SS (2015) FPGA Implementation of Vedic floating point multiplier. In: IEEE international conference on signal processing, informatics, communication and energy systems, Kozhikode
Zhang Hao, Chen Dongdong, Ko Seok-Bum (2017) High performance and energy efficient single- precision and double-precision merged floating-point adder on FPGA. IET Comput Digital Tech 12(1):20–29
Luo Zhen, Martonosi Margaret (2000) Accelerating pipelined integer and floating-point accumulations in configurable hardware with delayed addition techs. IEEE Trans Comput 49(3):208–218
Tingting HE, Jiyang CHEN, Baozhou ZHU et al (2017) High-performance FP divider with sharing multipliers based on goldschmidt algorithm. Chin J Electron 26(2):292–298
Campbell Keith, Zuo Wei, Chen Deming (2017) New advances of high-level synthesis for efficient and reliable hardware design. Interation 58:189–214
Pimentel JJ, Bohnenstiehl B, Baas BM (2016) Hybrid hardware/software floating-point implementations for optimized area and throughput tradeoffs. IEEE Trans Very Large Scale Integr Syst 25(1):100–113
Guralnik Elena, Aharoni Merav, Birnbaum Ariel J, Koyfman Anatoli (2011) Simulation-based verification of floating-point division. IEEE Trans Comput 60(2):176–188
Aguilera-Galicia CR, Longoria-Gandara O, Pizano-Escalante L, Vázquez-Castillo J, Salim-Maza M (2018) On-chip implementation of a low-latency bit-accurate reciprocal square root unit. Integration 63:9–17
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hanuman, C.R.S., Kamala, J. & Aruna, A.R. Implementation of high precision/low latency FP divider using Urdhva–Tiryakbhyam multiplier for SoC applications. Des Autom Embed Syst 24, 111–125 (2020). https://doi.org/10.1007/s10617-019-09225-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10617-019-09225-2