Implementation of high precision/low latency FP divider using Urdhva–Tiryakbhyam multiplier for SoC applications

Hanuman, C. R. S.; Kamala, J.; Aruna, A. R.

doi:10.1007/s10617-019-09225-2

Implementation of high precision/low latency FP divider using Urdhva–Tiryakbhyam multiplier for SoC applications

Published: 05 November 2019

Volume 24, pages 111–125, (2020)
Cite this article

Design Automation for Embedded Systems Aims and scope Submit manuscript

C. R. S. Hanuman¹,
J. Kamala¹ &
A. R. Aruna¹

299 Accesses
Explore all metrics

Abstract

The increasing demand of Industrial and Scientific data intensive applications are higher precision arithmetic with reduced computation time. In this paper, we designed a high-precision, fully pipelined 32-bit floating-point (FP) divider using Newton–Raphson (NR) algorithm realized with Urdhva–Tiryakbhyam (UT) multiplier for System on Chip applications. The divider design is based on Newton–Raphson (multiplicative) method and it supports all IEEE rounding modes with a latency of 15 cycles. The iterative NR computations are performed by using FP multiplier and FP adder. The key module of FP multiplier for calculating mantissa part is UT multiplier. It’s an ancient Vedic multiplication technique used from few centuries back for doing fast multiplications. We implemented two UT multipliers: one using carry look-ahead adders and another one using carry save adders. The results show that, the proposed architectures have 12% better precision with 24% high throughput than existing algorithms, at the cost of high on-chip power. The inputs to the divider are represented in IEEE-754 standard. The design uses Xilinx Vivado software and it is implemented on Virtex7 FPGA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Implementation of multi-precision floating point divider for high speed signal processing applications

Article 21 June 2019

Implementation of High-Performance Floating Point Divider Using Pipeline Architecture on FPGA

IEEE 754 Floating Point Pipelined Multiplier with Karatsuba for Mitigations of Area and Power

References

IEEE Computer Society (2008) IEEE standard for floating-point arithmetic. IEEE standard 754-2008, August 2008. http://ieeexplore.ieee.org/servlet/opac?punumber=4610933
Jean-Michel M, Brisebarre N, Dinechin FD, Lefevre V et al (2010) Handbook of floating-point arithmetic. Birkhauser, Springer, Berlin
MATH Google Scholar
Liu Wie, Nannarelli Alberto (2013) Power efficient division and square root unit. IEEE Trans Comput 61(8):1059–1071
Article MathSciNet Google Scholar
Baliga H, Cooray N, Gamsaragan E, Smith P, Yoon K, Abel J, Valles A (2008) Improvements in the Intel Core2 Penryn processor family architecture and microarchitecture. Intel Technol J 12(3):179–192
Google Scholar
Gerwig G, Wetter H, Schwarz EM, Haess J (2003) High performance floating-point unit with 116 bit wide divider. In: Proceedings of the 16th symposium computer arithmetic, pp. 87–94
Soderquist Peter, Lesser Miriam (1996) Area and performance tradeoffs in floating-point divide and square root implementations. ACM Comput Surv 28(3):1–48
Article Google Scholar
Goldberg R, Even G, Seidel PM (2007) An FPGA implementation of pipelined multiplicative division with IEEE rounding. In: International symposium on field-programmable custom computing machines, pp. 185–94
Oberman SF (1999) Floating-point division and square root algorithms and implementation in the AMD- K7 microprocessor. In: 14th IEEE Symposium Computer Arithmetic, Adelaide, pp. 106–115
Fermi (2009) NVIDIA’s next generation CUDA compute architecture. http://www.nvidia.com/content/PDF/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf. Accessed 25 Apr 2019
Amaricai Alexandru, Vladutiu Mircea, Boncalo Oana (2010) Design issues and implementations for floating-point divide-add fused. IEEE Trans Circuits and Systems 57(4):295–299
Google Scholar
Lenart T, Owall V (2006) Architectures for dynamic data scaling in 2/4/8 K pipeline FFT cores. IEEE Trans Very Large Scale Integr 14(11):1286–1290
Article Google Scholar
Kornerup Peter, Muller Jean-Michel (2006) Choosing starting values for certain Newton–Raphson Iterations. Theoret Comput Sci 351:101–110
Article MathSciNet Google Scholar
Nannarelli Alberto (2016) Performance/Power Space Exploration for Binary64 Division Units. IEEE Trans. Comput 65(5):1671–1677
Article MathSciNet Google Scholar
Arish S, Sharma RK (2015) An efficient floating point multiplier design for high speed applications using Karatsuba algorithm and Urdhva–Tiryagbhyam algorithm. In: IEEE international conference on signal processing and communication, pp. 303–308
Ali MH, Sahani AK (2013) Study, implementation and comparison of different multipliers based on Array, KCM and Vedic Mathematics using EDA tools. Int J Sci Res Publ 3(6):1
Google Scholar
Galal Sameh, Horowitz Mark (2011) Energy-efficient floating-point unit design. IEEE Trans Comput 60(7):913–922
Article MathSciNet Google Scholar
Tino Anita, Raahemifar Kaamran (2017) Increasing the efficiency and feasibility of configurable computing units. J Syst Arch 75:107–119
Article Google Scholar
Bailey DH (2005) High-precision floating-point arithmetic in scientific computation. Comput Sci Eng 7(3):54–61
Article Google Scholar
Hanuman CRS, Kamala J (2017) Implementation of high-performance floating point divider using pipeline architecture in FPGA. In: Proceedings of 6th international conference on FICTA—advances in intelligent systems and computing, Bhubaneswar, pp. 129–138
Venkataraman NL, Kumar R, Shakeel PM (2019) Ant lion optimized bufferless routing in the design of low power application specific network on chip. Circuits Syst Signal Process. https://doi.org/10.1007/s00034-019-01065-6
Article Google Scholar
Joldes Mioara, Marty Olivier, Muller Jean-Michel (2016) Arithmetic algorithms for extended precision using floating-point expansions. IEEE Trans Comput 65(4):1197–1210
Article MathSciNet Google Scholar
Tan Dimitri, Lemonds Carl E, Schulte Michel J (2009) Low-power multiple-precision iterative floating- point multiplier with SIMD support. IEEE Trans Comput 58(2):175–187
Article MathSciNet Google Scholar
Renxi G, Shangjun Z, Hainan Z, Xiaobi M, et al (2015) Hardware implementation of high speed floating point multiplier based on FPGA. In: IEEE international conference on computer science and education, pp. 1902–1906
Sriraman L, Prabhakar TN (2012) Design and implementation of two variable multiplier using KCM and vedic mathematics. In: IEEE International Conference on Recent Advances in Information Technolog, pp. 852–857
Anjana S, Pradeep C, Samuel P (2015) Synthesize of high speed floating-point multipliers based on Vedic mathematics. Proc Comput Sci 46:1294–1302
Article Google Scholar
Kodali RK, Boppana L, Yenamachintala SS (2015) FPGA Implementation of Vedic floating point multiplier. In: IEEE international conference on signal processing, informatics, communication and energy systems, Kozhikode
Zhang Hao, Chen Dongdong, Ko Seok-Bum (2017) High performance and energy efficient single- precision and double-precision merged floating-point adder on FPGA. IET Comput Digital Tech 12(1):20–29
Article Google Scholar
Luo Zhen, Martonosi Margaret (2000) Accelerating pipelined integer and floating-point accumulations in configurable hardware with delayed addition techs. IEEE Trans Comput 49(3):208–218
Article Google Scholar
Tingting HE, Jiyang CHEN, Baozhou ZHU et al (2017) High-performance FP divider with sharing multipliers based on goldschmidt algorithm. Chin J Electron 26(2):292–298
Article Google Scholar
Campbell Keith, Zuo Wei, Chen Deming (2017) New advances of high-level synthesis for efficient and reliable hardware design. Interation 58:189–214
Google Scholar
Pimentel JJ, Bohnenstiehl B, Baas BM (2016) Hybrid hardware/software floating-point implementations for optimized area and throughput tradeoffs. IEEE Trans Very Large Scale Integr Syst 25(1):100–113
Article Google Scholar
Guralnik Elena, Aharoni Merav, Birnbaum Ariel J, Koyfman Anatoli (2011) Simulation-based verification of floating-point division. IEEE Trans Comput 60(2):176–188
Article MathSciNet Google Scholar
Aguilera-Galicia CR, Longoria-Gandara O, Pizano-Escalante L, Vázquez-Castillo J, Salim-Maza M (2018) On-chip implementation of a low-latency bit-accurate reciprocal square root unit. Integration 63:9–17
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communications Engineering, CEG, Anna University, Chennai, India
C. R. S. Hanuman, J. Kamala & A. R. Aruna

Authors

C. R. S. Hanuman
View author publications
You can also search for this author in PubMed Google Scholar
J. Kamala
View author publications
You can also search for this author in PubMed Google Scholar
A. R. Aruna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. R. S. Hanuman.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hanuman, C.R.S., Kamala, J. & Aruna, A.R. Implementation of high precision/low latency FP divider using Urdhva–Tiryakbhyam multiplier for SoC applications. Des Autom Embed Syst 24, 111–125 (2020). https://doi.org/10.1007/s10617-019-09225-2

Download citation

Received: 19 May 2019
Accepted: 20 October 2019
Published: 05 November 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10617-019-09225-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Implementation of high precision/low latency FP divider using Urdhva–Tiryakbhyam multiplier for SoC applications

Abstract

Access this article

Similar content being viewed by others

Implementation of multi-precision floating point divider for high speed signal processing applications

Implementation of High-Performance Floating Point Divider Using Pipeline Architecture on FPGA

IEEE 754 Floating Point Pipelined Multiplier with Karatsuba for Mitigations of Area and Power

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Implementation of high precision/low latency FP divider using Urdhva–Tiryakbhyam multiplier for SoC applications

Abstract

Access this article

Similar content being viewed by others

Implementation of multi-precision floating point divider for high speed signal processing applications

Implementation of High-Performance Floating Point Divider Using Pipeline Architecture on FPGA

IEEE 754 Floating Point Pipelined Multiplier with Karatsuba for Mitigations of Area and Power

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation