Skip to main content

Advertisement

Log in

Implementation of high precision/low latency FP divider using Urdhva–Tiryakbhyam multiplier for SoC applications

  • Published:
Design Automation for Embedded Systems Aims and scope Submit manuscript

Abstract

The increasing demand of Industrial and Scientific data intensive applications are higher precision arithmetic with reduced computation time. In this paper, we designed a high-precision, fully pipelined 32-bit floating-point (FP) divider using Newton–Raphson (NR) algorithm realized with Urdhva–Tiryakbhyam (UT) multiplier for System on Chip applications. The divider design is based on Newton–Raphson (multiplicative) method and it supports all IEEE rounding modes with a latency of 15 cycles. The iterative NR computations are performed by using FP multiplier and FP adder. The key module of FP multiplier for calculating mantissa part is UT multiplier. It’s an ancient Vedic multiplication technique used from few centuries back for doing fast multiplications. We implemented two UT multipliers: one using carry look-ahead adders and another one using carry save adders. The results show that, the proposed architectures have 12% better precision with 24% high throughput than existing algorithms, at the cost of high on-chip power. The inputs to the divider are represented in IEEE-754 standard. The design uses Xilinx Vivado software and it is implemented on Virtex7 FPGA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. IEEE Computer Society (2008) IEEE standard for floating-point arithmetic. IEEE standard 754-2008, August 2008. http://ieeexplore.ieee.org/servlet/opac?punumber=4610933

  2. Jean-Michel M, Brisebarre N, Dinechin FD, Lefevre V et al (2010) Handbook of floating-point arithmetic. Birkhauser, Springer, Berlin

    MATH  Google Scholar 

  3. Liu Wie, Nannarelli Alberto (2013) Power efficient division and square root unit. IEEE Trans Comput 61(8):1059–1071

    Article  MathSciNet  Google Scholar 

  4. Baliga H, Cooray N, Gamsaragan E, Smith P, Yoon K, Abel J, Valles A (2008) Improvements in the Intel Core2 Penryn processor family architecture and microarchitecture. Intel Technol J 12(3):179–192

    Google Scholar 

  5. Gerwig G, Wetter H, Schwarz EM, Haess J (2003) High performance floating-point unit with 116 bit wide divider. In: Proceedings of the 16th symposium computer arithmetic, pp. 87–94

  6. Soderquist Peter, Lesser Miriam (1996) Area and performance tradeoffs in floating-point divide and square root implementations. ACM Comput Surv 28(3):1–48

    Article  Google Scholar 

  7. Goldberg R, Even G, Seidel PM (2007) An FPGA implementation of pipelined multiplicative division with IEEE rounding. In: International symposium on field-programmable custom computing machines, pp. 185–94

  8. Oberman SF (1999) Floating-point division and square root algorithms and implementation in the AMD- K7 microprocessor. In: 14th IEEE Symposium Computer Arithmetic, Adelaide, pp. 106–115

  9. Fermi (2009) NVIDIA’s next generation CUDA compute architecture. http://www.nvidia.com/content/PDF/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf. Accessed 25 Apr 2019

  10. Amaricai Alexandru, Vladutiu Mircea, Boncalo Oana (2010) Design issues and implementations for floating-point divide-add fused. IEEE Trans Circuits and Systems 57(4):295–299

    Google Scholar 

  11. Lenart T, Owall V (2006) Architectures for dynamic data scaling in 2/4/8 K pipeline FFT cores. IEEE Trans Very Large Scale Integr 14(11):1286–1290

    Article  Google Scholar 

  12. Kornerup Peter, Muller Jean-Michel (2006) Choosing starting values for certain Newton–Raphson Iterations. Theoret Comput Sci 351:101–110

    Article  MathSciNet  Google Scholar 

  13. Nannarelli Alberto (2016) Performance/Power Space Exploration for Binary64 Division Units. IEEE Trans. Comput 65(5):1671–1677

    Article  MathSciNet  Google Scholar 

  14. Arish S, Sharma RK (2015) An efficient floating point multiplier design for high speed applications using Karatsuba algorithm and Urdhva–Tiryagbhyam algorithm. In: IEEE international conference on signal processing and communication, pp. 303–308

  15. Ali MH, Sahani AK (2013) Study, implementation and comparison of different multipliers based on Array, KCM and Vedic Mathematics using EDA tools. Int J Sci Res Publ 3(6):1

    Google Scholar 

  16. Galal Sameh, Horowitz Mark (2011) Energy-efficient floating-point unit design. IEEE Trans Comput 60(7):913–922

    Article  MathSciNet  Google Scholar 

  17. Tino Anita, Raahemifar Kaamran (2017) Increasing the efficiency and feasibility of configurable computing units. J Syst Arch 75:107–119

    Article  Google Scholar 

  18. Bailey DH (2005) High-precision floating-point arithmetic in scientific computation. Comput Sci Eng 7(3):54–61

    Article  Google Scholar 

  19. Hanuman CRS, Kamala J (2017) Implementation of high-performance floating point divider using pipeline architecture in FPGA. In: Proceedings of 6th international conference on FICTA—advances in intelligent systems and computing, Bhubaneswar, pp. 129–138

  20. Venkataraman NL, Kumar R, Shakeel PM (2019) Ant lion optimized bufferless routing in the design of low power application specific network on chip. Circuits Syst Signal Process. https://doi.org/10.1007/s00034-019-01065-6

    Article  Google Scholar 

  21. Joldes Mioara, Marty Olivier, Muller Jean-Michel (2016) Arithmetic algorithms for extended precision using floating-point expansions. IEEE Trans Comput 65(4):1197–1210

    Article  MathSciNet  Google Scholar 

  22. Tan Dimitri, Lemonds Carl E, Schulte Michel J (2009) Low-power multiple-precision iterative floating- point multiplier with SIMD support. IEEE Trans Comput 58(2):175–187

    Article  MathSciNet  Google Scholar 

  23. Renxi G, Shangjun Z, Hainan Z, Xiaobi M, et al (2015) Hardware implementation of high speed floating point multiplier based on FPGA. In: IEEE international conference on computer science and education, pp. 1902–1906

  24. Sriraman L, Prabhakar TN (2012) Design and implementation of two variable multiplier using KCM and vedic mathematics. In: IEEE International Conference on Recent Advances in Information Technolog, pp. 852–857

  25. Anjana S, Pradeep C, Samuel P (2015) Synthesize of high speed floating-point multipliers based on Vedic mathematics. Proc Comput Sci 46:1294–1302

    Article  Google Scholar 

  26. Kodali RK, Boppana L, Yenamachintala SS (2015) FPGA Implementation of Vedic floating point multiplier. In: IEEE international conference on signal processing, informatics, communication and energy systems, Kozhikode

  27. Zhang Hao, Chen Dongdong, Ko Seok-Bum (2017) High performance and energy efficient single- precision and double-precision merged floating-point adder on FPGA. IET Comput Digital Tech 12(1):20–29

    Article  Google Scholar 

  28. Luo Zhen, Martonosi Margaret (2000) Accelerating pipelined integer and floating-point accumulations in configurable hardware with delayed addition techs. IEEE Trans Comput 49(3):208–218

    Article  Google Scholar 

  29. Tingting HE, Jiyang CHEN, Baozhou ZHU et al (2017) High-performance FP divider with sharing multipliers based on goldschmidt algorithm. Chin J Electron 26(2):292–298

    Article  Google Scholar 

  30. Campbell Keith, Zuo Wei, Chen Deming (2017) New advances of high-level synthesis for efficient and reliable hardware design. Interation 58:189–214

    Google Scholar 

  31. Pimentel JJ, Bohnenstiehl B, Baas BM (2016) Hybrid hardware/software floating-point implementations for optimized area and throughput tradeoffs. IEEE Trans Very Large Scale Integr Syst 25(1):100–113

    Article  Google Scholar 

  32. Guralnik Elena, Aharoni Merav, Birnbaum Ariel J, Koyfman Anatoli (2011) Simulation-based verification of floating-point division. IEEE Trans Comput 60(2):176–188

    Article  MathSciNet  Google Scholar 

  33. Aguilera-Galicia CR, Longoria-Gandara O, Pizano-Escalante L, Vázquez-Castillo J, Salim-Maza M (2018) On-chip implementation of a low-latency bit-accurate reciprocal square root unit. Integration 63:9–17

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. R. S. Hanuman.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hanuman, C.R.S., Kamala, J. & Aruna, A.R. Implementation of high precision/low latency FP divider using Urdhva–Tiryakbhyam multiplier for SoC applications. Des Autom Embed Syst 24, 111–125 (2020). https://doi.org/10.1007/s10617-019-09225-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10617-019-09225-2

Keywords

Navigation