Abstract
The emergence of low precision floating-point arithmetic in computer hardware has led to a resurgence of interest in the use of mixed precision numerical linear algebra. For linear systems of equations, there has been renewed enthusiasm for mixed precision variants of iterative refinement. We consider the iterative solution of large sparse systems using incomplete factorization preconditioners. The focus is on the robust computation of such preconditioners in half precision arithmetic and employing them to solve symmetric positive definite systems to higher precision accuracy; however, the proposed ideas can be applied more generally. Even for well-conditioned problems, incomplete factorizations can break down when small entries occur on the diagonal during the factorization. When using half precision arithmetic, overflows are an additional possible source of breakdown. We examine how breakdowns can be avoided and we implement our strategies within new half precision Fortran sparse incomplete Cholesky factorization software. Results are reported for a range of problems from practical applications. These demonstrate that, even for highly ill-conditioned problems, half precision preconditioners can potentially replace double precision preconditioners, although unsurprisingly this may be at the cost of additional iterations of a Krylov solver.
- A. Abdelfattah, H. Anzt, E. G. Boman, E. Carson, T. Cojean, J. Dongarra, A. Fox, M. Gates, N. J. Higham, X. S. Li, J. Loe, P. Luszczek, S. Pranesh, S. Rajamanickam, T. Ribizel, B. F. Smith, K. Swirydowicz, S. Thomas, S. Tomov, Y. M. Tzai, and U. Meier Yang. 2021. A survey of numerical linear algebra methods utilizing mixed-precision arithmetic. International J. High Performance Computing Applications 35, 4 (2021), 344–369. https://doi.org/10.1177/10943420211003313Google ScholarDigital Library
- P. Amestoy, A. Buttari, N. J. Higham, J.-Y. L’Excellent, T. Mary, and B. Vieuble. 2021. Five-Precision GMRES-based iterative refinement. Technical Report MIMS EPrint: 2021.5. Manchester Institute for Mathematical Sciences, University of Manchester.Google Scholar
- P. Amestoy, A. Buttari, N. J. Higham, J.-Y. L’Excellent, T. Mary, and B. Vieuble. 2023. Combining sparse approximate factorizations with mixed precision iterative refinement. ACM Trans. Math. Software 49, 1 (2023), 4:1–4:28. https://doi.org/10.1145/3582493Google ScholarDigital Library
- P. R. Amestoy, I. S. Duff, and J-Y L’Excellent. 2000. Multifrontal parallel distributed symmetric and unsymmetric solvers. Computer methods in applied mechanics and engineering 184, 2-4 (2000), 501–520. https://doi.org/10.1016/S0045-7825(99)00242-XGoogle ScholarCross Ref
- M. Arioli and I. S Duff. 2009. Using FGMRES to obtain backward stability in mixed precision. Electronic Transactions on Numerical Analysis 33 (2009), 31–44.Google Scholar
- M. Arioli, I. S. Duff, S. Gratton, and S. Pralet. 2007. A note on GMRES preconditioned by a perturbed LDLT decomposition with static pivoting. SIAM J. on Scientific Computing 29, 5 (2007), 2024–2044. https://doi.org/10.1137/060661545Google ScholarDigital Library
- M. Baboulin, A. Buttari, J. Dongarra, J. Kurzak, J. Langou, J. Langou, P. Luszczek, and S. Tomov. 2009. Accelerating scientific computations with mixed precision algorithms. Comp. Phys. Comm. 180, 12 (2009), 2526–2533. https://doi.org/10.1016/j.cpc.2008.11.005Google ScholarCross Ref
- M. Benzi. 2002. Preconditioning techniques for large linear systems: a survey. J. of Computational Physics 182, 2 (2002), 418–477. https://doi.org/10.1006/jcph.2002.7176Google ScholarDigital Library
- M. Benzi, C. D. Meyer, and M. Tůma. 1996. A sparse approximate inverse preconditioner for the conjugate gradient method. SIAM J. on Scientific Computing 17, 5 (1996), 1135–1149. https://doi.org/10.1137/S1064827594271421Google ScholarDigital Library
- A. Buttari, J. Dongarra, and J. Kurzak. 2007. Limitations of the PlayStation 3 for high performance cluster computing. Technical Report MIMS EPrint: 2007.93. Manchester Institute for Mathematical Sciences, University of Manchester.Google Scholar
- A. Buttari, J. Dongarra, J. Kurzak, P. Luszczek, and S. Tomov. 2008. Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-bit accuracy. ACM Trans. Math. Software 34 (2008), 17:1–17:22. https://doi.org/10.1145/1377596.1377597Google ScholarDigital Library
- A. Buttari, J. Dongarra, J. Langou, J. Langou, P. Luszczek, and J. Kurzak. 2007. Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems. International J. High Performance Computing Applications 21, 4 (2007), 457–466. https://doi.org/10.1177/1094342007084026Google ScholarDigital Library
- E. Carson, N. Higham, and S. Pranesh. 2020. Three-Precision GMRES-based Iterative Refinement for Least Squares Problems. SIAM J. on Scientific Computing 42, 6 (2020), A4063–A4083. https://doi.org/10.1137/20M1316822Google ScholarDigital Library
- E. Carson and N. J. Higham. 2017. A new analysis of iterative refinement and its application to accurate solution of ill-conditioned sparse linear systems. SIAM J. on Scientific Computing 39, 6 (2017), A2834–A2856. https://doi.org/10.1137/17M1122918Google ScholarDigital Library
- E. Carson and N. J. Higham. 2018. Accelerating the solution of linear systems by iterative refinement in three precisions. SIAM J. on Scientific Computing 40, 2 (2018), A817–A847. https://doi.org/10.1137/17M1140819Google ScholarDigital Library
- E. Carson and N. Khan. 2023. Mixed precision iterative refinement with sparse approximate inverse preconditioning. SIAM J. Sci. Comput. 45, 3 (2023), C131–C153. https://doi.org/10.1137/22M1487709Google ScholarCross Ref
- T. F. Chan and H. A. van der Vorst. 1997. Approximate and incomplete factorizations. In Parallel Numerical Algorithms, ICASE/LaRC Interdisciplinary Series in Science and Engineering IV. Centenary Conference, D.E. Keyes, A. Sameh and V. Venkatakrishnan, eds. Kluver Academic Publishers, Dordrecht, 167–202. https://doi.org/10.1007/978-94-011-5412-3_6Google ScholarCross Ref
- J. Demmel, Y. Hida, W. Kahan, X. S. Li, S. Mukherjee, and E. J. Riedy. 2006. Error bounds from extra-precise iterative refinement. ACM Trans. Math. Software 32, 2 (2006), 325–351. https://doi.org/10.1145/1141885.1141894Google ScholarDigital Library
- A. Greenbaum. 1997. Estimating the attainable accuracy of recursively computed residual methods. SIAM J. on Matrix Analysis and Applications 18 (1997), 535–551. https://doi.org/10.1137/S0895479895284944Google ScholarDigital Library
- M. J. Grote and T. Huckle. 1997. Parallel preconditioning with sparse approximate inverses. SIAM J. on Scientific Computing 18, 3 (1997), 838–853. https://doi.org/10.1137/S1064827594276552Google ScholarDigital Library
- N. J. Higham. 2002. Accuracy and Stability of Numerical Algorithms (second ed.). SIAM, Philadelphia, PA. xxx+680 pages. https://doi.org/10.1137/1.9780898718027Google ScholarCross Ref
- N. J. Higham and T. Mary. 2022. Mixed precision algorithms in numerical linear algebra. Acta Numerica 31 (2022), 347–414. https://doi.org/10.1017/S0962492922000022Google ScholarCross Ref
- N. J. Higham and S. Pranesh. 2019. Simulating low precision floating-point arithmetic. SIAM J. on Scientific Computing 41, 5 (2019), C585–C602. https://doi.org/10.1137/19M1251308Google ScholarDigital Library
- N. J. Higham and S. Pranesh. 2021. Exploiting lower precision arithmetic in solving symmetric positive definite linear systems and least squares problems. SIAM J. on Scientific Computing 43, 1 (2021), A258–A277. https://doi.org/10.1137/19M1298263Google ScholarDigital Library
- N. J. Higham, S. Pranesh, and M. Zounon. 2019. Squeezing a matrix into half precision, with an application to solving linear systems. SIAM J. on Scientific Computing 41, 4 (2019), A2536–A2551. https://doi.org/10.1137/18M1229511Google ScholarDigital Library
- J. D. Hogg and J. A. Scott. 2010. A fast and robust mixed precision solver for sparse symmetric systems. ACM Trans. Math. Software 37, 2 (2010), 1–24. https://doi.org/10.1145/1731022.1731027 Article 4, 19 pages.Google ScholarDigital Library
- HSL 2023. HSL. A collection of Fortran codes for large-scale scientific computation. http://www.hsl.rl.ac.uk.Google Scholar
- D. Hysom and A. Pothen. 1999. Efficient parallel computation of ILU(k) preconditioners. In SC ’99: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, Portland, OR. ACM/IEEE, 1–29. https://doi.org/10.1145/331532.331561Google ScholarDigital Library
- M. T. Jones and P. E. Plassmann. 1995. An Improved Incomplete Cholesky Factorization. ACM Trans. Math. Software 21, 1 (1995), 5–17. https://doi.org/10.1145/200979.200981Google ScholarDigital Library
- D. S. Kershaw. 1978. The incomplete Cholesky-conjugate gradient method for the iterative solution of systems of linear equations. J. of Computational Physics 26 (1978), 43–65. https://doi.org/10.1016/0021-9991(78)90098-0Google ScholarCross Ref
- P. A. Knight, D. Ruiz, and B. Uçar. 2014. A symmetry preserving algorithm for matrix scaling. SIAM J. on Matrix Analysis and Applications 35, 3 (2014), 931–955. https://doi.org/10.1137/110825753Google ScholarDigital Library
- J. Kurzak and J. Dongarra. 2007. Implementation of mixed precision in solving systems of linear equations on the CELL processor. Concurrency and Computation: Practice and Experience 19, 10 (2007), 1371–1385. https://doi.org/10.1002/cpe.1164Google ScholarCross Ref
- J. Langou, Jul. Langou, P. Luszczek, J. Kurzak, A. Buttari, and J. Dongarra. 2006. Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems). In SC’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing. https://doi.org/10.1109/SC.2006.30Google ScholarCross Ref
- C.-J. Lin and J. J. Moré. 1999. Incomplete Cholesky factorizations with limited memory. SIAM J. on Scientific Computing 21, 1 (1999), 24–45. https://doi.org/10.1137/S1064827597327334Google ScholarDigital Library
- N. Lindquist, P. Luszczek, and J. Dongarra. 2020. Improving the performance of the GMRES method using mixed-precision techniques. In Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI: 17th Smoky Mountains Computational Sciences and Engineering Conference, SMC 2020, Oak Ridge, TN, USA, August 26–28, 2020, Revised Selected Papers 17. Springer, 51–66. https://doi.org/10.1007/978-3-030-63393-6_4Google ScholarCross Ref
- N. Lindquist, P. Luszczek, and J. Dongarra. 2022. Accelerating restarted GMRES with mixed precision arithmetic. IEEE Transactions on Parallel and Distributed Systems 33, 4 (2022), 1027–1037. https://doi.org/10.1109/TPDS.2021.3090757Google ScholarCross Ref
- J. A. Loe, C. A. Glusa, I. Yamazaki, E. G. Boman, and S. Rajamanickam. 2021. Experimental evaluation of multiprecision strategies for GMRES on GPUs. In 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, 469–478. https://doi.org/10.1109/IPDPSW52791.2021.00078Google ScholarCross Ref
- T. A. Manteuffel. 1980. An incomplete factorization technique for positive definite linear systems. Math. Comp. 34 (1980), 473–497. https://doi.org/10.2307/2006097Google ScholarCross Ref
- S. F. McCormick, J. Benzaken, and R. Tamstorf. 2021. Algebraic error analysis for mixed-precision multigrid solvers. SIAM J. on Scientific Computing 43, 5 (2021), S392–S419. https://doi.org/10.1137/20M1348571Google ScholarCross Ref
- J. A. Meijerink and H. A. van der Vorst. 1977. An iterative solution method for linear systems of which the coefficient matrix is a symmetric M-matrix. Math. Comp. 31, 137 (1977), 148–162. https://doi.org/10.2307/2005786Google ScholarCross Ref
- N. Munksgaard. 1980. Solving sparse symmetric sets of linear equations by preconditioned conjugate gradients. ACM Trans. Math. Software 6, 2 (1980), 206–219. https://doi.org/10.1145/355887.355893Google ScholarDigital Library
- D. Ruiz. 2001. A scaling algorithm to equilibrate both rows and columns norms in matrices. Technical Report RAL-TR-2001-034. Rutherford Appleton Laboratory, Chilton, Oxfordshire, England.Google Scholar
- D. Ruiz and B. Uçar. 2011. A symmetry preserving algorithm for matrix scaling. Technical Report INRIA RR-7552. INRIA, Grenoble, France.Google Scholar
- Y. Saad. 1994. A flexible inner-outer preconditioned GMRES algorithm. SIAM J. on Scientific and Statistical Computing 14 (1994), 461–469. https://doi.org/10.1137/0914028Google ScholarDigital Library
- Y. Saad. 2003. Iterative Methods for Sparse Linear Systems (second ed.). SIAM, Philadelphia, PA. xviii+528 pages. https://doi.org/10.1137/1.9780898718003Google ScholarCross Ref
- J. A. Scott and M. Tůma. 2011. The importance of structure in incomplete factorization preconditioners. BIT Numerical Mathematics 51 (2011), 385–404. https://doi.org/10.1007/s10543-010-0299-8Google ScholarDigital Library
- J. A. Scott and M. Tůma. 2014. HSL_MI28: an efficient and robust limited-memory incomplete Cholesky factorization code. ACM Trans. Math. Software 40, 4 (2014), 24:1–19. https://doi.org/10.1145/2617555Google ScholarDigital Library
- J. W. Watts-III. 1981. A conjugate gradient truncated direct method for the iterative solution of the reservoir simulation pressure equation. Soc. Petroleum Engineer J. 21 (1981), 345–353. https://doi.org/10.2118/8252-PAGoogle ScholarCross Ref
- M. Zounon, N. J. Higham, C. Lucas, and F. Tisseur. 2022. Performance impact of precision reduction in sparse linear systems solvers. PeerJ Computer Science 8 (2022), e778:1–22. https://doi.org/10.7717/perrj-cs.778Google ScholarCross Ref
Index Terms
- Avoiding breakdown in incomplete factorizations in low precision arithmetic
Recommendations
Solving Hermitian positive definite systems using indefinite incomplete factorizations
Incomplete LDL^* factorizations sometimes produce an indefinite preconditioner even when the input matrix is Hermitian positive definite. The two most popular iterative solvers for symmetric systems, CG and MINRES, cannot use such preconditioners; they ...
A Sparse Approximate Inverse Preconditioner for Nonsymmetric Linear Systems
This paper is concerned with a new approach to preconditioning for large, sparse linear systems. A procedure for computing an incomplete factorization of the inverse of a nonsymmetric matrix is developed, and the resulting factorized sparse approximate ...
Incomplete Multilevel Cholesky Factorizations
Adaptive in-time local grid refinement techniques use multilevel local discretizations designed to achieve local accuracy. The changing nature of the matrix structure of the linear systems arising from the multilevel local discretizations requires ...
Comments