当前位置: X-MOL 学术arXiv.cs.MS › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
$O(N)$ distributed direct factorization of structured dense matrices using runtime systems
arXiv - CS - Mathematical Software Pub Date : 2023-11-02 , DOI: arxiv-2311.00921
Sameer Deshmukh, Qinxiang Ma, Rio Yokota, George Bosilca

Structured dense matrices result from boundary integral problems in electrostatics and geostatistics, and also Schur complements in sparse preconditioners such as multi-frontal methods. Exploiting the structure of such matrices can reduce the time for dense direct factorization from $O(N^3)$ to $O(N)$. The Hierarchically Semi-Separable (HSS) matrix is one such low rank matrix format that can be factorized using a Cholesky-like algorithm called ULV factorization. The HSS-ULV algorithm is highly parallel because it removes the dependency on trailing sub-matrices at each HSS level. However, a key merge step that links two successive HSS levels remains a challenge for efficient parallelization. In this paper, we use an asynchronous runtime system PaRSEC with the HSS-ULV algorithm. We compare our work with STRUMPACK and LORAPO, both state-of-the-art implementations of dense direct low rank factorization, and achieve up to 2x better factorization time for matrices arising from a diverse set of applications on up to 128 nodes of Fugaku for similar or better accuracy for all the problems that we survey.

中文翻译:

使用运行时系统对结构化密集矩阵进行 $O(N)$ 分布式直接分解

结构化密集矩阵源自静电学和地统计学中的边界积分问题,以及稀疏预处理器(例如多前沿方法)中的 Schur 补充。利用此类矩阵的结构可以将密集直接分解的时间从 $O(N^3)$ 减少到 $O(N)$。分层半可分离 (HSS) 矩阵就是一种低秩矩阵格式,可以使用称为 ULV 分解的类 Cholesky 算法进行分解。HSS-ULV 算法是高度并行的,因为它消除了对每个 HSS 级别的尾随子矩阵的依赖。然而,链接两个连续 HSS 级别的关键合并步骤仍然是高效并行化的挑战。在本文中,我们使用带有 HSS-ULV 算法的异步运行时系统 PaRSEC。我们将我们的工作与 STRUMPACK 和 LORAPO 进行比较,这两种方法都是最先进的密集直接低秩分解的实现,并且对于由 Fugaku 多达 128 个节点上的不同应用程序集产生的矩阵,实现了高达 2 倍的更好分解时间我们调查的所有问题的准确性相似或更高。
更新日期:2023-11-03
down
wechat
bug