当前位置: X-MOL 学术Comput. Sci. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
High-Productivity Parallelism With Python Plus Packages (But Without a Cluster)
Computing in Science & Engineering ( IF 2.1 ) Pub Date : 2021-05-21 , DOI: 10.1109/mcse.2021.3082864
John Bartlett 1 , Chris Uchytil 1 , Duane Storti 1
Affiliation  

We present two computing projects, peridynamics simulation and numerical integration on implicit domains, for which we realized high performance implementations using Python with appropriate packages. The problems are sufficiently compute intensive that a straightforward serial implementation is prohibitively slow. While conventional wisdom suggests moving such problems onto a computing cluster, we very directly produced high-performance parallel implementations that effectively perform the computing tasks on a single GPU. For the peridynamics application, the only package needed in addition to Numpy is Numba whose just-in-time compiler allows us to write kernel functions in Python and compile them to run in parallel on a CUDA-enabled GPU. Our approach to numerical integration on implicit domains invokes two additional packages to support interval arithmetic and dynamic parallelism to enable tree-structured recursive refinement. Use of Python (with only kernels requiring dynamic parallelism written in C) enabled rapid development of concise code that successfully achieves significant performance enhancement.

中文翻译:

使用 Python Plus 包的高生产力并行(但没有集群)

我们提出了两个计算项目,近场动力学模拟和隐式域上的数值集成,为此我们使用带有适当包的 Python 实现了高性能实现。这些问题的计算量足够大,以至于直接的串行实现速度非常慢。虽然传统观点建议将此类问题转移到计算集群上,但我们非常直接地生成了高性能并行实现,可在单个 GPU 上有效执行计算任务。对于近场动力学应用程序,除了 Numpy 之外唯一需要的包是 Numba,它的即时编译器允许我们用 Python 编写内核函数并编译它们以在支持 CUDA 的 GPU 上并行运行。我们在隐式域上进行数值积分的方法调用了两个额外的包来支持区间算术和动态并行性,从而实现树结构的递归细化。使用 Python(只有需要用 C 编写的动态并行性的内核)可以快速开发简洁的代码,成功实现显着的性能增强。
更新日期:2021-07-30
down
wechat
bug