SMSG: Profiling-Free Parallelism Modeling for Distributed Training of DNN,International Journal of Parallel Programming

当前位置： X-MOL 学术 › Int. J. Parallel. Program › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

SMSG: Profiling-Free Parallelism Modeling for Distributed Training of DNN
International Journal of Parallel Programming ( IF 1.5 ) Pub Date : 2022-12-12 , DOI: 10.1007/s10766-022-00741-6
Haoran Wang , Thibaut Tachon , Chong Li , Sophie Robert , Sébastien Limet

The increasing size of deep neural networks (DNNs) raises a high demand for distributed training. An expert could find good hybrid parallelism strategies, but designing suitable strategies is time and labor-consuming. Therefore, automating parallelism strategy generation is crucial and desirable for DNN designers. Some automatic searching approaches have recently been studied to free the experts from the heavy parallel strategy conception. However, these approaches all rely on a numerical cost model, which requires heavy profiling results that lack portability. These profiling-based approaches cannot lighten the strategy generation work due to the non-reusable profiling value. Our intuition is that there is no need to estimate the actual execution time of the distributed training but to compare the relative cost of different strategies. We propose SMSG (Symbolic Modeling for Strategy Generation), which analyses the cost based on the communication and computation semantics. With SMSG, the parallel cost analyses are decoupled from hardware characteristics. SMSG defines cost functions for each kind of operator to quantitatively evaluate the amount of data for computation and communication, which eliminates the heavy profiling tasks. Besides, SMSG introduces how to apply functional transformation by using the Third Homomorphism theorem to control the high searching complexity. Our experiments show that SMSG can find good hybrid parallelism strategies to generate an efficient training performance similar to the state of the art. Moreover, SMSG covers a wide variety of DNN models with good scalability. SMSG provides good portability when changing training configurations that a profiling-based approach cannot.

中文翻译：

SMSG：DNN 分布式训练的无分析并行建模

深度神经网络 (DNN) 规模的不断扩大对分布式训练提出了很高的要求。专家可以找到好的混合并行策略，但设计合适的策略是费时费力的。因此，自动化并行策略生成对于 DNN 设计人员来说至关重要且值得期待。最近研究了一些自动搜索方法，以使专家从繁重的并行策略概念中解放出来。然而，这些方法都依赖于数值成本模型，这需要大量缺乏可移植性的分析结果。由于不可重用的分析价值，这些基于分析的方法无法减轻策略生成工作。我们的直觉是，无需估计分布式训练的实际执行时间，而是比较不同策略的相对成本。我们提出了 SMSG（策略生成的符号建模），它基于通信和计算语义来分析成本。使用 SMSG，并行成本分析与硬件特性分离。SMSG 为每一种算子定义成本函数，以定量评估计算和通信的数据量，从而消除繁重的分析任务。此外，SMSG还介绍了如何利用三次同态定理进行泛函变换来控制高搜索复杂度。我们的实验表明，SMSG 可以找到良好的混合并行策略来生成类似于现有技术的高效训练性能。此外，SMSG 涵盖了广泛的 DNN 模型，具有良好的可扩展性。

更新日期：2022-12-13

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>