Interruptible Nodes: Reducing Queueing Costs in Irregular Streaming Dataflow Applications on Wide-SIMD Architectures,International Journal of Parallel Programming

当前位置： X-MOL 学术 › Int. J. Parallel. Program › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Interruptible Nodes: Reducing Queueing Costs in Irregular Streaming Dataflow Applications on Wide-SIMD Architectures
International Journal of Parallel Programming ( IF 1.5 ) Pub Date : 2022-12-05 , DOI: 10.1007/s10766-022-00745-2
Stephen Timcheck , Jeremy Buhler

Streaming dataflow applications are an attractive target to parallelize on wide-SIMD processors such as GPUs. These applications can be expressed as a pipeline of compute nodes connected by edges, which feed outputs from one node to the next. Streaming applications often exhibit irregular dataflow, where the amount of output produced for one input is unknown a priori. Inserting finite queues between pipeline nodes can ameliorate the impact of irregularity and improve SIMD lane occupancy. The sizing of these queues is driven by both performance and safety considerations- relative queue sizes should be chosen to reduce runtime overhead and maximize throughput, but each node’s output queue must be large enough to accommodate the maximum number of outputs produced by one SIMD vector of inputs to the node. When safety and performance considerations conflict, the application may incur excessive memory usage and runtime overhead. In this work, we identify properties of applications that lead to such undesirable behaviors, with examples from applications implemented in our MERCATOR framework for irregular streaming on GPUs. To address these issues, we propose extensions to support interruptible nodes that can be suspended mid-execution if their output queues fill. We illustrate the impacts of adding interruptible nodes to the MERCATOR framework on representative irregular streaming applications from the domains of branching search and bioinformatics.

中文翻译：

可中断节点：降低 Wide-SIMD 架构上不规则流数据流应用程序的排队成本

流式数据流应用程序是在 GPU 等宽 SIMD 处理器上并行化的有吸引力的目标。这些应用程序可以表示为由边缘连接的计算节点管道，这些节点将输出从一个节点馈送到下一个节点。流式应用程序通常表现出不规则的数据流，其中一个输入产生的输出量是先验未知的. 在管道节点之间插入有限队列可以改善不规则的影响并提高 SIMD 通道占用率。这些队列的大小是由性能和安全考虑驱动的 - 应选择相对队列大小以减少运行时开销并最大化吞吐量，但每个节点的输出队列必须足够大以容纳一个 SIMD 向量产生的最大输出数量节点的输入。当安全和性能考虑发生冲突时，应用程序可能会导致过多的内存使用和运行时开销。在这项工作中，我们通过在我们的 MERCATOR 框架中实现的用于 GPU 上的不规则流式传输的应用程序示例来识别导致此类不良行为的应用程序的属性。为了解决这些问题，我们建议扩展支持可中断节点，如果它们的输出队列已满，可以在执行过程中暂停。我们说明了将可中断节点添加到 MERCATOR 框架对来自分支搜索和生物信息学领域的代表性不规则流应用程序的影响。

更新日期：2022-12-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>