当前位置: X-MOL 学术Future Gener. Comput. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Validity constraints for data analysis workflows
Future Generation Computer Systems ( IF 7.5 ) Pub Date : 2024-03-25 , DOI: 10.1016/j.future.2024.03.037
Florian Schintke , Khalid Belhajjame , Ninon De Mecquenem , David Frantz , Vanessa Emanuela Guarino , Marcus Hilbrich , Fabian Lehmann , Paolo Missier , Rebecca Sattler , Jan Arne Sparka , Daniel T. Speckhard , Hermann Stolte , Anh Duc Vu , Ulf Leser

Porting a scientific data analysis workflow (DAW) to a cluster infrastructure, a new software stack, or even only a new dataset with some notably different properties is often challenging. Despite the structured definition of the steps (tasks) and their interdependencies during a complex data analysis in the DAW specification, relevant assumptions may remain unspecified and implicit. Such hidden assumptions often lead to crashing tasks without a reasonable error message, poor performance in general, non-terminating executions, or silent wrong results of the DAW, to name only a few possible consequences. Searching for the causes of such errors and drawbacks in a distributed compute cluster managed by a complex infrastructure stack, where DAWs for large datasets typically are executed, can be tedious and time-consuming.

中文翻译:

数据分析工作流程的有效性约束

将科学数据分析工作流程 (DAW) 移植到集群基础设施、新的软件堆栈,甚至只是具有一些明显不同属性的新数据集通常具有挑战性。尽管 DAW 规范中的复杂数据分析过程中步骤(任务)及其相互依赖性的结构化定义,相关假设可能仍然未指定和隐含。这种隐藏的假设通常会导致任务崩溃而没有合理的错误消息、总体性能不佳、执行不终止或 DAW 的静默错误结果,仅举几个可能的后果。在由复杂基础架构堆栈管理的分布式计算集群(通常执行大型数据集的 DAW)中寻找此类错误和缺陷的原因可能是乏味且耗时的。
更新日期:2024-03-25
down
wechat
bug