当前位置: X-MOL 学术Comput. Sci. Rev. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
State of the art on quality control for data streams: A systematic literature review
Computer Science Review ( IF 12.9 ) Pub Date : 2023-04-17 , DOI: 10.1016/j.cosrev.2023.100554
Mostafa Mirzaie , Behshid Behkamal , Mohammad Allahbakhsh , Samad Paydar , Elisa Bertino

These days, endless streams of data are generated by various sources such as sensors, applications, users, etc. Due to possible issues in sources, such as malfunctions in sensors, platforms, or communication, the generated data might be of low quality, and this can lead to wrong outcomes for the tasks that rely on these data streams. Therefore, controlling the quality of data streams has become increasingly significant. Many approaches have been proposed for controlling the quality of data streams, and hence, various research areas have emerged in this field. To the best of our knowledge, there is no systematic literature review of research papers within this field that comprehensively reviews approaches, classifies them, and highlights the challenges.

In this paper, we present the state of the art in the area of quality control of data streams, and characterize it along four dimensions. The first dimension represents the goal of the quality analysis, which can be either quality assessment, or quality improvement. The second dimension focuses on the quality control method, which can be online, offline, or hybrid. The third dimension focuses on the quality control technique, and finally, the fourth dimension represents whether the quality control approach uses any contextual information (inherent, system, organizational, or spatiotemporal context) or not. We compare and critically review the related approaches proposed in the last two decades along these dimensions. We also discuss the open challenges and future research directions.



中文翻译:

数据流质量控制的最新技术:系统的文献综述

如今,传感器、应用程序、用户等各种来源产生了无穷无尽的数据流。由于来源可能存在问题,例如传感器、平台或通信出现故障,生成的数据可能质量低下,并且这可能会导致依赖这些数据流的任务产生错误的结果。因此,控制数据流的质量变得越来越重要。已经提出了许多方法来控制数据流的质量,因此,该领域出现了各种研究领域。据我们所知,该领域的研究论文没有系统的文献综述来全面回顾方法、对其进行分类并突出挑战。

在本文中,我们介绍了数据流质量控制领域的最新技术,并从四个维度对其进行了描述。第一个维度表示质量分析的目标,可以是质量评估,也可以是质量改进。第二个维度侧重于质量控制方法,可以是在线的、离线的或混合的。第三个维度侧重于质量控制技术,最后,第四个维度表示质量控制方法是否使用任何上下文信息(固有、系统、组织或时空上下文)。我们沿着这些维度比较并批判性地回顾了过去二十年中提出的相关方法。我们还讨论了开放的挑战和未来的研究方向。

更新日期:2023-04-18
down
wechat
bug