当前位置: X-MOL 学术Front. Inform. Technol. Electron. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A visual analysis approach for data imputation via multi-party tabular data correlation strategies
Frontiers of Information Technology & Electronic Engineering ( IF 3 ) Pub Date : 2023-12-29 , DOI: 10.1631/fitee.2300480
Haiyang Zhu , Dongming Han , Jiacheng Pan , Yating Wei , Yingchaojie Feng , Luoxuan Weng , Ketian Mao , Yuankai Xing , Jianshu Lv , Qiucheng Wan , Wei Chen

Data imputation is an essential pre-processing task for data governance, aimed at filling in incomplete data. However, conventional data imputation methods can only partly alleviate data incompleteness using isolated tabular data, and they fail to achieve the best balance between accuracy and efficiency. In this paper, we present a novel visual analysis approach for data imputation. We develop a multi-party tabular data association strategy that uses intelligent algorithms to identify similar columns and establish column correlations across multiple tables. Then, we perform the initial imputation of incomplete data using correlated data entries from other tables. Additionally, we develop a visual analysis system to refine data imputation candidates. Our interactive system combines the multi-party data imputation approach with expert knowledge, allowing for a better understanding of the relational structure of the data. This significantly enhances the accuracy and efficiency of data imputation, thereby enhancing the quality of data governance and the intrinsic value of data assets. Experimental validation and user surveys demonstrate that this method supports users in verifying and judging the associated columns and similar rows using their domain knowledge.



中文翻译:

通过多方表格数据关联策略进行数据插补的可视化分析方法

数据插补是数据治理的一项重要预处理任务,旨在填补不完整的数据。然而,传统的数据插补方法只能利用孤立的表格数据部分缓解数据不完整性,并且无法实现准确性和效率之间的最佳平衡。在本文中,我们提出了一种新颖的数据插补可视化分析方法。我们开发了一种多方表格数据关联策略,该策略使用智能算法来识别相似的列并跨多个表建立列关联。然后,我们使用其他表中的相关数据条目对不完整数据进行初始插补。此外,我们开发了一个可视化分析系统来细化候选数据插补。我们的交互式系统将多方数据插补方法与专家知识相结合,可以更好地理解数据的关系结构。这显着提高了数据归算的准确性和效率,从而提高了数据治理的质量和数据资产的内在价值。实验验证和用户调查表明,该方法支持用户利用其领域知识验证和判断关联列和相似行。

更新日期:2023-12-29
down
wechat
bug