当前位置: X-MOL 学术Semant. Web › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Incremental schema integration for data wrangling via knowledge graphs
Semantic Web ( IF 3 ) Pub Date : 2023-06-08 , DOI: 10.3233/sw-233347
Javier Flores 1 , Kashif Rabbani 2 , Sergi Nadal 1 , Cristina Gómez 1 , Oscar Romero 1 , Emmanuel Jamin 3 , Stamatia Dasiopoulou 3
Affiliation  

Abstract

Virtual data integration is the current approach to go for data wrangling in data-driven decision-making. In this paper, we focus on automating schema integration, which extracts a homogenised representation of the data source schemata and integrates them into a global schema to enable virtual data integration. Schema integration requires a set of well-known constructs: the data source schemata and wrappers, a global integrated schema and the mappings between them. Based on them, virtual data integration systems enable fast and on-demand data exploration via query rewriting. Unfortunately, the generation of such constructs is currently performed in a largely manual manner, hindering its feasibility in real scenarios. This becomes aggravated when dealing with heterogeneous and evolving data sources. To overcome these issues, we propose a fully-fledged semi-automatic and incremental approach grounded on knowledge graphs to generate the required schema integration constructs in four main steps: bootstrapping, schema matching, schema integration, and generation of system-specific constructs. We also present NextiaDI, a tool implementing our approach. Finally, a comprehensive evaluation is presented to scrutinize our approach.



中文翻译:

通过知识图进行数据整理的增量模式集成

摘要

虚拟数据集成是当前在数据驱动决策中进行数据整理的方法。在本文中,我们重点关注自动化模式集成,它提取数据源模式的同质表示并将其集成到全局模式中以实现虚拟数据集成。模式集成需要一组众所周知的构造:数据源模式和包装器、全局集成模式以及它们之间的映射。基于它们,虚拟数据集成系统可以通过查询重写实现快速、按需的数据探索。不幸的是,此类构造的生成目前主要以手动方式执行,这阻碍了其在实际场景中的可行性。当处理异构且不断变化的数据源时,这种情况会变得更加严重。为了克服这些问题,我们提出了一种基于知识图的成熟的半自动和增量方法,通过四个主要步骤生成所需的模式集成构造:引导、模式匹配、模式集成和生成特定于系统的构造。我们还呈现尼西亚DI,一个实现我们方法的工具。最后,提出了全面的评估来审查我们的方法。

更新日期:2023-06-08
down
wechat
bug