当前位置: X-MOL 学术Inf. Syst. Front. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Analytic Processing in Data Lakes: A Semantic Query-Driven Discovery Approach
Information Systems Frontiers ( IF 5.9 ) Pub Date : 2024-02-14 , DOI: 10.1007/s10796-024-10471-4
Claudia Diamantini , Domenico Potena , Emanuele Storti

Data integration and discovery are open issues in Data Lakes potentially storing hundreds of data sources. The present paper addresses these issues targeting multidimensional data sources, that is sources containing atomic or derived measures aggregated along a number of dimensions, typically derived from raw data for analytical and reporting purposes. Combining semantic models of metadata with existing data-driven techniques, the paper proposes an approach for the discovery of mappings between source metadata and concepts in a reference knowledge graph, enabling the definition of reasoning-based techniques to discover, integrate, and rank data sources relevant to a given analytical query. The efficiency and effectiveness of the approach is discussed by means of experiments on real-world scenarios.



中文翻译:

数据湖中的分析处理:语义查询驱动的发现方法

数据集成和发现是可能存储数百个数据源的数据湖中的未决问题。本文针对多维数据源解决这些问题,即包含沿多个维度聚合的原子或派生度量的源,通常源自用于分析和报告目的的原始数据。本文将元数据的语义模型与现有的数据驱动技术相结合,提出了一种发现源元数据与参考知识图中概念之间映射的方法,从而能够定义基于推理的技术来发现、集成和排序数据源与给定的分析查询相关。通过现实场景的实验讨论了该方法的效率和有效性。

更新日期:2024-02-15
down
wechat
bug