当前位置: X-MOL 学术Electron. Markets › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sanitizing data for analysis: Designing systems for data understanding
Electronic Markets ( IF 6.017 ) Pub Date : 2023-10-09 , DOI: 10.1007/s12525-023-00677-w
Joshua Holstein , Max Schemmer , Johannes Jakubik , Michael Vössing , Gerhard Satzger

As organizations accumulate vast amounts of data for analysis, a significant challenge remains in fully understanding these datasets to extract accurate information and generate real-world impact. Particularly, the high dimensionality of datasets and the lack of sufficient documentation, specifically the provision of metadata, often limit the potential to exploit the full value of data via analytical methods. To address these issues, this study proposes a hybrid approach to metadata generation, that leverages both the in-depth knowledge of domain experts and the scalability of automated processes. The approach centers on two key design principles—semanticization and contextualization—to facilitate the understanding of high-dimensional datasets. A real-world case study conducted at a leading pharmaceutical company validates the effectiveness of this approach, demonstrating improved collaboration and knowledge sharing among users. By addressing the challenges in metadata generation, this research contributes significantly toward empowering organizations to make more effective, data-driven decisions.



中文翻译:

清理数据以进行分析:设计数据理解系统

随着组织积累大量数据进行分析,充分理解这些数据集以提取准确的信息并产生现实世界的影响仍然是一个重大挑战。特别是,数据集的高维性和缺乏足够的文档,特别是元数据的提供,往往限制了通过分析方法充分利用数据价值的潜力。为了解决这些问题,本研究提出了一种元数据生成的混合方法,该方法利用了领域专家的深入知识和自动化流程的可扩展性。该方法以两个关键设计原则为中心——语义化和情境化——以促进对高维数据集的理解。在一家领先的制药公司进行的真实案例研究验证了这种方法的有效性,展示了用户之间协作和知识共享的改进。通过解决元数据生成中的挑战,这项研究为帮助组织做出更有效、数据驱动的决策做出了重大贡献。

更新日期:2023-10-11
down
wechat
bug