Synthetic data generation: State of the art in health care domain,Computer Science Review

当前位置： X-MOL 学术 › Comput. Sci. Rev. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Synthetic data generation: State of the art in health care domain
Computer Science Review ( IF 12.9 ) Pub Date : 2023-02-26 , DOI: 10.1016/j.cosrev.2023.100546
Hajra Murtaza , Musharif Ahmed , Naurin Farooq Khan , Ghulam Murtaza , Saad Zafar , Ambreen Bano

Recent progress in artificial intelligence and machine learning has led to the growth of research in every aspect of life including the health care domain. However, privacy risks and legislations hinder the availability of patient data to researchers. Synthetic data (SD) has been regarded as a privacy-safe alternative to real data and has lately been employed in many research and academic endeavors. This growing body of research needs to be consolidated for the researchers and practitioners to gain a quick and fruitful comprehension of the state of the art in synthetic data generation in health care. The purpose of this study is to collate and synthesize the current state of synthetic data generation following a narrative review of 70 peer-reviewed studies discussing privacy-preserving synthetic medical data generation techniques. The literature shows the effectiveness of synthetic datasets for different applications in research, academics, and testing according to existing statistical and task-based utility metrics. However, the focus on longitudinal synthetic data seems deficient. Moreover, a unified metric for generic quality assessment of synthetic data is lacking. The results of this review will serve as a quick reference guide for the researchers and practitioners in the healthcare domain to select a suitable synthetic data strategy for their application based on its strengths and weaknesses and pave the path for further research and development in healthcare.

中文翻译：

合成数据生成：医疗保健领域的最新技术

人工智能和机器学习的最新进展导致了包括医疗保健领域在内的生活各个方面的研究的增长。然而，隐私风险和立法阻碍了研究人员获取患者数据。合成数据 (SD) 已被视为真实数据的隐私安全替代方案，最近已用于许多研究和学术活动。研究人员和从业者需要巩固这一不断增长的研究体系，以便快速而富有成果地理解医疗保健领域合成数据生成的最新技术水平。本研究的目的是在对 70 项讨论隐私保护合成医学数据生成技术的同行评审研究进行叙述性审查后，整理和综合合成数据生成的现状。文献显示了根据现有统计和基于任务的效用指标，合成数据集在研究、学术和测试中的不同应用的有效性。然而，对纵向合成数据的关注似乎不足。此外，缺乏对合成数据进行通用质量评估的统一指标。这篇综述的结果将作为医疗领域研究人员和从业者的快速参考指南，根据其优缺点选择适合其应用的合成数据策略，为医疗领域的进一步研究和发展铺平道路。对纵向综合数据的关注似乎不足。此外，缺乏对合成数据进行通用质量评估的统一指标。这篇综述的结果将作为医疗领域研究人员和从业者的快速参考指南，根据其优缺点选择适合其应用的合成数据策略，为医疗领域的进一步研究和发展铺平道路。对纵向综合数据的关注似乎不足。此外，缺乏对合成数据进行通用质量评估的统一指标。这篇综述的结果将作为医疗领域研究人员和从业者的快速参考指南，根据其优缺点选择适合其应用的合成数据策略，为医疗领域的进一步研究和发展铺平道路。

更新日期：2023-02-26

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>