Impact of ECG data format on the performance of machine learning models for the prediction of myocardial infarction,Journal of Electrocardiology

当前位置： X-MOL 学术 › J. Electrocardiol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Impact of ECG data format on the performance of machine learning models for the prediction of myocardial infarction
Journal of Electrocardiology ( IF 1.3 ) Pub Date : 2024-03-07 , DOI: 10.1016/j.jelectrocard.2024.03.005
Ryan A.A. Bellfield , Sandra Ortega-Martorell , Gregory Y.H. Lip , David Oxborough , Ivan Olier

Background We aim to determine which electrocardiogram (ECG) data format is optimal for ML modelling, in the context of myocardial infarction prediction. We will also address the auxiliary objective of evaluating the viability of using digitised ECG signals for ML modelling. Methods Two ECG arrangements displaying 10s and 2.5 s of data for each lead were used. For each arrangement, conservative and speculative data cohorts were generated from the PTB-XL dataset. All ECGs were represented in three different data formats: Signal ECGs, Image ECGs, and Extracted Signal ECGs, with 8358 and 11,621 ECGs in the conservative and speculative cohorts, respectively. ML models were trained using the three data formats in both data cohorts. Results For ECGs that contained 10s of data, Signal and Extracted Signal ECGs were optimal and statistically similar, with AUCs [95% CI] of 0.971 [0.961, 0.981] and 0.974 [0.965, 0.984], respectively, for the conservative cohort; and 0.931 [0.918, 0.945] and 0.919 [0.903, 0.934], respectively, for the speculative cohort. For ECGs that contained 2.5 s of data, the Image ECG format was optimal, with AUCs of 0.960 [0.948, 0.973] and 0.903 [0.886, 0.920], for the conservative and speculative cohorts, respectively. Conclusion When available, the Signal ECG data should be preferred for ML modelling. If not, the optimal format depends on the data arrangement within the ECG: If the Image ECG contains 10s of data for each lead, the Extracted Signal ECG is optimal, however, if it only uses 2.5 s, then using the Image ECG data is optimal for ML performance.

中文翻译：

心电图数据格式对心肌梗死预测机器学习模型性能的影响

背景我们的目标是在心肌梗塞预测的背景下确定哪种心电图 (ECG) 数据格式最适合 ML 建模。我们还将解决评估使用数字化心电图信号进行机器学习建模的可行性的辅助目标。方法使用两种 ECG 装置，每条导联显示 10 秒和 2.5 秒的数据。对于每种安排，保守数据和推测数据组都是从 PTB-XL 数据集中生成的。所有心电图均以三种不同的数据格式表示：信号心电图、图像心电图和提取信号心电图，保守组和推测组分别有 8358 个和 11,621 个心电图。使用两个数据组中的三种数据格式训练机器学习模型。结果对于包含 10 秒数据的心电图，信号心电图和提取信号心电图是最佳的，并且在统计上相似，保守队列的 AUC [95% CI] 分别为 0.971 [0.961, 0.981] 和 0.974 [0.965, 0.984]；对于投机队列，分别为 0.931 [0.918, 0.945] 和 0.919 [0.903, 0.934]。对于包含 2.5 s 数据的心电图，图像心电图格式是最佳的，保守组和投机组的 AUC 分别为 0.960 [0.948, 0.973] 和 0.903 [0.886, 0.920]。结论如果可用，信号 ECG 数据应优先用于 ML 建模。如果不是，最佳格式取决于心电图内的数据排列：如果图像心电图包含每个导联 10 秒的数据，则提取信号心电图是最佳的，但是，如果仅使用 2.5 秒，则使用图像心电图数据是最佳的。最适合 ML 性能。

更新日期：2024-03-07

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>