当前位置: X-MOL 学术Diabetol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An omics-based machine learning approach to predict diabetes progression: a RHAPSODY study
Diabetologia ( IF 8.2 ) Pub Date : 2024-02-19 , DOI: 10.1007/s00125-024-06105-8
Roderick C. Slieker , Magnus Münch , Louise A. Donnelly , Gerard A. Bouland , Iulian Dragan , Dmitry Kuznetsov , Petra J. M. Elders , Guy A. Rutter , Mark Ibberson , Ewan R. Pearson , Leen M. ’t Hart , Mark A. van de Wiel , Joline W. J. Beulens

Aims/hypothesis

People with type 2 diabetes are heterogeneous in their disease trajectory, with some progressing more quickly to insulin initiation than others. Although classical biomarkers such as age, HbA1c and diabetes duration are associated with glycaemic progression, it is unclear how well such variables predict insulin initiation or requirement and whether newly identified markers have added predictive value.

Methods

In two prospective cohort studies as part of IMI-RHAPSODY, we investigated whether clinical variables and three types of molecular markers (metabolites, lipids, proteins) can predict time to insulin requirement using different machine learning approaches (lasso, ridge, GRridge, random forest). Clinical variables included age, sex, HbA1c, HDL-cholesterol and C-peptide. Models were run with unpenalised clinical variables (i.e. always included in the model without weights) or penalised clinical variables, or without clinical variables. Model development was performed in one cohort and the model was applied in a second cohort. Model performance was evaluated using Harrel’s C statistic.

Results

Of the 585 individuals from the Hoorn Diabetes Care System (DCS) cohort, 69 required insulin during follow-up (1.0–11.4 years); of the 571 individuals in the Genetics of Diabetes Audit and Research in Tayside Scotland (GoDARTS) cohort, 175 required insulin during follow-up (0.3–11.8 years). Overall, the clinical variables and proteins were selected in the different models most often, followed by the metabolites. The most frequently selected clinical variables were HbA1c (18 of the 36 models, 50%), age (15 models, 41.2%) and C-peptide (15 models, 41.2%). Base models (age, sex, BMI, HbA1c) including only clinical variables performed moderately in both the DCS discovery cohort (C statistic 0.71 [95% CI 0.64, 0.79]) and the GoDARTS replication cohort (C 0.71 [95% CI 0.69, 0.75]). A more extensive model including HDL-cholesterol and C-peptide performed better in both cohorts (DCS, C 0.74 [95% CI 0.67, 0.81]; GoDARTS, C 0.73 [95% CI 0.69, 0.77]). Two proteins, lactadherin and proto-oncogene tyrosine-protein kinase receptor, were most consistently selected and slightly improved model performance.

Conclusions/interpretation

Using machine learning approaches, we show that insulin requirement risk can be modestly well predicted by predominantly clinical variables. Inclusion of molecular markers improves the prognostic performance beyond that of clinical variables by up to 5%. Such prognostic models could be useful for identifying people with diabetes at high risk of progressing quickly to treatment intensification.

Data availability

Summary statistics of lipidomic, proteomic and metabolomic data are available from a Shiny dashboard at https://rhapdata-app.vital-it.ch.

Graphical Abstract



中文翻译:

基于组学的机器学习方法来预测糖尿病进展:RHAPSODY 研究

目标/假设

2 型糖尿病患者的疾病轨迹各不相同,有些人开始使用胰岛素的进展速度比其他人更快。尽管年龄、HbA 1c和糖尿病持续时间等经典生物标志物与血糖进展相关,但尚不清楚这些变量在多大程度上预测胰岛素起始或需要,以及新发现的标志物是否具有增加的预测价值。

方法

在作为 IMI-RHAPSODY 一部分的两项前瞻性队列研究中,我们研究了临床变量和三种类型的分子标记(代谢物、脂质、蛋白质)是否可以使用不同的机器学习方法(lasso、ridge、GRridge、随机森林)预测胰岛素需求时间)。临床变量包括年龄、性别、HbA 1c、HDL-胆固醇和C-肽。模型使用未惩罚的临床变量(即始终包含在没有权重的模型中)或惩罚的临床变量或没有临床变量来运行。在一组中进行模型开发,并将该模型应用于第二组。使用 Harrel 的 C 统计量评估模型性能。

结果

在 Hoorn 糖尿病护理系统 (DCS) 队列的 585 名个体中,有 69 人在随访期间(1.0-11.4 年)需要胰岛素;在苏格兰泰赛德糖尿病遗传学审计和研究 (GoDARTS) 队列中的 571 人中,有 175 人在随访期间(0.3-11.8 年)需要胰岛素。总体而言,在不同模型中最常选择临床变量和蛋白质,其次是代谢物。最常选择的临床变量是 HbA 1c(36 个模型中的 18 个,50%)、年龄(15 个模型,41.2%)和 C 肽(15 个模型,41.2%)。基本模型(年龄、性别、BMI、HbA 1c)仅包括临床变量,在 DCS 发现队列(C 统计量 0.71 [95% CI 0.64, 0.79])和 GoDARTS 复制队列(C 0.71 [95% CI 0.69])中表现中等。 ,0.75])。包含 HDL 胆固醇和 C 肽的更广泛模型在两个队列中均表现更好(DCS,C 0.74 [95% CI 0.67,0.81];GoDARTS,C 0.73 [95% CI 0.69,0.77])。乳粘素和原癌基因酪氨酸蛋白激酶受体这两种蛋白质的选择最为一致,并且模型性能略有改善。

结论/解释

使用机器学习方法,我们表明可以通过主要的临床变量适度地很好地预测胰岛素需求风险。纳入分子标志物可使预后性能比临床变量提高高达 5%。这种预后模型可用于识别快速进展到强化治疗的高风险糖尿病患者。

数据可用性

脂质组学、蛋白质组学和代谢组学数据的汇总统计数据可从 Shiny 仪表板获取:https://rhapdata-app.vital-it.ch。

图形概要

更新日期:2024-02-20
down
wechat
bug