当前位置: X-MOL 学术Clin. Infect. Dis. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Development of a Machine Learning Modelling Tool for Predicting HIV Incidence Using Public Health Data from a County in the Southern United States
Clinical Infectious Diseases ( IF 11.8 ) Pub Date : 2024-02-23 , DOI: 10.1093/cid/ciae100
Carlos S Saldana 1 , Elizabeth Burkhardt 2 , Alfred Pennisi 2 , Kirsten Oliver 2 , John Olmstead 2 , David P Holland 3, 4 , Jenna Gettings 2 , Daniel Mauck 2 , David Austin 2 , Pascale Wortley 2 , Karla V Saldana Ochoa 5
Affiliation  

Background Recent advancements in Machine Learning (ML) have significantly improved the accuracy of models predicting HIV incidence. These models typically utilize electronic medical records and patient registries. This study aims to broaden the application of these tools by utilizing de-identified public health datasets for notifiable sexually transmitted infections (STIs) from a southern U.S. County known for high HIV incidence rates. The goal is to assess the feasibility and accuracy of ML in predicting HIV incidence, which could potentially inform and enhance public health interventions. Methods We analyzed two de-identified public health datasets, spanning January 2010 to December 2021, focusing on notifiable STIs. Our process involved data processing and feature extraction, including sociodemographic factors, STI cases, and social vulnerability index (SVI) metrics. Various ML algorithms were trained and evaluated for predicting HIV incidence, using metrics such as accuracy, precision, recall, and F1 score. Results The study included 85,224 individuals, with 2,027 (2.37%) newly diagnosed with HIV during the study period. The ML models demonstrated high performance in predicting HIV incidence among males and females. Influential predictive features for males included age at STI diagnosis, previous STI information, provider type, and SVI. For females, they included age, ethnicity, previous STIs information, overall SVI, and race. Conclusions The high accuracy of our ML models in predicting HIV incidence highlights the potential of using public health datasets for public health interventions such as tailored HIV testing and prevention. While these findings are promising, further research is needed to translate these models into practical public health applications.

中文翻译:

使用美国南部一个县的公共卫生数据开发机器学习建模工具来预测艾滋病毒发病率

背景 机器学习 (ML) 的最新进展显着提高了预测 HIV 发病率的模型的准确性。这些模型通常利用电子病历和患者登记。本研究旨在通过利用去识别化的公共卫生数据集来扩大这些工具的应用,该数据集来自美国南部一个以艾滋病毒发病率高而闻名的县,涉及应通报的性传播感染 (STI)。目标是评估机器学习在预测艾滋病毒发病率方面的可行性和准确性,这可能会为公共卫生干预措施提供信息和加强。方法 我们分析了 2010 年 1 月至 2021 年 12 月的两个去识别化的公共卫生数据集,重点关注应通报的性传播感染。我们的流程涉及数据处理和特征提取,包括社会人口因素、性传播感染病例和社会脆弱性指数 (SVI) 指标。使用准确度、精确度、召回率和 F1 评分等指标,对各种 ML 算法进行训练和评估,以预测 HIV 发病率。结果 该研究纳入了 85,224 人,其中 2,027 人(2.37%)在研究期间新诊断出感染了 HIV。ML 模型在预测男性和女性 HIV 发病率方面表现出高性能。对男性有影响的预测特征包括 STI 诊断时的年龄、既往 STI 信息、提供者类型和 SVI。对于女性,这些信息包括年龄、种族、既往性传播感染信息、整体 SVI 和种族。结论 我们的机器学习模型在预测 HIV 发病率方面具有很高的准确性,凸显了使用公共卫生数据集进行公共卫生干预(例如定制 HIV 检测和预防)的潜力。虽然这些发现很有希望,但还需要进一步研究将这些模型转化为实际的公共卫生应用。
更新日期:2024-02-23
down
wechat
bug