当前位置: X-MOL 学术Environ. Pollut. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enhancing compound confidence in suspect and non-target screening through machine learning-based retention time prediction
Environmental Pollution ( IF 8.9 ) Pub Date : 2024-03-14 , DOI: 10.1016/j.envpol.2024.123763
Dehao Song , Ting Tang , Rui Wang , He Liu , Danping Xie , Bo Zhao , Zhi Dang , Guining Lu

The retention time (RT) of contaminants of emerging concern (CECs) in liquid chromatography-high-resolution mass spectrometry (LC-HRMS) is crucial for database matching in non-targeted screening (NTS) analysis. In this study, we developed a machine learning (ML) model to predict RTs of CECs in NTS analysis. Using 1051 CEC standards, we evaluated Random Forest (RF), XGBoost, Support Vector Regression (SVR), and Artificial Neural Network (ANN) with molecular fingerprints and chemical descriptors to establish an optimal model. The SVR model utilizing chemical descriptors resulted in good predictive capacity with = 0.850 and = 0.925. The model was further validated through laboratory NTS compound characterization. When applied to examine CEC occurrence in a large wastewater treatment plant, we identified 40 level S1 CECs (confirmed structure by reference standard) and 234 level S2 compounds (probable structure by library spectrum match). The model predicted RTs for level S2 compounds, leading to the classification of 153 level S2 compounds with high confidence (ΔRT <2 min). The model served as a robust filtering mechanism within the analytical framework. This study emphasizes the importance of predicted RTs in NTS analysis and highlights the potential of prediction models. Our research introduces a workflow that enhances NTS analysis by utilizing RT prediction models to determine compound confidence levels.

中文翻译:

通过基于机器学习的保留时间预测增强可疑和非目标筛选的复合置信度

液相色谱-高分辨率质谱 (LC-HRMS) 中新出现的污染物 (CEC) 的保留时间 (RT) 对于非靶向筛选 (NTS) 分析中的数据库匹配至关重要。在本研究中,我们开发了一种机器学习 (ML) 模型来预测 NTS 分析中 CEC 的 RT。我们使用 1051 CEC 标准,利用分子指纹和化学描述符评估随机森林 (RF)、XGBoost、支持向量回归 (SVR) 和人工神经网络 (ANN),以建立最佳模型。利用化学描述符的 SVR 模型具有良好的预测能力,分别为 = 0.850 和 = 0.925。该模型通过实验室 NTS 化合物表征得到进一步验证。当应用于检查大型废水处理厂中的 CEC 发生情况时,我们鉴定了 40 种 S1 级 CEC(通过参考标准确认结构)和 234 种 S2 级化合物(通过库谱匹配确定可能的结构)。该模型预测了 S2 级化合物的 RT,从而以高置信度(ΔRT <2 分钟)对 153 种 S2 级化合物进行分类。该模型在分析框架内充当了强大的过滤机制。这项研究强调了预测 RT 在 NTS 分析中的重要性,并强调了预测模型的潜力。我们的研究引入了一个工作流程,通过利用 RT 预测模型来确定复合置信水平,从而增强 NTS 分析。
更新日期:2024-03-14
down
wechat
bug