当前位置: X-MOL 学术J. Comput. Aid. Mol. Des. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
pH-dependent solubility prediction for optimized drug absorption and compound uptake by plants
Journal of Computer-Aided Molecular Design ( IF 3.5 ) Pub Date : 2023-02-17 , DOI: 10.1007/s10822-023-00496-3
Anne Bonin 1 , Floriane Montanari 2 , Sebastian Niederführ 3 , Andreas H Göller 1
Affiliation  

Aqueous solubility is the most important physicochemical property for agrochemical and drug candidates and a prerequisite for uptake, distribution, transport, and finally the bioavailability in living species. We here present the first-ever direct machine learning models for pH-dependent solubility in water. For this, we combined almost 300000 data points from 11 solubility assays performed over 24 years and over one million data points from lipophilicity and melting point experiments. Data were split into three pH-classes − acidic, neutral and basic − , representing the conditions of stomach and intestinal tract for animals and humans, and phloem and xylem for plants. We find that multi-task neural networks using ECFP-6 fingerprints outperform baseline random forests and single-task neural networks on the individual tasks. Our final model with three solubility tasks using the pH-class combined data from different assays and five helper tasks results in root mean square errors of 0.56 log units overall (acidic 0.61; neutral 0.52; basic 0.54) and Spearman rank correlations of 0.83 (acidic 0.78; neutral 0.86; basic 0.86), making it a valuable tool for profiling of compounds in pharmaceutical and agrochemical research. The model allows for the prediction of compound pH profiles with mean and median RMSE per molecule of 0.62 and 0.56 log units.



中文翻译:

pH 依赖性溶解度预测,以优化药物吸收和植物对化合物的吸收

水溶性是农用化学品和候选药物最重要的物理化学性质,也是吸收、分布、运输以及生物利用度的先决条件。我们在这里展示了第一个直接的机器学习模型,用于研究 pH 依赖的水溶性。为此,我们结合了 24 年来进行的 11 次溶解度测定的近 300000 个数据点和来自亲脂性和熔点实验的超过 100 万个数据点。数据分为三个 pH 值等级 - 酸性、中性和碱性 - 代表动物和人类的胃和肠道以及植物的韧皮部和木质部的状况。我们发现使用 ECFP-6 指纹的多任务神经网络在单个任务上优于基线随机森林和单任务神经网络。我们的最终模型具有三个溶解度任务,使用来自不同测定的 pH 级组合​​数据和五个辅助任务,导致整体均方根误差为 0.56 个对数单位(酸性 0.61;中性 0.52;碱性 0.54)和 Spearman 等级相关性 0.83(酸性0.78;中性 0.86;碱性 0.86),使其成为药物和农用化学研究中化合物分析的宝贵工具。该模型允许预测化合物 pH 曲线,每个分子的均方根误差为 0.62 和 0.56 log 单位。使其成为分析药物和农用化学研究中化合物的宝贵工具。该模型允许预测化合物 pH 曲线,每个分子的均方根误差为 0.62 和 0.56 log 单位。使其成为分析药物和农用化学研究中化合物的宝贵工具。该模型允许预测化合物 pH 曲线,每个分子的均方根误差为 0.62 和 0.56 log 单位。

更新日期:2023-02-17
down
wechat
bug