Abstract
Aqueous solubility is the most important physicochemical property for agrochemical and drug candidates and a prerequisite for uptake, distribution, transport, and finally the bioavailability in living species. We here present the first-ever direct machine learning models for pH-dependent solubility in water. For this, we combined almost 300000 data points from 11 solubility assays performed over 24 years and over one million data points from lipophilicity and melting point experiments. Data were split into three pH-classes − acidic, neutral and basic − , representing the conditions of stomach and intestinal tract for animals and humans, and phloem and xylem for plants. We find that multi-task neural networks using ECFP-6 fingerprints outperform baseline random forests and single-task neural networks on the individual tasks. Our final model with three solubility tasks using the pH-class combined data from different assays and five helper tasks results in root mean square errors of 0.56 log units overall (acidic 0.61; neutral 0.52; basic 0.54) and Spearman rank correlations of 0.83 (acidic 0.78; neutral 0.86; basic 0.86), making it a valuable tool for profiling of compounds in pharmaceutical and agrochemical research. The model allows for the prediction of compound pH profiles with mean and median RMSE per molecule of 0.62 and 0.56 log units.
Similar content being viewed by others
References
Amidon GL, Lennernas H, Shah VP, Crison JR (1995) A theoretical basis for a biopharmaceutic drug classification: the correlation of in vitro drug product dissolution and in vivo bioavailability. Pharm Res 12:413–420
Jorgensen WL, Duffy EM (2002) Prediction of drug solubility from structure. Adv Drug Deliv Rev 54:355–366
Zhang Y, Lorsbach BA, Castetter S, Lambert WT, Kister J, Wang NX, Klittich CJR, Roth J, Sparks TC, Loso MR (2018) Physicochemical property guidelines for modern agrochemicals. Pest Manag Sci 74:1979–1991
Manallack DT (2027) The acid/base profile of agrochemicals. SAR QSAR Environ Res 28:621–628
Comer JEA (2003) In drug bioavailability, vol. 1, chapter 2. Wiley-VCH, New York, pp 21–45
Fallingborg J (1999) Intraluminal pH of the human gastrointestinal tract. Dan Med Bull 46:183–196
Nowak M, Selmar D (2018) Cellular distribution of alkaloids and their translocation via phloem and xylem: the importance of compartment pH. Plant Biol J 18:879–882
Bergstroem CAS, Luthman K, Artursson P (2004) Accuracy of calculated pH-dependent aqueous drug solubility. Eur J Pharm Sci 22:387–398
Loh ZH, Samanta AK, Heng PWS (2015) Overview of milling techniques for improving the solubility of poorly water-soluble drugs. Asian J Pharm Sci 10:255–274
Veseli A, Zakelj S, Kristl A (2019) A review of methods for solubility determination in biopharmaceutical drug characterization. Drug Devel Indust Pharm 45:1717–1724
Alsenz J, Kansy M (2007) High throughput solubility measurement in drug discovery and development. Adv Drug Deliv Rev 59:546–567
Galia E, Nicolaides E, Hörter D, Löbenberg R, Reppas C, Dressman J (1998) Evaluation of various dissolution media for predicting in vivo performance of class I and II drugs. Pharm Res 15:698–705
Galia E, Nicolaides E, Reppas C, Dressman J (1996) New media discriminate dissolution of poorly soluble drugs. Pharm Res 13:262
Kanikkannan N (2018) Technologies to improve the solubility, dissolution and bioavailability of poorly soluble drugs. J Anal Pharm Res 7:198
Delaney JS (2005) Predicting aqueous solubility from structure. Drug Discov Today 10:289–295
Balakin KV, Savchuk NP, Tetko IV (2006) In silico approaches to prediction of aqueous and DMSO solubility of drug-like compounds: trends, problems and solutions. Curr Med Chem 13:223–241
Faller B, Ertl P (2007) Computational approaches to determine drug solubility. Adv Drug Deliv Rev 59:533–545
Göller AH, Hennemann M, Keldenich J, Clark T (2006) In silico prediction of buffer solubility based on quantum-mechanical and HQSAR- and topology-based descriptors. J Chem Inf Model 46:648–658
Schwaighofer A, Schroeter T, Mika S, Laub J, ter Laak A, Sülzle D, Ganzer U, Heinrich N (2007) Accurate solubility prediction with error bars for electrolytes: a machine learning approach. J Chem Inf Model 47:407–424
Schroeter T, Schwaighofer A, Mika S, ter Laak A, Sülzle D, Ganzer U, Heinrich N, Müller KR (2007) Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules. J Comput Aided Mol Des 21:651–664
Montanari F, Kuhnke L, ter Laak A, Clevert DA (2020) Modeling physico-chemical ADMET endpoints with multitask graph convolutional networks. Molecules 25:44–56
Galarza LM, Gomez LAT Prediction of pH-dependent aqueous solubility of druglike molecules of different chemical behavior. MOL2NET 03, International Conference Series on Multidisciplinary Sciences. (2017)
Aleksic S, Seeliger D, Brown JB (2021) ADMET predictability at Boehringer Ingelheim: state-of-the- art, and do bigger datasets or algorithms make a difference? Mol Inf 40:2100113
Hasselbalch KA (1916) Die Berechnung der Wasserstoffzahl des Blutes aus der freien und gebunden Kohlensäure desselben, und die Sauerstoffbindung des Blutes als Funktion der Wasserstoffzahl. Biochem Z 78:112–144
Bergström CAS, Luthman K, Artursson P (2004) Accuracy of calculated pH-dependent aqueous drug solubility. Eur J Pharm Sci 22:387–398
Hansen NT, Kouskoumvekaki I, Jorgensen FS, Brunak S, Jonsdottir SO (2006) Prediction of pH-dependent aqueous solubility of druglike molecules. J Chem Inf Model 46:2601–2609
ACD/Percepta, Advanced Chemistry Development, Inc., Toronto, ON, Canada, www.acdlabs.com (2022). Accessed 15 Feb 2023.
ADMET Predictor, version 7.1; Simulations Plus, Inc.: Lancaster, CA (2014)
Pipeline Pilot, version 21.2.0.2574, server version 21.2.0.2575; Dassault Systemes BIOVIA Corp.: San Diego, CA (2020)
National Center for Biotechnology Information PubChem Bioassay Record for AID 1996, Aqueous Solubility from MLSMR Stock Solutions, Source: Burnham Center for Chemical Genomics. https://pubchem.ncbi.nlm.nih.gov/bioassay/1996 (2022). Accessed 1 Dec 2022
https://www.ebi.ac.uk/chembl/document_report_card/CHEMBL3301361/ (2023). Accessed 15 Feb 2023.
Wenlock MC, Austin RP, Potter T, Barton P (2011) A highly automated assay for determining the aqueous equilibrium solubility of drug discovery compounds. J Ass Lab Autom 16(276):284
Kramer C, Heinisch T, Fligge T, Beck B, Clark T (2009) A consistent dataset of kinetic solubilities for early-phase drug discovery. Chem Med Chem 4:1529–1536
Sieger P, Cui Y, Scheuer S (2017) pH-dependent solubility and permeability profiles: a useful tool for prediction of oral bioavailability. Eur J Pharm Sci 195:82–90
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754
Sosnin S, Karlov D, Tetko IV, Fedorov MV (2019) Comparative study of multitask toxicity modeling on a broad chemical space. J Chem Inf Model 59:1062–1072
Alexander DLJ, Tropsha A, Winkler DA (2015) Beware of R2: simple, unambiguous assessment of the prediction accuracy of QSAR and QSPR models. J Chem Inf Model 55:1316–1322
Yingqing RY, Yalkowsky SH (2001) Prediction of drug solubility by the General Solubility Equation (GSE). J Chem Inf Comput Sci 41:354–357
Dahl GE, Jaitly N, Salakhutdinov R Multi-task Neural Networks for QSAR Predictions, arXiv:1406.1231 (2014). Accessed 15 Feb 2023.
Kearnes S, Goldman B, Pande V Modeling Industrial ADMET Data with Multitask Networks, arXiv:1606.08793 (2016). Accessed 15 Feb 2023.
Winter R, Montanari F, Noe F, Clevert DA (2019) Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem Sci 10:1692–1701
Author information
Authors and Affiliations
Contributions
AB performed the machine learning work and prepared all figures and data. SN and FM provided the machine learning concept and framework. AG identified the datasets and wrote the main manuscript. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interest
The authors declare no competing interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bonin, A., Montanari, F., Niederführ, S. et al. pH-dependent solubility prediction for optimized drug absorption and compound uptake by plants. J Comput Aided Mol Des 37, 129–145 (2023). https://doi.org/10.1007/s10822-023-00496-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-023-00496-3