Skip to main content
Log in

Predicting absolute aqueous solubility by applying a machine learning model for an artificially liquid-state as proxy for the solid-state

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

In this study, we use machine learning algorithms with QM-derived COSMO-RS descriptors, along with Morgan fingerprints, to predict the absolute solubility of drug-like compounds. The QM-derived descriptors account for the molecular properties of the solute, i.e., the solute–solute interactions in an artificial-liquid-state (super-cooled liquid), and the solute–solvent interactions in solution. We employ two main approaches to predict solubility: (i) a hypothetical pathway that involves melting the solute at room temperature T = T¯ (\({\Delta }_{fus}{G}_{A}^{\ominus }\)) and mixing the artificially liquid solute into the solvent (\({\Delta }_{m}{G}_{\left(A:B\right)}^{\ominus }\)). In this approach \({\Delta }_{fus}{G}_{A}^{\ominus }\) is predicted using machine learning models, and the \({\Delta }_{m}{G}_{\left(A:B\right)}^{\ominus }\) is obtained from COSMO-RS calculations; (ii) direct solubility prediction using machine learning algorithms. The models were trained on a large number of Bayer in-house compounds for which water solubility data is available at physiological pH of 6.5 and ambient temperature. We also evaluated our models using external datasets from a solubility challenge. Our models present great improvements compared to the absolute solubility prediction with the QSAR model for the artificial liquid state as implemented in the COSMOtherm software, for both in-house and external datasets. We are furthermore able to demonstrate the superiority of QM-derived descriptors compared to cheminformatics descriptors. We finally present low-cost alternative models using fragment-based COSMOquick calculations with only marginal reduction in the quality of predicted solubility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The online version contains supplementary material available at …

References

  1. Chung TDY, Terry DB, Smith LH (2004) In vitro and in vivo assessment of ADME and PK properties during lead selection and lead optimization—guidelines, benchmarks and rules of thumb. In: Markossian S, Grossman A, Brimacombe K, Arkin M, Auld D, Austin C, Baell J, Chung TDY, Coussens NP, Dahlin JL, Devanarayan V, Foley TL, Glicksman M, Haas JV, Hall MD, Hoare S, Inglese J, Iversen PW, Kales SC, Lal-Nag M, Li Z, McGee J, McManus O, Riss T, Saradjian P, Sittampalam GS, Tarselli M, Trask OJ Jr, Wang Y, Weidner JR, Wildey MJ, Wilson K, Xia M, Xu X (eds) Assay guidance manual. Bethesda

    Google Scholar 

  2. Clark DE, Grootenhuis PD (2002) Progress in computational methods for the prediction of ADMET properties. Curr Opin Drug Discov Devel 5(3):382–390

    CAS  PubMed  Google Scholar 

  3. Dearden JC (2007) In silico prediction of ADMET properties: how far have we come? Expert Opin Drug Metab Toxicol 3(5):635–639

    CAS  PubMed  Google Scholar 

  4. Göller AH, Kuhnke L, Montanari F, Bonin A, Schneckener S, ter Laak A, Wichard J, Lobell M, Hillisch A (2020) Bayer’s in silico ADMET platform: a journey of machine learning over the past two decades. Drug Discov Today 25(9):1702–1709

    PubMed  Google Scholar 

  5. Göller AH, Kuhnke L, ter Laak A, Meier K, Hillisch A (2022) Machine learning applied to the modeling of pharmacological and ADMET absorption, distribution, metabolism, excretion and toxicity (ADMET) endpoints. In: Heifetz A (ed) Artificial intelligence in drug design. New York, Springer, pp 61–101

    Google Scholar 

  6. Kier LB, Hall LH (2005) The prediction of ADMET properties using structure information representations. Chem Biodivers 2(11):1428–1437

    CAS  PubMed  Google Scholar 

  7. Lucas AJ, Sproston JL, Barton P, Riley RJ (2019) Estimating human ADME properties, pharmacokinetic parameters and likely clinical dose in drug discovery. Expert Opin Drug Discov 14(12):1313–1327

    CAS  PubMed  Google Scholar 

  8. Norinder U, Bergstrom CA (2006) Prediction of ADMET properties. ChemMedChem 1(9):920–937

    CAS  PubMed  Google Scholar 

  9. Oliferenko PV, Oliferenko AA, Poda G, Palyulin VA, Zefirov NS, Katritzky AR (2009) New developments in hydrogen bonding acidity and basicity of small organic molecules for the prediction of physical and ADMET properties: part 2—the universal solvation equation. J Chem Inf Model 49(3):634–646

    CAS  PubMed  Google Scholar 

  10. Zhou SF, Zhong WZ (2017) Drug design and discovery: principles and applications. Molecules 22(2):279

    PubMed  PubMed Central  Google Scholar 

  11. Eleftheriadou D, Luette S, Kneuer C (2019) In silico prediction of dermal absorption of pesticides—an evaluation of selected models against results from in vitro testing. SAR QSAR Environ Res 30(8):561–585

    CAS  PubMed  Google Scholar 

  12. Elliott JR, Compton RG (2022) Modeling transcuticular uptake from particle-based formulations of lipophilic products. ACS Agric Sci Technol 2(3):603–614

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Khayet M, Fernandez V (2012) Estimation of the solubility parameters of model plant surfaces and agrochemicals: a valuable tool for understanding plant surface interactions. Theor Biol Med Model 9:45

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Xiao S, Gong Y, Li Z, Fantke P (2021) Improving pesticide uptake modeling into potatoes: considering tuber growth dynamics. J Agric Food Chem 69(12):3607–3616

    CAS  PubMed  Google Scholar 

  15. Avdeef A, Fuguet E, Llinàs A, Ràfols C, Bosch E, Völgyi G, Verbić T, Boldyreva E, Takács-Novák K (2016) Equilibrium solubility measurement of ionizable drugs–consensus recommendations for improving data quality. ADMET and DMPK 4(2):117–178

    Google Scholar 

  16. Fink C, Sun DJ, Wagner K, Schneider M, Bauer H, Dolgos H, Mader K, Peters SA (2020) Evaluating the role of solubility in oral absorption of poorly water-soluble drugs using physiologically-based pharmacokinetic modeling. Clin Pharmacol Ther 107(3):650–661

    CAS  PubMed  Google Scholar 

  17. Llinas A, Avdeef A (2019) Solubility challenge revisited after ten years, with multilab shake-flask data, using tight (SD ∼ 0.17 log) and loose (SD ∼ 0.62 log) test sets. J Chem Inf Model 59(6):3036–3040

    CAS  PubMed  Google Scholar 

  18. Ono A, Matsumura N, Kimoto T, Akiyama Y, Funaki S, Tamura N, Hayashi S, Kojima Y, Fushimi M, Sudaki H, Aihara R, Haruna Y, Jiko M, Iwasaki M, Fujita T, Sugano K (2019) Harmonizing solubility measurement to lower inter-laboratory variance—progress of consortium of biopharmaceutical tools (CoBiTo) in Japan. ADMET DMPK 7(3):183–195

    PubMed  PubMed Central  Google Scholar 

  19. Kuramochi H, Kawamoto K (2006) Modification of UNIFAC parameter table revision 5 for representation of aqueous solubility and 1-octanol/water partition coefficient for POPs. Chemosphere 63(4):698–706

    CAS  PubMed  Google Scholar 

  20. Banerjee S, Howard PH (1988) Improved estimation of solubility and partitioning through correction of UNIFAC-derived activity coefficients. Environ Sci Technol 22(7):839–841

    CAS  PubMed  Google Scholar 

  21. Arbuckle WB (1986) Using UNIFAC to calculate aqueous solubilities. Environ Sci Technol 20(10):1060–1064

    CAS  PubMed  Google Scholar 

  22. Ochsner AB, Sokoloski TD (1985) Prediction of solubility in nonideal multicomponent systems using the UNIFAC group contribution model. J Pharm Sci 74(6):634–637

    CAS  PubMed  Google Scholar 

  23. Banerjee S (1985) Calculation of water solubility of organic compounds with UNIFAC-derived parameters. Environ Sci Technol 19(4):369–370

    CAS  PubMed  Google Scholar 

  24. Fredenslund A, Jones RL, Prausnitz JM (1975) Group-contribution estimation of activity-coefficients in nonideal liquid-mixtures. Aiche J 21(6):1086–1099

    CAS  Google Scholar 

  25. Hildebrand, J. H., Solubility of non-electrolytes. 1936, 2nd ed. Pp. 203. New York: Reinhold Publishing Corp., London: Chapman & Hall, Ltd. 22s. 6d

  26. Hildebrand JH (1949) A critique of the theory of solubility of non-electrolytes. Chem Rev 44(1):37–45

    CAS  PubMed  Google Scholar 

  27. Hildebrand JH (1950) Factors determining solubility among non-electrolytes. Proc Natl Acad Sci USA 36(1):7–15

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Martin A, Paruta AN, Adjei A (1981) Extended hildebrand solubility approach: methylxanthines in mixed solvents. J Pharm Sci 70(10):1115–1120

    CAS  PubMed  Google Scholar 

  29. Martin A, Miralles MJ (1982) Extended Hildebrand solubility approach: solubility of tolbutamide, acetohexamide, and sulfisomidine in binary solvent mixtures. J Pharm Sci 71(4):439–442

    CAS  PubMed  Google Scholar 

  30. Martin A, Wu PL, Adjei A, Lindstrom RE, Elworthy PH (1982) Extended Hildebrand solubility approach and the log linear solubility equation. J Pharm Sci 71(8):849–856

    CAS  PubMed  Google Scholar 

  31. Bustamante P, Escalera B, Martin A, Selles E (1993) A modification of the extended Hildebrand approach to predict the solubility of structurally related drugs in solvent mixtures. J Pharm Pharmacol 45(4):253–257

    CAS  PubMed  Google Scholar 

  32. Lin HM, Nash RA (1993) An experimental method for determining the Hildebrand solubility parameter of organic nonelectrolytes. J Pharm Sci 82(10):1018–1026

    CAS  PubMed  Google Scholar 

  33. Jouyban-Gharamaleki A, Romero S, Bustamante P, Clark BJ (2000) Multiple solubility maxima of oxolinic acid in mixed solvents and a new extension of Hildebrand solubility approach. Chem Pharm Bull (Tokyo) 48(2):175–178

    CAS  PubMed  Google Scholar 

  34. Wu PL, Beerbower A, Martin A (1982) Extended Hansen approach: calculating partial solubility parameters of solid solutes. J Pharm Sci 71(11):1285–1287

    CAS  PubMed  Google Scholar 

  35. Barra J, Lescure F, Doelker E, Bustamante P (1997) The expanded Hansen approach to solubility parameters: Paracetamol and citric acid in individual solvents. J Pharm Pharmacol 49(7):644–651

    CAS  PubMed  Google Scholar 

  36. Hansen CM (2007) Hansen solubility parameters: a user’s handbook. CRC Press

    Google Scholar 

  37. Louwerse MJ, Maldonado A, Rousseau S, Moreau-Masselon C, Roux B, Rothenberg G (2017) Revisiting Hansen solubility parameters by including thermodynamics. ChemPhysChem 18(21):2999–3006

    CAS  PubMed  PubMed Central  Google Scholar 

  38. Famini GR, Headley AD, Wilson L (1994) Using theoretical descriptors in Qsar and Lfer—the role of solute solvent interactions in solubility, acidity and basicity. Abstr Pap Am Chem S 207:96

    Google Scholar 

  39. Abraham MH, Green CE, Acree WE, Hernandez CE, Roy LE (1998) Descriptors for solutes from the solubility of solids: trans-stilbene as an example. J Chem Soc Perk T 2 12:2677–2681

    Google Scholar 

  40. Green CE, Abraham MH, Acree WE, De Fina KM, Sharp TL (2000) Solvation descriptors for pesticides from the solubility of solids: diuron as an example. Pest Manag Sci 56(12):1043–1053

    CAS  Google Scholar 

  41. Acree WE, Abraham MH (2002) Solubility of crystalline nonelectrolyte solutes in organic solvents: mathematical correlation of Benzil solubilities with the Abraham general solvation model. J Solution Chem 31(4):293–303

    CAS  Google Scholar 

  42. Jouyban A, Soltanpour S, Soltani S, Chan HK, Acree WE (2007) Solubility prediction of drugs in water-cosolvent mixtures using Abraham solvation parameters. J Pharm Pharm Sci 10(3):263–277

    CAS  PubMed  Google Scholar 

  43. Jouyban A, Soltanpour S, Soltani S, Tamizi E, Fakhree MAA, Acree WE (2009) Prediction of drug solubility in mixed solvents using computed Abraham parameters. J Mol Liq 146(3):82–88

    CAS  Google Scholar 

  44. Abraham MH, Smith RE, Luchtefeld R, Boorem AJ, Luo R, Acree, Jr. WE (2010) Prediction of solubility of drugs and other compounds in organic solvents. J Pharm Sci 99(3):1500–1515

    CAS  PubMed  Google Scholar 

  45. Abraham MH, Le J (1999) The correlation and prediction of the solubility of compounds in water using an amended solvation energy relationship. J Pharm Sci US 88(9):868–880

    CAS  Google Scholar 

  46. Sutter JM, Jurs PC (1996) Prediction of aqueous solubility for a diverse set of heteroatom-containing organic compounds using a quantitative structure-property relationship. J Chem Inf Comp Sci 36(1):100–107

    CAS  Google Scholar 

  47. Katritzky AR, Wang YL, Sild S, Tamm T, Karelson M (1998) QSPR studies on vapor pressure, aqueous solubility, and the prediction of water-air partition coefficients. J Chem Inf Comp Sci 38(4):720–725

    CAS  Google Scholar 

  48. Yan A, Gasteiger J (2003) Prediction of aqueous solubility of organic compounds based on a 3D structure representation. J Chem Inf Comput Sci 43(2):429–434

    CAS  PubMed  Google Scholar 

  49. Rytting E, Lentz KA, Chen XQ, Qian F, Venkatesh S (2004) A quantitative structure-property relationship for predicting drug solubility in PEG 400/water cosolvent systems. Pharm Res-Dordr 21(2):237–244

    CAS  Google Scholar 

  50. Salahinejad M, Le TC, Winkler DA (2013) Aqueous solubility prediction: do crystal lattice interactions help? Mol Pharmaceut 10(7):2757–2766

    CAS  Google Scholar 

  51. Boobier S, Hose DRJ, Blacker AJ, Nguyen BN (2020) Machine learning with physicochemical relationships: solubility prediction in organic solvents and water. Nat Commun 11(1):5753. https://doi.org/10.1038/s41467-020-19594-z

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Kurotani A, Kakiuchi T, Kikuchi J (2021) Solubility prediction from molecular properties and analytical data using an in-phase deep neural network (Ip-DNN). ACS Omega 6(22):14278–14287

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Ye Z, Ouyang D (2021) Prediction of small-molecule compound solubility in organic solvents by machine learning algorithms. J Cheminform 13(1):98

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Göller AH, Hennemann M, Keldenich J, Clark T (2006) In silico prediction of buffer solubility based on quantum-mechanical and HQSAR- and topology-based descriptors. J Chem Inf Model 46(2):648–658

    PubMed  Google Scholar 

  55. Huuskonen J, Salo M, Taskinen J (1998) Aqueous solubility prediction of drugs based on molecular topology and neural network modeling. J Chem Inf Comput Sci 38(3):450–456

    CAS  PubMed  Google Scholar 

  56. Huuskonen J (2000) Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology. J Chem Inf Comput Sci 40(3):773–777

    CAS  PubMed  Google Scholar 

  57. Manallack DT, Tehan BG, Gancia E, Hudson BD, Ford MG, Livingstone DJ, Whitley DC, Pitt WR (2003) A consensus neural network-based technique for discriminating soluble and poorly soluble compounds. J Chem Inf Comput Sci 43(2):674–679

    CAS  PubMed  Google Scholar 

  58. Jouyban A, Majidi MR, Jalilzadeh H, Asadpour-Zeynali K (2004) Modeling drug solubility in water-cosolvent mixtures using an artificial neural network. Farmaco 59(6):505–512

    CAS  PubMed  Google Scholar 

  59. Lusci A, Pollastri G, Baldi P (2013) Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J Chem Inf Model 53(7):1563–1575

    CAS  PubMed  PubMed Central  Google Scholar 

  60. Deng T, Jia GZ (2020) Prediction of aqueous solubility of compounds based on neural network. Mol Phys. https://doi.org/10.1080/00268976.2019.1600754

    Article  Google Scholar 

  61. Tosca EM, Bartolucci R, Magni P (2021) Application of artificial neural networks to predict the intrinsic solubility of drug-like molecules. Pharmaceutics 13(7):1101

    CAS  PubMed  PubMed Central  Google Scholar 

  62. Jorgensen WL, Buckner JK, Boudon S, Tiradorives J (1988) Efficient computation of absolute free-energies of binding by computer-simulations - application to the methane dimer in water. J Chem Phys 89(6):3742–3746

    CAS  Google Scholar 

  63. Vangunsteren WF, Berendsen HJC (1990) Computer-simulation of molecular-dynamics—methodology, applications, and perspectives in chemistry. Angew Chem Int Edit 29(9):992–1023

    Google Scholar 

  64. Shirts MR, Bair E, Hooker G, Pande VS (2003) Equilibrium free energies from nonequilibrium measurements using maximum-likelihood methods. Phys Rev Lett. https://doi.org/10.1103/PhysRevLett.91.140601

    Article  PubMed  Google Scholar 

  65. van Gunsteren WF, Bakowies D, Baron R, Chandrasekhar I, Christen M, Daura X, Gee P, Geerke DP, Glattli A, Hunenberger PH, Kastenholz MA, Ostenbrink C, Schenk M, Trzesniak D, van der Vegt NFA, Yu HB (2006) Biomolecular modeling: goals, problems, perspectives. Angew Chem Int Ed 45(25):4064–4092

    Google Scholar 

  66. Christ CD, van Gunsteren WF (2007) Enveloping distribution sampling: a method to calculate free energy differences from a single simulation. J Chem Phys. https://doi.org/10.1063/1.2730508

    Article  PubMed  Google Scholar 

  67. Christ CD, van Gunsteren WF (2008) Multiple free energies from a single simulation: extending enveloping distribution sampling to nonoverlapping phase-space distributions. J Chem Phys. https://doi.org/10.1063/1.2913050

    Article  PubMed  Google Scholar 

  68. Christ CD, van Gunsteren WF (2009) Comparison of three enveloping distribution sampling Hamiltonians for the estimation of multiple free energy differences from a single simulation. J Comput Chem 30(11):1664–1679

    CAS  PubMed  Google Scholar 

  69. Khavrutskii IV, Wallqvist A (2011) Improved binding free energy predictions from single-reference thermodynamic integration augmented with Hamiltonian replica exchange. J Chem Theory Comput 7(9):3001–3011

    CAS  PubMed  PubMed Central  Google Scholar 

  70. Miao YL, Sinko W, Pierce L, Bucher D, Walker RC, McCammon JA (2014) Improved reweighting of accelerated molecular dynamics simulations for free energy calculation. J Chem Theory Comput 10(7):2677–2689

    CAS  PubMed  PubMed Central  Google Scholar 

  71. Hospital A, Goñi JR, Orozco M, Gelpí JL (2015) Molecular dynamics simulations: advances and applications. Adv Appl Bioinform Chem 8:37–47

    PubMed  PubMed Central  Google Scholar 

  72. Sidler D, Cristofol-Clough M, Schwaninger A, Riniker S (2017) Replica exchange envelope distribution sampling (RE-EDS): arobust and accurate method to calculate multiple free energy differences from a single simulation. Abstr Pap Am Chem Soc 254.

  73. Hahn DF, Hunenberger PH (2019) Alchemical free-energy calculations by multiple-replica lambda-dynamics: the conveyor belt thermodynamic integration scheme. J Chem Theory Comput 15(4):2392–2419

    CAS  PubMed  Google Scholar 

  74. Filipe HAL, Loura LMS (2022) Molecular dynamics simulations: advances and applications. Molecules 27(7):2105

    CAS  PubMed  PubMed Central  Google Scholar 

  75. Klamt A, Schuurmann G (1993) Cosmo—a new approach to dielectric screening in solvents with explicit expressions for the screening energy and its gradient. J Chem Soc Perk T 2 5:799–805

    Google Scholar 

  76. Klamt A (1995) Conductor-like screening model for real solvents—a new approach to the quantitative calculation of solvation phenomena. J Phys Chem-Us 99(7):2224–2235

    CAS  Google Scholar 

  77. Klamt A, Jonas V, Burger T, Lohrenz JCW (1998) Refinement and parametrization of COSMO-RS. J Phys Chem A 102(26):5074–5085

    CAS  Google Scholar 

  78. Klamt A (2011) The COSMO and COSMO-RS solvation models. Wires Comput Mol Sci 1(5):699–709

    CAS  Google Scholar 

  79. Klamt A (2016) COSMO-RS for aqueous solvation and interfaces. Fluid Phase Equilibr 407:152–158

    CAS  Google Scholar 

  80. Klamt A (2018) The COSMO and COSMO-RS solvation models. Wires Comput Mol Sci. https://doi.org/10.1002/wcms.1338

    Article  Google Scholar 

  81. Diedenhofen M, Eckert F, Klamt A (2003) Prediction of infinite dilution activity coefficients of organic compounds in ionic liquids using COSMO-RS. J Chem Eng Data 48(3):475–479

    CAS  Google Scholar 

  82. Putnam R, Taylor R, Klamt A, Eckert F, Schiller M (2003) Prediction of infinite dilution activity coefficients using COSMO-RS. Ind Eng Chem Res 42(15):3635–3641

    CAS  Google Scholar 

  83. Kashefolgheta S, Verde AV (2017) Developing force fields when experimental data is sparse: AMBER/GAFF-compatible parameters for inorganic and alkyl oxoanionst. Phys Chem Chem Phys 19(31):20593–20607

    CAS  PubMed  Google Scholar 

  84. Satarifard V, Kashefolgheta S, Vila Verde A, Grafmüller A (2017) Is the solution activity derivative sufficient to parametrize ion-ion interactions? Ions for TIP5P water. J Chem Theory Comput 13(5):2112–2122

    CAS  PubMed  Google Scholar 

  85. Matos GDR, Calabro G, Mobley DL (2019) Infinite dilution activity coefficients as constraints for force field parametrization and method development. J Chem Theory Comput 15(5):3066–3074

    Google Scholar 

  86. Klamt A, Diedenhofen M (2010) Blind prediction test of free energies of hydration with COSMO-RS. J Comput Aid Mol Des 24(4):357–360

    CAS  Google Scholar 

  87. Zhang J, Tuguldur B, van der Spoel D (2015) Force field benchmark of organic liquids: 2—Gibbs energy of solvation. J Chem Inf Model 55(6):1192–1201

    CAS  PubMed  Google Scholar 

  88. Matos GDR, Kyu DY, Loeffler HH, Chodera JD, Shirts MR, Mobley DL (2017) Approaches for calculating solvation free energies and enthalpies demonstrated with an update of the FreeSolv database. J Chem Eng Data 62(5):1559–1569

    PubMed Central  Google Scholar 

  89. Riquelme M, Lara A, Mobley DL, Verstraelen T, Matamala AR, Vohringer-Martinez E (2018) Hydration free energies in the FreeSolv database calculated with polarized iterative Hirshfeld charges. J Chem Inf Model 58(9):1779–1797

    CAS  PubMed  PubMed Central  Google Scholar 

  90. Kashefolgheta S, Oliveira MP, Rieder SR, Horta BAC, Acree WE, Hunenberger PH (2020) Evaluating classical force fields against experimental cross-solvation free energies. J Chem Theory Comput 16(12):7556–7580

    CAS  PubMed  Google Scholar 

  91. Kashefolgheta S, Wang SZ, Acree WE, Hunenberger PH (2021) Evaluation of nine condensed-phase force fields of the GROMOS, CHARMM, OPLS, AMBER, and OpenFF families against experimental cross-solvation free energies. Phys Chem Chem Phys 23(23):13055–13074

    CAS  PubMed  PubMed Central  Google Scholar 

  92. Bannan CC, Burley KH, Chiu M, Shirts MR, Gilson MK, Mobley DL (2016) Blind prediction of cyclohexane-water distribution coefficients from the SAMPL5 challenge. J Comput Aided Mol Des 30(11):927–944

    CAS  PubMed  PubMed Central  Google Scholar 

  93. Bannan CC, Calabro G, Kyu DY, Mobley DL (2016) Calculating partition coefficients of small molecules in octanol/water and cyclohexane/water. J Chem Theory Comput 12(8):4015–4024

    CAS  PubMed  PubMed Central  Google Scholar 

  94. Zhang HY, Jiang Y, Cui ZH, Yin CH (2018) Force field benchmark of amino acids: 2—partition coefficients between water and organic solvents. J Chem Inf Model 58(8):1669–1681

    CAS  PubMed  Google Scholar 

  95. Loschen C, Reinisch J, Klamt A (2020) COSMO-RS based predictions for the SAMPL6 logP challenge. J Comput Aided Mol Des 34(4):385–392

    CAS  PubMed  Google Scholar 

  96. Warnau J, Wichmann K, Reinisch J (2021) COSMO-RS predictions of logP in the SAMPL7 blind challenge. J Comput Aided Mol Des 35(7):813–818

    CAS  PubMed  Google Scholar 

  97. Andersson MP, Bennetzen MV, Klamt A, Stipp SLS (2014) First-principles prediction of liquid/liquid interfacial tension. J Chem Theory Comput 10(8):3401–3408

    CAS  PubMed  Google Scholar 

  98. Remesal ER, Suarez JA, Marquez AM, Sanz JF, Rincon C, Guitian J (2017) Molecular dynamics simulations of the role of salinity and temperature on the hydrocarbon/water interfacial tension. Theor Chem Acc. https://doi.org/10.1007/s00214-017-2096-9

    Article  Google Scholar 

  99. Klamt A, Schwobel J, Huniar U, Koch L, Terzi S, Gaudin T (2019) COSMOplex: self-consistent simulation of self-organizing inhomogeneous systems based on COSMO-RS. Phys Chem Chem Phys 21(18):9225–9238

    CAS  PubMed  Google Scholar 

  100. Andersson MP, Hassenkam T, Matthiesen J, Nikolajsen LV, Okhrimenko DV, Dobberschutz S, Stipp SLS (2020) First-principles prediction of surface wetting. Langmuir 36(42):12451–12459

    CAS  PubMed  Google Scholar 

  101. Abramov YA (2015) Major source of error in QSPR prediction of intrinsic thermodynamic solubility of drugs: solid vs nonsolid state contributions? Mol Pharm 12(6):2126–2141

    CAS  Google Scholar 

  102. Docherty R, Pencheva K, Abramov YA (2015) Low solubility in drug development: de-convoluting the relative importance of solvation and crystal packing. J Pharm Pharmacol 67(6):847–856

    CAS  PubMed  Google Scholar 

  103. McDonagh JL, Palmer DS, van Mourik T, Mitchell JBO (2016) Are the sublimation thermodynamics of organic molecules predictable? J Chem Inf Model 56(11):2162–2179

    CAS  PubMed  Google Scholar 

  104. Bera S, Dong X, Krishnarjuna B, Raab SA, Hales DA, Ji W, Tang Y, Shimon LJW, Ramamoorthy A, Clemmer DE, Wei G, Gazit E (2021) Solid-state packing dictates the unexpected solubility of aromatic peptides. Cell Rep Phys Sci 2(4):100391

    CAS  PubMed  PubMed Central  Google Scholar 

  105. Zhou Y, Wang J, Xiao Y, Wang T, Huang X (2018) The effects of polymorphism on physicochemical properties and pharmacodynamics of solid drugs. Curr Pharm Des 24(21):2375–2382

    CAS  PubMed  Google Scholar 

  106. Gavezzotti A (1994) Are crystal structures predictable? Accounts Chem Res 27(10):309–314

    CAS  Google Scholar 

  107. Dunitz JD (2003) Are crystal structures predictable? Chem Commun 5:545–548

    Google Scholar 

  108. Day GM, Chisholm J, Shan N, Motherwell WS, Jones W (2004) An assessment of lattice energy minimization for the prediction of molecular organic crystal structures. Cryst Growth Des 4(6):1327–1340

    CAS  Google Scholar 

  109. Price SL (2009) Computed crystal energy landscapes for understanding and predicting organic crystal structures and polymorphism. Acc Chem Res 42(1):117–126

    CAS  Google Scholar 

  110. Salahinejad M, Le TC, Winkler DA (2013) Capturing the crystal: prediction of enthalpy of sublimation, crystal lattice energy, and melting points of organic compounds. J Chem Inf Model 53(1):223–229

    CAS  PubMed  Google Scholar 

  111. Price SL (2014) Predicting crystal structures of organic compounds. Chem Soc Rev 43(7):2098–2111

    CAS  PubMed  Google Scholar 

  112. Dybeck EC, Schieber NP, Shirts MR (2016) Effects of a more accurate polarizable Hamiltonian on polymorph free energies computed efficiently by reweighting point-charge potentials. J Chem Theory Comput 12(8):3491–3505

    CAS  PubMed  Google Scholar 

  113. Beran GJO, Nanda K (2010) Predicting organic crystal lattice energies with chemical accuracy. J Phys Chem Lett 1(24):3480–3487

    CAS  Google Scholar 

  114. Buchholz HK, Stein M (2018) Accurate lattice energies of organic molecular crystals from periodic turbomole calculations. J Comput Chem 39(19):1335–1343

    CAS  PubMed  Google Scholar 

  115. Palmer DS, Llinas A, Morao I, Day GM, Goodman JM, Glen RC, Mitchell JB (2008) Predicting intrinsic aqueous solubility by a thermodynamic cycle. Mol Pharm 5(2):266–279

    CAS  PubMed  Google Scholar 

  116. Palmer DS, McDonagh JL, Mitchell JB, van Mourik T, Fedorov MV (2012) First-principles calculation of the intrinsic aqueous solubility of crystalline druglike molecules. J Chem Theory Comput 8(9):3322–3337

    CAS  PubMed  Google Scholar 

  117. Fraczkiewicz R, Lobell M, Göller AH, Krenz U, Schoenneis R, Clark RD, Hillisch A (2015) Best of both worlds: combining pharma data and state of the art modeling technology to improve in silico pKa prediction. J Chem Inf Model 55(2):389–397

    CAS  PubMed  Google Scholar 

  118. (2014) ADMET predictor, version 7.1; Simulations Plus, Inc.: Lancaster

  119. Llinas A, Oprisiu I, Avdeef A (2020) Findings of the second challenge to predict aqueous solubility. J Chem Inf Model 60(10):4791–4803

    CAS  PubMed  Google Scholar 

  120. Henderson LJ (1908) The theory of neutrality regulation in the animal organism. Am J Physiol 21(4):427–448

    Google Scholar 

  121. Henderson LJ (1908) Concerning the relationship between the strength of acids and their capacity to preserve neutrality. Am J Physiol 21(2):173–179

    CAS  Google Scholar 

  122. Po HN, Senozan NM (2001) The Henderson–Hasselbalch equation: its history and limitations. J Chem Educ 78(11):1499–1503

    CAS  Google Scholar 

  123. (2020) Pipeline pilot, version 21.2.0.2574, server version 21.2.0.2575; Dassault Systemes BIOVIA Corp.: San Diego

  124. RDKit: Open-source cheminformatics. https://www.rdkit.org

  125. Riniker S, Landrum GA (2015) Better informed distance geometry: using what we know to improve conformation generation. J Chem Inf Model 55(12):2562–2574

    CAS  Google Scholar 

  126. Spicher S, Grimme S (2020) Robust atomistic modeling of materials, organometallic, and biochemical systems. Angew Chem Int Ed Engl 59(36):15665–15673

    CAS  PubMed  PubMed Central  Google Scholar 

  127. Grimme S, Bannwarth C, Shushkov P (2017) A robust and accurate tight-binding quantum chemical method for structures, vibrational frequencies, and noncovalent interactions of large molecular systems parametrized for all spd-block elements (Z = 1–86). J Chem Theory Comput 13(5):1989–2009

    CAS  PubMed  Google Scholar 

  128. Becke AD (1988) Density-functional exchange-energy approximation with correct asymptotic behavior. Phys Rev A 38(6):3098–3100

    CAS  Google Scholar 

  129. Perdew JP (1986) Density-functional approximation for the correlation energy of the inhomogeneous electron gas. Phys Rev B 33(12):8822–8824

    CAS  Google Scholar 

  130. Eichkorn K, Treutler O, Ohm H, Haser M, Ahlrichs R (1995) Auxiliary basis-sets to approximate coulomb potentials. Chem Phys Lett 240(4):283–289

    CAS  Google Scholar 

  131. Eichkorn K, Weigend F, Treutler O, Ahlrichs R (1997) Auxiliary basis sets for main row atoms and transition metals and their use to approximate Coulomb potentials. Theor Chem Acc 97(1–4):119–124

    CAS  Google Scholar 

  132. TURBOMOLE V7.2 2017, a development of University of Karlsruhe and Forschungszentrum Karlsruhe GmbH, 1989–2007, TURBOMOLE GmbH, since 2007. http://www.turbomole.com

  133. COSMOtherm, release 19, © 2019 COSMOlogic GmbH & Co. KG, a Dassault Systèmes Company

  134. BIOVIA COSMOquick 2021 (2020) Dassault Systemes

  135. Loschen C, Klamt A (2012) COSMOquick: a novel interface for fast σ-profile composition and its application to COSMO-RS solvent screening using multiple reference solvents. Ind Eng Chem Res 51(43):14303–14308

    CAS  Google Scholar 

  136. Hornig M, Klamt A (2005) COSMOfrag: a novel tool for high-throughput ADME property prediction and similarity screening based on quantum chemistry. J Chem Inf Model 45(5):1169–1177

    CAS  PubMed  Google Scholar 

  137. Morgan HL (1965) The generation of a unique machine description for chemical structures—a technique developed at chemical abstracts service. J Chem Doc 5(2):107–113

    CAS  Google Scholar 

  138. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    Google Scholar 

  139. Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee S-I (2020) From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2(1):56–67

    PubMed  PubMed Central  Google Scholar 

  140. Hall LH, Kier LB (1995) Electrotopological state indexes for atom types—a novel combination of electronic, topological, and valence state information. J Chem Inf Comp Sci 35(6):1039–1045

    CAS  Google Scholar 

  141. Huuskonen JJ, Livingstone DJ, Tetko IV (2000) Neural network modeling for estimation of partition coefficient based on atom-type electrotopological state indices. J Chem Inf Comp Sci 40(4):947–955

    CAS  Google Scholar 

  142. Huuskonen JJ, Villa AEP, Tetko IV (1999) Prediction of partition coefficient based on atom-type electrotopological state indices. J Pharm Sci 88(2):229–233

    CAS  PubMed  Google Scholar 

  143. Kier LB, Hall LH (1990) An electrotopological-state index for atoms in molecules. Pharm Res 7(8):801–807

    CAS  PubMed  Google Scholar 

  144. Kier LB, Hall LH (1999) Molecular structure description: the electrotopological state. Academic Press

    Google Scholar 

  145. Openochem oestate license. https://github.com/openochem/ochem-external-tools/blob/main/oestate/license.txt

  146. Openchem. https://github.com/openochem

  147. Sushko I, Novotarskyi S, Korner R, Pandey AK, Rupp M, Teetz W, Brandmaier S, Abdelaziz A, Prokopenko VV, Tanchuk VY, Todeschini R, Varnek A, Marcou G, Ertl P, Potemkin V, Grishina M, Gasteiger J, Schwab C, Baskin II, Palyulin VA, Radchenko EV, Welsh WJ, Kholodovych V, Chekmarev D, Cherkasov A, Aires-de-Sousa J, Zhang QY, Bender A, Nigsch F, Patiny L, Williams A, Tkachenko V, Tetko IV (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aid Mol Des 25(6):533–554

    CAS  Google Scholar 

  148. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. Kdd 16:785–794

    Google Scholar 

  149. Dia M, Macris N, Krzakala F, Lesieur T, Zdeborová L (2016) Mutual information for symmetric rank-one matrix estimation: a proof of the replica formula. Adv Neural Inf Process Syst. https://doi.org/10.48550/arXiv.1606.04142

    Article  Google Scholar 

  150. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 30

  151. Zhang H, Si S, Hsieh CJ (2017) GPU-acceleration for large-scale Tree boosting. arXiv preprint arXiv:1706.08359.

  152. Moriguchi I, Hirono S, Liu Q, Nakagome I, Matsushita Y (1992) Simple method of calculating octanol/water partition coefficient. Chem Pharm Bull 40(1):127–130

    CAS  Google Scholar 

  153. Poda G, Tetko I (2005) In Towards predictive ADME profiling of drug candidates: lipophilicity and solubility, abstracts of papers of the American Chemical Society. American Chemical Society: Washington, DC, pp U201–U202.

  154. Tetko IV, Bruneau P (2004) Application of ALOGPS to predict 1-octanol/water distribution coefficients, logP, and logD, of AstraZeneca in-house database. J Pharm Sci 93(12):3103–3110

    CAS  PubMed  Google Scholar 

  155. Tetko IV, Poda GI (2004) Application of ALOGPS 2.1 to predict log D distribution coefficient for Pfizer proprietary compounds. J Med Chem 47(23):5601–5604

    CAS  PubMed  Google Scholar 

  156. Tetko IV, Tanchuk VY (2002) Application of associative neural networks for prediction of lipophilicity in ALOGPS 2.1 program. J Chem Inf Comput Sci 42(5):1136–1145

    CAS  PubMed  Google Scholar 

  157. Tetko IV, Tanchuk VY, Kasheva TN, Villa AE (2001) Estimation of aqueous solubility of chemical compounds using E-state indices. J Chem Inf Comput Sci 41(6):1488–1493

    CAS  PubMed  Google Scholar 

  158. Tetko IV, Tanchuk VY, Villa AE (2001) Prediction of n-octanol/water partition coefficients from PHYSPROP database using artificial neural networks and E-state indices. J Chem Inf Comput Sci 41(5):1407–1421

    CAS  PubMed  Google Scholar 

  159. Viswanadhan VN, Ghose AK, Revankar GR, Robins RK (1989) Atomic physicochemical parameters for three dimensional structure directed quantitative structure-activity relationships: 4—additional parameters for hydrophobic and dispersive interactions and their application for an automated superposition of certain naturally occurring nucleoside antibiotics. J Chem Inf Comp Sci 29(3):163–172

    CAS  Google Scholar 

  160. Openchem alogps license.

Download references

Acknowledgements

None.

Funding

The work was funded by Bayer AG.

Author information

Authors and Affiliations

Authors

Contributions

SK did the QM calculations, created the ML models and wrote the manuscript. AB prepared the datasets. TG and AG developed the concept and guided the work. All authors reviewed the manuscript.

Corresponding author

Correspondence to Andreas H. Göller.

Ethics declarations

Competing interest

The authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper. The authors declare no competing interests.

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

All authors have read and understood the publishing policy, and this manuscript is submitted in accordance with this policy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 680 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gheta, S.K.O., Bonin, A., Gerlach, T. et al. Predicting absolute aqueous solubility by applying a machine learning model for an artificially liquid-state as proxy for the solid-state. J Comput Aided Mol Des 37, 765–789 (2023). https://doi.org/10.1007/s10822-023-00538-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-023-00538-w

Keywords

Navigation