Abstract
Drug discovery, especially virtual screening and drug repositioning, can be accelerated through deeper understanding and prediction of Drug Target Interactions (DTIs). The advancement of deep learning as well as the time and financial costs associated with conventional wet-lab experiments have made computational methods for DTI prediction more popular. However, the majority of these computational methods handle the DTI problem as a binary classification task, ignoring the quantitative binding affinity that determines the drug efficacy to their target proteins. Moreover, computational space as well as execution time of the model is often ignored over accuracy. To address these challenges, we introduce a novel method, called Time-efficient Multimodal Drug Target Binding Affinity (TeM-DTBA), which predicts the binding affinity between drugs and targets by fusing different modalities based on compound structures and target sequences. We employ the Lasso feature selection method, which lowers the dimensionality of feature vectors and speeds up the proposed model training time by more than 50%. The results from two benchmark datasets demonstrate that our method outperforms state-of-the-art methods in terms of performance. The mean squared errors of 18.8% and 23.19%, achieved on the KIBA and Davis datasets, respectively, suggest that our method is more accurate in predicting drug-target binding affinity.
Similar content being viewed by others
Data availability
The datasets used in the current study are available in the repository https://github.com/hkmztrk/DeepDTA/tree/master/data.
References
Gonzalez MW, Kann MG (2012) Chapter 4: Protein interactions and disease. PLoS Comput Biol 8(12):e1002819. https://doi.org/10.1371/journal.pcbi.1002819
Mamoshina P et al (2018) Machine learning on human muscle transcriptomic data for biomarker discovery and tissue-specific drug target identification. Front Genet 9:242. https://doi.org/10.3389/fgene.2018.00242
Xuan P et al (2019) Gradient boosting decision tree-based method for predicting interactions between target genes and drugs. Front Genet 10:459. https://doi.org/10.3389/fgene.2019.00459
Paul SM et al (2010) How to improve r & d productivity: the pharmaceutical industry’s grand challenge. Nat Rev Drug Discovery 9(3):203–214. https://doi.org/10.1038/nrd3078
Wang L et al (2021) Nmfcda: Combining randomization-based neural network with non-negative matrix factorization for predicting circrna-disease association. Appl Soft Comput 110:107629. https://doi.org/10.1016/j.asoc.2021.107629
Wang L et al (2021) Sganrda: semi-supervised generative adversarial networks for predicting circrna-disease associations. Briefings Bioinform 22(5):bbab028. https://doi.org/10.1093/bib/bbab028
Wang L et al (2017) An ensemble approach for large-scale identification of protein-protein interactions using the alignments of multiple sequences. Oncotarget 8(3):5149. https://doi.org/10.18632/oncotarget.14103
Zhu S, Bing J, Min X, Lin C, Zeng X (2018) Prediction of drug-gene interaction by using metapath2vec. Front Genet 9:248. https://doi.org/10.3389/fgene.2018.00248
Luo H et al (2021) Biomedical data and computational models for drug repositioning: a comprehensive review. Brief Bioinform 22(2):1604–1619. https://doi.org/10.1093/bib/bbz176
El-Behery H, Attia A-F, El-Fishawy N, Torkey H (2021) Efficient machine learning model for predicting drug-target interactions with case study for covid-19. Comput Biol Chem 93:107536. https://doi.org/10.1021/ci00057a005
Wen M et al (2017) Deep-learning-based drug-target interaction prediction. J Proteome Res 16(4):1401–1409. https://doi.org/10.1021/acs.jproteome.6b00618
Kairys V, Baranauskiene L, Kazlauskiene M, Matulis D, Kazlauskas E (2019) Binding affinity in drug design: experimental and computational techniques. Expert Opin Drug Discov 14(8):755–768. https://doi.org/10.1080/17460441.2019.1623202
Chen R, Liu X, Jin S, Lin J, Liu J (2018) Machine learning for drug-target interaction prediction. Molecules 23(9):2208. https://doi.org/10.1186/s12911-020-1052-0
Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V (2015) Deep neural nets as a method for quantitative structure-activity relationships. J Chem Inf Model 55(2):263–274. https://doi.org/10.1021/ci500747n
Dong J, Zhao M, Liu Y, Su Y, Zeng X (2022) Deep learning in retrosynthesis planning: datasets, models and tools. Briefing Bioinform 23(1):bbab391. https://doi.org/10.1093/bib/bbab391
Kar S, Roy K (2011) Development and validation of a robust qsar model for prediction of carcinogenicity of drugs. Indian Journal of Biochemistry and Biophysics48(2):111–22. http://nopr.niscpr.res.in/handle/123456789/11614
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539
Baltrušaitis T, Ahuja C, Morency L-P (2018) Multimodal machine learning: A survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 41(2):423–443. https://doi.org/10.1109/TPAMI.2018.2798607
Lan W, Wang J, Li M, Wu F-X, Pan Y (2015) Predicting drug-target interaction based on sequence and structure information. IFAC-PapersOnLine 48(28):12–16. https://doi.org/10.1016/j.ifacol.2015.12.092
He T, Heidemeyer M, Ban F, Cherkasov A, Ester M (2017) Simboost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines. J Cheminform 9(1):1–14. https://doi.org/10.1186/s13321-017-0209-z
Liyaqat T, Ahmad T, Saxena C (2022) A methodology for the prediction of drug target interaction using CDK descriptors. CoRRabs/2210.11482. https://doi.org/10.48550/arXiv.2210.11482
Pahikkala T et al (2015) Toward more realistic drug-target interaction predictions. Brief Bioinform 16(2):325–337. https://doi.org/10.1093/bib/bbu010
Karimi M, Wu D, Wang Z, Shen Y (2019) Deepaffinity: interpretable deep learning of compound-protein affinity through unified recurrent and convolutional neural networks. Bioinformatics 35(18):3329–3338. https://doi.org/10.1093/bioinformatics/btz111
Zhao L, Wang J, Pang L, Liu Y, Zhang J (2020) Gansdta: Predicting drug-target binding affinity using gans. Front Genet 10:1243. https://doi.org/10.3389/fgene.2019.01243
Öztürk H, Özgür A, Ozkirimli E (2018) Deepdta: deep drug-target binding affinity prediction. Bioinformatics 34(17):i821–i829. https://doi.org/10.1093/bioinformatics/bty593
Wang H, Zhou G, Liu S, Jiang J-Y, Wang W (2021) Drug-target interaction prediction with graph attention networks. arXiv preprint arXiv:2107.06099. https://doi.org/10.48550/arXiv.2107.06099
Zhao, Q., Xiao, F., Yang, M., Li, Y. & Wang, J. Yoo, I., Bi, J. & Hu, X. (eds) Attentiondta: prediction of drug-target binding affinity using attention model. (eds Yoo, I., Bi, J. & Hu, X.) 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019, San Diego, CA, USA, November 18-21, 2019, 64–69 (IEEE 2019). https://doi.org/10.1109/BIBM47256.2019.8983125
Lin X (2020) Deepgs: Deep representation learning of graphs and sequences for drug-target binding affinity prediction. arXiv preprint arXiv:2003.13902. https://arxiv.org/abs/2003.13902
Thafar MA et al (2022) Affinity2vec: drug-target binding affinity prediction through representation learning, graph mining, and machine learning. Sci Rep 12(1):1–18. https://doi.org/10.1038/s41598-022-08787-9
Shin B, Park S, Kang K, Ho JC, Doshi-Velez F et al (eds) (2019) Self-attention based molecule representation for predicting drug-target interaction. (eds Doshi-Velez, F. et al.) , Vol. 106 of Proceedings of Machine Learning Research, 230–248 (PMLR). http://proceedings.mlr.press/v106/shin19a.html
Yang X et al (2022) Modality-dta: Multimodality fusion strategy for drug-target affinity prediction. IEEE/ACM Trans Comput Biol Bioinf. https://doi.org/10.1109/TCBB.2022.3205282
Song T et al (2022) Deepfusion: A deep learning based multi-scale feature fusion method for predicting drug-target interactions. Methods 204:269–277. https://doi.org/10.1016/j.ymeth.2022.02.007
Tang J et al (2014) Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis. J Chem Inf Model 54(3):735–743. https://doi.org/10.1021/ci400709d
Davis MI et al (2011) Comprehensive analysis of kinase inhibitor selectivity. Nat Biotechnol 29(11):1046–1051. https://doi.org/10.1038/nbt.1990
Weininger D (1988) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. J Chem Inform Comput Sci 28(1):31–36. https://doi.org/10.1021/ci00057a005
Liu, F., Ren, X., Zhang, Z., Sun, X. & Zou, Y. Scott, D., Bel, N. & Zong, C. (eds) Rethinking skip connection with layer normalization. (eds Scott, D., Bel, N. & Zong, C.) Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8-13, 2020, 3586–3598 (International Committee on Computational Linguistics, 2020). https://doi.org/10.18653/v1/2020.coling-main.320
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. CoRRabs/1512.03385. http://arxiv.org/abs/1512.03385.1512.03385
Xia F et al (2018) Predicting tumor cell line response to drug pairs with deep learning. BMC Bioinform 19(18):71–79. https://doi.org/10.1186/s12859-018-2509-3
Jaeger S, Fulle S, Turk S (2018) Mol2vec: unsupervised machine learning approach with chemical intuition. J Chem Inf Model 58(1):27–35. https://doi.org/10.1021/acs.jcim.7b00616
Dong J et al (2015) Chemdes: an integrated web-based platform for molecular descriptor and fingerprint computation. J Cheminform 7(1):1–10. https://doi.org/10.1186/s13321-015-0109-z
Yap CW (2011) Padel-descriptor: An open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32(7):1466–1474. https://doi.org/10.1002/jcc.21707
Rost B, Sander C (1996) Bridging the protein sequence-structure gap by structure predictions. Annu Rev Biophys Biomol Struct 25(1):113–136. https://doi.org/10.1146/annurev.bb.25.060196.000553
Elnaggar A et al (2020) Prottrans: towards cracking the language of life’s code through self-supervised deep learning and high performance computing. arXiv preprint arXiv:2007.06225. https://doi.org/10.1109/TPAMI.2021.3095381
Steinegger M, Mirdita M, Söding J (2019) Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold. Nat Methods 16(7):603–606. https://doi.org/10.1038/s41592-019-0437-4
Steinegger M, Söding J (2018) Clustering huge protein sequence sets in linear time. Nat Commun 9(1):1–8. https://doi.org/10.1038/s41467-018-04964-5
Mahmud SH et al (2020) Prediction of drug-target interaction based on protein features using undersampling and feature selection techniques with boosting. Analytical Biochem 589:113507. https://doi.org/10.1016/j.ab.2019.113507
Mahmud SH et al (2020) Deepaction: A deep learning-based method for predicting novel drug-target interactions. Anal Biochem 610:113978. https://doi.org/10.1016/j.ab.2020.113978
Mahmud SH et al (2021) Predtis: prediction of drug-target interactions based on multiple feature information using gradient boosting framework with data balancing and feature selection techniques. Briefings Bioinform 22(5):046. https://doi.org/10.1093/bib/bbab046
Chen C et al (2021) Dnn-dtis: Improved drug-target interactions prediction using xgboost feature selection and deep neural network. Comput Biol Med 136:104676. https://doi.org/10.1016/j.compbiomed.2021.104676
Refahi MS, Mir A, Nasiri JA (2020) A novel fusion based on the evolutionary features for protein fold recognition using support vector machines. Sci Rep 10(1):1–13. https://doi.org/10.1038/s41598-020-71172-x
Lobley A, Sadowski MI, Jones DT (2009) pgenthreader and pdomthreader: new methods for improved protein fold recognition and superfamily discrimination. Bioinformatics 25(14):1761–1767. https://doi.org/10.1093/bioinformatics/btp302
Zhu H-J et al (2019) Improved prediction of protein-protein interactions using descriptors derived from pssm via gray level co-occurrence matrix. IEEE Access 7:49456–49465. https://doi.org/10.1109/ACCESS.2019.2907132
Wang L, Wang H-F, Liu S-R, Yan X, Song K-J (2019) Predicting protein-protein interactions from matrix-based protein sequence using convolution neural network and feature-selective rotation forest. Sci Rep 9(1):1–12. https://doi.org/10.1038/s41598-019-46369-4
Liu T, Geng X, Zheng X, Li R, Wang J (2012) Accurate prediction of protein structural class using auto covariance transformation of psi-blast profiles. Amino Acids 42:2243–2249. https://doi.org/10.1109/TCBB.2022.3205282
Liu T, Zheng X, Wang J (2010) Prediction of protein structural class for low-similarity sequences using support vector machine and psi-blast profile. Biochimie 92(10):1330–1334. https://doi.org/10.1016/j.biochi.2010.06.013
Dong Q, Zhou S, Guan J (2009) A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics 25(20):2655–2662. https://doi.org/10.1093/bioinformatics/btp500
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc: Ser B (Methodol) 58(1):267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Nebauer C (1998) Evaluation of convolutional neural networks for visual recognition. IEEE Trans Neural Networks 9(4):685–696. https://doi.org/10.1109/72.701181
Abdeljaber O et al (2018) 1-d cnns for structural damage detection: Verification on a structural health monitoring benchmark data. Neurocomputing 275:1308–1317. https://doi.org/10.1016/j.neucom.2017.09.069
Kiranyaz S, Ince T, Gabbouj M (2015) Real-time patient-specific ecg classification by 1-d convolutional neural networks. IEEE Trans Biomed Eng 63(3):664–675. https://doi.org/10.1109/TBME.2015.2468589
Shim J, Hong Z-Y, Sohn I, Hwang C (2021) Prediction of drug-target binding affinity using similarity-based convolutional neural network. Sci Rep 11(1):4416. https://doi.org/10.1038/s41598-021-83679-y
Pratim Roy P, Paul S, Mitra I, Roy K (2009) On two novel parameters for validation of predictive qsar models. Molecules 14(5):1660–1701. https://doi.org/10.3390/molecules14051660
Roy K et al (2013) Some case studies on application of “rm2” metrics for judging quality of quantitative structure-activity relationship predictions: emphasis on scaling of response data. J Comput Chem 34(12):1071–1082. https://doi.org/10.1002/jcc.23231
Author information
Authors and Affiliations
Contributions
TL: methodology, software, validation, formal analysis, investigation, data curation, writing—original draft. TA: validation, formal analysis, investigation, writing—review and editing. CS: writing—review and editing, formal analysis, conceptualization, validation, visualization.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Ethical approval
The data used in the study adheres to the standard ethical rules and regulations.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liyaqat, T., Ahmad, T. & Saxena, C. TeM-DTBA: time-efficient drug target binding affinity prediction using multiple modalities with Lasso feature selection. J Comput Aided Mol Des 37, 573–584 (2023). https://doi.org/10.1007/s10822-023-00533-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-023-00533-1