Skip to main content

Advertisement

Log in

Optimised stacked machine learning algorithms for genomics and genetics disorder detection in the healthcare industry

  • Original Article
  • Published:
Functional & Integrative Genomics Aims and scope Submit manuscript

Abstract

With recent advances in precision medicine and healthcare computing, there is an enormous demand for developing machine learning algorithms in genomics to enhance the rapid analysis of disease disorders. Technological advancement in genomics and imaging provides clinicians with enormous amounts of data, but prediction is still mostly subjective, resulting in problematic medical treatment. Machine learning is being employed in several domains of the healthcare sector, encompassing clinical research, early disease identification, and medicinal innovation with a historical perspective. The main objective of this study is to detect patients who, based on several medical standards, are more susceptible to having a genetic disorder. A genetic disease prediction algorithm was employed, leveraging the patient’s health history to evaluate the probability of diagnosing a genetic disorder. We developed a computationally efficient machine learning approach to predict the overall lifespan of patients with a genomics disorder and to classify and predict patients with a genetic disease. The SVM, RF, and ETC are stacked using two-layer meta-estimators to develop the proposed model. The first layer comprises all the baseline models employed to predict the outcomes based on the dataset. The second layer comprises a component known as a meta-classifier. Results from the experiment indicate that the model achieved an accuracy of 90.45% and a recall score of 90.19%. The area under the curve (AUC) for mitochondrial diseases is 98.1%; for multifactorial diseases, it is 97.5%; and for single-gene inheritance, it is 98.8%. The proposed approach presents a novel method for predicting patient prognosis in a manner that is unbiased, accurate, and comprehensive. The proposed approach outperforms human professionals using the current clinical standard for genetic disease classification in terms of identification accuracy. The implementation of stacked will significantly improve the field of biomedical research by improving the anticipation of genetic diseases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Availability of data and material

The dataset we used in our study is freely available publicly on the Kaggle website: (https://www.kaggle.com/datasets/aryarishabh/of-genomes-and-genetics-hackerearth-ml-challenge).

References

  • Quazi S (2022) Artificial intelligence and machine learning in precision and genomic medicine. Med Oncol 39(8):120

    Article  PubMed  PubMed Central  Google Scholar 

  • Lenze EJ, Rodebaugh TL, Nicol GE (2020) A framework for advancing precision medicine in clinical trials for mental disorders. JAMA Psychiat 77(7):663–664

    Article  Google Scholar 

  • Le-Niculescu H, Roseberry K, Levey DF, Rogers J, Kosary K, Prabha S, Jones T, Judd S, McCormick MA, Wessel AR, Williams A (2020) Towards precision medicine for stress disorders: diagnostic biomarkers and targeted drugs. Mol Psychiatry 25(5):918–938

    Article  CAS  PubMed  Google Scholar 

  • Ghazal TM, Al Hamadi H, Umar Nasir M, Gollapalli M, Zubair M, Adnan Khan M, Yeob Yeun C (2022) Supervised machine learning empowered multifactorial genetic inheritance disorder prediction. Comput Intell Neurosci 2022

  • De La Vega FM, Chowdhury S, Moore B, Frise E, McCarthy J, Hernandez EJ, Wong T, James K, Guidugli L, Agrawal PB, Genetti CA (2021) Artificial intelligence enables comprehensive genome interpretation and nomination of candidate diagnoses for rare genetic diseases. Genome Med 13:1–19

    Google Scholar 

  • Thirunavukarasu R, Gnanasambandan R, Gopikrishnan M, Palanisamy V (2022) Towards computational solutions for precision medicine based big data healthcare system using deep learning models: a review. Comput Biol Med 106020

  • Martin-Sanchez F, Iakovidis I, Nørager S, Maojo V, de Groen P, Van der Lei J, Jones T, Abraham-Fuchs K, Apweiler R, Babic A, Baud R (2004) Synergy between medical informatics and bioinformatics: facilitating genomic medicine for future health care. J Biomed Inform 37(1):30–42

    Article  CAS  PubMed  Google Scholar 

  • Nandhini K, Tamilpavai G (2023) An optimal stacked ResNet-BiLSTM-based accurate detection and classification of genetic disorders. Neural Process Lett 1–22

  • Nasir MU, Khan MA, Muhammad Z, Ghazal TM, Said RA, Al Hamadi H (2022) Single and mitochondrial gene inheritance disorder prediction using machine learning. Comput Mater Contin 73:953–963

    Google Scholar 

  • Ghazal TM, Al Hamadi H, Nasir MU, Gollapalli M, Zubair M, Khan MA, Yeun CY (2022) Supervised machine learning empowered multifactorial genetic inheritance disorder prediction. Comput Intell Neurosci 2022

  • Solomon DD, Sonia, Kumar K, Kanwar K, Iyer S, Kumar M (2023) Extensive review on the role of machine learning for multifactorial genetic disorders prediction. Arch Comput Meth Eng 1–18

  • Gurovich Y, Hanani Y, Bar O, Nadav G, Fleischer N, Gelbman D, Basel-Salmon L, Krawitz PM, Kamphausen SB, Zenker M, Bird LM (2019) Identifying facial phenotypes of genetic disorders using deep learning. Nat Med 25(1):60–64

    Article  CAS  PubMed  Google Scholar 

  • Lin E, Lane H-Y (2017) Machine learning and systems genomics approaches for multi-omics data. Biomarker Res 5:1–6

    Article  CAS  Google Scholar 

  • Asgari E, Mofrad MRK (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PloS one 10(11):e0141287. https://doi.org/10.1371/journal.pone.0141287

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mobadersany P, Yousefi S, Amgad M, Gutman DA, Barnholtz-Sloan JS, Velázquez Vega JE, Brat DJ, Cooper LAD (2018) Predicting cancer outcomes from histology and genomics using convolutional networks. Proc Natl Acad Sci 115(13):E2970–E2979

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  • Rostoks N, Park YJ, Ramakrishna W, Ma J, Druka A, Shiloff BA, SanMiguel PJ, Jiang Z, Brueggeman R, Sandhu D, Gill K (2002) Genomic sequencing reveals gene content, genomic organization, and recombination relationships in barley. Funct Integr Genomics 2:51–59

    Article  CAS  PubMed  Google Scholar 

  • Smoller JW (2018) The use of electronic health records for psychiatric phenotyping and genomics. Am J Med Genet B Neuropsychiatr Genet 177(7):601–612

    Article  PubMed  Google Scholar 

  • Liu L, Qingxian F, Ding H, Jiang H, Zhan Z, Lai Y (2023) Combination of machine learning-based bulk and single-cell genomics reveals necroptosis-related molecular subtypes and immunological features in autism spectrum disorder. Front Immunol 14:1139420

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • DeGroat W, Venkat V, Pierre-Louis W, Abdelhalim H, Ahmed Z (2023) Hygieia: AI/ML pipeline integrating healthcare and genomics data to investigate genes associated with targeted disorders and predict disease. Software Impacts 16:100493

    Article  Google Scholar 

  • Guo K, Wu M, Soo Z, Yang Y, Zhang Y, Zhang Q, Lin H, Grosser M, Venter D, Zhang G, Lu J (2023) Artificial intelligence-driven biomedical genomics. Knowl-Based Syst 7:110937

    Article  Google Scholar 

  • Allesøe RL, Thompson WK, Bybjerg-Grauholm J, Hougaard DM, Nordentoft M, Werge T, Rasmussen S, Benros ME (2023) Deep learning for cross-diagnostic prediction of mental disorder diagnosis and prognosis using Danish nationwide register and genetic data. JAMA Psychiatry 80(2):146–155

    Article  PubMed  Google Scholar 

  • Bracher-Smith M, Crawford K, Escott-Price V (2021) Machine learning for genetic prediction of psychiatric disorders: a systematic review. Mol Psychiatry 26(1):70–79

    Article  PubMed  Google Scholar 

  • Mittag F, Büchel F, Saad M, Jahn A, Schulte C, Bochdanovits Z, Simón-Sánchez J et al (2012) Use of support vector machines for disease risk prediction in genome-wide association studies: concerns and opportunities. Hum Mutat 33(12):1708–1718

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Rodriguez-Galiano V, Sanchez-Castillo M, Chica-Olmo M, Chica-Rivas MJOGR (2015) Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol Rev 71:804–818

  • Alchamlat A, Sinan, Farnir F (2017) KNN-MDR: a learning approach for improving interactions mapping performances in genome wide association studies. BMC Bioinf 18:1–12

  • Haga H, Sato H, Koseki A, Saito T, Okumoto K, Hoshikawa K, Katsumi T, Mizuno K, Nishina T, Ueno Y (2020) A machine learning-based treatment prediction model using whole genome variants of hepatitis C virus. PLoS ONE 15(11):e0242028

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kuang X, Wang F, Hernandez KM, Zhang Z, Grossman RL (2022) Accurate and rapid prediction of tuberculosis drug resistance from genome sequence data using traditional machine learning algorithms and CNN. Sci Rep 12(1):2427

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  • Rish I (2001) An empirical study of the naive Bayes classifier. In: IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, vol 3, no 22. pp 41–46

  • Shafique R, Mehmood A, Choi GS (2019) Cardiovascular disease prediction system using extra trees classifier

  • Yang K, Zheng Y, Kezhi L, Chang K, Wang N, Shu Z, Jian Yu, Liu B, Gao Z, Zhou X (2020) PDGNet: predicting disease genes using a deep neural network with multi-view features. IEEE/ACM Trans Comput Biol Bioinf 19(1):575–584

    Article  Google Scholar 

  • Farran B, Channanath AM, Behbehani K, Thanaraj TA (2013) Predictive models to assess risk of type 2 diabetes, hypertension and comorbidity: machine-learning algorithms and validation using national health data from Kuwait-a cohort study. BMJ Open 3(5):e002457

    Article  PubMed  PubMed Central  Google Scholar 

  • Liu L, Fu Q, Ding H, Jiang H, Zhan Z, Lai Y (2023) Combination of machine learning-based bulk and single-cell genomics reveals necroptosis-related molecular subtypes and immunological features in autism spectrum disorder. Front Immunol 14:1139420. https://doi.org/10.3389/fimmu.2023.1139420

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Nasir MU, Gollapalli M, Zubair M, Saleem MA, Mehmood S, Khan MA, Mosavi A (2022) Advance genome disorder prediction model empowered with deep learning. IEEE Access 10:70317–70328. https://doi.org/10.1109/ACCESS.2022.3186998

    Article  Google Scholar 

  • González-Recio O, Forni S (2011) Genome-wide prediction of discrete traits using Bayesian regressions and machine learning. Genet Sel Evol 43:1–12. https://doi.org/10.1186/1297-9686-43-7

    Article  Google Scholar 

Download references

Funding

The authors are thankful to AIDA Lab CCIS Prince Sultan University, Riyadh, Saudi Arabia, for the support.

Author information

Authors and Affiliations

Authors

Contributions

A.R.: conceptualization, methodology. M.M.: software programming, validation, verification. T.S.: formal analysis, investigation. G.J.: resources, data curation, management.

Corresponding author

Correspondence to Gwanggil Jeon.

Ethics declarations

Ethical approval

Not applicable

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rehman, A., Mujahid, M., Saba, T. et al. Optimised stacked machine learning algorithms for genomics and genetics disorder detection in the healthcare industry. Funct Integr Genomics 24, 23 (2024). https://doi.org/10.1007/s10142-024-01289-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10142-024-01289-z

Keywords

Navigation