Skip to main content
Log in

Efficient Classification of Hallmark of Cancer Using Embedding-Based Support Vector Machine for Multilabel Text

  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

The Hallmark of Cancers consists of various biological capabilities of the tumor cell which help the medical experts to understand the development and identification of these cells during various stages of the cancer disease. The hallmark of cancer classification is a widely accepted framework that characterizes the fundamental biological capabilities of cancer cells. This classification is based on the work of Hanahan and Weinberg, who identified 10 hallmark capabilities that collectively enable the development and progression of cancer. The hallmark of cancer classification provides a comprehensive framework for understanding the biological basis of cancer development and progression. It helps researchers to identify the key molecular and cellular pathways that are involved in the disease, which can inform the development of new diagnostic tools and therapies. Multi-label classification aims to assign a set of labels to the samples under study. This paper focuses on creating an improved model by hybridizing the biomedical domain-specific embeddings for all the extracted biomedical features on the machine learning model. The use of domain-specific embeddings adds semantics to the vector-represented text. More specifically the study has tried to improve the efficacy of the multi-label classification as compared with other state-of-art methods using BioWordVec and the MeSH embeddings. The experimental work showed a significant improvement in the performance of our model which is being trained on the machine learning algorithm Support Vector Machine (SVM). The paper also focuses on understanding the label correlation which is studied by conducting a case study with medical domain experts and is also analyzed with the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

On request.

References

  1. ABNER: A Biomedical Named Entity Recognizer [WWW Document] (n.d.). https://pages.cs.wisc.edu/~bsettles/abner/. Accessed 7 July 2022

  2. Baker, S.: Hallmarks-of-Cancer (2021)

  3. Baker, S., Silins, I., Guo, Y., Ali, I., Högberg, J., Stenius, U., Korhonen, A.: Automatic semantic classification of scientific literature according to the hallmarks of cancer. Bioinforma. Oxf. Engl. 32, 432–440 (2016). https://doi.org/10.1093/bioinformatics/btv585

    Article  CAS  Google Scholar 

  4. Budhiraja, M.: Multi label text classification for untrained data through supervised learning. In: 2017 International Conference on Intelligent Computing and Control (I2C2). Presented at the 2017 International Conference on Intelligent Computing and Control (I2C2), pp. 1–3 (2017). https://doi.org/10.1109/I2C2.2017.8321804

  5. Cerri, R., da Silva, R.R.O., de Carvalho, A.C.P.L.F.: Comparing methods for multilabel classification of proteins using machine learning techniques. In: Guimarães, K.S., Panchenko, A., Przytycka, T.M. (eds.) Advances in Bioinformatics and Computational Biology, pp. 109–120. Springer, Berlin, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03223-3_10

  6. Deng, Y., Zhou, L., Yao, J., Liu, Y., Zheng, Y., Yang, S., Wu, Y., Li, N., Xu, P., Lyu, L., Zhang, D., Lyu, J., Dai, Z.: Associations of lncRNA H19 polymorphisms at MicroRNA binding sites with glioma susceptibility and prognosis. Mol. Ther. Nucleic Acids 20, 86–96 (2020). https://doi.org/10.1016/j.omtn.2020.02.003

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Doan, S., Kawazoe, A., Collier, N.: The role of roles in classifying annotated biomedical text. In: Biological, Translational, and Clinical Language Processing, pp. 17–24. Prague, Czech Republic, Association for Computational Linguistics (2007)

    Google Scholar 

  8. Guo, H., Li, X., Zhang, L., Liu, J., Chen, W.: Label-aware text representation for multi-label text classification. In: ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Presented at the ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7728–7732 (2021). https://doi.org/10.1109/ICASSP39728.2021.9413921

  9. Gutschner, T., Diederichs, S.: The hallmarks of cancer. RNA Biol. 9, 703–719 (2012). https://doi.org/10.4161/rna.20481

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Hanahan, D., Weinberg, R.A.: Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011). https://doi.org/10.1016/j.cell.2011.02.013

    Article  CAS  PubMed  Google Scholar 

  11. Hanahan, D., Weinberg, R.A.: The Hallmarks of cancer. Cell 100, 57–70 (2000). https://doi.org/10.1016/S0092-8674(00)81683-9

    Article  CAS  PubMed  Google Scholar 

  12. Home—MeSH—NCBI [WWW Document] (n.d.). https://www.ncbi.nlm.nih.gov/mesh/. Accessed 25 July 2022

  13. Kanstrén, T.: A look at precision, recall, and F1-score [WWW Document]. Medium (2021). https://towardsdatascience.com/a-look-at-precision-recall-and-f1-score-36b5fd0dd3ec. Accessed 7 Aug 2022

  14. Verma, S., Sharan, A.: Incorporating semantics for text classification in biomedical domain. in Proceedings of the International Health Informatics Conference, Jain, S., Groppe, S., Mihindukulasooriya, N. Eds., in Lecture Notes in Electrical Engineering. Singapore: Springer Nature, 2023, pp. 185–197. https://doi.org/10.1007/978-981-19-9090-8_17

  15. PubMed [WWW Document]. PubMed (n.d.). https://pubmed.ncbi.nlm.nih.gov/. Accessed 4 Aug 2022

  16. scispacy [WWW Document. scispacy (n.d.). https://allenai.github.io/scispacy/. Accessed 25 July 2022

  17. Wang, T.-Y., Chiang, H.-M.: Fuzzy support vector machine for multi-class text categorization. Inf. Process. Manag. 43, 914–929 (2007). https://doi.org/10.1016/j.ipm.2006.09.011

    Article  Google Scholar 

  18. Xun, G., Jha, K., Yuan, Y., Zhang, A.: Topic discovery for biomedical corpus using MeSH Embeddings. In: 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI). Presented at the 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), pp. 1–4 (2019). https://doi.org/10.1109/BHI.2019.8834559

  19. Yang, J., Bai, L., Guo, Y.: A survey of text classification models. In: Proceedings of the 2020 2nd International Conference on Robotics, Intelligent Control and Artificial Intelligence, RICAI 2020, pp. 327–334. Association for Computing Machinery, New York, NY (2020). https://doi.org/10.1145/3438872.3439101

  20. Yu, T., Li, T., Wang, X.: Multi-label text classification with label correction under noise. In: 2021 10th International Conference on Computing and Pattern Recognition, ICCPR 2021, pp. 169–174. Association for Computing Machinery, New York, NY (2021). https://doi.org/10.1145/3497623.3497650

  21. Zhang, M.-L., Zhou, Z.-H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26, 1819–1837 (2014). https://doi.org/10.1109/TKDE.2013.39

    Article  Google Scholar 

  22. Zhang, Y., Chen, Q., Yang, Z., Lin, H., Lu, Z.: BioWordVec, improving biomedical word embeddings with subword information and MeSH. Sci. Data 6, 52 (2019). https://doi.org/10.1038/s41597-019-0055-0

    Article  PubMed  PubMed Central  Google Scholar 

  23. Chen, W.-J., Shao, Y.-H., Li, C.-N., Deng, N.-Y.: MLTSVM: a novel twin support vector machine to multi-label learning. Pattern Recognit. 52, 61–74 (2016). https://doi.org/10.1016/j.patcog.2015.10.008

    Article  ADS  CAS  Google Scholar 

  24. Kassim, T., Mohan, B.S.S., Muneer, K.V.A.: Modified ML-kNN and rank SVM for multi-label pattern classification. J. Phys. Conf. Ser. 1921, 012027 (2021). https://doi.org/10.1088/1742-6596/1921/1/012027

    Article  Google Scholar 

  25. Shajari, H., Rangarajan, A.: A unified framework for multiclass and multilabel support vector machines (2020). https://doi.org/10.48550/arXiv.2003.11197

  26. Schapire, R. E., Singer, Y. Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37(3), 297–336 (1999). https://doi.org/10.1023/A:1007614523901

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shikha Verma.

Ethics declarations

Conflicts of interest

The authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge, or beliefs) in the subject matter or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Verma, S., Sharan, A. & Malik, N. Efficient Classification of Hallmark of Cancer Using Embedding-Based Support Vector Machine for Multilabel Text. New Gener. Comput. (2024). https://doi.org/10.1007/s00354-024-00248-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00354-024-00248-3

Keywords

Navigation