skip to main content
research-article
Free Access
Just Accepted

Automatic Extractive Text Summarization using Multiple Linguistic Features

Authors Info & Claims
Online AM:08 April 2024Publication History
Skip Abstract Section

Abstract

Automatic text summarization (ATS) provides a summary of distinct categories of information using natural language processing (NLP). Low-resource languages like Hindi have restricted applications of these techniques. This study proposes a method for automatically generating summaries of Hindi documents using extractive technique. The approach retrieves pertinent sentences from the source documents by employing multiple linguistic features and machine learning (ML) using maximum likelihood estimation (MLE) and maximum entropy (ME). We conducted pre-processing on the input documents, such as eliminating Hindi stop words and stemming. We have obtained 15 linguistic feature scores from each document to identify the phrases with high scores for summary generation. We have performed experiments over BBC News articles, CNN News, DUC 2004, Hindi Text Short Summarization Corpus, Indian Language News Text Summarization Corpus, and Wikipedia Articles for the proposed text summarizer. The Hindi Text Short Summarization Corpus and Indian Language News Text Summarization Corpus datasets are in Hindi, whereas BBC News articles, CNN News, and the DUC 2004 datasets have been translated into Hindi using Google, Microsoft Bing, and Systran translators for experiments. The summarization results have been calculated and shown for Hindi as well as for English to compare the performance of a low and rich-resource language. Multiple ROUGE metrics, along with precision, recall, and F-measure, have been used for the evaluation, which shows the better performance of the proposed method with multiple ROUGE scores. We compare the proposed method with the supervised and unsupervised machine learning methodologies, including support vector machine (SVM), Naive Bayes (NB), decision tree (DT), latent semantic analysis (LSA), latent Dirichlet allocation (LDA), and K-means clustering, and it was found that the proposed method outperforms these methods.

References

  1. Boorugu, R.; and Ramesh, G.: A survey on NLP based text summarization for summarizing product reviews. In 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA) IEEE, 352-356 (2020).Google ScholarGoogle ScholarCross RefCross Ref
  2. Kassas, El.; W. S., Salama; C. R., Rafea; A. A.; and Mohamed, H. K.: Automatic Text Summarization: A Comprehensive Survey. Expert systems with applications. 165, 113679 (2020).Google ScholarGoogle Scholar
  3. Andhale, N.; and Bewoor, L. A.: An overview of text summarization techniques. In 2016 international conference on computing communication control and automation (ICCUBEA) IEEE, 1-7 (2016).Google ScholarGoogle ScholarCross RefCross Ref
  4. Neto, J. L.; Freitas, A. A.; and Kaestner, C. A.: Automatic text summarization using a machine learning approach. In Advances in Artificial Intelligence: 16th Brazilian Symposium on Artificial Intelligence, SBIA 2002 Porto de Galinhas /Recife, Brazil, November 11–14, 2002 Proceedings 16, Springer Berlin Heidelberg, 205-215 (2002).Google ScholarGoogle ScholarCross RefCross Ref
  5. Shirwandkar, N. S.; and Kulkarni, S.: Extractive text summarization using deep learning. In 2018 fourth international conference on computing communication control and automation (ICCUBEA) 1-5 IEEE, (2018).Google ScholarGoogle Scholar
  6. Yadav, A. K.; Singh, A.; Dhiman, M.; Vineet, Kaundal, R.; Verma, A.; and Yadav, D.: Extractive text summarization using deep learning approach. International Journal of Information Technology, 14(5), 2407-2415 (2022).Google ScholarGoogle ScholarCross RefCross Ref
  7. Harish, B. S.; and Rangan, R. K.: A comprehensive survey on Indian regional language processing. SN Applied Sciences, 2(7), 1204 (2020).Google ScholarGoogle ScholarCross RefCross Ref
  8. Sharma, K.; Bafna, N.; and Husain, S. Clause final verb prediction in Hindi: Evidence for noisy channel model of communication. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics 160-170 (2021).Google ScholarGoogle ScholarCross RefCross Ref
  9. Hong, K.; and Nenkova, A.: Improving the estimation of word importance for news multi-document summarization. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. 712-721 (2014)Google ScholarGoogle Scholar
  10. Khurana, A.; and Bhatnagar, V.: Investigating entropy for extractive document summarization. Expert Systems with Applications, 187, 115820 (2022).Google ScholarGoogle ScholarCross RefCross Ref
  11. Fattah, M. A.: A machine learning model for multi-document summarization. Applied intelligence, 40, 592-600 (2014).Google ScholarGoogle Scholar
  12. Shah, C.; and Jivani, A.: An automatic text summarization on Naive Bayes classifier using latent semantic analysis. Data, Engineering and Applications: Volume 1, 171-180 (2019).Google ScholarGoogle ScholarCross RefCross Ref
  13. Wong, K. F.; Wu, M.; and Li, W.: Extractive summarization using supervised and semi-supervised learning. In Proceedings of the 22nd international conference on computational linguistics (Coling 2008) 985-992 (2008).Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Acharya, S.: Extractive Text Summarization Using Machine Learning (2022).Google ScholarGoogle Scholar
  15. Belwal, R. C.; Rai, S.; and Gupta, A.: Extractive text summarization using clustering-based topic modeling. Soft Computing, 27(7), 3965-3982 (2023).Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lin, C. Y.: Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74-81 (2004).Google ScholarGoogle Scholar
  17. Mihalcea, R.: Graph-based ranking algorithms for sentence extraction, applied to text summarization. In Proceedings of the ACL interactive poster and demonstration sessions, 170-173 (2004).Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Dutta, M.; Das, A. K.; Mallick, C., Sarkar, A.; and Das, A. K.: A graph based approach on extractive summarization. In Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2018, Volume 2 179-187. Springer Singapore, (2019).Google ScholarGoogle ScholarCross RefCross Ref
  19. Lin, C. Y.; and Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics. 150-157 (2003).Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Joshi, A;, Fidalgo, E.; Alegre, E.; and Alaiz-Rodriguez, R.: RankSum—An unsupervised extractive text summarization based on rank fusion. Expert Systems with Applications, 200, 116846 (2022).Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Elbarougy, R.; Behery, G.; and El Khatib, A.: Extractive Arabic text summarization using modified PageRank algorithm. Egyptian informatics journal, 21(2), 73-81 (2020).Google ScholarGoogle Scholar
  22. Erkan, G.; and Radev, D. R.: Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research. 22, 457-479 (2004).Google ScholarGoogle Scholar
  23. Mallick, C.; Das, A. K.; Dutta, M.; Das, A. K.; and Sarkar, A.: Graph-based text summarization using modified TextRank. In Soft computing in data analytics, Springer, Singapore. 137-146 (2019).Google ScholarGoogle ScholarCross RefCross Ref
  24. Mamidala, K. K.; and Sanampudi, S. K.: Text summarization for Indian languages: a survey. Int J Adv Res Eng Technol (IJARET), 12(1), 530-538 (2021).Google ScholarGoogle Scholar
  25. Saleh, A. A.; and Weigang, L.: TxLASM: A novel language agnostic summarization model for text documents. Expert Systems with Applications, 237, 121433 (2024).Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jain, D.; Borah, M. D.; and Biswas, A.: Summarization of Lengthy Legal Documents via Abstractive Dataset Building: An Extract-then-Assign Approach. Expert Systems with Applications, 237, 121571 (2024).Google ScholarGoogle Scholar
  27. Fatima, Z.; Zardari, S.; Fahim, M.; Andleeb Siddiqui, M.; Ibrahim, A. A. A.; Nisar, K.; and Naz, L. F.: A novel approach for semantic extractive text summarization. Applied Sciences, 12(9), 4479 (2022).Google ScholarGoogle ScholarCross RefCross Ref
  28. Mutlu, B.; Sezer, E. A.; and Akcayol, M. A: Multi-document extractive text summarization: comparative assessment on features. Knowledge-Based Systems, 183, 104848 (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Adhikari, S.: Nlp based machine learning approaches for text summarization. In 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC) IEEE 535-538 (2020).Google ScholarGoogle Scholar
  30. Yadav, D.; Katna, R.; Yadav, A. K.; and Morato, J.: Feature Based Automatic Text Summarization Methods: A Comprehensive State-of-the-Art Survey. IEEE Access, 10, 133981-134003 (2022).Google ScholarGoogle ScholarCross RefCross Ref
  31. Kumar, Y.; Kaur, K.; and Kaur, S.: Study of automatic text summarization approaches in different languages. Artificial Intelligence Review, 54(8), 5897-5929 (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Harish, B. S.; and Rangan, R. K.: A comprehensive survey on Indian regional language processing. SN Applied Sciences, 2(7), 1204 (2020).Google ScholarGoogle ScholarCross RefCross Ref
  33. Srivastava, R.; Singh, P.; Rana, K. P. S.; and Kumar, V.: A topic modeled unsupervised approach to single document extractive text summarization. Knowledge-Based Systems, 246, 108636 (2022).Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Mao, X.; Yang, H.; Huang, S.; Liu, Y.; and Li, R.: Extractive summarization using supervised and unsupervised learning. Expert systems with applications, 133, 173-181 (2019).Google ScholarGoogle Scholar
  35. Bhandari, M.; Gour, P.; Ashfaq, A.; Liu, P.; and Neubig, G.: Re-evaluating evaluation in text summarization. arXiv preprint arXiv:2010.07100 (2020).Google ScholarGoogle Scholar
  36. Radev, D. R.; Allison, T.; Blair-Goldensohn, S.; Blitzer, J.; Celebi, A.; Dimitrov, S.; and Zhang, Z.: MEAD-a platform for multidocument multilingual text summarization (2004).Google ScholarGoogle Scholar
  37. Gupta, P.; Nigam, S.; and Singh, R.: A Statistical Language Modeling Framework for Extractive Summarization of Text Doents. SN Computer Science, 4(6), 750 (2023).Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Gupta, P.; Nigam, S.; and Singh, R.: A Ranking based Language Model for Automatic Extractive Text Summarization. In 2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR) 1-5 IEEE (2022).Google ScholarGoogle ScholarCross RefCross Ref
  39. Gupta, P.; Nigam, S.; and Singh, R.: A Statistical Approach for Extractive Hindi Text Summarization Using Machine Translation. In Proceedings of Fourth International Conference on Computer and Communication Technologies: IC3T 2022 275-282 Singapore: Springer Nature Singapore (2023).Google ScholarGoogle ScholarCross RefCross Ref
  40. Chiche, A.; and Yitagesu, B.: Part of speech tagging: a systematic review of deep learning and machine learning approaches. Journal of Big Data. 9(1), 1-25 (2022).Google ScholarGoogle ScholarCross RefCross Ref
  41. Lovins, J. B.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1-2), 22-31 (1968).Google ScholarGoogle Scholar
  42. Vimal Kumar, K.; and Yadav, D.: An improvised extractive approach to hindi text summarization. In Information Systems Design and Intelligent Applications: Proceedings of Second International Conference INDIA 2015, Volume 1 291-300 Springer India (2015).Google ScholarGoogle ScholarCross RefCross Ref
  43. Mohd, M.; Jan, R.; and Shah, M.: Text document summarization using word embedding. Expert Systems with Applications, 143, 112958 (2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Verma, P.; and Om, H.: A novel approach for text summarization using optimal combination of sentence scoring methods. Sādhanā, 44, 1-15 (2019).Google ScholarGoogle ScholarCross RefCross Ref
  45. Karotia, A.; and Susan, S: Pre-training Meets Clustering: A Hybrid Extractive Multi-document Summarization Model. In International Conference on Hybrid Intelligent Systems, Cham: Springer Nature Switzerland 532-542 (2022).Google ScholarGoogle Scholar
  46. Babu Gl, A.; and Badugu, S.: Extractive Summarization of Telugu Text Using Modified Text Rank and Maximum Marginal Relevance. ACM Transactions on Asian and Low-Resource Language Information Processing (2023).Google ScholarGoogle Scholar
  47. Rani, R.; and Lobiyal, D. K.: Document vector embedding based extractive text summarization system for Hindi and English text. Applied Intelligence, 1-20 (2022).Google ScholarGoogle Scholar
  48. Verma, P.; Pal, S.; and Om, H.: A comparative analysis on Hindi and English extractive text summarization. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(3), 1-39 (2019).Google ScholarGoogle Scholar
  49. Kumar, K. V.; Yadav, D.; and Sharma, A.: Graph based technique for Hindi text summarization. In Information Systems Design and Intelligent Applications: Proceedings of Second International Conference INDIA 2015, Springer India, Volume 1 301-310 (2015).Google ScholarGoogle ScholarCross RefCross Ref
  50. Dalal, V.; and Malik, L.: Data clustering approach for automatic text summarization of Hindi documents using particle swarm optimization and semantic graph. International Journal of Soft Computing and Engineering (IJSCE), 1-3 (2017).Google ScholarGoogle Scholar
  51. Krishnan, D.; Bharathy, P.; and Venugopalan, M.: A supervised approach for extractive text summarization using minimal robust features. In 2019 International Conference on Intelligent Computing and Control Systems (ICCS) IEEE, 521-527 (2019).Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Automatic Extractive Text Summarization using Multiple Linguistic Features
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Asian and Low-Resource Language Information Processing
            ACM Transactions on Asian and Low-Resource Language Information Processing Just Accepted
            ISSN:2375-4699
            EISSN:2375-4702
            Table of Contents

            Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Online AM: 8 April 2024
            • Accepted: 1 April 2024
            • Revised: 14 March 2024
            • Received: 30 November 2023
            Published in tallip Just Accepted

            Check for updates

            Qualifiers

            • research-article
          • Article Metrics

            • Downloads (Last 12 months)82
            • Downloads (Last 6 weeks)82

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader