research-article

Free Access

Just Accepted

Automatic Extractive Text Summarization using Multiple Linguistic Features

Authors:
Pooja Gupta

Department of Computer Science, Banasthali Vidyapith, Tonk, Rajasthan, India

Centre for Artificial Intelligence, Banasthali Vidyapith, Tonk, Rajasthan, India

Department of Computer Science, Banasthali Vidyapith, Tonk, Rajasthan, India

Centre for Artificial Intelligence, Banasthali Vidyapith, Tonk, Rajasthan, India

0000-0001-6583-4987
Search about this author

,
Swati Nigam

Department of Computer Science, Banasthali Vidyapith, Tonk, Rajasthan, India

Centre for Artificial Intelligence, Banasthali Vidyapith, Tonk, Rajasthan, India

Department of Computer Science, Banasthali Vidyapith, Tonk, Rajasthan, India

Centre for Artificial Intelligence, Banasthali Vidyapith, Tonk, Rajasthan, India

0000-0002-2629-8461
Search about this author

,
Rajiv Singh

Department of Computer Science, Banasthali Vidyapith, Tonk, Rajasthan, India

Centre for Artificial Intelligence, Banasthali Vidyapith, Tonk, Rajasthan, India

Department of Computer Science, Banasthali Vidyapith, Tonk, Rajasthan, India

Centre for Artificial Intelligence, Banasthali Vidyapith, Tonk, Rajasthan, India

0000-0003-4022-9945
Search about this author

ACM Transactions on Asian and Low-Resource Language Information ProcessingAccepted on April 2024https://doi.org/10.1145/3656471

Online AM:08 April 2024Publication History

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

Automatic text summarization (ATS) provides a summary of distinct categories of information using natural language processing (NLP). Low-resource languages like Hindi have restricted applications of these techniques. This study proposes a method for automatically generating summaries of Hindi documents using extractive technique. The approach retrieves pertinent sentences from the source documents by employing multiple linguistic features and machine learning (ML) using maximum likelihood estimation (MLE) and maximum entropy (ME). We conducted pre-processing on the input documents, such as eliminating Hindi stop words and stemming. We have obtained 15 linguistic feature scores from each document to identify the phrases with high scores for summary generation. We have performed experiments over BBC News articles, CNN News, DUC 2004, Hindi Text Short Summarization Corpus, Indian Language News Text Summarization Corpus, and Wikipedia Articles for the proposed text summarizer. The Hindi Text Short Summarization Corpus and Indian Language News Text Summarization Corpus datasets are in Hindi, whereas BBC News articles, CNN News, and the DUC 2004 datasets have been translated into Hindi using Google, Microsoft Bing, and Systran translators for experiments. The summarization results have been calculated and shown for Hindi as well as for English to compare the performance of a low and rich-resource language. Multiple ROUGE metrics, along with precision, recall, and F-measure, have been used for the evaluation, which shows the better performance of the proposed method with multiple ROUGE scores. We compare the proposed method with the supervised and unsupervised machine learning methodologies, including support vector machine (SVM), Naive Bayes (NB), decision tree (DT), latent semantic analysis (LSA), latent Dirichlet allocation (LDA), and K-means clustering, and it was found that the proposed method outperforms these methods.

References

Boorugu, R.; and Ramesh, G.: A survey on NLP based text summarization for summarizing product reviews. In 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA) IEEE, 352-356 (2020).Google ScholarCross Ref
Kassas, El.; W. S., Salama; C. R., Rafea; A. A.; and Mohamed, H. K.: Automatic Text Summarization: A Comprehensive Survey. Expert systems with applications. 165, 113679 (2020).Google Scholar
Andhale, N.; and Bewoor, L. A.: An overview of text summarization techniques. In 2016 international conference on computing communication control and automation (ICCUBEA) IEEE, 1-7 (2016).Google ScholarCross Ref
Neto, J. L.; Freitas, A. A.; and Kaestner, C. A.: Automatic text summarization using a machine learning approach. In Advances in Artificial Intelligence: 16th Brazilian Symposium on Artificial Intelligence, SBIA 2002 Porto de Galinhas /Recife, Brazil, November 11–14, 2002 Proceedings 16, Springer Berlin Heidelberg, 205-215 (2002).Google ScholarCross Ref
Shirwandkar, N. S.; and Kulkarni, S.: Extractive text summarization using deep learning. In 2018 fourth international conference on computing communication control and automation (ICCUBEA) 1-5 IEEE, (2018).Google Scholar
Yadav, A. K.; Singh, A.; Dhiman, M.; Vineet, Kaundal, R.; Verma, A.; and Yadav, D.: Extractive text summarization using deep learning approach. International Journal of Information Technology, 14(5), 2407-2415 (2022).Google ScholarCross Ref
Harish, B. S.; and Rangan, R. K.: A comprehensive survey on Indian regional language processing. SN Applied Sciences, 2(7), 1204 (2020).Google ScholarCross Ref
Sharma, K.; Bafna, N.; and Husain, S. Clause final verb prediction in Hindi: Evidence for noisy channel model of communication. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics 160-170 (2021).Google ScholarCross Ref
Hong, K.; and Nenkova, A.: Improving the estimation of word importance for news multi-document summarization. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. 712-721 (2014)Google Scholar
Khurana, A.; and Bhatnagar, V.: Investigating entropy for extractive document summarization. Expert Systems with Applications, 187, 115820 (2022).Google ScholarCross Ref
Fattah, M. A.: A machine learning model for multi-document summarization. Applied intelligence, 40, 592-600 (2014).Google Scholar
Shah, C.; and Jivani, A.: An automatic text summarization on Naive Bayes classifier using latent semantic analysis. Data, Engineering and Applications: Volume 1, 171-180 (2019).Google ScholarCross Ref
Wong, K. F.; Wu, M.; and Li, W.: Extractive summarization using supervised and semi-supervised learning. In Proceedings of the 22nd international conference on computational linguistics (Coling 2008) 985-992 (2008).Google ScholarDigital Library
Acharya, S.: Extractive Text Summarization Using Machine Learning (2022).Google Scholar
Belwal, R. C.; Rai, S.; and Gupta, A.: Extractive text summarization using clustering-based topic modeling. Soft Computing, 27(7), 3965-3982 (2023).Google ScholarDigital Library
Lin, C. Y.: Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74-81 (2004).Google Scholar
Mihalcea, R.: Graph-based ranking algorithms for sentence extraction, applied to text summarization. In Proceedings of the ACL interactive poster and demonstration sessions, 170-173 (2004).Google ScholarDigital Library
Dutta, M.; Das, A. K.; Mallick, C., Sarkar, A.; and Das, A. K.: A graph based approach on extractive summarization. In Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2018, Volume 2 179-187. Springer Singapore, (2019).Google ScholarCross Ref
Lin, C. Y.; and Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics. 150-157 (2003).Google ScholarDigital Library
Joshi, A;, Fidalgo, E.; Alegre, E.; and Alaiz-Rodriguez, R.: RankSum—An unsupervised extractive text summarization based on rank fusion. Expert Systems with Applications, 200, 116846 (2022).Google ScholarDigital Library
Elbarougy, R.; Behery, G.; and El Khatib, A.: Extractive Arabic text summarization using modified PageRank algorithm. Egyptian informatics journal, 21(2), 73-81 (2020).Google Scholar
Erkan, G.; and Radev, D. R.: Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research. 22, 457-479 (2004).Google Scholar
Mallick, C.; Das, A. K.; Dutta, M.; Das, A. K.; and Sarkar, A.: Graph-based text summarization using modified TextRank. In Soft computing in data analytics, Springer, Singapore. 137-146 (2019).Google ScholarCross Ref
Mamidala, K. K.; and Sanampudi, S. K.: Text summarization for Indian languages: a survey. Int J Adv Res Eng Technol (IJARET), 12(1), 530-538 (2021).Google Scholar
Saleh, A. A.; and Weigang, L.: TxLASM: A novel language agnostic summarization model for text documents. Expert Systems with Applications, 237, 121433 (2024).Google ScholarDigital Library
Jain, D.; Borah, M. D.; and Biswas, A.: Summarization of Lengthy Legal Documents via Abstractive Dataset Building: An Extract-then-Assign Approach. Expert Systems with Applications, 237, 121571 (2024).Google Scholar
Fatima, Z.; Zardari, S.; Fahim, M.; Andleeb Siddiqui, M.; Ibrahim, A. A. A.; Nisar, K.; and Naz, L. F.: A novel approach for semantic extractive text summarization. Applied Sciences, 12(9), 4479 (2022).Google ScholarCross Ref
Mutlu, B.; Sezer, E. A.; and Akcayol, M. A: Multi-document extractive text summarization: comparative assessment on features. Knowledge-Based Systems, 183, 104848 (2019).Google ScholarDigital Library
Adhikari, S.: Nlp based machine learning approaches for text summarization. In 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC) IEEE 535-538 (2020).Google Scholar
Yadav, D.; Katna, R.; Yadav, A. K.; and Morato, J.: Feature Based Automatic Text Summarization Methods: A Comprehensive State-of-the-Art Survey. IEEE Access, 10, 133981-134003 (2022).Google ScholarCross Ref
Kumar, Y.; Kaur, K.; and Kaur, S.: Study of automatic text summarization approaches in different languages. Artificial Intelligence Review, 54(8), 5897-5929 (2021).Google ScholarDigital Library
Harish, B. S.; and Rangan, R. K.: A comprehensive survey on Indian regional language processing. SN Applied Sciences, 2(7), 1204 (2020).Google ScholarCross Ref
Srivastava, R.; Singh, P.; Rana, K. P. S.; and Kumar, V.: A topic modeled unsupervised approach to single document extractive text summarization. Knowledge-Based Systems, 246, 108636 (2022).Google ScholarDigital Library
Mao, X.; Yang, H.; Huang, S.; Liu, Y.; and Li, R.: Extractive summarization using supervised and unsupervised learning. Expert systems with applications, 133, 173-181 (2019).Google Scholar
Bhandari, M.; Gour, P.; Ashfaq, A.; Liu, P.; and Neubig, G.: Re-evaluating evaluation in text summarization. arXiv preprint arXiv:2010.07100 (2020).Google Scholar
Radev, D. R.; Allison, T.; Blair-Goldensohn, S.; Blitzer, J.; Celebi, A.; Dimitrov, S.; and Zhang, Z.: MEAD-a platform for multidocument multilingual text summarization (2004).Google Scholar
Gupta, P.; Nigam, S.; and Singh, R.: A Statistical Language Modeling Framework for Extractive Summarization of Text Doents. SN Computer Science, 4(6), 750 (2023).Google ScholarDigital Library
Gupta, P.; Nigam, S.; and Singh, R.: A Ranking based Language Model for Automatic Extractive Text Summarization. In 2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR) 1-5 IEEE (2022).Google ScholarCross Ref
Gupta, P.; Nigam, S.; and Singh, R.: A Statistical Approach for Extractive Hindi Text Summarization Using Machine Translation. In Proceedings of Fourth International Conference on Computer and Communication Technologies: IC3T 2022 275-282 Singapore: Springer Nature Singapore (2023).Google ScholarCross Ref
Chiche, A.; and Yitagesu, B.: Part of speech tagging: a systematic review of deep learning and machine learning approaches. Journal of Big Data. 9(1), 1-25 (2022).Google ScholarCross Ref
Lovins, J. B.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1-2), 22-31 (1968).Google Scholar
Vimal Kumar, K.; and Yadav, D.: An improvised extractive approach to hindi text summarization. In Information Systems Design and Intelligent Applications: Proceedings of Second International Conference INDIA 2015, Volume 1 291-300 Springer India (2015).Google ScholarCross Ref
Mohd, M.; Jan, R.; and Shah, M.: Text document summarization using word embedding. Expert Systems with Applications, 143, 112958 (2020).Google ScholarDigital Library
Verma, P.; and Om, H.: A novel approach for text summarization using optimal combination of sentence scoring methods. Sādhanā, 44, 1-15 (2019).Google ScholarCross Ref
Karotia, A.; and Susan, S: Pre-training Meets Clustering: A Hybrid Extractive Multi-document Summarization Model. In International Conference on Hybrid Intelligent Systems, Cham: Springer Nature Switzerland 532-542 (2022).Google Scholar
Babu Gl, A.; and Badugu, S.: Extractive Summarization of Telugu Text Using Modified Text Rank and Maximum Marginal Relevance. ACM Transactions on Asian and Low-Resource Language Information Processing (2023).Google Scholar
Rani, R.; and Lobiyal, D. K.: Document vector embedding based extractive text summarization system for Hindi and English text. Applied Intelligence, 1-20 (2022).Google Scholar
Verma, P.; Pal, S.; and Om, H.: A comparative analysis on Hindi and English extractive text summarization. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(3), 1-39 (2019).Google Scholar
Kumar, K. V.; Yadav, D.; and Sharma, A.: Graph based technique for Hindi text summarization. In Information Systems Design and Intelligent Applications: Proceedings of Second International Conference INDIA 2015, Springer India, Volume 1 301-310 (2015).Google ScholarCross Ref
Dalal, V.; and Malik, L.: Data clustering approach for automatic text summarization of Hindi documents using particle swarm optimization and semantic graph. International Journal of Soft Computing and Engineering (IJSCE), 1-3 (2017).Google Scholar
Krishnan, D.; Bharathy, P.; and Venugopalan, M.: A supervised approach for extractive text summarization using minimal robust features. In 2019 International Conference on Intelligent Computing and Control Systems (ICCS) IEEE, 521-527 (2019).Google ScholarCross Ref

Index Terms

Automatic Extractive Text Summarization using Multiple Linguistic Features

Index terms have been assigned to the content through auto-classification.

Recommendations

A Statistical Language Modeling Framework for Extractive Summarization of Text Documents
Abstract
The availability of a large collection of text documents on a variety of topics, such as tweets, web pages, news articles, and stories, in different languages. Due to these electronic documents, users get exhausted reading the entire document and ...
Read More
Extractive text summarization using clustering-based topic modeling
Abstract
Text summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. ...
Read More
A Comparative Analysis on Hindi and English Extractive Text Summarization

Text summarization is the process of transfiguring a large documental information into a clear and concise form. In this article, we present a detailed comparative study of various extractive methods for automatic text summarization on Hindi and English ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Asian and Low-Resource Language Information Processing Just Accepted
ISSN:2375-4699
EISSN:2375-4702
Table of Contents

Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Online AM: 8 April 2024
- Accepted: 1 April 2024
- Revised: 14 March 2024
- Received: 30 November 2023
Published in tallip Just Accepted

Check for updates
Author Tags
Language modeling
machine learning
linguistic features
ROUGE
extractive summarization
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 82
  Total Downloads
- Downloads (Last 12 months)82
- Downloads (Last 6 weeks)82
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic Extractive Text Summarization using Multiple Linguistic Features

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

A Statistical Language Modeling Framework for Extractive Summarization of Text Documents

Extractive text summarization using clustering-based topic modeling

A Comparative Analysis on Hindi and English Extractive Text Summarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Automatic Extractive Text Summarization using Multiple Linguistic Features

ACM Transactions on Asian and Low-Resource Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

A Statistical Language Modeling Framework for Extractive Summarization of Text Documents

Extractive text summarization using clustering-based topic modeling

A Comparative Analysis on Hindi and English Extractive Text Summarization

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media