Abstract
Automatic text summarization (ATS) provides a summary of distinct categories of information using natural language processing (NLP). Low-resource languages like Hindi have restricted applications of these techniques. This study proposes a method for automatically generating summaries of Hindi documents using extractive technique. The approach retrieves pertinent sentences from the source documents by employing multiple linguistic features and machine learning (ML) using maximum likelihood estimation (MLE) and maximum entropy (ME). We conducted pre-processing on the input documents, such as eliminating Hindi stop words and stemming. We have obtained 15 linguistic feature scores from each document to identify the phrases with high scores for summary generation. We have performed experiments over BBC News articles, CNN News, DUC 2004, Hindi Text Short Summarization Corpus, Indian Language News Text Summarization Corpus, and Wikipedia Articles for the proposed text summarizer. The Hindi Text Short Summarization Corpus and Indian Language News Text Summarization Corpus datasets are in Hindi, whereas BBC News articles, CNN News, and the DUC 2004 datasets have been translated into Hindi using Google, Microsoft Bing, and Systran translators for experiments. The summarization results have been calculated and shown for Hindi as well as for English to compare the performance of a low and rich-resource language. Multiple ROUGE metrics, along with precision, recall, and F-measure, have been used for the evaluation, which shows the better performance of the proposed method with multiple ROUGE scores. We compare the proposed method with the supervised and unsupervised machine learning methodologies, including support vector machine (SVM), Naive Bayes (NB), decision tree (DT), latent semantic analysis (LSA), latent Dirichlet allocation (LDA), and K-means clustering, and it was found that the proposed method outperforms these methods.
- Boorugu, R.; and Ramesh, G.: A survey on NLP based text summarization for summarizing product reviews. In 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA) IEEE, 352-356 (2020).Google ScholarCross Ref
- Kassas, El.; W. S., Salama; C. R., Rafea; A. A.; and Mohamed, H. K.: Automatic Text Summarization: A Comprehensive Survey. Expert systems with applications. 165, 113679 (2020).Google Scholar
- Andhale, N.; and Bewoor, L. A.: An overview of text summarization techniques. In 2016 international conference on computing communication control and automation (ICCUBEA) IEEE, 1-7 (2016).Google ScholarCross Ref
- Neto, J. L.; Freitas, A. A.; and Kaestner, C. A.: Automatic text summarization using a machine learning approach. In Advances in Artificial Intelligence: 16th Brazilian Symposium on Artificial Intelligence, SBIA 2002 Porto de Galinhas /Recife, Brazil, November 11–14, 2002 Proceedings 16, Springer Berlin Heidelberg, 205-215 (2002).Google ScholarCross Ref
- Shirwandkar, N. S.; and Kulkarni, S.: Extractive text summarization using deep learning. In 2018 fourth international conference on computing communication control and automation (ICCUBEA) 1-5 IEEE, (2018).Google Scholar
- Yadav, A. K.; Singh, A.; Dhiman, M.; Vineet, Kaundal, R.; Verma, A.; and Yadav, D.: Extractive text summarization using deep learning approach. International Journal of Information Technology, 14(5), 2407-2415 (2022).Google ScholarCross Ref
- Harish, B. S.; and Rangan, R. K.: A comprehensive survey on Indian regional language processing. SN Applied Sciences, 2(7), 1204 (2020).Google ScholarCross Ref
- Sharma, K.; Bafna, N.; and Husain, S. Clause final verb prediction in Hindi: Evidence for noisy channel model of communication. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics 160-170 (2021).Google ScholarCross Ref
- Hong, K.; and Nenkova, A.: Improving the estimation of word importance for news multi-document summarization. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. 712-721 (2014)Google Scholar
- Khurana, A.; and Bhatnagar, V.: Investigating entropy for extractive document summarization. Expert Systems with Applications, 187, 115820 (2022).Google ScholarCross Ref
- Fattah, M. A.: A machine learning model for multi-document summarization. Applied intelligence, 40, 592-600 (2014).Google Scholar
- Shah, C.; and Jivani, A.: An automatic text summarization on Naive Bayes classifier using latent semantic analysis. Data, Engineering and Applications: Volume 1, 171-180 (2019).Google ScholarCross Ref
- Wong, K. F.; Wu, M.; and Li, W.: Extractive summarization using supervised and semi-supervised learning. In Proceedings of the 22nd international conference on computational linguistics (Coling 2008) 985-992 (2008).Google ScholarDigital Library
- Acharya, S.: Extractive Text Summarization Using Machine Learning (2022).Google Scholar
- Belwal, R. C.; Rai, S.; and Gupta, A.: Extractive text summarization using clustering-based topic modeling. Soft Computing, 27(7), 3965-3982 (2023).Google ScholarDigital Library
- Lin, C. Y.: Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74-81 (2004).Google Scholar
- Mihalcea, R.: Graph-based ranking algorithms for sentence extraction, applied to text summarization. In Proceedings of the ACL interactive poster and demonstration sessions, 170-173 (2004).Google ScholarDigital Library
- Dutta, M.; Das, A. K.; Mallick, C., Sarkar, A.; and Das, A. K.: A graph based approach on extractive summarization. In Emerging Technologies in Data Mining and Information Security: Proceedings of IEMIS 2018, Volume 2 179-187. Springer Singapore, (2019).Google ScholarCross Ref
- Lin, C. Y.; and Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics. 150-157 (2003).Google ScholarDigital Library
- Joshi, A;, Fidalgo, E.; Alegre, E.; and Alaiz-Rodriguez, R.: RankSum—An unsupervised extractive text summarization based on rank fusion. Expert Systems with Applications, 200, 116846 (2022).Google ScholarDigital Library
- Elbarougy, R.; Behery, G.; and El Khatib, A.: Extractive Arabic text summarization using modified PageRank algorithm. Egyptian informatics journal, 21(2), 73-81 (2020).Google Scholar
- Erkan, G.; and Radev, D. R.: Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research. 22, 457-479 (2004).Google Scholar
- Mallick, C.; Das, A. K.; Dutta, M.; Das, A. K.; and Sarkar, A.: Graph-based text summarization using modified TextRank. In Soft computing in data analytics, Springer, Singapore. 137-146 (2019).Google ScholarCross Ref
- Mamidala, K. K.; and Sanampudi, S. K.: Text summarization for Indian languages: a survey. Int J Adv Res Eng Technol (IJARET), 12(1), 530-538 (2021).Google Scholar
- Saleh, A. A.; and Weigang, L.: TxLASM: A novel language agnostic summarization model for text documents. Expert Systems with Applications, 237, 121433 (2024).Google ScholarDigital Library
- Jain, D.; Borah, M. D.; and Biswas, A.: Summarization of Lengthy Legal Documents via Abstractive Dataset Building: An Extract-then-Assign Approach. Expert Systems with Applications, 237, 121571 (2024).Google Scholar
- Fatima, Z.; Zardari, S.; Fahim, M.; Andleeb Siddiqui, M.; Ibrahim, A. A. A.; Nisar, K.; and Naz, L. F.: A novel approach for semantic extractive text summarization. Applied Sciences, 12(9), 4479 (2022).Google ScholarCross Ref
- Mutlu, B.; Sezer, E. A.; and Akcayol, M. A: Multi-document extractive text summarization: comparative assessment on features. Knowledge-Based Systems, 183, 104848 (2019).Google ScholarDigital Library
- Adhikari, S.: Nlp based machine learning approaches for text summarization. In 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC) IEEE 535-538 (2020).Google Scholar
- Yadav, D.; Katna, R.; Yadav, A. K.; and Morato, J.: Feature Based Automatic Text Summarization Methods: A Comprehensive State-of-the-Art Survey. IEEE Access, 10, 133981-134003 (2022).Google ScholarCross Ref
- Kumar, Y.; Kaur, K.; and Kaur, S.: Study of automatic text summarization approaches in different languages. Artificial Intelligence Review, 54(8), 5897-5929 (2021).Google ScholarDigital Library
- Harish, B. S.; and Rangan, R. K.: A comprehensive survey on Indian regional language processing. SN Applied Sciences, 2(7), 1204 (2020).Google ScholarCross Ref
- Srivastava, R.; Singh, P.; Rana, K. P. S.; and Kumar, V.: A topic modeled unsupervised approach to single document extractive text summarization. Knowledge-Based Systems, 246, 108636 (2022).Google ScholarDigital Library
- Mao, X.; Yang, H.; Huang, S.; Liu, Y.; and Li, R.: Extractive summarization using supervised and unsupervised learning. Expert systems with applications, 133, 173-181 (2019).Google Scholar
- Bhandari, M.; Gour, P.; Ashfaq, A.; Liu, P.; and Neubig, G.: Re-evaluating evaluation in text summarization. arXiv preprint arXiv:2010.07100 (2020).Google Scholar
- Radev, D. R.; Allison, T.; Blair-Goldensohn, S.; Blitzer, J.; Celebi, A.; Dimitrov, S.; and Zhang, Z.: MEAD-a platform for multidocument multilingual text summarization (2004).Google Scholar
- Gupta, P.; Nigam, S.; and Singh, R.: A Statistical Language Modeling Framework for Extractive Summarization of Text Doents. SN Computer Science, 4(6), 750 (2023).Google ScholarDigital Library
- Gupta, P.; Nigam, S.; and Singh, R.: A Ranking based Language Model for Automatic Extractive Text Summarization. In 2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR) 1-5 IEEE (2022).Google ScholarCross Ref
- Gupta, P.; Nigam, S.; and Singh, R.: A Statistical Approach for Extractive Hindi Text Summarization Using Machine Translation. In Proceedings of Fourth International Conference on Computer and Communication Technologies: IC3T 2022 275-282 Singapore: Springer Nature Singapore (2023).Google ScholarCross Ref
- Chiche, A.; and Yitagesu, B.: Part of speech tagging: a systematic review of deep learning and machine learning approaches. Journal of Big Data. 9(1), 1-25 (2022).Google ScholarCross Ref
- Lovins, J. B.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 11(1-2), 22-31 (1968).Google Scholar
- Vimal Kumar, K.; and Yadav, D.: An improvised extractive approach to hindi text summarization. In Information Systems Design and Intelligent Applications: Proceedings of Second International Conference INDIA 2015, Volume 1 291-300 Springer India (2015).Google ScholarCross Ref
- Mohd, M.; Jan, R.; and Shah, M.: Text document summarization using word embedding. Expert Systems with Applications, 143, 112958 (2020).Google ScholarDigital Library
- Verma, P.; and Om, H.: A novel approach for text summarization using optimal combination of sentence scoring methods. Sādhanā, 44, 1-15 (2019).Google ScholarCross Ref
- Karotia, A.; and Susan, S: Pre-training Meets Clustering: A Hybrid Extractive Multi-document Summarization Model. In International Conference on Hybrid Intelligent Systems, Cham: Springer Nature Switzerland 532-542 (2022).Google Scholar
- Babu Gl, A.; and Badugu, S.: Extractive Summarization of Telugu Text Using Modified Text Rank and Maximum Marginal Relevance. ACM Transactions on Asian and Low-Resource Language Information Processing (2023).Google Scholar
- Rani, R.; and Lobiyal, D. K.: Document vector embedding based extractive text summarization system for Hindi and English text. Applied Intelligence, 1-20 (2022).Google Scholar
- Verma, P.; Pal, S.; and Om, H.: A comparative analysis on Hindi and English extractive text summarization. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(3), 1-39 (2019).Google Scholar
- Kumar, K. V.; Yadav, D.; and Sharma, A.: Graph based technique for Hindi text summarization. In Information Systems Design and Intelligent Applications: Proceedings of Second International Conference INDIA 2015, Springer India, Volume 1 301-310 (2015).Google ScholarCross Ref
- Dalal, V.; and Malik, L.: Data clustering approach for automatic text summarization of Hindi documents using particle swarm optimization and semantic graph. International Journal of Soft Computing and Engineering (IJSCE), 1-3 (2017).Google Scholar
- Krishnan, D.; Bharathy, P.; and Venugopalan, M.: A supervised approach for extractive text summarization using minimal robust features. In 2019 International Conference on Intelligent Computing and Control Systems (ICCS) IEEE, 521-527 (2019).Google ScholarCross Ref
Index Terms
- Automatic Extractive Text Summarization using Multiple Linguistic Features
Recommendations
A Statistical Language Modeling Framework for Extractive Summarization of Text Documents
AbstractThe availability of a large collection of text documents on a variety of topics, such as tweets, web pages, news articles, and stories, in different languages. Due to these electronic documents, users get exhausted reading the entire document and ...
Extractive text summarization using clustering-based topic modeling
AbstractText summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. ...
A Comparative Analysis on Hindi and English Extractive Text Summarization
Text summarization is the process of transfiguring a large documental information into a clear and concise form. In this article, we present a detailed comparative study of various extractive methods for automatic text summarization on Hindi and English ...
Comments