skip to main content
research-article

Improved BIO-Based Chinese Automatic Abstract-Generation Model

Published:09 March 2024Publication History
Skip Abstract Section

Abstract

With its unique information-filtering function, text summarization technology has become a significant aspect of search engines and question-and-answer systems. However, existing models that include the copy mechanism often lack the ability to extract important fragments, resulting in generated content that suffers from thematic deviation and insufficient generalization. Specifically, Chinese automatic summarization using traditional generation methods often loses semantics because of its reliance on word lists. To address these issues, we proposed the novel BioCopy mechanism for the summarization task. By training the tags of predictive words and reducing the probability distribution range on the glossary, we enhanced the ability to generate continuous segments, which effectively solves the above problems. Additionally, we applied reinforced canonicality to the inputs to obtain better model results, making the model share the sub-network weight parameters and sparsing the model output to reduce the search space for model prediction. To further improve the model’s performance, we calculated the bilingual evaluation understudy (BLEU) score on the English dataset CNN/DailyMail to filter the thresholds and reduce the difficulty of word separation and the dependence of the output on the word list. We fully fine-tuned the model using the LCSTS dataset for the Chinese summarization task and conducted small-sample experiments using the CSL dataset. We also conducted ablation experiments on the Chinese dataset. The experimental results demonstrate that the optimized model can learn the semantic representation of the original text better than other models and performs well with small sample sizes.

REFERENCES

  1. [1] Tas Oguzhan and Kiyani Farzad. 2007. A survey automatic text summarization. PressAcademia Procedia 5, 1 (2007), 205213.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Wei Bingzhen, Ren Xuancheng, Zhang Yi, Cai Xiaoyan, Su Qi, and Sun Xu. 2019. Regularizing output distribution of abstractive chinese social media text summarization for improved semantic consistency. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 18, 3 (2019), 115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Allahyari Mehdi, Pouriyeh Seyedamin, Assefi Mehdi, Safaei Saeid, Trippe Elizabeth D., Gutierrez Juan B., and Kochut Krys. 2017. Text summarization techniques: A brief survey. arXiv preprint arXiv:1707.02268 (2017).Google ScholarGoogle Scholar
  4. [4] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 59986008.Google ScholarGoogle Scholar
  5. [5] Qiu Xipeng, Sun Tianxiang, Xu Yige, Shao Yunfan, Dai Ning, and Huang Xuanjing. 2020. Pre-trained models for natural language processing: A survey. Science China Technological Sciences (2020), 126.Google ScholarGoogle Scholar
  6. [6] Xue Linting, Constant Noah, Roberts Adam, Kale Mihir, Al-Rfou Rami, Siddhant Aditya, Barua Aditya, and Raffel Colin. 2020. mt5: A massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934 (2020).Google ScholarGoogle Scholar
  7. [7] Wu Lijun, Li Juntao, Wang Yue, Meng Qi, Qin Tao, Chen Wei, Zhang Min, and Liu Tie-Yan. 2021. R-drop: Regularized dropout for neural networks. Advances in Neural Information Processing Systems 34 (2021).Google ScholarGoogle Scholar
  8. [8] Martins Andre and Astudillo Ramon. 2016. From softmax to sparsemax: A sparse model of attention and multi-label classification. In International Conference on Machine Learning. PMLR, 16141623.Google ScholarGoogle Scholar
  9. [9] Liu Yi, Zhang Guoan, Yu Puning, Su Jianlin, and Pan Shengfeng. 2021. BioCopy: A plug-and-play span copy mechanism in Seq2Seq models. arXiv preprint arXiv:2109.12533 (2021).Google ScholarGoogle Scholar
  10. [10] Sutskever Ilya, Vinyals Oriol, and Le Quoc V.. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 31043112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Rush Alexander M., Chopra Sumit, and Weston Jason. 2015. A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 (2015).Google ScholarGoogle Scholar
  12. [12] Chopra Sumit, Auli Michael, and Rush Alexander M.. 2016. Abstractive sentence summarization with attentive recurrent neural networks. In 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 9398.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Nallapati Ramesh, Zhai Feifei, and Zhou Bowen. 2017. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In 31st AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] See Abigail, Liu Peter J., and Manning Christopher D.. 2017. Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 (2017).Google ScholarGoogle Scholar
  15. [15] Gu Jiatao, Lu Zhengdong, Li Hang, and Li Victor O. K.. 2016. Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393 (2016).Google ScholarGoogle Scholar
  16. [16] Mikolov Tomas, Chen Kai, Corrado Greg, and Dean Jeffrey. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google ScholarGoogle Scholar
  17. [17] Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar. Association for Computational Linguistics, 532–1543.Google ScholarGoogle Scholar
  18. [18] Peters Matthew E., Neumann Mark, Iyyer Mohit, Gardner Matt, Clark Christopher, Lee Kenton, and Zettlemoyer Luke. 2018. Deep contextualized word representations. CoRR abs/1802.05365 (2018). arXiv:1802.05365 http://arxiv.org/abs/1802.05365Google ScholarGoogle Scholar
  19. [19] Sun Chi, Qiu Xipeng, Xu Yige, and Huang Xuanjing. 2019. How to fine-tune BERT for text classification?. In China National Conference on Chinese Computational Linguistics. Springer, 194206.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Radford Alec, Narasimhan Karthik, Salimans Tim, and Sutskever Ilya. 2018. Improving language understanding by generative pre-training. (2018).Google ScholarGoogle Scholar
  21. [21] Shaw Peter, Uszkoreit Jakob, and Vaswani Ashish. 2018. Self-attention with relative position representations. arXiv preprint arXiv:1803.02155 (2018).Google ScholarGoogle Scholar
  22. [22] Dauphin Yann N., Fan Angela, Auli Michael, and Grangier David. 2017. Language modeling with gated convolutional networks. In International Conference on Machine Learning. PMLR, 933941.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Shazeer Noam. 2020. GLU variants improve transformer. CoRR abs/2002.05202 (2020). arXiv:2002.05202 https://arxiv.org/abs/2002.05202Google ScholarGoogle Scholar
  24. [24] Peters Ben, Niculae Vlad, and Martins André F. T.. 2019. Sparse sequence-to-sequence models. arXiv preprint arXiv:1905.05702 (2019).Google ScholarGoogle Scholar
  25. [25] Pires Telmo, Schlinger Eva, and Garrette Dan. 2019. How multilingual is multilingual BERT? arXiv preprint arXiv:1906.01502 (2019).Google ScholarGoogle Scholar
  26. [26] Lin Chin-Yew. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out. 7481.Google ScholarGoogle Scholar
  27. [27] Papineni Kishore, Roukos Salim, Ward Todd, and Zhu Wei-Jing. 2002. BLEU: A method for automatic evaluation of machine translation. In 40th Annual Meeting of the Association for Computational Linguistics. 311318.Google ScholarGoogle Scholar
  28. [28] Hermann Karl Moritz, Kocisky Tomas, Grefenstette Edward, Espeholt Lasse, Kay Will, Suleyman Mustafa, and Blunsom Phil. 2015. Teaching machines to read and comprehend. Advances in Neural Information Processing Systems 28 (2015), 16931701.Google ScholarGoogle Scholar
  29. [29] Nallapati Ramesh, Zhou Bowen, Gulcehre Caglar, and Xiang Bing. 2016. Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023 (2016).Google ScholarGoogle Scholar
  30. [30] Hu Baotian, Chen Qingcai, and Zhu Fangze. 2015. LCSTS: A large-scale Chinese short text summarization dataset. arXiv preprint arXiv:1506.05865 (2015).Google ScholarGoogle Scholar
  31. [31] Pan Tse-Yu, Lo Li-Yun, Yeh Chung-Wei, Li Jhe-Wei, Liu Hou-Tim, and Hu Min-Chun. 2016. Real-time sign language recognition in complex background scene based on a hierarchical clustering classification method. In 2016 IEEE 2nd International Conference on Multimedia Big Data (BigMM ’16). IEEE, 6467.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Abdaoui Amine, Pradel Camille, and Sigel Grégoire. 2020. Load what you need: Smaller versions of multilingual BERT. arXiv preprint arXiv:2010.05609 (2020).Google ScholarGoogle Scholar
  33. [33] Shazeer Noam and Stern Mitchell. 2018. Adafactor: Adaptive learning rates with sublinear memory cost. In International Conference on Machine Learning. PMLR, 45964604.Google ScholarGoogle Scholar
  34. [34] Dong Li, Yang Nan, Wang Wenhui, Wei Furu, Liu Xiaodong, Wang Yu, Gao Jianfeng, Zhou Ming, and Hon Hsiao-Wuen. 2019. Unified language model pre-training for natural language understanding and generation. Advances in Neural Information Processing Systems 32 (2019).Google ScholarGoogle Scholar

Index Terms

  1. Improved BIO-Based Chinese Automatic Abstract-Generation Model

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Asian and Low-Resource Language Information Processing
          ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 3
          March 2024
          277 pages
          ISSN:2375-4699
          EISSN:2375-4702
          DOI:10.1145/3613569
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 9 March 2024
          • Online AM: 5 February 2024
          • Accepted: 23 January 2024
          • Revised: 18 August 2023
          • Received: 31 January 2022
          Published in tallip Volume 23, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
        • Article Metrics

          • Downloads (Last 12 months)69
          • Downloads (Last 6 weeks)36

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text