Abstract
With its unique information-filtering function, text summarization technology has become a significant aspect of search engines and question-and-answer systems. However, existing models that include the copy mechanism often lack the ability to extract important fragments, resulting in generated content that suffers from thematic deviation and insufficient generalization. Specifically, Chinese automatic summarization using traditional generation methods often loses semantics because of its reliance on word lists. To address these issues, we proposed the novel BioCopy mechanism for the summarization task. By training the tags of predictive words and reducing the probability distribution range on the glossary, we enhanced the ability to generate continuous segments, which effectively solves the above problems. Additionally, we applied reinforced canonicality to the inputs to obtain better model results, making the model share the sub-network weight parameters and sparsing the model output to reduce the search space for model prediction. To further improve the model’s performance, we calculated the bilingual evaluation understudy (BLEU) score on the English dataset CNN/DailyMail to filter the thresholds and reduce the difficulty of word separation and the dependence of the output on the word list. We fully fine-tuned the model using the LCSTS dataset for the Chinese summarization task and conducted small-sample experiments using the CSL dataset. We also conducted ablation experiments on the Chinese dataset. The experimental results demonstrate that the optimized model can learn the semantic representation of the original text better than other models and performs well with small sample sizes.
- [1] . 2007. A survey automatic text summarization. PressAcademia Procedia 5, 1 (2007), 205–213.Google ScholarCross Ref
- [2] . 2019. Regularizing output distribution of abstractive chinese social media text summarization for improved semantic consistency. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 18, 3 (2019), 1–15.Google ScholarDigital Library
- [3] . 2017. Text summarization techniques: A brief survey. arXiv preprint arXiv:1707.02268 (2017).Google Scholar
- [4] . 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008.Google Scholar
- [5] . 2020. Pre-trained models for natural language processing: A survey. Science China Technological Sciences (2020), 1–26.Google Scholar
- [6] . 2020. mt5: A massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934 (2020).Google Scholar
- [7] . 2021. R-drop: Regularized dropout for neural networks. Advances in Neural Information Processing Systems 34 (2021).Google Scholar
- [8] . 2016. From softmax to sparsemax: A sparse model of attention and multi-label classification. In International Conference on Machine Learning. PMLR, 1614–1623.Google Scholar
- [9] . 2021. BioCopy: A plug-and-play span copy mechanism in Seq2Seq models. arXiv preprint arXiv:2109.12533 (2021).Google Scholar
- [10] . 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 3104–3112.Google ScholarDigital Library
- [11] . 2015. A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 (2015).Google Scholar
- [12] . 2016. Abstractive sentence summarization with attentive recurrent neural networks. In 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 93–98.Google ScholarCross Ref
- [13] . 2017. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In 31st AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- [14] . 2017. Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 (2017).Google Scholar
- [15] . 2016. Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393 (2016).Google Scholar
- [16] . 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).Google Scholar
- [17] Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. GloVe: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar. Association for Computational Linguistics, 532–1543.Google Scholar
- [18] . 2018. Deep contextualized word representations. CoRR abs/1802.05365 (2018).
arXiv:1802.05365 http://arxiv.org/abs/1802.05365Google Scholar - [19] . 2019. How to fine-tune BERT for text classification?. In China National Conference on Chinese Computational Linguistics. Springer, 194–206.Google ScholarDigital Library
- [20] . 2018. Improving language understanding by generative pre-training. (2018).Google Scholar
- [21] . 2018. Self-attention with relative position representations. arXiv preprint arXiv:1803.02155 (2018).Google Scholar
- [22] . 2017. Language modeling with gated convolutional networks. In International Conference on Machine Learning. PMLR, 933–941.Google ScholarDigital Library
- [23] . 2020. GLU variants improve transformer. CoRR abs/2002.05202 (2020).
arXiv:2002.05202 https://arxiv.org/abs/2002.05202Google Scholar - [24] . 2019. Sparse sequence-to-sequence models. arXiv preprint arXiv:1905.05702 (2019).Google Scholar
- [25] . 2019. How multilingual is multilingual BERT? arXiv preprint arXiv:1906.01502 (2019).Google Scholar
- [26] . 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out. 74–81.Google Scholar
- [27] . 2002. BLEU: A method for automatic evaluation of machine translation. In 40th Annual Meeting of the Association for Computational Linguistics. 311–318.Google Scholar
- [28] . 2015. Teaching machines to read and comprehend. Advances in Neural Information Processing Systems 28 (2015), 1693–1701.Google Scholar
- [29] . 2016. Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023 (2016).Google Scholar
- [30] . 2015. LCSTS: A large-scale Chinese short text summarization dataset. arXiv preprint arXiv:1506.05865 (2015).Google Scholar
- [31] . 2016. Real-time sign language recognition in complex background scene based on a hierarchical clustering classification method. In 2016 IEEE 2nd International Conference on Multimedia Big Data (BigMM ’16). IEEE, 64–67.Google ScholarCross Ref
- [32] . 2020. Load what you need: Smaller versions of multilingual BERT. arXiv preprint arXiv:2010.05609 (2020).Google Scholar
- [33] . 2018. Adafactor: Adaptive learning rates with sublinear memory cost. In International Conference on Machine Learning. PMLR, 4596–4604.Google Scholar
- [34] . 2019. Unified language model pre-training for natural language understanding and generation. Advances in Neural Information Processing Systems 32 (2019).Google Scholar
Index Terms
- Improved BIO-Based Chinese Automatic Abstract-Generation Model
Recommendations
A Topic Inference Chinese News Headline Generation Method Integrating Copy Mechanism
AbstractTo maximize the accuracy of the news headline generation model, increase the attention ratio of the model to significant information, and avoid duplication of generated headlines and problems unrelated to feature semantics, we proposed a topic ...
Improved N-grams approach for web page language identification
Transactions on computational collective intelligence VLanguage identification has been widely used for machine translations and information retrieval. In this paper, an improved Ngrams (ING) approach is proposed for web page language identification. The improved N-grams approach is based on a combination ...
Khmer-Chinese bilingual LDA topic model based on dictionary
Multilingual probabilistic topic models have been widely used in topic of mining area in multilingual documents, this paper proposes the Khmer-Chinese bilingual latent Dirichlet allocation (KCB-LDA) model based on the bilingual dictionary. With the ...
Comments