Abstract
Semi-supervised learning is a promising approach to dealing with the problem of insufficient labeled data. Recent methods grouped into paradigms of consistency regularization and pseudo-labeling have outstanding performances on image data, but achieve limited improvements when employed for processing textual information, due to the neglect of the discrete nature of textual information and the lack of high-quality text augmentation transformation means. In this paper, we propose the novel SeqMatch method. It can automatically perceive abnormal model states caused by anomalous data obtained by text augmentations and reduce their interferences and instead leverages normal ones to improve the effectiveness of consistency regularization. And it generates hard artificial pseudo-labels to enable the model to be efficiently updated and optimized toward low entropy. We also design several much stronger well-organized text augmentation transformation pipelines to increase the divergence between two views of unlabeled discrete textual sequences, thus enabling the model to learn more knowledge from the alignment. Extensive comparative experimental results show that our SeqMatch outperforms previous methods on three widely used benchmarks significantly. In particular, SeqMatch can achieve a maximum performance improvement of 16.4% compared to purely supervised training when provided with a minimal number of labeled examples.
Similar content being viewed by others
References
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Process Syst, 30
Kenton JDMWC, Toutanova LK (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT
Van Engelen JE, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn 109(2):373–440
Yang X, Song Z, King I, Xu Z (2022) A survey on deep semi-supervised learning. IEEE Trans Knowl Data Eng
Liu F, Tian Y, Chen Y, Liu Y, Belagiannis V, Carneiro G (2022) Acpl: anti-curriculum pseudo-,labelling for semi-supervised medical image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 20697–20706
Xie Q, Dai Z, Hovy E, Luong T, Le Q (2020) Unsupervised data augmentation for consistency training. Adv Neural Inf Process Syst 33:6256–6268
Sohn K, Berthelot D, Carlini N, Zhang Z, Zhang H, Raffel CA, Cubuk ED, Kurakin A, Li C-L (2020) Fixmatch: simplifying semi-supervised learning with consistency and confidence. Adv Neural Inf Process Syst 33:596–608
Zhang B, Wang Y, Hou W, Wu H, Wang J, Okumura M, Shinozaki T (2021) Flexmatch: Boosting semi-supervised learning with curriculum pseudo labeling. Adv Neural Inform Process Syst, 34
Park J, Kim G, Kang J (2022) Consistency training with virtual adversarial discrete perturbation. In: Proceedings of the 2022 conference of the north american chapter of the association for computational linguistics: human language technologies, pp 5646–5656
Lee DH (2013) Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on challenges in representation learning, ICML, vol 3, p 896
Chen L, Alexa A, Garcia F, Kumar V, Xie H, Lu J (2021) Industry scale semi-supervised learning for natural language understanding. NAACL-HLT 2021:311
Tsai ACY, Lin SY, Fu LC (2022) Contrast-enhanced semi-supervised text classification with few labels. In: Proceedings of the AAAI conference on artificial intelligence, vol 36, pp 11394–11402
Bachman P, Alsharif O, Precup D (2014) Learning with pseudo-ensemblesAdv Neural Inform Process Syst, 27
Sajjadi M, Javanmardi M, Tasdizen T (2016) Regularization with stochastic transformations and perturbations for deep semi-supervised learning. Adv Neural Inform Process Syst, 29
Laine S, Aila T (2016) Temporal ensembling for semi-supervised learning. In: International conference on learning representations
Lee D, Kim S, Kim I, Cheon Y, Cho M, Han WS (2022 ) Contrastive regularization for semi-supervised learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3911–3920
Tarvainen A, Valpola H (2017) Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Adv. Neural Inform Process Syst, 30
Cubuk ED, Zoph B, Shlens J, Le QV (2019) Randaugment: practical data augmentation with no separate search. arXiv:1909.13719 (vol 2 no 4, p 7)
Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV (2018) Autoaugment: learning augmentation policies from data. arXiv:1805.09501
Wei J, Zou K (2019) Eda: Easy data augmentation techniques for boosting performance on text classification tasks. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 6382–6388
Sennrich R, Haddow B, Birch A(2016) Improving neural machine translation models with monolingual data. In: Proceedings of the 54th annual meeting of the association for computational linguistics, Vol 1: Long Papers, pp 86–96
Edunov S, Ott M, Auli M, Grangier D (2018) Understanding back-translation at scale. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 489–500
Ma, E.: NLP Augmentation. https://github.com/makcedward/nlpaug (2019)
Miyato T, Dai AM, Goodfellow I (2016) Adversarial training methods for semi-supervised text classification. In: International conference on learning representations
Qu Y, Shen D, Shen Y, Sajeev S, Chen W, Han J (2020) Coda: contrast-enhanced and diversity-promoting data augmentation for natural language understanding. In: International conference on learning representations
Feng SY, Gangal V, Wei J, Chandar S, Vosoughi S, Mitamura T, Hovy E (2021) A survey of data augmentation approaches for nlp. In: Findings of the association for computational linguistics: ACL-IJCNLP 2021, pp 968–988
Wang Y, Xu C, Sun Q, Hu H, Tao C, Geng X, Jiang D (2022) Promda: Prompt-based data augmentation for low-resource nlu tasks. In: Proceedings of the 60th annual meeting of the association for computational linguistics, pp 4242–4255
Wang M, Wang W, Li B, Zhang X, Lan L, Tan H, Liang T, Yu W, Luo Z (2021) Interbn: Channel fusion for adversarial unsupervised domain adaptation. In: Proceedings of the 29th ACM international conference on multimedia, pp 3691–3700
Wang M, Li P, Shen L, Wang Y, Wang S, Wang W, Zhang X, Chen J, Luo Z (2022) Informative pairs mining based adaptive metric learning for adversarial domain adaptation. Neural Netw 151:238–249
Wang M, Yuan J, Qian Q, Wang Z, Li H (2022) Semantic data augmentation based distance metric learning for domain generalization. In: Proceedings of the 30th ACM international conference on multimedia, pp 3214–3223
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(1):5485–5551
Kobayashi S (2018) Contextual augmentation: Data augmentation by words with paradigmatic relations. In: Proceedings of NAACL-HLT, pp 452–457
Cocos A, Apidianaki M, Callison-Burch C (2017) Mapping the paraphrase database to wordnet. In: Proceedings of the 6th joint conference on lexical and computational semantics (* SEM 2017), pp 84–90
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Association for Computational Linguistics, Portland, pp 142–150
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. Adv Neural Inform Process Syst, 28
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, X., Tan, Z., Lu, F. et al. Adaptive semi-supervised learning from stronger augmentation transformations of discrete text information. Knowl Inf Syst (2024). https://doi.org/10.1007/s10115-024-02100-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10115-024-02100-y