当前位置: X-MOL 学术J. Comput. Sci. Tech. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
Journal of Computer Science and Technology ( IF 1.9 ) Pub Date : 2023-07-31 , DOI: 10.1007/s11390-021-1119-0
Yi-Ge Xu , Xi-Peng Qiu , Li-Gao Zhou , Xuan-Jing Huang

Fine-tuning pre-trained language models like BERT have become an effective way in natural language processing (NLP) and yield state-of-the-art results on many downstream tasks. Recent studies on adapting BERT to new tasks mainly focus on modifying the model structure, re-designing the pre-training tasks, and leveraging external data and knowledge. The fine-tuning strategy itself has yet to be fully explored. In this paper, we improve the fine-tuning of BERT with two effective mechanisms: self-ensemble and self-distillation. The self-ensemble mechanism utilizes the checkpoints from an experience pool to integrate the teacher model. In order to transfer knowledge from the teacher model to the student model efficiently, we further use knowledge distillation, which is called self-distillation because the distillation comes from the model itself through the time dimension. Experiments on the GLUE benchmark and the Text Classification benchmark show that our proposed approach can significantly improve the adaption of BERT without any external data or knowledge. We conduct exhaustive experiments to investigate the efficiency of the self-ensemble and self-distillation mechanisms, and our proposed approach achieves a new state-of-the-art result on the SNLI dataset.



中文翻译:

通过自集成和自蒸馏改进 BERT 微调

微调 BERT 等预训练语言模型已成为自然语言处理 (NLP) 的有效方法,并在许多下游任务上产生最先进的结果。最近关于BERT适应新任务的研究主要集中在修改模型结构、重新设计预训练任务以及利用外部数据和知识。微调策略本身还有待充分探索。在本文中,我们通过自集成和自蒸馏两种有效机制改进了 BERT 的微调。自集成机制利用经验池中的检查点来集成教师模型。为了有效地将知识从教师模型迁移到学生模型,我们进一步使用知识蒸馏,这称为自蒸馏,因为蒸馏来自于模型本身通过时间维度。在 GLUE 基准和文本分类基准上的实验表明,我们提出的方法可以在不需要任何外部数据或知识的情况下显着提高 BERT 的适应性。我们进行了详尽的实验来研究自集成和自蒸馏机制的效率,我们提出的方法在 SNLI 数据集上取得了新的最先进的结果。

更新日期:2023-07-31
down
wechat
bug