当前位置: X-MOL 学术ACM Trans. Asian Low Resour. Lang. Inf. Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Supervised Contrast Learning Text Classification Model Based on Data Quality Augmentation
ACM Transactions on Asian and Low-Resource Language Information Processing ( IF 2 ) Pub Date : 2024-03-19 , DOI: 10.1145/3653300
Liang Wu 1 , Fangfang Zhang 1 , Chao Cheng 1 , Shinan Song 1
Affiliation  

Token-level data augmentation generates text samples by modifying the words of the sentences. However, data that are not easily classified can negatively affect the model. In particular, not considering the role of keywords when performing random augmentation operations on samples may lead to the generation of low-quality supplementary samples. Therefore, we propose a supervised contrast learning text classification model based on data quality augment (DQA). First, dynamic training is used to screen high-quality datasets containing beneficial information for model training. The selected data is then augmented with data based on important words with tag information. To obtain a better text representation to serve the downstream classification task, we employ a standard supervised contrast loss to train the model. Finally, we conduct experiments on five text classification datasets to validate the effectiveness of our model. In addition, ablation experiments are conducted to verify the impact of each module on classification.



中文翻译:

基于数据质量增强的监督对比学习文本分类模型

令牌级数据增强通过修改句子的单词来生成文本样本。然而,不易分类的数据会对模型产生负面影响。特别是,在对样本执行随机增强操作时不考虑关键字的作用可能会导致生成低质量的补充样本。因此,我们提出了一种基于数据质量增强(DQA)的监督对比学习文本分类模型。首先,动态训练用于筛选包含对模型训练有益信息的高质量数据集。然后,使用基于具有标签信息的重要单词的数据来扩充所选择的数据。为了获得更好的文本表示来服务下游分类任务,我们采用标准的监督对比度损失来训练模型。最后,我们在五个文本分类数据集上进行实验来验证我们模型的有效性。此外,还进行了消融实验来验证每个模块对分类的影响。

更新日期:2024-03-20
down
wechat
bug