当前位置: X-MOL 学术Pattern Recogn. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Even small correlation and diversity shifts pose dataset-bias issues
Pattern Recognition Letters ( IF 5.1 ) Pub Date : 2024-02-03 , DOI: 10.1016/j.patrec.2024.01.026
Alceu Bissoto , Catarina Barata , Eduardo Valle , Sandra Avila

Distribution shifts hinder the deployment of deep learning in real-world problems. Distribution shifts appear when train and test data come from different sources, which commonly happens in practice. Despite shifts occurring concurrently in many forms (e.g., correlation and diversity shifts) and intensities, the literature focuses only on severe and isolated shifts. In this work, we propose a comprehensive examination of distribution shifts across different intensity levels, investigating the nuanced impacts of both mild and severe shifts on the learning process and assessing the interplay between correlation and diversity shifts. We train models in three different scenarios considering synthetic and real correlation and diversity shifts, spamming across eight different levels of correlation shift, and evaluate them in both in-distribution and diversity-shifted test sets. Our experiments reveal three major findings: (1) Even small correlation shifts pose dataset-bias issues, presenting a risk of accumulating and combining unaccountable weak biases; (2) Models learn robust features in high- and low-shift scenarios but prefer spurious ones during test regardless; (3) Diversity shift can attenuate the reliance on spurious correlations. Our work has implications for distribution shift research and practice, providing new insights into how models learn and rely on spurious correlations under different types and intensities of shifts.

中文翻译:

即使很小的相关性和多样性变化也会带来数据集偏差问题

分布变化阻碍了深度学习在现实世界问题中的部署。当训练数据和测试数据来自不同来源时,就会出现分布变化,这在实践中经常发生。尽管变化以多种形式(例如,相关性和多样性变化)和强度同时发生,但文献仅关注严重和孤立的变化。在这项工作中,我们建议对不同强度水平的分布变化进行全面检查,调查轻度和严重变化对学习过程的细微影响,并评估相关性和多样性变化之间的相互作用。我们在三种不同的场景中训练模型,考虑合成和真实的相关性和多样性转移,在八个不同级别的相关性转移中发送垃圾邮件,并在分布内和多样性转移测试集中评估它们。我们的实验揭示了三个主要发现:(1)即使很小的相关性变化也会带来数据集偏差问题,从而带来累积和组合不负责任的弱偏差的风险;(2) 模型在高移和低移场景中学习鲁棒特征,但无论如何在测试过程中更喜欢虚假特征;(3) 多样性转移可以减弱对虚假相关性的依赖。我们的工作对分布转变研究和实践具有影响,为模型如何在不同类型和强度的转变下学习和依赖虚假相关性提供了新的见解。
更新日期:2024-02-03
down
wechat
bug