当前位置: X-MOL 学术Nat. Mach. Intell. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Inversion dynamics of class manifolds in deep learning reveals tradeoffs underlying generalization
Nature Machine Intelligence ( IF 23.8 ) Pub Date : 2024-01-08 , DOI: 10.1038/s42256-023-00772-9
Simone Ciceri , Lorenzo Cassani , Matteo Osella , Pietro Rotondo , Filippo Valle , Marco Gherardi

To achieve near-zero training error in a classification problem, the layers of a feed-forward network have to disentangle the manifolds of data points with different labels to facilitate the discrimination. However, excessive class separation can lead to overfitting because good generalization requires learning invariant features, which involve some level of entanglement. We report on numerical experiments showing how the optimization dynamics finds representations that balance these opposing tendencies with a non-monotonic trend. After a fast segregation phase, a slower rearrangement (conserved across datasets and architectures) increases the class entanglement. The training error at the inversion is stable under subsampling and across network initializations and optimizers, which characterizes it as a property solely of the data structure and (very weakly) of the architecture. The inversion is the manifestation of tradeoffs elicited by well-defined and maximally stable elements of the training set called ‘stragglers’, which are particularly influential for generalization.



中文翻译:

深度学习中类流形的反转动态揭示了泛化背后的权衡

为了在分类问题中实现接近零的训练误差,前馈网络的各层必须解开具有不同标签的数据点流形,以便于区分。然而,过度的类分离可能会导致过度拟合,因为良好的泛化需要学习不变的特征,这涉及一定程度的纠缠。我们报告了数值实验,展示了优化动态如何找到平衡这些相反趋势与非单调趋势的表示。在快速隔离阶段之后,较慢的重新排列(在数据集和体系结构中保守)会增加类纠缠。反演时的训练误差在子采样下以及跨网络初始化和优化器时是稳定的,这将其表征为仅数据结构和(非常弱)架构的属性。反转是训练集中被称为“落后者”的明确定义和最大稳定元素引起的权衡的表现,这些元素对泛化特别有影响。

更新日期:2024-01-08
down
wechat
bug