Double data piling: a high-dimensional solution for asymptotically perfect multi-category classification,Journal of the Korean Statistical Society

当前位置： X-MOL 学术 › J. Korean Stat. Soc. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Double data piling: a high-dimensional solution for asymptotically perfect multi-category classification
Journal of the Korean Statistical Society ( IF 0.6 ) Pub Date : 2024-04-03 , DOI: 10.1007/s42952-024-00263-6
Taehyun Kim , Woonyoung Chang , Jeongyoun Ahn , Sungkyu Jung

For high-dimensional classification, interpolation of training data manifests as the data piling phenomenon, in which linear projections of data vectors from each class collapse to a single value. Recent research has revealed an additional phenomenon known as the ‘second data piling’ for independent test data in binary classification, providing a theoretical understanding of asymptotically perfect classification. This paper extends these findings to multi-category classification and provides a comprehensive characterization of the double data piling phenomenon. We define the maximal data piling subspace, which maximizes the sum of pairwise distances between piles of training data in multi-category classification. Furthermore, we show that a second data piling subspace that induces data piling for independent data exists and can be consistently estimated by projecting the negatively-ridged discriminant subspace onto an estimated ‘signal’ subspace. By leveraging this second data piling phenomenon, we propose a bias-correction strategy for class assignments, which asymptotically achieves perfect classification. The present research sheds light on benign overfitting and enhances the understanding of perfect multi-category classification of high-dimensional discrimination with a help of high-dimensional asymptotics.

中文翻译：

双数据堆积：渐近完美多类别分类的高维解决方案

对于高维分类，训练数据的插值表现为数据堆积现象，其中每个类别的数据向量的线性投影崩溃为单个值。最近的研究揭示了二元分类中独立测试数据的另一种现象，称为“第二数据堆积”，为渐近完美分类提供了理论理解。本文将这些发现扩展到多类别分类，并提供了双重数据堆积现象的全面表征。我们定义最大数据堆积子空间，它最大化多类别分类中训练数据堆之间的成对距离之和。此外，我们表明存在导致独立数据的数据堆积的第二数据堆积子空间，并且可以通过将负脊判别子空间投影到估计的“信号”子空间来一致地估计。通过利用第二个数据堆积现象，我们提出了一种用于类别分配的偏差校正策略，渐近地实现了完美的分类。目前的研究揭示了良性过度拟合，并在高维渐近的帮助下增强了对高维判别的完美多类别分类的理解。

更新日期：2024-04-04

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>