当前位置: X-MOL 学术Int. J. Uncertain. Quantif. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
DECISION THEORETIC BOOTSTRAPPING
International Journal for Uncertainty Quantification ( IF 1.7 ) Pub Date : 2024-01-01 , DOI: 10.1615/int.j.uncertaintyquantification.2023038552
Peyman Tavallali , Peyman Tavallali , Hamed Hamze Bajgiran , Danial Esaid , Houman Owhadi

The design and testing of supervised machine learning models combine two fundamental distributions: (1) the training data distribution and (2) the testing data distribution. Although these two distributions are identical and identifiable when the data set is infinite, they are imperfectly known when the data are finite (and possibly corrupted), and this uncertainty must be taken into account for robust uncertainty quantification (UQ). An important case is when the test distribution is coming from a modal or localized area of the finite sample distribution. We present a general decision theoretic bootstrapping solution to this problem: (1) partition the available data into a training subset and a UQ subset; (2) take m subsampled subsets of the training set and train m models; (3) partition the UQ set into n sorted subsets and take a random fraction of them to define n corresponding empirical distributions μj; (4) consider the adversarial game where Player I selects a model i ∈ {1,.....,m}, Player II selects the UQ distribution μj, and Player I receives a loss defined by evaluating the model i against data points sampled from μj; (5) identify optimal mixed strategies (probability distributions over models and UQ distributions) for both players. These randomized optimal mixed strategies provide optimal model mixtures, and UQ estimates given the adversarial uncertainty of the training and testing distributions represented by the game. The proposed approach provides (1) some degree of robustness to in-sample distribution localization/concentration and (2) conditional probability distributions on the output space forming aleatory representations of the uncertainty on the output as a function of the input variable.

中文翻译:

决策理论引导

监督机器学习模型的设计和测试结合了两个基本分布:(1) 训练数据分布和 (2) 测试数据分布。尽管当数据集无限时这两个分布是相同且可识别的,但当数据有限(并且可能已损坏)时,它们是不完全已知的,并且必须考虑这种不确定性以实现稳健的不确定性量化(UQ)。一个重要的情况是测试分布来自有限样本分布的模态或局部区域。我们针对这个问题提出了一个通用的决策理论自举解决方案:(1)将可用数据划分为训练子集和 UQ 子集; (2) 取训练集的 m 个下采样子集并训练 m 个模型; (3) 将 UQ 集合划分为 n 个排序子集,并取其中的随机部分来定义n 个相应的经验分布 μ j; (4) 考虑对抗性游戏,其中玩家 I 选择模型 i ∈ {1,.....,m},玩家 II 选择 UQ 分布 μ j 玩家 ​​I 接收通过根据数据评估模型i定义的损失从 μ j采样的点; (5) 确定两个参与者的最佳混合策略(模型的概率分布和 UQ 分布)。这些随机最优混合策略提供了最优模型混合,以及考虑到游戏所代表的训练和测试分布的对抗性不确定性的 UQ 估计。所提出的方法提供了(1)对样本内分布定位/集中的一定程度的鲁棒性,以及(2)输出空间上的条件概率分布,形成作为输入变量的函数的输出不确定性的偶然表示。
更新日期:2024-01-01
down
wechat
bug