当前位置: X-MOL 学术Ann. Inst. Stat. Math. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Selective inference after feature selection via multiscale bootstrap
Annals of the Institute of Statistical Mathematics ( IF 1 ) Pub Date : 2022-07-30 , DOI: 10.1007/s10463-022-00838-2
Yoshikazu Terada , Hidetoshi Shimodaira

It is common to show the confidence intervals or p-values of selected features, or predictor variables in regression, but they often involve selection bias. The selective inference approach solves this bias by conditioning on the selection event. Most existing studies of selective inference consider a specific algorithm, such as Lasso, for feature selection, and thus they have difficulties in handling more complicated algorithms. Moreover, existing studies often consider unnecessarily restrictive events, leading to over-conditioning and lower statistical power. Our novel and widely applicable resampling method via multiscale bootstrap addresses these issues to compute an approximately unbiased selective p-value for the selected features. As a simplification of the proposed method, we also develop a simpler method via the classical bootstrap. We prove that the p-value computed by our multiscale bootstrap method is more accurate than the classical bootstrap method. Furthermore, numerical experiments demonstrate that our algorithm works well even for more complicated feature selection methods such as non-convex regularization.



中文翻译:

通过多尺度引导程序选择特征后的选择性推理

在回归中显示所选特征或预测变量的置信区间或p值是很常见的,但它们通常涉及选择偏差。选择性推理方法通过以选择事件为条件来解决这种偏差。大多数现有的选择性推理研究都考虑了一种特定的算法,例如 Lasso,用于特征选择,因此它们难以处理更复杂的算法。此外,现有研究经常考虑不必要的限制性事件,导致过度调节和降低统计功效。我们通过多尺度引导的新颖且广泛适用的重采样方法解决了这些问题,以计算近似无偏的选择性p- 选定特征的值。作为所提出方法的简化,我们还通过经典引导程序开发了一种更简单的方法。我们证明了我们的多尺度自举方法计算的p值比经典自举方法更准确。此外,数值实验表明,即使对于更复杂的特征选择方法,例如非凸正则化,我们的算法也能很好地工作。

更新日期:2022-08-01
down
wechat
bug