当前位置: X-MOL 学术Ann. Inst. Stat. Math. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On the universal consistency of an over-parametrized deep neural network estimate learned by gradient descent
Annals of the Institute of Statistical Mathematics ( IF 1 ) Pub Date : 2024-04-08 , DOI: 10.1007/s10463-024-00898-6
Selina Drews , Michael Kohler

Estimation of a multivariate regression function from independent and identically distributed data is considered. An estimate is defined which fits a deep neural network consisting of a large number of fully connected neural networks, which are computed in parallel, via gradient descent to the data. The estimate is over-parametrized in the sense that the number of its parameters is much larger than the sample size. It is shown that with a suitable random initialization of the network, a sufficiently small gradient descent step size, and a number of gradient descent steps that slightly exceed the reciprocal of this step size, the estimate is universally consistent. This means that the expected \(L_2\) error converges to zero for all distributions of the data where the response variable is square integrable.



中文翻译:

关于通过梯度下降学习的过参数化深度神经网络估计的普遍一致性

考虑从独立且同分布的数据估计多元回归函数。定义了一个适合由大量完全连接的神经网络组成的深度神经网络的估计,这些神经网络通过数据的梯度下降并行计算。该估计是过度参数化的,因为其参数的数量远大于样本大小。结果表明,通过适当的网络随机初始化、足够小的梯度下降步长以及稍微超过该步长的倒数的多个梯度下降步长,估计是普遍一致的。这意味着对于响应变量可平方可积的所有数据分布,预期\(L_2\)误差收敛为零。

更新日期:2024-04-09
down
wechat
bug