当前位置: X-MOL 学术J. Optim. Theory Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Nonsmooth Nonconvex Stochastic Heavy Ball
Journal of Optimization Theory and Applications ( IF 1.9 ) Pub Date : 2024-03-11 , DOI: 10.1007/s10957-024-02408-3
Tam Le

Motivated by the conspicuous use of momentum-based algorithms in deep learning, we study a nonsmooth nonconvex stochastic heavy ball method and show its convergence. Our approach builds upon semialgebraic (definable) assumptions commonly met in practical situations and combines a nonsmooth calculus with a differential inclusion method. Additionally, we provide general conditions for the sample distribution to ensure the convergence of the objective function. Our results are general enough to justify the use of subgradient sampling in modern implementations that heuristically apply rules of differential calculus on nonsmooth functions, such as backpropagation or implicit differentiation. As for the stochastic subgradient method, our analysis highlights that subgradient sampling can make the stochastic heavy ball method converge to artificial critical points. Thanks to the semialgebraic setting, we address this concern showing that these artifacts are almost surely avoided when initializations are randomized, leading the method to converge to Clarke critical points.



中文翻译:

非光滑非凸随机重球

受深度学习中基于动量的算法的广泛使用的推动,我们研究了一种非光滑非凸随机重球方法并展示了其收敛性。我们的方法建立在实际情况中常见的半代数(可定义)假设之上,并将非光滑微积分与微分包含方法相结合。此外,我们还提供了样本分布的一般条件,以确保目标函数的收敛。我们的结果足够普遍,足以证明在现代实现中使用次梯度采样是合理的,这些实现启发式地将微分计算规则应用于非光滑函数,例如反向传播或隐式微分。对于随机次梯度法,我们的分析强调,次梯度采样可以使随机重球法收敛到人工临界点。由于半代数设置,我们解决了这个问题,表明当初始化随机化时,几乎肯定可以避免这些伪影,从而使该方法收敛到克拉克临界点。

更新日期:2024-03-13
down
wechat
bug