当前位置: X-MOL 学术J. Optim. Theory Appl. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
On Stochastic Roundoff Errors in Gradient Descent with Low-Precision Computation
Journal of Optimization Theory and Applications ( IF 1.9 ) Pub Date : 2023-12-20 , DOI: 10.1007/s10957-023-02345-7
Lu Xia , Stefano Massei , Michiel E. Hochstenbach , Barry Koren

Abstract

When implementing the gradient descent method in low precision, the employment of stochastic rounding schemes helps to prevent stagnation of convergence caused by the vanishing gradient effect. Unbiased stochastic rounding yields zero bias by preserving small updates with probabilities proportional to their relative magnitudes. This study provides a theoretical explanation for the stagnation of the gradient descent method in low-precision computation. Additionally, we propose two new stochastic rounding schemes that trade the zero bias property with a larger probability to preserve small gradients. Our methods yield a constant rounding bias that, on average, lies in a descent direction. For convex problems, we prove that the proposed rounding methods typically have a beneficial effect on the convergence rate of gradient descent. We validate our theoretical analysis by comparing the performances of various rounding schemes when optimizing a multinomial logistic regression model and when training a simple neural network with an 8-bit floating-point format.



中文翻译:

低精度计算梯度下降中的随机舍入误差

摘要

在低精度下实现梯度下降法时,采用随机舍入方案有助于防止梯度消失效应引起的收敛停滞。无偏随机舍入通过保留小更新(概率与其相对大小成比例)来产生零偏差。该研究为梯度下降法在低精度计算中的停滞现象提供了理论解释。此外,我们提出了两种新的随机舍入方案,以更大的概率交换零偏差属性以保留小梯度。我们的方法产生恒定的舍入偏差,平均而言,该偏差位于下降方向。对于凸问题,我们证明所提出的舍入方法通常对梯度下降的收敛速度有有益的影响。我们通过比较优化多项逻辑回归模型时以及训练具有 8 位浮点格式的简单神经网络时各种舍入方案的性能来验证我们的理论分析。

更新日期:2023-12-20
down
wechat
bug