Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks,ACM Transactions on Intelligent Systems and Technology

当前位置： X-MOL 学术 › ACM Trans. Intell. Syst. Technol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks
ACM Transactions on Intelligent Systems and Technology ( IF 5 ) Pub Date : 2024-03-29 , DOI: 10.1145/3643860
Bum Jun Kim ₁ , Hyeyeon Choi ₁ , Hyeonah Jang ₁ , Sang Woo Kim ₁

Affiliation

L₂ regularization for weights in neural networks is widely used as a standard training trick. In addition to weights, the use of batch normalization involves an additional trainable parameter γ, which acts as a scaling factor. However, L₂ regularization for γ remains an undiscussed mystery and is applied in different ways depending on the library and practitioner. In this article, we study whether L₂ regularization for γ is valid. To explore this issue, we consider two approaches: (1) variance control to make the residual network behave like an identity mapping and (2) stable optimization through the improvement of effective learning rate. Through two analyses, we specify the desirable and undesirable γ to apply L₂ regularization and propose four guidelines for managing them. In several experiments, we observed that applying L₂ regularization to applicable γ increased 1% to 4% classification accuracy, whereas applying L₂ regularization to inapplicable γ decreased 1% to 3% classification accuracy, which is consistent with our four guidelines. Our proposed guidelines were further validated through various tasks and architectures, including variants of residual networks and transformers.

中文翻译：

深度残差网络批量归一化中 Gamma 正则化指南

神经网络中权重的L ₂正则化被广泛用作标准训练技巧。除了权重之外，批量归一化的使用还涉及一个额外的可训练参数 γ，它充当比例因子。然而， γ 的L ₂正则化仍然是一个未讨论的谜，并且根据库和实践者以不同的方式应用。在本文中，我们研究了γ 的L ₂正则化是否有效。为了探讨这个问题，我们考虑两种方法：（1）方差控制，使残差网络表现得像恒等映射；（2）通过提高有效学习率进行稳定优化。通过两次分析，我们指定了应用L ₂正则化所需的和不需要的 γ，并提出了管理它们的四个准则。在几个实验中，我们观察到，将L ₂正则化应用于适用的 γ 提高了 1% 至 4% 的分类精度，而将L ₂正则化应用于不适用的 γ 则降低了 1% 至 3% 的分类精度，这与我们的四个指南一致。我们提出的指南通过各种任务和架构（包括残差网络和变压器的变体）得到了进一步验证。

更新日期：2024-04-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>