Information Bottleneck in Deep Learning - A Semiotic Approach,International Journal of Computers Communications & Control

当前位置： X-MOL 学术 › Int. J. Comput. Commun. Control › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Information Bottleneck in Deep Learning - A Semiotic Approach
International Journal of Computers Communications & Control ( IF 2.7 ) Pub Date : 2022-01-09 , DOI: 10.15837/ijccc.2022.1.4650
Bogdan Musat , Razvan Andonie

The information bottleneck principle was recently proposed as a theory meant to explain some of the training dynamics of deep neural architectures. Via information plane analysis, patterns start to emerge in this framework, where two phases can be distinguished: fitting and compression. We take a step further and study the behaviour of the spatial entropy characterizing the layers of convolutional neural networks (CNNs), in relation to the information bottleneck theory. We observe pattern formations which resemble the information bottleneck fitting and compression phases. From the perspective of semiotics, also known as the study of signs and sign-using behavior, the saliency maps of CNN’s layers exhibit aggregations: signs are aggregated into supersigns and this process is called semiotic superization. Superization can be characterized by a decrease of entropy and interpreted as information concentration. We discuss the information bottleneck principle from the perspective of semiotic superization and discover very interesting analogies related to the informational adaptation of the model. In a practical application, we introduce a modification of the CNN training process: we progressively freeze the layers with small entropy variation of their saliency map representation. Such layers can be stopped earlier from training without a significant impact on the performance (the accuracy) of the network, connecting the entropy evolution through time with the training dynamics of a network.

中文翻译：

深度学习中的信息瓶颈——一种符号学方法

信息瓶颈原理最近被提出作为一种理论，旨在解释深度神经架构的一些训练动态。通过信息平面分析，该框架中开始出现模式，其中可以区分两个阶段：拟合和压缩。我们更进一步研究了与信息瓶颈理论相关的表征卷积神经网络 (CNN) 层的空间熵的行为。我们观察到类似于信息瓶颈拟合和压缩阶段的模式形成。从符号学（也称为符号和符号使用行为的研究）的角度来看，CNN 各层的显着性图表现出聚合：符号聚合成超符号，这个过程称为符号超化。超级化可以通过熵的减少来表征，并被解释为信息集中。我们从符号学超化的角度讨论了信息瓶颈原理，并发现了与模型的信息适应相关的非常有趣的类比。在实际应用中，我们引入了 CNN 训练过程的修改：我们逐渐冻结其显着性图表示熵变化较小的层。这些层可以更早地停止训练，而不会对网络的性能（准确性）产生重大影响，将随时间的熵演化与网络的训练动态联系起来。我们从符号学超化的角度讨论了信息瓶颈原理，并发现了与模型的信息适应相关的非常有趣的类比。在实际应用中，我们引入了 CNN 训练过程的修改：我们逐渐冻结其显着性图表示熵变化较小的层。这些层可以更早地停止训练，而不会对网络的性能（准确性）产生重大影响，将随时间的熵演化与网络的训练动态联系起来。我们从符号学超化的角度讨论了信息瓶颈原理，并发现了与模型的信息适应相关的非常有趣的类比。在实际应用中，我们引入了 CNN 训练过程的修改：我们逐渐冻结其显着性图表示熵变化较小的层。这些层可以更早地停止训练，而不会对网络的性能（准确性）产生重大影响，将随时间的熵演化与网络的训练动态联系起来。

更新日期：2022-02-06

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>