Instabilities in Convnets for Raw Audio,IEEE Signal Processing Letters

当前位置： X-MOL 学术 › IEEE Signal Process. Lett. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Instabilities in Convnets for Raw Audio
IEEE Signal Processing Letters ( IF 3.9 ) Pub Date : 2024-04-08 , DOI: 10.1109/lsp.2024.3386492
Daniel Haider ₁ , Vincent Lostanlen ₂ , Martin Ehler ₃ , Peter Balazs ₁

Affiliation

What makes waveform-based deep learning so hard? Despite numerous attempts at training convolutional neural networks (convnets) for filterbank design, they often fail to outperform hand-crafted baselines. These baselines are linear time-invariant systems: as such, they can be approximated by convnets with wide receptive fields. Yet, in practice, gradient-based optimization leads to suboptimal results. In our article, we approach this problem from the perspective of initialization. We present a theory of large deviations for the energy response of FIR filterbanks with random Gaussian weights. We find that deviations worsen for large filters and locally periodic input signals, which are both typical for audio signal processing applications. Numerical simulations align with our theory and suggest that the condition number of a convolutional layer follows a logarithmic scaling law between the number and length of the filters, which is reminiscent of discrete wavelet bases.

中文翻译：

原始音频卷积网络的不稳定性

是什么让基于波形的深度学习如此困难？尽管为滤波器组设计训练卷积神经网络（Convnet）进行了多次尝试，但它们通常无法超越手工制作的基线。这些基线是线性时不变系统：因此，它们可以通过具有宽感受野的卷积网络来近似。然而，在实践中，基于梯度的优化会导致次优结果。在我们的文章中，我们从初始化的角度来解决这个问题。我们提出了具有随机高斯权重的 FIR 滤波器组能量响应大偏差理论。我们发现，对于音频信号处理应用来说，大型滤波器和局部周期性输入信号的偏差会更严重。数值模拟与我们的理论一致，表明卷积层的条件数遵循滤波器的数量和长度之间的对数缩放定律，这让人想起离散小波基。

更新日期：2024-04-08

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>