当前位置: X-MOL 学术Digit. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
CARgram: CNN-based accident recognition from road sounds through intensity-projected spectrogram analysis
Digital Signal Processing ( IF 2.9 ) Pub Date : 2024-02-13 , DOI: 10.1016/j.dsp.2024.104431
Alessandro Sebastian Podda , Riccardo Balia , Livio Pompianu , Salvatore Carta , Gianni Fenu , Roberto Saia

Road surveillance systems play an important role in traffic monitoring and detecting hazardous events. In recent years, several artificial intelligence-based approaches have been proposed for this purpose, typically based on the analysis of the acquired video streams. However, occlusions, poor lighting conditions, and heterogeneity of the events may often reduce their effectiveness and reliability. To overcome the limitations mentioned, scientific and industrial research has therefore focused on integrating such solutions with audio recognition methods. By automatically identifying anomalous traffic sounds, e.g., car crashes and skids, they help reduce false positives and missed alarms. Following this trend, in this work, we propose an innovative pipeline for the analysis of intensity-projected audio spectrograms from streams of traffic sounds, which exploits both (i) a visual approach based on a custom, special-purpose Convolutional Neural Network for the identification of anomalous events on the sound signal; and, (ii) a novel multi-representational encoding of the input, which proved to significantly improve the recognition accuracy of the neural models. The validation results of the proposed pipeline on the public MIVIA dataset, with a 0.96% of false positive rate, showed to be the best performance against the state-of-the-art competitors. Notably, following such findings, a prototype implementation has been deployed on a real-world video surveillance infrastructure.

中文翻译:

CARgram:通过强度投影频谱图分析,基于 CNN 的道路声音事故识别

道路监控系统在交通监控和检测危险事件方面发挥着重要作用。近年来,为此目的提出了几种基于人工智能的方法,通常基于对获取的视频流的分析。然而,遮挡、不良照明条件和事件的异质性通常可能会降低其有效性和可靠性。因此,为了克服上述限制,科学和工业研究的重点是将此类解决方案与音频识别方法相结合。通过自动识别异常交通声音(例如车祸和打滑),它们有助于减少误报和漏报。顺应这一趋势,在这项工作中,我们提出了一种创新的管道,用于分析交通声音流中的强度投影音频频谱图,它利用了(i)基于定制的、专用的卷积神经网络的视觉方法,用于识别声音信号上的异常事件;(ii)一种新颖的输入多表示编码,事实证明它可以显着提高神经模型的识别精度。所提出的管道在公共 MIVIA 数据集上的验证结果显示,误报率为 0.96%,与最先进的竞争对手相比,表现最佳。值得注意的是,根据这些发现,原型实施已部署在现实世界的视频监控基础设施上。
更新日期:2024-02-13
down
wechat
bug