Multi-object behaviour recognition based on object detection cascaded image classification in classroom scenes,Applied Intelligence

当前位置： X-MOL 学术 › Appl. Intell. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Multi-object behaviour recognition based on object detection cascaded image classification in classroom scenes
Applied Intelligence ( IF 5.3 ) Pub Date : 2024-04-11 , DOI: 10.1007/s10489-024-05409-x
Min Dang , Gang Liu , Hao Li , Qijie Xu , Xu Wang , Rong Pan

For multi-object behaviour recognition in classroom scenes, crowded objects have heavy occlusion, invisible keypoints, scale variation, which directly overwhelms the recognition performance. Due to the dense student objects and similar student behaviours, multi-object behaviour recognition brings great challenges. Therefore, we proposed multi-object behaviour recognition based on object detection cascaded image classification. Specifically, object detection extracts student objects, followed by Vision Transformer (ViT) classification of student behaviour. To ensure the accuracy of behaviour recognition, it is first necessary to improve the detection performance of object detection. This paper proposes the Shallow Auxiliary Module for object detection to assist the backbone network in extracting hybrid multi-scale feature information. The multi-scale and multi-channel feature information is fused to alleviate object overlap and scale variation. We propose a Scale Assignment Fusion Mechanism that non-heuristically guides objects to learn the optimal feature layer. Furthermore, the Anchor-free Dynamic Label Assignment can suppress the prediction of low-quality bounding boxes, stabling training and improving detection performance. The proposed student object detector achieves the state-of-the-art mAP\(^{50}\) of 88.03 and AP\(_l\) of 57.64, outperforming state-of-the-art object detection methods. Our multi-object behaviour recognition method achieves the recognition of four behaviour classes, which is significantly better than the results of other comparison methods.

中文翻译：

课堂场景中基于目标检测级联图像分类的多目标行为识别

对于课堂场景中的多目标行为识别，拥挤的物体遮挡严重、关键点不可见、尺度变化等，直接压倒了识别性能。由于学生对象密集、学生行为相似，多对象行为识别带来了巨大的挑战。因此，我们提出了基于目标检测级联图像分类的多目标行为识别。具体来说，对象检测提取学生对象，然后对学生行为进行 Vision Transformer (ViT) 分类。要保证行为识别的准确性，首先需要提高物体检测的检测性能。本文提出了用于目标检测的浅层辅助模块，以辅助骨干网络提取混合多尺度特征信息。融合多尺度、多通道特征信息，以减轻对象重叠和尺度变化。我们提出了一种尺度分配融合机制，以非启发式方式引导对象学习最佳特征层。此外，Anchor-free动态标签分配可以抑制低质量边界框的预测，稳定训练并提高检测性能。所提出的学生对象检测器实现了 88.03 的最先进的 mAP \(^{50}\)和 57.64 的 AP \(_l\)，优于最先进的对象检测方法。我们的多对象行为识别方法实现了四个行为类别的识别，明显优于其他比较方法的结果。

更新日期：2024-04-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>