当前位置: X-MOL 学术Pattern Anal. Applic. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
How to track and segment fish without human annotations: a self-supervised deep learning approach
Pattern Analysis and Applications ( IF 3.9 ) Pub Date : 2024-02-23 , DOI: 10.1007/s10044-024-01227-6
Alzayat Saleh , Marcus Sheaves , Dean Jerry , Mostafa Rahimi Azghadi

Tracking fish movements and sizes of fish is crucial to understanding their ecology and behaviour. Knowing where fish migrate, how they interact with their environment, and how their size affects their behaviour can help ecologists develop more effective conservation and management strategies to protect fish populations and their habitats. Deep learning is a promising tool to analyse fish ecology from underwater videos. However, training deep neural networks (DNNs) for fish tracking and segmentation requires high-quality labels, which are expensive to obtain. We propose an alternative unsupervised approach that relies on spatial and temporal variations in video data to generate noisy pseudo-ground-truth labels. We train a multi-task DNN using these pseudo-labels. Our framework consists of three stages: (1) an optical flow model generates the pseudo-labels using spatial and temporal consistency between frames, (2) a self-supervised model refines the pseudo-labels incrementally, and (3) a segmentation network uses the refined labels for training. Consequently, we perform extensive experiments to validate our method on three public underwater video datasets and demonstrate its effectiveness for video annotation and segmentation. We also evaluate its robustness to different imaging conditions and discuss its limitations.



中文翻译:

如何在没有人工注释的情况下跟踪和分割鱼类:一种自我监督的深度学习方法

跟踪鱼类的运动和鱼类的大小对于了解其生态和行为至关重要。了解鱼类迁徙的地点、它们如何与环境相互作用以及它们的体型如何影响它们的行为,可以帮助生态学家制定更有效的保护和管理策略,以保护鱼类种群及其栖息地。深度学习是一种很有前途的工具,可以通过水下视频分析鱼类生态。然而,训练用于鱼类跟踪和分割的深度神经网络 (DNN) 需要高质量的标签,而获得这些标签的成本很高。我们提出了一种替代的无监督方法,它依赖于视频数据的空间和时间变化来生成嘈杂的伪地面真实标签。我们使用这些伪标签训练多任务 DNN。我们的框架由三个阶段组成:(1) 光流模型使用帧之间的空间和时间一致性生成伪标签,(2) 自监督模型逐步细化伪标签,(3) 分割网络使用训练的细化标签。因此,我们进行了大量的实验,在三个公共水下视频数据集上验证我们的方法,并证明其在视频注释和分割方面的有效性。我们还评估了它对不同成像条件的鲁棒性并讨论了它的局限性。

更新日期:2024-02-24
down
wechat
bug