An Algorithm of Complete Coverage Path Planning for Deep-Sea Mining Vehicle Clusters Based on Reinforcement Learning,Advanced Theory and Simulations

当前位置： X-MOL 学术 › Adv. Theory Simul. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

An Algorithm of Complete Coverage Path Planning for Deep-Sea Mining Vehicle Clusters Based on Reinforcement Learning
Advanced Theory and Simulations ( IF 3.3 ) Pub Date : 2024-01-14 , DOI: 10.1002/adts.202300970
Bowen Xing ₁ , Xiao Wang ₁ , Zhenchong Liu ₂

Affiliation

This paper proposes a deep reinforcement learning algorithm to achieve complete coverage path planning for deep-sea mining vehicle clusters. First, the mining vehicles and the deep-sea mining environment are modeled. Then, this paper implements a series of algorithm designs and optimizations based on Deep Q Networks (DQN). The map fusion mechanism can integrate the grid matrix data from multiple mining vehicles to get the state matrix of the complete environment. In this paper, a preprocessing method for the state matrix is also designed to provide suitable training data for the neural network. The reward function and action selection mechanism of the algorithm are also optimized according to the requirements of cluster cooperative operation. Furthermore, the algorithm uses distance constraints to prevent the entanglement of underwater hoses. To improve the training efficiency of the neural network, the algorithm filters and extracts training samples for training through the sample quality score. Considering the requirement of cluster complete coverage mission, this paper introduces Long Short-Term Memory (LSTM) based on the neural network to achieve a better training effect. After completing the above optimization and design, the algorithm proposed in this paper is verified through simulation experiments.

中文翻译：

基于强化学习的深海矿车集群全覆盖路径规划算法

本文提出一种深度强化学习算法，实现深海采矿车辆集群的全覆盖路径规划。首先，对采矿车辆和深海采矿环境进行建模。然后，本文基于深度Q网络（DQN）实现了一系列算法设计和优化。地图融合机制可以整合多个矿车的网格矩阵数据，得到完整环境的状态矩阵。本文还设计了状态矩阵的预处理方法，为神经网络提供合适的训练数据。算法的奖励函数和动作选择机制也根据集群协作运行的要求进行了优化。此外，该算法使用距离约束来防止水下软管缠绕。为了提高神经网络的训练效率，算法通过样本质量评分过滤提取训练样本进行训练。考虑到集群全覆盖任务的要求，本文引入基于神经网络的长短期记忆（LSTM）以达到更好的训练效果。完成上述优化设计后，通过仿真实验对本文提出的算法进行了验证。

更新日期：2024-01-14

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>