当前位置: X-MOL 学术Simulation › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Combining multi-agent deep deterministic policy gradient and rerouting technique to improve traffic network performance under mixed traffic conditions
SIMULATION ( IF 1.6 ) Pub Date : 2024-03-22 , DOI: 10.1177/00375497241237831
Hung Tuan Trinh 1 , Sang-Hoon Bae 1 , Duy Quang Tran 2
Affiliation  

In the future, mixed traffic flow will include two types of vehicles: connected autonomous vehicles (CAVs) and human-driven vehicles (HDVs). CAVs emerge as new solutions to disrupt the traditional transportation system. This new solution shares real-time data with each other and the roadside units (RSU) for network management. Reinforcement learning (RL) is a promising approach for traffic signal management in complex urban areas by leveraging information gathered from CAVs. In particular, coordinating signal management at many intersections is a critical challenge in multi-agent reinforcement learning (MARL). According to this vision, we propose an approach that combines an actor–critic network–based multi-agent deep deterministic policy gradient (MADDPG) model and a rerouting technique (RT) to increase traffic performance in vehicular networks. This algorithm overcomes the inherent non-stationary of Q-learning and the high variance of policy gradient (PG) algorithms. Based on centralized learning with decentralized execution, the MADDPG model employs one actor and one critic for each agent. The actor network uses local information to execute actions, while the critic network is trained with extra information, including the states and actions of other agents. Through a centralized learning process, agents can coordinate with each other, diminishing the influence of an unstable environment. Unlike previous studies, we not only manage traffic light systems but also consider the effect of platooning vehicles on increasing throughput. Experimental results show that our model outperforms other models in terms of traffic performance in different scenarios.

中文翻译:

结合多智能体深度确定性策略梯度和重路由技术提高混合流量条件下的交通网络性能

未来,混合交通流将包括两种类型的车辆:联网自动驾驶车辆(CAV)和人类驾驶车辆(HDV)。 CAV 成为颠覆传统交通系统的新解决方案。这一新解决方案可相互共享实时数据并与路边单元 (RSU) 共享实时数据以进行网络管理。强化学习 (RL) 是一种利用 CAV 收集的信息来管理复杂城市地区交通信号的有前途的方法。特别是,协调许多交叉口的信号管理是多智能体强化学习(MARL)中的一个关键挑战。根据这一愿景,我们提出了一种将基于行动者批评家网络的多智能体深度确定性策略梯度(MADDPG)模型和重新路由技术(RT)相结合的方法,以提高车辆网络的流量性能。该算法克服了Q学习固有的非平稳性和策略梯度(PG)算法的高方差。 MADDPG 模型基于集中式学习和分散式执行,为每个智能体配备一名参与者和一名评论家。参与者网络使用本地信息来执行操作,而批评者网络则使用额外信息进行训练,包括其他代理的状态和操作。通过集中学习过程,智能体可以相互协调,减少不稳定环境的影响。与以前的研究不同,我们不仅管理交通灯系统,还考虑队列车辆对增加吞吐量的影响。实验结果表明,我们的模型在不同场景下的流量性能均优于其他模型。
更新日期:2024-03-22
down
wechat
bug