当前位置: X-MOL 学术J. Manuf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A novel collaborative agent reinforcement learning framework based on an attention mechanism and disjunctive graph embedding for flexible job shop scheduling problem
Journal of Manufacturing Systems ( IF 12.1 ) Pub Date : 2024-04-08 , DOI: 10.1016/j.jmsy.2024.03.012
Wenquan Zhang , Fei Zhao , Yong Li , Chao Du , Xiaobing Feng , Xuesong Mei

The Flexible Job Shop Scheduling Problem (FJSP), a classic NP-hard optimization challenge, has a direct impact on manufacturing system efficiency. Considering that the FJSP is more complex than the Job Shop Scheduling Problem (JSSP) due to its involvement of both job and machine selection, we have introduced a collaborative agent reinforcement learning (CARL) architecture to tackle this challenge for the first time. To enhance Co-Markov decision process, we introduced disjunctive graphs for the representation of state features. However, the representation of states and actions often leads to suboptimal solutions due to intricate variability. To achieve superior outcomes, we refined our approach to representing states and actions. During the solving process, we employed Graph Attention Network (GAT) to extract global state information from the disjunctive graph and used a Transformer Encoder to quantitatively capture the competitive relationships among machines. We configured two independent encoder–decoder components for job and machine agents, enabling the generation of two distinct action strategies. Finally, we employed the Soft Actor–Critic (SAC) algorithm and an integrated Deep Q Network (DQN) known as D5QN to train the decision network parameters of job and machine agents. Our experiments revealed that after just one training session, collaborative agents acquired exceptional scheduling strategies. These strategies excel not only in solution quality compared to traditional Priority Dispatching Rules (PDR) but also outperform results achieved by some metaheuristic and reinforcement learning algorithms. Additionally, they exhibit greater speed than OR-Tools. Moreover, the empirical findings on both randomized and benchmark instances underscore the remarkable robustness of our acquired policies in practical, large-scale scenarios. Notably, when confronted with the DPpaulli dataset, characterized by a considerable imbalance between the number of operations and machines, our approach achieved optimality in 11 out of 18 FJSP instances.

中文翻译:

一种基于注意力机制和析取图嵌入的新型协作代理强化学习框架,用于灵活作业车间调度问题

灵活作业车间调度问题(FJSP)是一个经典的 NP 难优化挑战,对制造系统的效率有直接影响。考虑到 FJSP 由于涉及作业和机器选择而比作业车间调度问题(JSSP)更复杂,因此我们首次引入了协作代理强化学习(CARL)架构来应对这一挑战。为了增强 Co-Markov 决策过程,我们引入了析取图来表示状态特征。然而,由于复杂的可变性,状态和动作的表示通常会导致次优解决方案。为了取得卓越的成果,我们改进了代表国家和行动的方法。在求解过程中,我们使用图注意力网络(GAT)从析取图中提取全局状态信息,并使用 Transformer Encoder 定量捕获机器之间的竞争关系。我们为作业和机器代理配置了两个独立的编码器-解码器组件,从而能够生成两种不同的操作策略。最后,我们采用 Soft Actor-Critic (SAC) 算法和称为 D5QN 的集成深度 Q 网络 (DQN) 来训练作业和机器代理的决策网络参数。我们的实验表明,仅经过一次培训,协作智能体就获得了出色的调度策略。与传统的优先级调度规则 (PDR) 相比,这些策略不仅在解决方案质量方面表现出色,而且还优于某些元启发式和强化学习算法所取得的结果。此外,它们的速度比 OR-Tools 更快。此外,随机和基准实例的实证结果强调了我们所获得的政策在实际的大规模场景中的显着稳健性。值得注意的是,当面对以操作数量和机器数量之间相当不平衡为特征的 DPpaulli 数据集时,我们的方法在 18 个 FJSP 实例中的 11 个中实现了最优。
更新日期:2024-04-08
down
wechat
bug