Less Is More -- On the Importance of Sparsification for Transformers and Graph Neural Networks for TSP,arXiv - CS - Artificial Intelligence

当前位置： X-MOL 学术 › arXiv.cs.AI › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Less Is More -- On the Importance of Sparsification for Transformers and Graph Neural Networks for TSP
arXiv - CS - Artificial Intelligence Pub Date : 2024-03-25 , DOI: arxiv-2403.17159
Attila Lischka, Jiaming Wu, Rafael Basso, Morteza Haghir Chehreghani, Balázs Kulcsár

Most of the recent studies tackling routing problems like the Traveling Salesman Problem (TSP) with machine learning use a transformer or Graph Neural Network (GNN) based encoder architecture. However, many of them apply these encoders naively by allowing them to aggregate information over the whole TSP instances. We, on the other hand, propose a data preprocessing method that allows the encoders to focus on the most relevant parts of the TSP instances only. In particular, we propose graph sparsification for TSP graph representations passed to GNNs and attention masking for TSP instances passed to transformers where the masks correspond to the adjacency matrices of the sparse TSP graph representations. Furthermore, we propose ensembles of different sparsification levels allowing models to focus on the most promising parts while also allowing information flow between all nodes of a TSP instance. In the experimental studies, we show that for GNNs appropriate sparsification and ensembles of different sparsification levels lead to substantial performance increases of the overall architecture. We also design a new, state-of-the-art transformer encoder with ensembles of attention masking. These transformers increase model performance from a gap of $0.16\%$ to $0.10\%$ for TSP instances of size 100 and from $0.02\%$ to $0.00\%$ for TSP instances of size 50.

中文翻译：

少即是多——论稀疏化对于 Transformers 和图神经网络对于 TSP 的重要性

最近通过机器学习解决旅行商问题（TSP）等路由问题的大多数研究都使用基于转换器或图神经网络（GNN）的编码器架构。然而，他们中的许多人天真地应用这些编码器，允许它们聚合整个 TSP 实例的信息。另一方面，我们提出了一种数据预处理方法，允许编码器仅关注 TSP 实例中最相关的部分。特别是，我们建议对传递给 GNN 的 TSP 图表示进行图稀疏化，并对传递给 Transformer 的 TSP 实例进行注意力掩码，其中掩码对应于稀疏 TSP 图表示的邻接矩阵。此外，我们提出了不同稀疏化级别的集成，允许模型专注于最有希望的部分，同时还允许 TSP 实例的所有节点之间的信息流。在实验研究中，我们表明，对于 GNN 来说，适当的稀疏化和不同稀疏化级别的集成可以显着提高整体架构的性能。我们还设计了一种新的、最先进的变压器编码器，具有注意力掩蔽集合。这些转换器将大小为 100 的 TSP 实例的模型性能从 $0.16\%$ 提高到 $0.10\%$，将大小为 50 的 TSP 实例的模型性能从 $0.02\%$ 提高到 $0.00\%$。

更新日期：2024-03-28

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>