当前位置: X-MOL 学术Knowl. Inf. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
SimGCL: graph contrastive learning by finding homophily in heterophily
Knowledge and Information Systems ( IF 2.7 ) Pub Date : 2024-03-01 , DOI: 10.1007/s10115-023-02022-1
Cheng Liu , Chenhuan Yu , Ning Gui , Zhiwu Yu , Songgaojun Deng

Abstract

Graph Contrastive learning (GCL) has been widely studied in unsupervised graph representation learning. Most existing GCL methods focus on modeling the invariances of identical instances in different augmented views of a graph and using the Graph Neural Network (GNN) as the underlying encoder to generate node representations. GNNs generally learn node representations by aggregating information from their neighbors, where homophily and heterophily in the graph can strongly affect the performance of GNNs. Existing GCL methods neglect the effect of homophily/heterophily in graphs, resulting in sub-optimal learned representations of graphs with more complex patterns, especially in the case of high heterophily. We propose a novel Similarity-based Graph Contrastive Learning model (SimGCL), which generates augmented views with a higher homophily ratio at the topology level by adding or removing edges. We treat dimension-wise features as weak labels and introduce a new similarity metric based on feature and feature dimension-wise distribution patterns as a guide to improving homophily in an unsupervised manner. To preserve node diversity in augmented views, we retain feature dimensions with higher heterophily to amplify the differences between nodes in augmented views at the feature level. We also use the proposed similarity in the negative sampling process to eliminate possible false negative samples. We conduct extensive experiments comparing our model with ten baseline methods on seven benchmark datasets. Experimental results show that SimGCL significantly outperforms the state-of-the-art GCL methods on both homophilic and heterophilic graphs and brings more than 10% improvement on heterophilic graphs.



中文翻译:

SimGCL:通过在异质中寻找同质来进行图对比学习

摘要

图对比学习(GCL)在无监督图表示学习中得到了广泛的研究。大多数现有的 GCL 方法侧重于对图的不同增强视图中相同实例的不变性进行建模,并使用图神经网络(GNN)作为底层编码器来生成节点表示。 GNN 通常通过聚合邻居的信息来学习节点表示,其中图中的同质性和异质性会强烈影响 GNN 的性能。现有的 GCL 方法忽略了图中同质/异质的影响,导致具有更复杂模式的图的学习表示次优,尤其是在高度异质的情况下。我们提出了一种新颖的基于相似性的图对比学习模型(SimGCL),它通过添加或删除边缘来生成在拓扑级别具有更高同质率的增强视图。我们将维度特征视为弱标签,并引入基于特征和特征维度分布模式的新相似性度量,作为以无监督方式改进同质性的指南。为了保持增强视图中的节点多样性,我们保留具有更高异质性的特征维度,以在特征级别放大增强视图中节点之间的差异。我们还在负采样过程中使用所提出的相似性来消除可能的假负样本。我们进行了广泛的实验,将我们的模型与七个基准数据集上的十种基线方法进行比较。实验结果表明,SimGCL 在同性图和异性图上都显着优于最先进的 GCL 方法,并且在异性图上带来了超过 10% 的改进。

更新日期:2024-02-07
down
wechat
bug