当前位置: X-MOL 学术Proteins Struct. Funct. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting the functional state of protein kinases using interpretable graph neural networks from sequence and structural data
Proteins: Structure, Function, and Bioinformatics ( IF 2.9 ) Pub Date : 2023-12-11 , DOI: 10.1002/prot.26641
Ashwin Ravichandran 1 , Juan C. Araque 1 , John W. Lawson 2
Affiliation  

Protein kinases are central to cellular activities and are actively pursued as drug targets for several conditions including cancer and autoimmune diseases. Despite the availability of a large structural database for kinases, methodologies to elucidate the structure–function relationship of these proteins (without manual intervention) are lacking. Such techniques are essential in structural biology and to accelerate drug discovery efforts. Here, we implement an interpretable graph neural network (GNN) framework for classifying the functionally active and inactive states of a large set of protein kinases by only using their tertiary structure and amino acid sequence. We show that the GNN models can classify kinase structures with high accuracy (>97%). We implement the Gradient-weighted Class Activation Mapping for graphs (Graph Grad-CAM) to automatically identify structurally important residues and residue-residue contacts of the kinases without any a priori input. We show that the motifs identified through the Graph Grad-CAM methodology are functionally critical, consistent with the existing kinase literature. Notably, the highly conserved DFG and HRD motifs of the well-known hydrophobic spine are identified by the interpretable framework in addition to some of the lesser known motifs. Further, using Grad-CAM maps as the vector embedding of the protein structures, we identify the subtle differences in the crystal structures among different sub-classes of kinases in the Protein Data Bank (PDB). Frameworks such as the one implemented here, for high-throughput identification of protein structure–function relationships are essential in designing targeted small molecules therapies as well as in engineering new proteins for novel applications.

中文翻译:

使用可解释的图神经网络根据序列和结构数据预测蛋白激酶的功能状态

蛋白激酶是细胞活动的核心,并被积极作为多种疾病(包括癌症和自身免疫性疾病)的药物靶标。尽管有激酶的大型结构数据库,但缺乏阐明这些蛋白质的结构-功能关系(无需人工干预)的方法。这些技术对于结构生物学和加速药物发现工作至关重要。在这里,我们实现了一个可解释的图神经网络(GNN)框架,仅使用大量蛋白激酶的三级结构和氨基酸序列对它们的功能活性和非活性状态进行分类。我们证明 GNN 模型可以高精度 (>97%) 对激酶结构进行分类。我们实现了图的梯度加权类激活映射(Graph Grad-CAM),以自动识别结构上重要的残基和激酶的残基-残基接触,而无需任何先验输入。我们表明,通过 Graph Grad-CAM 方法鉴定的基序在功能上至关重要,与现有的激酶文献一致。值得注意的是,除了一些鲜为人知的基序之外,可解释框架还识别了众所周知的疏水性脊柱的高度保守的 DFG 和 HRD 基序。此外,使用 Grad-CAM 图作为蛋白质结构的向量嵌入,我们识别了蛋白质数据库(PDB)中不同激酶亚类之间晶体结构的细微差异。像这里所实现的这样的框架,用于高通量鉴定蛋白质结构-功能关系,对于设计靶向小分子疗法以及为新应用设计新蛋白质至关重要。
更新日期:2023-12-11
down
wechat
bug