当前位置: X-MOL 学术Protein Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Classifying protein kinase conformations with machine learning
Protein Science ( IF 8 ) Pub Date : 2024-03-19 , DOI: 10.1002/pro.4918
Ivan Reveguk 1 , Thomas Simonson 1
Affiliation  

Protein kinases are key actors of signaling networks and important drug targets. They cycle between active and inactive conformations, distinguished by a few elements within the catalytic domain. One is the activation loop, whose conserved DFG motif can occupy DFG‐in, DFG‐out, and some rarer conformations. Annotation and classification of the structural kinome are important, as different conformations can be targeted by different inhibitors and activators. Valuable resources exist; however, large‐scale applications will benefit from increased automation and interpretability of structural annotation. Interpretable machine learning models are described for this purpose, based on ensembles of decision trees. To train them, a set of catalytic domain sequences and structures was collected, somewhat larger and more diverse than existing resources. The structures were clustered based on the DFG conformation and manually annotated. They were then used as training input. Two main models were constructed, which distinguished active/inactive and in/out/other DFG conformations. They considered initially 1692 structural variables, spanning the whole catalytic domain, then identified (“learned”) a small subset that sufficed for accurate classification. The first model correctly labeled all but 3 of 3289 structures as active or inactive, while the second assigned the correct DFG label to all but 17 of 8826 structures. The most potent classifying variables were all related to well‐known structural elements in or near the activation loop and their ranking gives insights into the conformational preferences. The models were used to automatically annotate 3850 kinase structures predicted recently with the Alphafold2 tool, showing that Alphafold2 reproduced the active/inactive but not the DFG‐in proportions seen in the Protein Data Bank. We expect the models will be useful for understanding and engineering kinases.

中文翻译:

通过机器学习对蛋白激酶构象进行分类

蛋白激酶是信号网络的关键参与者和重要的药物靶点。它们在活性和非活性构象之间循环,通过催化域内的一些元素来区分。一是激活环,其保守的 DFG 基序可以占据 DFG-in、DFG-out 和一些更罕见的构象。结构激酶组的注释和分类很重要,因为不同的抑制剂和激活剂可以针对不同的构象。有价值的资源是存在的;然而,大规模应用程序将受益于结构注释的自动化和可解释性的提高。为此目的,基于决策树集合描述了可解释的机器学习模型。为了训练它们,收集了一组催化结构域序列和结构,比现有资源更大、更多样化。结构根据 DFG 构象进行聚类并手动注释。然后将它们用作训练输入。构建了两个主要模型,区分活性/非活性和输入/输出/其他 DFG 构象。他们最初考虑了 1692 个结构变量,涵盖整个催化领域,然后识别(“学习”)了足以准确分类的一个小子集。第一个模型将 3289 个结构中除 3 个之外的所有结构正确标记为活动或非活动,而第二个模型将正确的 DFG 标签分配给 8826 个结构中除 17 个之外的所有结构。最有效的分类变量都与激活环中或附近的众所周知的结构元素相关,它们的排名可以深入了解构象偏好。该模型用于自动注释最近使用 Alphafold2 工具预测的 3850 个激酶结构,表明 Alphafold2 再现了活性/非活性比例,但没有再现蛋白质数据库中看到的 DFG-in 比例。我们预计这些模型将有助于理解和改造激酶。
更新日期:2024-03-19
down
wechat
bug