当前位置: X-MOL 学术Genom. Proteom. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
NetGO 3.0: Protein Language Model Improves Large-scale Functional Annotations
Genomics, Proteomics & Bioinformatics ( IF 9.5 ) Pub Date : 2023-04-17 , DOI: 10.1016/j.gpb.2023.04.001
Shaojun Wang 1 , Ronghui You 1 , Yunjia Liu 2 , Yi Xiong 3 , Shanfeng Zhu 4
Affiliation  

As one of the state-of-the-art automated function prediction (AFP) methods, NetGO 2.0 integrates multi-source information to improve the performance. However, it mainly utilizes the proteins with experimentally supported functional annotations without leveraging valuable information from a vast number of unannotated proteins. Recently, protein language models have been proposed to learn informative representations [e.g., Evolutionary Scale Modeling (ESM)-1b embedding] from protein sequences based on self-supervision. Here, we represented each protein by ESM-1b and used logistic regression (LR) to train a new model, LR-ESM, for AFP. The experimental results showed that LR-ESM achieved comparable performance with the best-performing component of NetGO 2.0. Therefore, by incorporating LR-ESM into NetGO 2.0, we developed NetGO 3.0 to improve the performance of AFP extensively. NetGO 3.0 is freely accessible at https://dmiip.sjtu.edu.cn/ng3.0.



中文翻译:

NetGO 3.0:蛋白质语言模型改进大规模功能注释

作为最先进的自动功能预测(AFP)方法之一,NetGO 2.0 集成了多源信息来提高性能。然而,它主要利用具有实验支持的功能注释的蛋白质,而没有利用大量未注释蛋白质的有价值的信息。最近,蛋白质语言模型被提议从基于自我监督的蛋白质序列中学习信息表示[例如,进化尺度建模(ESM)-1b嵌入]。在这里,我们用 ESM-1b 表示每种蛋白质,并使用逻辑回归 (LR) 来训练 AFP 的新模型 LR-ESM。实验结果表明,LR-ESM 的性能与 NetGO 2.0 中性能最好的组件相当。因此,通过将LR-ESM合并到NetGO 2.0中,我们开发了NetGO 3.0来广泛提高AFP的性能。NetGO 3.0可免费访问https://dmiip.sjtu.edu.cn/ng3.0

更新日期:2023-04-17
down
wechat
bug