当前位置: X-MOL 学术Curr. Med. Chem. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
EACVP: An ESM-2 LM Framework Combined CNN and CBAM Attention to Predict Anti-coronavirus Peptides
Current Medicinal Chemistry ( IF 4.1 ) Pub Date : 2024-03-16 , DOI: 10.2174/0109298673287899240303164403
Shengli Zhang 1, 2 , Yuanyuan Jing 1 , Yunyun Liang 3
Affiliation  

Background:: The novel coronavirus pneumonia (COVID-19) outbreak in late 2019 killed millions worldwide. Coronaviruses cause diseases such as severe acute respiratory syndrome (SARS-Cov) and SARS-COV-2. Many peptides in the host defense system have antiviral activity. How to establish a set of efficient models to identify anti-coronavirus peptides is a meaningful study. Methods:: Given this, a new prediction model EACVP is proposed. This model uses the evolutionary scale language model (ESM-2 LM) to characterize peptide sequence information. The ESM model is a natural language processing model trained by machine learning technology. It is trained on a highly diverse and dense dataset (UR50/D 2021_04) and uses the pre-trained language model to obtain peptide sequence features with 320 dimensions. Compared with traditional feature extraction methods, the information represented by ESM-2 LM is more comprehensive and stable. Then, the features are input into the convolutional neural network (CNN), and the convolutional block attention module (CBAM) lightweight attention module is used to perform attention operations on CNN in space dimension and channel dimension. To verify the rationality of the model structure, we performed ablation experiments on the benchmark and independent test datasets. We compared the EACVP with existing methods on the independent test dataset. Results:: Experimental results show that ACC, F1-score, and MCC are 3.95%, 35.65% and 0.0725 higher than the most advanced methods, respectively. At the same time, we tested EACVP on ENNAVIA-C and ENNAVIA-D data sets, and the results showed that EACVP has good migration and is a powerful tool for predicting anti-coronavirus peptides. Conclusion:: The results prove that this model EACVP could fully characterize the peptide information and achieve high prediction accuracy. It can be generalized to different data sets. The data and code of the article have been uploaded to https://github.- com/JYY625/EACVP.git.

中文翻译:

EACVP:结合 CNN 和 CBAM 注意力来预测抗冠状病毒肽的 ESM-2 LM 框架

背景:2019 年底爆发的新型冠状病毒肺炎 (COVID-19) 导致全球数百万人死亡。冠状病毒会引起严重急性呼吸综合征 (SARS-Cov) 和 SARS-COV-2 等疾病。宿主防御系统中的许多肽都具有抗病毒活性。如何建立一套高效的模型来鉴定抗冠状病毒肽,是一项有意义的研究。方法:鉴于此,提出了一种新的预测模型EACVP。该模型使用进化尺度语言模型(ESM-2 LM)来表征肽序列信息。 ESM模型是通过机器学习技术训练的自然语言处理模型。它在高度多样化和密集的数据集(UR50/D 2021_04)上进行训练,并使用预训练的语言模型获得320维的肽序列特征。与传统的特征提取方法相比,ESM-2 LM所表示的信息更加全面、稳定。然后,将特征输入到卷积神经网络(CNN)中,利用卷积块注意力模块(CBAM)轻量级注意力模块在空间维度和通道维度上对CNN进行注意力操作。为了验证模型结构的合理性,我们在基准数据集和独立测试数据集上进行了消融实验。我们在独立测试数据集上将 EACVP 与现有方法进行了比较。结果:实验结果表明,ACC、F1-score 和 MCC 分别比最先进的方法高 3.95%、35.65% 和 0.0725。同时,我们在ENNAVIA-C和ENNAVIA-D数据集上测试了EACVP,结果表明EACVP具有良好的迁移性,是预测抗冠状病毒肽的有力工具。结论::结果证明该模型EACVP能够充分表征肽段信息并达到较高的预测精度。它可以推广到不同的数据集。文章数据和代码已上传至https://github.-com/JYY625/EACVP.git。
更新日期:2024-03-16
down
wechat
bug