当前位置: X-MOL 学术Curr. Comput.-Aided Drug Des. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
WSHNN: A Weakly Supervised Hybrid Neural Network for the Identification of DNA-Protein Binding Sites
Current Computer-Aided Drug Design ( IF 1.7 ) Pub Date : 2024-02-13 , DOI: 10.2174/0115734099277249240129114123
Wenzheng Bao 1 , Baitong Chen 2 , Yue Zhang 3
Affiliation  

Introduction: Transcription factors are vital biological components that control gene expression, and their primary biological function is to recognize DNA sequences. As related research continues, it was found that the specificity of DNA-protein binding has a significant role in gene expression, regulation, and especially gene therapy. Convolutional Neural Networks (CNNs) have become increasingly popular for predicting DNa-protein-specific binding sites, but their accuracy in prediction needs to be improved. Methods: We proposed a framework for combining multi-Instance Learning (MIL) and a hybrid neural network named WSHNN. First, we utilized sliding windows to split the DNA sequences into multiple overlapping instances, each instance containing multiple bags. Then, the instances were encoded using a K-mer encoding. Afterward, the scores of all instances in the same bag were calculated separately by a hybrid neural network. Results: Finally, a fully connected network was utilized as the final prediction for that bag. The framework could achieve the performances of 90.73% in Pre, 82.77% in Recall, 87.17% in Acc, 0.8657 in F1-score, and 0.7462 in MCC, respectively. In addition, we discussed the performance of K-mer encoding. Compared with other art-of-the-state efforts, the model has better performance with sequence information. Conclusion: From the experimental results, it can be concluded that Bi-directional Long-ShortTerm Memory (Bi-LSTM) can better capture the long-sequence relationships between DNA sequences (the code and data can be visited at https://github.com/baowz12345/Weak_ Super_Network).

中文翻译:

WSHNN:用于识别 DNA-蛋白质结合位点的弱监督混合神经网络

简介:转录因子是控制基因表达的重要生物成分,其主要生物功能是识别DNA序列。随着相关研究的不断进行,人们发现DNA-蛋白质结合的特异性在基因表达、调控,尤其是基因治疗中具有重要作用。卷积神经网络 (CNN) 在预测 DNA 蛋白质特异性结合位点方面越来越受欢迎,但其预测准确性有待提高。方法:我们提出了一个结合多实例学习(MIL)和名为 WSHNN 的混合神经网络的框架。首先,我们利用滑动窗口将 DNA 序列分割成多个重叠实例,每个实例包含多个包。然后,使用 K-mer 编码对实例进行编码。随后,通过混合神经网络分别计算同一包中所有实例的分数。结果:最后,使用完全连接的网络作为该包的最终预测。该框架在Pre、Recall、Acc、F1-score、MCC分别达到90.73%、82.77%、87.17%、0.8657、0.7462的性能。此外,我们还讨论了 K-mer 编码的性能。与其他最先进的技术相比,该模型在序列信息方面具有更好的性能。结论:从实验结果可以看出,双向长短期记忆(Bi-LSTM)可以更好地捕获DNA序列之间的长序列关系(代码和数据可以访问https://github.com/bi-LSTM)。 com/baowz12345/Weak_Super_Network)。
更新日期:2024-02-13
down
wechat
bug