当前位置: X-MOL 学术Curr. Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
iProm-Yeast: Prediction Tool for Yeast Promoters Based on ML Stacking
Current Bioinformatics ( IF 4 ) Pub Date : 2023-12-01 , DOI: 10.2174/0115748936256869231019113616
Muhammad Shujaat 1 , Hilal Tayara 2 , Sunggoo Yoo 3 , Kil To Chong 4
Affiliation  

Background and Objective: Gene promoters play a crucial role in regulating gene transcription by serving as DNA regulatory elements near transcription start sites. Despite numerous approaches, including alignment signal and content-based methods for promoter prediction, accurately identifying promoters remains challenging due to the lack of explicit features in their sequences. Consequently, many machine learning and deep learning models for promoter identification have been presented, but the performance of these tools is not precise. Most recent investigations have concentrated on identifying sigma or plant promoters. While the accurate identification of Saccharomyces cerevisiae promoters remains an underexplored area. In this study, we introduced “iPromyeast”, a method for identifying yeast promoters. Using genome sequences from the eukaryotic yeast Saccharomyces cerevisiae, we investigate vector encoding and promoter classification. Additionally, we developed a more difficult negative set by employing promoter sequences rather than nonpromoter regions of the genome. The newly developed negative reconstruction approach improves classification and minimizes the amount of false positive predictions. Methods: To overcome the problems associated with promoter prediction, we investigate alternate vector encoding and feature extraction methodologies. Following that, these strategies are coupled with several machine learning algorithms and a 1-D convolutional neural network model. Our results show that the pseudo-dinucleotide composition is preferable for feature encoding and that the machine- learning stacking approach is excellent for accurate promoter categorization. Furthermore, we provide a negative reconstruction method that uses promoter sequences rather than non-promoter regions, resulting in higher classification performance and fewer false positive predictions. Results: Based on the results of 5-fold cross-validation, the proposed predictor, iProm-Yeast, has a good potential for detecting Saccharomyces cerevisiae promoters. The accuracy (Acc) was 86.27%, the sensitivity (Sn) was 82.29%, the specificity (Sp) was 89.47%, the Matthews correlation coefficient (MCC) was 0.72, and the area under the receiver operating characteristic curve (AUROC) was 0.98. We also performed a cross-species analysis to determine the generalizability of iProm-Yeast across other species. Conclusion: iProm-Yeast is a robust method for accurately identifying Saccharomyces cerevisiae promoters. With advanced vector encoding techniques and a negative reconstruction approach, it achieves improved classification accuracy and reduces false positive predictions. In addition, it offers researchers a reliable and precise webserver to study gene regulation in diverse organisms.

中文翻译:

iProm-Yeast:基于 ML Stacking 的酵母启动子预测工具

背景和目的:基因启动子作为转录起始位点附近的 DNA 调控元件,在基因转录调控中发挥着至关重要的作用。尽管有多种方法,包括比对信号和基于内容的启动子预测方法,但由于启动子序列缺乏明确的特征,准确识别启动子仍然具有挑战性。因此,已经提出了许多用于启动子识别的机器学习和深度学习模型,但这些工具的性能并不精确。最近的研究集中在识别西格玛或植物启动子上。虽然酿酒酵母启动子的准确鉴定仍然是一个尚未开发的领域。在这项研究中,我们引入了“iPromyeast”,一种识别酵母启动子的方法。使用来自真核酵母酿酒酵母的基因组序列,我们研究了载体编码和启动子分类。此外,我们通过使用启动子序列而不是基因组的非启动子区域开发了更困难的阴性组。新开发的负重建方法改进了分类并最大限度地减少了误报预测的数量。方法:为了克服与启动子预测相关的问题,我们研究了替代向量编码和特征提取方法。接下来,这些策略与多种机器学习算法和一维卷积神经网络模型相结合。我们的结果表明,伪二核苷酸组合物更适合特征编码,并且机器学习堆叠方法非常适合准确的启动子分类。此外,我们提供了一种使用启动子序列而不是非启动子区域的负重建方法,从而获得更高的分类性能和更少的假阳性预测。结果:基于5倍交叉验证的结果,所提出的预测器iProm-Yeast在检测酿酒酵母启动子方面具有良好的潜力。准确度(Acc)为86.27%,敏感度(Sn)为82.29%,特异度(Sp)为89.47%,马修斯相关系数(MCC)为0.72,受试者工作特征曲线下面积(AUROC)为0.98。我们还进行了跨物种分析,以确定 iProm-Yeast 在其他物种中的普遍性。结论:iProm-Yeast 是准确鉴定酿酒酵母启动子的可靠方法。凭借先进的矢量编码技术和负重建方法,它提高了分类精度并减少了误报预测。此外,它还为研究人员提供了可靠且精确的网络服务器来研究不同生物体中的基因调控。
更新日期:2023-12-01
down
wechat
bug