当前位置: X-MOL 学术Curr. Genomics › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Heuristic Analysis of Genomic Sequence Processing Models for High Efficiency Prediction: A Statistical Perspective
Current Genomics ( IF 2.6 ) Pub Date : 2022-10-07 , DOI: 10.2174/1389202923666220927105311
Aditi R Durge 1 , Deepti D Shrimankar 1 , Ankush D Sawarkar 1
Affiliation  

Genome sequences indicate a wide variety of characteristics, which include species and sub-species type, genotype, diseases, growth indicators, yield quality, etc. To analyze and study the characteristics of the genome sequences across different species, various deep learning models have been proposed by researchers, such as Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), Multilayer Perceptrons (MLPs), etc., which vary in terms of evaluation performance, area of application and species that are processed. Due to a wide differentiation between the algorithmic implementations, it becomes difficult for research programmers to select the best possible genome processing model for their application. In order to facilitate this selection, the paper reviews a wide variety of such models and compares their performance in terms of accuracy, area of application, computational complexity, processing delay, precision and recall. Thus, in the present review, various deep learning and machine learning models have been presented that possess different accuracies for different applications. For multiple genomic data, Repeated Incremental Pruning to Produce Error Reduction with Support Vector Machine (Ripper SVM) outputs 99.7% of accuracy, and for cancer genomic data, it exhibits 99.27% of accuracy using the CNN Bayesian method. Whereas for Covid genome analysis, Bidirectional Long Short-Term Memory with CNN (BiLSTM CNN) exhibits the highest accuracy of 99.95%. A similar analysis of precision and recall of different models has been reviewed. Finally, this paper concludes with some interesting observations related to the genomic processing models and recommends applications for their efficient use.

中文翻译:

用于高效预测的基因组序列处理模型的启发式分析:统计视角

基因组序列指示了种类繁多的特征,包括物种和亚种类型、基因型、疾病、生长指标、产量质量等。为了分析和研究不同物种的基因组序列特征,已经建立了各种深度学习模型研究人员提出的卷积神经网络(CNNs)、深度信念网络(DBNs)、多层感知器(MLPs)等,它们在评估性能、应用领域和处理的物种方面各不相同。由于算法实现之间存在很大差异,研究程序员很难为其应用选择最佳的基因组处理模型。为了方便这次选拔,该论文回顾了各种各样的此类模型,并比较了它们在准确性、应用领域、计算复杂性、处理延迟、精确度和召回率方面的表现。因此,在本综述中,提出了各种深度学习和机器学习模型,它们对不同的应用具有不同的精度。对于多个基因组数据,使用支持向量机 (Ripper SVM) 进行重复增量修剪以减少错误的输出准确度为 99.7%,而对于癌症基因组数据,使用 CNN 贝叶斯方法的准确度为 99.27%。而对于 Covid 基因组分析,带有 CNN 的双向长短期记忆 (BiLSTM CNN) 表现出 99.95% 的最高准确度。对不同模型的精确度和召回率进行了类似的分析。最后,
更新日期:2022-10-07
down
wechat
bug