当前位置: X-MOL 学术Gene › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Splice site recognition - deciphering Exon-Intron transitions for genetic insights using Enhanced integrated Block-Level gated LSTM model
Gene ( IF 3.5 ) Pub Date : 2024-04-03 , DOI: 10.1016/j.gene.2024.148429
Mohemmed Sha , Mohamudha Parveen Rahamathulla

Bioinformatics is a contemporary interdisciplinary area focused on analyzing the growing number of genome sequences. Gene variants are differences in DNA sequences among individuals within a population. Splice site recognition is a crucial step in the process of gene expression, where the coding sequences of genes are joined together to form mature messenger RNA (mRNA). These genetic variants that disrupt genes are believed to be the primary reason for neuro-developmental disorders like ASD (Autism Spectrum Disorder) is a neuro-developmental disorder that is diagnosed in individuals, families, and society and occurs as the developmental delay in one among the hundred genes that are associated with these disorders. Missense variants, premature stop codons, or deletions alter both the quality and quantity of encoded proteins. Predicting genes within exons and introns presents main challenges, such as dealing with sequencing errors, short reads, incomplete genes, overlapping, and more. Although many traditional techniques have been utilized in creating an exon prediction system, the primary challenge lies in accurately identifying the length and spliced strand location classification of exons in conjunction with introns. From now on, the suggested approach utilizes a Deep Learning algorithm to analyze intricate and extensive genomic datasets. M−LSTM is utilized to categorize three binary combinations (EI as 1, IE as 2, and none as 3) using spliced DNA strands. The M−LSTM system is able to sequence extensive datasets, ensuring that long information can be stored without any impact on the current input or output. This enables it to recognize and address long-term connections and problems with rapidly increasing gradients. The proposed model is compared internally with Naïve Bayes and Random Forest to assess its efficacy. Additionally, the proposed model's performance is forecasted by utilizing probabilistic parameters like recall, F1-score, precision, and accuracy to assess the effectiveness of the proposed system.

中文翻译:

剪接位点识别 - 使用增强型集成块级门控 LSTM 模型破译外显子-内含子转换以获得遗传见解

生物信息学是一个当代跨学科领域,专注于分析不断增长的基因组序列。基因变异是群体内个体之间 DNA 序列的差异。剪接位点识别是基因表达过程中的关键步骤,其中基因的编码序列连接在一起形成成熟的信使RNA (mRNA)。这些破坏基因的基因变异被认为是神经发育障碍的主要原因,例如 ASD(自闭症谱系障碍)是一种神经发育障碍,在个人、家庭和社会中被诊断出来,并以其中一个人的发育迟缓而发生。与这些疾病相关的数百个基因。错义变异、过早终止密码子或缺失都会改变编码蛋白质的质量和数量。预测外显子和内含子内的基因面临着主要挑战,例如处理测序错误、短读长、不完整的基因、重叠等。尽管许多传统技术已用于创建外显子预测系统,但主要挑战在于准确识别外显子与内含子结合的长度和剪接链位置分类。从现在开始,建议的方法利用深度学习算法来分析复杂且广泛的基因组数据集。 M−LSTM 用于使用剪接的 DNA 链对三个二元组合(EI 为 1,IE 为 2,无为 3)进行分类。 M−LSTM 系统能够对大量数据集进行排序,确保可以存储长信息而不会对当前输入或输出产生任何影响。这使其能够识别并解决梯度快速增加的长期连接和问题。所提出的模型在内部与朴素贝叶斯和随机森林进行比较,以评估其有效性。此外,通过利用召回率、F1 分数、精度和准确度等概率参数来预测所提出模型的性能,以评估所提出系统的有效性。
更新日期:2024-04-03
down
wechat
bug