A Comprehensive Review on Machine Learning Techniques for Protein Family Prediction,The Protein Journal

当前位置： X-MOL 学术 › Protein J. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Comprehensive Review on Machine Learning Techniques for Protein Family Prediction
The Protein Journal ( IF 3 ) Pub Date : 2024-03-01 , DOI: 10.1007/s10930-024-10181-5
T. Idhaya , A. Suruliandi , S. P. Raja

Proteomics is a field dedicated to the analysis of proteins in cells, tissues, and organisms, aiming to gain insights into their structures, functions, and interactions. A crucial aspect within proteomics is protein family prediction, which involves identifying evolutionary relationships between proteins by examining similarities in their sequences or structures. This approach holds great potential for applications such as drug discovery and functional annotation of genomes. However, current methods for protein family prediction have certain limitations, including limited accuracy, high false positive rates, and challenges in handling large datasets. Some methods also rely on homologous sequences or protein structures, which introduce biases and restrict their applicability to specific protein families or structures. To overcome these limitations, researchers have turned to machine learning (ML) approaches that can identify connections between protein features and simplify complex high-dimensional datasets. This paper presents a comprehensive survey of articles that employ various ML techniques for predicting protein families. The primary objective is to explore and improve ML techniques specifically for protein family prediction, thus advancing future research in the field. Through qualitative and quantitative analyses of ML techniques, it is evident that multiple methods utilizing a range of classifiers have been applied for protein family prediction. However, there has been limited focus on developing novel classifiers for protein family classification, highlighting the urgent need for improved approaches in this area. By addressing these challenges, this research aims to enhance the accuracy and effectiveness of protein family prediction, ultimately facilitating advancements in proteomics and its diverse applications.

中文翻译：

蛋白质家族预测机器学习技术的综合综述

蛋白质组学是一个致力于分析细胞、组织和生物体中蛋白质的领域，旨在深入了解它们的结构、功能和相互作用。蛋白质组学的一个重要方面是蛋白质家族预测，它涉及通过检查蛋白质序列或结构的相似性来识别蛋白质之间的进化关系。这种方法在药物发现和基因组功能注释等应用中具有巨大的潜力。然而，当前的蛋白质家族预测方法存在一定的局限性，包括准确度有限、误报率高以及处理大型数据集方面的挑战。一些方法还依赖于同源序列或蛋白质结构，这会引入偏差并限制其对特定蛋白质家族或结构的适用性。为了克服这些限制，研究人员转向机器学习 (ML) 方法，该方法可以识别蛋白质特征之间的联系并简化复杂的高维数据集。本文对使用各种机器学习技术来预测蛋白质家族的文章进行了全面的调查。主要目标是探索和改进专门用于蛋白质家族预测的机器学习技术，从而推进该领域的未来研究。通过对机器学习技术的定性和定量分析，很明显，利用一系列分类器的多种方法已应用于蛋白质家族预测。然而，人们对开发用于蛋白质家族分类的新型分类器的关注有限，这凸显了该领域迫切需要改进方法。通过解决这些挑战，本研究旨在提高蛋白质家族预测的准确性和有效性，最终促进蛋白质组学及其多样化应用的进步。

更新日期：2024-03-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>