当前位置: X-MOL 学术Amino Acids › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prediction of matrilineal specific patatin-like protein governing in-vivo maternal haploid induction in maize using support vector machine and di-peptide composition
Amino Acids ( IF 3.5 ) Pub Date : 2024-03-09 , DOI: 10.1007/s00726-023-03368-0
Suman Dutta , Rajkumar U. Zunjare , Anirban Sil , Dwijesh Chandra Mishra , Alka Arora , Nisrita Gain , Gulab Chand , Rashmi Chhabra , Vignesh Muthusamy , Firoz Hossain

The mutant matrilineal (mtl) gene encoding patatin-like phospholipase activity is involved in in-vivo maternal haploid induction in maize. Doubling of chromosomes in haploids by colchicine treatment leads to complete fixation of inbreds in just one generation compared to 6–7 generations of selfing. Thus, knowledge of patatin-like proteins in other crops assumes great significance for in-vivo haploid induction. So far, no online tool is available that can classify unknown proteins into patatin-like proteins. Here, we aimed to optimize a machine learning-based algorithm to predict the patatin-like phospholipase activity of unknown proteins. Four different kernels [radial basis function (RBF), sigmoid, polynomial, and linear] were used for building support vector machine (SVM) classifiers using six different sequence-based compositional features (AAC, DPC, GDPC, CTDC, CTDT, and GAAC). A total of 1170 protein sequences including both patatin-like (585 sequences) from various monocots, dicots, and microbes; and non-patatin-like proteins (585 sequences) from different subspecies of Zea mays were analyzed. RBF and polynomial kernels were quite promising in the prediction of patatin-like proteins. Among six sequence-based compositional features, di-peptide composition attained > 90% prediction accuracies using RBF and polynomial kernels. Using mutual information, most explaining dipeptides that contributed the highest to the prediction process were identified. The knowledge generated in this study can be utilized in other crops prior to the initiation of any experiment. The developed SVM model opened a new paradigm for scientists working in in-vivo haploid induction in commercial crops. This is the first report of machine learning of the identification of proteins with patatin-like activity.



中文翻译:

使用支持向量机和二肽组合物预测控制玉米体内母本单倍体诱导的母系特异性patatin样蛋白

编码patatin 样磷脂酶活性的突变母系( mtl ) 基因参与玉米体内母本单倍体诱导。通过秋水仙碱处理使单倍体中的染色体加倍,从而使近交体在一代内完全固定,而自交则需要 6-7 代。因此,了解其他作物中的patatin样蛋白对于体内单倍体诱导具有重要意义。到目前为止,还没有可用的在线工具可以将未知蛋白质分类为patatin样蛋白质。在这里,我们的目标是优化基于机器学习的算法来预测未知蛋白质的类patatin磷脂酶活性。使用四种不同的内核[径向基函数 (RBF)、S形、多项式和线性] 使用六种不同的基于序列的组合特征(AAC、DPC、GDPC、CTDC、CTDT 和 GAAC)构建支持向量机 (SVM) 分类器)。总共 1170 个蛋白质序列,包括来自各种单子叶植物、双子叶植物和微生物的 patatin 样蛋白(585 个序列);分析了来自不同玉米亚种的非 patatin 样蛋白(585 个序列) 。RBF 和多项式核在预测 patatin 样蛋白方面非常有前景。在六个基于序列的组成特征中,使用 RBF 和多项式核,二肽组成的预测准确度超过 90%。利用相互信息,识别出对预测过程贡献最大的大多数解释性二肽。在开始任何实验之前,本研究中产生的知识可以用于其他作物。开发的 SVM 模型为从事经济作物体内单倍体诱导的科学家开辟了新的范例。这是机器学习识别具有 patatin 样活性的蛋白质的第一份报告。

更新日期:2024-03-11
down
wechat
bug