当前位置: X-MOL 学术J. Anim. Breed. Genet. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Impact of multi-output and stacking methods on feed efficiency prediction from genotype using machine learning algorithms
Journal of Animal Breeding and Genetics ( IF 2.6 ) Pub Date : 2023-07-05 , DOI: 10.1111/jbg.12815
Mónica Mora 1, 2 , Pablo González 3 , José Ramón Quevedo 3 , Elena Montañés 3 , Llibertat Tusell 2 , Rob Bergsma 4 , Miriam Piles 2
Affiliation  

Feeding represents the largest economic cost in meat production; therefore, selection to improve traits related to feed efficiency is a goal in most livestock breeding programs. Residual feed intake (RFI), that is, the difference between the actual and the expected feed intake based on animal's requirements, has been used as the selection criteria to improve feed efficiency since it was proposed by Kotch in 1963. In growing pigs, it is computed as the residual of the multiple regression model of daily feed intake (DFI), on average daily gain (ADG), backfat thickness (BFT), and metabolic body weight (MW). Recently, prediction using single-output machine learning algorithms and information from SNPs as predictor variables have been proposed for genomic selection in growing pigs, but like in other species, the prediction quality achieved for RFI has been generally poor. However, it has been suggested that it could be improved through multi-output or stacking methods. For this purpose, four strategies were implemented to predict RFI. Two of them correspond to the computation of RFI in an indirect way using the predicted values of its components obtained from (i) individual (multiple single-output strategy) or (ii) simultaneous predictions (multi-output strategy). The other two correspond to the direct prediction of RFI using (iii) the individual predictions of its components as predictor variables jointly with the genotype (stacking strategy), or (iv) using only the genotypes as predictors of RFI (single-output strategy). The single-output strategy was considered the benchmark. This research aimed to test the former three hypotheses using data recorded from 5828 growing pigs and 45,610 SNPs. For all the strategies two different learning methods were fitted: random forest (RF) and support vector regression (SVR). A nested cross-validation (CV) with an outer 10-folds CV and an inner threefold CV for hyperparameter tuning was implemented to test all strategies. This scheme was repeated using as predictor variables different subsets with an increasing number (from 200 to 3000) of the most informative SNPs identified with RF. Results showed that the highest prediction performance was achieved with 1000 SNPs, although the stability of feature selection was poor (0.13 points out of 1). For all SNP subsets, the benchmark showed the best prediction performance. Using the RF as a learner and the 1000 most informative SNPs as predictors, the mean (SD) of the 10 values obtained in the test sets were: 0.23 (0.04) for the Spearman correlation, 0.83 (0.04) for the zero–one loss, and 0.33 (0.03) for the rank distance loss. We conclude that the information on predicted components of RFI (DFI, ADG, MW, and BFT) does not contribute to improve the quality of the prediction of this trait in relation to the one obtained with the single-output strategy.

中文翻译:

多输出和叠加方法对使用机器学习算法根据基因型预测饲料效率的影响

饲养是肉类生产中最大的经济成本;因此,选择改善与饲料效率相关的性状是大多数牲畜育种计划的目标。剩余采食量(RFI),即基于动物需要量的实际采食量与预期采食量之间的差值,自1963年由Kotch提出以来,一直被用作提高饲料效率的选择标准。在生长猪中,计算为每日采食量 (DFI)、平均日增重 (ADG)、背膘厚度 (BFT) 和代谢体重 (MW) 的多元回归模型的残差。最近,有人提出使用单输出机器学习算法和 SNP 信息作为预测变量进行预测,用于生长猪的基因组选择,但与其他物种一样,RFI 的预测质量普遍较差。然而,有人建议可以通过多输出或堆叠方法对其进行改进。为此,实施了四种策略来预测 RFI。其中两个对应于使用从(i)单独(多个单输出策略)或(ii)同时预测(多输出策略)获得的其分量的预测值以间接方式计算RFI。另外两个对应于 RFI 的直接预测,使用(iii)其成分的单独预测作为预测变量与基因型联合(堆叠策略),或(iv)仅使用基因型作为 RFI 的预测变量(单输出策略) 。单一输出策略被认为是基准。本研究旨在使用 5828 头生长猪和 45,610 个 SNP 记录的数据来检验前三个假设。对于所有策略,都采用了两种不同的学习方法:随机森林(RF)和支持向量回归(SVR)。实施了用于超参数调整的具有外部 10 倍 CV 和内部三倍 CV 的嵌套交叉验证 (CV) 来测试所有策略。使用不同子集作为预测变量重复该方案,其中 RF 识别的信息最丰富的 SNP 数量不断增加(从 200 到 3000 个)。结果显示,虽然特征选择的稳定性较差(0.13分,满分1分),但1000个SNP时预测性能最高。对于所有 SNP 子集,基准测试显示出最佳的预测性能。使用 RF 作为学习器并使用 1000 个信息最丰富的 SNP 作为预测变量,测试集中获得的 10 个值的平均值 (SD) 为:Spearman 相关性为 0.23 (0.04),零一损失为 0.83 (0.04) ,等级距离损失为 0.33 (0.03)。我们得出的结论是,与通过单输出策略获得的预测相比,有关 RFI 预测成分(DFI、ADG、MW 和 BFT)的信息无助于提高该性状的预测质量。
更新日期:2023-07-05
down
wechat
bug