Abstract
Best linear unbiased prediction (BLUP) is widely used in plant research to address experimental variation. For phenotypic values, BLUP accuracy is largely dependent on properly controlled experimental repetition and how variable components are outlined in the model. Thus, determining BLUP robustness implies the need to evaluate contributions from each repetition. Here, we assessed the robustness of BLUP values for simulated or empirical phenotypic datasets, where the BLUP value and each experimental repetition served as dependent and independent (feature) variables, respectively. Our technique incorporated machine learning and partial dependence. First, we compared the feature importance estimated with the neural networks. Second, we compared estimated average marginal effects of individual repetitions, calculated with a partial dependence analysis. We showed that contributions of experimental repetitions are unequal in a phenotypic dataset, suggesting that the calculated BLUP value is likely to be influenced by some repetitions more than others (such as failing to detect simulated true positive associations). To resolve disproportionate sources, variable components in the BLUP model must be further outlined.
Data availability
All data generated or analyzed during this study are included in this published article and its supplemental information.
References
Bhandari P, Kim J, Lee TG (2023) Genetic architecture of fresh-market tomato yield. BMC Plant Biol 23:18
Cheng B, Titterington DM (1994) Neural networks: a review from a statistical perspective. Stat Sci 9:2–30
Cochran WG (1954) The combination of estimates from different experiments. Biometrics 10:101
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Goldstein A, Kapelner A, Bleich J, Pitkin E (2015) Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. J Comput Graph Stat 24:44–65
Greenwell BM (2017) pdp: an R package for constructing partial dependence Plots. R J 9:421
Henderson CR (1950) Estimation of genetic parameters. Ann Math Stat 21:309–310
Hill RR, Rosenberger JL (1985) Methods for combining data from gemrplasm evaluation trials. Crop Sci 25:467–470
Loh W-Y (2014) Fifty years of classification and regression trees. Int Stat Rev 82:329–348
Milborrow S (2023) Earth: multivariate adaptive regression splines. (http://www.milbo.users.sonic.net/earth/)
Molnar C, Freiesleben T, König G et al (2021) Relating the partial dependence plot and permutation feature importance to the data generating process. https://doi.org/10.48550/arXiv.2109.01433
Pauli D, Chapman SC, Bart R et al (2016) The quest for understanding phenotypic variation via integrated approaches in the field environment. Plant Physiol 172:622–634
Piepho HP, Buchse A, Emrich K (2003) A hitchhiker’s guide to mixed models for randomized experiments. J Agron Crop Sci 189:310–322
Piepho HP, Möhring J, Melchinger AE, Büchse A (2008) BLUP for phenotypic selection in plant breeding and variety testing. Euphytica 161:209–228
R Core Team (2022) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Robinson GK (1991) That BLUP is a good thing: the estimation of random effects. Stat Sci 6
Saarela M, Jauhiainen S (2021) Comparison of feature importance measures as explanations for classification models. SN Appl Sci 3:272
Wei P, Lu Z, Song J (2015) Variable importance analysis: a comprehensive review. Reliab Eng Syst Saf 142:399–432
Zhao Q, Hastie T (2021) Causal interpretations of black-box models. J Bus Econ Stat 39:272–281
Author information
Authors and Affiliations
Contributions
PB has contributed to the conception and design of the work; the acquisition, analysis, and interpretation of data; drafted the work; and approved the submitted version. TGL has contributed to the design of the work, the interpretation of data, substantively revised the work, and approved the submitted version. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Communicated by: Izabela Pawłowicz
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Key message
Feature importance and partial dependence can be used to evaluate contributions of individual experimental repetitions toward the robustness of best linear unbiased predictions (BLUPs) for phenotypic value estimated using a linear mixed model.
Rights and permissions
About this article
Cite this article
Bhandari, P., Lee, T.G. Using machine learning and partial dependence to evaluate robustness of best linear unbiased prediction (BLUP) for phenotypic values. J Appl Genetics 65, 283–286 (2024). https://doi.org/10.1007/s13353-023-00815-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13353-023-00815-2