Using machine learning and partial dependence to evaluate robustness of best linear unbiased prediction (BLUP) for phenotypic values

Bhandari, Prashant; Lee, Tong Geon

doi:10.1007/s13353-023-00815-2

Using machine learning and partial dependence to evaluate robustness of best linear unbiased prediction (BLUP) for phenotypic values

Plant Genetics • Short Communication
Published: 03 January 2024

Volume 65, pages 283–286, (2024)
Cite this article

Journal of Applied Genetics Aims and scope Submit manuscript

115 Accesses
Explore all metrics

Abstract

Best linear unbiased prediction (BLUP) is widely used in plant research to address experimental variation. For phenotypic values, BLUP accuracy is largely dependent on properly controlled experimental repetition and how variable components are outlined in the model. Thus, determining BLUP robustness implies the need to evaluate contributions from each repetition. Here, we assessed the robustness of BLUP values for simulated or empirical phenotypic datasets, where the BLUP value and each experimental repetition served as dependent and independent (feature) variables, respectively. Our technique incorporated machine learning and partial dependence. First, we compared the feature importance estimated with the neural networks. Second, we compared estimated average marginal effects of individual repetitions, calculated with a partial dependence analysis. We showed that contributions of experimental repetitions are unequal in a phenotypic dataset, suggesting that the calculated BLUP value is likely to be influenced by some repetitions more than others (such as failing to detect simulated true positive associations). To resolve disproportionate sources, variable components in the BLUP model must be further outlined.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data availability

All data generated or analyzed during this study are included in this published article and its supplemental information.

References

Bhandari P, Kim J, Lee TG (2023) Genetic architecture of fresh-market tomato yield. BMC Plant Biol 23:18
Article CAS PubMed PubMed Central Google Scholar
Cheng B, Titterington DM (1994) Neural networks: a review from a statistical perspective. Stat Sci 9:2–30
Cochran WG (1954) The combination of estimates from different experiments. Biometrics 10:101
Article Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Goldstein A, Kapelner A, Bleich J, Pitkin E (2015) Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. J Comput Graph Stat 24:44–65
Article Google Scholar
Greenwell BM (2017) pdp: an R package for constructing partial dependence Plots. R J 9:421
Article Google Scholar
Henderson CR (1950) Estimation of genetic parameters. Ann Math Stat 21:309–310
Google Scholar
Hill RR, Rosenberger JL (1985) Methods for combining data from gemrplasm evaluation trials. Crop Sci 25:467–470
Article Google Scholar
Loh W-Y (2014) Fifty years of classification and regression trees. Int Stat Rev 82:329–348
Article Google Scholar
Milborrow S (2023) Earth: multivariate adaptive regression splines. (http://www.milbo.users.sonic.net/earth/)
Molnar C, Freiesleben T, König G et al (2021) Relating the partial dependence plot and permutation feature importance to the data generating process. https://doi.org/10.48550/arXiv.2109.01433
Book Google Scholar
Pauli D, Chapman SC, Bart R et al (2016) The quest for understanding phenotypic variation via integrated approaches in the field environment. Plant Physiol 172:622–634
CAS PubMed PubMed Central Google Scholar
Piepho HP, Buchse A, Emrich K (2003) A hitchhiker’s guide to mixed models for randomized experiments. J Agron Crop Sci 189:310–322
Article Google Scholar
Piepho HP, Möhring J, Melchinger AE, Büchse A (2008) BLUP for phenotypic selection in plant breeding and variety testing. Euphytica 161:209–228
Article Google Scholar
R Core Team (2022) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Robinson GK (1991) That BLUP is a good thing: the estimation of random effects. Stat Sci 6
Saarela M, Jauhiainen S (2021) Comparison of feature importance measures as explanations for classification models. SN Appl Sci 3:272
Article Google Scholar
Wei P, Lu Z, Song J (2015) Variable importance analysis: a comprehensive review. Reliab Eng Syst Saf 142:399–432
Article Google Scholar
Zhao Q, Hastie T (2021) Causal interpretations of black-box models. J Bus Econ Stat 39:272–281
Article Google Scholar

Download references

Author information

Authors and Affiliations

Horticultural Sciences Department, University of Florida, Gainesville, FL, 32611, USA
Prashant Bhandari & Tong Geon Lee
Bayer, Chesterfield, MO, 63017, USA
Tong Geon Lee

Authors

Prashant Bhandari
View author publications
You can also search for this author in PubMed Google Scholar
Tong Geon Lee
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

PB has contributed to the conception and design of the work; the acquisition, analysis, and interpretation of data; drafted the work; and approved the submitted version. TGL has contributed to the design of the work, the interpretation of data, substantively revised the work, and approved the submitted version. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Tong Geon Lee.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Communicated by: Izabela Pawłowicz

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Key message

Feature importance and partial dependence can be used to evaluate contributions of individual experimental repetitions toward the robustness of best linear unbiased predictions (BLUPs) for phenotypic value estimated using a linear mixed model.

Supplementary information

ESM 1

(PDF 354 kb)

ESM 2

(PDF 72 kb)

ESM 3

(PDF 92 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhandari, P., Lee, T.G. Using machine learning and partial dependence to evaluate robustness of best linear unbiased prediction (BLUP) for phenotypic values. J Appl Genetics 65, 283–286 (2024). https://doi.org/10.1007/s13353-023-00815-2

Download citation

Received: 18 April 2023
Revised: 17 November 2023
Accepted: 30 November 2023
Published: 03 January 2024
Issue Date: May 2024
DOI: https://doi.org/10.1007/s13353-023-00815-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using machine learning and partial dependence to evaluate robustness of best linear unbiased prediction (BLUP) for phenotypic values

Abstract

Access this article

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s note

Key message

Supplementary information

ESM 1

ESM 2

ESM 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation