Skip to main content
Log in

Using machine learning and partial dependence to evaluate robustness of best linear unbiased prediction (BLUP) for phenotypic values

  • Plant Genetics • Short Communication
  • Published:
Journal of Applied Genetics Aims and scope Submit manuscript

Abstract

Best linear unbiased prediction (BLUP) is widely used in plant research to address experimental variation. For phenotypic values, BLUP accuracy is largely dependent on properly controlled experimental repetition and how variable components are outlined in the model. Thus, determining BLUP robustness implies the need to evaluate contributions from each repetition. Here, we assessed the robustness of BLUP values for simulated or empirical phenotypic datasets, where the BLUP value and each experimental repetition served as dependent and independent (feature) variables, respectively. Our technique incorporated machine learning and partial dependence. First, we compared the feature importance estimated with the neural networks. Second, we compared estimated average marginal effects of individual repetitions, calculated with a partial dependence analysis. We showed that contributions of experimental repetitions are unequal in a phenotypic dataset, suggesting that the calculated BLUP value is likely to be influenced by some repetitions more than others (such as failing to detect simulated true positive associations). To resolve disproportionate sources, variable components in the BLUP model must be further outlined.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Data availability

All data generated or analyzed during this study are included in this published article and its supplemental information.

References

  • Bhandari P, Kim J, Lee TG (2023) Genetic architecture of fresh-market tomato yield. BMC Plant Biol 23:18

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Cheng B, Titterington DM (1994) Neural networks: a review from a statistical perspective. Stat Sci 9:2–30

  • Cochran WG (1954) The combination of estimates from different experiments. Biometrics 10:101

    Article  Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

  • Goldstein A, Kapelner A, Bleich J, Pitkin E (2015) Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. J Comput Graph Stat 24:44–65

    Article  Google Scholar 

  • Greenwell BM (2017) pdp: an R package for constructing partial dependence Plots. R J 9:421

    Article  Google Scholar 

  • Henderson CR (1950) Estimation of genetic parameters. Ann Math Stat 21:309–310

    Google Scholar 

  • Hill RR, Rosenberger JL (1985) Methods for combining data from gemrplasm evaluation trials. Crop Sci 25:467–470

    Article  Google Scholar 

  • Loh W-Y (2014) Fifty years of classification and regression trees. Int Stat Rev 82:329–348

    Article  Google Scholar 

  • Milborrow S (2023) Earth: multivariate adaptive regression splines. (http://www.milbo.users.sonic.net/earth/)

  • Molnar C, Freiesleben T, König G et al (2021) Relating the partial dependence plot and permutation feature importance to the data generating process. https://doi.org/10.48550/arXiv.2109.01433

    Book  Google Scholar 

  • Pauli D, Chapman SC, Bart R et al (2016) The quest for understanding phenotypic variation via integrated approaches in the field environment. Plant Physiol 172:622–634

    CAS  PubMed  PubMed Central  Google Scholar 

  • Piepho HP, Buchse A, Emrich K (2003) A hitchhiker’s guide to mixed models for randomized experiments. J Agron Crop Sci 189:310–322

    Article  Google Scholar 

  • Piepho HP, Möhring J, Melchinger AE, Büchse A (2008) BLUP for phenotypic selection in plant breeding and variety testing. Euphytica 161:209–228

    Article  Google Scholar 

  • R Core Team (2022) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria

  • Robinson GK (1991) That BLUP is a good thing: the estimation of random effects. Stat Sci 6

  • Saarela M, Jauhiainen S (2021) Comparison of feature importance measures as explanations for classification models. SN Appl Sci 3:272

    Article  Google Scholar 

  • Wei P, Lu Z, Song J (2015) Variable importance analysis: a comprehensive review. Reliab Eng Syst Saf 142:399–432

    Article  Google Scholar 

  • Zhao Q, Hastie T (2021) Causal interpretations of black-box models. J Bus Econ Stat 39:272–281

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

PB has contributed to the conception and design of the work; the acquisition, analysis, and interpretation of data; drafted the work; and approved the submitted version. TGL has contributed to the design of the work, the interpretation of data, substantively revised the work, and approved the submitted version. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Tong Geon Lee.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Communicated by: Izabela Pawłowicz

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Key message

Feature importance and partial dependence can be used to evaluate contributions of individual experimental repetitions toward the robustness of best linear unbiased predictions (BLUPs) for phenotypic value estimated using a linear mixed model.

Supplementary information

ESM 1

(PDF 354 kb)

ESM 2

(PDF 72 kb)

ESM 3

(PDF 92 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhandari, P., Lee, T.G. Using machine learning and partial dependence to evaluate robustness of best linear unbiased prediction (BLUP) for phenotypic values. J Appl Genetics 65, 283–286 (2024). https://doi.org/10.1007/s13353-023-00815-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13353-023-00815-2

Keywords

Navigation