Estimating genomic relationships of metafounders across and within breeds using maximum likelihood, pseudo-expectation–maximization maximum likelihood and increase of relationships,Genetics Selection Evolution

当前位置： X-MOL 学术 › Genet. Sel. Evol. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Estimating genomic relationships of metafounders across and within breeds using maximum likelihood, pseudo-expectation–maximization maximum likelihood and increase of relationships
Genetics Selection Evolution ( IF 4.1 ) Pub Date : 2024-05-02 , DOI: 10.1186/s12711-024-00892-9
Andres Legarra , Matias Bermann , Quanshun Mei , Ole F. Christensen

The theory of “metafounders” proposes a unified framework for relationships across base populations within breeds (e.g. unknown parent groups), and base populations across breeds (crosses) together with a sensible compatibility with genomic relationships. Considering metafounders might be advantageous in pedigree best linear unbiased prediction (BLUP) or single-step genomic BLUP. Existing methods to estimate relationships across metafounders $${\varvec{\Gamma}}$$ are not well adapted to highly unbalanced data, genotyped individuals far from base populations, or many unknown parent groups (within breed per year of birth). We derive likelihood methods to estimate $${\varvec{\Gamma}}$$ . For a single metafounder, summary statistics of pedigree and genomic relationships allow deriving a cubic equation with the real root being the maximum likelihood (ML) estimate of $${\varvec{\Gamma}}$$ . This equation is tested with Lacaune sheep data. For several metafounders, we split the first derivative of the complete likelihood in a term related to $${\varvec{\Gamma}}$$ , and a second term related to Mendelian sampling variances. Approximating the first derivative by its first term results in a pseudo-EM algorithm that iteratively updates the estimate of $${\varvec{\Gamma}}$$ by the corresponding block of the H-matrix. The method extends to complex situations with groups defined by year of birth, modelling the increase of $${\varvec{\Gamma}}$$ using estimates of the rate of increase of inbreeding ( $$\Delta F$$ ), resulting in an expanded $${\varvec{\Gamma}}$$ and in a pseudo-EM+ $$\Delta F$$ algorithm. We compare these methods with the generalized least squares (GLS) method using simulated data: complex crosses of two breeds in equal or unsymmetrical proportions; and in two breeds, with 10 groups per year of birth within breed. We simulate genotyping in all generations or in the last ones. For a single metafounder, the ML estimates of the Lacaune data corresponded to the maximum. For simulated data, when genotypes were spread across all generations, both GLS and pseudo-EM(+ $$\Delta F$$ ) methods were accurate. With genotypes only available in the most recent generations, the GLS method was biased, whereas the pseudo-EM(+ $$\Delta F$$ ) approach yielded more accurate and unbiased estimates. We derived ML, pseudo-EM and pseudo-EM+ $$\Delta F$$ methods to estimate $${\varvec{\Gamma}}$$ in many realistic settings. Estimates are accurate in real and simulated data and have a low computational cost.

中文翻译：

使用最大似然、伪期望最大化和关系增加来估计品种间和品种内元创建者的基因组关系

“元创始人”理论提出了一个统一的框架，用于跨品种内的基础种群（例如未知的亲本群体）和跨品种的基础种群（杂交）之间的关系，以及与基因组关系的合理兼容性。考虑元创始人可能有利于谱系最佳线性无偏预测 (BLUP) 或单步基因组 BLUP。现有的估计元创建者 $${\varvec{\Gamma}}$$ 之间关系的方法不能很好地适应高度不平衡的数据、远离基础人群的基因分型个体或许多未知的亲本群体（出生年份内的品种）。我们推导出似然方法来估计 $${\varvec{\Gamma}}$$ 。对于单个元创建者，谱系和基因组关系的汇总统计允许导出三次方程，其实根是 $${\varvec{\Gamma}}$$ 的最大似然（ML）估计。该方程用 Lacaune 绵羊数据进行了测试。对于几个元创始人，我们将完全似然的一阶导数拆分为与 $${\varvec{\Gamma}}$$ 相关的项，以及与孟德尔采样方差相关的第二项。通过第一项近似一阶导数会产生伪 EM 算法，该算法通过 H 矩阵的相应块迭代更新 $${\varvec{\Gamma}}$$ 的估计。该方法扩展到由出生年份定义的群体的复杂情况，使用近交增长率的估计 ( $$\Delta F$$ ) 对 $${\varvec{\Gamma}}$$ 的增长进行建模，从而得到在扩展的 $${\varvec{\Gamma}}$$ 和伪 EM+ $$\Delta F$$ 算法中。我们使用模拟数据将这些方法与广义最小二乘法（GLS）进行比较：两个品种以相等或不对称比例进行复杂杂交；并有两个品种，品种内每年出生 10 组。我们模拟所有世代或最后世代的基因分型。对于单个元创建者，Lacaune 数据的 ML 估计对应于最大值。对于模拟数据，当基因型分布在所有世代中时，GLS 和伪 EM(+ $$\Delta F$$ ) 方法都是准确的。由于基因型仅在最近几代中可用，GLS 方法存在偏差，而伪 EM(+ $$\Delta F$$ ) 方法产生更准确和无偏差的估计。我们导出了 ML、伪 EM 和伪 EM+ $$\Delta F$$ 方法来估计许多实际设置中的 $${\varvec{\Gamma}}$$。真实和模拟数据的估计是准确的，并且计算成本低。

更新日期：2024-05-02

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>