当前位置: X-MOL 学术Hum. Hered. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Penalized Logistic Regression Analysis for Genetic Association Studies of Binary Phenotypes
Human Heredity ( IF 1.8 ) Pub Date : 2022-06-29 , DOI: 10.1159/000525650
Ying Yu , Siyuan Chen , Samantha Jean Jones , Rawnak Hoque , Olga Vishnyakova , Angela Brooks-Wilson , Brad McNeney

Introduction: Increasingly, logistic regression methods for genetic association studies of binary phenotypes must be able to accommodate data sparsity, which arises from unbalanced case-control ratios and/or rare genetic variants. Sparseness leads to maximum likelihood estimators (MLEs) of log-OR parameters that are biased away from their null value of zero and tests with inflated type 1 errors. Different penalized-likelihood methods have been developed to mitigate sparse-data bias. We study penalized logistic regression using a class of log-F priors indexed by a shrinkage parameter m to shrink the biased MLE towards zero. For a given m, log-F-penalized logistic regression may be easily implemented using data augmentation and standard software. Method: We propose a two-step approach to the analysis of a genetic association study: first, a set of variants that show evidence of association with the trait is used to estimate m; and second, the estimated m is used for log-F-penalized logistic regression analyses of all variants using data augmentation with standard software. Our estimate of m is the maximizer of a marginal likelihood obtained by integrating the latent log-ORs out of the joint distribution of the parameters and observed data. We consider two approximate approaches to maximizing the marginal likelihood: (i) a Monte Carlo EM algorithm (MCEM) and (ii) a Laplace approximation (LA) to each integral, followed by derivative-free optimization of the approximation. Results: We evaluate the statistical properties of our proposed two-step method and compared its performance to other shrinkage methods by a simulation study. Our simulation studies suggest that the proposed log-F-penalized approach has lower bias and mean squared error than other methods considered. We also illustrate the approach on data from a study of genetic associations with "super senior" cases and middle aged controls. Discussion/Conclusion: We have proposed a method for single rare variant analysis with binary phenotypes by logistic regression penalized by log-F priors. Our method has the advantage of being easily extended to correct for confounding due to population structure and genetic relatedness through a data augmentation approach.


中文翻译:

二元表型遗传关联研究的惩罚逻辑回归分析

简介:越来越多的用于二元表型遗传关联研究的逻辑回归方法必须能够适应由不平衡的病例对照比率和/或罕见遗传变异引起的数据稀疏性。稀疏性导致 log-OR 参数的最大似然估计器 (MLE) 偏离其零值零并使用膨胀的 1 类错误进行测试。已经开发了不同的惩罚似然方法来减轻稀疏数据偏差。我们使用由收缩参数 m 索引的一类 log-F 先验来研究惩罚逻辑回归,以将有偏的 MLE 收缩到零。对于给定的 m,log-F 惩罚逻辑回归可以使用数据增强和标准软件轻松实现。方法:我们提出了一种遗传关联研究分析的两步法:首先,一组显示与性状相关证据的变体用于估计 m;其次,估计的 m 用于使用标准软件的数据增强对所有变体进行 log-F 惩罚逻辑回归分析。我们对 m 的估计是通过将潜在对数或从参数和观察数据的联合分布中整合而获得的边际似然的最大值。我们考虑了两种使边际似然最大化的近似方法:(i)Monte Carlo EM 算法(MCEM)和(ii)每个积分的拉普拉斯近似(LA),然后对近似进行无导数优化。结果:我们评估了我们提出的两步法的统计特性,并通过模拟研究将其性能与其他收缩方法进行了比较。我们的模拟研究表明,与其他考虑的方法相比,所提出的 log-F-penalized 方法具有更低的偏差和均方误差。我们还说明了来自与“超级老年”病例和中年对照的遗传关联研究的数据方法。讨论/结论:我们提出了一种通过 log-F 先验惩罚的逻辑回归对二元表型进行单一罕见变异分析的方法。我们的方法具有通过数据增强方法轻松扩展以纠正由于种群结构和遗传相关性而导致的混杂的优点。我们提出了一种通过 log-F 先验惩罚的逻辑回归对二元表型进行单一罕见变异分析的方法。我们的方法具有通过数据增强方法轻松扩展以纠正由于种群结构和遗传相关性而导致的混杂的优点。我们提出了一种通过 log-F 先验惩罚的逻辑回归对二元表型进行单一罕见变异分析的方法。我们的方法具有通过数据增强方法轻松扩展以纠正由于种群结构和遗传相关性而导致的混杂的优点。
更新日期:2022-06-29
down
wechat
bug