Penalized Logistic Regression Analysis for Genetic Association Studies of Binary Phenotypes,Human Heredity

当前位置： X-MOL 学术 › Hum. Hered. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Penalized Logistic Regression Analysis for Genetic Association Studies of Binary Phenotypes
Human Heredity ( IF 1.8 ) Pub Date : 2022-06-29

Introduction: Increasingly, logistic regression methods for genetic association studies of binary phenotypes must be able to accommodate data sparsity, which arises from unbalanced case-control ratios and/or rare genetic variants. Sparseness leads to maximum likelihood estimators (MLEs) of log-OR parameters that are biased away from their null value of zero and tests with inflated type I errors. Different penalized likelihood methods have been developed to mitigate sparse data bias. We study penalized logistic regression using a class of log-F priors indexed by a shrinkage parameter m to shrink the biased MLE toward zero. Methods: We proposed a two-step approach to the analysis of a genetic association study: first, a set of variants that show evidence of association with the trait is used to estimate m; second, the estimated m is used for log-F-penalized logistic regression analyses of all variants using data augmentation with standard software. Our estimate of m is the maximizer of a marginal likelihood obtained by integrating the latent log-ORs out of the joint distribution of the parameters and observed data. We consider two approximate approaches to maximizing the marginal likelihood: (i) a Monte Carlo EM algorithm and (ii) a Laplace approximation to each integral, followed by derivative-free optimization of the approximation. Results: We evaluated the statistical properties of our proposed two-step method and compared its performance to other shrinkage methods by a simulation study. Our simulation studies suggest that the proposed log-F-penalized approach has lower bias and mean squared error than other methods considered. We also illustrated the approach on data from a study of genetic associations with “Super Senior” cases and middle-aged controls. Discussion/Conclusion: We have proposed a method for single rare variant analysis with binary phenotypes by logistic regression penalized by log-F priors. Our method has the advantage of being easily extended to correct for confounding due to population structure and genetic relatedness through a data augmentation approach.
Hum Hered 2022;87:69–86

中文翻译：

二元表型遗传关联研究的惩罚逻辑回归分析

简介：越来越多的用于二元表型遗传关联研究的逻辑回归方法必须能够适应数据稀疏性，这是由不平衡的病例对照比和/或罕见的遗传变异引起的。稀疏性导致 log-OR 参数的最大似然估计量 (MLE) 偏离其空值零，并使用夸大的 I 类错误进行测试。已经开发了不同的惩罚似然法来减轻稀疏数据偏差。我们使用一类由收缩参数m索引的 log -F先验来研究惩罚逻辑回归，以将有偏差的 MLE 收缩为零。方法：我们提出了一种分析遗传关联研究的两步法：首先，使用一组显示与性状关联证据的变体来估计m；其次，估计的m用于使用标准软件进行数据扩充的所有变体的 log -F惩罚逻辑回归分析。我们对m的估计是通过从参数和观测数据的联合分布中整合潜在对数 OR 获得的边际似然的最大化。我们考虑两种最大化边际似然的近似方法：(i) 蒙特卡洛 EM 算法和 (ii) 对每个积分的拉普拉斯近似，然后对近似进行无导数优化。结果：我们评估了我们提出的两步法的统计特性，并通过模拟研究将其性能与其他收缩方法进行了比较。我们的模拟研究表明，与其他考虑的方法相比，所提出的log -F惩罚方法具有更低的偏差和均方误差。我们还说明了使用“超级高级”病例和中年对照的遗传关联研究数据的方法。讨论/结论：我们提出了一种通过 log -F先验惩罚的逻辑回归对具有二元表型的单个罕见变异进行分析的方法。我们的方法的优点是可以通过数据增强方法轻松扩展以纠正由于人口结构和遗传相关性引起的混杂。
嗡嗡声 2022 年；87:69–86

更新日期：2022-06-29

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>