样式: 排序: IF: - GO 导出 标记为已读
-
Assessing the feasibility of statistical inference using synthetic antibody-antigen datasets Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2024-04-02 Thomas Minotto, Philippe A. Robert, Ingrid Hobæk Haff, Geir K. Sandve
Simulation frameworks are useful to stress-test predictive models when data is scarce, or to assert model sensitivity to specific data distributions. Such frameworks often need to recapitulate several layers of data complexity, including emergent properties that arise implicitly from the interaction between simulation components. Antibody-antigen binding is a complex mechanism by which an antibody
-
A global test of hybrid ancestry from genome-scale data Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2024-02-17 Md Rejuan Haque, Laura Kubatko
Methods based on the multi-species coalescent have been widely used in phylogenetic tree estimation using genome-scale DNA sequence data to understand the underlying evolutionary relationship between the sampled species. Evolutionary processes such as hybridization, which creates new species through interbreeding between two different species, necessitate inferring a species network instead of a species
-
Integrative pathway analysis with gene expression, miRNA, methylation and copy number variation for breast cancer subtypes Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2024-02-16 Henry Linder, Yuping Zhang, Yunqi Wang, Zhengqing Ouyang
Developments in biotechnologies enable multi-platform data collection for functional genomic units apart from the gene. Profiling of non-coding microRNAs (miRNAs) is a valuable tool for understanding the molecular profile of the cell, both for canonical functions and malignant behavior due to complex diseases. We propose a graphical mixed-effects statistical model incorporating miRNA-gene target relationships
-
Bayesian LASSO for population stratification correction in rare haplotype association studies Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2024-01-18 Zilu Liu, Asuman Seda Turkmen, Shili Lin
Population stratification (PS) is one major source of confounding in both single nucleotide polymorphism (SNP) and haplotype association studies. To address PS, principal component regression (PCR) and linear mixed model (LMM) are the current standards for SNP associations, which are also commonly borrowed for haplotype studies. However, the underfitting and overfitting problems introduced by PCR and
-
When is the allele-sharing dissimilarity between two populations exceeded by the allele-sharing dissimilarity of a population with itself? Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2023-12-11 Xiran Liu, Zarif Ahsan, Tarun K. Martheswaran, Noah A. Rosenberg
Allele-sharing statistics for a genetic locus measure the dissimilarity between two populations as a mean of the dissimilarity between random pairs of individuals, one from each population. Owing to within-population variation in genotype, allele-sharing dissimilarities can have the property that they have a nonzero value when computed between a population and itself. We consider the mathematical properties
-
Mediation analysis method review of high throughput data Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2023-11-28 Qiang Han, Yu Wang, Na Sun, Jiadong Chu, Wei Hu, Yueping Shen
High-throughput technologies have made high-dimensional settings increasingly common, providing opportunities for the development of high-dimensional mediation methods. We aimed to provide useful guidance for researchers using high-dimensional mediation analysis and ideas for biostatisticians to develop it by summarizing and discussing recent advances in high-dimensional mediation analysis. The method
-
Patterns of differential expression by association in omic data using a new measure based on ensemble learning Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2023-11-22 Jorge M. Arevalillo, Raquel Martin-Arevalillo
The ongoing development of high-throughput technologies is allowing the simultaneous monitoring of the expression levels for hundreds or thousands of biological inputs with the proliferation of what has been coined as omic data sources. One relevant issue when analyzing such data sources is concerned with the detection of differential expression across two experimental conditions, clinical status or
-
Integrated regulatory and metabolic networks of the tumor microenvironment for therapeutic target prioritization Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2023-11-21 Tiange Shi, Han Yu, Rachael Hageman Blair
Translation of genomic discovery, such as single-cell sequencing data, to clinical decisions remains a longstanding bottleneck in the field. Meanwhile, computational systems biological models, such as cellular metabolism models and cell signaling pathways, have emerged as powerful approaches to provide efficient predictions in metabolites and gene expression levels, respectively. However, there has
-
Randomized singular value decomposition for integrative subtype analysis of ‘omics data’ using non-negative matrix factorization Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2023-11-08 Yonghui Ni, Jianghua He, Prabhakar Chalise
Integration of multiple ‘omics datasets for differentiating cancer subtypes is a powerful technic that leverages the consistent and complementary information across multi-omics data. Matrix factorization is a common technique used in integrative clustering for identifying latent subtype structure across multi-omics data. High dimensionality of the omics data and long computation time have been common
-
A novel hybrid CNN and BiGRU-Attention based deep learning model for protein function prediction Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2023-09-02 Lavkush Sharma, Akshay Deepak, Ashish Ranjan, Gopalakrishnan Krishnasamy
Proteins are the building blocks of all living things. Protein function must be ascertained if the molecular mechanism of life is to be understood. While CNN is good at capturing short-term relationships, GRU and LSTM can capture long-term dependencies. A hybrid approach that combines the complementary benefits of these deep-learning models motivates our work. Protein Language models, which use attention
-
Accurate and fast small p-value estimation for permutation tests in high-throughput genomic data analysis with the cross-entropy method Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2023-08-25 Yang Shi, Weiping Shi, Mengqiao Wang, Ji-Hyun Lee, Huining Kang, Hui Jiang
Permutation tests are widely used for statistical hypothesis testing when the sampling distribution of the test statistic under the null hypothesis is analytically intractable or unreliable due to finite sample sizes. One critical challenge in the application of permutation tests in genomic studies is that an enormous number of permutations are often needed to obtain reliable estimates of very small
-
CAT PETR: a graphical user interface for differential analysis of phosphorylation and expression data Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2023-08-18 Keegan Flanagan, Steven Pelech, Yossef Av-Gay, Khanh Dao Duc
Antibody microarray data provides a powerful and high-throughput tool to monitor global changes in cellular response to perturbation or genetic manipulation. However, while collecting such data has become increasingly accessible, a lack of specific computational tools has made their analysis limited. Here we present CAT PETR, a user friendly web application for the differential analysis of expression
-
Improving the accuracy and internal consistency of regression-based clustering of high-dimensional datasets Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2023-07-25 Bo Zhang, Jianghua He, Jinxiang Hu, Prabhakar Chalise, Devin C. Koestler
Component-wise Sparse Mixture Regression (CSMR) is a recently proposed regression-based clustering method that shows promise in detecting heterogeneous relationships between molecular markers and a continuous phenotype of interest. However, CSMR can yield inconsistent results when applied to high-dimensional molecular data, which we hypothesize is in part due to inherent limitations associated with
-
A Bayesian model to identify multiple expression patterns with simultaneous FDR control for a multi-factor RNA-seq experiment Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2023-04-21 Yuanyuan Bian, Chong He, Jing Qiu
It is often of research interest to identify genes that satisfy a particular expression pattern across different conditions such as tissues, genotypes, etc. One common practice is to perform differential expression analysis for each condition separately and then take the intersection of differentially expressed (DE) genes or non-DE genes under each condition to obtain genes that satisfy a particular
-
A fast and efficient approach for gene-based association studies of ordinal phenotypes Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2023-02-01 Nanxing Li, Lili Chen, Yajing Zhou, Qianran Wei
Many human disease conditions need to be measured by ordinal phenotypes, so analysis of ordinal phenotypes is valuable in genome-wide association studies (GWAS). However, existing association methods for dichotomous or quantitative phenotypes are not appropriate to ordinal phenotypes. Therefore, based on an aggregated Cauchy association test, we propose a fast and efficient association method to test
-
pwrBRIDGE: a user-friendly web application for power and sample size estimation in batch-confounded microarray studies with dependent samples Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2022-10-10 Qing Xia, Jeffrey A. Thompson, Devin C. Koestler
Batch effect Reduction of mIcroarray data with Dependent samples usinG Empirical Bayes (BRIDGE) is a recently developed statistical method to address the issue of batch effect correction in batch-confounded microarray studies with dependent samples. The key component of the BRIDGE methodology is the use of samples run as technical replicates in two or more batches, “bridging samples”, to inform batch
-
Distinct characteristics of correlation analysis at the single-cell and the population level Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2022-08-02 Guoyu Wu, Yuchao Li
Correlation analysis is widely used in biological studies to infer molecular relationships within biological networks. Recently, single-cell analysis has drawn tremendous interests, for its ability to obtain high-resolution molecular phenotypes. It turns out that there is little overlap of co-expressed genes identified in single-cell level investigations with that of population level investigations
-
Use of SVM-based ensemble feature selection method for gene expression data analysis Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2022-07-13 Shizhi Zhang, Mingjin Zhang
Gene selection is one of the key steps for gene expression data analysis. An SVM-based ensemble feature selection method is proposed in this paper. Firstly, the method builds many subsets by using Monte Carlo sampling. Secondly, ranking all the features on each of the subsets and integrating them to obtain a final ranking list. Finally, the optimum feature set is determined by a backward feature elimination
-
A robust association test with multiple genetic variants and covariates Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2022-06-03 Jen-Yu Lee, Pao-Sheng Shen, Kuang-Fu Cheng
Due to the advancement of genome sequencing techniques, a great stride has been made in exome sequencing such that the association study between disease and genetic variants has become feasible. Some powerful and well-known association tests have been proposed to test the association between a group of genes and the disease of interest. However, some challenges still remain, in particular, many factors
-
Estimation of the covariance structure from SNP allele frequencies Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2022-05-26 Jan van Waaij, Zilong Li, Carsten Wiuf
We propose two new statistics, V ̂ $\hat{V}$ and S ̂ $\hat{S}$ , to disentangle the population history of related populations from SNP frequency data. If the populations are related by a tree, we show by theoretical means as well as by simulation that the new statistics are able to identify the root of a tree correctly, in contrast to standard statistics, such as the observed matrix of F 2-statistics
-
GMEPS: a fast and efficient likelihood approach for genome-wide mediation analysis under extreme phenotype sequencing Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2022-03-10 Janaka S. S. Liyanage, Jeremie H. Estepp, Kumar Srivastava, Yun Li, Motomi Mori, Guolian Kang
Due to many advantages such as higher statistical power of detecting the association of genetic variants in human disorders and cost saving, extreme phenotype sequencing (EPS) is a rapidly emerging study design in epidemiological and clinical studies investigating how genetic variations associate with complex phenotypes. However, the investigation of the mediation effect of genetic variants on phenotypes
-
Challenges for machine learning in RNA-protein interaction prediction Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2022-01-24 Viplove Arora, Guido Sanguinetti
RNA-protein interactions have long being recognised as crucial regulators of gene expression. Recently, the development of scalable experimental techniques to measure these interactions has revolutionised the field, leading to the production of large-scale datasets which offer both opportunities and challenges for machine learning techniques. In this brief note, we will discuss some of the major stumbling
-
Sparse latent factor regression models for genome-wide and epigenome-wide association studies Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2022-01-01 Basile Jumentier, Kevin Caye, Barbara Heude, Johanna Lepeule, Olivier François
Association of phenotypes or exposures with genomic and epigenomic data faces important statistical challenges. One of these challenges is to account for variation due to unobserved confounding factors, such as individual ancestry or cell-type composition in tissues. This issue can be addressed with penalized latent factor regression models, where penalties are introduced to cope with high dimension
-
Frontmatter Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2021-12-01
Article Frontmatter was published on December 1, 2021 in the journal Statistical Applications in Genetics and Molecular Biology (volume 20, issue 4-6).
-
Inference of genetic regulatory networks with regulatory hubs using vector autoregressions and automatic relevance determination with model selections Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2021-12-01 Chi-Kan Chen
The inference of genetic regulatory networks (GRNs) reveals how genes interact with each other. A few genes can regulate many genes as targets to control cell functions. We present new methods based on the order-1 vector autoregression (VAR1) for inferring GRNs from gene expression time series. The methods use the automatic relevance determination (ARD) to incorporate the regulatory hub structure into
-
Batch effect reduction of microarray data with dependent samples using an empirical Bayes approach (BRIDGE) Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2021-12-01 Qing Xia, Jeffrey A. Thompson, Devin C. Koestler
Batch-effects present challenges in the analysis of high-throughput molecular data and are particularly problematic in longitudinal studies when interest lies in identifying genes/features whose expression changes over time, but time is confounded with batch. While many methods to correct for batch-effects exist, most assume independence across samples; an assumption that is unlikely to hold in longitudinal
-
Optimizing weighted gene co-expression network analysis with a multi-threaded calculation of the topological overlap matrix Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2021-12-01 Min Shuai, Dongmei He, Xin Chen
Biomolecular networks are often assumed to be scale-free hierarchical networks. The weighted gene co-expression network analysis (WGCNA) treats gene co-expression networks as undirected scale-free hierarchical weighted networks. The WGCNA R software package uses an Adjacency Matrix to store a network, next calculates the topological overlap matrix (TOM), and then identifies the modules (sub-networks)
-
A hierarchical Bayesian approach for detecting global microbiome associations Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2021-10-29 Farhad Hatami, Emma Beamish, Albert Davies, Rachael Rigby, Frank Dondelinger
The human gut microbiome has been shown to be associated with a variety of human diseases, including cancer, metabolic conditions and inflammatory bowel disease. Current approaches for detecting microbiome associations are limited by relying on specific measures of ecological distance, or only allowing for the detection of associations with individual bacterial species, rather than the whole microbiome
-
Low variability in the underlying cellular landscape adversely affects the performance of interaction-based approaches for conducting cell-specific analyses of DNA methylation in bulk samples Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2021-08-11 Richard Meier, Emily Nissen, Devin C. Koestler
Statistical methods that allow for cell type specific DNA methylation (DNAm) analyses based on bulk-tissue methylation data have great potential to improve our understanding of human disease and have created unprecedented opportunities for new insights using the wealth of publicly available bulk-tissue methylation data. These methodologies involve incorporating interaction terms formed between the
-
AdaReg: data adaptive robust estimation in linear regression with application in GTEx gene expressions Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2021-07-12 Meng Wang, Lihua Jiang, Michael P. Snyder
The Genotype-Tissue Expression (GTEx) project provides a valuable resource of large-scale gene expressions across multiple tissue types. Under various technical noise and unknown or unmeasured factors, how to robustly estimate the major tissue effect becomes challenging. Moreover, different genes exhibit heterogeneous expressions across different tissue types. Therefore, we need a robust method which
-
Collocation based training of neural ordinary differential equations Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2021-07-08 Elisabeth Roesch, Christopher Rackauckas, Michael P. H. Stumpf
The predictive power of machine learning models often exceeds that of mechanistic modeling approaches. However, the interpretability of purely data-driven models, without any mechanistic basis is often complicated, and predictive power by itself can be a poor metric by which we might want to judge different methods. In this work, we focus on the relatively new modeling techniques of neural ordinary
-
Frontmatter Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2021-02-01
Article Frontmatter was published on February 1, 2021 in the journal Statistical Applications in Genetics and Molecular Biology (volume 20, issue 1).
-
An Empirical Bayes approach for the identification of long-range chromosomal interaction from Hi-C data Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2021-02-01 Qi Zhang, Zheng Xu, Yutong Lai
Hi-C experiments have become very popular for studying the 3D genome structure in recent years. Identification of long-range chromosomal interaction, i.e., peak detection, is crucial for Hi-C data analysis. But it remains a challenging task due to the inherent high dimensionality, sparsity and the over-dispersion of the Hi-C count data matrix. We propose EBHiC, an empirical Bayes approach for peak
-
Fine tuned exploration of evolutionary relationships within the protein universe Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2021-02-01 Danilo Gullotto
In the regime of domain classifications, the protein universe unveils a discrete set of folds connected by hierarchical relationships. Instead, at sub-domain-size resolution and because of physical constraints not necessarily requiring evolution to shape polypeptide chains, networks of protein motifs depict a continuous view that lies beyond the extent of hierarchical classification schemes. A number
-
Measuring evolutionary cancer dynamics from genome sequencing, one patient at a time Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2020-12-01 Giulio Caravagna
Cancers progress through the accumulation of somatic mutations which accrue during tumour evolution, allowing some cells to proliferate in an uncontrolled fashion. This growth process is intimately related to latent evolutionary forces moulding the genetic and epigenetic composition of tumour subpopulations. Understanding cancer requires therefore the understanding of these selective pressures. The
-
Inferring dynamic gene regulatory networks with low-order conditional independencies – an evaluation of the method Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2020-12-01 Hamda B. Ajmal, Michael G. Madden
Over a decade ago, Lèbre (2009) proposed an inference method, G1DBN, to learn the structure of gene regulatory networks (GRNs) from high dimensional, sparse time-series gene expression data. Their approach is based on concept of low-order conditional independence graphs that they extend to dynamic Bayesian networks (DBNs). They present results to demonstrate that their method yields better structural
-
Combining dependent p-values by gamma distributions Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2020-12-01 Li-Chu Chien
Combining correlated p -values from multiple hypothesis testing is a most frequently used method for integrating information in genetic and genomic data analysis. However, most existing methods for combining independent p -values from individual component problems into a single unified p -value are unsuitable for the correlational structure among p -values from multiple hypothesis testing. Although
-
Bayesian reconstruction of transmission trees from genetic sequences and uncertain infection times Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2020-12-01 Hesam Montazeri, Susan Little, Mozhgan Mozaffarilegha, Niko Beerenwinkel, Victor DeGruttola
Genetic sequence data of pathogens are increasingly used to investigate transmission dynamics in both endemic diseases and disease outbreaks. Such research can aid in the development of appropriate interventions and in the design of studies to evaluate them. Several computational methods have been proposed to infer transmission chains from sequence data; however, existing methods do not generally reliably
-
Accuracy and sensitivity of different Bayesian methods for genomic prediction using simulation and real data. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2020-08-10 Saheb Foroutaifar
The main objectives of this study were to compare the prediction accuracy of different Bayesian methods for traits with a wide range of genetic architecture using simulation and real data and to assess the sensitivity of these methods to the violation of their assumptions. For the simulation study, different scenarios were implemented based on two traits with low or high heritability and different
-
Spectral dynamic causal modelling of resting-state fMRI: an exploratory study relating effective brain connectivity in the default mode network to genetics. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2020-08-31 Yunlong Nie,Eugene Opoku,Laila Yasmin,Yin Song,Jie Wang,Sidi Wu,Vanessa Scarapicchia,Jodie Gawryluk,Liangliang Wang,Jiguo Cao,Farouk S Nathoo
We conduct an imaging genetics study to explore how effective brain connectivity in the default mode network (DMN) may be related to genetics within the context of Alzheimer’s disease and mild cognitive impairment. We develop an analysis of longitudinal resting-state functional magnetic resonance imaging (rs-fMRI) and genetic data obtained from a sample of 111 subjects with a total of 319 rs-fMRI scans
-
A weighted empirical Bayes risk prediction model using multiple traits. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2020-09-04 Gengxin Li,Lin Hou,Xiaoyu Liu,Cen Wu
With rapid advances in high-throughput sequencing technology, millions of single-nucleotide variants (SNVs) can be simultaneously genotyped in a sequencing study. These SNVs residing in functional genomic regions such as exons may play a crucial role in biological process of the body. In particular, non-synonymous SNVs are closely related to the protein sequence and its function, which are important
-
Understanding hormonal crosstalk in Arabidopsis root development via emulation and history matching. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2020-07-13 Samuel E Jackson,Ian Vernon,Junli Liu,Keith Lindsey
A major challenge in plant developmental biology is to understand how plant growth is coordinated by interacting hormones and genes. To meet this challenge, it is important to not only use experimental data, but also formulate a mathematical model. For the mathematical model to best describe the true biological system, it is necessary to understand the parameter space of the model, along with the links
-
Bivariate traits association analysis using generalized estimating equations in family data. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2020-05-05 Mariza de Andrade,Mauricio A Mazo Lopera,Nubia E Duarte
Genome wide association study (GWAS) is becoming fundamental in the arduous task of deciphering the etiology of complex diseases. The majority of the statistical models used to address the genes-disease association consider a single response variable. However, it is common for certain diseases to have correlated phenotypes such as in cardiovascular diseases. Usually, GWAS typically sample unrelated
-
Bayesian approach to discriminant problems for count data with application to multilocus short tandem repeat dataset. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2020-05-04 Koji Tsukuda,Shuhei Mano,Toshimichi Yamamoto
Short Tandem Repeats (STRs) are a type of DNA polymorphism. This study considers discriminant analysis to determine the population of test individuals using an STR database containing the lengths of STRs observed at more than one locus. The discriminant method based on the Bayes factor is discussed and an improved method is proposed. The main issues are to develop a method that is relatively robust
-
Identification of supervised and sparse functional genomic pathways. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2020-02-29 Fan Zhang,Jeffrey C Miecznikowski,David L Tritchler
Functional pathways involve a series of biological alterations that may result in the occurrence of many diseases including cancer. With the availability of various “omics” technologies it becomes feasible to integrate information from a hierarchy of biological layers to provide a more comprehensive understanding to the disease. In many diseases, it is believed that only a small number of networks
-
Joint variable selection and network modeling for detecting eQTLs. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2020-02-20 Xuan Cao,Lili Ding,Tesfaye B Mersha
In this study, we conduct a comparison of three most recent statistical methods for joint variable selection and covariance estimation with application of detecting expression quantitative trait loci (eQTL) and gene network estimation, and introduce a new hierarchical Bayesian method to be included in the comparison. Unlike the traditional univariate regression approach in eQTL, all four methods correlate
-
An extended model for phylogenetic maximum likelihood based on discrete morphological characters. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2020-02-20 David A Spade
Maximum likelihood is a common method of estimating a phylogenetic tree based on a set of genetic data. However, models of evolution for certain types of genetic data are highly flawed in their specification, and this misspecification can have an adverse impact on phylogenetic inference. Our attention here is focused on extending an existing class of models for estimating phylogenetic trees from discrete
-
Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2019-12-12 Oliver M Crook,Laurent Gatto,Paul D W Kirk
The Dirichlet Process (DP) mixture model has become a popular choice for model-based clustering, largely because it allows the number of clusters to be inferred. The sequential updating and greedy search (SUGS) algorithm (Wang & Dunson, 2011) was proposed as a fast method for performing approximate Bayesian inference in DP mixture models, by posing clustering as a Bayesian model selection (BMS) problem
-
EBADIMEX: an empirical Bayes approach to detect joint differential expression and methylation and to classify samples. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2019-11-16 Tobias Madsen,Michał Świtnicki,Malene Juul,Jakob Skou Pedersen
DNA methylation and gene expression are interdependent and both implicated in cancer development and progression, with many individual biomarkers discovered. A joint analysis of the two data types can potentially lead to biological insights that are not discoverable with separate analyses. To optimally leverage the joint data for identifying perturbed genes and classifying clinical cancer samples,
-
A Bayesian framework for identifying consistent patterns of microbial abundance between body sites. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2019-11-08 Richard Meier,Jeffrey A Thompson,Mei Chung,Naisi Zhao,Karl T Kelsey,Dominique S Michaud,Devin C Koestler
Recent studies have found that the microbiome in both gut and mouth are associated with diseases of the gut, including cancer. If resident microbes could be found to exhibit consistent patterns between the mouth and gut, disease status could potentially be assessed non-invasively through profiling of oral samples. Currently, there exists no generally applicable method to test for such associations
-
Determining the number of components in PLS regression on incomplete data set. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2019-11-06 Titin Agustin Nengsih,Frédéric Bertrand,Myriam Maumy-Bertrand,Nicolas Meyer
Partial least squares regression - or PLS regression - is a multivariate method in which the model parameters are estimated using either the SIMPLS or NIPALS algorithm. PLS regression has been extensively used in applied research because of its effectiveness in analyzing relationships between an outcome and one or several components. Note that the NIPALS algorithm can provide estimates parameters on
-
Clustering methods for single-cell RNA-sequencing expression data: performance evaluation with varying sample sizes and cell compositions. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2019-10-28 Aslı Suner
A number of specialized clustering methods have been developed so far for the accurate analysis of single-cell RNA-sequencing (scRNA-seq) expression data, and several reports have been published documenting the performance measures of these clustering methods under different conditions. However, to date, there are no available studies regarding the systematic evaluation of the performance measures
-
Stability selection for lasso, ridge and elastic net implemented with AFT models. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2019-10-07 Md Hasinur Rahaman Khan,Anamika Bhadra,Tamanna Howlader
The instability in the selection of models is a major concern with data sets containing a large number of covariates. We focus on stability selection which is used as a technique to improve variable selection performance for a range of selection methods, based on aggregating the results of applying a selection procedure to sub-samples of the data where the observations are subject to right censoring
-
Bi-level feature selection in high dimensional AFT models with applications to a genomic study. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2019-09-17 Hailin Huang,Jizi Shangguan,Peifeng Ruan,Hua Liang
We propose a new bi-level feature selection method for high dimensional accelerated failure time models by formulating the models to a single index model. The method yields sparse solutions at both the group and individual feature levels along with an expedient algorithm, which is computationally efficient and easily implemented. We analyze a genomic dataset for an illustration, and present a simulation
-
A novel individualized drug repositioning approach for predicting personalized candidate drugs for type 1 diabetes mellitus. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2019-07-10 Hong Zheng
The existence of high cost-consuming and high rate of drug failures suggests the promotion of drug repositioning in drug discovery. Existing drug repositioning techniques mainly focus on discovering candidate drugs for a kind of disease, and are not suitable for predicting candidate drugs for an individual sample. Type 1 diabetes mellitus (T1DM) is a disorder of glucose homeostasis caused by autoimmune
-
A multivariate linear model for investigating the association between gene-module co-expression and a continuous covariate. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2019-03-15 Trishanta Padayachee,Tatsiana Khamiakova,Ziv Shkedy,Perttu Salo,Markus Perola,Tomasz Burzykowski
A way to enhance our understanding of the development and progression of complex diseases is to investigate the influence of cellular environments on gene co-expression (i.e. gene-pair correlations). Often, changes in gene co-expression are investigated across two or more biological conditions defined by categorizing a continuous covariate. However, the selection of arbitrary cut-off points may have
-
Discrete Wavelet Packet Transform Based Discriminant Analysis for Whole Genome Sequences. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2019-02-15 Hsin-Hsiung Huang,Senthil Balaji Girimurugan
In recent years, alignment-free methods have been widely applied in comparing genome sequences, as these methods compute efficiently and provide desirable phylogenetic analysis results. These methods have been successfully combined with hierarchical clustering methods for finding phylogenetic trees. However, it may not be suitable to apply these alignment-free methods directly to existing statistical
-
LCox: a tool for selecting genes related to survival outcomes using longitudinal gene expression data. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2019-02-13 Jiehuan Sun,Jose D Herazo-Maya,Jane-Ling Wang,Naftali Kaminski,Hongyu Zhao
Longitudinal genomics data and survival outcome are common in biomedical studies, where the genomics data are often of high dimension. It is of great interest to select informative longitudinal biomarkers (e.g. genes) related to the survival outcome. In this paper, we develop a computationally efficient tool, LCox, for selecting informative biomarkers related to the survival outcome using the longitudinal
-
Meta-analytic framework for modeling genetic coexpression dynamics. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2019-02-09 Tyler G Kinzy,Timothy K Starr,George C Tseng,Yen-Yi Ho
Methods for exploring genetic interactions have been developed in an attempt to move beyond single gene analyses. Because biological molecules frequently participate in different processes under various cellular conditions, investigating the changes in gene coexpression patterns under various biological conditions could reveal important regulatory mechanisms. One of the methods for capturing gene coexpression
-
Sliced inverse regression for integrative multi-omics data analysis. Stat. Appl. Genet. Molecul. Biol. (IF 0.9) Pub Date : 2019-01-26 Yashita Jain,Shanshan Ding,Jing Qiu
Advancement in next-generation sequencing, transcriptomics, proteomics and other high-throughput technologies has enabled simultaneous measurement of multiple types of genomic data for cancer samples. These data together may reveal new biological insights as compared to analyzing one single genome type data. This study proposes a novel use of supervised dimension reduction method, called sliced inverse