We introduce a new estimator for the vector of coefficients β in the linear model y = Xβ + z, where X has dimensions n × p with p possibly larger than n. SLOPE, short for Sorted L-One Penalized ...Estimation, is the solution to $_{b \in {\mathbb{R}^p}}^{\min }\frac{1}{2}||y - Xb||_{{l^2}}^2 + {\lambda _1}|b{|_{(1)}} + {\lambda _2}|b{|_{(2)}} + \cdot \cdot \cdot + {\lambda _p}|b{|_{(p)}}$ where ${\lambda _1} \geqslant {\lambda _2} \geqslant \cdot \cdot \cdot \geqslant {\lambda _p} \geqslant 0and|b{|_{(1)}} \geqslant |b{|_{(2)}} \geqslant \cdot \cdot \cdot \geqslant |b{|_{(p)}}$ are the decreasing absolute values of the entries of b. This is a convex program and we demonstrate a solution algorithm whose computational complexity is roughly comparable to that of classical l₁ procedures such as the Lasso. Here, the regularizer is a sorted l₁ norm, which penalizes the regression coefficients according to their rank: the higher the rank—that is, stronger the signal—the larger the penalty. This is similar to the Benjamini and Hochberg J. Roy. Statist. Soc. Ser. B 57 (1995) 289-300 procedure (BH) which compares more significant p-values with more stringent thresholds. One notable choice of the sequence {λi} is given by the BH critical values λBH = z(1 — i · q/2p), where q ∊ (0, 1) and z(α) is the quantile of a standard normal distribution. SLOPE aims to provide finite sample guarantees on the selected model; of special interest is the false discovery rate (FDR), defined as the expected proportion of irrelevant regressors among all selected predictors. Under orthogonal designs, SLOPE with λBH provably controls FDR at level q. Moreover, it also appears to have appreciable inferential properties under more general designs X while having substantial power, as demonstrated in a series of experiments running on both simulated and real data.
Genome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development and clinical guidelines. However, the majority of discovery ...efforts are based on data from populations of European ancestry
. In light of the differential genetic architecture that is known to exist between populations, bias in representation can exacerbate existing disease and healthcare disparities. Critical variants may be missed if they have a low frequency or are completely absent in European populations, especially as the field shifts its attention towards rare variants, which are more likely to be population-specific
. Additionally, effect sizes and their derived risk prediction scores derived in one population may not accurately extrapolate to other populations
. Here we demonstrate the value of diverse, multi-ethnic participants in large-scale genomic studies. The Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioural phenotypes in 49,839 non-European individuals. Using strategies tailored for analysis of multi-ethnic and admixed populations, we describe a framework for analysing diverse populations, identify 27 novel loci and 38 secondary signals at known loci, as well as replicate 1,444 GWAS catalogue associations across these traits. Our data show evidence of effect-size heterogeneity across ancestries for published GWAS associations, substantial benefits for fine-mapping using diverse cohorts and insights into clinical implications. In the United States-where minority populations have a disproportionately higher burden of chronic conditions
-the lack of representation of diverse populations in genetic research will result in inequitable access to precision medicine for those with the highest burden of disease. We strongly advocate for continued, large genome-wide efforts in diverse populations to maximize genetic discovery and reduce health disparities.
Determinants of telomere length across human tissues Demanelis, Kathryn; Jasmine, Farzana; Chen, Lin S ...
Science (American Association for the Advancement of Science),
09/2020, Volume:
369, Issue:
6509
Journal Article
Peer reviewed
Open access
Telomere shortening is a hallmark of aging. Telomere length (TL) in blood cells has been studied extensively as a biomarker of human aging and disease; however, little is known regarding variability ...in TL in nonblood, disease-relevant tissue types. Here, we characterize variability in TLs from 6391 tissue samples, representing >20 tissue types and 952 individuals from the Genotype-Tissue Expression (GTEx) project. We describe differences across tissue types, positive correlation among tissue types, and associations with age and ancestry. We show that genetic variation affects TL in multiple tissue types and that TL may mediate the effect of age on gene expression. Our results provide the foundational knowledge regarding TL in healthy tissues that is needed to interpret epidemiological studies of TL and human health.
We introduce a multiple testing procedure that controls global error rates at multiple levels of resolution. Conceptually, we frame this problem as the selection of hypotheses that are organized ...hierarchically in a tree structure. We describe a fast algorithm and prove that it controls relevant error rates given certain assumptions on the dependence between the
-values. Through simulations, we demonstrate that the proposed procedure provides the desired guarantees under a range of dependency structures and that it has the potential to gain power over alternative methods. Finally, we apply the method to studies on the genetic regulation of gene expression across multiple tissues and on the relation between the gut microbiome and colorectal cancer.
ABSTRACT
The genetic basis of multiple phenotypes such as gene expression, metabolite levels, or imaging features is often investigated by testing a large collection of hypotheses, probing the ...existence of association between each of the traits and hundreds of thousands of genotyped variants. Appropriate multiplicity adjustment is crucial to guarantee replicability of findings, and the false discovery rate (FDR) is frequently adopted as a measure of global error. In the interest of interpretability, results are often summarized so that reporting focuses on variants discovered to be associated to some phenotypes. We show that applying FDR‐controlling procedures on the entire collection of hypotheses fails to control the rate of false discovery of associated variants as well as the expected value of the average proportion of false discovery of phenotypes influenced by such variants. We propose a simple hierarchical testing procedure that allows control of both these error rates and provides a more reliable basis for the identification of variants with functional effects. We demonstrate the utility of this approach through simulation studies comparing various error rates and measures of power for genetic association studies of multiple traits. Finally, we apply the proposed method to identify genetic variants that impact flowering phenotypes in Arabidopsis thaliana, expanding the set of discoveries.