Significance testing was developed as an objective method for summarizing statistical evidence for a hypothesis. It has been widely adopted in genetic studies, including genome-wide association ...studies and, more recently, exome sequencing studies. However, significance testing in both genome-wide and exome-wide studies must adopt stringent significance thresholds to allow multiple testing, and it is useful only when studies have adequate statistical power, which depends on the characteristics of the phenotype and the putative genetic variant, as well as the study design. Here, we review the principles and applications of significance testing and power calculation, including recently proposed gene-based tests for rare variants.
Interaction between tumor cells and immune cells in the tumor microenvironment is important in cancer development. Immune cells interact with the tumor cells to shape this process. Here, we use ...single-cell RNA sequencing analysis to delineate the immune landscape and tumor heterogeneity in a cohort of patients with HBV-associated human hepatocellular carcinoma (HCC). We found that tumor-associated macrophages suppress tumor T cell infiltration and TIGIT-NECTIN2 interaction regulates the immunosuppressive environment. The cell state transition of immune cells towards a more immunosuppressive and exhaustive status exemplifies the overall cancer-promoting immunocellular landscape. Furthermore, the heterogeneity of global molecular profiles reveals co-existence of intra-tumoral and inter-tumoral heterogeneity, but is more apparent in the latter. This analysis of the immunosuppressive landscape and intercellular interactions provides mechanistic information for the design of efficacious immune-oncology treatments in hepatocellular carcinoma.
The gene has been proposed as an attractive unit of analysis for association studies, but a simple yet valid, powerful, and sufficiently fast method of evaluating the statistical significance of all ...genes in large, genome-wide datasets has been lacking. Here we propose the use of an extended Simes test that integrates functional information and association evidence to combine the p values of the single nucleotide polymorphisms within a gene to obtain an overall p value for the association of the entire gene. Our computer simulations demonstrate that this test is more powerful than the SNP-based test, offers effective control of the type 1 error rate regardless of gene size and linkage-disequilibrium pattern among markers, and does not need permutation or simulation to evaluate empirical significance. Its statistical power in simulated data is at least comparable, and often superior, to that of several alternative gene-based tests. When applied to real genome-wide association study (GWAS) datasets on Crohn disease, the test detected more significant genes than SNP-based tests and alternative gene-based tests. The proposed test, implemented in an open-source package, has the potential to identify additional novel disease-susceptibility genes for complex diseases from large GWAS datasets.
Hepatocellular carcinoma (HCC) is heterogeneous, rendering its current curative treatments ineffective. The emergence of single-cell genomics represents a powerful strategy in delineating the complex ...molecular landscapes of cancers. In this study, we demonstrated the feasibility and merit of using single-cell RNA sequencing to dissect the intra-tumoral heterogeneity and analyze the single-cell transcriptomic landscape to detect rare cell subpopulations of significance. Exploration of the inter-relationship among liver cancer stem cell markers showed two distinct major cell populations according to EPCAM expression, and the EPCAM+ cells had upregulated expression of multiple oncogenes. We also identified a CD24+/CD44+-enriched cell subpopulation within the EPCAM+ cells which had specific signature genes and might indicate a novel stemness-related cell subclone in HCC. Notably, knockdown of signature gene CTSE for CD24+/CD44+ cells significantly reduced self-renewal ability on HCC cells in vitro and the stemness-related role of CTSE was further confirmed by in vivo tumorigenicity assays in nude mice. In summary, single-cell genomics is a useful tool to delineate HCC intratumoral heterogeneity at better resolution. It can identify rare but important cell subpopulations, and may guide better precision medicine in the long run.
•Single-cell transcriptomics dissected the intra-tumoral heterogeneity of hepatocellular carcinoma (HCC).•HCC single cells showed two distinct major cell populations according to EPCAM expression.•CD24+/CD44+-enriched cell subpopulation was identified within the EPCAM+ cells.•CTSE was the most upregulated signature gene in CD24+/CD44+-enriched cells.•Knockdown of CTSE significantly reduced self-renewal ability in vitro and tumorigenicity in vivo.
Current genome-wide association studies (GWAS) use commercial genotyping microarrays that can assay over a million single nucleotide polymorphisms (SNPs). The number of SNPs is further boosted by ...advanced statistical genotype-imputation algorithms and large SNP databases for reference human populations. The testing of a huge number of SNPs needs to be taken into account in the interpretation of statistical significance in such genome-wide studies, but this is complicated by the non-independence of SNPs because of linkage disequilibrium (LD). Several previous groups have proposed the use of the effective number of independent markers (
M
e
) for the adjustment of multiple testing, but current methods of calculation for
M
e
are limited in accuracy or computational speed. Here, we report a more robust and fast method to calculate
M
e
. Applying this efficient method implemented in a free software tool named Genetic type 1 error calculator (GEC), we systematically examined the
M
e
, and the corresponding
p
-value thresholds required to control the genome-wide type 1 error rate at 0.05, for 13 Illumina or Affymetrix genotyping arrays, as well as for HapMap Project and 1000 Genomes Project datasets which are widely used in genotype imputation as reference panels. Our results suggested the use of a
p
-value threshold of ~10
−7
as the criterion for genome-wide significance for early commercial genotyping arrays, but slightly more stringent
p
-value thresholds ~5 × 10
−8
for current or merged commercial genotyping arrays, ~10
−8
for all common SNPs in the 1000 Genomes Project dataset and ~5 × 10
−8
for the common SNPs only within genes.
Genome-wide association studies (GWAS) are commonly employed to study the genetic basis of complex traits/diseases, and a key question is how much heritability could be explained by all single ...nucleotide polymorphisms (SNPs) in GWAS. One widely used approach that relies on summary statistics only is linkage disequilibrium score regression (LDSC); however, this approach requires certain assumptions about the effects of SNPs (e.g., all SNPs contribute to heritability and each SNP contributes equal variance). More flexible modeling methods may be useful. We previously developed an approach recovering the "true" effect sizes from a set of observed
-statistics with an empirical Bayes approach, using only summary statistics. However, methods for standard error (SE) estimation are not available yet, limiting the interpretation of our results and the applicability of the approach. In this study, we developed several resampling-based approaches to estimate the SE of SNP-based heritability, including two jackknife and three parametric bootstrap methods. The resampling procedures are performed at the SNP level as it is most common to estimate heritability from GWAS summary statistics alone. Simulations showed that the delete-
-jackknife and parametric bootstrap approaches provide good estimates of the SE. In particular, the parametric bootstrap approaches yield the lowest root-mean-squared-error (RMSE) of the true SE. We also explored various methods for constructing confidence intervals (CIs). In addition, we applied our method to estimate the SNP-based heritability of 12 immune-related traits (levels of cytokines and growth factors) to shed light on their genetic architecture. We also implemented the methods to compute the sum of heritability explained and the corresponding SE in an R package SumVg. In conclusion, SumVg may provide a useful alternative tool for calculating SNP heritability and estimating SE/CI, which does not rely on distributional assumptions of SNP effects.
Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets ...in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.
Genetic effects on the liability scale are informative for describing the genetic architecture of binary traits, typically diseases. However, most genetic association analyses on binary traits are ...performed by logistic regression, and there is no straightforward method that transforms both effect size estimate and standard error from the logit scale to the liability scale. Here, we derive a simple linear transformation of the log odds ratio and its standard error for a single nucleotide polymorphism (SNP) to an effect size and standard error on the liability scale. We show by analytic calculations and simulations that this approximation is accurate when the disease is common and the SNP effect is small. We also apply this method to estimate the contribution of a SNP near the
RET
gene to the variance of Hirschsprung disease liability, and the age-specific contributions of
APOE4
on the variance of Alzheimer’s disease liability. We discuss the approximate linear inter-relationships between genotype and effect sizes on the observed binary, logit, and liability scales, and the potential applications of the linear approximation to statistical power calculation for binary traits.
Schizophrenia (SCZ) is a debilitating neuropsychiatric disorder with high heritability and complex inheritance. In the past decade, successful identification of numerous susceptibility loci has ...provided useful insights into the molecular etiology of SCZ. However, applications of these findings to clinical classification and diagnosis, risk prediction, or intervention for SCZ have been limited, and elucidating the underlying genomic and molecular mechanisms of SCZ is still challenging. More recently, multiple Omics technologies - genomics, transcriptomics, epigenomics, proteomics, metabolomics, connectomics, and gut microbiomics - have all been applied to examine different aspects of SCZ pathogenesis. Integration of multi-Omics data has thus emerged as an approach to provide a more comprehensive view of biological complexity, which is vital to enable translation into assessments and interventions of clinical benefit to individuals with SCZ. In this review, we provide a broad survey of the single-omics studies of SCZ, summarize the advantages and challenges of different Omics technologies, and then focus on studies in which multiple omics data are integrated to unravel the complex pathophysiology of SCZ. We believe that integration of multi-Omics technologies would provide a roadmap to create a more comprehensive picture of interactions involved in the complex pathogenesis of SCZ, constitute a rich resource for elucidating the potential molecular mechanisms of the illness, and eventually improve clinical assessments and interventions of SCZ to address clinical translational questions from bench to bedside.