High-throughput technologies have revolutionized medical research. The advent of genotyping arrays enabled large-scale genome-wide association studies and methods for examining global transcript ...levels, which gave rise to the field of "integrative genetics". Other omics technologies, such as proteomics and metabolomics, are now often incorporated into the everyday methodology of biological researchers. In this review, we provide an overview of such omics technologies and focus on methods for their integration across multiple omics layers. As compared to studies of a single omics type, multi-omics offers the opportunity to understand the flow of information that underlies disease.
Glioma incidence is highest in non‐Hispanic Whites, and to date, glioma genome‐wide association studies (GWAS) to date have only included European ancestry (EA) populations. African Americans and ...Hispanics in the US have varying proportions of EA, African (AA) and Native American ancestries (NAA). It is unknown if identified GWAS loci or increased EA is associated with increased glioma risk. We assessed whether EA was associated with glioma in African Americans and Hispanics. Data were obtained for 832 cases and 675 controls from the Glioma International Case–Control Study and GliomaSE Case–Control Study previously estimated to have <80% EA, or self‐identify as non‐White. We estimated global and local ancestry using fastStructure and RFMix, respectively, using 1,000 genomes project reference populations. Within groups with ≥40% AA (AFR≥0.4), and ≥15% NAA (AMR≥0.15), genome‐wide association between local EA and glioma was evaluated using logistic regression conditioned on global EA for all gliomas. We identified two regions (7q21.11, p = 6.36 × 10−4; 11p11.12, p = 7.0 × 10−4) associated with increased EA, and one associated with decreased EA (20p12.13, p = 0.0026) in AFR≥0.4. In addition, we identified a peak at rs1620291 (p = 4.36 × 10−6) in 7q21.3. Among AMR≥0.15, we found an association between increased EA in one region (12q24.21, p = 8.38 × 10−4), and decreased EA in two regions (8q24.21, p = 0. 0010; 20q13.33, p = 6.36 × 10−4). No other significant associations were identified. This analysis identified an association between glioma and two regions previously identified in EA populations (8q24.21, 20q13.33) and four novel regions (7q21.11, 11p11.12, 12q24.21 and 20p12.13). The identifications of novel association with EA suggest regions to target for future genetic association studies.
What's new?
Glioma is rare in non‐White populations, and most glioma genome‐wide association studies have included only primarily European ancestry populations. Here, the authors assess whether variation in European ancestry is associated with glioma risk in populations with a combination of European, African and Native American ancestry. Based on African American and Hispanic cases from two large glioma case–control studies, this analysis shows that increased European ancestry in admixed populations may be associated with increased glioma risk. The associations between glioma and two chromosomal regions previously identified in European ancestry populations, and four novel regions, may guide future studies.
Genome-wide association studies (GWAS) and fine-mapping efforts to date have identified more than 100 prostate cancer (PrCa)-susceptibility loci. We meta-analyzed genotype data from a custom ...high-density array of 46,939 PrCa cases and 27,910 controls of European ancestry with previously genotyped data of 32,255 PrCa cases and 33,202 controls of European ancestry. Our analysis identified 62 novel loci associated (P < 5.0 × 10
) with PrCa and one locus significantly associated with early-onset PrCa (≤55 years). Our findings include missense variants rs1800057 (odds ratio (OR) = 1.16; P = 8.2 × 10
; G>C, p.Pro1054Arg) in ATM and rs2066827 (OR = 1.06; P = 2.3 × 10
; T>G, p.Val109Gly) in CDKN1B. The combination of all loci captured 28.4% of the PrCa familial relative risk, and a polygenic risk score conferred an elevated PrCa risk for men in the ninetieth to ninety-ninth percentiles (relative risk = 2.69; 95% confidence interval (CI): 2.55-2.82) and first percentile (relative risk = 5.71; 95% CI: 5.04-6.48) risk stratum compared with the population average. These findings improve risk prediction, enhance fine-mapping, and provide insight into the underlying biology of PrCa
.
Common single-nucleotide polymorphisms (SNPs) are predicted to collectively explain 40-50% of phenotypic variation in human height, but identifying the specific variants and associated regions ...requires huge sample sizes
. Here, using data from a genome-wide association study of 5.4 million individuals of diverse ancestries, we show that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a mean size of around 90 kb, covering about 21% of the genome. The density of independent associations varies across the genome and the regions of increased density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs (or all SNPs in the HapMap 3 panel
) account for 40% (45%) of phenotypic variance in populations of European ancestry but only around 10-20% (14-24%) in populations of other ancestries. Effect sizes, associated regions and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely to be explained by linkage disequilibrium and differences in allele frequency within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than are needed to implicate causal genes and variants. Overall, this study provides a comprehensive map of specific genomic regions that contain the vast majority of common height-associated variants. Although this map is saturated for populations of European ancestry, further research is needed to achieve equivalent saturation in other ancestries.
Individuals with psychiatric disorders have elevated rates of autoimmune comorbidity and altered immune signaling. It is unclear whether these altered immunological states have a shared genetic basis ...with those psychiatric disorders. The present study sought to use existing summary‐level data from previous genome‐wide association studies to determine if commonly varying single nucleotide polymorphisms are shared between psychiatric and immune‐related phenotypes. We estimated heritability and examined pair‐wise genetic correlations using the linkage disequilibrium score regression (LDSC) and heritability estimation from summary statistics methods. Using LDSC, we observed significant genetic correlations between immune‐related disorders and several psychiatric disorders, including anorexia nervosa, attention deficit‐hyperactivity disorder, bipolar disorder, major depression, obsessive compulsive disorder, schizophrenia, smoking behavior, and Tourette syndrome. Loci significantly mediating genetic correlations were identified for schizophrenia when analytically paired with Crohn's disease, primary biliary cirrhosis, systemic lupus erythematosus, and ulcerative colitis. We report significantly correlated loci and highlight those containing genome‐wide associations and candidate genes for respective disorders. We also used the LDSC method to characterize genetic correlations among the immune‐related phenotypes. We discuss our findings in the context of relevant genetic and epidemiological literature, as well as the limitations and caveats of the study.
Abstract
Motivation
Estimating microbial association networks from high-throughput sequencing data is a common exploratory data analysis approach aiming at understanding the complex interplay of ...microbial communities in their natural habitat. Statistical network estimation workflows comprise several analysis steps, including methods for zero handling, data normalization and computing microbial associations. Since microbial interactions are likely to change between conditions, e.g. between healthy individuals and patients, identifying network differences between groups is often an integral secondary analysis step. Thus far, however, no unifying computational tool is available that facilitates the whole analysis workflow of constructing, analysing and comparing microbial association networks from high-throughput sequencing data.
Results
Here, we introduce NetCoMi (Network Construction and comparison for Microbiome data), an R package that integrates existing methods for each analysis step in a single reproducible computational workflow. The package offers functionality for constructing and analysing single microbial association networks as well as quantifying network differences. This enables insights into whether single taxa, groups of taxa or the overall network structure change between groups. NetCoMi also contains functionality for constructing differential networks, thus allowing to assess whether single pairs of taxa are differentially associated between two groups. Furthermore, NetCoMi facilitates the construction and analysis of dissimilarity networks of microbiome samples, enabling a high-level graphical summary of the heterogeneity of an entire microbiome sample collection. We illustrate NetCoMi’s wide applicability using data sets from the GABRIELA study to compare microbial associations in settled dust from children’s rooms between samples from two study centers (Ulm and Munich).
Availability
R scripts used for producing the examples shown in this manuscript are provided as supplementary data. The NetCoMi package, together with a tutorial, is available at https://github.com/stefpeschel/NetCoMi.
Contact
Tel:+49 89 3187 43258; stefanie.peschel@mail.de
Supplementary information
Supplementary data are available at Briefings in Bioinformatics online.
Improvement of grain yield is an essential long-term goal of maize (Zea mays) breeding to meet continual and increasing food demands worldwide, but the genetic basis remains unclear.
We used 10 ...different recombination inbred line (RIL) populations genotyped with high-density markers and phenotyped in multiple environments to dissect the genetic architecture of maize ear traits.
Three methods were used to map the quantitative trait loci (QTLs) affecting ear traits. We found 17–34 minor- or moderate-effect loci that influence ear traits, with little epistasis and environmental interactions, totally accounting for 55.4–82% of the phenotypic variation. Four novel QTLs were validated and fine mapped using candidate gene association analysis, expression QTL analysis and heterogeneous inbred family validation.
The combination of multiple different populations is a flexible and manageable way to collaboratively integrate widely available genetic resources, thereby boosting the statistical power of QTL discovery for important traits in agricultural crops, ultimately facilitating breeding programs.
The majority of risk loci identified by genome-wide association studies (GWAS) are in non-coding regions, hampering their functional interpretation. Instead, transcriptome-wide association studies ...(TWAS) identify gene-trait associations, which can be used to prioritize candidate genes in disease-relevant tissue(s). Here, we aimed to systematically identify susceptibility genes for coronary artery disease (CAD) by TWAS. We trained prediction models of nine CAD-relevant tissues using EpiXcan based on two genetics-of-gene-expression panels, the Stockholm-Tartu Atherosclerosis Reverse Network Engineering Task (STARNET) and the Genotype-Tissue Expression (GTEx). Based on these prediction models, we imputed gene expression of respective tissues from individual-level genotype data on 37,997 CAD cases and 42,854 controls for the subsequent gene-trait association analysis. Transcriptome-wide significant association (i.e.
P
< 3.85e−6) was observed for 114 genes. Of these, 96 resided within previously identified GWAS risk loci and 18 were novel. Stepwise analyses were performed to study their plausibility, biological function, and pathogenicity in CAD, including analyses for colocalization, damaging mutations, pathway enrichment, phenome-wide associations with human data and expression-traits correlations using mouse data. Finally, CRISPR/Cas9-based gene knockdown of two newly identified TWAS genes,
RGS19
and
KPTN
, in a human hepatocyte cell line resulted in reduced secretion of APOB100 and lipids in the cell culture medium. Our CAD TWAS work (i) prioritized candidate causal genes at known GWAS loci, (ii) identified 18 novel genes to be associated with CAD, and iii) suggested potential tissues and pathways of action for these TWAS CAD genes.
Zymoseptoria tritici is the causal agent of Septoria tritici blotch, a major pathogen of wheat globally and the most damaging pathogen of wheat in Europe. A gene-for-gene (GFG) interaction between Z. ...tritici and wheat cultivars carrying the Stb6 resistance gene has been postulated for many years, but the genes have not been identified.
We identified AvrStb6 by combining quantitative trait locus mapping in a cross between two Swiss strains with a genome-wide association study using a natural population of c. 100 strains from France. We functionally validated AvrStb6 using ectopic transformations.
AvrStb6 encodes a small, cysteine-rich, secreted protein that produces an avirulence phenotype on wheat cultivars carrying the Stb6 resistance gene. We found 16 nonsynonymous single nucleotide polymorphisms among the tested strains, indicating that AvrStb6 is evolving very rapidly. AvrStb6 is located in a highly polymorphic subtelomeric region and is surrounded by transposable elements, which may facilitate its rapid evolution to overcome Stb6 resistance.
AvrStb6 is the first avirulence gene to be functionally validated in Z. tritici, contributing to our understanding of avirulence in apoplastic pathogens and the mechanisms underlying GFG interactions between Z. tritici and wheat.
Summary
Flowering time is one of the major adaptive traits in domestication of maize and an important selection criterion in breeding. To detect more maize flowering time variants we evaluated ...flowering time traits using an extremely large multi‐ genetic background population that contained more than 8000 lines under multiple Sino‐United States environments. The population included two nested association mapping (NAM) panels and a natural association panel. Nearly 1 million single‐nucleotide polymorphisms (SNPs) were used in the analyses. Through the parallel linkage analysis of the two NAM panels, both common and unique flowering time regions were detected. Genome wide, a total of 90 flowering time regions were identified. One‐third of these regions were connected to traits associated with the environmental sensitivity of maize flowering time. The genome‐wide association study of the three panels identified nearly 1000 flowering time‐associated SNPs, mainly distributed around 220 candidate genes (within a distance of 1 Mb). Interestingly, two types of regions were significantly enriched for these associated SNPs – one was the candidate gene regions and the other was the approximately 5 kb regions away from the candidate genes. Moreover, the associated SNPs exhibited high accuracy for predicting flowering time.
Significance Statement
Flowering time reflects an adaptive response to the environment through floral transition to local conditions and thus is an important selection criterion in plant breeding. Major genetic components that regulate flowering time have been cloned, but quantitative trait locus analyses indicate that there are numerous additional components. To detect more maize flowering time variants we evaluated flowering time traits using an extremely large multi‐background population and identified at least 90 flowering time regions and over 200 candidate genes.