Exome association studies to date have generally been underpowered to systematically evaluate the phenotypic impact of very rare coding variants. We leveraged extensive haplotype sharing between ...49,960 exome-sequenced UK Biobank participants and the remainder of the cohort (total n ≈ 500,000) to impute exome-wide variants with accuracy R
> 0.5 down to minor allele frequency (MAF) ~0.00005. Association and fine-mapping analyses of 54 quantitative traits identified 1,189 significant associations (P < 5 × 10
) involving 675 distinct rare protein-altering variants (MAF < 0.01) that passed stringent filters for likely causality. Across all traits, 49% of associations (578/1,189) occurred in genes with two or more hits; follow-up analyses of these genes identified allelic series containing up to 45 distinct 'likely-causal' variants. Our results demonstrate the utility of within-cohort imputation in population-scale genome-wide association studies, provide a catalog of likely-causal, large-effect coding variant associations and foreshadow the insights that will be revealed as genetic biobank studies continue to grow.
Key message
A physical map of
Secale cereale
chromosome 6R was constructed using deletion mapping, and a new stripe rust resistance gene
Yr83
was mapped to the deletion bin of FL 0.73–1.00 of 6RL.
...Rye (
Secale cereale
L., RR) possesses valuable genes for wheat improvement. In the current study, we report a resistance gene conferring stripe rust resistance effective from seedling to adult plant stages located on chromosome 6R. This chromosome was derived from triticale line T-701 and also carries highly effective resistance to the cereal cyst nematode species
Heterodera avenae
Woll. A wheat-rye 6R(6D) disomic substitution line exhibited high levels of seedling resistance to Australian pathotypes of the stripe rust (
Puccinia striiformis
f. sp.
tritici
;
Pst
) pathogen and showed an even greater resistance to the Chinese
Pst
pathotypes in the field. Ten chromosome 6R deletion lines and five wheat-rye 6R translocation lines were developed earlier in the attempt to transfer the nematode resistance gene to wheat and used herein to map the stripe rust resistance gene. These lines were subsequently characterized by sequential multicolor fluorescence in situ hybridization (mc-FISH), genomic in situ hybridization (GISH), mc-GISH, PCR-based landmark unique gene (PLUG), and chromosome 6R-specific length amplified fragment sequencing (SLAF-Seq) marker analyses to physically map the stripe rust resistance gene. The new stripe rust resistance locus was located in a chromosomal bin with fraction length (FL) 0.73–1.00 on 6RL and was named
Yr83
. A wheat-rye translocation line T6RL (#5) carrying the stripe rust resistance gene will be useful as a new germplasm in breeding for resistance.
The mapping of quantitative trait loci (QTL) is to identify molecular markers or genomic loci that influence the variation of complex traits. The problem is complicated by the facts that QTL data ...usually contain a large number of markers across the entire genome and most of them have little or no effect on the phenotype. In this article, we propose several Bayesian hierarchical models for mapping multiple QTL that simultaneously fit and estimate all possible genetic effects associated with all markers. The proposed models use prior distributions for the genetic effects that are scale mixtures of normal distributions with mean zero and variances distributed to give each effect a high probability of being near zero. We consider two types of priors for the variances, exponential and scaled inverse-chi(2) distributions, which result in a Bayesian version of the popular least absolute shrinkage and selection operator (LASSO) model and the well-known Student's t model, respectively. Unlike most applications where fixed values are preset for hyperparameters in the priors, we treat all hyperparameters as unknowns and estimate them along with other parameters. Markov chain Monte Carlo (MCMC) algorithms are developed to simulate the parameters from the posteriors. The methods are illustrated using well-known barley data.
With the advent of massively parallel sequencing, considerable work has gone into adapting chromosome conformation capture (3C) techniques to study chromosomal architecture at a genome-wide scale. We ...recently demonstrated that the inactive murine X chromosome adopts a bipartite structure using a novel 3C protocol, termed in situ DNase Hi-C. Like traditional Hi-C protocols, in situ DNase Hi-C requires that chromatin be chemically cross-linked, digested, end-repaired, and proximity-ligated with a biotinylated bridge adaptor. The resulting ligation products are optionally sheared, affinity-purified via streptavidin bead immobilization, and subjected to traditional next-generation library preparation for Illumina paired-end sequencing. Importantly, in situ DNase Hi-C obviates the dependence on a restriction enzyme to digest chromatin, instead relying on the endonuclease DNase I. Libraries generated by in situ DNase Hi-C have a higher effective resolution than traditional Hi-C libraries, which makes them valuable in cases in which high sequencing depth is allowed for, or when hybrid capture technologies are expected to be used. The protocol described here, which involves ∼4 d of bench work, is optimized for the study of mammalian cells, but it can be broadly applicable to any cell or tissue of interest, given experimental parameter optimization.
Using the Immunochip custom SNP array, which was designed for dense genotyping of 186 loci identified through genome-wide association studies (GWAS), we analyzed 11,475 individuals with rheumatoid ...arthritis (cases) of European ancestry and 15,870 controls for 129,464 markers. We combined these data in a meta-analysis with GWAS data from additional independent cases (n = 2,363) and controls (n = 17,872). We identified 14 new susceptibility loci, 9 of which were associated with rheumatoid arthritis overall and five of which were specifically associated with disease that was positive for anticitrullinated peptide antibodies, bringing the number of confirmed rheumatoid arthritis risk loci in individuals of European ancestry to 46. We refined the peak of association to a single gene for 19 loci, identified secondary independent effects at 6 loci and identified association to low-frequency variants at 4 loci. Bioinformatic analyses generated strong hypotheses for the causal SNP at seven loci. This study illustrates the advantages of dense SNP mapping analysis to inform subsequent functional investigations.
More accurate and precise phenotyping strategies are necessary to empower high-resolution linkage mapping and genome-wide association studies and for training genomic selection models in plant ...improvement. Within this framework, the objective of modern phenotyping is to increase the accuracy, precision and throughput of phenotypic estimation at all levels of biological organization while reducing costs and minimizing labor through automation, remote sensing, improved data integration and experimental design. Much like the efforts to optimize genotyping during the 1980s and 1990s, designing effective phenotyping initiatives today requires multi-faceted collaborations between biologists, computer scientists, statisticians and engineers. Robust phenotyping systems are needed to characterize the full suite of genetic factors that contribute to quantitative phenotypic variation across cells, organs and tissues, developmental stages, years, environments, species and research programs. Next-generation phenotyping generates significantly more data than previously and requires novel data management, access and storage systems, increased use of ontologies to facilitate data integration, and new statistical tools for enhancing experimental design and extracting biologically meaningful signal from environmental and experimental noise. To ensure relevance, the implementation of efficient and informative phenotyping experiments also requires familiarity with diverse germplasm resources, population structures, and target populations of environments. Today, phenotyping is quickly emerging as the major operational bottleneck limiting the power of genetic analysis and genomic prediction. The challenge for the next generation of quantitative geneticists and plant breeders is not only to understand the genetic basis of complex trait variation, but also to use that knowledge to efficiently synthesize twenty-first century crop varieties.
Abstract
The UCSC Genome Browser (https://genome.ucsc.edu) is a graphical viewer for exploring genome annotations. For almost two decades, the Browser has provided visualization tools for genetics ...and molecular biology and continues to add new data and features. This year, we added a new tool that lets users interactively arrange existing graphing tracks into new groups. Other software additions include new formats for chromosome interactions, a ChIP-Seq peak display for track hubs and improved support for HGVS. On the annotation side, we have added gnomAD, TCGA expression, RefSeq Functional elements, GTEx eQTLs, CRISPR Guides, SNPpedia and created a 30-way primate alignment on the human genome. Nine assemblies now have RefSeq-mapped gene models.
Genome-wide association studies have identified breast cancer risk variants in over 150 genomic regions, but the mechanisms underlying risk remain largely unknown. These regions were explored by ...combining association analysis with in silico genomic feature annotations. We defined 205 independent risk-associated signals with the set of credible causal variants in each one. In parallel, we used a Bayesian approach (PAINTOR) that combines genetic association, linkage disequilibrium and enriched genomic features to determine variants with high posterior probabilities of being causal. Potentially causal variants were significantly over-represented in active gene regulatory regions and transcription factor binding sites. We applied our INQUSIT pipeline for prioritizing genes as targets of those potentially causal variants, using gene expression (expression quantitative trait loci), chromatin interaction and functional annotations. Known cancer drivers, transcription factors and genes in the developmental, apoptosis, immune system and DNA integrity checkpoint gene ontology pathways were over-represented among the highest-confidence target genes.
In just seven years, next-generation technologies have reduced the cost and increased the speed of DNA sequencing by four orders of magnitude, and experiments requiring many millions of sequencing ...reads are now routine. In research, sequencing is being applied not only to assemble genomes and to investigate the genetic basis of human disease, but also to explore myriad phenomena in organismic and cellular biology. In the clinic, the utility of sequence data is being intensively evaluated in diverse contexts, including reproductive medicine, oncology and infectious disease. A recurrent theme in the development of new sequencing applications is the creative 'recombination' of existing experimental building blocks. However, there remain many potentially high-impact applications of next-generation DNA sequencing that are not yet fully realized.