Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes that are increasingly used in high-throughput sequencing experiments. Through a UMI, identical copies arising from distinct ...molecules can be distinguished from those arising through PCR amplification of the same molecule. However, bioinformatic methods to leverage the information from UMIs have yet to be formalized. In particular, sequencing errors in the UMI sequence are often ignored or else resolved in an ad hoc manner. We show that errors in the UMI sequence are common and introduce network-based methods to account for these errors when identifying PCR duplicates. Using these methods, we demonstrate improved quantification accuracy both under simulated conditions and real iCLIP and single-cell RNA-seq data sets. Reproducibility between iCLIP replicates and single-cell RNA-seq clustering are both improved using our proposed network-based method, demonstrating the value of properly accounting for errors in UMIs. These methods are implemented in the open source UMI-tools software package.
A common question in genomic analysis is whether two sets of genomic intervals overlap significantly. This question arises, for example, when interpreting ChIP-Seq or RNA-Seq data in functional ...terms. Because genome organization is complex, answering this question is non-trivial.
We present Genomic Association Test (GAT), a tool for estimating the significance of overlap between multiple sets of genomic intervals. GAT implements a null model that the two sets of intervals are placed independently of one another, but allows each set's density to depend on external variables, for example, isochore structure or chromosome identity. GAT estimates statistical significance based on simulation and controls for multiple tests using the false discovery rate.
GAT's source code, documentation and tutorials are available at http://code.google.com/p/genomic-association-tester.
Promiscuous gene expression (PGE) by thymic epithelial cells (TEC) is essential for generating a diverse T cell antigen receptor repertoire tolerant to self-antigens, and thus for avoiding ...autoimmunity. Nevertheless, the extent and nature of this unusual expression program within TEC populations and single cells are unknown. Using deep transcriptome sequencing of carefully identified mouse TEC subpopulations, we discovered a program of PGE that is common between medullary (m) and cortical TEC, further elaborated in mTEC, and completed in mature mTEC expressing the autoimmune regulator gene (Aire). TEC populations are capable of expressing up to 19,293 protein-coding genes, the highest number of genes known to be expressed in any cell type. Remarkably, in mouse mTEC, Aire expression alone positively regulates 3980 tissue-restricted genes. Notably, the tissue specificities of these genes include known targets of autoimmunity in human AIRE deficiency. Led by the observation that genes induced by Aire expression are generally characterized by a repressive chromatin state in somatic tissues, we found these genes to be strongly associated with H3K27me3 marks in mTEC. Our findings are consistent with AIRE targeting and inducing the promiscuous expression of genes previously epigenetically silenced by Polycomb group proteins. Comparison of the transcriptomes of 174 single mTEC indicates that genes induced by Aire expression are transcribed stochastically at low cell frequency. Furthermore, when present, Aire expression-dependent transcript levels were 16-fold higher, on average, in individual TEC than in the mTEC population.
Early reports indicate that long non-coding RNAs (lncRNAs) are novel regulators of biological responses. However, their role in the human innate immune response, which provides the initial defence ...against infection, is largely unexplored. To address this issue, here we characterize the long non-coding RNA transcriptome in primary human monocytes using RNA sequencing. We identify 76 enhancer RNAs (eRNAs), 40 canonical lncRNAs, 65 antisense lncRNAs and 35 regions of bidirectional transcription (RBT) that are differentially expressed in response to bacterial lipopolysaccharide (LPS). Crucially, we demonstrate that knockdown of nuclear-localized, NF-κB-regulated, eRNAs (IL1β-eRNA) and RBT (IL1β-RBT46) surrounding the IL1β locus, attenuates LPS-induced messenger RNA transcription and release of the proinflammatory mediators, IL1β and CXCL8. We predict that lncRNAs can be important regulators of the human innate immune response.
Long noncoding RNAs (lncRNAs) are potentially important regulators of cell differentiation and development, but little is known about their roles in B lymphocytes. Using RNA-seq and de novo ...transcript assembly, we identified 4516 lncRNAs expressed in 11 stages of B-cell development and activation. Most of these lncRNAs have not been previously detected, even in the closely related T-cell lineage. Comparison with lncRNAs previously described in human B cells identified 185 mouse lncRNAs that have human orthologs. Using chromatin immunoprecipitation-seq, we classified 20% of the lncRNAs as either enhancer-associated (eRNA) or promoter-associated RNAs. We identified 126 eRNAs whose expression closely correlated with the nearest coding gene, thereby indicating the likely location of numerous enhancers active in the B-cell lineage. Furthermore, using this catalog of newly discovered lncRNAs, we show that PAX5, a transcription factor required to specify the B-cell lineage, bound to and regulated the expression of 109 lncRNAs in pro-B and mature B cells and 184 lncRNAs in acute lymphoblastic leukemia.
•A total of 4516 lncRNAs were identified across multiple stages of B-cell development and activation.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Whole-genome sequencing (WGS) is becoming widely used in clinical medicine in diagnostic contexts and to inform treatment choice. Here we evaluate the potential of the Oxford Nanopore Technologies ...(ONT) MinION long-read sequencer for routine WGS by sequencing the reference sample NA12878 and the genome of an individual with ataxia-pancytopenia syndrome and severe immune dysregulation. We develop and apply a novel reference panel-free analytical method to infer and then exploit phase information which improves single-nucleotide variant (SNV) calling performance from otherwise modest levels. In the clinical sample, we identify and directly phase two non-synonymous de novo variants in SAMD9L, (OMIM #159550) inferring that they lie on the same paternal haplotype. Whilst consensus SNV-calling error rates from ONT data remain substantially higher than those from short-read methods, we demonstrate the substantial benefits of analytical innovation. Ongoing improvements to base-calling and SNV-calling methodology must continue for nanopore sequencing to establish itself as a primary method for clinical WGS.
The mechanisms by which the major Polycomb group (PcG) complexes PRC1 and PRC2 are recruited to target sites in vertebrate cells are not well understood. Building on recent studies that determined ...a reciprocal relationship between DNA methylation and Polycomb activity, we demonstrate that, in methylation-deficient embryonic stem cells (ESCs), CpG density combined with antagonistic effects of H3K9me3 and H3K36me3 redirects PcG complexes to pericentric heterochromatin and gene-rich domains. Surprisingly, we find that PRC1-linked H2A monoubiquitylation is sufficient to recruit PRC2 to chromatin in vivo, suggesting a mechanism through which recognition of unmethylated CpG determines the localization of both PRC1 and PRC2 at canonical and atypical target sites. We discuss our data in light of emerging evidence suggesting that PcG recruitment is a default state at licensed chromatin sites, mediated by interplay between CpG hypomethylation and counteracting H3 tail modifications.
Display omitted
•Absence of DNA methylation recruits Polycomb complexes to pericentric heterochromatin•H3K9me3 antagonizes activity of PRC2, but not PRC1, at pericentric heterochromatin•CpG density and antagonism by H3 modifications define genome-wide Polycomb occupancy•PRC1-mediated H2AK119u1 recruits PRC2 and H3K27me3
Polycomb group proteins are important repressors of developmentally regulated genes, but how these complexes are recruited to their target genes is still largely unknown. In this study, Cooper et al. show that Polycomb group protein recruitment is a combinatorial readout of unmethylated CpG density and antagonism by specific histone tail modifications. Unexpectedly, they also show that monoubiquitylated histone H2A, the modification produced by Polycomb repressor complex 1 (PRC1), is sufficient to recruit PRC2.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
miRNAs have shown promise as potential biomarkers for acute myocardial infarction (AMI). However, the current used quantitative real-time PCR (qRT-PCR) allows solely for relative expression of ...nucleic acids and it is susceptible to day-to-day variability, which has limited the validity of using the miRNAs as biomarkers. In this study we explored the technical qualities and diagnostic potential of a new technique, chip-based digital PCR, in quantifying the miRNAs in patients with AMI and ischaemia-reperfusion injury (I/R). In a dilution series of synthetic C.elegans-miR-39, chip-based digital PCR displayed a lower coefficient of variation (8.9% vs 46.3%) and a lower limit of detection (0.2 copies/μL vs 1.1 copies/μL) compared with qRT-PCR. In the serum collected from 24 patients with ST-elevation myocardial infarction (STEMI) and 20 patients with stable coronary artery disease (CAD) patients after percutaneous coronary intervention (PCI), we used qRT-PCR and multiplexed chip-based digital PCR to quantify the serum levels of miRNA-21 and miRNA-499 as they have been validated in AMI in prior studies. In STEMI, I/R injury was assessed via measurement of ST-segment resolution (ST-R). Chip-based digital PCR revealed a statistical significance in the difference of miR-21 levels between stable CAD and STEMI groups (118.8 copies/μL vs 59 copies/μL; P=0.0300), whereas qRT-PCR was unable to reach significance (136.4 copies/μL vs 122.8 copies/μL; P=0.2273). For miR-499 levels, both chip-based digital PCR and qRT-PCR revealed statistically significant differences between stable CAD and STEMI groups (2 copies/μL vs 8.5 copies/μL, P=0.0011; 0 copies/μL vs 19.4 copies/μL; P<0.0001). There was no association between miR-21/499 levels and ST-R post-PCI. Our results show that the chip-based digital PCR exhibits superior technical qualities and promises to be a superior method for quantifying miRNA levels in the circulation, which may become a more accurate and reproducible method for directly quantifying miRNAs, particularly for use in large multi-centre clinical trials.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Two-thirds of gene promoters in mammals are associated with regions of non-methylated DNA, called CpG islands (CGIs), which counteract the repressive effects of DNA methylation on chromatin. In ...cold-blooded vertebrates, computational CGI predictions often reside away from gene promoters, suggesting a major divergence in gene promoter architecture across vertebrates. By experimentally identifying non-methylated DNA in the genomes of seven diverse vertebrates, we instead reveal that non-methylated islands (NMIs) of DNA are a central feature of vertebrate gene promoters. Furthermore, NMIs are present at orthologous genes across vast evolutionary distances, revealing a surprising level of conservation in this epigenetic feature. By profiling NMIs in different tissues and developmental stages we uncover a unifying set of features that are central to the function of NMIs in vertebrates. Together these findings demonstrate an ancient logic for NMI usage at gene promoters and reveal an unprecedented level of epigenetic conservation across vertebrate evolution. DOI:http://dx.doi.org/10.7554/eLife.00348.001.
Sequence alignment underpins all of comparative genomics, yet it remains an incompletely solved problem. In particular, the statistical uncertainty within inferred alignments is often disregarded, ...while parametric or phylogenetic inferences are considered meaningless without confidence estimates. Here, we report on a theoretical and simulation study of pairwise alignments of genomic DNA at human-mouse divergence. We find that >15% of aligned bases are incorrect in existing whole-genome alignments, and we identify three types of alignment error, each leading to systematic biases in all algorithms considered. Careful modeling of the evolutionary process improves alignment quality; however, these improvements are modest compared with the remaining alignment errors, even with exact knowledge of the evolutionary model, emphasizing the need for statistical approaches to account for uncertainty. We develop a new algorithm, Marginalized Posterior Decoding (MPD), which explicitly accounts for uncertainties, is less biased and more accurate than other algorithms we consider, and reduces the proportion of misaligned bases by a third compared with the best existing algorithm. To our knowledge, this is the first nonheuristic algorithm for DNA sequence alignment to show robust improvements over the classic Needleman-Wunsch algorithm. Despite this, considerable uncertainty remains even in the improved alignments. We conclude that a probabilistic treatment is essential, both to improve alignment quality and to quantify the remaining uncertainty. This is becoming increasingly relevant with the growing appreciation of the importance of noncoding DNA, whose study relies heavily on alignments. Alignment errors are inevitable, and should be considered when drawing conclusions from alignments. Software and alignments to assist researchers in doing this are provided at http://genserv.anat.ox.ac.uk/grape/.