A substantial proportion of the genome of many species is derived from transposable elements (TEs). Moreover, through various self-copying mechanisms, TEs continue to proliferate in the genomes of ...most species. TEs have contributed numerous regulatory, transcript and protein innovations and have also been linked to disease. However, notwithstanding their demonstrated impact, many genomic studies still exclude them because their repetitive nature results in various analytical complexities. Fortunately, a growing array of methods and software tools are being developed to cater for them. This Review presents a summary of computational resources for TEs and highlights some of the challenges and remaining gaps to perform comprehensive genomic analyses that do not simply 'mask' repeats.
Abstract
We present a new update to MetaboAnalyst (version 4.0) for comprehensive metabolomic data analysis, interpretation, and integration with other omics data. Since the last major update in ...2015, MetaboAnalyst has continued to evolve based on user feedback and technological advancements in the field. For this year's update, four new key features have been added to MetaboAnalyst 4.0, including: (1) real-time R command tracking and display coupled with the release of a companion MetaboAnalystR package; (2) a MS Peaks to Pathways module for prediction of pathway activity from untargeted mass spectral data using the mummichog algorithm; (3) a Biomarker Meta-analysis module for robust biomarker identification through the combination of multiple metabolomic datasets and (4) a Network Explorer module for integrative analysis of metabolomics, metagenomics, and/or transcriptomics data. The user interface of MetaboAnalyst 4.0 has been reengineered to provide a more modern look and feel, as well as to give more space and flexibility to introduce new functions. The underlying knowledgebases (compound libraries, metabolite sets, and metabolic pathways) have also been updated based on the latest data from the Human Metabolome Database (HMDB). A Docker image of MetaboAnalyst is also available to facilitate download and local installation of MetaboAnalyst. MetaboAnalyst 4.0 is freely available at http://metaboanalyst.ca.
Cancer stem cells are critical for cancer initiation, development, and treatment resistance. Our understanding of these processes, and how they relate to glioblastoma heterogeneity, is limited. To ...overcome these limitations, we performed single-cell RNA sequencing on 53586 adult glioblastoma cells and 22637 normal human fetal brain cells, and compared the lineage hierarchy of the developing human brain to the transcriptome of cancer cells. We find a conserved neural tri-lineage cancer hierarchy centered around glial progenitor-like cells. We also find that this progenitor population contains the majority of the cancer's cycling cells, and, using RNA velocity, is often the originator of the other cell types. Finally, we show that this hierarchal map can be used to identify therapeutic targets specific to progenitor cancer stem cells. Our analyses show that normal brain development reconciles glioblastoma development, suggests a possible origin for glioblastoma hierarchy, and helps to identify cancer stem cell-specific targets.
Although emerging evidence suggests that transposable elements (TEs) have contributed novel regulatory elements to the human genome, their global impact on transcriptional networks remains largely ...uncharacterized. Here we show that TEs have contributed to the human genome nearly half of its active elements. Using DNase I hypersensitivity data sets from ENCODE in normal, embryonic, and cancer cells, we found that 44% of open chromatin regions were in TEs and that this proportion reached 63% for primate-specific regions. We also showed that distinct subfamilies of endogenous retroviruses (ERVs) contributed significantly more accessible regions than expected by chance, with up to 80% of their instances in open chromatin. Based on these results, we further characterized 2,150 TE subfamily-transcription factor pairs that were bound in vivo or enriched for specific binding motifs, and observed that TEs contributing to open chromatin had higher levels of sequence conservation. We also showed that thousands of ERV-derived sequences were activated in a cell type-specific manner, especially in embryonic and cancer cells, and we demonstrated that this activity was associated with cell type-specific expression of neighboring genes. Taken together, these results demonstrate that TEs, and in particular ERVs, have contributed hundreds of thousands of novel regulatory elements to the primate lineage and reshaped the human transcriptional landscape.
Transposable elements (TEs) are major components of eukaryotic genomes. However, the extent of their impact on genome evolution, function, and disease remain a matter of intense interrogation. The ...rise of genomics and large-scale functional assays has shed new light on the multi-faceted activities of TEs and implies that they should no longer be marginalized. Here, we introduce the fundamental properties of TEs and their complex interactions with their cellular environment, which are crucial to understanding their impact and manifold consequences for organismal biology. While we draw examples primarily from mammalian systems, the core concepts outlined here are relevant to a broad range of organisms.
Mammalian genomes are viewed as functional organizations that orchestrate spatial and temporal gene regulation. CTCF, the most characterized insulator-binding protein, has been implicated as a key ...genome organizer. However, little is known about CTCF-associated higher-order chromatin structures at a global scale. Here we applied chromatin interaction analysis by paired-end tag (ChIA-PET) sequencing to elucidate the CTCF-chromatin interactome in pluripotent cells. From this analysis, we identified 1,480 cis- and 336 trans-interacting loci with high reproducibility and precision. Associating these chromatin interaction loci with their underlying epigenetic states, promoter activities, enhancer binding and nuclear lamina occupancy, we uncovered five distinct chromatin domains that suggest potential new models of CTCF function in chromatin organization and transcriptional control. Specifically, CTCF interactions demarcate chromatin-nuclear membrane attachments and influence proper gene expression through extensive cross-talk between promoters and regulatory elements. This highly complex nuclear organization offers insights toward the unifying principles that govern genome plasticity and function.
The kidney and upper urinary tract develop through reciprocal interactions between the ureteric bud and the surrounding mesenchyme. Ureteric bud branching forms the arborized collecting duct system ...of the kidney, while ureteric tips promote nephron formation from dedicated progenitor cells. While nephron progenitor cells are relatively well characterized, the origin of ureteric bud progenitors has received little attention so far. It is well established that the ureteric bud is induced from the nephric duct, an epithelial duct derived from the intermediate mesoderm of the embryo. However, the cell state transitions underlying the progression from intermediate mesoderm to nephric duct and ureteric bud remain unknown. Here we show that nephric duct morphogenesis results from the coordinated organization of four major progenitor cell populations. Using single cell RNA-seq and Cluster RNA-seq, we show that these progenitors emerge in time and space according to a stereotypical pattern. We identify the transcription factors Tfap2a/b and Gata3 as critical coordinators of this progenitor cell progression. This study provides a better understanding of the cellular origin of the renal collecting duct system and associated urinary tract developmental diseases, which may inform guided differentiation of functional kidney tissue.
Advances in vertebrate genomics have uncovered thousands of loci encoding long noncoding RNAs (lncRNAs). While progress has been made in elucidating the regulatory functions of lncRNAs, little is ...known about their origins and evolution. Here we explore the contribution of transposable elements (TEs) to the makeup and regulation of lncRNAs in human, mouse, and zebrafish. Surprisingly, TEs occur in more than two thirds of mature lncRNA transcripts and account for a substantial portion of total lncRNA sequence (~30% in human), whereas they seldom occur in protein-coding transcripts. While TEs contribute less to lncRNA exons than expected, several TE families are strongly enriched in lncRNAs. There is also substantial interspecific variation in the coverage and types of TEs embedded in lncRNAs, partially reflecting differences in the TE landscapes of the genomes surveyed. In human, TE sequences in lncRNAs evolve under greater evolutionary constraint than their non-TE sequences, than their intronic TEs, or than random DNA. Consistent with functional constraint, we found that TEs contribute signals essential for the biogenesis of many lncRNAs, including ~30,000 unique sites for transcription initiation, splicing, or polyadenylation in human. In addition, we identified ~35,000 TEs marked as open chromatin located within 10 kb upstream of lncRNA genes. The density of these marks in one cell type correlate with elevated expression of the downstream lncRNA in the same cell type, suggesting that these TEs contribute to cis-regulation. These global trends are recapitulated in several lncRNAs with established functions. Finally a subset of TEs embedded in lncRNAs are subject to RNA editing and predicted to form secondary structures likely important for function. In conclusion, TEs are nearly ubiquitous in lncRNAs and have played an important role in the lineage-specific diversification of vertebrate lncRNA repertoires.
Epigenomic studies that use next generation sequencing experiments typically rely on the alignment of reads to a reference sequence. However, because of genetic diversity and the diploid nature of ...the human genome, we hypothesize that using a generic reference could lead to incorrectly mapped reads and bias downstream results.
We show that accounting for genetic variation using a modified reference genome or a de novo assembled genome can alter histone H3K4me1 and H3K27ac ChIP-seq peak calls either by creating new personal peaks or by the loss of reference peaks. Using permissive cutoffs, modified reference genomes are found to alter approximately 1% of peak calls while de novo assembled genomes alter up to 5% of peaks. We also show statistically significant differences in the amount of reads observed in regions associated with the new, altered, and unchanged peaks. We report that short insertions and deletions (indels), followed by single nucleotide variants (SNVs), have the highest probability of modifying peak calls. We show that using a graph personalized genome represents a reasonable compromise between modified reference genomes and de novo assembled genomes. We demonstrate that altered peaks have a genomic distribution typical of other peaks.
Analyzing epigenomic datasets with personalized and graph genomes allows the recovery of new peaks enriched for indels and SNVs. These altered peaks are more likely to differ between individuals and, as such, could be relevant in the study of various human phenotypes.
Human endogenous retrovirus subfamily H (HERVH) is a class of transposable elements expressed preferentially in human embryonic stem cells (hESCs). Here, we report that the long terminal repeats of ...HERVH function as enhancers and that HERVH is a nuclear long noncoding RNA required to maintain hESC identity. Furthermore, HERVH is associated with OCT4, coactivators and Mediator subunits. Together, these results uncover a new role of species-specific transposable elements in hESCs.