Tumor-derived cell lines have served as vital models to advance our understanding of oncogene function and therapeutic responses. Although substantial effort has been made to define the genomic ...constitution of cancer cell line panels, the transcriptome remains understudied. Here we describe RNA sequencing and single-nucleotide polymorphism (SNP) array analysis of 675 human cancer cell lines. We report comprehensive analyses of transcriptome features including gene expression, mutations, gene fusions and expression of non-human sequences. Of the 2,200 gene fusions catalogued, 1,435 consist of genes not previously found in fusions, providing many leads for further investigation. We combine multiple genome and transcriptome features in a pathway-based approach to enhance prediction of response to targeted therapeutics. Our results provide a valuable resource for studies that use cancer cell lines.
Plant cells undergo two types of cell cycles-the mitotic cycle in which DNA replication is coupled to mitosis, and the endocycle in which DNA replication occurs in the absence of cell division. To ...investigate DNA replication programs in these two types of cell cycles, we pulse labeled intact root tips of maize (Zea mays) with 5-ethynyl-2'-deoxyuridine (EdU) and used flow sorting of nuclei to examine DNA replication timing (RT) during the transition from a mitotic cycle to an endocycle. Comparison of the sequence-based RT profiles showed that most regions of the maize genome replicate at the same time during S phase in mitotic and endocycling cells, despite the need to replicate twice as much DNA in the endocycle and the fact that endocycling is typically associated with cell differentiation. However, regions collectively corresponding to 2% of the genome displayed significant changes in timing between the two types of cell cycles. The majority of these regions are small with a median size of 135 kb, shift to a later RT in the endocycle, and are enriched for genes expressed in the root tip. We found larger regions that shifted RT in centromeres of seven of the ten maize chromosomes. These regions covered the majority of the previously defined functional centromere, which ranged between 1 and 2 Mb in size in the reference genome. They replicate mainly during mid S phase in mitotic cells but primarily in late S phase of the endocycle. In contrast, the immediately adjacent pericentromere sequences are primarily late replicating in both cell cycles. Analysis of CENH3 enrichment levels in 8C vs 2C nuclei suggested that there is only a partial replacement of CENH3 nucleosomes after endocycle replication is complete. The shift to later replication of centromeres and possible reduction in CENH3 enrichment after endocycle replication is consistent with a hypothesis that centromeres are inactivated when their function is no longer needed.
Eukaryotes use a temporally regulated process, known as the replication timing program, to ensure that their genomes are fully and accurately duplicated during S phase. Replication timing programs ...are predictive of genomic features and activity and are considered to be functional readouts of chromatin organization. Although replication timing programs have been described for yeast and animal systems, much less is known about the temporal regulation of plant DNA replication or its relationship to genome sequence and chromatin structure. We used the thymidine analog, 5-ethynyl-2'-deoxyuridine, in combination with flow sorting and Repli-Seq to describe, at high-resolution, the genome-wide replication timing program for Arabidopsis (Arabidopsis thaliana) Col-0 suspension cells. We identified genomic regions that replicate predominantly during early, mid, and late S phase, and correlated these regions with genomic features and with data for chromatin state, accessibility, and long-distance interaction. Arabidopsis chromosome arms tend to replicate early while pericentromeric regions replicate late. Early and mid-replicating regions are gene-rich and predominantly euchromatic, while late regions are rich in transposable elements and primarily heterochromatic. However, the distribution of chromatin states across the different times is complex, with each replication time corresponding to a mixture of states. Early and mid-replicating sequences interact with each other and not with late sequences, but early regions are more accessible than mid regions. The replication timing program in Arabidopsis reflects a bipartite genomic organization with early/mid-replicating regions and late regions forming separate, noninteracting compartments. The temporal order of DNA replication within the early/mid compartment may be modulated largely by chromatin accessibility.
DNA replication during S phase in eukaryotes is a highly regulated process that ensures the accurate transmission of genetic material to daughter cells during cell division. Replication follows a ...well-defined temporal program, which has been studied extensively in humans, Drosophila, and yeast, where it is clear that the replication process is both temporally and spatially ordered. The replication timing (RT) program is increasingly considered to be a functional readout of genomic features and chromatin organization. Although there is increasing evidence that plants display important differences in their DNA replication process compared to animals, RT programs in plants have not been extensively studied. To address this deficiency, we developed an improved protocol for the genome-wide RT analysis by sequencing newly replicated DNA ("Repli-seq") and applied it to the characterization of RT in maize root tips. Our protocol uses 5-ethynyl-2'-deoxyuridine (EdU) to label replicating DNA in vivo in intact roots. Our protocol also eliminates the need for synchronization and frequently associated chemical perturbations as well as the need for cell cultures, which can accumulate genetic and epigenetic differences over time. EdU can be fluorescently labeled under mild conditions and does not degrade subnuclear structure, allowing for the differentiation of labeled and unlabeled nuclei by flow sorting, effectively eliminating contamination issues that can result from sorting on DNA content alone. We also developed an analysis pipeline for analyzing and classifying regions of replication and present it in a point-and-click application called Repliscan that eliminates the need for command line programming.
Replication timing experiments that use label incorporation and high throughput sequencing produce peaked data similar to ChIP-Seq experiments. However, the differences in experimental design, ...coverage density, and possible results make traditional ChIP-Seq analysis methods inappropriate for use with replication timing.
To accurately detect and classify regions of replication across the genome, we present Repliscan. Repliscan robustly normalizes, automatically removes outlying and uninformative data points, and classifies Repli-seq signals into discrete combinations of replication signatures. The quality control steps and self-fitting methods make Repliscan generally applicable and more robust than previous methods that classify regions based on thresholds.
Repliscan is simple and effective to use on organisms with different genome sizes. Even with analysis window sizes as small as 1 kilobase, reliable profiles can be generated with as little as 2.4x coverage.
DNA methylation is a chromatin modification that can provide epigenetic regulation of gene and transposon expression. Plants utilize several pathways to establish and maintain DNA methylation in ...specific sequence contexts. The chromomethylase (CMT) genes maintain CHG (where H = A, C or T) methylation. The RNA-directed DNA methylation (RdDM) pathway is important for CHH methylation. Transcriptome analysis was performed in a collection of
lines carrying mutant alleles for CMT or RdDM-associated genes. While the majority of the transcriptome was not affected, we identified sets of genes and transposon families sensitive to context-specific decreases in DNA methylation in mutant lines. Many of the genes that are up-regulated in CMT mutant lines have high levels of CHG methylation, while genes that are differentially expressed in RdDM mutants are enriched for having nearby mCHH islands, implicating context-specific DNA methylation in the regulation of expression for a small number of genes. Many genes regulated by CMTs exhibit natural variation for DNA methylation and transcript abundance in a panel of diverse inbred lines. Transposon families with differential expression in the mutant genotypes show few defining features, though several families up-regulated in RdDM mutants show enriched expression in endosperm tissue, highlighting the potential importance for this pathway during reproduction. Taken together, our findings suggest that while the number of genes and transposon families whose expression is reproducibly affected by mild perturbations in context-specific methylation is small, there are distinct patterns for loci impacted by RdDM and CMT mutants.
All plants and animals must replicate their DNA, using a regulated process to ensure that their genomes are completely and accurately replicated. DNA replication timing programs have been extensively ...studied in yeast and animal systems, but much less is known about the replication programs of plants. We report a novel adaptation of the “Repli-seq” assay for use in intact root tips of maize (Zea mays) that includes several different cell lineages and present whole-genome replication timing profiles from cells in early, mid, and late S phase of the mitotic cell cycle. Maize root tips have a complex replication timing program, including regions of distinct early, mid, and late S replication that each constitute between 20 and 24%of the genome, as well as other loci corresponding to ~32% of the genome that exhibit replication activity in two different time windows. Analyses of genomic, transcriptional, and chromatin features of the euchromatic portion of the maize genome provide evidence for a gradient of early replicating, open chromatin that transitions gradually to less open and less transcriptionally active chromatin replicating in mid S phase. Our genomic level analysis also demonstrated that the centromere core replicates in mid S, before heavily compacted classical heterochromatin, including pericentromeres and knobs, which replicate during late S phase.
Software containers are an important common currency for portability and reproducibility in the modern world of computing. While they are easy to share through public registries, usage documentation ...is often lacking, effectively leaving users with black boxes. RollingGantryCrane (RGC) is an open-source tool that takes generic software containers and automatically exposes the internal software through LMOD environment modules. Users provide the container URLs they wish to use, and RGC pulls the containers, collects descriptive metadata from public repositories, scans for non-standard executables on each container's search path, and generates LMOD modulefiles with help text and shell functions that transparently expose applications directly to the command line interface. RGC has been used in production since early 2019 on five production systems at The Texas Advanced Computing Center (TACC), allowing users to create bespoke modules and serving over 3000 unique tools from the BioContainers project.
The genome assembly process has significantly decreased in computational complexity since the advent of third-generation long-read technologies. However, genome annotations still require significant ...manual effort from scientists to produce trustworthy annotations required for most bioinformatic analyses. Current methods for automatic eukaryotic annotation rely on sequence homology, structure, or repeat detection, and each method requires a separate tool, making the workflow for a final product a complex ensemble. Beyond the nucleotide sequence, one important component of genetic architecture is the presence of epigenetic marks, including DNA methylation. However, no automatic annotation tools currently use this valuable information. As methylation data becomes more widely available from nanopore sequencing technology, tools that take advantage of patterns in this data will be in demand. The goal of this dissertation was to improve the annotation process by developing and training a recurrent neural network (RNN) on trusted annotations to recognize multiple classes of elements from both the reference sequence and DNA methylation. We found that our proposed tool, RNNotate, detected fewer coding elements than GlimmerHMM and Augustus, but those predictions were more often correct. When predicting transposable elements, RNNotate was more accurate than both RepeatMasker and RepeatScout. Additionally, we found that RNNotate was significantly less sensitive when trained and run without DNA methylation, validating our hypothesis. To our best knowledge, we are not only the first group to use recurrent neural networks for eukaryotic genome annotation, but we also innovated in the data space by utilizing DNA methylation patterns for prediction.
The Arabidopsis genome replicates in two noninteracting compartments during early/mid and late S phase.
Eukaryotes use a temporally regulated process, known as the replication timing program, to ...ensure that their genomes are fully and accurately duplicated during S phase. Replication timing programs are predictive of genomic features and activity and are considered to be functional readouts of chromatin organization. Although replication timing programs have been described for yeast and animal systems, much less is known about the temporal regulation of plant DNA replication or its relationship to genome sequence and chromatin structure. We used the thymidine analog, 5-ethynyl-2′-deoxyuridine, in combination with flow sorting and Repli-Seq to describe, at high-resolution, the genome-wide replication timing program for Arabidopsis (
Arabidopsis thaliana
) Col-0 suspension cells. We identified genomic regions that replicate predominantly during early, mid, and late S phase, and correlated these regions with genomic features and with data for chromatin state, accessibility, and long-distance interaction. Arabidopsis chromosome arms tend to replicate early while pericentromeric regions replicate late. Early and mid-replicating regions are gene-rich and predominantly euchromatic, while late regions are rich in transposable elements and primarily heterochromatic. However, the distribution of chromatin states across the different times is complex, with each replication time corresponding to a mixture of states. Early and mid-replicating sequences interact with each other and not with late sequences, but early regions are more accessible than mid regions. The replication timing program in Arabidopsis reflects a bipartite genomic organization with early/mid-replicating regions and late regions forming separate, noninteracting compartments. The temporal order of DNA replication within the early/mid compartment may be modulated largely by chromatin accessibility.