Since the completion of the human genome project in 2003, extraordinary progress has been made in genome sequencing technologies, which has led to a decreased cost per megabase and an increase in the ...number and diversity of sequenced genomes. An astonishing complexity of genome architecture has been revealed, bringing these sequencing technologies to even greater advancements. Some approaches maximize the number of bases sequenced in the least amount of time, generating a wealth of data that can be used to understand increasingly complex phenotypes. Alternatively, other approaches now aim to sequence longer contiguous pieces of DNA, which are essential for resolving structurally complex regions. These and other strategies are providing researchers and clinicians a variety of tools to probe genomes in greater depth, leading to an enhanced understanding of how genome sequence variants underlie phenotype and disease.
Single-molecule sequencing instruments can generate multikilobase sequences with the potential to greatly improve genome and transcriptome assembly. However, the error rates of single-molecule reads ...are high, which has limited their use thus far to resequencing bacteria. To address this limitation, we introduce a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced genome of the parrot Melopsittacus undulatus, as well as for RNA-Seq reads of the corn (Zea mays) transcriptome. Our long-read correction achieves >99.9% base-call accuracy, leading to substantially better assemblies than current sequencing strategies: in the best example, the median contig size was quintupled relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly.
Exome sequencing of 343 families, each with a single child on the autism spectrum and at least one unaffected sibling, reveal de novo small indels and point substitutions, which come mostly from the ...paternal line in an age-dependent manner. We do not see significantly greater numbers of de novo missense mutations in affected versus unaffected children, but gene-disrupting mutations (nonsense, splice site, and frame shifts) are twice as frequent, 59 to 28. Based on this differential and the number of recurrent and total targets of gene disruption found in our and similar studies, we estimate between 350 and 400 autism susceptibility genes. Many of the disrupted genes in these studies are associated with the fragile X protein, FMRP, reinforcing links between autism and synaptic plasticity. We find FMRP-associated genes are under greater purifying selection than the remainder of genes and suggest they are especially dosage-sensitive targets of cognitive disorders.
► De novo mutations derive mainly from the paternal line in an age-dependent manner ► Mutations disrupting genes are twice as frequent in affected as unaffected siblings ► Many disrupted genes are associated with the fragile X protein, FMRP ► FMRP-associated genes are under unexpectedly strong purifying selection
Iossifov et al. use exome sequencing of 343 autistic families to identify de novo gene mutations associated with autism. Many of the mutated genes are associated with the fragile X protein FMRP, indicating new links between autism and synaptic plasticity.
Heterochromatin has been defined as deeply staining chromosomal material that remains condensed in interphase, whereas euchromatin undergoes de-condensation. Heterochromatin is found near centromeres ...and telomeres, but interstitial sites of heterochromatin (knobs) are common in plant genomes and were first described in maize. These regions are repetitive and late-replicating. In Drosophila, heterochromatin influences gene expression, a heterochromatin phenomenon called position effect variegation. Similarities between position effect variegation in Drosophila and gene silencing in maize mediated by "controlling elements" (that is, transposable elements) led in part to the proposal that heterochromatin is composed of transposable elements, and that such elements scattered throughout the genome might regulate development. Using microarray analysis, we show that heterochromatin in Arabidopsis is determined by transposable elements and related tandem repeats, under the control of the chromatin remodelling ATPase DDM1 (Decrease in DNA Methylation 1). Small interfering RNAs (siRNAs) correspond to these sequences, suggesting a role in guiding DDM1. We also show that transposable elements can regulate genes epigenetically, but only when inserted within or very close to them. This probably accounts for the regulation by DDM1 and the DNA methyltransferase MET1 of the euchromatic, imprinted gene FWA, as its promoter is provided by transposable-element-derived tandem repeats that are associated with siRNAs.
During germ cell and preimplantation development, mammalian cells undergo nearly complete reprogramming of DNA methylation patterns. We profiled the methylomes of human and chimp sperm as a basis for ...comparison to methylation patterns of ESCs. Although the majority of promoters escape methylation in both ESCs and sperm, the corresponding hypomethylated regions show substantial structural differences. Repeat elements are heavily methylated in both germ and somatic cells; however, retrotransposons from several subfamilies evade methylation more effectively during male germ cell development, whereas other subfamilies show the opposite trend. Comparing methylomes of human and chimp sperm revealed a subset of differentially methylated promoters and strikingly divergent methylation in retrotransposon subfamilies, with an evolutionary impact that is apparent in the underlying genomic sequence. Thus, the features that determine DNA methylation patterns differ between male germ cells and somatic cells, and elements of these features have diverged between humans and chimpanzees.
Display omitted
► Single-nucleotide resolution methylomes of human and chimp sperm were produced ► ES cell and sperm hypomethylated regions (HMRs) are structurally distinct ► A large number of repeats evade de novo methylation exclusively in sperm ► The developmental period spent methylated determines pressure for CpG depletion
Comparative analysis of DNA methylation in human and chimp germ cells and human embryonic stem cells provides insights into the coevolution of the genome and the epigenome in humans and suggests that changes to the epigenome may precede changes to the underlying sequence
Cancers are highly heterogeneous and contain many passenger and driver mutations. To functionally identify tumor suppressor genes relevant to human cancer, we compiled pools of short hairpin RNAs ...(shRNAs) targeting the mouse orthologs of genes recurrently deleted in a series of human hepatocellular carcinomas and tested their ability to promote tumorigenesis in a mosaic mouse model. In contrast to randomly selected shRNA pools, many deletion-specific pools accelerated hepatocarcinogenesis in mice. Through further analysis, we identified and validated 13 tumor suppressor genes, 12 of which had not been linked to cancer before. One gene,
XPO4, encodes a nuclear export protein whose substrate, EIF5A2, is amplified in human tumors, is required for proliferation of XPO4-deficient tumor cells, and promotes hepatocellular carcinoma in mice. Our results establish the feasibility of in vivo RNAi screens and illustrate how combining cancer genomics, RNA interference, and mosaic mouse models can facilitate the functional annotation of the cancer genome.
Monitoring the progress of DNA molecules through a membrane pore has been postulated as a method for sequencing DNA for several decades. Recently, a nanopore-based sequencing instrument, the Oxford ...Nanopore MinION, has become available, and we used this for sequencing the Saccharomyces cerevisiae genome. To make use of these data, we developed a novel open-source hybrid error correction algorithm Nanocorr specifically for Oxford Nanopore reads, because existing packages were incapable of assembling the long read lengths (5-50 kbp) at such high error rates (between ∼5% and 40% error). With this new method, we were able to perform a hybrid error correction of the nanopore reads using complementary MiSeq data and produce a de novo assembly that is highly contiguous and accurate: The contig N50 length is more than ten times greater than an Illumina-only assembly (678 kb versus 59.9 kbp) and has >99.88% consensus identity when compared to the reference. Furthermore, the assembly with the long nanopore reads presents a much more complete representation of the features of the genome and correctly assembles gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly.
This study exploits time, the relatively unexplored fourth dimension of gene regulatory networks (GRNs), to learn the temporal transcriptional logic underlying dynamic nitrogen (N) signaling in ...plants. Our “just-in-time” analysis of time-series transcriptome data uncovered a temporal cascade of cis elements underlying dynamic N signaling. To infer transcription factor (TF)-target edges in a GRN, we applied a time-based machine learning method to 2,174 dynamic N-responsive genes. We experimentally determined a network precision cutoff, using TF-regulated genome-wide targets of three TF hubs (CRF4, SNZ, and CDF1), used to “prune” the network to 155 TFs and 608 targets. This network precision was reconfirmed using genome-wide TF-target regulation data for four additional TFs (TGA1, HHO5/6, and PHL1) not used in network pruning. These higher-confidence edges in the GRN were further filtered by independent TF-target binding data, used to calculate a TF “N-specificity” index. This refined GRN identifies the temporal relationship of known/validated regulators of N signaling (NLP7/8, TGA1/4, NAC4, HRS1, and LBD37/38/39) and 146 additional regulators. Six TFs—CRF4, SNZ, CDF1, HHO5/6, and PHL1—validated herein regulate a significant number of genes in the dynamic N response, targeting 54% of N-uptake/assimilation pathway genes. Phenotypically, inducible overexpression of CRF4 in planta regulates genes resulting in altered biomass, root development, and 15NO₃⁻ uptake, specifically under low-N conditions. This dynamic N-signaling GRN now provides the temporal “transcriptional logic” for 155 candidate TFs to improve nitrogen use efficiency with potential agricultural applications. Broadly, these time-based approaches can uncover the temporal transcriptional logic for any biological response system in biology, agriculture, or medicine.
The SK-BR-3 cell line is one of the most important models for HER2+ breast cancers, which affect one in five breast cancer patients. SK-BR-3 is known to be highly rearranged, although much of the ...variation is in complex and repetitive regions that may be underreported. Addressing this, we sequenced SK-BR-3 using long-read single molecule sequencing from Pacific Biosciences and develop one of the most detailed maps of structural variations (SVs) in a cancer genome available, with nearly 20,000 variants present, most of which were missed by short-read sequencing. Surrounding the important
oncogene (also known as
), we discover a complex sequence of nested duplications and translocations, suggesting a punctuated progression. Full-length transcriptome sequencing further revealed several novel gene fusions within the nested genomic variants. Combining long-read genome and transcriptome sequencing enables an in-depth analysis of how SVs disrupt the genome and sheds new light on the complex mechanisms involved in cancer genome evolution.
Elevated levels of CO2 and temperature can both affect plant growth and development, but the signalling pathways regulating these processes are still obscure. MicroRNAs function to silence gene ...expression, and environmental stresses can alter their expressions. Here we identify, using the small RNA-sequencing method, microRNAs that change significantly in expression by either doubling the atmospheric CO2 concentration or by increasing temperature 3-6 °C. Notably, nearly all CO2-influenced microRNAs are affected inversely by elevated temperature. Using the RNA-sequencing method, we determine strongly correlated expression changes between miR156/157 and miR172, and their target transcription factors under elevated CO2 concentration. Similar correlations are also found for microRNAs acting in auxin-signalling, stress responses and potential cell wall carbohydrate synthesis. Our results demonstrate that both CO2 and temperature alter microRNA expression to affect Arabidopsis growth and development, and miR156/157- and miR172-regulated transcriptional network might underlie the onset of early flowering induced by increasing CO2.