Oesophageal adenocarcinoma (EAC) incidence is rapidly increasing in Western countries. A better understanding of EAC underpins efforts to improve early detection and treatment outcomes. While large ...EAC exome sequencing efforts to date have found recurrent loss-of-function mutations, oncogenic driving events have been underrepresented. Here we use a combination of whole-genome sequencing (WGS) and single-nucleotide polymorphism-array profiling to show that genomic catastrophes are frequent in EAC, with almost a third (32%, n=40/123) undergoing chromothriptic events. WGS of 22 EAC cases show that catastrophes may lead to oncogene amplification through chromothripsis-derived double-minute chromosome formation (MYC and MDM2) or breakage-fusion-bridge (KRAS, MDM2 and RFC3). Telomere shortening is more prominent in EACs bearing localized complex rearrangements. Mutational signature analysis also confirms that extreme genomic instability in EAC can be driven by somatic BRCA2 mutations. These findings suggest that genomic catastrophes have a significant role in the malignant transformation of EAC.
Somatic mutation calling from next-generation sequencing data remains a challenge due to the difficulties of distinguishing true somatic events from artifacts arising from PCR, sequencing errors or ...mis-mapping. Tumor cellularity or purity, sub-clonality and copy number changes also confound the identification of true somatic events against a background of germline variants. We have developed a heuristic strategy and software (http://www.qcmg.org/bioinformatics/qsnp/) for somatic mutation calling in samples with low tumor content and we show the superior sensitivity and precision of our approach using a previously sequenced cell line, a series of tumor/normal admixtures, and 3,253 putative somatic SNVs verified on an orthogonal platform.
Short proteins play key roles in cell signalling and other processes, but their abundance in the mammalian proteome is unknown. Current catalogues of mammalian proteins exhibit an artefactual ...discontinuity at a length of 100 aa, so that protein abundance peaks just above this length and falls off sharply below it. To clarify the abundance of short proteins, we identify proteins in the FANTOM collection of mouse cDNAs by analysing synonymous and non-synonymous substitutions with the computer program CRITICA. This analysis confirms that there is no real discontinuity at length 100. Roughly 10% of mouse proteins are shorter than 100 aa, although the majority of these are variants of proteins longer than 100 aa. We identify many novel short proteins, including a "dark matter" subset containing ones that lack detectable homology to other known proteins. Translation assays confirm that some of these novel proteins can be translated and localised to the secretory pathway.
MicroRNAs (miRNAs) bind to mRNAs and target them for translational inhibition or transcriptional degradation. It is thought that most miRNA-mRNA interactions involve the seed region at the 5' end of ...the miRNA. The importance of seed sites is supported by experimental evidence, although there is growing interest in interactions mediated by the central region of the miRNA, termed centered sites. To investigate the prevalence of these interactions, we apply a biotin pull-down method to determine the direct targets of ten human miRNAs, including four isomiRs that share centered sites, but not seeds, with their canonical partner miRNAs.
We confirm that miRNAs and their isomiRs can interact with hundreds of mRNAs, and that imperfect centered sites are common mediators of miRNA-mRNA interactions. We experimentally demonstrate that these sites can repress mRNA activity, typically through translational repression, and are enriched in regions of the transcriptome bound by AGO. Finally, we show that the identification of imperfect centered sites is unlikely to be an artifact of our protocol caused by the biotinylation of the miRNA. However, the fact that there was a slight bias against seed sites in our protocol may have inflated the apparent prevalence of centered site-mediated interactions.
Our results suggest that centered site-mediated interactions are much more frequent than previously thought. This may explain the evolutionary conservation of the central region of miRNAs, and has significant implications for decoding miRNA-regulated genetic networks, and for predicting the functional effect of variants that do not alter protein sequence.
In response to the growing need for functional analysis of the human genome, we have developed a platform for high-throughput functional screening of genes overexpressed from lentiviral vectors. ...Protein-coding human open reading frames (ORFs) from the Mammalian Gene Collection were transferred into lentiviral expression vector using the highly efficient Gateway recombination cloning. Target ORFs were inserted into the vector downstream of a constitutive promoter and upstream of an IRES controlled GFP reporter, so that their transfection, transduction and expression could be monitored by fluorescence. The expression plasmids and viral packaging plasmids were combined and transfected into 293T cells to produce virus, which was then used to transduce the screening cell line. We have optimised the transfection and transduction procedures so that they can be performed using robotic liquid handling systems in arrayed 96-well microplate, one-gene-per-well format, without the need to concentrate the viral supernatant. Since lentiviruses can infect both dividing and non-dividing cells, this system can be used to overexpress human ORFs in a broad spectrum of experimental contexts. We tested the platform in a 1990 gene pilot screen for genes that can increase proliferation of the non-tumorigenic mammary epithelial cell line MCF-10A after removal of growth factors. Transduced cells were labelled with the nucleoside analogue 5-ethynyl-2'-deoxyuridine (EdU) to detect cells progressing through S phase. Hits were identified using high-content imaging and statistical analysis and confirmed with vectors using two different promoters (CMV and EF1α). The screen demonstrates the reliability, versatility and utility of our screening platform, and identifies novel cell cycle/proliferative activities for a number of genes.
The developing mouse kidney is currently the best-characterized model of organogenesis at a transcriptional level. Detailed spatial maps have been generated for gene expression profiling combined ...with systematic in situ screening. These studies, however, fall short of capturing the transcriptional complexity arising from each locus due to the limited scope of microarray-based technology, which is largely based on "gene-centric" models.
To address this, the polyadenylated RNA and microRNA transcriptomes of the 15.5 dpc mouse kidney were profiled using strand-specific RNA-sequencing (RNA-Seq) to a depth sufficient to complement spatial maps from pre-existing microarray datasets. The transcriptional complexity of RNAs arising from mouse RefSeq loci was catalogued; including 3568 alternatively spliced transcripts and 532 uncharacterized alternate 3' UTRs. Antisense expressions for 60% of RefSeq genes was also detected including uncharacterized non-coding transcripts overlapping kidney progenitor markers, Six2 and Sall1, and were validated by section in situ hybridization. Analysis of genes known to be involved in kidney development, particularly during mesenchymal-to-epithelial transition, showed an enrichment of non-coding antisense transcripts extended along protein-coding RNAs.
The resulting resource further refines the transcriptomic cartography of kidney organogenesis by integrating deep RNA sequencing data with locus-based information from previously published expression atlases. The added resolution of RNA-Seq has provided the basis for a transition from classical gene-centric models of kidney development towards more accurate and detailed "transcript-centric" representations, which highlights the extent of transcriptional complexity of genes that direct complex development events.
Recent RNA-sequencing studies have shown remarkable complexity in the mammalian transcriptome. The ultimate impact of this complexity on the predicted proteomic output is less well defined. We have ...undertaken strand-specific RNA sequencing of multiple cellular RNA fractions (>20 Gb) to uncover the transcriptional complexity of human embryonic stem cells (hESCs). We have shown that human embryonic stem (ES) cells display a high degree of transcriptional diversity, with more than half of active genes generating RNAs that differ from conventional gene models. We found evidence that more than 1000 genes express long 5' and/or extended 3'UTRs, which was confirmed by "virtual Northern" analysis. Exhaustive sequencing of the membrane-polysome and cytosolic/untranslated fractions of hESCs was used to identify RNAs encoding peptides destined for secretion and the extracellular space and to demonstrate preferential selection of transcription complexity for translation in vitro. The impact of this newly defined complexity on known gene-centric network models such as the Plurinet and the cell surface signaling machinery in human ES cells revealed a significant expansion of known transcript isoforms at play, many predicting possible alternative functions based on sequence alterations within key functional domains.
Somatic rearrangements, which are commonly found in human cancer genomes, contribute to the progression and maintenance of cancers. Conventionally, the verification of somatic rearrangements ...comprises many manual steps and Sanger sequencing. This is labor intensive when verifying a large number of rearrangements in a large cohort. To increase the verification throughput, we devised a high-throughput workflow that utilizes benchtop next-generation sequencing and in-house bioinformatics tools to link the laboratory processes. In the proposed workflow, primers are automatically designed. PCR and an optional gel electrophoresis step to confirm the somatic nature of the rearrangements are performed. PCR products of somatic events are pooled for Ion Torrent PGM and/or Illumina MiSeq sequencing, the resulting sequence reads are assembled into consensus contigs by a consensus assembler, and an automated BLAT is used to resolve the breakpoints to base level. We compared sequences and breakpoints of verified somatic rearrangements between the conventional and high-throughput workflow. The results showed that next-generation sequencing methods are comparable to conventional Sanger sequencing. The identified breakpoints obtained from next-generation sequencing methods were highly accurate and reproducible. Furthermore, the proposed workflow allows hundreds of events to be processed in a shorter time frame compared with the conventional workflow.
Integrated genomic analysis of 456 pancreatic ductal adenocarcinomas identified 32 recurrently mutated genes that aggregate into 10 pathways: KRAS, TGF-β, WNT, NOTCH, ROBO/SLIT signalling, G1/S ...transition, SWI-SNF, chromatin modification, DNA repair and RNA processing. Expression analysis defined 4 subtypes: (1) squamous; (2) pancreatic progenitor; (3) immunogenic; and (4) aberrantly differentiated endocrine exocrine (ADEX) that correlate with histopathological characteristics. Squamous tumours are enriched for TP53 and KDM6A mutations, upregulation of the TP63∆N transcriptional network, hypermethylation of pancreatic endodermal cell-fate determining genes and have a poor prognosis. Pancreatic progenitor tumours preferentially express genes involved in early pancreatic development (FOXA2/3, PDX1 and MNX1). ADEX tumours displayed upregulation of genes that regulate networks involved in KRAS activation, exocrine (NR5A2 and RBPJL), and endocrine differentiation (NEUROD1 and NKX2-2). Immunogenic tumours contained upregulated immune networks including pathways involved in acquired immune suppression. These data infer differences in the molecular evolution of pancreatic cancer subtypes and identify opportunities for therapeutic development.