Tumors from individuals with cancer are frequently genetically profiled for information about the driving forces behind the disease. We present the CancerMine resource, a text-mined and routinely ...updated database of drivers, oncogenes and tumor suppressors in different types of cancer. All data are available online ( http://bionlp.bcgsc.ca/cancermine ) and downloadable under a Creative Commons Zero license for ease of use.
An outbreak of tuberculosis occurred over a 3-year period in a medium-size community in British Columbia, Canada. The results of mycobacterial interspersed repetitive unit-variable-number ...tandem-repeat (MIRU-VNTR) genotyping suggested the outbreak was clonal. Traditional contact tracing did not identify a source. We used whole-genome sequencing and social-network analysis in an effort to describe the outbreak dynamics at a higher resolution.
We sequenced the complete genomes of 32 Mycobacterium tuberculosis outbreak isolates and 4 historical isolates (from the same region but sampled before the outbreak) with matching genotypes, using short-read sequencing. Epidemiologic and genomic data were overlaid on a social network constructed by means of interviews with patients to determine the origins and transmission dynamics of the outbreak.
Whole-genome data revealed two genetically distinct lineages of M. tuberculosis with identical MIRU-VNTR genotypes, suggesting two concomitant outbreaks. Integration of social-network and phylogenetic analyses revealed several transmission events, including those involving "superspreaders." Both lineages descended from a common ancestor and had been detected in the community before the outbreak, suggesting a social, rather than genetic, trigger. Further epidemiologic investigation revealed that the onset of the outbreak coincided with a recorded increase in crack cocaine use in the community.
Through integration of large-scale bacterial whole-genome sequencing and social-network analysis, we show that a socioenvironmental factor--most likely increased crack cocaine use--triggered the simultaneous expansion of two extant lineages of M. tuberculosis that was sustained by key members of a high-risk social network. Genotyping and contact tracing alone did not capture the true dynamics of the outbreak. (Funded by Genome British Columbia and others.).
The ability of nanopore sequencing to simultaneously detect modified nucleotides while producing long reads makes it ideal for detecting and phasing allele-specific methylation. However, there is ...currently no complete software for detecting SNPs, phasing haplotypes, and mapping methylation to these from nanopore sequence data. Here, we present NanoMethPhase, a software tool to phase 5-methylcytosine from nanopore sequencing. We also present SNVoter, which can post-process nanopore SNV calls to improve accuracy in low coverage regions. Together, these tools can accurately detect allele-specific methylation genome-wide using nanopore sequence data with low coverage of about ten-fold redundancy.
Although it is known that the methylation of DNA in 5' promoters suppresses gene expression, the role of DNA methylation in gene bodies is unclear. In mammals, tissue- and cell type-specific ...methylation is present in a small percentage of 5' CpG island (CGI) promoters, whereas a far greater proportion occurs across gene bodies, coinciding with highly conserved sequences. Tissue-specific intragenic methylation might reduce, or, paradoxically, enhance transcription elongation efficiency. Capped analysis of gene expression (CAGE) experiments also indicate that transcription commonly initiates within and between genes. To investigate the role of intragenic methylation, we generated a map of DNA methylation from the human brain encompassing 24.7 million of the 28 million CpG sites. From the dense, high-resolution coverage of CpG islands, the majority of methylated CpG islands were shown to be in intragenic and intergenic regions, whereas less than 3% of CpG islands in 5' promoters were methylated. The CpG islands in all three locations overlapped with RNA markers of transcription initiation, and unmethylated CpG islands also overlapped significantly with trimethylation of H3K4, a histone modification enriched at promoters. The general and CpG-island-specific patterns of methylation are conserved in mouse tissues. An in-depth investigation of the human SHANK3 locus and its mouse homologue demonstrated that this tissue-specific DNA methylation regulates intragenic promoter activity in vitro and in vivo. These methylation-regulated, alternative transcripts are expressed in a tissue- and cell type-specific manner, and are expressed differentially within a single cell type from distinct brain regions. These results support a major role for intragenic methylation in regulating cell context-specific alternative promoters in gene bodies.
Chardonnay is the basis of some of the world's most iconic wines and its success is underpinned by a historic program of clonal selection. There are numerous clones of Chardonnay available that ...exhibit differences in key viticultural and oenological traits that have arisen from the accumulation of somatic mutations during centuries of asexual propagation. However, the genetic variation that underlies these differences remains largely unknown. To address this knowledge gap, a high-quality, diploid-phased Chardonnay genome assembly was produced from single-molecule real time sequencing, and combined with re-sequencing data from 15 different Chardonnay clones. There were 1620 markers identified that distinguish the 15 clones. These markers were reliably used for clonal identification of independently sourced genomic material, as well as in identifying a potential genetic basis for some clonal phenotypic differences. The predicted parentage of the Chardonnay haplomes was elucidated by mapping sequence data from the predicted parents of Chardonnay (Gouais blanc and Pinot noir) against the Chardonnay reference genome. This enabled the detection of instances of heterosis, with differentially-expanded gene families being inherited from the parents of Chardonnay. Most surprisingly however, the patterns of nucleotide variation present in the Chardonnay genome indicate that Pinot noir and Gouais blanc share an extremely high degree of kinship that has resulted in the Chardonnay genome displaying characteristics that are indicative of inbreeding.
The Open Regulatory Annotation database (ORegAnno) is a resource for curated regulatory annotation. It contains information about regulatory regions, transcription factor binding sites, RNA binding ...sites, regulatory variants, haplotypes, and other regulatory elements. ORegAnno differentiates itself from other regulatory resources by facilitating crowd-sourced interpretation and annotation of regulatory observations from the literature and highly curated resources. It contains a comprehensive annotation scheme that aims to describe both the elements and outcomes of regulatory events. Moreover, ORegAnno assembles these disparate data sources and annotations into a single, high quality catalogue of curated regulatory information. The current release is an update of the database previously featured in the NAR Database Issue, and now contains 1 948 307 records, across 18 species, with a combined coverage of 334 215 080 bp. Complete records, annotation, and other associated data are available for browsing and download at http://www.oreganno.org/.
Repositioning existing drugs for new therapeutic uses is an efficient approach to drug discovery. We have developed a computational drug repositioning pipeline to perform large-scale molecular ...docking of small molecule drugs against protein drug targets, in order to map the drug-target interaction space and find novel interactions. Our method emphasizes removing false positive interaction predictions using criteria from known interaction docking, consensus scoring, and specificity. In all, our database contains 252 human protein drug targets that we classify as reliable-for-docking as well as 4621 approved and experimental small molecule drugs from DrugBank. These were cross-docked, then filtered through stringent scoring criteria to select top drug-target interactions. In particular, we used MAPK14 and the kinase inhibitor BIM-8 as examples where our stringent thresholds enriched the predicted drug-target interactions with known interactions up to 20 times compared to standard score thresholds. We validated nilotinib as a potent MAPK14 inhibitor in vitro (IC50 40 nM), suggesting a potential use for this drug in treating inflammatory diseases. The published literature indicated experimental evidence for 31 of the top predicted interactions, highlighting the promising nature of our approach. Novel interactions discovered may lead to the drug being repositioned as a therapeutic treatment for its off-target's associated disease, added insight into the drug's mechanism of action, and added insight into the drug's side effects.
Novel DNA sequencing technologies with the potential for up to three orders magnitude more sequence throughput than conventional Sanger sequencing are emerging. The instrument now available from ...Solexa Ltd, produces millions of short DNA sequences of 25 nt each. Due to ubiquitous repeats in large genomes and the inability of short sequences to uniquely and unambiguously characterize them, the short read length limits applicability for de novo sequencing. However, given the sequencing depth and the throughput of this instrument, stringent assembly of highly identical sequences can be achieved. We describe SSAKE, a tool for aggressively assembling millions of short nucleotide sequences by progressively searching through a prefix tree for the longest possible overlap between any two sequences. SSAKE is designed to help leverage the information from short sequence reads by stringently assembling them into contiguous sequences that can be used to characterize novel sequencing targets. Availability: Contact:rwarren@bcgsc.ca
Pancreatic adenocarcinoma presents as a spectrum of a highly aggressive disease in patients. The basis of this disease heterogeneity has proved difficult to resolve due to poor tumor cellularity and ...extensive genomic instability. To address this, a dataset of whole genomes and transcriptomes was generated from purified epithelium of primary and metastatic tumors. Transcriptome analysis demonstrated that molecular subtypes are a product of a gene expression continuum driven by a mixture of intratumoral subpopulations, which was confirmed by single-cell analysis. Integrated whole-genome analysis uncovered that molecular subtypes are linked to specific copy number aberrations in genes such as mutant KRAS and GATA6. By mapping tumor genetic histories, tetraploidization emerged as a key mutational process behind these events. Taken together, these data support the premise that the constellation of genomic aberrations in the tumor gives rise to the molecular subtype, and that disease heterogeneity is due to ongoing genomic instability during progression.
Networks are typically visualized with force-based or spectral layouts. These algorithms lack reproducibility and perceptual uniformity because they do not use a node coordinate system. The layouts ...can be difficult to interpret and are unsuitable for assessing differences in networks. To address these issues, we introduce hive plots (http://www.hiveplot.com) for generating informative, quantitative and comparable network layouts. Hive plots depict network structure transparently, are simple to understand and can be easily tuned to identify patterns of interest. The method is computationally straightforward, scales well and is amenable to a plugin for existing tools.