With DNA sequencing now getting cheaper more quickly than data storage or computation, the time may have come for genome informatics to migrate to the cloud.
One challenge facing biologists is to tease out useful information from massive data sets for further analysis. A pathway-based analysis may shed light by projecting candidate genes onto protein ...functional relationship networks. We are building such a pathway-based analysis system.
We have constructed a protein functional interaction network by extending curated pathways with non-curated sources of information, including protein-protein interactions, gene coexpression, protein domain interaction, Gene Ontology (GO) annotations and text-mined protein interactions, which cover close to 50% of the human proteome. By applying this network to two glioblastoma multiforme (GBM) data sets and projecting cancer candidate genes onto the network, we found that the majority of GBM candidate genes form a cluster and are closer than expected by chance, and the majority of GBM samples have sequence-altered genes in two network modules, one mainly comprising genes whose products are localized in the cytoplasm and plasma membrane, and another comprising gene products in the nucleus. Both modules are highly enriched in known oncogenes, tumor suppressors and genes involved in signal transduction. Similar network patterns were also found in breast, colorectal and pancreatic cancers.
We have built a highly reliable functional interaction network upon expert-curated pathways and applied this network to the analysis of two genome-wide GBM and several other cancer data sets. The network patterns revealed from our results suggest common mechanisms in the cancer biology. Our system should provide a foundation for a network or pathway-based analysis platform for cancer and other diseases.
Intra-tumor heterogeneity (ITH) is a mechanism of therapeutic resistance and therefore an important clinical challenge. However, the extent, origin, and drivers of ITH across cancer types are poorly ...understood. To address this, we extensively characterize ITH across whole-genome sequences of 2,658 cancer samples spanning 38 cancer types. Nearly all informative samples (95.1%) contain evidence of distinct subclonal expansions with frequent branching relationships between subclones. We observe positive selection of subclonal driver mutations across most cancer types and identify cancer type-specific subclonal patterns of driver gene mutations, fusions, structural variants, and copy number alterations as well as dynamic changes in mutational processes between subclonal expansions. Our results underline the importance of ITH and its drivers in tumor evolution and provide a pan-cancer resource of comprehensively annotated subclonal events from whole-genome sequencing data.
Display omitted
•Pan-cancer resource of comprehensively annotated intra-tumor heterogeneity (ITH)•ITH is pervasive across cancers and shows cancer type-specific patterns•Branching phylogenies are common•Dynamic changes in mutational processes between subclonal expansions
Dentro et al. provide a comprehensive annotation of intra-tumor heterogeneity and its drivers in cancer evolution.
Fulfilling the promise of the genetic revolution requires the analysis of large datasets containing information from thousands to millions of participants. However, sharing human genomic data ...requires protecting subjects from potential harm. Current models rely on de-identification techniques in which privacy versus data utility becomes a zero-sum game. Instead, we propose the use of trust-enabling techniques to create a solution in which researchers and participants both win. To do so we introduce three principles that facilitate trust in genetic research and outline one possible framework built upon those principles. Our hope is that such trust-centric frameworks provide a sustainable solution that reconciles genetic privacy with data sharing and facilitates genetic research.
In a classical view of hematopoiesis, the various blood cell lineages arise via a hierarchical scheme starting with multipotent stem cells that become increasingly restricted in their differentiation ...potential through oligopotent and then unipotent progenitors. We developed a cell-sorting scheme to resolve myeloid (My), erythroid (Er), and megakaryocytic (Mk) fates from single CD34(+) cells and then mapped the progenitor hierarchy across human development. Fetal liver contained large numbers of distinct oligopotent progenitors with intermingled My, Er, and Mk fates. However, few oligopotent progenitor intermediates were present in the adult bone marrow. Instead, only two progenitor classes predominate, multipotent and unipotent, with Er-Mk lineages emerging from multipotent cells. The developmental shift to an adult "two-tier" hierarchy challenges current dogma and provides a revised framework to understand normal and disease states of human hematopoiesis.
Biology is an information-driven science. Large-scale data sets from genomics, physiology, population genetics and imaging are driving research at a dizzying rate. Simultaneously, interdisciplinary ...collaborations among experimental biologists, theorists, statisticians and computer scientists have become the key to making effective use of these data sets. However, too many biologists have trouble accessing and using these electronic data sets and tools effectively. A 'cyberinfrastructure' is a combination of databases, network protocols and computational services that brings people, information and computational tools together to perform science in this information-driven world. This article reviews the components of a biological cyberinfrastructure, discusses current and pending implementations, and notes the many challenges that lie ahead.
Pancreatic adenocarcinoma presents as a spectrum of a highly aggressive disease in patients. The basis of this disease heterogeneity has proved difficult to resolve due to poor tumor cellularity and ...extensive genomic instability. To address this, a dataset of whole genomes and transcriptomes was generated from purified epithelium of primary and metastatic tumors. Transcriptome analysis demonstrated that molecular subtypes are a product of a gene expression continuum driven by a mixture of intratumoral subpopulations, which was confirmed by single-cell analysis. Integrated whole-genome analysis uncovered that molecular subtypes are linked to specific copy number aberrations in genes such as mutant KRAS and GATA6. By mapping tumor genetic histories, tetraploidization emerged as a key mutational process behind these events. Taken together, these data support the premise that the constellation of genomic aberrations in the tumor gives rise to the molecular subtype, and that disease heterogeneity is due to ongoing genomic instability during progression.
Cancers are caused by genomic alterations known as drivers. Hundreds of drivers in coding genes are known but, to date, only a handful of noncoding drivers have been discovered-despite intensive ...searching
. Attention has recently shifted to the role of altered RNA splicing in cancer; driver mutations that lead to transcriptome-wide aberrant splicing have been identified in multiple types of cancer, although these mutations have only been found in protein-coding splicing factors such as splicing factor 3b subunit 1 (SF3B1)
. By contrast, cancer-related alterations in the noncoding component of the spliceosome-a series of small nuclear RNAs (snRNAs)-have barely been studied, owing to the combined challenges of characterizing noncoding cancer drivers and the repetitive nature of snRNA genes
. Here we report a highly recurrent A>C somatic mutation at the third base of U1 snRNA in several types of tumour. The primary function of U1 snRNA is to recognize the 5' splice site via base-pairing. This mutation changes the preferential A-U base-pairing between U1 snRNA and the 5' splice site to C-G base-pairing, and thus creates novel splice junctions and alters the splicing pattern of multiple genes-including known drivers of cancer. Clinically, the A>C mutation is associated with heavy alcohol use in patients with hepatocellular carcinoma, and with the aggressive subtype of chronic lymphocytic leukaemia with unmutated immunoglobulin heavy-chain variable regions. The mutation in U1 snRNA also independently confers an adverse prognosis to patients with chronic lymphocytic leukaemia. Our study demonstrates a noncoding driver in spliceosomal RNAs, reveals a mechanism of aberrant splicing in cancer and may represent a new target for treatment. Our findings also suggest that driver discovery should be extended to a wider range of genomic regions.