Next-generation sequencing technologies have enabled a dramatic expansion of clinical genetic testing both for inherited conditions and diseases such as cancer. Accurate variant calling in NGS data ...is a critical step upon which virtually all downstream analysis and interpretation processes rely. Just as NGS technologies have evolved considerably over the past 10 years, so too have the software tools and approaches for detecting sequence variants in clinical samples. In this review, I discuss the current best practices for variant calling in clinical sequencing studies, with a particular emphasis on trio sequencing for inherited disorders and somatic mutation detection in cancer patients. I describe the relative strengths and weaknesses of panel, exome, and whole-genome sequencing for variant detection. Recommended tools and strategies for calling variants of different classes are also provided, along with guidance on variant review, validation, and benchmarking to ensure optimal performance. Although NGS technologies are continually evolving, and new capabilities (such as long-read single-molecule sequencing) are emerging, the "best practice" principles in this review should be relevant to clinical variant calling in the long term.
Genomics is a relatively new scientific discipline, having DNA sequencing as its core technology. As technology has improved the cost and scale of genome characterization over sequencing’s 40-year ...history, the scope of inquiry has commensurately broadened. Massively parallel sequencing has proven revolutionary, shifting the paradigm of genomics to address biological questions at a genome-wide scale. Sequencing now empowers clinical diagnostics and other aspects of medical care, including disease risk, therapeutic identification, and prenatal testing. This Review explores the current state of genomics in the massively parallel sequencing era.
The myelodysplastic syndromes are a group of hematologic disorders that often evolve into secondary acute myeloid leukemia (AML). The genetic changes that underlie progression from the ...myelodysplastic syndromes to secondary AML are not well understood.
We performed whole-genome sequencing of seven paired samples of skin and bone marrow in seven subjects with secondary AML to identify somatic mutations specific to secondary AML. We then genotyped a bone marrow sample obtained during the antecedent myelodysplastic-syndrome stage from each subject to determine the presence or absence of the specific somatic mutations. We identified recurrent mutations in coding genes and defined the clonal architecture of each pair of samples from the myelodysplastic-syndrome stage and the secondary-AML stage, using the allele burden of hundreds of mutations.
Approximately 85% of bone marrow cells were clonal in the myelodysplastic-syndrome and secondary-AML samples, regardless of the myeloblast count. The secondary-AML samples contained mutations in 11 recurrently mutated genes, including 4 genes that have not been previously implicated in the myelodysplastic syndromes or AML. In every case, progression to acute leukemia was defined by the persistence of an antecedent founding clone containing 182 to 660 somatic mutations and the outgrowth or emergence of at least one subclone, harboring dozens to hundreds of new mutations. All founding clones and subclones contained at least one mutation in a coding gene.
Nearly all the bone marrow cells in patients with myelodysplastic syndromes and secondary AML are clonally derived. Genetic evolution of secondary AML is a dynamic process shaped by multiple cycles of mutation acquisition and clonal selection. Recurrent gene mutations are found in both founding clones and daughter subclones. (Funded by the National Institutes of Health and others.).
Cancer is a disease driven by genetic variation and mutation. Exome sequencing can be utilized for discovering these variants and mutations across hundreds of tumors. Here we present an analysis ...tool, VarScan 2, for the detection of somatic mutations and copy number alterations (CNAs) in exome data from tumor-normal pairs. Unlike most current approaches, our algorithm reads data from both samples simultaneously; a heuristic and statistical algorithm detects sequence variants and classifies them by somatic status (germline, somatic, or LOH); while a comparison of normalized read depth delineates relative copy number changes. We apply these methods to the analysis of exome sequence data from 151 high-grade ovarian tumors characterized as part of the Cancer Genome Atlas (TCGA). We validated some 7790 somatic coding mutations, achieving 93% sensitivity and 85% precision for single nucleotide variant (SNV) detection. Exome-based CNA analysis identified 29 large-scale alterations and 619 focal events per tumor on average. As in our previous analysis of these data, we observed frequent amplification of oncogenes (e.g., CCNE1, MYC) and deletion of tumor suppressors (NF1, PTEN, and CDKN2A). We searched for additional recurrent focal CNAs using the correlation matrix diagonal segmentation (CMDS) algorithm, which identified 424 significant events affecting 582 genes. Taken together, our results demonstrate the robust performance of VarScan 2 for somatic mutation and CNA detection and shed new light on the landscape of genetic alterations in ovarian cancer.
Myelodysplastic syndromes (MDS) are hematopoietic stem cell disorders that often progress to chemotherapy-resistant secondary acute myeloid leukemia (sAML). We used whole-genome sequencing to perform ...an unbiased comprehensive screen to discover the somatic mutations in a sample from an individual with sAML and genotyped the loci containing these mutations in the matched MDS sample. Here we show that a missense mutation affecting the serine at codon 34 (Ser34) in U2AF1 was recurrently present in 13 out of 150 (8.7%) subjects with de novo MDS, and we found suggestive evidence of an increased risk of progression to sAML associated with this mutation. U2AF1 is a U2 auxiliary factor protein that recognizes the AG splice acceptor dinucleotide at the 3' end of introns, and the alterations in U2AF1 are located in highly conserved zinc fingers of this protein. Mutant U2AF1 promotes enhanced splicing and exon skipping in reporter assays in vitro. This previously unidentified, recurrent mutation in U2AF1 implicates altered pre-mRNA splicing as a potential mechanism for MDS pathogenesis.
Massively parallel sequencing technology and the associated rapidly decreasing sequencing costs have enabled systemic analyses of somatic mutations in large cohorts of cancer cases. Here we introduce ...a comprehensive mutational analysis pipeline that uses standardized sequence-based inputs along with multiple types of clinical data to establish correlations among mutation sites, affected genes and pathways, and to ultimately separate the commonly abundant passenger mutations from the truly significant events. In other words, we aim to determine the Mutational Significance in Cancer (MuSiC) for these large data sets. The integration of analytical operations in the MuSiC framework is widely applicable to a broad set of tumor types and offers the benefits of automation as well as standardization. Herein, we describe the computational structure and statistical underpinnings of the MuSiC pipeline and demonstrate its performance using 316 ovarian cancer samples from the TCGA ovarian cancer project. MuSiC correctly confirms many expected results, and identifies several potentially novel avenues for discovery.
Motivation: The sequencing of tumors and their matched normals is frequently used to study the genetic composition of cancer. Despite this fact, there remains a dearth of available software tools ...designed to compare sequences in pairs of samples and identify sites that are likely to be unique to one sample.
Results: In this article, we describe the mathematical basis of our SomaticSniper software for comparing tumor and normal pairs. We estimate its sensitivity and precision, and present several common sources of error resulting in miscalls.
Availability and implementation: Binaries are freely available for download at http://gmt.genome.wustl.edu/somatic-sniper/current/, implemented in C and supported on Linux and Mac OS X.
Contact:
delarson@wustl.edu; lding@wustl.edu
Supplementary information:
Supplementary data are available at Bioinformatics online.
Massively parallel sequencing technologies hold incredible promise for the study of DNA sequence variation, particularly the identification of variants affecting human disease. The unprecedented ...throughput and relatively short read lengths of Roche/454, Illumina/Solexa, and other platforms have spurred development of a new generation of sequence alignment algorithms. Yet detection of sequence variants based on short read alignments remains challenging, and most currently available tools are limited to a single platform or aligner type. We present VarScan, an open source tool for variant detection that is compatible with several short read aligners. We demonstrate VarScan's ability to detect SNPs and indels with high sensitivity and specificity, in both Roche/454 sequencing of individuals and deep Illumina/Solexa sequencing of pooled samples. Availability and Implementation: Source code and documentation freely available at http://genome.wustl.edu/tools/cancer-genomics implemented as a Perl package and supported on Linux/UNIX, MS Windows and Mac OSX. Contact: dkoboldt@genome.wustl.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Data from 8 breast cancer genome-sequencing projects identified 25 patients with HER2 somatic mutations in cancers lacking HER2 gene amplification. To determine the phenotype of these mutations, we ...functionally characterized 13 HER2 mutations using in vitro kinase assays, protein structure analysis, cell culture, and xenograft experiments. Seven of these mutations are activating mutations, including G309A, D769H, D769Y, V777L, P780ins, V842I, and R896C. HER2 in-frame deletion 755-759, which is homologous to EGF receptor (EGFR) exon 19 in-frame deletions, had a neomorphic phenotype with increased phosphorylation of EGFR or HER3. L755S produced lapatinib resistance, but was not an activating mutation in our experimental systems. All of these mutations were sensitive to the irreversible kinase inhibitor, neratinib. These findings show that HER2 somatic mutation is an alternative mechanism to activate HER2 in breast cancer and they validate HER2 somatic mutations as drug targets for breast cancer treatment.
We show that the majority of HER2 somatic mutations in breast cancer patients are activating mutations that likely drive tumorigenesis. Several patients had mutations that are resistant to the reversible HER2 inhibitor lapatinib, but are sensitive to the irreversible HER2 inhibitor, neratinib. Our results suggest that patients with HER2 mutation–positive breast cancers could benefit from existing HER2-targeted drugs.
Significance We present highlights of the first complete domestic cat reference genome, to our knowledge. We provide evolutionary assessments of the feline protein-coding genome, population genetic ...discoveries surrounding domestication, and a resource of domestic cat genetic variants. These analyses span broadly, from carnivore adaptations for hunting behavior to comparative odorant and chemical detection abilities between cats and dogs. We describe how segregating genetic variation in pigmentation phenotypes has reached fixation within a single breed, and also highlight the genomic differences between domestic cats and wildcats. Specifically, the signatures of selection in the domestic cat genome are linked to genes associated with gene knockout models affecting memory, fear-conditioning behavior, and stimulus-reward learning, and potentially point to the processes by which cats became domesticated.
Little is known about the genetic changes that distinguish domestic cat populations from their wild progenitors. Here we describe a high-quality domestic cat reference genome assembly and comparative inferences made with other cat breeds, wildcats, and other mammals. Based upon these comparisons, we identified positively selected genes enriched for genes involved in lipid metabolism that underpin adaptations to a hypercarnivorous diet. We also found positive selection signals within genes underlying sensory processes, especially those affecting vision and hearing in the carnivore lineage. We observed an evolutionary tradeoff between functional olfactory and vomeronasal receptor gene repertoires in the cat and dog genomes, with an expansion of the feline chemosensory system for detecting pheromones at the expense of odorant detection. Genomic regions harboring signatures of natural selection that distinguish domestic cats from their wild congeners are enriched in neural crest-related genes associated with behavior and reward in mouse models, as predicted by the domestication syndrome hypothesis. Our description of a previously unidentified allele for the gloving pigmentation pattern found in the Birman breed supports the hypothesis that cat breeds experienced strong selection on specific mutations drawn from random bred populations. Collectively, these findings provide insight into how the process of domestication altered the ancestral wildcat genome and build a resource for future disease mapping and phylogenomic studies across all members of the Felidae.