We report the results of whole-genome and transcriptome sequencing of tumor and adjacent normal tissue samples from 17 patients with non-small cell lung carcinoma (NSCLC). We identified 3,726 point ...mutations and more than 90 indels in the coding sequence, with an average mutation frequency more than 10-fold higher in smokers than in never-smokers. Novel alterations in genes involved in chromatin modification and DNA repair pathways were identified, along with DACH1, CFTR, RELN, ABCB5, and HGF. Deep digital sequencing revealed diverse clonality patterns in both never-smokers and smokers. All validated EFGR and KRAS mutations were present in the founder clones, suggesting possible roles in cancer initiation. Analysis revealed 14 fusions, including ROS1 and ALK, as well as novel metabolic enzymes. Cell-cycle and JAK-STAT pathways are significantly altered in lung cancer, along with perturbations in 54 genes that are potentially targetable with currently available drugs.
Display omitted
► Smokers with lung cancer show 10× the number of point mutations than never-smokers ► Novel lung cancer genes, including DACH1, CFTR, RELN, ABCB5, and HGF were identified ► Novel pathway alterations in lung cancer include cell-cycle and JAK-STAT pathways ► Alterations were identified in 54 genes for which targeted drugs are available
Whole-genome sequencing of 17 lung cancer patients reveals that smokers with lung cancer show 10× the number of point mutations than patients who were never smokers. Alterations were identified in 54 genes for which targeted drugs are available.
Genomics is a relatively new scientific discipline, having DNA sequencing as its core technology. As technology has improved the cost and scale of genome characterization over sequencing’s 40-year ...history, the scope of inquiry has commensurately broadened. Massively parallel sequencing has proven revolutionary, shifting the paradigm of genomics to address biological questions at a genome-wide scale. Sequencing now empowers clinical diagnostics and other aspects of medical care, including disease risk, therapeutic identification, and prenatal testing. This Review explores the current state of genomics in the massively parallel sequencing era.
Exome sequencing of 343 families, each with a single child on the autism spectrum and at least one unaffected sibling, reveal de novo small indels and point substitutions, which come mostly from the ...paternal line in an age-dependent manner. We do not see significantly greater numbers of de novo missense mutations in affected versus unaffected children, but gene-disrupting mutations (nonsense, splice site, and frame shifts) are twice as frequent, 59 to 28. Based on this differential and the number of recurrent and total targets of gene disruption found in our and similar studies, we estimate between 350 and 400 autism susceptibility genes. Many of the disrupted genes in these studies are associated with the fragile X protein, FMRP, reinforcing links between autism and synaptic plasticity. We find FMRP-associated genes are under greater purifying selection than the remainder of genes and suggest they are especially dosage-sensitive targets of cognitive disorders.
► De novo mutations derive mainly from the paternal line in an age-dependent manner ► Mutations disrupting genes are twice as frequent in affected as unaffected siblings ► Many disrupted genes are associated with the fragile X protein, FMRP ► FMRP-associated genes are under unexpectedly strong purifying selection
Iossifov et al. use exome sequencing of 343 autistic families to identify de novo gene mutations associated with autism. Many of the mutated genes are associated with the fragile X protein FMRP, indicating new links between autism and synaptic plasticity.
The unprecedented resolution of high-throughput genomics has enabled the recent discovery of a phenomenon by which specific regions of the genome are shattered and then stitched together via a single ...devastating event, referred to as chromothripsis. Potential mechanisms governing this process are now emerging, with implications for our understanding of the role of genomic rearrangements in development and disease.
Gene duplication is an important source of phenotypic change and adaptive evolution. We leverage a haploid hydatidiform mole to identify highly identical sequences missing from the reference genome, ...confirming that the cortical development gene Slit-Robo Rho GTPase-activating protein 2 (SRGAP2) duplicated three times exclusively in humans. We show that the promoter and first nine exons of SRGAP2 duplicated from 1q32.1 (SRGAP2A) to 1q21.1 (SRGAP2B) ∼3.4 million years ago (mya). Two larger duplications later copied SRGAP2B to chromosome 1p12 (SRGAP2C) and to proximal 1q21.1 (SRGAP2D) ∼2.4 and ∼1 mya, respectively. Sequence and expression analyses show that SRGAP2C is the most likely duplicate to encode a functional protein and is among the most fixed human-specific duplicate genes. Our data suggest a mechanism where incomplete duplication created a novel gene function—antagonizing parental SRGAP2 function—immediately “at birth” 2–3 mya, which is a time corresponding to the transition from Australopithecus to Homo and the beginning of neocortex expansion.
Display omitted
► Missing SRGAP2 human-specific genes sequenced by using haploid hydatidiform mole DNA ► SRGAP2 duplicated three times in the human lineage ∼1.0–3.4 million years ago ► One duplicate is expressed in the brain and is fixed in copy number in all humans ► The incomplete initial duplication likely antagonized the parent gene at birth
A series of incomplete duplications of an ancestral neuronal gene that took place only in the human lineage generated truncated genes, likely to encode new functions immediately upon “birth.” The appearance of these human-specific genes coincides with the emergence of an expanded neocortex.
The discovery of genetic variation and the assembly of genome sequences are both inextricably linked to advances in DNA-sequencing technology. Short-read massively parallel sequencing has ...revolutionized our ability to discover genetic variation but is insufficient to generate high-quality genome assemblies or resolve most structural variation. Full resolution of variation is only guaranteed by complete de novo assembly of a genome. Here, we review approaches to genome assembly, the nature of gaps or missing sequences, and biases in the assembly process. We describe the challenges of generating a complete de novo genome assembly using current technologies and the impact that being able to perfectly sequence the genome would have on understanding human disease and evolution. Finally, we summarize recent technological advances that improve both contiguity and accuracy and emphasize the importance of complete de novo assembly as opposed to read mapping as the primary means to understanding the full range of human genetic variation.
In order to provide a comprehensive resource for human structural variants (SVs), we generated long-read sequence data and analyzed SVs for fifteen human genomes. We sequence resolved 99,604 ...insertions, deletions, and inversions including 2,238 (1.6 Mbp) that are shared among all discovery genomes with an additional 13,053 (6.9 Mbp) present in the majority, indicating minor alleles or errors in the reference. Genotyping in 440 additional genomes confirms the most common SVs in unique euchromatin are now sequence resolved. We report a ninefold SV bias toward the last 5 Mbp of human chromosomes with nearly 55% of all VNTRs (variable number of tandem repeats) mapping to this portion of the genome. We identify SVs affecting coding and noncoding regulatory loci improving annotation and interpretation of functional variation. These data provide the framework to construct a canonical human reference and a resource for developing advanced representations capable of capturing allelic diversity.
Display omitted
•We sequence resolve and annotate 99,604 common human structural variants•55% of VNTRs map to the end of chromosomes and correlate with double-strand breaks•Alternate alleles facilitate accurate genotyping with short reads and new associations•We patch the reference and add diversity needed for developing a pan human genome
Long-read sequencing allows generation of a large catalog of human structural variants and the development of an algorithm for genotyping SVs from short-read data, clarifying the spectrum and importance of structural variation in the human genome.
Infant acute lymphoblastic leukemia (ALL) with MLL rearrangements (MLL-R) represents a distinct leukemia with a poor prognosis. To define its mutational landscape, we performed whole-genome, exome, ...RNA and targeted DNA sequencing on 65 infants (47 MLL-R and 18 non-MLL-R cases) and 20 older children (MLL-R cases) with leukemia. Our data show that infant MLL-R ALL has one of the lowest frequencies of somatic mutations of any sequenced cancer, with the predominant leukemic clone carrying a mean of 1.3 non-silent mutations. Despite this paucity of mutations, we detected activating mutations in kinase-PI3K-RAS signaling pathway components in 47% of cases. Surprisingly, these mutations were often subclonal and were frequently lost at relapse. In contrast to infant cases, MLL-R leukemia in older children had more somatic mutations (mean of 6.5 mutations/case versus 1.3 mutations/case, P = 7.15 × 10(-5)) and had frequent mutations (45%) in epigenetic regulators, a category of genes that, with the exception of MLL, was rarely mutated in infant MLL-R ALL.
The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects ...the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.
The relationships between clonal architecture and functional heterogeneity in acute myeloid leukemia (AML) samples are not yet clear. We used targeted sequencing to track AML subclones identified by ...whole-genome sequencing using a variety of experimental approaches. We found that virtually all AML subclones trafficked from the marrow to the peripheral blood, but some were enriched in specific cell populations. Subclones showed variable engraftment potential in immunodeficient mice. Xenografts were predominantly comprised of a single genetically defined subclone, but there was no predictable relationship between the engrafting subclone and the evolutionary hierarchy of the leukemia. These data demonstrate the importance of integrating genetic and functional data in studies of primary cancer samples, both in xenograft models and in patients.
•AML subclones are discrete, genetically distinct entities in AML samples•AML subclones often have unique functional and morphological properties•Engraftment of AML cells in mice is not defined by evolutionary hierarchy•The AML founding clone is not equivalent to the AML-initiating cell in mice
Klco et al. track acute myeloid leukemia (AML) subclones identified by whole-genome sequencing and find that subclones of AML can correspond to different cellular populations within a single AML sample and can have different functional properties in vitro and in immunodeficient mice.