DIKUL - logo
E-viri
Recenzirano Odprti dostop
  • Characterizing the Major St...
    Audano, Peter A.; Sulovari, Arvis; Graves-Lindsay, Tina A.; Cantsilieris, Stuart; Sorensen, Melanie; Welch, AnneMarie E.; Dougherty, Max L.; Nelson, Bradley J.; Shah, Ankeeta; Dutcher, Susan K.; Warren, Wesley C.; Magrini, Vincent; McGrath, Sean D.; Li, Yang I.; Wilson, Richard K.; Eichler, Evan E.

    Cell, 01/2019, Letnik: 176, Številka: 3
    Journal Article

    In order to provide a comprehensive resource for human structural variants (SVs), we generated long-read sequence data and analyzed SVs for fifteen human genomes. We sequence resolved 99,604 insertions, deletions, and inversions including 2,238 (1.6 Mbp) that are shared among all discovery genomes with an additional 13,053 (6.9 Mbp) present in the majority, indicating minor alleles or errors in the reference. Genotyping in 440 additional genomes confirms the most common SVs in unique euchromatin are now sequence resolved. We report a ninefold SV bias toward the last 5 Mbp of human chromosomes with nearly 55% of all VNTRs (variable number of tandem repeats) mapping to this portion of the genome. We identify SVs affecting coding and noncoding regulatory loci improving annotation and interpretation of functional variation. These data provide the framework to construct a canonical human reference and a resource for developing advanced representations capable of capturing allelic diversity. Display omitted •We sequence resolve and annotate 99,604 common human structural variants•55% of VNTRs map to the end of chromosomes and correlate with double-strand breaks•Alternate alleles facilitate accurate genotyping with short reads and new associations•We patch the reference and add diversity needed for developing a pan human genome Long-read sequencing allows generation of a large catalog of human structural variants and the development of an algorithm for genotyping SVs from short-read data, clarifying the spectrum and importance of structural variation in the human genome.