The domestic cat (Felis catus) numbers over 94 million in the USA alone, occupies households as a companion animal, and, like humans, suffers from cancer and common and rare diseases. However, ...genome-wide sequence variant information is limited for this species. To empower trait analyses, a new cat genome reference assembly was developed from PacBio long sequence reads that significantly improve sequence representation and assembly contiguity. The whole genome sequences of 54 domestic cats were aligned to the reference to identify single nucleotide variants (SNVs) and structural variants (SVs). Across all cats, 16 SNVs predicted to have deleterious impacts and in a singleton state were identified as high priority candidates for causative mutations. One candidate was a stop gain in the tumor suppressor FBXW7. The SNV is found in cats segregating for feline mediastinal lymphoma and is a candidate for inherited cancer susceptibility. SV analysis revealed a complex deletion coupled with a nearby potential duplication event that was shared privately across three unrelated cats with dwarfism and is found within a known dwarfism associated region on cat chromosome B1. This SV interrupted UDP-glucose 6-dehydrogenase (UGDH), a gene involved in the biosynthesis of glycosaminoglycans. Importantly, UGDH has not yet been associated with human dwarfism and should be screened in undiagnosed patients. The new high-quality cat genome reference and the compilation of sequence variation demonstrate the importance of these resources when searching for disease causative alleles in the domestic cat and for identification of feline biomedical models.
Identifying the genetic factors that underlie complex traits is central to understanding the mechanistic underpinnings of evolution. Cave-dwelling Astyanax mexicanus populations are well adapted to ...subterranean life and many populations appear to have evolved troglomorphic traits independently, while the surface-dwelling populations can be used as a proxy for the ancestral form. Here we present a high-resolution, chromosome-level surface fish genome, enabling the first genome-wide comparison between surface fish and cavefish populations. Using this resource, we performed quantitative trait locus (QTL) mapping analyses and found new candidate genes for eye loss such as dusp26. We used CRISPR gene editing in A. mexicanus to confirm the essential role of a gene within an eye size QTL, rx3, in eye formation. We also generated the first genome-wide evaluation of deletion variability across cavefish populations to gain insight into this potential source of cave adaptation. The surface fish genome reference now provides a more complete resource for comparative, functional and genetic studies of drastic trait differences within a species.
The complete assembly of each human chromosome is essential for understanding human biology and evolution
. Here we use complementary long-read sequencing technologies to complete the linear assembly ...of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the β-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.
The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects ...the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.
...we have developed a system to track individual regions that are under review. The primary assembly unit contains sequences for the non-redundant haploid assembly; this includes the scaffolds that ...make up the chromosome sequence as well as unplaced and unlocalized scaffolds that are thought to represent novel sequence (not shown in this picture).\n Additionally, we wish to engage the research and clinical communities to identify regions that require targeted effort and to incorporate information from groups performing detailed work on specific loci.
The importance of the Gallus gallus (chicken) as a model organism and agricultural animal merits a continuation of sequence assembly improvement efforts. We present a new version of the chicken ...genome assembly (Gallus_gallus-5.0; GCA_000002315.3), built from combined long single molecule sequencing technology, finished BACs, and improved physical maps. In overall assembled bases, we see a gain of 183 Mb, including 16.4 Mb in placed chromosomes with a corresponding gain in the percentage of intact repeat elements characterized. Of the 1.21 Gb genome, we include three previously missing autosomes, GGA30, 31, and 33, and improve sequence contig length 10-fold over the previous Gallus_gallus-4.0. Despite the significant base representation improvements made, 138 Mb of sequence is not yet located to chromosomes. When annotated for gene content, Gallus_gallus-5.0 shows an increase of 4679 annotated genes (2768 noncoding and 1911 protein-coding) over those in Gallus_gallus-4.0. We also revisited the question of what genes are missing in the avian lineage, as assessed by the highest quality avian genome assembly to date, and found that a large fraction of the original set of missing genes are still absent in sequenced bird species. Finally, our new data support a detailed map of MHC-B, encompassing two segments: one with a highly stable gene copy number and another in which the gene copy number is highly variable. The chicken model has been a critical resource for many other fields of study, and this new reference assembly will substantially further these efforts.
Recurrent rearrangements of Chromosome 8p23.1 are associated with congenital heart defects and developmental delay. The complexity of this region has led to inconsistencies in the current reference ...assembly, confounding studies of genetic variation. Using comparative sequence-based approaches, we generated a high-quality 6.3-Mbp alternate reference assembly of an inverted Chromosome 8p23.1 haplotype. Comparison with nonhuman primates reveals a 746-kbp duplicative transposition and two separate inversion events that arose in the last million years of human evolution. The breakpoints associated with these rearrangements map to an ape-specific interchromosomal core duplicon that clusters at sites of evolutionary inversion (P = 7.8 × 10
). Refinement of microdeletion breakpoints identifies a subgroup of patients that map to the same interchromosomal core involved in the evolutionary formation of the duplication blocks. Our results define a higher-order genomic instability element that has shaped the structure of specific chromosomes during primate evolution contributing to rearrangements associated with inversion and disease.
The rhesus macaque (
Macaca mulatta
) is the most widely studied nonhuman primate (NHP) in biomedical research. We present an updated reference genome assembly (Mmul_10, contig N50 = 46 Mbp), ...increasing the sequence contiguity 120-fold and annotate it using 6.5 million full-length transcripts, thus improving our understanding of gene content, isoform diversity, and repeat organization. With the improved assembly of segmental duplications, we discover novel lineage-specific genes and expand gene families that are potentially informative in studies of evolution and disease susceptibility. Whole-genome sequence data from 853 captive rhesus macaques identifies polymorphism in 85.7 million single-nucleotide and 10.5 million indel variants, including potentially damaging variants in genes associated with human autism and developmental delay, providing a framework for developing non-invasive NHP models of human disease.
A compendium of rhesus macaque genome variation
We present a case of 9p- syndrome with a complex chromosomal event originally characterized by the classical karyotype approach as 46,XX,der(9)t(9;13)(p23;q13). We used advanced technologies (Bionano ...Genomics genome imaging and 10× Genomics sequencing) to characterize the location of the translocation and accompanying deletion on Chromosome 9 and duplication on Chromosome 13 with single-nucleotide breakpoint resolution. The translocation breakpoint was at Chr 9:190938 and Chr 13:50850492, the deletion at Chr 9:1-190938, and the duplication at Chr 13:50850492-114364328. We identified genes in the deletion and duplication regions that are known to be associated with this patient's phenotype (e.g.,
in dysmorphic facial features,
in developmental delay,
in developmental delay, and
in autism). Our results indicate that clinical genomic assessment of individuals with complex karyotypes can be refined to a single-base-pair resolution when utilizing Bionano and 10× Genomics sequencing. With the 10× Genomics data, we were also able to characterize other variation (e.g., loss of function) throughout the remainder of the patient's genome. Overall, the Bionano and 10× technologies complemented each other and provided important insight into our patient with 9p- syndrome. Altogether, these results indicate that newer technologies can identify precise genomic variants associated with unique patient phenotypes that permit discovery of novel genotype-phenotype correlations and therapeutic strategies.