Over the past decade, long-read, single-molecule DNA sequencing technologies have emerged as powerful players in genomics. With the ability to generate reads tens to thousands of kilobases in length ...with an accuracy approaching that of short-read sequencing technologies, these platforms have proven their ability to resolve some of the most challenging regions of the human genome, detect previously inaccessible structural variants and generate some of the first telomere-to-telomere assemblies of whole chromosomes. Long-read sequencing technologies will soon permit the routine assembly of diploid genomes, which will revolutionize genomics by revealing the full spectrum of human genetic variation, resolving some of the missing heritability and leading to the discovery of novel mechanisms of disease.
Complete and accurate genome assemblies form the basis of most downstream genomic analyses and are of critical importance. Recent genome assembly projects have relied on a combination of noisy ...long-read sequencing and accurate short-read sequencing, with the former offering greater assembly continuity and the latter providing higher consensus accuracy. The recently introduced Pacific Biosciences (PacBio) HiFi sequencing technology bridges this divide by delivering long reads (>10 kbp) with high per-base accuracy (>99.9%). Here we present HiCanu, a modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering. We benchmark HiCanu with a focus on the recovery of haplotype diversity, major histocompatibility complex (MHC) variants, satellite DNAs, and segmental duplications. For diploid human genomes sequenced to 30× HiFi coverage, HiCanu achieved superior accuracy and allele recovery compared to the current state of the art. On the effectively haploid CHM13 human cell line, HiCanu achieved an NG50 contig size of 77 Mbp with a per-base consensus accuracy of 99.999% (QV50), surpassing recent assemblies of high-coverage, ultralong Oxford Nanopore Technologies (ONT) reads in terms of both accuracy and continuity. This HiCanu assembly correctly resolves 337 out of 341 validation BACs sampled from known segmental duplications and provides the first preliminary assemblies of nine complete human centromeric regions. Although gaps and errors still remain within the most challenging regions of the genome, these results represent a significant advance toward the complete assembly of human genomes.
The complete assembly of each human chromosome is essential for understanding human biology and evolution
. Here we use complementary long-read sequencing technologies to complete the linear assembly ...of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the β-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.
Epigenetic patterns in a complete human genome Gershman, Ariel; Sauria, Michael E G; Guitart, Xavi ...
Science (American Association for the Advancement of Science),
04/2022, Volume:
376, Issue:
6588
Journal Article
Peer reviewed
Open access
The completion of a telomere-to-telomere human reference genome, T2T-CHM13, has resolved complex regions of the genome, including repetitive and homologous regions. Here, we present a high-resolution ...epigenetic study of previously unresolved sequences, representing entire acrocentric chromosome short arms, gene family expansions, and a diverse collection of repeat classes. This resource precisely maps CpG methylation (32.28 million CpGs), DNA accessibility, and short-read datasets (166,058 previously unresolved chromatin immunoprecipitation sequencing peaks) to provide evidence of activity across previously unidentified or corrected genes and reveals clinically relevant paralog-specific regulation. Probing CpG methylation across human centromeres from six diverse individuals generated an estimate of variability in kinetochore localization. This analysis provides a framework with which to investigate the most elusive regions of the human genome, granting insights into epigenetic regulation.
Abstract
Centromeres are the chromosomal loci essential for faithful chromosome segregation during cell division. Although centromeres are transcribed and produce non-coding RNAs (cenRNAs) that ...affect centromere function, we still lack a mechanistic understanding of how centromere transcription is regulated. Here, using a targeted RNA isoform sequencing approach, we identified the transcriptional landscape at and surrounding all centromeres in budding yeast. Overall, cenRNAs are derived from transcription readthrough of pericentromeric regions but rarely span the entire centromere and are a complex mixture of molecules that are heterogeneous in abundance, orientation, and sequence. While most pericentromeres are transcribed throughout the cell cycle, centromere accessibility to the transcription machinery is restricted to S-phase. This temporal restriction is dependent on Cbf1, a centromere-binding transcription factor, that we demonstrate acts locally as a transcriptional roadblock. Cbf1 deletion leads to an accumulation of cenRNAs at all phases of the cell cycle which correlates with increased chromosome mis-segregation that is partially rescued when the roadblock activity is restored. We propose that a Cbf1-mediated transcriptional roadblock protects yeast centromeres from untimely transcription to ensure genomic stability.
Lay Summary
Centromeres are essential chromosomal regions that do not encode gene products and instead ensure the accurate partitioning of chromosomes during cell division. Despite the lack of genes, transcription has been detected at centromeres. It has not been clear where this centromeric RNA comes from and how it is regulated. In this study, the authors identified all of the centromeric RNAs at and around budding yeast centromeres during the cell cycle. Unlike RNAs that encode for proteins, centromeric RNAs are a complex mixture of transcripts that result from adjacent RNAs that continue into the centromere. The authors found that most transcription is blocked at the centromere border by a protein called Cbf1. This mechanism shields the centromere from untimely transcription to ensure genome stability.
Inheritance of each chromosome depends upon its centromere. A histone H3 variant, centromere protein A (CENP-A), is essential for epigenetically marking centromere location. We find that CENP-A is ...quantitatively retained at the centromere upon which it is initially assembled. CENP-C binds to CENP-A nucleosomes and is a prime candidate to stabilize centromeric chromatin. Using purified components, we find that CENP-C reshapes the octameric histone core of CENP-A nucleosomes, rigidifies both surface and internal nucleosome structure, and modulates terminal DNA to match the loose wrap that is found on native CENP-A nucleosomes at functional human centromeres. Thus, CENP-C affects nucleosome shape and dynamics in a manner analogous to allosteric regulation of enzymes. CENP-C depletion leads to rapid removal of CENP-A from centromeres, indicating their collaboration in maintaining centromere identity.
After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end ...to end, and hundreds of unresolved gaps persist
. Here we present a human genome assembly that surpasses the continuity of GRCh38
, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome
, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.
The first telomere-to-telomere (T2T) human genome assembly (T2T-CHM13) release is a milestone in human genomics. The T2T-CHM13 genome assembly extends our understanding of telomeres, centromeres, ...segmental duplication, and other complex regions. The current human genome reference (GRCh38) has been widely used in various human genomic studies. However, the large-scale genomic differences between these two important genome assemblies are not characterized in detail yet.
Here, in addition to the previously reported "non-syntenic" regions, we find 67 additional large-scale discrepant regions and precisely categorize them into four structural types with a newly developed website tool called SynPlotter. The discrepant regions (~ 21.6 Mbp) excluding telomeric and centromeric regions are highly structurally polymorphic in humans, where the deletions or duplications are likely associated with various human diseases, such as immune and neurodevelopmental disorders. The analyses of a newly identified discrepant region-the KLRC gene cluster-show that the depletion of KLRC2 by a single-deletion event is associated with natural killer cell differentiation in ~ 20% of humans. Meanwhile, the rapid amino acid replacements observed within KLRC3 are probably a result of natural selection in primate evolution.
Our study provides a foundation for understanding the large-scale structural genomic differences between the two crucial human reference genomes, and is thereby important for future human genomics studies.
Recent breakthroughs with synthetic budding yeast chromosomes expedite the creation of synthetic mammalian chromosomes and genomes. Mammals, unlike budding yeast, depend on the histone H3 variant, ...CENP-A, to epigenetically specify the location of the centromere—the locus essential for chromosome segregation. Prior human artificial chromosomes (HACs) required large arrays of centromeric α-satellite repeats harboring binding sites for the DNA sequence-specific binding protein, CENP-B. We report the development of a type of HAC that functions independently of these constraints. Formed by an initial CENP-A nucleosome seeding strategy, a construct lacking repetitive centromeric DNA formed several self-sufficient HACs that showed no uptake of genomic DNA. In contrast to traditional α-satellite HAC formation, the non-repetitive construct can form functional HACs without CENP-B or initial CENP-A nucleosome seeding, revealing distinct paths to centromere formation for different DNA sequence types. Our developments streamline the construction and characterization of HACs to facilitate mammalian synthetic genome efforts.
Display omitted
•Development of human artificial chromosomes (HACs) where CENP-A chromatin is seeded•Seeding CENP-A nucleosome assembly induces centromere formation•Seeding centromeric chromatin bypasses sequence elements in repetitive centromere DNA•Non-repetitive HAC templates ease initial construction and downstream genomic analyses
Development of human artificial chromosomes that bypass centromeric DNA removes a key barrier limiting mammalian synthetic genome efforts.