The Neanderthal and Denisovan genomes enabled the discovery of sequences that differ between modern and archaic humans, the majority of which are noncoding. However, our understanding of the ...regulatory consequences of these differences remains limited, in part due to the decay of regulatory marks in ancient samples. Here, we used a massively parallel reporter assay in embryonic stem cells, neural progenitor cells, and bone osteoblasts to investigate the regulatory effects of the 14,042 single-nucleotide modern human-specific variants. Overall, 1791 (13%) of sequences containing these variants showed active regulatory activity, and 407 (23%) of these drove differential expression between human groups. Differentially active sequences were associated with divergent transcription factor binding motifs, and with genes enriched for vocal tract and brain anatomy and function. This work provides insight into the regulatory function of variants that emerged along the modern human lineage and the recent evolution of human gene expression.
Genetic differences that specify unique aspects of human evolution have typically been identified by comparative analyses between the genomes of humans and closely related primates, including more ...recently the genomes of archaic hominins. Not all regions of the genome, however, are equally amenable to such study. Recurrent copy number variation (CNV) at chromosome 16p11.2 accounts for approximately 1% of cases of autism and is mediated by a complex set of segmental duplications, many of which arose recently during human evolution. Here we reconstruct the evolutionary history of the locus and identify bolA family member 2 (BOLA2) as a gene duplicated exclusively in Homo sapiens. We estimate that a 95-kilobase-pair segment containing BOLA2 duplicated across the critical region approximately 282 thousand years ago (ka), one of the latest among a series of genomic changes that dramatically restructured the locus during hominid evolution. All humans examined carried one or more copies of the duplication, which nearly fixed early in the human lineage--a pattern unlikely to have arisen so rapidly in the absence of selection (P < 0.0097). We show that the duplication of BOLA2 led to a novel, human-specific in-frame fusion transcript and that BOLA2 copy number correlates with both RNA expression (r = 0.36) and protein level (r = 0.65), with the greatest expression difference between human and chimpanzee in experimentally derived stem cells. Analyses of 152 patients carrying a chromosome 16p11. rearrangement show that more than 96% of breakpoints occur within the H. sapiens-specific duplication. In summary, the duplicative transposition of BOLA2 at the root of the H. sapiens lineage about 282 ka simultaneously increased copy number of a gene associated with iron homeostasis and predisposed our species to recurrent rearrangements associated with disease.
Topological associating domains (TADs) are self-interacting genomic units crucial for shaping gene regulation patterns. Despite their importance, the extent of their evolutionary conservation and its ...functional implications remain largely unknown. In this study, we generate Hi-C and ChIP-seq data and compare TAD organization across four primate and four rodent species and characterize the genetic and epigenetic properties of TAD boundaries in correspondence to their evolutionary conservation. We find 14% of all human TAD boundaries to be shared among all eight species (ultraconserved), while 15% are human-specific. Ultraconserved TAD boundaries have stronger insulation strength, CTCF binding, and enrichment of older retrotransposons compared to species-specific boundaries. CRISPR-Cas9 knockouts of an ultraconserved boundary in a mouse model lead to tissue-specific gene expression changes and morphological phenotypes. Deletion of a human-specific boundary near the autism-related AUTS2 gene results in the upregulation of this gene in neurons. Overall, our study provides pertinent TAD boundary evolutionary conservation annotations and showcases the functional importance of TAD evolution.
Structural variation and single-nucleotide variation of the complement factor H (CFH) gene family underlie several complex genetic diseases, including age-related macular degeneration (AMD) and ...atypical hemolytic uremic syndrome (AHUS). To understand its diversity and evolution, we performed high-quality sequencing of this ∼360-kbp locus in six primate lineages, including multiple human haplotypes. Comparative sequence analyses reveal two distinct periods of gene duplication leading to the emergence of four CFH-related (CFHR) gene paralogs (CFHR2 and CFHR4 ∼25–35 Mya and CFHR1 and CFHR3 ∼7–13 Mya). Remarkably, all evolutionary breakpoints share a common ∼4.8-kbp segment corresponding to an ancestral CFHR gene promoter that has expanded independently throughout primate evolution. This segment is recurrently reused and juxtaposed with a donor duplication containing exons 8 and 9 from ancestral CFH, creating four CFHR fusion genes that include lineage-specific members of the gene family. Combined analysis of >5,000 AMD cases and controls identifies a significant burden of a rare missense mutation that clusters at the N terminus of CFH P = 5.81 × 10−8, odds ratio (OR) = 9.8 (3.67-Infinity). A bipolar clustering pattern of rare nonsynonymous mutations in patients with AMD (P < 10−3) and AHUS (P = 0.0079) maps to functional domains that show evidence of positive selection during primate evolution. Our structural variation analysis in >2,400 individuals reveals five recurrent rearrangement breakpoints that show variable frequency among AMD cases and controls. These data suggest a dynamic and recurrent pattern of mutation critical to the emergence of new CFHR genes but also in the predisposition to complex human genetic disease phenotypes.
Gene innovation by duplication is a fundamental evolutionary process but is difficult to study in humans due to the large size, high sequence identity, and mosaic nature of segmental duplication ...blocks. The human-specific gene hydrocephalus-inducing 2, HYDIN2, was generated by a 364 kbp duplication of 79 internal exons of the large ciliary gene HYDIN from chromosome 16q22.2 to chromosome 1q21.1. Because the HYDIN2 locus lacks the ancestral promoter and seven terminal exons of the progenitor gene, we sought to characterize transcription at this locus by coupling reverse transcription polymerase chain reaction and long-read sequencing.
5' RACE indicates a transcription start site for HYDIN2 outside of the duplication and we observe fusion transcripts spanning both the 5' and 3' breakpoints. We observe extensive splicing diversity leading to the formation of altered open reading frames (ORFs) that appear to be under relaxed selection. We show that HYDIN2 adopted a new promoter that drives an altered pattern of expression, with highest levels in neural tissues. We estimate that the HYDIN duplication occurred ~3.2 million years ago and find that it is nearly fixed (99.9%) for diploid copy number in contemporary humans. Examination of 73 chromosome 1q21 rearrangement patients reveals that HYDIN2 is deleted or duplicated in most cases.
Together, these data support a model of rapid gene innovation by fusion of incomplete segmental duplications, altered tissue expression, and potential subfunctionalization or neofunctionalization of HYDIN2 early in the evolution of the Homo lineage.
The coding regions of the human and chimpanzee genomes are 99% identical, which implies changes to the noncoding genome are likely responsible for the divergent phenotypes between these species. A ...large body of research has focused on noncoding regions in an attempt to decode their function and implications on phenotypes and disease. In this work, I examined noncoding regulatory sequences unique to modern humans and identified candidate regulatory sequences that may have contributed to modern human speciation. First, I tested the regulatory potential of all fixed or nearly fixed single nucleotide changes in the modern human lineage as compared to Neanderthal and Denisovan using Massively Parallel Reporter Assays (MPRA). We found that a subset of these sequences are potentially active regulatory elements, and an additional subset were differentially active between the archaic and modern sequence versions. The differentially active sequences were linked to genes involved in brain development, vocal tract function, and other phenotypes. Additionally, we annotated changes in CTCF binding sites between humans, great apes, and archaic humans. CTCF is an architectural protein that helps facilitate genome looping and enhancer-promoter interactions. We identified CTCF binding sites that were uniquely gained or lost in the human genome, as compared to great apes (human specific), and separately compared to Neanderthal and Denisovan (recent human specific). We identified 2,230 human specific gained CTCF sites and 24 human specific lost CTCF sites, as compared to great apes. As compared to Neanderthal and Denisovan, we found 24 recent human specific gained sites. We observed an enrichment of human gained sites at TAD boundaries, but no enrichment for the human lost CTCF sites. Additionally, we found that human specific gained CTCF sites were enriched near genes involved in cognitive function and recent human specific CTCF sites were enriched near genes related to chondrocyte differentiation. Finally, I created a stable human iPSC line in which I deleted one human specific gained CTCF site. I differentiated these cells into neurons for further characterization but found that this deletion had no effect on gene expression in the region. My work provides a list of single nucleotide changes and CTCF sites that are interesting targets for future research in determining the effects of noncoding sequence changes on modern human evolution.
Co-option of transposable elements (TEs) to become part of existing or new enhancers is an important mechanism for evolution of gene regulation. However, contributions of lineage-specific TE ...insertions to recent regulatory adaptations remain poorly understood. Gibbons present a suitable model to study these contributions as they have evolved a lineage-specific TE called LAVA (LINE-AluSz-VN-TR-Alu
LIKE), which is still active in the gibbon genome. The LAVA retrotransposon is thought to have played a role in the emergence of the highly rearranged structure of the gibbon genome by disrupting transcription of cell cycle genes. In this study, we investigated whether LAVA may have also contributed to the evolution of gene regulation by adopting enhancer function. We characterized fixed and polymorphic LAVA insertions across multiple gibbons and found 96 LAVA elements overlapping enhancer chromatin states. Moreover, LAVA was enriched in multiple transcription factor binding motifs, was bound by an important transcription factor (PU.1), and was associated with higher levels of gene expression in cis. We found gibbon-specific signatures of purifying/positive selection at 27 LAVA insertions. Two of these insertions were fixed in the gibbon lineage and overlapped with enhancer chromatin states, representing putative co-opted LAVA enhancers. These putative enhancers were located within genes encoding SETD2 and RAD9A, two proteins that facilitate accurate repair of DNA double-strand breaks and prevent chromosomal rearrangement mutations. Co-option of LAVA in these genes may have influenced regulation of processes that preserve genome integrity. Our findings highlight the importance of considering lineage-specific TEs in studying evolution of gene regulatory elements.
Recurrent rearrangements of Chromosome 8p23.1 are associated with congenital heart defects and developmental delay. The complexity of this region has led to inconsistencies in the current reference ...assembly, confounding studies of genetic variation. Using comparative sequence-based approaches, we generated a high-quality 6.3-Mbp alternate reference assembly of an inverted Chromosome 8p23.1 haplotype. Comparison with nonhuman primates reveals a 746-kbp duplicative transposition and two separate inversion events that arose in the last million years of human evolution. The breakpoints associated with these rearrangements map to an ape-specific interchromosomal core duplicon that clusters at sites of evolutionary inversion (P = 7.8 × 10
). Refinement of microdeletion breakpoints identifies a subgroup of patients that map to the same interchromosomal core involved in the evolutionary formation of the duplication blocks. Our results define a higher-order genomic instability element that has shaped the structure of specific chromosomes during primate evolution contributing to rearrangements associated with inversion and disease.