Genetic changes causing brain size expansion in human evolution have remained elusive. Notch signaling is essential for radial glia stem cell proliferation and is a determinant of neuronal number in ...the mammalian cortex. We find that three paralogs of human-specific NOTCH2NL are highly expressed in radial glia. Functional analysis reveals that different alleles of NOTCH2NL have varying potencies to enhance Notch signaling by interacting directly with NOTCH receptors. Consistent with a role in Notch signaling, NOTCH2NL ectopic expression delays differentiation of neuronal progenitors, while deletion accelerates differentiation into cortical neurons. Furthermore, NOTCH2NL genes provide the breakpoints in 1q21.1 distal deletion/duplication syndrome, where duplications are associated with macrocephaly and autism and deletions with microcephaly and schizophrenia. Thus, the emergence of human-specific NOTCH2NL genes may have contributed to the rapid evolution of the larger human neocortex, accompanied by loss of genomic stability at the 1q21.1 locus and resulting recurrent neurodevelopmental disorders.
Display omitted
•NOTCH2NLA,B,C expressed in human fetal brain radial glia stem cells, arose 0.5–4 mya•These genes encode secreted, Notch-related proteins that enhance Notch signaling•Overexpression of NOTCH2NL delays neuronal differentiation, while deletion accelerates it•NOTCH2NLA and NOTCH2NLB serve as breakpoints in 1q21.1 deletion/duplication syndrome
Human-specific Notch paralogs are expressed in radial glia, enhance Notch signaling, and impact neuronal differentiation.
In order to provide a comprehensive resource for human structural variants (SVs), we generated long-read sequence data and analyzed SVs for fifteen human genomes. We sequence resolved 99,604 ...insertions, deletions, and inversions including 2,238 (1.6 Mbp) that are shared among all discovery genomes with an additional 13,053 (6.9 Mbp) present in the majority, indicating minor alleles or errors in the reference. Genotyping in 440 additional genomes confirms the most common SVs in unique euchromatin are now sequence resolved. We report a ninefold SV bias toward the last 5 Mbp of human chromosomes with nearly 55% of all VNTRs (variable number of tandem repeats) mapping to this portion of the genome. We identify SVs affecting coding and noncoding regulatory loci improving annotation and interpretation of functional variation. These data provide the framework to construct a canonical human reference and a resource for developing advanced representations capable of capturing allelic diversity.
Display omitted
•We sequence resolve and annotate 99,604 common human structural variants•55% of VNTRs map to the end of chromosomes and correlate with double-strand breaks•Alternate alleles facilitate accurate genotyping with short reads and new associations•We patch the reference and add diversity needed for developing a pan human genome
Long-read sequencing allows generation of a large catalog of human structural variants and the development of an algorithm for genotyping SVs from short-read data, clarifying the spectrum and importance of structural variation in the human genome.
Direct comparisons of human and non-human primate brains can reveal molecular pathways underlying remarkable specializations of the human brain. However, chimpanzee tissue is inaccessible during ...neocortical neurogenesis when differences in brain size first appear. To identify human-specific features of cortical development, we leveraged recent innovations that permit generating pluripotent stem cell-derived cerebral organoids from chimpanzee. Despite metabolic differences, organoid models preserve gene regulatory networks related to primary cell types and developmental processes. We further identified 261 differentially expressed genes in human compared to both chimpanzee organoids and macaque cortex, enriched for recent gene duplications, and including multiple regulators of PI3K-AKT-mTOR signaling. We observed increased activation of this pathway in human radial glia, dependent on two receptors upregulated specifically in human: INSR and ITGB8. Our findings establish a platform for systematic analysis of molecular changes contributing to human brain development and evolution.
•Brain organoids preserve gene expression networks despite elevated metabolic stress•Chimpanzee organoids enable studies of the evolution of human brain development•Primary and organoid samples reveal 261 human-specific gene expression changes•Human radial glia exhibit increased mTOR activation compared to non-human primates
Comparisons of cerebral organoids between chimpanzees, macaques, and humans reveal gene duplications and cell-signaling alterations that explain developmental evolutionary differences that are unique to us as a species.
Despite the importance of duplicate genes for evolutionary adaptation, accurate gene annotation is often incomplete, incorrect, or lacking in regions of segmental duplication. We developed an ...approach combining long-read sequencing and hybridization capture to yield full-length transcript information and confidently distinguish between nearly identical genes/paralogs. We used biotinylated probes to enrich for full-length cDNA from duplicated regions, which were then amplified, size-fractionated, and sequenced using single-molecule, long-read sequencing technology, permitting us to distinguish between highly identical genes by virtue of multiple paralogous sequence variants. We examined 19 gene families as expressed in developing and adult human brain, selected for their high sequence identity (average >99%) and overlap with human-specific segmental duplications (SDs). We characterized the transcriptional differences between related paralogs to better understand the birth-death process of duplicate genes and particularly how the process leads to gene innovation. In 48% of the cases, we find that the expressed duplicates have changed substantially from their ancestral models due to novel sites of transcription initiation, splicing, and polyadenylation, as well as fusion transcripts that connect duplication-derived exons with neighboring genes. We detect unannotated open reading frames in genes currently annotated as pseudogenes, while relegating other duplicates to nonfunctional status. Our method significantly improves gene annotation, specifically defining full-length transcripts, isoforms, and open reading frames for new genes in highly identical SDs. The approach will be more broadly applicable to genes in structurally complex regions of other genomes where the duplication process creates novel genes important for adaptive traits.
We have developed a computational method based on polyploid phasing of long sequence reads to resolve collapsed regions of segmental duplications within genome assemblies. Segmental Duplication ...Assembler (SDA; https://github.com/mvollger/SDA ) constructs graphs in which paralogous sequence variants define the nodes and long-read sequences provide attraction and repulsion edges, enabling the partition and assembly of long reads corresponding to distinct paralogs. We apply it to single-molecule, real-time sequence data from three human genomes and recover 33-79 megabase pairs (Mb) of duplications in which approximately half of the loci are diverged (<99.8%) compared to the reference genome. We show that the corresponding sequence is highly accurate (>99.9%) and that the diverged sequence corresponds to copy-number-variable paralogs that are absent from the human reference genome. Our method can be applied to other complex genomes to resolve the last gene-rich gaps, improve duplicate gene annotation, and better understand copy-number-variant genetic diversity at the base-pair level.
Genetic studies of human evolution require high-quality contiguous ape genome assemblies that are not guided by the human reference. We coupled long-read sequence assembly and full-length ...complementary DNA sequencing with a multiplatform scaffolding approach to produce ab initio chimpanzee and orangutan genome assemblies. By comparing these with two long-read de novo human genome assemblies and a gorilla genome assembly, we characterized lineage-specific and shared great ape genetic variation ranging from single- to mega-base pair-sized variants. We identified ~17,000 fixed human-specific structural variants identifying genic and putative regulatory changes that have emerged in humans since divergence from nonhuman apes. Interestingly, these variants are enriched near genes that are down-regulated in human compared to chimpanzee cerebral organoids, particularly in cells analogous to radial glial neural progenitors.
The complex interspersed pattern of segmental duplications in humans is responsible for rearrangements associated with neurodevelopmental disease, including the emergence of novel genes important in ...human brain evolution. We investigate the evolution of LCR16a, a putative driver of this phenomenon that encodes one of the most rapidly evolving human-ape gene families, nuclear pore interacting protein (NPIP).
Comparative analysis shows that LCR16a has independently expanded in five primate lineages over the last 35 million years of primate evolution. The expansions are associated with independent lineage-specific segmental duplications flanking LCR16a leading to the emergence of large interspersed duplication blocks at non-orthologous chromosomal locations in each primate lineage. The intron-exon structure of the NPIP gene family has changed dramatically throughout primate evolution with different branches showing characteristic gene models yet maintaining an open reading frame. In the African ape lineage, we detect signatures of positive selection that occurred after a transition to more ubiquitous expression among great ape tissues when compared to Old World and New World monkeys. Mouse transgenic experiments from baboon and human genomic loci confirm these expression differences and suggest that the broader ape expression pattern arose due to mutational changes that emerged in cis.
LCR16a promotes serial interspersed duplications and creates hotspots of genomic instability that appear to be an ancient property of primate genomes. Dramatic changes to NPIP gene structure and altered tissue expression preceded major bouts of positive selection in the African ape lineage, suggestive of a gene undergoing strong adaptive evolution.
Gene innovation by duplication is a fundamental evolutionary process but is difficult to study in humans due to the large size, high sequence identity, and mosaic nature of segmental duplication ...blocks. The human-specific gene hydrocephalus-inducing 2, HYDIN2, was generated by a 364 kbp duplication of 79 internal exons of the large ciliary gene HYDIN from chromosome 16q22.2 to chromosome 1q21.1. Because the HYDIN2 locus lacks the ancestral promoter and seven terminal exons of the progenitor gene, we sought to characterize transcription at this locus by coupling reverse transcription polymerase chain reaction and long-read sequencing.
5' RACE indicates a transcription start site for HYDIN2 outside of the duplication and we observe fusion transcripts spanning both the 5' and 3' breakpoints. We observe extensive splicing diversity leading to the formation of altered open reading frames (ORFs) that appear to be under relaxed selection. We show that HYDIN2 adopted a new promoter that drives an altered pattern of expression, with highest levels in neural tissues. We estimate that the HYDIN duplication occurred ~3.2 million years ago and find that it is nearly fixed (99.9%) for diploid copy number in contemporary humans. Examination of 73 chromosome 1q21 rearrangement patients reveals that HYDIN2 is deleted or duplicated in most cases.
Together, these data support a model of rapid gene innovation by fusion of incomplete segmental duplications, altered tissue expression, and potential subfunctionalization or neofunctionalization of HYDIN2 early in the evolution of the Homo lineage.
In this work, I set out to characterize new genes contained specifically within the human genome but absent from any other, including our closest evolutionary cousins. Our interest in these genes is ...twofold: First, gene duplication is a fundamental process by which evolutionary innovations at the organismal level arise; and second, we believe that variation in these new genes is an important and unappreciated source of phenotypic differences between individuals, both normal and pathogenic. The ability to interpret observed variation in a gene, such as to determine if a mutation in that gene is likely to be impactful, requires quality annotation. This includes an understanding of gene structure: how the gene is transcribed, processed, and which base pairs code for protein or have other specific roles. The major challenge presented by this category of genes is that they are highly identical to other parts of the genome, and as such, most methods of investigation struggle to tell them apart. As will be expanded upon, this annotation is currently absent or insufficient for genes that are specific to the human genome. Combined with evolutionary and expression analysis, solving this annotation problem enables us to understand how new genes are created in the human genome at the very earliest stages. Combined with surveys of natural occurring and pathogenic variation, it enables us to understand which new genes are functional and which are not, and among those with function, which harbor deleterious variants that can cause disease. With these goals in mind, I set out to solve the annotation of human-specific duplicate genes most promising for functional status. I performed a close study of one such gene, HYDIN2, where I present an evolutionary analysis that gives insight into gene creation and associated disease mechanism, present a survey of naturally occurring variation, and describe the complex transcriptional pattern of a gene that serves as a case study in how duplication and rearrangement of genome segments can lead to rapid gene innovation. Next, I present a technique to more rapidly and rigorously study the transcription of any recently created duplicate gene. I apply this technique to the body of human-specific duplicate genes as well as other expanded gene families. I show that by improving upon current annotations we gain insight into the structural history, expression pattern, and functional status of such genes. I conclude with logical next steps and promising future directions. Ultimately, this work increments our understanding of how gene duplication leads to evolutionary innovation specifically in human, the functional impact of these species-specific differences, and how variation in these genes can contribute to disease.