To further our understanding of the genetic etiology of autism, we generated and analyzed genome sequence data from 516 idiopathic autism families (2,064 individuals). This resource includes >59 ...million single-nucleotide variants (SNVs) and 9,212 private copy number variants (CNVs), of which 133,992 and 88 are de novo mutations (DNMs), respectively. We estimate a mutation rate of ∼1.5 × 10−8 SNVs per site per generation with a significantly higher mutation rate in repetitive DNA. Comparing probands and unaffected siblings, we observe several DNM trends. Probands carry more gene-disruptive CNVs and SNVs, resulting in severe missense mutations and mapping to predicted fetal brain promoters and embryonic stem cell enhancers. These differences become more pronounced for autism genes (p = 1.8 × 10−3, OR = 2.2). Patients are more likely to carry multiple coding and noncoding DNMs in different genes, which are enriched for expression in striatal neurons (p = 3 × 10−3), suggesting a path forward for genetically characterizing more complex cases of autism.
Display omitted
•Comprehensive CNV/SNV dataset from whole-genome sequencing of 516 autism families•Estimated human germline mutation rate of ∼1.5 × 10−8 substitutions/site/generation•Autism probands enriched for de novo missense, promoter, and enhancer mutations•Oligogenic de novo mutation signals for genes enriched in striatal neuron expression
Genomic analysis of 516 families with an autistic child and an unaffected sibling suggests that simplex autism results from de novo mutation and is oligogenic.
In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an ...assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF > 1%). We estimate that this theoretical human diploid differs by as much as ∼16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery from genotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that ∼59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.
Genetic studies of human evolution require high-quality contiguous ape genome assemblies that are not guided by the human reference. We coupled long-read sequence assembly and full-length ...complementary DNA sequencing with a multiplatform scaffolding approach to produce ab initio chimpanzee and orangutan genome assemblies. By comparing these with two long-read de novo human genome assemblies and a gorilla genome assembly, we characterized lineage-specific and shared great ape genetic variation ranging from single- to mega-base pair-sized variants. We identified ~17,000 fixed human-specific structural variants identifying genic and putative regulatory changes that have emerged in humans since divergence from nonhuman apes. Interestingly, these variants are enriched near genes that are down-regulated in human compared to chimpanzee cerebral organoids, particularly in cells analogous to radial glial neural progenitors.
Recurrent de novo (DN) and likely gene-disruptive (LGD) mutations contribute significantly to autism spectrum disorders (ASDs) but have been primarily investigated in European cohorts. Here, we ...sequence 189 risk genes in 1,543 Chinese ASD probands (1,045 from trios). We report an 11-fold increase in the odds of DN LGD mutations compared with expectation under an exome-wide neutral model of mutation. In aggregate, ∼4% of ASD patients carry a DN mutation in one of just 29 autism risk genes. The most prevalent gene for recurrent DN mutations is SCN2A (1.1% of patients) followed by CHD8, DSCAM, MECP2, POGZ, WDFY3 and ASH1L. We identify novel DN LGD recurrences (GIGYF2, MYT1L, CUL3, DOCK8 and ZNF292) and DN mutations in previous ASD candidates (ARHGAP32, NCOR1, PHIP, STXBP1, CDKL5 and SHANK1). Phenotypic follow-up confirms potential subtypes and highlights how large global cohorts might be leveraged to prove the pathogenic significance of individually rare mutations.
The interplay of natural selection and genetic drift, influenced by geographic isolation, mating systems and population size, determines patterns of genetic diversity within species. The sperm whale ...provides an interesting example of a longâlived species with few geographic barriers to dispersal. Worldwide mtDNA diversity is relatively low, but highly structured among geographic regions and social groups, attributed to female philopatry. However, it is unclear whether this female philopatry is due to geographic regions or social groups, or how this might vary on a worldwide scale. To answer these questions, we combined mtDNA information for 1091 previously published samples with 542 newly obtained DNA profiles (394âbp mtDNA, sex, 13 microsatellites) including the previously unsampled Indian Ocean, and social group information for 541 individuals. We found low mtDNA diversity (ÏÂ =Â 0.430%) reflecting an expansion event <80Â 000Â years bp, but strong differentiation by ocean, among regions within some oceans, and among social groups. In comparison, microsatellite differentiation was low at all levels, presumably due to maleâmediated gene flow. A hierarchical amova showed that regions were important for explaining mtDNA variance in the Indian Ocean, but not Pacific, with social group sampling in the Atlantic too limited to include in analyses. Social groups were important in partitioning mtDNA and microsatellite variance within both oceans. Therefore, both geographic philopatry and social philopatry influence genetic structure in the sperm whale, but their relative importance differs by sex and ocean, reflecting breeding behaviour, geographic features and perhaps a more recent origin of sperm whales in the Pacific. By investigating the interplay of evolutionary forces operating at different temporal and geographic scales, we show that sperm whales are perhaps a unique example of a worldwide population expansion followed by rapid assortment due to female social organization.
Copy number variants (CNVs) are subject to stronger selective pressure than single-nucleotide variants, but their roles in archaic introgression and adaptation have not been systematically ...investigated. We show that stratified CNVs are significantly associated with signatures of positive selection in Melanesians and provide evidence for adaptive introgression of large CNVs at chromosomes 16p11.2 and 8p21.3 from Denisovans and Neanderthals, respectively. Using long-read sequence data, we reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations. Our results collectively suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation.
Social affective and communication symptoms are central to autism spectrum disorder (ASD), yet their severity differs across toddlers: Some toddlers with ASD display improving abilities across early ...ages and develop good social and language skills, while others with "profound" autism have persistently low social, language and cognitive skills and require lifelong care. The biological origins of these opposite ASD social severity subtypes and developmental trajectories are not known.
Because ASD involves early brain overgrowth and excess neurons, we measured size and growth in 4910 embryonic-stage brain cortical organoids (BCOs) from a total of 10 toddlers with ASD and 6 controls (averaging 196 individual BCOs measured/subject). In a 2021 batch, we measured BCOs from 10 ASD and 5 controls. In a 2022 batch, we tested replicability of BCO size and growth effects by generating and measuring an independent batch of BCOs from 6 ASD and 4 control subjects. BCO size was analyzed within the context of our large, one-of-a-kind social symptom, social attention, social brain and social and language psychometric normative datasets ranging from N = 266 to N = 1902 toddlers. BCO growth rates were examined by measuring size changes between 1- and 2-months of organoid development. Neurogenesis markers at 2-months were examined at the cellular level. At the molecular level, we measured activity and expression of Ndel1; Ndel1 is a prime target for cell cycle-activated kinases; known to regulate cell cycle, proliferation, neurogenesis, and growth; and known to be involved in neuropsychiatric conditions.
At the BCO level, analyses showed BCO size was significantly enlarged by 39% and 41% in ASD in the 2021 and 2022 batches. The larger the embryonic BCO size, the more severe the ASD social symptoms. Correlations between BCO size and social symptoms were r = 0.719 in the 2021 batch and r = 0. 873 in the replication 2022 batch. ASD BCOs grew at an accelerated rate nearly 3 times faster than controls. At the cell level, the two largest ASD BCOs had accelerated neurogenesis. At the molecular level, Ndel1 activity was highly correlated with the growth rate and size of BCOs. Two BCO subtypes were found in ASD toddlers: Those in one subtype had very enlarged BCO size with accelerated rate of growth and neurogenesis; a profound autism clinical phenotype displaying severe social symptoms, reduced social attention, reduced cognitive, very low language and social IQ; and substantially altered growth in specific cortical social, language and sensory regions. Those in a second subtype had milder BCO enlargement and milder social, attention, cognitive, language and cortical differences.
Larger samples of ASD toddler-derived BCO and clinical phenotypes may reveal additional ASD embryonic subtypes.
By embryogenesis, the biological bases of two subtypes of ASD social and brain development-profound autism and mild autism-are already present and measurable and involve dysregulated cell proliferation and accelerated neurogenesis and growth. The larger the embryonic BCO size in ASD, the more severe the toddler's social symptoms and the more reduced the social attention, language ability, and IQ, and the more atypical the growth of social and language brain regions.
Recurrent copy-number variations (CNVs) at chromosome 16p11.2 are associated with neurodevelopmental diseases, skeletal system abnormalities, anemia, and genitourinary defects. Among the 40 ...protein-coding genes encompassed within the rearrangement, some have roles in leukocyte biology and immunodeficiency, like SPN and CORO1A. We therefore investigated leukocyte differential counts and disease in 16p11.2 CNV carriers. In our clinically-recruited cohort, we identified three deletion carriers from two families (out of 32 families assessed) with neutropenia and lymphopenia. They had no deleterious single-nucleotide or indel variant in known cytopenia genes, suggesting a possible causative role of the deletion. Noticeably, all three individuals had the lowest copy number of the human-specific BOLA2 duplicon (copy-number range: 3-8). Consistent with the lymphopenia and in contrast with the neutropenia associations, adult deletion carriers from UK biobank (n = 74) showed lower lymphocyte (Padj = 0.04) and increased neutrophil (Padj = 8.31e-05) counts. Mendelian randomization studies pinpointed to reduced CORO1A, KIF22, and BOLA2-SMG1P6 expressions being causative for the lower lymphocyte counts. In conclusion, our data suggest that 16p11.2 deletion, and possibly also the lowest dosage of the BOLA2 duplicon, are associated with low lymphocyte counts. There is a trend between 16p11.2 deletion with lower copy-number of the BOLA2 duplicon and higher susceptibility to moderate neutropenia. Higher numbers of cases are warranted to confirm the association with neutropenia and to resolve the involvement of the deletion coupled with deleterious variants in other genes and/or with the structure and copy number of segments in the CNV breakpoint regions.