After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end ...to end, and hundreds of unresolved gaps persist
. Here we present a human genome assembly that surpasses the continuity of GRCh38
, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome
, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.
The current version of the human reference genome, GRCh38, contains a number of errors including 1.2 Mbp of falsely duplicated and 8.04 Mbp of collapsed regions. These errors impact the variant ...calling of 33 protein-coding genes, including 12 with medical relevance. Here, we present FixItFelix, an efficient remapping approach, together with a modified version of the GRCh38 reference genome that improves the subsequent analysis across these genes within minutes for an existing alignment file while maintaining the same coordinates. We showcase these improvements over multi-ethnic control samples, demonstrating improvements for population variant calling as well as eQTL studies.
The Atacama Desert in Chile-hyperarid and with high-ultraviolet irradiance levels-is one of the harshest environments on Earth. Yet, dozens of species grow there, including Atacama-endemic plants. ...Herein, we establish the Talabre-Lejía transect (TLT) in the Atacama as an unparalleled natural laboratory to study plant adaptation to extreme environmental conditions. We characterized climate, soil, plant, and soil-microbe diversity at 22 sites (every 100 m of altitude) along the TLT over a 10-y period. We quantified drought, nutrient deficiencies, large diurnal temperature oscillations, and pH gradients that define three distinct vegetational belts along the altitudinal cline. We deep-sequenced transcriptomes of 32 dominant plant species spanning the major plant clades, and assessed soil microbes by metabarcoding sequencing. The top-expressed genes in the 32 Atacama species are enriched in stress responses, metabolism, and energy production. Moreover, their root-associated soils are enriched in growth-promoting bacteria, including nitrogen fixers. To identify genes associated with plant adaptation to harsh environments, we compared 32 Atacama species with the 32 closest sequenced species, comprising 70 taxa and 1,686,950 proteins. To perform phylogenomic reconstruction, we concatenated 15,972 ortholog groups into a supermatrix of 8,599,764 amino acids. Using two codon-based methods, we identified 265 candidate positively selected genes (PSGs) in the Atacama plants, 64% of which are located in Pfam domains, supporting their functional relevance. For 59/184 PSGs with an
ortholog, we uncovered functional evidence linking them to plant resilience. As some Atacama plants are closely related to staple crops, these candidate PSGs are a "genetic goldmine" to engineer crop resilience to face climate change.
Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first telomere-to-telomere human genome assembly, which resolves complex segmental ...duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Although derived from highly accurate sequences, evaluation revealed evidence of small errors and structural misassemblies in the initial draft assembly. To correct these errors, we designed a new repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly quality value from 70.2 to 73.9 measured from PacBio high-fidelity and Illumina k-mers. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both high-fidelity and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies.
Abstract
Emerging evidence links genes within human-specific segmental duplications (HSDs) to traits and diseases unique to our species. Strikingly, despite being nearly identical by sequence ...(>98.5%), paralogous HSD genes are differentially expressed across human cell and tissue types, though the underlying mechanisms have not been examined. We compared cross-tissue mRNA levels of 75 HSD genes from 30 families between humans and chimpanzees and found expression patterns consistent with relaxed selection on or neofunctionalization of derived paralogs. In general, ancestral paralogs exhibited greatest expression conservation with chimpanzee orthologs, though exceptions suggest certain derived paralogs may retain or supplant ancestral functions. Concordantly, analysis of long-read isoform sequencing data sets from diverse human tissues and cell lines found that about half of derived paralogs exhibited globally lower expression. To understand mechanisms underlying these differences, we leveraged data from human lymphoblastoid cell lines (LCLs) and found no relationship between paralogous expression divergence and post-transcriptional regulation, sequence divergence, or copy-number variation. Considering cis-regulation, we reanalyzed ENCODE data and recovered hundreds of previously unidentified candidate CREs in HSDs. We also generated large-insert ChIP-sequencing data for active chromatin features in an LCL to better distinguish paralogous regions. Some duplicated CREs were sufficient to drive differential reporter activity, suggesting they may contribute to divergent cis-regulation of paralogous genes. This work provides evidence that cis-regulatory divergence contributes to novel expression patterns of recent gene duplicates in humans.
Comprehending ecological dynamics requires not only knowledge of modern communities but also detailed reconstructions of ecosystem history. Ancient DNA (aDNA) metabarcoding allows biodiversity ...responses to major climatic change to be explored at different spatial and temporal scales. We extracted aDNA preserved in fossil rodent middens to reconstruct late Quaternary vegetation dynamics in the hyperarid Atacama Desert. By comparing our paleo‐informed millennial record with contemporary observations of interannual variations in diversity, we show local plant communities behave differentially at different timescales. In the interannual (years to decades) time frame, only annual herbaceous expand and contract their distributional ranges (emerging from persistent seed banks) in response to precipitation, whereas perennials distribution appears to be extraordinarily resilient. In contrast, at longer timescales (thousands of years) many perennial species were displaced up to 1,000 m downslope during pluvial events. Given ongoing and future natural and anthropogenically induced climate change, our results not only provide baselines for vegetation in the Atacama Desert, but also help to inform how these and other high mountain plant communities may respond to fluctuations of climate in the future.
Comprehending ecological dynamics requires not only knowledge of modern communities but also detailed reconstructions of ecosystem history. In this study, we compared long‐term records of Atacama Desert plant biodiversity preserved in fossils to interannual variations in diversity. In the interannual (years to decades) time frame, only annual herbaceous expand and contract their distributional ranges in response to precipitation. In contrast, at longer timescales (thousands of years) many perennial and annual species were displaced up to 1,000 m downslope during pluvial events. Our results show how plant communities have responded to past climate change and could help predict how they may respond in the future.
The enhanced cognitive abilities characterizing the human species result from specialized features of neurons and circuits. Here, we report that the hominid-specific gene LRRC37B encodes a receptor ...expressed in human cortical pyramidal neurons (CPNs) and selectively localized to the axon initial segment (AIS), the subcellular compartment triggering action potentials. Ectopic expression of LRRC37B in mouse CPNs in vivo leads to reduced intrinsic excitability, a distinctive feature of some classes of human CPNs. Molecularly, LRRC37B binds to the secreted ligand FGF13A and to the voltage-gated sodium channel (Nav) β-subunit SCN1B. LRRC37B concentrates inhibitory effects of FGF13A on Nav channel function, thereby reducing excitability, specifically at the AIS level. Electrophysiological recordings in adult human cortical slices reveal lower neuronal excitability in human CPNs expressing LRRC37B. LRRC37B thus acts as a species-specific modifier of human neuron excitability, linking human genome and cell evolution, with important implications for human brain function and diseases.
Revealing hidden plant diversity in arid environments Carrasco‐Puga, Gabriela; Díaz, Francisca P.; Soto, Daniela C. ...
Ecography (Copenhagen),
January 2021, 2021-01-00, 20210101, Volume:
44, Issue:
1
Journal Article
Peer reviewed
Open access
Estimating total plant diversity in extreme or hyperarid environments can be challenging, as adaptations to pronounced climate variability include evading prolonged stress periods through seeds or ...specialized underground organs. Short‐term surveys of these ecosystems are thus likely poor estimators of actual diversity. Here we develop a multimethod strategy to obtain a more complete understanding of plant diversity from a community in the Atacama Desert. We explicitly test environmental DNA‐based techniques (eDNA) to see if they can reveal the observed and ‘hidden' (dormant or locally rare) species.
To estimate total plant diversity, we performed long‐term traditional surveys during eight consecutive years, including El Niño and La Niña events, we then analyzed eDNA from soil samples using high‐throughput sequencing. We further used soil pollen analysis and soil seed bank germination assays to identify ‘hidden' species. Each approach offers different subsets of current biodiversity at different taxonomic, spatial and temporal resolution, with a total of 92 taxa identified along the transect. Traditional field surveys identified 77 plant species over eight consecutive years. Observed community composition greatly varies interannually, with only 22 species seen every year. eDNA analysis revealed 37 taxa, eight of which were ‘hidden' in our field surveys. Soil samples contain a viable seed bank of 21 taxa. Soil pollen (27 taxa) and eDNA analysis show affinities with vegetation at the landscape scale but a weak relationship to local plot diversity.
Multimethod approaches (including eDNA) in deserts are valuable tools that add to a comprehensive assessment of biodiversity in such extreme environments, where using a single method or observations over a few years is insufficient. Our results can also explain the resilience of Atacama plant communities as ‘hidden' taxa may have been active in the recent past or could even emerge in the future as accelerated global environmental change continues unabated.
The rapid increase in the availability of transcriptomics data generated by RNA sequencing represents both a challenge and an opportunity for biologists without bioinformatics training. The challenge ...is handling, integrating, and interpreting these data sets. The opportunity is to use this information to generate testable hypothesis to understand molecular mechanisms controlling gene expression and biological processes (Fig. 1). A successful strategy to generate tractable hypotheses from transcriptomics data has been to build undirected network graphs based on patterns of gene co-expression. Many examples of new hypothesis derived from network analyses can be found in the literature, spanning different organisms including plants and specific fields such as root developmental biology.In order to make the process of constructing a gene co-expression network more accessible to biologists, here we provide step-by-step instructions using published RNA-seq experimental data obtained from a public database. Similar strategies have been used in previous studies to advance root developmental biology. This guide includes basic instructions for the operation of widely used open source platforms such as Bio-Linux, R, and Cytoscape. Even though the data we used in this example was obtained from Arabidopsis thaliana, the workflow developed in this guide can be easily adapted to work with RNA-seq data from any organism.
Autism spectrum disorder (ASD) involves complex genetics interacting with the perinatal environment, complicating the discovery of common genetic risk. The epigenetic layer of DNA methylation shows ...dynamic developmental changes and molecular memory of in utero experiences, particularly in placenta, a fetal tissue discarded at birth. However, current array-based methods to identify novel ASD risk genes lack coverage of the most structurally and epigenetically variable regions of the human genome.
We use whole genome bisulfite sequencing in placenta samples from prospective ASD studies to discover a previously uncharacterized ASD risk gene, LOC105373085, renamed NHIP. Out of 134 differentially methylated regions associated with ASD in placental samples, a cluster at 22q13.33 corresponds to a 118-kb hypomethylated block that replicates in two additional cohorts. Within this locus, NHIP is functionally characterized as a nuclear peptide-encoding transcript with high expression in brain, and increased expression following neuronal differentiation or hypoxia, but decreased expression in ASD placenta and brain. NHIP overexpression increases cellular proliferation and alters expression of genes regulating synapses and neurogenesis, overlapping significantly with known ASD risk genes and NHIP-associated genes in ASD brain. A common structural variant disrupting the proximity of NHIP to a fetal brain enhancer is associated with NHIP expression and methylation levels and ASD risk, demonstrating a common genetic influence.
Together, these results identify and initially characterize a novel environmentally responsive ASD risk gene relevant to brain development in a hitherto under-characterized region of the human genome.