Variability in SARS-CoV-2 susceptibility and COVID-19 disease severity between individuals is partly due to genetic factors. Here, we identify 4 genomic loci with suggestive associations for ...SARS-CoV-2 susceptibility and 19 for COVID-19 disease severity. Four of these 23 loci likely have an ethnicity-specific component. Genome-wide association study (GWAS) signals in 11 loci colocalize with expression quantitative trait loci (eQTLs) associated with the expression of 20 genes in 62 tissues/cell types (range: 1:43 tissues/gene), including lung, brain, heart, muscle, and skin as well as the digestive system and immune system. We perform genetic fine mapping to compute 99% credible SNP sets, which identify 10 GWAS loci that have eight or fewer SNPs in the credible set, including three loci with one single likely causal SNP. Our study suggests that the diverse symptoms and disease severity of COVID-19 observed between individuals is associated with variants across the genome, affecting gene expression levels in a wide variety of tissue types.
Display omitted
•Identification of 23 genomic loci with suggestive associations for COVID-19 disease•Colocalized GWAS and eQTL signals associate with expression of 20 genes in 62 tissues•In total, 45% of GWAS signals do not colocalize with eQTLs in blood or lung•Genetic fine mapping identifies putative causal variants at COVID-19 GWAS loci
D’Antonio et al. characterize associations between GWAS signals for COVID-19 disease and eQTLs in 69 human tissues to identify causal variants and their underlying molecular mechanisms. They show that diverse symptoms and disease severity of COVID-19 are associated with variants affecting gene expression in a wide variety of tissues.
Despite their importance in gene innovation and phenotypic variation, duplicated regions have remained largely intractable owing to difficulties in accurately resolving their structure, copy number ...and sequence content. We present an algorithm (mrFAST) to comprehensively map next-generation sequence reads, which allows for the prediction of absolute copy-number variation of duplicated segments and genes. We examine three human genomes and experimentally validate genome-wide copy number differences. We estimate that, on average, 73-87 genes vary in copy number between any two individuals and find that these genic differences overwhelmingly correspond to segmental duplications (odds ratio = 135; P < 2.2 x 10(-16)). Our method can distinguish between different copies of highly identical genes, providing a more accurate assessment of gene content and insight into functional constraint without the limitations of array-based technology.
Differential gene expression defines individual neuron types and determines how each contributes to circuit physiology and responds to injury and disease. The C. elegans Neuronal Gene Expression Map ...& Network (CeNGEN) will establish a comprehensive gene expression atlas of an entire nervous system at single-neuron resolution.
Differential gene expression determines how neurons contribute to circuit physiology and respond to injury and disease. The C. elegans Neuronal Gene Expression Map & Network (CeNGEN) will establish a high-density gene expression atlas of an entire nervous system.
The Cucurbitaceae includes important crops such as cucumber, melon, watermelon, squash and pumpkin. However, few genetic and genomic resources are available for plant improvement. Some cucurbit ...species such as cucumber have a narrow genetic base, which impedes construction of saturated molecular linkage maps. We report herein the development of highly polymorphic simple sequence repeat (SSR) markers originated from whole genome shotgun sequencing and the subsequent construction of a high-density genetic linkage map. This map includes 995 SSRs in seven linkage groups which spans in total 573 cM, and defines ~680 recombination breakpoints with an average of 0.58 cM between two markers. These linkage groups were then assigned to seven corresponding chromosomes using fluorescent in situ hybridization (FISH). FISH assays also revealed a chromosomal inversion between Cucumis subspecies C. sativus var. sativus L. and var. hardwickii (R.) Alef, which resulted in marker clustering on the genetic map. A quarter of the mapped markers showed relatively high polymorphism levels among 11 inbred lines of cucumber. Among the 995 markers, 49%, 26% and 22% were conserved in melon, watermelon and pumpkin, respectively. This map will facilitate whole genome sequencing, positional cloning, and molecular breeding in cucumber, and enable the integration of knowledge of gene and trait in cucurbits.
Teosinte, the progenitor of maize, is restricted to tropical environments in Mexico and Central America. The pre-Columbian spread of maize from its center of origin in tropical Southern Mexico to the ...higher latitudes of the Americas required postdomestication selection for adaptation to longer day lengths. Flowering time of teosinte and tropical maize is delayed under long day lengths, whereas temperate maize evolved a reduced sensitivity to photoperiod. We measured flowering time of the maize nested association and diverse association mapping panels in the field under both short and long day lengths, and of a maize-teosinte mapping population under long day lengths. Flowering time in maize is a complex trait affected by many genes and the environment. Photoperiod response is one component of flowering time involving a subset of flowering time genes whose effects are strongly influenced by day length. Genome-wide association and targeted high-resolution linkage mapping identified ZmCCT , a homologue of the rice photoperiod response regulator Ghd7, as the most important gene affecting photoperiod response in maize. Under long day lengths ZmCCT alleles from diverse teosintes are consistently expressed at higher levels and confer later flowering than temperate maize alleles. Many maize inbred lines, including some adapted to tropical regions, carry ZmCCT alleles with no sensitivity to day length. Indigenous farmers of the Americas were remarkably successful at selecting on genetic variation at key genes affecting the photoperiod response to create maize varieties adapted to vastly diverse environments despite the hindrance of the geographic axis of the Americas and the complex genetic control of flowering time.
Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome ...sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.
Although draft genomes are available for most agronomically important plant species, the majority are incomplete, highly fragmented, and often riddled with assembly and scaffolding errors. These ...assembly issues hinder advances in tool development for functional genomics and systems biology.
Here we utilized a robust, cost-effective approach to produce high-quality reference genomes. We report a near-complete genome of diploid woodland strawberry (Fragaria vesca) using single-molecule real-time sequencing from Pacific Biosciences (PacBio). This assembly has a contig N50 length of ∼7.9 million base pairs (Mb), representing a ∼300-fold improvement of the previous version. The vast majority (>99.8%) of the assembly was anchored to 7 pseudomolecules using 2 sets of optical maps from Bionano Genomics. We obtained ∼24.96 Mb of sequence not present in the previous version of the F. vesca genome and produced an improved annotation that includes 1496 new genes. Comparative syntenic analyses uncovered numerous, large-scale scaffolding errors present in each chromosome in the previously published version of the F. vesca genome.
Our results highlight the need to improve existing short-read based reference genomes. Furthermore, we demonstrate how genome quality impacts commonly used analyses for addressing both fundamental and applied biological questions.
Model-based molecular phylogenetics plays an important role in comparisons of genomic data, and model selection is a key step in all such analyses. We present ModelFinder, a fast model-selection ...method that greatly improves the accuracy of phylogenetic estimates by incorporating a model of rate heterogeneity across sites not previously considered in this context and by allowing concurrent searches of model space and tree space.
Expressed Sequence Tags (ESTs) are a source of simple sequence repeats (SSRs) that can be used to develop molecular markers for genetic studies. The availability of ESTs for Quercus robur and Quercus ...petraea provided a unique opportunity to develop microsatellite markers to accelerate research aimed at studying adaptation of these long-lived species to their environment. As a first step toward the construction of a SSR-based linkage map of oak for quantitative trait locus (QTL) mapping, we describe the mining and survey of EST-SSRs as well as a fast and cost-effective approach (bin mapping) to assign these markers to an approximate map position. We also compared the level of polymorphism between genomic and EST-derived SSRs and address the transferability of EST-SSRs in Castanea sativa (chestnut).
A catalogue of 103,000 Sanger ESTs was assembled into 28,024 unigenes from which 18.6% presented one or more SSR motifs. More than 42% of these SSRs corresponded to trinucleotides. Primer pairs were designed for 748 putative unigenes. Overall 37.7% (283) were found to amplify a single polymorphic locus in a reference full-sib pedigree of Quercus robur. The usefulness of these loci for establishing a genetic map was assessed using a bin mapping approach. Bin maps were constructed for the male and female parental tree for which framework linkage maps based on AFLP markers were available. The bin set consisting of 14 highly informative offspring selected based on the number and position of crossover sites. The female and male maps comprised 44 and 37 bins, with an average bin length of 16.5 cM and 20.99 cM, respectively. A total of 256 EST-SSRs were assigned to bins and their map position was further validated by linkage mapping. EST-SSRs were found to be less polymorphic than genomic SSRs, but their transferability rate to chestnut, a phylogenetically related species to oak, was higher.
We have generated a bin map for oak comprising 256 EST-SSRs. This resource constitutes a first step toward the establishment of a gene-based map for this genus that will facilitate the dissection of QTLs affecting complex traits of ecological importance.
Over the past decade, long-read, single-molecule DNA sequencing technologies have emerged as powerful players in genomics. With the ability to generate reads tens to thousands of kilobases in length ...with an accuracy approaching that of short-read sequencing technologies, these platforms have proven their ability to resolve some of the most challenging regions of the human genome, detect previously inaccessible structural variants and generate some of the first telomere-to-telomere assemblies of whole chromosomes. Long-read sequencing technologies will soon permit the routine assembly of diploid genomes, which will revolutionize genomics by revealing the full spectrum of human genetic variation, resolving some of the missing heritability and leading to the discovery of novel mechanisms of disease.