GetOrganelle is a state-of-the-art toolkit to accurately assemble organelle genomes from whole genome sequencing data. It recruits organelle-associated reads using a modified "baiting and iterative ...mapping" approach, conducts de novo assembly, filters and disentangles the assembly graph, and produces all possible configurations of circular organelle genomes. For 50 published plant datasets, we are able to reassemble the circular plastomes from 47 datasets using GetOrganelle. GetOrganelle assemblies are more accurate than published and/or NOVOPlasty-reassembled plastomes as assessed by mapping. We also assemble complete mitochondrial genomes using GetOrganelle. GetOrganelle is freely released under a GPL-3 license ( https://github.com/Kinggerm/GetOrganelle ).
Low-level DNA N6-methyldeoxyadenosine (DNA-m6A) has recently been reported across various eukaryotes. Although anti-m6A antibody–based approaches are commonly used to measure DNA-m6A levels, this ...approach is known to be confounded by DNA secondary structures, RNA contamination, and bacterial contamination. To evaluate for these confounding features, we introduce an approach for systematically validating the selectivity of antibody-based DNA-m6A methods and use a highly selective anti-DNA-m6A antibody to reexamine patterns of DNA-m6A in C. reinhardtii, A. thaliana, and D. melanogaster. Our findings raise caution about the use of antibody-based methods for endogenous m6A quantification and mapping in eukaryotes.
Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < ...5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored
. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10
) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.
A fundamental question in microbiology is whether there is continuum of genetic diversity among genomes, or clear species boundaries prevail instead. Whole-genome similarity metrics such as Average ...Nucleotide Identity (ANI) help address this question by facilitating high resolution taxonomic analysis of thousands of genomes from diverse phylogenetic lineages. To scale to available genomes and beyond, we present FastANI, a new method to estimate ANI using alignment-free approximate sequence mapping. FastANI is accurate for both finished and draft genomes, and is up to three orders of magnitude faster compared to alignment-based approaches. We leverage FastANI to compute pairwise ANI values among all prokaryotic genomes available in the NCBI database. Our results reveal clear genetic discontinuity, with 99.8% of the total 8 billion genome pairs analyzed conforming to >95% intra-species and <83% inter-species ANI values. This discontinuity is manifested with or without the most frequently sequenced species, and is robust to historic additions in the genome databases.
Schizophrenia has a heritability of 60-80%
, much of which is attributable to common risk alleles. Here, in a two-stage genome-wide association study of up to 76,755 individuals with schizophrenia ...and 243,649 control individuals, we report common variant associations at 287 distinct genomic loci. Associations were concentrated in genes that are expressed in excitatory and inhibitory neurons of the central nervous system, but not in other tissues or cell types. Using fine-mapping and functional genomic data, we identify 120 genes (106 protein-coding) that are likely to underpin associations at some of these loci, including 16 genes with credible causal non-synonymous or untranslated region variation. We also implicate fundamental processes related to neuronal function, including synaptic organization, differentiation and transmission. Fine-mapped candidates were enriched for genes associated with rare disruptive coding variants in people with schizophrenia, including the glutamate receptor subunit GRIN2A and transcription factor SP4, and were also enriched for genes implicated by such variants in neurodevelopmental disorders. We identify biological processes relevant to schizophrenia pathophysiology; show convergence of common and rare variant associations in schizophrenia and neurodevelopmental disorders; and provide a resource of prioritized genes and variants to advance mechanistic studies.
Glycosylation is the most abundant and diverse form of post-translational modification of proteins that is common to all eukaryotic cells. Enzymatic glycosylation of proteins involves a complex ...metabolic network and different types of glycosylation pathways that orchestrate enormous amplification of the proteome in producing diversity of proteoforms and its biological functions. The tremendous structural diversity of glycans attached to proteins poses analytical challenges that limit exploration of specific functions of glycosylation. Major advances in quantitative transcriptomics, proteomics and nuclease-based gene editing are now opening new global ways to explore protein glycosylation through analysing and targeting enzymes involved in glycosylation processes. In silico models predicting cellular glycosylation capacities and glycosylation outcomes are emerging, and refined maps of the glycosylation pathways facilitate genetic approaches to address functions of the vast glycoproteome. These approaches apply commonly available cell biology tools, and we predict that use of (single-cell) transcriptomics, genetic screens, genetic engineering of cellular glycosylation capacities and custom design of glycoprotein therapeutics are advancements that will ignite wider integration of glycosylation in general cell biology.