Plant secondary cell walls constitute the majority of plant biomass. They are predominantly found in xylem cells, which are derived from vascular initials during vascularization. Little is known ...about these processes in grass species despite their emerging importance as biomass feedstocks. The targeted biofuel crop Sorghum bicolor has a sequenced and well-annotated genome, making it an ideal monocot model for addressing vascularization and biomass deposition.
Here we generated tissue-specific transcriptome and DNA methylome data from sorghum shoots, roots and developing root vascular and nonvascular tissues.
Many genes associated with vascular development in other species show enriched expression in developing vasculature. However, several transcription factor families varied in vascular expression in sorghum compared with Arabidopsis and maize. Furthermore, differential expression of genes associated with DNA methylation were identified between vascular and nonvascular tissues, implying that changes in DNA methylation are a feature of sorghum root vascularization, which we confirmed using tissue-specific DNA methylome data. Roots treated with a DNA methylation inhibitor also showed a significant decrease in root length.
Tissues and organs can be discriminated based on their genomic methylation patterns and methylation context. Consequently, tissue-specific changes in DNA methylation are part of the normal developmental process.
Full text
Available for:
BFBNIB, FZAB, GIS, IJS, KILJ, NLZOH, NMLJ, NUK, OILJ, PNG, SAZU, SBCE, SBMB, UL, UM, UPUK
Core Ideas
This was the largest panel of switchgrass genetic diversity generated to date.
The Gulf coast of the United States is the center of genetic diversity for switchgrass.
There was a genetic ...bottleneck in upland switchgrass.
Switchgrass (Panicum virgatum L.) is a perennial native North American grass present in two ecotypes: upland, found primarily in the northern range of switchgrass habitats, and lowland, found largely in the southern reaches of switchgrass habitats. Previous studies focused on a diversity panel of primarily northern switchgrass, so to expand our knowledge of genetic diversity in a broader set of North American switchgrass, exome capture sequence data were generated for 632 additional, primarily lowland individuals. In total, over 37 million single nucleotide polymorphisms (SNPs) were identified and a set of 1.9 million high‐confidence SNPs were obtained from 1169 individuals from 140 populations (67 upland, 65 lowland, 8 admixed) were used in downstream analyses of genetic diversity and population structure. Seven separate population groups were identified with moderate genetic differentiation mean fixation index (Fst) estimate of 0.06 between the lowland and the upland populations. Ecotype‐specific and population‐specific SNPs were identified for use in germplasm evaluations. Relative to rice (Oryza sativa L.), maize (Zea mays L.), soybean Glycine max (L.) Merr., and Medicago truncatula Gaertn., analyses of nucleotide diversity revealed a high degree of genetic diversity (0.0135) across all individuals, consistent with the outcrossing mode of reproduction and the polyploidy of switchgrass. This study supports the hypothesis that repeated glaciation events, ploidy barriers, and restricted gene flow caused by flowering time differences have resulted in distinct gene pools across ecotypes and geographic regions. These data provide a resource to associate alleles with traits of interest for forage, restoration, and biofuel feedstock efforts in switchgrass.
Full text
Available for:
FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SAZU, SBCE, SBMB, UL, UM, UPUK
Summary
Panicum virgatum L. (switchgrass) is a polyploid, perennial grass species that is native to North America, and is being developed as a future biofuel feedstock crop. Switchgrass is present ...primarily in two ecotypes: a northern upland ecotype, composed of tetraploid and octoploid accessions, and a southern lowland ecotype, composed of primarily tetraploid accessions. We employed high‐coverage exome capture sequencing (~2.4 Tb) to genotype 537 individuals from 45 upland and 21 lowland populations. From these data, we identified ~27 million single‐nucleotide polymorphisms (SNPs), of which 1 590 653 high‐confidence SNPs were used in downstream analyses of diversity within and between the populations. From the 66 populations, we identified five primary population groups within the upland and lowland ecotypes, a result that was further supported through genetic distance analysis. We identified conserved, ecotype‐restricted, non‐synonymous SNPs that are predicted to affect the protein function of CONSTANS (CO) and EARLY HEADING DATE 1 (EHD1), key genes involved in flowering, which may contribute to the phenotypic differences between the two ecotypes. We also identified, relative to the near‐reference Kanlow population, 17 228 genes present in more copies than in the reference genome (up‐CNVs), 112 630 genes present in fewer copies than in the reference genome (down‐CNVs) and 14 430 presence/absence variants (PAVs), affecting a total of 9979 genes, including two upland‐specific CNV clusters. In total, 45 719 genes were affected by an SNP, CNV, or PAV across the panel, providing a firm foundation to identify functional variation associated with phenotypic traits of interest for biofuel feedstock production.
Significance Statement
In order to develop switchgrass as a biofuel crop, it is important to understand the genetic and phenotypic diversity available in native populations. Here we used more than 2.4 Gb of sequence data from 66 divergent switchgrass populations to identify five population groups delineated by ploidy, ecotype, and geographic location, and found ecotype‐restricted sequence and structural variation in flowering time pathway genes. This sequence data provides a rich resource with which to identify functional variation for phenotypic traits.
Full text
Available for:
BFBNIB, FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SAZU, SBCE, SBMB, UL, UM, UPUK
Abstract
The Mouse Phenome Database (MPD; https://phenome.jax.org) is a widely used resource that provides access to primary experimental trait data, genotypic variation, protocols and analysis tools ...for mouse genetic studies. Data are contributed by investigators worldwide and represent a broad scope of phenotyping endpoints and disease-related traits in naïve mice and those exposed to drugs, environmental agents or other treatments. MPD houses individual animal data with detailed, searchable protocols, and makes these data available to other resources via API. MPD provides rigorous curation of experimental data and supporting documentation using relevant ontologies and controlled vocabularies. Most data in MPD are from inbreds and other reproducible strains such that the data are cumulative over time and across laboratories. The resource has been expanded to include the QTL Archive and other primary phenotype data from mapping crosses as well as advanced high-diversity mouse populations including the Collaborative Cross and Diversity Outbred mice. Furthermore, MPD provides a means of assessing replicability and reproducibility across experimental conditions and protocols, benchmarking assays in users' own laboratories, identifying sensitized backgrounds for making new mouse models with genome editing technologies, analyzing trait co-inheritance, finding the common genetic basis for multiple traits and assessing sex differences and sex-by-genotype interactions.
Coupling bisulfite conversion with next-generation sequencing (Bisulfite-seq) enables genome-wide measurement of DNA methylation, but poses unique challenges for mapping. However, despite a ...proliferation of Bisulfite-seq mapping tools, no systematic comparison of their genomic coverage and quantitative accuracy has been reported. We sequenced bisulfite-converted DNA from two tissues from each of two healthy human adults and systematically compared five widely used Bisulfite-seq mapping algorithms: Bismark, BSMAP, Pash, BatMeth and BS Seeker. We evaluated their computational speed and genomic coverage and verified their percentage methylation estimates. With the exception of BatMeth, all mappers covered >70% of CpG sites genome-wide and yielded highly concordant estimates of percentage methylation (r(2) ≥ 0.95). Fourfold variation in mapping time was found between BSMAP (fastest) and Pash (slowest). In each library, 8-12% of genomic regions covered by Bismark and Pash were not covered by BSMAP. An experiment using simulated reads confirmed that Pash has an exceptional ability to uniquely map reads in genomic regions of structural variation. Independent verification by bisulfite pyrosequencing generally confirmed the percentage methylation estimates by the mappers. Of these algorithms, Bismark provides an attractive combination of processing speed, genomic coverage and quantitative accuracy, whereas Pash offers considerably higher genomic coverage.
Effectors of Type III Secretion System (T3SS) play a pivotal role in establishing and maintaining pathogenicity in the host and therefore the identification of these effectors is important in ...understanding virulence. However, the effectors display high level of sequence diversity, therefore making the identification a difficult process. There is a need to collate and annotate existing effector sequences in public databases to enable systematic analyses of these sequences for development of models for screening and selection of putative novel effectors from bacterial genomes that can be validated by a smaller number of key experiments.
Herein, we present T3SEdb http://effectors.bic.nus.edu.sg/T3SEdb, a specialized database of annotated T3SS effector (T3SE) sequences containing 1089 records from 46 bacterial species compiled from the literature and public protein databases. Procedures have been defined for i) comprehensive annotation of experimental status of effectors, ii) submission and curation review of records by users of the database, and iii) the regular update of T3SEdb existing and new records. Keyword fielded and sequence searches (BLAST, regular expression) are supported for both experimentally verified and hypothetical T3SEs. More than 171 clusters of T3SEs were detected based on sequence identity comparisons (intra-cluster difference up to ~60%). Owing to this high level of sequence diversity of T3SEs, the T3SEdb provides a large number of experimentally known effector sequences with wide species representation for creation of effector predictors. We created a reliable effector prediction tool, integrated into the database, to demonstrate the application of the database for such endeavours.
T3SEdb is the first specialised database reported for T3SS effectors, enriched with manual annotations that facilitated systematic construction of a reliable prediction model for identification of novel effectors. The T3SEdb represents a platform for inclusion of additional annotations of metadata for future developments of sophisticated effector prediction models for screening and selection of putative novel effectors from bacterial genomes/proteomes that can be validated by a small number of key experiments.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
MVAR: A Mouse Variation Registry El Kassaby, Bahá; Castellanos, Francisco; Gerring, Matthew ...
Journal of Molecular Biology/Journal of molecular biology,
09/2024, Volume:
436, Issue:
17
Journal Article
Peer reviewed
Open access
Display omitted
•MVAR aggregates and annotates genome variation from large-scale sequencing of different mouse strains and expertly curated variants for phenotypic alleles.•Variant annotation in MVAR ...includes variant type, molecular consequence, impact, and region.•Data in MVAR are accessible in both human- and machine- readable formats.•MVAR serves as both a stand-alone database of mouse genome variation and as a variant annotation service.•MVAR is a platform for facilitating genotype-phenotype associations in the laboratory mouse.•MVAR resource was implemented using a micro-services architecture, providing both interoperability and ease of software maintenance.
The Mouse Variation Registry (MVAR) resource is a scalable registry of mouse single nucleotide variants and small indels and variant annotation. The resource accepts data in standard Variant Call Format (VCF) and assesses the uniqueness of the submitted variants via a canonicalization process. Novel variants are assigned a unique, persistent MVAR identifier; variants that are equivalent to an existing variant in the resource are associated with the existing identifier. Annotations for variant type, molecular consequence, impact, and genomic region in the context of specific transcripts and protein sequences are generated using Ensembl’s Variant Effect Predictor (VEP) and Jannovar. Access to the data and annotations in MVAR are supported via an Application Programming Interface (API) and web application. Researchers can search the resource by gene symbol, genomic region, variant (expressed in Human Genome Variation Society syntax), refSNP identifiers, or MVAR identifiers. Tabular search results can be filtered by variant annotations (variant type, molecular consequence, impact, variant region) and viewed according to variant distribution across mouse strains. The registry currently comprises more than 99 million canonical single nucleotide variants for 581 strains of mice. MVAR is accessible from https://mvar.jax.org.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
N6-methyldeoxyadenine (6mA) is a noncanonical DNA base modification present at low levels in plant and animal genomes, but its prevalence and association with genome function in other eukaryotic ...lineages remains poorly understood. Here we report that abundant 6mA is associated with transcriptionally active genes in early-diverging fungal lineages. Using single-molecule long-read sequencing of 16 diverse fungal genomes, we observed that up to 2.8% of all adenines were methylated in early-diverging fungi, far exceeding levels observed in other eukaryotes and more derived fungi. 6mA occurred symmetrically at ApT dinucleotides and was concentrated in dense methylated adenine clusters surrounding the transcriptional start sites of expressed genes; its distribution was inversely correlated with that of 5-methylcytosine. Our results show a striking contrast in the genomic distributions of 6mA and 5-methylcytosine and reinforce a distinct role for 6mA as a gene-expression-associated epigenomic mark in eukaryotes.
Full text
Available for:
IJS, NUK, SBMB, UL, UM, UPUK