Genome sequences from over 200 plant species have already been published, with this number expected to increase rapidly due to advances in sequencing technologies. Once a new genome has been ...assembled and the genes identified, the functional annotation of their putative translational products, proteins, using ontologies is of key importance as it places the sequencing data in a biological context. Furthermore, to keep pace with rapid production of genome sequences, this functional annotation process must be fully automated. Here we present a redesigned and significantly enhanced MapMan4 framework, together with a revised version of the associated online Mercator annotation tool. Compared with the original MapMan, the new ontology has been expanded almost threefold and enforces stricter assignment rules. This framework was then incorporated into Mercator4, which has been upgraded to reflect current knowledge across the land plant group, providing protein annotations for all embryophytes with a comparably high quality. The annotation process has been optimized to allow a plant genome to be annotated in a matter of minutes. The output results continue to be compatible with the established MapMan desktop application.
MapMan4 is a substantial redesign of the MapMan framework incorporating the latest literature knowledge to provide greatly enhanced protein family granularity. The online Mercator4 tool uses this framework to rapidly functionally annotate protein sequences from any land plant species.
Although many next-generation sequencing (NGS) read preprocessing tools already existed, we could not find any tool or combination of tools that met our requirements in terms of flexibility, correct ...handling of paired-end data and high performance. We have developed Trimmomatic as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data.
The value of NGS read preprocessing is demonstrated for both reference-based and reference-free tasks. Trimmomatic is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.
Trimmomatic is licensed under GPL V3. It is cross-platform (Java 1.5+ required) and available at http://www.usadellab.org/cms/index.php?page=trimmomatic
usadel@bio1.rwth-aachen.de
Supplementary data are available at Bioinformatics online.
Updates in nanopore technology have made it possible to obtain gigabases of sequence data. Prior to this, nanopore sequencing technology was mainly used to analyze microbial samples. Here, we ...describe the generation of a comprehensive nanopore sequencing data set with a median read length of 11,979 bp for a self-compatible accession of the wild tomato species Solanum pennellii. We describe the assembly of its genome to a contig N50 of 2.5 MB. The assembly pipeline comprised initial read correction with Canu and assembly with SMARTdenovo. The resulting raw nanopore-based de novo genome is structurally highly similar to that of the reference S. pennellii LA716 accession but has a high error rate and was rich in homopolymer deletions. After polishing the assembly with Illumina reads, we obtained an error rate of <0.02% when assessed versus the same Illumina data. We obtained a gene completeness of 96.53%, slightly surpassing that of the reference S. pennellii. Taken together, our data indicate that such long read sequencing data can be used to affordably sequence and assemble gigabase-sized plant genomes.
Summary
The extreme sensitivity of the microsporogenesis process to moderately high or low temperatures is a major hindrance for tomato (Solanum lycopersicum) sexual reproduction and hence year‐round ...cropping. Consequently, breeding for parthenocarpy, namely, fertilization‐independent fruit set, is considered a valuable goal especially for maintaining sustainable agriculture in the face of global warming. A mutant capable of setting high‐quality seedless (parthenocarpic) fruit was found following a screen of EMS‐mutagenized tomato population for yielding under heat stress. Next‐generation sequencing followed by marker‐assisted mapping and CRISPR/Cas9 gene knockout confirmed that a mutation in SlAGAMOUS‐LIKE 6 (SlAGL6) was responsible for the parthenocarpic phenotype. The mutant is capable of fruit production under heat stress conditions that severely hamper fertilization‐dependent fruit set. Different from other tomato recessive monogenic mutants for parthenocarpy, Slagl6 mutations impose no homeotic changes, the seedless fruits are of normal weight and shape, pollen viability is unaffected, and sexual reproduction capacity is maintained, thus making Slagl6 an attractive gene for facultative parthenocarpy. The characteristics of the analysed mutant combined with the gene's mode of expression imply SlAGL6 as a key regulator of the transition between the state of ‘ovary arrest’ imposed towards anthesis and the fertilization‐triggered fruit set.
A parasitic lifestyle, where plants procure some or all of their nutrients from other living plants, has evolved independently in many dicotyledonous plant families and is a major threat for ...agriculture globally. Nevertheless, no genome sequence of a parasitic plant has been reported to date. Here we describe the genome sequence of the parasitic field dodder, Cuscuta campestris. The genome contains signatures of a fairly recent whole-genome duplication and lacks genes for pathways superfluous to a parasitic lifestyle. Specifically, genes needed for high photosynthetic activity are lost, explaining the low photosynthesis rates displayed by the parasite. Moreover, several genes involved in nutrient uptake processes from the soil are lost. On the other hand, evidence for horizontal gene transfer by way of genomic DNA integration from the parasite's hosts is found. We conclude that the parasitic lifestyle has left characteristic footprints in the C. campestris genome.
Recent rapid advances in next generation RNA sequencing (RNA-Seq)-based provide researchers with unprecedentedly large data sets and open new perspectives in transcriptomics. Furthermore, ...RNA-Seq-based transcript profiling can be applied to non-model and newly discovered organisms because it does not require a predefined measuring platform (like e.g. microarrays). However, these novel technologies pose new challenges: the raw data need to be rigorously quality checked and filtered prior to analysis, and proper statistical methods have to be applied to extract biologically relevant information. Given the sheer volume of data, this is no trivial task and requires a combination of considerable technical resources along with bioinformatics expertise. To aid the individual researcher, we have developed RobiNA as an integrated solution that consolidates all steps of RNA-Seq-based differential gene-expression analysis in one user-friendly cross-platform application featuring a rich graphical user interface. RobiNA accepts raw FastQ files, SAM/BAM alignment files and counts tables as input. It supports quality checking, flexible filtering and statistical analysis of differential gene expression based on state-of-the art biostatistical methods developed in the R/Bioconductor projects. In-line help and a step-by-step manual guide users through the analysis. Installer packages for Mac OS X, Windows and Linux are available under the LGPL licence from http://mapman.gabipd.org/web/guest/robin.
Summary
Recent advances in genomics technologies have greatly accelerated the progress in both fundamental plant science and applied breeding research. Concurrently, high‐throughput plant phenotyping ...is becoming widely adopted in the plant community, promising to alleviate the phenotypic bottleneck. While these technological breakthroughs are significantly accelerating quantitative trait locus (QTL) and causal gene identification, challenges to enable even more sophisticated analyses remain. In particular, care needs to be taken to standardize, describe and conduct experiments robustly while relying on plant physiology expertise. In this article, we review the state of the art regarding genome assembly and the future potential of pangenomics in plant research. We also describe the necessity of standardizing and describing phenotypic studies using the Minimum Information About a Plant Phenotyping Experiment (MIAPPE) standard to enable the reuse and integration of phenotypic data. In addition, we show how deep phenotypic data might yield novel trait−trait correlations and review how to link phenotypic data to genomic data. Finally, we provide perspectives on the golden future of machine learning and their potential in linking phenotypes to genomic features.
Significance statement
This review addresses the state of the art on plant genome sequencing, phenotyping and how to bridge these. Genome sequencing sees a new revolution and phenotyping is becoming extremely important.
High-throughput measurement of transcript intensities using Affymetrix type oligonucleotide microarrays has produced a massive quantity of data during the last decade. Different preprocessing ...techniques exist to convert the raw signal intensities measured by these chips into gene expression estimates. Although these techniques have been widely benchmarked in the context of differential gene expression analysis, there are only few examples where their performance has been assessed in respect to coexpression-based studies such as sample classification.
In the present paper we benchmark the three most used normalization procedures (MAS5, RMA and GCRMA) in the context of inter-array correlation analysis, confirming and extending the finding that RMA and GCRMA consistently overestimate sample similarity upon normalization. We determine that median polish summarization is responsible for generating a large proportion of these over-similarity artifacts. Furthermore, we show that most affected probesets show also internal signal disagreement, and tend to be composed by individual probes hitting different gene transcripts. We finally provide a correction to the RMA/GCRMA summarization procedure that massively reduces inter-array correlation artifacts, without affecting the detection of differentially expressed genes.
We propose tRMA as a modification of RMA to normalize microarray experiments for correlation-based analysis.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Functional gene clusters, containing two or more genes encoding different enzymes for the same pathway, are sometimes observed in plant genomes, most often when the genes specify the synthesis of ...specialized defensive metabolites. Here, we show that a cluster of genes in tomato (Solanum lycopersicum; Solanaceae) contains genes for terpene synthases (TPSs) that specify the synthesis of monoterpenes and diterpenes from cis-prenyl diphosphates, substrates that are synthesized by enzymes encoded by cis-prenyl transferase (CPT) genes also located within the same cluster. The monoterpene synthase genes in the cluster likely evolved from a diterpene synthase gene in the cluster by duplication and divergence. In the orthologous cluster in Solanum habrochaites, a new sesquiterpene synthase gene was created by a duplication event of a monoterpene synthase followed by a localized gene conversion event directed by a diterpene synthase gene. The TPS genes in the Solanum cluster encoding cis-prenyl diphosphate—utilizing enzymes are closely related to a tobacco (Nicotiana tabacum; Solanaceae) diterpene synthase encoding Z-abienol synthase (Nt-ABS). Nt-ABS uses the substrate copal-8-ol diphosphate, which is made from the all-trans geranylgeranyl diphosphate by copal-8-ol diphosphate synthase (Nt-CPS2). The Solanum gene cluster also contains an ortholog of Nt-CPS2, but it appears to encode a nonfunctional protein. Thus, the Solanum functional gene cluster evolved by duplication and divergence of TPS genes, together with alterations in substrate specificity to utilize cis-prenyl diphosphates and through the acquisition of CPT genes.
Although applied over extremely short timescales, artificial selection has dramatically altered the form, physiology, and life history of cultivated plants. We have used RNAseq to define both gene ...sequence and expression divergence between cultivated tomato and five related wild species. Based on sequence differences, we detect footprints of positive selection in over 50 genes. We also document thousands of shifts in gene-expression level, many of which resulted from changes in selection pressure. These rapidly evolving genes are commonly associated with environmental response and stress tolerance. The importance of environmental inputs during evolution of gene expression is further highlighted by large-scale alteration of the light response coexpression network between wild and cultivated accessions. Human manipulation of the genome has heavily impacted the tomato transcriptome through directed admixture and by indirectly favoring nonsynonymous over synonymous substitutions. Taken together, our results shed light on the pervasive effects artificial and natural selection have had on the transcriptomes of tomato and its wild relatives.