The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble ...clusters of overlapping transcript alignments (ESTs and full‐length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and >1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the ∼27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.
A large tomato expressed sequence tag (EST) dataset (152 635 total) was analyzed to gain insights into differential gene expression among diverse plant tissues representing a range of developmental ...programs and biological responses. These ESTs were clustered and assembled to a total of 31 012 unique gene sequences. To better understand tomato gene expression at a plant system level and to identify differentially expressed and tissue-specific genes, we developed and implemented a digital expression analysis protocol. By clustering genes according to their relative abundance in the various EST libraries, expression patterns of genes across various tissues were generated and genes with similar patterns were grouped. In addition, tissues themselves were clustered for relatedness based on relative gene expression as a means of validating the integrity of the EST data as representative of relative gene expression. Arabidopsis and grape EST collections were also characterized to facilitate cross-species comparisons where possible. Tomato fruit digital expression data was specifically compared with publicly available grape EST data to gain insight into molecular manifestation of ripening processes across diverse taxa and resulted in identification of common transcription factors not previously associated with ripening.
Genetic programs underlying multicellular morphogenesis and cellular differentiation are most often associated with eukaryotic organisms, but examples also exist in bacteria such as the formation of ...multicellular, spore-filled fruiting bodies in the order Myxococcales. Most members of the Myxococcales undergo a multicellular developmental program culminating in the formation of spore-filled fruiting bodies in response to starvation. To gain insight into the evolutionary history of fruiting body formation in Myxococcales, we performed a comparative analysis of the genomes and transcriptomes of five Myxococcales species, four of these undergo fruiting body formation (Myxococcus xanthus, Stigmatella aurantiaca, Sorangium cellulosum, and Haliangium ochraceum) and one does not (Anaeromyxobacter dehalogenans). Our analyses show that a set of 95 known M. xanthus development-specific genes--although suffering from a sampling bias--are overrepresented and occur more frequently than an average M. xanthus gene in S. aurantiaca, whereas they occur at the same frequency as an average M. xanthus gene in S. cellulosum and in H. ochraceum and are underrepresented in A. dehalogenans. Moreover, genes for entire signal transduction pathways important for fruiting body formation in M. xanthus are conserved in S. aurantiaca, whereas only a minority of these genes are conserved in A. dehalogenans, S. cellulosum, and H. ochraceum. Likewise, global gene expression profiling of developmentally regulated genes showed that genes that upregulated during development in M. xanthus are overrepresented in S. aurantiaca and slightly underrepresented in A. dehalogenans, S. cellulosum, and H. ochraceum. These comparative analyses strongly indicate that the genetic programs for fruiting body formation in M. xanthus and S. aurantiaca are highly similar and significantly different from the genetic program directing fruiting body formation in S. cellulosum and H. ochraceum. Thus, our analyses reveal an unexpected level of plasticity in the genetic programs for fruiting body formation in the Myxococcales and strongly suggest that the genetic program underlying fruiting body formation in different Myxococcales is not conserved. The evolutionary implications of this finding are discussed.
Plasmids are mobile genetic elements that play a key role in the evolution of bacteria by mediating genome plasticity and lateral transfer of useful genetic information. Although originally ...considered to be exclusively circular, linear plasmids have also been identified in certain bacterial phyla, notably the actinomycetes. In some cases, linear plasmids engage with chromosomes in an intricate evolutionary interplay, facilitating the emergence of new genome configurations by transfer and recombination or plasmid integration. Genome sequencing of Streptomyces clavuligerus ATCC 27064, a Gram-positive soil bacterium known for its production of a diverse array of biotechnologically important secondary metabolites, revealed a giant linear plasmid of 1.8 Mb in length. This megaplasmid (pSCL4) is one of the largest plasmids ever identified and the largest linear plasmid to be sequenced. It contains more than 20% of the putative protein-coding genes of the species, but none of these is predicted to be essential for primary metabolism. Instead, the plasmid is densely packed with an exceptionally large number of gene clusters for the potential production of secondary metabolites, including a large number of putative antibiotics, such as staurosporine, moenomycin, beta-lactams, and enediynes. Interestingly, cross-regulation occurs between chromosomal and plasmid-encoded genes. Several factors suggest that the megaplasmid came into existence through recombination of a smaller plasmid with the arms of the main chromosome. Phylogenetic analysis indicates that heavy traffic of genetic information between Streptomyces plasmids and chromosomes may facilitate the rapid evolution of secondary metabolite repertoires in these bacteria.
Since the initial publication of its complete genome sequence, Arabidopsis thaliana has become more important than ever as a model for plant research. However, the initial genome annotation was ...submitted by multiple centers using inconsistent methods, making the data difficult to use for many applications.
Over the course of three years, TIGR has completed its effort to standardize the structural and functional annotation of the Arabidopsis genome. Using both manual and automated methods, Arabidopsis gene structures were refined and gene products were renamed and assigned to Gene Ontology categories. We present an overview of the methods employed, tools developed, and protocols followed, summarizing the contents of each data release with special emphasis on our final annotation release (version 5).
Over the entire period, several thousand new genes and pseudogenes were added to the annotation. Approximately one third of the originally annotated gene models were significantly refined yielding improved gene structure annotations, and every protein-coding gene was manually inspected and classified using Gene Ontology terms.
Burkholderia species exhibit enormous phenotypic diversity, ranging from the nonpathogenic, soil- and water-inhabiting Burkholderia thailandensis to the virulent, host-adapted mammalian pathogen B. ...mallei. Genomic diversity is evident within Burkholderia species as well. Individual isolates of Burkholderia pseudomallei and B. thailandensis, for example, carry a variety of strain-specific genomic islands (GIs), including putative pathogenicity and metabolic islands, prophage-like islands, and prophages. These GIs may provide some strains with a competitive advantage in the environment and/or in the host relative to other strains.
Here we present the results of analysis of 37 prophages, putative prophages, and prophage-like elements from six different Burkholderia species. Five of these were spontaneously induced to form bacteriophage particles from B. pseudomallei and B. thailandensis strains and were isolated and fully sequenced; 24 were computationally predicted in sequenced Burkholderia genomes; and eight are previously characterized prophages or prophage-like elements. The results reveal numerous differences in both genome structure and gene content among elements derived from different species as well as from strains within species, due in part to the incorporation of additional DNA, or 'morons' into the prophage genomes. Implications for pathogenicity are also discussed. Lastly, RNAseq analysis of gene expression showed that many of the genes in varphi1026b that appear to contribute to phage and lysogen fitness were expressed independently of the phage structural and replication genes.
This study provides the first estimate of the relative contribution of prophages to the vast phenotypic diversity found among the Burkholderiae.
Musa species (Zingiberaceae, Zingiberales) including bananas and plantains are collectively the fourth most important crop in developing countries. Knowledge concerning Musa genome structure and the ...origin of distinct cultivars has greatly increased over the last few years. Until now, however, no large-scale analyses of Musa genomic sequence have been conducted. This study compares genomic sequence in two Musa species with orthologous regions in the rice genome.
We produced 1.4 Mb of Musa sequence from 13 BAC clones, annotated and analyzed them along with 4 previously sequenced BACs. The 443 predicted genes revealed that Zingiberales genes share GC content and distribution characteristics with eudicot and Poaceae genomes. Comparison with rice revealed microsynteny regions that have persisted since the divergence of the Commelinid orders Poales and Zingiberales at least 117 Mya. The previously hypothesized large-scale duplication event in the common ancestor of major cereal lineages within the Poaceae was verified. The divergence time distributions for Musa-Zingiber (Zingiberaceae, Zingiberales) orthologs and paralogs provide strong evidence for a large-scale duplication event in the Musa lineage after its divergence from the Zingiberaceae approximately 61 Mya. Comparisons of genomic regions from M. acuminata and M. balbisiana revealed highly conserved genome structure, and indicated that these genomes diverged circa 4.6 Mya.
These results point to the utility of comparative analyses between distantly-related monocot species such as rice and Musa for improving our understanding of monocot genome evolution. Sequencing the genome of M. acuminata would provide a strong foundation for comparative genomics in the monocots. In addition a genome sequence would aid genomic and genetic analyses of cultivated Musa polyploid genotypes in research aimed at localizing and cloning genes controlling important agronomic traits for breeding purposes.
Burkholderia mallei (Bm), the causative agent of the predominately equine disease glanders, is a genetically uniform species that is very closely related to the much more diverse species Burkholderia ...pseudomallei (Bp), an opportunistic human pathogen and the primary cause of melioidosis. To gain insight into the relative lack of genetic diversity within Bm, we performed whole-genome comparative analysis of seven Bm strains and contrasted these with eight Bp strains. The Bm core genome (shared by all seven strains) is smaller in size than that of Bp, but the inverse is true for the variable gene sets that are distributed across strains. Interestingly, the biological roles of the Bm variable gene sets are much more homogeneous than those of Bp. The Bm variable genes are found mostly in contiguous regions flanked by insertion sequence (IS) elements, which appear to mediate excision and subsequent elimination of groups of genes that are under reduced selection in the mammalian host. The analysis suggests that the Bm genome continues to evolve through random IS-mediated recombination events, and differences in gene content may contribute to differences in virulence observed among Bm strains. The results are consistent with the view that Bm recently evolved from a single strain of Bp upon introduction into an animal host followed by expansion of IS elements, prophage elimination, and genome rearrangements and reduction mediated by homologous recombination across IS elements.
Burkholderia multivorans is a Gram-negative bacterium and a member of the Burkholderia cepacia complex, which is frequently associated with respiratory infections in people with cystic fibrosis (CF) ...and chronic granulomatous disease (CGD). We are reporting the genome sequences of 4 B. multivorans strains, 2 from CF patients and 2 from CGD patients.