Summary
The flowering plant Arabidopsis thaliana is a dicot model organism for research in many aspects of plant biology. A comprehensive annotation of its genome paves the way for understanding the ...functions and activities of all types of transcripts, including mRNA, the various classes of non‐coding RNA, and small RNA. The TAIR10 annotation update had a profound impact on Arabidopsis research but was released more than 5 years ago. Maintaining the accuracy of the annotation continues to be a prerequisite for future progress. Using an integrative annotation pipeline, we assembled tissue‐specific RNA‐Seq libraries from 113 datasets and constructed 48 359 transcript models of protein‐coding genes in eleven tissues. In addition, we annotated various classes of non‐coding RNA including microRNA, long intergenic RNA, small nucleolar RNA, natural antisense transcript, small nuclear RNA, and small RNA using published datasets and in‐house analytic results. Altogether, we identified 635 novel protein‐coding genes, 508 novel transcribed regions, 5178 non‐coding RNAs, and 35 846 small RNA loci that were formerly unannotated. Analysis of the splicing events and RNA‐Seq based expression profiles revealed the landscapes of gene structures, untranslated regions, and splicing activities to be more intricate than previously appreciated. Furthermore, we present 692 uniformly expressed housekeeping genes, 43% of whose human orthologs are also housekeeping genes. This updated Arabidopsis genome annotation with a substantially increased resolution of gene models will not only further our understanding of the biological processes of this plant model but also of other species.
Significance Statement
The most recent annotation of the Arabidopsis thaliana genome (TAIR10), released more than 5 years ago, had a profound impact on Arabidopsis research. Here we present Araport11, a re‐annotation of the Col‐0 reference genome. We used extensive RNA‐seq data to update and extend structural gene models, thus identifying over 700 novel protein‐coding genes, 500 novel transcribed regions, 5000 non‐coding genes, and 35 000 small RNA loci that formerly eluded annotation.
Most gastrointestinal stromal tumors (GISTs) harbor mutant KIT or platelet-derived growth factor receptor alpha (PDGFRA) kinases, which are imatinib targets. Sunitinib, which targets KIT, PDGFRs, and ...several other kinases, has demonstrated efficacy in patients with GIST after they experience imatinib failure. We evaluated the impact of primary and secondary kinase genotype on sunitinib activity.
Tumor responses were assessed radiologically in a phase I/II trial of sunitinib in 97 patients with metastatic, imatinib-resistant/intolerant GIST. KIT/PDGFRA mutational status was determined for 78 patients by using tumor specimens obtained before and after prior imatinib therapy. Kinase mutants were biochemically profiled for sunitinib and imatinib sensitivity.
Clinical benefit (partial response or stable disease for > or = 6 months) with sunitinib was observed for the three most common primary GIST genotypes: KIT exon 9 (58%), KIT exon 11 (34%), and wild-type KIT/PDGFRA (56%). Progression-free survival (PFS) was significantly longer for patients with primary KIT exon 9 mutations (P = .0005) or with a wild-type genotype (P = .0356) than for those with KIT exon 11 mutations. The same pattern was observed for overall survival (OS). PFS and OS were longer for patients with secondary KIT exon 13 or 14 mutations (which involve the KIT-adenosine triphosphate binding pocket) than for those with exon 17 or 18 mutations (which involve the KIT activation loop). Biochemical profiling studies confirmed the clinical results.
The clinical activity of sunitinib after imatinib failure is significantly influenced by both primary and secondary mutations in the predominant pathogenic kinases, which has implications for optimization of the treatment of patients with GIST.
There is an increasing awareness that as a result of structural variation, a reference sequence representing a genome of a single individual is unable to capture all of the gene repertoire found in ...the species. A large number of genes affected by presence/absence and copy number variation suggest that it may contribute to phenotypic and agronomic trait diversity. Here we show by analysis of the Brassica oleracea pangenome that nearly 20% of genes are affected by presence/absence variation. Several genes displaying presence/absence variation are annotated with functions related to major agronomic traits, including disease resistance, flowering time, glucosinolate metabolism and vitamin biosynthesis.
Araport: the Arabidopsis information portal Krishnakumar, Vivek; Hanlon, Matthew R; Contrino, Sergio ...
Nucleic acids research,
01/2015, Letnik:
43, Številka:
Database issue
Journal Article
Recenzirano
Odprti dostop
The Arabidopsis Information Portal (https://www.araport.org) is a new online resource for plant biology research. It houses the Arabidopsis thaliana genome sequence and associated annotation. It was ...conceived as a framework that allows the research community to develop and release 'modules' that integrate, analyze and visualize Arabidopsis data that may reside at remote sites. The current implementation provides an indexed database of core genomic information. These data are made available through feature-rich web applications that provide search, data mining, and genome browser functionality, and also by bulk download and web services. Araport uses software from the InterMine and JBrowse projects to expose curated data from TAIR, GO, BAR, EBI, UniProt, PubMed and EPIC CoGe. The site also hosts 'science apps,' developed as prototypes for community modules that use dynamic web pages to present data obtained on-demand from third-party servers via RESTful web services. Designed for sustainability, the Arabidopsis Information Portal strategy exploits existing scientific computing infrastructure, adopts a practical mixture of data integration technologies and encourages collaborative enhancement of the resource by its user community.
Most gastrointestinal stromal tumors (GISTs) have activating mutations in the KIT receptor tyrosine kinase, and most patients with GISTs respond well to Gleevec, which inhibits KIT kinase activity. ...Here we show that ~35% (14 of 40) of GISTs lacking KIT mutations have intragenic activation mutations in the related receptor tyrosine kinase, platelet-derived growth factor receptor α (PDGFRA). Tumors expressing KIT or PDGFRA oncoproteins were indistinguishable with respect to activation of downstream signaling intermediates and cytogenetic changes associated with tumor progression. Thus, KIT and PDGFRA mutations appear to be alternative and mutually exclusive oncogenic mechanisms in GISTs.
Medicago truncatula, a close relative of alfalfa, is a preeminent model for studying nitrogen fixation, symbiosis, and legume genomics. The Medicago sequencing project began in 2003 with the goal to ...decipher sequences originated from the euchromatic portion of the genome. The initial sequencing approach was based on a BAC tiling path, culminating in a BAC-based assembly (Mt3.5) as well as an in-depth analysis of the genome published in 2011.
Here we describe a further improved and refined version of the M. truncatula genome (Mt4.0) based on de novo whole genome shotgun assembly of a majority of Illumina and 454 reads using ALLPATHS-LG. The ALLPATHS-LG scaffolds were anchored onto the pseudomolecules on the basis of alignments to both the optical map and the genotyping-by-sequencing (GBS) map. The Mt4.0 pseudomolecules encompass ~360 Mb of actual sequences spanning 390 Mb of which ~330 Mb align perfectly with the optical map, presenting a drastic improvement over the BAC-based Mt3.5 which only contained 70% sequences (~250 Mb) of the current version. Most of the sequences and genes that previously resided on the unanchored portion of Mt3.5 have now been incorporated into the Mt4.0 pseudomolecules, with the exception of ~28 Mb of unplaced sequences. With regard to gene annotation, the genome has been re-annotated through our gene prediction pipeline, which integrates EST, RNA-seq, protein and gene prediction evidences. A total of 50,894 genes (31,661 high confidence and 19,233 low confidence) are included in Mt4.0 which overlapped with ~82% of the gene loci annotated in Mt3.5. Of the remaining genes, 14% of the Mt3.5 genes have been deprecated to an "unsupported" status and 4% are absent from the Mt4.0 predictions.
Mt4.0 and its associated resources, such as genome browsers, BLAST-able datasets and gene information pages, can be found on the JCVI Medicago web site (http://www.jcvi.org/medicago). The assembly and annotation has been deposited in GenBank (BioProject: PRJNA10791). The heavily curated chromosomal sequences and associated gene models of Medicago will serve as a better reference for legume biology and comparative genomics.
Chickpea (Cicer arietinum L.), an important grain legume crop of the world is seriously challenged by terminal drought and salinity stresses. However, very limited number of molecular markers and ...candidate genes are available for undertaking molecular breeding in chickpea to tackle these stresses. This study reports generation and analysis of comprehensive resource of drought- and salinity-responsive expressed sequence tags (ESTs) and gene-based markers.
A total of 20,162 (18,435 high quality) drought- and salinity- responsive ESTs were generated from ten different root tissue cDNA libraries of chickpea. Sequence editing, clustering and assembly analysis resulted in 6,404 unigenes (1,590 contigs and 4,814 singletons). Functional annotation of unigenes based on BLASTX analysis showed that 46.3% (2,965) had significant similarity (< or =1E-05) to sequences in the non-redundant UniProt database. BLASTN analysis of unique sequences with ESTs of four legume species (Medicago, Lotus, soybean and groundnut) and three model plant species (rice, Arabidopsis and poplar) provided insights on conserved genes across legumes as well as novel transcripts for chickpea. Of 2,965 (46.3%) significant unigenes, only 2,071 (32.3%) unigenes could be functionally categorised according to Gene Ontology (GO) descriptions. A total of 2,029 sequences containing 3,728 simple sequence repeats (SSRs) were identified and 177 new EST-SSR markers were developed. Experimental validation of a set of 77 SSR markers on 24 genotypes revealed 230 alleles with an average of 4.6 alleles per marker and average polymorphism information content (PIC) value of 0.43. Besides SSR markers, 21,405 high confidence single nucleotide polymorphisms (SNPs) in 742 contigs (with > or = 5 ESTs) were also identified. Recognition sites for restriction enzymes were identified for 7,884 SNPs in 240 contigs. Hierarchical clustering of 105 selected contigs provided clues about stress- responsive candidate genes and their expression profile showed predominance in specific stress-challenged libraries.
Generated set of chickpea ESTs serves as a resource of high quality transcripts for gene discovery and development of functional markers associated with abiotic stress tolerance that will be helpful to facilitate chickpea breeding. Mapping of gene-based markers in chickpea will also add more anchoring points to align genomes of chickpea and other legume species.
Human respiratory syncytial virus (RSV) is the leading cause of respiratory tract infections in children globally, with nearly all children experiencing at least one infection by the age of two. ...Partial sequencing of the attachment glycoprotein gene is conducted routinely for genotyping, but relatively few whole genome sequences are available for RSV. The goal of our study was to sequence the genomes of RSV strains collected from multiple countries to further understand the global diversity of RSV at a whole-genome level.
We collected RSV samples and isolates from Mexico, Argentina, Belgium, Italy, Germany, Australia, South Africa, and the USA from the years 1998-2010. Both Sanger and next-generation sequencing with the Illumina and 454 platforms were used to sequence the whole genomes of RSV A and B. Phylogenetic analyses were performed using the Bayesian and maximum likelihood methods of phylogenetic inference.
We sequenced the genomes of 34 RSVA and 23 RSVB viruses. Phylogenetic analysis showed that the RSVA genome evolves at an estimated rate of 6.72 × 10(-4) substitutions/site/year (95% HPD 5.61 × 10(-4) to 7.6 × 10(-4)) and for RSVB the evolutionary rate was 7.69 × 10(-4) substitutions/site/year (95% HPD 6.81 × 10(-4) to 8.62 × 10(-4)). We found multiple clades co-circulating globally for both RSV A and B. The predominant clades were GA2 and GA5 for RSVA and BA for RSVB.
Our analyses showed that RSV circulates on a global scale with the same predominant clades of viruses being found in countries around the world. However, the distribution of clades can change rapidly as new strains emerge. We did not observe a strong spatial structure in our trees, with the same three main clades of RSV co-circulating globally, suggesting that the evolution of RSV is not strongly regionalized.
Chickpea (Cicer arietinum L.) is the third most important cool season food legume, cultivated in arid and semi-arid regions of the world. The goal of this study was to develop novel molecular markers ...such as microsatellite or simple sequence repeat (SSR) markers from bacterial artificial chromosome (BAC)-end sequences (BESs) and diversity arrays technology (DArT) markers, and to construct a high-density genetic map based on recombinant inbred line (RIL) population ICC 4958 (C. arietinum)×PI 489777 (C. reticulatum). A BAC-library comprising 55,680 clones was constructed and 46,270 BESs were generated. Mining of these BESs provided 6,845 SSRs, and primer pairs were designed for 1,344 SSRs. In parallel, DArT arrays with ca. 15,000 clones were developed, and 5,397 clones were found polymorphic among 94 genotypes tested. Screening of newly developed BES-SSR markers and DArT arrays on the parental genotypes of the RIL mapping population showed polymorphism with 253 BES-SSR markers and 675 DArT markers. Segregation data obtained for these polymorphic markers and 494 markers data compiled from published reports or collaborators were used for constructing the genetic map. As a result, a comprehensive genetic map comprising 1,291 markers on eight linkage groups (LGs) spanning a total of 845.56 cM distance was developed (http://cmap.icrisat.ac.in/cmap/sm/cp/thudi/). The number of markers per linkage group ranged from 68 (LG 8) to 218 (LG 3) with an average inter-marker distance of 0.65 cM. While the developed resource of molecular markers will be useful for genetic diversity, genetic mapping and molecular breeding applications, the comprehensive genetic map with integrated BES-SSR markers will facilitate its anchoring to the physical map (under construction) to accelerate map-based cloning of genes in chickpea and comparative genome evolution studies in legumes.
In natural ecosystems, the roots of many plants exist in association with arbuscular mycorrhizal (AM) fungi, and the resulting symbiosis has profound effects on the plant. The most frequently ...documented response is an increase in phosphorus nutrition; however, other effects have been noted, including increased resistance to abiotic and biotic stresses. Here we used a 16 000-feature oligonucleotide array and real-time quantitative RT-PCR to explore transcriptional changes triggered in Medicago truncatula roots and shoots as a result of AM symbiosis. By controlling the experimental conditions, phosphorus-related effects were minimized, and both local and systemic transcriptional responses to the AM fungus were revealed. The transcriptional response of the roots and shoots differed in both the magnitude of gene induction and the predicted functional categories of the mycorrhiza-regulated genes. In the roots, genes regulated in response to three different AM fungi were identified, and, through split-root experiments, an additional layer of regulation, in the colonized or non-colonized sections of the mycorrhizal root system, was uncovered. Transcript profiles of the shoots of mycorrhizal plants indicated the systemic induction of many genes predicted to be involved in stress or defense responses, and suggested that mycorrhizal plants might display enhanced disease resistance. Experimental evidence supports this prediction, and mycorrhizal M. truncatula plants showed increased resistance to a virulent bacterial pathogen, Xanthomonas campestris. Thus, the symbiosis is accompanied by a complex pattern of local and systemic changes in gene expression, including the induction of a functional defense response.