High-throughput automated partial sequencing of anonymous cDNA clones provides a method to survey the repertoire of expressed genes from an organism. Comparison of the coding capacity of these ...expressed sequence tags (ESTs) with the sequences in the public data bases results in assignment of putative function to a significant proportion of the ESTs. Thus, the more than 13,400 plant ESTs that are currently available provide a new resource that will facilitate progress in many areas of plant biology. These opportunities are illustrated by a description of the results obtained from analysis of 1500 Arabidopsis ESTs from a cDNA library prepared from equal portions of $\text{poly}(\text{A}^{+}$) mRNA from etiolated seedlings, roots, leaves, and flowering inflorescences. More than 900 different sequences were represented, 32% of which showed significant nucleotide or deduced amino acid sequence similarity to previously characterized genes or proteins from a wide range of organisms. At least 165 of the clones had significant deduced amino acid sequence homology to proteins or gene products that have not been previously characterized from higher plants. A summary of methods for accessing the information and materials generated by the Arabidopsis cDNA sequencing projects is provided.
Secondary xylem (wood) formation is likely to involve some genes expressed rarely or not at all in herbaceous plants. Moreover, environmental and developmental stimuli influence secondary xylem ...differentiation, producing morphological and chemical changes in wood. To increase our understanding of xylem formation, and to provide material for comparative analysis of gymnosperm and angiosperm sequences, ESTs were obtained from immature xylem of loblolly pine (Pinus taeda L.). A total of 1,097 single-pass sequences were obtained from 5' ends of cDNAs made from gravistimulated tissue from bent trees. Cluster analysis detected 107 groups of similar sequences, ranging in size from 2 to 20 sequences. A total of 361 sequences fell into these groups, whereas 736 sequences were unique. About 55% of the pine EST sequences show similarity to previously described sequences in public databases. About 10% of the recognized genes encode factors involved in cell wall formation. Sequences similar to cell wall proteins, most known lignin biosynthetic enzymes, and several enzymes of carbohydrate metabolism were found. A number of putative regulatory proteins also are represented. Expression patterns of several of these genes were studied in various tissues and organs of pine. Sequencing novel genes expressed during xylem formation will provide a powerful means of identifying mechanisms controlling this important differentiation pathway.
Amplification and detection of lentiviral DNA inside cells Haase, A.T. (University of Minnesota Medical School, Minneapolis, MN); Retzel, E.F; Staskus, K.A
Proceedings of the National Academy of Sciences - PNAS,
07/1990, Volume:
87, Issue:
13
Journal Article
Peer reviewed
Open access
Visna virus and human immunodeficiency virus are prototypes of animal and human lentiviruses, respectively, that persist and are disseminated despite the host immune response because cells in the ...tissues and the blood-stream harbor viral genomes in a covert state. To facilitate identification of these latently infected cells, the polymerase chain reaction has been adapted to amplify viral DNA in fixed cells for detection by in situ hybridization. By using a multiple primer set that generates DNA segments with overlapping cohesive termini, visna virus DNA can be amplified, retained, and detected in infected cells with sensitivities that exceed those of existing methods by more than 2 orders of magnitude. This advance in single-cell technology should prove useful in diagnosing and gaining insight into the pathogenesis of viral infections and provide new opportunities to look for viruses in chronic diseases of unknown etiology.
Summary
We used the IMNpRH212 000‐rad RH and IMpRH7 000‐rad panels to integrate 2019 transcriptome (RNA‐seq)‐generated contigs with markers from the porcine genetic and radiation hybrid (RH) maps and ...bacterial artificial chromosome finger‐printed contigs, into 1) parallel framework maps (LOD ≥ 10) on both panels for swine chromosome (SSC) 4, and 2) a high‐resolution comparative map of SSC4, thus and human chromosomes (HSA) 1 and 8. A total of 573 loci were anchored and ordered on SSC4 closing gaps identified in the porcine sequence assembly Sscrofa9. Alignment of the SSC4 RH with the genetic map identified five microsatellites incorrectly mapped around the centromeric region in the genetic map. Further alignment of the RH and comparative maps with the genome sequence identified four additional regions of discrepancy that are also suggestive of errors in assembly, three of which were resolved through conserved synteny with blocks on HSA1 and HSA8.
Motivation: Protein sequence classification is becoming an increasingly important means of organizing the voluminous data produced by large-scale genome sequencing projects. At present, there are ...several independent classification methods. To aid the general classification effort, we have created a unified protein family resource, MetaFam. MetaFam is a protein family classification built upon 10 publicly-accessible protein family databases (Blocks\batchmode \documentclassfleqn,10pt,legalpaper{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(+\) \end{document}, DOMO, Pfam, PIR-ALN, PRINTS, PROSITE, ProDom, PROTOMAP, SBASE, and SYSTERS). MetaFam’s family ‘supersets’, as we call them, are created automatically using set-theory to compare families among the databases. Families of one database are matched to those in another when the intersection of their members exceeds all other possible family pairings between the two databases. Pairwise family matches are drawn together transitively to create a new list of protein family supersets. Results: MetaFam family supersets have several useful features: (1) each superset contains more members than the families from which it is composed, because each of the component family databases only works with a subset of our full non-redundant set of proteins; (2) conflicting assignments can be pinpointed quickly, since our analysis identifies individual members that are in conflict with the majority consensus; (3) family descriptions that are absent from automated databases can frequently be assigned; (4) statistics have been computed comparing domain boundaries, family size distributions, and overall quality of MetaFam supersets; (5) the supersets have been loaded into a relational database to allow for complex queries and visualization of the connections among families in a superset and the consensus of individual domain members; and (6) the quality of individual supersets has been assessed using numerous quantitative measures such as family consistency, connectedness, and size. We anticipate this new resource will be particularly useful to genomic database curators. Availability: Free access to the MetaFam web server is provided to all users at http://metafam.ahc.umn.edu/. Contact: metafam@ahc.umn.edu Supplementary information: Detailed distribution plots on MetaFam 2.0 supersets and its constituent family databases (e.g. superset/family sizes, domain boundary comparisons) are shown at http://metafam.ahc.umn.edu/mf2.0/stats.html. Statistics on the current release of MetaFam can be found at http://metafam.ahc.umn.edu/current_release/stats.html. * To whom correspondence should be addressed.
In order to identify the genes and gene functions that underlie key aspects of legume biology, researchers have selected the cool season legume Medicago truncatula (Mt) as a model system for legume ...research. A set of >170 000 Mt ESTs has been assembled based on in-depth sampling from various developmental stages and pathogen-challenged tissues. MtDB is a relational database that integrates Mt transcriptome data and provides a wide range of user-defined data mining options. The database is interrogated through a series of interfaces with 58 options grouped into two filters. In addition, the user can select and compare unigene sets generated by different assemblers: Phrap, Cap3 and Cap4. Sequence identifiers from all public Mt sites (e.g. IDs from GenBank, CCGB, TIGR, NCGR, INRA) are fully cross-referenced to facilitate comparisons between different sites, and hypertext links to the appropriate database records are provided for all queries' results. MtDB's goal is to provide researchers with the means to quickly and independently identify sequences that match specific research interests based on user-defined criteria. The underlying database and query software have been designed for ease of updates and portability to other model organisms. Public access to the database is at http://www.medicago.org/MtDB.
Motivation: Protein sequence and family data is accumulating at such a rapid rate that state-of-the-art databases and interface tools are required to aid curators with their classifications. We have ...designed such a system, MetaFam, to facilitate the comparison and integration of public protein sequence and family data. This paper presents the global schema, integration issues, and query capabilities of MetaFam. Results: MetaFam is an integrated data warehouse of information about protein families and their sequences. This data has been collected into a consistent global schema, and stored in an Oracle relational database. The warehouse implementation allows for quick removal of outdated data sets. In addition to the relational implementation of the primary schema, we have developed several derived tables that enable efficient access from data visualization and exploration tools. Through a series of straightforward SQL queries, we demonstrate the usefulness of this data warehouse for comparing protein family classifications and for functional assignment of new sequences. Availability: Access to the MetaFam database is provided through a Java applet called MetaFamView, which can be run from the MetaFam web site at http://www.metafam.ahc.umn.edu/. Access to the relational data via named Oracle accounts can be arranged with the authors. Arrangements can also be made to obtain the data in Oracle ‘export dump’ format. Contact: metafam@ahc.umn.edu Supplementary information: The complete relational schema, integration scripts, and analysis queries are available from the authors. * To whom correspondence should be addressed.
As structural and functional genomics efforts provide the biological community with ever-broadening sets of interrelated data, the need to explore such complex information for subtle relationships ...expands. We present wCLUTO, a Web-enabled version of the stand-alone application CLUTO, designed to apply clustering methods to genomic information. Its first application is focused on the clustering transcriptome data from microarrays. Data can be uploaded by the user into the clustering tool, a choice of several clustering methods can be made and configured, and data are presented to the user in a variety of visual formats, including a three-dimensional "mountain" view of the clusters. Parameters can be explored to rapidly examine a variety of clustering results, and the resulting clusters can be downloaded either for manipulation by other programs or to be saved in a format for publication.
Though sexually reproductive plants share the same principle and most processes in meiosis, there are distinct features detectable. To address the similarities and differences of early meiosis ...transcriptomes from the dicot model system Arabidopsis and monocot model system maize, we performed comparative analyses of RNA-seq data of isolated meiocytes, anthers and seedlings from both species separately and via orthologous genes. Overall gene expression showed similarities, such as an increased number of reads mapping to unannotated features, and differences, such as the amount of differentially expressed genes. We detected major similarities and differences in functional annotations of genes up-regulated in meiocytes, which point to conserved features as well as unique features. Transcriptional regulation seems to be quite similar in Arabidopsis and maize, and we could reveal known and novel transcription factors and cis- regulatory elements acting in early meiosis. Taken together, meiosis between Arabidopsis and maize is conserved in many ways, but displays key distinctions that lie in the patterns of gene expression.
Pinus taeda L. (loblolly pine) and Arabidopsis thaliana differ greatly in form, ecological niche, evolutionary history, and genome size. Arabidopsis is a small, herbaceous, annual dicotyledon, ...whereas pines are large, long-lived, coniferous forest trees. Such diverse plants might be expected to differ in a large number of functional genes. We have obtained and analyzed 59,797 expressed sequence tags (ESTs) from wood-forming tissues of loblolly pine and compared them to the gene sequences inferred from the complete sequence of the Arabidopsis genome. Approximately 50% of pine ESTs have no apparent homologs in Arabidopsis or any other angiosperm in public databases. When evaluated by using contigs containing long, high-quality sequences, we find a higher level of apparent homology between the inferred genes of these two species. For those contigs 1,100 bp or longer, ≈90% have an apparent Arabidopsis homolog (E value $< 10^{-10}$). Pines and Arabidopsis last shared a common ancestor ≈300 million years ago. Few genes would be expected to retain high sequence similarity for this time if they did not have essential functions. These observations suggest substantial conservation of gene sequence in seed plants.