Continued sequencing efforts coupled with advances in sequencing technology will lead to the completion of a vast number of small genomes. Whole-genome comparisons represent an important part of the ...analysis of any new genome sequence, as they can provide a better understanding of the biology and evolution of the source organism. Visualization of the results is important, as it allows information from a variety of sources to be integrated and interpreted. However, existing graphical comparison tools lack features needed for efficiently comparing a new genome to hundreds or thousands of existing sequences. Moreover, existing tools are limited in terms of the types of comparisons that can be performed, the extent to which the output can be customized, and the ease with which the entire process can be automated.
The CGView Comparison Tool (CCT) is a package for visually comparing bacterial, plasmid, chloroplast, or mitochondrial sequences of interest to existing genomes or sequence collections. The comparisons are conducted using BLAST, and the BLAST results are presented in the form of graphical maps that can also show sequence features, gene and protein names, COG (Clusters of Orthologous Groups of proteins) category assignments, and sequence composition characteristics. CCT can generate maps in a variety of sizes, including 400 Megapixel maps suitable for posters. Comparisons can be conducted within a particular species or genus, or all available genomes can be used. The entire map creation process, from downloading sequences to redrawing zoomed maps, can be completed easily using scripts included with the CCT. User-defined features or analysis results can be included on maps, and maps can be extensively customized. To simplify program setup, a CCT virtual machine that includes all dependencies preinstalled is available. Detailed tutorials illustrating the use of CCT are included with the CCT documentation.
CCT can be used to visually compare a reference sequence to thousands of existing genomes or sequence collections (next-generation sequencing reads for example) on a standard desktop computer. It provides analysis and visualization functionality not available in any existing circular genome visualization tool. By visually presenting sequence conservation information along with functional classifications and sequence composition characteristics, CCT can be a useful tool for identifying rapidly evolving or novel sequences, horizontally transferred sequences, or unusual functional properties in newly sequenced genomes. CCT is freely available for download at http://stothard.afns.ualberta.ca/downloads/CCT/.
CGView (Circular Genome Viewer) is a Java application and library for generating high-quality, zoomable maps of circular genomes. It converts XML or tab-delimited input into a graphical map (PNG, JPG ...or Scalable Vector Graphics format), complete with sequence features, labels, legends and footnotes. In addition to the default full view map, the program can generate a series of hyperlinked maps showing expanded views. The linked maps can be explored using any Web browser, allowing rapid genome browsing and facilitating data sharing. Availability: CGView (the standalone application, library or applet), sample input, sample maps and documentation can be obtained from http://wishart.biology.ualberta.ca/cgview/ Contact: david.wishart@ualberta.ca
Abstract The Escherichia coli species is comprised of several ‘ecotypes’ inhabiting a wide range of host and natural environmental niches. Recent studies have suggested that novel naturalized ...ecotypes have emerged across wastewater treatment plants and meat processing facilities. Phylogenetic and multilocus sequence typing analyses clustered naturalized wastewater and meat plant E. coli strains into two main monophyletic clusters corresponding to the ST635 and ST399 sequence types, with several serotypes identified by serotyping, potentially representing distinct lineages that have naturalized across wastewater treatment plants and meat processing facilities. This evidence, taken alongside ecotype prediction analyses that distinguished the naturalized strains from their host-associated counterparts, suggests these strains may collectively represent a novel ecotype that has recently emerged across food- and water-associated engineered environments. Interestingly, pan-genomic analyses revealed that the naturalized strains exhibited an abundance of biofilm formation, defense, and disinfection-related stress resistance genes, but lacked various virulence and colonization genes, indicating that their naturalization has come at the cost of fitness in the original host environment.
Genome wide association studies (GWAS) on residual feed intake (RFI) and its component traits including daily dry matter intake (DMI), average daily gain (ADG), and metabolic body weight (MWT) were ...conducted in a population of 7573 animals from multiple beef cattle breeds based on 7,853,211 imputed whole genome sequence variants. The GWAS results were used to elucidate genetic architectures of the feed efficiency related traits in beef cattle.
The DNA variant allele substitution effects approximated a bell-shaped distribution for all the traits while the distribution of additive genetic variances explained by single DNA variants followed a scaled inverse chi-squared distribution to a greater extent. With a threshold of P-value < 1.00E-05, 16, 72, 88, and 116 lead DNA variants on multiple chromosomes were significantly associated with RFI, DMI, ADG, and MWT, respectively. In addition, lead DNA variants with potentially large pleiotropic effects on DMI, ADG, and MWT were found on chromosomes 6, 14 and 20. On average, missense, 3'UTR, 5'UTR, and other regulatory region variants exhibited larger allele substitution effects in comparison to other functional classes. Intergenic and intron variants captured smaller proportions of additive genetic variance per DNA variant. Instead 3'UTR and synonymous variants explained a greater amount of genetic variance per DNA variant for all the traits examined while missense, 5'UTR and other regulatory region variants accounted for relatively more additive genetic variance per sequence variant for RFI and ADG, respectively. In total, 25 to 27 enriched cellular and molecular functions were identified with lipid metabolism and carbohydrate metabolism being the most significant for the feed efficiency traits.
RFI is controlled by many DNA variants with relatively small effects whereas DMI, ADG, and MWT are influenced by a few DNA variants with large effects and many DNA variants with small effects. Nucleotide polymorphisms in regulatory region and synonymous functional classes play a more important role per sequence variant in determining variation of the feed efficiency traits. The genetic architecture as revealed by the GWAS of the imputed 7,853,211 DNA variants will improve our understanding on the genetic control of feed efficiency traits in beef cattle.
Abstract
PHASTEST (PHAge Search Tool with Enhanced Sequence Translation) is the successor to the PHAST and PHASTER prophage finding web servers. PHASTEST is designed to support the rapid ...identification, annotation and visualization of prophage sequences within bacterial genomes and plasmids. PHASTEST also supports rapid annotation and interactive visualization of all other genes (protein coding regions, tRNA/tmRNA/rRNA sequences) in bacterial genomes. Given that bacterial genome sequencing has become so routine, the need for fast tools to comprehensively annotate bacterial genomes has become progressively more important. PHASTEST not only offers faster and more accurate prophage annotations than its predecessors, it also provides more complete whole genome annotations and much improved genome visualization capabilities. In standardized tests, we found that PHASTEST is 31% faster and 2–3% more accurate in prophage identification than PHASTER. Specifically, PHASTEST can process a typical bacterial genome in 3.2 min (raw sequence) or in 1.3 min when given a pre-annotated GenBank file. Improvements in PHASTEST’s ability to annotate bacterial genomes now make it a particularly powerful tool for whole genome annotation. In addition, PHASTEST now offers a much more modern and responsive visualization interface that allows users to generate, edit, annotate and interactively visualize (via zooming, rotating, dragging, panning, resetting), colourful, publication quality genome maps. PHASTEST continues to offer popular options such as an API for programmatic queries, a Docker image for local installations, support for multiple (metagenomic) queries and the ability to perform automated look-ups against thousands of previously PHAST-annotated bacterial genomes. PHASTEST is available online at https://phastest.ca.
Graphical Abstract
Graphical Abstract
Genome wide association studies (GWAS) were conducted on 7,853,211 imputed whole genome sequence variants in a population of 3354 to 3984 animals from multiple beef cattle breeds for five ...carcass merit traits including hot carcass weight (HCW), average backfat thickness (AFAT), rib eye area (REA), lean meat yield (LMY) and carcass marbling score (CMAR). Based on the GWAS results, genetic architectures of the carcass merit traits in beef cattle were elucidated.
The distributions of DNA variant allele substitution effects approximated a bell-shaped distribution for all the traits while the distribution of additive genetic variances explained by single DNA variants conformed to a scaled inverse chi-squared distribution to a greater extent. At a threshold of P-value < 10
, 51, 33, 46, 40, and 38 lead DNA variants on multiple chromosomes were significantly associated with HCW, AFAT, REA, LMY, and CMAR, respectively. In addition, lead DNA variants with potentially large pleiotropic effects on HCW, AFAT, REA, and LMY were found on chromosome 6. On average, missense variants, 3'UTR variants, 5'UTR variants, and other regulatory region variants exhibited larger allele substitution effects on the traits in comparison to other functional classes. The amounts of additive genetic variance explained per DNA variant were smaller for intergenic and intron variants on all the traits whereas synonymous variants, missense variants, 3'UTR variants, 5'UTR variants, downstream and upstream gene variants, and other regulatory region variants captured a greater amount of additive genetic variance per sequence variant for one or more carcass merit traits investigated. In total, 26 enriched cellular and molecular functions were identified with lipid metabolisms, small molecular biochemistry, and carbohydrate metabolism being the most significant for the carcass merit traits.
The GWAS results have shown that the carcass merit traits are controlled by a few DNA variants with large effects and many DNA variants with small effects. Nucleotide polymorphisms in regulatory, synonymous, and missense functional classes have relatively larger impacts per sequence variant on the variation of carcass merit traits. The genetic architecture as revealed by the GWAS will improve our understanding on genetic controls of carcass merit traits in beef cattle.
One of the goals of livestock genomics research is to identify the genetic differences responsible for variation in phenotypic traits, particularly those of economic importance. Characterizing the ...genetic variation in livestock species is an important step towards linking genes or genomic regions with phenotypes. The completion of the bovine genome sequence and recent advances in DNA sequencing technology allow for in-depth characterization of the genetic variations present in cattle. Here we describe the whole-genome resequencing of two Bos taurus bulls from distinct breeds for the purpose of identifying and annotating novel forms of genetic variation in cattle.
The genomes of a Black Angus bull and a Holstein bull were sequenced to 22-fold and 19-fold coverage, respectively, using the ABI SOLiD system. Comparisons of the sequences with the Btau4.0 reference assembly yielded 7 million single nucleotide polymorphisms (SNPs), 24% of which were identified in both animals. Of the total SNPs found in Holstein, Black Angus, and in both animals, 81%, 81%, and 75% respectively are novel. In-depth annotations of the data identified more than 16 thousand distinct non-synonymous SNPs (85% novel) between the two datasets. Alignments between the SNP-altered proteins and orthologues from numerous species indicate that many of the SNPs alter well-conserved amino acids. Several SNPs predicted to create or remove stop codons were also found. A comparison between the sequencing SNPs and genotyping results from the BovineHD high-density genotyping chip indicates a detection rate of 91% for homozygous SNPs and 81% for heterozygous SNPs. The false positive rate is estimated to be about 2% for both the Black Angus and Holstein SNP sets, based on follow-up genotyping of 422 and 427 SNPs, respectively. Comparisons of read depth between the two bulls along the reference assembly identified 790 putative copy-number variations (CNVs). Ten randomly selected CNVs, five genic and five non-genic, were successfully validated using quantitative real-time PCR. The CNVs are enriched for immune system genes and include genes that may contribute to lactation capacity. The majority of the CNVs (69%) were detected as regions with higher abundance in the Holstein bull.
Substantial genetic differences exist between the Black Angus and Holstein animals sequenced in this work and the Hereford reference sequence, and some of this variation is predicted to affect evolutionarily conserved amino acids or gene copy number. The deeply annotated SNPs and CNVs identified in this resequencing study can serve as useful genetic tools, and as candidates in searches for phenotype-altering DNA differences.
We previously demonstrated the existence of naturalized strains of E. coli in wastewater and herein perform an in-depth comparative whole genome analysis of these strains (n = 17). Fourteen of the ...Canadian E. coli strains, isolated from geographically separated wastewater treatment plants, were virtually identical at the core genome and were ≥96% similar at the whole genome level, suggesting clonal-relatedness among these isolates. Remarkably, these strains were shown to be extremely similar to the genome of an E. coli isolated from wastewater in Switzerland, suggesting a global distribution of these strains. The genomes of three other Canadian wastewater strains were more diverse but very similar to the genomes of E. coli isolates collected from U.S. wastewater samples. Based on maximum likelihood phylogenetic analysis, wastewater strains from Canada, the U.S. and Switzerland formed a clade separate from other known enteric phylogroups (i.e., A, B1, B2, D, E) and the cryptic clades. All Canadian, Swiss and U.S. wastewater strains possessed a common SNP biomarker pattern across their genomes, and a sub-population (i.e., 14 Canadian and 1 Swiss strain) also possessed a previously identified wastewater-specific marker known as uspC-IS30-flhDC element. Biochemical heat mapping of 518 categories of genes recapitulated phylogeny, with wastewater strains phenotypically clustering separately from enteric and cryptic clades. Wastewater strains were enriched for stress-response genes (i.e., nutrient acquisition/deprivation, DNA repair, oxidative stress, and UV resistance) - elements reflective of their environmental survival challenges. Wastewater strains were shown to carry a plethora of known antibiotic resistance (AR) genes, the patterns of which were remarkably similar among all Canadian, U.S. and Swiss wastewater strains. Virulence gene composition was also similar among all the wastewater strains, with an abundant representation of virulence genes commonly associated with urinary pathogenic E. coli (UPEC) as well as enterohemorrhagic (EHEC) E. coli. The remarkable degree of similarity between all wastewater strains from Canada, Switzerland and the U.S. suggests the evolution and global-dissemination of water treatment-resistant clone of E. coli. These finding, along with others, raise some important concerns about the potential for emergence of E. coli pathotypes resistant to water-treatment.
Display omitted
•Naturalized wastewater strains of E. coli share a common genetic background.•Naturalized wastewater strains are globally-distributed (Canada, U.S., and Switzerland).•The genomes of wastewater strains of E. coli are enriched in stress-response elements.•Wastewater strains share virulence and antibiotic resistance genes with enteric pathotypes.•The data raises concerns about the emergence of water treatment-resistant pathotypes of E. coli.
NGS-SNP is a collection of command-line scripts for providing rich annotations for SNPs identified by the sequencing of whole genomes from any organism with reference sequences in Ensembl. Included ...among the annotations, several of which are not available from any existing SNP annotation tools, are the results of detailed comparisons with orthologous sequences. These comparisons can, for example, identify SNPs that affect conserved residues, or alter residues or genes linked to phenotypes in another species.
Availability: NGS-SNP is available both as a set of scripts and as a virtual machine. The virtual machine consists of a Linux operating system with all the NGS-SNP dependencies pre-installed. The source code and virtual machine are freely available for download at http://stothard.afns.ualberta.ca/downloads/NGS-SNP/.
Contact:
stothard@ualberta.ca
Supplementary information:
Supplementary data are available at Bioinformatics online.