The Salmonella enterica serovar Newport red onion outbreak of 2020 was the largest foodborne outbreak of Salmonella in over a decade. The epidemiological investigation suggested two farms as the ...likely source of contamination. However, single nucleotide polymorphism (SNP) analysis of the whole genome sequencing data showed that none of the Salmonella isolates collected from the farm regions were linked to the clinical isolates-preventing the use of phylogenetics in source identification. Here, we explored an alternative method for analyzing the whole genome sequencing data driven by the hypothesis that if the outbreak strain had come from the farm regions, then the clinical isolates would disproportionately contain plasmids found in isolates from the farm regions due to horizontal transfer.
SNP analysis confirmed that the clinical isolates formed a single, nearly-clonal clade with evidence for ancestry in California going back a decade. The clinical clade had a large core genome (4,399 genes) and a large and sparsely distributed accessory genome (2,577 genes, at least 64% on plasmids). At least 20 plasmid types occurred in the clinical clade, more than were found in the literature for Salmonella Newport. A small number of plasmids, 14 from 13 clinical isolates and 17 from 8 farm isolates, were found to be highly similar (> 95% identical)-indicating they might be related by horizontal transfer. Phylogenetic analysis was unable to determine the geographic origin, isolation source, or time of transfer of the plasmids, likely due to their promiscuous and transient nature. However, our resampling analysis suggested that observing a similar number and combination of highly similar plasmids in random samples of environmental Salmonella enterica within the NCBI Pathogen Detection database was unlikely, supporting a connection between the outbreak strain and the farms implicated by the epidemiological investigation.
Horizontally transferred plasmids provided evidence for a connection between clinical isolates and the farms implicated as the source of the outbreak. Our case study suggests that such analyses might add a new dimension to source tracking investigations, but highlights the need for detailed and accurate metadata, more extensive environmental sampling, and a better understanding of plasmid molecular evolution.
Whole genome sequencing of cultured pathogens is the state of the art public health response for the bioinformatic source tracking of illness outbreaks. Quasimetagenomics can substantially reduce the ...amount of culturing needed before a high quality genome can be recovered. Highly accurate short read data is analyzed for single nucleotide polymorphisms and multi-locus sequence types to differentiate strains but cannot span many genomic repeats, resulting in highly fragmented assemblies. Long reads can span repeats, resulting in much more contiguous assemblies, but have lower accuracy than short reads.
We evaluated the accuracy of Listeria monocytogenes assemblies from enrichments (quasimetagenomes) of naturally-contaminated ice cream using long read (Oxford Nanopore) and short read (Illumina) sequencing data. Accuracy of ten assembly approaches, over a range of sequencing depths, was evaluated by comparing sequence similarity of genes in assemblies to a complete reference genome. Long read assemblies reconstructed a circularized genome as well as a 71 kbp plasmid after 24 h of enrichment; however, high error rates prevented high fidelity gene assembly, even at 150X depth of coverage. Short read assemblies accurately reconstructed the core genes after 28 h of enrichment but produced highly fragmented genomes. Hybrid approaches demonstrated promising results but had biases based upon the initial assembly strategy. Short read assemblies scaffolded with long reads accurately assembled the core genes after just 24 h of enrichment, but were highly fragmented. Long read assemblies polished with short reads reconstructed a circularized genome and plasmid and assembled all the genes after 24 h enrichment but with less fidelity for the core genes than the short read assemblies.
The integration of long and short read sequencing of quasimetagenomes expedited the reconstruction of a high quality pathogen genome compared to either platform alone. A new and more complete level of information about genome structure, gene order and mobile elements can be added to the public health response by incorporating long read analyses with the standard short read WGS outbreak response.
Multidrug-resistant (MDR)
infections have been identified globally among men who have sex with men (MSM). The highly drug-resistant phenotype often confounds initial antimicrobial therapy, placing ...patients at risk for adverse outcomes, the development of more drug-resistant strains, and additional treatment failures. New macrolide-resistant
strains complicate treatment further as azithromycin is a next-in-line antibiotic for MDR strains, and an antibiotic-strain combination confounded by gaps in validated clinical breakpoints for clinical laboratories to interpret macrolide resistance in
We present the first high-resolution genomic analyses of 2,097 U.S.
isolates, including those from MDR outbreaks. A sentinel shigellosis case in an MSM patient revealed a strain carrying 12 plasmids, of which two carried known resistance genes, the pKSR100-related plasmid pMHMC-004 and spA-related plasmid pMHMC-012. Genomic-epidemiologic analyses of isolates revealed high carriage rates of pMHMC-004 predominantly in U.S. isolates from men and not in other demographic groups. Isolates genetically related to the sentinel case further harbored elevated numbers of unique replicons, showing the receptivity of this
lineage to plasmid acquisition. Findings from integrated genomic-epidemiologic analyses were leveraged to direct targeted clinical actions to improve rapid diagnosis and patient care and for public health efforts to further reduce spread.
Multidrug-resistant
isolates with resistance to macrolides are an emerging public health threat. We define a plasmid/pathogen complex behind infections seen in the United States and globally in vulnerable patient populations and identify multiple outbreaks in the United States and evidence of intercontinental transmission. Using new tools and sequence information, we experimentally identify the drivers of antibiotic resistance that complicate patient treatment to facilitate improvements to clinical microbiologic testing for their detection. We illustrate the use of these methods to support multiagency efforts to combat multidrug-resistant
using publicly available tools, existing genomic data, and resources in clinical microbiology and public health laboratories to inform credible actions to reduce spread.
Carbapenems—one of the important last-line antibiotics for the treatment of gram-negative infections—are becoming ineffective for treating
Acinetobacter baumannii
infections. Studies have identified ...multiple genes (and mechanisms) responsible for carbapenem resistance. In some
A. baumannii
strains, the presence/absence of putative resistance genes is not consistent with their resistance phenotype—indicating the genomic factors underlying carbapenem resistance in
A. baumannii
are not fully understood. Here, we describe a large-scale whole-genome genotype-phenotype association study with 349
A. baumannii
isolates that extends beyond the presence/absence of individual antimicrobial resistance genes and includes the genomic positions and pairwise interactions of genes. Ten known resistance genes exhibited statistically significant associations with resistance to imipenem, a type of carbapenem:
blaOXA-23, qacEdelta1, sul1, mphE, msrE, ant(3”)-II, aacC1, yafP, aphA6
, and
xerD
. A review of the strains without any of these 10 genes uncovered a clade of isolates with diverse imipenem resistance phenotypes. Finer resolution evaluation of this clade revealed the presence of a 38.6 kbp conserved chromosomal region found exclusively in imipenem-susceptible isolates. This region appears to host several HTH-type DNA binding transcriptional regulators and transporter genes. Imipenem-susceptible isolates from this clade also carried two mutually exclusive plasmids that contain genes previously known to be specific to imipenem-susceptible isolates. Our analysis demonstrates the utility of using whole genomes for genotype-phenotype correlations in the context of antibiotic resistance and provides several new hypotheses for future research.
Viruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously ...decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources. To this end, during this three-day continuation of the Virus Hunting Toolkit codeathon series (VHT-2), a new integrated and federated viral index was elaborated. This Federated Index of Viral Experiments (FIVE) integrates pre-existing and novel functional and taxonomy annotations and virus-host pairings. Variability in the context of viral genomic diversity is often overlooked in virus databases. As a proof-of-concept, FIVE was the first attempt to include viral genome variation for HIV, the most well-studied human pathogen, through viral genome diversity graphs. As per the publication of this manuscript, FIVE is the first implementation of a virus-specific federated index of such scope. FIVE is coded in BigQuery for optimal access of large quantities of data and is publicly accessible. Many projects of database or index federation fail to provide easier alternatives to access or query information. To this end, a Python API query system was developed to enhance the accessibility of FIVE.
Abstract
Motivation
Scientists seeking to understand the genomic basis of bacterial phenotypes, such as antibiotic resistance, today have access to an unprecedented number of complete and nearly ...complete genomes. Making sense of these data requires computational tools able to perform multiple-genome comparisons efficiently, yet currently available tools cannot scale beyond several tens of genomes.
Results
We describe PRAWNS, an efficient and scalable tool for multiple-genome analysis. PRAWNS defines a concise set of genomic features (metablocks), as well as pairwise relationships between them, which can be used as a basis for large-scale genotype–phenotype association studies. We demonstrate the effectiveness of PRAWNS by identifying genomic regions associated with antibiotic resistance in Acinetobacter baumannii.
Availability and implementation
PRAWNS is implemented in C++ and Python3, licensed under the GPLv3 license, and freely downloadable from GitHub (https://github.com/KiranJavkar/PRAWNS.git).
Supplementary information
Supplementary data are available at Bioinformatics online.
The Mid-Atlantic Microbiome Meet-up (M
) organization brings together academic, government, and industry groups to share ideas and develop best practices for microbiome research. In January of 2018, ...M
held its fourth meeting, which focused on recent advances in biodefense, specifically those relating to infectious disease, and the use of metagenomic methods for pathogen detection. Presentations highlighted the utility of next-generation sequencing technologies for identifying and tracking microbial community members across space and time. However, they also stressed the current limitations of genomic approaches for biodefense, including insufficient sensitivity to detect low-abundance pathogens and the inability to quantify viable organisms. Participants discussed ways in which the community can improve software usability and shared new computational tools for metagenomic processing, assembly, annotation, and visualization. Looking to the future, they identified the need for better bioinformatics toolkits for longitudinal analyses, improved sample processing approaches for characterizing viruses and fungi, and more consistent maintenance of database resources. Finally, they addressed the necessity of improving data standards to incentivize data sharing. Here, we summarize the presentations and discussions from the meeting, identifying the areas where microbiome analyses have improved our ability to detect and manage biological threats and infectious disease, as well as gaps of knowledge in the field that require future funding and focus.
Microbes strongly impact human health and the ecosystem of which they are a part. Rapid improvements and decreasing costs in sequencing technologies have revolutionized the field of genomics and ...enabled important insights into microbial genome biology and microbiomes. However, new tools and approaches are needed to facilitate the efficient analysis of large sets of genomes and to associate genomic features with phenotypic characteristics better. Here, we built and utilized several tools for large-scale whole-genome analysis for different microbial characteristics, such as antimicrobial resistance and pathogenicity, that are important for human health. Chapters 2 and 3 demonstrate the needs and challenges of population genomics in associating antimicrobial resistance with genomic features. Our results highlight important limitations of reference database-driven analysis for genotype-phenotype association studies and demonstrate the utility of whole-genome population genomics in uncovering novel genomic factors associated with antimicrobial resistance. Chapter 4 describes PRAWNS, a fast and scalable bioinformatics tool that generates compact pan-genomic features. Existing approaches are unable to meet the needs of large-scale whole-genome analyses, either due to scalability limitations or the inability of the genomic features generated to support a thorough whole-genome assessment. We demonstrate that PRAWNS scales to thousands of genomes and provides a concise collection of genomic features which support the downstream analyses. In Chapter 5, we assess whether the combination of long and short-read sequencing can expedite the accurate reconstruction of a pathogen genome from a microbial community. We describe the challenges for pathogen detection in current foodborne illness outbreak monitoring. Our results show that the recovery of a pathogen genome can be accelerated using a combination of long and short-read sequencing after limited culturing of the microbial community. We evaluated several popular genome assembly approaches and identified areas for improvement. In Chapter 6, we describe SIMILE, a fast and scalable bioinformatics tool that enables the detection of genomic regions shared between several assembled metagenomes. In metagenomics, microbial communities are sequenced directly without culturing. Although metagenomics has furthered our understanding of the microbiome, comparing metagenomic samples is extremely difficult. We describe the need and challenges in comparing several metagenomic samples and present an approach that facilitates large-scale metagenomic comparisons.
Many mobile applications such as Strava or Mapmyride allow cyclists to collect detailed GPS traces of their trips for health or route sharing purposes. However, cycling GPS traces also have a lot of ...potential from an urban planning perspective. In this paper, we focus on two important issues to characterize urban cyclist behavior: trip purpose and route choice. Cycling trip purpose has been typically analyzed using survey data. Here, we present a method to automatically infer the purpose of a cycling trip using cyclists' personal data, GPS traces and a variety of built-in and social environment features extracted from open datasets characterizing the streets cycled. We evaluate the proposed method using GPS traces from over 7, 000 cycling routes in the city of Philadelphia and report F1 scores of up to 86% when four trip purposes are considered. On the other hand, we also present a novel statistical method to identify the role that certain variables characterizing the built-in and social environment play in the selection of a specific cycling route. Our results show that cyclists in Philadelphia tend to favor routes with green areas, safety and centrality.
Advances in biotechnology now allow users to obtain their genetic information, including ancestry and predisposition to various diseases and health issues, with relative ease. With these new ...commercial services come a host of privacy concerns with respect to data sharing and access. User data is being sold to third parties, including pharmaceutical and biotechnology companies, and may be accessed by law enforcement in accordance with proper legal procedures. Moreover, many users of these services go on to deposit the data they obtain into online, public repositories that are fully accessible to anyone with an internet connection. The full extent of the risks they face may not be apparent to users. This paper reports on a semistructured interview study ( n=24 ) examining user concerns regarding these tests, what information they believe they are revealing, and what they think companies are doing with their data. We find that users are concerned with privacy, and understand at a basic level the nature of the data they are revealing. However, their privacy concerns are often insufficient to deter them from taking such a test, and many have difficulty grasping some of the implications of sharing their genetic information with commercial entities.