Abstract
Summary
Genome-wide association studies (GWAS) in microbes have different challenges to GWAS in eukaryotes. These have been addressed by a number of different methods. pyseer brings these ...techniques together in one package tailored to microbial GWAS, allows greater flexibility of the input data used, and adds new methods to interpret the association results.
Availability and implementation
pyseer is written in python and is freely available at https://github.com/mgalardini/pyseer, or can be installed through pip. Documentation and a tutorial are available at http://pyseer.readthedocs.io.
Supplementary information
Supplementary data are available at Bioinformatics online.
Abstract
Population-level comparisons of prokaryotic genomes must take into account the substantial differences in gene content resulting from horizontal gene transfer, gene duplication and gene ...loss. However, the automated annotation of prokaryotic genomes is imperfect, and errors due to fragmented assemblies, contamination, diverse gene families and mis-assemblies accumulate over the population, leading to profound consequences when analysing the set of all genes found in a species. Here, we introduce Panaroo, a graph-based pangenome clustering tool that is able to account for many of the sources of error introduced during the annotation of prokaryotic genome assemblies. Panaroo is available at
https://github.com/gtonkinhill/panaroo
.
We present fastbaps, a fast solution to the genetic clustering problem. Fastbaps rapidly identifies an approximate fit to a Dirichlet process mixture model (DPM) for clustering multilocus genotype ...data. Our efficient model-based clustering approach is able to cluster datasets 10-100 times larger than the existing model-based methods, which we demonstrate by analyzing an alignment of over 110 000 sequences of HIV-1 pol genes. We also provide a method for rapidly partitioning an existing hierarchy in order to maximize the DPM model marginal likelihood, allowing us to split phylogenetic trees into clades and subclades using a population genomic model. Extensive tests on simulated data as well as a diverse set of real bacterial and viral datasets show that fastbaps provides comparable or improved solutions to previous model-based methods, while being significantly faster. The method is made freely available under an open source MIT licence as an easy to use R package at https://github.com/gtonkinhill/fastbaps.
The DNA damage response (DDR) is an organized network of multiple interwoven components evolved to repair damaged DNA and maintain genome fidelity. Conceptually the DDR includes damage sensors, ...transducer kinases, and effectors to maintain genomic stability and accurate transmission of genetic information. We have recently gained a substantially improved molecular and mechanistic understanding of how DDR components are interconnected to inflammatory and immune responses to stress. DDR shapes both innate and adaptive immune pathways: (i) in the context of innate immunity, DDR components mainly enhance cytosolic DNA sensing and its downstream STimulator of INterferon Genes (STING)-dependent signaling; (ii) in the context of adaptive immunity, the DDR is needed for the assembly and diversification of antigen receptor genes that is requisite for T and B lymphocyte development. Imbalances between DNA damage and repair impair tissue homeostasis and lead to replication and transcription stress, mutation accumulation, and even cell death. These impacts from DDR defects can then drive tumorigenesis, secretion of inflammatory cytokines, and aberrant immune responses. Yet, DDR deficiency or inhibition can also directly enhance innate immune responses. Furthermore, DDR defects plus the higher mutation load in tumor cells synergistically produce primarily tumor-specific neoantigens, which are powerfully targeted in cancer immunotherapy by employing immune checkpoint inhibitors to amplify immune responses. Thus, elucidating DDR-immune response interplay may provide critical connections for harnessing immunomodulatory effects plus targeted inhibition to improve efficacy of radiation and chemotherapies, of immune checkpoint blockade, and of combined therapeutic strategies.
Abstract
Mechanistic studies in DNA repair have focused on roles of multi-protein DNA complexes, so how long non-coding RNAs (lncRNAs) regulate DNA repair is less well understood. Yet, lncRNA LINP1 ...is over-expressed in multiple cancers and confers resistance to ionizing radiation and chemotherapeutic drugs. Here, we unveil structural and mechanistic insights into LINP1’s ability to facilitate non-homologous end joining (NHEJ). We characterized LINP1 structure and flexibility and analyzed interactions with the NHEJ factor Ku70/Ku80 (Ku) and Ku complexes that direct NHEJ. LINP1 self-assembles into phase-separated condensates via RNA–RNA interactions that reorganize to form filamentous Ku-containing aggregates. Structured motifs in LINP1 bind Ku, promoting Ku multimerization and stabilization of the initial synaptic event for NHEJ. Significantly, LINP1 acts as an effective proxy for PAXX. Collective results reveal how lncRNA effectively replaces a DNA repair protein for efficient NHEJ with implications for development of resistance to cancer therapy.
The routine use of genomics for disease surveillance provides the opportunity for high-resolution bacterial epidemiology. Current whole-genome clustering and multilocus typing approaches do not fully ...exploit core and accessory genomic variation, and they cannot both automatically identify, and subsequently expand, clusters of significantly similar isolates in large data sets spanning entire species. Here, we describe PopPUNK (
ulation
artitioning
sing
ucleotide
-mers), a software implementing scalable and expandable annotation- and alignment-free methods for population analysis and clustering. Variable-length
-mer comparisons are used to distinguish isolates' divergence in shared sequence and gene content, which we demonstrate to be accurate over multiple orders of magnitude using data from both simulations and genomic collections representing 10 taxonomically widespread species. Connections between closely related isolates of the same strain are robustly identified, despite interspecies variation in the pairwise distance distributions that reflects species' diverse evolutionary patterns. PopPUNK can process 10
-10
genomes in a single batch, with minimal memory use and runtimes up to 200-fold faster than existing model-based methods. Clusters of strains remain consistent as new batches of genomes are added, which is achieved without needing to reanalyze all genomes de novo. This facilitates real-time surveillance with consistent cluster naming between studies and allows for outbreak detection using hundreds of genomes in minutes. Interactive visualization and online publication is streamlined through the automatic output of results to multiple platforms. PopPUNK has been designed as a flexible platform that addresses important issues with currently used whole-genome clustering and typing methods, and has potential uses across bacterial genetics and public health research.
Bacterial genomes vary extensively in terms of both gene content and gene sequence. This plasticity hampers the use of traditional SNP-based methods for identifying all genetic associations with ...phenotypic variation. Here we introduce a computationally scalable and widely applicable statistical method (SEER) for the identification of sequence elements that are significantly enriched in a phenotype of interest. SEER is applicable to tens of thousands of genomes by counting variable-length k-mers using a distributed string-mining algorithm. Robust options are provided for association analysis that also correct for the clonal population structure of bacteria. Using large collections of genomes of the major human pathogens Streptococcus pneumoniae and Streptococcus pyogenes, SEER identifies relevant previously characterized resistance determinants for several antibiotics and discovers potential novel factors related to the invasiveness of S. pyogenes. We thus demonstrate that our method can answer important biologically and medically relevant questions.
In less than a decade, population genomics of microbes has progressed from the effort of sequencing dozens of strains to thousands, or even tens of thousands of strains in a single study. There are ...now hundreds of thousands of genomes available even for a single bacterial species, and the number of genomes is expected to continue to increase at an accelerated pace given the advances in sequencing technology and widespread genomic surveillance initiatives. This explosion of data calls for innovative methods to enable rapid exploration of the structure of a population based on different data modalities, such as multiple sequence alignments, assemblies and estimates of gene content across different genomes. Here, we present Mandrake, an efficient implementation of a dimensional reduction method tailored for the needs of large-scale population genomics. Mandrake is capable of visualizing population structure from millions of whole genomes, and we illustrate its usefulness with several datasets representing major pathogens. Our method is freely available both as an analysis pipeline (
https://github.com/johnlees/mandrake
) and as a browser-based interactive application (
https://gtonkinhill.github.io/mandrake-web/
).
This article is part of a discussion meeting issue ‘Genomic population structures of microbial pathogens’.
Bacterial genome data are accumulating at an unprecedented speed due to the routine use of sequencing in clinical diagnoses, public health surveillance, and population genetics studies. Genealogical ...reconstruction is fundamental to many of these uses; however, inferring genealogy from large-scale genome data sets quickly, accurately, and flexibly is still a challenge. Here, we extend an alignment- and annotation-free method, PopPUNK, to increase its flexibility and interpretability across data sets. Our method, iterative-PopPUNK, rapidly produces multiple consistent cluster assignments across a range of sequence identities. By constructing a partially resolved genealogical tree with respect to these clusters, users can select a resolution most appropriate for their needs. We showed the accuracy of clusters at all levels of similarity and genealogical inference of iterative-PopPUNK based on simulated data and obtained phylogenetically concordant results in real data sets from seven bacterial species. Using two example sets of
and
genomes, we show that iterative-PopPUNK can achieve cluster resolutions ranging from phylogroup down to sequence typing (ST). The iterative-PopPUNK algorithm is implemented in the "PopPUNK_iterate" program, available as part of the PopPUNK package.
Successful infection by mucosal pathogens requires overcoming the mucus barrier. To better understand this key step, we performed a survey of the interactions between human respiratory mucus and the ...human pathogen Streptococcus pneumoniae. Pneumococcal adherence to adult human nasal fluid was seen only by isolates expressing pilus-1. Robust binding was independent of pilus-1 adhesive properties but required Fab-dependent recognition of RrgB, the pilus shaft protein, by naturally acquired secretory IgA (sIgA). Pilus-1 binding by specific sIgA led to bacterial agglutination, but adherence required interaction of agglutinated pneumococci and entrapment in mucus particles. To test the effect of these interactions in vivo, pneumococci were preincubated with human sIgA before intranasal challenge in a mouse model of colonization. sIgA treatment resulted in rapid immune exclusion of pilus-expressing pneumococci. Our findings predict that immune exclusion would select for nonpiliated isolates in individuals who acquired RrgB-specific sIgA from prior episodes of colonization with piliated strains. Accordingly, genomic data comparing isolates carried by mothers and their children showed that mothers are less likely to be colonized with pilus-expressing strains. Our study provides a specific example of immune exclusion involving naturally acquired antibody in the human host, a major factor driving pneumococcal adaptation.