Faced with the ongoing global pandemic of coronavirus disease, the 'National Reference Centre for Whole Genome Sequencing of microbial pathogens: database and bioinformatic analysis' (GENPAT) ...formally established at the 'Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise' (IZSAM) in Teramo (Italy) is in charge of the SARS-CoV-2 surveillance at the genomic scale. In a context of SARS-CoV-2 surveillance requiring correct and fast assessment of epidemiological clusters from substantial amount of samples, the present study proposes an analytical workflow for identifying accurately the PANGO lineages of SARS-CoV-2 samples and building of discriminant minimum spanning trees (MST) bypassing the usual time consuming phylogenomic inferences based on multiple sequence alignment (MSA) and substitution model. GENPAT constituted two collections of SARS-CoV-2 samples. The first collection consisted of SARS-CoV-2 positive swabs collected by IZSAM from the Abruzzo region (Italy), then sequenced by next generation sequencing (NGS) and analyzed in GENPAT (n = 1592), while the second collection included samples from several Italian provinces and retrieved from the reference Global Initiative on Sharing All Influenza Data (GISAID) (n = 17,201). The main results of the present work showed that (i) GENPAT and GISAID detected the same PANGO lineages, (ii) the PANGO lineages B.1.177 (i.e. historical in Italy) and B.1.1.7 (i.e. 'UK variant') are major concerns today in several Italian provinces, and the new MST-based method (iii) clusters most of the PANGO lineages together, (iv) with a higher dicriminatory power than PANGO lineages, (v) and faster that the usual phylogenomic methods based on MSA and substitution model. The genome sequencing efforts of Italian provinces, combined with a structured national system of NGS data management, provided support for surveillance SARS-CoV-2 in Italy. We propose to build phylogenomic trees of SARS-CoV-2 variants through an accurate, discriminant and fast MST-based method avoiding the typical time consuming steps related to MSA and substitution model-based phylogenomic inference.
•Higher transmissibility of SARS-CoV-2 lineage B.1.1.7 may be related to higher viral loads.•Higher RNA loads were observed in nasopharyngeal swabs from individuals with lineage B.1.1.7.•A ...significantly longer persistence of SARS-CoV-2 was observed in B.1.1.7-infected individuals.
•Following the announcement on December 2020 about the emergence of a new variant (VOC 202012/01, B.1.1.7 lineage) in the United Kingdom, a targeted surveillance was put in place in the Abruzzo region (Italy), which allowed detection of 313 persons affected by lineage B.1.1.7, up to the 20th of February 2021. We investigated the results of RT-PCR on nasopharyngeal swabs tested from December 2020 to February 2021 to verify any difference on the viral load and persistence between people infected by lineage B.1.1.7 and others. Statistically significant lower values of CT associated with the detection of the N protein encoding gene (CT N) were observed in persons with lineage B.1.1.7 infection (median CT N = 15.8) in comparison to those infected by other lineages (median CT N = 16.9). A significantly longer duration of the persistence of SARS-CoV-2 RNA in nasopharyngeal swabs was observed in persons with lineage B.1.1.7 infection (16 days) in comparison to those infected by other lineages (14 days).
Genomic data-based machine learning tools are promising for real-time surveillance activities performing source attribution of foodborne bacteria such as Listeria monocytogenes. Given the ...heterogeneity of machine learning practices, our aim was to identify those influencing the source prediction performance of the usual holdout method combined with the repeated k-fold cross-validation method. A large collection of 1 100 L. monocytogenes genomes with known sources was built according to several genomic metrics to ensure authenticity and completeness of genomic profiles. Based on these genomic profiles (i.e. 7-locus alleles, core alleles, accessory genes, core SNPs and pan kmers), we developed a versatile workflow assessing prediction performance of different combinations of training dataset splitting (i.e. 50, 60, 70, 80 and 90%), data preprocessing (i.e. with or without near-zero variance removal), and learning models (i.e. BLR, ERT, RF, SGB, SVM and XGB). The performance metrics included accuracy, Cohen's kappa, F1-score, area under the curves from receiver operating characteristic curve, precision recall curve or precision recall gain curve, and execution time. The testing average accuracies from accessory genes and pan kmers were significantly higher than accuracies from core alleles or SNPs. While the accuracies from 70 and 80% of training dataset splitting were not significantly different, those from 80% were significantly higher than the other tested proportions. The near-zero variance removal did not allow to produce results for 7-locus alleles, did not impact significantly the accuracy for core alleles, accessory genes and pan kmers, and decreased significantly accuracy for core SNPs. The SVM and XGB models did not present significant differences in accuracy between each other and reached significantly higher accuracies than BLR, SGB, ERT and RF, in this order of magnitude. However, the SVM model required more computing power than the XGB model, especially for high amount of descriptors such like core SNPs and pan kmers. In addition to recommendations about machine learning practices for L. monocytogenes source attribution based on genomic data, the present study also provides a freely available workflow to solve other balanced or unbalanced multiclass phenotypes from binary and categorical genomic profiles of other microorganisms without source code modifications.
Streptococcus suis is a pathogen associated with severe diseases in pigs and humans. Human infections have a zoonotic origin in pigs. To assess circulating strains, we characterized the serotypes, ...sequence types, and antimicrobial susceptibility of 78 S. suis isolates from diseased farmed pigs in Italy during 2017-2019. Almost 60% of infections were caused by serotypes 1/2 and 9. All but 1 of the serotype 2 and 1/2 isolates were confined to a single cluster, and serotype 9 isolates were distributed along the phylogenetic tree. Besides sequence type (ST) 1, the serotype 2 cluster included ST7, which caused severe human infections in China in 1998 and 2005. A large proportion of serotype 9 isolates, assigned to ST123, were resistant to penicillin. The emergence of this clone threatens the successful treatment of S. suis infection. Characterizing S. suis isolates from pigs will promote earlier detection of emerging clones.
We detected severe acute respiratory syndrome coronavirus 2 in an otherwise healthy poodle living with 4 family members who had coronavirus disease. We observed antibodies in serum samples taken from ...the dog, indicating seroconversion. Full-length genome sequencing showed that the canine and human viruses were identical, suggesting human-to-animal transmission.
Whole genome sequencing analyzed by core genome multi-locus sequence typing (cgMLST) is widely used in surveillance of the pathogenic bacteria Listeria monocytogenes. Given the heterogeneity of ...available bioinformatics tools to define cgMLST alleles, our aim was to identify parameters influencing the precision of cgMLST profiles.
We used three L. monocytogenes reference genomes from different phylogenetic lineages and assessed the impact of in vitro (i.e. tested genomes, successive platings, replicates of DNA extraction and sequencing) and in silico parameters (i.e. targeted depth of coverage, depth of coverage, breadth of coverage, assembly metrics, cgMLST workflows, cgMLST completeness) on cgMLST precision made of 1748 core loci. Six cgMLST workflows were tested, comprising assembly-based (BIGSdb, INNUENDO, GENPAT, SeqSphere and BioNumerics) and assembly-free (i.e. kmer-based MentaLiST) allele callers. Principal component analyses and generalized linear models were used to identify the most impactful parameters on cgMLST precision.
The isolate's genetic background, cgMLST workflows, cgMLST completeness, as well as depth and breadth of coverage were the parameters that impacted most on cgMLST precision (i.e. identical alleles against reference circular genomes). All workflows performed well at ≥40X of depth of coverage, with high loci detection (> 99.54% for all, except for BioNumerics with 97.78%) and showed consistent cluster definitions using the reference cut-off of ≤7 allele differences.
This highlights that bioinformatics workflows dedicated to cgMLST allele calling are largely robust when paired-end reads are of high quality and when the sequencing depth is ≥40X.
Italy's second wave of SARS-CoV-2 has hit hard, with more than three million cases and over 100,000 deaths, representing an almost ten-fold increase in the numbers reported by August 2020. Herein, we ...present an analysis of 6515 SARS-CoV-2 sequences sampled in Italy between 29 January 2020 and 1 March 2021 and show how different lineages emerged multiple times independently despite lockdown restrictions. Virus lineage B.1.177 became the dominant variant in November 2020, when cases peaked at 40,000 a day, but since January 2021 this is being replaced by the B.1.1.7 'variant of concern'. In addition, we report a sudden increase in another documented variant of concern-lineage P.1-from December 2020 onwards, most likely caused by a single introduction into Italy. We again highlight how international importations drive the emergence of new lineages and that genome sequencing should remain a top priority for ongoing surveillance in Italy.
During November 2021-May 2022, we identified 37 clinical cases of Streptococcus equi subspecies zooepidemicus infections in central Italy. Epidemiologic investigations and whole-genome sequencing ...showed unpasteurized fresh dairy products were the outbreak source. Early diagnosis by using sequencing technology prevented the spread of life-threatening S. equi subsp. zooepidemicus infections.
Italy was one of the first countries to experience a major epidemic of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), with >1000 cases confirmed by 1 March 2020. However, virus genome ...sequence data is sparse and there has been only limited investigation of virus transmission across the country. Here, we provide the most extensive study to date of the genomic epidemiology of SARS-CoV-2 in Italy covering the first wave of infection. We generated 191 new full-length genomes, largely sampled from central Italy (Abruzzo), before, during, and after the enforcement of a nationwide "lockdown" (8 March-3 June). These were combined with 460 published SARS-CoV-2 sequences sampled across Italy. Phylogenetic analysis including global sequence data revealed multiple independent introductions into Italy, with at least 124 instances of sequence clusters representing longer chains of transmission. Eighteen of these transmission clusters emerged before the nation-wide lockdown was implemented on 8 March, and an additional 18 had evidence for transmission between different Italian regions. Extended transmission periods between infections of up to 104 days were observed in five clusters. In addition, we found seven clusters that persisted throughout the lockdown period. Overall, we show how importations were an important driver of the first wave of SARS-CoV-2 in Italy.
In this study, we characterized 84
Listeria monocytogenes
(Lm) strains having an atypical IVb-v1 profile and isolated in a meat producing plant of Central Italy. They were assigned to the new MLST ...type ST2801 (CC218). The new ST was widespread in the food-producing environment where it was able to persist for over a year even after cleaning and sanitation. Cluster analysis identified three main clusters genetically close to each other (0–22 allelic differences and 0–28 SNPs) from two different cgMLST types, suggesting a common source. The coexistence of closely related clusters over time could be the result of a different evolution path starting from a common ancestor first introduced in the plant and/or the consequence of the repetitive reintroduction of closely related clones probably by raw materials. All the strains presented several determinants for heavy metals resistance, stress response, biofilm production, and multidrug efflux pumps with no significant differences among the clusters. A total of 53 strains carried pLI100 and the j1776 plasmids, while in one strain, the pLM33 was found in addition to pLI100. Only the strains carrying plasmids presented
cadA
and
cadC
for cadmium resistance and the
mco
gene encoding a multicopper oxidase and
gerN
for an additional Na+/H+-K+ antiporter. All the strains presented a virulence profile including a full-length
inlA
gene and the additional LIPI-3. The isolation of a new ST with a large pattern of stress-adaptation genes and able to persist is an important contribution to deepening the current knowledge on the uncommon IVb-v1 and in general on the genomic diversity of Lm.