Next-generation sequencing (NGS) allows ultra-deep sequencing of nucleic acids. The use of sequence-independent amplification of viral nucleic acids without utilization of target-specific primers ...provides advantages over traditional sequencing methods and allows detection of unsuspected variants and co-infecting agents. However, NGS is not widely used for small RNA viruses because of incorrectly perceived cost estimates and inefficient utilization of freely available bioinformatics tools.
In this study, we have utilized NGS-based random sequencing of total RNA combined with barcode multiplexing of libraries to quickly, effectively and simultaneously characterize the genomic sequences of multiple avian paramyxoviruses. Thirty libraries were prepared from diagnostic samples amplified in allantoic fluids and their total RNAs were sequenced in a single flow cell on an Illumina MiSeq instrument. After digital normalization, data were assembled using the MIRA assembler within a customized workflow on the Galaxy platform.
Twenty-eight avian paramyxovirus 1 (APMV-1), one APMV-13, four avian influenza and two infectious bronchitis virus complete or nearly complete genome sequences were obtained from the single run. The 29 avian paramyxovirus genomes displayed 99.6% mean coverage based on bases with Phred quality scores of 30 or more. The lower and upper quartiles of sample median depth per position for those 29 samples were 2984 and 6894, respectively, indicating coverage across samples sufficient for deep variant analysis. Sample processing and library preparation took approximately 25-30 h, the sequencing run took 39 h, and processing through the Galaxy workflow took approximately 2-3 h. The cost of all steps, excluding labor, was estimated to be 106 USD per sample.
This work describes an efficient multiplexing NGS approach, a detailed analysis workflow, and customized tools for the characterization of the genomes of RNA viruses. The combination of multiplexing NGS technology with the Galaxy workflow platform resulted in a fast, user-friendly, and cost-efficient protocol for the simultaneous characterization of multiple full-length viral genomes. Twenty-nine full-length or near-full-length APMV genomes with a high median depth were successfully sequenced out of 30 samples. The applied de novo assembly approach also allowed identification of mixed viral populations in some of the samples.
Abstract
Background
Long-read sequencing has shown its tremendous potential to address genome assembly challenges, e.g., achieving the first telomere-to-telomere assembly of a gapless human ...chromosome. However, many issues remain unresolved when leveraging error-prone long reads to characterize high-complexity metagenomes, for instance, complete/high-quality genome reconstruction from highly complex systems.
Results
Here, we developed an iterative haplotype-resolved hierarchical clustering-based hybrid assembly (HCBHA) approach that capitalizes on a hybrid (error-prone long reads and high-accuracy short reads) sequencing strategy to reconstruct (near-) complete genomes from highly complex metagenomes. Using the HCBHA approach, we first phase short and long reads from the highly complex metagenomic dataset into different candidate bacterial haplotypes, then perform hybrid assembly of each bacterial genome individually. We reconstructed 557 metagenome-assembled genomes (MAGs) with an average N50 of 574 Kb from a deeply sequenced, highly complex activated sludge (AS) metagenome. These high-contiguity MAGs contained 14 closed genomes and 111 high-quality (HQ) MAGs including full-length rRNA operons, which accounted for 61.1% of the microbial community. Leveraging the near-complete genomes, we also profiled the metabolic potential of the AS microbiome and identified 2153 biosynthetic gene clusters (BGCs) encoded within the recovered AS MAGs.
Conclusion
Our results established the feasibility of an iterative haplotype-resolved HCBHA approach to reconstruct (near-) complete genomes from highly complex ecosystems, providing new insights into “complete metagenomics”. The retrieved high-contiguity MAGs illustrated that various biosynthetic gene clusters (BGCs) were harbored in the AS microbiome. The high diversity of BGCs highlights the potential to discover new natural products biosynthesized by the AS microbial community, aside from the traditional function (e.g., organic carbon and nitrogen removal) in wastewater treatment.
is a Gram-positive, spore-forming anaerobic bacterium that produces botulinum neurotoxin (BoNT). Closing their genomes provides information about their neurotoxin clusters' arrangement(s) and their ...location (e.g., chromosome or plasmid) which cannot be assessed using draft genomes. Therefore, we tested the use of long-read sequencing (nanopore sequencing) in combination with short-read sequencing to close two toxin-producing strains. These genomes could be used by the Public Health Emergency Preparedness and Response staff during botulism outbreaks. The genomes of two toxin-producing
strains, one from an environmental sample (83F_CFSAN034202) and the other from a clinical sample (CDC51232_CFSAN034200) were sequenced using MinION and MiSeq devices. The genomes, including the chromosomes and the plasmids, were closed by a combination of long-read and short-read sequencing. They belonged to different
sequence types (STs), with 83F belonging to ST4 and CDC51232 to ST7. A whole genome single nucleotide polymorphism (SNP) analysis clustered these two strains with strains in lineage 2 (e.g., 6CDC297) and 4 (e.g., NCTC2916) from Group I, respectively. These two strains were also bivalent strains with the BoNTB and BoNTA4 clusters located in the larger plasmid for CDC51232, and the BoNTB and BoNTA1 clusters located both in the chromosome for 83F. Overall, this study showed the advantage of combining these two sequencing methods to obtain high quality closed
genomes that could be used for SNP phylogenies (source tracking) as well as for fast identification of BoNT clusters and their gene arrangements.
Xanthomonas translucens pv. graminis (Xtg) is a major bacterial pathogen of economically important forage grasses, causing severe yield losses. So far, genomic resources for this pathovar consisted ...mostly of draft genome sequences, and only one complete genome sequence was available, preventing comprehensive comparative genomic analyses. Such comparative analyses are essential in understanding the mechanisms involved in the virulence of pathogens and to identify virulence factors involved in pathogenicity.
In this study, we produced high-quality, complete genome sequences of four strains of Xtg, complementing the recently obtained complete genome sequence of the Xtg pathotype strain. These genomic resources allowed for a comprehensive comparative analysis, which revealed a high genomic plasticity with many chromosomal rearrangements, although the strains were highly related. A high number of transposases were exclusively found in Xtg and corresponded to 413 to 457 insertion/excision transposable elements per strain. These mobile genetic elements are likely to be involved in the observed genomic plasticity and may play an important role in the adaptation of Xtg. The pathovar was found to lack a type IV secretion system, and it possessed the smallest set of type III effectors in the species. However, three XopE and XopX family effectors were found, while in the other pathovars of the species two or less were present. Additional genes that were specific to the pathovar were identified, including a unique set of minor pilins of the type IV pilus, 17 TonB-dependent receptors (TBDRs), and 11 plant cell wall degradative enzymes.
These results suggest a high adaptability of Xtg, conferred by the abundance of mobile genetic elements, which could play a crucial role in pathogen adaptation. The large amount of such elements in Xtg compared to other pathovars of the species could, at least partially, explain its high virulence and broad host range. Conserved features that were specific to Xtg were identified, and further investigation will help to determine genes that are essential to pathogenicity and host adaptation of Xtg.
To date, there is a dearth of information on canine parvovirus-2 (CPV-2) from the Caribbean region. During August–October 2020, the veterinary clinic on the Caribbean island of Nevis reported 64 ...household dogs with CPV-2-like clinical signs (hemorrhagic/non-hemorrhagic diarrhea and vomiting), of which 27 animals died. Rectal swabs/fecal samples were obtained from 43 dogs. A total of 39 of the 43 dogs tested positive for CPV-2 antigen and/or DNA, while 4 samples, negative for CPV-2 antigen, were not available for PCR. Among the 21 untested dogs, 15 had CPV-2 positive littermates. Analysis of the complete VP2 sequences of 32 strains identified new CPV-2a (CPV-2a with Ser297Ala in VP2) as the predominant CPV-2 on Nevis Island. Two nonsynonymous mutations, one rare (Asp373Asn) and the other uncommon (Ala262Thr), were observed in a few VP2 sequences. It was intriguing that new CPV-2a was associated with an outbreak of gastroenteritis on Nevis while found at low frequencies in sporadic cases of diarrhea on the neighboring island of St. Kitts. The nearly complete CPV-2 genomes (4 CPV-2 strains from St. Kitts and Nevis (SKN)) were reported for the first time from the Caribbean region. Eleven substitutions were found among the SKN genomes, which included nine synonymous substitutions, five of which have been rarely reported, and the two nonsynonymous substitutions. Phylogenetically, the SKN CPV-2 sequences formed a distinct cluster, with CPV-2b/USA/1998 strains constituting the nearest cluster. Our findings suggested that new CPV-2a is endemic in the region, with the potential to cause severe outbreaks, warranting further studies across the Caribbean Islands. Analysis of the SKN CPV-2 genomes corroborated the hypothesis that recurrent parallel evolution and reversion might play important roles in the evolution of CPV-2.
Escherichia coli is one of the major pathogens causing mastitis in lactating mammals. We hypothesized that E. coli from the gut and mammary glands may have similar genomic characteristics in the ...causation of mastitis. To test this hypothesis, we used whole genome sequencing to analyze two multidrug resistant E. coli strains isolated from mammary tissue (G2M6U) and fecal sample (G6M1F) of experimentally induced mastitis mice. Both strains showed resistance to multiple (>7) antibiotics such as oxacillin, aztreonam, nalidixic acid, streptomycin, gentamicin, cefoxitin, ampicillin, tetracycline, azithromycin and nitrofurantoin. The genome of E. coli G2M6U had 59 antimicrobial resistance genes (ARGs) and 159 virulence factor genes (VFGs), while the E. coli G6M1F genome possessed 77 ARGs and 178 VFGs. Both strains were found to be genetically related to many E. coli strains causing mastitis and enteric diseases originating from different hosts and regions. The G6M1F had several unique ARGs (e.g., QnrS1, sul2, tetA, tetR, emrK, blaTEM-1/105, and aph(6)-Id, aph(3″)-Ib) conferring resistance to certain antibiotics, whereas G2M6U had a unique heat-stable enterotoxin gene (astA) and 7192 single nucleotide polymorphisms. Furthermore, there were 43 and 111 unique genes identified in G2M6U and G6M1F genomes, respectively. These results indicate distinct differences in the genomic characteristics of E. coli strain G2M6U and G6M1F that might have important implications in the pathophysiology of mammalian mastitis, and treatment strategies for mastitis in dairy animals.
The novel duck reovirus (NDRV) emerged in southeast China in 2005. The virus causes severe liver and spleen hemorrhage and necrosis in various duck species, bringing serious harm to waterfowl ...farming. In this study, three strains of NDRV designated as NDRV-ZSS-FJ20, NDRV-LRS-GD20, and NDRV-FJ19 were isolated from diseased Muscovy ducks in Guangdong and Fujian provinces. Pairwise sequence comparisons revealed that the three strains were closely related to NDRV, with nucleotide sequence identities for 10 genomic fragments ranging between 84.8 and 99.8%. In contrast, the nucleotide sequences of the three strains were only 38.9-80.9% similar to the chicken-origin reovirus and only 37.6-98.9% similar to the classical waterfowl-origin reovirus. Similarly, phylogenetic analysis revealed that the three strains clustered together with NDRV and were significantly different from classical waterfowl-origin reovirus and chicken-origin reovirus. In addition, the analyses showed that the L1 segment of the NDRV-FJ19 strain was a recombinant of 03G and J18 strains. Experimental reproduction of the disease showed that the NDRV-FJ19 strain was pathogenic to both ducks and chickens and could lead to symptoms of hemorrhage and necrosis in the liver and spleen. This was somewhat different from previous reports that NDRV is less pathogenic to chickens. In conclusion, we speculated that the NDRV-FJ19 causing duck liver and spleen necrosis is a new variant of a duck orthoreovirus that is significantly different in pathogenicity from any previously reported waterfowl-origin orthoreovirus.
Thousands of plasmid sequences are now publicly available in the NCBI nucleotide database, but they are not reliably annotated to distinguish complete plasmids from plasmid fragments, such as gene or ...contig sequences; therefore, retrieving complete plasmids for downstream analyses is challenging. Here we present a curated dataset of complete bacterial plasmids from the clinically relevant Enterobacteriaceae family. The dataset was compiled from the NCBI nucleotide database using curation steps designed to exclude incomplete plasmid sequences, and chromosomal sequences misannotated as plasmids. Over 2000 complete plasmid sequences are included in the curated plasmid dataset. Protein sequences produced from translating each complete plasmid nucleotide sequence in all 6 frames are also provided. Further analysis and discussion of the dataset is presented in an accompanying research article: “Ordering the mob: insights into replicon and MOB typing…” (Orlek et al., 2017) 1. The curated plasmid sequences are publicly available in the Figshare repository.
In this study, new chloroplast (cp) resources were developed for the genus Cynara, using whole cp genomes from 20 genotypes, by means of high‐throughput sequencing technologies. Our target species ...included seven globe artichokes, two cultivated cardoons, eight wild artichokes, and three other wild Cynara species (C. baetica, C. cornigera and C. syriaca). One complete cp genome was isolated using short reads from a whole‐genome sequencing project, while the others were obtained by means of long‐range PCR, for which primer pairs are provided here. A de novo assembly strategy combined with a reference‐based assembly allowed us to reconstruct each cp genome. Comparative analyses among the newly sequenced genotypes and two additional Cynara cp genomes (‘Brindisino’ artichoke and C. humilis) retrieved from public databases revealed 126 parsimony informative characters and 258 singletons in Cynara, for a total of 384 variable characters. Thirty‐nine SSR loci and 34 other INDEL events were detected. After data analysis, 37 primer pairs for SSR amplification were designed, and these molecular markers were subsequently validated in our Cynara genotypes. Phylogenetic analysis based on all cp variable characters provided the best resolution when compared to what was observed using only parsimony informative characters, or only short ‘variable’ cp regions. The evaluation of the molecular resources obtained from this study led us to support the ‘super‐barcode’ theory and consider the total cp sequence of Cynara as a reliable and valuable molecular marker for exploring species diversity and examining variation below the species level.
nematicidal bacterial strains are used to control plant parasitic nematode infestation of crops in agricultural production. Proteases are presumed to be the primary nematode virulence factors in ...nematicidal
degrading the nematode cuticle and other organs. We determined and compared the whole genome sequences of two nematicidal strains. Comparative genomics with a particular focus on possible virulence determinants revealed a wider range of possible virulence factors in a
isolate from a commercial bionematicide and a wild type
sp. isolate with nematicidal activity. The resulting 4.6 Mb
I-1582 and 5.3 Mb
sp. ZZV12-4809 genome assemblies contain respectively 18 and 19 homologs to nematode-virulent proteases, two nematode-virulent chitinase homologs in ZZV12-4809 and 28 and 36 secondary metabolite biosynthetic clusters, projected to encode antibiotics, small peptides, toxins and siderophores. The results of this study point to the genetic capability of
and related species for nematode virulence through a range of direct and indirect mechanisms.