Accurate and rapid typing of pathogens is essential for effective surveillance and outbreak detection. Conventional serotyping of Escherichia coli is a delicate, laborious, time-consuming, and ...expensive procedure. With whole-genome sequencing (WGS) becoming cheaper, it has vast potential in routine typing and surveillance. The aim of this study was to establish a valid and publicly available tool for WGS-based in silico serotyping of E. coli applicable for routine typing and surveillance. A FASTA database of specific O-antigen processing system genes for O typing and flagellin genes for H typing was created as a component of the publicly available Web tools hosted by the Center for Genomic Epidemiology (CGE) (www.genomicepidemiology.org). All E. coli isolates available with WGS data and conventional serotype information were subjected to WGS-based serotyping employing this specific SerotypeFinder CGE tool. SerotypeFinder was evaluated on 682 E. coli genomes, 108 of which were sequenced for this study, where both the whole genome and the serotype were available. In total, 601 and 509 isolates were included for O and H typing, respectively. The O-antigen genes wzx, wzy, wzm, and wzt and the flagellin genes fliC, flkA, fllA, flmA, and flnA were detected in 569 and 508 genome sequences, respectively. SerotypeFinder for WGS-based O and H typing predicted 560 of 569 O types and 504 of 508 H types, consistent with conventional serotyping. In combination with other available WGS typing tools, E. coli serotyping can be performed solely from WGS data, providing faster and cheaper typing than current routine procedures and making WGS typing a superior alternative to conventional typing strategies.
CYP2D6 is one of the most studied enzymes in the field of pharmacogenetics. The CYP2D6 gene is highly polymorphic with over 100 catalogued star (*) alleles, and clinical CYP2D6 testing is ...increasingly accessible and supported by practice guidelines. However, the degree of variation at the CYP2D6 locus and homology with its pseudogenes make interrogating CYP2D6 by short-read sequencing challenging. Moreover, accurate prediction of CYP2D6 metabolizer status necessitates analysis of duplicated alleles when an increased copy number is detected. These challenges have recently been overcome by long-read CYP2D6 sequencing; however, such platforms are not widely available. This review highlights the genomic complexities of CYP2D6, current sequencing methods and the evolution of CYP2D6 from allele discovery to clinical pharmacogenetic testing.
Among available genome relatedness indices, average nucleotide identity (ANI) is one of the most robust measurements of genomic relatedness between strains, and has great potential in the taxonomy of ...bacteria and archaea as a substitute for the labour-intensive DNA–DNA hybridization (DDH) technique. An ANI threshold range (95–96 %) for species demarcation had previously been suggested based on comparative investigation between DDH and ANI values, albeit with rather limited datasets. Furthermore, its generality was not tested on all lineages of prokaryotes. Here, we investigated the overall distribution of ANI values generated by pairwise comparison of 6787 genomes of prokaryotes belonging to 22 phyla to see whether the suggested range can be applied to all species. There was an apparent distinction in the overall ANI distribution between intra- and interspecies relationships at around 95–96 % ANI. We went on to determine which level of 16S rRNA gene sequence similarity corresponds to the currently accepted ANI threshold for species demarcation using over one million comparisons. A twofold cross-validation statistical test revealed that 98.65 % 16S rRNA gene sequence similarity can be used as the threshold for differentiating two species, which is consistent with previous suggestions (98.2–99.0 %) derived from comparative studies between DDH and 16S rRNA gene sequence similarity. Our findings should be useful in accelerating the use of genomic sequence data in the taxonomy of bacteria and archaea.
16S ribosomal RNA gene (rDNA) amplicon analysis remains the standard approach for the cultivation-independent investigation of microbial diversity. The accuracy of these analyses depends strongly on ...the choice of primers. The overall coverage and phylum spectrum of 175 primers and 512 primer pairs were evaluated in silico with respect to the SILVA 16S/18S rDNA non-redundant reference dataset (SSURef 108 NR). Based on this evaluation a selection of 'best available' primer pairs for Bacteria and Archaea for three amplicon size classes (100-400, 400-1000, ≥ 1000 bp) is provided. The most promising bacterial primer pair (S-D-Bact-0341-b-S-17/S-D-Bact-0785-a-A-21), with an amplicon size of 464 bp, was experimentally evaluated by comparing the taxonomic distribution of the 16S rDNA amplicons with 16S rDNA fragments from directly sequenced metagenomes. The results of this study may be used as a guideline for selecting primer pairs with the best overall coverage and phylum spectrum for specific applications, therefore reducing the bias in PCR-based microbial diversity studies.
Adjacent CpG sites in mammalian genomes can be co-methylated owing to the processivity of methyltransferases or demethylases, yet discordant methylation patterns have also been observed, which are ...related to stochastic or uncoordinated molecular processes. We focused on a systematic search and investigation of regions in the full human genome that show highly coordinated methylation. We defined 147,888 blocks of tightly coupled CpG sites, called methylation haplotype blocks, after analysis of 61 whole-genome bisulfite sequencing data sets and validation with 101 reduced-representation bisulfite sequencing data sets and 637 methylation array data sets. Using a metric called methylation haplotype load, we performed tissue-specific methylation analysis at the block level. Subsets of informative blocks were further identified for deconvolution of heterogeneous samples. Finally, using methylation haplotypes we demonstrated quantitative estimation of tumor load and tissue-of-origin mapping in the circulating cell-free DNA of 59 patients with lung or colorectal cancer.
Display omitted
► High quality de novo annotation of Metazoan mitochondrial genomes. ► MITOS is available as fully automatic web server. ► Consistent reannotation of available mitogenomes.
About 2000 ...completely sequenced mitochondrial genomes are available from the NCBI RefSeq data base together with manually curated annotations of their protein-coding genes, rRNAs, and tRNAs. This annotation information, which has accumulated over two decades, has been obtained with a diverse set of computational tools and annotation strategies. Despite all efforts of manual curation it is still plagued by misassignments of reading directions, erroneous gene names, and missing as well as false positive annotations in particular for the RNA genes. Taken together, this causes substantial problems for fully automatic pipelines that aim to use these data comprehensively for studies of animal phylogenetics and the molecular evolution of mitogenomes. The MITOS pipeline is designed to compute a consistent de novo annotation of the mitogenomic sequences. We show that the results of MITOS match RefSeq and MitoZoa in terms of annotation coverage and quality. At the same time we avoid biases, inconsistencies of nomenclature, and typos originating from manual curation strategies. The MITOS pipeline is accessible online at http://mitos.bioinf.uni-leipzig.de.
Although genomic instability, epigenetic abnormality, and gene expression dysregulation are hallmarks of colorectal cancer, these features have not been simultaneously analyzed at single-cell ...resolution. Using optimized single-cell multiomics sequencing together with multiregional sampling of the primary tumor and lymphatic and distant metastases, we developed insights beyond intratumoral heterogeneity. Genome-wide DNA methylation levels were relatively consistent within a single genetic sublineage. The genome-wide DNA demethylation patterns of cancer cells were consistent in all 10 patients whose DNA we sequenced. The cancer cells' DNA demethylation degrees clearly correlated with the densities of the heterochromatin-associated histone modification H3K9me3 of normal tissue and those of repetitive element long interspersed nuclear element 1. Our work demonstrates the feasibility of reconstructing genetic lineages and tracing their epigenomic and transcriptomic dynamics with single-cell multiomics sequencing.
Single-cell sequencing-based methods for profiling gene transcript levels have revealed substantial heterogeneity in expression levels among morphologically indistinguishable cells. This variability ...has important functional implications for tissue biology and disease states such as cancer. Mapping of epigenomic information such as chromatin accessibility, nucleosome positioning, histone tail modifications and enhancer-promoter interactions in both bulk-cell and single-cell samples has shown that these characteristics of chromatin state contribute to expression or repression of associated genes. Advances in single-cell epigenomic profiling methods are enabling high-resolution mapping of chromatin states in individual cells. Recent studies using these techniques provide evidence that variations in different aspects of chromatin organization collectively define gene expression heterogeneity among otherwise highly similar cells.