RNA viruses exist in their hosts as populations of different but related strains. The virus population, often called quasispecies, is shaped by a combination of genetic change and natural selection. ...Genetic change is due to both point mutations and recombination events. We present a jumping hidden Markov model that describes the generation of viral quasispecies and a method to infer its parameters from next-generation sequencing data. The model introduces position-specific probability tables over the sequence alphabet to explain the diversity that can be found in the population at each site. Recombination events are indicated by a change of state, allowing a single observed read to originate from multiple sequences. We present a specific implementation of the expectation maximization (EM) algorithm to find maximum a posteriori estimates of the model parameters and a method to estimate the distribution of viral strains in the quasispecies. The model is validated on simulated data, showing the advantage of explicitly taking the recombination process into account, and applied to reads obtained from a clinical HIV sample.
The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the ...accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.
Virus populations can display high genetic diversity within individual hosts. The intra-host collection of viral haplotypes, called viral quasispecies, is an important determinant of virulence, ...pathogenesis, and treatment outcome. We present HaploClique, a computational approach to reconstruct the structure of a viral quasispecies from next-generation sequencing data as obtained from bulk sequencing of mixed virus samples. We develop a statistical model for paired-end reads accounting for mutations, insertions, and deletions. Using an iterative maximal clique enumeration approach, read pairs are assembled into haplotypes of increasing length, eventually enabling global haplotype assembly. The performance of our quasispecies assembly method is assessed on simulated data for varying population characteristics and sequencing technology parameters. Owing to its paired-end handling, HaploClique compares favorably to state-of-the-art haplotype inference methods. It can reconstruct error-free full-length haplotypes from low coverage samples and detect large insertions and deletions at low frequencies. We applied HaploClique to sequencing data derived from a clinical hepatitis C virus population of an infected patient and discovered a novel deletion of length 357±167 bp that was validated by two independent long-read sequencing experiments. HaploClique is available at https://github.com/armintoepfer/haploclique. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2-5.
Background: The potential of digital technologies is far from being exhausted for patients. The regulatory framework becomes a brake on innovation due to digitalisation, but also due to the trend ...towards individualisation. Strategies, corporate culture and processes, which are necessary for the design of high-quality and cost-effective healthcare services, are still lacking in many healthcare providing organisations.
Health Services 4.0 and patient integration as leverage: With Health Services 4.0 it is possible to improve the outcome of the individual healthcare service and meet the regulatory requirements. This requires the capabilities of the provider to dynamically balance exploitation and exploration. The challenges are to develop innovations in a continuously changing working environment and/or to adapt (medical) technical innovations into their own service processes.
Conclusion: This article is focused on hypotheses of cause-and-effect analyses formulated as scenarios, related to the implementation of digital technologies in order to improve efficiency and effectiveness for a high medical expertise as well as for a higher level of service quality.
The output is a more detailed analysis of key value drivers, success factors as well as internal and external value generators for the design of Health Services 4.0. Up to now many issues regarding the use of digital technologies are still only partly analysed and not yet proved for a more efficient care on high-quality level.
The company's capacity for ambidexterity is becoming an important dynamic capability, with on one hand flexibility for new developments and on the other hand stability for hard factors in physical value chains and soft factors in value-oriented attitudes and behaviour based on empathy.
This article was previously published in German in "Monitor Versorgungsforschung" under the original title " Personalisierung von Behandlungspfaden - Das Potenzial digitaler Technologien". This translated version faithfully reflects the authors, data, and interpretations of the original.
Circular consensus sequencing with Pacific Biosciences (PacBio) technology generates long (10-25 kilobases), accurate 'HiFi' reads by combining serial observations of a DNA molecule into a consensus ...sequence. The standard approach to consensus generation, pbccs, uses a hidden Markov model. We introduce DeepConsensus, which uses an alignment-based loss to train a gap-aware transformer-encoder for sequence correction. Compared to pbccs, DeepConsensus reduces read errors by 42%. This increases the yield of PacBio HiFi reads at Q20 by 9%, at Q30 by 27% and at Q40 by 90%. With two SMRT Cells of HG003, reads from DeepConsensus improve hifiasm assembly contiguity (NG50 4.9 megabases (Mb) to 17.2 Mb), increase gene completeness (94% to 97%), reduce the false gene duplication rate (1.1% to 0.5%), improve assembly base accuracy (Q43 to Q45) and reduce variant-calling errors by 24%. DeepConsensus models could be trained to the general problem of analyzing the alignment of other types of sequences, such as unique molecular identifiers or genome assemblies.
Next-generation sequencing (NGS) technologies enable new insights into the diversity of virus populations within their hosts. Diversity estimation is currently restricted to single-nucleotide ...variants or to local fragments of no more than a few hundred nucleotides defined by the length of sequence reads. To study complex heterogeneous virus populations comprehensively, novel methods are required that allow for complete reconstruction of the individual viral haplotypes. Here, we show that assembly of whole viral genomes of ∼8600 nucleotides length is feasible from mixtures of heterogeneous HIV-1 strains derived from defined combinations of cloned virus strains and from clinical samples of an HIV-1 superinfected individual. Haplotype reconstruction was achieved using optimized experimental protocols and computational methods for amplification, sequencing and assembly. We comparatively assessed the performance of the three NGS platforms 454 Life Sciences/Roche, Illumina and Pacific Biosciences for this task. Our results prove and delineate the feasibility of NGS-based full-length viral haplotype reconstruction and provide new tools for studying evolution and pathogenesis of viruses.
Challenges in RNA virus bioinformatics Marz, Manja; Beerenwinkel, Niko; Drosten, Christian ...
Bioinformatics,
07/2014, Letnik:
30, Številka:
13
Journal Article
Recenzirano
Odprti dostop
Computer-assisted studies of structure, function and evolution of viruses remains a neglected area of research. The attention of bioinformaticians to this interesting and challenging field is far ...from commensurate with its medical and biotechnological importance. It is telling that out of >200 talks held at ISMB 2013, the largest international bioinformatics conference, only one presentation explicitly dealt with viruses. In contrast to many broad, established and well-organized bioinformatics communities (e.g. structural genomics, ontologies, next-generation sequencing, expression analysis), research groups focusing on viruses can probably be counted on the fingers of two hands.
The purpose of this review is to increase awareness among bioinformatics researchers about the pressing needs and unsolved problems of computational virology. We focus primarily on RNA viruses that pose problems to many standard bioinformatics analyses owing to their compact genome organization, fast mutation rate and low evolutionary conservation. We provide an overview of tools and algorithms for handling viral sequencing data, detecting functionally important RNA structures, classifying viral proteins into families and investigating the origin and evolution of viruses.
Determining the composition of viral populations is becoming increasingly important in the field of medical virology. While recently developed computational tools for viral haplotype analysis allow ...for correcting sequencing errors, they do not always allow for the removal of errors occurring in the upstream experimental protocol, such as PCR errors. Primer IDs (pIDs) are one method to address this problem by harnessing redundant template resampling for error correction. By using a reference mixture of five HIV-1 strains, we show how pIDs can be useful for estimating key experimental parameters, such as the substitution rate of the PCR process and the reverse transcription (RT) error rate. In addition, we introduce a hidden Markov model for determining the recombination rate of the RT PCR process. We found no strong sequence-specific bias in pID abundances (the same RT efficiencies as compared to commonly used short, specific RT primers) and no effects of pIDs on the estimated distribution of the references viruses.
Display omitted
•pIDs are unique tags that track a single cDNA molecule through to sequencing.•pIDs allow for inferring enzymatic error rates.•pIDs show no strong bias effects within their ID region.•pIDs do not improve population frequency estimators over standard approaches.•Public healthcare economics unclear required redundancy incompatible with pooling.
Abstract Classical swine fever virus (CSFV) is a positive-sense RNA virus with a high degree of genetic variability among isolates. High diversity is also found in virulence, with strains covering ...the complete spectrum from avirulent to highly virulent. The underlying genetic determinants are far from being understood. Since RNA polymerases of RNA viruses lack any proof-reading activity, different genome variations called haplotypes, occur during replication. A set of haplotypes is referred to as a viral quasispecies. Genetic variability can be a fitness advantage through facilitating of a more effective escape from the host immune response. In order to investigate the correlation of quasispecies composition and virulence in vivo , we analyzed next-generation sequencing data of CSFV isolates of varying virulence. Viral samples from pigs infected with the highly virulent isolates “Koslov” and “Brescia” showed higher quasispecies diversity and more nucleotide variability, compared to samples of pigs infected with low and moderately virulent isolates.
Motivation: The world-wide community of life scientists has access to a large number of public bioinformatics databases and tools, which are developed and deployed using diverse technologies and ...designs. More and more of the resources offer programmatic web-service interface. However, efficient use of the resources is hampered by the lack of widely used, standard data-exchange formats for the basic, everyday bioinformatics data types. Results: BioXSD has been developed as a candidate for standard, canonical exchange format for basic bioinformatics data. BioXSD is represented by a dedicated XML Schema and defines syntax for biological sequences, sequence annotations, alignments and references to resources. We have adapted a set of web services to use BioXSD as the input and output format, and implemented a test-case workflow. This demonstrates that the approach is feasible and provides smooth interoperability. Semantics for BioXSD is provided by annotation with the EDAM ontology. We discuss in a separate section how BioXSD relates to other initiatives and approaches, including existing standards and the Semantic Web. Availability: The BioXSD 1.0 XML Schema is freely available at http://www.bioxsd.org/BioXSD-1.0.xsd under the Creative Commons BY-ND 3.0 license. The http://bioxsd.org web page offers documentation, examples of data in BioXSD format, example workflows with source codes in common programming languages, an updated list of compatible web services and tools and a repository of feature requests from the community. Contact: matus.kalas@bccs.uib.no; developers@bioxsd.org; support@bioxsd.org