Due to the high mutation rate of the virus, the COVID-19 pandemic evolved rapidly. Certain variants of the virus, such as Delta and Omicron emerged with altered viral properties leading to severe ...transmission and death rates. These variants burdened the medical systems worldwide with a major impact to travel, productivity, and the world economy. Unsupervised machine learning methods have the ability to compress, characterize, and visualize unlabelled data. This paper presents a framework that utilizes unsupervised machine learning methods to discriminate and visualize the associations between major COVID-19 variants based on their genome sequences. These methods comprise a combination of selected dimensionality reduction and clustering techniques. The framework processes the RNA sequences by performing a k-mer analysis on the data and further visualises and compares the results using selected dimensionality reduction methods that include principal component analysis (PCA), t-distributed stochastic neighbour embedding (t-SNE), and uniform manifold approximation projection (UMAP). Our framework also employs agglomerative hierarchical clustering to visualize the mutational differences among major variants of concern and country-wise mutational differences for selected variants (Delta and Omicron) using dendrograms. We also provide country-wise mutational differences for selected variants via dendrograms. We find that the proposed framework can effectively distinguish between the major variants and has the potential to identify emerging variants in the future.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Recent years have seen the development of computational tools to assist researchers in performing CRISPR-Cas9 experiment optimally. More specifically, these tools aim to maximize on-target activity ...(guide efficiency) while also minimizing potential off-target effects (guide specificity) by analyzing the features of the target site. Nonetheless, currently available tools cannot robustly predict experimental success as prediction accuracy depends on the approximations of the underlying model and how closely the experimental setup matches the data the model was trained on. Here, we present an overview of the available computational tools, their current limitations and future considerations. We discuss new trends around personalized health by taking genomic variants into account when predicting target sites as well as discussing other governing factors that can improve prediction accuracy.
Abstract
Motivation
Despite being essential for numerous clinical and research applications, high-resolution human leukocyte antigen (HLA) typing remains challenging and laboratory tests are also ...time-consuming and labour intensive. With next-generation sequencing data becoming widely accessible, on-demand in silico HLA typing offers an economical and efficient alternative.
Results
In this study we evaluate the HLA typing accuracy and efficiency of five computational HLA typing methods by comparing their predictions against a curated set of > 1000 published polymerase chain reaction-derived HLA genotypes on three different data sets (whole genome sequencing, whole exome sequencing and transcriptomic sequencing data). The highest accuracy at clinically relevant resolution (four digits) we observe is 81% on RNAseq data by PHLAT and 99% accuracy by OptiType when limited to Class I genes only. We also observed variability between the tools for resource consumption, with runtime ranging from an average of 5 h (HLAminer) to 7 min (seq2HLA) and memory from 12.8 GB (HLA-VBSeq) to 0.46 GB (HLAminer) per sample.
While a minimal coverage is required, other factors also determine prediction accuracy and the results between tools do not correlate well. Therefore, by combining tools, there is the potential to develop a highly accurate ensemble method that is able to deliver fast, economical HLA typing from existing sequencing data.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Editing individual nucleotides is a crucial component for validating genomic disease association. It is currently hampered by CRISPR-Cas-mediated "base editing" being limited to certain nucleotide ...changes, and only achievable within a small window around CRISPR-Cas target sites. The more versatile alternative, HDR (homology directed repair), has a 3-fold lower efficiency with known optimization factors being largely immutable in experiments. Here, we investigated the variable efficiency-governing factors on a novel mouse dataset using machine learning. We found the sequence composition of the single-stranded oligodeoxynucleotide (ssODN), i.e. the repair template, to be a governing factor. Furthermore, different regions of the ssODN have variable influence, which reflects the underlying mechanism of the repair process. Our model improves HDR efficiency by 83% compared to traditionally chosen targets. Using our findings, we developed CUNE (Computational Universal Nucleotide Editor), which enables users to identify and design the optimal targeting strategy using traditional base editing or - for-the-first-time - HDR-mediated nucleotide changes.
Precise genomic modification using prime editing (PE) holds enormous potential for research and clinical applications. In this study, we generated all-in-one prime editing (PEA1) constructs that ...carry all the components required for PE, along with a selection marker. We tested these constructs (with selection) in HEK293T, K562, HeLa and mouse embryonic stem (ES) cells. We discovered that PE efficiency in HEK293T cells was much higher than previously observed, reaching up to 95% (mean 67%). The efficiency in K562 and HeLa cells, however, remained low. To improve PE efficiency in K562 and HeLa, we generated a nuclease prime editor and tested this system in these cell lines as well as mouse ES cells. PE-nuclease greatly increased prime editing initiation, however, installation of the intended edits was often accompanied by extra insertions derived from the repair template. Finally, we show that zygotic injection of the nuclease prime editor can generate correct modifications in mouse fetuses with up to 100% efficiency.
Natural variations in a genome can drastically alter the CRISPR-Cas9 off-target landscape by creating or removing sites. Despite the resulting potential side-effects from such unaccounted for sites, ...current off-target detection pipelines are not equipped to include variant information. To address this, we developed VARiant-aware detection and SCoring of Off-Targets (VARSCOT).
VARSCOT identifies only 0.6% of off-targets to be common between 4 individual genomes and the reference, with an average of 82% of off-targets unique to an individual. VARSCOT is the most sensitive detection method for off-targets, finding 40 to 70% more experimentally verified off-targets compared to other popular software tools and its machine learning model allows for CRISPR-Cas9 concentration aware off-target activity scoring.
VARSCOT allows researchers to take genomic variation into account when designing individual or population-wide targeting strategies. VARSCOT is available from https://github.com/BauerLab/VARSCOT .
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Pre‐clinical responses to fast‐moving infectious disease outbreaks heavily depend on choosing the best isolates for animal models that inform diagnostics, vaccines and treatments. Current approaches ...are driven by practical considerations (e.g. first available virus isolate) rather than a detailed analysis of the characteristics of the virus strain chosen, which can lead to animal models that are not representative of the circulating or emerging clusters. Here, we suggest a combination of epidemiological, experimental and bioinformatic considerations when choosing virus strains for animal model generation. We discuss the currently chosen SARS‐CoV‐2 strains for international coronavirus disease (COVID‐19) models in the context of their phylogeny as well as in a novel alignment‐free bioinformatic approach. Unlike phylogenetic trees, which focus on individual shared mutations, this new approach assesses genome‐wide co‐developing functionalities and hence offers a more fluid view of the ‘cloud of variances’ that RNA viruses are prone to accumulate. This joint approach concludes that while the current animal models cover the existing viral strains adequately, there is substantial evolutionary activity that is likely not considered by the current models. Based on insights from the non‐discrete alignment‐free approach and experimental observations, we suggest isolates for future animal models.
Display omitted
•Detecting integrations of viral and vector genomes is critical in many fields.•Isling is the first tool identifying integrations of both virus and vector genomes.•Isling up to 170% ...more accurate and 1.6-fold faster than other software.•Isling enablescomparisons of wild-type virus and gene therapy vector integrations.
Detecting viral and vector integration events is a key step when investigating interactions between viral and host genomes. This is relevant in several fields, including virology, cancer research and gene therapy. For example, investigating integrations of wild-type viruses such as human papillomavirus and hepatitis B virus has proven to be crucial for understanding the role of these integrations in cancer. Furthermore, identifying the extent of vector integration is vital for determining the potential for genotoxicity in gene therapies. To address these questions, we developed isling, the first tool specifically designed for identifying viral integrations in both wild-type and vector from next-generation sequencing data. Isling addresses complexities in integration behaviour including integration of fragmented genomes and integration junctions with ambiguous locations in a host or vector genome, and can also flag possible vector recombinations. We show that isling is up to 1.6-fold faster and up to 170% more accurate than other viral integration tools, and performs well on both simulated and real datasets. Isling is therefore an efficient and application-agnostic tool that will enable a broad range of investigations into viral and vector integration. These include comparisons between integrations of wild-type viruses and gene therapy vectors, as well as assessing the genotoxicity of vectors and understanding the role of viruses in cancer.
Being able to link clinical outcomes to SARS‐CoV‐2 virus strains is a critical component of understanding COVID‐19. Here, we discuss how current processes hamper sustainable data collection to enable ...meaningful analysis and insights. Following the ‘Fast Healthcare Interoperable Resource’ (FHIR) implementation guide, we introduce an ontology‐based standard questionnaire to overcome these shortcomings and describe patient 'journeys' in coordination with the World Health Organization's recommendations. We identify steps in the clinical health data acquisition cycle and workflows that likely have the biggest impact in the data‐driven understanding of this virus. Specifically, we recommend detailed symptoms and medical history using the FHIR standards. We have taken the first steps towards this by making patient status mandatory in GISAID (‘Global Initiative on Sharing All Influenza Data’), immediately resulting in a measurable increase in the fraction of cases with useful patient information. The main remaining limitation is the lack of controlled vocabulary or a medical ontology.
Abstract
In silico predictions combined with in vitro, in vivo, and in situ observations collectively suggest that mouse adaptation of the severe acute respiratory syndrome 2 virus requires an ...aromatic substitution in position 501 or position 498 (but not both) of the spike protein’s receptor binding domain. This effect could be enhanced by mutations in positions 417, 484, and 493 (especially K417N, E484K, Q493K, and Q493R), and to a lesser extent by mutations in positions 486 and 499 (such as F486L and P499T). Such enhancements, due to more favorable binding interactions with residues on the complementary angiotensin-converting enzyme 2 interface, are, however, unlikely to sustain mouse infectivity on their own based on theoretical and experimental evidence to date. Our current understanding thus points to the Alpha, Beta, Gamma, and Omicron variants of concern infecting mice, whereas Delta and “Delta Plus” lack a similar biomolecular basis to do so. This paper identifies 11 countries (Brazil, Chile, Djibouti, Haiti, Malawi, Mozambique, Reunion, Suriname, Trinidad and Tobago, Uruguay, and Venezuela) where targeted local field surveillance of mice is encouraged because they may have come in contact with humans who had the virus with adaptive mutation(s). It also provides a systematic methodology to analyze the potential for other animal reservoirs and their likely locations.