We present Gene-Aware Variant INterpretation (GAVIN), a new method that accurately classifies variants for clinical diagnostic purposes. Classifications are based on gene-specific calibrations of ...allele frequencies from the ExAC database, likely variant impact using SnpEff, and estimated deleteriousness based on CADD scores for >3000 genes. In a benchmark on 18 clinical gene sets, we achieve a sensitivity of 91.4% and a specificity of 76.9%. This accuracy is unmatched by 12 other tools. We provide GAVIN as an online MOLGENIS service to annotate VCF files and as an open source executable for use in bioinformatic pipelines. It can be found at http://molgenis.org/gavin .
The diagnostic yield of exome and genome sequencing remains low (8-70%), due to incomplete knowledge on the genes that cause disease. To improve this, we use RNA-seq data from 31,499 samples to ...predict which genes cause specific disease phenotypes, and develop GeneNetwork Assisted Diagnostic Optimization (GADO). We show that this unbiased method, which does not rely upon specific knowledge on individual genes, is effective in both identifying previously unknown disease gene associations, and flagging genes that have previously been incorrectly implicated in disease. GADO can be run on www.genenetwork.nl by supplying HPO-terms and a list of genes that contain candidate variants. Finally, applying GADO to a cohort of 61 patients for whom exome-sequencing analysis had not resulted in a genetic diagnosis, yields likely causative genes for ten cases.
Rapid diagnostic whole-genome sequencing has been explored in critically ill newborns, hoping to improve their clinical care and replace time-consuming and/or invasive diagnostic testing. A previous ...retrospective study in a research setting showed promising results with diagnoses in 57%, but patients were highly selected for known and likely Mendelian disorders. The aim of our prospective study was to assess the speed and yield of rapid targeted genomic diagnostics for clinical application.
We included 23 critically ill children younger than 12 months in ICUs over a period of 2 years. A quick diagnosis could not be made after routine clinical evaluation and diagnostics. Targeted analysis of 3426 known disease genes was performed by using whole-genome sequencing data. We measured diagnostic yield, turnaround times, and clinical consequences.
A genetic diagnosis was obtained in 7 patients (30%), with a median turnaround time of 12 days (ranging from 5 to 23 days). We identified compound heterozygous mutations in the
gene (Vici syndrome), the
gene (combined oxidative phosphorylation deficiency-11), and the
gene (vanishing white matter), and homozygous mutations in the
gene (nemaline myopathy), the
gene (progressive mitochondrial myopathy), and the
gene (GM1-gangliosidosis). In addition, a 1p36.33p36.32 microdeletion was detected in a child with cardiomyopathy.
Rapid targeted genomics combined with copy number variant detection adds important value in the neonatal and pediatric intensive care setting. It led to a fast diagnosis in 30% of critically ill children for whom the routine clinical workup was unsuccessful.
Epidermolysis bullosa is a group of genetic skin conditions characterized by abnormal skin (and mucosal) fragility caused by pathogenic variants in various genes. The disease severity ranges from ...early childhood mortality in the most severe types to occasional acral blistering in the mildest types. The subtype and severity of EB is linked to the gene involved and the specific variants in that gene, which also determine its mode of inheritance. Current treatment is mainly focused on symptomatic relief such as wound care and blister prevention, because truly curative treatment options are still at the preclinical stage. Given the current level of understanding, the broad spectrum of genes and variants underlying EB makes it impossible to develop a single treatment strategy for all patients. It is likely that many different variant-specific treatment strategies will be needed to ultimately treat all patients. Antisense-oligonucleotide (ASO)-mediated exon skipping aims to counteract pathogenic sequence variants by restoring the open reading frame through the removal of the mutant exon from the pre-messenger RNA. This should lead to the restored production of the protein absent in the affected skin and, consequently, improvement of the phenotype. Several preclinical studies have demonstrated that exon skipping can restore protein production in vitro, in skin equivalents, and in skin grafts derived from EB-patient skin cells, indicating that ASO-mediated exon skipping could be a viable strategy as a topical or systemic treatment. The potential value of exon skipping for EB is supported by a study showing reduced phenotypic severity in patients who carry variants that result in natural exon skipping. In this article, we review the substantial progress made on exon skipping for EB in the past 15 years and highlight the opportunities and current challenges of this RNA-based therapy approach. In addition, we present a prioritization strategy for the development of exon skipping based on genomic information of all EB-involved genes.
Allele specific expression (ASE) concerns divergent expression quantity of alternative alleles and is measured by RNA sequencing. Multiple studies show that ASE plays a role in hereditary diseases by ...modulating penetrance or phenotype severity. However, genome diagnostics is based on DNA sequencing and therefore neglects gene expression regulation such as ASE. To take advantage of ASE in absence of RNA sequencing, it must be predicted using only DNA variation. We have constructed ASE models from BIOS (n = 3432) and GTEx (n = 369) that predict ASE using DNA features. These models are highly reproducible and comprise many different feature types, highlighting the complex regulation that underlies ASE. We applied the BIOS-trained model to population variants in three genes in which ASE plays a clinically relevant role: BRCA2, RET and NF1. This resulted in predicted ASE effects for 27 variants, of which 10 were known pathogenic variants. We demonstrated that ASE can be predicted from DNA features using machine learning. Future efforts may improve sensitivity and translate these models into a new type of genome diagnostic tool that prioritizes candidate pathogenic variants or regulators thereof for follow-up validation by RNA sequencing. All used code and machine learning models are available at GitHub and Zenodo.
ABSTRACT
Next‐generation sequencing in clinical diagnostics is providing valuable genomic variant data, which can be used to support healthcare decisions. In silico tools to predict pathogenicity are ...crucial to assess such variants and we have evaluated a new tool, Combined Annotation Dependent Depletion (CADD), and its classification of gene variants in Lynch syndrome by using a set of 2,210 DNA mismatch repair gene variants. These had already been classified by experts from InSiGHT's Variant Interpretation Committee. Overall, we found CADD scores do predict pathogenicity (Spearman's ρ = 0.595, P < 0.001). However, we discovered 31 major discrepancies between the InSiGHT classification and the CADD scores; these were explained in favor of the expert classification using population allele frequencies, cosegregation analyses, disease association studies, or a second‐tier test. Of 751 variants that could not be clinically classified by InSiGHT, CADD indicated that 47 variants were worth further study to confirm their putative pathogenicity. We demonstrate CADD is valuable in prioritizing variants in clinically relevant genes for further assessment by expert classification teams.
In silico estimation of pathogenicity is becoming a necessity to interpret the amount of variants produced by current high‐throughput DNA sequencing. Here we benchmark the promising new CADD method against a gold standard set of curated mismatch repair gene variants classified by the InSiGHT consortium and investigate any discrepancies we find.
ABSTRACT
Microvillus inclusion disease (MVID) is one of the most severe congenital intestinal disorders and is characterized by neonatal secretory diarrhea and the inability to absorb nutrients from ...the intestinal lumen. MVID is associated with patient‐, family‐, and ancestry‐unique mutations in the MYO5B gene, encoding the actin‐based motor protein myosin Vb. Here, we review the MYO5B gene and all currently known MYO5B mutations and for the first time methodologically categorize these with regard to functional protein domains and recurrence in MYO7A associated with Usher syndrome and other myosins. We also review animal models for MVID and the latest data on functional studies related to the myosin Vb protein. To congregate existing and future information on MVID geno‐/phenotypes and facilitate its quick and easy sharing among clinicians and researchers, we have constructed an online MOLGENIS‐based international patient registry (www.MVID‐central.org). This easily accessible database currently contains detailed information of 137 MVID patients together with reported clinical/phenotypic details and 41 unique MYO5B mutations, of which several unpublished. The future expansion and prospective nature of this registry is expected to improve disease diagnosis, prognosis, and genetic counseling.
Microvillus Inclusion Disease is a severe congenital intestinal disorder characterized by MYO5B mutations. In this article we review and categorize all known MYO5B mutations. We also present an online registry for Microvillus Inclusion Disease patients and their MYO5B mutations, the future expansion of which is expected to improve disease diagnosis, prognosis, and genetic counseling.
Abstract
Motivation
The volume and complexity of biological data increases rapidly. Many clinical professionals and biomedical researchers without a bioinformatics background are generating big ...'-omics' data, but do not always have the tools to manage, process or publicly share these data.
Results
Here we present MOLGENIS Research, an open-source web-application to collect, manage, analyze, visualize and share large and complex biomedical datasets, without the need for advanced bioinformatics skills.
Availability and implementation
MOLGENIS Research is freely available (open source software). It can be installed from source code (see http://github.com/molgenis), downloaded as a precompiled WAR file (for your own server), setup inside a Docker container (see http://molgenis.github.io), or requested as a Software-as-a-Service subscription. For a public demo instance and complete installation instructions see http://molgenis.org/research.
Rare disease patient data are typically sensitive, present in multiple registries controlled by different custodians, and non-interoperable. Making these data Findable, Accessible, Interoperable, and ...Reusable (FAIR) for humans and machines at source enables federated discovery and analysis across data custodians. This facilitates accurate diagnosis, optimal clinical management, and personalised treatments. In Europe, twenty-four European Reference Networks (ERNs) work on rare disease registries in different clinical domains. The process and the implementation choices for making data FAIR ('FAIRification') differ among ERN registries. For example, registries use different software systems and are subject to different legal regulations. To support the ERNs in making informed decisions and to harmonise FAIRification, the FAIRification steward team was established to work as liaisons between ERNs and researchers from the European Joint Programme on Rare Diseases.
The FAIRification steward team inventoried the FAIRification challenges of the ERN registries and proposed solutions collectively with involved stakeholders to address them. Ninety-eight FAIRification challenges from 24 ERNs' registries were collected and categorised into "training" (31), "community" (9), "modelling" (12), "implementation" (26), and "legal" (20). After curating and aggregating highly similar challenges, 41 unique FAIRification challenges remained. The two categories with the most challenges were "training" (15) and "implementation" (9), followed by "community" (7), and then "modelling" (5) and "legal" (5). To address all challenges, eleven types of solutions were proposed. Among them, the provision of guidelines and the organisation of training activities resolved the "training" challenges, which ranged from less-technical "coffee-rounds" to technical workshops, from informal FAIR Games to formal hackathons. Obtaining implementation support from technical experts was the solution type for tackling the "implementation" challenges.
This work shows that a dedicated team of FAIR data stewards is an asset for harmonising the various processes of making data FAIR in a large organisation with multiple stakeholders. Additionally, multi-levelled training activities are required to accommodate the diverse needs of the ERNs. Finally, the lessons learned from the experience of the FAIRification steward team described in this paper may help to increase FAIR awareness and provide insights into FAIRification challenges and solutions of rare disease registries.
Exome sequencing is now mainstream in clinical practice. However, identification of pathogenic Mendelian variants remains time-consuming, in part, because the limited accuracy of current ...computational prediction methods requires manual classification by experts. Here we introduce CAPICE, a new machine-learning-based method for prioritizing pathogenic variants, including SNVs and short InDels. CAPICE outperforms the best general (CADD, GAVIN) and consequence-type-specific (REVEL, ClinPred) computational prediction methods, for both rare and ultra-rare variants. CAPICE is easily added to diagnostic pipelines as pre-computed score file or command-line software, or using online MOLGENIS web service with API. Download CAPICE for free and open-source (LGPLv3) at Keywords: Variant pathogenicity prediction, Machine learning, Exome sequencing, Molecular consequence, Allele frequency, Clinical genetics, Genome diagnostics