The Human Gene Mutation Database (HGMD
®
) constitutes a comprehensive collection of published germline mutations in nuclear genes that underlie, or are closely associated with human inherited ...disease. At the time of writing (March 2017), the database contained in excess of 203,000 different gene lesions identified in over 8000 genes manually curated from over 2600 journals. With new mutation entries currently accumulating at a rate exceeding 17,000 per annum, HGMD represents de facto the central unified gene/disease-oriented repository of heritable mutations causing human genetic disease used worldwide by researchers, clinicians, diagnostic laboratories and genetic counsellors, and is an essential tool for the annotation of next-generation sequencing data. The public version of HGMD (
http://www.hgmd.org
) is freely available to registered users from academic institutions and non-profit organisations whilst the subscription version (HGMD Professional) is available to academic, clinical and commercial users under license via QIAGEN Inc.
The Human Gene Mutation Database (HGMD
®
) constitutes a comprehensive collection of published germline mutations in nuclear genes that are thought to underlie, or are closely associated with human ...inherited disease. At the time of writing (June 2020), the database contains in excess of 289,000 different gene lesions identified in over 11,100 genes manually curated from 72,987 articles published in over 3100 peer-reviewed journals. There are primarily two main groups of users who utilise HGMD on a regular basis; research scientists and clinical diagnosticians. This review aims to highlight how to make the most out of HGMD data in each setting.
Abstract
Summary
We present FATHMM-XF, a method for predicting pathogenic point mutations in the human genome. Drawing on an extensive feature set, FATHMM-XF outperforms competitors on benchmark ...tests, particularly in non-coding regions where the majority of pathogenic mutations are likely to be found.
Availability and implementation
The FATHMM-XF web server is available at http://fathmm.biocompute.org.uk/fathmm-xf/, and as tracks on the Genome Tolerance Browser: http://gtb.biocompute.org.uk. Predictions are provided for human genome version GRCh37/hg19. The data used for this project can be downloaded from: http://fathmm.biocompute.org.uk/fathmm-xf/
Supplementary information
Supplementary data are available at Bioinformatics online.
Motivation: Advances in high-throughput genotyping and next generation sequencing have generated a vast amount of human genetic variation data. Single nucleotide substitutions within protein coding ...regions are of particular importance owing to their potential to give rise to amino acid substitutions that affect protein structure and function which may ultimately lead to a disease state. Over the last decade, a number of computational methods have been developed to predict whether such amino acid substitutions result in an altered phenotype. Although these methods are useful in practice, and accurate for their intended purpose, they are not well suited for providing probabilistic estimates of the underlying disease mechanism. Results: We have developed a new computational model, MutPred, that is based upon protein sequence, and which models changes of structural features and functional sites between wild-type and mutant sequences. These changes, expressed as probabilities of gain or loss of structure and function, can provide insight into the specific molecular mechanism responsible for the disease state. MutPred also builds on the established SIFT method but offers improved classification accuracy with respect to human disease mutations. Given conservative thresholds on the predicted disruption of molecular function, we propose that MutPred can generate accurate and reliable hypotheses on the molecular basis of disease for ∼11% of known inherited disease-causing mutations. We also note that the proportion of changes of functionally relevant residues in the sets of cancer-associated somatic mutations is higher than for the inherited lesions in the Human Gene Mutation Database which are instead predicted to be characterized by disruptions of protein structure. Availability: http://mutdb.org/mutpred Contact: predrag@indiana.edu; smooney@buckinstitute.org
Identifying pathogenic variants and underlying functional alterations is challenging. To this end, we introduce MutPred2, a tool that improves the prioritization of pathogenic amino acid ...substitutions over existing methods, generates molecular mechanisms potentially causative of disease, and returns interpretable pathogenicity score distributions on individual genomes. Whilst its prioritization performance is state-of-the-art, a distinguishing feature of MutPred2 is the probabilistic modeling of variant impact on specific aspects of protein structure and function that can serve to guide experimental studies of phenotype-altering variants. We demonstrate the utility of MutPred2 in the identification of the structural and functional mutational signatures relevant to Mendelian disorders and the prioritization of de novo mutations associated with complex neurodevelopmental disorders. We then experimentally validate the functional impact of several variants identified in patients with such disorders. We argue that mechanism-driven studies of human inherited disease have the potential to significantly accelerate the discovery of clinically actionable variants.
Technological advances have enabled the identification of an increasingly large spectrum of single nucleotide variants within the human genome, many of which may be associated with monogenic disease ...or complex traits. Here, we propose an integrative approach, named FATHMM-MKL, to predict the functional consequences of both coding and non-coding sequence variants. Our method utilizes various genomic annotations, which have recently become available, and learns to weight the significance of each component annotation source.
We show that our method outperforms current state-of-the-art algorithms, CADD and GWAVA, when predicting the functional consequences of non-coding variants. In addition, FATHMM-MKL is comparable to the best of these algorithms when predicting the impact of coding variants. The method includes a confidence measure to rank order predictions.
The Human Gene Mutation Database (HGMD
®
) is a comprehensive collection of germline mutations in nuclear genes that underlie, or are associated with, human inherited disease. By June 2013, the ...database contained over 141,000 different lesions detected in over 5,700 different genes, with new mutation entries currently accumulating at a rate exceeding 10,000 per annum. HGMD was originally established in 1996 for the scientific study of mutational mechanisms in human genes. However, it has since acquired a much broader utility as a central unified disease-oriented mutation repository utilized by human molecular geneticists, genome scientists, molecular biologists, clinicians and genetic counsellors as well as by those specializing in biopharmaceuticals, bioinformatics and personalized genomics. The public version of HGMD (
http://www.hgmd.org
) is freely available to registered users from academic institutions/non-profit organizations whilst the subscription version (HGMD Professional) is available to academic, clinical and commercial users under license via BIOBASE GmbH.
It is widely accepted that at least 10% of all mutations causing human inherited disease disrupt splice-site consensus sequences. In contrast to splice-site mutations, the role of auxiliary ...cis-acting elements such as exonic splicing enhancers (ESE) and exonic splicing silencers (ESS) in human inherited disease is still poorly understood. Here we use a top-down approach to determine rates of loss or gain of known human exonic splicing regulatory (ESR) sequences associated with either disease-causing mutations or putatively neutral single nucleotide polymorphisms (SNPs). We observe significant enrichment toward loss of ESEs and gain of ESSs among inherited disease-causing variants relative to neutral polymorphisms, indicating that exon skipping may play a prominent role in aberrant gene regulation. Both computational and biochemical approaches underscore the relevance of exonic splicing enhancer loss and silencer gain in inherited disease. Additionally, we provide direct evidence that both SRp20 (SRSF3) and possibly PTB (PTBP1) are involved in the function of a splicing silencer that is created de novo by a total of 83 different inherited disease mutations in 67 different disease genes. Taken together, we find that ~25% (7154/27,681) of known mis-sense and nonsense disease-causing mutations alter functional splicing signals within exons, suggesting a much more widespread role for aberrant mRNA processing in causing human inherited disease than has hitherto been appreciated.
Loss-of-function genetic variants are frequently associated with severe clinical phenotypes, yet many are present in the genomes of healthy individuals. The available methods to assess the impact of ...these variants rely primarily upon evolutionary conservation with little to no consideration of the structural and functional implications for the protein. They further do not provide information to the user regarding specific molecular alterations potentially causative of disease.
To address this, we investigate protein features underlying loss-of-function genetic variation and develop a machine learning method, MutPred-LOF, for the discrimination of pathogenic and tolerated variants that can also generate hypotheses on specific molecular events disrupted by the variant. We investigate a large set of human variants derived from the Human Gene Mutation Database, ClinVar and the Exome Aggregation Consortium. Our prediction method shows an area under the Receiver Operating Characteristic curve of 0.85 for all loss-of-function variants and 0.75 for proteins in which both pathogenic and neutral variants have been observed. We applied MutPred-LOF to a set of 1142 de novo vari3ants from neurodevelopmental disorders and find enrichment of pathogenic variants in affected individuals. Overall, our results highlight the potential of computational tools to elucidate causal mechanisms underlying loss of protein function in loss-of-function variants.
http://mutpred.mutdb.org.
predrag@indiana.edu.
Differentiation between phenotypically neutral and disease-causing genetic variation remains an open and relevant problem. Among different types of variation, non-frameshifting insertions and ...deletions (indels) represent an understudied group with widespread phenotypic consequences. To address this challenge, we present a machine learning method, MutPred-Indel, that predicts pathogenicity and identifies types of functional residues impacted by non-frameshifting insertion/deletion variation. The model shows good predictive performance as well as the ability to identify impacted structural and functional residues including secondary structure, intrinsic disorder, metal and macromolecular binding, post-translational modifications, allosteric sites, and catalytic residues. We identify structural and functional mechanisms impacted preferentially by germline variation from the Human Gene Mutation Database, recurrent somatic variation from COSMIC in the context of different cancers, as well as de novo variants from families with autism spectrum disorder. Further, the distributions of pathogenicity prediction scores generated by MutPred-Indel are shown to differentiate highly recurrent from non-recurrent somatic variation. Collectively, we present a framework to facilitate the interrogation of both pathogenicity and the functional effects of non-frameshifting insertion/deletion variants. The MutPred-Indel webserver is available at http://mutpred.mutdb.org/.