The growing availability of human genetic variation has given rise to novel methods of measuring genetic tolerance that better interpret variants of unknown significance. We recently developed a ...concept based on protein domain homology in the human genome to improve variant interpretation. For this purpose, we mapped population variation from the Exome Aggregation Consortium (ExAC) and pathogenic mutations from the Human Gene Mutation Database (HGMD) onto Pfam protein domains. The aggregation of these variation data across homologous domains into meta‐domains allowed us to generate amino acid resolution of genetic intolerance profiles for human protein domains. Here, we developed MetaDome, a fast and easy‐to‐use web server that visualizes meta‐domain information and gene‐wide profiles of genetic tolerance. We updated the underlying data of MetaDome to contain information from 56,319 human transcripts, 71,419 protein domains, 12,164,292 genetic variants from gnomAD, and 34,076 pathogenic mutations from ClinVar. MetaDome allows researchers to easily investigate their variants of interest for the presence or absence of variation at corresponding positions within homologous domains. We illustrate the added value of MetaDome by an example that highlights how it may help in the interpretation of variants of unknown significance. The MetaDome web server is freely accessible at https://stuart.radboudumc.nl/metadome.
We developed MetaDome, a fast and easy‐to‐use web server that visualizes meta‐domain information and gene‐wide profiles of genetic tolerance. MetaDome allows researchers to easily investigate their variants of interest for the presence or absence of variation at corresponding positions within homologous domains. The MetaDome web server is freely accessible at https://stuart.radboudumc.nl/metadome.
ABSTRACT
For next‐generation sequencing technologies, sufficient base‐pair coverage is the foremost requirement for the reliable detection of genomic variants. We investigated whether whole‐genome ...sequencing (WGS) platforms offer improved coverage of coding regions compared with whole‐exome sequencing (WES) platforms, and compared single‐base coverage for a large set of exome and genome samples. We find that WES platforms have improved considerably in the last years, but at comparable sequencing depth, WGS outperforms WES in terms of covered coding regions. At higher sequencing depth (95x–160x), WES successfully captures 95% of the coding regions with a minimal coverage of 20x, compared with 98% for WGS at 87‐fold coverage. Three different assessments of sequence coverage bias showed consistent biases for WES but not for WGS. We found no clear differences for the technologies concerning their ability to achieve complete coverage of 2,759 clinically relevant genes. We show that WES performs comparable to WGS in terms of covered bases if sequenced at two to three times higher coverage. This does, however, go at the cost of substantially more sequencing biases in WES approaches. Our findings will guide laboratories to make an informed decision on which sequencing platform and coverage to choose.
For next‐generation sequencing technologies sufficient base‐pair coverage is the foremost requirement for the reliable detection of genomic variants. We investigated whether whole genome sequencing (WGS) platforms offer improved coverage of coding regions compared to whole exome sequencing (WES) platforms. We show that WES performs comparable to WGS in terms of covered bases if sequenced at 2–3 times higher coverage. This does, however go at the cost of substantially more sequencing biases in WES approaches.
Numerous new disease-gene associations have been identified by whole-exome sequencing studies in the last few years. However, many cases remain unsolved due to the sheer number of candidate variants ...remaining after common filtering strategies such as removing low quality and common variants and those deemed unlikely to be pathogenic. The observation that each of our genomes contains about 100 genuine loss-of-function variants makes identification of the causative mutation problematic when using these strategies alone. We propose using the wealth of genotype to phenotype data that already exists from model organism studies to assess the potential impact of these exome variants. Here, we introduce PHenotypic Interpretation of Variants in Exomes (PHIVE), an algorithm that integrates the calculation of phenotype similarity between human diseases and genetically modified mouse models with evaluation of the variants according to allele frequency, pathogenicity, and mode of inheritance approaches in our Exomiser tool. Large-scale validation of PHIVE analysis using 100,000 exomes containing known mutations demonstrated a substantial improvement (up to 54.1-fold) over purely variant-based (frequency and pathogenicity) methods with the correct gene recalled as the top hit in up to 83% of samples, corresponding to an area under the ROC curve of >95%. We conclude that incorporation of phenotype data can play a vital role in translational bioinformatics and propose that exome sequencing projects should systematically capture clinical phenotypes to take advantage of the strategy presented here.
To identify candidate genes for intellectual disability, we performed a meta-analysis on 2,637 de novo mutations, identified from the exomes of 2,104 patient-parent trios. Statistical analyses ...identified 10 new candidate ID genes: DLG4, PPM1D, RAC1, SMAD6, SON, SOX5, SYNCRIP, TCF20, TLK2 and TRIP12. In addition, we show that these genes are intolerant to nonsynonymous variation and that mutations in these genes are associated with specific clinical ID phenotypes.
De novo mutations (DNMs) originating in gametogenesis are an important source of genetic variation. We use a data set of 7,216 autosomal DNMs with resolved parent of origin from whole-genome ...sequencing of 816 parent-offspring trios to investigate differences between maternally and paternally derived DNMs and study the underlying mutational mechanisms. Our results show that the number of DNMs in offspring increases not only with paternal age, but also with maternal age, and that some genome regions show enrichment for maternally derived DNMs. We identify parent-of-origin-specific mutation signatures that become more pronounced with increased parental age, pointing to different mutational mechanisms in spermatogenesis and oogenesis. Moreover, we find DNMs that are spatially clustered to have a unique mutational signature with no significant differences between parental alleles, suggesting a different mutational mechanism. Our findings provide insights into the molecular mechanisms that underlie mutagenesis and are relevant to disease and evolution in humans.
Uniparental disomy (UPD) is the rare occurrence of two homologous chromosomes originating from the same parent and is typically identified by marker analysis or single-nucleotide polymorphism ...(SNP)-based microarrays. UPDs may lead to disease due to imprinting effects, underlying homozygous pathogenic variants, or low-level mosaic aneuploidies. In this study we detected clinically relevant UPD events in both trio and single exome sequencing (ES) data.
UPD was detected by applying a method based on Mendelian inheritance errors to a cohort of 4912 ES trios (all UPD types) and by using median absolute deviation–scaled regions of homozygosity to a cohort of 29,723 single ES samples (isodisomy only).
As positive controls, we accurately identified three mixed UPD, three isodisomy, as well as two segmental UPD events that were all previously reported by SNP-based microarrays. In addition, we identified three segmental UPD and 11 isodisomy events. This resulted in a novel diagnosis based on imprinting for one patient, and adjusted genetic counseling for another patient.
UPD can easily be identified using both single and trio ES and may be clinically relevant to patients. UPD analysis should become routine in clinical ES, because it increases the diagnostic yield and could affect genetic counseling.
De novo mutations are recognized both as an important source of genetic variation and as a prominent cause of sporadic disease in humans. Mutations identified as de novo are generally assumed to have ...occurred during gametogenesis and, consequently, to be present as germline events in an individual. Because Sanger sequencing does not provide the sensitivity to reliably distinguish somatic from germline mutations, the proportion of de novo mutations that occur somatically rather than in the germline remains largely unknown. To determine the contribution of post-zygotic events to de novo mutations, we analyzed a set of 107 de novo mutations in 50 parent-offspring trios. Using four different sequencing techniques, we found that 7 (6.5%) of these presumed germline de novo mutations were in fact present as mosaic mutations in the blood of the offspring and were therefore likely to have occurred post-zygotically. Furthermore, genome-wide analysis of “de novo” variants in the proband led to the identification of 4/4,081 variants that were also detectable in the blood of one of the parents, implying parental mosaicism as the origin of these variants. Thus, our results show that an important fraction of de novo mutations presumed to be germline in fact occurred either post-zygotically in the offspring or were inherited as a consequence of low-level mosaicism in one of the parents.
Haploinsufficiency (HI) is the best characterized mechanism through which dominant mutations exert their effect and cause disease. Non-haploinsufficiency (NHI) mechanisms, such as gain-of-function ...and dominant-negative mechanisms, are often characterized by the spatial clustering of mutations, thereby affecting only particular regions or base pairs of a gene. Variants leading to haploinsufficency might occasionally cluster as well, for example in critical domains, but such clustering is on the whole less pronounced with mutations often spread throughout the gene. Here we exploit this property and develop a method to specifically identify genes with significant spatial clustering patterns of de novo mutations in large cohorts. We apply our method to a dataset of 4,061 de novo missense mutations from published exome studies of trios with intellectual disability and developmental disorders (ID/DD) and successfully identify 15 genes with clustering mutations, including 12 genes for which mutations are known to cause neurodevelopmental disorders. For 11 out of these 12, NHI mutation mechanisms have been reported. Additionally, we identify three candidate ID/DD-associated genes of which two have an established role in neuronal processes. We further observe a higher intolerance to normal genetic variation of the identified genes compared to known genes for which mutations lead to HI. Finally, 3D modeling of these mutations on their protein structures shows that 81% of the observed mutations are unlikely to affect the overall structural integrity and that they therefore most likely act through a mechanism other than HI.
The genetic cause underlying the development of multiple colonic adenomas, the premalignant precursors of colorectal cancer (CRC), frequently remains unresolved in patients with adenomatous ...polyposis. Here we applied whole-exome sequencing to 51 individuals with multiple colonic adenomas from 48 families. In seven affected individuals from three unrelated families, we identified a homozygous germline nonsense mutation in the base-excision repair (BER) gene NTHL1. This mutation was exclusively found in a heterozygous state in controls (minor allele frequency of 0.0036; n = 2,329). All three families showed recessive inheritance of the adenomatous polyposis phenotype and progression to CRC in at least one member. All three affected women developed an endometrial malignancy or premalignancy. Genetic analysis of three carcinomas and five adenomas from different affected individuals showed a non-hypermutated profile enriched for cytosine-to-thymine transitions. We conclude that a homozygous loss-of-function germline mutation in the NTHL1 gene predisposes to a new subtype of BER-associated adenomatous polyposis and CRC.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SBMB, UILJ, UKNU, UL, UM, UPUK
The cause of autosomal-dominant retinitis pigmentosa (adRP), which leads to loss of vision and blindness, was investigated in families lacking a molecular diagnosis. A refined locus for adRP on ...Chr17q22 (RP17) was delineated through genotyping and genome sequencing, leading to the identification of structural variants (SVs) that segregate with disease. Eight different complex SVs were characterized in 22 adRP-affected families with >300 affected individuals. All RP17 SVs had breakpoints within a genomic region spanning YPEL2 to LINC01476. To investigate the mechanism of disease, we reprogrammed fibroblasts from affected individuals and controls into induced pluripotent stem cells (iPSCs) and differentiated them into photoreceptor precursor cells (PPCs) or retinal organoids (ROs). Hi-C was performed on ROs, and differential expression of regional genes and a retinal enhancer RNA at this locus was assessed by qPCR. The epigenetic landscape of the region, and Hi-C RO data, showed that YPEL2 sits within its own topologically associating domain (TAD), rich in enhancers with binding sites for retinal transcription factors. The Hi-C map of RP17 ROs revealed creation of a neo-TAD with ectopic contacts between GDPD1 and retinal enhancers, and modeling of all RP17 SVs was consistent with neo-TADs leading to ectopic retinal-specific enhancer-GDPD1 accessibility. qPCR confirmed increased expression of GDPD1 and increased expression of the retinal enhancer that enters the neo-TAD. Altered TAD structure resulting in increased retinal expression of GDPD1 is the likely convergent mechanism of disease, consistent with a dominant gain of function. Our study highlights the importance of SVs as a genomic mechanism in unsolved Mendelian diseases.