Clinical exome sequencing is nondiagnostic for about 75% of patients evaluated for a possible Mendelian disorder. We examined the ability of systematic reevaluation of exome data to establish ...additional diagnoses.
The exome and phenotypic data of 40 individuals with previously nondiagnostic clinical exomes were reanalyzed with current software and literature.
A definitive diagnosis was identified for 4 of 40 participants (10%). In these cases the causative variant is de novo and in a relevant autosomal-dominant disease gene. The literature to tie the causative genes to the participants’ phenotypes was weak, nonexistent, or not readily located at the time of the initial clinical exome reports. At the time of diagnosis by reanalysis, the supporting literature was 1 to 3 years old.
Approximately 250 gene–disease and 9,200 variant–disease associations are reported annually. This increase in information necessitates regular reevaluation of nondiagnostic exomes. To be practical, systematic reanalysis requires further automation and more up-to-date variant databases. To maximize the diagnostic yield of exome sequencing, providers should periodically request reanalysis of nondiagnostic exomes. Accordingly, policies regarding reanalysis should be weighed in combination with factors such as cost and turnaround time when selecting a clinical exome laboratory.
Variant pathogenicity classifiers such as SIFT, PolyPhen-2, CADD, and MetaLR assist in interpretation of the hundreds of rare, missense variants in the typical patient genome by deprioritizing some ...variants as likely benign. These widely used methods misclassify 26 to 38% of known pathogenic mutations, which could lead to missed diagnoses if the classifiers are trusted as definitive in a clinical setting. We developed M-CAP, a clinical pathogenicity classifier that outperforms existing methods at all thresholds and correctly dismisses 60% of rare, missense variants of uncertain significance in a typical genome at 95% sensitivity.
Manual review of aligned reads for confirmation and interpretation of variant calls is an important step in many variant calling pipelines for next-generation sequencing (NGS) data. Visual inspection ...can greatly increase the confidence in calls, reduce the risk of false positives, and help characterize complex events. The Integrative Genomics Viewer (IGV) was one of the first tools to provide NGS data visualization, and it currently provides a rich set of tools for inspection, validation, and interpretation of NGS datasets, as well as other types of genomic data. Here, we present a short overview of IGV's variant review features for both single-nucleotide variants and structural variants, with examples from both cancer and germline datasets. IGV is freely available at https://www.igv.org
.
The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the ...accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.
We developed the Genomic Regions Enrichment of Annotations Tool (GREAT) to analyze the functional significance of cis-regulatory regions identified by localized measurements of DNA binding events ...across an entire genome. Whereas previous methods took into account only binding proximal to genes, GREAT is able to properly incorporate distal binding sites and control for false positives using a binomial test over the input genomic regions. GREAT incorporates annotations from 20 ontologies and is available as a web application. Applying GREAT to data sets from chromatin immunoprecipitation coupled with massively parallel sequencing (ChIP-seq) of multiple transcription-associated factors, including SRF, NRSF, GABP, Stat3 and p300 in different developmental contexts, we recover many functions of these factors that are missed by existing gene-based tools, and we generate testable hypotheses. The utility of GREAT is not limited to ChIP-seq, as it could also be applied to open chromatin, localized epigenomic markers and similar functional data sets, as well as comparative genomics sets.
Humans differ from other animals in many aspects of anatomy, physiology, and behaviour; however, the genotypic basis of most human-specific traits remains unknown. Recent whole-genome comparisons ...have made it possible to identify genes with elevated rates of amino acid change or divergent expression in humans, and non-coding sequences with accelerated base pair changes. Regulatory alterations may be particularly likely to produce phenotypic effects while preserving viability, and are known to underlie interesting evolutionary differences in other species. Here we identify molecular events particularly likely to produce significant regulatory changes in humans: complete deletion of sequences otherwise highly conserved between chimpanzees and other mammals. We confirm 510 such deletions in humans, which fall almost exclusively in non-coding regions and are enriched near genes involved in steroid hormone signalling and neural function. One deletion removes a sensory vibrissae and penile spine enhancer from the human androgen receptor (AR) gene, a molecular change correlated with anatomical loss of androgen-dependent sensory vibrissae and penile spines in the human lineage. Another deletion removes a forebrain subventricular zone enhancer near the tumour suppressor gene growth arrest and DNA-damage-inducible, gamma (GADD45G), a loss correlated with expansion of specific brain regions in humans. Deletions of tissue-specific enhancers may thus accompany both loss and gain traits in the human lineage, and provide specific examples of the kinds of regulatory alterations and inactivation events long proposed to have an important role in human evolutionary divergence.
The sequence and assembly of human genomes using long‐read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, ...continuity, and gene annotation of genome assemblies generated from either high‐fidelity (HiFi) or continuous long‐read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5‐fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes.
The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, ...short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per genome. We also discover 156 inversions per genome and 58 of the inversions intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a three to sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The methods and the dataset presented serve as a gold standard for the scientific community allowing us to make recommendations for maximizing structural variation sensitivity for future genome sequencing studies.
Microbiota regulate intestinal physiology by modifying host gene expression along the length of the intestine, but the underlying regulatory mechanisms remain unresolved. Transcriptional specificity ...occurs through interactions between transcription factors (TFs) and cis-regulatory regions (CRRs) characterized by nucleosome-depleted accessible chromatin. We profiled transcriptome and accessible chromatin landscapes in intestinal epithelial cells (IECs) from mice reared in the presence or absence of microbiota. We show that regional differences in gene transcription along the intestinal tract were accompanied by major alterations in chromatin accessibility. Surprisingly, we discovered that microbiota modify host gene transcription in IECs without significantly impacting the accessible chromatin landscape. Instead, microbiota regulation of host gene transcription might be achieved by differential expression of specific TFs and enrichment of their binding sites in nucleosome-depleted CRRs near target genes. Our results suggest that the chromatin landscape in IECs is preprogrammed by the host in a region-specific manner to permit responses to microbiota through binding of open CRRs by specific TFs.