High-throughput next-generation RNA sequencing has matured into a viable and powerful method for detecting variations in transcript expression and regulation. Proactive quality control is of critical ...importance as unanticipated biases, artifacts, or errors can potentially drive false associations and lead to flawed results.
We have developed the Quality of RNA-Seq Toolset, or QoRTs, a comprehensive, multifunction toolset that assists in quality control and data processing of high-throughput RNA sequencing data.
QoRTs generates an unmatched variety of quality control metrics, and can provide cross-comparisons of replicates contrasted by batch, biological sample, or experimental condition, revealing any outliers and/or systematic issues that could drive false associations or otherwise compromise downstream analyses. In addition, QoRTs simultaneously replaces the functionality of numerous other data-processing tools, and can quickly and efficiently generate quality control metrics, coverage counts (for genes, exons, and known/novel splice-junctions), and browser tracks. These functions can all be carried out as part of a single unified data-processing/quality control run, greatly reducing both the complexity and the total runtime of the analysis pipeline. The software, source code, and documentation are available online at http://hartleys.github.io/QoRTs.
Next-generation sequencing (NGS) data are used for both clinical care and clinical research. DNA sequence variants identified using NGS are often returned to patients/participants as part of clinical ...or research protocols. The current standard of care is to validate NGS variants using Sanger sequencing, which is costly and time-consuming.
We performed a large-scale, systematic evaluation of Sanger-based validation of NGS variants using data from the ClinSeq® project. We first used NGS data from 19 genes in 5 participants, comparing them to high-throughput Sanger sequencing results on the same samples, and found no discrepancies among 234 NGS variants. We then compared NGS variants in 5 genes from 684 participants against data from Sanger sequencing.
Of over 5800 NGS-derived variants, 19 were not validated by Sanger data. Using newly designed sequencing primers, Sanger sequencing confirmed 17 of the NGS variants, and the remaining 2 variants had low quality scores from exome sequencing. Overall, we measured a validation rate of 99.965% for NGS variants using Sanger sequencing, which was higher than many existing medical tests that do not necessitate orthogonal validation.
A single round of Sanger sequencing is more likely to incorrectly refute a true-positive variant from NGS than to correctly identify a false-positive variant from NGS. Validation of NGS-derived variants using Sanger sequencing has limited utility, and best practice standards should not include routine orthogonal Sanger validation of NGS variants.
After two decades of improvements, the current human reference genome (GRCh38) is the most accurate and complete vertebrate genome ever produced. However, no single chromosome has been finished end ...to end, and hundreds of unresolved gaps persist
. Here we present a human genome assembly that surpasses the continuity of GRCh38
, along with a gapless, telomere-to-telomere assembly of a human chromosome. This was enabled by high-coverage, ultra-long-read nanopore sequencing of the complete hydatidiform mole CHM13 genome, combined with complementary technologies for quality improvement and validation. Focusing our efforts on the human X chromosome
, we reconstructed the centromeric satellite DNA array (approximately 3.1 Mb) and closed the 29 remaining gaps in the current reference, including new sequences from the human pseudoautosomal regions and from cancer-testis ampliconic gene families (CT-X and GAGE). These sequences will be integrated into future human reference genome releases. In addition, the complete chromosome X, combined with the ultra-long nanopore data, allowed us to map methylation patterns across complex tandem repeats and satellite arrays. Our results demonstrate that finishing the entire human genome is now within reach, and the data presented here will facilitate ongoing efforts to complete the other human chromosomes.
We report an early onset spastic ataxia-neuropathy syndrome in two brothers of a consanguineous family characterized clinically by lower extremity spasticity, peripheral neuropathy, ptosis, ...oculomotor apraxia, dystonia, cerebellar atrophy, and progressive myoclonic epilepsy. Whole-exome sequencing identified a homozygous missense mutation (c.1847G>A; p.Y616C) in AFG3L2, encoding a subunit of an m-AAA protease. m-AAA proteases reside in the mitochondrial inner membrane and are responsible for removal of damaged or misfolded proteins and proteolytic activation of essential mitochondrial proteins. AFG3L2 forms either a homo-oligomeric isoenzyme or a hetero-oligomeric complex with paraplegin, a homologous protein mutated in hereditary spastic paraplegia type 7 (SPG7). Heterozygous loss-of-function mutations in AFG3L2 cause autosomal-dominant spinocerebellar ataxia type 28 (SCA28), a disorder whose phenotype is strikingly different from that of our patients. As defined in yeast complementation assays, the AFG3L2(Y616C) gene product is a hypomorphic variant that exhibited oligomerization defects in yeast as well as in patient fibroblasts. Specifically, the formation of AFG3L2(Y616C) complexes was impaired, both with itself and to a greater extent with paraplegin. This produced an early-onset clinical syndrome that combines the severe phenotypes of SPG7 and SCA28, in additional to other "mitochondrial" features such as oculomotor apraxia, extrapyramidal dysfunction, and myoclonic epilepsy. These findings expand the phenotype associated with AFG3L2 mutations and suggest that AFG3L2-related disease should be considered in the differential diagnosis of spastic ataxias.
Antibodies of the VRC01 class neutralize HIV-1, arise in diverse HIV-1-infected donors, and are potential templates for an effective HIV-1 vaccine. However, the stochastic processes that generate ...repertoires in each individual of >1012 antibodies make elicitation of specific antibodies uncertain. Here we determine the ontogeny of the VRC01 class by crystallography and next-generation sequencing. Despite antibody-sequence differences exceeding 50%, antibody-gp120 cocrystal structures reveal VRC01-class recognition to be remarkably similar. B cell transcripts indicate that VRC01-class antibodies require few specific genetic elements, suggesting that naive-B cells with VRC01-class features are generated regularly by recombination. Virtually all of these fail to mature, however, with only a few—likely one—ancestor B cell expanding to form a VRC01-class lineage in each donor. Developmental similarities in multiple donors thus reveal the generation of VRC01-class antibodies to be reproducible in principle, thereby providing a framework for attempts to elicit similar antibodies in the general population.
Display omitted
•VRC01-class antibodies from six donors exhibit remarkably similar HIV-1 recognition•NGS sequencing of six donors with VRC01-class antibodies reveals genetic requirements•Restricted gene usage for VRC01-class antibodies indicates single ancestor B cell•Elicitation of VRC01-class antibodies is reproducible
Antibodies capable of neutralizing divergent influenza A viruses could form the basis of a universal vaccine. Here, from subjects enrolled in an H5N1 DNA/MIV-prime-boost influenza vaccine trial, we ...sorted hemagglutinin cross-reactive memory B cells and identified three antibody classes, each capable of neutralizing diverse subtypes of group 1 and group 2 influenza A viruses. Co-crystal structures with hemagglutinin revealed that each class utilized characteristic germline genes and convergent sequence motifs to recognize overlapping epitopes in the hemagglutinin stem. All six analyzed subjects had sequences from at least one multidonor class, and—in half the subjects—multidonor-class sequences were recovered from >40% of cross-reactive B cells. By contrast, these multidonor-class sequences were rare in published antibody datasets. Vaccination with a divergent hemagglutinin can thus increase the frequency of B cells encoding broad influenza A-neutralizing antibodies. We propose the sequence signature-quantified prevalence of these B cells as a metric to guide universal influenza A immunization strategies.
Display omitted
•Isolation of group 1 and group 2 influenza A-neutralizing antibodies from H5N1 vaccinees•Discovery of three classes of broadly neutralizing antibodies directed to the HA stem•Delineation of sequence signatures specific for broadly neutralizing antibodies•Antibody quantification by NGS to guide the development of a universal vaccine
Quantifying B cells capable of producing broadly neutralizing antibodies against influenza serves as a metric to guide the development of a universal influenza vaccine.
Genome- and exome-sequencing costs are continuing to fall, and many individuals are undergoing these assessments as research participants and patients. The issue of secondary (so-called incidental) ...findings in exome analysis is controversial, and data are needed on methods of detection and their frequency. We piloted secondary variant detection by analyzing exomes for mutations in cancer-susceptibility syndromes in subjects ascertained for atherosclerosis phenotypes. We performed exome sequencing on 572 ClinSeq participants, and in 37 genes, we interpreted variants that cause high-penetrance cancer syndromes by using an algorithm that filtered results on the basis of mutation type, quality, and frequency and that filtered mutation-database entries on the basis of defined categories of causation. We identified 454 sequence variants that differed from the human reference. Exclusions were made on the basis of sequence quality (26 variants) and high frequency in the cohort (77 variants) or dbSNP (17 variants), leaving 334 variants of potential clinical importance. These were further filtered on the basis of curation of literature reports. Seven participants, four of whom were of Ashkenazi Jewish descent and three of whom did not meet family-history-based referral criteria, had deleterious BRCA1 or BRCA2 mutations. One participant had a deleterious SDHC mutation, which causes paragangliomas. Exome sequencing, coupled with multidisciplinary interpretation, detected clinically important mutations in cancer-susceptibility genes; four of such mutations were in individuals without a significant family history of disease. We conclude that secondary variants of high clinical importance will be detected at an appreciable frequency in exomes, and we suggest that priority be given to the development of more efficient modes of interpretation with trials in larger patient groups.
Although RNA-Seq data provide unprecedented isoform-level expression information, detection of alternative isoform regulation (AIR) remains difficult, particularly when working with an incomplete ...transcript annotation. We introduce JunctionSeq, a new method that builds on the statistical techniques used by the well-established DEXSeq package to detect differential usage of both exonic regions and splice junctions. In particular, JunctionSeq is capable of detecting differential usage of novel splice junctions without the need for an additional isoform assembly step, greatly improving performance when the available transcript annotation is flawed or incomplete. JunctionSeq also provides a powerful and streamlined visualization toolset that allows bioinformaticians to quickly and intuitively interpret their results. We tested our method on publicly available data from several experiments performed on the rat pineal gland and Toxoplasma gondii, successfully detecting known and previously validated AIR genes in 19 out of 19 gene-level hypothesis tests. Due to its ability to query novel splice sites, JunctionSeq is still able to detect these differences even when all alternative isoforms for these genes were not included in the transcript annotation. JunctionSeq thus provides a powerful method for detecting alternative isoform regulation even with low-quality annotations. An implementation of JunctionSeq is available as an R/Bioconductor package.
The utility of induced pluripotent stem cells (iPSCs) as models to study diseases and as sources for cell therapy depends on the integrity of their genomes. Despite recent publications of DNA ...sequence variations in the iPSCs, the true scope of such changes for the entire genome is not clear. Here we report the whole-genome sequencing of three human iPSC lines derived from two cell types of an adult donor by episomal vectors. The vector sequence was undetectable in the deeply sequenced iPSC lines. We identified 1,058–1,808 heterozygous single-nucleotide variants (SNVs), but no copy-number variants, in each iPSC line. Six to twelve of these SNVs were within coding regions in each iPSC line, but ∼50% of them are synonymous changes and the remaining are not selectively enriched for known genes associated with cancers. Our data thus suggest that episome-mediated reprogramming is not inherently mutagenic during integration-free iPSC induction.
► Deep whole-genome sequencing of three human iPSC lines generated with episomal vectors ► No vector sequence found in the nuclear and mitochondrial genomes ► Single-nucleotide and copy-number variation occurs at a normal frequency ► No evidence for selective enrichment of variation at functionally relevant loci
Public health officials have raised concerns that plasmid transfer between Enterobacteriaceae species may spread resistance to carbapenems, an antibiotic class of last resort, thereby rendering ...common health care-associated infections nearly impossible to treat. To determine the diversity of carbapenemase-encoding plasmids and assess their mobility among bacterial species, we performed comprehensive surveillance and genomic sequencing of carbapenem-resistant Enterobacteriaceae in the National Institutes of Health (NIH) Clinical Center patient population and hospital environment. We isolated a repertoire of carbapenemase-encoding Enterobacteriaceae, including multiple strains of Klebsiella pneumoniae, Klebsiella oxytoca, Escherichia coli, Enterobacter cloacae, Citrobacter freundii, and Pantoea species. Long-read genome sequencing with full end-to-end assembly revealed that these organisms carry the carbapenem resistance genes on a wide array of plasmids. K. pneumoniae and E. cloacae isolated simultaneously from a single patient harbored two different carbapenemase-encoding plasmids, indicating that plasmid transfer between organisms was unlikely within this patient. We did, however, find evidence of horizontal transfer of carbapenemase-encoding plasmids between K. pneumoniae, E. cloacae, and C. freundii in the hospital environment. Our data, including full plasmid identification, challenge assumptions about horizontal gene transfer events within patients and identify possible connections between patients and the hospital environment. In addition, we identified a new carbapenemase-encoding plasmid of potentially high clinical impact carried by K. pneumoniae, E. coli, E. cloacae, and Pantoea species, in unrelated patients and in the hospital environment.