Illumina DNA sequencing is now the predominant source of raw genomic data, and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble keeping pace. A common bottleneck ...in such pipelines is the requirement to read, write, sort and compress large BAM files multiple times.
We present SAMBLASTER, a tool that reduces the number of times such costly operations are performed. SAMBLASTER is designed to mark duplicates in read-sorted SAM files as a piped post-pass on DNA aligner output before it is compressed to BAM. In addition, it can simultaneously output into separate files the discordant read-pairs and/or split-read mappings used for structural variant calling. As an alignment post-pass, its own runtime overhead is negligible, while dramatically reducing overall pipeline complexity and runtime. As a stand-alone duplicate marking tool, it performs significantly better than PICARD or SAMBAMBA in terms of both speed and memory usage, while achieving nearly identical results.
SAMBLASTER is open-source C+ + code and freely available for download from https://github.com/GregoryFaust/samblaster.
Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing web-based ...methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools Contact: aaronquinlan@gmail.com; imh4y@virginia.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Affordable genome sequencing technologies promise to revolutionize the field of human genetics by enabling comprehensive studies that interrogate all classes of genome variation, genome-wide, across ...the entire allele frequency spectrum. Ongoing projects worldwide are sequencing many thousands—and soon millions—of human genomes as part of various gene mapping studies, biobanking efforts, and clinical programs. However, while genome sequencing data production has become routine, genome analysis and interpretation remain challenging endeavors with many limitations and caveats. Here, we review the current state of technologies for genetic variant discovery, genotyping, and functional interpretation and discuss the prospects for future advances. We focus on germline variants discovered by whole-genome sequencing, genome-wide functional genomic approaches for predicting and measuring variant functional effects, and implications for studies of common and rare human disease.
As whole-genome sequencing becomes the standard in human genetic studies, approaches for genetic variant detection and interpretation are key to discern biological meaning from sequence.
Comprehensive discovery of structural variation (SV) from whole genome sequencing data requires multiple detection signals including read-pair, split-read, read-depth and prior knowledge. Owing to ...technical challenges, extant SV discovery algorithms either use one signal in isolation, or at best use two sequentially. We present LUMPY, a novel SV discovery framework that naturally integrates multiple SV signals jointly across multiple samples. We show that LUMPY yields improved sensitivity, especially when SV signal is reduced owing to either low coverage data or low intra-sample variant allele frequency. We also report a set of 4,564 validated breakpoints from the NA12878 human genome. https://github.com/arq5x/lumpy-sv.
Structural variants (SVs) are an important source of human genome diversity, but their functional effects are poorly understood. We mapped 61,668 SVs in 613 individuals from the GTEx project and ...measured their effects on gene expression. We estimate that common SVs are causal at 2.66% of eQTLs, a 10.5-fold enrichment relative to their abundance in the genome. Duplications and deletions were the most impactful variant types, whereas the contribution of mobile element insertions was small (0.12% of eQTLs, 1.9-fold enriched). Multitissue analysis of eQTLs revealed that gene-altering SVs show more constitutive effects than other variant types, with 62.09% of coding SV-eQTLs active in all tissues with eQTL activity compared with 23.08% of coding SNV- and indel-eQTLs. Noncoding SVs, SNVs and indels show broadly similar patterns. We also identified 539 rare SVs associated with nearby gene expression outliers. Of these, 62.34% are noncoding SVs that affect gene expression but have modest enrichment at regulatory elements, showing that rare noncoding SVs are a major source of gene expression differences but remain difficult to predict from current annotations. Both common and rare SVs often affect the expression of multiple genes: SV-eQTLs affect an average of 1.82 nearby genes, whereas SNV- and indel-eQTLs affect an average of 1.09 genes, and 21.34% of rare expression-altering SVs show effects on two to nine different genes. We also observe significant effects on rare gene expression changes extending 1 Mb from the SV. This provides a mechanism by which individual SVs may have strong or pleiotropic effects on phenotypic variation.
SpeedSeq is an open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a ...bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers performance competitive with or superior to current methods for detecting germline and somatic single-nucleotide variants, structural variants, insertions and deletions, and it includes novel functionality for streamlined interpretation.
The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 ...release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.
To develop evidence-based recommendations for clinicians caring for children (including infants, school-aged children, and adolescents) with septic shock and other sepsis-associated organ ...dysfunction.
A panel of 49 international experts, representing 12 international organizations, as well as three methodologists and three public members was convened. Panel members assembled at key international meetings (for those panel members attending the conference), and a stand-alone meeting was held for all panel members in November 2018. A formal conflict-of-interest policy was developed at the onset of the process and enforced throughout. Teleconferences and electronic-based discussion among the chairs, co-chairs, methodologists, and group heads, as well as within subgroups, served as an integral part of the guideline development process.
The panel consisted of six subgroups: recognition and management of infection, hemodynamics and resuscitation, ventilation, endocrine and metabolic therapies, adjunctive therapies, and research priorities. We conducted a systematic review for each Population, Intervention, Control, and Outcomes question to identify the best available evidence, statistically summarized the evidence, and then assessed the quality of evidence using the Grading of Recommendations Assessment, Development, and Evaluation approach. We used the evidence-to-decision framework to formulate recommendations as strong or weak, or as a best practice statement. In addition, "in our practice" statements were included when evidence was inconclusive to issue a recommendation, but the panel felt that some guidance based on practice patterns may be appropriate.
The panel provided 77 statements on the management and resuscitation of children with septic shock and other sepsis-associated organ dysfunction. Overall, six were strong recommendations, 52 were weak recommendations, and nine were best-practice statements. For 13 questions, no recommendations could be made; but, for 10 of these, "in our practice" statements were provided. In addition, 49 research priorities were identified.
A large cohort of international experts was able to achieve consensus regarding many recommendations for the best care of children with sepsis, acknowledging that most aspects of care had relatively low quality of evidence resulting in the frequent issuance of weak recommendations. Despite this challenge, these recommendations regarding the management of children with septic shock and other sepsis-associated organ dysfunction provide a foundation for consistent care to improve outcomes and inform future research.
Structural variants (SVs) are an important source of human genetic diversity, but their contribution to traits, disease and gene regulation remains unclear. We mapped cis expression quantitative ...trait loci (eQTLs) in 13 tissues via joint analysis of SVs, single-nucleotide variants (SNVs) and short insertion/deletion (indel) variants from deep whole-genome sequencing (WGS). We estimated that SVs are causal at 3.5-6.8% of eQTLs-a substantially higher fraction than prior estimates-and that expression-altering SVs have larger effect sizes than do SNVs and indels. We identified 789 putative causal SVs predicted to directly alter gene expression: most (88.3%) were noncoding variants enriched at enhancers and other regulatory elements, and 52 were linked to genome-wide association study loci. We observed a notable abundance of rare high-impact SVs associated with aberrant expression of nearby genes. These results suggest that comprehensive WGS-based SV analyses will increase the power of common- and rare-variant association studies.
Objectives
To develop evidence-based recommendations for clinicians caring for children (including infants, school-aged children, and adolescents) with septic shock and other sepsis-associated organ ...dysfunction.
Design
A panel of 49 international experts, representing 12 international organizations, as well as three methodologists and three public members was convened. Panel members assembled at key international meetings (for those panel members attending the conference), and a stand-alone meeting was held for all panel members in November 2018. A formal conflict-of-interest policy was developed at the onset of the process and enforced throughout. Teleconferences and electronic-based discussion among the chairs, co-chairs, methodologists, and group heads, as well as within subgroups, served as an integral part of the guideline development process.
Methods
The panel consisted of six subgroups: recognition and management of infection, hemodynamics and resuscitation, ventilation, endocrine and metabolic therapies, adjunctive therapies, and research priorities. We conducted a systematic review for each Population, Intervention, Control, and Outcomes question to identify the best available evidence, statistically summarized the evidence, and then assessed the quality of evidence using the Grading of Recommendations Assessment, Development, and Evaluation approach. We used the evidence-to-decision framework to formulate recommendations as strong or weak, or as a best practice statement. In addition, “in our practice” statements were included when evidence was inconclusive to issue a recommendation, but the panel felt that some guidance based on practice patterns may be appropriate.
Results
The panel provided 77 statements on the management and resuscitation of children with septic shock and other sepsis-associated organ dysfunction. Overall, six were strong recommendations, 49 were weak recommendations, and nine were best-practice statements. For 13 questions, no recommendations could be made; but, for 10 of these, “in our practice” statements were provided. In addition, 52 research priorities were identified.
Conclusions
A large cohort of international experts was able to achieve consensus regarding many recommendations for the best care of children with sepsis, acknowledging that most aspects of care had relatively low quality of evidence resulting in the frequent issuance of weak recommendations. Despite this challenge, these recommendations regarding the management of children with septic shock and other sepsis-associated organ dysfunction provide a foundation for consistent care to improve outcomes and inform future research.