MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads ...generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me).
As a consequence of the accumulation of insertion events over evolutionary time, mobile elements now comprise nearly half of the human genome. The Alu, L1, and SVA mobile element families are still ...duplicating, generating variation between individual genomes. Mobile element insertions (MEI) have been identified as causes for genetic diseases, including hemophilia, neurofibromatosis, and various cancers. Here we present a comprehensive map of 7,380 MEI polymorphisms from the 1000 Genomes Project whole-genome sequencing data of 185 samples in three major populations detected with two detection methods. This catalog enables us to systematically study mutation rates, population segregation, genomic distribution, and functional properties of MEI polymorphisms and to compare MEI to SNP variation from the same individuals. Population allele frequencies of MEI and SNPs are described, broadly, by the same neutral ancestral processes despite vastly different mutation mechanisms and rates, except in coding regions where MEI are virtually absent, presumably due to strong negative selection. A direct comparison of MEI and SNP diversity levels suggests a differential mobile element insertion rate among populations.
An ultrafast DNA sequence aligner (Isaac Genome Alignment Software) that takes advantage of high-memory hardware (>48 GB) and variant caller (Isaac Variant Caller) have been developed. We demonstrate ...that our combined pipeline (Isaac) is four to five times faster than BWA + GATK on equivalent hardware, with comparable accuracy as measured by trio conflict rates and sensitivity. We further show that Isaac is effective in the detection of disease-causing variants and can easily/economically be run on commodity hardware.
Isaac has an open source license and can be obtained at https://github.com/sequencing.
Forward genetic mutational studies, adaptive evolution, and phenotypic screening are powerful tools for creating new variant organisms with desirable traits. However, mutations generated in the ...process cannot be easily identified with traditional genetic tools. We show that new high-throughput, massively parallel sequencing technologies can completely and accurately characterize a mutant genome relative to a previously sequenced parental (reference) strain. We studied a mutant strain of Pichia stipitis, a yeast capable of converting xylose to ethanol. This unusually efficient mutant strain was developed through repeated rounds of chemical mutagenesis, strain selection, transformation, and genetic manipulation over a period of seven years. We resequenced this strain on three different sequencing platforms. Surprisingly, we found fewer than a dozen mutations in open reading frames. All three sequencing technologies were able to identify each single nucleotide mutation given at least 10-15-fold nominal sequence coverage. Our results show that detecting mutations in evolved and engineered organisms is rapid and cost-effective at the whole-genome level using new sequencing technologies. Identification of specific mutations in strains with altered phenotypes will add insight into specific gene functions and guide further metabolic engineering efforts.
Analysis of genomic sequencing data requires efficient, easy-to-use access to alignment results and flexible data management tools (e.g. filtering, merging, sorting, etc.). However, the enormous ...amount of data produced by current sequencing technologies is typically stored in compressed, binary formats that are not easily handled by the text-based parsers commonly used in bioinformatics research.
We introduce a software suite for programmers and end users that facilitates research analysis and data management using BAM files. BamTools provides both the first C++ API publicly available for BAM file support as well as a command-line toolkit.
BamTools was written in C++, and is supported on Linux, Mac OSX and MS Windows. Source code and documentation are freely available at http://github.org/pezmaster31/bamtools.
Previously reported applications of the 454 Life Sciences pyrosequencing technology have relied on deep sequence coverage for accurate polymorphism discovery because of frequent insertion and ...deletion sequence errors. Here we report a new base calling program, Pyrobayes, for pyrosequencing reads. Pyrobayes permits accurate single-nucleotide polymorphism (SNP) calling in resequencing applications, even in shallow read coverage, primarily because it produces more confident base calls than the native base calling program.
The prognostic significance of tumor infiltrating lymphocyte (TIL) response in cutaneous melanoma is controversial. This analysis of data from a prospective, randomized trial included patients with ...cutaneous melanoma > or = 1.0 mm Breslow thickness who underwent wide local excision and sentinel lymph node (SLN) biopsy. Univariate and multivariate analyses were performed to determine factors associated with TIL response, disease-free survival (DFS), and overall survival (OS). A total of 515 patients were included; TIL response was classified as "brisk" (n = 100; 19.4%) or "non-brisk" (n = 415; 80.6%). Patients in the nonbrisk TIL group were more likely to have tumor-positive SLN (17.6% vs 7%; P = 0.0087). On multivariate analysis, nonbrisk TIL response, increased tumor thickness, and ulceration were significant independent predictors of tumor-positive SLN. By Kaplan-Meier analysis, 5-year DFS rate was 91 per cent for those with a brisk TIL response compared with 86 per cent in the nonbrisk group (P = 0.41). The 5-year OS rates were 95 per cent versus 84 per cent in the brisk versus nonbrisk TIL groups, respectively (P = 0.0083). However, on multivariate analysis, TIL response was not a significant independent factor predicting DFS or OS. TIL response is a significant predictor of SLN metastasis but is not a major predictor of DFS or OS.
The evolution of gene expression is a challenging problem in evolutionary biology, for which accurate, well-calibrated measurements and methods are crucial.
We quantified gene expression with ...whole-transcriptome sequencing in four diploid, prototrophic strains of Saccharomyces species grown under the same condition to investigate the evolution of gene expression. We found that variation in expression is gene-dependent with large variations in each gene's expression between replicates of the same species. This confounds the identification of genes differentially expressed across species. To address this, we developed a statistical approach to establish significance bounds for inter-species differential expression in RNA-Seq data based on the variance measured across biological replicates. This metric estimates the combined effects of technical and environmental variance, as well as Poisson sampling noise by isolating each component. Despite a paucity of large expression changes, we found a strong correlation between the variance of gene expression change and species divergence (R² = 0.90).
We provide an improved methodology for measuring gene expression changes in evolutionary diverged species using RNA Seq, where experimental artifacts can mimic evolutionary effects.GEO Accession Number: GSE32679.
Abstract Introduction Single-use laundry detergent pods (LDPs) were introduced to the United States in 2010 but had been available in Europe as early as 2001. Case reports of unintentional exposures ...noted vomiting, ocular injuries, respiratory depression, and central nervous system depression. We summarize clinical effects from unintentional LDP exposures reported to a single poison center over 15 months. Methods Electronic poison center records were searched using verbatim field and both product and generic codes to identify laundry pod exposures from January 1, 2012, through April 9, 2013. Clinical effects were abstracted to a database and summarized using descriptive statistics. Results We identified 131 cases between March 2012 and April 2013. Median (interquartile range) age was 2.0 (1.5) years with 4 adult cases; all were coded as unintentional. The most common route was ingestion (120) followed by ocular (14) and dermal (6). Some patients had multiple routes of exposure. Of ingestion exposures, 79 (66%) were managed at home; and 41 (34%) were evaluated in a hospital, of which 9 patients were admitted. The median (interquartile range) age of admitted patients was 1.4 (1.1) years. Relevant findings in these admitted children included emesis (78%), central nervous system depression (22%), upper airway effects (56%), lower respiratory symptoms (33%), seizure (n = 1), and intubation (67%). One child with emesis initially managed at home was subsequently intubated for respiratory distress. Discussion Exposure to LDP can cause significant toxicity, particularly in infants and toddlers. Compared to traditional detergents, clinicians should be aware of the potential for airway compromise following exposure to LDP.
Abstract Background The significance of mitotic rate (MR) in melanoma remains controversial. Methods In this retrospective analysis of a prospective randomized trial that included patients with ...melanoma of 1.0 mm or greater, all patients underwent wide excision and sentinel node (sentinel lymph node SLN) biopsy. Univariate and multivariate analyses were performed to evaluate factors predictive of disease-free survival (DFS) and overall survival (OS). Results A total of 551 patients had MR reported. A cut-off point of 6 mitoses/mm2 best discriminated DFS and OS: 455 patients (82.6%) had MR less than 6/mm2 . SLN were tumor-positive in 14.7% of low MR versus 31.3% of high MR patients ( P = .0003). There were significant differences in DFS ( P = .0014) and OS ( P = .0002) between the 2 groups, however, MR failed to remain significant in the multivariate model. Conclusions MR is weakly predictive of SLN status but it is not an independent predictor of survival for melanomas 1.0 mm or thicker.