In order for next‐generation sequencing to become widely used as a diagnostic in the healthcare industry, sequencing instrumentation will need to be mass produced with a high degree of quality and ...economy. One way to achieve this is to recast DNA sequencing in a format that fully leverages the manufacturing base created for computer chips, complementary metal‐oxide semiconductor chip fabrication, which is the current pinnacle of large scale, high quality, low‐cost manufacturing of high technology. To achieve this, ideally the entire sensory apparatus of the sequencer would be embodied in a standard semiconductor chip, manufactured in the same fab facilities used for logic and memory chips. Recently, such a sequencing chip, and the associated sequencing platform, has been developed and commercialized by Ion Torrent, a division of Life Technologies, Inc. Here we provide an overview of this semiconductor chip based sequencing technology, and summarize the progress made since its commercial introduction. We described in detail the progress in chip scaling, sequencing throughput, read length, and accuracy. We also summarize the enhancements in the associated platform, including sample preparation, data processing, and engagement of the broader development community through open source and crowdsourcing initiatives.
The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA ...sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25-100 base range, in the presence of errors and true biological variation.
We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels.
We compare BFAST to a selection of large-scale alignment tools -- BLAT, MAQ, SHRiMP, and SOAP -- in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at (http://bfast.sourceforge.net).
Cytosine DNA methylation is important in regulating gene expression and in silencing transposons and other repetitive sequences. Recent genomic studies in Arabidopsis thaliana have revealed that many ...endogenous genes are methylated either within their promoters or within their transcribed regions, and that gene methylation is highly correlated with transcription levels. However, plants have different types of methylation controlled by different genetic pathways, and detailed information on the methylation status of each cytosine in any given genome is lacking. To this end, we generated a map at single-base-pair resolution of methylated cytosines for Arabidopsis, by combining bisulphite treatment of genomic DNA with ultra-high-throughput sequencing using the Illumina 1G Genome Analyser and Solexa sequencing technology. This approach, termed BS-Seq, unlike previous microarray-based methods, allows one to sensitively measure cytosine methylation on a genome-wide scale within specific sequence contexts. Here we describe methylation on previously inaccessible components of the genome and analyse the DNA methylation sequence composition and distribution. We also describe the effect of various DNA methylation mutants on genome-wide methylation patterns, and demonstrate that our newly developed library construction and computational methods can be applied to large genomes such as that of mouse.
The study of reaction–diffusion processes is much more complicated on general curved surfaces than on standard Cartesian coordinate spaces. Here we show how to formulate and solve systems of ...reaction–diffusion equations on surfaces in an extremely simple way, using only the standard Cartesian form of differential operators, and a discrete unorganized point set to represent the surface. Our method decouples surface geometry from the underlying differential operators. As a consequence, it becomes possible to formulate and solve rather general reaction–diffusion equations on general surfaces without having to consider the complexities of differential geometry or sophisticated numerical analysis. To illustrate the generality of the method, computations for surface diffusion, pattern formation, excitable media, and bulk-surface coupling are provided for a variety of complex point cloud surfaces.
U87MG is a commonly studied grade IV glioma cell line that has been analyzed in at least 1,700 publications over four decades. In order to comprehensively characterize the genome of this cell line ...and to serve as a model of broad cancer genome sequencing, we have generated greater than 30x genomic sequence coverage using a novel 50-base mate paired strategy with a 1.4kb mean insert library. A total of 1,014,984,286 mate-end and 120,691,623 single-end two-base encoded reads were generated from five slides. All data were aligned using a custom designed tool called BFAST, allowing optimal color space read alignment and accurate identification of DNA variants. The aligned sequence reads and mate-pair information identified 35 interchromosomal translocation events, 1,315 structural variations (>100 bp), 191,743 small (<21 bp) insertions and deletions (indels), and 2,384,470 single nucleotide variations (SNVs). Among these observations, the known homozygous mutation in PTEN was robustly identified, and genes involved in cell adhesion were overrepresented in the mutated gene list. Data were compared to 219,187 heterozygous single nucleotide polymorphisms assayed by Illumina 1M Duo genotyping array to assess accuracy: 93.83% of all SNPs were reliably detected at filtering thresholds that yield greater than 99.99% sequence accuracy. Protein coding sequences were disrupted predominantly in this cancer cell line due to small indels, large deletions, and translocations. In total, 512 genes were homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and 35 by interchromosomal translocations to reveal a highly mutated cell line genome. Of the small homozygously mutated variants, 8 SNVs and 99 indels were novel events not present in dbSNP. These data demonstrate that routine generation of broad cancer genome sequence is possible outside of genome centers. The sequence analysis of U87MG provides an unparalleled level of mutational resolution compared to any cell line to date.
Schizophrenia is a devastating neurodevelopmental disorder whose genetic influences remain elusive. We hypothesize that individually rare structural variants contribute to the illness. Microdeletions ...and microduplications >100 kilobases were identified by microarray comparative genomic hybridization of genomic DNA from 150 individuals with schizophrenia and 268 ancestry-matched controls. All variants were validated by high-resolution platforms. Novel deletions and duplications of genes were present in 5% of controls versus 15% of cases and 20% of young-onset cases, both highly significant differences. The association was independently replicated in patients with childhood-onset schizophrenia as compared with their parents. Mutations in cases disrupted genes disproportionately from signaling networks controlling neurodevelopment, including neuregulin and glutamate pathways. These results suggest that multiple, individually rare mutations altering genes in neurodevelopmental pathways contribute to schizophrenia.
Oral-facial-digital (OFD) syndromes are a heterogeneous group of congenital disorders characterized by malformations of the face and oral cavity, and digit anomalies. Mutations within 12 ...cilia-related genes have been identified that cause several types of OFD, suggesting that OFDs constitute a subgroup of developmental ciliopathies. Through homozygosity mapping and exome sequencing of two families with variable OFD type 2, we identified distinct germline variants in INTS13, a subunit of the Integrator complex. This multiprotein complex associates with RNA Polymerase II and cleaves nascent RNA to modulate gene expression. We determined that INTS13 utilizes its C-terminus to bind the Integrator cleavage module, which is disrupted by the identified germline variants p.S652L and p.K668Nfs*9. Depletion of INTS13 disrupts ciliogenesis in human cultured cells and causes dysregulation of a broad collection of ciliary genes. Accordingly, its knockdown in Xenopus embryos leads to motile cilia anomalies. Altogether, we show that mutations in INTS13 cause an autosomal recessive ciliopathy, which reveals key interactions between components of the Integrator complex.
While Eulerian schemes work well for most gas flows, they have been shown to admit nonphysical oscillations near some material interfaces. In contrast, Lagrangian schemes work well at multimaterial ...interfaces, but suffer from their own difficulties in problems with large deformations and vorticity characteristic of most gas flows. We believe that the most robust schemes will combine the best properties of Eulerian and Lagrangian schemes. In this paper, we propose a new numerical method for treating interfaces in Eulerian schemes that maintains a Heaviside profile of the density with no numerical smearing along the lines of earlier work and most Lagrangian schemes. We use a level set function to track the motion of a multimaterial interface in an Eulerian framework. In addition, the use of ghost cells (actually ghost nodes in our finite difference framework) and a new isobaric fix technique allows us to keep the density profile from smearing out, while still keeping the scheme robust and easy to program with simple extensions to multidimensions and multilevel time integration, e.g., Runge–Kutta methods. In contrast, previous methods used ill-advised dimensional splitting for multidimensional problems and suffered from great complexity when used in conjunction with multilevel time integrators.
Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced ...over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands.
In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net).
The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.
Tubulin glutamylation is a post-translational modification that occurs predominantly in the ciliary axoneme and has been suggested to be important for ciliary function. However, its relationship to ...disorders of the primary cilium, termed ciliopathies, has not been explored. Here we mapped a new locus for Joubert syndrome (JBTS), which we have designated as JBTS15, and identified causative mutations in CEP41, which encodes a 41-kDa centrosomal protein. We show that CEP41 is localized to the basal body and primary cilia, and regulates ciliary entry of TTLL6, an evolutionarily conserved polyglutamylase enzyme. Depletion of CEP41 causes ciliopathy-related phenotypes in zebrafish and mice and results in glutamylation defects in the ciliary axoneme. Our data identify CEP41 mutations as a cause of JBTS and implicate tubulin post-translational modification in the pathogenesis of human ciliary dysfunction.