The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA ...sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25-100 base range, in the presence of errors and true biological variation.
We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels.
We compare BFAST to a selection of large-scale alignment tools -- BLAT, MAQ, SHRiMP, and SOAP -- in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at (http://bfast.sourceforge.net).
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
We use high-density single nucleotide polymorphism (SNP) genotyping microarrays to demonstrate the ability to accurately and robustly determine whether individuals are in a complex genomic DNA ...mixture. We first develop a theoretical framework for detecting an individual's presence within a mixture, then show, through simulations, the limits associated with our method, and finally demonstrate experimentally the identification of the presence of genomic DNA of specific individuals within a series of highly complex genomic mixtures, including mixtures where an individual contributes less than 0.1% of the total genomic DNA. These findings shift the perceived utility of SNPs for identifying individual trace contributors within a forensics mixture, and suggest future research efforts into assessing the viability of previously sub-optimal DNA sources due to sample contamination. These findings also suggest that composite statistics across cohorts, such as allele frequency or genotype counts, do not mask identity within genome-wide association studies. The implications of these findings are discussed.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
U87MG is a commonly studied grade IV glioma cell line that has been analyzed in at least 1,700 publications over four decades. In order to comprehensively characterize the genome of this cell line ...and to serve as a model of broad cancer genome sequencing, we have generated greater than 30x genomic sequence coverage using a novel 50-base mate paired strategy with a 1.4kb mean insert library. A total of 1,014,984,286 mate-end and 120,691,623 single-end two-base encoded reads were generated from five slides. All data were aligned using a custom designed tool called BFAST, allowing optimal color space read alignment and accurate identification of DNA variants. The aligned sequence reads and mate-pair information identified 35 interchromosomal translocation events, 1,315 structural variations (>100 bp), 191,743 small (<21 bp) insertions and deletions (indels), and 2,384,470 single nucleotide variations (SNVs). Among these observations, the known homozygous mutation in PTEN was robustly identified, and genes involved in cell adhesion were overrepresented in the mutated gene list. Data were compared to 219,187 heterozygous single nucleotide polymorphisms assayed by Illumina 1M Duo genotyping array to assess accuracy: 93.83% of all SNPs were reliably detected at filtering thresholds that yield greater than 99.99% sequence accuracy. Protein coding sequences were disrupted predominantly in this cancer cell line due to small indels, large deletions, and translocations. In total, 512 genes were homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and 35 by interchromosomal translocations to reveal a highly mutated cell line genome. Of the small homozygously mutated variants, 8 SNVs and 99 indels were novel events not present in dbSNP. These data demonstrate that routine generation of broad cancer genome sequence is possible outside of genome centers. The sequence analysis of U87MG provides an unparalleled level of mutational resolution compared to any cell line to date.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
•Rate of diagnostic yield of Clinical Exome Sequencing is not increasing.•Rate of new genetic disease discovery is slowing.•Novel methods are needed to improve diagnostic and discovery rates.
...Characterized by impairments in brain and central nervous system development, neurodevelopmental diseases causes are highly heterogeneous. Although many of these diseases are individually rare, collectively more than 3% of the children are reported to be affected with a type of neurodevelopmental diseases worldwide, and many remain undiagnosed even with current genomic tools. Identifying the genetic causes of these diseases allows better clinical management and expands our understanding of human neurodevelopment. Over the past decade, expansion of genomic sequencing and some methodologic improvements have improved molecular diagnostic yield as well as the discovery of novel genetic causes for wide spectrum of neurodevelopmental diseases. Here we review the current diagnostic workflow and propose ways of improving the diagnostic yield.
IMPORTANCE: Clinical exome sequencing (CES) is rapidly becoming a common molecular diagnostic test for individuals with rare genetic disorders. OBJECTIVE: To report on initial clinical indications ...for CES referrals and molecular diagnostic rates for different indications and for different test types. DESIGN, SETTING, AND PARTICIPANTS: Clinical exome sequencing was performed on 814 consecutive patients with undiagnosed, suspected genetic conditions at the University of California, Los Angeles, Clinical Genomics Center between January 2012 and August 2014. Clinical exome sequencing was conducted as trio-CES (both parents and their affected child sequenced simultaneously) to effectively detect de novo and compound heterozygous variants or as proband-CES (only the affected individual sequenced) when parental samples were not available. MAIN OUTCOMES AND MEASURES: Clinical indications for CES requests, molecular diagnostic rates of CES overall and for phenotypic subgroups, and differences in molecular diagnostic rates between trio-CES and proband-CES. RESULTS: Of the 814 cases, the overall molecular diagnosis rate was 26% (213 of 814; 95% CI, 23%-29%). The molecular diagnosis rate for trio-CES was 31% (127 of 410 cases; 95% CI, 27%-36%) and 22% (74 of 338 cases; 95% CI, 18%-27%) for proband-CES. In cases of developmental delay in children (<5 years, n = 138), the molecular diagnosis rate was 41% (45 of 109; 95% CI, 32%-51%) for trio-CES cases and 9% (2 of 23, 95% CI, 1%-28%) for proband-CES cases. The significantly higher diagnostic yield (P value = .002; odds ratio, 7.4 95% CI, 1.6-33.1) of trio-CES was due to the identification of de novo and compound heterozygous variants. CONCLUSIONS AND RELEVANCE: In this sample of patients with undiagnosed, suspected genetic conditions, trio-CES was associated with higher molecular diagnostic yield than proband-CES or traditional molecular diagnostic methods. Additional studies designed to validate these findings and to explore the effect of this approach on clinical and economic outcomes are warranted.
Mutations in DMD disrupt the reading frame, prevent dystrophin translation, and cause Duchenne muscular dystrophy (DMD). Here we describe a CRISPR/Cas9 platform applicable to 60% of DMD patient ...mutations. We applied the platform to DMD-derived hiPSCs where successful deletion and non-homologous end joining of up to 725 kb reframed the DMD gene. This is the largest CRISPR/Cas9-mediated deletion shown to date in DMD. Use of hiPSCs allowed evaluation of dystrophin in disease-relevant cell types. Cardiomyocytes and skeletal muscle myotubes derived from reframed hiPSC clonal lines had restored dystrophin protein. The internally deleted dystrophin was functional as demonstrated by improved membrane integrity and restoration of the dystrophin glycoprotein complex in vitro and in vivo. Furthermore, miR31 was reduced upon reframing, similar to observations in Becker muscular dystrophy. This work demonstrates the feasibility of using a single CRISPR pair to correct the reading frame for the majority of DMD patients.
Display omitted
•Largest CRISPR/Cas9-mediated deletion of 725 kb of DMD•Reframed DMD hiPSCs differentiated to cardiac and skeletal muscle express dystrophin•Internally deleted dystrophin demonstrates functionality in vitro and in vivo•This single gRNA pair is therapeutically relevant to 60% of DMD mutations
Young et al. demonstrate restoration of the DMD reading frame by CRISPR/Cas9-mediated deletion of up to 725 kb in hiPSCs as a therapeutic strategy for 60% of Duchenne muscular dystrophy patients. The resulting internally deleted protein is shown to be functional in vitro and in vivo.
Human pluripotent stem cells (hPSCs) can be directed to differentiate into skeletal muscle progenitor cells (SMPCs). However, the myogenicity of hPSC-SMPCs relative to human fetal or adult satellite ...cells remains unclear. We observed that hPSC-SMPCs derived by directed differentiation are less functional in vitro and in vivo compared to human satellite cells. Using RNA sequencing, we found that the cell surface receptors ERBB3 and NGFR demarcate myogenic populations, including PAX7 progenitors in human fetal development and hPSC-SMPCs. We demonstrated that hPSC skeletal muscle is immature, but inhibition of transforming growth factor-β signalling during differentiation improved fusion efficiency, ultrastructural organization and the expression of adult myosins. This enrichment and maturation strategy restored dystrophin in hundreds of dystrophin-deficient myofibres after engraftment of CRISPR-Cas9-corrected Duchenne muscular dystrophy human induced pluripotent stem cell-SMPCs. The work provides an in-depth characterization of human myogenesis, and identifies candidates that improve the in vivo myogenic potential of hPSC-SMPCs to levels that are equal to directly isolated human fetal muscle cells.
Intratumoral heterogeneity contributes to cancer drug resistance, but the underlying mechanisms are not understood. Single-cell analyses of patient-derived models and clinical samples from ...glioblastoma patients treated with epidermal growth factor receptor (EGFR) tyrosine kinase inhibitors (TKIs) demonstrate that tumor cells reversibly up-regulate or suppress mutant EGFR expression, conferring distinct cellular phenotypes to reach an optimal equilibrium for growth. Resistance to EGFR TKIs is shown to occur by elimination of mutant EGFR from extrachromosomal DNA. After drug withdrawal, reemergence of donai EGFR mutations on extrachromosomal DNA follows. These results indicate a highly specific, dynamic, and adaptive route by which cancers can evade therapies that target oncogenes maintained on extrachromosomal DNA.
Cytosine DNA methylation is important in regulating gene expression and in silencing transposons and other repetitive sequences. Recent genomic studies in Arabidopsis thaliana have revealed that many ...endogenous genes are methylated either within their promoters or within their transcribed regions, and that gene methylation is highly correlated with transcription levels. However, plants have different types of methylation controlled by different genetic pathways, and detailed information on the methylation status of each cytosine in any given genome is lacking. To this end, we generated a map at single-base-pair resolution of methylated cytosines for Arabidopsis, by combining bisulphite treatment of genomic DNA with ultra-high-throughput sequencing using the Illumina 1G Genome Analyser and Solexa sequencing technology. This approach, termed BS-Seq, unlike previous microarray-based methods, allows one to sensitively measure cytosine methylation on a genome-wide scale within specific sequence contexts. Here we describe methylation on previously inaccessible components of the genome and analyse the DNA methylation sequence composition and distribution. We also describe the effect of various DNA methylation mutants on genome-wide methylation patterns, and demonstrate that our newly developed library construction and computational methods can be applied to large genomes such as that of mouse.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Motivation: The ability to detect copy-number variation (CNV) and loss of heterozygosity (LOH) from exome sequencing data extends the utility of this powerful approach that has mainly been used for ...point or small insertion deletion detection.
Results: We present ExomeCNV, a statistical method to detect CNV and LOH using depth-of-coverage and B-allele frequencies, from mapped short sequence reads, and we assess both the method's power and the effects of confounding variables. We apply our method to a cancer exome resequencing dataset. As expected, accuracy and resolution are dependent on depth-of-coverage and capture probe design.
Availability: CRAN package 'ExomeCNV'.
Contact:
fsathira@fas.harvard.edu; snelson@ucla.edu
Supplementary information:
Supplementary data are available at Bioinformatics online.