We developed a novel software package, XCAVATOR, for the identification of genomic regions involved in copy number variants/alterations (CNVs/CNAs) from short and long reads whole-genome sequencing ...experiments.
By using simulated and real datasets we showed that our tool, based on read count approach, is capable to predict the boundaries and the absolute number of DNA copies CNVs/CNAs with high resolutions. To demonstrate the power of our software we applied it to the analysis Illumina and Pacific Bioscencies data and we compared its performance to other ten state of the art tools.
All the analyses we performed demonstrate that XCAVATOR is capable to detect germline and somatic CNVs/CNAs outperforming all the other tools we compared. XCAVATOR is freely available at http://sourceforge.net/projects/xcavator/ .
We developed a novel software tool, EXCAVATOR, for the detection of copy number variants (CNVs) from whole-exome sequencing data. EXCAVATOR combines a three-step normalization procedure with a novel ...heterogeneous hidden Markov model algorithm and a calling method that classifies genomic regions into five copy number states. We validate EXCAVATOR on three datasets and compare the results with three other methods. These analyses show that EXCAVATOR outperforms the other methods and is therefore a valuable tool for the investigation of CNVs in largescale projects, as well as in clinical research and diagnostics. EXCAVATOR is freely available at http://sourceforge.net/projects/excavatortool/.
Motivation: The advent of high-throughput sequencing technologies is revolutionizing our ability in discovering and genotyping DNA copy number variants (CNVs). Read count-based approaches are able to ...detect CNV regions with an unprecedented resolution. Although this computational strategy has been recently introduced in literature, much work has been already done for the preparation, normalization and analysis of this kind of data.
Results: Here we face the many aspects that cover the detection of CNVs by using read count approach. We first study the characteristics and systematic biases of read count distributions, focusing on the normalization methods designed for removing these biases. Subsequently, we compare the algorithms designed to detect the boundaries of CNVs and we investigate the ability of read count data to predict the exact number of DNA copy. Finally, we review the tools publicly available for analysing read count data. To better understand the state of the art of read count approaches, we compare the performance of the three most widely used sequencing technologies (Illumina Genome Analyzer, Roche 454 and Life Technologies SOLiD) in all the analyses that we perform.
Contact:
albertomagi@gmail.com
Supplementary information:
Supplementary data are available at Bioinformatics online.
The clinical application of technological progress in the identification of DNA alterations has always led to improvements of diagnostic yields in genetic medicine. At chromosome side, from ...cytogenetic techniques evaluating number and gross structural defects to genomic microarrays detecting cryptic copy number variants, and at molecular level, from Sanger method studying the nucleotide sequence of single genes to the high-throughput next-generation sequencing (NGS) technologies, resolution and sensitivity progressively increased expanding considerably the range of detectable DNA anomalies and alongside of Mendelian disorders with known genetic causes. However, particular genomic regions (i.e., repetitive and GC-rich sequences) are inefficiently analyzed by standard genetic tests, still relying on laborious, time-consuming and low-sensitive approaches (i.e., southern-blot for repeat expansion or long-PCR for genes with highly homologous pseudogenes), accounting for at least part of the patients with undiagnosed genetic disorders. Third generation sequencing, generating long reads with improved mappability, is more suitable for the detection of structural alterations and defects in hardly accessible genomic regions. Although recently implemented and not yet clinically available, long read sequencing (LRS) technologies have already shown their potential in genetic medicine research that might greatly impact on diagnostic yield and reporting times, through their translation to clinical settings. The main investigated LRS application concerns the identification of structural variants and repeat expansions, probably because techniques for their detection have not evolved as rapidly as those dedicated to single nucleotide variants (SNV) identification: gold standard analyses are karyotyping and microarrays for balanced and unbalanced chromosome rearrangements, respectively, and southern blot and repeat-primed PCR for the amplification and sizing of expanded alleles, impaired by limited resolution and sensitivity that have not been significantly improved by the advent of NGS. Nevertheless, more recently, with the increased accuracy provided by the latest product releases, LRS has been tested also for SNV detection, especially in genes with highly homologous pseudogenes and for haplotype reconstruction to assess the parental origin of alleles with
pathogenic variants. We provide a review of relevant recent scientific papers exploring LRS potential in the diagnosis of genetic diseases and its potential future applications in routine genetic testing.
The advent of Whole Genome Sequencing (WGS) broadened the genetic variation detection range, revealing the presence of variants even in non-coding regions of the genome, which would have been missed ...using targeted approaches. One of the most challenging issues in WGS analysis regards the interpretation of annotated variants. This review focuses on tools suitable for the functional annotation of variants falling into non-coding regions. It couples the description of non-coding genomic areas with the results and performance of existing tools for a functional interpretation of the effect of variants in these regions. Tools were tested in a controlled genomic scenario, representing the ground-truth and allowing us to determine software performance.
Until recently, thrombocytopenia 2 (THC2) was considered an exceedingly rare form of autosomal dominant thrombocytopenia and only 2 families were known. However, we recently identified mutations in ...the 5′-untranslated region of the ANKRD26 gene in 9 THC2 families. Here we report on 12 additional pedigrees with ANKRD26 mutations, 6 of which are new. Because THC2 affected 21 of the 210 families in our database, it has to be considered one of the less rare forms of inherited thrombocytopenia. Analysis of all 21 families with ANKRD26 mutations identified to date revealed that thrombocytopenia and bleeding tendency were usually mild. Nearly all patients had no platelet macrocytosis, and this characteristic distinguishes THC2 from most other forms of inherited thrombocytopenia. In the majority of cases, platelets were deficient in glycoprotein Ia and α-granules, whereas in vitro platelet aggregation was normal. Bone marrow examination and serum thrombopoietin levels suggested that thrombocytopenia was derived from dysmegakaryopoiesis. Unexplained high values of hemoglobin and leukocytes were observed in a few cases. An unexpected finding that warrants further investigation was a high incidence of acute leukemia. Given the scarcity of distinctive characteristics, the ANKRD26-related thrombocytopenia has to be taken into consideration in the differential diagnosis of isolated thrombocytopenias.
The SLC12 gene family consists of SLC12A1-SLC12A9, encoding electroneutral cation-coupled chloride co-transporters. SCL12A2 has been shown to play a role in corticogenesis and therefore represents a ...strong candidate neurodevelopmental disorder gene. Through trio exome sequencing we identified de novo mutations in SLC12A2 in six children with neurodevelopmental disorders. All had developmental delay or intellectual disability ranging from mild to severe. Two had sensorineural deafness. We also identified SLC12A2 variants in three individuals with non-syndromic bilateral sensorineural hearing loss and vestibular areflexia. The SLC12A2 de novo mutation rate was demonstrated to be significantly elevated in the deciphering developmental disorders cohort. All tested variants were shown to reduce co-transporter function in Xenopus laevis oocytes. Analysis of SLC12A2 expression in foetal brain at 16-18 weeks post-conception revealed high expression in radial glial cells, compatible with a role in neurogenesis. Gene co-expression analysis in cells robustly expressing SLC12A2 at 16-18 weeks post-conception identified a transcriptomic programme associated with active neurogenesis. We identify SLC12A2 de novo mutations as the cause of a novel neurodevelopmental disorder and bilateral non-syndromic sensorineural hearing loss and provide further data supporting a role for this gene in human neurodevelopment.
ETV6-related thrombocytopenia is an autosomal dominant thrombocytopenia that has been recently identified in a few families and has been suspected to predispose to hematologic malignancies. To gain ...further information on this disorder, we searched for ETV6 mutations in the 130 families with inherited thrombocytopenia of unknown origin from our cohort of 274 consecutive pedigrees with familial thrombocytopenia. We identified 20 patients with ETV6-related thrombocytopenia from seven pedigrees. They have five different ETV6 variants, including three novel mutations affecting the highly conserved E26 transformation-specific domain. The relative frequency of ETV6-related thrombocytopenia was 2.6% in the whole case series and 4.6% among the families with known forms of inherited thrombocytopenia. The degree of thrombocytopenia and bleeding tendency of the patients with ETV6-related thrombocytopenia were mild, but four subjects developed B-cell acute lymphoblastic leukemia during childhood, resulting in a significantly higher incidence of this condition compared to that in the general population. Clinical and laboratory findings did not identify any particular defects that could lead to the suspicion of this disorder from the routine diagnostic workup. However, at variance with most inherited thrombocytopenias, platelets were not enlarged. In vitro studies revealed that the maturation of the patients' megakaryocytes was defective and that the patients have impaired proplatelet formation. Moreover, platelets from patients with ETV6-related thrombocytopenia have reduced ability to spread on fibrinogen. Since the dominant thrombocytopenias due to mutations in RUNX1 and ANKRD26 are also characterized by normal platelet size and predispose to hematologic malignancies, we suggest that screening for ETV6, RUNX1 and ANKRD26 mutations should be performed in all subjects with autosomal dominant thrombocytopenia and normal platelet size.
Inherited thrombocytopenias (IT) are genetic diseases characterized by low platelet count, sometimes associated with congenital defects or a predisposition to develop additional conditions. ...Next-generation sequencing has substantially improved our knowledge of IT, with more than 40 genes identified so far, but obtaining a molecular diagnosis remains a challenge especially for patients with non-syndromic forms, having no clinical or functional phenotypes that raise suspicion about specific genes. We performed exome sequencing (ES) in a cohort of 116 IT patients (89 families), still undiagnosed after a previously validated phenotype-driven diagnostic algorithm including a targeted analysis of suspected genes. ES achieved a diagnostic yield of 36%, with a gain of 16% over the diagnostic algorithm. This can be explained by genetic heterogeneity and unspecific genotype-phenotype relationships that make the simultaneous analysis of all the genes, enabled by ES, the most reasonable strategy. Furthermore, ES disentangled situations that had been puzzling because of atypical inheritance, sex-related effects or false negative laboratory results. Finally, ES-based copy number variant analysis disclosed an unexpectedly high prevalence of RUNX1 deletions, predisposing to hematologic malignancies. Our findings demonstrate that ES, including copy number variant analysis, can substantially contribute to the diagnosis of IT and can solve diagnostic problems that would otherwise remain a challenge.