DIKUL - logo
E-resources
Open access
  • Predicting G-Quadruplexes f...
    Barshai, Mira; Orenstein, Yaron

    Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, 09/2019
    Conference Proceeding

    G-quadruplexes are nucleic acid secondary structures that form within guanine-rich DNA or RNA sequences. G-quadruplex formation can affect chromatin architecture and gene regulation and has been associated with genomic instability, genetic diseases and cancer progression. G-quadruplex formation in a DNA template can be assessed using polymerase stop assays, which measure polymerase stalling at G-quadruplex sites. An experimental technique, called G4-seq, was developed by combining features of the polymerase stop assay with Illumina next-generation sequencing. The experimental data produced by this technique provides unprecedented details on where and at what intensity do G-quadruplexes form in the human genome. Still, running the experimental protocol on a whole genome is an expensive and time-consuming process. Thus, it is highly desirable to have a computational method to predict G-quadruplex formation of new DNA sequences or whole genomes. Here, we present a new method, called G4detector, to predict G-quadruplexes from DNA sequences based on multi-kernel convolutional neural networks. To test G4detector, we compiled novel high-throughput in vitro and in vivo benchmarks. On these data, we show that G4detector outperforms extant methods for the same task on all benchmark datasets. We visualize the most important features of G4detector models and discover that G-quadruplex formation is highly depended on G-tracts length, their spacing and nucleotide composition between them. The code and benchmarks are publicly available on github.com/OrensteinLab/G4detector.