G-quadruplexes are non-B-DNA structures that form in the genome facilitated by Hoogsteen bonds between guanines in single or multiple strands of DNA. The functions of G-quadruplexes are linked to ...various molecular and disease phenotypes, and thus researchers are interested in measuring G-quadruplex formation genome-wide. Experimentally measuring G-quadruplexes is a long and laborious process. Computational prediction of G-quadruplex propensity from a given DNA sequence is thus a long-standing challenge. Unfortunately, despite the availability of high-throughput datasets measuring G-quadruplex propensity in the form of mismatch scores, extant methods to predict G-quadruplex formation either rely on small datasets or are based on domain-knowledge rules. We developed G4mismatch, a novel algorithm to accurately and efficiently predict G-quadruplex propensity for any genomic sequence. G4mismatch is based on a convolutional neural network trained on almost 400 millions human genomic loci measured in a single G4-seq experiment. When tested on sequences from a held-out chromosome, G4mismatch, the first method to predict mismatch scores genome-wide, achieved a Pearson correlation of over 0.8. When benchmarked on independent datasets derived from various animal species, G4mismatch trained on human data predicted G-quadruplex propensity genome-wide with high accuracy (Pearson correlations greater than 0.7). Moreover, when tested in detecting G-quadruplexes genome-wide using the predicted mismatch scores, G4mismatch achieved superior performance compared to extant methods. Last, we demonstrate the ability to deduce the mechanism behind G-quadruplex formation by unique visualization of the principles learned by the model.
G-quadruplexes (G4s) are nucleic acid secondary structures that form within guanine-rich DNA or RNA sequences. G4 formation can affect chromatin architecture and gene regulation, and has been ...associated with genomic instability, genetic diseases, and cancer progression. The experimental data produced by the G4-seq experiment provides unprecedented details on G4 formation in the genome. Still, running the experimental protocol on a whole genome is an expensive and time-consuming process. Thus, it is highly desirable to have a computational method to predict G4 formation in new DNA sequences or whole genomes. Here, we present G4detector, a new method based on a convolutional neural network to predict G4s from DNA sequences. On top of the sequence information, we improved prediction accuracy by the addition of RNA secondary structure information. To train and test G4detector, we compiled novel high-throughput benchmarks over multiple species genomes measured by the G4-seq protocol. We show that G4detector outperforms extant methods for the same task on all benchmark datasets, can detect G4s genome-wide with high accuracy, and is able to extrapolate human-trained measurements to various non-human species. The code and benchmarks are publicly available on github.com/OrensteinLab/G4detector .
RNA G-quadruplexes (rG4s) are RNA secondary structures, which are formed by guanine-rich sequences and have important cellular functions. Existing computational tools for rG4 prediction rely on ...specific sequence features and/or were trained on small datasets, without considering rG4 stability information, and are therefore sub-optimal. Here, we developed rG4detector, a convolutional neural network to identify potential rG4s in transcriptomics data. rG4detector outperforms existing methods in both predicting rG4 stability and in detecting rG4-forming sequences. To demonstrate the biological-relevance of rG4detector, we employed it to study RNAs that are bound by the RNA-binding protein G3BP1. G3BP1 is central to the induction of stress granules (SGs), which are cytoplasmic biomolecular condensates that form in response to a variety of cellular stresses. Unexpectedly, rG4detector revealed a dynamic enrichment of rG4s bound by G3BP1 in response to cellular stress. In addition, we experimentally characterized G3BP1 cross-talk with rG4s, demonstrating that G3BP1 is a bona fide rG4-binding protein and that endogenous rG4s are enriched within SGs. Furthermore, we found that reduced rG4 availability impairs SG formation. Hence, we conclude that rG4s play a direct role in SG biology via their interactions with RNA-binding proteins and that rG4detector is a novel useful tool for rG4 transcriptomics data analyses.
Single-stranded DNA (ssDNA) containing four guanine repeats can form G-quadruplex (G4) structures. While cellular proteins and small molecules can bind G4s, it has been difficult to broadly assess ...their DNA-binding specificity. Here, we use custom DNA microarrays to examine the binding specificities of proteins, small molecules, and antibodies across ∼15,000 potential G4 structures. Molecules used include fluorescently labeled pyridostatin (Cy5-PDS, a small molecule), BG4 (Cy5-BG4, a G4-specific antibody), and eight proteins (GST-tagged nucleolin, IGF2, CNBP, FANCJ, PIF1, BLM, DHX36, and WRN). Cy5-PDS and Cy5-BG4 selectively bind sequences known to form G4s, confirming their formation on the microarrays. Cy5-PDS binding decreased when G4 formation was inhibited using lithium or when ssDNA features on the microarray were made double-stranded. Similar conditions inhibited the binding of all other molecules except for CNBP and PIF1. We report that proteins have different G4-binding preferences suggesting unique cellular functions. Finally, competition experiments are used to assess the binding specificity of an unlabeled small molecule, revealing the structural features in the G4 required to achieve selectivity. These data demonstrate that the microarray platform can be used to assess the binding preferences of molecules to G4s on a broad scale, helping to understand the properties that govern molecular recognition.
G-quadruplexes are nucleic acid secondary structures that form within guanine-rich DNA or RNA sequences. G-quadruplex formation can affect chromatin architecture and gene regulation and has been ...associated with genomic instability, genetic diseases and cancer progression. G-quadruplex formation in a DNA template can be assessed using polymerase stop assays, which measure polymerase stalling at G-quadruplex sites. An experimental technique, called G4-seq, was developed by combining features of the polymerase stop assay with Illumina next-generation sequencing. The experimental data produced by this technique provides unprecedented details on where and at what intensity do G-quadruplexes form in the human genome. Still, running the experimental protocol on a whole genome is an expensive and time-consuming process. Thus, it is highly desirable to have a computational method to predict G-quadruplex formation of new DNA sequences or whole genomes. Here, we present a new method, called G4detector, to predict G-quadruplexes from DNA sequences based on multi-kernel convolutional neural networks. To test G4detector, we compiled novel high-throughput in vitro and in vivo benchmarks. On these data, we show that G4detector outperforms extant methods for the same task on all benchmark datasets. We visualize the most important features of G4detector models and discover that G-quadruplex formation is highly depended on G-tracts length, their spacing and nucleotide composition between them. The code and benchmarks are publicly available on github.com/OrensteinLab/G4detector.
G-quadruplexes are nucleic acid secondary structures that form within guanine-rich DNA sequences. G-quadruplex formation can affect chromatin architecture and gene regulation and has been associated ...with genomic instability, genetic diseases and cancer progression. Here, we present a new method, called G4detector, to predict G-quadruplexes from DNA sequences based on multi-kernel convolutional neural networks. The code and benchmarks are publicly available on \urlgithub.com/OrensteinLab/G4detector. As part of this study, we generated novel benchmarks to train and test different computational methods for the task of G4 prediction. We used the high-throughputin vitro data generated by the G4-seq protocol %(Chamberset al. 2015). ~\citechambers2015high. We turned each of the three sets into a classification problem by augmenting it with a negative set, using three types of negatives: \beginenumerate \item random : random genomic sequences \item dishuffle : randomly shuffled positives while preserving dinucleotide frequencies \item PQ : predicted G-quadruplexes in the human genome according to a regular the expression: \beginequation G^3+ ACGT^1-7 G^3+ ACGT^1-7 G^3+ ACGT^1-7 G^3+ \endequation \endenumerate We used the genomic coordinates as measured in the G4 ChIP-seq experiment %(H\"a nsel-Hertschet al. 2016) ~\citehansel2016g to create anin vivo benchmark. Since the G4 structures and the loops that comprise them vary in size, using a kernel of fixed size might not be beneficial for identifying the features that characterize a G4 structure. Instead, our method, G4detector, employs three parallel one-dimensional convolution layers. %(Zhanget al. 2018). ~\citezhang2018high. The final output assigns to each input DNA sequence a probability of forming a G4 structure. We compared G4detector to three other methods: GraphProt %(Maticzkaet al. 2014), ~\citematiczka2014graphprot, Quadron %(Sahakyanet al. 2017), ~\citesahakyan2017machine and G4Hunter %(Bedratet al. 2016). ~\citebedrat2016re. According to the results G4detector outperforms all competing methods in predictingin vitro G4 formation as well as in predictingin vivo formation, maintaining the highest area under the receiver operating curve (AUC) score, as summarized in Figure~\reffig:all. \beginfigure !h \centering \beginsubfigure a0.4\textwidth \includegraphicswidth=\columnwidthinvitro_c.png \caption łabelfig:Ng1 \endsubfigure \beginsubfigure b0.4\textwidth \includegraphicswidth=\columnwidthinvivo_c.png \caption łabelfig:Ng2 \endsubfigure \captionG4detector outperforms extant methods in predicting G4 formation (a)in vitro and (b)in vivo. %using three different types of stabilizers and negative sets. łabelfig:all \endfigure
rG4detector Turner, Maor; Barshai, Mira; Orenstein, Yaron
Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics,
08/2022
Conference Proceeding
Open access
RNA G-quadruplexes (rG4s) are RNA secondary structures, which are formed by guanine-rich sequences and have important cellular functions. Thus, researchers would like to know where and when rG4s are ...formed throughout the transcriptome. Measuring rG4s experimentally is a long and lobarious process, and hence researchers often rely on computational methods to predict the rG4 propensity of a given RNA sequence. However, existing computational methods for rG4 propensity prediction are sub-optimal since they rely on specific sequence features and/or were trained on small datasets and without considering rG4 stability information. Here, we developed rG4detector, a convolutional neural network to predict the rG4 propensity of any given RNA sequence. We demonstrated that rG4detector outperforms existing methods over various transcriptomic datasets. In addition, we used rG4detector to detect potential rG4s in transcriptomic data, and showed that it improves detection performance compared to existing methods. Last, we interrogated rG4detector for the important features it learned and discovered known and novel molecular principles behind rG4 formation. We expect rG4detector to advance future rG4 research by accurate detection and propensity prediction of rG4s. The code, trained models, and processed datasets are publicly available via github.com/OrensteinLab/rG4detector.
Background
Hemolytic disease of the fetus and newborn (HDFN) is a severe form of anemia caused by maternal antibodies against fetal red blood cells (RBC) that can cause intrauterine and perinatal ...morbidity and mortality. The prevalence and specificities of alloantibodies among Israeli pregnant women and clinical outcomes for their fetuses and newborns are unknown.
Study Design and Methods
A retrospective study of women who gave birth between January 1, 2011, and December 31, 2011, was performed. Data were obtained for obstetric admissions from 16 of 27 hospitals, which included results of maternal ABO, D, antibody screens, antibody identification, and requirements for intrauterine or newborn exchange transfusions.
Results
Data on 90 948 women representing 70% of all births during 2011 were analyzed. Antibody screen was positive in 5245 (5.8%) women. Alloantibodies, excluding anti‐D titer (<16) were identified in 900 (1.0%) women. Of 191 D– women, 75 (39.3%) had anti‐D titer of 16 or greater. Other common clinically significant antibodies were anti‐E (204, 23%), anti‐K (145, 16%), and anti‐c (97, 10.8%) alone or in antibody combinations. Multiple alloantibodies were observed in 132 of 900 (15%) of women. Severe HDFN developed in 6.8% (9/132) of these pregnancies. Seventeen fetuses and newborns (0.02% of births) including one set of twins required RBC transfusions. Two fetuses whose mothers had multiple alloantibodies received intrauterine transfusions; one of them was hydropic and died.
Conclusion
The prevalence of RBC alloantibodies was 1.0% among Israeli pregnant women. Transfusion was required in 0.02% of the fetuses and newborns. Severe HDFN developed in 6.8% of pregnancies with multiple maternal alloantibodies.