The secondary structure of RNA is integral to the variety of functions it carries out in the cell and its depiction allows researchers to develop hypotheses about which nucleotides and base pairs are ...functionally relevant. Current approaches to visualizing secondary structure provide an adequate platform for the conversion of static text-based representations to 2D images, but are limited in their offer of interactivity as well as their ability to display larger structures, multiple structures and pseudoknotted structures.
In this article, we present forna, a web-based tool for displaying RNA secondary structure which allows users to easily convert sequences and secondary structures to clean, concise and customizable visualizations. It supports, among other features, the simultaneous visualization of multiple structures, the display of pseudoknotted structures, the interactive editing of the displayed structures, and the automatic generation of secondary structure diagrams from PDB files. It requires no software installation apart from a modern web browser.
The web interface of forna is available at http://rna.tbi.univie.ac.at/forna while the source code is available on github at www.github.com/pkerpedjiev/forna.
pkerp@tbi.univie.ac.at
Supplementary data are available at Bioinformatics online.
We present HiGlass, an open source visualization tool built on web technologies that provides a rich interface for rapid, multiplex, and multiscale navigation of 2D genomic maps alongside 1D genomic ...tracks, allowing users to combine various data types, synchronize multiple visualization modalities, and share fully customizable views with others. We demonstrate its utility in exploring different experimental conditions, comparing the results of analyses, and creating interactive snapshots to share with collaborators and the broader public. HiGlass is accessible online at http://higlass.io and is also available as a containerized application that can be run on any platform.
The three-dimensional conformation of a genome can be profiled using Hi-C, a technique that combines chromatin conformation capture with high-throughput sequencing. However, structural variations ...often yield features that can be mistaken for chromosomal interactions. Here, we describe a computational method HiNT (Hi-C for copy Number variation and Translocation detection), which detects copy number variations and interchromosomal translocations within Hi-C data with breakpoints at single base-pair resolution. We demonstrate that HiNT outperforms existing methods on both simulated and real data. We also show that Hi-C can supplement whole-genome sequencing in structure variant detection by locating breakpoints in repetitive regions.
Intermolecular interactions of ncRNAs are at the core of gene regulation events, and identifying the full map of these interactions bears crucial importance for ncRNA functional studies. It is known ...that RNA-RNA interactions are built up by complementary base pairings between interacting RNAs and high level of complementarity between two RNA sequences is a powerful predictor of such interactions. Here, we present RIsearch2, a large-scale RNA-RNA interaction prediction tool that enables quick localization of potential near-complementary RNA-RNA interactions between given query and target sequences. In contrast to previous heuristics which either search for exact matches while including G-U wobble pairs or employ simplified energy models, we present a novel approach using a single integrated seed-and-extend framework based on suffix arrays. RIsearch2 enables fast discovery of candidate RNA-RNA interactions on genome/transcriptome-wide scale. We furthermore present an siRNA off-target discovery pipeline that not only predicts the off-target transcripts but also computes the off-targeting potential of a given siRNA. This is achieved by combining genome-wide RIsearch2 predictions with target site accessibilities and transcript abundance estimates. We show that this pipeline accurately predicts siRNA off-target interactions and enables off-targeting potential comparisons between different siRNA designs. RIsearch2 and the siRNA off-target discovery pipeline are available as stand-alone software packages from http://rth.dk/resources/risearch.
Modern DNA sequencing methods produce vast amounts of data that often requires mapping to a reference genome. Most existing programs use the number of mismatches between the read and the genome as a ...measure of quality. This approach is without a statistical foundation and can for some data types result in many wrongly mapped reads. Here we present a probabilistic mapping method based on position-specific scoring matrices, which can take into account not only the quality scores of the reads but also user-specified models of evolution and data-specific biases.
We show how evolution, data-specific biases, and sequencing errors are naturally dealt with probabilistically. Our method achieves better results than Bowtie and BWA on simulated and real ancient and PAR-CLIP reads, as well as on simulated reads from the AT rich organism P. falciparum, when modeling the biases of these data. For simulated Illumina reads, the method has consistently higher sensitivity for both single-end and paired-end data. We also show that our probabilistic approach can limit the problem of random matches from short reads of contamination and that it improves the mapping of real reads from one organism (D. melanogaster) to a related genome (D. simulans).
The presented work is an implementation of a novel approach to short read mapping where quality scores, prior mismatch probabilities and mapping qualities are handled in a statistically sound manner. The resulting implementation provides not only a tool for biologists working with low quality and/or biased sequencing data but also a demonstration of the feasibility of using a probability based alignment method on real and simulated data sets.
The 4D Nucleome (4DN) Network aims to elucidate the complex structure and organization of chromosomes in the nucleus and the impact of their disruption in disease biology. We present the 4DN Data ...Portal ( https://data.4dnucleome.org/ ), a repository for datasets generated in the 4DN network and relevant external datasets. Datasets were generated with a wide range of experiments, including chromosome conformation capture assays such as Hi-C and other innovative sequencing and microscopy-based assays probing chromosome architecture. All together, the 4DN data portal hosts more than 1800 experiment sets and 36000 files. Results of sequencing-based assays from different laboratories are uniformly processed and quality-controlled. The portal interface allows easy browsing, filtering, and bulk downloads, and the integrated HiGlass genome browser allows interactive visualization and comparison of multiple datasets. The 4DN data portal represents a primary resource for chromosome contact and other nuclear architecture data for the scientific community.
Achromobacter xylosoxidans is an environmental opportunistic pathogen, which infects an increasing number of immunocompromised patients. In this study we combined genomic analysis of a clinical ...isolated A. xylosoxidans strain with phenotypic investigations of its important pathogenic features. We present a complete assembly of the genome of A. xylosoxidans NH44784-1996, an isolate from a cystic fibrosis patient obtained in 1996. The genome of A. xylosoxidans NH44784-1996 contains approximately 7 million base pairs with 6390 potential protein-coding sequences. We identified several features that render it an opportunistic human pathogen, We found genes involved in anaerobic growth and the pgaABCD operon encoding the biofilm adhesin poly-β-1,6-N-acetyl-D-glucosamin. Furthermore, the genome contains a range of antibiotic resistance genes coding efflux pump systems and antibiotic modifying enzymes. In vitro studies of A. xylosoxidans NH44784-1996 confirmed the genomic evidence for its ability to form biofilms, anaerobic growth via denitrification, and resistance to a broad range of antibiotics. Our investigation enables further studies of the functionality of important identified genes contributing to the pathogenicity of A. xylosoxidans and thereby improves our understanding and ability to treat this emerging pathogen.
We present
forgi, a Python library to analyze the tertiary structure of RNA secondary structure elements. Our representation of an RNA molecule is centered on secondary structure elements (stems, ...bulges and loops). By fitting a cylinder to the helix axis, these elements are carried over into a coarse-grained 3D structure representation. Integration with Biopython allows for handling of all-atom 3D information.
forgi can deal with a variety of file formats including dotbracket strings, PDB and MMCIF files. We can handle modified residues, missing residues, cofold and multifold structures as well as nucleotide numbers starting at arbitrary positions. We apply this library to the study of stacking helices in junctions and pseudoknots and investigate how far stacking helices in solved experimental structures can divert from coaxial geometries.
This paper presents an interactive visualization interface-HiPiler-for the exploration and visualization of regions-of-interest in large genome interaction matrices. Genome interaction matrices ...approximate the physical distance of pairs of regions on the genome to each other and can contain up to 3 million rows and columns with many sparse regions. Regions of interest (ROIs) can be defined, e.g., by sets of adjacent rows and columns, or by specific visual patterns in the matrix. However, traditional matrix aggregation or pan-and-zoom interfaces fail in supporting search, inspection, and comparison of ROIs in such large matrices. In HiPiler, ROIs are first-class objects, represented as thumbnail-like "snippets". Snippets can be interactively explored and grouped or laid out automatically in scatterplots, or through dimension reduction methods. Snippets are linked to the entire navigable genome interaction matrix through brushing and linking. The design of HiPiler is based on a series of semi-structured interviews with 10 domain experts involved in the analysis and interpretation of genome interaction matrices. We describe six exploration tasks that are crucial for analysis of interaction matrices and demonstrate how HiPiler supports these tasks. We report on a user study with a series of data exploration sessions with domain experts to assess the usability of HiPiler as well as to demonstrate respective findings in the data.