HiC-Pro is an optimized and flexible pipeline for processing Hi-C data from raw reads to normalized contact maps. HiC-Pro maps reads, detects valid ligation products, performs quality controls and ...generates intra- and inter-chromosomal contact maps. It includes a fast implementation of the iterative correction method and is based on a memory-efficient data format for Hi-C contact maps. In addition, HiC-Pro can use phased genotype data to build allele-specific contact maps. We applied HiC-Pro to different Hi-C datasets, demonstrating its ability to easily process large data in a reasonable time. Source code and documentation are available at http://github.com/nservant/HiC-Pro .
More and more cancer studies use next-generation sequencing (NGS) data to detect various types of genomic variation. However, even when researchers have such data at hand, single-nucleotide ...polymorphism arrays have been considered necessary to assess copy number alterations and especially loss of heterozygosity (LOH). Here, we present the tool Control-FREEC that enables automatic calculation of copy number and allelic content profiles from NGS data, and consequently predicts regions of genomic alteration such as gains, losses and LOH. Taking as input aligned reads, Control-FREEC constructs copy number and B-allele frequency profiles. The profiles are then normalized, segmented and analyzed in order to assign genotype status (copy number and allelic content) to each genomic region. When a matched normal sample is provided, Control-FREEC discriminates somatic from germline events. Control-FREEC is able to analyze overdiploid tumor samples and samples contaminated by normal cells. Low mappability regions can be excluded from the analysis using provided mappability tracks.
Availability: C++ source code is available at: http://bioinfo.curie.fr/projects/freec/
Contact:
freec@curie.fr
Supplementary information:
Supplementary data are available at Bioinformatics online.
Understanding the etiology of metastasis is very important in clinical perspective, since it is estimated that metastasis accounts for 90% of cancer patient mortality. Metastasis results from a ...sequence of multiple steps including invasion and migration. The early stages of metastasis are tightly controlled in normal cells and can be drastically affected by malignant mutations; therefore, they might constitute the principal determinants of the overall metastatic rate even if the later stages take long to occur. To elucidate the role of individual mutations or their combinations affecting the metastatic development, a logical model has been constructed that recapitulates published experimental results of known gene perturbations on local invasion and migration processes, and predict the effect of not yet experimentally assessed mutations. The model has been validated using experimental data on transcriptome dynamics following TGF-β-dependent induction of Epithelial to Mesenchymal Transition in lung cancer cell lines. A method to associate gene expression profiles with different stable state solutions of the logical model has been developed for that purpose. In addition, we have systematically predicted alleviating (masking) and synergistic pairwise genetic interactions between the genes composing the model with respect to the probability of acquiring the metastatic phenotype. We focused on several unexpected synergistic genetic interactions leading to theoretically very high metastasis probability. Among them, the synergistic combination of Notch overexpression and p53 deletion shows one of the strongest effects, which is in agreement with a recent published experiment in a mouse model of gut cancer. The mathematical model can recapitulate experimental mutations in both cell line and mouse models. Furthermore, the model predicts new gene perturbations that affect the early steps of metastasis underlying potential intervention points for innovative therapeutic strategies in oncology.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Computational models in systems biology are becoming more important with the advancement of experimental techniques to query the mechanistic details responsible for leading to phenotypes of interest. ...In particular, Boolean models are well fit to describe the complexity of signaling networks while being simple enough to scale to a very large number of components. With the advance of Boolean model inference techniques, the field is transforming from an artisanal way of building models of moderate size to a more automatized one, leading to very large models. In this context, adapting the simulation software for such increases in complexity is crucial.
We present two new developments in the continuous time Boolean simulators: MaBoSS.MPI, a parallel implementation of MaBoSS which can exploit the computational power of very large CPU clusters, and MaBoSS.GPU, which can use GPU accelerators to perform these simulations.
These implementations enable simulation and exploration of the behavior of very large models, thus becoming a valuable analysis tool for the systems biology community.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The study of response to cancer treatments has benefited greatly from the contribution of different omics data but their interpretation is sometimes difficult. Some mathematical models based on prior ...biological knowledge of signaling pathways facilitate this interpretation but often require fitting of their parameters using perturbation data. We propose a more qualitative mechanistic approach, based on logical formalism and on the sole mapping and interpretation of omics data, and able to recover differences in sensitivity to gene inhibition without model training. This approach is showcased by the study of BRAF inhibition in patients with melanomas and colorectal cancers who experience significant differences in sensitivity despite similar omics profiles. We first gather information from literature and build a logical model summarizing the regulatory network of the mitogen-activated protein kinase (MAPK) pathway surrounding BRAF, with factors involved in the BRAF inhibition resistance mechanisms. The relevance of this model is verified by automatically assessing that it qualitatively reproduces response or resistance behaviors identified in the literature. Data from over 100 melanoma and colorectal cancer cell lines are then used to validate the model's ability to explain differences in sensitivity. This generic model is transformed into personalized cell line-specific logical models by integrating the omics information of the cell lines as constraints of the model. The use of mutations alone allows personalized models to correlate significantly with experimental sensitivities to BRAF inhibition, both from drug and CRISPR targeting, and even better with the joint use of mutations and RNA, supporting multi-omics mechanistic models. A comparison of these untrained models with learning approaches highlights similarities in interpretation and complementarity depending on the size of the datasets. This parsimonious pipeline, which can easily be extended to other biological questions, makes it possible to explore the mechanistic causes of the response to treatment, on an individualized basis.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Sequencing technologies give access to a precise picture of the molecular mechanisms acting upon genome regulation. One of the biggest technical challenges with sequencing data is to map millions of ...reads to a reference genome. This problem is exacerbated when dealing with repetitive sequences such as transposable elements that occupy half of the mammalian genome mass. Sequenced reads coming from these regions introduce ambiguities in the mapping step. Therefore, applying dedicated parameters and algorithms has to be taken into consideration when transposable elements regulation is investigated with sequencing datasets.
Here, we used simulated reads on the mouse and human genomes to define the best parameters for aligning transposable element-derived reads on a reference genome. The efficiency of the most commonly used aligners was compared and we further evaluated how transposable element representation should be estimated using available methods. The mappability of the different transposon families in the mouse and the human genomes was calculated giving an overview into their evolution.
Based on simulated data, we provided recommendations on the alignment and the quantification steps to be performed when transposon expression or regulation is studied, and identified the limits in detecting specific young transposon families of the mouse and human genomes. These principles may help the community to adopt standard procedures and raise awareness of the difficulties encountered in the study of transposable elements.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
The lack of integrated resources depicting the complexity of the innate immune response in cancer represents a bottleneck for high-throughput data interpretation. To address this challenge, we ...perform a systematic manual literature mining of molecular mechanisms governing the innate immune response in cancer and represent it as a signalling network map. The cell-type specific signalling maps of macrophages, dendritic cells, myeloid-derived suppressor cells and natural killers are constructed and integrated into a comprehensive meta map of the innate immune response in cancer. The meta-map contains 1466 chemical species as nodes connected by 1084 biochemical reactions, and it is supported by information from 820 articles. The resource helps to interpret single cell RNA-Seq data from macrophages and natural killer cells in metastatic melanoma that reveal different anti- or pro-tumor sub-populations within each cell type. Here, we report a new open source analytic platform that supports data visualisation and interpretation of tumour microenvironment activity in cancer.
Independent Component Analysis (ICA) is a method that models gene expression data as an action of a set of statistically independent hidden factors. The output of ICA depends on a fundamental ...parameter: the number of components (factors) to compute. The optimal choice of this parameter, related to determining the effective data dimension, remains an open question in the application of blind source separation techniques to transcriptomic data.
Here we address the question of optimizing the number of statistically independent components in the analysis of transcriptomic data for reproducibility of the components in multiple runs of ICA (within the same or within varying effective dimensions) and in multiple independent datasets. To this end, we introduce ranking of independent components based on their stability in multiple ICA computation runs and define a distinguished number of components (Most Stable Transcriptome Dimension, MSTD) corresponding to the point of the qualitative change of the stability profile. Based on a large body of data, we demonstrate that a sufficient number of dimensions is required for biological interpretability of the ICA decomposition and that the most stable components with ranks below MSTD have more chances to be reproduced in independent studies compared to the less stable ones. At the same time, we show that a transcriptomics dataset can be reduced to a relatively high number of dimensions without losing the interpretability of ICA, even though higher dimensions give rise to components driven by small gene sets.
We suggest a protocol of ICA application to transcriptomics data with a possibility of prioritizing components with respect to their reproducibility that strengthens the biological interpretation. Computing too few components (much less than MSTD) is not optimal for interpretability of the results. The components ranked within MSTD range have more chances to be reproduced in independent studies.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Mathematical modeling is used as a Systems Biology tool to answer biological questions, and more precisely, to validate a network that describes biological observations and predict the effect of ...perturbations. This article presents an algorithm for modeling biological networks in a discrete framework with continuous time.
There exist two major types of mathematical modeling approaches: (1) quantitative modeling, representing various chemical species concentrations by real numbers, mainly based on differential equations and chemical kinetics formalism; (2) and qualitative modeling, representing chemical species concentrations or activities by a finite set of discrete values. Both approaches answer particular (and often different) biological questions. Qualitative modeling approach permits a simple and less detailed description of the biological systems, efficiently describes stable state identification but remains inconvenient in describing the transient kinetics leading to these states. In this context, time is represented by discrete steps. Quantitative modeling, on the other hand, can describe more accurately the dynamical behavior of biological processes as it follows the evolution of concentration or activities of chemical species as a function of time, but requires an important amount of information on the parameters difficult to find in the literature.
Here, we propose a modeling framework based on a qualitative approach that is intrinsically continuous in time. The algorithm presented in this article fills the gap between qualitative and quantitative modeling. It is based on continuous time Markov process applied on a Boolean state space. In order to describe the temporal evolution of the biological process we wish to model, we explicitly specify the transition rates for each node. For that purpose, we built a language that can be seen as a generalization of Boolean equations. Mathematically, this approach can be translated in a set of ordinary differential equations on probability distributions. We developed a C++ software, MaBoSS, that is able to simulate such a system by applying Kinetic Monte-Carlo (or Gillespie algorithm) on the Boolean state space. This software, parallelized and optimized, computes the temporal evolution of probability distributions and estimates stationary distributions.
Applications of the Boolean Kinetic Monte-Carlo are demonstrated for three qualitative models: a toy model, a published model of p53/Mdm2 interaction and a published model of the mammalian cell cycle. Our approach allows to describe kinetic phenomena which were difficult to handle in the original models. In particular, transient effects are represented by time dependent probability distributions, interpretable in terms of cell populations.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Upon fertilization, the highly specialised sperm and oocyte genomes are remodelled to confer totipotency. The mechanisms of the dramatic reprogramming events that occur have remained unknown, and ...presumed roles of histone modifying enzymes are just starting to be elucidated. Here, we explore the function of the oocyte-inherited pool of a histone H3K4 and K9 demethylase, LSD1/KDM1A during early mouse development. KDM1A deficiency results in developmental arrest by the two-cell stage, accompanied by dramatic and stepwise alterations in H3K9 and H3K4 methylation patterns. At the transcriptional level, the switch of the maternal-to-zygotic transition fails to be induced properly and LINE-1 retrotransposons are not properly silenced. We propose that KDM1A plays critical roles in establishing the correct epigenetic landscape of the zygote upon fertilization, in preserving genome integrity and in initiating new patterns of genome expression that drive early mouse development.