Convolutional Neural Networks (CNNs) achieve excellent computer-assisted diagnosis with sufficient annotated training data. However, most medical imaging datasets are small and fragmented. In this ...context, Generative Adversarial Networks (GANs) can synthesize realistic/diverse additional training images to fill the data lack in the real image distribution; researchers have improved classification by augmenting data with noise-to-image (e.g., random noise samples to diverse pathological images) or image-to-image GANs (e.g., a benign image to a malignant one). Yet, no research has reported results combining noise-to-image and image-to-image GANs for further performance boost. Therefore, to maximize the DA effect with the GAN combinations, we propose a two-step GAN-based DA that generates and refines brain Magnetic Resonance (MR) images with/without tumors separately: (i) Progressive Growing of GANs (PGGANs), multi-stage noise-to-image GAN for high-resolution MR image generation, first generates realistic/diverse 256×256 images; (ii) Multimodal UNsupervised Image-to-image Translation (MUNIT) that combines GANs/Variational AutoEncoders or SimGAN that uses a DA-focused GAN loss, further refines the texture/shape of the PGGAN-generated images similarly to the real ones. We thoroughly investigate CNN-based tumor classification results, also considering the influence of pre-training on ImageNet and discarding weird-looking GAN-generated images. The results show that, when combined with classic DA, our two-step GAN-based DA can significantly outperform the classic DA alone, in tumor detection (i.e., boosting sensitivity 93.67% to 97.48%) and also in other medical imaging tasks.
Cancer cells share several metabolic traits, including aerobic production of lactate from glucose (Warburg effect), extensive glutamine utilization and impaired mitochondrial electron flow. It is ...still unclear how these metabolic rearrangements, which may involve different molecular events in different cells, contribute to a selective advantage for cancer cell proliferation. To ascertain which metabolic pathways are used to convert glucose and glutamine to balanced energy and biomass production, we performed systematic constraint-based simulations of a model of human central metabolism. Sampling of the feasible flux space allowed us to obtain a large number of randomly mutated cells simulated at different glutamine and glucose uptake rates. We observed that, in the limited subset of proliferating cells, most displayed fermentation of glucose to lactate in the presence of oxygen. At high utilization rates of glutamine, oxidative utilization of glucose was decreased, while the production of lactate from glutamine was enhanced. This emergent phenotype was observed only when the available carbon exceeded the amount that could be fully oxidized by the available oxygen. Under the latter conditions, standard Flux Balance Analysis indicated that: this metabolic pattern is optimal to maximize biomass and ATP production; it requires the activity of a branched TCA cycle, in which glutamine-dependent reductive carboxylation cooperates to the production of lipids and proteins; it is sustained by a variety of redox-controlled metabolic reactions. In a K-ras transformed cell line we experimentally assessed glutamine-induced metabolic changes. We validated computational results through an extension of Flux Balance Analysis that allows prediction of metabolite variations. Taken together these findings offer new understanding of the logic of the metabolic reprogramming that underlies cancer cell growth.
A central problem in graph mining is finding dense subgraphs, with several applications in different fields, a notable example being identifying communities. While a lot of effort has been put in the ...problem of finding a single dense subgraph, only recently the focus has been shifted to the problem of finding a set of densest subgraphs. An approach introduced to find possible overlapping subgraphs is the Top-k-Overlapping Densest Subgraphs problem. Given an integer
k
≥
1
and a parameter
λ
>
0
, the goal of this problem is to find a set of
k
dense subgraphs that may share some vertices. The objective function to be maximized takes into account the density of the subgraphs, the parameter
λ
and the distance between each pair of subgraphs in the solution. The Top-k-Overlapping Densest Subgraphs problem has been shown to admit a
1
10
-factor approximation algorithm. Furthermore, the computational complexity of the problem has been left open. In this paper, we present contributions concerning the approximability and the computational complexity of the problem. For the approximability, we present approximation algorithms that improve the approximation factor to
1
2
, when
k
is smaller than the number of vertices in the graph, and to
2
3
, when
k
is a constant. For the computational complexity, we show that the problem is NP-hard even when
k
=
3
.
Metabolic reprogramming is a general feature of cancer cells. Regrettably, the comprehensive quantification of metabolites in biological specimens does not promptly translate into knowledge on the ...utilization of metabolic pathways. By estimating fluxes across metabolic pathways, computational models hold the promise to bridge this gap between data and biological functionality. These models currently portray the average behavior of cell populations however, masking the inherent heterogeneity that is part and parcel of tumorigenesis as much as drug resistance. To remove this limitation, we propose single-cell Flux Balance Analysis (scFBA) as a computational framework to translate single-cell transcriptomes into single-cell fluxomes. We show that the integration of single-cell RNA-seq profiles of cells derived from lung adenocarcinoma and breast cancer patients into a multi-scale stoichiometric model of a cancer cell population: significantly 1) reduces the space of feasible single-cell fluxomes; 2) allows to identify clusters of cells with different growth rates within the population; 3) points out the possible metabolic interactions among cells via exchange of metabolites. The scFBA suite of MATLAB functions is available at https://github.com/BIMIB-DISCo/scFBA, as well as the case study datasets.
Self-assembling processes are ubiquitous phenomena that drive the organization and the hierarchical formation of complex molecular systems. The investigation of assembling dynamics, emerging from the ...interactions among biomolecules like amino-acids and polypeptides, is fundamental to determine how a mixture of simple objects can yield a complex structure at the nano-scale level. In this paper we present HyperBeta, a novel open-source software that exploits an innovative algorithm based on hyper-graphs to efficiently identify and graphically represent the dynamics of Formula: see text-sheets formation. Differently from the existing tools, HyperBeta directly manipulates data generated by means of coarse-grained molecular dynamics simulation tools (GROMACS), performed using the MARTINI force field. Coarse-grained molecular structures are visualized using HyperBeta 's proprietary real-time high-quality 3D engine, which provides a plethora of analysis tools and statistical information, controlled by means of an intuitive event-based graphical user interface. The high-quality renderer relies on a variety of visual cues to improve the readability and interpretability of distance and depth relationships between peptides. We show that HyperBeta is able to track the Formula: see text-sheets formation in coarse-grained molecular dynamics simulations, and provides a completely new and efficient mean for the investigation of the kinetics of these nano-structures. HyperBeta will therefore facilitate biotechnological and medical research where these structural elements play a crucial role, such as the development of novel high-performance biomaterials in tissue engineering, or a better comprehension of the molecular mechanisms at the basis of complex pathologies like Alzheimer's disease.
Mathematical modeling and in silico analysis are widely acknowledged as complementary tools to biological laboratory methods, to achieve a thorough understanding of emergent behaviors of cellular ...processes in both physiological and perturbed conditions. Though, the simulation of large-scale models-consisting in hundreds or thousands of reactions and molecular species-can rapidly overtake the capabilities of Central Processing Units (CPUs). The purpose of this work is to exploit alternative high-performance computing solutions, such as Graphics Processing Units (GPUs), to allow the investigation of these models at reduced computational costs.
LASSIE is a "black-box" GPU-accelerated deterministic simulator, specifically designed for large-scale models and not requiring any expertise in mathematical modeling, simulation algorithms or GPU programming. Given a reaction-based model of a cellular process, LASSIE automatically generates the corresponding system of Ordinary Differential Equations (ODEs), assuming mass-action kinetics. The numerical solution of the ODEs is obtained by automatically switching between the Runge-Kutta-Fehlberg method in the absence of stiffness, and the Backward Differentiation Formulae of first order in presence of stiffness. The computational performance of LASSIE are assessed using a set of randomly generated synthetic reaction-based models of increasing size, ranging from 64 to 8192 reactions and species, and compared to a CPU-implementation of the LSODA numerical integration algorithm.
LASSIE adopts a novel fine-grained parallelization strategy to distribute on the GPU cores all the calculations required to solve the system of ODEs. By virtue of this implementation, LASSIE achieves up to 92× speed-up with respect to LSODA, therefore reducing the running time from approximately 1 month down to 8 h to simulate models consisting in, for instance, four thousands of reactions and species. Notably, thanks to its smaller memory footprint, LASSIE is able to perform fast simulations of even larger models, whereby the tested CPU-implementation of LSODA failed to reach termination. LASSIE is therefore expected to make an important breakthrough in Systems Biology applications, for the execution of faster and in-depth computational analyses of large-scale models of complex biological systems.
After being considered as a nuisance to be filtered out, it became recently clear that biochemical noise plays a complex role, often fully functional, for a biomolecular network. The influence of ...intrinsic and extrinsic noises on biomolecular networks has intensively been investigated in last ten years, though contributions on the co-presence of both are sparse. Extrinsic noise is usually modeled as an unbounded white or colored gaussian stochastic process, even though realistic stochastic perturbations are clearly bounded. In this paper we consider Gillespie-like stochastic models of nonlinear networks, i.e. the intrinsic noise, where the model jump rates are affected by colored bounded extrinsic noises synthesized by a suitable biochemical state-dependent Langevin system. These systems are described by a master equation, and a simulation algorithm to analyze them is derived. This new modeling paradigm should enlarge the class of systems amenable at modeling. We investigated the influence of both amplitude and autocorrelation time of a extrinsic Sine-Wiener noise on: (i) the Michaelis-Menten approximation of noisy enzymatic reactions, which we show to be applicable also in co-presence of both intrinsic and extrinsic noise, (ii) a model of enzymatic futile cycle and (iii) a genetic toggle switch. In (ii) and (iii) we show that the presence of a bounded extrinsic noise induces qualitative modifications in the probability densities of the involved chemicals, where new modes emerge, thus suggesting the possible functional role of bounded noises.
In order to fully characterize the genome of an individual, the reconstruction of the two distinct copies of each chromosome, called haplotypes, is essential. The computational problem of inferring ...the full haplotype of a cell starting from read sequencing data is known as haplotype assembly, and consists in assigning all heterozygous Single Nucleotide Polymorphisms (SNPs) to exactly one of the two chromosomes. Indeed, the knowledge of complete haplotypes is generally more informative than analyzing single SNPs and plays a fundamental role in many medical applications.
To reconstruct the two haplotypes, we addressed the weighted Minimum Error Correction (wMEC) problem, which is a successful approach for haplotype assembly. This NP-hard problem consists in computing the two haplotypes that partition the sequencing reads into two disjoint sub-sets, with the least number of corrections to the SNP values. To this aim, we propose here GenHap, a novel computational method for haplotype assembly based on Genetic Algorithms, yielding optimal solutions by means of a global search process. In order to evaluate the effectiveness of our approach, we run GenHap on two synthetic (yet realistic) datasets, based on the Roche/454 and PacBio RS II sequencing technologies. We compared the performance of GenHap against HapCol, an efficient state-of-the-art algorithm for haplotype phasing. Our results show that GenHap always obtains high accuracy solutions (in terms of haplotype error rate), and is up to 4× faster than HapCol in the case of Roche/454 instances and up to 20× faster when compared on the PacBio RS II dataset. Finally, we assessed the performance of GenHap on two different real datasets.
Future-generation sequencing technologies, producing longer reads with higher coverage, can highly benefit from GenHap, thanks to its capability of efficiently solving large instances of the haplotype assembly problem. Moreover, the optimization approach proposed in GenHap can be extended to the study of allele-specific genomic features, such as expression, methylation and chromatin conformation, by exploiting multi-objective optimization techniques. The source code and the full documentation are available at the following GitHub repository: https://github.com/andrea-tango/GenHap .
Existing techniques to reconstruct tree models of progression for accumulative processes, such as cancer, seek to estimate causation by combining correlation and a frequentist notion of temporal ...priority. In this paper, we define a novel theoretical framework called CAPRESE (CAncer PRogression Extraction with Single Edges) to reconstruct such models based on the notion of probabilistic causation defined by Suppes. We consider a general reconstruction setting complicated by the presence of noise in the data due to biological variation, as well as experimental or measurement errors. To improve tolerance to noise we define and use a shrinkage-like estimator. We prove the correctness of our algorithm by showing asymptotic convergence to the correct tree under mild constraints on the level of noise. Moreover, on synthetic data, we show that our approach outperforms the state-of-the-art, that it is efficient even with a relatively small number of samples and that its performance quickly converges to its asymptote as the number of samples increases. For real cancer datasets obtained with different technologies, we highlight biologically significant differences in the progressions inferred with respect to other competing techniques and we also show how to validate conjectured biological relations with progression models.
Cellular Automata (CA) are a well-established bio-inspired model of computation that has been successfully applied in several domains. In the recent years the importance of modelling real systems ...more accurately has sparkled a new interest in the study of asynchronous CA (ACA). When using an ACA for modelling real systems, it is important to determine the fidelity of the model, in particular with regards to the existence (or absence) of certain dynamical behaviors.
This paper is concerned with two big classes of problems: reachability and preimage existence. For each class, both an existential and a universal version are considered. The following results are proved. Reachability is PSPACE-complete, its resource bounded version is NP-complete (existential form) or coNP-complete (universal form). The preimage problem is dimension sensitive in the sense that it is NL-complete (both existential and universal form) for one-dimensional ACA while it is NP-complete (existential version) or Π2P-complete (universal version) for higher dimension.