•This survey paper presents recent works proposing neural network models for cancer prediction related problems. All the considered works used gene expression datasets to experiment the proposed ...models.•The survey is distinguished from previous works by analysing the contributions that included three basic components which are: Neural networks, gene expression datasets and cancer prediction. The survey also presents technical details related to data preprocessing, model configuration, learning parameters and the evaluation metrics which give the reader ideas about recent approaches.•We grouped the considered works according to the neural network functionality in the model. Starting with presenting preprocessing techniques, then the models configuration and evaluation metrics and a summary at the end.•A discussion for better future practice was given at the end to highlight some practical issues that can be considered to increase future models predictability.
Neural networks are powerful tools used widely for building cancer prediction models from microarray data. We review the most recently proposed models to highlight the roles of neural networks in predicting cancer from gene expression data. We identified articles published between 2013–2018 in scientific databases using keywords such as cancer classification, cancer analysis, cancer prediction, cancer clustering and microarray data. Analyzing the studies reveals that neural network methods have been either used for filtering (data engineering) the gene expressions in a prior step to prediction; predicting the existence of cancer, cancer type or the survivability risk; or for clustering unlabeled samples. This paper also discusses some practical issues that can be considered when building a neural network-based cancer prediction model. Results indicate that the functionality of the neural network determines its general architecture. However, the decision on the number of hidden layers, neurons, hypermeters and learning algorithm is made using trail-and-error techniques.
The transcriptional network determines a cell's internal state by regulating protein expression in response to changes in the local environment. Due to the interconnected nature of this network, ...information encoded in the abundance of various proteins will often propagate across chains of noisy intermediate signaling events. The data-processing inequality (DPI) leads us to expect that this intracellular game of "telephone" should degrade this type of signal, with longer chains losing successively more information to noise. However, a previous modeling effort predicted that because the steps of these signaling cascades do not truly represent independent stages of data processing, the limits of the DPI could seemingly be surpassed, and the amount of transmitted information could actually increase with chain length. What that work did not examine was whether this regime of growing information transmission was attainable by a signaling system constrained by the mechanistic details of more complex protein-binding kinetics. Here we address this knowledge gap through the lens of information theory by examining a model that explicitly accounts for the binding of each transcription factor to DNA. We analyze this model by comparing stochastic simulations of the fully nonlinear kinetics to simulations constrained by the linear response approximations that displayed a regime of growing information. Our simulations show that even when molecular binding is considered, there remains a regime wherein the transmitted information can grow with cascade length, but ends after a critical number of links determined by the kinetic parameter values. This inflection point marks where correlations decay in response to an oversaturation of binding sites, screening informative transcription factor fluctuations from further propagation down the chain where they eventually become indistinguishable from the surrounding levels of noise.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Techniques using machine learning for short term blood glucose level prediction in patients with Type 1 Diabetes are investigated. This problem is significant for the development of effective ...artificial pancreas technology so accurate alerts (e.g. hypoglycemia alarms) and other forecasts can be generated. It is shown that two factors must be considered when selecting the best machine learning technique for blood glucose level regression: (i) the regression model performance metrics being used to select the model, and (ii) the preprocessing techniques required to account for the imbalanced time spent by patients in different portions of the glycemic range. Using standard benchmark data, it is demonstrated that different regression model/preprocessing technique combinations exhibit different accuracies depending on the glycemic subrange under consideration. Therefore technique selection depends on the type of alert required. Specific findings are that a linear Support Vector Regression-based model, trained with normal as well as polynomial features, is best for blood glucose level forecasting in the normal and hyperglycemic ranges while a Multilayer Perceptron trained on oversampled data is ideal for predictions in the hypoglycemic range.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
A quantitative adverse outcome pathway (qAOP) consists of one or more biologically based, computational models describing key event relationships linking a molecular initiating event (MIE) to an ...adverse outcome. A qAOP provides quantitative, dose–response, and time-course predictions that can support regulatory decision-making. Herein we describe several facets of qAOPs, including (a) motivation for development, (b) technical considerations, (c) evaluation of confidence, and (d) potential applications. The qAOP used as an illustrative example for these points describes the linkage between inhibition of cytochrome P450 19A aromatase (the MIE) and population-level decreases in the fathead minnow (FHM; Pimephales promelas). The qAOP consists of three linked computational models for the following: (a) the hypothalamic-pitutitary-gonadal axis in female FHMs, where aromatase inhibition decreases the conversion of testosterone to 17β-estradiol (E2), thereby reducing E2-dependent vitellogenin (VTG; egg yolk protein precursor) synthesis, (b) VTG-dependent egg development and spawning (fecundity), and (c) fecundity-dependent population trajectory. While development of the example qAOP was based on experiments with FHMs exposed to the aromatase inhibitor fadrozole, we also show how a toxic equivalence (TEQ) calculation allows use of the qAOP to predict effects of another, untested aromatase inhibitor, iprodione. While qAOP development can be resource-intensive, the quantitative predictions obtained, and TEQ-based application to multiple chemicals, may be sufficient to justify the cost for some applications in regulatory decision-making.
Can we evolve better training data for machine learning algorithms? To investigate this question we use population-based optimization algorithms to generate artificial surrogate training data for ...naive Bayes for regression. We demonstrate that the generalization performance of naive Bayes for regression models is enhanced by training them on the artificial data as opposed to the real data. These results are important for two reasons. Firstly, naive Bayes models are simple and interpretable but frequently underperform compared to more complex "black box" models, and therefore new methods of enhancing accuracy are called for. Secondly, the idea of using the real training data indirectly in the construction of the artificial training data, as opposed to directly for model training, is a novel twist on the usual machine learning paradigm.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
RNA aptamers are relatively short nucleic acid sequences that bind targets with high affinity, and when combined with a riboswitch that initiates translation of a fluorescent reporter protein, can be ...used as a biosensor for chemical detection in various types of media. These processes span target binding at the molecular scale to fluorescence detection at the macroscale, which involves a number of intermediate rate-limiting physical (e.g., molecular conformation change) and biochemical changes (e.g., reaction velocity), which together complicate assay design. Here we describe a mathematical model developed to aid environmental detection of hexahydro-1,3,5-trinitro-1,3,5-triazine (RDX) using the DsRed fluorescent reporter protein, but is general enough to potentially predict fluorescence from a broad range of water-soluble chemicals given the values of just a few kinetic rate constants as input. If we expose a riboswitch test population of Escherichia coli bacteria to a chemical dissolved in media, then the model predicts an empirically distinct, power-law relationship between the exposure concentration and the elapsed time of exposure. This relationship can be used to deduce an exposure time that meets or exceeds the optical threshold of a fluorescence detection device and inform new biosensor designs.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Large scale biological responses are inherently uncertain, in part as a consequence of noisy systems that do not respond deterministically to perturbations and measurement errors inherent to ...technological limitations. As a result, they are computationally difficult to model and current approaches are notoriously slow and computationally intensive (multiscale stochastic models), fail to capture the effects of noise across a system (chemical kinetic models), or fail to provide sufficient biological fidelity because of broad simplifying assumptions (stochastic differential equations). We use a new approach to modeling multiscale stationary biological processes that embraces the noise found in experimental data to provide estimates of the parameter uncertainties and the potential mis-specification of models. Our approach models the mean stationary response at each biological level given a particular expected response relationship, capturing variation around this mean using conditional Monte Carlo sampling that is statistically consistent with training data. A conditional probability distribution associated with a biological response can be reconstructed using this method for a subset of input values, which overcomes the parameter identification problem. Our approach could be applied in addition to dynamical modeling methods (see above) to predict uncertain biological responses over experimental time scales. To illustrate this point, we apply the approach to a test case in which we model the variation associated with measurements at multiple scales of organization across a reproduction-related Adverse Outcome Pathway described for teleosts.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Per- and polyfluoroalkyl substances (PFAS) are pervasive environmental contaminants, and their relative stability and high bioaccumulation potential create a challenging risk assessment problem. ...Zebrafish (Danio rerio) data, in principle, can be synthesized within a quantitative adverse outcome pathway (qAOP) framework to link molecular activity with individual or population level hazards. However, even as qAOP models are still in their infancy, there is a need to link internal dose and toxicity endpoints in a more rigorous way to further not only qAOP models but adverse outcome pathway frameworks in general. We address this problem by suggesting refinements to the current state of toxicokinetic modeling for the early development zebrafish exposed to PFAS up to 120 h post-fertilization. Our approach describes two key physiological transformation phenomena of the developing zebrafish: dynamic volume of an individual and dynamic hatching of a population. We then explore two different modeling strategies to describe the mass transfer, with one strategy relying on classical kinetic rates and the other incorporating mechanisms of membrane transport and adsorption/binding potential. Moving forward, we discuss the challenges of extending this model in both timeframe and chemical class, in conjunction with providing a conceptual framework for its integration with ongoing qAOP modeling efforts.
As manipulating the self-assembly of supramolecular and nanoscale constructs at the single-molecule level increasingly becomes the norm, new theoretical scaffolds must be erected to replace the ...thermodynamic and kinetics based models used to describe traditional bulk phase syntheses. Like the statistical mechanics underpinning these latter theories, the framework we propose uses state probabilities as its fundamental objects; but, contrary to the Gibbsian paradigm, our theory directly models the transition probabilities between the initial and final states of a trajectory, foregoing the need to assume ergodicity. We leverage these probabilities in the context of molecular self-assembly to compute the overall likelihood that a specified experimental condition leads to a desired structural outcome. We demonstrate the application of this framework to a toy model in which
N
identical molecules can assemble into oligomers of different lengths and conclude with a discussion of how the high computational cost of such a fine-grained model can be overcome through approximation when extending it to larger, more complex systems.
The Dynamic Spatial Structure of Flocks Russell, Nicholas J; Pilkiewicz, Kevin R; Mayo, Michael L
Entropy (Basel, Switzerland),
03/2024, Letnik:
26, Številka:
3
Journal Article
Recenzirano
Odprti dostop
Studies of collective motion have heretofore been dominated by a thermodynamic perspective in which the emergent "flocked" phases are analyzed in terms of their time-averaged orientational and ...spatial properties. Studies that attempt to scrutinize the dynamical processes that spontaneously drive the formation of these flocks from initially random configurations are far more rare, perhaps owing to the fact that said processes occur far from the eventual long-time steady state of the system and thus lie outside the scope of traditional statistical mechanics. For systems whose dynamics are simulated numerically, the nonstationary distribution of system configurations can be sampled at different time points, and the time evolution of the average structural properties of the system can be quantified. In this paper, we employ this strategy to characterize the spatial dynamics of the standard Vicsek flocking model using two correlation functions common to condensed matter physics. We demonstrate, for modest system sizes with 800 to 2000 agents, that the self-assembly dynamics can be characterized by three distinct and disparate time scales that we associate with the corresponding physical processes of clustering (compaction), relaxing (expansion), and mixing (rearrangement). We further show that the behavior of these correlation functions can be used to reliably distinguish between phenomenologically similar models with different underlying interactions and, in some cases, even provide a direct measurement of key model parameters.