The authors proposed a direct comparison between white- and black-box models to predict the engine brake power of a 15,000 TEU (twenty-foot equivalent unit) containership. A Simplified Naval ...Architecture Method (SNAM), based on limited operational data, was highly enhanced by including specific operational parameters. An OAT (one-at-a-time) sensitivity analysis was performed to recognize the influences of the most relevant parameters in the white-box model. The black-box method relied on a DNN (deep neural network) composed of two fully connected layers with 4092 and 8192 units. The network consisted of a feed-forward network, and it was fed by more than 12,000 samples of data, encompassing twenty-three input features. The test data were validated against realistic operational data obtained during specific operational windows. Our results agreed favorably with the results obtained for the DNN, which relied on sufficiently observed data for the physical model.
Non-coding RNA transcripts originating from Ultraconserved Regions (UCRs) have tissue-specific expression and play relevant roles in the pathophysiology of multiple cancer types. Among them, we ...recently identified and characterized the ultra-conserved-transcript-8+ (uc.8+), whose levels correlate with grading and staging of bladder cancer. Here, to validate uc.8+ as a potential biomarker in bladder cancer, we assessed its expression and subcellular localization by using tissue microarray on 73 human bladder cancer specimens. We quantified uc.8+ by in-situ hybridization and correlated its expression levels with clinical characteristics and patient survival. The analysis of subcellular localization indicated the simultaneous presence of uc.8+ in the cytoplasm and nucleus of cells from the Low-Grade group, whereas a prevalent cytoplasmic localization was observed in samples from the High-Grade group, supporting the hypothesis of uc.8+ nuclear-to-cytoplasmic translocation in most malignant tumor forms. Moreover, analysis of uc.8+ expression and subcellular localization in tumor-surrounding stroma revealed a marked down-regulation of uc.8+ levels compared to the paired (adjacent) tumor region. Finally, deep machine-learning approaches identified nucleotide sequences associated with uc.8+ localization in nucleus and/or cytoplasm, allowing to predict possible RNA binding proteins associated with uc.8+, recognizing also sequences involved in mRNA cytoplasm-translocation. Our model suggests uc.8+ subcellular localization as a potential prognostic biomarker for bladder cancer.
Environmental time series are often affected by missing data, namely data unavailability at certain time points. This paper presents the Iterated Imputation and Prediction algorithm, that allows the ...prediction of time series with missing data. The algorithm uses iteratively the Correlation Dimension Estimation of the underlying dynamic system generating the time series to fix the model order (i.e., how many past samples are required to model the time series accurately), and the Support Vector Machine Regression to estimate the skeleton of time series. Experimental validation of the algorithm on three environmental time series with missing data, expressing the concentration of Ozone in three European sites, shows a small average percentage prediction error for all time series on the test set.
•The paper presents Iterated Imputation and Prediction (IIP) algorithm for the missing data time series prediction .•IIP uses Correlation Dimension and Support Vector Machine Regression to estimate the model order and the skeleton of time series.•Correlation Dimension is estimated with the proposed Grassberger-Procaccia-Hough algorithm.
An accurate assessment of a ship's required power is increasingly relevant for ship operations. We used a simplified framework of a data-driven model to predict ship's fuel consumption. Our approach ...was based on the learning capabilities of a generalized AutoML (Automated Machine Learning) process, trained with a variety of databases obtained from Computational-Fluid-Dynamics (CFD) simulations or from simplified numerical methods. These CFD simulations were conducted by solving the Reynolds Averaged Navier-Stokes (RANS) equations, using the STARCCM + commercial CFD software to calculate the ship resistance at speed under different operating conditions. Initially, we conducted a statistical analysis to select the independent variables before fitting the regression models and identifying potentially wrong assumptions. The AutoML process allowed optimizing the model's hyperparameters and designing the topology of the neural networks. For a set of unknown scenarios, comparative predictions obtained from the data-driven model and from numerical simulations showed that the data-driven model, trained with results obtained from CFD simulations, accurately and efficiently predicted ship operational parameters under realistic operating conditions, thereby dispensing with the need to perform elaborate CFD computations. Specifically, this low-cost and efficient operational data-driven technique forecasted the ship's operational fuel consumption although only a limited amount of recorded operational data was available.
•Low-cost and efficient operational data-driven technique to forecast the ship's operational fuel consumption although limited amount of data.•Application of AutoML algorithm allows easy data processing and optimization of hyperparameters while generating the model.•AUTOML model enabled capturing the physical simulated tested scenarios, highlighting the advantages in terms of computational effort.
Abstract
Motivation
The cost of drug development has dramatically increased in the last decades, with the number new drugs approved per billion US dollars spent on R&D halving every year or less. The ...selection and prioritization of targets is one the most influential decisions in drug discovery. Here we present a Gaussian Process model for the prioritization of drug targets cast as a problem of learning with only positive and unlabeled examples.
Results
Since the absence of negative samples does not allow standard methods for automatic selection of hyperparameters, we propose a novel approach for hyperparameter selection of the kernel in One Class Gaussian Processes. We compare our methods with state-of-the-art approaches on benchmark datasets and then show its application to druggability prediction of oncology drugs. Our score reaches an AUC 0.90 on a set of clinical trial targets starting from a small training set of 102 validated oncology targets. Our score recovers the majority of known drug targets and can be used to identify novel set of proteins as drug target candidates.
Availability and implementation
The matrix of features for each protein is available at: https://bit.ly/3iLgZTa. Source code implemented in Python is freely available for download at https://github.com/AntonioDeFalco/Adaptive-OCGP.
Supplementary information
Supplementary data are available at Bioinformatics online.
The seismic analysis of reinforced concrete (RC) structures generally requires significant computational effort, which can be challenging or at least time-consuming also for the modern computing ...systems. Particularly, huge computational effort is required for running optimisation procedures intended at selecting the “best” retrofitting solution among the wide set of technical feasible ones. Therefore, this paper proposes the use of Machine Learning instead of the mechanistic analyses executed as part of an optimisation procedure for seismic retrofitting of RC existing structures recently proposed by the authors. Specifically, an Artificial Neural Network is trained and employed as a possible substitute of finite element analysis for a rapid and accurate assessment of the relevant performance exhibited by the enhanced configurations of an RC existing building typology. The obtained results demonstrate the effectiveness of an artificial neural network as a computational model to approximate a finite element analysis in seismic retrofitting of RC structures by considering several structural configurations. The proposed methodology can be used to speed-up the search of a viable RC strengthening configuration within the whole parametric field of relevance, which can be subsequently refined using more detailed and computationally expensive FE methods.
Fuzzy Cognitive Maps Extraction from Enriched Tweets Maratea, Antonio; Ciaramella, Angelo; Santillo, Marialuisa
2022 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE),
2022-July-18
Conference Proceeding
Fuzzy Cognitive Maps (FCMs) represent graphically the main concepts of a given domain and their relationships as a directed and weighted graph. As part of a growing need for intelligent systems that ...produce explanations for the decisions they make (the so-called XAI - eXplainable Artificial Intelligence), due to their intuitive yet formal nature, FCMs are invaluable tools for modeling complex real world scenarios, but are traditionally created through the analysis of direct interviews with a number of domain experts, hence requiring a largely manual, expensive, and cumbersome effort. The aim of this work is to design, develop and test a method for the automatic generation of FCMs from raw data in form of Twitter conversations. In order to improve the recognized entities and to cope with brevity, ambiguity and jargon, messages in tweets are first enriched with both domain-specific and general corpora, then analyzed and transformed into meaningful maps. As the data come from a population of common users instead of domain experts, the obtained FCMs are highly variable and should be read more as a snapshot of the beliefs of these users on a specific topic than an objective representation of what experts think on that topic. From clerical review, reported test cases confirm the viability and effectiveness of the proposed method.
In this work, a scheme based on a compressive sampling technique and a fast dictionary learning approach for reconstructing audio content in multimedia streaming is introduced. Audio streaming data ...are encapsulated in different packets by means of an interleaving technique. The compressive sampling technique is used to reconstruct audio information in case of lost packets, with a sparsifying basis provided by a greedy adaptive dictionary learning algorithm. In order to assess the performance of the methodology, several experiments on speech and musical audio signals are presented.
In recent years, the field of Machine Learning is showing great interest towards the processing of structured data, such as sequences, trees and graphs. In this paper an unsupervised recursive ...learning schema for structured data clustering is introduced. The schema allows to process data organized in graphs for both graph-focused and node-focused applications. The approach uses the Fuzzy C-Means algorithm as building block. Some experiments are proposed to show its performances and to compare it with another approach known in literature.
•A fuzzy system for environmental risk assessment of genetically modified plants is described.•The fuzzy system is based on Mamdani inference.•The Fuzzy System Risk Assessments have been validated on ...real world trial case studies.•The system decisions have been considered coherent and consistent by human experts.
Environmental risk assessment (ERA) of the deliberate release of genetically modified plants (GMPs) is currently performed by human experts on the basis of own personal experience and knowledge. In this paper we describe a fuzzy decision system (FDS) for the ERA of GMPs, based on Mamdani fuzzy inference. The risk assessment in the FDS is obtained by using a fuzzy inference system (FIS), performed using jFuzzyLogic library. The FDS permits obtaining an evaluation process for the identification of potential impacts that can achieve one or more receptors through a set of migration paths. The decisions derived by FDS have been validated on real world cases by the human experts that are in charge of ERA. They have confirmed the reliability and correctness of the fuzzy system decisions.