High throughput methods, in biological and biomedical fields, acquire a large number of molecular parameters or omics data by a single experiment. Combining these omics data can significantly ...increase the capability for recovering fine-tuned structures or reducing the effects of experimental and biological noise in data.
In this work we propose a multi-view integration methodology (named FH-Clust) for identifying patient subgroups from different omics information (e.g., Gene Expression, Mirna Expression, Methylation). In particular, hierarchical structures of patient data are obtained in each omic (or view) and finally their topologies are merged by consensus matrix. One of the main aspects of this methodology, is the use of a measure of dissimilarity between sets of observations, by using an appropriate metric. For each view, a dendrogram is obtained by using a hierarchical clustering based on a fuzzy equivalence relation with Łukasiewicz valued fuzzy similarity. Finally, a consensus matrix, that is a representative information of all dendrograms, is formed by combining multiple hierarchical agglomerations by an approach based on transitive consensus matrix construction. Several experiments and comparisons are made on real data (e.g., Glioblastoma, Prostate Cancer) to assess the proposed approach.
Fuzzy logic allows us to introduce more flexible data agglomeration techniques. From the analysis of scientific literature, it appears to be the first time that a model based on fuzzy logic is used for the agglomeration of multi-omic data. The results suggest that FH-Clust provides better prognostic value and clinical significance compared to the analysis of single-omic data alone and it is very competitive with respect to other techniques from literature.
Recent studies have indicated that a special class of long non-coding RNAs (lncRNAs), namely Transcribed-Ultraconservative Regions are transcribed from specific DNA regions (T-UCRs), 100Formula: see ...text conserved in human, mouse, and rat genomes. This is noticeable, as lncRNAs are usually poorly conserved. Despite their peculiarities, T-UCRs remain very understudied in many diseases, including cancer and, yet, it is known that dysregulation of T-UCRs is associated with cancer as well as with human neurological, cardiovascular, and developmental pathologies. We have recently reported the T-UCR uc.8+ as a potential prognostic biomarker in bladder cancer.
The aim of this work is to develop a methodology, based on machine learning techniques, for the selection of a predictive signature panel for bladder cancer onset. To this end, we analyzed the expression profiles of T-UCRs from surgically removed normal and bladder cancer tissues, by using custom expression microarray. Bladder tissue samples from 24 bladder cancer patients (12 Low Grade and 12 High Grade), with complete clinical data, and 17 control samples from normal bladder epithelium were analysed. After the selection of preferentially expressed and statistically significant T-UCRs, we adopted an ensemble of statistical and machine learning based approaches (i.e., logistic regression, Random Forest, XGBoost and LASSO) for ranking the most important diagnostic molecules. We identified a signature panel of 13 selected T-UCRs with altered expression profiles in cancer, able to efficiently discriminate between normal and bladder cancer patient samples. Also, using this signature panel, we classified bladder cancer patients in four groups, each characterized by a different survival extent. As expected, the group including only Low Grade bladder cancer patients had greater overall survival than patients with the majority of High Grade bladder cancer. However, a specific signature of deregulated T-UCRs identifies sub-types of bladder cancer patients with different prognosis regardless of the bladder cancer Grade.
Here we present the results for the classification of bladder cancer (Low and High Grade) patient samples and normal bladder epithelium controls by using a machine learning application. The T-UCR's panel can be used for learning an eXplainable Artificial Intelligent model and develop a robust decision support system for bladder cancer early diagnosis providing urinary T-UCRs data of new patients. The use of this system instead of the current methodology will result in a non-invasive approach, reducing uncomfortable procedures (such as cystoscopy) for the patients. Overall, these results raise the possibility of new automatic systems, which could help the RNA-based prognosis and/or the cancer therapy in bladder cancer patients, and demonstrate the successful application of Artificial Intelligence to the definition of an independent prognostic biomarker panel.
As of today, bioinformatics is one of the most exciting fields of scientific research. There is a wide-ranging list of challenging problems to face, i.e., pairwise and multiple alignments, motif ...detection/discrimination/classification, phylogenetic tree reconstruction, protein secondary and tertiary structure prediction, protein function prediction, DNA microarray analysis, gene regulation/regulatory networks, just to mention a few, and an army of researchers, coming from several scientific backgrounds, focus their efforts on developing models to properly address these problems. In this paper, we aim to briefly review some of the huge amount of machine learning methods, developed in the last two decades, suited for the analysis of gene microarray data that have a strong impact on molecular biology. In particular, we focus on the wide-ranging list of data clustering and visualization techniques able to find homogeneous data groupings, and also provide the possibility to discover its connections in terms of structure, function and evolution.
The ARGO-USV (Unmanned Surface Vehicle for ARchaeological GeO-application) is a technological project involving a marine drone aimed at devising an innovative methodology for marine geological and ...geomorphological investigations in shallow areas, usually considered critical areas to be investigated, with the help of traditional vessels. The methodological approach proposed in this paper has been implemented according to a multimodal mapping technique involving the simultaneous and integrated use of both optical and geoacoustic sensors. This approach has been enriched by tools based on artificial intelligence (AI), specifically intended to be installed onboard the ARGO-USV, aimed at the automatic recognition of submerged targets and the physical characterization of the seabed. This technological project is composed of a main command and control system and a series of dedicated sub-systems successfully tested in different operational scenarios. The ARGO drone is capable of acquiring and storing a considerable amount of georeferenced data during surveys lasting a few hours. The transmission of all acquired data in broadcasting allows the cooperation of a multidisciplinary team of specialists able to analyze specific datasets in real time. These features, together with the use of deep-learning-based modules and special attention to green-compliant construction phases, are the particular aspects that make ARGO-USV a modern and innovative project, aiming to improve the knowledge of wide coastal areas while minimizing the impact on these environments. As a proof-of-concept, we present the extensive mapping and characterization of the seabed from a geoarchaeological survey of the underwater Roman harbor of Puteoli in the Gulf of Naples (Italy), demonstrating that deep learning techniques can work synergistically with seabed mapping methods.
CIBB is a venue that embraces researchers with different backgrounds, ranging from mathematics to computer science, from materials science to medicine, and from engineering to biology, all interested ...in the investigation and application of computational intelligence methods to open problems in bioinformatics, biostatistics, systems biology, synthetic biology, and medical informatics. The program of this edition was organized with contributions on the main conference scientific area with heterogeneous open problems at the forefront of current research, and in special sessions on specific themes as Computational Methods for Neuroimaging Analysis, Machine Learning in Health Informatics and Biological Systems, Soft Computing Methods for characterizing Diseases from Omics Data, Engineering Bio-Interfaces and Rudimentary Cells as a way to Develop Synthetic Biology, Modelling and Simulation Methods for System Biology and System Medicine, Fast and Efficient Solutions for Computational Intelligence Methods in Bioinformatics, Systems, and Computational Biology, Networking Biostatistics and Bioinformatics, Machine Explanation—Interpretation of Machine Learning Models for Medicine and Bioinformatics. The organization of this edition of CIBB was supported by the Department of Informatics, Systems and Communication of the University of Milano-Bicocca, Italy, and by the Institute of Biomedical Technologies of the National Research Council, Italy. Besides the papers focused on computational intelligence methods applied to open problems of bioinformatics and biostatistics, the works submitted to CIBB 2019 dealt with algebraic and computational methods to study RNA behaviour, intelligence methods for molecular characterization and dynamics in translational medicine, modeling and simulation methods for computational biology and systems medicine, and machine learning in healthcare informatics and medical biology.
In this work, we propose a novel Feature Selection framework called Sparse-Modeling Based Approach for Class Specific Feature Selection (SMBA-CSFS), that simultaneously exploits the idea of Sparse ...Modeling and Class-Specific Feature Selection. Feature selection plays a key role in several fields (e.g., computational biology), making it possible to treat models with fewer variables which, in turn, are easier to explain, by providing valuable insights on the importance of their role, and likely speeding up the experimental validation. Unfortunately, also corroborated by the no free lunch theorems, none of the approaches in literature is the most apt to detect the optimal feature subset for building a final model, thus it still represents a challenge. The proposed feature selection procedure conceives a two-step approach: (a) a sparse modeling-based learning technique is first used to find the best subset of features, for each class of a training set; (b) the discovered feature subsets are then fed to a class-specific feature selection scheme, in order to assess the effectiveness of the selected features in classification tasks. To this end, an ensemble of classifiers is built, where each classifier is trained on its own feature subset discovered in the previous phase, and a proper decision rule is adopted to compute the ensemble responses. In order to evaluate the performance of the proposed method, extensive experiments have been performed on publicly available datasets, in particular belonging to the computational biology field where feature selection is indispensable: the acute lymphoblastic leukemia and acute myeloid leukemia, the human carcinomas, the human lung carcinomas, the diffuse large B-cell lymphoma, and the malignant glioma. SMBA-CSFS is able to identify/retrieve the most representative features that maximize the classification accuracy. With top 20 and 80 features, SMBA-CSFS exhibits a promising performance when compared to its competitors from literature, on all considered datasets, especially those with a higher number of features. Experiments show that the proposed approach may outperform the state-of-the-art methods when the number of features is high. For this reason, the introduced approach proposes itself for selection and classification of data with a large number of features and classes.
Record linkage aims to identify records from multiple data sources that refer to the same entity of the real world. It is a well known data quality process studied since the second half of the last ...century, with an established pipeline and a rich literature of case studies mainly covering census, administrative or health domains. In this paper, a method to recognize matching records from real municipalities and banks through multiple similarity criteria and a Neural Network classifier is proposed: starting from a labeled subset of the available data, first several similarity measures are combined and weighted to build a feature vector, then a Multi-Layer Perceptron (MLP) network is trained and tested to find matching pairs. For validation, seven real datasets have been used (three from banks and four from municipalities), purposely chosen in the same geographical area to increase the probability of matches. The training only involved two municipalities, while testing involved all sources (municipalities vs. municipalities, banks vs banks and and municipalities vs. banks). The proposed method scored remarkable results in terms of both precision and recall, clearly outperforming threshold-based competitors.
Two methods were compared to predict a ship’s fuel consumption: the simplified naval architecture method (SNAM) and the deep neural network (DNN) method. The SNAM relied on limited operational data ...and employed a simplified technique to estimate a ship’s required power by determining its resistance in calm water. Here, the Holtrop–Mennen technique obtained hydrostatic information for each selected voyage, the added resistance in the encountered natural seaways, and the brake power required for each scenario. Additional characteristics, such as efficiency factors, were derived from literature surveys and from assumed working hypotheses. The DNN method comprised multiple fully connected layers with the nonlinear activation function rectified linear unit (ReLU). This machine-learning-based method was trained on more than 12,000 sample voyages, and the tested data were validated against realistic operational data. Our results demonstrated that, for some ship topologies (general cargo and containerships), the physical models predicted more accurately the realistic data than the machine learning approach despite the lack of relevant operational parameters. Nevertheless, the DNN method was generally capable of yielding reasonably accurate predictions of fuel consumption for oil tankers, bulk carriers, and RoRo ships.
Aim of this work is to introduce a novel visual object tracking model based on siamese network and vision transformer. Tracking is performed by multiple tokens exploiting the learning and ...memorization capabilities of the vision transformers. Therefore, the tracking problem is divided into multiple sub-tasks and experiments by using multiple tokens for learning each individual sub-task. This makes possible to learn a robust characterization of the problem with an explainable architecture, understanding the motivation of the choice that the neural network does. This is due to the attention in the transformer that uses the representational capacity of tokens that allows one to identify, simply with respect to different architectures and methodologies, where all the interest is focused. Several experiments are performed on benchmark data proving to be among the most performing trackers compared with the state of the art in explainability, precision, robustness and speed.