High throughput methods, in biological and biomedical fields, acquire a large number of molecular parameters or omics data by a single experiment. Combining these omics data can significantly ...increase the capability for recovering fine-tuned structures or reducing the effects of experimental and biological noise in data.
In this work we propose a multi-view integration methodology (named FH-Clust) for identifying patient subgroups from different omics information (e.g., Gene Expression, Mirna Expression, Methylation). In particular, hierarchical structures of patient data are obtained in each omic (or view) and finally their topologies are merged by consensus matrix. One of the main aspects of this methodology, is the use of a measure of dissimilarity between sets of observations, by using an appropriate metric. For each view, a dendrogram is obtained by using a hierarchical clustering based on a fuzzy equivalence relation with Łukasiewicz valued fuzzy similarity. Finally, a consensus matrix, that is a representative information of all dendrograms, is formed by combining multiple hierarchical agglomerations by an approach based on transitive consensus matrix construction. Several experiments and comparisons are made on real data (e.g., Glioblastoma, Prostate Cancer) to assess the proposed approach.
Fuzzy logic allows us to introduce more flexible data agglomeration techniques. From the analysis of scientific literature, it appears to be the first time that a model based on fuzzy logic is used for the agglomeration of multi-omic data. The results suggest that FH-Clust provides better prognostic value and clinical significance compared to the analysis of single-omic data alone and it is very competitive with respect to other techniques from literature.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
•The paper reviews state-of-the-art of the methods of Intrinsic Dimension (ID) Estimation.•The paper defines the properties that an ideal ID estimator should have.•The paper reviews, under the above ...mentioned framework, the major ID estimation methods underlining their advances and the open problems.
Dimensionality reduction methods are preprocessing techniques used for coping with high dimensionality. They have the aim of projecting the original data set of dimensionality N, without information loss, onto a lower M-dimensional submanifold. Since the value of M is unknown, techniques that allow knowing in advance the value of M, called intrinsic dimension (ID), are quite useful. The aim of the paper is to review state-of-the-art of the methods of ID estimation, underlining the recent advances and the open problems.
As of today, bioinformatics is one of the most exciting fields of scientific research. There is a wide-ranging list of challenging problems to face, i.e., pairwise and multiple alignments, motif ...detection/discrimination/classification, phylogenetic tree reconstruction, protein secondary and tertiary structure prediction, protein function prediction, DNA microarray analysis, gene regulation/regulatory networks, just to mention a few, and an army of researchers, coming from several scientific backgrounds, focus their efforts on developing models to properly address these problems. In this paper, we aim to briefly review some of the huge amount of machine learning methods, developed in the last two decades, suited for the analysis of gene microarray data that have a strong impact on molecular biology. In particular, we focus on the wide-ranging list of data clustering and visualization techniques able to find homogeneous data groupings, and also provide the possibility to discover its connections in terms of structure, function and evolution.
CIBB is a venue that embraces researchers with different backgrounds, ranging from mathematics to computer science, from materials science to medicine, and from engineering to biology, all interested ...in the investigation and application of computational intelligence methods to open problems in bioinformatics, biostatistics, systems biology, synthetic biology, and medical informatics. The program of this edition was organized with contributions on the main conference scientific area with heterogeneous open problems at the forefront of current research, and in special sessions on specific themes as Computational Methods for Neuroimaging Analysis, Machine Learning in Health Informatics and Biological Systems, Soft Computing Methods for characterizing Diseases from Omics Data, Engineering Bio-Interfaces and Rudimentary Cells as a way to Develop Synthetic Biology, Modelling and Simulation Methods for System Biology and System Medicine, Fast and Efficient Solutions for Computational Intelligence Methods in Bioinformatics, Systems, and Computational Biology, Networking Biostatistics and Bioinformatics, Machine Explanation—Interpretation of Machine Learning Models for Medicine and Bioinformatics. The organization of this edition of CIBB was supported by the Department of Informatics, Systems and Communication of the University of Milano-Bicocca, Italy, and by the Institute of Biomedical Technologies of the National Research Council, Italy. Besides the papers focused on computational intelligence methods applied to open problems of bioinformatics and biostatistics, the works submitted to CIBB 2019 dealt with algebraic and computational methods to study RNA behaviour, intelligence methods for molecular characterization and dynamics in translational medicine, modeling and simulation methods for computational biology and systems medicine, and machine learning in healthcare informatics and medical biology.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
In this work, we propose a novel Feature Selection framework called Sparse-Modeling Based Approach for Class Specific Feature Selection (SMBA-CSFS), that simultaneously exploits the idea of Sparse ...Modeling and Class-Specific Feature Selection. Feature selection plays a key role in several fields (e.g., computational biology), making it possible to treat models with fewer variables which, in turn, are easier to explain, by providing valuable insights on the importance of their role, and likely speeding up the experimental validation. Unfortunately, also corroborated by the no free lunch theorems, none of the approaches in literature is the most apt to detect the optimal feature subset for building a final model, thus it still represents a challenge. The proposed feature selection procedure conceives a two-step approach: (a) a sparse modeling-based learning technique is first used to find the best subset of features, for each class of a training set; (b) the discovered feature subsets are then fed to a class-specific feature selection scheme, in order to assess the effectiveness of the selected features in classification tasks. To this end, an ensemble of classifiers is built, where each classifier is trained on its own feature subset discovered in the previous phase, and a proper decision rule is adopted to compute the ensemble responses. In order to evaluate the performance of the proposed method, extensive experiments have been performed on publicly available datasets, in particular belonging to the computational biology field where feature selection is indispensable: the acute lymphoblastic leukemia and acute myeloid leukemia, the human carcinomas, the human lung carcinomas, the diffuse large B-cell lymphoma, and the malignant glioma. SMBA-CSFS is able to identify/retrieve the most representative features that maximize the classification accuracy. With top 20 and 80 features, SMBA-CSFS exhibits a promising performance when compared to its competitors from literature, on all considered datasets, especially those with a higher number of features. Experiments show that the proposed approach may outperform the state-of-the-art methods when the number of features is high. For this reason, the introduced approach proposes itself for selection and classification of data with a large number of features and classes.
Several fuzzy c-means based clustering techniques have been developed to tackle many problems in a number of areas such as pattern recognition, image analysis, communication, data mining. Among all, ...a common use of such a class of clustering algorithms is in the training of radial basis function neural networks (RBFNNs). In this paper, we describe a novel approach to fuzzy clustering, which organizes the data in clusters on the basis of the input data and a ‘prototype’ regression function built, in the output space, as a summation of a number of linear local regression models. This methodology is shown to be effective in the training of RBFNNs leading to improved performance with respect to other clustering algorithms.
Environmental time series are often affected by missing data, namely data unavailability at certain time points. This paper presents the Iterated Imputation and Prediction algorithm, that allows the ...prediction of time series with missing data. The algorithm uses iteratively the Correlation Dimension Estimation of the underlying dynamic system generating the time series to fix the model order (i.e., how many past samples are required to model the time series accurately), and the Support Vector Machine Regression to estimate the skeleton of time series. Experimental validation of the algorithm on three environmental time series with missing data, expressing the concentration of Ozone in three European sites, shows a small average percentage prediction error for all time series on the test set.
•The paper presents Iterated Imputation and Prediction (IIP) algorithm for the missing data time series prediction .•IIP uses Correlation Dimension and Support Vector Machine Regression to estimate the model order and the skeleton of time series.•Correlation Dimension is estimated with the proposed Grassberger-Procaccia-Hough algorithm.
Abstract Recent studies have opened the way for using elicitor‐induced resistance in plants as a method to control arthropod pests. In this study, 1,3‐β‐glucan laminarin, an elicitor of disease ...resistance in plants, was tested on the green peach aphid, Myzus persicae (Sulzer) (Hemiptera: Aphididae), on peach Prunus persica (L.) Batsch, Rosaceae plantlets and evaluated its effects on short‐term mortality and population growth. Laminarin exposure did not affect aphid survival in the short term; however, laminarin‐treated peach plants sustained fewer nymphs and adults in comparison with the control. Aphid populations on plants treated with laminarin declined significantly over the sampling period compared to the control. Moreover, the demographic parameters net reproductive rate (R 0 ), finite rate of increase (λ), and intrinsic rate of increase (r m ), all showed decreasing trends in aphid populations reared on laminarin‐treated plants. The decline in aphid populations exposed to laminarin seemed to mainly be linked to reduced adult survival, slower nymph development, and lower nymph survival and only marginally to changes in reproduction outcome. Changes in gene expression causing the final production of defence chemicals by peach plants may contribute to explaining the results. However, potential direct effects of laminarin on M. persicae feeding activity and probing behaviour cannot be ruled out. This study provides evidence that, although laminarin did not display insecticidal activity in the short term, this elicitor caused sublethal effects, significantly reducing aphid populations.