The data used for analysis are becoming increasingly complex along several directions: high dimensionality, number of examples and availability of labels for the examples. This poses a variety of ...challenges for the existing machine learning methods, related to analyzing datasets with a large number of examples that are described in a high-dimensional space, where not all examples have labels provided. For example, when investigating the toxicity of chemical compounds, there are many compounds available that can be described with information-rich high-dimensional representations, but not all of the compounds have information on their toxicity. To address these challenges, we propose methods for semi-supervised learning (SSL) of feature rankings. The feature rankings are learned in the context of classification and regression, as well as in the context of structured output prediction (multi-label classification, MLC, hierarchical multi-label classification, HMLC and multi-target regression, MTR) tasks. This is the first work that treats the task of feature ranking uniformly across various tasks of semi-supervised structured output prediction. To the best of our knowledge, it is also the first work on SSL of feature rankings for the tasks of HMLC and MTR. More specifically, we propose two approaches—based on predictive clustering tree ensembles and the Relief family of algorithms—and evaluate their performance across 38 benchmark datasets. The extensive evaluation reveals that rankings based on Random Forest ensembles perform the best for classification tasks (incl. MLC and HMLC tasks) and are the fastest for all tasks, while ensembles based on extremely randomized trees work best for the regression tasks. Semi-supervised feature rankings outperform their supervised counterparts across the majority of datasets for all of the different tasks, showing the benefit of using unlabeled in addition to labeled data.
Multi-target prediction (MTP) serves as an umbrella term for machine learning tasks that concern the simultaneous prediction of multiple target variables. Classical instantiations are multi-label ...classification, multivariate regression, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. Despite the significant similarities, all these domains have evolved separately into distinct research areas over the last two decades. This led to the development of a plethora of highly-engineered methods, and created a substantially-high entrance barrier for machine learning practitioners that are not experts in the field. In this work we present a generic deep learning methodology that can be used for a wide range of multi-target prediction problems. We introduce a flexible multi-branch neural network architecture, partially configured via a questionnaire that helps end users to select a suitable MTP problem setting for their needs. Experimental results for a wide range of domains illustrate that the proposed methodology manifests a competitive performance compared to methods from specific MTP domains.
Epigenetics, referring to genetic modifications that change gene expression, but which are not encoded in DNA, has been shown to be related to oncology, with the potential to influence associated ...treatments. As such, epigenetic drugs comprise an important new field in cancer therapy; however, drug development is a high-cost and time-consuming procedure. Different epigenetic modifications, such as mutations in DNA methyltransferase and somatic mutations in core histone genes that lead to a global loss of the histone modifications, have innumerable relationships. In this article, we propose a graph neural network-based model for the extraction of molecular features, thus reducing the computational requirements. Through integration with a popular and efficient supervised learner, our model achieves higher prediction accuracy in both single- and multi-target tasks and can determine the pleiotropy associated with drugs, providing theoretical support for drug combination and discovery research.
Due to their diverse bioactivity, natural product (NP)s have been developed as commercial products in the pharmaceutical, food and cosmetic sectors as natural compound (NC)s and in the form of ...extracts. Following administration, NCs typically interact with multiple target proteins to elicit their effects. Various machine learning models have been developed to predict multi-target modulating NCs with desired physiological effects. However, due to deficiencies with existing chemical-protein interaction datasets, which are mostly single-labeled and limited, the existing models struggle to predict new chemical-protein interactions. New techniques are needed to overcome these limitations.
We propose a novel NC discovery model called OptNCMiner that offers various advantages. The model is trained via end-to-end learning with a feature extraction step implemented, and it predicts multi-target modulating NCs through multi-label learning. In addition, it offers a few-shot learning approach to predict NC-protein interactions using a small training dataset. OptNCMiner achieved better prediction performance in terms of recall than conventional classification models. It was tested for the prediction of NC-protein interactions using small datasets and for a use case scenario to identify multi-target modulating NCs for type 2 diabetes mellitus complications.
OptNCMiner identifies NCs that modulate multiple target proteins, which facilitates the discovery and the understanding of biological activity of novel NCs with desirable health benefits.
Display omitted
•Contributes to the “general correlation problem” of Process Mining.•A three-stage approach that recommends horizontal partitioning of the event log, cases’ profiles creation, and ...multi-target feature evaluation to deliver insights.•Connects process behavior to cases’ characteristics following a conceptually unsupervised approach.
Certain business environments, like health-care or customer service, host complex and highly variable business processes. In such situations, we expect fluctuating process behavior, which is difficult to attribute to specific causes, at least automatically. This work aims to provide process analysts with an additional tool to discover factors that affect the process flow. To this end, we propose a three-stage methodology to deal with the several challenges of this goal.
Adhering to the process mining paradigm that suggests for evidence-based process analysis and improvement, we introduce a horizontal partitioning approach to identify elements of process behavior during the first stage. Then, during the second stage, we discuss how log manipulations can yield characteristics that reflect various perspectives of the process. Finally, we propose a multi-target feature evaluation step to deliver insights about the associations between characteristics and process behavior.
The proposed methodology is designed to tackle challenges related to the general correlation problem of process mining, like dealing with general process behavior (not just local decisions) and relaxing the independence assumption among the elements of behavior. We demonstrate our approach step by step through a case study on a real-world, open dataset.
An important consideration in conservation and biodiversity planning is an appreciation of the condition or integrity of ecosystems. In this study, we have applied various machine learning methods to ...the problem of predicting the condition or quality of the remnant indigenous vegetation across an extensive area of south-eastern Australia—the state of Victoria. The field data were obtained using the ‘habitat hectares’ approach. This rapid assessment technique produces multiple scores that describe the condition of various attributes of the vegetation at a given site. Multiple sites were assessed and subsequently circumscribed with GIS and remote-sensed data.
We explore and compare two approaches for modelling this type of data: to learn a model for each score separately (single-target approach, a regression tree), or to learn one model for all scores simultaneously (multi-target approach, a multi-target regression tree). In order to lift the predictive performance, we also employ ensembles (bagging and random forests) of regression trees and multi-target regression trees. Our results demonstrate the advantages of a multi-target over a single-target modelling approach. While there is no statistically significant difference between the multi-target and single-target models in terms of model performance, the multi-target models are smaller and faster to learn than the single-target ones. Ensembles of multi-target models, also, improve the spatial prediction of condition.
The usefulness of models of vegetation condition is twofold. First, they provide an enhanced knowledge and understanding of the condition of different indigenous vegetation types, and identify possible biophysical and landscape attributes that may contribute to vegetation decline. Second, these models may be used to map the condition of indigenous vegetation, in support of biodiversity planning, management and investment decisions.
Prediction of dam behavior based on monitoring data is important for dam safety and emergency management. It is crucial to analyze and predict the seepage field. Different from the mechanism-based ...physical models, machine learning models predict directly from data with high accuracy. However, current prediction models are generally based on environmental variables and single measurement point time series. Sometimes point-by-point modeling is used to obtain multi-point prediction values. In order to improve the prediction accuracy and efficiency of the seepage field, a novel multi-target prediction model (MPM) is proposed in which two deep learning methods are integrated into one frame. The MPM model can capture causal temporal features between environmental variables and target values, as well as latent correlation features between different measurement points at each moment. The features of these two parts are put into fully connected layers to establish the mapping relationship between the comprehensive feature vector and the multi-target outputs. Finally, the model is trained for prediction in the framework of a feed-forward neural network using standard back propagation. The MPM model can not only describe the variation pattern of measurement values with the change of load and time, but also reflect the spatial distribution relationship of measurement values. The effectiveness and accuracy of the MPM model are verified by two cases. The proposed MPM model is commonly applicable in prediction of other types of physical fields in dam safety besides the seepage field.
DeepMTP is a python framework designed to be compatible with the majority of machine learning sub-areas that fall under the umbrella of multi-target prediction (MTP). Multi-target prediction includes ...problem settings like multi-label classification, multivariate regression, multi-task learning, matrix completion, dyadic prediction, and zero-shot learning. Instead of using separate methodologies for the different problem settings, the proposed framework employs a single flexible two-branch neural network architecture that has been proven to be effective across the majority of MTP problem settings. To our knowledge, this is the first attempt at providing a framework that is compatible with more than two MTP problem settings. The source code of the framework is available at https://github.com/diliadis/DeepMTP and an extension with a graphical user-interface is available at https://github.com/diliadis/DeepMTP_gui.
Human mobility prediction is of great advantage in route planning and schedule management. However, mobility data is a high-dimensional dataset in which multi-context prediction is difficult in a ...single model. Mobility data can usually be expressed as a home event, a work event, a shopping event and a traveling event. Previous works have only been able to learn and predict one type of mobility event and then integrate them. As the tensor model has a strong ability to describe high-dimensional information, we propose an algorithm to predict human mobility in tensors of location context data. Using the tensor decomposition method, we extract human mobility patterns with multiple expressions and then synthesize the future mobility event based on mobility patterns. The experiment is based on real-world location data and the results show that the tensor decomposition method has the highest accuracy in terms of prediction error among the three methods. The results also prove the feasibility of our multi-context prediction model.