•An orthogonal deep non-negative matrix factorization (Deep-NMF) framework that aims to learn the non-linear parts-based representation for multi-view data is proposed.•The T-SNE visualizations of ...the features learned by the proposed Deep-NMF and its counterpart ascertain the effectiveness of the proposed framework for multi-view clustering.•The proposed Deep-NMF method learns and incorporates the most consensed manifold for multi-view data in all layers of the multi-layer architecture.•The objective function is designed to uncover the consensus representation that is unique and encodes both the view-shared, view-specific information for multi-view data.•Extensive experiments including features visualization, components-based and multi-layer ability analysis, comprehensive examples have been conducted and presented in this work.
Multi-view data clustering based on Non-negative Matrix Factorization (NMF) has been commonly used for pattern recognition by grouping multi-view high-dimensional data by projecting it to a lower-order dimensional space. However, the NMF framework fails to learn the accurate lower-order representation of the input data if it exhibits complex and non-linear relationships. This paper proposes a deep non-negative matrix factorization-based framework for effective multi-view data clustering by uncovering both the non-linear relationships and the intrinsic components of the data. Both the consensus and complementary information present in multiple views are sufficiently learned in the proposed framework with the effective use of constraints such as normalized cut-type and orthogonal. The optimal manifold of multi-view data is effectively incorporated in all layers of the framework. Extensive experimental results show the proposed method outperforms state-of-the-art multi-view matrix factorization-based methods.
Semi-supervised symmetric nonnegative matrix factorization (SNMF) has been extensively utilized in both linear and nonlinear data clustering tasks. However, the current SNMF model's non-convex ...objective function faces challenges in global optimization and time efficiency. In this study, we leverage label information to propose a convex and unconstrained symmetric matrix factorization (SMF) model that is thoroughly analyzed for its convexity properties. In order to capture high-order relationships among data, a hypergraph is utilized in the model, which is computationally simple, translation invariant, and naturally normalized. Moreover, based on the analysis and the corresponding experiments in the paper, the model exhibits robustness towards outliers to some extent. Due to the convexity of our proposed model without constraint, it can be efficiently optimized using the Conjugate Gradient (CG) method, one of the most efficient methods available. Therefore, we propose a novel Convex Combination-based Sufficient Descent CG (CSDCG) method, which outperforms other methods across 284 optimization problems within the CUTEst library. In order to evaluate the effectiveness of the proposed method, the semi-supervised clustering experiments are conducted on the eight datasets by comparison with ten state-of-the-art matrix factorization (MF) methods. The experiment results demonstrate its superiority over the other compared methods to handle the clustering problem with better performance and less computational time. The code is available at https://github.com/Pokemer/HCSSMF.
•A novel embedded-based multi-label feature selection method is proposed.•Our method extracts the shared common mode between features and labels.•Our method uses Non-negative Matrix Factorization to ...enhance the interpretability.•An optimization algorithm is proposed for our method.•Numerous experiments are conducted to demonstrate the superiority of our method.
Multi-label feature selection plays an indispensable role in multi-label learning, which eliminates irrelevant and redundant features while retaining relevant features. Most of existing multi-label feature selection methods employ two strategies to construct feature selection models: extracting label correlations to guide feature selection process and maintaining the consistency between the feature matrix and the reduced low-dimensional feature matrix. However, the data information is described by two data matrices: the feature matrix and the label matrix. Previous methods devote attention to either of the two data matrices. To address this issue, we propose a novel feature selection method named Feature Selection considering Shared Common Mode between features and labels (SCMFS). First, we utilize Coupled Matrix Factorization (CMF) to extract the shared common mode between the feature matrix and the label matrix, considering the comprehensive data information in the two matrices. Additionally, Non-negative Matrix Factorization (NMF) is adopted to enhance the interpretability for feature selection. Extensive experiments are implemented on fifteen real-world benchmark data sets for multiple evaluation metrics, the experimental results demonstrate the classification superiority of the proposed method.
The application of binary matrices are numerous. Representing a matrix as a mixture of a small collection of latent vectors via low-rank decomposition is often seen as an advantageous method to ...interpret and analyze data. In this work, we examine the factorizations of binary matrices using standard arithmetic (real and nonnegative) and logical operations (Boolean and ℤ2). We examine the relationships between the different ranks, and discuss when factorization is unique. In particular, we characterize when a Boolean factorization X = W ∧ H has a unique W, a unique H (for a fixed W), and when both W and H are unique, given a rank constraint. We introduce a method for robust Boolean model selection, called BMFk, and show on numerical examples that BMFk not only accurately determines the correct number of Boolean latent features but reconstruct the pre-determined factors accurately.
In both academia and the pharmaceutical industry, large-scale assays for drug discovery are expensive and often impractical, particularly for the increasingly important physiologically relevant model ...systems that require primary cells, organoids, whole organisms, or expensive or rare reagents. We hypothesized that data from a single high-throughput imaging assay can be repurposed to predict the biological activity of compounds in other assays, even those targeting alternate pathways or biological processes. Indeed, quantitative information extracted from a three-channel microscopy-based screen for glucocorticoid receptor translocation was able to predict assay-specific biological activity in two ongoing drug discovery projects. In these projects, repurposing increased hit rates by 50- to 250-fold over that of the initial project assays while increasing the chemical structure diversity of the hits. Our results suggest that data from high-content screens are a rich source of information that can be used to predict and replace customized biological assays.
Display omitted
•Scalable machine-learning-based method predicting compound activity from images•Hundreds of assays predicted by one image screen annotating half a million compounds•Image-based models boosted hit rate and diversity in two drug discovery projects•Proof of concept justifying further work on image-based learning for drug discovery
Simm et al. demonstrate a computational method to predict the activities of compounds in hundreds of biological assays from a single image-based screen of half a million compounds. The resulting models boosted the identification and diversity of hit compounds for two projects, encouraging further research in this field.
•A new regularizer is proposed based on a linear projection.•Two iterative update procedures are developed for minimizing the new objective function.•Various experiments verify the superiority of the ...proposed algorithm.
Nonnegative Matrix Factorization (NMF) produces interpretable solutions for many applications including collaborative filtering. Typically, regularization is needed to address issues such as overfitting and interpretability, especially for collaborative filtering where the rating matrices are sparse. However, the existing regularizers are typically constructed from the factorization results instead of the rating matrices. Intuitively, we regard these existing regularizers as representing either user factors or item factors and anticipate that a more holistic regularizer could improve the effectiveness of NMF. To this end, we propose a graph regularizer based on a linear projection of the rating matrix, and call the resulting method: Linear Projection and Graph Regularized Nonnegative Matrix Factorization (LPGNMF). We develop two iterative methods to minimize the cost function and derive two update rules named LPGNMF and F-LPGNMF. Additionally, we prove the value of the objective function decreases with LPGNMF and converges to a fixed point with F-LPGNMF. Finally, we test these methods against a number of NMF algorithms on different data sets and show both LPGNMF and F-LPGNMF always achieve smaller errors based on two different error measures.
Hyperspectral remote sensing image unsupervised classification, which assigns each pixel of the image into a certain land-cover class without any training samples, plays an important role in the ...hyperspectral image processing but still leaves huge challenges due to the complicated and high-dimensional data observation. Although many advanced hyperspectral remote sensing image classification techniques based on supervised and semi-supervised learning had been proposed and confirmed effective in recent years, they require a certain number of high quality training samples to learn a classifier, and thus can’t work in the unsupervised manner. In this work, we propose a hyperspectral image unsupervised classification framework based on robust manifold matrix factorization and its out-of-sample extension. In order to address the high feature dimensionality of the hyperspectral image, we propose a unified low-rank matrix factorization to jointly perform the dimensionality reduction and data clustering, by which the clustering result can be exactly reproduced, which is significantly superior to the existing data clustering algorithms such as the k-means and spectral clustering. In particular, in the proposed matrix factorization, the ℓ2,1-norm is used to measure the reconstruction loss, which helps to reduce the errors brought by the possible noisy observation. The widely considered manifold regularization is also adopted to further promote the proposed model. Furthermore, we have designed a novel Augmented Lagrangian Method (ALM) based procedure to seek the local optimal solution of the proposed optimization and suggested an additional out-of-sample extension trick to make the method can deal with the large-scale hyperspectral remote sensing images. Several experimental results on the standard hyperspectral images show that the proposed method presents competitive clustering accuracy and comparative running time compared to the existing data clustering algorithms.
Explicable recommendation system is proved to be conducive to improving the persuasiveness of the recommendation system, enabling users to trust the system more and make more intelligent decisions. ...Nonnegative Matrix Factorization (NMF) produces interpretable solutions for many applications including collaborative filtering as it’s nonnegativity. However, the latent features make it difficult to interpret recommendation results to users because we don’t know the specific meaning of features that users are interested in and the extent to which the items or users belong to these features. To overcome this difficulty, we develop a novel method called Partially Explainable Nonnegative Matrix Factorization (PE-NMF) by employing explicit data to replace part latent variables of item-feature matrix, by which users can learn more about the features of the items and then to make ideal decisions and recommendations. The objective function of PE-NMF is composed of two parts: one part corresponding to explicit features and the other part is about implicit features. We develop an iterative method to minimize the objective function and derive the iterative update rules, with which the objective function can be proved to be decreasing. Finally, the experiments are executed on Yelp, Amazon and Dianping datasets, and the experimental results demonstrate PE-NMF keeps a high prediction performance on both rating prediction and top-N recommendation that compare to fully explainable nonnegative matrix factorization (FE-NMF), which is obtained by using explicit opinions instead of item-feature matrix. Also PE-NMF holds almost the same recommendation ability as NMF.