Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics, and electrical engineering. Current literature on matrix completion focuses ...primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival. Supplementary materials for this article are available online.
Single-cell RNA sequencing has been proved to be revolutionary for its potential of zooming into complex biological systems. Genome-wide expression analysis at single-cell resolution provides a ...window into dynamics of cellular phenotypes. This facilitates the characterization of transcriptional heterogeneity in normal and diseased tissues under various conditions. It also sheds light on the development or emergence of specific cell populations and phenotypes. However, owing to the paucity of input RNA, a typical single cell RNA sequencing data features a high number of dropout events where transcripts fail to get amplified.
We introduce mcImpute, a low-rank matrix completion based technique to impute dropouts in single cell expression data. On a number of real datasets, application of mcImpute yields significant improvements in the separation of true zeros from dropouts, cell-clustering, differential expression analysis, cell type separability, the performance of dimensionality reduction techniques for cell visualization, and gene distribution.
https://github.com/aanchalMongia/McImpute_scRNAseq.
Spectrum sharing enables radar and communication systems to share the spectrum efficiently by minimizing mutual interference. Recently proposed multiple-input multiple-output radars based on sparse ...sensing and matrix completion (MIMO-MC), in addition to reducing communication bandwidth and power as compared with MIMO radars, offer a significant advantage for spectrum sharing. The advantage stems from the way the sampling scheme at the radar receivers modulates the interference channel from the communication system transmitters, rendering it symbol dependent and reducing its row space. This makes it easier for the communication system to design its waveforms in an adaptive fashion so that it minimizes the interference to the radar subject to meeting rate and power constraints. Two methods are proposed. First, based on the knowledge of the radar sampling scheme, the communication system transmit covariance matrix is designed to minimize the effective interference power (EIP) at the radar receiver, while maintaining certain average capacity and transmit power for the communication system. Second, a joint design of the communication transmit covariance matrix and the MIMO-MC radar sampling scheme is proposed, which achieves even further EIP reduction.
Phasor measurement units (PMUs) provide high temporal-resolution synchrophasor measurements for power system monitoring and control. The frequent data quality issues, such as missing and bad data, ...prevent the incorporation of synchrophasor data in real-time operations. Most existing data-driven data recovery methods assume the power system dynamics can be approximated by a linear dynamical system, and the recovery performance degrades significantly when the power system is experiencing nonlinear dynamics during significant events. This paper proposes a data-driven Bayesian nonlinear synchrophasor data recovery method (Ba-NSDR) that can recover a consecutive time period of simultaneous data losses or errors across all channels, even when the underlying system is highly nonlinear. The idea is to lift the Hankel matrix of the spatial-temporal synchrophasor data to a higher dimension such that the lifted Hankel matrix is low-rank in that space and can be processed with the kernel trick. Our proposed Bayesian method then infers the probabilistic distributions of synchrophasor from the partial observations. Some distinctive features of Ba-NSDR include an uncertainty index to measure the accuracy of the recovery result and the robustness to parameter selections. Our method is verified on both synthetic and recorded event datasets.
Low-rank matrices play a fundamental role in modeling and computational methods for signal processing and machine learning. In many applications where low-rank matrices arise, these matrices cannot ...be fully sampled or directly observed, and one encounters the problem of recovering the matrix given only incomplete and indirect observations. This paper provides an overview of modern techniques for exploiting low-rank structure to perform matrix recovery in these settings, providing a survey of recent advances in this rapidly-developing field. Specific attention is paid to the algorithms most commonly used in practice, the existing theoretical guarantees for these algorithms, and representative practical applications of these techniques.
Consider the problem of estimating the entries of a large matrix, when the observed entries are noisy versions of a small random fraction of the original entries. This problem has received widespread ...attention in recent times, especially after the pioneering works of Emmanuel Candes and collaborators. This paper introduces a simple estimation procedure, called Universal Singular Value Thresholding (USVT), that works for any matrix that has "a little bit of structure." Surprisingly, this simple estimator achieves the minimax error rate up to a constant factor. The method is applied to solve problems related to low rank matrix estimation, blockmodels, distance matrix completion, latent space models, positive definite matrix completion, graphon estimation and generalized Bradley-Terry models for pairwise comparison.
The central aims of many host or environmental microbiome studies are to elucidate factors associated with microbial community compositions and to relate microbial features to outcomes. However, ...these aims are often complicated by difficulties stemming from high-dimensionality, non-normality, sparsity, and the compositional nature of microbiome data sets. A key tool in microbiome analysis is beta diversity, defined by the distances between microbial samples. Many different distance metrics have been proposed, all with varying discriminatory power on data with differing characteristics. Here, we propose a compositional beta diversity metric rooted in a centered log-ratio transformation and matrix completion called robust Aitchison PCA. We demonstrate the benefits of compositional transformations upstream of beta diversity calculations through simulations. Additionally, we demonstrate improved effect size, classification accuracy, and robustness to sequencing depth over the current methods on several decreased sample subsets of real microbiome data sets. Finally, we highlight the ability of this new beta diversity metric to retain the feature loadings linked to sample ordinations revealing salient intercommunity niche feature importance.
By accounting for the sparse compositional nature of microbiome data sets, robust Aitchison PCA can yield high discriminatory power and salient feature ranking between microbial niches. The software to perform this analysis is available under an open-source license and can be obtained at https://github.com/biocore/DEICODE; additionally, a QIIME 2 plugin is provided to perform this analysis at https://library.qiime2.org/plugins/deicode/.