This paper studies the problem of data-adaptive representations for big, distributed data. It is assumed that a number of geographically-distributed, interconnected sites have massive local data and ...they are interested in collaboratively learning a low-dimensional geometric structure underlying these data. In contrast with previous works on subspace-based data representations, this paper focuses on the geometric structure of a union of subspaces (UoS). In this regard, it proposes a distributed algorithm-termed cloud K-SVD-for collaborative learning of a UoS structure underlying distributed data of interest. The goal of cloud K-SVD is to learn a common overcomplete dictionary at each individual site such that every sample in the distributed data can be represented through a small number of atoms of the learned dictionary. Cloud K-SVD accomplishes this goal without requiring exchange of individual samples between sites. This makes it suitable for applications where sharing of raw data is discouraged due to either privacy concerns or large volumes of data. This paper also provides an analysis of cloud K-SVD that gives insights into its properties as well as deviations of the dictionaries learned at individual sites from a centralized solution in terms of different measures of local/global data and topology of interconnections. Finally, the paper numerically illustrates the efficacy of cloud K-SVD on real and synthetic distributed data.
High-rate data communication over a multipath wireless channel often requires that the channel response be known at the receiver. Training-based methods, which probe the channel in time, frequency, ...and space with known signals and reconstruct the channel response from the output signals, are most commonly used to accomplish this task. Traditional training-based channel estimation methods, typically comprising linear reconstruction techniques, are known to be optimal for rich multipath channels. However, physical arguments and growing experimental evidence suggest that many wireless channels encountered in practice tend to exhibit a sparse multipath structure that gets pronounced as the signal space dimension gets large (e.g., due to large bandwidth or large number of antennas). In this paper, we formalize the notion of multipath sparsity and present a new approach to estimating sparse (or effectively sparse) multipath channels that is based on some of the recent advances in the theory of compressed sensing. In particular, it is shown in the paper that the proposed approach, which is termed as compressed channel sensing (CCS), can potentially achieve a target reconstruction error using far less energy and, in many instances, latency and bandwidth than that dictated by the traditional least-squares-based training methods.
Distributed machine learning algorithms enable learning of models from datasets that are distributed over a network without gathering the data at a centralized location. While efficient distributed ...algorithms have been developed under the assumption of faultless networks, failures that can render these algorithms nonfunctional occur frequently in the real world. This paper focuses on the problem of Byzantine failures, which are the hardest to safeguard against in distributed algorithms. While Byzantine fault tolerance has a rich history, existing work does not translate into efficient and practical algorithms for high-dimensional learning in fully distributed (also known as decentralized) settings. In this paper, an algorithm termed Byzantine-resilient distributed coordinate descent is developed and analyzed that enables distributed learning in the presence of Byzantine failures. Theoretical analysis (convex settings) and numerical experiments (convex and nonconvex settings) highlight its usefulness for high-dimensional distributed learning in the presence of Byzantine failures.
Compressed sensing (CS) has recently emerged as a powerful signal acquisition paradigm. In essence, CS enables the recovery of high-dimensional sparse signals from relatively few linear observations ...in the form of projections onto a collection of test vectors. Existing results show that if the entries of the test vectors are independent realizations of certain zero-mean random variables, then with high probability the unknown signals can be recovered by solving a tractable convex optimization. This work extends CS theory to settings where the entries of the test vectors exhibit structured statistical dependencies. It follows that CS can be effectively utilized in linear, time-invariant system identification problems provided the impulse response of the system is (approximately or exactly) sparse. An immediate application is in wireless multipath channel estimation. It is shown here that time-domain probing of a multipath channel with a random binary sequence, along with utilization of CS reconstruction techniques, can provide significant improvements in estimation accuracy compared to traditional least-squares based linear channel estimation strategies. Abstract extensions of the main results are also discussed, where the theory of equitable graph coloring is employed to establish the utility of CS in settings where the test vectors exhibit more general statistical dependencies.
Principal Component Analysis (PCA) is a fundamental data preprocessing tool in the world of machine learning. While PCA is often thought of as a dimensionality reduction method, the purpose of PCA is ...actually two-fold: dimension reduction and uncorrelated feature learning. Furthermore, the enormity of the dimensions and sample size in the modern day datasets have rendered the centralized PCA solutions unusable. In that vein, this paper reconsiders the problem of PCA when data samples are distributed across nodes in an arbitrarily connected network. While a few solutions for distributed PCA exist, those either overlook the uncorrelated feature learning aspect of the PCA, tend to have high communication overhead that makes them inefficient and/or lack 'exact' or 'global' convergence guarantees. To overcome these aforementioned issues, this paper proposes a distributed PCA algorithm termed FAST-PCA (Fast and exAct diSTributed PCA) . The proposed algorithm is efficient in terms of communication and is proven to converge linearly and exactly to the principal components, leading to dimension reduction as well as uncorrelated features. The claims are further supported by experimental results.
This letter proposes an algorithm for clustering of two-dimensional data. Instead of "flattening" data into vectors, the proposed algorithm keeps samples as matrices and stores them as lateral slices ...in a third-order tensor. It is then assumed that the samples lie near a union of free submodules and their representations under this model are obtained by imposing a low tensor-rank constraint and a structural constraint on the representation tensor. Clustering is carried out using an affinity matrix calculated from the representation tensor. Effectiveness of the proposed algorithm and its superiority over existing methods are demonstrated through experiments on two image datasets.
Detection Theory for Union of Subspaces Lodhi, Muhammad Asad; Bajwa, Waheed U.
IEEE transactions on signal processing,
12/2018, Volume:
66, Issue:
24
Journal Article
Peer reviewed
Open access
The focus of this paper is on detection theory for union of subspaces (UoS). To this end, generalized likelihood ratio tests (GLRTs) are presented for detection of signals conforming to the UoS model ...and detection of the corresponding "active" subspace. One of the main contributions of this paper is bounds on the performances of these GLRTs in terms of geometry of subspaces under various assumptions on the observation noise. The insights obtained through geometrical interpretation of the GLRTs are also validated through extensive numerical experiments on both synthetic and real-world data.
Emerging applications of machine learning in numerous areas-including online social networks, remote sensing, Internet-of-Things (IoT) systems, smart grids, and more-involve continuous gathering of ...and learning from streams of data samples. Real-time incorporation of streaming data into the learned machine learning models is essential for improved inference in these applications. Furthermore, these applications often involve data that are either inherently gathered at geographically distributed entities due to physical reasons, for example, IoT systems and smart grids, or that are intentionally distributed across multiple computing machines for memory, storage, computational, and/or privacy reasons. Training of machine learning models in this distributed, streaming setting requires solving stochastic optimization (SO) problems in a collaborative manner over communication links between the physical entities. When the streaming data rate is high compared with the processing capabilities of individual computing entities and/or the rate of the communications links, this poses a challenging question: How can one best leverage the incoming data for distributed training of machine learning models under constraints on computing capabilities and/or communications rate? A large body of research in distributed online optimization has emerged in recent decades to tackle this and related problems. This article reviews recently developed methods that focus on large-scale distributed SO in the compute- and bandwidth-limited regimes, with an emphasis on convergence analysis that explicitly accounts for the mismatch between computation, communication, and streaming rates and provides sufficient conditions for order-optimum convergence. In particular, it focuses on methods that solve: 1) distributed stochastic convex problems and 2) distributed principal component analysis, which is a nonconvex problem with the geometric structure that permits global convergence. For such methods, this article discusses recent advances in terms of distributed algorithmic designs when faced with high-rate streaming data. Furthermore, it reviews theoretical guarantees underlying these methods that show that there exist regimes in which systems can learn from distributed processing of streaming data at order-optimal rates-nearly as fast as if all the data were processed at a single superpowerful machine.
In a typical multiple-input and multiple-output (MIMO) radar scenario, the receive nodes transmit to a fusion center either samples of the target returns, or the results of matched filtering with the ...transmit waveforms. Based on the data it receives from multiple antennas, the fusion center formulates a matrix, referred to as the data matrix, which, via standard array processing schemes leads to target detection and parameter estimation. In this paper, it is shown that under certain conditions, the data matrix is low rank and thus can be recovered based on knowledge of a small subset of its entries via matrix completion (MC) techniques. Leveraging the low-rank property of the data matrix, we propose a new MIMO radar approach, termed, MIMO-MC radar, in which each receive node either performs matched filtering with a small number of randomly selected dictionary waveforms or obtains sub-Nyquist samples of the target returns at random sampling instants, and forwards the results to a fusion center. Based on the received samples, and with knowledge of the sampling scheme, the fusion center partially fills the data matrix and subsequently applies MC techniques to estimate the full matrix. MIMO-MC radars share the advantages of MIMO radars with compressive sensing, (MIMO-CS), i.e., high resolution with reduced amounts of data, but unlike MIMO-CS radars do not require grid discretization. The MIMO-MC radar concept is illustrated through a uniform linear array configuration, and its target estimation performance is demonstrated via simulations.
This work addresses the problem of learning sparse representations of tensor data using structured dictionary learning. It proposes learning a mixture of separable dictionaries to better capture the ...structure of tensor data by generalizing the separable dictionary learning model. Two different approaches for learning mixture of separable dictionaries are explored and sufficient conditions for local identifiability of the underlying dictionary are derived in each case. Moreover, computational algorithms are developed to solve the problem of learning mixture of separable dictionaries in both batch and online settings. Numerical experiments are used to show the usefulness of the proposed model and the efficacy of the developed algorithms.