Petabyte-scale distributed storage systems are currently transitioning to erasure codes to achieve higher storage efficiency. Classical codes, such as Reed-Solomon (RS), are highly sub-optimal for ...distributed environments due to their high overhead during single-failure events. Locally repairable codes (LRCs) form a new family of codes that are repair efficient. In particular, LRCs minimize the number of nodes participating in single node repairs. Fundamental bounds and methods for explicitly constructing LRCs suitable for deployment in distributed storage clusters are not fully understood and currently form an active area of research. In this paper, we present an explicit LRC that is simple to construct and is optimal for a specific set of coding parameters. Our construction is based on grouping RS symbols and then adding extra simple parities that allow for small repair locality. For the analysis of the optimality of the code, we derive a new result on the matroid represented by the code's generator matrix.
Locality and Availability in Distributed Storage Rawat, Ankit Singh; Papailiopoulos, Dimitris S.; Dimakis, Alexandros G. ...
IEEE transactions on information theory,
2016-Aug., 2016-8-00, 20160801, Letnik:
62, Številka:
8
Journal Article
Recenzirano
Odprti dostop
This paper studies the problem of information symbol availability in codes: we refer to a systematic code as code with (r, t)-availability if every information (systematic) symbol can be ...reconstructed from t disjoint groups of other code symbols, each of the sizes at most r. This paper shows that it is possible to construct codes that can support a scaling number of parallel reads while keeping the rate to be an arbitrarily high constant. It further shows that this is possible with the minimum Hamming distance arbitrarily close to the Singleton bound. This paper also presents a bound demonstrating a tradeoff between rate, minimum Hamming distance, and availability parameters. Our codes match the aforementioned bound, and their constructions rely on certain combinatorial structures. Resolvable designs provide one way to realize these required combinatorial structures. The two constructions presented in this paper require field sizes, which are linear and exponential in the code length, respectively. From a practical standpoint, our codes are relevant for distributed storage applications involving hot data, i.e., the information, which is frequently accessed by multiple processes in parallel.
We show that the maximization of the sum degrees-of-freedom for the static flat-fading multiple-input multiple-output (MIMO) interference channel (IC) is equivalent to a rank constrained rank ...minimization problem (RCRM), when the signal subspaces span all available dimensions. The rank minimization corresponds to maximizing interference alignment (IA) so that interference spans the lowest dimensional subspace possible. The rank constraints account for the useful signal subspaces spanning all available spatial dimensions. That way, we reformulate the IA requirements to requirements involving ranks. Then, we present a convex relaxation of the RCRM problem inspired by recent results in compressed sensing and low-rank matrix completion theory that rely on approximating rank with the nuclear norm. We show that the convex envelope of the sum of ranks of the interference matrices is the normalized sum of their corresponding nuclear norms and replace the rank constraints with asymptotically equivalent and tractable ones. We then tune our heuristic relaxation for the multicell interference channel. We experimentally show that in many cases the proposed algorithm attains perfect interference alignment and in some cases outperforms previous approaches for finding precoding and receive matrices for interference alignment.
A Repair Framework for Scalar MDS Codes Shanmugam, Karthikeyan; Papailiopoulos, Dimitris S.; Dimakis, Alexandros G. ...
IEEE journal on selected areas in communications,
05/2014, Letnik:
32, Številka:
5
Journal Article
Recenzirano
Several works have developed vector-linear maximum-distance separable (MDS) storage codes that minimize the total communication cost required to repair a single coded symbol after an erasure, ...referred to as repair bandwidth (BW). Vector codes allow communicating fewer sub-symbols per node, instead of the entire content. This allows non trivial savings in repair BW. In sharp contrast, classic codes, like Reed-Solomon (RS), used in current storage systems, are deemed to suffer from naive repair, i.e. downloading the entire stored message to repair one failed node. This mainly happens because they are scalar-linear. In this work, we present a simple framework that treats scalar codes as vector-linear. In some cases, this allows significant savings in repair BW. We show that vectorized scalar codes exhibit properties that simplify the design of repair schemes. Our framework can be seen as a finite field analogue of real interference alignment. Using our simplified framework, we design a scheme that we call clique-repair which provably identifies the best linear repair strategy for any scalar 2-parity MDS code, under some conditions on the sub-field chosen for vectorization. We specify optimal repair schemes for specific (5,3)- and (6,4)-Reed-Solomon (RS) codes. Further, we present a repair strategy for the RS code currently deployed in the Facebook Analytics Hadoop cluster that leads to 20% of repair BW savings over naive repair which is the repair scheme currently used for this code.
Repair Optimal Erasure Codes Through Hadamard Designs Papailiopoulos, DimitrisS; Dimakis, Alexandros G.; Cadambe, Viveck R.
IEEE transactions on information theory,
05/2013, Letnik:
59, Številka:
5
Journal Article
Recenzirano
In distributed storage systems that employ erasure coding, the issue of minimizing the total communication required to exactly rebuild a storage node after a failure arises. This repair bandwidth ...depends on the structure of the storage code and the repair strategies used to restore the lost data. Designing high-rate maximum-distance separable (MDS) codes that achieve the optimum repair communication has been a well-known open problem. Our work resolves, in part, this open problem. In this study, we use Hadamard matrices to construct the first explicit two-parity MDS storage code with optimal repair properties for all single node failures, including the parities. Our construction relies on a novel method of achieving perfect interference alignment over finite fields with a finite number of symbol extensions. We generalize this construction to design m -parity MDS codes that achieve the optimum repair communication for single systematic node failures.
Codes are widely used in many engineering applications to offer robustness against noise . In large-scale systems, there are several types of noise that can affect the performance of distributed ...machine learning algorithms-straggler nodes, system failures, or communication bottlenecks-but there has been little interaction cutting across codes, machine learning, and distributed systems. In this paper, we provide theoretical insights on how coded solutions can achieve significant gains compared with uncoded ones. We focus on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling . For matrix multiplication, we use codes to alleviate the effect of stragglers and show that if the number of homogeneous workers is <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula>, and the runtime of each subtask has an exponential tail, coded computation can speed up distributed matrix multiplication by a factor of <inline-formula> <tex-math notation="LaTeX">\log n </tex-math></inline-formula>. For data shuffling, we use codes to reduce communication bottlenecks, exploiting the excess in storage. We show that when a constant fraction <inline-formula> <tex-math notation="LaTeX">\alpha </tex-math></inline-formula> of the data matrix can be cached at each worker, and <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> is the number of workers, coded shuffling reduces the communication cost by a factor of <inline-formula> <tex-math notation="LaTeX">\left({\alpha + \frac {1}{n}}\right)\gamma (n) </tex-math></inline-formula> compared with uncoded shuffling, where <inline-formula> <tex-math notation="LaTeX">\gamma (n) </tex-math></inline-formula> is the ratio of the cost of unicasting <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> messages to <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> users to multicasting a common message (of the same size) to <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> users. For instance, <inline-formula> <tex-math notation="LaTeX">\gamma (n) \simeq n </tex-math></inline-formula> if multicasting a message to <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> users is as cheap as unicasting a message to one user. We also provide experimental results, corroborating our theoretical gains of the coded algorithms.
We prove that maximum-likelihood (ML) noncoherent sequence detection of orthogonal space-time block coded signals can always be performed in polynomial time with respect to the sequence length for ...static Rayleigh, correlated (in general) channels. Moreover, using recent results on efficient maximization of rank-deficient quadratic forms over finite alphabets, we develop an algorithm that performs ML noncoherent sequence detection with polynomial complexity. The order of the polynomial complexity of the proposed receiver equals two times the rank of the covariance matrix of the vectorized channel matrix. Therefore, the lower the Rayleigh channel covariance rank the lower the receiver complexity. Instead, for Ricean channel distribution, we prove that polynomial complexity is attained through the proposed receiver as long as the mean channel vector is in the range of the covariance matrix of the vectorized channel matrix. Hence, full-rank channel correlation is desired to guarantee polynomial ML noncoherent detection complexity for the case of static Ricean fading. Our results are presented for the general case of block-fading Rayleigh or Ricean channels where we provide conditions under which ML noncoherent sequence detection can be performed in polynomial time through our algorithm.
The computation of the sparse principal component of a matrix is equivalent to the identification of its principal submatrix with the largest maximum eigenvalue. Finding this optimal submatrix is ...what renders the problem NP-hard. In this paper, we prove that, if the matrix is positive semidefinite and its rank is constant, then its sparse principal component is polynomially computable. Our proof utilizes the auxiliary unit vector technique that has been recently developed to identify problems that are polynomially solvable. In addition, we use this technique to design an algorithm which, for any sparsity value, computes the sparse principal component with complexity O(N D+1 ), where N and D are the matrix size and rank, respectively. Our algorithm is fully parallelizable and memory efficient.
Maximum-Likelihood Noncoherent PAM Detection Papailiopoulos, D. S.; Elkheir, G. A.; Karystinos, G. N.
IEEE transactions on communications,
03/2013, Letnik:
61, Številka:
3
Journal Article
Recenzirano
Sequence detection offers improved error-rate performance over conventional symbol-by-symbol detection when channel knowledge is not available at the receiver end. However, maximum-likelihood (ML) ...noncoherent sequence detection is proven to be notoriously intractable in many communication settings. In this work, we develop a new ML sequence detector for pulse-amplitude modulation (PAM) or quadrature-amplitude modulation (QAM) transmissions in unknown Rayleigh fading. Our detector identifies the ML sequence with overall polynomial complexity. This is possible via an auxiliary-angle approach that unlocks a low-rank property of the ML detection problem, reduces the exponential-size set of solution sequences to a polynomial-size set of candidates, and guarantees that the ML sequence is always contained in this substantially smaller set.
Locally Repairable Codes Papailiopoulos, Dimitris S.; Dimakis, Alexandros G.
IEEE transactions on information theory,
2014-Oct., 2014-10-00, 20141001, Letnik:
60, Številka:
10
Journal Article
Recenzirano
Distributed storage systems for large-scale applications typically use replication for reliability. Recently, erasure codes were used to reduce the large storage overhead, while increasing data ...reliability. A main limitation of off-the-shelf erasure codes is their high-repair cost during single node failure events. A major open problem in this area has been the design of codes that: 1) are repair efficient and 2) achieve arbitrarily high data rates. In this paper, we explore the repair metric of locality, which corresponds to the number of disk accesses required during a single node repair. Under this metric, we characterize an information theoretic tradeoff that binds together the locality, code distance, and storage capacity of each node. We show the existence of optimal locally repairable codes (LRCs) that achieve this tradeoff. The achievability proof uses a locality aware flow-graph gadget, which leads to a randomized code construction. Finally, we present an optimal and explicit LRC that achieves arbitrarily high data rates. Our locality optimal construction is based on simple combinations of Reed-Solomon blocks.