We study the problem of making a distributed storage system information-theoretically secure against a passive eavesdropper, and aim to characterize coding schemes that are universally secure for up ...to a given number of eavesdropped nodes. Specifically, we consider minimum storage regenerating (MSR) codes and ask the following question: For an MSR code where a failed node is repaired using all the remaining nodes, is it possible to simultaneously be optimally secure using a single linear coding scheme? We define a pareto-optimality associated with this simultaneity and show that there exists at least one linear coding scheme that is pareto-optimal.
We show that the network coding and index coding problems are equivalent. This equivalence holds in the general setting which includes linear and nonlinear codes. Specifically, we present a reduction ...that maps a network coding instance to an index coding instance while preserving feasibility, i.e., the network coding instance has a feasible solution if and only if the corresponding index coding instance is feasible. In addition, we show that one can determine the capacity region of a given network coding instance with colocated sources by studying the capacity region of a corresponding index coding instance. Previous connections between network and index coding were restricted to the linear case.
One-Shot PIR: Refinement and Lifting D'Oliveira, Rafael G. L.; El Rouayheb, Salim
IEEE transactions on information theory,
2020-April, 2020-4-00, Volume:
66, Issue:
4
Journal Article
Peer reviewed
Open access
We study a class of private information retrieval (PIR) methods that we call one-shot schemes. The intuition behind one-shot schemes is the following. The user's query is regarded as a dot product of ...a query vector and the message vector (database) stored at multiple servers. Privacy, in an information theoretic sense, is then achieved by encrypting the query vector using a secure linear code, such as secret sharing. Several PIR schemes in the literature, in addition to novel ones constructed here, fall into this class. One-shot schemes provide an insightful link between PIR and data security against eavesdropping. However, their download rate is not optimal, i.e., they do not achieve the PIR capacity. Our main contribution is two transformations of one-shot schemes, which we call refining and lifting. We show that refining and lifting one-shot schemes gives capacity-achieving schemes for the cases when the PIR capacity is known. In the other cases, when the PIR capacity is still unknown, refining and lifting one-shot schemes gives, for most parameters, the best download rate so far.
The index coding problem has recently attracted a significant attention from the research community due to its theoretical significance and applications in wireless ad hoc networks. An instance of ...the index coding problem includes a sender that holds a set of information messages X={x 1 ,...,x k } and a set of receivers R . Each receiver (x,H) in R needs to obtain a message x X and has prior side information consisting of a subset H of X . The sender uses a noiseless communication channel to broadcast encoding of messages in X to all clients. The objective is to find an encoding scheme that minimizes the number of transmissions required to satisfy the demands of all the receivers. In this paper, we analyze the relation between the index coding problem, the more general network coding problem, and the problem of finding a linear representation of a matroid. In particular, we show that any instance of the network coding and matroid representation problems can be efficiently reduced to an instance of the index coding problem. Our reduction implies that many important properties of the network coding and matroid representation problems carry over to the index coding problem. Specifically, we show that vector linear codes outperform scalar linear index codes and that vector linear codes are insufficient for achieving the optimum number of transmissions.
We study Private Information Retrieval with Side Information (PIR-SI) in the single-server multi-message setting. In this setting, a user wants to download D messages from a database of K \geq D ...messages, stored on a single server, without revealing any information about the identities of the demanded messages to the server. The goal of the user is to achieve information-theoretic privacy by leveraging the side information about the database. The side information consists of a random subset of M messages. The identities of the messages forming the side information are initially unknown to the server. Our goal is to characterize the capacity of this setting, i.e., the maximum achievable download rate. In our previous work, we have established the PIR-SI capacity for the special case in which the user wants a single message, i.e., D = 1 and showed that the capacity can be achieved through the Partition and Code scheme. In this paper, we focus on the case when the user wants multiple messages, i.e., D\lt /p\gt \gt 1. Our first result is that if the user wants more messages than what they have as side information, i.e., D \gt M, then the capacity is \frac{D}{K-M}, and it can be achieved using a scheme based on the Generalized Reed-Solomon codes. Our second result shows that when D\leq M the capacity can be higher. We present a lower bound on the capacity based on an achievability scheme which we call Generalized Partition and Code.
Codes for Correcting Localized Deletions Kas Hanna, Serge; El Rouayheb, Salim
IEEE transactions on information theory,
2021-April, 2021-4-00, Volume:
67, Issue:
4
Journal Article
Peer reviewed
Open access
We consider the problem of constructing binary codes for correcting deletions that are localized within certain parts of the codeword that are unknown a priori. The model that we study is when ...<inline-formula> <tex-math notation="LaTeX">\delta \leq w </tex-math></inline-formula> deletions are localized in a window of size <inline-formula> <tex-math notation="LaTeX">w </tex-math></inline-formula> bits. These <inline-formula> <tex-math notation="LaTeX">\delta </tex-math></inline-formula> deletions do not necessarily occur in consecutive positions, but are restricted to the window of size <inline-formula> <tex-math notation="LaTeX">w </tex-math></inline-formula>. The localized deletions model is a generalization of the bursty model, in which all the deleted bits are consecutive. In this paper, we construct new explicit codes for the localized model, based on the family of Guess & Check codes which was previously introduced by the authors. The codes that we construct can correct, with high probability, <inline-formula> <tex-math notation="LaTeX">\delta \leq w </tex-math></inline-formula> deletions that are localized in a single window of size <inline-formula> <tex-math notation="LaTeX">w </tex-math></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">w </tex-math></inline-formula> grows with the block length. Moreover, these codes are systematic; have low redundancy; and have efficient deterministic encoding and decoding algorithms. We also generalize these codes to deletions that are localized within multiple windows in the codeword.
We consider the problem of designing private information retrieval (PIR) schemes on data of m files replicated on n servers that can possibly collude. We focus on devising robust PIR schemes that can ...tolerate stragglers, i.e., slow or unresponsive servers. In many settings, the number of stragglers is not known a priori or may change with time. We define universally robust PIR as schemes that achieve PIR capacity asymptotically in m and simultaneously for any number of stragglers up to a given threshold. We introduce Staircase-PIR schemes and prove that they are universally robust. Towards that end, we establish an equivalence between robust PIR and communication efficient secret sharing.
We extend the notion of locality from the Hamming metric to the rank and subspace metrics. Our main contribution is to construct a class of array codes with locality constraints in the rank metric. ...Our motivation for constructing such codes stems from the need to design codes for efficient data recovery from correlated and/or mixed ( i.e., complete and partial) failures in distributed storage systems. Specifically, the proposed local rank-metric codes can recover locally from crisscross errors and erasures , which affect a limited number of rows and/or columns of the storage array. We also derive a Singleton-like upper bound on the minimum rank distance of (linear) codes with rank-locality constraints. Our proposed construction achieves this bound for a broad range of parameters. The construction builds upon Tamo and Barg's method for constructing locally repairable codes with optimal minimum Hamming distance. Finally, we construct a class of constant-dimension subspace codes (also known as Grassmannian codes) with locality constraints in the subspace metric. The key idea is to show that a Grassmannian code with locality can be easily constructed from a rank-metric code with locality by using the lifting method proposed by Silva et al. We present an application of such codes for distributed storage systems, wherein nodes are connected over a network that can introduce errors and erasures.
We study the problem of Private Information Retrieval (PIR) in the presence of prior side information. The problem setup includes a database of K independent messages possibly replicated on several ...servers, and a user that needs to retrieve one of these messages. In addition, the user has some prior side information in the form of a subset of M messages, not containing the desired message and unknown to the servers. This problem is motivated by practical settings in which the user can obtain side information opportunistically from other users or has previously downloaded some messages using classical PIR schemes. The objective of the user is to retrieve the required message without revealing its identity while minimizing the amount of data downloaded from the server. We focus on achieving information-theoretic privacy in two scenarios: (i) the user wants to protect jointly its demand and side information; (ii) the user wants to protect only the information about its demand, but not the side information. To highlight the role of side information, we focus on the case of a single server. We prove that, in the first scenario, the minimum download cost is K-M messages, and in the second scenario, it is ⌈K/M+1⌉ messages. This is a significant improvement compared to the minimum cost of K messages in the setting where the user has no side information. Our proof techniques use a reduction from the PIR with side information problem to an index coding problem. We leverage this reduction to prove converse results, as well as to design achievability schemes.
The growing availability of personal genomics services comes with increasing concerns for genomic privacy. Individuals may wish to withhold sensitive genotypes that contain critical health-related ...information when sharing their data with such services. A straightforward solution that masks only the sensitive genotypes does not ensure privacy due to the correlation structure within the genome. Here, we develop an informationtheoretic mechanism for masking sensitive genotypes, which ensures no information about the sensitive genotypes is leaked. We also propose an efficient algorithmic implementation of our mechanism for genomic data governed by hidden Markov models. Our work is a step towards more rigorous control of privacy in genomic data sharing.