We present Modular Polynomial (MP) Codes for Secure Distributed Matrix Multiplication (SDMM). The construction is based on the observation that one can decode certain proper subsets of the ...coefficients of a polynomial with fewer evaluations than is necessary to interpolate the entire polynomial. We also present Generalized Gap Additive Secure Polynomial (GGASP) codes. Both MP and GGASP codes are shown experimentally to perform favorably in terms of recovery threshold when compared to other polynomials codes for SDMM which use the grid partition. Both MP and GGASP codes achieve the recovery threshold of Entangled Polynomial Codes for robustness against stragglers, but MP codes can decode below this recovery threshold depending on the set of worker nodes which fails. The decoding complexity of MP codes is shown to be lower than other approaches in the literature, due to the user not being tasked with interpolating an entire polynomial.
The problem of providing privacy, in the private information retrieval (PIR) sense, to users requesting data from a distributed storage system (DSS), is considered. The DSS is coded by an ...<inline-formula> <tex-math notation="LaTeX">(n,k,d) </tex-math></inline-formula> maximum distance separable code to store the data reliably on unreliable storage nodes. Some of these nodes can be spies which report to a third party, such as an oppressive regime, which data is being requested by the user. An information theoretic PIR scheme ensures that a user can satisfy its request while revealing no information on which data is being requested to the nodes. A user can trivially achieve PIR by downloading all the data in the DSS. However, this is not a feasible solution due to its high communication cost. We construct PIR schemes with low download communication cost. When there is <inline-formula> <tex-math notation="LaTeX">b=1 </tex-math></inline-formula> spy node in the DSS, in other words, no collusion between the nodes, we construct PIR schemes with download cost <inline-formula> <tex-math notation="LaTeX">\frac {1}{1-R} </tex-math></inline-formula> per unit of requested data (<inline-formula> <tex-math notation="LaTeX">R=k/n </tex-math></inline-formula> is the code rate), achieving the information theoretic limit for linear schemes. The proposed schemes are universal since they depend on the code rate, but not on the generator matrix of the code. Also, if <inline-formula> <tex-math notation="LaTeX">b\leq n-\delta k </tex-math></inline-formula> nodes collude, with <inline-formula> <tex-math notation="LaTeX">\delta =\lfloor {\frac {n-b}{k}}\rfloor </tex-math></inline-formula>, we construct linear PIR schemes with download cost <inline-formula> <tex-math notation="LaTeX">\frac {b+\delta k}{\delta } </tex-math></inline-formula>.
The problem of private information retrieval (PIR) from coded storage systems with colluding, Byzantine, and unresponsive servers is considered. An explicit scheme using an <inline-formula> <tex-math ...notation="LaTeX">n,k </tex-math></inline-formula> Reed-Solomon storage code is designed, protecting against <inline-formula> <tex-math notation="LaTeX">t </tex-math></inline-formula>-collusion, and handling up to <inline-formula> <tex-math notation="LaTeX">b </tex-math></inline-formula> Byzantine and <inline-formula> <tex-math notation="LaTeX">r </tex-math></inline-formula> unresponsive servers, when <inline-formula> <tex-math notation="LaTeX">n>k+t+2b+r-1 </tex-math></inline-formula>. This scheme achieves a PIR rate of <inline-formula> <tex-math notation="LaTeX">(({n-r-(k+2b+t-1)})/{n-r}) </tex-math></inline-formula>. In the case where the capacity is known, namely, when <inline-formula> <tex-math notation="LaTeX">k=1 </tex-math></inline-formula>, it is asymptotically capacity achieving as the number of files grows. Finally, the scheme is adapted to symmetric PIR.
In this paper, the problem of providing privacy to users requesting data over a network from a distributed storage system (DSS) is considered. The DSS, which is considered as the multi-terminal ...destination of the network from the user's perspective, is encoded by a maximum rank distance (MRD) code to store the data on these multiple servers. A private information retrieval (PIR) scheme ensures that a user can request a file without revealing any information on which file is being requested to any of the servers. In this paper, a novel PIR scheme is proposed, allowing the user to recover a file from a storage system with low communication cost, while allowing some servers in the system to collude in the quest of revealing the identity of the requested file. The network is modeled as a random linear network, i.e., all nodes of the network forward random (unknown) linear combinations of incoming packets. Both error-free and erroneous random linear networks are considered.
A private information retrieval (PIR) scheme allows a user to retrieve a file from a database without revealing any information on the file being requested. As of now, PIR schemes have been proposed ...for several kinds of storage systems, including replicated and MDS-coded systems. However, the problem of constructing PIR schemes on regenerating codes has been sparsely considered. A regenerating code is a storage code whose codewords are distributed among nodes, enabling efficient storage of files, as well as low-bandwidth retrieval of files and repair of nodes. Minimum-bandwidth regenerating (MBR) codes define a family of regenerating codes allowing a node repair with optimal bandwidth. Rashmi, Shah, and Kumar obtained a large family of MBR codes using the product-matrix (PM) construction. In this work, a new PIR scheme over PM-MBR codes is designed. The inherent redundancy of the PM structure is used to reduce the download communication complexity of the scheme. A lower bound on the PIR capacity of MBR-coded PIR schemes is derived, showing an interesting storage space vs. PIR rate trade-off compared to existing PIR schemes with the same reconstruction capability. The present scheme also outperforms a recent PM-MBR PIR construction of Dorkson and Ng.
We present Modular Polynomial (MP) Codes for Secure Distributed Matrix Multiplication (SDMM). The construction is based on the observation that one can decode certain proper subsets of the ...coefficients of a polynomial with fewer evaluations than is necessary to interpolate the entire polynomial. We also present Generalized Gap Additive Secure Polynomial (GGASP) codes. Both MP and GGASP codes are shown experimentally to perform favorably in terms of recovery threshold when compared to other comparable polynomials codes for SDMM which use the grid partition. Both MP and GGASP codes achieve the recovery threshold of Entangled Polynomial Codes for robustness against stragglers, but MP codes can decode below this recovery threshold depending on the set of worker nodes which fails. The decoding complexity of MP codes is shown to be lower than other approaches in the literature, due to the user not being tasked with interpolating an entire polynomial.
We consider the problem of designing PIR scheme on coded data when certain nodes are unresponsive. We provide the construction of ν-robust PIR schemes that can tolerate up to ν unresponsive nodes. ...These schemes are adaptive and universally optimal in the sense of achieving (asymptotically) optimal download cost for any number of unresponsive nodes up to ν.
In this work, we introduce a differentially private method for generating synthetic data from vertically partitioned data, \emph{i.e.}, where data of the same individuals is distributed across ...multiple data holders or parties. We present a differentially privacy stochastic gradient descent (DP-SGD) algorithm to train a mixture model over such partitioned data using variational inference. We modify a secure multiparty computation (MPC) framework to combine MPC with differential privacy (DP), in order to use differentially private MPC effectively to learn a probabilistic generative model under DP on such vertically partitioned data. Assuming the mixture components contain no dependencies across different parties, the objective function can be factorized into a sum of products of the contributions calculated by the parties. Finally, MPC is used to compute the aggregate between the different contributions. Moreover, we rigorously define the privacy guarantees with respect to the different players in the system. To demonstrate the accuracy of our method, we run our algorithm on the Adult dataset from the UCI machine learning repository, where we obtain comparable results to the non-partitioned case.
In this paper, the problem of providing privacy to users requesting data over a network from a distributed storage system (DSS) is considered. The DSS, which is considered as the multi-terminal ...destination of the network from the user's perspective, is encoded by a maximum rank distance (MRD) code to store the data on these multiple servers. A private information retrieval (PIR) scheme ensures that a user can request a file without revealing any information on which file is being requested to any of the servers. In this paper, a novel PIR scheme is proposed, allowing the user to recover a file from a storage system with low communication cost, while allowing some servers in the system to collude in the quest of revealing the identity of the requested file. The network is modeled as a random linear network, i.e., all nodes of the network forward random (unknown) linear combinations of incoming packets. Both error-free and erroneous random linear networks are considered.
A private information retrieval (PIR) scheme on coded storage systems with colluding, byzantine, and non-responsive servers is presented. Furthermore, the scheme can also be used for symmetric PIR in ...the same setting. An explicit scheme using an n, k generalized Reed-Solomon storage code is designed, protecting against t-collusion and handling up to b byzantine and r non-responsive servers, when n ≥ n 1' =(ν+1)k+t+2b+r-1, for some integer ν ≥ 1. This scheme achieves a PIR rate of 1-(k+2b+t+r-1)/(n ' -r). In the case where the capacity is known, namely when k=1, it is asymptotically capacity achieving as the number of files grows.