A Memory-Disaggregated Radix Tree Luo, Xuchuan; Zuo, Pengfei; Shen, Jiacheng ...
ACM transactions on storage,
08/2024, Volume:
20, Issue:
3
Journal Article
Peer reviewed
Disaggregated memory (DM) is an increasingly prevalent architecture with high resource utilization. It separates computing and memory resources into two pools and interconnects them with fast ...networks. Existing range indexes on DM are based on B+ trees, which suffer from large inherent read and write amplifications. The read and write amplifications rapidly saturate the network bandwidth, resulting in low request throughput and high access latency of B+ trees on DM.In this article, we propose that the radix tree is more suitable for DM than the B+ tree due to smaller read and write amplifications. However, constructing a radix tree on DM is challenging due to the costly lock-based concurrency control, the bounded memory-side IOPS, and the complicated computing-side cache validation. To address these challenges, we design SMART, the first radix tree for disaggregated memory with high performance. Specifically, we leverage (1) a hybrid concurrency control scheme including lock-free internal nodes and fine-grained lock-based leaf nodes to reduce lock overhead, (2) a computing-side read-delegation and write-combining technique to break through the IOPS upper bound by reducing redundant I/Os, and (3) a simple yet effective reverse check mechanism for computing-side cache validation. Experimental results show that SMART achieves 6.1× higher throughput under typical write-intensive workloads and 2.8× higher throughput under read-only workloads in YCSB benchmarks, compared with state-of-the-art B+ trees on DM.
We consider the problem of private information retrieval (PIR) over a distributed storage system. The storage system consists of N non-colluding databases, each storing an MDS-coded version of M ...messages. In the PIR problem, the user wishes to retrieve one of the available messages without revealing the message identity to any individual database. We derive the information-theoretic capacity of this problem, which is defined as the maximum number of bits of the desired message that can be privately retrieved per one bit of downloaded information. We show that the PIR capacity in this case is C = (1 + K/N + K2/N2 + ⋯ + KM-1/NM-1)-1 = (1 + Rc + Rc2 + ⋯ + RcM-1)-1 = (1 - Rc)/(1 - RcM), where Rc is the rate of the (N, K) MDS code used. The capacity is a function of the code rate and the number of messages only regardless of the explicit structure of the storage code. The result implies a fundamental tradeoff between the optimal retrieval cost and the storage cost when the storage code is restricted to the class of MDS codes. The result generalizes the achievability and converse results for the classical PIR with replicated databases to the case of MDS-coded databases.
Microgrids are key components of future smart grids, which integrate distributed renewable energy generators to efficiently serve the load locally. However, the intermittent nature of renewable ...energy generations hinders the reliable operation of microgrids. Besides the commonly adopted methods such as deploying energy storage system (ESS) and supplementary fuel generator to address the intermittency issue, energy cooperation among microgrids by enabling their energy exchange for sharing is an appealing new solution. In this paper, we consider the energy management problem for two cooperative microgrids each with individual renewable energy generator and ESS. First, by assuming that the microgrids' renewable energy generation/load amounts are perfectly known ahead of time, we solve the off-line energy management problem optimally. Based on the obtained solution, we study the impacts of microgrids' energy cooperation and their ESSs on the total energy cost. Next, inspired by the off-line optimization solution, we propose online algorithms for the real-time energy management of the two cooperative microgrids. It is shown via simulations that the proposed online algorithms perform well in practice, have low complexity, and are also valid under arbitrary realizations of renewable energy generations/loads. Finally, we present one method to extend our proposed online algorithms to the general case of more than two microgrids based on a clustering approach.
Implementing cloud computing empowers numerous paths for Web-based service offerings to meet diverse needs. However, the data security and privacy has become a critical issue that restricts many ...cloud applications. One of the major concerns in security and privacy is caused by the fact that cloud operators have chances to reach the sensitive data. This concern dramatically increases users’ anxiety and reduces the adoptability of cloud computing in many fields, such as the financial industry and governmental agencies. This paper focuses on this issue and proposes an intelligent cryptography approach, by which the cloud service operators cannot directly reach partial data. The proposed approach divides the file and separately stores the data in the distributed cloud servers. An alternative approach is designed to determine whether the data packets need a split in order to shorten the operation time. The proposed scheme is entitled Security-Aware Efficient Distributed Storage (SA-EDS) model, which is mainly supported by our proposed algorithms, including Alternative Data Distribution (AD2) Algorithm, Secure Efficient Data Distributions (SED2) Algorithm and Efficient Data Conflation (EDCon) Algorithm. Our experimental evaluations have assessed both security and efficiency performances and the experimental results depict that our approach can effectively defend main threats from clouds and requires with an acceptable computation time.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK, ZRSKP
We consider the problem of X-secure and T-private linear computation with graph based replicated storage (GXSTPLC), which enables the user to privately retrieve a linear combination of messages from ...a set of N distributed servers where each message is restricted to be stored exclusively among a subset of servers, adhering to an X-security constraint. This constraint dictates that any group of up to X colluding servers must not disclose any information about the stored messages. Furthermore, any group of up to T servers is restricted from learning anything about the coefficients of the linear combination retrieved by the user. In this work, we completely characterize the asymptotic capacity of GXSTPLC, i.e., the supremum of achievable rates (which is the average number of desired symbols retrieved per downloaded symbol), in the limit as the number of messages K approaches infinity. Specifically, it is shown that a prior linear programming based upper bound on the asymptotic capacity of GXSTPLC due to Jia and Jafar is tight (thus settles their conjecture) by constructing achievability schemes. Notably, our achievability scheme also settles the exact capacity (i.e., for finite K) of X-secure linear combination with graph based replicated storage (GXSLC). Our achievability proof builds upon an achievability scheme for a closely related problem named asymmetric <inline-formula> <tex-math notation="LaTeX">\mathbf {X} </tex-math></inline-formula>-secure <inline-formula> <tex-math notation="LaTeX">\mathbf {T} </tex-math></inline-formula>-private linear computation with graph based replicated storage (Asymm-GXSTPLC) that guarantees non-uniform security and privacy levels across messages and coefficients (of the desired linear combination). In particular, by carefully designing Asymm-GXSTPLC settings for GXSTPLC problems, the corresponding Asymm-GXSTPLC schemes can be reduced to asymptotic capacity achieving schemes for GXSTPLC. In regard to the achievability scheme for Asymm-GXSTPLC, interesting aspects of our construction include a novel query and answer design which makes use of a Vandermonde decomposition of Cauchy matrices, and a trade-off among message replication, security and privacy thresholds.
Recovery sets for vectors and subspaces are important in the construction of distributed storage system codes. These concepts are also interesting in their own right. In this paper, we consider the ...following very basic recovery question: what is the maximum number of possible pairwise disjoint recovery sets for each recovered element? The recovered elements in this work are d -dimensional subspaces of a k -dimensional vector space over F q . Each server stores one representative for each distinct one-dimensional subspace of the k -dimensional vector space, or equivalently a distinct point of PG( k - 1, q ). As column vectors, the associated vectors of the stored one-dimensional subspaces form the generator matrix of the ( q k - 1)/( q - 1), k , q k -1 simplex code over F q . Lower bounds and upper bounds on the maximum number of such recovery sets are provided. It is shown that generally, these bounds are either tight or very close to being tight.
Locally Repairable Codes Papailiopoulos, Dimitris S.; Dimakis, Alexandros G.
IEEE transactions on information theory,
2014-Oct., 2014-10-00, 20141001, Volume:
60, Issue:
10
Journal Article
Peer reviewed
Open access
Distributed storage systems for large-scale applications typically use replication for reliability. Recently, erasure codes were used to reduce the large storage overhead, while increasing data ...reliability. A main limitation of off-the-shelf erasure codes is their high-repair cost during single node failure events. A major open problem in this area has been the design of codes that: 1) are repair efficient and 2) achieve arbitrarily high data rates. In this paper, we explore the repair metric of locality, which corresponds to the number of disk accesses required during a single node repair. Under this metric, we characterize an information theoretic tradeoff that binds together the locality, code distance, and storage capacity of each node. We show the existence of optimal locally repairable codes (LRCs) that achieve this tradeoff. The achievability proof uses a locality aware flow-graph gadget, which leads to a randomized code construction. Finally, we present an optimal and explicit LRC that achieves arbitrarily high data rates. Our locality optimal construction is based on simple combinations of Reed-Solomon blocks.
The massive redundant data storage and communication in network 4.0 environments have issues of low integrity, high cost, and easy tampering. To address these issues, in this article, a secure data ...storage and recovery scheme in the blockchain-based network is proposed by improving the decentration, tampering-proof, real-time monitoring, and management of storage systems, as such design supports the dynamic storage, fast repair, and update of distributed data in the data storage system of industrial nodes. A local regenerative code technology is used to repair and store data between failed nodes while ensuring the privacy of user data. That is, as the data stored are found to be damaged, multiple local repair groups constructed by vector code can simultaneously yet efficiently repair multiple distributed data storage nodes. Based on the unique chain storage structure, such as data consensus mechanism and smart contract, the storage structure of blockchain distributed coding not only quickly repair the nearby local regenerative codes in the blockchain but also reduce the resource overhead in the data storage process of industrial nodes. Experimental results show that the proposed scheme improves the repair rate of multinode data by 9% and data storage rate increased by 8.6%, indicating to be promising with good security and real-time performance.
A code over a finite alphabet is called locally recoverable (LRC) if every symbol in the encoding is a function of a small number (at most r ) other symbols. We present a family of LRC codes that ...attain the maximum possible value of the distance for a given locality parameter and code cardinality. The codewords are obtained as evaluations of specially constructed polynomials over a finite field, and reduce to a Reed-Solomon code if the locality parameter r is set to be equal to the code dimension. The size of the code alphabet for most parameters is only slightly greater than the code length. The recovery procedure is performed by polynomial interpolation over r points. We also construct codes with several disjoint recovering sets for every symbol. This construction enables the system to conduct several independent and simultaneous recovery processes of a specific symbol by accessing different parts of the codeword. This property enables high availability of frequently accessed data ("hot data").