Optimal locally repairable codes with information locality are considered. Optimal codes are constructed, whose length is also order-optimal with respect to a new bound on the code length derived in ...this article. The length of the constructed codes is super-linear in the alphabet size, which improves upon the well known pyramid codes, whose length is only linear in the alphabet size. The recoverable erasure patterns are also analyzed for the new codes. Based on the recoverable erasure patterns, we construct generalized sector-disk (GSD) codes, which can recover from disk erasures mixed with sector erasures in a more general setting than known sector-disk (SD) codes. Additionally, the number of sectors in the constructed GSD codes is super-linear in the alphabet size, compared with known SD codes, whose number of sectors is only linear in the alphabet size.
The ability to store data in the DNA of a living organism has applications in a variety of areas including synthetic biology and watermarking of patented genetically modified organisms. Data stored ...in this medium are subject to errors arising from various mutations, such as point mutations, indels, and tandem duplication, which need to be corrected to maintain data integrity. In this paper, we provide error-correcting codes for errors caused by tandem duplications, which create a copy of a block of the sequence and insert it in a tandem manner, i.e., next to the original. In particular, we present two families of codes for correcting errors due to tandem duplications of a fixed length: the first family can correct any number of errors, while the second corrects a bounded number of errors. We also study codes for correcting tandem duplications of length up to a given constant k, where we are primarily focused on the cases of k = 2,3. Finally, we provide a full classification of the sets of lengths allowed in tandem duplication that result in a unique root for all sequences.
We study array codes which are based on subspaces of a linear space over a finite field, using spreads, <inline-formula> <tex-math notation="LaTeX">q </tex-math></inline-formula>-Steiner systems, and ...subspace transversal designs. We present several constructions of such codes which are <inline-formula> <tex-math notation="LaTeX">q </tex-math></inline-formula>-analogs of some known block codes, such as the Hamming and simplex codes. We examine the locality and availability of the constructed codes. In particular, we distinguish between two types of locality and availability: node versus symbol. The resulting codes have distinct symbol/node locality/availability, allowing a more efficient repair process for a single symbol stored in a storage node of a distributed storage system, compared with the repair process for the whole node.
Tandem repeat sequences are common in the genomes of many organisms and are known to cause important phenomena such as gene silencing and rapid morphological changes. Due to the presence of multiple ...copies of the same pattern in tandem repeats and their high variability, they contain a wealth of information about the mutations that have led to their formation. The ability to extract this information can enhance our understanding of evolutionary mechanisms.
We present a stochastic model for the formation of tandem repeats via tandem duplication and substitution mutations. Based on the analysis of this model, we develop a method for estimating the relative mutation rates of duplications and substitutions, as well as the total number of mutations, in the history of a tandem repeat sequence. We validate our estimation method via Monte Carlo simulation and show that it outperforms the state-of-the-art algorithm for discovering the duplication history. We also apply our method to tandem repeat sequences in the human genome, where it demonstrates the different behaviors of micro- and mini-satellites and can be used to compare mutation rates across chromosomes. It is observed that chromosomes that exhibit the highest mutation activity in tandem repeat regions are the same as those thought to have the highest overall mutation rates. However, unlike previous works that rely on comparing human and chimpanzee genomes to measure mutation rates, the proposed method allows us to find chromosomes with the highest mutation activity based on a single genome, in essence by comparing (approximate) copies of the pattern in tandem repeats.
The prevalence of tandem repeats in most organisms and the efficiency of the proposed method enable studying various aspects of the formation of tandem repeats and the surrounding sequences in a wide range of settings.
The implementation of the estimation method is available at http://ips.lab.virginia.edu/smtr .
We study the Singleton-type bound that provides an upper limit on the minimum distance of locally repairable codes. We present an improved bound by carefully analyzing the combinatorial structure of ...the repair sets. Thus, we show the previous bound is unachievable for certain parameters. We then also provide explicit constructions of optimal codes which show that for certain parameters the new bound is sharp. Additionally, as a byproduct, some previously known codes are shown to attain the new bound and are thus proved to be optimal.
We study error-correcting codes for permutations under the infinity norm, motivated by a novel storage scheme for flash memories called rank modulation. In this scheme, a set of n flash cells are ...combined to create a single virtual multilevel cell. Information is stored in the permutation induced by the cell charge levels. Spike errors, which are characterized by a limited-magnitude change in cell charge levels, correspond to a low-distance change under the infinity norm. We define codes protecting against spike errors, called limited-magnitude rank-modulation codes (LMRM codes), and present several constructions for these codes, some resulting in optimal codes. These codes admit simple recursive, and sometimes direct, encoding and decoding procedures. We also provide lower and upper bounds on the maximal size of LMRM codes both in the general case, and in the case where the codes form a subgroup of the symmetric group. In the asymptotic analysis, the codes we construct outperform the Gilbert-Varshamov-like bound estimate.
Systematic Error-Correcting Codes for Rank Modulation Hongchao Zhou; Schwartz, Moshe; Jiang, Anxiao Andrew ...
IEEE transactions on information theory,
2015-Jan., 2015-1-00, 20150101, Letnik:
61, Številka:
1
Journal Article
Recenzirano
Odprti dostop
The rank-modulation scheme has been recently proposed for efficiently storing data in nonvolatile memories. In this paper, we explore n, k, d systematic error-correcting codes for rank modulation. ...Such codes have length n, k information symbols, and minimum distance d. Systematic codes have the benefits of enabling efficient information retrieval in conjunction with memory-scrubbing schemes. We study systematic codes for rank modulation under Kendall's T-metric as well as under the ℓ ∞ -metric. In Kendall's T-metric, we present k + 2, k, 3 systematic codes for correcting a single error, which have optimal rates, unless systematic perfect codes exist. We also study the design of multierror-correcting codes, and provide a construction of k + t + 1, k, 2t + 1 systematic codes, for large-enough k. We use nonconstructive arguments to show that for rank modulation, systematic codes achieve the same capacity as general error-correcting codes. Finally, in the ℓ ∞ -metric, we construct two n, k, d systematic multierror-correcting codes, the first for the case of d = 0(1) and the second for d = Θ(n). In the latter case, the codes have the same asymptotic rate as the best codes currently known in this metric.
We study network maximum distance separable (MDS) codes, which are a class of network error-correcting codes whose distance attains the Singleton-type bound. The minimum field size of a network MDS ...code is of particular interest, since it impacts the computing complexity at the network nodes. Previous constructions of network MDS codes, which are applicable to general single-source multicast networks, require large field sizes. In this paper, for two specific classes of network topologies, we derive upper and lower bounds on the minimum field size of the corresponding network MDS codes and present explicit constructions. The proposed upper bounds significantly improve upon the previous ones and differ from the lower bounds only by a small factor, which is asymptotically no more than 2. Additionally, we extend the concept of linear network error-correction coding from the scalar case to the vector case, and demonstrate a class of networks in which the minimum field size of the vector network MDS code is substantially smaller than that of the scalar case.
DNA as a data storage medium has several advantages, including far greater data density compared to electronic media. We propose that schemes for data storage in the DNA of living organisms may ...benefit from studying the reconstruction problem, which is applicable whenever multiple reads of noisy data are available. This strategy is uniquely suited to the medium, which inherently replicates stored data in multiple distinct ways, caused by mutations. We consider noise introduced solely by uniform tandem-duplication, and utilize the relation to constant-weight integer codes in the Manhattan metric. By bounding the intersection of the cross-polytope with hyperplanes, we prove the existence of reconstruction codes with full rate, as well as suggest a construction for a family of reconstruction codes.