We live in an increasingly interconnected world, with many organizations operating across countries or even continents. To serve their global user base, organizations are replacing their legacy DBMSs ...with cloud-based systems capable of scaling OLTP workloads to millions of users. CockroachDB is a scalable SQL DBMS that was built from the ground up to support these global OLTP workloads while maintaining high availability and strong consistency. Just like its namesake, CockroachDB is resilient to disasters through replication and automatic recovery mechanisms. This paper presents the design of CockroachDB and its novel transaction model that supports consistent geo-distributed transactions on commodity hardware. We describe how CockroachDB replicates and distributes data to achieve fault tolerance and high performance, as well as how its distributed SQL layer automatically scales with the size of the database cluster while providing the standard SQL interface that users expect. Finally, we present a comprehensive performance evaluation and share a couple of case studies of CockroachDB users. We conclude by describing lessons learned while building CockroachDB over the last five years.
ALEX: An Updatable Adaptive Learned Index Ding, Jialin; Minhas, Umar Farooq; Yu, Jia ...
Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data,
06/2020
Conference Proceeding
Open access
Recent work on "learned indexes" has changed the way we look at the decades-old field of DBMS indexing. The key idea is that indexes can be thought of as "models" that predict the position of a key ...in a dataset. Indexes can, thus, be learned. The original work by Kraska et al. shows that a learned index beats a B+ tree by a factor of up to three in search time and by an order of magnitude in memory footprint. However, it is limited to static, read-only workloads. In this paper, we present a new learned index called ALEX which addresses practical issues that arise when implementing learned indexes for workloads that contain a mix of point lookups, short range queries, inserts, updates, and deletes. ALEX effectively combines the core insights from learned indexes with proven storage and indexing techniques to achieve high performance and low memory footprint. On read-only workloads, ALEX beats the learned index from Kraska et al. by up to 2.2X on performance with up to 15X smaller index size. Across the spectrum of read-write workloads, ALEX beats B+ trees by up to 4.1X while never performing worse, with up to 2000X smaller index size. We believe ALEX presents a key step towards making learned indexes practical for a broader class of database workloads with dynamic updates.
FITing-Tree Galakatos, Alex; Markovitch, Michael; Binnig, Carsten ...
Proceedings of the 2019 International Conference on Management of Data,
06/2019
Conference Proceeding
Open access
Index structures are one of the most important tools that DBAs leverage to improve the performance of analytics and transactional workloads. However, building several indexes over large datasets can ...often become prohibitive and consume valuable system resources. In fact, a recent study showed that indexes created as part of the TPC-C benchmark can account for 55% of the total memory available in a modern DBMS. This overhead consumes valuable and expensive main memory, and limits the amount of space available to store new data or process existing data. In this paper, we present a novel data-aware index structure called FITing-Tree which approximates an index using piece-wise linear functions with a bounded error specified at construction time. This error knob provides a tunable parameter that allows a DBA to FIT an index to a dataset and workload by being able to balance lookup performance and space consumption. To navigate this tradeoff, we provide a cost model that helps determine an appropriate error parameter given either (1) a lookup latency requirement (e.g., 500ns) or (2) a storage budget (e.g., 100MB). Using a variety of real-world datasets, we show that our index is able to provide performance that is comparable to full index structures while reducing the storage footprint by orders of magnitude.
BullFrog is a relational DBMS that supports single-step schema migrations --- even those that are backwards incompatible --- without downtime, and without need for advanced warning. When a schema ...migration is submitted, BullFrog initiates a logical switch to the new schema, but physically migrates affected data lazily, as it is accessed by incoming transactions. BullFrog's internal concurrency control algorithms and data structures enable concurrent processing of schema migration operations with post-migration transactions, while ensuring exactly-once migration of all old data into the physical layout required by the new schema. BullFrog is implemented as an open source extension to PostgreSQL. Experiments using this prototype over a TPC-C based workload (supplemented to include schema migrations) show that BullFrog can achieve zero-downtime migration to non-trivial new schemas with near-invisible impact on transaction throughput and latency.
Cross-domain Data Management Du, Xiaoyong; Li, Tong; Lu, Wei ...
Ji suan ji ke xue,
01/2024, Volume:
51, Issue:
1
Journal Article
As data becomes a new production factor and the digital China is promoted as a top-level strategy, cross-domain data sharing and circulation play a crucial role in maximizing the value of data ...factors.The country has taken a series of measures such as completing the overall layout design of the national integrated data center system and launching the "East-West Computing" project, providing infrastructure for the cross-domain application of data factors.Cross-domain data management faces challenges in communication, data modeling, and data access.This paper explores the connotation, research challenges, and key technologies of cross-domain data management from three perspectives: cross-spatial domain, cross-administrative domain, and cross-trust domain, and discusses its future development trends.
With the continuous development of application requirements and technologies, the differences between distributed storage architectures and traditional storage architectures are becoming increasingly ...apparent. The distributed storage architecture not only has the advantages of small fault impact range, high scalability, and large throughput, but also is widely used in many application systems. In the transaction data management system based on the distributed storage architecture, a large amount of transaction data information is transmitted and stored using the distributed storage architecture, which can effectively ensure the accuracy and consistency of the data. At the same time, the stability of the transaction data management system will be enhanced.
Abstract
Mounting evidence suggested that dysfunction of long non-coding RNAs (lncRNAs) is involved in a wide variety of diseases. A knowledgebase with systematic collection and curation of ...lncRNA-disease associations is critically important for further examining their underlying molecular mechanisms. In 2013, we presented the first release of LncRNADisease, representing a database for collection of experimental supported lncRNA-disease associations. Here, we describe an update of the database. The new developments in LncRNADisease 2.0 include (i) an over 40-fold lncRNA-disease association enhancement compared with the previous version; (ii) providing the transcriptional regulatory relationships among lncRNA, mRNA and miRNA; (iii) providing a confidence score for each lncRNA-disease association; (iv) integrating experimentally supported circular RNA disease associations. LncRNADisease 2.0 documents more than 200 000 lncRNA-disease associations. We expect that this database will continue to serve as a valuable source for potential clinical application related to lncRNAs. LncRNADisease 2.0 is freely available at http://www.rnanut.net/lncrnadisease/.