We did whole-transcriptome sequencing and whole-genome sequencing on nine pairs of Hepatocellular carcinoma (HCC) tumors and matched adjacent tissues to identify RNA editing events. We identified ...mean 26,982 editing sites with mean 89.5% canonical A→G edits in each sample using an improved bioinformatics pipeline. The editing rate was significantly higher in tumors than adjacent normal tissues. Comparing the difference between tumor and normal tissues of each patient, we found 7 non-synonymous tissue specific editing events including 4 tumor-specific edits and 3 normal-specific edits in the coding region, as well as 292 edits varying in editing degree. The significant expression changes of 150 genes associated with RNA editing were found in tumors, with 3 of the 4 most significant genes being cancer related. Our results show that editing might be related to higher gene expression. These findings indicate that RNA editing modification may play an important role in the development of HCC.
•Mean 26,982 RNA editing sites with mean 89% A→G edits in hepatocellular carcinoma.•The editing rate is significantly higher in tumors than adjacent normal tissues.•Seven non-synonymous tissue specific editing events are in the coding region.•RNA editing might be related to higher gene expression.
Hash tables can efficiently determine whether an element exists in a given set and have been widely used in computer networks, the Internet of Things (IoT), data centers, and stream data mining. With ...the continuous generation of massive data, the memory consumption of hash tables keeps increasing. The emerging Compute Express Link (CXL) technique can significantly expand memory capacity. Porting hash tables from DRAM to CXL memory can alleviate the issue that hash tables occupy significant amounts of DRAM space. However, porting hash tables to CXL memory is not a trivial task. This paper analyzes the challenges of porting hash tables to CXL memory and shows opportunities to address these challenges.
Approximate membership query (AMQ) data structures can approximately determine whether an element is in the set with high efficiency. They are widely used in distributed systems, database systems, ...bioinformatics, IoT applications, data stream mining, etc. However, the memory consumption of AMQ data structures grows rapidly as the data scale grows, which limits the system's ability to process a massive amount of data. The emerging persistent memory provides a close-to-DRAM access speed and terabyte-level capacity, facilitating AMQ data structures to handle massive data. Nevertheless, existing AMQ data structures perform poorly on persistent memory due to intensive random accesses and/or sequential writes. Therefore, we propose a novel AMQ data structure called wormhole filter, which achieves high performance on persistent memory by reducing random accesses and sequential writes. In addition, we reduce the number of log records for lower recovery overhead. Theoretical analysis and experimental results show that wormhole filters significantly outperform competitive state-of-the-art AMQ data structures. For example, wormhole filters achieve 23.26× insertion throughput, 1.98× positive lookup throughput, and 8.82× deletion throughput of the best competing baseline.
Approximate membership query (AMQ) data structures can efficiently indicate whether an element exists in a data set. Therefore, they are widely used in data mining applications such as IoT streaming ...data mining, anomaly detection, duplicate detection, record linkage, and community discovery. The data amount to be processed in real-world applications often changes frequently and dynamically. Thus, before using the AMQ data structures, it is necessary to configure their capacity to the maximum number of elements that will be stored during runtime. We observe that when the number of elements stored in an AMQ data structure is lower than its capacity, a significant amount of space is wasted, making the false positive rate much higher than expected. To tackle this problem, we propose the variable-length encoding framework. It dynamically adjusts the encoding length of each element according to the number of elements stored in the AMQ data structure. Based on this design, the variable-length encoding framework can make full use of the memory space allocated to AMQ data structures, thereby improving the space efficiency and reducing the false positive rate. In addition, as a general encoding scheme, the variable-length encoding framework can be widely used in different types of AMQ data structures. Theoretical analysis and evaluation results show that AMQ data structures using the variable-length encoding framework have significantly lower false positive rates compared with state-of-the-art AMQ data structures. For example, when the load factor is 25%, the variable-length encoding framework can reduce the false positive rate of AMQ data structures by 88.15% on average (up to 99.40%).