The next-generation sequencing (NGS) technology outputs a huge number of sequences (reads) that require further processing. After applying prefiltering techniques in order to eliminate redundancy and ...to correct erroneous reads, an overlap-based assembler typically finds the longest exact suffix-prefix match between each ordered pair of the input reads. However, another trend has been evolving for the purpose of solving an approximate version of the overlap problem. The main benefit of this direction is the ability to skip time-consuming error-detecting techniques which are applied in the prefiltering stage. In this work, we present and compare two techniques to solve the approximate overlap problem. The first adapts a compact prefix tree to efficiently solve the approximate all-pairs suffix-prefix problem, while the other utilizes a well-known principle, namely, the pigeonhole principle, to identify a potential overlap match in order to ultimately solve the same problem. Our results show that our solution using the pigeonhole principle has better space and time consumption over an FM-based solution, while our solution based on prefix tree has the best space consumption between all three solutions. The number of mismatches (hamming distance) is used to define the approximate matching between strings in our work.
Oblivious RAM (ORAM) schemes exist in order to protect the access pattern of data in a data store. Under an ORAM algorithm, a client accesses a data store in such a way that does not reveal which ...item it is interested in. This is typically accomplished by accessing multiple items each access and periodically reshuffling some, or all, of the data in the data-store. While many recent schemes make the ORAM computation complexity feasible, the performance of practical implementations is still largely limited by computational and storage limitations of the client as well as the bandwidth available between the client and the data store. In a cloud computing environment, where it is commonly assumed that the client is underpowered and you must pay by the gigabyte for data transfer, traditional ORAM methods are not optimal.
Intel’s Software Guard Extensions (SGX) provide a new opportunity for ORAM implementations that can safely outsource the computational and bandwidth requirements along with the data itself, meaning that the client can be very limited and still attain high performance. In this work, we develop efficient techniques for constructing ORAMs that takes advantage of the SGX enclave technology. We demonstrate implementations of multiple ORAM schemes (linear, square root, and path ORAM) using Intel’s SGX. We discuss the limitations of SGX as they pertain to implementing ORAM, and discuss alterations to the standard algorithms to overcome these limitations. We then evaluate the performance of our techniques.
The evolution of the next generation sequencing technology increases the demand for efficient solutions, in terms of space and time, for several bioinformatics problems. This paper presents a ...practical and easy-to-implement solution for one of these problems, namely, the all-pairs suffix-prefix problem, using a compact prefix tree. The paper demonstrates an efficient construction of this time-efficient and space-economical tree data structure. The paper presents techniques for parallel implementations of the proposed solution. Experimental evaluation indicates superior results in terms of space and time over existing solutions. Results also show that the proposed technique is highly scalable in a parallel execution environment.
There is a growing need for systems that efficiently support the work of medical teams at the precision-oncology point of care. Here, we present the implementation of the Molecular Tumor Board Portal ...(MTBP), an academic clinical decision support system developed under the umbrella of Cancer Core Europe that creates a unified legal, scientific and technological platform to share and harness next-generation sequencing data. Automating the interpretation and reporting of sequencing results decrease the need for time-consuming manual procedures that are prone to errors. The adoption of an expert-agreed process to systematically link tumor molecular profiles with clinical actions promotes consistent decision-making and structured data capture across the connected centers. The use of information-rich patient reports with interactive content facilitates collaborative discussion of complex cases during virtual molecular tumor board meetings. Overall, streamlined digital systems like the MTBP are crucial to better address the challenges brought by precision oncology and accelerate the use of emerging biomarkers.
The all-pairs suffix-prefix matching problem is a basic problem in string processing. It has an application in the de novo genome assembly task, which is one of the major bioinformatics problems. Due ...to the large size of the input data, it is crucial to use fast and space efficient solutions. In this paper, we present a space-economical solution to this problem using the generalized Sadakane compressed suffix tree. Furthermore, we present a parallel algorithm to provide more speed for shared memory computers. Our sequential and parallel algorithms are optimized by exploiting features of the Sadakane compressed index data structure. Experimental results show that our solution based on the Sadakane’s compressed index consumes significantly less space than the ones based on noncompressed data structures like the suffix tree and the enhanced suffix array. Our experimental results show that our parallel algorithm is efficient and scales well with increasing number of processors.
Private Function Evaluation (PFE) is the problem of evaluating one party’s private data using a private function owned by another party. Existing solutions for PFE are based on universal circuits ...evaluated in secure multiparty computations or on hiding the circuit’s topology and the gate’s functionality through additive homomorphic encryption. These solutions, however, are not efficient enough for practical use; hence there is a need for more efficient techniques. This work looks at utilizing the Intel Software Guard Extensions platform (SGX) to provide a more practical solution for PFE where the privacy of the data and the function are both preserved. Notably, our solution carefully avoids the pitfalls of side-channel attacks on SGX. We present solutions for two different scenarios: the first is when the function’s owner has an SGX-enabled device and the other is when a third party (or one of the data owners) has the SGX capability. Our results show a clear expected advantage in terms of running time for the first case over the second. Investigating the slowdown in the second case leads to the garbling time which constitutes more than 60% of the consumed time. Both solutions clearly outperform FairplayPF in our tests.
Compressed indices are important data structures in stringology. Compressed versions of many well-known data structures such as suffix tree and suffix array, which are used in string matching ...problems, have been studied and proposed. This paper takes advantage of a very recent compressed suffix array to build a space-economic solution for an important bioinformatics problem, namely the all-pairs suffix prefix problem. The paper also presents a simple technique for parallelizing the solution. Our results show that the proposed solution consumes less than one fifth of the space required by other solutions based on standard data structures. In addition, our results demonstrate that good performance scalability can be achieved by employing the proposed parallel algorithm.