We study the problem of processing supergraph queries on graph databases. A graph database
D
is a large set of graphs. A
supergraph query
q
on
D
is to retrieve all the graphs in
D
such that
q
is a ...supergraph of them. The large number of graphs in databases and the NP-completeness of subgraph isomorphism testing make it challenging to efficiently processing supergraph queries. In this paper, a new approach to processing supergraph queries is proposed. Specifically, a method for compactly organizing graph databases is first presented. Common subgraphs of the graphs in a database are stored only once in the compact organization of the database, in order to reduce the overall cost of subgraph isomorphism testings from the stored graphs to queries during query processing. Then, an exact algorithm and an approximate algorithm for generating the significant feature set with optimal order are proposed, followed by the algorithms for indices construction on graph databases. The optimal order on the feature set is to reduce the number of subgraph isomorphism testings during query processing. Based on the compact organization of graph databases, a novel algorithm for testing subgraph isomorphisms from multiple graphs to one graph is presented. Finally, based on all the above techniques, a query processing method is proposed. Analytical and experimental results show that the proposed algorithms outperform the existing similar algorithms by one to two orders of magnitude.
In the traditional medical treatment, the number of hospital patients is too large, the medical resources are too tight, and the accumulated medical knowledge is difficult to be fully utilized. With ...the help of the Internet and artificial intelligence technology, the reform of traditional medical treatment can realize the preliminary diagnosis and triage in the process of diagnosis and treatment. Based on the electronic medical records obtained from a hospital and the pre-processed text, a medical knowledge graph based on the BILSTM-CRF model was constructed. Firstly, named entity recognition was realized, then relation extraction was carried out, and finally, data was imported into the Neo4j database to realize the visualization of the knowledge graph. A simple medical question answering system based on knowledge graph is designed and implemented to provide technical support for medical diagnosis in the form of "one question and one answer". The results show that the system can alleviate some problems in traditional medical treatment to a certain extent.
Graph databases (GDB) enable us to conduct a query for searching and analyzing graph data efficiently. However, such a query has to extract sub-graphs in the beginning, so this process is high cost ...due to the NP-complete problem. GDBs find out sub-graphs specified in a query by graph traversal that is a process following edges from a node. Moreover, it enables them to traverse an edge at a constant cost, but graph traversal involving some edges is affected by database volume due to the increase of candidate that it has to traverse edges. To improve the performance of graph traversal more efficiently, it is necessary to reduce the number of times for graph traversal on conducting a query. In this study, we focus on traversing some edges having the same relationship recurrently. Therefore, we propose a new graph index for enabling to traverse the same type edges efficiently to improve the performance of sub-graph searching.
Robust Partitioning Scheme for Accelerating SQL Database Khan, Wisal; Zhang, Cheng; Luo, Bin ...
2021 IEEE International Conference on Emergency Science and Information Technology (ICESIT),
2021-Nov.-22
Conference Proceeding
SQL Databases have shown tremendous performance in the last four decades. Data consistency, isolation, and durability are the main strengths of SQL databases. However, in relational databases, ...computing joins are time-consuming at run time. To address this issue, we propose to utilize the Oracle partitioning technique to enhance specific query performance. Physical database tuning approaches of SQL databases are being used to speed up each SQL statement such as partitioning and indexing. With the partitioning technique, the Oracle 11g RDBMS efficiency performance improves by approximately absolute 50%. Additionally, we compare its performance with the state-of-the-art NoSQL graph database (Neo4j).
Graph Databases have been used widely in different areas. Owing to the type of representation they offer, they have gained popularity in disciplines where the interconnection of the data is a ...substantial matter. With the amount of interconnected data that the era of omics has resulted in, analyzing this data is an important task in medicine, drug design, and many other related fields. This can be done with the help of graph databases. In this paper, a novel multi-bipartite heterogeneous biological graph model is provided. It has been implemented and stored in the graph database Neo4j. Moreover, a new modified version of degree centrality (hereafter "Disease Degree Centrality") is adapted to aid in extracting and mining for meaningful insights from the graph model in hand. We calculated the Disease Degree Centrality for the intended node and we reported the most important protein domains. Finally, we analysed our results on a case study of Menkes and Wilson diseases using DAVID and InterPro databases.
This paper introduces the Entity-Event Knowledge Graph (EEKG) model for clinical data stored in graph databases. We describe how the EEKG model dramatically simplifies the representation of patient ...data, facilitates temporal queries, enables a 360 view of patients and promotes scalability by partitioning patient data into shards. We solved the practical problem that not all clinical data and life science knowledge can be sharded. The solution is to federate each individual shard with common shared data in a knowledge graph. One such shared data source is the UMLS (Unified Medical Language System) knowledge base, which contains genetic, drug clinical trials and Metathesaurus data that we link to individual patient records. We report on several use cases including EMR patient retrieval, matching patients with clinical trials, patient control group selection, and care quality measures.
Regular Expressions for Data Words Libkin, Leonid; Vrgoč, Domagoj
Logic for Programming, Artificial Intelligence, and Reasoning
Book Chapter
Recenzirano
Odprti dostop
In data words, each position carries not only a letter form a finite alphabet, as the usual words do, but also a data value coming from an infinite domain. There has been a renewed interest in them ...due to applications in querying and reasoning about data models with complex structural properties, notably XML, and more recently, graph databases. Logical formalisms designed for querying such data often require concise and easily understandable presentations of regular languages over data words.
Our goal, therefore, is to define and study regular expressions for data words. As the automaton model, we take register automata, which are a natural analog of NFAs for data words. We first equip standard regular expressions with limited memory, and show that they capture the class of data words defined by register automata. The complexity of the main decision problems for these expressions (nonemptiness, membership) also turns out to be the same as for register automata. We then look at a subclass of these regular expressions that can define many properties of interest in applications of data words, and show that the main decision problems can be solved efficiently for it.
Existing Web API search engines allow for only category-based browsing and keyword or tag-based searches for RESTful services without offering the capability of discovering and composing real-world ...RESTful services from the viewpoint of application developers. Therefore, we propose a novel approach, referred to as TAD (Transformation-Annotation-Discovery), to address the above issue. TAD firstly transforms OpenAPI (Swagger) documents of RESTful services into the graph structure in graph database, and provides an annotation engine to automatically annotate the semantic concepts on each graph node by using LDA (Latent Dirichlet Allocation), and WordNet. Next, TAD conducts service composition based on the user requirement by the two modules, service discovery chain and pipeline-based composition. The service discovery chain checks service interface compatibility and retrieves supported and aided services to bridge the gap between the user requirement and the current discovered services based on the Hungarian algorithm. The pipeline-based composition module finds services that semantically fit the user's required tasks based on the annotated graph database and sends candidate services to service discovery chains to simultaneously seek for multiple possible composition solutions fitting the user's composition requirement. Experimental results show that the proposed approach is with good performance under the precision metric.