Keyword search has been widely studied to retrieve relevant substructures from graphs for a given set of keywords. However, existing well-studied approaches aim at finding compact trees/subgraphs ...containing the keywords, and ignore a critical measure, density, to represent how strongly and stably the keyword nodes are connected in the substructure. In this paper, given a set of keywords <inline-formula><tex-math notation="LaTeX">Q = \lbrace w_1, w_2, \ldots, w_l\rbrace</tex-math> <mml:math><mml:mrow><mml:mi>Q</mml:mi><mml:mo>=</mml:mo><mml:mo>{</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mo>...</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhu-ieq1-2975793.gif"/> </inline-formula>, we study the problem of finding a cohesive subgraph containing <inline-formula><tex-math notation="LaTeX">Q</tex-math> <mml:math><mml:mi>Q</mml:mi></mml:math><inline-graphic xlink:href="zhu-ieq2-2975793.gif"/> </inline-formula> with high density and compactness from a graph <inline-formula><tex-math notation="LaTeX">G</tex-math> <mml:math><mml:mi>G</mml:mi></mml:math><inline-graphic xlink:href="zhu-ieq3-2975793.gif"/> </inline-formula>. We model the cohesive subgraph based on a carefully chosen <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhu-ieq4-2975793.gif"/> </inline-formula>-truss model, and formulate the problem of finding cohesive subgraphs for keyword queries as minimal dense truss search problem, i.e., finding minimal subgraph that maximizes the trussness covering <inline-formula><tex-math notation="LaTeX">Q</tex-math> <mml:math><mml:mi>Q</mml:mi></mml:math><inline-graphic xlink:href="zhu-ieq5-2975793.gif"/> </inline-formula>. However, unlike <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhu-ieq6-2975793.gif"/> </inline-formula>-truss based community search that can be efficiently done based on the local search from a given set of nodes, minimal dense truss search for keyword queries is a nontrivial task as the subset of keyword nodes to be included in the retrieved substructure is previously unknown. To tackle this problem, we first design a novel hybrid KT-Index to keep the keyword and truss information compacly, and then propose an efficient algorithm that carries the search on KT-Index directly to find the dense truss with the maximum trussness <inline-formula><tex-math notation="LaTeX">G_{den}</tex-math> <mml:math><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="zhu-ieq7-2975793.gif"/> </inline-formula> without repeated accesses to the original graph. Then, we develop a novel refinement approach to extract minimal dense truss from the dense truss <inline-formula><tex-math notation="LaTeX">G_{den}</tex-math> <mml:math><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="zhu-ieq8-2975793.gif"/> </inline-formula>, by checking each node at most once based on the anti-monotonicity property derived from <inline-formula><tex-math notation="LaTeX">k</tex-math> <mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="zhu-ieq9-2975793.gif"/> </inline-formula>-truss, together with several optimization strategies including batch based deletion, early-stop based deletion, and local exploration. Moreover, we also extend the proposed method to deal with the top-<inline-formula><tex-math notation="LaTeX">r</tex-math> <mml:math><mml:mi>r</mml:mi></mml:math><inline-graphic xlink:href="zhu-ieq10-2975793.gif"/> </inline-formula> search. Extensive experimental studies on real-world networks validated the effectiveness and efficiency of our approaches.
Trinity Shao, Bin; Wang, Haixun; Li, Yatao
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data,
06/2013
Conference Proceeding
Computations performed by graph algorithms are data driven, and require a high degree of random data access. Despite the great progresses made in disk technology, it still cannot provide the level of ...efficient random access required by graph computation. On the other hand, memory-based approaches usually do not scale due to the capacity limit of single machines. In this paper, we introduce Trinity, a general purpose graph engine over a distributed memory cloud. Through optimized memory management and network communication, Trinity supports fast graph exploration as well as efficient parallel computing. In particular, Trinity leverages graph access patterns in both online and offline computation to optimize memory and communication for best performance. These enable Trinity to support efficient online query processing and offline analytics on large graphs with just a few commodity machines. Furthermore, Trinity provides a high level specification language called TSL for users to declare data schema and communication protocols, which brings great ease-of-use for general purpose graph management and computing. Our experiments show Trinity's performance in both low latency graph queries as well as high throughput graph analytics on web-scale, billion-node graphs.
Life cycle inventory (LCI) databases constitute the basis for the life cycle assessment (LCA) of a product system. LCI data involve complex relationships between the production activities and the ...environment. The currently used relational database employs a rigid schema structure of two-dimensional tables and lacks direct support for the complex relationships between LCI data. In this paper, a graph database was designed and constructed using Neo4j graph database management system. First, an LCI knowledge graph (LCIKG) model for the graph database is proposed, which employs the labeled property graph structure and describes the LCI data and the semantic relationships among LCI data concepts. Second, Ecoinvent datasets were successfully used to construct the LCI graph database by automatically extracting Cypher syntax patterns; then, Neo4j was used to store, visualize, and retrieve LCI data. Finally, a set of queries have been executed to evaluate the performance of the graph database; a case study has been provided to demonstrate the effectiveness of the proposed graph database. The graph database can effectively reduce the time and effort to query and process LCI data of a product system. Moreover, the dynamic schema of the LCIKG model promotes the scalability and interoperability of LCI data. The completed work provides a feasible solution for the issues and challenges in current LCI research and promotes the wide application of LCA.
•Using graph databases to replace traditional relational databases to store and manage LCI data.•A novel labeled property knowledge graph model for LCI data (LCIKG) is proposed and developed.•An approach based on Cypher syntax patterns to automatically extract nodes and relations of LCIKGs from Ecoinvent datasets.•The wide application prospects of the LCI graph database are shown.
Subject Ontologies represent conceptualizations of disciplinary domains in which concepts symbolize topics that are relevant for the considered domain and are associated each other by means of ...specific relations. Usually, these kind of lightweight ontologies are adopted in knowledge-based educational environments to enable semantic organization and search of resources and, in other cases, to support personalization and adaptation features for learning and teaching experiences. For this reason, applying effective management methodologies for Subject Ontologies is a crucial aspect in engineering the environments. In particular, this paper proposes an approach to use SKOS (a Semantic Web-based vocabulary providing a standard way to represent knowledge organization systems) for modelling subject ontologies. Moreover, the paper underlines the main benefits of SKOS. It focuses on alternative strategies for storing and accessing ontologies in order to support the knowledge sharing, knowledge reusing, planning, assessment, customization and adaptation processes related to learning scenarios. The results of an early experimentation allowed the authors defining a framework able to support, from both methodological and technological viewpoints, the use of Subject Ontologies in the context of a Semantic Web-based Educational System. The defined framework has high performances in terms of response and this may really improve the user experience.
•We defined a framework to use Subject Ontologies in Educational Systems.•Subject Ontologies support learning scenarios by representing themes to treat.•SKOS is a good technical approach to improve the learning experience of users.•We identified a good strategy to treat Subject Ontologies and support SWBESs.•The described approach may improve the performance of the learning processes.
Phillip Ein-Dor advocated that electronic journals be more than a PDF of the established text model. He envisioned a transformation of scholarship. The need for such a transition has only grown since ...the first issue of JAIS in 2000 because the continuing growth and fragmentation of knowledge limits the generation of new knowledge. We propose drawing on analytics and AI to accelerate and transform scholarship, providing an appropriate tribute to a visionary IS leader.
PHYLODB is a modular and extensible framework for large-scale phylogenetic analyses of sequence based typing data, which are essential for understanding epidemics evolution. It relies on the Neo4j ...graph database for data storage and processing, providing a schema and an API for representing and querying phylogenetic data. Custom algorithms are also supported, allowing users to perform heavy computations directly over the data, and to store results in the database. Multiple computation results are stored as multilayer networks, promoting and facilitating comparative analyses, as well as avoiding unnecessary ab initio computations. The experimental evaluation results showcase that PHYLODB is efficient and scalable with respect to both API operations and algorithms execution.
Searching similar graphs in graph databases for a query graph has attracted extensive attention recently. Existing works on graph similarity queries are threshold based approaches which return graphs ...with distances to the query smaller than a given threshold. However, in many applications the number of answer graphs for the same threshold can vary significantly for different queries. In this paper, we study the problem of finding top-k most similar graphs for a query under the distance measure based on maximum common subgraph (MCS). Since computing MCS is NP-hard, we devise a novel framework to prune unqualified graphs based on the lower bounds of graph distance, and accordingly derive four lower bounds with different tightness and computational cost for pruning. To further reduce the number of MCS computations, we also propose an improved framework based on both lower and upper bounds, and derive three new upper bounds. To support efficient pruning, we design three indexes with different tradeoffs between pruning power and construction cost. To accelerate the index construction, we explore bound relaxation techniques, based on which approximate indexes can be efficiently built. We conducted extensive performance studies on real-life graph datasets to validate the effectiveness and efficiency of our approaches.
Graphs are used in many disciplines, from communication networks, biological, social networks including maths and other fields of science. This is the latest and most important field of computer ...science today. In this research, the authors have worked on the materialization to improve the query response of graph data. The large graph dataset have been divided into two categories; one contains the topological data and other contains the aggregate data and both are accessed via a PAM (Predicate Aggregate Materialization) engine which plays an intermediary role. PAM engine stores the query results and it checks whether the query is new or already processed every time the query appears. If it is found already processed than it just get the results which are materialized and if it finds a new query than it goes for the extraction of data from required datasets. After completion of process, PAM engine materialize the extracted data for reuse. The technique works and it reduces the processing time and improves response time.
Graph databases have recently gained a lot of attention in areas where the relationships between data and the data itself are equally important, like the semantic web, social networks, and biological ...networks. A graph database is simply a database designed to store, query, and modify graphs. Recently, several graph database models have been developed. The goal of this research is to evaluate the performance of the two most popular graph databases, Neo4j and TigerGraph, for network centrality metrics including degree centrality, betweenness centrality, closeness centrality, eigenvector centrality, and PageRank. We applied those metrics to a set of real-world networks in both graph databases to see their performance. Experimental results show Neo4j outperforms TigerGraph for computing the centrality metrics used in this study, but TigerGraph performs better during the data loading phase.
In this paper, we approach the design of ID caching technology (IDCT) for graph databases, with the purpose of accelerating the queries on graph database data and avoiding redundant graph database ...query operations which will consume great computer resources. Traditional graph database caching technology (GDCT) needs a large memory to store data and has the problems of serious data consistency and low cache utilization. To address these issues, in the paper we propose a new technology which focuses on ID allocation mechanism and high-speed queries of ID on graph databases. Specifically, ID of the query result is cached in memory and data consistency is achieved through the real-time synchronization and cache memory adaptation. In addition, we set up complex queries and simple queries to satisfy all query requirements and design a mechanism of cache replacement based on query action time, query times, and memory capacity, thus improving the performance furthermore. Extensive experiments show the superiority of our techniques compared with the traditional query approach of graph databases.