The Resource Description Framework (RDF) represents a main ingredient and data representation format for Linked Data and the Semantic Web. It supports a generic graph-based data model and data ...representation format for describing things, including their relationships with other things. As the size of RDF datasets is growing fast, RDF data management systems must be able to cope with growing amounts of data. Even though physically handling RDF data using a relational table is possible, querying a giant triple table becomes very expensive because of the multiple nested joins required for answering graph queries. In addition, the heterogeneity of RDF Data poses entirely new challenges to database systems. This article provides a comprehensive study of the state of the art in handling and querying RDF data. In particular, we focus on data storage techniques, indexing strategies, and query execution mechanisms. Moreover, we provide a classification of existing systems and approaches. We also provide an overview of the various benchmarking efforts in this context and discuss some of the open problems in this domain.
This paper presents a model for Enterprise Application Integration (EAI) in the modern era of data explosion and globalisation. Application here refers to software, which is in essence data system, ...and data refers to both information and knowledge (data serves as a vehicle for information as well as knowledge). The salient features of the model are: (1) separation of business functions from applications and enterprises, (2) three-layer architecture of the model (conceptual or semantic level, external or application level, internal or realisation level), and (3) integration of structured, semi-structured and non-structured data. To our best knowledge, the existing model or solution to EAI does not hold all the three features. A case study is presented to illustrate how the model works. The model can be used by an individual enterprise or a group of enterprises that form a network, e.g., a holistic supply chain network.
Construction schedules are written instructions of construction execution shared between stakeholders for essential project information exchange. However, construction schedules are semi-structured ...data that lack semantic details and coherence within and across projects. This study proposes an ontology-based Recurrent Neural Network approach to bi-directionally translate between human written language and machinery ontological language. The proposed approach is assessed in three areas: text generation accuracy, machine readability, and human understandability. This study collected 30 project schedules with 19,589 activities (sample size = 19,589) from a Tier-1 contractor in the UK. The experimental results indicate that: (1) precision and recall of text generation LSTM-RNN model is 0.991 and 0.874, respectively; (2) schedule readability improved by increasing the semantic distinctiveness, measured using the cosine similarity which was reduced from 0.995 to 0.990 (p < 0.01); (3) schedule understandability improved from 75.90% to 85.55%. The proposed approach formalises text descriptions in construction schedules and other construction documents with less labour investment. It supports contractors to establish knowledge management systems to learn from historic data and make more informed decisions in future similar scenarios.
Developers often prefer flexibility over upfront schema design, making semi-structured data formats such as JSON increasingly popular. Large amounts of JSON data are therefore stored and analyzed by ...relational database systems. In existing systems, however, JSON's lack of a fixed schema results in slow analytics. In this paper, we present JSON tiles, which, without losing the flexibility of JSON, enables relational systems to perform analytics on JSON data at native speed. JSON tiles automatically detects the most important keys and extracts them transparently - often achieving scan performance similar to columnar storage. At the same time, JSON tiles is capable of handling heterogeneous and changing data. Furthermore, we automatically collect statistics that enable the query optimizer to find good execution plans. Our experimental evaluation compares against state-of-the-art systems and research proposals and shows that our approach is both robust and efficient.
The data produced by various services should be stored and managed in an appropriate format for gaining valuable knowledge conveniently. This leads to the emergence of various data models, including ...relational, semi-structured, and graph models, and so on. Considering the fact that the mature relational databases established on relational data models are still predominant in today’s market, it has fueled interest in storing and processing semi-structured data and graph data in relational databases so that mature and powerful relational databases’ capabilities can all be applied to these various data. In this survey, we review existing methods on mapping semi-structured data and graph data into relational tables, analyze their major features, and give a detailed classification of those methods. We also summarize the merits and demerits of each method, introduce open research challenges, and present future research directions. With this comprehensive investigation of existing methods and open problems, we hope this survey can motivate new mapping approaches through drawing lessons from each model’s mapping strategies, as well as a new research topic - mapping multi-model data into relational tables.
The process of converting natural language specifications into conceptual models requires detailed analysis of natural language text, and designers frequently make mistakes when undertaking this ...transformation manually. Although many approaches have been used to partly automate this process, one of the main limitations is the lack of a domain-independent ontology that can be used as a repository for entities and relationships, thus guiding the transformation process. In this paper, a semi-automated system for mapping natural language text into conceptual models is proposed. The system, called SACMES, combines a linguistic approach with an ontological approach and human intervention to achieve the task. SACMES learns from the natural language specifications that it processes and stores the information that is learnt in a conceptual model ontology and a user history knowledge database. It then uses the stored information to improve performance and reduce the need for human intervention. The evaluation conducted on SACMES demonstrates that: (1) by using the system, precision and recall for users identifying entities of conceptual models is increased by 6% and 13%, respectively, while for relationships, increases are even higher, 14% for precision and 23% for recall; (2) the performance of the system is improved by processing more natural language requirements, and thus, the need for human intervention is decreased.
Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In ...the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.
Knowledge graphs have, for the past decade, been a hot topic both in public and private domains, typically used for large-scale integration and analysis of data using graph-based data models. One of ...the central concepts in this area is the Semantic Web, with the vision of providing a well-defined meaning to information and services on the Web through a set of standards. Particularly, linked data and ontologies have been quite essential for data sharing, discovery, integration, and reuse. In this paper, we provide a systematic literature review on knowledge graph creation from structured and semi-structured data sources using Semantic Web technologies. The review takes into account four prominent publication venues, namely, Extended Semantic Web Conference, International Semantic Web Conference, Journal of Web Semantics, and Semantic Web Journal. The review highlights the tools, methods, types of data sources, ontologies, and publication methods, together with the challenges, limitations, and lessons learned in the knowledge graph creation processes.