After more than two decades of its use, XML is not only a standard format for exchanging data between different (Web) applications but also a model for a family of some emerging or NoSQL databases, ...called XML databases. In addition to its efficiency in the management of conventional data (i.e., data without a temporal reference), XML is also an excellent support for storing, manipulating, and querying temporal data, due its hierarchical structure. Besides, although several survey papers have dealt with multiple aspects concerning XML data, like XML data modeling, storage, indexing, and querying, there is no survey on the XML data manipulation (i.e., XML data insertion, deletion and modification) topic that continues interesting researchers of the database community, both in conventional and temporal XML databases. For that reason, we think that it is interesting to have a paper that reviews and compares research contributions dealing with this topic. So, our present paper (i) provides an overview of the state-of-the-art of XML data manipulation, in conventional and temporal XML databases, (ii) studies the support of such functionality in mainstream commercial DBMSs and (iii) gives some remarks on possible future research directions related to this issue.
ABSTRACT
The writing of digital text documents has become a longer process that usually goes through revision rounds. Document comparison is important for the human reader interested in changes made ...by the authors. These documents contain structural data using text‐centric XML as one of their main storage systems. Current XML diff algorithms are able to represent differences with a limited number of edit operations: insert, delete, move and update. This approach does not fit the scope of digital text document comparison where the human reader needs to understand actual modifications made by the author. With JATS being a text‐centric XML vocabulary, we propose within this paper a new XML diff algorithm called jats‐diff, able to support bijection between higher‐level modifications made by the authors, such as structural changes and restyling, and the changes detected between XML documents. In addition, jats‐diff provides similarity information between different nodes in order to measure the impact of the text changes on the XML tree.
Consider an ordered, static tree
T
where each node has a label from alphabet Σ. Tree
T
may be of arbitrary degree and shape. Our goal is designing a compressed storage scheme of
T
that supports basic
...navigational
operations among the immediate neighbors of a node (i.e. parent,
i
th child, or any child with some label,…) as well as more sophisticated
path
-based search operations over its labeled structure.
We present a novel approach to this problem by designing what we call the XBW-transform of the tree in the spirit of the well-known Burrows-Wheeler transform for strings 1994. The XBW-transform uses path-sorting to linearize the labeled tree
T
into
two
coordinated arrays, one capturing the structure and the other the labels. For the first time, by using the properties of the XBW-transform, our compressed indexes go beyond the information-theoretic lower bound, and support navigational and path-search operations over labeled trees within (near-)optimal time bounds and entropy-bounded space.
Our XBW-transform is simple and likely to spur new results in the theory of tree compression and indexing, as well as interesting application contexts. As an example, we use the XBW-transform to design and implement a compressed index for XML documents whose compression ratio is significantly better than the one achievable by state-of-the-art tools, and its query time performance is order of magnitudes faster.
XML based attacks are executed in web applications through crafted XML document that forces XML parser to process un-validated documents. This leads to disclosure of sensitive information, malicious ...code execution and disruption of services. OWASP has included XML based attacks at number four in its top 10 list of vulnerabilities published in 2017. Most of the vulnerabilities reported using the XML document range from high to critical and require to be addressed immediately. As per the National Vulnerability Database, 152 vulnerabilities have already been reported in the first five months of the year 2019. A varied number of XML vulnerabilities and their classification exist but are limited to a specific vulnerability. In this paper, the authors have proposed a classification of XML based vulnerabilities based on exhaustive literature survey. The approach/strategies to mitigate these vulnerabilities are also presented. The work will help the web developers for proposing secure parsers that will thwart such attacks.
•We describe to what extent the specificities of DTDs/XSDs impact on the Schema Matching task.•We introduce a template to describe the main components of an XML Matcher, their role and behavior.•We ...illustrate how each of these components has been implemented in some popular XML Matchers.•We analyze commercial tools implementing XML Matchers.•We discuss XML source clustering and uncertainty management in XML Matchers.
Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespread adoption of XML in the most disparate application fields pushed a growing number of researchers to design XML-specific Schema Matching approaches, called XML Matchers, aiming at finding semantic matchings between concepts defined in DTDs and XSDs. XML Matchers do not just take well-known techniques originally designed for other data models and apply them on DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical structure of a DTD/XSD) to improve the performance of the Schema Matching process. The design of XML Matchers is currently a well-established research area. The main goal of this paper is to provide a detailed description and classification of XML Matchers. We first describe to what extent the specificities of DTDs/XSDs impact on the Schema Matching task. Then we introduce a template, called XML Matcher Template, that describes the main components of an XML Matcher, their role and behavior. We illustrate how each of these components has been implemented in some popular XML Matchers. We consider our XML Matcher Template as the baseline for objectively comparing approaches that, at first glance, might appear as unrelated. The introduction of this template can be useful in the design of future XML Matchers. Finally, we analyze commercial tools implementing XML Matchers and introduce two challenging issues strictly related to this topic, namely XML source clustering and uncertainty management in XML Matchers.
Since the boom in new proposals on techniques for efficient querying of XML data is now over and the research world has shifted its attention toward new types of data formats, we believe that it is ...crucial to review what has been done in the area to help users choose an appropriate strategy and scientists exploit the contributions in new areas of data processing. The aim of this work is to provide a comprehensive study of the state-of-the-art of approaches for the structural querying of XML data. In particular, we start with a description of labeling schemas to capture the structure of the data and the respective storage strategies. Then we deal with the key part of every XML query processing: a twig query join, XML query algebras, optimizations of query plans, and selectivity estimation of XML queries. To the best of our knowledge, this is the first work that provides such a detailed description of XML query processing techniques that are related to structural aspects and that contains information about their theoretical and practical features as well as about their mutual compatibility and general usability.
Purpose
Any XML schema definition can be organized according to one of the following design styles: “Russian Doll”, “Salami Slice”, “Venetian Blind” and “Garden of Eden” (with the additional ...“Bologna” style actually representing absence of style). Conversion from a design style to another can facilitate the reuse and exchange of schema specifications encoded using the XML schema language. Without any computer-aided engineering support, style conversions must be performed very carefully as they are difficult and error-prone operations. The purpose of this paper is to efficiently deal with such XML schema design style conversions.
Design/methodology/approach
A general approach, named StyleVolution, for automatic management of XML schema design style conversions, is proposed. StyleVolution is equipped with a suite of seven procedures: four for converting a valid XML schema from any other design style to the “Garden of Eden” style, which has been chosen as a normalized XML schema format, and three for converting from the “Garden of Eden” style to any of the other desired design styles.
Findings
Procedures, algorithms and methods for XML schema design style conversions are presented. The feasibility of the approach has been shown through the encoding (using the XQuery language) and the testing (with the Altova XMLSpy 2019 tool) of a suite of seven ready-to-use procedures. Moreover, four test procedures are provided for checking the conformance of a given input XML schema to a schema design style.
Originality/value
The proposed approach implements a new technique for efficiently managing XML schema design style conversions, which can be used to make any given XML schema file to conform to a desired design style.
The pebble tree automaton and the pebble tree transducer are enhanced by additionally allowing an unbounded number of “invisible” pebbles (as opposed to the usual “visible” ones). The resulting ...pebble tree automata recognize the regular tree languages (i.e., can validate all generalized DTD's) and hence can find all matches of MSO definable patterns. Moreover, when viewed as a navigational device, they lead to an XPath-like formalism that has a path expression for every MSO definable binary pattern. The resulting pebble tree transducers can apply arbitrary MSO definable tests to (the observable part of) their configurations, they (still) have a decidable typechecking problem, and they can model the recursion mechanism of XSLT. The time complexity of the typechecking problem for conjunctive queries that use MSO definable patterns can often be reduced through the use of invisible pebbles.
Although multi-temporal XML databases supporting schema versioning are used in several domains, like e-commerce, e-health, and e-government, existing database management systems and XML tools do not ...provide any support for managing (inserting, updating, and deleting) temporal XML data or temporal XML schema versioning. Besides, whereas much research work has focused in the last decade on schema versioning in temporal XML databases, any attention has been devoted to manipulating data in such databases. To fill this theoretical and practical gap, we propose in this paper a generic approach, named TempoX (Temporal XML), for data manipulation in multi-temporal and multi-schema-version XML databases. Indeed, we (i) define a new multi-temporal XML data model supporting temporal schema versioning, named TempoXDM (Temporal XML Data Model), (ii) introduce the principles on which our approach is based, and (iii) provide the specifications of the basic data manipulation operations: “insert”, “replace”, “evolve”, and “delete”. Moreover, to show the feasibility of TempoX, we use it to propose a temporal XML update language, named TempoXUF (Temporal XQuery Update Facility), as an extension of the W3C XQuery Update Facility language to temporal and versioning aspects. Furthermore, to validate our language proposal, we develop a system prototype, named TempoXUF-Manager, that supports TempoXUF.