This paper presents a corpus of pre-processed Mexican laws for computational tasks. The main contributions are the proposed JSON structure and the methodology used to achieve the semi-structured ...corpus with the selected algorithms. Law PDF documents were transformed into plain text, unified by a deconstruction of law–document structure, and labeled with natural language processing techniques considering part of speech (PoS); a process of entity extraction was also performed. The corpus includes the Mexican constitution and the Mexican laws that were collected from the official site in PDF format repealed before 14 October 2021. The collection has 305 documents, including: the Mexican constitution, 289 laws, 8 federal codes, 3 regulations, 2 statutes, 1 decree, and 1 ordinance. The semi-structured database includes the transformation of the set of laws from PDF format to a digital representation in order to facilitate its computational analysis. The documents were migrated to JSON type files to represent internal hierarchical relations. In addition, basic natural language processing techniques were implemented on laws for the identification of part of speech and named entities. The presented data set is mainly useful for text analysis and data science. It could be used for various legislative analysis tasks including: comprehension, interpretation, translation, classification, accessibility, coherence, and searches. Finally, we present some statistic of the identified entities and an example of the usefulness of the corpus for environmental laws.
References to parts of structured documents use their structure to locate the piece of document which is the reference target. On the other hand, XML has become an increasingly important language for ...structured documents. One of its most important related languages is XPath, the language that permits fragments of XML documents to be selected. In this article we present a methodology, and an application case, to automatically extract and solve references to fragments of structured documents. This approach combines structure manipulation and information extraction, to enhance reference extraction tools by improving the precision of the references extracted. We take advantage of XML markup to locate the position within the structure in which the references are found. The use of XPath, one of the most important XML related languages, for reference resolution is original: the resolution tool automatically builds XPath expressions. This proposal is inspired (and implemented) from our work with legislative documents.
The foundation of any activity of all state bodies must primarily be based on state legislation. The military field, which is narrower than the defense field, with all the characteristics and ...properties of the largest collection of kinetic energy and at the same time the integrated energy of state institutionalized coercion, is even more dependent on the meaning and purpose of state legislation. It would be recommended that in the future Slovenian legislation in the military field would more effectively and above all more successfully follow international developments in the field of military legislation, science and military research and development. It should be emphasized that interdisciplinary content in the military field requires special attention, caution and, above all, far-reaching predictability, which is extremely difficult in the modern world, but not impossible. The predictability of future military events requires clear, target-specific and unambiguous legislative content.
This paper surveys the main data models used in projects including the management of changes in digital normative legislation. Models have been classified based on a set of criteria, which are also ...proposed in the paper. Some projects have been chosen as representative for each kind of model. The advantages and problems of each type are analysed, and future trends are identified.