The OpenScience Slovenia metadata dataset contains metadata entries for Slovenian public domain academic documents which include undergraduate and postgraduate theses, research and professional ...articles, along with other academic document types. The data within the dataset was collected as a part of the establishment of the Slovenian Open-Access Infrastructure which defined a unified document collection process and cataloguing for universities in Slovenia within the infrastructure repositories. The data was collected from several already established but separate library systems in Slovenia and merged into a single metadata scheme using metadata deduplication and merging techniques. It consists of text and numerical fields, representing attributes that describe documents. These attributes include document titles, keywords, abstracts, typologies, authors, issue years and other identifiers such as URL and UDC. The potential of this dataset lies especially in text mining and text classification tasks and can also be used in development or benchmarking of content-based recommender systems on real-world data.
AbstractThe paper presents the legal, organisational and technical perspectives regarding the implementation of the Slovenian national open access infrastructure for electronic theses and ...dissertations as well as for research publications. The infrastructure consists of four institutional repositories and a national portal that aggregates content from the university repositories and other Slovenian archives in order to provide a common search engine, recommendation of similar publications, and similar text detection. We have developed the software which is integrated with the universities' information and authentication systems and with the COBISS.SI. During the project the necessary legal background was defined and processes for mandatory submission of electronic theses and dissertations as well as of research publications were designed. The processes for data exchange between the institutional repositories and the national portal, and the processes for similar text detection and recommendation system were established. Bilingual web and mobile applications, a recommendation system and the interface suitable for persons with disabilities are provided to the users from around the world. The repositories are an effective promotion tool for universities and their researchers. It is expected that they will improve the recognition of Slovenian universities in the world. The complex national open access infrastructure with similar text detection support and integration with other systems will enable the storage of almost eighty percent of peer-reviewed scientific papers, annually published by Slovenian researchers. The majority of electronic theses and dissertations yearly produced at the Slovenian higher education institutions will also be accessible.
The paper presents the legal, organisational and technical perspectives regarding the implementation of the Slovenian national open access infrastructure for electronic theses and dissertations as ...well as for research publications. The infrastructure consists of four institutional repositories and a national portal that aggregates content from the university repositories and other Slovenian archives in order to provide a common search engine, recommendation of similar publications, and similar text detection. We have developed the software which is integrated with the universities' information and authentication systems and with the COBISS.SI. During the project the necessary legal background was defined and processes for mandatory submission of electronic theses and dissertations as well as of research publications were designed. The processes for data exchange between the institutional repositories and the national portal, and the processes for similar text detection and recommendation system were established. Bilingual web and mobile applications, a recommendation system and the interface suitable for persons with disabilities are provided to the users from around the world. The repositories are an effective promotion tool for universities and their researchers. It is expected that they will improve the recognition of Slovenian universities in the world. The complex national open access infrastructure with similar text detection support and integration with other systems will enable the storage of almost eighty percent of peer-reviewed scientific papers, annually published by Slovenian researchers.The majority of electronic theses and dissertations yearly produced at the Slovenian higher education institutions will also be accessible.
Purpose
– The purpose of this paper is to present a technical perspective when implementing the Slovenian open access infrastructure that consists of four institutional repositories (IRs) and a ...national portal (NP) that aggregates content from the repositories in order to provide a common search engine, recommendations of similar documents, and similar text detection.
Design/methodology/approach
– During the project, the necessary legal background and processes for mandatory submissions of final study works, research publications and research data were established, as well as processes for data exchange between the IRs and the NP, and processes for similar text detection.
Findings
– The consortium consisted of four Slovenian universities that significantly differ in size, organisation, and workflows. It was anticipated that exactly the same legal background and software would be used for the four repositories. It turned out that complete unification was impossible due to the differences.
Practical implications
– The national open access infrastructure will improve the visibility of Slovenian research organisations. It supports the compliance with the funders’ open access mandates. The established infrastructure enables the depositing and archiving of approximately 80 percent of the peer-reviewed scientific publications that are annually published by Slovenian researchers. At the same time, the majority of final study works from Slovenian higher education institutions are available in full-text format.
Originality/value
– This paper describes a technical perspective for setting up a national open access infrastructure, which has not been described in the literature previously.