Data Mining and Knowledge Discovery in Databases (KDD) is a research field concerned with deriving higher-level insights from data. The tasks performed in that field are knowledge intensive and can ...often benefit from using additional knowledge from various sources. Therefore, many approaches have been proposed in this area that combine Semantic Web data with the data mining and knowledge discovery process. This survey article gives a comprehensive overview of those approaches in different stages of the knowledge discovery process. As an example, we show how Linked Open Data can be used at various stages for building content-based recommender systems. The survey shows that, while there are numerous interesting research works performed, the full potential of the Semantic Web and Linked Open Data for data mining and KDD is still to be unlocked.
•A MF model with Linked Open Data is developed to handle data sparsity issue.•Hidden data and LOD similarity measure are integrated to enhance recommendations.•The proposed framework can be applied ...to any domain for recommendations.•Experiments were done on Netflix and Movie Lens datasets for validation.
The web contains a huge volume of data, and it's populating every moment to the point that human beings cannot deal with the vast amount of data manually or via traditional tools. Hence an advanced tool is required to filter such massive data and mine the valuable information. Recommender systems are among the most excellent tools for such a purpose in which collaborative filtering is widely used. Collaborative filtering (CF) has been extensively utilized to offer personalized recommendations in electronic business and social network websites. In that, matrix factorization is an efficient technique; however, it depends on past transactions of the users. Hence, there will be a data sparsity problem. Another issue with the collaborative filtering method is the cold start issue, which is due to the deficient information about new entities. A novel method is proposed to overcome the data sparsity and the cold start problem in CF. For cold start issue, Recommender System with Linked Open Data (RS-LOD) model is designed and for data sparsity problem, Matrix Factorization model with Linked Open Data is developed (MF-LOD). A LOD knowledge base “DBpedia” is used to find enough information about new entities for a cold start issue, and an improvement is made on the matrix factorization model to handle data sparsity. Experiments were done on Netflix and MovieLens datasets show that our proposed techniques are superior to other existing methods, which mean recommendation accuracy is improved.
Women’s presence in STEM (Science, Technology, Engineering and Math) has been growing in relevance, yet research on the topic is met with a lack of data for the construction of consistent analysis. ...The Equality for Leadership in Latin American STEM Network (ELLAS) aims to develop a Linked Open Data (LOD) platform to help fill this gap. This work is embedded in ELLAS and contributes to (1) a triplication (RDF) of the data in Inep’s Higher Education Census, (2) the creation of a methodology for elaborating ontologies, to be used within the project, that enables an analysis of the presence and permanence of women in STEM areas and (3) its instantiation in the context of Higher Education in the field of computing in Brazil.
Coronavirus disease is a worldwide pandemic. The need for accurate data and information become an important thing in this pandemic situation. In Indonesia, the government provides an official website ...for displaying COVID-19 spread statistics. However, the data provided does not follow the 5-star open data. As a result, the data is not reusable and integrated easily into another dataset and application. In this paper, we proposed an RDF vocabulary for presenting COVID-19 data in Indonesia. In addition, two queries are presented as an example for using our vocabulary and dataset as part of Linked Open data movement.
Persistent identifiers are applied to an ever-increasing variety of research objects, including software, samples, models, people, instruments, grants, and projects, and there is a growing need to ...apply identifiers at a finer and finer granularity. Unfortunately, the systems developed over two decades ago to manage identifiers and the metadata describing the identified objects no longer scale. Communities working with physical samples have grappled with these three challenges of the increasing volume, variety, and variability of identified objects for many years. To address this dual challenge, the IGSN 2040 project explored how metadata and catalogues for physical samples could be shared at the scale of billions of samples across an ever-growing variety of users and disciplines. In this paper, we focus on how we scale identifiers and their describing metadata to billions of objects and who the actors involved with this system are. Our analysis of these requirements resulted in the definition of a minimum viable product and the design of an architecture that not only addresses the challenges of increasing volume and variety but, more importantly, is easy to implement because it reuses commonly used Web components. Our solution is based on a Web architectural model that utilises Schema.org, JSON-LD, and sitemaps. Applying these commonly used architectural patterns on the internet allows us to not only handle increasing variety but also enable better compliance with the FAIR Guiding Principles.
•We design the main components of an algorithm-agnostic framework to generate natural language explanations;•We propose a methodology to extract descriptive (direct and indirect) properties about the ...items, and we use these properties to feed a graph-based explanation model;•We define a scoring function to rank these explanation patterns and we use the most relevant ones to generate a template-based natural language explanation;•We validate our methodology by carrying out a large user study (N = 680) in three different domains, as movies, books and music;•We integrate our methodology in a conversational recommender system implemented as a Telegram Bot.
In this article we propose a framework that generates natural language explanations supporting the suggestions generated by a recommendation algorithm.
The cornerstone of our approach is the usage of Linked Open Data (LOD) for explanation aims. Indeed, the descriptive properties freely available in the LOD cloud (e.g., the author of a book or the director of a movie) can be used to build a graph that connects the recommendations the user received to the items she previously liked via the properties extracted from the LOD cloud. In a nutshell, our approach is based on the insight that properties describing the items the user previously liked as well as the suggestions she received can be effectively used to explain the recommendations.
Such a framework is both algorithm-independent and domain-independent, thus it can generate a natural language explanation for every kind of recommendation algorithm, and it can be used to explain a single recommendation (Top-1 scenario) as well as a group of recommendations (Top-N scenario). It is worth noting that the algorithm-independent characteristic does not mean that the framework is able to explain to the user how the recommendations have been generated and how the recommendation algorithm works. The framework explains to users why they might like the recommended items, independently from the recommendation algorithm that generated the recommendations.
In the experimental evaluation, we carried out a user study (N = 680) aiming to investigate the effectiveness of our framework in three different domains, as movies, books and music. Results showed that our technique leads to transparent explanations for all the domains, and such explanations resulted independent of the specific recommendation algorithm in most of the experimental settings. Moreover, we also showed the goodness of our strategy when an entire group of recommendations has to be explained.
As a case study, we integrated the framework in a real-world application, a conversational recommender system implemented as a Telegram Bot. The idea is to use the explanation for supporting both the training phase (when the user expresses her preferences) and the recommendation step (when the user receives the recommendations). Interesting outcomes emerge from these preliminary experiments.
Collaborative filtering recommendation algorithms generate suggestions based on similar interactions between users. Although it provides accurate recommendations, the approach has two limitations: ...the popularity bias, which frequently suggests a small set of the most interacted items, and the systems’ black box functioning, as they are grounded on complex mathematical models. To improve such aspects in collaborative filtering algorithms, this paper introduces a multi-domain item reordering system based on the best explanation for an item, which are the best ranked paths extracted from a Linked Open Data knowledge graph connecting recommended and interacted items. To order paths, the algorithm assigns a value to the node attributes connecting two items by calculating the popularity of the property between interacted items that are rare in the full set of items. Results from two datasets of the movie and music domains comparing the proposed reordering system with six baselines of different collaborative filtering families showed that our easy-to-explain approach improved diversity and/or accuracy metrics.
•Formalization of the popularity bias and transparency problems in collaborative filtering recommendation algorithms.•Proposal of personalized extraction of the users’ most relevant Knowledge Graph properties.•Proposal of an explainable multi-domain item reordering system based on the best explanation for recommendations.•Evaluation of proposal comparing accuracy and beyond-accuracy metrics such as diversity, transparency, fairness, and coverage.
Purpose/significance Linked Open Data (LOD) has been widely used in large industries, as well as non-profit organizations and government organizations. Libraries and archives are ones of the early ...adopters of LOD technology. Libraries and archives promote the development of LOD. Germany is one of the developed countries in the libraries and archives industry, and there are many successful cases about the application of LOD in the libraries and archives. Method/process This paper analyzed the successful application of LOD technology in German libraries and archives by using the methods of document investigation, network survey and content analysis. Result/conclusion These cases reveal in the traditional field of computer science the relationship among research topics related to libraries and archives such as artificial intelligence, database and knowledge discovery. Summing up the characteristics and experience of German practice can provide more reference value for the development of relevant practice i