The literature provides a wide range of techniques to assess and improve the quality of data. Due to the diversity and complexity of these techniques, research has recently focused on defining ...methodologies that help the selection, customization, and application of data quality assessment and improvement techniques. The goal of this article is to provide a systematic and comparative description of such methodologies. Methodologies are compared along several dimensions, including the methodological phases and steps, the strategies and techniques, the data quality dimensions, the types of data, and, finally, the types of information systems addressed by each methodology. The article concludes with a summary description of each methodology.
In this paper, we discuss the application of concept of data quality to big data by highlighting how much complex is to define it in a general way. Already data quality is a multidimensional concept, ...difficult to characterize in precise definitions even in the case of well-structured data. Big data add two further dimensions of complexity: (i) being “
very
”
source specific
, and for this we adopt the interesting UNECE classification, and (ii) being
highly unstructured and schema-less
, often without golden standards to refer to or very difficult to access. After providing a tutorial on data quality in traditional contexts, we analyze big data by providing insights into the UNECE classification, and then, for each type of data source, we choose a specific instance of such a type (notably deep Web data, sensor-generated data, and Twitters/short texts) and discuss how quality dimensions can be defined in these cases. The overall aim of the paper is therefore to identify further research directions in the area of big data quality, by providing at the same time an up-to-date state of the art on data quality.
In this paper, we address the problem of assessing the social value of open data. While the number of open data initiatives increases and many data sets are currently available to lay people, common ...citizens and end users, a still limited number of studies specifically address how to improve open data understandability, their usability by common users and the measurability of their value in terms of concrete outcomes and benefits for the intended communities and the individual who appropriate those data, to make them more personal and hence more valuable. Our goal is to contribute to the success of open data initiatives by defining a methodology by which to assess their perceived social value. In this paper, we present the conceptual content of the methodology, that is its main concepts and logic structure, and discuss it by means of an empirical user study in which we applied it to real-life open data sets and involving a large sample of prospective consumers of those open data. In particular, we focus on the health care domain in order to improve the welfare of the citizens that need health care services and use the Web to look for relevant information to address those situated needs. Among our main findings, we discovered a clear preference for visual information formats by women with respect to men, and a clear preference for hospitals ranking by disease for senior people with respect to younger.
Open data initiatives are characterized, in several countries, by a great extension of the number of data sets made available for access by public administrations, constituencies, businesses and ...other actors, such as journalists, international institutions and academics, to mention a few. However, most of the open data sets rely on selection criteria, based on a technology-driven perspective, rather than a focus on the potential public and social value of data to be published. Several experiences and reports confirm this issue, such as those of the Open Data Census. However, there are also relevant best practices. The goal of this paper is to investigate the different dimensions of a framework suitable to support public administrations, as well as constituencies, in assessing and benchmarking the social value of open data initiatives. The framework is tested on three initiatives, referring to three different countries, Italy, the United Kingdom and Tunisia. The countries have been selected to provide a focus on European and Mediterranean countries, considering also the difference in legal frameworks (civic law vs. common law countries).
When tens and even hundreds of schemas are involved in the integration process, criteria are needed for choosing clusters of schemas to be integrated, so as to deal with the integration problem ...through an efficient iterative process. Schemas in clusters should be chosen according to cohesion and coupling criteria that are based on similarities and dissimilarities among schemas. In this paper, we propose an algorithm for a novel variant of the correlation clustering approach that addresses the problem of assisting a designer in integrating a large number of conceptual schemas. The novel variant introduces upper and lower bounds to the number of schemas in each cluster, in order to avoid too complex and too simple integration contexts respectively. We give a heuristic for solving the problem, being an NP hard combinatorial problem. An experimental activity demonstrates an appreciable increment in the effectiveness of the schema integration process when clusters are computed by means of the proposed algorithm w.r.t, the ones manually defined by an expert.
From Data Quality to Big Data Quality Batini, Carlo; Rula, Anisa; Scannapieco, Monica ...
Journal of database management,
01/2015, Volume:
26, Issue:
1
Journal Article
Peer reviewed
Open access
This article investigates the evolution of data quality issues from traditional structured data managed in relational databases to Big Data. In particular, the paper examines the nature of the ...relationship between Data Quality and several research coordinates that are relevant in Big Data, such as the variety of data types, data sources and application domains, focusing on maps, semi-structured texts, linked open data, sensor & sensor networks and official statistics. Consequently a set of structural characteristics is identified and a systematization of the a posteriori correlation between them and quality dimensions is provided. Finally, Big Data quality issues are considered in a conceptual framework suitable to map the evolution of the quality paradigm according to three core coordinates that are significant in the context of the Big Data phenomenon: the data type considered, the source of data, and the application domain. Thus, the framework allows ascertaining the relevant changes in data quality emerging with the Big Data phenomenon, through an integrative and theoretical literature review.
Infographics are a common visual means to inform users. This paper investigates how lay people of different age, gender and educational background perceive the use of infographics for information ...visualization in daily tasks. We chose three topics of general interest: weather, study and work and three infographics, one for each topic. We administered a questionnaire to people randomly split in two groups: the first group interacted with a static version of each infographic, i.e., a snapshot of it; the second group interacted with the fully configurable infographics. We aimed to assess information quality on different dimensions, to take into account both formal and substantial aspects; interaction quality along dimensions like usability and ease of use; and design quality on the dimensions of the Visualization Wheel by Cairo, to assess the trade-off between information complexity and aesthetics of infographics. The goal was to measure whether the quality of infographics affects the perception of information and the users' interaction. The overall results suggest that, although interactive infographics are perceived as more complex, the experience with them is better. From our observations, we derived a model to assess the overall quality of static and interactive infographics, based on information, interaction and design quality dimensions.
•A user study on static and interactive infographics use for daily tasks.•An information, interaction and design quality model for infographics.•Assessment of whether/how quality of infographics affects users' experience.
Qualità dei Dati Batini, Carlo; Scannapieco, Monica
2008, 2008-05-01
eBook
La scarsa qualita dei dati puo ostacolare o danneggiare seriamente l'efficienza e l'efficacia di organizzazioni e imprese. La crescente consapevolezza di tali ripercussioni ha condotto a importanti ...iniziative pubbliche come la promulgazione del "Data Quality Act" negli Stati Uniti e della direttiva 2003/98 del Parlamento Europeo.Gli autori presentano un'introduzione completa e sistematica all'ampio insieme di problemi legati alla qualit dei dati. Il libro parte con una descrizione dettagliata di diverse dimensioni della qualit dei dati, come l'accuratezza, la completezza e la consistenza, e ne discute l'importanza in relazione sia a diverse tipologie di dati, come i dati federati, i dati presenti sul web e i dati con dipendenze temporali, che alle diverse categorie in cui i dati si possono classificare. L'esauriente descrizione di tecniche e metodologie provenienti non solo dalla ricerca nell'area della qualit dei dati ma anche in aree correlate, quali data mining, teoria della probabilit, analisi statistica dei dati e apprendimento automatico, fornisce un'eccellente introduzione allo stato dell'arte attuale. La presentazione completata da una breve descrizione e da un confronto critico di strumenti e metodologie pratiche, che aiuter il lettore a risolvere i propri problemi di qualit.Questo libro costituisce la combinazione ideale fra la correttezza dei fondamenti teorici e l'applicabilit degli approcci pratici. E' ideale per tutti coloro - ricercatori, studenti o professionisti - che siano interessati a una panoramica completa sui problemi della qualit dei dati. Pu essere inoltre impiegato come manuale in un corso introduttivo all'argomento, o dall'autodidatta.
The article investigates the potential role of conceptual modeling for policymaking. It argues that the use of conceptual schemas may provide an effective understanding of public sector information ...assets, and how they might be used to satisfy the needs of constituencies, thus having a public as well as social value. The article first defines the information assets of public administration, and goes on to consider the role of conceptual modeling for eliciting social value with regard to open data, using as a case study open data concerning hospitals in the United States, Canada, and Italy. An interpretive framework is outlined to support public managers for choosing the data sets to be "opened," thereby exploiting public sector information assets under a social value perspective.