On the Reuse of Scientific Data Pasquetto, Irene V.; Randles, Bernadette M.; Borgman, Christine L.
Data science journal,
03/2017, Letnik:
16
Journal Article
Recenzirano
Odprti dostop
While science policy promotes data sharing and open data, these are not ends in themselves. Arguments for data sharing are to reproduce research, to make public assets available to the public, to ...leverage investments in research, and to advance research and innovation. To achieve these expected benefits of data sharing, data must actually be reused by others. Data sharing practices, especially motivations and incentives, have received far more study than has data reuse, perhaps because of the array of contested concepts on which reuse rests and the disparate contexts in which it occurs. Here we explicate concepts of data, sharing, and open data as a means to examine data reuse. We explore distinctions between use and reuse of data. Lastly we propose six research questions on data reuse worthy of pursuit by the community: How can uses of data be distinguished from reuses? When is reproducibility an essential goal? When is data integration an essential goal? What are the tradeoffs between collecting new data and reusing existing data? How do motivations for data collection influence the ability to reuse data? How do standards and formats for data release influence reuse opportunities? We conclude by summarizing the implications of these questions for science policy and for investments in data reuse.
This Tutorial serves as both an approachable theoretical introduction to mixed-effects modeling and a practical introduction to how to implement mixed-effects models in R. The intended audience is ...researchers who have some basic statistical knowledge, but little or no experience implementing mixed-effects models in R using their own data. In an attempt to increase the accessibility of this Tutorial, I deliberately avoid using mathematical terminology beyond what a student would learn in a standard graduate-level statistics course, but I reference articles and textbooks that provide more detail for interested readers. This Tutorial includes snippets of R code throughout; the data and R script used to build the models described in the text are available via OSF at https://osf.io/v6qag/, so readers can follow along if they wish. The goal of this practical introduction is to provide researchers with the tools they need to begin implementing mixed-effects models in their own research.
A study on the feasibility of a national open data policy in Zimbabwe was done to document open government data globally and in Zimbabwe. The study showcases the benefits of open government data and ...the opportunities and challenges toward the development of a national open data policy. Web content analysis and document analysis were used to collect data concerning the readiness of the country in implementing open data activities. The open data barometer was used to gather qualitative data which is essential in assessing the preparedness of the country in opening up government and research data. Content analysis was used to analyse the data which was presented thematically based on the objectives of the study. The findings indicated that the Government of Zimbabwe has endorsed a couple of open data frameworks though some projects are done by non-governmental organizations. The major challenge is implementation of these conventions and commitment to make the data accessible. The results indicated that open data must be made available and accessible within Zimbabwe as a matter of national policy. The author recommends the need for advocacy and continuous awareness creation among the stakeholders so that a national open data policy can be crafted and enacted. The enactment of a national open data policy would guide the use of and access to government data and research data which is valuable in research.
The open‐data scientific philosophy is being widely adopted and proving to promote considerable progress in ecology and evolution. Open‐data global data bases now exist on animal migration, species ...distribution, conservation status, etc. However, a gap exists for data on population dynamics spanning the rich diversity of the animal kingdom world‐wide. This information is fundamental to our understanding of the conditions that have shaped variation in animal life histories and their relationships with the environment, as well as the determinants of invasion and extinction. Matrix population models (MPMs) are among the most widely used demographic tools by animal ecologists. MPMs project population dynamics based on the reproduction, survival and development of individuals in a population over their life cycle. The outputs from MPMs have direct biological interpretations, facilitating comparisons among animal species as different as Caenorhabditis elegans, Loxodonta africana and Homo sapiens. Thousands of animal demographic records exist in the form of MPMs, but they are dispersed throughout the literature, rendering comparative analyses difficult. Here, we introduce the COMADRE Animal Matrix Database, an open‐data online repository, which in its version 1.0.0 contains data on 345 species world‐wide, from 402 studies with a total of 1625 population projection matrices. COMADRE also contains ancillary information (e.g. ecoregion, taxonomy, biogeography, etc.) that facilitates interpretation of the numerous demographic metrics that can be derived from its MPMs. We provide R code to some of these examples. Synthesis: We introduce the COMADRE Animal Matrix Database, a resource for animal demography. Its open‐data nature, together with its ancillary information, will facilitate comparative analysis, as will the growing availability of databases focusing on other aspects of the rich animal diversity, and tools to query and combine them. Through future frequent updates of COMADRE, and its integration with other online resources, we encourage animal ecologists to tackle global ecological and evolutionary questions with unprecedented sample size.
In the current era of big data, huge quantities of valuable data, which may be of different levels of veracity, are being generated at a rapid rate. Embedded into these big data are implicit, ...previously unknown and potentially useful information and valuable knowledge that can be discovered by data science solutions, which apply techniques like data mining. There has been a trend that more and more collections of these big data have been made openly available in science, government and non-profit organizations so that people could collaboratively study and analysis these open big data. In this article, we focus on open big data for public transit because public transit (e.g., bus) as a means of transportation is a vital part of many people’s lives. As time is a precious resource, bus delays could negatively affect commuters’ plans. Unfortunately, they are inevitable. Hence, many existing works focused on predicting bus delays. However, predicting on-time or early buses is also important. For instance, commuters who come to a bus stop on time may still miss their buses if the buses leave early. So, in this article, we examine open big data about bus performance (e.g., early, on-time, and late stops). We analyze the data with frequent pattern mining and make predictions with decision-tree based classification. For illustration, we perform predictive analytics on real-life open big data available on Winnipeg Open Data Portal, about bus performance from Winnipeg Transit. It shows the benefits of predictive analytics on open big data for supporting smart transportation services.
A Tale of Open Data Innovations in Five Smart Cities Ojo, Adegboyega; Curry, Edward; Zeleti, Fatemeh Ahmadi
2015 48th Hawaii International Conference on System Sciences,
01/2015
Conference Proceeding, Journal Article
Odprti dostop
Open Data initiatives are increasingly considered as defining elements of emerging smart cities. However, few studies have attempted to provide a better understanding of the nature of this ...convergence and the impact on both domains. This paper presents findings from a detailed study of 18 open data initiatives across five smart cities -- Barcelona, Chicago, Manchester, Amsterdam, and Helsinki. Specifically, the study sought to understand how open data initiatives are shaped by the different smart cities contexts and concomitantly what kinds of innovations are enabled by open data in these cities. The findings highlight the specific impacts of open data innovation on the different smart cities domains, governance of the cities, and the nature of datasets available in the open data ecosystem.
This paper introduces the package open-crypto for free-of-charge and systematic cryptocurrency data collecting. The package supports several methods to request (1) static data, (2) real-time data and ...(3) historical data. It allows to retrieve data from over 100 of the most popular and liquid exchanges world-wide. New exchanges can easily be added with the help of provided templates or updated with build-in functions from the project repository. The package is available on GitHub and the Python package index (PyPi). The data is stored in a relational SQL database and therefore accessible from many different programming languages. We provide a hands-on and illustrations for each data type, explanations on the received data and also demonstrate the usability from R and Matlab. Academic research heavily relies on costly or confidential data, however, open data projects are becoming increasingly important. This project is mainly motivated to contribute to openly accessible software and free data in the cryptocurrency markets to improve transparency and reproducibility in research and any other disciplines.