Data sharing is increasingly an expectation in health research as part of a general move toward more open sciences. In the United States, in particular, the implementation of the 2023 National ...Institutes of Health Data Management and Sharing Policy has made it clear that qualitative studies are not exempt from this data sharing requirement. Recognizing this trend, the Palliative Care Research Cooperative Group (PCRC) realized the value of creating a de-identified qualitative data repository to complement its existing de-identified quantitative data repository. The PCRC Data Informatics and Statistics Core leadership partnered with the Qualitative Data Repository (QDR) to establish the first serious illness and palliative care qualitative data repository in the U.S. We describe the processes used to develop this repository, called the PCRC-QDR, as well as our outreach and education among the palliative care researcher community, which led to the first ten projects to share the data in the new repository. Specifically, we discuss how we co-designed the PCRC-QDR and created tailored guidelines for depositing and sharing qualitative data depending on the original research context, establishing uniform expectations for key components of relevant documentation, and the use of suitable access controls for sensitive data. We also describe how PCRC was able to leverage its existing community to recruit and guide early depositors and outline lessons learned in evaluating the experience. This work advances the establishment of best practices in qualitative data sharing.
Freeze-casting produces materials with complex, three-dimensional pore structures which may be tuned during the solidification process. The range of potential applications of freeze-cast materials is ...vast, and includes: structural materials, biomaterials, filtration membranes, pharmaceuticals, and foodstuffs. Fabrication of materials with application-specific microstructures is possible via freeze casting, however, the templating process is highly complex and the underlying principles are only partially understood. Here, we report the creation of a freeze-casting experimental data repository, which contains data extracted from ∼800 different freeze-casting papers (as of August 2017). These data pertain to variables that link processing conditions to microstructural characteristics, and finally, mechanical properties. The aim of this work is to facilitate broad dissemination of relevant data to freeze-casting researchers, promote better informed experimental design, and encourage modeling efforts that relate processing conditions to microstructure formation and material properties. An initial, systematic analysis of these data is provided and key processing-structure-property relationships posited in the freeze-casting literature are discussed and tested against the database. Tools for data visualization and exploration available through the web interface are also provided.
Librarians and Repository Data Royani, Yupi; Nani Rahayu, Rochani; Saefudin Suriapermana, Ahmad
Khizanah al-Hikmah (Online),
12/2020, Letnik:
8, Številka:
2
Journal Article
Recenzirano
Odprti dostop
This study aims to determine the librarian's understanding of data repository which include benefits, engagement of librarians in repository management, and repository implementation. The study has ...surveyed librarians at the Indonesian Institute of Sciences located in Jakarta, Bogor, Cibinong, and Serpong by distributing 45 questionnaires, but only 36 returned. The data were tabulated and analyzed using graphs/diagrams. The respondents of the study were female (61.1%) and male (38.9%). The education background of the respondents was undergraduate (50%), the most working period is between 20-30 years (38.9%), and the most work locations were in Jakarta (70.6%). The study showed that respondents with an undergraduate background and those who work in Jakarta have the highest understanding of the repository (69%). The most involved in repository management were those with bachelor's degrees and working in Jakarta (52%). Respondents who were involved in the socialization of the data repository were 47%, and those who were not involved were 44%. As many as 44% of respondents stated that the data repository has been implemented, and the remaining 56% said it has not been implemented. In conclusion, most respondents understand the data repository, especially those with a bachelor's degree, and were working in Jakarta. The data repository has not been implemented properly due to lack of outreach to researchers and their low trust in the data repository.
The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available ...real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists.
The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. From this study, we find that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered.
This work represents another important step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.
In this paper we present the Tardigrada Register (www.tardigrada.net/register): a free, comprehensive, and standardised online data repository for tardigrade taxonomy. We outline key problems of the ...modern tardigrade systematics and we propose the Register as a potential solution to some of them. We then describe the idea, structure and works of the service and discuss challenges it may face. However, most importantly, we hope to convince fellow Tardigradologists that sharing their data via the Register will benefit the entire community of the contemporary and future tardigrade researchers.
Recognizing the value of open-source research databases in advancing the art and science of HVAC, in 2014 the ASHRAE Global Thermal Comfort Database II project was launched under the leadership of ...University of California at Berkeley's Center for the Built Environment and The University of Sydney's Indoor Environmental Quality (IEQ) Laboratory. The exercise began with a systematic collection and harmonization of raw data from the last two decades of thermal comfort field studies around the world. The ASHRAE Global Thermal Comfort Database II (Comfort Database), now an online, open-source database, includes approximately 81,846 complete sets of objective indoor climatic observations with accompanying “right-here-right-now” subjective evaluations by the building occupants who were exposed to them. The database is intended to support diverse inquiries about thermal comfort in field settings. A simple web-based interface to the database enables filtering on multiple criteria, including building typology, occupancy type, subjects' demographic variables, subjective thermal comfort states, indoor thermal environmental criteria, calculated comfort indices, environmental control criteria and outdoor meteorological information. Furthermore, a web-based interactive thermal comfort visualization tool has been developed that allows end-users to quickly and interactively explore the data.
•The scope, development, contents, and accessibility of the Comfort Database is documented.•The Comfort Database II includes approximately 76,000 complete sets of thermal comfort data.•The Comfort Database provides access to the collected raw data.•Web-based interactive visualization tool was developed that allows end-users to interactively explore the data.
This paper describes the data repository for the Cambridge Centre for Ageing and Neuroscience (Cam-CAN) initial study cohort. The Cam-CAN Stage 2 repository contains multi-modal (MRI, MEG, and ...cognitive-behavioural) data from a large (approximately N=700), cross-sectional adult lifespan (18–87years old) population-based sample. The study is designed to characterise age-related changes in cognition and brain structure and function, and to uncover the neurocognitive mechanisms that support healthy cognitive ageing. The database contains raw and preprocessed structural MRI, functional MRI (active tasks and resting state), and MEG data (active tasks and resting state), as well as derived scores from cognitive behavioural experiments spanning five broad domains (attention, emotion, action, language, and memory), and demographic and neuropsychological data. The dataset thus provides a depth of neurocognitive phenotyping that is currently unparalleled, enabling integrative analyses of age-related changes in brain structure, brain function, and cognition, and providing a testbed for novel analyses of multi-modal neuroimaging data.
•Cross-sectional uniform adult-lifespan population-based data•Multimodal MRI, fMRI and MEG neuroimaging data•Unprecedented depth of cognitive phenotyping•Age-related differences in brain structure, function, and cognition
To address the growing need for a centralized, community resource of published results processed with Skyline, and to provide reviewers and readers immediate visual access to the data behind ...published conclusions, we present Panorama Public (https://panoramaweb.org/public.url), a repository of Skyline documents supporting published results. Panorama Public is built on Panorama, an open source data management system for mass spectrometry data processed with the Skyline targeted mass spectrometry environment. The Panorama web application facilitates viewing, sharing, and disseminating results contained in Skyline documents via a web-browser. Skyline users can easily upload their documents to a Panorama server and allow other researchers to explore uploaded results in the Panorama web-interface through a variety of familiar summary graphs as well as annotated views of the chromatographic peaks processed with Skyline. This makes Panorama ideal for sharing targeted, quantitative results contained in Skyline documents with collaborators, reviewers, and the larger proteomics community. The Panorama Public repository employs the full data visualization capabilities of Panorama which facilitates sharing results with reviewers during manuscript review.
Public repositories have contributed to the maturation of experimental methodology in machine learning. Publicly available data sets have allowed researchers to empirically assess their learners and, ...jointly with open source machine learning software, they have favoured the emergence of comparative analyses of learners’ performance over a common framework. These studies have brought standard procedures to evaluate machine learning techniques. However, current claims—such as the superiority of enhanced algorithms—are biased by unsustained assumptions made throughout some praxes.
In this paper, the early steps of the methodology, which refer to data set selection, are inspected. Particularly, the exploitation of the most popular data repository in machine learning—the UCI repository—is examined. We analyse the type, complexity, and use of UCI data sets. The study recommends the design of a mindful data repository, UCI+, which should include a set of properly characterised data sets consisting of a complete and representative sample of real-world problems, enriched with artificial benchmarks. The ultimate goal of the UCI+ is to lay the foundations towards a well-supported methodology for learner assessment.
The Novel Materials Discovery (NOMAD) Laboratory is a user-driven platform for sharing and exploiting computational materials science data. It accounts for the various aspects of data being a crucial ...raw material and most relevant to accelerate materials research and engineering. NOMAD, with the NOMAD Repository, and its code-independent and normalized form, the NOMAD Archive, comprises the worldwide largest data collection of this field. Based on its findable accessible, interoperable, reusable data infrastructure, various services are offered, comprising advanced visualization, the NOMAD Encyclopedia, and artificial-intelligence tools. The latter are realized in the NOMAD Analytics Toolkit. Prerequisite for all this is the NOMAD metadata, a unique and thorough description of the data, that are produced by all important computer codes of the community. Uploaded data are tagged by a persistent identifier, and users can also request a digital object identifier to make data citable. Developments and advancements of parsers and metadata are organized jointly with users and code developers. In this work, we review the NOMAD concept and implementation, highlight its orthogonality to and synergistic interplay with other data collections, and provide an outlook regarding ongoing and future developments.