This study focuses the theoretical underpinning of the so-called semantic gap. By analysing the discourses on ideals and practices of image search and image use in terms of different understandings ...of 'information' and 'communication' this study illuminates the epistemological foundation of different ways of thinking about image descriptions. More precisely, it compares the discourse for metadata production, standards and information management with the discourse among humanities scholars. Close readings of handbooks, best practice and interviews with metadata producers discloses a discourse imbued by a mechanical model for communication and information transmission and a focus on objectivity and effectiveness. Simultaneously, interviews with humanities scholars and close readings of recent archival theory reveal another understanding of metadata as historically, institutionally, and even individually situated interpretations. As this study shows, attentiveness to the theoretical underpinning of the different ideals, wants and needs of metadata for images may illuminate why these differences exists.
Purpose: The purpose of the paper is to provide a framework for addressing the disconnect between metadata and data science. Data science cannot progress without metadata research.This paper takes ...steps toward advancing the synergy between metadata and data science, and identifies pathways for developing a more cohesive metadata research agenda in data science. Design/methodology/approach: This paper identifies factors that challenge metadata research in the digital ecosystem, defines metadata and data science, and presents the concepts big metadata, smart metadata, and metadata capital as part of a metadata lingua franca connecting to data science. Findings: The "utilitarian nature" and "historical and traditional views" of metadata are identified as two intersecting factors that have inhibited metadata research. Big metadata, smart metadata, and metadata capital are presented as part ofa metadata linguafranca to help frame research in the data science research space. Research limitations: There are additional, intersecting factors to consider that likely inhibit metadata research, and other significant metadata concepts to explore. Practical implications: The immediate contribution of this work is that it may elicit response, critique, revision, or, more significantly, motivate research. The work presented can encourage more researchers to consider the significance of metadata as a research worthy topic within data science and the larger digital ecosystem. Originality/value: Although metadata research has not kept pace with other data science topics, there is little attention directed to this problem. This is surprising, given that metadata is essential for data science endeavors. This examination synthesizes original and prior scholarship to provide new grounding for metadata research in data science.
Cilj. Cilj je rada prikaz razvoja metapodatkovnih standarda za opis građe – ili shema metapodataka – Kongresne knjižnice u Washingtonu, SAD, iniciranog ubrzanim razvojem tehnologije i rasta mrežnog ...okruženja u posljednjim desetljećima. Pristup/metodologija/dizajn. U skladu s ciljem rada, odabrani pristup svodi se na sažeti prikaz razvoja svakog metapodatkovnog standarda Kongresne knjižnice pojedinačno, s naglaskom na sheme metapodataka za opis građe knjižnične zajednice – MARC 21, MARCXML, MODS i MADS. Ovaj pristup omogućava autorima da detaljno analiziraju i ocijene trenutačno stanje metapodatkovnih standarda i njihovu evoluciju kroz vrijeme, kao i da identificiraju izazove i prilike za budući razvoj. Rezultati. Rad pruža sažeti pregled razvoja svakog metapodatkovnog standarda u Kongresnoj knjižnici pojedinačno, s fokusom na sheme metapodataka za opis resursa unutar knjižnične zajednice – MARC 21, MARCXML, MODS i MADS. Također se daje pregled razvoja shema metapodataka za opis izvora u srodnim zajednicama (EAD, VRA Core) te metapodatkovnih standarda za digitalne knjižnice namijenjenih arhiviranju i zaštiti (METS, PREMIS). U radu se raspravlja i o potencijalnim smjerovima daljnjeg razvoja metapodatkovnih standarda u Kongresnoj knjižnici, s osvrtom na projekte vezane za tehnologije semantičkog weba i povezanih podataka (BIBFRAME, MODS/RDF, MADS/RDF). Ističe se potreba za kontinuiranim poboljšanjem metapodatkovnih standarda kako bi se pružale bolje usluge korisnicima i uskladile s tehnološkim razvojem i potrebamakorisnika. Vrijednost. Rad pruža sveobuhvatan osvrt na razvoj metapodatkovnih standarda Kongresne knjižnice, što je iznimno važno za knjižničnu zajednicu i one koji se bave arhiviranjem i očuvanjem digitalnih izvora. Rasprava o mogućim pravcima razvoja metapodatkovnih standarda,posebno u kontekstu semantičkog weba i povezanih podataka, pruža uvid u buduće izazove i prilike za Kongresnu knjižnicu i širu zajednicu. Ovaj rad, u konačnici, pridonosi razumijevanju važnosti i složenosti metapodatkovnih standarda te njihova kontinuiranog razvoja u svjetlu brzih tehnoloških promjena.
Metadata is becoming more than a tool to facilitate access and retrieval; librarians and other metadata professionals and their users are expecting metadata to perform multiple and diverse purposes ...and functions: as much as metadata helps connect users to resources, it is expected to appropriately situate resources in relationship to other resources, and within historical and contemporary social contexts. In light of various social justice movements and Diversity, Equity, and Inclusion (DEI) efforts, metadata are being viewed as not only representing resources, but as powerful mechanisms for representing ourselves and others, and moreover, as representative of our organizational, professional, and communal values. Despite these increased demands on metadata and the roles we expect it to play, our frameworks for assessing and evaluating metadata quality have not kept pace. This article proposes that there needs to be increased user-centered research to the end of introducing a new ethical dimension to conventional frameworks and/or expanding our definitions of existing assessment criteria.
Abstract
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding ...members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.
Effective metadata management is a consistent challenge faced by many scientific experiments. These challenges are magnified by the evolving needs of the experiment, the intricacies of seamlessly ...integrating a new system with existing analytical frameworks, and the crucial mandate to maintain database integrity. In this work we present the various challenges faced by experiments that produce a large amount of metadata and describe the solution used by the XENON experiment for metadata management.
This scientific review paper aims at challenging a common point of view on metadata as a necessary evil and something mandatory to the data creating and dataset publishing process. Metadata are ...instead presented as a crucial element to ensure the findability of data services and repositories. This paper describes a way through four levels of metadata management and publication, from default unstructured data, through schema-based metadata with literal values and/or URIs, towards linked open (meta)data providing explicit linkage between reliable data resources. Such research was conducted within the European Union's project PoliVisu. Special attention is given to the following: (1) guidance on publication aimed at the broad audience of search engine users and (2) the publication of geo (meta)data not only via standard technologies, such as the OGC Catalogue Service for Web and open data portals, but also through leading search engines (that are Schema.org-based).
•Metadata developed in line with open linked data concept are scarce in geosciences.•The presented paradigm revokes an artificial border between data and metadata.•Open linked (meta)data are advertised attractively in search engines.•Multiple-times more users can be attracted, both, within as well beyond geosciences.•Incremental implementations approach is a novel one towards open linked (meta)data.
This article is one of the first to explore and delve into the legal system, with a focus on the burgeoning use of metadata in civil cases. Although metadata is embedded in all kinds of digital files ...including text, audio, and image files, as well as many social media and game applications, few understand how both the visible and embedded information is being “mined” (collected) for a myriad of uses by organizations, such as, Google or even the United States government. Consequently, in this paper, we explore the implications of metadata use in civil cases and how it could bring a new era of evidence in litigation, which has huge ramifications for how the average citizen may begin to view their privacy in the course of everyday activities.
This article presents a Unified Architecture (UA) for automated point tagging of Building Automation System (BAS) data, based on a combination of data-driven approaches. Advanced energy analytics ...applications—including fault detection and diagnostics and supervisory control—have emerged as a significant opportunity for improving the performance of our built environment. Effective application of these analytics depends on harnessing structured data from the various building control and monitoring systems, but typical BAS implementations do not employ any standardized metadata schema. While standards such as Project Haystack and Brick Schema have been developed to address this issue, the process of structuring the data, i.e., tagging the points to apply a standard metadata schema, has, to date, been a manual process. This process is typically costly, labor-intensive, and error-prone. In this work we address this gap by proposing a UA that automates the process of point tagging by leveraging the data accessible through connection to the BAS, including time-series data and the raw point names. The UA intertwines supervised classification and unsupervised clustering techniques from machine learning and leverages both their deterministic and probabilistic outputs to inform the point tagging process. Furthermore, we extend the UA to embed additional input and output data-processing modules that are designed to address the challenges associated with the real-time deployment of this automation solution. We test the UA on two datasets for real-life buildings: (i) commercial retail buildings and (ii) office buildings from the National Renewable Energy Laboratory (NREL) campus. The proposed methodology correctly applied 85–90% and 70–75% of the tags in each of these test scenarios, respectively for two significantly different building types used for testing UA's fully-functional prototype. The proposed UA, therefore, offers promising approach for automatically tagging BAS data as it reaches close to 90% accuracy. Further building upon this framework to algorithmically identify the equipment type and their relationships is an apt future research direction to pursue.
•Advanced energy analytics and Internet of Thing (IoT) applications require raw built environment data to be structured.•A Unified Architecture based on machine learning is proposed to automate the tagging of metadata.•Proposed Unified Architecture also harnessed available human-knowledge in conjunction with machine learning.•Challenges with real-time deployment of automated metadata tagging applications are identified.•Two real-life use cases demonstrate the effectiveness of the Unified Architecture.