The Palomar Transient Factory (PTF) is a multiepochal robotic survey of the northern sky that acquires data for the scientific study of transient and variable astrophysical phenomena. The camera and ...telescope provide for wide-field imaging in optical bands. In the five years of operation since first light on 2008 December 13, images taken with Mould-R and SDSS-g′ camera filters have been routinely acquired on a nightly basis (weather permitting), and two different Hα filters were installed in 2011 May (656 and 663 nm). The PTF image-processing and data-archival program at the Infrared Processing and Analysis Center (IPAC) is tailored to receive and reduce the data, and, from it, generate and preserve astrometrically and photometrically calibrated images, extracted source catalogs, and co-added reference images. Relational databases have been deployed to track these products in operations and the data archive. The fully automated system has benefited by lessons learned from past IPAC projects and comprises advantageous features that are potentially incorporable into other ground-based observatories. Both off-the-shelf and in-house software have been utilized for economy and rapid development. The PTF data archive is curated by the NASA/IPAC Infrared Science Archive (IRSA). A state-of-the-art custom Web interface has been deployed for downloading the raw images, processed images, and source catalogs from IRSA. Access to PTF data products is currently limited to an initial public data release (M81, M44, M42, SDSS Stripe 82, and the Kepler Survey Field). It is the intent of the PTF collaboration to release the full PTF data archive when sufficient funding becomes available.
A new sea surface temperature (SST) analysis on a centennial time scale is presented. In this analysis, a daily SST field is constructed as a sum of a trend, interannual variations, and daily ...changes, using in situ SST and sea ice concentration observations. All SST values are accompanied with theory-based analysis errors as a measure of reliability. An improved equation is introduced to represent the ice–SST relationship, which is used to produce SST data from observed sea ice concentrations. Prior to the analysis, biases of individual SST measurement types are estimated for a homogenized long-term time series of global mean SST. Because metadata necessary for the bias correction are unavailable for many historical observational reports, the biases are determined so as to ensure consistency among existing SST and nighttime air temperature observations. The global mean SSTs with bias-corrected observations are in agreement with those of a previously published study, which adopted a different approach. Satellite observations are newly introduced for the purpose of reconstruction of SST variability over data-sparse regions. Moreover, uncertainty in areal means of the present and previous SST analyses is investigated using the theoretical analysis errors and estimated sampling errors. The result confirms the advantages of the present analysis, and it is helpful in understanding the reliability of SST for a specific area and time period.
On the synthesis of metadata tags for HTML files Jiménez, Patricia; Roldán, Juan C.; Gallego, Fernando O. ...
Software, practice & experience,
December 2020, 2020-12-00, 20201201, Letnik:
50, Številka:
12
Journal Article
Recenzirano
Odprti dostop
Summary
RDFa, JSON‐LD, Microdata, and Microformats allow to endow the data in HTML files with metadata tags that help software agents understand them. Unluckily, there are many HTML files that do not ...have any metadata tags, which has motivated many authors to work on proposals to synthesize them. But they have some problems: the authors either provide an overall picture of their designs without too many details on the techniques behind the scenes or focus on the techniques but do not describe the design of the software systems that support them; many of them cannot deal with data that are encoded using semistructured formats like forms, listings, or tables; and the few proposals that can work on tables can deal with horizontal listings only. In this article, we describe the design of a system that overcomes the previous limitations using a novel embedding approach that has proven to outperform four state‐of‐the‐art techniques on a repository with randomly selected HTML files from 40 different sites. According to our experimental analysis, our proposal can achieve an F1 score that outperforms the others by 10.14%; this difference was confirmed to be statistically significant at the standard confidence level.
The emergence of nanoinformatics as a key component of nanotechnology and nanosafety assessment for the prediction of engineered nanomaterials (NMs) properties, interactions, and hazards, and for ...grouping and read-across to reduce reliance on animal testing, has put the spotlight firmly on the need for access to high-quality, curated datasets. To date, the focus has been around what constitutes data quality and completeness, on the development of minimum reporting standards, and on the FAIR (findable, accessible, interoperable, and reusable) data principles. However, moving from the theoretical realm to practical implementation requires human intervention, which will be facilitated by the definition of clear roles and responsibilities across the complete data lifecycle and a deeper appreciation of what metadata is, and how to capture and index it. Here, we demonstrate, using specific worked case studies, how to organise the nano-community efforts to define metadata schemas, by organising the data management cycle as a joint effort of all players (data creators, analysts, curators, managers, and customers) supervised by the newly defined role of data shepherd. We propose that once researchers understand their tasks and responsibilities, they will naturally apply the available tools. Two case studies are presented (modelling of particle agglomeration for dose metrics, and consensus for NM dissolution), along with a survey of the currently implemented metadata schema in existing nanosafety databases. We conclude by offering recommendations on the steps forward and the needed workflows for metadata capture to ensure FAIR nanosafety data.
Abstract
While metagenomic sequencing has become the tool of preference to study host-associated microbial communities, downstream analyses and clinical interpretation of microbiome data remains ...challenging due to the sparsity and compositionality of sequence matrices. Here, we evaluate both computational and experimental approaches proposed to mitigate the impact of these outstanding issues. Generating fecal metagenomes drawn from simulated microbial communities, we benchmark the performance of thirteen commonly used analytical approaches in terms of diversity estimation, identification of taxon-taxon associations, and assessment of taxon-metadata correlations under the challenge of varying microbial ecosystem loads. We find quantitative approaches including experimental procedures to incorporate microbial load variation in downstream analyses to perform significantly better than computational strategies designed to mitigate data compositionality and sparsity, not only improving the identification of true positive associations, but also reducing false positive detection. When analyzing simulated scenarios of low microbial load dysbiosis as observed in inflammatory pathologies, quantitative methods correcting for sampling depth show higher precision compared to uncorrected scaling. Overall, our findings advocate for a wider adoption of experimental quantitative approaches in microbiome research, yet also suggest preferred transformations for specific cases where determination of microbial load of samples is not feasible.
Audiovisual resources available online gain more importance in satisfying users' information needs, yet the quality of metadata that ensures discoverability of these resources is not yet emphasized, ...mainly due to lack of benchmark data. This article addresses the need by presenting results of the comparative evaluation of accuracy and completeness of Dublin Core metadata records created by metadata beginners to represent audio recordings and video recordings. We present our findings in the context of how the metadata learning is organized at the program which currently prepares most of those entering the information profession in Kuwait. Findings reveal some similarities, as well as some pronounced differences in metadata accuracy and completeness patterns for two kinds of online digital resources: audio recordings and video recordings. Video metadata was found to be of substantially higher quality than audio metadata created by the same beginners. Overall, for audiovisual information resources, we found the Type metadata field to be the least prone to completeness errors, and Format to accuracy errors. Our data suggests that the Source metadata field is the most vulnerable for accuracy errors in both audio metadata and video metadata. However, Dublin Core metadata fields with the highest possibility of completeness errors did not exhibit overlap between the two sets of beginner-created metadata records. We discuss examples of the most common metadata errors and compare results with findings of previous research. Empirical data obtained in this study allows assessing preparedness of information professionals to create metadata that is functional in supporting resource discovery.
This article aims to evaluate how and to what extent metadata of datasets indexed in DataCite offer clear human- or machine-readable information that enables the research data to be linked to a ...particular research institution. Two main pathways are explored. First, researchers can encode their affiliation information at the moment of data submission. This can be done by means of free-text metadata fields or via the inclusion of identifiers such as GRID/ROR and ORCID. Second, affiliation information can be traced indirectly through linking between a dataset and associated publications, given that the metadata of publications is often more explicit about affiliation information than the metadata of datasets. Both pathways of affiliation information encoding are evaluated on the basis of metadata pertaining to datasets created at the five Flemish universities. It is shown that good practices such as encoding of affiliation information in a dedicated metadata field or inclusion of ORCID in the metadata are on the rise, but could be expanded further. Finally, the establishment of links between datasets and related publications is often lacking in dataset metadata, although there are important differences between data repositories, as is also demonstrated in a more data-intensive follow-up analysis based on random samples of metadata records. It is important that data repositories address this issue by providing a metadata field clearly dedicated to associated publications, prominently displayed on the landing page of the dataset. Keywords: DataCite, Scholix, research data, metadata, affiliation
The rise of smartphones and web services made possible the large-scale collection of personal metadata. Information about individuals' location, phone call logs, or web-searches, is collected and ...used intensively by organizations and big data researchers. Metadata has however yet to realize its full potential. Privacy and legal concerns, as well as the lack of technical solutions for personal metadata management is preventing metadata from being shared and reconciled under the control of the individual. This lack of access and control is furthermore fueling growing concerns, as it prevents individuals from understanding and managing the risks associated with the collection and use of their data. Our contribution is two-fold: (1) we describe openPDS, a personal metadata management framework that allows individuals to collect, store, and give fine-grained access to their metadata to third parties. It has been implemented in two field studies; (2) we introduce and analyze SafeAnswers, a new and practical way of protecting the privacy of metadata at an individual level. SafeAnswers turns a hard anonymization problem into a more tractable security one. It allows services to ask questions whose answers are calculated against the metadata instead of trying to anonymize individuals' metadata. The dimensionality of the data shared with the services is reduced from high-dimensional metadata to low-dimensional answers that are less likely to be re-identifiable and to contain sensitive information. These answers can then be directly shared individually or in aggregate. openPDS and SafeAnswers provide a new way of dynamically protecting personal metadata, thereby supporting the creation of smart data-driven services and data science research.
Human untargeted metabolomics studies annotate only ~10% of molecular features. We introduce reference-data-driven analysis to match metabolomics tandem mass spectrometry (MS/MS) data against ...metadata-annotated source data as a pseudo-MS/MS reference library. Applying this approach to food source data, we show that it increases MS/MS spectral usage 5.1-fold over conventional structural MS/MS library matches and allows empirical assessment of dietary patterns from untargeted data.