Mark Davison examines several legal models designed to protect databases, considering in particular the EU Directive, the history of its adoption and its transposition into national laws. He compares ...the Directive with a range of American legislative proposals, as well as the principles of misappropriation that underpin them. In addition, the book also contains a commentary on the appropriateness of the various models in the context of moves for an international agreement on the topic. This book will be of interest to academics and practitioners, including those involved with databases and other forms of new media.
Abstract
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database ...and the PubMed database of citations and abstracts for published life science journals. The Entrez system provides search and retrieval operations for most of these data from 39 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. New resources released in the past year include PubMed Data Management, RefSeq Functional Elements, genome data download, variation services API, Magic-BLAST, QuickBLASTp, and Identical Protein Groups. Resources that were updated in the past year include the genome data viewer, a human genome resources page, Gene, virus variation, OSIRIS, and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Private information retrieval (PIR) is the problem of retrieving as efficiently as possible, one out of <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula> messages from ...<inline-formula> <tex-math notation="LaTeX">N </tex-math></inline-formula> non-communicating replicated databases (each holds all <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula> messages) while keeping the identity of the desired message index a secret from each individual database. The information theoretic capacity of PIR (equivalently, the reciprocal of minimum download cost) is the maximum number of bits of desired information that can be privately retrieved per bit of downloaded information. <inline-formula> <tex-math notation="LaTeX">T </tex-math></inline-formula>-private PIR is a generalization of PIR to include the requirement that even if any <inline-formula> <tex-math notation="LaTeX">T </tex-math></inline-formula> of the <inline-formula> <tex-math notation="LaTeX">N </tex-math></inline-formula> databases collude, the identity of the retrieved message remains completely unknown to them. Robust PIR is another generalization that refers to the scenario where we have <inline-formula> <tex-math notation="LaTeX">M \geq N </tex-math></inline-formula> databases, out of which any <inline-formula> <tex-math notation="LaTeX">M - N </tex-math></inline-formula> may fail to respond. For <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula> messages and <inline-formula> <tex-math notation="LaTeX">M\geq N </tex-math></inline-formula> databases out of which at least some <inline-formula> <tex-math notation="LaTeX">N </tex-math></inline-formula> must respond, we show that the capacity of <inline-formula> <tex-math notation="LaTeX">T </tex-math></inline-formula>-private and Robust PIR is <inline-formula> <tex-math notation="LaTeX">(1+T/N+T^{2}/N^{2}+\cdots +T^{K-1}/N^{K-1})^{-1} </tex-math></inline-formula>. The result includes as special cases the capacity of PIR without robustness (<inline-formula> <tex-math notation="LaTeX">M=N </tex-math></inline-formula>) or <inline-formula> <tex-math notation="LaTeX">T </tex-math></inline-formula>-privacy constraints (<inline-formula> <tex-math notation="LaTeX">T=1 </tex-math></inline-formula>).
Nucleotide sequence and taxonomy reference databases are critical resources for widespread applications including marker-gene and metagenome sequencing for microbiome analysis, diet metabarcoding, ...and environmental DNA (eDNA) surveys. Reproducibly generating, managing, using, and evaluating nucleotide sequence and taxonomy reference databases creates a significant bottleneck for researchers aiming to generate custom sequence databases. Furthermore, database composition drastically influences results, and lack of standardization limits cross-study comparisons. To address these challenges, we developed RESCRIPt, a Python 3 software package and QIIME 2 plugin for reproducible generation and management of reference sequence taxonomy databases, including dedicated functions that streamline creating databases from popular sources, and functions for evaluating, comparing, and interactively exploring qualitative and quantitative characteristics across reference databases. To highlight the breadth and capabilities of RESCRIPt, we provide several examples for working with popular databases for microbiome profiling (SILVA, Greengenes, NCBI-RefSeq, GTDB), eDNA and diet metabarcoding surveys (BOLD, GenBank), as well as for genome comparison. We show that bigger is not always better, and reference databases with standardized taxonomies and those that focus on type strains have quantitative advantages, though may not be appropriate for all use cases. Most databases appear to benefit from some curation (quality filtering), though sequence clustering appears detrimental to database quality. Finally, we demonstrate the breadth and extensibility of RESCRIPt for reproducible workflows with a comparison of global hepatitis genomes. RESCRIPt provides tools to democratize the process of reference database acquisition and management, enabling researchers to reproducibly and transparently create reference materials for diverse research applications. RESCRIPt is released under a permissive BSD-3 license at https://github.com/bokulich-lab/RESCRIPt.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank
nucleic acid sequence database and the ...PubMed database of citations and abstracts for published life science journals. The Entrez system provides search and retrieval operations for most of these data from 37 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. New resources released in the past year include iCn3D, MutaBind, and the Antimicrobial Resistance Gene Reference Database; and resources that were updated in the past year include My Bibliography, SciENcv, the Pathogen Detection Project, Assembly, Genome, the Genome Data Viewer, BLAST and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Abstract
Drug development involves a deep understanding of the mechanisms of action and possible side effects of each drug, and sometimes results in the identification of new and unexpected uses for ...drugs, termed as drug repurposing. Both in case of serendipitous observations and systematic mechanistic explorations, confirmation of new indications for a drug requires hypothesis building around relevant drug-related data, such as molecular targets involved, and patient and cellular responses. These datasets are available in public repositories, but apart from sifting through the sheer amount of data imposing computational bottleneck, a major challenge is the difficulty in selecting which databases to use from an increasingly large number of available databases. The database selection is made harder by the lack of an overview of the types of data offered in each database. In order to alleviate these problems and to guide the end user through the drug repurposing efforts, we provide here a survey of 102 of the most promising and drug-relevant databases reported to date. We summarize the target coverage and types of data available in each database and provide several examples of how multi-database exploration can facilitate drug repurposing.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Organizations that collect large amounts of unstructured data are increasingly turning to nonrelational databases, now frequently called NoSQL databases.
Rigorous evidence identification is essential for systematic reviews and meta‐analyses (evidence syntheses) because the sample selection of relevant studies determines a review's outcome, validity, ...and explanatory power. Yet, the search systems allowing access to this evidence provide varying levels of precision, recall, and reproducibility and also demand different levels of effort. To date, it remains unclear which search systems are most appropriate for evidence synthesis and why. Advice on which search engines and bibliographic databases to choose for systematic searches is limited and lacking systematic, empirical performance assessments. This study investigates and compares the systematic search qualities of 28 widely used academic search systems, including Google Scholar, PubMed, and Web of Science. A novel, query‐based method tests how well users are able to interact and retrieve records with each system. The study is the first to show the extent to which search systems can effectively and efficiently perform (Boolean) searches with regards to precision, recall, and reproducibility. We found substantial differences in the performance of search systems, meaning that their usability in systematic searches varies. Indeed, only half of the search systems analyzed and only a few Open Access databases can be recommended for evidence syntheses without adding substantial caveats. Particularly, our findings demonstrate why Google Scholar is inappropriate as principal search system. We call for database owners to recognize the requirements of evidence synthesis and for academic journals to reassess quality requirements for systematic reviews. Our findings aim to support researchers in conducting better searches for better evidence synthesis.
Full text
Available for:
FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SBCE, SBMB, UL, UM, UPUK
Using validated stimulus material is crucial for ensuring research comparability and replicability. However, many databases rely solely on bidimensional valence ratings, ranging from negative to ...positive. While this material might be appropriate for certain studies, it does not reflect the complexity of attitudes and therefore might hamper the unambiguous interpretation of some study results. In fact, most databases cannot differentiate between neutral (i.e., neither positive nor negative) and ambivalent (i.e., simultaneously positive and negative) attitudes. Consequently, even presumably univalent (only positive or negative) stimuli cannot be clearly distinguished from ambivalent ones when selected via bipolar rating scales. In the present research, we introduce the Trier Univalence Neutrality Ambivalence (TUNA) database, a database containing 304,262 validation ratings from heterogeneous samples of 3,232 participants and at least 20 (M = 27.3, SD = 4.84) ratings per self-report scale per picture for a variety of attitude objects on split semantic differential scales. As these scales measure positive and negative evaluations independently, the TUNA database allows to distinguish univalence, neutrality, and ambivalence (i.e., potential ambivalence). TUNA also goes beyond previous databases by validating the stimulus materials on affective outcomes such as experiences of conflict (i.e., felt ambivalence), arousal, anger, disgust, and empathy. The TUNA database consists of 796 pictures and is compatible with other popular databases. It sets a focus on food pictures in various forms (e.g., raw vs. cooked, non-processed vs. highly processed), but includes pictures of other objects that are typically used in research to study univalent (e.g., flowers) and ambivalent (e.g., money, cars) attitudes for comparison. Furthermore, to facilitate the stimulus selection the TUNA database has an accompanying desktop app that allows easy stimulus selection via a multitude of filter options.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK