Mark Davison examines several legal models designed to protect databases, considering in particular the EU Directive, the history of its adoption and its transposition into national laws. He compares ...the Directive with a range of American legislative proposals, as well as the principles of misappropriation that underpin them. In addition, the book also contains a commentary on the appropriateness of the various models in the context of moves for an international agreement on the topic. This book will be of interest to academics and practitioners, including those involved with databases and other forms of new media.
Abstract
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database ...and the PubMed database of citations and abstracts for published life science journals. The Entrez system provides search and retrieval operations for most of these data from 39 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. New resources released in the past year include PubMed Data Management, RefSeq Functional Elements, genome data download, variation services API, Magic-BLAST, QuickBLASTp, and Identical Protein Groups. Resources that were updated in the past year include the genome data viewer, a human genome resources page, Gene, virus variation, OSIRIS, and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Private information retrieval (PIR) is the problem of retrieving as efficiently as possible, one out of <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula> messages from ...<inline-formula> <tex-math notation="LaTeX">N </tex-math></inline-formula> non-communicating replicated databases (each holds all <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula> messages) while keeping the identity of the desired message index a secret from each individual database. The information theoretic capacity of PIR (equivalently, the reciprocal of minimum download cost) is the maximum number of bits of desired information that can be privately retrieved per bit of downloaded information. <inline-formula> <tex-math notation="LaTeX">T </tex-math></inline-formula>-private PIR is a generalization of PIR to include the requirement that even if any <inline-formula> <tex-math notation="LaTeX">T </tex-math></inline-formula> of the <inline-formula> <tex-math notation="LaTeX">N </tex-math></inline-formula> databases collude, the identity of the retrieved message remains completely unknown to them. Robust PIR is another generalization that refers to the scenario where we have <inline-formula> <tex-math notation="LaTeX">M \geq N </tex-math></inline-formula> databases, out of which any <inline-formula> <tex-math notation="LaTeX">M - N </tex-math></inline-formula> may fail to respond. For <inline-formula> <tex-math notation="LaTeX">K </tex-math></inline-formula> messages and <inline-formula> <tex-math notation="LaTeX">M\geq N </tex-math></inline-formula> databases out of which at least some <inline-formula> <tex-math notation="LaTeX">N </tex-math></inline-formula> must respond, we show that the capacity of <inline-formula> <tex-math notation="LaTeX">T </tex-math></inline-formula>-private and Robust PIR is <inline-formula> <tex-math notation="LaTeX">(1+T/N+T^{2}/N^{2}+\cdots +T^{K-1}/N^{K-1})^{-1} </tex-math></inline-formula>. The result includes as special cases the capacity of PIR without robustness (<inline-formula> <tex-math notation="LaTeX">M=N </tex-math></inline-formula>) or <inline-formula> <tex-math notation="LaTeX">T </tex-math></inline-formula>-privacy constraints (<inline-formula> <tex-math notation="LaTeX">T=1 </tex-math></inline-formula>).
Nucleotide sequence and taxonomy reference databases are critical resources for widespread applications including marker-gene and metagenome sequencing for microbiome analysis, diet metabarcoding, ...and environmental DNA (eDNA) surveys. Reproducibly generating, managing, using, and evaluating nucleotide sequence and taxonomy reference databases creates a significant bottleneck for researchers aiming to generate custom sequence databases. Furthermore, database composition drastically influences results, and lack of standardization limits cross-study comparisons. To address these challenges, we developed RESCRIPt, a Python 3 software package and QIIME 2 plugin for reproducible generation and management of reference sequence taxonomy databases, including dedicated functions that streamline creating databases from popular sources, and functions for evaluating, comparing, and interactively exploring qualitative and quantitative characteristics across reference databases. To highlight the breadth and capabilities of RESCRIPt, we provide several examples for working with popular databases for microbiome profiling (SILVA, Greengenes, NCBI-RefSeq, GTDB), eDNA and diet metabarcoding surveys (BOLD, GenBank), as well as for genome comparison. We show that bigger is not always better, and reference databases with standardized taxonomies and those that focus on type strains have quantitative advantages, though may not be appropriate for all use cases. Most databases appear to benefit from some curation (quality filtering), though sequence clustering appears detrimental to database quality. Finally, we demonstrate the breadth and extensibility of RESCRIPt for reproducible workflows with a comparison of global hepatitis genomes. RESCRIPt provides tools to democratize the process of reference database acquisition and management, enabling researchers to reproducibly and transparently create reference materials for diverse research applications. RESCRIPt is released under a permissive BSD-3 license at https://github.com/bokulich-lab/RESCRIPt.
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank
nucleic acid sequence database and the ...PubMed database of citations and abstracts for published life science journals. The Entrez system provides search and retrieval operations for most of these data from 37 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. New resources released in the past year include iCn3D, MutaBind, and the Antimicrobial Resistance Gene Reference Database; and resources that were updated in the past year include My Bibliography, SciENcv, the Pathogen Detection Project, Assembly, Genome, the Genome Data Viewer, BLAST and PubChem. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Organizations that collect large amounts of unstructured data are increasingly turning to nonrelational databases, now frequently called NoSQL databases.
The recent explosive growth of biological data has lead to a rapid increase in the number of molecular biology databases. Held in many different locations and often using varying interfaces and ...non-standard data formats, integrating and comparing data from these multiple databases can be difficult and time-consuming. This book provides an overview of the key tools currently available for large-scale comparisons of gene sequences and annotations, focusing on the databases and tools from the University of California, Santa Cruz (UCSC), Ensembl, and the National Centre for Biotechnology Information (NCBI). Written specifically for biology and bioinformatics students and researchers, it aims to give an appreciation of the methods by which the browsers and their databases are constructed, enabling readers to determine which tool is the most appropriate for their requirements. Each chapter contains a summary and exercises to aid understanding and promote effective use of these important tools.
Using validated stimulus material is crucial for ensuring research comparability and replicability. However, many databases rely solely on bidimensional valence ratings, ranging from negative to ...positive. While this material might be appropriate for certain studies, it does not reflect the complexity of attitudes and therefore might hamper the unambiguous interpretation of some study results. In fact, most databases cannot differentiate between neutral (i.e., neither positive nor negative) and ambivalent (i.e., simultaneously positive and negative) attitudes. Consequently, even presumably univalent (only positive or negative) stimuli cannot be clearly distinguished from ambivalent ones when selected via bipolar rating scales. In the present research, we introduce the Trier Univalence Neutrality Ambivalence (TUNA) database, a database containing 304,262 validation ratings from heterogeneous samples of 3,232 participants and at least 20 (M = 27.3, SD = 4.84) ratings per self-report scale per picture for a variety of attitude objects on split semantic differential scales. As these scales measure positive and negative evaluations independently, the TUNA database allows to distinguish univalence, neutrality, and ambivalence (i.e., potential ambivalence). TUNA also goes beyond previous databases by validating the stimulus materials on affective outcomes such as experiences of conflict (i.e., felt ambivalence), arousal, anger, disgust, and empathy. The TUNA database consists of 796 pictures and is compatible with other popular databases. It sets a focus on food pictures in various forms (e.g., raw vs. cooked, non-processed vs. highly processed), but includes pictures of other objects that are typically used in research to study univalent (e.g., flowers) and ambivalent (e.g., money, cars) attitudes for comparison. Furthermore, to facilitate the stimulus selection the TUNA database has an accompanying desktop app that allows easy stimulus selection via a multitude of filter options.
Abstract
The Reactome Knowledgebase (https://reactome.org) provides molecular details of signal transduction, transport, DNA replication, metabolism, and other cellular processes as an ordered ...network of molecular transformations-an extended version of a classic metabolic map, in a single consistent data model. Reactome functions both as an archive of biological processes and as a tool for discovering unexpected functional relationships in data such as gene expression profiles or somatic mutation catalogues from tumor cells. To support the continued brisk growth in the size and complexity of Reactome, we have implemented a graph database, improved performance of data analysis tools, and designed new data structures and strategies to boost diagram viewer performance. To make our website more accessible to human users, we have improved pathway display and navigation by implementing interactive Enhanced High Level Diagrams (EHLDs) with an associated icon library, and subpathway highlighting and zooming, in a simplified and reorganized web site with adaptive design. To encourage re-use of our content, we have enabled export of pathway diagrams as 'PowerPoint' files.