Finally, here is a modern, self-contained text on quantum information theory suitable for graduate-level courses. Developing the subject 'from the ground up' it covers classical results as well as ...major advances of the past decade. Beginning with an extensive overview of classical information theory suitable for the non-expert, the author then turns his attention to quantum mechanics for quantum information theory, and the important protocols of teleportation, super-dense coding and entanglement distribution. He develops all of the tools necessary for understanding important results in quantum information theory, including capacity theorems for classical, entanglement-assisted, private and quantum communication. The book also covers important recent developments such as superadditivity of private, coherent and Holevo information, and the superactivation of quantum capacity. This book will be warmly welcomed by the upcoming generation of quantum information theorists and the already established community of classical information theorists.
Moving Objects Databases is the first uniform treatment of moving objects databases, the technology that supports GPS and RFID. It focuses on the modeling and design of data from moving objects — ...such as people, animals, vehicles, hurricanes, forest fires, oil spills, armies, or other objects — as well as the storage, retrieval, and querying of that very voluminous data.It includes homework assignments at the end of each chapter, exercises throughout the text that students can complete as they read, and a solutions manual in the back of the book.This book is intended for graduate or advanced undergraduate students. It is also recommended for computer scientists and database systems engineers and programmers in government, industry and academia; professionals from other disciplines, e.g., geography, geology, soil science, hydrology, urban and regional planning, mobile computing, bioterrorism and homeland security, etc.Focuses on the modeling and design of data from moving objects--such as people, animals, vehicles, hurricanes, forest fires, oil spills, armies, or other objects--as well as the storage, retrieval, and querying of that very voluminous data.Demonstrates through many practical examples and illustrations how new concepts and techniques are used to integrate time and space in database applications.Provides exercises and solutions in each chapter to enable the reader to explore recent research results in practice.
Mark Davison examines several legal models designed to protect databases, considering in particular the EU Directive, the history of its adoption and its transposition into national laws. He compares ...the Directive with a range of American legislative proposals, as well as the principles of misappropriation that underpin them. In addition, the book also contains a commentary on the appropriateness of the various models in the context of moves for an international agreement on the topic. This book will be of interest to academics and practitioners, including those involved with databases and other forms of new media.
The Burrows-Wheeler Transform is a text transformation scheme that has found applications in different aspects of the data explosion problem, from data compression to index structures and search. The ...BWT belongs to a new class of compression algorithms, distinguished by its ability to perform compression by sorted contexts. More recently, the BWT has also found various applications in addition to text data compression, such as in lossless and lossy image compression, tree-source identification, bioinformatics, machine translation, shape matching, and test data compression. This book will serve as a reference for seasoned professionals and researchers in the area, while providing a gentle introduction that makes it accessible for senior undergraduate students or first-year graduate students embarking upon research in compression, pattern matching, full text retrieval, compressed index structures, or other areas related to the BWT.
This is one book that can genuinely be said to be straight from the horse's mouth. Written by the originator of the technique, it examines parallel coordinates as the leading methodology for ...multidimensional visualization. Starting from geometric foundations, this is the first systematic and rigorous exposition of the methodology's mathematical and algorithmic components. It covers, among many others, the visualization of multidimensional lines, minimum distances, planes, hyperplanes, and clusters of "near" planes. The last chapter explains in a non-technical way the methodology's application to visual and automatic data mining. The principles of the latter, along with guidelines, strategies and algorithms are illustrated in detail on real high-dimensional datasets.
Engagement with scientific manuscripts is frequently facilitated by Twitter and other social media platforms. As such, the demographics of a paper's social media audience provide a wealth of ...information about how scholarly research is transmitted, consumed, and interpreted by online communities. By paying attention to public perceptions of their publications, scientists can learn whether their research is stimulating positive scholarly and public thought. They can also become aware of potentially negative patterns of interest from groups that misinterpret their work in harmful ways, either willfully or unintentionally, and devise strategies for altering their messaging to mitigate these impacts. In this study, we collected 331,696 Twitter posts referencing 1,800 highly tweeted bioRxiv preprints and leveraged topic modeling to infer the characteristics of various communities engaging with each preprint on Twitter. We agnostically learned the characteristics of these audience sectors from keywords each user's followers provide in their Twitter biographies. We estimate that 96% of the preprints analyzed are dominated by academic audiences on Twitter, suggesting that social media attention does not always correspond to greater public exposure. We further demonstrate how our audience segmentation method can quantify the level of interest from nonspecialist audience sectors such as mental health advocates, dog lovers, video game developers, vegans, bitcoin investors, conspiracy theorists, journalists, religious groups, and political constituencies. Surprisingly, we also found that 10% of the preprints analyzed have sizable (>5%) audience sectors that are associated with right-wing white nationalist communities. Although none of these preprints appear to intentionally espouse any right-wing extremist messages, cases exist in which extremist appropriation comprises more than 50% of the tweets referencing a given preprint. These results present unique opportunities for improving and contextualizing the public discourse surrounding scientific research.
In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of ...data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.
This open access book is part of the LAMBDA Project (Learning, Applying, Multiplying Big Data Analytics), funded by the European Union, GA No. 809965. Data Analytics involves applying algorithmic ...processes to derive insights. Nowadays it is used in many industries to allow organizations and companies to make better decisions as well as to verify or disprove existing theories or models. The term data analytics is often used interchangeably with intelligence, statistics, reasoning, data mining, knowledge discovery, and others. The goal of this book is to introduce some of the definitions, methods, tools, frameworks, and solutions for big data processing, starting from the process of information extraction and knowledge representation, via knowledge processing and analytics to visualization, sense-making, and practical applications. Each chapter in this book addresses some pertinent aspect of the data processing chain, with a specific focus on understanding Enterprise Knowledge Graphs, Semantic Big Data Architectures, and Smart Data Analytics solutions. This book is addressed to graduate students from technical disciplines, to professional audiences following continuous education short courses, and to researchers from diverse areas following self-study courses. Basic skills in computer science, mathematics, and statistics are required.
Clinical prediction models estimated with health records data may perpetuate inequities.
To evaluate racial/ethnic differences in the performance of statistical models that predict suicide.
In this ...diagnostic/prognostic study, performed from January 1, 2009, to September 30, 2017, with follow-up through December 31, 2017, all outpatient mental health visits to 7 large integrated health care systems by patients 13 years or older were evaluated. Prediction models were estimated using logistic regression with LASSO variable selection and random forest in a training set that contained all visits from a 50% random sample of patients (6 984 184 visits). Performance was evaluated in the remaining 6 996 386 visits, including visits from White (4 031 135 visits), Hispanic (1 664 166 visits), Black (578 508 visits), Asian (313 011 visits), and American Indian/Alaskan Native (48 025 visits) patients and patients without race/ethnicity recorded (274 702 visits). Data analysis was performed from January 1, 2019, to February 1, 2021.
Demographic, diagnosis, prescription, and utilization variables and Patient Health Questionnaire 9 responses.
Suicide death in the 90 days after a visit.
This study included 13 980 570 visits by 1 433 543 patients (64% female; mean SD age, 42 18 years. A total of 768 suicide deaths were observed within 90 days after 3143 visits. Suicide rates were highest for visits by patients with no race/ethnicity recorded (n = 313 visits followed by suicide within 90 days, rate = 5.71 per 10 000 visits), followed by visits by Asian (n = 187 visits followed by suicide within 90 days, rate = 2.99 per 10 000 visits), White (n = 2134 visits followed by suicide within 90 days, rate = 2.65 per 10 000 visits), American Indian/Alaskan Native (n = 21 visits followed by suicide within 90 days, rate = 2.18 per 10 000 visits), Hispanic (n = 392 visits followed by suicide within 90 days, rate = 1.18 per 10 000 visits), and Black (n = 65 visits followed by suicide within 90 days, rate = 0.56 per 10 000 visits) patients. The area under the curve (AUC) and sensitivity of both models were high for White, Hispanic, and Asian patients and poor for Black and American Indian/Alaskan Native patients and patients without race/ethnicity recorded. For example, the AUC for the logistic regression model was 0.828 (95% CI, 0.815-0.840) for White patients compared with 0.640 (95% CI, 0.598-0.681) for patients with unrecorded race/ethnicity and 0.599 (95% CI, 0.513-0.686) for American Indian/Alaskan Native patients. Sensitivity at the 90th percentile was 62.2% (95% CI, 59.2%-65.0%) for White patients compared with 27.5% (95% CI, 21.0%-34.7%) for patients with unrecorded race/ethnicity and 10.0% (95% CI, 0%-23.0%) for Black patients. Results were similar for random forest models, with an AUC of 0.812 (95% CI, 0.800-0.826) for White patients compared with 0.676 (95% CI, 0.638-0.714) for patients with unrecorded race/ethnicity and 0.642 (95% CI, 0.579-0.710) for American Indian/Alaskan Native patients and sensitivities at the 90th percentile of 52.8% (95% CI, 50.0%-55.8%) for White patients, 29.3% (95% CI, 22.8%-36.5%) for patients with unrecorded race/ethnicity, and 6.7% (95% CI, 0%-16.7%) for Black patients.
These suicide prediction models may provide fewer benefits and more potential harms to American Indian/Alaskan Native or Black patients or those with undrecorded race/ethnicity compared with White, Hispanic, and Asian patients. Improving predictive performance in disadvantaged populations should be prioritized to improve, rather than exacerbate, health disparities.