In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of ...data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.
To compare racial and ethnic differences between obstetrician-gynecologists (ob-gyns) and other large groups of adult medical specialists who provide the predominant care of women. Whether physician ...diversity influences their practice locations in underserved areas was also sought.
This cross-sectional study reports an analysis of U.S. national data about racial and ethnic characteristics, gender, and specialty (obstetrics and gynecology, general internal medicine, family medicine, emergency medicine) of 190,379 physicians who came from three resources (Association of American Medical Colleges Student Records System, Association of American Medical Colleges Minority Physicians Database, American Medical Association Physician Masterfile). Underserved locations were identified as being rural, having 20% or more of the population living in poverty or being federally designated as areas of professional shortages or underserved populations. Bivariate measures of associations were performed to study the association between physician race and ethnicity and their practice location.
Female physicians in all specialties were more likely than males to be nonwhite, and ob-gyns were most likely to be female (61.9%). Compared with other studied specialists, ob-gyns had the highest proportion of underrepresented minorities (combined, 18.4%), especially black (11.1%) and Hispanic (6.7%) physicians. Underrepresented minority ob-gyns were more likely than white or Asians to practice in federally funded underserved areas or where poverty levels were high. Native Americans, Alaska Natives, and Pacific Islanders were the ob-gyn group with the highest proportion practicing in rural areas.
Compared with other adult medical specialists, ob-gyns have a relatively high proportion of black and Hispanic physicians. A higher proportion of underrepresented minority ob-gyns practiced at medically underserved areas.
Clinical prediction models estimated with health records data may perpetuate inequities.
To evaluate racial/ethnic differences in the performance of statistical models that predict suicide.
In this ...diagnostic/prognostic study, performed from January 1, 2009, to September 30, 2017, with follow-up through December 31, 2017, all outpatient mental health visits to 7 large integrated health care systems by patients 13 years or older were evaluated. Prediction models were estimated using logistic regression with LASSO variable selection and random forest in a training set that contained all visits from a 50% random sample of patients (6 984 184 visits). Performance was evaluated in the remaining 6 996 386 visits, including visits from White (4 031 135 visits), Hispanic (1 664 166 visits), Black (578 508 visits), Asian (313 011 visits), and American Indian/Alaskan Native (48 025 visits) patients and patients without race/ethnicity recorded (274 702 visits). Data analysis was performed from January 1, 2019, to February 1, 2021.
Demographic, diagnosis, prescription, and utilization variables and Patient Health Questionnaire 9 responses.
Suicide death in the 90 days after a visit.
This study included 13 980 570 visits by 1 433 543 patients (64% female; mean SD age, 42 18 years. A total of 768 suicide deaths were observed within 90 days after 3143 visits. Suicide rates were highest for visits by patients with no race/ethnicity recorded (n = 313 visits followed by suicide within 90 days, rate = 5.71 per 10 000 visits), followed by visits by Asian (n = 187 visits followed by suicide within 90 days, rate = 2.99 per 10 000 visits), White (n = 2134 visits followed by suicide within 90 days, rate = 2.65 per 10 000 visits), American Indian/Alaskan Native (n = 21 visits followed by suicide within 90 days, rate = 2.18 per 10 000 visits), Hispanic (n = 392 visits followed by suicide within 90 days, rate = 1.18 per 10 000 visits), and Black (n = 65 visits followed by suicide within 90 days, rate = 0.56 per 10 000 visits) patients. The area under the curve (AUC) and sensitivity of both models were high for White, Hispanic, and Asian patients and poor for Black and American Indian/Alaskan Native patients and patients without race/ethnicity recorded. For example, the AUC for the logistic regression model was 0.828 (95% CI, 0.815-0.840) for White patients compared with 0.640 (95% CI, 0.598-0.681) for patients with unrecorded race/ethnicity and 0.599 (95% CI, 0.513-0.686) for American Indian/Alaskan Native patients. Sensitivity at the 90th percentile was 62.2% (95% CI, 59.2%-65.0%) for White patients compared with 27.5% (95% CI, 21.0%-34.7%) for patients with unrecorded race/ethnicity and 10.0% (95% CI, 0%-23.0%) for Black patients. Results were similar for random forest models, with an AUC of 0.812 (95% CI, 0.800-0.826) for White patients compared with 0.676 (95% CI, 0.638-0.714) for patients with unrecorded race/ethnicity and 0.642 (95% CI, 0.579-0.710) for American Indian/Alaskan Native patients and sensitivities at the 90th percentile of 52.8% (95% CI, 50.0%-55.8%) for White patients, 29.3% (95% CI, 22.8%-36.5%) for patients with unrecorded race/ethnicity, and 6.7% (95% CI, 0%-16.7%) for Black patients.
These suicide prediction models may provide fewer benefits and more potential harms to American Indian/Alaskan Native or Black patients or those with undrecorded race/ethnicity compared with White, Hispanic, and Asian patients. Improving predictive performance in disadvantaged populations should be prioritized to improve, rather than exacerbate, health disparities.
Data Science Dinov, Ivo D; Velev, Milen Velchev
2022.
eBook
The book includes many illustrations of model-based and model-free spacekime analytic techniques applied to economic forecasting, identification of functional brain activation, and high-dimensional ...cohort phenotyping. --
This open access book is part of the LAMBDA Project (Learning, Applying, Multiplying Big Data Analytics), funded by the European Union, GA No. 809965. Data Analytics involves applying algorithmic ...processes to derive insights. Nowadays it is used in many industries to allow organizations and companies to make better decisions as well as to verify or disprove existing theories or models. The term data analytics is often used interchangeably with intelligence, statistics, reasoning, data mining, knowledge discovery, and others. The goal of this book is to introduce some of the definitions, methods, tools, frameworks, and solutions for big data processing, starting from the process of information extraction and knowledge representation, via knowledge processing and analytics to visualization, sense-making, and practical applications. Each chapter in this book addresses some pertinent aspect of the data processing chain, with a specific focus on understanding Enterprise Knowledge Graphs, Semantic Big Data Architectures, and Smart Data Analytics solutions. This book is addressed to graduate students from technical disciplines, to professional audiences following continuous education short courses, and to researchers from diverse areas following self-study courses. Basic skills in computer science, mathematics, and statistics are required.
The need to align investments in health research and development (R&D) with public health demands is one of the most pressing global public health challenges. We aim to provide a comprehensive ...description of available data sources, propose a set of indicators for monitoring the global landscape of health R&D, and present a sample of country indicators on research inputs (investments), processes (clinical trials), and outputs (publications), based on data from international databases. Total global investments in health R&D (both public and private sector) in 2009 reached US$240 billion. Of the US$214 billion invested in high-income countries, 60% of health R&D investments came from the business sector, 30% from the public sector, and about 10% from other sources (including private non-profit organisations). Only about 1% of all health R&D investments were allocated to neglected diseases in 2010. Diseases of relevance to high-income countries were investigated in clinical trials seven-to-eight-times more often than were diseases whose burden lies mainly in low-income and middle-income countries. This report confirms that substantial gaps in the global landscape of health R&D remain, especially for and in low-income and middle-income countries. Too few investments are targeted towards the health needs of these countries. Better data are needed to improve priority setting and coordination for health R&D, ultimately to ensure that resources are allocated to diseases and regions where they are needed the most. The establishment of a global observatory on health R&D, which is being discussed at WHO, could address the absence of a comprehensive and sustainable mechanism for regular global monitoring of health R&D.
Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications brings together all the information, tools and methods a professional will need to efficiently use text mining ...applications and statistical analysis. Winner of a 2012 PROSE Award in Computing and Information Sciences from the Association of American Publishers, this book presents a comprehensive how-to reference that shows the user how to conduct text mining and statistically analyze results. In addition to providing an in-depth examination of core text mining and link detection tools, methods and operations, the book examines advanced preprocessing techniques, knowledge representation considerations, and visualization approaches. Finally, the book explores current real-world, mission-critical applications of text mining and link detection using real world example tutorials in such varied fields as corporate, finance, business intelligence, genomics research, and counterterrorism activities. The world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the textual data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. As the Internet expands and our natural capacity to process the unstructured text that it contains diminishes, the value of text mining for information retrieval and search will increase dramatically. Extensive case studies, most in a tutorial format, allow the reader to 'click through' the example using a software program, thus learning to conduct text mining analyses in the most rapid manner of learning possible Numerous examples, tutorials, power points and datasets
available via companion website on Elsevierdirect.com Glossary of text mining terms provided in the appendix.
Designing Data Spaces Otto, Boris; ten Hompel, Michael; Wrobel, Stefan
2022, 2022-07-21
eBook
Odprti dostop
This open access book provides a comprehensive view on data ecosystems and platform economics from methodical and technological foundations up to reports from practical implementations and ...applications in various industries. To this end, the book is structured in four parts: Part I “Foundations and Contexts” provides a general overview about building, running, and governing data spaces and an introduction to the IDS and GAIA-X projects. Part II “Data Space Technologies” subsequently details various implementation aspects of IDS and GAIA-X, including eg data usage control, the usage of blockchain technologies, or semantic data integration and interoperability. Next, Part III describes various “Use Cases and Data Ecosystems” from various application areas such as agriculture, healthcare, industry, energy, and mobility. Part IV eventually offers an overview of several “Solutions and Applications”, eg including products and experiences from companies like Google, SAP, Huawei, T-Systems, Innopay and many more. Overall, the book provides professionals in industry with an encompassing overview of the technological and economic aspects of data spaces, based on the International Data Spaces and Gaia-X initiatives. It presents implementations and business cases and gives an outlook to future developments. In doing so, it aims at proliferating the vision of a social data market economy based on data spaces which embrace trust and data sovereignty.
Data management for libraries Krier, Laura; Strasser, Carly A; Strasser, Carly A
2014., 2013, 2014-03-30, 2014-01-01
eBook
This guide offers a start-to-finish primer on understanding, building, and maintaining a data management service, showing another way the academic library can be invaluable to researchers.