Superhuman AI for multiplayer poker Brown, Noam; Sandholm, Tuomas
Science (American Association for the Advancement of Science),
08/2019, Volume:
365, Issue:
6456
Journal Article
Peer reviewed
Open access
In recent years there have been great strides in artificial intelligence (AI), with games often serving as challenge problems, benchmarks, and milestones for progress. Poker has served for decades as ...such a challenge problem. Past successes in such benchmarks, including poker, have been limited to two-player games. However, poker in particular is traditionally played with more than two players. Multiplayer games present fundamental additional issues beyond those in two-player games, and multiplayer poker is a recognized AI milestone. In this paper we present Pluribus, an AI that we show is stronger than top human professionals in six-player no-limit Texas hold'em poker, the most popular form of poker played by humans.
Recent long-read assemblies often exceed the quality and completeness of available reference genomes, making validation challenging. Here we present Merqury, a novel tool for reference-free assembly ...evaluation based on efficient k-mer set operations. By comparing k-mers in a de novo assembly to those found in unassembled high-accuracy reads, Merqury estimates base-level accuracy and completeness. For trios, Merqury can also evaluate haplotype-specific accuracy, completeness, phase block continuity, and switch errors. Multiple visualizations, such as k-mer spectrum plots, can be generated for evaluation. We demonstrate on both human and plant genomes that Merqury is a fast and robust method for assembly validation.
When comparing outcomes after sepsis, it is essential to account for patient case mix to make fair comparisons. We developed a model to assess risk-adjusted 30-day mortality in the Michigan Hospital ...Medicine Safety’s sepsis initiative (HMS-Sepsis).
Can HMS-Sepsis registry data adequately predict risk of 30-day mortality? Do performance assessments using adjusted vs unadjusted data differ?
Retrospective cohort of community-onset sepsis hospitalizations in HMS-Sepsis registry (4/2022-9/2023), with split derivation (70%) and validation (30%) cohorts. We fit a risk-adjustment model (HMS-Sepsis mortality model) incorporating acute physiology, demographic, and baseline health data and assessed model performance using c-statistics, Brier’s scores, and comparisons of predicted vs observed mortality by deciles of risk. We compared hospital performance (1st quintile, middle quintiles, 5th quintile) using observed versus adjusted mortality to understand the extent to which risk-adjustment impacted hospital performance assessment.
Among 17,514 hospitalizations from 66 hospitals during the study period, 12,260 (70%) were used for model derivation and 5,254 (30%) for model validation. 30-day mortality for the total cohort was 19.4%. The final model included 13 physiologic variables, two physiologic interactions, and 16 demographic and chronic health variables. The most significant variables were age, metastatic solid tumor, temperature, altered mental status, and platelet count. The model c-statistic was 0.82 for the derivation cohort, 0.81 for the validation cohort, and ≥0.78 for all subgroups assessed. Overall calibration error was 0.0% and mean calibration error across deciles of risk was 1.5%. Standardized mortality ratios yielded different assessments than observed mortality for 33.9% of hospitals.
The HMS-Sepsis mortality model had strong discrimination, adequate calibration, and reclassified one-third of hospitals to a different performance category from unadjusted mortality. Based on its strong performance, the HMS-Sepsis mortality model can aid in fair hospital benchmarking, assessment of temporal changes, and observational causal inference analysis.
For more than two decades, many efforts have been made to develop methods for extracting urban objects from data acquired by airborne sensors. In order to make the results of such algorithms more ...comparable, benchmarking data sets are of paramount importance. Such a data set, consisting of airborne image and laserscanner data, has been made available to the scientific community by ISPRS WGIII/4. Researchers were encouraged to submit their results of urban object detection and 3D building reconstruction, which were evaluated based on reference data. This paper presents the outcomes of the evaluation for building detection, tree detection, and 3D building reconstruction. The results achieved by different methods are compared and analysed to identify promising strategies for automatic urban object extraction from current airborne sensor data, but also common problems of state-of-the-art methods.
•We organized two challenges for landmark detection, pathology classification and teeth segmentation in dental x-ray image analysis.•Datasets include 400 cephalometric images and 120 bitewing images ...with a referenced standard generated by medical experts.•The datasets and the evaluation software will be made available to the research community, further encouraging future developments in this field.
Display omitted
Dental radiography plays an important role in clinical diagnosis, treatment and surgery. In recent years, efforts have been made on developing computerized dental X-ray image analysis systems for clinical usages. A novel framework for objective evaluation of automatic dental radiography analysis algorithms has been established under the auspices of the IEEE International Symposium on Biomedical Imaging 2015 Bitewing Radiography Caries Detection Challenge and Cephalometric X-ray Image Analysis Challenge. In this article, we present the datasets, methods and results of the challenge and lay down the principles for future uses of this benchmark. The main contributions of the challenge include the creation of the dental anatomy data repository of bitewing radiographs, the creation of the anatomical abnormality classification data repository of cephalometric radiographs, and the definition of objective quantitative evaluation for comparison and ranking of the algorithms. With this benchmark, seven automatic methods for analysing cephalometric X-ray image and two automatic methods for detecting bitewing radiography caries have been compared, and detailed quantitative evaluation results are presented in this paper. Based on the quantitative evaluation results, we believe automatic dental radiography analysis is still a challenging and unsolved problem. The datasets and the evaluation software will be made available to the research community, further encouraging future developments in this field. (http://www-o.ntust.edu.tw/~cweiwang/ISBI2015/)
Drug combination discovery depends on reliable synergy metrics but no consensus exists on the correct synergy criterion to characterize combined interactions. The fragmented state of the field ...confounds analysis, impedes reproducibility, and delays clinical translation of potential combination treatments. Here we present a mass-action based formalism to quantify synergy. With this formalism, we clarify the relationship between the dominant drug synergy principles, and present a mapping of commonly used frameworks onto a unified synergy landscape. From this, we show how biases emerge due to intrinsic assumptions which hinder their broad applicability and impact the interpretation of synergy in discovery efforts. Specifically, we describe how traditional metrics mask consequential synergistic interactions, and contain biases dependent on the Hill-slope and maximal effect of single-drugs. We show how these biases systematically impact synergy classification in large combination screens, potentially misleading discovery efforts. Thus the proposed formalism can provide a consistent, unbiased interpretation of drug synergy, and accelerate the translatability of synergy studies.
•We assess the tradeoffs between three pillars of performance.•We construct several scenarios representing policymakers' preferences.•The sustainable development potential has varied across time and ...space.•European OECD member countries have outperformed non-European ones.
Aiming to achieve sustainable development, a constantly growing number of countries have strived to promote economic growth while simultaneously mitigating environmental degradation and maximizing social welfare. However, despite the importance attributed to social well-being in contemporary discourse, its role has not received much attention in the performance evaluation literature. We propose a novel, multi-stage framework based on three dimensions of performance allowing us to assess the tradeoffs between the economic, environmental, and social efficiency in 28 OECD member countries from 2000 to 2019. We construct several scenarios representing policymakers' preferences by altering the weights assigned to the different performance pillars, allowing us to assess the environmental and social repercussions of economic growth. Our findings suggest that policies promoting relatively balanced growth patterns can offer opportunities for higher performance across all three pillars. At the same time, prioritizing development along any single dimension can trigger a relatively significant drop in progress in terms of the other two pillars. We also demonstrate that the sustainable development potential has varied across time and space. Comparisons suggest that the European OECD member countries have outperformed their non-European counterparts in terms of the economic performance, health outcomes, life expectancy, and carbon dioxide (CO2) emissions. Our results can provide policymakers with insights into strategies for promoting economic growth that account for sustainable development objectives.
Highlights • There is heterogeneity in the setting where dashboards are used. • There is heterogeneity in the design of dashboards and users targeted. • Dashboard use may be associated with improved ...outcomes in some contexts. • It is unclear what dashboard characteristics are related to improved outcomes.