Big data and machine learning tools have jointly empowered humans in making data-driven decisions. However, many of them capture empirical associations that might be spurious due to confounding ...factors and subgroup heterogeneity. The famous Simpson's paradox is such a phenomenon where aggregated and subgroup-level associations contradict with each other, causing cognitive confusions and difficulty in making adequate interpretations and decisions. Existing tools provide little insights for humans to locate, reason about, and prevent pitfalls of spurious association in practice. We propose V ispur , a visual analytic system that provides a causal analysis framework and a human-centric workflow for tackling spurious associations. These include a C onfounder D ashboard , which can automatically identify possible confounding factors, and a S ubgroup V iewer , which allows for the visualization and comparison of diverse subgroup patterns that likely or potentially result in a misinterpretation of causality. Additionally, we propose a R easoning S toryboard , which uses a flow-based approach to illustrate paradoxical phenomena, as well as an interactive D ecision D iagnosis panel that helps ensure accountable decision-making. Through an expert interview and a controlled user experiment, our qualitative and quantitative results demonstrate that the proposed "de-paradox" workflow and the designed visual analytic system are effective in helping human users to identify and understand spurious associations, as well as to make accountable causal decisions.
•During the first year of the pandemic, U.S. Black and Latinx persons had lower case fatality rates (CFRs) overall than their white counterparts. However, after adjusting for age, Black and Latinx ...persons under age 65 had higher CFRs, a partial example of Simpson's paradox.•The racial and ethnic disparity in CFR was highest among the youngest adults (18–49).•There is considerable variability in observed CFR between states, likely due to differences in testing rates and reporting.•More complete national data are needed to fully understand racial and ethnic disparities in the impact of the pandemic.
During the initial 12 months of the pandemic, racial and ethnic disparities in COVID-19 death rates received considerable attention but it has been unclear whether disparities in death rates were due to disparities in case fatality rates (CFRs), incidence rates or both. We examined differences in observed COVID-19 CFRs between U.S. White, Black/African American, and Latinx individuals during this period.
Using data from the COVID Tracking Project and the Centers for Disease Control and Prevention COVID-19 Case Surveillance Public Use dataset, we calculated CFR ratios comparing Black and Latinx to White individuals, both overall and separately by age group. We also used a model of monthly COVID-19 deaths to estimate CFR ratios, adjusting for age, gender, and differences across states and time.
Overall Black and Latinx individuals had lower CFRs than their White counterparts. However, when adjusting for age, Black and Latinx had higher CFRs than White individuals among those younger than 65. CFRs varied substantially across states and time.
Disparities in COVID-19 case fatality among U.S. Black and Latinx individuals under age 65 were evident during the first year of the pandemic. Understanding racial and ethnic differences in COVID-19 CFRs is challenging due to limitations in available data.
We point out an instantiation of Simpson's paradox in COVID-19 case fatality rates ( cfr s): comparing a large-scale study from China (February 17) with early reports from Italy (March 9), we find ...that cfr s are lower in Italy for every age group, but higher overall. This phenomenon is explained by a stark difference in case demographic between the two countries. Using this as a motivating example, we introduce basic concepts from mediation analysis and show how these can be used to quantify different direct and indirect effects when assuming a coarse-grained causal graph involving country, age, and case fatality. We curate an age-stratified cfr dataset with <inline-formula><tex-math notation="LaTeX">></tex-math></inline-formula>750 k cases and conduct a case study, investigating total, direct, and indirect (age-mediated) causal effects between different countries and at different points in time. This allows us to separate age-related effects from others unrelated to age and facilitates a more transparent comparison of cfr s across countries at different stages of the COVID-19 pandemic. Using longitudinal data from Italy, we discover a sign reversal of the direct causal effect in mid-March, which temporally aligns with the reported collapse of the healthcare system in parts of the country. Moreover, we find that direct and indirect effects across 132 pairs of countries are only weakly correlated, suggesting that a country's policy and case demographic may be largely unrelated. We point out limitations and extensions for future work, and finally, discuss the role of causal reasoning in the broader context of using AI to combat the COVID-19 pandemic.
Impact Statement -During a global pandemic, understanding the causal effects of risk factors such as age on COVID-19 fatality is an important scientific question. Since randomised controlled trials are typically infeasible or unethical in this context, causal investigations based on observational data-such as the one carried out in this article-will, therefore, be crucial in guiding our understanding of the available data. Causal inference, in particular mediation analysis, can be used to resolve apparent statistical paradoxes; help educate the public and decision-makers alike; avoid unsound comparisons; and answer a range of causal questions pertaining to the pandemic, subject to transparently stated assumptions. Our exposition helps clarify how mediation analysis can be used to investigate direct and indirect effects along different causal paths and thus serves as a stepping stone for future studies of other important risk factors for COVID-19 besides age.
Understanding why diversity sometimes limits disease is essential for managing outbreaks; however, mechanisms underlying this ‘dilution effect’ remain poorly understood. Negative diversity‐disease ...relationships have previously been detected in plant communities impacted by an emerging forest disease, sudden oak death. We used this focal system to empirically evaluate whether these relationships were driven by dilution mechanisms that reduce transmission risk for individuals or from the fact that disease was averaged across the host community. We integrated laboratory competence measurements with plant community and symptom data from a large forest monitoring network. Richness increased disease risk for bay laurel trees, dismissing possible dilution mechanisms. Nonetheless, richness was negatively associated with community‐level disease prevalence because the disease was aggregated among hosts that vary in disease susceptibility. Aggregating observations (which is surprisingly common in other dilution effect studies) can lead to misinterpretations of dilution mechanisms and bias towards a negative diversity‐disease relationship.
We studied natural forests infested by sudden oak death to empirically evaluate whether previously observed negative diversity‐disease relationships were driven by dilution mechanisms that reduce transmission risk for individuals, or from the fact that disease was averaged across the host community. Due to distinct underlying drivers at different hierarchical levels, the effects of diversity on disease risk were positive, negative and neutral depending on how the disease was measured. We show that aggregating observations (which is surprisingly common in other dilution effect studies) can lead to misinterpretations of dilution mechanisms and bias towards a negative diversity‐disease relationship.
Bitter fighting among Christian factions and immoral behavior among Church leaders led to a transition to secular thought in Europe (see Zaman (2018) for details). One of the consequences of ...rejection of religion was the rejection of all unobservables. Empiricists like David Hume rejected all knowledge which was not based on observations and logic. He famously stated that: ““If we take in our hand any volume; of divinity or school metaphysics, for instance; let us ask, Does it contain any abstract reasoning concerning quantity or number? No. Does it contain any experimental reasoning concerning matter of fact and existence? No. Commit it then to the flames: for it can contain nothing but sophistry and illusion.” David Hume further realized that causality was not observable. This means that it is observable that event Y happened after event X, but it is not observable that Y happened due to X. The underlying mechanisms which connect X to Y are not observable. Current Article discusses the impact of changing causal structures on relationships and results of econometric analysis. it shows that conventional econometric analysis is devoid of causal chains which makes it impossible to get realistic results.
The Simpson's paradox unraveled Hernán, Miguel A; Clayton, David; Keiding, Niels
International journal of epidemiology,
06/2011, Letnik:
40, Številka:
3
Journal Article
Recenzirano
Odprti dostop
Background In a famous article, Simpson described a hypothetical data example that led to apparently paradoxical results.
Methods We make the causal structure of Simpson's example explicit.
Results ...We show how the paradox disappears when the statistical analysis is appropriately guided by subject-matter knowledge. We also review previous explanations of Simpson's paradox that attributed it to two distinct phenomena: confounding and non-collapsibility.
Conclusion Analytical errors may occur when the problem is stripped of its causal context and analyzed merely in statistical terms.
We present the problem of measuring the strength of a causal interaction, starting from the linear perspective and generalizing to a nonlinear measure of causal influence. The proposed measure of ...causal strength is interpretable and we demonstrate that it may be estimated efficiently using Gaussian process regression. We validate our results on several examples and connect our results to the existing causal inference literature.
A previous note illustrated how the odds of an outcome have an undesirable property for risk summarization and communication: Noncollapsibility, defined as a failure of a group measure to represent a ...simple average of the measure over individuals or subgroups. The present sequel discusses how odds ratios amplify odds noncollapsibility and provides a basic numeric illustration of how noncollapsibility differs from confounding of effects (with which it is often confused). It also draws a connection of noncollapsibility to sparse-data bias in logistic, log-linear, and proportional-hazards regression.
The authors describe the relative benefits of conducting meta-analyses with (a) individual participant data (IPD) gathered from the constituent studies and (b) aggregated data (AD), or the ...group-level statistics (in particular, effect sizes) that appear in reports of a study's results. Given that both IPD and AD are equally available, meta-analysis of IPD is superior to meta-analysis of AD. IPD meta-analysis permits synthesists to perform subgroup analyses not conducted by the original collectors of the data, to check the data and analyses in the original studies, to add new information to the data sets, and to use different statistical methods. However, the cost of IPD meta-analysis and the lack of available IPD data sets suggest that the best strategy currently available is to use both approaches in a complementary fashion such that the first step in conducting an IPD meta-analysis would be to conduct an AD meta-analysis. Regardless of whether a meta-analysis is conducted with IPD or AD, synthesists must remain vigilant in how they interpret their results. They must avoid ecological fallacies, Simpson's paradox, and interpretation of synthesis-generated evidence as supporting causal inferences.