Directed acyclic graphs (DAGs) are an increasingly popular approach for identifying confounding variables that require conditioning when estimating causal effects. This review examined the use of ...DAGs in applied health research to inform recommendations for improving their transparency and utility in future research.
Original health research articles published during 1999-2017 mentioning 'directed acyclic graphs' (or similar) or citing DAGitty were identified from Scopus, Web of Science, Medline and Embase. Data were extracted on the reporting of: estimands, DAGs and adjustment sets, alongside the characteristics of each article's largest DAG.
A total of 234 articles were identified that reported using DAGs. A fifth (n = 48, 21%) reported their target estimand(s) and half (n = 115, 48%) reported the adjustment set(s) implied by their DAG(s). Two-thirds of the articles (n = 144, 62%) made at least one DAG available. DAGs varied in size but averaged 12 nodes interquartile range (IQR): 9-16, range: 3-28 and 29 arcs (IQR: 19-42, range: 3-99). The median saturation (i.e. percentage of total possible arcs) was 46% (IQR: 31-67, range: 12-100). 37% (n = 53) of the DAGs included unobserved variables, 17% (n = 25) included 'super-nodes' (i.e. nodes containing more than one variable) and 34% (n = 49) were visually arranged so that the constituent arcs flowed in the same direction (e.g. top-to-bottom).
There is substantial variation in the use and reporting of DAGs in applied health research. Although this partly reflects their flexibility, it also highlights some potential areas for improvement. This review hence offers several recommendations to improve the reporting and use of DAGs in future research.
During the first wave of the COVID-19 pandemic, the United Kingdom experienced one of the highest per-capita death tolls worldwide. It is debated whether this may partly be explained by the ...relatively late initiation of voluntary social distancing and mandatory lockdown measures. In this study, we used simulations to estimate the number of cases and deaths that would have occurred in England by 1 June 2020 if these interventions had been implemented one or two weeks earlier, and the impact on the required duration of lockdown.
Using official reported data on the number of Pillar 1 lab-confirmed cases of COVID-19 and associated deaths occurring in England from 3 March to 1 June, we modelled: the natural (i.e. observed) growth of cases, and the counterfactual (i.e. hypothetical) growth of cases that would have occurred had measures been implemented one or two weeks earlier. Under each counterfactual condition, we estimated the expected number of deaths and the time required to reach the incidence observed under natural growth on 1 June.
Introducing measures one week earlier would have reduced by 74% the number of confirmed COVID-19 cases in England by 1 June, resulting in approximately 21,000 fewer hospital deaths and 34,000 fewer total deaths; the required time spent in full lockdown could also have been halved, from 69 to 35 days. Acting two weeks earlier would have reduced cases by 93%, resulting in between 26,000 and 43,000 fewer deaths.
Our modelling supports the claim that the relatively late introduction of social distancing and lockdown measures likely increased the scale, severity, and duration of the first wave of COVID-19 in England. Our results highlight the importance of acting swiftly to minimise the spread of an infectious disease when case numbers are increasing exponentially.
Estimating relative causal effects (i.e., “substitution effects”) is a common aim of nutritional research. In observational data, this is usually attempted using 1 of 2 statistical modeling ...approaches: the leave-one-out model and the energy partition model. Despite their widespread use, there are concerns that neither approach is well understood in practice.
We aimed to explore and illustrate the theory and performance of the leave-one-out and energy partition models for estimating substitution effects in nutritional epidemiology.
Monte Carlo data simulations were used to illustrate the theory and performance of both the leave-one-out model and energy partition model, by considering 3 broad types of causal effect estimands: 1) direct substitutions of the exposure with a single component, 2) inadvertent substitutions of the exposure with several components, and 3) average relative causal effects of the exposure instead of all other dietary sources. Models containing macronutrients, foods measured in calories, and foods measured in grams were all examined.
The leave-one-out and energy partition models both performed equally well when the target estimand involved substituting a single exposure with a single component, provided all variables were measured in the same units. Bias occurred when the substitution involved >1 substituting component. Leave-one-out models that examined foods in mass while adjusting for total energy intake evaluated obscure estimands.
Regardless of the approach, substitution models need to be constructed from clearly defined causal effect estimands. Estimands involving a single exposure and a single substituting component are typically estimated more accurately than estimands involving more complex substitutions. The practice of examining foods measured in grams or portions while adjusting for total energy intake is likely to deliver obscure relative effect estimands with unclear interpretations.
Four models are commonly used to adjust for energy intake when estimating the causal effect of a dietary component on an outcome: 1) the “standard model” adjusts for total energy intake, 2) the ...“energy partition model” adjusts for remaining energy intake, 3) the “nutrient density model” rescales the exposure as a proportion of total energy, and 4) the “residual model” indirectly adjusts for total energy by using a residual. It remains underappreciated that each approach evaluates a different estimand and only partially accounts for confounding by common dietary causes.
We aimed to clarify the implied causal estimand and interpretation of each model and evaluate their performance in reducing dietary confounding.
Semiparametric directed acyclic graphs and Monte Carlo simulations were used to identify the estimands and interpretations implied by each model and explore their performance in the absence or presence of dietary confounding.
The “standard model” and the mathematically identical “residual model” estimate the average relative causal effect (i.e., a “substitution” effect) but provide biased estimates even in the absence of confounding. The “energy partition model” estimates the total causal effect but only provides unbiased estimates in the absence of confounding or when all other nutrients have equal effects on the outcome. The “nutrient density model” has an obscure interpretation but attempts to estimate the average relative causal effect rescaled as a proportion of total energy. Accurate estimates of both the total and average relative causal effects may instead be derived by simultaneously adjusting for all dietary components, an approach we term the “all-components model.”
Lack of awareness of the estimand differences and accuracy of the 4 modeling approaches may explain some of the apparent heterogeneity among existing nutritional studies. This raises serious questions regarding the validity of meta-analyses where different estimands have been inappropriately pooled.
Reply to WC Willett et al Tomova, Georgia D; Arnold, Kellyn F; Gilthorpe, Mark S ...
The American journal of clinical nutrition,
08/2022, Letnik:
116, Številka:
2
Journal Article
BackgroundCompositional data comprise ‘parts’ of a ‘whole’ (or total), where the parts sum to the whole. In compositional data with fixed totals (e.g. hours within a day), only relative causal ...effects can be estimated because the effect of increasing one component (e.g. time spent physically active) cannot be distinguished from the effect of decreasing one or more other components (e.g. time spent sedentary).Compositional data are not well understood, but the structure has recently been conceptualised using directed acyclic graphs (DAGs) with deterministic nodes. This work encourages the use of a simple well-established approach, known as the isotemporal (‘leave-one-out’) model, for estimating relative causal effects in compositional data.However, the isotemporal model has been criticised as unsuitable in the presence of non-linear effects. Other, more technically demanding approaches, known as Compositional Data Analyses (CoDA) methods, are promoted instead.This study is the first to investigate the performance of DAG-informed regression-models for estimating causal effects in compositional data with fixed totals in simulated data, where the ground truth is known.MethodsUsing the DagSim package in Python, we simulated compositional data with fixed totals, using the example of physical activity data, in which sleep, sedentary behaviour (SB), light physical activity (LPA), and moderate and vigorous physical activity (MVPA) sum to a fixed total of 24 hours. The time spent in each state was then simulated to contribute to levels of an outcome (fasting plasma glucose, FPG), either in a strictly linear manner, or through non-linear relationships. We assessed the performance of using the DAG-informed isotemporal approach by comparing model estimates to the known (simulated) true relative causal effect of each component on the outcome.ResultsAccurate relative causal effect estimates were obtained using the DAG-informed isotemporal approach, provided the models were parameterised correctly. When the model was not parameterised correctly, e.g. linear terms were used for modelling non-linear relationships, the estimates were biased. In the literature, the isotemporal model is used almost exclusively with linear terms, which might explain some of the previous misconceptions that it is unsuitable for modelling of compositional data.ConclusionIn compositional data with fixed totals, a simple DAG-informed isotemporal modelling approach recovers the true relative causal effect as long as any non-linear relationships are appropriately parameterised. This method is a viable alternative to the more technically challenging and specialist CoDA methods. The findings cannot be generalised to compositional data with varying totals, which require separate investigation.
Abstract Deterministic variables are variables that are functionally determined by one or more parent variables. They commonly arise when a variable has been functionally created from one or more ...parent variables, as with derived variables, and in compositional data, where the 'whole' variable is determined from its 'parts'. This article introduces how deterministic variables may be depicted within directed acyclic graphs (DAGs) to help with identifying and interpreting causal effects involving derived variables and/or compositional data. We propose a two-step approach in which all variables are initially considered, and a choice is made whether to focus on the deterministic variable or its determining parents. Depicting deterministic variables within DAGs brings several benefits. It is easier to identify and avoid misinterpreting tautological associations, i.e., self-fulfilling associations between deterministic variables and their parents, or between sibling variables with shared parents. In compositional data, it is easier to understand the consequences of conditioning on the ‘whole’ variable, and correctly identify total and relative causal effects. For derived variables, it encourages greater consideration of the target estimand and greater scrutiny of the consistency and exchangeability assumptions. DAGs with deterministic variables are a useful aid for planning and interpreting analyses involving derived variables and/or compositional data.
BackgroundFour modelling approaches are commonly used to adjust for overall energy intake when seeking to estimate the causal effect of an individual dietary component on an outcome; (1) the ...‘standard model’ adjusts for total energy intake, (2) the ‘energy partition model’ adjusts for remaining energy intake, (3) the ‘nutrient density model’ examines the exposure as a proportion of total energy, and (4) the ‘residual model’ indirectly adjusts for total energy by using the residual from regressing the exposure nutrient on total energy intake. Unfortunately, it remains underappreciated that each approach evaluates a different causal effect estimand and only partially accounts for confounding by common causes of dietary intake and composition.MethodsSemi-parametric directed acyclic graphs and Monte Carlo simulations were used to identify the estimand implied by each approach and the correct interpretation of the model results. The performance of each model for estimating the corresponding target estimand was explored both in the absence and presence of confounding that acts through diet. An alternative approach based on the energy partition model that simultaneously adjusts for all competing dietary components, termed the ‘all-components model’, was also explored and compared with the four traditional approaches. This model involves using the weighted coefficients of different dietary components to estimate any desired causal effect estimand.ResultsThe ‘standard model’ and the mathematically identical ‘residual model’ both estimate the average relative causal effect (i.e. a ‘substitution’ effect) but provide biased estimates even in the absence of any confounding. The ‘energy partition model’, that adjusts for remaining energy intake, estimates the total causal effect (i.e. an ‘additive’ effect) but only provides unbiased estimates in the absence of confounding or when all individual nutrients have equal effects on the outcome. The ‘nutrient density model’ does not target a causally meaningful estimand but can provide extremely biased estimates of the average relative causal effect of the exposure rescaled as a percentage of total energy intake. Accurate estimates of both the total and average relative causal effects were obtained with the ‘all-components model’.ConclusionOnly the ‘all-components model’ produces unbiased estimates of different causal effects. Lack of awareness of the estimand differences and accuracy of the different modelling approaches may explain some of the apparent heterogeneity among existing nutritional studies. Serious questions may be raised regarding the validity of meta-analyses where different strategies returning different estimands have been inappropriately pooled.
BackgroundDietary guidelines often recommend substituting certain nutrients or foods with healthier alternatives, based on the available evidence from nutritional epidemiology. The effects of food ...substitutions can be examined by conducting isocaloric dietary interventions, but experimental studies are often not practical or sufficiently generalisable. Therefore, nutritional epidemiology is highly reliant on observational data, in which food substitutions can be explored using mathematical modelling. The two modelling approaches commonly used for estimating substitution effects are known as (1) the ‘leave-one-out’ model, in which total energy intake and all dietary components are included as covariates, excluding the nutrient(s) that the exposure should be substituted with; and (2) the energy partition model, in which all dietary components are included as covariates, without further adjustment for total energy intake. It remains underappreciated that these approaches do not perform equally well for estimating substitution effects, and that there is limited evidence on whether they produce unbiased estimates.MethodsSemi-parametric directed acyclic graphs and Monte Carlo data simulations were used to explore the performance of the two approaches for estimating the following estimands: 1) the average relative causal effect (i.e. the joint effect of increasing intake of the exposure and decreasing the intake of all other nutrients, while keeping total energy intake constant), 2) the relative effect of increasing the exposure nutrient and decreasing the intake of one other nutrient, and 3) the relative effect of increasing the exposure nutrient and decreasing the intake of a combination of other nutrients. The approaches were explored both in the absence and presence of confounding that acts through diet.ResultsThe ‘leave-one-out’ model produced a biased estimate of the average relative causal effect even in the absence of any confounding. It robustly estimated substituting the exposure with another specific nutrient regardless of whether confounding was present but produced biased estimates of substituting the exposure for a combination of other nutrients even in the absence of confounding. The energy partition model robustly estimated all three estimands of interest, producing unbiased estimates regardless of whether confounding was present or not.ConclusionOnly the energy partition model produces unbiased estimates of different substitution effects in the context of nutritional epidemiology. It performs equally well even in the presence of confounding that acts through diet. Substitution analyses using the ‘leave-one-out’ approach might not be robust and any existing studies using this model might suffer from bias.
BackgroundDeterministic variables are variables that are fully explained by one or more parent variables. They are extremely common in the health and social sciences, typically arising in the ...following three forms:1) Transformed variables are simple derived variables (e.g., macrosomia) that are functionally created from a single parent variable (e.g., birthweight).2) Composite variables are complex derived variables (e.g., body mass index) that are functionally created from two or more parent variables (e.g., weight and height).3) In compositional data, ‘whole’ variables are total variables (e.g., total mass)that contain two or more distinct ‘part’ variables (e.g., fat mass and lean mass)The identification and interpretation of causal effects involving deterministic variables is challenging, and misinterpretations are common, suggesting a need for new approaches and aids to thinking.We introduce how deterministic variables can be depicted within directed acyclic graphs (DAGs) and discuss how this approach can help with identifying and interpreting causal effects involving compositional data and composite variables.MethodsWe propose a two-step approach to the handling of deterministic variables when identifying and interpreting causal effects. First, a ‘full’ DAG is drawn that includes all deterministic variables and all determining parents. For clarity, we recommend depicting deterministic variables with double-outlined nodes and all their incoming arcs as double-lined. Next, an explicit choice is made whether to focus on the deterministic variable(s) or the determining parents. This approach ensures that the context and assumptions are given sufficiently thorough attention, reducing the risk of misinterpretation.ResultsDepicting deterministic variables within DAGs bring several benefits. First, it is easier to identify and avoid misinterpreting tautological associations, i.e., self-fulfilling associations between variables with shared algebraic parent variables. An example of a tautological association is the negative correlation between a change score variable (X1-X0) and the baseline variable (X0). Second, in compositional data, it is easier to understand the consequences of conditioning on the ‘whole’ variable (e.g. adjusting for total energy intake), and in turn identifying the appropriate strategies for estimating total and relative causal effects. Finally, for composite variables, depicting deterministic variables within DAGs encourages greater consideration of the target estimand and focusses attention on whether the consistency and exchangeability assumptions can be satisfied.ConclusionIncluding deterministic variables within DAGs makes it easier to identify, understand, and/or avoid a range of common biases and fallacies that may complicate the estimation of causal effects in analyses involving composite variables and compositional data.