Handling missing data in clinical research Heymans, Martijn W.; Twisk, Jos W.R.
Journal of clinical epidemiology,
November 2022, 2022-11-00, 20221101, Letnik:
151
Journal Article
Recenzirano
Odprti dostop
Because missing data are present in almost every study, it is important to handle missing data properly. First of all, the missing data mechanism should be considered. Missing data can be either ...completely at random (MCAR), at random (MAR), or not at random (MNAR). When missing data are MCAR, a complete case analysis can be valid. Also when missing data are MAR, in some situations a complete case analysis leads to valid results. However, in most situations, missing data imputation should be used. Regarding imputation methods, it is highly advised to use multiple imputations because multiple imputations lead to valid estimates including the uncertainty about the imputed values. When missing data are MNAR, also multiple imputations do not lead to valid results. A complication hereby is that it not possible to distinguish whether missing data are MAR or MNAR. Finally, it should be realized that preventing to have missing data is always better than the treatment of missing data.
To evaluate fMRI whole-brain resting-state functional connectivity changes in relation to cognitive decline in Parkinson disease (PD) over a 3-year period.
Resting-state fMRI scans were acquired in ...55 patients with PD (mean age 65.8 years, SD 6.37; average disease duration 9.24 years, SD 3.96) and 15 matched controls (mean age 64.4 years, SD 8.65). We first performed overall (i.e., 1 whole-brain mean) as well as regional (i.e., for all individual regions of interest) functional connectivity analyses, in which we compared subject groups cross-sectionally. After a 3-year follow-up period, 36 patients with PD and 12 controls were rescanned to study functional connectivity changes over time and correlate the changes in functional connectivity with deteriorating cognitive and motor function in the PD sample.
In the cross-sectional analysis, we found widespread decreases of resting-state functional connectivity in patients with PD in comparison to controls. Subsequent comparison between the 2 timepoints revealed that patients with PD displayed further decreases in functional connectivity independent of aging effects. These functional connectivity changes were most prominent for posterior parts of the brain and correlated across time with clinical measures of disease progression, especially cognitive decline.
In this fMRI study in PD, we demonstrated a progressive loss of resting-state functional connectivity over a period of 3 years for multiple brain regions, especially in posterior parts of the brain. The strong correlation with decreasing cognitive performance supports the pathophysiologic role of reduced functional connectivity in cognitive decline and the development of dementia in PD.
Mediation analysis methodology underwent many advancements throughout the years, with the most recent and important advancement being the development of causal mediation analysis based on the ...counterfactual framework. However, a previous review showed that for experimental studies the uptake of causal mediation analysis remains low. The aim of this paper is to review the methodological characteristics of mediation analyses performed in observational epidemiologic studies published between 2015 and 2019 and to provide recommendations for the application of mediation analysis in future studies.
We searched the MEDLINE and EMBASE databases for observational epidemiologic studies published between 2015 and 2019 in which mediation analysis was applied as one of the primary analysis methods. Information was extracted on the characteristics of the mediation model and the applied mediation analysis method.
We included 174 studies, most of which applied traditional mediation analysis methods (n = 123, 70.7%). Causal mediation analysis was not often used to analyze more complicated mediation models, such as multiple mediator models. Most studies adjusted their analyses for measured confounders, but did not perform sensitivity analyses for unmeasured confounders and did not assess the presence of an exposure-mediator interaction.
To ensure a causal interpretation of the effect estimates in the mediation model, we recommend that researchers use causal mediation analysis and assess the plausibility of the causal assumptions. The uptake of causal mediation analysis can be enhanced through tutorial papers that demonstrate the application of causal mediation analysis, and through the development of software packages that facilitate the causal mediation analysis of relatively complicated mediation models.
Competing events are often ignored in epidemiological studies. Conventional methods for the analysis of survival data assume independent or noninformative censoring, which is violated when subjects ...that experience a competing event are censored. Because many survival studies do not apply competing risk analysis, we explain and illustrate in a nonmathematical way how to analyze and interpret survival data in the presence of competing events.
Using data from the Longitudinal Aging Study Amsterdam, both marginal analyses (Kaplan–Meier method and Cox proportional-hazards regression) and competing risk analyses (cumulative incidence function CIF, cause-specific and subdistribution hazard regression) were performed. We analyzed the association between sex and depressive symptoms, in which death before the onset of depression was a competing event.
The Kaplan–Meier method overestimated the cumulative incidence of depressive symptoms. Instead, the CIF should be used. As the subdistribution hazard model has a one-to-one relation with the CIF, it is recommended for prediction research, whereas the cause-specific hazard model is recommended for etiologic research.
When competing risks are present, the type of research question guides the choice of the analytical model to be used. In any case, results should be presented for all event types.
Abstract Objectives Regardless of the proportion of missing values, complete-case analysis is most frequently applied, although advanced techniques such as multiple imputation (MI) are available. The ...objective of this study was to explore the performance of simple and more advanced methods for handling missing data in cases when some, many, or all item scores are missing in a multi-item instrument. Study Design and Setting Real-life missing data situations were simulated in a multi-item variable used as a covariate in a linear regression model. Various missing data mechanisms were simulated with an increasing percentage of missing data. Subsequently, several techniques to handle missing data were applied to decide on the most optimal technique for each scenario. Fitted regression coefficients were compared using the bias and coverage as performance parameters. Results Mean imputation caused biased estimates in every missing data scenario when data are missing for more than 10% of the subjects. Furthermore, when a large percentage of subjects had missing items (>25%), MI methods applied to the items outperformed methods applied to the total score. Conclusion We recommend applying MI to the item scores to get the most accurate regression model estimates. Moreover, we advise not to use any form of mean imputation to handle missing data.
Purpose
Much research has been performed on physical exposures during work (e.g. lifting, trunk flexion or body vibrations) as risk factors for low back pain (LBP), however results are inconsistent. ...Information on the effect of doses (e.g. spinal force or low back moments) on LBP may be more reliable but is lacking yet. The aim of the present study was to investigate the prospective relationship of cumulative low back loads (CLBL) with LBP and to compare the association of this mechanical load measure to exposure measures used previously.
Methods
The current study was part of the Study on Musculoskeletal disorders, Absenteeism and Health (SMASH) study in which 1,745 workers completed questionnaires. Physical load at the workplace was assessed by video-observations and force measurements. These measures were used to calculate CLBL. Furthermore, a 3-year follow-up was conducted to assess the occurrence of LBP. Logistic regressions were performed to assess associations of CLBL and physical risk factors established earlier (i.e. lifting and working in a flexed posture) with LBP. Furthermore, CLBL and the risk factors combined were assessed as predictors in logistic regression analyses to assess the association with LBP.
Results
Results showed that CLBL is a significant risk factor for LBP (OR: 2.06 (1.32–3.20)). Furthermore, CLBL had a more consistent association with LBP than two of the three risk factors reported earlier.
Conclusions
From these results it can be concluded that CLBL is a risk factor for the occurrence of LBP, having a more consistent association with LBP compared to most risk factors reported earlier.
The efficacy of psychodynamic therapies for depression remains open to debate because of a paucity of high-quality studies. The authors compared the efficacy of psychodynamic therapy with that of ...cognitive-behavioral therapy (CBT), hypothesizing nonsignificant differences and the noninferiority of psychodynamic therapy relative to CBT.
A total of 341 adults who met DSM-IV criteria for a major depressive episode and had Hamilton Depression Rating Scale (HAM-D) scores ≥14 were randomly assigned to 16 sessions of individual manualized CBT or short-term psychodynamic supportive therapy. Severely depressed patients (HAM-D score >24) also received antidepressant medication according to protocol. The primary outcome measure was posttreatment remission rate (HAM-D score ≤7). Secondary outcome measures included mean posttreatment HAM-D score and patient-rated depression score and 1-year follow-up outcomes. Data were analyzed with generalized estimating equations and mixed-model analyses using intent-to-treat samples. Noninferiority margins were prespecified as an odds ratio of 0.49 for remission rates and a Cohen's d value of 0.30 for continuous outcome measures.
No statistically significant treatment differences were found for any of the outcome measures. The average posttreatment remission rate was 22.7%. Noninferiority was shown for posttreatment HAM-D and patient-rated depression scores but could not be demonstrated for posttreatment remission rates or any of the follow-up measures.
The findings extend the evidence base of psychodynamic therapy for depression but also indicate that time-limited treatment is insufficient for a substantial number of patients encountered in psychiatric outpatient clinics.
The interpretation of a regression coefficient obtained from a longitudinal data analysis is a combination of a within-subject part and a between-subject part. The hybrid model is used to disentangle ...the two components. The purpose of this article was to illustrate and discuss the use of the hybrid model in epidemiologic studies.
In the hybrid model the between-subject part of the relationship is obtained using the individual mean value over time, whereas the within-subject part is obtained using the deviation score, that is, the differences between the observations and the individual mean value.
It was shown that the regression coefficient of a standard mixed model analysis is a sort of weighted average of the between- and within-subject part of the relationship. When the outcome was continuous the separate analyses to estimate the two components of a longitudinal relationship were equal to the estimation in the hybrid model. However, for dichotomous outcome, the estimations were slightly different.
The hybrid model is an elegant, easy to perform method to disentangle the within- and between-subject part of a relationship in longitudinal studies.
•The between-subject part is obtained by the individual mean value over time.•The within-subject part is obtained by using the deviation score.•The deviation score is the difference between observations and individual mean.•The results of a hybrid logistic model should be interpreted with caution.
Logistic regression is often used for mediation analysis with a dichotomous outcome. However, previous studies showed that the indirect effect and proportion mediated are often affected by a change ...of scales in logistic regression models. To circumvent this, standardization has been proposed. The aim of this study was to show the relative performance of the unstandardized and standardized estimates of the indirect effect and proportion mediated based on multiple regression, structural equation modeling, and the potential outcomes framework for mediation models with a dichotomous outcome.
We compared the performance of the effect estimates yielded by the three methods using a simulation study and two real-life data examples from an observational cohort study (n = 360).
Lowest bias and highest efficiency were observed for the estimates from the potential outcomes framework and for the crude indirect effect ab and the proportion mediated ab/(ab + c') based on multiple regression and SEM.
We advise the use of either the potential outcomes framework estimates or the ab estimate of the indirect effect and the ab/(ab + c') estimate of the proportion mediated based on multiple regression and SEM when mediation analysis is based on logistic regression. Standardization of the coefficients prior to estimating the indirect effect and the proportion mediated may not increase the performance of these estimates.
Although alterations in resting-state functional connectivity between brain regions have previously been reported in Parkinson's disease, the spatial organization of these changes remains largely ...unknown. Here, we longitudinally studied brain network topology in Parkinson's disease in relation to clinical measures of disease progression, using magnetoencephalography and concepts from graph theory. We characterized whole-brain functional networks by means of a standard graph analysis approach, measuring clustering coefficient and shortest path length, as well as the construction of a minimum spanning tree, a novel approach that allows a unique and unbiased characterization of brain networks. We observed that brain networks in early stage untreated patients displayed lower local clustering with preserved path length in the delta frequency band in comparison to controls. Longitudinal analysis over a 4-year period in a larger group of patients showed a progressive decrease in local clustering in multiple frequency bands together with a decrease in path length in the alpha2 frequency band. In addition, minimum spanning tree analysis revealed a decentralized and less integrated network configuration in early stage, untreated Parkinson's disease that also progressed over time. Moreover, the longitudinal changes in network topology identified with both techniques were associated with deteriorating motor function and cognitive performance. Our results indicate that impaired local efficiency and network decentralization are very early features of Parkinson's disease that continue to progress over time, together with reductions in global efficiency. As these network changes appear to reflect clinically relevant phenomena, they hold promise as markers of disease progression.