Background:
Propensity score (PS) analyses are increasingly used in multiple sclerosis (MS) research, largely owing to the greater availability of large observational cohorts and registry databases.
...Objective:
To evaluate the use and quality of reporting of PS methods in the recent MS literature.
Methods:
We searched the PubMed database for articles published between January 2013 and July 2019. We restricted the search to comparative effectiveness studies of two disease-modifying therapies.
Results:
Thirty-nine studies were included in the review, with most studies (62%) published within the past 3 years. All studies reported the list of covariates used for the PS model, but only 21% of studies mentioned how those covariates were selected. Most studies used PS matching (72%), followed by PS adjustment (18%), weighting (15%), and stratification (3%), with some overlap. Most studies using matching or weighting reported checking post-PS covariate imbalance (91%), although about 45% of these studies relied on p values from various statistical tests. Only 25% of studies using matching reported calculating robust standard errors for the PS analyses.
Conclusions:
The quality of reporting of PS methods in the MS literature is sub-optimal in general, and in some cases, inappropriate methods are used.
Background:
With many disease-modifying therapies currently approved for the management of multiple sclerosis, there is a growing need to evaluate the comparative effectiveness and safety of those ...therapies from real-world data sources. Propensity score methods have recently gained popularity in multiple sclerosis research to generate real-world evidence. Recent evidence suggests, however, that the conduct and reporting of propensity score analyses are often suboptimal in multiple sclerosis studies.
Objectives:
To provide practical guidance to clinicians and researchers on the use of propensity score methods within the context of multiple sclerosis research.
Methods:
We summarize recommendations on the use of propensity score matching and weighting based on the current methodological literature, and provide examples of good practice.
Results:
Step-by-step recommendations are presented, starting with covariate selection and propensity score estimation, followed by guidance on the assessment of covariate balance and implementation of propensity score matching and weighting. Finally, we focus on treatment effect estimation and sensitivity analyses.
Conclusion:
This comprehensive set of recommendations highlights key elements that require careful attention when using propensity score methods.
To evaluate primary causes of death after spontaneous subarachnoid hemorrhage (SAH) and externally validate the HAIR score, a prognostication tool, in a single academic institution.
We reviewed all ...patients with SAH admitted to our neuro-intensive care unit between 2010 and 2016. Univariate and multivariate logistic regressions were performed to identify predictors of in-hospital mortality. The HAIR score predictors were Hunt and Hess grade at treatment decision, age, intraventricular hemorrhage, and rebleeding within 24 hours. Validation of the HAIR score was characterized with the receiver operating curve, the area under the curve, and a calibration plot.
Among 434 patients with SAH, in-hospital mortality was 14.1%. Of the 61 mortalities, 54 (88.5%) had a neurologic cause of death or withdrawal of care and 7 (11.5%) had cardiac death. Median time from SAH to death was 6 days. The main causes of death were effect of the initial hemorrhage (26.2%), rebleeding (23%) and refractory cerebral edema (19.7%). Factors significantly associated with in-hospital mortality in the multivariate analysis were age, Hunt and Hess grade, and intracerebral hemorrhage. Maximum lumen size was also a significant risk factor after aneurysmal SAH. The HAIR score had a satisfactory discriminative ability, with an area under the curve of 0.89.
The in-hospital mortality is lower than in previous reports, attesting to the continuing improvement of our institutional SAH care. The major causes are the same as in previous reports. Despite a different therapeutic protocol, the HAIR score showed good discrimination and could be a useful tool for predicting mortality.
•In-hospital mortality after SAH continues to decrease.•Direct effect of the index bleed is the leading cause of mortality after SAH.•Robust risk factors for mortality include age and Hunt and Hess grade.•The HAIR score can be a useful tool for predicting in-hospital mortality.
Real-world data sources offer opportunities to compare the effectiveness of treatments in practical clinical settings. However, relevant outcomes are often recorded selectively and collected at ...irregular measurement times. It is therefore common to convert the available visits to a standardized schedule with equally spaced visits. Although more advanced imputation methods exist, they are not designed to recover longitudinal outcome trajectories and typically assume that missingness is non-informative. We, therefore, propose an extension of multilevel multiple imputation methods to facilitate the analysis of real-world outcome data that is collected at irregular observation times. We illustrate multilevel multiple imputation in a case study evaluating two disease-modifying therapies for multiple sclerosis in terms of time to confirmed disability progression. This survival outcome is derived from repeated measurements of the Expanded Disability Status Scale, which is collected when patients come to the healthcare center for a clinical visit and for which longitudinal trajectories can be estimated. Subsequently, we perform a simulation study to compare the performance of multilevel multiple imputation to commonly used single imputation methods. Results indicate that multilevel multiple imputation leads to less biased treatment effect estimates and improves the coverage of confidence intervals, even when outcomes are missing not at random.
Physical activity is beneficial to lipid profiles; however, the association between sedentary behavior and sleep and pediatric dyslipidemia remains unclear. We aimed to investigate whether sedentary ...behavior or sleep predicted lipid profiles in children over a 2-year period.
Six hundered and thirty children from the QUALITY cohort, with at least one obese parent, were assessed prospectively at ages 8-10 and 10-12 years. Measures of sedentary behavior included self-reported TV viewing and computer/video game use. Seven-day accelerometry was used to derive sedentary behavior and sleep duration. Adiposity was assessed using DEXA scans. Twenty-four-hour dietary recalls yielded estimates of carbohydrate and fat intake. Outcomes included fasting total cholesterol, triglycerides, HDL and LDL-cholesterol. Multivariable models were adjusted for adiposity and diet.
At both Visit 1 (median age 9.6 year) and Visit 2 (median age 11.6 year), children were of normal weight (55%), overweight (22%), or obese (22%). Every additional hour of TV viewing at Visit 1 was associated with a 7.0% triglyceride increase (95% CI: 3.5, 10.6; P < 0.01) and 2.6% HDL decrease (95% CI: -4.2, -0.9; P < 0.01) at Visit 2; findings remained significant after adjusting for adiposity and diet. Every additional hour of sleep at Visit 1 predicted a 4.8% LDL decrease (95% CI: -9.0, -0.5; P = 0.03) at Visit 2, after adjusting for fat intake; this association became nonsignificant once controlling for adiposity.
Longer screen time during childhood appears to deteriorate lipid profiles in early adolescence, even after accounting for other major lifestyle habits. There is preliminary evidence of a deleterious effect of shorter sleep duration, which should be considered in further studies.
Background
Real-time automated analysis of videos of the microvasculature is an essential step in the development of research protocols and clinical algorithms that incorporate point-of-care ...microvascular analysis. In response to the call for validation studies of available automated analysis software by the European Society of Intensive Care Medicine, and building on a previous validation study in sheep, we report the first human validation study of AVA 4.
Methods
Two retrospective perioperative datasets of human microcirculation videos (P1 and P2) and one prospective healthy volunteer dataset (V1) were used in this validation study. Video quality was assessed using the modified Microcirculation Image Quality Selection (MIQS) score. Videos were initially analyzed with (1) AVA software 3.2 by two experienced investigators using the gold standard semi-automated method, followed by an analysis with (2) AVA automated software 4.1. Microvascular variables measured were perfused vessel density (PVD), total vessel density (TVD), and proportion of perfused vessels (PPV). Bland–Altman analysis and intraclass correlation coefficients (ICC) were used to measure agreement between the two methods. Each method’s ability to discriminate between microcirculatory states before and after induction of general anesthesia was assessed using paired t-tests.
Results
Fifty-two videos from P1, 128 videos from P2 and 26 videos from V1 met inclusion criteria for analysis. Correlational analysis and Bland–Altman analysis revealed poor agreement and no correlation between AVA 4.1 and AVA 3.2. Following the induction of general anesthesia, TVD and PVD measured using AVA 3.2 increased significantly for P1 (
p
< 0.05) and P2 (
p
< 0.05). However, these changes could not be replicated with the data generated by AVA 4.1.
Conclusions
AVA 4.1 is not a suitable tool for research or clinical purposes at this time. Future validation studies of automated microvascular flow analysis software should aim to measure the new software’s agreement with the gold standard, its ability to discriminate between clinical states and the quality thresholds at which its performance becomes unacceptable.
Comparative effectiveness research using real-world data often involves pairwise propensity score matching to adjust for confounding bias. We show that corresponding treatment effect estimates may ...have limited external validity, and propose two visualization tools to clarify the target estimand.
We conduct a simulation study to demonstrate, with bivariate ellipses and joy plots, that differences in covariate distributions across treatment groups may affect the external validity of treatment effect estimates. We showcase how these visualization tools can facilitate the interpretation of target estimands in a case study comparing the effectiveness of teriflunomide (TERI), dimethyl fumarate (DMF) and natalizumab (NAT) on manual dexterity in patients with multiple sclerosis.
In the simulation study, estimates of the treatment effect greatly differed depending on the target population. For example, when comparing treatment B with C, the estimated treatment effect (and respective standard error) varied from -0.27 (0.03) to -0.37 (0.04) in the type of patients initially receiving treatment B and C, respectively. Visualization of the matched samples revealed that covariate distributions vary for each comparison and cannot be used to target one common treatment effect for the three treatment comparisons. In the case study, the bivariate distribution of age and disease duration varied across the population of patients receiving TERI, DMF or NAT. Although results suggest that DMF and NAT improve manual dexterity at 1 year compared with TERI, the effectiveness of DMF versus NAT differs depending on which target estimand is used.
Visualization tools may help to clarify the target population in comparative effectiveness studies and resolve ambiguity about the interpretation of estimated treatment effects.
Background
Comparing real-world effectiveness and tolerability of therapies for relapsing-remitting multiple sclerosis is increasingly important, though average treatment effects fail to capture ...possible treatment effect heterogeneity. With the clinical course of the disease being highly heterogeneous across patients, precision medicine methods enable treatment response heterogeneity investigations.
Objective
To compare real-world effectiveness and discontinuation profiles between dimethyl fumarate and fingolimod while investigating treatment effect heterogeneity with precision medicine methods.
Methods
Adults initiating dimethyl fumarate or fingolimod as a second-line therapy were selected from a French registry. The primary outcome was annualized relapse rate at 12 months. Seven secondary outcomes relative to discontinuation and disease progression were considered. A precision medicine framework was used to characterize treatment effect heterogeneity.
Results
Annualized relapse rates at 12 months were similar for dimethyl fumarate and fingolimod. The odd of treatment persistence was 47% lower for patients treated with dimethyl fumarate relative to those treated with fingolimod (odds ratio: 0.53, 95% confidence interval: 0.39, 0.70). None of the five precision medicine scoring approaches identified treatment heterogeneity.
Conclusion
These findings substantiated the similar effectiveness and different discontinuation profiles for dimethyl fumarate and fingolimod as a second-line therapy for relapsing-remitting multiple sclerosis, with no significant effect heterogeneity observed.
Background
Multiple sclerosis (MS) comparative effectiveness research needs to go beyond average treatment effects (ATEs) and post-host subgroup analyses.
Objective
This retrospective study assessed ...overall and patient-specific effects of dimethyl fumarate (DMF) versus teriflunomide (TERI) in patients with relapsing-remitting MS.
Methods
A novel precision medicine (PM) scoring approach leverages advanced machine learning methods and adjusts for imbalances in baseline characteristics between patients receiving different treatments. Using the German NeuroTransData registry, we implemented and internally validated different scoring systems to distinguish patient-specific effects of DMF relative to TERI based on annualized relapse rates, time to first relapse, and time to confirmed disease progression.
Results
Among 2791 patients, there was superior ATE of DMF versus TERI for the two relapse-related endpoints (p = 0.037 and 0.018). Low to moderate signals of treatment effect heterogeneity were detected according to individualized scores. A MS patient subgroup was identified for whom DMF was more effective than TERI (p = 0.013): older (45 versus 38 years), longer MS duration (110 versus 50 months), not newly diagnosed (74% versus 40%), and no prior glatiramer acetate usage (35% versus 5%).
Conclusion
The implemented approach can disentangle prognostic differences from treatment effect heterogeneity and provide unbiased patient-specific profiling of comparative effectiveness based on real-world data.
Background High station at specific points in the first stage of labor, such as a floating head on admission, or at 4-cm dilation or when arrest of dilation occurs, is associated with higher rates of ...failure to deliver vaginally. Therefore it could be useful to know if station is within an expected range at a given dilation during first stage. Arrest of descent disorders have been defined thus far on criteria applicable in the second stage. Statistical modeling is an attractive methodology to characterize the relationship between station and dilation because the resulting mathematical expressions could be used as a reference for comparison in the future. In addition, they can be used to produce a finely graded assessment of descent using numerical terms such as percentile rankings. A 2-step approach to potentially improving the assessment of station could be to develop a statistical model that describes the general relationship between station and dilation in the first stage of uncomplicated births and then determine if such a model would have identified births with complications related to poor labor progress. Given the complex nature of labor data, especially the imprecision of dilation and station measurement, it is not immediately evident that such a model is identifiable or what its precision would be. Objective We sought to characterize in mathematical terms the relationship of station to dilation during the first stage of labor for nulliparous and multiparous women with spontaneous vaginal births. Study Design This retrospective cohort study included 28,121 exams from 5555 women with singleton cephalic presentations at ≥37 weeks’ gestation with electronic fetal monitoring tracings, who delivered vaginally without instrumentation and had 5-minute Apgar scores >6 at 2 academic community referral hospitals in 2012 through 2013. Women with a previous cesarean birth were excluded. We used longitudinal statistical techniques suitable to biological data that were irregularly sampled with repeated measures over time. Results A linear relationship was observed between station and dilation. For both nulliparous and multiparous women the final model was a linear regression with random effects for intercept and slope and a first-order autoregressive correlation structure. The 5th-95th range of station at any given dilation spanned about 3-4 cm. Conclusion Our results demonstrate a general trend of increasing descent of the presenting part as dilation advances during the first stage of labor in women who delivered vaginally without instrumentation. We propose that the mathematical expressions describing this relationship may be valuable in the assessment of first-stage labor progression.