There has been a dramatic shift in use of bariatric procedures, but little is known about their long-term comparative effectiveness.
To compare weight loss and safety among bariatric procedures.
...Retrospective observational cohort study, January 2005 to September 2015. (ClinicalTrials.gov: NCT02741674).
41 health systems in the National Patient-Centered Clinical Research Network.
65 093 patients aged 20 to 79 years with body mass index (BMI) of 35 kg/m2 or greater who had bariatric procedures.
32 208 Roux-en-Y gastric bypass (RYGB), 29 693 sleeve gastrectomy (SG), and 3192 adjustable gastric banding (AGB) procedures.
Estimated percent total weight loss (TWL) at 1, 3, and 5 years; 30-day rates of major adverse events.
Total numbers of eligible patients with weight measures at 1, 3, and 5 years were 44 978 (84%), 20 783 (68%), and 7159 (69%), respectively. Thirty-day rates of major adverse events were 5.0% for RYGB, 2.6% for SG, and 2.9% for AGB. One-year mean TWLs were 31.2% (95% CI, 31.1% to 31.3%) for RYGB, 25.2% (CI, 25.1% to 25.4%) for SG, and 13.7% (CI, 13.3% to 14.0%) for AGB. At 1 year, RYGB patients lost 5.9 (CI, 5.8 to 6.1) percentage points more weight than SG patients and 17.7 (CI, 17.3 to 18.1) percentage points more than AGB patients, and SG patients lost 12.0 (CI, 11.6 to 12.5) percentage points more than AGB patients. Five-year mean TWLs were 25.5% (CI, 25.1% to 25.9%) for RYGB, 18.8% (CI, 18.0% to 19.6%) for SG, and 11.7% (CI, 10.2% to 13.1%) for AGB. Patients with diabetes, those with BMI less than 50 kg/m2, those aged 65 years or older, African American patients, and Hispanic patients lost less weight than patients without those characteristics.
Potential unobserved confounding due to nonrandomized design; electronic health record databases had missing outcome data.
Adults lost more weight with RYGB than with SG or AGB at 1, 3, and 5 years; however, RYGB had the highest 30-day rate of major adverse events. Small subgroup differences in weight loss outcomes were observed.
Patient-Centered Outcomes Research Institute.
Clinical prediction models estimated with health records data may perpetuate inequities.
To evaluate racial/ethnic differences in the performance of statistical models that predict suicide.
In this ...diagnostic/prognostic study, performed from January 1, 2009, to September 30, 2017, with follow-up through December 31, 2017, all outpatient mental health visits to 7 large integrated health care systems by patients 13 years or older were evaluated. Prediction models were estimated using logistic regression with LASSO variable selection and random forest in a training set that contained all visits from a 50% random sample of patients (6 984 184 visits). Performance was evaluated in the remaining 6 996 386 visits, including visits from White (4 031 135 visits), Hispanic (1 664 166 visits), Black (578 508 visits), Asian (313 011 visits), and American Indian/Alaskan Native (48 025 visits) patients and patients without race/ethnicity recorded (274 702 visits). Data analysis was performed from January 1, 2019, to February 1, 2021.
Demographic, diagnosis, prescription, and utilization variables and Patient Health Questionnaire 9 responses.
Suicide death in the 90 days after a visit.
This study included 13 980 570 visits by 1 433 543 patients (64% female; mean SD age, 42 18 years. A total of 768 suicide deaths were observed within 90 days after 3143 visits. Suicide rates were highest for visits by patients with no race/ethnicity recorded (n = 313 visits followed by suicide within 90 days, rate = 5.71 per 10 000 visits), followed by visits by Asian (n = 187 visits followed by suicide within 90 days, rate = 2.99 per 10 000 visits), White (n = 2134 visits followed by suicide within 90 days, rate = 2.65 per 10 000 visits), American Indian/Alaskan Native (n = 21 visits followed by suicide within 90 days, rate = 2.18 per 10 000 visits), Hispanic (n = 392 visits followed by suicide within 90 days, rate = 1.18 per 10 000 visits), and Black (n = 65 visits followed by suicide within 90 days, rate = 0.56 per 10 000 visits) patients. The area under the curve (AUC) and sensitivity of both models were high for White, Hispanic, and Asian patients and poor for Black and American Indian/Alaskan Native patients and patients without race/ethnicity recorded. For example, the AUC for the logistic regression model was 0.828 (95% CI, 0.815-0.840) for White patients compared with 0.640 (95% CI, 0.598-0.681) for patients with unrecorded race/ethnicity and 0.599 (95% CI, 0.513-0.686) for American Indian/Alaskan Native patients. Sensitivity at the 90th percentile was 62.2% (95% CI, 59.2%-65.0%) for White patients compared with 27.5% (95% CI, 21.0%-34.7%) for patients with unrecorded race/ethnicity and 10.0% (95% CI, 0%-23.0%) for Black patients. Results were similar for random forest models, with an AUC of 0.812 (95% CI, 0.800-0.826) for White patients compared with 0.676 (95% CI, 0.638-0.714) for patients with unrecorded race/ethnicity and 0.642 (95% CI, 0.579-0.710) for American Indian/Alaskan Native patients and sensitivities at the 90th percentile of 52.8% (95% CI, 50.0%-55.8%) for White patients, 29.3% (95% CI, 22.8%-36.5%) for patients with unrecorded race/ethnicity, and 6.7% (95% CI, 0%-16.7%) for Black patients.
These suicide prediction models may provide fewer benefits and more potential harms to American Indian/Alaskan Native or Black patients or those with undrecorded race/ethnicity compared with White, Hispanic, and Asian patients. Improving predictive performance in disadvantaged populations should be prioritized to improve, rather than exacerbate, health disparities.
Abstract
Methodological advancements in epidemiology, biostatistics, and data science have strengthened the research world’s ability to use data captured from electronic health records (EHRs) to ...address pressing medical questions, but gaps remain. We describe methods investments that are needed to curate EHR data toward research quality and to integrate complementary data sources when EHR data alone are insufficient for research goals. We highlight new methods and directions for improving the integrity of medical evidence generated from pragmatic trials, observational studies, and predictive modeling. We also discuss needed methods contributions to further ease data sharing across multisite EHR data networks. Throughout, we identify opportunities for training and for bolstering collaboration among subject matter experts, methodologists, practicing clinicians, and health system leaders to help ensure that methods problems are identified and resulting advances are translated into mainstream research practice more quickly.
The National Committee for Quality Assurance recommends response and remission as indicators of successful depression treatment for the Healthcare Effectiveness and Data Information Set. Effect size ...and severity-adjusted effect size (SAES) offer alternative metrics. This study compared measures and examined the relationship between baseline symptom severity and treatment success.
Electronic records from two large integrated health systems (Kaiser Permanente Colorado and Washington) were used to identify 5,554 new psychotherapy episodes with a baseline Patient Health Questionnaire (PHQ-9) score of ≥10 and a PHQ-9 follow-up score from 14-180 days after treatment initiation. Treatment success was defined for four measures: response (≥50% reduction in PHQ-9 score), remission (PHQ-9 score <5), effect size ≥0.8, and SAES ≥0.8. Descriptive analyses examined agreement of measures. Logistic regression estimated the association between baseline severity and success on each measure. Sensitivity analyses evaluated the impact of various outcome definitions and loss to follow-up.
Effect size ≥0.8 was most frequently attained (72% across sites), followed by SAES ≥0.8 (66%), response (46%), and remission (22%). Response was the only measure not associated with baseline PHQ-9 score. Effect size ≥0.8 favored episodes with a higher baseline PHQ-9 score (odds ratio OR=2.3, p<0.001, for 10-point difference in baseline PHQ-9 score), whereas SAES ≥0.8 (OR=0.61, p<0.001) and remission (OR=0.43, p<0.001) favored episodes with lower baseline scores.
Response is preferable for comparing treatment outcomes, because it does not favor more or less baseline symptom severity, indicates clinically meaningful improvement, and is transparent and easy to calculate.
The authors examined whether machine-learning models could be used to analyze data from electronic health records (EHRs) to predict patients' responses to antidepressant medications.
EHR data from a ...Washington State health system identified patients ages ≥13 years who started an antidepressant medication in 2016 in a community practice setting and had a baseline Patient Health Questionnaire-9 (PHQ-9) score of ≥10 and at least one PHQ-9 score recorded 14-180 days later. Potential predictors of a response to antidepressants were extracted from the EHR and included demographic characteristics, psychiatric and substance use diagnoses, past psychiatric medication use, mental health service use, and past PHQ-9 scores. Random-forest and penalized regression analyses were used to build models predicting follow-up PHQ-9 score and a favorable treatment response (≥50% improvement in score).
Among 2,469 patients starting antidepressant medication treatment, the mean±SD baseline PHQ-9 score was 17.3±4.5, and the mean lowest follow-up score was 9.2±5.9. Outcome data were available for 72% of the patients. About 48% of the patients had a favorable treatment response. The best-fitting random-forest models yielded a correlation between predicted and observed follow-up scores of 0.38 (95% CI=0.32-0.45) and an area under the receiver operating characteristic curve for a favorable response of 0.57 (95% CI=0.52-0.61). Results were similar for penalized regression models and for models predicting last PHQ-9 score during follow-up.
Prediction models using EHR data were not accurate enough to inform recommendations for or against starting antidepressant medication. Personalization of depression treatment should instead rely on systematic assessment of early outcomes.
Additional data comparing longer-term problems associated with various bariatric surgical procedures are needed for shared decision-making.
To compare the risks of intervention, operation, endoscopy, ...hospitalization, and mortality up to 5 years after 2 bariatric surgical procedures.
Adults who underwent Roux-en-Y gastric bypass (RYGB) or sleeve gastrectomy (SG) between January 1, 2005, and September 30, 2015, within the National Patient-Centered Clinical Research Network. Data from 33 560 adults at 10 centers within 4 clinical data research networks were included in this cohort study. Information was extracted from electronic health records using a common data model and linked to insurance claims and mortality indices. Analyses were conducted from January 2018 through October 2019.
Bariatric surgical procedures.
The primary outcome was time until operation or intervention. Secondary outcomes included endoscopy, hospitalization, and mortality rates.
Of 33 560 adults, 18 056 (54%) underwent RYGB, and 15 504 (46%) underwent SG. The median (interquartile range) follow-up for operation or intervention was 3.4 (1.6-5.0) years for RYGB and 2.2 (0.9-3.6) years for SG. The overall mean (SD) patient age was 45.0 (11.5) years, and the overall mean (SD) patient body mass index was 49.1 (7.9). The cohort was composed predominantly of women (80%) and white individuals (66%), with 26% of Hispanic ethnicity. Operation or intervention was less likely for SG than for RYGB (hazard ratio, 0.72; 95% CI, 0.65-0.79; P < .001). The estimated, adjusted cumulative incidence rates of operation or intervention at 5 years were 8.94% (95% CI, 8.23%-9.65%) for SG and 12.27% (95% CI, 11.49%-13.05%) for RYGB. Hospitalization was less likely for SG than for RYGB (hazard ratio, 0.82; 95% CI, 0.78-0.87; P < .001), and the 5-year adjusted cumulative incidence rates were 32.79% (95% CI, 31.62%-33.94%) for SG and 38.33% (95% CI, 37.17%-39.46%) for RYGB. Endoscopy was less likely for SG than for RYGB (hazard ratio, 0.47; 95% CI, 0.43-0.52; P < .001), and the adjusted cumulative incidence rates at 5 years were 7.80% (95% CI, 7.15%-8.43%) for SG and 15.83% (95% CI, 14.94%-16.71%) for RYGB. There were no differences in all-cause mortality between SG and RYGB.
Interventions, operations, and hospitalizations were relatively common after bariatric surgical procedures and were more often associated with RYGB than SG.
ClinicalTrials.gov identifier: NCT02741674.
There is increasing interest in clinical prediction models for rare outcomes such as suicide, psychiatric hospitalizations, and opioid overdose. Accurate model validation is needed to guide model ...selection and decisions about whether and how prediction models should be used. Split-sample estimation and validation of clinical prediction models, in which data are divided into training and testing sets, may reduce predictive accuracy and precision of validation. Using all data for estimation and validation increases sample size for both procedures, but validation must account for overfitting, or optimism. Our study compared split-sample and entire-sample methods for estimating and validating a suicide prediction model.
We compared performance of random forest models estimated in a sample of 9,610,318 mental health visits ("entire-sample") and in a 50% subset ("split-sample") as evaluated in a prospective validation sample of 3,754,137 visits. We assessed optimism of three internal validation approaches: for the split-sample prediction model, validation in the held-out testing set and, for the entire-sample model, cross-validation and bootstrap optimism correction.
The split-sample and entire-sample prediction models showed similar prospective performance; the area under the curve, AUC, and 95% confidence interval was 0.81 (0.77-0.85) for both. Performance estimates evaluated in the testing set for the split-sample model (AUC = 0.85 0.82-0.87) and via cross-validation for the entire-sample model (AUC = 0.83 0.81-0.85) accurately reflected prospective performance. Validation of the entire-sample model with bootstrap optimism correction overestimated prospective performance (AUC = 0.88 0.86-0.89). Measures of classification accuracy, including sensitivity and positive predictive value at the 99
, 95
, 90
, and 75
percentiles of the risk score distribution, indicated similar conclusions: bootstrap optimism correction overestimated classification accuracy in the prospective validation set.
While previous literature demonstrated the validity of bootstrap optimism correction for parametric models in small samples, this approach did not accurately validate performance of a rare-event prediction model estimated with random forests in a large clinical dataset. Cross-validation of prediction models estimated with all available data provides accurate independent validation while maximizing sample size.
Bariatric surgery can lead to substantial improvements in type 2 diabetes (T2DM), but outcomes vary across procedures and populations. It is unclear which bariatric procedure has the most benefits ...for patients with T2DM.
To evaluate associations of bariatric surgery with T2DM outcomes.
This cohort study was conducted in 34 US health system sites in the National Patient-Centered Clinical Research Network Bariatric Study. Adult patients with T2DM who had bariatric surgery between January 1, 2005, and September 30, 2015, were included. Data analysis was conducted from April 2017 to August 2019.
Roux-en-Y gastric bypass (RYGB) or sleeve gastrectomy (SG).
Type 2 diabetes remission, T2DM relapse, percentage of total weight lost, and change in glycosylated hemoglobin (hemoglobin A1c).
A total of 9710 patients were included (median interquartile range follow-up time, 2.7 2.9 years; 7051 female patients 72.6%; mean SD age, 49.8 10.5 years; mean SD BMI, 49.0 8.4; 6040 white patients 72.2%). Weight loss was significantly greater with RYGB than SG at 1 year (mean difference, 6.3 95% CI, 5.8-6.7 percentage points) and 5 years (mean difference, 8.1 95% CI, 6.6-9.6 percentage points). The T2DM remission rate was approximately 10% higher in patients who had RYGB (hazard ratio, 1.10 95% CI, 1.04-1.16) than those who had SG. Estimated adjusted cumulative T2DM remission rates for patients who had RYGB and SG were 59.2% (95% CI, 57.7%-60.7%) and 55.9% (95% CI, 53.9%-57.9%), respectively, at 1 year and 86.1% (95% CI, 84.7%-87.3%) and 83.5% (95% CI, 81.6%-85.1%) at 5 years postsurgery. Among 6141 patients who experienced T2DM remission, the subsequent T2DM relapse rate was lower for those who had RYGB than those who had SG (hazard ratio, 0.75 95% CI, 0.67-0.84). Estimated relapse rates for those who had RYGB and SG were 8.4% (95% CI, 7.4%-9.3%) and 11.0% (95% CI, 9.6%-12.4%) at 1 year and 33.1% (95% CI, 29.6%-36.5%) and 41.6% (95% CI, 36.8%-46.1%) at 5 years after surgery. At 5 years, compared with baseline, hemoglobin A1c was reduced 0.45 (95% CI, 0.27-0.63) percentage points more for patients who had RYGB vs patients who had SG.
In this large multicenter study, patients who had RYGB had greater weight loss, a slightly higher T2DM remission rate, less T2DM relapse, and better long-term glycemic control compared with those who had SG. These findings can help inform patient-centered surgical decision-making.
Purpose
Observational studies assessing effects of medical products on suicidal behavior often rely on health record data to account for pre‐existing risk. We assess whether high‐dimensional models ...predicting suicide risk using data derived from insurance claims and electronic health records (EHRs) are superior to models using data from insurance claims alone.
Methods
Data were from seven large health systems identified outpatient mental health visits by patients aged 11 or older between 1/1/2009 and 9/30/2017. Data for the 5 years prior to each visit identified potential predictors of suicidal behavior typically available from insurance claims (e.g., mental health diagnoses, procedure codes, medication dispensings) and additional potential predictors available from EHRs (self‐reported race and ethnicity, responses to Patient Health Questionnaire or PHQ‐9 depression questionnaires). Nonfatal self‐harm events following each visit were identified from insurance claims data and fatal self‐harm events were identified by linkage to state mortality records. Random forest models predicting nonfatal or fatal self‐harm over 90 days following each visit were developed in a 70% random sample of visits and validated in a held‐out sample of 30%. Performance of models using linked claims and EHR data was compared to models using claims data only.
Results
Among 15 845 047 encounters by 1 574 612 patients, 99 098 (0.6%) were followed by a self‐harm event within 90 days. Overall classification performance did not differ between the best‐fitting model using all data (area under the receiver operating curve or AUC = 0.846, 95% CI 0.839–0.854) and the best‐fitting model limited to data available from insurance claims (AUC = 0.846, 95% CI 0.838–0.853). Competing models showed similar classification performance across a range of cut‐points and similar calibration performance across a range of risk strata. Results were similar when the sample was limited to health systems and time periods where PHQ‐9 depression questionnaires were recorded more frequently.
Conclusion
Investigators using health record data to account for pre‐existing risk in observational studies of suicidal behavior need not limit that research to databases including linked EHR data.