Background
The size of the margin strongly influences the required sample size in non-inferiority and equivalence trials. What is sometimes ignored, however, is that for trials with binary outcomes, ...the scale of the margin – risk difference, risk ratio or odds ratio – also has a large impact on power and thus on sample size requirement. When considering several scales at the design stage of a trial, these sample size consequences should be taken into account. Sometimes, changing the scale may be needed at a later stage of a trial, for example, when the event proportion in the control arm turns out different from expected. Also after completion of a trial, a switch to another scale is sometimes made, for example, when using a regression model in a secondary analysis or when combining study results in a meta-analysis that requires unifying scales. The exact consequences of such switches are currently unknown.
Methods and Results
This article first outlines sample size consequences for different choices of analysis scale at the design stage of a trial. We add a new result on sample size requirement comparing the risk difference scale with the risk ratio scale. Then, we study two different approaches to changing the analysis scale after the trial has commenced: (1) mapping the original non-inferiority margin using the event proportion in the control arm that was anticipated at the design stage or (2) mapping the original non-inferiority margin using the observed event proportion in the control arm. We use simulations to illustrate consequences on type I and type II error rates. Methods are illustrated on the INES trial, a non-inferiority trial that compared single birth rates in subfertile couples after different fertility treatments. Our results demonstrate large differences in required sample size when choosing between risk difference, risk ratio and odds ratio scales at the design stage of non-inferiority trials. In some cases, the sample size requirement is twice as large on one scale compared with another. Changing the scale after commencing the trial using anticipated proportions mainly impacts type II error rate, whereas switching using observed proportions is not advised due to not maintaining type I error rate. Differences were more pronounced with larger margins.
Conclusions
Trialists should be aware that the analysis scale can have large impact on type I and type II error rates in non-inferiority trials.
Over the past decade, immune checkpoint inhibitors (ICIs) have transformed the management of multiple malignancies including lung cancer. However, the optimal use of these agents in terms of ...duration, dose and administration frequency remains unknown. Focusing on anti-PD1 agents nivolumab and pembrolizumab in the context of non-small cell lung cancer, we argue that several lines of evidence suggest current administration regimens of these drugs may result in overtreatment with potentially important implications for cost, quality of life and toxicity. This review summarizes evidence for the scope to optimize anti-PD1 regimens, the limitations of existing data and potential approaches to solve these problems including with a novel multi-arm clinical trial design implemented in the recently opened REFINE-Lung study.
People with radiographic evidence for pulmonary tuberculosis (TB), but negative sputum cultures, have increased risk of developing culture-positive TB. Recent expansion of X-ray screening is leading ...to increased identification of this group. We set out to synthesise the evidence for treatment to prevent progression to culture-positive disease.
We conducted a systematic review and meta-analysis. We searched for prospective trials evaluating the efficacy of TB regimens against placebo, observation, or alternative regimens, for the treatment of adults and children with radiographic evidence of TB but culture-negative respiratory samples. Databases were searched up to 18 Oct 2022. Study quality was assessed using ROB 2·0 and ROBINS-I. The primary outcome was progression to culture-positive TB. Meta-analysis with a random effects model was conducted to estimate pooled efficacy. This study was registered with PROSPERO (CRD42021248486).
We included 13 trials (32,568 individuals) conducted between 1955 and 2018. Radiographic and bacteriological criteria for inclusion varied. 19·1% to 57·9% of participants with active x-ray changes and no treatment progressed to culture-positive disease. Progression was reduced with any treatment (6 studies, risk ratio RR 0·27, 95%CI 0·13-0·56), although multi-drug TB treatment (RR 0·11, 95%CI 0·05-0·23) was significantly more effective than isoniazid treatment (RR 0·63, 95%CI 0·35-1·13) (p = 0·0002).
Multi-drug regimens were associated with significantly reduced risk of progression to TB disease for individuals with radiographically apparent, but culture-negative TB. However, most studies were old, conducted prior to the HIV epidemic and with outdated regimens. New clinical trials are required to identify the optimal treatment approach.
Missing information is a major drawback in analyzing data collected in many routine health care settings. Multiple imputation assuming a missing at random mechanism is a popular method to handle ...missing data. The missing at random assumption cannot be confirmed from the observed data alone, hence the need for sensitivity analysis to assess robustness of inference. However, sensitivity analysis is rarely conducted and reported in practice. We analyzed routine paediatric data collected during a cluster randomized trial conducted in Kenyan hospitals. We imputed missing patient and clinician-level variables assuming the missing at random mechanism. We also imputed missing clinician-level variables assuming a missing not at random mechanism. We incorporated opinions from 15 clinical experts in the form of prior distributions and shift parameters in the delta adjustment method. An interaction between trial intervention arm and follow-up time, hospital, clinician and patient-level factors were included in a proportional odds random-effects analysis model. We performed these analyses using R functions derived from the jomo package. Parameter estimates from multiple imputation under the missing at random mechanism were similar to multiple imputation estimates assuming the missing not at random mechanism. Our inferences were insensitive to departures from the missing at random assumption using either the prior distributions or shift parameters sensitivity analysis approach.
Most UK adolescents do not achieve recommended levels of physical activity. Previous studies suggested that perceptions of the neighbourhood environment could contribute to explain differences in ...physical activity behaviours. We aimed to examine whether five measures of perceptions - perceived bus stop proximity, traffic safety, street connectivity, enjoyment of the neighbourhood for walking/cycling, and personal safety - were longitudinally associated with common forms of physical activity, namely walking to school, walking for leisure, and a composite measure of outdoor physical activity. We further aimed to investigate the moderating role of gender.
We used longitudinal data from the Olympic Regeneration in East London (ORiEL) study, a prospective cohort study. In 2012, 3106 adolescents aged 11 to 12 were recruited from 25 schools in 4 deprived boroughs of East London. Adolescents were followed-up in 2013 and 2014. The final sample includes 2260 adolescents surveyed at three occasions. We estimated logistic regression models using Generalised Estimating Equations to test the plausibility of hypotheses on the nature of the longitudinal associations (general association, cumulative effect, co-varying trajectories), adjusting for potential confounders. Item non-response was handled using multiple imputation.
Longitudinal analyses indicate little evidence that perceptions of the neighbourhood are important predictors of younger adolescent physical activity. There was weak evidence that greater perceived proximity to bus stops is associated with a small decrease in the probability of walking for leisure. Results also indicate that poorer perception of personal safety decreases the probability of walking for leisure. There was some indication that better perception of street connectivity is associated with more outdoor physical activity. Finally, we found very little evidence that the associations between perceptions of the neighbourhood and physical activity differed by gender.
This study suggests that younger adolescents' perceptions of their neighbourhood environment, and changes in these perceptions, did not consistently predict physical activity in a deprived and ethnically diverse urban population. Future studies should use situation-specific measures of the neighbourhood environment and physical activity to better capture the hypothesised processes and explore the relative roles of the objective environment, parental and adolescents' perceptions in examining differences in types of physical activity.
Increasing interest has centered on the psychotherapeutic working alliance as a means of understanding clinical change in digital mental health interventions in recent years. However, little is ...understood about how and to what extent a digital mental health program can have an impact on the working alliance and clinical outcomes in a blended (therapist plus digital program) cognitive behavioral therapy (bCBT) intervention for depression.
This study aimed to test the difference in working alliance scores between bCBT and treatment as usual (TAU), examine the association between working alliance and depression severity scores in both arms, and test for an interaction between system usability and working alliance with regard to the association between working alliance and depression scores in bCBT at 3-month assessments.
We conducted a secondary data analysis of the E-COMPARED (European Comparative Effectiveness Research on Blended Depression Treatment versus Treatment-as-usual) trial, which compared bCBT with TAU across 9 European countries. Data were collected in primary care and specialized services between April 2015 and December 2017. Eligible participants aged 18 years or older and diagnosed with major depressive disorder were randomized to either bCBT (n=476) or TAU (n=467). bCBT consisted of 6-20 sessions of bCBT (involving face-to-face sessions with a therapist and an internet-based program). TAU consisted of usual care for depression. The main outcomes were scores of the working alliance (Working Alliance Inventory-Short Revised-Client WAI-SR-C) and depressive symptoms (Patient Health Questionnaire-9 PHQ-9) at 3 months after randomization. Other variables included system usability scores (System Usability Scale-Client SUS-C) at 3 months and baseline demographic information. Data from baseline and 3-month assessments were analyzed using linear regression models that adjusted for a set of baseline variables.
Of the 945 included participants, 644 (68.2%) were female, and the mean age was 38.96 years (IQR 38). bCBT was associated with higher composite WAI-SR-C scores compared to TAU (B=5.67, 95% CI 4.48-6.86). There was an inverse association between WAI-SR-C and PHQ-9 in bCBT (B=-0.12, 95% CI -0.17 to -0.06) and TAU (B=-0.06, 95% CI -0.11 to -0.02), in which as WAI-SR-C scores increased, PHQ-9 scores decreased. Finally, there was a significant interaction between SUS-C and WAI-SR-C with regard to an inverse association between higher WAI-SR-C scores and lower PHQ-9 scores in bCBT (b=-0.030, 95% CI -0.05 to -0.01; P=.005).
To our knowledge, this is the first study to show that bCBT may enhance the client working alliance when compared to evidence-based routine care for depression that services reported offering. The working alliance in bCBT was also associated with clinical improvements that appear to be enhanced by good program usability. Our findings add further weight to the view that the addition of internet-delivered CBT to face-to-face CBT may positively augment experiences of the working alliance.
ClinicalTrials.gov NCT02542891, https://clinicaltrials.gov/study/NCT02542891; German Clinical Trials Register DRKS00006866, https://drks.de/search/en/trial/DRKS00006866; Netherlands Trials Register NTR4962, https://www.onderzoekmetmensen.nl/en/trial/25452; ClinicalTrials.Gov NCT02389660, https://clinicaltrials.gov/study/NCT02389660; ClinicalTrials.gov NCT02361684, https://clinicaltrials.gov/study/NCT02361684; ClinicalTrials.gov NCT02449447, https://clinicaltrials.gov/study/NCT02449447; ClinicalTrials.gov NCT02410616, https://clinicaltrials.gov/study/NCT02410616; ISRCTN Registry ISRCTN12388725, https://www.isrctn.com/ISRCTN12388725?q=ISRCTN12388725&filters=&sort=&offset=1&totalResults=1&page=1&pageSize=10; ClinicalTrials.gov NCT02796573, https://classic.clinicaltrials.gov/ct2/show/NCT02796573.
RR2-10.1186/s13063-016-1511-1.
Diagnostic delay is associated with lower chances of cancer survival. Underlying comorbidities are known to affect the timely diagnosis of cancer. Diffuse large B-cell (DLBCL) and follicular ...lymphomas (FL) are primarily diagnosed amongst older patients, who are more likely to have comorbidities. Characteristics of clinical commissioning groups (CCG) are also known to impact diagnostic delay. We assess the association between comorbidities and diagnostic delay amongst patients with DLBCL or FL in England during 2005-2013.
Multivariable generalised linear mixed-effect models were used to assess the main association. Empirical Bayes estimates of the random effects were used to explore between-cluster variation. The latent normal joint modelling multiple imputation approach was used to account for partially observed variables.
We included 30,078 and 15,551 patients diagnosed with DLBCL or FL, respectively. Amongst patients from the same CCG, having multimorbidity was strongly associated with the emergency route to diagnosis (DLBCL: odds ratio 1.56, CI 1.40-1.73; FL: odds ratio 1.80, CI 1.45-2.23). Amongst DLBCL patients, the diagnostic delay was possibly correlated with CCGs that had higher population densities.
Underlying comorbidity is associated with diagnostic delay amongst patients with DLBCL or FL. Results suggest a possible correlation between CCGs with higher population densities and diagnostic delay of aggressive lymphomas.
Background:
Designing trials to reduce treatment duration is important in several therapeutic areas, including tuberculosis and bacterial infections. We recently proposed a new randomised trial ...design to overcome some of the limitations of standard two-arm non-inferiority trials. This DURATIONS design involves randomising patients to a number of duration arms and modelling the so-called ‘duration-response curve’. This article investigates the operating characteristics (type-1 and type-2 errors) of different statistical methods of drawing inference from the estimated curve.
Methods:
Our first estimation target is the shortest duration non-inferior to the control (maximum) duration within a specific risk difference margin. We compare different methods of estimating this quantity, including using model confidence bands, the delta method and bootstrap. We then explore the generalisability of results to estimation targets which focus on absolute event rates, risk ratio and gradient of the curve.
Results:
We show through simulations that, in most scenarios and for most of the estimation targets, using the bootstrap to estimate variability around the target duration leads to good results for DURATIONS design-appropriate quantities analogous to power and type-1 error. Using model confidence bands is not recommended, while the delta method leads to inflated type-1 error in some scenarios, particularly when the optimal duration is very close to one of the randomised durations.
Conclusions:
Using the bootstrap to estimate the optimal duration in a DURATIONS design has good operating characteristics in a wide range of scenarios and can be used with confidence by researchers wishing to design a DURATIONS trial to reduce treatment duration. Uncertainty around several different targets can be estimated with this bootstrap approach.
Background
The population-level summary measure is a key component of the estimand for clinical trials with time-to-event outcomes. This is particularly the case for non-inferiority trials, because ...different summary measures imply different null hypotheses. Most trials are designed using the hazard ratio as summary measure, but recent studies suggested that the difference in restricted mean survival time might be more powerful, at least in certain situations. In a recent letter, we conjectured that differences between summary measures can be explained using the concept of the non-inferiority frontier and that for a fair simulation comparison of summary measures, the same analysis methods, making the same assumptions, should be used to estimate different summary measures. The aim of this article is to make such a comparison between three commonly used summary measures: hazard ratio, difference in restricted mean survival time and difference in survival at a fixed time point. In addition, we aim to investigate the impact of using an analysis method that assumes proportional hazards on the operating characteristics of a trial designed with any of the three summary measures.
Methods
We conduct a simulation study in the proportional hazards setting. We estimate difference in restricted mean survival time and difference in survival non-parametrically, without assuming proportional hazards. We also estimate all three measures parametrically, using flexible survival regression, under the proportional hazards assumption.
Results
Comparing the hazard ratio assuming proportional hazards with the other summary measures not assuming proportional hazards, relative performance varies substantially depending on the specific scenario. Fixing the summary measure, assuming proportional hazards always leads to substantial power gains compared to using non-parametric methods. Fixing the modelling approach to flexible parametric regression assuming proportional hazards, difference in restricted mean survival time is most often the most powerful summary measure among those considered.
Conclusion
When the hazards are likely to be approximately proportional, reflecting this in the analysis can lead to large gains in power for difference in restricted mean survival time and difference in survival. The choice of summary measure for a non-inferiority trial with time-to-event outcomes should be made on clinical grounds; when any of the three summary measures discussed here is equally justifiable, difference in restricted mean survival time is most often associated with the most powerful test, on the condition that it is estimated under proportional hazards.
Non-inferiority trials are increasingly used to evaluate new treatments that are expected to have secondary advantages over standard of care, but similar efficacy on the primary outcome. When ...designing a non-inferiority trial with a binary primary outcome, the choice of effect measure for the non-inferiority margin (e.g. risk ratio or risk difference) has an important effect on sample size calculations; furthermore, if the control event risk observed is markedly different from that assumed, the trial can quickly lose power or the results become difficult to interpret.
We propose a new way of designing non-inferiority trials to overcome the issues raised by unexpected control event risks. Our proposal involves using clinical judgement to specify a 'non-inferiority frontier', i.e. a curve defining the most appropriate non-inferiority margin for each possible value of control event risk. Existing trials implicitly use frontiers defined by a fixed risk ratio or a fixed risk difference. We discuss their limitations and propose a fixed arcsine difference frontier, using the power-stabilising transformation for binary outcomes, which may better represent clinical judgement. We propose and compare three ways of designing a trial using this frontier: testing and reporting on the arcsine scale; testing on the arcsine scale but reporting on the risk difference or risk ratio scale; and modifying the margin on the risk difference or risk ratio scale after observing the control event risk according to the power-stabilising frontier.
Testing and reporting on the arcsine scale leads to results which are challenging to interpret clinically. For small values of control event risk, testing on the arcsine scale and reporting results on the risk difference scale produces confidence intervals at a higher level than the nominal one or non-inferiority margins that are slightly smaller than those back-calculated from the power-stabilising frontier alone. However, working on the arcsine scale generally requires a larger sample size compared to the risk difference scale. Therefore, working on the risk difference scale, modifying the margin after observing the control event risk, might be preferable, as it requires a smaller sample size. However, this approach tends to slightly inflate type I error rate; a solution is to use a slightly lower significance level for testing, although this modestly reduces power. When working on the risk ratio scale instead, the same approach based on the modification of the margin leads to power levels above the nominal one, maintaining type I error under control.
Our proposed methods of designing non-inferiority trials using power-stabilising non-inferiority frontiers make trial design more resilient to unexpected values of the control event risk, at the only cost of requiring somewhat larger sample sizes when the goal is to report results on the risk difference scale.