In drug development, it sometimes occurs that a new drug does not demonstrate effectiveness for the full study population but appears to be beneficial in a relevant subgroup. In case the subgroup of ...interest was not part of a confirmatory testing strategy, the inflation of the overall type I error is substantial and therefore such a subgroup analysis finding can only be seen as exploratory at best. To support such exploratory findings, an appropriate replication of the subgroup finding should be undertaken in a new trial. We should, however, be reasonably confident in the observed treatment effect size to be able to use this estimate in a replication trial in the subpopulation of interest. We were therefore interested in evaluating the bias of the estimate of the subgroup treatment effect, after selection based on significance for the subgroup in an overall "failed" trial. Different scenarios, involving continuous as well as dichotomous outcomes, were investigated via simulation studies. It is shown that the bias associated with subgroup findings in overall nonsignificant clinical trials is on average large and varies substantially across plausible scenarios. This renders the subgroup treatment estimate from the original trial of limited value to design the replication trial. An empirical Bayesian shrinkage method is suggested to minimize this overestimation. The proposed estimator appears to offer either a good or a conservative correction to the observed subgroup treatment effect hence provides a more reliable subgroup treatment effect estimate for adequate planning of future studies.
•Subgroups — prominent role in marketing authorisation applications (MAAs).•Regulators — how and for what purpose(s) are assessments made?•Assessments — mainly related to consistency or heterogeneity ...of treatment effect.•Main role — choice of the final indication.
Marketing authorisation application dossiers relating to medicinal products containing new active substances and evaluated by the European Medicines Agency (EMA) over the period 2012–2015 were examined. Major objections and other concerns relating to efficacy and safety of the day 80 assessment reports were reviewed. Overall, approved products have more subgroup concerns than nonapproved products, which seems to be a consistent pattern. Subgroup analyses are mainly assessed to have the insurance that subgroups of patients that might lack a positive benefit: risk ratio will not be wrongly included in the approved treatment indication.
To enhance the utility of transfusion data for research, ideally every transfusion should be linked to a primary clinical indication. In electronic patient records, many diagnostic and procedural ...codes are registered, but unfortunately, it is usually not specified which one is the reason for transfusion. Therefore, a method is needed to determine the most likely indication for transfusion in an automated way.
An algorithm to identify the most likely transfusion indication was developed and evaluated against a gold standard based on the review of medical records for 234 cases by 2 experts. In a second step, information on misclassification was used to fine-tune the initial algorithm. The adapted algorithm predicts, out of all data available, the most likely indication for transfusion using information on medical specialism, surgical procedures, and diagnosis and procedure dates relative to the transfusion date.
The adapted algorithm was able to predict 74.4% of indications in the sample correctly (extrapolated to the full data set 75.5%). A kappa score, which corrects for the number of options to choose from, was found of 0.63. This indicates that the algorithm performs substantially better than chance level.
It is possible to use an automated algorithm to predict the indication for transfusion in terms of procedures and/or diagnoses. Before implementation of the algorithm in other data sets, the obtained results should be externally validated in an independent hospital data set.
In drug development and drug licensing, it sometimes occurs that a new drug does not demonstrate effectiveness for the full study population, but there appears to be benefit in a relevant, ...pre-defined subgroup. This raises the question, how strong the evidence from such a subgroup is, and which confirmatory testing strategies are the most appropriate ones. Hence, we considered the type I error and the power of a subgroup result in a trial with non-significant overall results and of suitable replication strategies. In the case of a single trial, the inflation of the overall type I error is substantial and can be up to twice as large, especially in relatively small subgroups. This also increases to the risk of starting a replication trial that should not be done, if such a second trial is not already available. The overall type I error is almost controlled by using an appropriate replication strategy. This confirms the required cautious interpretation of promising subgroups, even in the case that overall trial results were perceived to be close to significance.
Abstract Background The 17-item Hamilton depression rating scale (HAMD17 ) is the standard efficacy outcome in antidepressant clinical trials. It is criticized for multidimensionality and poorly ...discriminating treatment from placebo. HAMD subscales may overcome these limitations and reduce the sample size of clinical trials. This study compared the discriminative performance of the HAMD17 and three established HAMD subscales (Bech, Maier-Philipp, Gibbons) across a range of antidepressants with different mechanisms of action. Methods We analyzed data from 24 clinical trials including 3692 patients randomized to tricyclic or tetracyclic antidepressants (TCAs or TeCAs), selective serotonin reuptake inhibitors (SSRIs) or placebo. Data were analyzed using a mixed model for repeated measurements (MMRM). Standardized effect sizes for the HAMD17 and subscales were derived for every time-point, and their effect on sample size was evaluated. Results For TCAs and TeCAs vs. placebo, the HAMD17 consistently provided the highest standardized effects. The sample size to establish efficacy at week six was >25 percent smaller than for any of the subscales. However, for SSRIs vs. placebo, the HAMD17 provided slightly smaller standardized effects and was the least efficient outcome. There were no relevant differences between the subscales. Limitations Data were derived exclusively from mirtazapine trials. Conclusions are restricted to clinical trial settings. Conclusions Comparative performance of the HAMD17 and various subscales strongly depends on type of antidepressant. Results support using HAMD17 as primary endpoint in clinical trials, but it will be beneficial to pro-actively include subscales as additional endpoints to successfully establish treatment effects of new antidepressants.
Meta-analyses are typically triggered by a (potentially false-significant) finding in one of the preceding primary studies. We studied consequences of meta-analysis investigating effects when primary ...studies that triggered such meta-analysis are also included.
We analytically determined the bias of the treatment effect estimates obtained by meta-analysis, conditional on the number of included primary and false-significant studies. The type I error rate and power of the meta-analysis were assessed using simulations. We applied a method for bias-correction, by subtracting an analytically derived bias from the treatment effect estimated in meta-analysis.
Bias in meta-analytical effects and type I error rates increased when increasing numbers of primary studies with false-significant effects were included. When 20% of the primary studies showed false-significant effects, the bias was 0.33 (z-score) instead of 0, and the type I error rate was 23% instead of 5%. After applying a bias-correction, the type I error rate became indeed 5%.
Inclusion of primary studies with false-significant effects leads to biased effect estimates and inflated type I error rates in the meta-analysis, depending on the number of false-significant studies. This bias can be adjusted for.
The first studies on renal denervation (RDN) suggest that this treatment is feasible, effective, and safe in the short term. Presently available data are promising, but important uncertainties exist; ...therefore, SYMPATHY has been initiated. SYMPATHY is a multicenter, randomized, controlled trial in patients randomized to RDN in addition to usual care (intervention group) or to continued usual care (control group). Randomization will take place in a ratio of 2 to 1. At least 300 participants will be included to answer the primary objective. Sample size may be extended to a maximum of 570 to address key secondary objectives. The primary objective is to assess whether RDN added to usual care compared with usual care alone reduces blood pressure (BP) (ambulatory daytime systolic BP) in subjects with an average daytime systolic BP ≥135, despite use of ≥3 BP-lowering agents, 6 months after RDN. Key secondary objectives are evaluated at 6 months and at regular intervals during continued follow-up and include the effect of RDN on the use of BP-lowering agents, in different subgroups (across strata of estimated glomerular filtration rate and of baseline BP), on office BP, quality of life, and cost-effectiveness.