Student's two-sample t test is generally used for comparing the means of two independent samples, for example, two treatment arms. Under the null hypothesis, the t test assumes that the two samples ...arise from the same normally distributed population with unknown variance. Adequate control of the Type I error requires that the normality assumption holds, which is often examined by means of a preliminary Shapiro-Wilk test. The following two-stage procedure is widely accepted: If the preliminary test for normality is not significant, the t test is used; if the preliminary test rejects the null hypothesis of normality, a nonparametric test is applied in the main analysis.
Equally sized samples were drawn from exponential, uniform, and normal distributions. The two-sample t test was conducted if either both samples (Strategy I) or the collapsed set of residuals from both samples (Strategy II) had passed the preliminary Shapiro-Wilk test for normality; otherwise, Mann-Whitney's U test was conducted. By simulation, we separately estimated the conditional Type I error probabilities for the parametric and nonparametric part of the two-stage procedure. Finally, we assessed the overall Type I error rate and the power of the two-stage procedure as a whole.
Preliminary testing for normality seriously altered the conditional Type I error rates of the subsequent main analysis for both parametric and nonparametric tests. We discuss possible explanations for the observed results, the most important one being the selection mechanism due to the preliminary test. Interestingly, the overall Type I error rate and power of the entire two-stage procedure remained within acceptable limits.
The two-stage procedure might be considered incorrect from a formal perspective; nevertheless, in the investigated examples, this procedure seemed to satisfactorily maintain the nominal significance level and had acceptable power properties.
Many clinical trials focus on the comparison of the treatment effect between two or more groups concerning a rarely occurring event. In this situation, showing a relevant effect with an acceptable ...power requires the observation of a large number of patients over a long period of time. For feasibility issues, it is therefore often considered to include several event types of interest, non-fatal or fatal, and to combine them within a composite endpoint. Commonly, a composite endpoint is analyzed with standard survival analysis techniques by assessing the time to the first occurring event. This approach neglects that an individual may experience more than one event which leads to a loss of information. As an alternative, composite endpoints could be analyzed by models for recurrent events. There exists a number of such models, e.g. regression models based on count data or Cox-based models such as the approaches of Andersen and Gill, Prentice, Williams and Peterson or, Wei, Lin and Weissfeld. Although some of the methods were already compared within the literature there exists no systematic investigation for the special requirements regarding composite endpoints.
Within this work a simulation-based comparison of recurrent event models applied to composite endpoints is provided for different realistic clinical trial scenarios.
We demonstrate that the Andersen-Gill model and the Prentice- Williams-Petersen models show similar results under various data scenarios whereas the Wei-Lin-Weissfeld model delivers effect estimators which can considerably deviate under commonly met data scenarios.
Based on the conducted simulation study, this paper helps to understand the pros and cons of the investigated methods in the context of composite endpoints and provides therefore recommendations for an adequate statistical analysis strategy and a meaningful interpretation of results.
Why do you need a biostatistician? Zapf, Antonia; Rauch, Geraldine; Kieser, Meinhard
BMC Medical research methodology,
02/2020, Volume:
20, Issue:
1
Journal Article
Peer reviewed
Open access
The quality of medical research importantly depends, among other aspects, on a valid statistical planning of the study, analysis of the data, and reporting of the results, which is usually guaranteed ...by a biostatistician. However, there are several related professions next to the biostatistician, for example epidemiologists, medical informaticians and bioinformaticians. For medical experts, it is often not clear what the differences between these professions are and how the specific role of a biostatistician can be described. For physicians involved in medical research, this is problematic because false expectations often lead to frustration on both sides. Therefore, the aim of this article is to outline the tasks and responsibilities of biostatisticians in clinical trials as well as in other fields of application in medical research.
Even though adaptive two-stage designs with unblinded interim analyses are becoming increasingly popular in clinical trial designs, there is a lack of statistical software to make their application ...more straightforward. The package adoptr fills this gap for the common case of two-stage one- or two-arm trials with (approximately) normally distributed outcomes. In contrast to previous approaches, adoptr optimizes the entire design upfront which allows maximal efficiency. To facilitate experimentation with different objective functions, adoptr supports a flexible way of specifying both (composite) objective scores and (conditional) constraints by the user. Special emphasis was put on providing measures to aid practitioners with the validation process of the package.
In clinical trials, the opportunity for an early stop during an interim analysis (either for efficacy or for futility) may relevantly save time and financial resources. This is especially important, ...if the planning assumptions required for power calculation are based on a low level of evidence. For example, when including two primary endpoints in the confirmatory analysis, the power of the trial depends on the effects of both endpoints and on their correlation. Assessing the feasibility of such a trial is therefore difficult, as the number of parameter assumptions to be correctly specified is large. For this reason, so-called 'group sequential designs' are of particular importance in this setting. Whereas the choice of adequate boundaries to stop a trial early for efficacy has been broadly discussed in the literature, the choice of optimal futility boundaries has not been investigated so far, although this may have serious consequences with respect to performance characteristics.
In this work, we propose a general method to construct 'optimal' futility boundaries according to predefined criteria. Further, we present three different group sequential designs for two endpoints applying these futility boundaries. Our methods are illustrated by a real clinical trial example and by Monte-Carlo simulations.
By construction, the provided method of choosing futility boundaries maximizes the probability to correctly stop in case of small or opposite effects while limiting the power loss and the probability of stopping the study 'wrongly'. Our results clearly demonstrate the benefit of using such 'optimal' futility boundaries, especially compared to futility boundaries commonly applied in practice.
As the properties of futility boundaries are often not considered in practice and unfavorably chosen futility boundaries may imply bad properties of the study design, we recommend assessing the performance of these boundaries according to the criteria proposed in here.
Sample size calculation is a central aspect in planning of clinical trials. The sample size is calculated based on parameter assumptions, like the treatment effect and the endpoint's variance. A ...fundamental problem of this approach is that the true distribution parameters are not known before the trial. Hence, sample size calculation always contains a certain degree of uncertainty, leading to the risk of underpowering or oversizing a trial. One way to cope with this uncertainty are adaptive designs. Adaptive designs allow to adjust the sample size during an interim analysis. There is a large number of such recalculation rules to choose from. To guide the choice of a suitable adaptive design with sample size recalculation, previous literature suggests a conditional performance score for studies with a normally distributed endpoint. However, binary endpoints are also frequently applied in clinical trials and the application of the conditional performance score to binary endpoints is not yet investigated.
We extend the theory of the conditional performance score to binary endpoints by suggesting a related one-dimensional score parametrization. We moreover perform a simulation study to evaluate the operational characteristics and to illustrate application.
We find that the score definition can be extended without modification to the case of binary endpoints. We represent the score results by a single distribution parameter, and therefore derive a single effect measure, which contains the difference in proportions Formula: see text between the intervention and the control group, as well as the endpoint proportion Formula: see text in the control group.
This research extends the theory of the conditional performance score to binary endpoints and demonstrates its application in practice.
Network meta-analysis is an extension of the classical pairwise meta-analysis and allows to compare multiple interventions based on both head-to-head comparisons within trials and indirect ...comparisons across trials. Bayesian or frequentist models are applied to obtain effect estimates with credible or confidence intervals. Furthermore, p-values or similar measures may be helpful for the comparison of the included arms but related methods are not yet addressed in the literature. In this article, we discuss how hypothesis testing can be done in a Bayesian network meta-analysis.
An index is presented and discussed in a Bayesian modeling framework. Simulation studies were performed to evaluate the characteristics of this index. The approach is illustrated by a real data example.
The simulation studies revealed that the type I error rate is controlled. The approach can be applied in a superiority as well as in a non-inferiority setting.
Test decisions can be based on the proposed index. The index may be a valuable complement to the commonly reported results of network meta-analyses. The method is easy to apply and of no (noticeable) additional computational cost.
Adaptive enrichment designs are an attractive option for clinical trials that aim at demonstrating efficacy of therapies, which may show different benefit for the full patient population and a ...prespecified subgroup. In these designs, based on interim data, either the subgroup or the full population is selected for further exploration. When selection is based on efficacy data, this introduces bias to the commonly used maximum likelihood estimator. For the situation of two‐stage designs with a single prespecified subgroup, we present six alternative estimators and investigate their performance in a simulation study. The most consistent reduction of bias over the range of scenarios considered was achieved by a method combining the uniformly minimum variance conditionally unbiased estimator with a conditional moment estimator. Application of the methods is illustrated by a clinical trial example.