A draft addendum to ICH E9 has been released for public consultation in August 2017. The addendum focuses on two topics particularly relevant for randomized confirmatory clinical trials: estimands ...and sensitivity analyses. The need to amend ICH E9 grew out of the realization of a lack of alignment between the objectives of a clinical trial stated in the protocol and the accompanying quantification of the “treatment effect” reported in a regulatory submission. We embed time‐to‐event endpoints in the estimand framework and discuss how the four estimand attributes described in the addendum apply to time‐to‐event endpoints. We point out that if the proportional hazards assumption is not met, the estimand targeted by the most prevalent methods used to analyze time‐to‐event endpoints, logrank test, and Cox regression depends on the censoring distribution. We discuss for a large randomized clinical trial how the analyses for the primary and secondary endpoints as well as the sensitivity analyses actually performed in the trial can be seen in the context of the addendum. To the best of our knowledge, this is the first attempt to do so for a trial with a time‐to‐event endpoint. Questions that remain open with the addendum for time‐to‐event endpoints and beyond are formulated, and recommendations for planning of future trials are given. We hope that this will provide a contribution to developing a common framework based on the final version of the addendum that can be applied to design, protocols, statistical analysis plans, and clinical study reports in the future.
In this paper, we derive the joint distribution of progression‐free and overall survival as a function of transition probabilities in a multistate model. No assumptions on copulae or latent event ...times are needed and the model is allowed to be non‐Markov. From the joint distribution, statistics of interest can then be computed. As an example, we provide closed formulas and statistical inference for Pearson's correlation coefficient between progression‐free and overall survival in a parametric framework. The example is inspired by recent approaches to quantify the dependence between progression‐free survival, a common primary outcome in Phase 3 trials in oncology and overall survival. We complement these approaches by providing methods of statistical inference while at the same time working within a much more parsimonious modeling framework. Our approach is completely general and can be applied to other measures of dependence. We also discuss extensions to nonparametric inference. Our analytical results are illustrated using a large randomized clinical trial in breast cancer.
The assessment of safety is an important aspect of the evaluation of new therapies in clinical trials, with analyses of adverse events being an essential part of this. Standard methods for the ...analysis of adverse events such as the incidence proportion, that is the number of patients with a specific adverse event out of all patients in the treatment groups, do not account for both varying follow‐up times and competing risks. Alternative approaches such as the Aalen–Johansen estimator of the cumulative incidence function have been suggested. Theoretical arguments and numerical evaluations support the application of these more advanced methodology, but as yet there is to our knowledge only insufficient empirical evidence whether these methods would lead to different conclusions in safety evaluations. The Survival analysis for AdVerse events with VarYing follow‐up times (SAVVY) project strives to close this gap in evidence by conducting a meta‐analytical study to assess the impact of the methodology on the conclusion of the safety assessment empirically. Here we present the rationale and statistical concept of the empirical study conducted as part of the SAVVY project. The statistical methods are presented in unified notation, and examples of their implementation in R and SAS are provided.
The development of oncology drugs progresses through multiple phases, where after each phase, a decision is made about whether to move a molecule forward. Early phase efficacy decisions are often ...made on the basis of single‐arm studies based on a set of rules to define whether the tumor improves (“responds”), remains stable, or progresses (response evaluation criteria in solid tumors RECIST). These decision rules are implicitly assuming some form of surrogacy between tumor response and long‐term endpoints like progression‐free survival (PFS) or overall survival (OS). With the emergence of new therapies, for which the link between RECIST tumor response and long‐term endpoints is either not accessible yet, or the link is weaker than with classical chemotherapies, tumor response‐based rules may not be optimal. In this paper, we explore the use of a multistate model for decision‐making based on single‐arm early phase trials. The multistate model allows to account for more information than the simple RECIST response status, namely, the time to get to response, the duration of response, the PFS time, and time to death. We propose to base the decision on efficacy on the OS hazard ratio (HR) comparing historical control to data from the experimental treatment, with the latter predicted from a multistate model based on early phase data with limited survival follow‐up. Using two case studies, we illustrate feasibility of the estimation of such an OS HR. We argue that, in the presence of limited follow‐up and small sample size, and making realistic assumptions within the multistate model, the OS prediction is acceptable and may lead to better early decisions within the development of a drug.
The SAVVY project aims to improve the analyses of adverse events (AEs) in clinical trials through the use of survival techniques appropriately dealing with varying follow-up times and competing ...events (CEs). This paper summarizes key features and conclusions from the various SAVVY papers.
Summarizing several papers reporting theoretical investigations using simulations and an empirical study including randomized clinical trials from several sponsor organizations, biases from ignoring varying follow-up times or CEs are investigated. The bias of commonly used estimators of the absolute (incidence proportion and one minus Kaplan-Meier) and relative (risk and hazard ratio) AE risk is quantified. Furthermore, we provide a cursory assessment of how pertinent guidelines for the analysis of safety data deal with the features of varying follow-up time and CEs.
SAVVY finds that for both, avoiding bias and categorization of evidence with respect to treatment effect on AE risk into categories, the choice of the estimator is key and more important than features of the underlying data such as percentage of censoring, CEs, amount of follow-up, or value of the gold-standard.
The choice of the estimator of the cumulative AE probability and the definition of CEs are crucial. Whenever varying follow-up times and/or CEs are present in the assessment of AEs, SAVVY recommends using the Aalen-Johansen estimator (AJE) with an appropriate definition of CEs to quantify AE risk. There is an urgent need to improve pertinent clinical trial reporting guidelines for reporting AEs so that incidence proportions or one minus Kaplan-Meier estimators are finally replaced by the AJE with appropriate definition of CEs.
A randomized trial allows estimation of the causal effect of an intervention compared to a control in the overall population and in subpopulations defined by baseline characteristics. Often, however, ...clinical questions also arise regarding the treatment effect in subpopulations of patients, which would experience clinical or disease related events post‐randomization. Events that occur after treatment initiation and potentially affect the interpretation or the existence of the measurements are called intercurrent events in the ICH E9(R1) guideline. If the intercurrent event is a consequence of treatment, randomization alone is no longer sufficient to meaningfully estimate the treatment effect. Analyses comparing the subgroups of patients without the intercurrent events for intervention and control will not estimate a causal effect. This is well known, but post‐hoc analyses of this kind are commonly performed in drug development. An alternative approach is the principal stratum strategy, which classifies subjects according to their potential occurrence of an intercurrent event on both study arms. We illustrate with examples that questions formulated through principal strata occur naturally in drug development and argue that approaching these questions with the ICH E9(R1) estimand framework has the potential to lead to more transparent assumptions as well as more adequate analyses and conclusions. In addition, we provide an overview of assumptions required for estimation of effects in principal strata. Most of these assumptions are unverifiable and should hence be based on solid scientific understanding. Sensitivity analyses are needed to assess robustness of conclusions.
Adapting the final sample size of a trial to the evidence accruing during the trial is a natural way to address planning uncertainty. Since the sample size is usually determined by an argument based ...on the power of the trial, an interim analysis raises the question of how the final sample size should be determined conditional on the accrued information. To this end, we first review and compare common approaches to estimating conditional power, which is often used in heuristic sample size recalculation rules. We then discuss the connection of heuristic sample size recalculation and optimal two‐stage designs, demonstrating that the latter is the superior approach in a fully preplanned setting. Hence, unplanned design adaptations should only be conducted as reaction to trial‐external new evidence, operational needs to violate the originally chosen design, or post hoc changes in the optimality criterion but not as a reaction to trial‐internal data. We are able to show that commonly discussed sample size recalculation rules lead to paradoxical adaptations where an initially planned optimal design is not invariant under the adaptation rule even if the planning assumptions do not change. Finally, we propose two alternative ways of reacting to newly emerging trial‐external evidence in ways that are consistent with the originally planned design to avoid such inconsistencies.
For a trial with primary endpoint overall survival for a molecule with curative potential, statistical methods that rely on the proportional hazards assumption may underestimate the power and the ...time to final analysis. We show how a cure proportion model can be used to get the necessary number of events and appropriate timing via simulation. If phase 1 results for the new drug are exceptional and/or the medical need in the target population is high, a phase 3 trial might be initiated after phase 1. Building in a futility interim analysis into such a pivotal trial may mitigate the uncertainty of moving directly to phase 3. However, if cure is possible, overall survival might not be mature enough at the interim to support a futility decision. We propose to base this decision on an intermediate endpoint that is sufficiently associated with survival. Planning for such an interim can be interpreted as making a randomized phase 2 trial a part of the pivotal trial: If stopped at the interim, the trial data would be analyzed, and a decision on a subsequent phase 3 trial would be made. If the trial continues at the interim, then the phase 3 trial is already underway. To select a futility boundary, a mechanistic simulation model that connects the intermediate endpoint and survival is proposed. We illustrate how this approach was used to design a pivotal randomized trial in acute myeloid leukemia and discuss historical data that informed the simulation model and operational challenges when implementing it.
For the analysis of a time‐to‐event endpoint in a single‐arm or randomized clinical trial it is generally perceived that interpretation of a given estimate of the survival function, or the comparison ...between two groups, hinges on some quantification of the amount of follow‐up. Typically, a median of some loosely defined quantity is reported. However, whatever median is reported, is typically not answering the question(s) trialists actually have in terms of follow‐up quantification. In this paper, inspired by the estimand framework, we formulate a comprehensive list of relevant scientific questions that trialists have when reporting time‐to‐event data. We illustrate how these questions should be answered, and that reference to an unclearly defined follow‐up quantity is not needed at all. In drug development, key decisions are made based on randomized controlled trials, and we therefore also discuss relevant scientific questions not only when looking at a time‐to‐event endpoint in one group, but also for comparisons. We find that different thinking about some of the relevant scientific questions around follow‐up is required depending on whether a proportional hazards assumption can be made or other patterns of survival functions are anticipated, for example, delayed separation, crossing survival functions, or the potential for cure. We conclude the paper with practical recommendations.