From: Applicability and added value of novel methods to improve drug development in rare diseases

Description of the method | Requirements for use of the method | Potential advantages | Potential disadvantages |
---|---|---|---|

compared to developmental plans that supported approval | |||

LEVEL OF EVIDENCE | |||

Extrapolation [50] | |||

In small populations, a full independent drug development program to demonstrate efficacy may not be ethical, feasible or necessary. Extrapolations of evidence from a larger population to the smaller target population is widely used to support decisions in this situation.For the justification of requirements specified in EMA Paediatric Investigation Plans, this paper discusses how to specify the clinical trial design in the target population, when the data from the source population at the time of planning is not available but development in the target population will only start, after a treatment effect in the source population has been demonstrated. A framework based on prior beliefs is formulated to investigate whether the significance level for the test of the primary endpoint in confirmatory trials can be relaxed, and the sample size reduced, while controlling a certain level of certainty about the effects. The procedure is based on a so called skepticism factor, that quantifies the belief that a treatment effect observed in the larger population can be extrapolated to the target population. |
Factors that influence the possibility for extrapolation: ▪ Same underlying mechanism of action, similarity of response to treatment, similar dose-response relationship so the mechanism is translatable to the target population ▪ Same disease symptoms in adults and children, regarding similarity of disease progression ▪ Timing of the paediatric trial compared to the adult trial should allow extrapolation ▪ Repurposed drug or not or extension of indication |
▪ Optimised use of available evidence for the entire development programme ▪ Reduction of sample size |
▪ Difficulty lies in its novelty and application ▪ Parameters for prior need to be set realistically |

META-ANALYSIS | |||

Prior distributions for variance parameters in sparse-event meta-analysis (Pateras K, personnal communication) | |||

The small sample sizes in rare diseases make it particularly valuable to pool the data of small studies in a meta-analysis. When the primary outcome is binary, small sample sizes increase the chance of observing zero events. The frequentist random-effects model is known to induce bias and to result in improper interval estimation of the overall treatment effect in a meta-analysis with zero events. Bayesian hierarchical modelling could be a promising alternative. Bayesian models are known for being sensitive to the choice of between-study variance (heterogeneity) prior distributions in sparse settings. In a rare disease setting, only limited data will be available to base our prior on, therefore, the need to identify priors with robust properties is crucial. This paper shows that the Uniform (− 10; 10) heterogeneity prior on the log (T2) scale shows appropriate 95% coverage and induces relatively acceptable under/over estimation of both the overall treatment effect and heterogeneity, across a wide range of heterogeneity levels. We illustrate the results with two examples of a meta-analyses with a few small trials. |
▪ > = 2 RCTs ▪ Same endpoint in at least two trials, from which one primary endpoint ▪ Binary endpoint(s) ▪ Sparse events ▪ Not prerequisite but patients allocation ratio ideally 1:1 ▪ Treatment effect size estimates reported in harmonized (or harmonisable) manner ▪ Not prerequisite but ideally equally allocated number of patients per study |
▪ Optimised use and variance estimation in a sparse-event meta-analysis ▪ Quicker, optimal selection and use of appropriate heterogeneity priors distributions |
▪ Use of informative priors (even for heterogeneity) may be controversial. ▪ Optimal choice is simulation based, and unknown if it is best in a specific situation |

Heterogeneity estimators in zero cells meta-analysis [51] | |||

When a meta-analysis consists of a few small trials that report zero events, accounting for heterogeneity in the estimation of the overall effect is challenging. In practice, the data poses restrictions on the meta-analysis method employed that lead to deviations from the pre-planned analysis, such as the presence of zero events in at least one study arm. Estimators that performed modestly robust when estimating the overall treatment effect across a range of heterogeneity assumptions were the Sidik-Jonkman, Hartung-Makambi and improved Paul-Mandel. The relative performance of estimators did not materially differ between making a predefined or data-driven choice. The simulations confirmed that heterogeneity cannot be estimated reliably in a few small trials that report zero events. Estimators whose performance depends strongly on the presence of heterogeneity should be avoided. The choice of estimator does not need to depend on whether or not zero cells are observed. | ▪ Quicker and optimal selection of heterogeneity estimator in a sparse-event meta-analysis | ▪ Niche method and does not cover all heterogeneity estimators | |

INNOVATIVE TRIAL DESIGNS | |||

Critical appraisal of delayed-start design proposed as alternative to randomized controlled design in the field of rare diseases [52] | |||

In a delayed start randomization design, patients are randomised at baseline to receive either the intervention (early-start group) or placebo (delayed-start group) and after a certain period of time, the latter switch to the intervention until trial completion, therefore, reducing the time in placebo. Data collected at the end of placebo-phase allow for causal inferences, whereas the data collected at trial completion allow for investigation of disease-modifying effects. |
▪ The comparator needs to be placebo (severity and predictability of condition to allow for placebo arm use) ▪ Intervention needs to have lasting response/remission ▪ Ideal for slowly and constant progressive diseases |
▪ All patients eventually receive treatment ▪ Robust evidence from the randomised and controlled first phase of the design ▪ The switch time-point can provide extra information |
▪ Delay in some patients receiving treatment, compared to a single arm trial (but not different from parallel control arm) ▪ Not (always) more efficient than parallel arm trial |

Sample size reassessment and hypothesis testing in adaptive survival trials [53] | |||

This design allows a sample size reassessment during a trial where the primary outcome is the time to the occurrence of an event. The sample size reassessment is performed in an interim analysis and may be based on unblinded interim data, including secondary endpoints. This paper discusses major drawbacks of a fully unmasked sample-size recalculation, i.e. a decision based on all available efficacy and safety data, are potential intentional changes in the behaviour of the investigators, and the potential impossibility to include all patients in the final analysis and propose a test statistic for inclusion of all patients. |
▪ In case the sample size re-assessment is unmasked ▪ Time to outcome faster than accrual rate ▪ At least one interim analysis |
▪ Increased precision for sample size reassessment ▪ Preservation of type I error ▪ Inclusion of all (more patients) in the final test statistic (increases regulatory acceptability) |
▪ Logistically resource-wise more demanding ▪ Risks of bias associated with unblinded re-assessment |

Multi-arm group sequential designs with a simultaneous stopping rule [54] | |||

A design with 3 arms or more, with planned interim analyses with a simultaneous stopping rule using predefined boundaries. This rule aims to detect at least one efficacious treatment out of all tested arms. The trial may stop for one or more arms because of futility, or for all arms when efficacy is proven for at least one of them. |
▪ At least 3 arms including control (placebo) ▪ At least 1 interim analysis ▪ Time to outcome faster than accrual/enrolment rate Developed for normally distributed endpoints, conferrable to other types (i.e. binary), relying on asymptotic normality of the corresponding test statistics. |
▪ More patients are randomized to a treatment arm due to the common control arm ▪ More efficient use of available patients (i.e. lower average sample number than for the separate stopping rule, up to 21% depending on design): ▪ Possibility of head to head comparisons between different treatments |
▪ Not applicable to historically/externally controlled studies ▪ More complex trial conduct ▪ In case of no early estopping, interim analyses could result in an overall longer trial compared to single stage designs ▪ The potential to marginally miss a second efficacious intervention |

Sequential design for small samples starting from a maximum sample size [55] | |||

Using a group sequential design, an analysis will be performed before the trial is finished, based on the available data collected at that (pre-defined) moment. The aim of this design is to pick up large benefits or lack of benefit signals earlier. The proposed method uses the maximum available sample size as a starting point for planning the study, taking into account the desired chance to pick up a therapeutic effect if it really exists, and then continues with the refined calculations of the limit boundaries. This method determines the optimal number of interim analyses to be performed, while keeping the chance low of concluding that a treatment works - while in real life it does not work. |
▪ Needs to start from maximum sample size that can be recruited ▪ At least 1 interim analysis ▪ Time to outcome faster than accrual rate |
▪ Increased precision when using prior knowledge (from historical data or previous trials) to estimate treatment effect size, and thereby increased precision for the adjustment of boundaries ▪ Optimised use of maximum available patient pool in the development programme (especially for ultra-rare settings) |
▪ More interim analyses will provide extra work ▪ Sufficient level of evidence, but not overwhelming |

Bayesian sample size re-estimation using power priors [56] | |||

Bayesian statistics, use probability distributions, often including a probability of the belief in the intervention before the start of the trial (the prior). For normally distributed outcomes, an assumption for the variance needs to be made to inform the sample size needed, which is usually based on limited prior information, especially in small populations. When using a Bayesian approach, the aggregation of prior information on the variance with newly collected data is more formalized. The uncertainty surrounding prior estimates can be modelled with prior distributions. The authors adapt the previously suggested methodology to facilitate sample size re-estimation. In, addition, they suggest the employment of power priors in order for operational characteristics to be controlled. |
▪ At least 1 interim analysis ▪ Randomisation ▪ 1 control and 1 experimental arm ▪ Developed for continuous endpoints, transportable to other types of outcomes |
▪ More efficient use of available patients for the development programme (i.e. smaller sample size) ▪ Increased precision when optimally using prior knowledge (from historical data or previous trials) to estimate treatment effect size ▪ Control of type I error | ▪ Extra patients needed in case of effect size overestimation |

Dynamic borrowing using power priors that control type I error [57] | |||

In rare diseases, where available data is scarce and heterogeneity between trials is less well understood, the current methods of meta-analysis fall short. The concept of power priors can be useful, particularly for borrowing evidence from a single historical study. Such power priors are expressed as a parameter, which in most situations has a direct translation as a fraction of the sample size of the historical study that is included in the analysis of the new study. However, the possibility of borrowing data from a historical trial will usually be associated with an inflation of the type I error. Therefore in this paper a new, simple method of estimating the power parameter in the power prior formulation is suggested, suitable when only one historical dataset is available. This method is based on predictive distributions and parameterized in such a way that the type I error can be controlled, by calibrating the degree of similarity between the new and historical data. |
▪ Essential to have robust data from ideally previous similar studies ▪ Developed for normal responses in a one or two group setting, but the generalization to other models is straightforward |
▪ More efficient use of available patients for the development programme (i.e. smaller sample size) ▪ Increased precision when optimally using prior knowledge (from historical data or previous trials) to estimate treatment effect size ▪ Control of operational characteristics while modelling the heterogeneity between the historical and emerging data | ▪ Extra patients needed in case of effect size overestimation |

STUDY ENDPOINTS AND STATISTICAL ANALYSIS | |||

Fallback tests for co-primary endpoints [58] | |||

Usually, when the efficacy of an intervention is measured by co-primary endpoints, efficacy may be claimed only if for each endpoint an individual statistical test is significant. While this strategy controls the type I error, it is often very conservative, and does not allow for inference if only one of the co-primary endpoints shows significance.This paper describes the use of fall-back tests. They reject the null hypothesis in exactly the same way as the classical tests, with the advantage that they allow for inference in settings where only some of the co-primary endpoints show a significant effect. Similarly to the fall-back tests defined for hierarchical testing procedures, these fall-back tests for co-primary endpoints allow to continue testing, even the primary objective of the trial was not met. |
▪ At least 2 co-primary endpoints ▪ One test per endpoint |
▪ No need for hierarchical pre-specification and testing of multiple co-primary endpoints ▪ Improved statistical testing (more chances to detect one dimension of treatment effect and benefit even if the primary objective has not been met) ▪ Control of family-wise error rate (FWER) | ▪ Potentially more patients needed |

Optimal exact tests for multiple binary endpoints [59] | |||

In confirmatory trials with small sample sizes, hypothesis tests developed for large samples - based on asymptotic distributions - are often not valid. Exact non-parametric procedures are applied instead. However, exact non-parametric procedures are based on discrete test statistics and can become very conservative. With standard adjustments for multiple testing, they become even more conservative. Exact multiple testing procedures are proposed, for the setting where multiple binary endpoints are compared in two parallel groups. Based on the joint conditional distribution of the test statistics of Fisher’s exact test, the optimal rejection regions for intersection hypothesis tests are constructed. To efficiently search the large space of possible rejection regions, the an optimization algorithm is proposed based on constrained optimization and integer linear programming. Depending on the objective of the optimization, the optimal test yields maximal power under a specific alternative, maximal exhaustion of the nominal type I error rate, or the largest possible rejection region controlling the type I error rate. Applying the closed testing principle, the authors construct optimized multiple testing procedures with strong familywise error rate control. In addition, they propose a greedy algorithm for nearly optimal tests, which is computationally more efficient. |
▪ Multiple dichotomous/binary outcomes ▪ Two or more endpoints ▪ Gain is strongest in very small sample sizes (1 to 50 per group) ▪ A priori definition of the optimization criterion. ▪ Prior assumption on effect sizes when optimizing power |
▪ Optimised multiple testing procedure for dichotomous endpoints ▪ Maximal power use of the statistical test ▪ Control of family-wise error rate (FWER) ▪ Robust evidence ▪ Useful for (very) small sample sizes | ▪ Potentially more patients needed |

Simultaneous inference for multiple marginal GEE models [60] | |||

A framework is proposed for using generalized estimating equation models for each endpoint marginally considering dependencies within the same subject. The asymptotic joint normality of the stacked vector of marginal estimating equations is used to derive Wald-type simultaneous confidence intervals and hypothesis tests for linear contrasts of regression coefficients of the multiple marginal models. The small sample performance of this approach is improved by adapting the bias correction proposed by Mancl and DeRouen to the estimate of the joint covariance matrix of the regression coefficients from multiple models. As a further improvement a multivariate t-distribution with appropriate degrees of freedom is specified as reference distribution. Alternatively, a generalized score test based on the stacked whom correspondence should be addressed estimating equations is derived. By means of simulation studies, control of type I error rate for these methods is shown even with small sample sizes and also increased power compared to a Bonferroni multiplicity adjustment. The proposed methods are suitable to efficiently use the information from dependent observations of multiple endpoints in small-sample studies. If simultaneous confidence intervals for two or more endpoints are of interest, this approach can be used. Additionally, an R software package has been developed (`mmmgee’) for computational implementation of this framework. | ▪ Repeated measurements |
▪ Robust evidence from longitudinal data ▪ Estimation of endpoints separately while taking into account dependencies within the same patient |
▪ Technically more complex ▪ Not all trials make use of repeated measurements |

Goal Attainment Scaling [61] | |||

Goal Attainment Scaling is a measurement instrument that measures the attainment of different goals of patients in a standardized way. The goals are measured in the same way for every patient, but the content of the goals can be different between patients. To apply goal attainment scaling, the caregiver and the patient sit together to decide what the goals of the patient are, and how they can be defined in five levels. Next, the patient receives the intervention (preferably blinded). Then after the intervention the patient and doctor assess how well the goals have been attained. Due to the different content of the goals for different patients goal attainment scaling can be used in groups of patients who all have different complaints, which is often the case in rare diseases. Another advantage is that it is very sensitive to change. |
▪ Essential that there is no primary endpoint that is relevant for all patients ▪ Heterogeneous disease course with stable baseline values for goal(s) setting ▪ It has to be actual treatment (not prevention) ▪ (Can only be interpreted in a) randomised controlled trial ▪ Measurement relevant at functional level |
▪ The goals are individually defined in consultation with patients and chosen per patient, hence customised measurement of therapeutic effect ▪ Time-demanding aspect, needed for detailed construction and definition of goals, may be less of a concern when there is a (very) limited number of available patients ▪ Direct patient involvement in efficacy assessment |
▪ Time-consuming to set (multiple) goals individually per patient ▪ Choice of goals must be realistic and associated with potential treatment effect ▪ Translation of effect size at group level to clinical benefit difficult |