Skip to main content

Table 3 Regulatory perspectives on the utility of the N-of-1 trial data

From: Aggregated N-of-1 trials for unlicensed medicines for small populations: an assessment of a trial with ephedrine for myasthenia gravis

Themes and subthemes

Response of NHCI regarding reimbursement

Response of MEB regarding licensing

Considerations for accepting evidence from N-of-1 trialsa

Condition and treatment

Chronic, stable disease; fast on- and offset of treatment

Chronic, stable disease; fast on- and offset of treatment effect

Prevalence

A traditional RCT is not feasible due to small patient numbers. The Feasible Information Trajectory [24] indicates that the N-of-1 design is acceptable.

General approaches to rare conditions exist. b

The N-of-1 design is a last resort option for rare conditions

Generalizability to population level

The results of individual patients’ trials must be aggregated to allow a statement about effectiveness at the population level.

Patients should be sufficiently diverse (e.g., characteristics, context of care).c

Individual results should be shown.d

An overall mean may not be interpretable if there is high heterogeneity.e

-

The number of patients involved [is] crucial for generalization of effect

Consider how many measurements are needed per patient and the total number of patients to be included.

The number of periods assessed must be sufficient

-

Multiple study sites are preferable

Other

The evidence of effectiveness must be published in a peer-reviewed journal.

N-of-1 trials are not suitable for substantiating safety in a benefit/risk assessment for market authorization. Safety needs to be substantiated otherwise.

Evidence from this series of trials with ephedrine as add-on for myasthenia gravis

Patient characteristics and external validity

Patients were receiving usual care as recommended in international guidelines, thus the usual care is acceptable.

One patient was taking low-dose prednisone and an immunosuppressant during the trial. How might this have influenced the results?

All patients were female. Are the results generalizable to males?

The MG population in this study is heterogeneous with differences in disease duration, baseline scores, concomitant treatment. Hence there should be a justification to which extent these results based on 4 subjects can be extrapolated to the general MG population.

Comparator

Placebo control is suitable for answering the question whether adding ephedrine to usual care is more effective than not adding it.

-

Outcome measures

QMG is acceptable as primary outcome measure because it is endorsed by MGFA. Clarification is desirable on QMG’s validation and correlation to EQ-5D, in case EQ-5D cannot be used for this indication.

-

Timing

Is a 5-day treatment period sufficient to show a clinically relevant effect on the chosen outcome measures, considering that previous studies in myasthenia gravis used 14-day treatment periods?

-

Variability of observed effects within patients: questioning the validity of the N-of-1 design for this indication

 

On a group level, ephedrine tended to be effective based on the effect estimators, but responses within patients were highly variable. In Figure one of the Briefing document [Additional file 1] the mean of the placebo episodes and the mean of the ephedrine episodes per patient are presented. However, the observed scores within these episodes were rather variable. A clear, consistent treatment effect within a patient was not observed. In response the Applicant stated that motor performance in MG is highly variable over the day. This challenges whether the assumptions of the N-of-1 trial can be met as patients appear not to return to baseline levels in motor performance after stopping treatment. In theory, MG could have met the assumptions for an N-of-1 trial, but this was not clearly observed in the data. Shortly, individual patients could not be classified as responders. Moreover, the severity of motor symptoms in MG may vary over the days. It may even vary within a day, as it will decrease during the day especially in case of exercise. As such, it may be questioned whether patients included had the correct baseline characteristics: was disease activity constant?

Within each patient, responses for the different scales were not consistent, adding to the variability of the data, hampering the interpretation of the results.

Clinical relevance and statistical significance of effect in primary outcome measure

The clinical relevance of the found effect, 1 point reduction in QMG, requires further support. For the initial treatment of myasthenia gravis, a reduction of 3.5 points on the QMG scale is clinically relevant. Is a reduction of 1 a small, average or large effect? Expression as a SMD would be helpful. Is there literature to support its relevance for add-on ephedrine? Is a MCID known for QMG?

Although significant p values were presented for the treatment effect of add-on ephedrine compared to add-on placebo, the clinical relevance of the effects was considered inconclusive. Effect size is small which questions the clinical relevance of effects. The initial, postulated minimal clinically important difference by the Applicant was 3.5 points on the QMG scale. This was based on literature and was not met. In the discussion meeting the Applicant stated that this might have been too ambitious and a lower effect size may still be relevant. However, this prompts a justification why the observed differences would be clinically relevant.

Interpretation of statistical results in the trial patients compared to inference at population level

-

-

Outcomes not addressed in the trial

The trial at hand is not suitable for answering the question whether ephedrine postpones or prevents the use of treatments with a higher risk profile. NHCI wondered whether applicants could provide insight from this series of trials or literature.

The argument that the treatment might reduce or postpone corticosteroid use should be demonstrated.

 

Clinical relevance should also be discussed in terms of benefit/risk. (N-of-1 trial not suitable, see above). Safety needs to be substantiated otherwise. The effect size should be weighed against safety e.g., long-term cardiovascular risks. Applicant did state that intermittent use was aimed for.

Sufficiency of the evidence for a decision

NHCI cannot at present give a definitive answer whether this series of N-of-1 trials shows that the treatment in question is “Established medical science and medical practice” [33]

The data from the trial are not adequate to base a marketing authorization upon.

Desired level of precision/how many patients still to include (for MEB only): What outcome or type of analysis would be recommended?

It is the applicants’ responsibility to demonstrate that the statistical methodology as applied to this aggregated N-of-1 trial design, is the correct methodology to enable a reliable statement about an effect at the population level.

The observed data do not allow the conclusion that the conditions for a N-of-1 trial have been met.

Firm recommendations on what would suffice [how many more patients to include in the aggregated N-of-1 trial and/or what specific outcome or type of analysis] cannot be made as it depends on the reasons for failure of the current study design [to show clinical relevance] [e.g., insufficiencies in the trial design, inclusion of an incorrect patient population and/or in fact the drug’s being ineffective]. Options might be inclusion of a positive control with a clear symptomatic effect (e.g., N-is-1 trial with placebo/ephedrine/acetylcholine crossovers) in combination with selection of a more responsive patient population. Revision of the inclusion criteria of the study population to assure a constant disease activity also may increase the probability of showing a symptomatic treatment effect. A different study design e.g., a parallel group trial with a longer treatment duration where the day to day variability in scores can be averaged, also may be considered.

 

At the present stage it could not make a decision under the framework of “Established medical science and medical practice” on whether ephedrine is reimbursable for the indication under consideration, on the basis of the trial results and the scope of the trial (including the number of patients)

 
  1. aIncluding the MEB’s advice on N-of-1 methodology in general, given during the design stage of the ephedrine trial before the data were available
  2. bFor rare diseases an applicant may propose and motivate (and discuss with the MEB) which level of evidence they would consider sufficient, e.g., increase type I error to 10% instead of 5% and/or register with a small sample (because the disease is very rare, mechanism of action well-understood and generalizable)
  3. cApplicants should present a discussion of why the treatment effect could be generalized to the population intended. This should address whether the included patients are sufficiently diverse (patient characteristics, context of care surrounding the patients)
  4. dDealing with several N-of-1 trials is like dealing with a meta-analysis with patients instead of trials. Therefore, as an analogue of a Forest plot, a box-and-whisker plot should be provided. Per patient, this will provide information on the median (and mean) effect, the quartile range of effects, and full range of effects seen for that patient. This will provide information on the generalizability (boxes closer together: more uniform effect over patients) as well as of the repeatability of the effect within a patient (smaller boxes, better repeatability)
  5. eIf heterogeneity of effects is substantial: a discussion of sources and explanation for this heterogeneity should be attempted. This should include differences in patients’ characteristics, their context (e.g., the surrounding care in the hospital) and treatment (is the treatment differentially implemented per patient). If given, the interpretability of an overall mean should be discussed. E.g. if treatment effects vary much, the value of an overall mean has no interpretation. However, a large variation of effects but all “positive” supported by a significant overall test could support the statement that the treatment works (with differential effect)