Fatigability in spinal muscular atrophy: validity and reliability of endurance shuttle tests

Background To determine construct validity and test-retest reliability of Endurance Shuttle Tests as outcome measures for fatigability of remaining motor functions in children and adults with Spinal Muscular Atrophy (SMA) across the severity spectrum. Results We assessed the Endurance Shuttle - Nine Hole Peg Test (ESNHPT), − Box and Block Test (ESBBT) and – Walk Test (ESWT) in 61 patients with SMA types 2–4, 25 healthy controls (HC) and 15 disease controls (DC). Convergent validity, discriminative validity and test-retest reliability were investigated. Additionally, we compiled the Endurance Shuttle Combined Score (ESTCS) by selecting the most relevant endurance test of each individual. 54, 70 and 73% of patients with SMA demonstrated increased fatigability on the ESNHPT, ESBBT and the ESWT. Endurance response in SMA was characterized by a decrease in muscle strength, an increase in muscle fatigue and an increase in motor adaptions, thereby confirming convergent validity. Patients with SMA showed increased drop-out rates and a shorter endurance time compared to HC and DC demonstrating good discriminative validity. Test-retest reliability was moderate to excellent (ICC’s ranging from .78 to .91) with a trend towards better performance on retest. The ESTCS increased sample size and drop-out rate up to 100 and 85%. Conclusions Fatigability is an important additional dimension of physical impairments across the severity spectrum in children and adults with SMA. The EST’s are reliable and valid to document fatigability of walking, proximal- and distal arm function in SMA and thus are promising outcome measures for use in clinical trials.


Background
Hereditary proximal Spinal Muscular Atrophy (SMA) is a severe neuromuscular disorder with predominantly infantile or childhood onset and is caused by deficiency of the survival motor neuron (SMN) protein due to loss of function of the SMN1 gene [1]. SMA is characterised by progressive loss of muscle strength and motor function with a large clinical variety ranging from severe hypotonia in the first months of life (type 1), stalled gross motor development but the ability to sit without support (type 2), difficulties with or the loss of ambulation later in life (type 3) to relatively mild impairments in adulthood (type 4) [2][3][4][5]. Fatigability, defined as the inability to sustain repetitive physical activities, is increasingly being recognized as an important additional dimension of physical impairments and a target for therapeutic interventions [6][7][8][9]. Research into the effect of both SMN-augmenting treatment strategies and pharmacological compounds specifically targeting skeletal muscle on fatigability is hampered by the lack of sensitive and clinically relevant outcome measures for the assessment of fatigability [10][11][12][13]. Therefore, we recently established content validity and feasibility of the Endurance Shuttle Tests [7,14,15]. The primary objective of this study was to determine construct validity and reliability of the Endurance Shuttle -Walk Test, − Box and Block Test and -Nine Hole Peg Test as outcome measures for fatigability of walking, proximaland distal arm function in SMA types 2-4. The second objective was to compile and evaluate the Endurance Shuttle Test Combined Score to increase sensitivity and provide one single outcome measure for a broad range of phenotypes.

Subjects
Patients with SMA type 2, 3a, 3b and 4 were recruited from the Dutch national SMA registry (ww.treatnmd.eu/ patient registries) [2,16]. To minimize selection bias, all eligible patients from a total of more than 300 enrolled in this register were invited to participate. All patients had a confirmed homozygous deletion of the SMN1 gene or a heterozygous SMN1 deletion in combination with a disabling point mutation on the second SMN1 allele. Disease controls with another (genetically) confirmed neuromuscular disease were recruited from the paediatric neuromuscular outpatient clinic at the University Medical Center Utrecht and from Rijndam Rehabilitation Center in Rotterdam, the Netherlands. Healthy controls were recruited from the HU University of Applied Sciences, the University Medical Center Utrecht and through the subject's social network of family, friends and schoolmates. Inclusion criteria were an age between 8 and 60 years and the ability to follow test instructions. Subjects were excluded if they had a history of Myasthenia Gravis or another neuromuscular disorder known to cause fatigability or affect neuromuscular junction function, if they used drugs that change neuromuscular transmission, or if they had other medical problems that could interfere with the outcomes of the testing.

Study design
The study consisted of three visits (V1,V2,V3) within approximately 6 weeks (Table 1). At V1 we documented baseline characteristics and subjects practiced the endurance tests during 1 min to reduce the learning effect on test-retest reliability. At V2 and V3, subjects performed respectively test 1 (test) and test 2 (retest) at home or at the exercise laboratory in our hospital (both under supervision), depending the subjects preference. There was at least 1 week resting period between V2 and V3.

Muscle strength
We assessed muscle strength of 22 muscle groups on both sides using a slightly modified Medical Research Council (MRC) score (i.e. no distinction between MRC 0 and 1; in both cases we used a score of 1) and calculated the MRC sum score (Range: 44-220) [2]. We calculated a sub score for the upper limb strength using 11 muscle groups of the upper limb on both sides .

Endurance shuttle tests
The Endurance Shuttle -Nine Hole Peg Test (ESNHPT), − Box and Block Test (ESBBT) and -Walk Test (ESWT) were performed according to standardized procedures as previously described [7]. In short, we instructed subjects to repeatedly place and return 9 pegs in 9 holes, move 10 blocks over a partition or walk 10 m at 75% of their previously determined, individualized maximum speed. The individual rounds were paced by auditory signals. The test was ended when the subject was not able to keep up the pre-set pace during two consecutive shuttles or when the maximal duration of 20 min was reached (test completion). Subjects performed all tests they were physically capable of in a predetermined order starting with the ESNHPT followed by the ESBBT and the ESWT. Subjects recovered between tests for at least 30 min. Fourteen out of 25 (56%) HC performed tests for the duration of 10 (rather than 20) minutes. This test duration was chosen for the initial protocol but was later changed into 20 min to optimize outcome [7]. We corrected for differences in test duration during statistical analysis. For each performed Endurance Shuttle Test (EST), we documented two outcomes 'drop-out' (Yes/No) and 'time to limitation' (Tlim) (sec). Drop-out was defined as the inability to endure the maximum duration of 20 min. We also documented test acceptability, defined as the willingness to perform the endurance test again in the future using a visual analogue scale (VAS) with a range of 0-10 [17].

Fatigability parameters
We compared muscle strength, self-reported fatigue and motor adaptations before and directly after each EST. We

Changes in muscle strength
For change in muscle strength, we performed quantitative hand held myometry (type CT 3001, C.I.T. Technics, Groningen) according standardized procedures to measure maximal voluntary contraction (MVC) of five muscle groups of the dominant arm (shoulder abduction, elbow flexion, wrist extension, hand grip and pinch grip in subjects that performed the ESNHPT and ESBBT and of the dominant leg (hip flexion, hip abduction, knee extension, knee flexion and ankle dorsal flexion in subjects that performed the ESWT [18].

Self-reported fatigue
Subjects reported on general and local muscle fatigue with the OMNI scale of perceived exertion (0-10, [19].

Motor adaptations
We video-taped all patients during each EST to capture motor adaptations. Two assessors (BB, LH) independently compared four different aspects of performance of the first two and last two rounds of each EST: the disability to use different parts of the body together smooth and efficiently; increase in compensatory movements (i.e. movements used habitually to achieve functional motor skills when a normal movement pattern has not been established or is unavailable); increase in synkinesis (e.g. non-functional involuntary movement of muscles or limbs accompanying a voluntary movement) and decrease of the ability to move against gravity [20,21]. 'Motor adaption' was assumed when at least one aspect was scored as abnormal and 'no motor adaptation' when all aspects were normal. The assessors resolved any disagreements through discussion.

Construct validity
Construct validity refers to the degree to which the scores of an instrument are consistent with predefined hypotheses regarding relationships to scores of other instruments (convergent validity) or differences among relevant groups (discriminative validity) [15].

Convergent validity
To determine convergent validity, we used a linear mixed model (LMM) to assess muscle strength and self-reported fatigue in SMA while accounting for within-subject clustering with a random intercept. Time (0 and 1) was added to the model as fixed effect. Subsequently, we added 'dropout' and the interaction between 'time' and 'drop-out' as fixed effects to determine the effect of drop-out on muscle strength and self-reported fatigue. The association between drop-out and motor adaptations was studied with Pearsons Chi Square and Fisher's exact test. We hypothesized that subjects with SMA would demonstrate a lower muscle strength, higher self-reported fatigue and more motor adaptations directly after the EST compared to before.

Discriminative validity
We used the log-rank test to study whether the ESWT and ESBBT could discriminate between SMA and HC and the ESNHPT between SMA, HC and DC. Event probabilities were estimated using Kaplan Meyer estimates. Group differences in age (between SMA, HC and DC) and muscle strength (between SMA and DC) were tested with Mann-Whitney U test. We hypothesized that patients with SMA would demonstrate increased dropout rates and shorter endurance time compared to HC and DC.

Reliability
For test-retest reliability, we calculated the two-way mixed intra-class correlation coefficients (ICC), type consistency.
We defined ICC's as 'excellent' if the lower bound of the 95% CI > 0.80, 'high' if it ranged between 0.7-0.8, and 'moderate' if it ranged between 0.5-0.7 [22]. For agreement between test completion of test 1 and test 2, we calculated Cohen's kappa considering a kappa of 40-60% as moderate, 60-80% as substantial and > 80% as excellent agreement [22]. Due to repeated measurements of the time-to-event outcome (i.e. trial 1 and 2), we used a linear mixed Cox model with a Gaussian distribution to account for intra-individual clustering [23]. The linear mixed Cox model estimated the effect of retest (i.e. trial 2) on the probability of dropout and is expressed as hazard ratio. As visual illustration of testretest effect on the dropout probability, we modeled the first test (i.e. trial 1) using a parametric Weibull model. Subsequently, we reduced the estimated Weibull hazard rate with the hazard ratio from the linear mixed Cox model.

Subject characteristics
Sixty-one patients with SMA, 25 healthy controls and 15 disease controls completed the study ( Table 2). Three participants were excluded due to perceived burden

Construct validity and reliability
In this section we will describe outcomes of validity and reliability per separate EST and for the ESTCS.

Endurance shuttle tests ESNHPT
We observed an increase in general fatigue and local muscle fatigue of the upper arm, lower arm and hand after the test in patients with SMA (Table 3). We did not find a decrease in muscle strength. Motor adaptation occurred more frequently in patients with SMA with drop-out (p = .000). Drop-out was significantly higher in SMA compared to HC and DC (p = .000) (Fig. 1a). Drop-out was different between SMA type 2, type 3a and type 3b-4 (p = .001) (Fig. 1b). The test-retest reliability was moderate ( Table 4). Agreement on test completion between test 1 and test 2 was substantial. We observed a trend towards better performance on retest but this was not significant (Fig. 2a).

ESBBT
We observed a decrease in muscle strength of shoulder abduction and an increase in muscle fatigue of the upper arm, lower arm and hand after the test in patients with SMA (Table 3). We didn't find a significant difference between patients with and without drop-out. Motor adaptation occurred more frequently in patients with SMA with drop-out (p = .000). Drop-out was significantly higher in SMA compared to HC (p = .000) (Fig. 1c). Drop-out was different between SMA type 2, type 3a and type 3b-4 (p = .001) (Fig. 1d). The test-retest reliability was high (Table 4). Agreement on test completion between test 1 and test 2 was excellent. We observed a trend towards better performance on retest but this was not significant (Fig. 2b).

ESWT
We observed a decrease in muscle strength of knee flexion, an increase in general muscle fatigue and upper leg muscle fatigue, and an increase in motor adaptations after the test in patients with SMA (Table 3). We didn't find a significant difference between patients with and without drop-out. Drop-out was significantly higher in SMA compared to HC (p = .000) (Fig. 1e). The test-retest reliability was high and agreement on test completion between test-retest was excellent (Table 4). We observed a trend towards better performance on retest but this was not significant (Fig. 2c).
The test-retest reliability and agreement between test 1 and test 2 were moderate (Table 4). We observed a trend towards better performance on retest but this was not significant (Fig. 2d).

Discussion
The primary objective of this study was to determine construct validity and reliability of the EST's in patients with SMA. Results of our study indicate good convergent validity of EST's to assess fatigability and good discriminative validity between patients with SMA, HC and DC. Even with similar muscle strength, higher frequency of drop-out and shorter endurance time in patients with SMA were present compared to disease controls. These results indicate that fatigability is an important dimension of physical impairment in SMA separate from muscle strength.
The high prevalence of fatigability we report in both mildly and severe affected patients with SMA is consistent with recent studies that reported increased fatigability in ambulatory patients with SMA type 3 using the 6-min walk test (6MWT) and in type 2 patients with the repetitive Nine Hole Peg Test (r9HPT) [24,25]. The 6MWT and the r9HPT however, do not cover the large severity spectrum of SMA and use different methodologies which make them difficult to compare. Therefore, we developed a set of endurance shuttle tests based on the same construct using the same methodology in patients with mild, moderate and severe motor impairments [7]. The ESNHPT showed increased sensitivity of approximately 64% to capture fatigability during fine motor tasks in patients with SMA type 3a compared to 36% using the r9HPT [25]. The ESBBT is the first validated and sensitive fatigability test for proximal arm function in SMA and may be complementary to outcome measures that focus on arm motor function such as the Revised Upper Limb Measure (RULM), by adding the dimension of endurance [26]. Few studies have addressed the prevalence of fatigability and the variability in endurance capacity between ambulatory patients [24,27]. Our results show that most ambulatory patients do show fatigability during walking, but that the moment at which that occurs is highly variable. The fact that respectively over 80% of the patients with SMA were able to walk for more than 6 min at a constant walking speed during the ESWT, does suggest that the currently used 6MWT might not be sensitive to capture fatigability in patients with moderately limited ambulatory capacity. The ESWT could be a good alternative to capture change in endurance in ambulatory patients. The reliability of the EST's was good (ICC's .78-.91) and similar to the r9HPT and 6MWT (ICC's .71-.99) [25,28]. Reliability of the ESNHPT was slightly lower than the ESBBT and ESWT which was explained primarily by a learning effect we observed in some videos. We did not detect a learning effect in a previous study on the value of the r9HPT to document fatigability in SMA, so we anticipated that a practice session of 1 min would be sufficient to correct for motor learning [25]. Based on the findings in this study, a complete practice test of the entire duration of 20 min should be applied in the future. Ideally, outcome measures can be used across the severity spectrum of SMA without large floor-and ceiling effects. These and previously published data of motor function and endurance suggest that current performance measures are not sensitive to capture possible changes at the extreme ends of the spectrum of physical abilities [25,29]. A commonly used method to counteract this problem in functional scales, adding items to both ends of the hierarchical scale, is not applicable to exercise testing [26,30,31]. The second objective of this study was to develop a combined score that would allow comparison of patients with varying severity on their individual most relevant endurance test, thereby increasing sensitivity and circumventing subgroup analysis with less statistical power. The ESTCS increased sensitivity to detect fatigability and increased sample size compared to the ESNHPT (+ 31%, N = + 6), the ESBBT (+ 15%, N = + 24) and the ESWT (+ 12%, N = + 46). At the same time, test-retest reliability of the ESTCS was slightly lower compared to the reliability of the individual EST's. This implies that in the choice between a separate EST and the ESTCS, the size and heterogeneity of the study sample and the degree of reliability and sensitivity that are necessary to demonstrate trial efficacy have to be taken in account. An important strength of this study was the application of survival analysis to quantify fatigability in SMA which gave us the opportunity to include patients with severe fatigability that could only sustain the specific endurance test for a short amount of time. The alternative method that looks at change over time such as the 6MWT or repetitions such as the r9HPT might underestimate fatigability because patients that drop out early are often not included in the analysis. The use of hazard ratios is an innovative approach to test reliability and can be used to determine efficacy of clinical trials by calculating the difference with the hazard ratio of the treatment-versus placebo group. Longitudinal natural history studies and data from clinical trials are now required to determine whether the EST's are sensitive to detect clinically meaningful changes over time. We were not able to determine discriminative validity of the ESWT and the ESBBT between SMA and DC since few patients with Muscular Dystrophy we included were able to walk or lift their arms against gravity. Disease controls are generally hard to recruit and difficult to match with SMA on the severity and distribution of muscle weakness. Despite the limited number of DC's, we made a first step to explore differences in fatigability response between subjects with SMA and other neuromuscular diseases. The lower endurance time in patients with SMA compared to DC is in line with previous results using the repetitive nine hole peg test [25]. The available data suggest that the dramatic deterioration in muscle performance that we observed in many subjects with SMA, is not present to the same extent in disease controls even with similar muscle strength, but this needs further confirmation.

Conclusion
We show that the Endurance Shuttle Tests are reliable and valid to assess fatigability in patients with SMA across the spectrum of disease severity. This makes them promising outcome measures for application in standard care and clinical trials in patients with SMA.