Quantitative strength testing in ALS clinical trials
Citation Manager Formats
Make Comment
See Comments

Abstract
Objective: To study the attributes of quantitative strength testing using hand-held dynamometry (HHD) as an efficacy measure in 2 large phase 3 amyotrophic lateral sclerosis (ALS) trials.
Methods: In the phase 3 trials of ceftriaxone and dexpramipexole, 513 and 943 patients, respectively, were enrolled in double-blind, randomized, placebo-controlled trials with planned follow-up of at least 1 year. Patients were studied every 3 months in the ceftriaxone study and every 2 months in the dexpramipexole study. Evaluators of HHD were trained and had to show evidence of adequate performance of strength testing; the testing paradigm involved testing 9 muscle groups in the upper and lower extremity bilaterally. Neither drug significantly affected any outcome measure. Strength measurements were evaluated by individual muscle and by megascores, which averaged scaled strength measures to produce an overall measure of muscle strength.
Results: A measure combining rate of decline with both within- and between-patient variabilities of measurement, the coefficient of variation for rate of change, was calculated, and showed that HHD overall performed slightly less well than Amyotrophic Lateral Sclerosis Functional Rating Scale–revised (ALSFRS-R) but better than vital capacity. Individual muscles were highly correlated to the identical muscles on the contralateral side, as well as to other muscles in the same body region. Strength decline was correlated both with ALSFRS-R and vital capacity.
Conclusion: Quantitative strength testing using HHD is a reliable and reproducible measure of decline in ALS.
GLOSSARY
- ALS=
- amyotrophic lateral sclerosis;
- ALSFRS-R=
- Amyotrophic Lateral Sclerosis Functional Rating Scale–revised;
- CoV(R)=
- coefficient of variation for rate of change;
- HHD=
- hand-held dynamometer;
- MMT=
- manual muscle testing;
- VC=
- vital capacity
Clinical trials in amyotrophic lateral sclerosis (ALS) have employed a variety of efficacy measures in recent years.1,–,4 Survival has been considered the gold standard, but its use requires long studies with many participants. Functional rating scales combine attributes and may be insensitive to change. Quality of life measures have been developed specifically for ALS, but change in such measures with disease progression has been surprisingly modest.5 Vital capacity (VC) assesses only a limited aspect of the motor system; however, VC is reproducibly measured in patients with ALS, declines at a predictable rate, and rate of decline predicts survival.6,–,8
The hallmark of ALS is a progressive loss of motor function. Patients become progressively weaker over time; this can be measured qualitatively9,–,12 or quantitatively. Quantitative muscle testing in ALS was pioneered by Andres et al. in the 1980s.13 With extensive evaluator training, reproducible data were obtained from which extremity specific megascores could be calculated. However, the evaluation required multiple participant position changes and was fatiguing; in addition, a limited number of muscles could be tested.
More recently, a method of quantitative muscle strength testing using a hand-held dynamometer (HHD) was developed. With HHD, more muscles are evaluated. A standardized training program, well-defined patient positions for each muscle tested, and validation of training competency were all put in place to minimize variability. Two recent phase 3 trials employed quantitative strength testing with HHD as a secondary outcome.3,14 We describe the characteristics of HHD as compared to other measures used in both studies.
METHODS
Primary results of the dexpramipexole (EMPOWER) and ceftriaxone trials have been described.3,14 In the EMPOWER study, 943 participants, randomized 1:1 to either placebo or dexpramipexole, were enrolled at 81 study sites in North America and Europe. The ceftriaxone study enrolled 513 participants in 59 sites in North America only, 340 on ceftriaxone and 173 on placebo. For EMPOWER, patients (aged >18 years) were eligible for enrollment if they had a diagnosis of possible, laboratory-supported probable, probable, or definite ALS, a slow VC >65% of the predicted normal value, and symptom duration <2 years before enrollment. Participants were permitted to use riluzole if they had been on a stable dose for ≥30 days. Inclusion criteria were identical to the ceftriaxone study except that VC was ≥60% predicted and disease duration <3 years. In both studies, some patients were treated longer than 12 months; however, only the first 12 months of treatment were included in this analysis. Patients were seen for efficacy evaluations every 3 months during the ceftriaxone study and every 2 months for EMPOWER; patients who suspended active treatment of study medication were nonetheless encouraged to continue their regularly scheduled outcome visits according to the intent-to-treat principle. Efficacy measures included Amyotrophic Lateral Sclerosis Functional Rating Scale–revised (ALSFRS-R), VC, ALS-Specific Quality of Life instrument, assessment of vital status, and quantitative strength testing using HHD of 9 upper and lower extremity muscles measured bilaterally (table 1). The instrument chosen for use was the MicroFET2.
Slopes and coefficient of variation (CoV) for rate of change estimates based on regression analysis
Prior to sites being approved to begin recruitment, training for performance for all outcome measures was completed. For strength testing, this training included viewing an instructional video followed by a hands-on training session using normal volunteers. Evaluators were trained on participant positioning for each muscle to be evaluated, as well as their own positioning to minimize variability (the HHD training manual is included as appendix e-1 on the Neurology® Web site at Neurology.org). Subsequent to this training session, all evaluators were required to test 4 healthy volunteers twice, and test-retest variability was determined for each muscle tested. Average test-retest variability was required to be less than 15% before evaluators were certified to study patients; repeated testing occurred until this criterion was met. For VC, evaluators tested 3 healthy participants, and had to demonstrate less than 15% variability between the maximum VC value and the next highest value. For ALSFRS-R, evaluators used vignettes of the scale being performed on patients with ALS, and rated the vignettes. For 2 vignettes, they were required to be within 2 points of the total score as rated by an expert panel, and have no more than a 1-point difference from the panel on any given item. The same training and certification process was followed for both studies, and the trainers were from the same group as well.
For each participant visit during the trial, strength testing was performed with a patient sitting in a hard-backed chair with armrests. If a patient was too weak to transfer comfortably, he or she could be tested in a wheelchair. Each muscle was tested with a standard beginning position, usually midway between maximum flexion or extension of the muscle being studied. After the standard position was attained, the evaluator placed the HHD device along the limb, with standard position defined for each muscle. The patient was instructed to steadily increase force against the device until he or she achieved maximal exertion; during that time, the evaluator attempted to match the patient's force so that the limb did not move. After maximum strength was achieved, the evaluator exerted force against the dynamometer sufficient to overcome the participant's contraction. For very strong muscles (for example, knee extensors), evaluators occasionally could not overcome the patient's strength. This failure was noted on the case report form. For each muscle to be tested, 2 evaluations were performed separated by at least 15 seconds. If the variability of the 2 evaluations was less than 15%, the maximum value was recorded and the evaluator proceeded to the next muscle to be tested. If variability was greater than 15%, a third trial was performed, and the maximum value of the 3 trials was accepted. Order of muscle testing was standardized. Measurements were made in pounds. A value of 0 was assigned to a given muscle if the patient could not assume the testing position due to weakness, or could attain the position but not exert measureable force. Strength testing for each session took less than 20 minutes to complete.
Standard protocol approvals, registrations, and patient consents.
For both studies, institutional review board approval was obtained before participants were enrolled, and all participants signed informed consent documents.
Analysis.
Muscle strength data were analyzed by individual muscle groups and by combined muscle groups named (megascores). To create a megascore, individual muscles were standardized using data from 228 healthy participants (not matched by age or sex to ALS participants), from which individual muscle means and SDs were calculated (table e-1). Z scores for each muscle measurement were calculated using these means and SDs. Individual Z scores where then averaged to produce a megascore for a single extremity, for both upper extremities, both lower extremities, and a global megascore including all muscle groups. In addition to the Z score transformation, individual muscles were transformed to a percent change from baseline value, where muscle strength in pounds at baseline was set to 100% and changes over time expressed as percent change from that value. Using percent change from baseline, individual muscles could be combined into megascores in a manner similar to Z scores.
For individual muscle scores as well as megascores, the rate of change for each outcome was estimated for each participant using the slope of linear regression based on measurements from the participant's entire follow-up. Participants with only 1 visit were not included in the analysis. The coefficient of variation for rate of change, a summary statistic that represented both rate of change in muscle function and variability for all patients, was calculated (CoV[R] = [SD of individual slopes of decline]/[mean slope of decline]). The same statistic was calculated for decline in VC and ALSFRS-R.
CoV is a useful statistic for assessing the variability of an outcome measure while taking into account the magnitude of the outcome.15,16 It is typically used as a descriptive statistic. When applied to slopes, CoV estimates both the variability of rates of progression of a measure that are intrinsic to the aspect of the underlying disease being measured, as well as variability of measurement due to either evaluator or participant factors. An outcome with lower CoV would suggest that the outcome measure is more efficient than one with higher CoV when used in clinical trials. We used the CoV in this context in this article purely descriptively, with no intention of making formal inferences regarding the differences either between outcome measures or across studies. At present, there is no way to distinguish variability of measurement from variability of underlying rates of progression.
In addition to slope of decline, strength in individual muscles was correlated both with the same muscle on the contralateral limb as well as with other muscles tested. Similar correlation coefficients were calculated using slope of decline of individual muscles.
For the longitudinal changes of megascores over study follow-up as shown in figure 1, in order to demonstrate the extent of linearity in strength loss over time, month of visit was treated as a categorical variable and the average decline at each follow-up visit was estimated using linear mixed-effects model.
Number of data points for each study and time period are shown below (A). Megascores for lower extremity (A), upper extremity (B), and all muscles (C) are represented.
RESULTS
Neither study overall showed significant differences between active treatment and placebo on any outcome measure. For this reason, placebo and treatment groups were combined for both graphic display and for analysis. For combinatorial analyses, % change from baseline and Z score transformations produced measures that were approximately equally variable; for this reason, only Z score transformations will be presented. Overall, there was remarkable consistency in ALSFRS-R, VC, and HHD between the 2 studies (table 1). ALSFRS-R had the lowest CoV(R) for rate of change, followed by the HHD megascore, with VC showing the highest CoV.
All muscle groups declined in strength over time. For the lower extremities, CoV(R) for individual muscles were in general more variable than for upper extremity muscles; the most variable muscle in both studies was knee extension, while wrist extension in the upper extremity was least variable for both studies (knee extension ceftriaxone: 2.30, 2.32 [L/R]; dexpramipexole: 2.03, 1.82 [L/R]; wrist extension ceftriaxone: 1.38, 1.35 [L/R]; dexpramipexole: 1.25, 1.22 [L/R]).
Baseline strength in a given muscle was highly correlated side to side. Side to side correlation coefficients ranged from 0.90 to 0.65 (table 2). In addition, all upper extremity muscles were well correlated with other muscles, with nonidentical muscle pairs ranging from 0.69 to 0.28 for upper extremities and 0.63–0.58 for lower extremities. In addition, rate of decline in a given muscle was closely correlated to the rate of decline of the same muscle on the contralateral side. Side to side correlation coefficients ranged from 0.82 to 0.43 for identical muscle pairs, 0.55–0.19 for nonidentical upper extremity muscles, and 0.56–0.10 for lower extremity muscles (all p < 0.001). Figure 2 shows representative muscle pairs for both studies.
Side-to-side correlations between muscle strength at baseline and slopes of decline
Data from the ceftriaxone study are on the left; dexpramipexole study, on the right.
For both studies, strength expressed by megascores declined monotonically over time (figure 1). Rates of decline were faster for the upper extremity than lower extremity, and the pattern of change was linear over the time period studied. Changes in HHD megascores were also well-correlated with ALSFRS-R and VC (table 3).
Correlation of slopes of different outcome measures for dexpramipexole and ceftriaxone studies
DISCUSSION
For both the phase 3 studies of dexpramipexole and ceftriaxone, standardized evaluator training and validation of performance was employed to train evaluators on ALSFRS-R, VC, and quantitative strength measurement. All measures were performed with a high degree of consistency; the sensitivities of each measure as expressed by CoV(R) were similar across studies. As this measure is the ratio between mean slope and the SD of all slopes, variability from all sources (patient-related and measure-related) affects the value. Thus, an important point is that, for all measures, formal training of evaluators will reduce CoV(R). Although individual muscles differed in their CoV(R), the differences were not great (table e-1). Knee extension was most variable, perhaps because it is often the strongest muscle tested and thus susceptible to a ceiling effect. However, knee extension is critical for ambulation in ALS, and for that reason it was considered important to include in the analysis.
ALSFRS-R was the least variable measure; this is most likely due to the fact that, for each question, a 0–4 scale is employed for which the boundaries between grades are broad. Thus the variability of the measure is less than that of continuous variables for which small changes in capacity result in changes in measurement. Both quantitative strength and VC are continuous measures, allowing for more variability. Formal power analyses are based both on rates of decline of a measure and the variability between patients; thus, purely from these considerations, ALSFRS-R as a primary outcome measure allows for the smallest sample size of the 3 measures discussed.
Perhaps surprisingly, quantitative strength, expressed as an average of multiple muscle groups in the upper and lower extremities, had a lower CoV(R) than VC for both studies. As performed using HHD, strength testing is inherently limited by the strength of the evaluator; if for a given muscle the evaluator cannot overcome the strength of the patient, the value obtained will reflect the force exerted by the evaluator rather than the patient. Despite this clear ceiling effect, strength measurements were consistent and reproducible.
Strength has been tested in a variety of ways in ALS trials, and is a critical outcome measure for trials in many neuromuscular diseases. Perhaps most commonly, manual muscle testing (MMT) has been employed. With MMT, strength is graded qualitatively on a 5- or 10-point scale. Even with defined evaluator training, MMT for an individual muscle is extremely variable, and the scale itself is a noninterval scale. Despite these limitations, CoV for rate of change can be reduced by testing many muscles. In a multicenter study where 36 muscles were tested and included in an average value, CoV was slightly greater than 1.0.9 Because of the variability of measurement and the poor correlation of MMT grades to quantitative muscle strength, Munsat et al. developed the Tufts Quantitative Neuromuscular Evaluation, which tested isometric strength using a strain that was moved to various positions while patients assumed multiple positions on a flat table. With formal training, this test was reliable and reproducible, and produced lower CoV(R) values than MMT when the number of tested muscles was equal. However, because patients had to move to a variety of positions, it was fatiguing and took over 40 minutes to assess a limited number of muscle groups. To reduce patient fatigue, increase speed of testing, and allow more muscles to be tested in a quantitative fashion, quantitative muscle testing using HHD was developed. In both the ceftriaxone and dexpramipexole studies, a full testing protocol took approximately 10 minutes and included 18 muscle groups. From the data presented here, CoV(R) was better than MMT using 36 muscles.
One potential limitation to quantitative strength testing using HHD is the fact that, for very strong muscles, the evaluator may not be able to overcome the inherent strength of the muscle being tested. In that situation, the strength measured may reflect that of the evaluator instead of the patient. It is likely for this reason that the CoV(R) for knee flexors and extensors was in general higher than for other muscles, as these muscle groups are often strong even in the ALS population. However, the overall reliability of the global and extremity megascores indicates that strength could nonetheless be measured reliably.
An interesting observation present in both studies was the extent to which muscle strength in the same muscles of both sides were correlated, both with respect to baseline values and rate of change over time. Even more surprising was the fact that all muscles of the upper extremity were well-correlated with each other, and the same was true for the lower extremity. These observations suggest that, for most patients, ALS is a much more generalized disease during the period of time patients are participating in a clinical trial. This may be a function of the duration of illness at the time of trial enrollment; for the ceftriaxone study, mean duration of illness prior to screening was 18 months, while it was 15.2 months for the dexpramipexole study. Clearly, though many patients present focally, weakness becomes disseminated over time. Given that the ceftriaxone and dexpramipexole trials had different disease duration requirements (less than 36 vs less than 24 months), and the muscle-to-muscle correlation coefficients were similar, it seems that disease duration itself is not the governing factor. With respect to consideration of outcomes in future trials, the high muscle-to-muscle correlation implies that, if there were a sensitive and reliable measure that could only be performed on a single muscle or a limited muscle set, such a measure might still capture an accurate picture of overall disease course. Though not included in the set of muscles tested by HHD, VC, a measure of respiratory muscle strength, may also change in close association with skeletal muscles. For dexpramipexole, correlation coefficient for HHD and VC was 0.46, while it was 0.52 for ceftriaxone.
The CoV(R) is a measure that combines both the variability of assessment and the variability of the rate of decline across patients. Although a useful way to compare measures, other factors are important as well. The relevance of the measure to the underlying disease process is critical, as is the potential sensitivity to disease modification with the therapeutic agent being tested. As loss of strength is an intrinsic component of decline in ALS, strength testing seems clearly relevant and has been adopted in many recent phase 3 trials. Sensitivity to therapeutic change can only be determined in a trial where there is a positive effect of treatment. However, based on data from the studies reported here, quantitative strength is clearly a measure that should be included in future studies. HHD is a reliable, time-efficient, and low-cost method of monitoring strength in an ALS clinical trial and thus is a viable candidate for this assessment.
AUTHOR CONTRIBUTIONS
Jeremy M. Shefner: study concept and design, analysis and interpretation, study supervision. Dawei Liu, Melanie Leitner, David Schoenfeld, Toby Ferguson, and Merit Cudkowicz: analysis and interpretation, critical revision of the manuscript for important intellectual content. Donald R. Johns: critical revision of the manuscript for important intellectual content.
STUDY FUNDING
The dexpramipexole clinical trial was funded entirely by Biogen-Idec. The ceftriaxone clinical trial was funded by NIH/National Institute of Neurological Disorders and Stroke (U01NS077179, PI Dr. Merit Cudkowicz).
DISCLOSURE
J. Shefner served on an Advisory Board for Biogen-Idec, received consulting income from Cytokinetics, Inc., Voyager Pharmaceuticals, and ISIS Pharmaceuticals, received income as neuromuscular section editor for UpToDate, and received research support from NIH, ALSA, MDA, Cytokinetics, and Biogen-Idec. D. Liu is an employee of and holds stock in Biogen-Idec. M. Leitner is an employee of and holds stock in Biogen-Idec. D. Schoenfeld has received consulting income from Alexion Pharma, Mitsubishi Pharma, and research support from ALS Therapy Alliance, NIH, American Burn Association, CDC, and Skulpt. D. Johns is an employee of and holds stock in Biogen-Idec. T. Ferguson is an employee of and holds stock in Biogen-Idec. M. Cudkowicz received consulting income from Genentech, Denali, AstraZeneca, Cytokinetics, and Biogen, gave expert testimony for Teva, and has received research support from ALSA, National Institute of Neurological Disorders and Stroke, and MDA. Go to Neurology.org for full disclosures.
Footnotes
Go to Neurology.org for full disclosures. Funding information and disclosures deemed relevant by the authors, if any, are provided at the end of the article.
Supplemental data at Neurology.org
- Received October 27, 2015.
- Accepted in final form April 26, 2016.
- © 2016 American Academy of Neurology
REFERENCES
- 1.↵
- Miller RG,
- Block G,
- Katz JS, et al.
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- Robbins RA,
- Simmons Z,
- Bremer BA,
- Walsh SM,
- Fischer S
- 6.↵
- Traynor B,
- Zhang H,
- Shefner J,
- Schoenfeld D,
- Cudkowicz M
- 7.↵
- Czaplinski A,
- Yen AA,
- Appel SH
- 8.↵
- 9.↵Great Lakes ALS Study Group. A comparison of muscle strength testing techniques in amyotrophic lateral sclerosis. Neurology 2003;61:1503–1506.
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- Shefner JM,
- Watson ML,
- Simionescu L, et al.
Disputes & Debates: Rapid online correspondence
NOTE: All authors' disclosures must be entered and current in our database before comments can be posted. Enter and update disclosures at http://submit.neurology.org. Exception: replies to comments concerning an article you originally authored do not require updated disclosures.
- Stay timely. Submit only on articles published within the last 8 weeks.
- Do not be redundant. Read any comments already posted on the article prior to submission.
- 200 words maximum.
- 5 references maximum. Reference 1 must be the article on which you are commenting.
- 5 authors maximum. Exception: replies can include all original authors of the article.
- Submitted comments are subject to editing and editor review prior to posting.