Measuring Friedreich ataxia
Complementary features of examination and performance measures
Citation Manager Formats
Make Comment
See Comments

Abstract
Objective: To examine the potential validity of performance measures and examination-based scales in Friedreich ataxia (FA) by examining their correlation with disease characteristics.
Methods: The authors assessed the properties of a candidate clinical outcome measure, the Friedreich Ataxia Rating Scale (FARS), and simple performance measures (9-hole peg test, the timed 25-foot walk, PATA test, and low-contrast letter acuity) in 155 patients with FA from six institutions, and correlated the scores with disease duration, functional disability, activity of daily living scores, age, and shorter GAA repeat length to assess whether these measures capture the severity of neurologic dysfunction in FA.
Results: Scores for the FARS and performance measures correlated significantly with functional disability, activities of daily living scores, and disease duration, showing that these measures meet essential criteria for construct validity for measuring the progressive nature of FA. In addition, the FARS and transformed performance measures scores were predicted by age and shorter GAA repeat length in linear regression models accounting for sex and testing site. Correlations between performance measures were moderate in magnitude, suggesting that each test captures separate yet related dimensions of neurologic function in FA and that a composite measure might better predict disease status. Composite measures created using cohort means and standard deviations predicted disease status better than or equal to single performance measures or examination-based measures.
Conclusions: The Friedreich Ataxia Rating Scale, performance measures, and performance measure composites provide valid assessments of disease progression in Friedreich ataxia.
Friedreich ataxia (FA) is an autosomal recessive disorder of progressive ataxia, dysarthria, diabetes, scoliosis, and cardiomyopathy.1–3 An expanded GAA triplet repeat is found in both alleles of the FRDA gene in 97% of patients.4 The length of the shorter GAA repeat provides a marker of genetic severity. The remaining patients carry a point mutation on one allele and an expanded repeat on the other.1–5 While no clinical outcome measure has been fully validated in patients with FA, a clinical examination-based measure, the Friedreich Ataxia Rating Scale (FARS), has excellent interrater reliability.6 The FARS is derived from ordinal grading of a directed neurologic examination, which may limit its sensitivity and reproducibility.
In other disorders, measures based on performance scales, such as the MS Functional Composite (MSFC), may potentially replace examination-based scales.7 The MSFC combines the timed 25-foot walk (T25FW) for ambulation, the 9-hole peg test (9HPT) for arm function, and a test for cognitive function. Low-contrast letter acuity is under evaluation as a visual component.8 A similar approach may prove effective for FA. In a small FA cohort, performance measures capture neurologic dysfunction in a manner reflecting neurologic and genetic severity.9 The present study assesses whether the FARS and performance measures reflect disease severity in a multi-institutional cohort of FA patients, compares results of performance measures directly with the FARS, and uses an approach matching that of the MSFC to create composite measures that capture the progressive nature of FA.
Methods.
We examined 155 patients with genetically confirmed FA at the University of Pennsylvania/Children's Hospital of Philadelphia (47 patients), University of California, Los Angeles (50 patients), Emory University (28 patients), University of Mississippi (14 patients), University of Minnesota (12 patients), and University of Texas Medical Branch (4 patients). The FARS, activities of daily living (ADL), and disability ratings were performed as described.6 9HPT and T25FW testing were conducted as in the MSFC.7 For the 9HPT, as a summary measure, the mean of the scores from two trials with each hand was calculated. For the T25FW, the mean of two trials was the summary measure. The reciprocal peg test (9HPT-1) and reciprocal T25FW (T25FW-1) were calculated as the reciprocal of the summary measure. For the T25FW and 9HPT, the variability between trials was recorded and expressed as the method error between the trials (ME = SD/1.414) and the coefficient of variation of the ME (CVME = 2ME/[X1+X2] × 100). The intraclass correlation coefficient (ICC) between the two trials was also calculated. When a patient could not perform the 9HPT or T25FW, it was determined whether it reflected an FA-associated abnormality (weakness, dysmetria, fatigue, disease-related pain) or was unrelated to FA (pain unrelated to disease, fracture, patient refusal, insufficient time available). Twenty patients could not complete 9HPT testing. In 19 of these, failure to perform testing was related to FA. Eighty-two patients could not complete T25FW testing. In 79, failure to perform testing was related to FA. For analysis, patients unable to complete tests due to FA-related factors were given a score of infinity while patients unable to complete the test for unrelated reasons were excluded from analysis.
Low-contrast letter acuity testing using low-contrast Sloan letter charts (Precision Vision, La Salle, IL) was performed using a protocol that demonstrates significant differences between controls and patients with FA.9,10 The present study utilized retroilluminated charts, thus minimizing the need for identical lighting between different centers. Patients wore their standard refractive correction for distance. Each eye was also tested separately at high contrast, allowing monocular visual acuity to be determined. One patient could not complete testing for a reason unrelated to FA and was excluded from analysis.
PATA testing was performed as described previously.6 Briefly, patients were asked to say the bisyllabic phrase “PATA” as many times as possible in 10 seconds. This was repeated once, and data were calculated as the mean of the two trials. The variability between trials was assessed in a manner analogous to the T25FW and 9HPT. Six patients could not complete PATA testing, all for reasons unrelated to FA, and were excluded from analysis. A control cohort was obtained from neurologically asymptomatic individuals at the primary institution and compared to the patients at that location.
The SF-36, a generic health-related quality of life scale, was administered to subjects 18 years or older as described previously.11 The SF-36 has established norms in the United States population, and is scored as a Mental Components Summary and a Physical Component Summary reflecting patients' perceptions of these distinct abilities.
Patients were genetically confirmed using commercial or research testing. Patients met one of the following criteria: 1) two expanded FRDA alleles (143 patients); 2) typical FA phenotype and a genetically confirmed sibling (2 patients); 3) a single expanded allele and a point mutation in the FRDA gene confirmed by DNA sequencing (6 patients; table E-1 on the Neurology Web site at www.neurology.org); 4) a single expanded allele and a presumed point mutation based on phenotypic resemblance to FA (4 patients). Demographic characteristics recorded included patient age, GAA triplet repeat length of the smaller FRDA allele, symptomatic disease duration, age at onset, age at diagnosis, and first symptom.
Creation of composite scores.
Composite scores from the 9HPT-1 and the T25FW-1 scores (designated Z2) and the 9HPT-1, the T25FW-1, and the Sloan chart scores (designated Z3) were created by the basic methods used in the MSFC.7 Raw scores from each test were tabulated and converted to Z scores for that test by subtracting the cohort mean from the raw score, and then dividing by the cohort SD to create a Z score for the test. Thus, the cohort mean score is equal to a Z score of zero, one SD unit worse than the cohort mean receives a Z score of -1, and one SD better receives a score of 1. The composite Z scores were created by averaging the Z scores of the individual components. The reliability of the two-component composite score Z2 was evaluated based on the two trials of the 9HPT and T25FW. The variability between trials was calculated and expressed as the ME displayed in Z units. The ICC between the two trials was also calculated.
Data analysis.
Correlation coefficients (Pearson or Spearman) were calculated to examine the relation of performance measure scores with disease characteristics including age, symptomatic disease duration, and disability stage.9 Nonparametric analyses were used when data did not fit assumptions for normality (skewness values substantially > 1). When the data fit assumptions for normality, multivariate linear regression analyses were used to examine the relation of measure scores with GAA repeat length accounting simultaneously for age, sex, and testing site and to examine the rate of change of different measures with disease duration. Analyses were performed using Stata 8.0 software (StataCorp, College Station, TX) and Instat (Graph Pad software). To account for multiple comparisons, p < 0.01 was the standard used for significance.
Results.
The present cohort resembles those previously identified in large investigations (table E-2). The length of the shorter GAA repeat correlated with age at onset (RS = –0.60), and the typical patient used a walker based on functional disability scores (table E-3). We initially sought to ensure that two simple measures of progressive neurologic changes—the ADL scale and the functional disability scale—each reflected a decline in neurologic abilities as revealed by correlation with disease duration. Each correlated with disease duration (RS = 0.63 for ADL vs duration, RS = 0.64 for functional disability score vs duration, both p < 0.0001), and they correlated highly with each other (RS = 0.85, p < 0.0001). As the present group includes a heterogeneous population of FA patients with differing GAA repeat lengths, correlations of neurologic measures with duration should be significant but substantially less than one, as seen for the ADL and functional disability scores.
Features of the FARS.
While the ADL scale and functional disability ratings are useful for categorizing neurologic impairment, they should be insensitive to disease progression based on their ordinal/categorical properties. We hypothesized that the FARS could capture progressive neurologic dysfunction as assessed by cross-sectional analysis of the cohort in relation to disease duration, functional rating, and ADL scores (all of which reflect progressive disease) (table E-4). The total FARS score correlated highly with disability and ADL scales and significantly with disease duration. In addition, in multivariate linear regression models accounting for sex and testing site, the FARS score was predicted by GAA repeat length (p < 0.0001) and age (p < 0.0001). Other variables did not predict FARS scores (testing site, p = 0.76; sex, p = 0.74; R2 overall = 0.43). In addition, each individual FARS subscale correlated highly with duration, disability, and ADL score, with only modest differences between the lowest of these correlations (bulbar) and the highest (upright stability) (table E-5). Overall, the correlation of the complete neurologic examination with duration, disability, and ADL scores was higher than any subscore alone except for the upright stability score. Factor analysis showed that 99% of the variation among subscale scores was explained by a single factor. This shows that the FARS captures the neurologic features of the diverse phenotype of FA, including the predisposition of FA for impairment of upright stability.
Features of performance measures.
The T25FW allows patients to use their typical assist device. Of the 73 patients who completed the test, 43 used no assistive device, 8 used unilateral assistance, and 22 used bilateral assistance. Among patients completing the test, data were skewed, but analysis of the T25FW−1 created a more normal distribution (as observed in our previous study of a smaller cohort) with a skewness of 0.74 (table E-3).9 The T25FW was relatively reproducible (based on data of those able to perform the test); the ME was 1.1 seconds and the CVME between trials was 9.9%. The ICC between trials was 0.999 ± 0.001, and the rank correlation between trials was RS= 0.97. Similar reproducibility was found for the T25FW−1 with the ME being 0.0077 seconds−1 with a CVME of 11.8%. The ICC between trials was 0.994 ± 0.003, and the rank correlation was RS = 0.99.
The data from the 9HPT (calculated from the mean of both hands) were also skewed (as were data from dominant and nondominant hands individually; data not shown) but normalized with reciprocal transformation, as noted in our previous work and in the MSFC (table E-3).7,9 The correlation between hands was high (RS= 0.95). Scores from the 9HPT were also relatively reproducible. Among patients who completed the test, for the average of dominant and nondominant hands, the ME between trials was 14.5 seconds, and the coefficient of variation was 8.2%. The ICC between the trials was 0.97, while the rank correlation between trials was RS = 0.97. Similar results were found for the dominant and nondominant hand individually (data not shown). For the 9HPT−1 using dominant and nondominant hands together, the ME between trials was 0.00084 seconds−1, and the CVME was 5.6%. The ICC between the trials was 0.99, and the rank correlation between trials was RS = 0.99.
Sloan charts scores were not skewed in the summary measure (table E-3). However, scores from the high contrast chart were substantially skewed, reflecting a ceiling effect as almost all subjects successfully read the high contrast chart. Of 154 subjects, 127 had binocular high contrast Snellen acuity equivalents of 20/25 or better, and 81 had binocular acuities of 20/16.5 or better. On the high contrast chart, patients performed better with both eyes than with either eye alone. The median right-eye and left-eye scores were four letters worse than both eyes combined (both p < 0.0001). Scores correlated highly between eyes (RS = 0.80; p < 0.0001) and between charts (100 vs 2.5, RS = 0.82; 100 vs 1.2, RS = 0.78; 2.5 vs 1.25, RS = 0.95).
PATA scores were different between controls and FA patients (p < 0.0001; data not shown). Data were normally distributed, and PATA scores were reproducible between trials (table E-3) (ME = 1.6; CVME = 10.7%). The ICC between trials was 0.93 and the rank correlation between trials was RS= 0.90.
Each performance measure correlated significantly with measures of disease progression (ADL, duration, FARS, functional disability score; table E-4), showing that they capture progressive disease, although the correlations were substantially lower for the PATA test and slightly lower for the Sloan charts. Similarly, we used linear regression models to assess whether age, GAA repeat length, testing site, and sex predicted each performance measure (table E-6). Testing site significantly predicted none of the performance measures, and sex had no significant influence on any performance measures (though it was of marginal significance in predicting T25FW−1 scores). GAA repeat length and age were significant predictors of T25FW−1, 9HPT−1, and Sloan Chart scores. Models for PATA scores were marginally significant overall, and age and GAA repeat length only marginally predicted PATA scores. These data show that performance measures capture progressive aspects of FA.
Pearson correlation coefficients among performance measures ranged from modest to high, suggesting that they capture different dimensions of neurologic dysfunction in FA (table E-7). Some of the differences reflect the differing temporal course of change of each measure (figure E-1).
These results predict that a composite measure derived by combining different performance measures might provide an appropriate measure of disease. We created composite measures using the rules designed for the MSFC, in which measures are averaged using SD units. The PATA test was not included in the composite measures due to the weaker correlation of PATA scores with disease features. The composite measures created from the combination of T25FW−1 and 9HPT−1 testing (called Z2) and from the combination of T25FW−1, 9HPT−1, and Sloan Chart (called Z3) testing remained normal in distribution (table E-3). When the scores from the two trials of the walk and peg test were treated separately, the Z2 score was reproducible; the ICC value was 0.99 between trials. The ME for the walk component was 0.094 Z units, for the peg component it was 0.086 Z units, and for the Z2 score it was 0.066 Z units.
In addition, the composite measures correlated with features of progressive disease. Both Z2 and Z3 scores correlated with ADL scores, functional disability scores, disease duration, and FARS scores with coefficients that were generally higher than those of individual performance measures (table E-4). The composite scores were also predicted by age and GAA repeat length with higher R2 values than individual performance measures (table E-6). In addition, the correlations with duration, ADL scores, and disability scores, and the results from multivariate linear regression models, were equal to or higher than those of the FARS, suggesting that the composite measures capture disease progression to a degree equal to or greater than that of the neurologic examination-based FARS. While the Z2 composite appeared to have floor effects late in FA, the Z3 composite measure changed relatively uniformly over the course of FA (figure).
Figure. Temporal course of the Friedreich Ataxia Rating Scale (FARS) and composite measures. The course of change of composite performance measures with time is demonstrated by the plot of scores with disease duration. While the Z3 composite score changed with duration over the entire illness, the FARS and the Z2 score reached a plateau in some patients in late disease.
Relation of the FARS and composite performance measures with SF36 scores.
We also assessed whether the FARS and the composite Z scores capture subjective dysfunction in FA using a HRQOL measure, the SF-36. Patients with FA perform worse on the Physical Component Summary (PCS) (T score = 33.0 ± 9.0; p < 0.0001) component of the SF36 than the norms for the American population but have similar results to norms (T score = 50.9 ± 9.4) on the Mental Component Summary (MCS). We then assessed whether FARS and performance measure composite scores correlated with PCS or MCS scores. The FARS (RC = 0.37; p = 0.0002), Z2 (RC= 0.41; p = 0.0001), and Z3 (RC = 0.36; p = 0.0005) scores correlated moderately with PCS scores but not with MCS (RC < 0.05 for all measures) scores, showing that they capture the subjective physical dysfunction identified by patients with FA.
Utility of the FARS and composite measures among different age groups.
As FA affects both children and adults, clinical measures of FA must be useful in different age groups. We examined whether the significant correlations of the FARS and composite performance measures observed for the entire cohort were also observed when subjects were stratified by age. In subjects 18 years or younger, both FARS and composite measure scores still correlated highly with functional disability and ADL scores, and correlated moderately with disease duration (table E-8). Correlations with ADL scores, functional disability scores, and duration were also highly significant in subjects over 18. In addition, FA is sometimes subdivided into typical FA and late-onset FA (after age 25), which vary slightly in phenotype as well as in age at onset.1–3 When the data were analyzed based on age at onset, correlations of the FARS, Z2, and Z3 scores with ADL and functional disability scores remained high. Correlations were higher in each of the stratified groups than in the whole cohort, consistent with the heterogeneous nature of the entire cohort. Late onset and typical FA are also differentiated by GAA repeat length. When FARS scores were stratified based on GAA repeat length, the correlations of FARS and composite measure scores with ADL score, duration, and functional disability score remained high. These data show that each of these clinical measures captures progressive neurologic dysfunction in subgroups of our cohort.
We then examined the linear regression of FARS and composite measure scores with symptomatic disease duration, and used this to define the slope of the best fit line for the relation of measures with disease duration among different subgroups (analogous to the plots shown in the figure) (table E-9). This slope should provide an estimate of the average rate of change in function of different subgroups over time. Overall, the FARS, Z2, and Z3 score correlated moderately with disease duration, although the duration data were minimally skewed. The slope of the regression line was relatively similar among groups with different GAA repeat lengths, being only slightly higher in the 251 to 500 and 501 to 750 repeat subgroups, and substantially lower in the individuals with GAA repeat lengths of less than 251. This confirms that individuals with very short GAA repeat lengths not only present later but also progress more slowly when viewed as a cohort.1–3 In contrast, the slope of the line for younger subjects (age 30 and under) was substantially higher than that of the overall cohort or older age groups. This shows that when the FARS and composite measures are used as measures in a large cohort, both should be most sensitive to change among the younger (and thus usually earlier onset) subjects.
Discussion.
This study demonstrates that the FARS and performance measures both capture aspects of neurologic function in FA. FARS, performance-measure, and performance-measure-composite scores correlate with disease duration, functional disability, ADL scores, and generic physical HRQOL scores. These scores (except for the PATA test) are also predicted by genetic severity of FA (accounting for age). In contrast, testing site and sex had no influence on FARS or performance measure scores. The results all support the validity of the FARS and performance scores as clinical measures of the progressive course of FA. In addition, correlations of the FARS and composite measure scores with markers of severity are highly significant in subgroups stratified by patient age, age at onset, and GAA repeat length. This suggests that both the FARS and performance measure composites may be useful in subtypes of FA.
These measures may provide complementary information when used in different clinical situations. Since all of the measures had ceiling or floor effects depending on the relative level of disability of individual patients, the measure used in any situation may be best selected for the disability level of the target FA population. For example, the performance measure composite Z3 may be the most useful measure in examining change in patients with more advanced disease, as the FARS and Z2 scores had floor effects among this group. Both the T25FW and the 9HPT had substantial floor effects, and in fact the T25FW could be performed by only about 50% of the patients. This suggests that composites using these performance tests may be most useful in FA populations in which such effects are minimized.
In addition, the continuous nature of the performance measures may provide better sensitivity than the categorical nature of the FARS in many situations. Performance-measure-based scales have excellent reliability. The 9HPT, PATA, and timed walk have high interrater reliability in FA, and Sloan letter chart testing has shown high interrater reliability both in controls and in progressive neurologic disorders such as multiple sclerosis.6,12 However, whether this potential improved sensitivity and reliability is noted in longitudinal studies also depends on the test-retest features of the measures. In the present study, for the two-component composite, the immediate test-retest method error for the two-component composite is roughly equal to the predicted yearly change based on the slope of the regression line of the Z2 score and duration. The test-retest features of performance measures over longer periods of time and the reproducibility of the FARS in test-retest situations are not yet known. Data on these properties from future studies will influence sample size calculations for clinical trials.
One method for decreasing the necessary sample size in future therapeutic trials is to identify those individuals destined to change at a faster rate and enrich clinical trial populations with such individuals. The FARS and composite scores were most sensitive to change in patients at age 30 or less. This group includes mainly subjects with early onset reaching moderate disability in their teenage and young adult years. It excludes individuals with very long GAA lengths who usually begin to reach maximal disability slightly later than age 30 and individuals with shorter GAA repeat lengths who have not presented by age 30. Based on this result, future therapeutic trials may be best directed to patients less than 30. While one might predict that GAA repeat length would be the best predictor of the average rate of change, the present cohort contains individuals with long GAA repeat lengths who are almost maximally disabled, confounding the simple use of triple repeat length as a predictor of the average rate of change. Only among patients with GAA repeat lengths of 250 repeats or less was the predicted progression rate slower. Ideally, one could use both GAA repeat length and age as elements for stratifying the group, but the size of our cohort limits the ability to make such calculations.
Several questions remain to be investigated. Both performance measures and the FARS are susceptible to practice effects. In addition, the exact change in any measure that is clinically important in FA is unknown at present. In the present study, dimensions in the composite scores are combined based on SD units, allowing the continuous nature of individual results to be retained in a normally distributed composite score. While this improves the sensitivity of the measure, it assumes that a clinically relevant change in each measure is similarly proportional to the SD of that measure. This is supported by the correlation of scores with HRQOL results in the present study. Further studies with more detailed and disease-specific HRQOL components may suggest modifications in the weights for individual components in the composite measures, and also define the degree of change in the composites and the FARS that is clinically important to patients. In addition, a performance measure composite for FA ideally should have a measure of speech, but the PATA test barely meets criteria for capturing disease severity on its own. Further development of speech measures for use in FA is ongoing. Finally, all measures are likely to have decreased reproducibility at the extremely young end of the age range of FA, less than age 10. In this age range, neurologic abilities, including performance scores, are still improving among normal individuals, confounding all measures of declining performance.13 Our present data do not have sufficient power to determine how much this affects performance measure results in this age range, but longitudinal studies of this population may be particularly useful in this regard.
While the present study represents an early step in the development of sensitive clinical measures for assessment of FA, the best test of clinical measures in FA will involve serial, parallel comparisons of the FARS with performance and composite performance measures. Such comparison studies are ongoing and will further define the role of different measures in clinical studies in FA.
Acknowledgment
The authors thank William Hartnett, Marianne Wilcox, Martin Ohman, and the rest of the EDS team for designing the Web-based data entry system used for this multicenter study.
Footnotes
-
Additional material related to this article can be found on the Neurology Web site. Go to www.neurology.org and scroll down the Table of Contents for the June 13 issue to find the title link for this article.
See also page 1717
Supported by grants from the Muscular Dystrophy Association, the Friedreich Ataxia Research Alliance, and a Beeson Scholar Award from AFAR to D.R.L., and NIH grant EY 13273 to L.J.B.
Disclosure: The authors report no conflicts of interest.
Received August 17, 2005. Accepted in final form February 21, 2006.
References
- 1.↵
- 2.
- 3.
- 4.↵
- 5.
- 6.↵
Subramony SH, May W, Lynch D, et al. Measuring Friedreich ataxia: interrater reliability of a neurologic rating scale. Neurology 2005;64:1261–1262.
- 7.↵
Cutter GR, Baier ML, Rudick RA, et al. Development of a multiple sclerosis functional composite as a clinical trial outcome measure. Brain 1999;122:871–882.
- 8.↵
Balcer LJ, Baier ML, Cohen JA, et al. Contrast letter acuity as a visual component for the Multiple Sclerosis Functional Composite. Neurology 2003;61:1367–1373.
- 9.↵
Lynch DR, Farmer JM, Wilson RL, Balcer LJ. Performance measures in Friedreich ataxia: potential utility as clinical outcome tools. Mov Disord 2005;20:777–782.
- 10.
- 11.↵
Ritvo PG, Fischer JS, Miller DM, et al. Multiple Sclerosis Quality of Life Inventory (MSQLI): a user's manual. New York: National Multiple Sclerosis Society, 1997.
- 12.
Balcer LJ, Baier ML, Pelak VS, et al. New low-contrast vision charts: reliability and test characteristics in patients with multiple sclerosis. Mult Scler 2000;6:163–171.
- 13.↵
Disputes & Debates: Rapid online correspondence
REQUIREMENTS
If you are uploading a letter concerning an article:
You must have updated your disclosures within six months: http://submit.neurology.org
Your co-authors must send a completed Publishing Agreement Form to Neurology Staff (not necessary for the lead/corresponding author as the form below will suffice) before you upload your comment.
If you are responding to a comment that was written about an article you originally authored:
You (and co-authors) do not need to fill out forms or check disclosures as author forms are still valid
and apply to letter.
Submission specifications:
- Submissions must be < 200 words with < 5 references. Reference 1 must be the article on which you are commenting.
- Submissions should not have more than 5 authors. (Exception: original author replies can include all original authors of the article)
- Submit only on articles published within 6 months of issue date.
- Do not be redundant. Read any comments already posted on the article prior to submission.
- Submitted comments are subject to editing and editor review prior to posting.