Measuring progression of cerebral white matter lesions on MRI
Visual rating and volumetrics
Citation Manager Formats
Make Comment
See Comments

Abstract
Objective: To evaluate the concordance of a volumetric method for measuring white matter lesion (WML) change with visual rating scales.
Methods: The authors selected a stratified sample of 20 elderly people (mean age 72 years, range 61 to 88 years) with an MRI examination at baseline and at 3-year follow-up from the community-based Rotterdam Scan Study (RSS). Four raters assessed WML change with four different visual rating scales: the Fazekas scale, the Scheltens scale, the RSS scale, and a new visual rating scale that was designed to measure change in WML. The authors assessed concordance with a volumetric method with scatter plots and correlations, and interobserver agreement with intraclass correlation coefficients.
Results: For assessment of change in WML, the Fazekas, Scheltens, and periventricular part of the RSS scale showed little correlation with volumetrics, and low interobserver agreement. The authors’ new WML change scale and the subcortical part of the RSS scale showed good correlation with volumetrics. After additional training, the new WML change scale showed good interobserver agreement for measuring WML change.
Conclusions: Commonly used visual rating scales are not well suited for measuring change in white matter lesion severity. The authors’ new white matter lesion change scale is more accurate and precise, and may be of use in studies focusing on progression of white matter lesions.
Cerebral white matter lesions (WML) are thought to result from small vessel disease, and their presence and severity increase with age and the presence of arterial hypertension.1-3⇓⇓ Although the clinical significance of these lesions remains to be fully understood, WML have been associated with dementia, depression, and stroke.4-6⇓⇓ In healthy elderly people, WML are associated with adverse cognitive function and the presence of depressive symptoms.7-9⇓⇓ Patients with Alzheimer disease (AD), vascular dementia, and depression have more severe WML than controls.4,5⇓ It has been suggested that WML progress gradually over time, and may ultimately lead to subcortical vascular dementia and vascular depression or contribute to the clinical expression of AD.10 Studies that have determined progression of WML over time are limited, and comparison of their findings is difficult due to the use of different visual rating scales for the assessment of WML progression.11-13⇓⇓ Evaluation of WML progression is of clinical importance, since it is needed to determine the natural course of these lesions, and to study the effect of intervention studies. Visual rating scales have proven their value in cross-sectional studies, but very little is known about the sensitivity and reliability of these scales for measuring change in WML over time. Volumetric methods may provide the most objective assessment method, but are often time consuming, and therefore not always feasible in large studies. The objective of the present study was to evaluate three commonly used visual rating scales—the Fazekas scale,14 the Rotterdam Scan Study (RSS) scale,3 and the Scheltens scale15—in terms of accuracy and precision in measuring change of WML in a defined population. We compared the degree of concordance with a volumetric method, and the reproducibility of these scales. In addition, we introduce a simple visual rating scale that was designed to measure change in WML over time. We compared the performance of this WML change scale with the other three visual rating scales, and with the volumetric method.
Methods.
Subjects.
The scan material used in the present study originates from subjects participating in the RSS, a population-based study that was designed to study causes and consequences of age-related brain changes in elderly people.7 In 1995 through 1996, 1,077 nondemented elderly people aged 60 to 90 years underwent a baseline examination that included a cranial MRI scan. In 1999 through 2000, 787 of the 973 participants who were alive and eligible (not institutionalized, not moved abroad) were re-examined (response rate 81%). Of these participants, 668 underwent a second MRI (response rate 69%). We selected scan pairs from 10 participants, in a nonrandom manner, to serve as a training set. Additionally, we randomly selected 20 participants who had a baseline and follow-up scan, in three strata of baseline subcortical WML severity, as assessed with the RSS scale.3 We selected seven participants from the first tertile of subcortical WML severity, seven from the second tertile, and six from the third tertile to cover the whole range of the WML distribution. The mean age of participants was 72 years (range 61 to 88 years), 10 (50%) were women, and 8 (40%) had hypertension. The mean time between the first and second MRI was 3.3 years (range 2.9 to 4.0 years).
MRI scanning and white matter lesions.
Axial T1, T2, and proton density weighted cerebral MR scans were made on a 1.5-Tesla scanner (Siemens, Erlangen, Germany). The following pulse sequences were applied: T1 (700 msec/14 msec/2 [repetition time/echo time/excitations]), T2 (2,200 msec/80 msec), and proton density (2200 msec/20 msec). Slice thickness was 5 mm, with an interslice gap of 1 mm, and a matrix size of 192 × 256 pixels. MRI protocols were identical at baseline and at follow-up. We defined WML as hyperintense lesions, located in the cerebral white matter, that are visible on both T2- and proton density-weighted images, and do not have a hypointense center on proton density-weighted images (as in lacunes). Lesions were considered periventricular in location when directly adjacent to the ventricles; otherwise we considered them as subcortical. If periventricular lesions extended 10 mm perpendicularly from the ventricular border, the extending part was per definition scored as a subcortical lesion.
Rating scales.
We assessed WML severity at baseline and WML progression with four different visual rating scales. The Fazekas scale rates WML both in the periventricular and subcortical region on a 0 to 3 scale.14 The Scheltens scale rates WML in the periventricular region on a 0 to 6 scale, and in the subcortical region on a 0 to 24 scale, on the basis of the size and number of the lesions.15 It also includes ratings for basal ganglia and infratentorial areas, which were not used in this report. The RSS scale rates WML in the periventricular region on a 0 to 9 scale, and for subcortical WML a lesion volume is approximated based on number and size of the lesions.3 In addition, we designed and used a new simple scale to measure WML change: the WML change scale. In this scale change in WML (−1 decrease, 0 no change, +1 increase) is scored in three periventricular locations (frontal caps, lateral bands, occipital caps) resulting in a periventricular score of −3 to + 3, and in four subcortical locations (frontal, parietal, temporal, and occipital), resulting in a subcortical score of −4 to + 4. Increase is defined as the occurrence of a new focal lesion or the enlargement of a previously visible lesion; decrease is defined as the reverse (i.e., disappearance or shrinkage).
Visual rating system.
All ratings were performed at the VU medical center. The MRI studies were in digital format. Four raters (N.D.P., E.C.W.v.S., E.J.v.D., M.S.) analyzed WML on baseline and follow-up images, using the four different visual rating scales. Raters were blinded to clinical information, but not to name, age, and scan year. WML were rated on proton density and T2-weighted images, by direct scan comparison on a personal computer, using the viewing program Radworks (version 5.1, Applicare, Zeist, the Netherlands). To optimize the comparability of the baseline and follow-up scans, images had been registered and resliced using the software package Mirit, which uses mutual information as optimization criteria.16 After a training session in which the four raters in couples assessed 10 scan pairs of the training set, a consensus meeting was held among the authors to identify and resolve any possible differences in application of the various scales. Following this training stage, each rater then individually scored the 20 series of baseline and follow-up MRI studies. The rating scales were always applied in the same order: first the Fazekas scale, second the RSS scale, third the Scheltens scale, and finally our WML change scale. Raters were aware which was the baseline and follow-up scan, and this may lead to bias toward finding a positive change in WML severity. In order to estimate this potential systematic measurement error, two raters reassessed the 20 series with the WML change scale in the native domain, first blinded to scan date, and 2 weeks later not blinded to scan date.
Volumetric assessments.
We used proton density images for the volumetric quantification of WML volume on a workstation (Sparc 5; SUN, Palo Alto, CA). One reader identified lesions on the registered images, and then determined the areas of the lesions using home-developed software (Show_Images, version 3.6.1) in the native domain to avoid artificial enlargement of lesion areas due to reslicing. We used a seed growing method to determine WML areas on each slice for periventricular WML (frontal caps, lateral bands, occipital caps), and subcortical WML (frontal, parietal, temporal, and occipital).17 WML areas were not recorded separately for the right and left hemisphere. By summing the areas of each slice multiplied by the interslice distance, we calculated total WML volumes for the different regions. The volumetric assessments were performed twice, with an interval of 6 months, and the mean value of the two assessments was used in the analyses. The intraclass correlation coefficients reflecting the intrarater agreement for the baseline assessment were 0.84 for periventricular and 0.97 for subcortical WML volume, with a SD of the difference between the two ratings of 1.4 mL for periventricular and 0.86 mL for subcortical WML.18
Data analysis.
For the volumetric assessment, change in WML volume was calculated by subtracting the baseline WML volume from the follow-up volume.19 Pearson’s correlation coefficient was used to assess the relation between baseline WML severity and WML change in the periventricular and subcortical region. For the visual rating scales (Fazekas scale, Scheltens scale, RSS scale), change in WML score was calculated by subtracting the baseline from the follow-up rating for each rater separately. Progression on the visual rating scales was defined as an increase of 1 point or more on the scale. We made scatter plots to visualize the relation between the change in WML assessed with the volumetric method (in mL), and the visual rating scales. Furthermore, we assessed concordance between visual scales and volumetrics by the nonparametric Spearman’s rho. Spearman’s rho values of 0 were considered no relationship between the variables; values equal to 1 were considered to reflect perfect correlation. We quantified the interobserver agreement on the visual rating scales with intraclass correlation coefficients. The intraclass correlation coefficient is the biologic variation between participants divided by the sum of the variation between participants and the rater variation. We estimated the possible bias in the visual ratings that may have been introduced by being aware which were the baseline and follow-up images. Bias was expressed as the mean difference in scores on the WML change scale between not-blinded ratings and blinded ratings.18 Furthermore, we assessed the 95% limits of agreement between the not-blinded and blinded method.
Results.
WML severity at baseline and change.
The median WML volumes as assessed with the volumetric method were 3.3 mL (range 1.6 to 10.4) at baseline and 0.7 mL (range −2.1 to 6.7) increase for the periventricular region, and 0.2 mL at baseline (range 0 to 15.2) and 0.1 mL (range −0.4 to 3.5) increase for the subcortical region. Mean increase in the periventricular region was 1.4 (SD 2.2) and in the subcortical region 0.5 (SD 0.9). This corresponds to a mean WML increase at a rate of 0.42 mL per year in the periventricular region and 0.15 mL per year in the subcortical region. figure 1 shows an example of WML progression in the periventricular and subcortical region.
Figure 1. White matter lesions (WML) progression in an 88-year-old woman who participated in our study. The composite shows a slice from the baseline study (left), and a corresponding slice from the follow-up study (right). After 3 1/2 years, WML progression has occurred (arrows) in the left and right occipital cap (periventricular region), extending into the parietal subcortical regions.
WML volumes at baseline were positively correlated with WML change (Pearson correlation coefficient 0.70, p = 0.001 for periventricular WML; 0.90, p < 0.001 for subcortical WML). Table 1 gives the WML severity at baseline and WML change as well as the number of participants with WML progression, as assessed with the different visual rating scales for the four raters. Several methods showed on average an increase in WML in both the periventricular and subcortical region, but the number of participants showing progression varied largely between the different methods applied.
Table 1 WML severity at baseline, change, and number of participants with progression for the four visual rating scales and for the four raters
Correlation between volumetric assessment and visual rating scales.
We evaluated the concordance between the volumetric WML change and the change assessed with the visual rating scales. This was done separately for the four raters, and after averaging the visual rating of the four raters in order to reduce noise due to interobserver disagreement. Figure 2 shows the scatter plots of the relationship between WML change measured with the volumetric assessment and the visual rating scales (average of four raters) in the periventricular and subcortical region. Visual inspection of the scatter plots shows comparatively good agreement between the WML change scale and the volumetric method, although the WML change scale tends to overestimate lesion change in the subcortical region (figure 2D), and may systematically underestimate lesion change in the periventricular region as volume change gets larger (figure 2H). Table 2 gives the nonparametric Spearman’s rho between the volumetric method and the four visual rating scales on WML change. Only the subcortical part of the RSS scale and the WML change scale showed significant correlation with the volumetric method for rater 1, 2, and 3, and for the average of the four raters (see table 2).
Figure 2. Scatter plots showing the relation between change in white matter lesions (WML) measured with the volumetric method (x-axis) and the different visual rating scales (y-axis) in the periventricular region: (A) Fazekas scale, (B) Rotterdam Scan Study (RSS) scale, (C) Scheltens scale, (D) WML change scale; and in the subcortical region: (E) Fazekas scale, (F) RSS scale, (G) Scheltens scale, (H) WML change scale.
Table 2 Correlation between volumetrics and visual rating of WML change assessed by the nonparametric Spearman rho test
Interobserver agreement.
The intraclass correlation coefficients for the interobserver agreement on baseline WML severity and change for the visual rating scales including our new WML change scale are presented in table 3. Values <0.20 were considered to reflect poor agreement, 0.21 to 0.40 fair agreement, 0.41 to 0.60 moderate agreement, 0.61 to 0.80 good agreement, and 0.81 to 1.00 very good agreement.18 The interobserver agreement for the baseline ratings on the Fazekas, Scheltens, and RSS scales was fair to good for the periventricular region and good to very good for the subcortical region (see table 3). However, agreement on change was poor for the Fazekas and Scheltens scale, fair for the WML change scale and the periventricular part of the RSS scale, and moderate for the subcortical part of the RSS scale.
Table 3 Interobserver agreement for the visual rating scales for baseline and change measurements
The raters had been using the existing rating scales by Fazekas, Scheltens, and the RSS previously and thus were better acquainted with them. In a post hoc study that was performed using the new WML change scale after additional training, two of the raters rated 200 additional pairs of scans. In this second sample, the interobserver agreement was 0.73 for the periventricular region, and 0.72 for the subcortical region, indicating good agreement.
Effect of blinding to the scan date.
Mean difference in score on the WML change scale between the not-blinded and blinded method were +0.075 (SD 0.40) points in the periventricular region, and −0.025 (SD 0.44) in the subcortical region. This indicates there is no substantial bias toward higher progression when images are scored with knowledge of which are the baseline and which are the follow-up images. The 95% limits of agreement between the not-blinded and the blinded method were (−0.71 to +0.86) for the periventricular region, and (−0.84 to +0.89) for the subcortical region, which suggests that for an individual the not-blinded and blinded method are unlikely to disagree more than one point on the WML change scale.
Discussion.
We evaluated three commonly used visual rating scales, and one new simple visual rating scale in terms of their ability to measure change in WML severity on MRI. We assessed the concordance of the visual assessments with volumetric change, and quantified the reproducibility of the scales for measuring WML change. In a stratified sample from a defined population, during a 3-year time period, both the volumetric method and the visual rating scales showed, on average, an increase in WML. We found significant correlations with volumetric change for the subcortical part of the RSS scale and for the new WML change scale. The interobserver agreement was moderate for change on the subcortical part of the RSS scale, and fair for the WML change scale. In a post hoc study, the interobserver agreement for the WML change scale improved to good agreement after observers had become familiarized with the scale. The Fazekas, Scheltens, and periventricular part of the RSS scale showed poor correlation with volumetric change, and poor to fair interobserver agreement on the change measurements.
Several methodologic issues need to be addressed. First, there is currently no gold standard for the assessment of WML change, and our volumetric method cannot be interpreted as such. Second, the comparison between volumetric WML change and WML change measured with different visual scales is complicated by differences in type of data (continuous versus categorical) obtained with the different methods. We used rank correlation to evaluate the relationship between the visual scales and volumetrics. Unlike agreement, correlation is not affected by the scale of measurement, but does depend on the range of the quantity of the sample.19 Therefore, the presented correlations cannot be interpreted as agreement between the visual scales and volumetrics, although they do allow for comparison between the visual scales. Third, visual rating was performed side by side, and with knowledge of the time sequence of scans, which may have led to some bias toward a higher progression rating. However, we evaluated this possible bias in a post hoc analysis, and found that this effect was very small. Fourth, registration of the follow-up scans on the baseline scans may have caused blurring effect, and although this effect was judged to be small, it may have contributed to higher progression rating with the visual rating scales.
The Fazekas, Scheltens, and RSS scales were designed for cross-sectional assessments of WML. When applied in a cross-sectional fashion these scales show both good intra- and interobserver agreement, which largely corresponds to our findings of fair to very good interobserver agreement for the baseline assessments.3,15,20⇓⇓ A previous study reported that visual rating with the Fazekas and Scheltens scales shows significant correlation with quantitative volumetric assessments.20 However, visual assessment of WML change with these scales is problematic. There are several explanations for the disappointing performance of these scales in measuring WML change. As shown in our and other data sets, baseline WML severity is positively correlated with WML change.11 WML that were already rated in the highest category at baseline (and which are most likely to progress) cannot contribute to progression on these scales due to a ceiling effect. Furthermore, new lesions may develop or lesions may grow without crossing the limits of the categories of the scales, and thereby remain below detection on the visual scales. The subcortical part of the RSS scale performed better in capturing change, most likely because it incorporates the number and size of lesions in more detail, thus avoiding a ceiling effect. However, the subcortical part of this scale is elaborate and time consuming. Our new WML change scale that was designed to measure WML change not only seemed to be valid, but also takes less time to apply. Although agreement on progression initially was fair, it approved to good agreement after additional training.
We found that during a 3-year period WML volume increased at a mean rate of 0.42 mL per year in the periventricular region and 0.15 mL per year in the subcortical region. These figures on rate of change do not directly reflect the rate of change in the population at large because of the way we constructed our sample for this validation study. The NHLBI Twin Study reported a mean WML volume increase of 0.38 mL per year in 168 individual male twins with a mean age of 72 years,21 which is comparable to the range of our findings on rate of change.
When we compare the interobserver agreement on the visual scales for the baseline assessments in the present study to those reported in the literature, interobserver agreement was comparable for periventricular WML on the Fazekas scale (0.37 versus 0.35 to 0.74) and for subcortical WML on the RSS scale (0.90 versus 0.88), higher for subcortical WML on the Fazekas scale (0.84 versus 0.34 to 0.78) and Scheltens scale (0.84 versus 0.69), but lower for periventricular WML on the Scheltens scale (0.56 versus 0.71) and RSS scale (0.64 versus 0.79 to 0.90).3,15,20⇓⇓
Acknowledgments
Supported by the Netherlands Organization for Scientific Research (grant 904-61-096).
The authors thank Dirk Knol for statistical advice.
- Received May 27, 2003.
- Accepted in final form December 1, 2003.
References
- ↵Longstreth WT, Jr., Manolio TA, Arnold A, et al. Clinical correlates of white matter findings on cranial magnetic resonance imaging of 3301 elderly people. The Cardiovascular Health Study. Stroke. 1996; 27: 1274–1282.
- ↵Pantoni L, Garcia JH. Pathogenesis of leukoaraiosis: a review. Stroke. 1997; 28: 652–659.
- ↵de Leeuw FE, de Groot JC, Achten E, et al. Prevalence of cerebral white matter lesions in elderly people: a population based magnetic resonance imaging study. The Rotterdam Scan Study. J Neurol Neurosurg Psychiatry. 2001; 70: 9–14.
- ↵Barber R, Scheltens P, Gholkar A, et al. White matter lesions on magnetic resonance imaging in dementia with Lewy bodies, Alzheimer’s disease, vascular dementia, and normal aging. J Neurol Neurosurg Psychiatry. 1999; 67: 66–72.
- ↵O’Brien J, Desmond P, Ames D, Schweitzer I, Harrigan S, Tress B. A magnetic resonance imaging study of white matter lesions in depression and Alzheimer’s disease. Br J Psychiatry. 1996; 168: 477–485.
- ↵
- ↵
- ↵DeCarli C, Murphy DG, Tranh M, et al. The effect of white matter hyperintensity volume on brain structure, cognitive performance, and cerebral metabolism of glucose in 51 healthy adults. Neurology. 1995; 45: 2077–2084.
- ↵Steffens DC, Krishnan KR, Crump C, Burke GL. Cerebrovascular disease and evolution of depressive symptoms in the Cardiovascular Health Study. Stroke. 2002; 33: 1636–1644.
- ↵Awad IA, Johnson PC, Spetzler RF, Hodak JA. Incidental subcortical lesions identified on magnetic resonance imaging in the elderly. II. Postmortem pathological correlations. Stroke. 1986; 17: 1090–1097.
- ↵Schmidt R, Fazekas F, Kapeller P, Schmidt H, Hartung HP. MRI white matter hyperintensities: three-year follow-up of the Austrian Stroke Prevention Study. Neurology. 1999; 53: 132–139.
- ↵Veldink JH, Scheltens P, Jonker C, Launer LJ. Progression of cerebral white matter hyperintensities on MRI is related to diastolic blood pressure. Neurology. 1998; 51: 319–320.
- ↵
- ↵
- ↵
- ↵
- ↵van Walderveen MA, Barkhof F, Hommes OR, et al. Correlating MRI and clinical disease activity in multiple sclerosis: relevance of hypointense lesions on short-TR/short-TE (T1-weighted) spin-echo images. Neurology. 1995; 45: 1684–1690.
- ↵Altman DG. Practical statistics for medical research. Boca Raton: Chapman & Hall, 1999.
- ↵
- ↵Kapeller P, Barber R, Vermeulen RJ, et al. Visual rating of age-related white matter changes on magnetic resonance imaging: scale comparison, interrater agreement, and correlations with quantitative measurements. Stroke. 2003; 34: 441–445.
- ↵DeCarli C, Swan GE, Park M, Reed T, Wolf PA, Carmelli D. Longitudinal changes in brain and white matter hyperintensity volumes among elderly male twins from the NHLBI twin study. Neurology. 2002; 58 (suppl 3): A399.
Disputes & Debates: Rapid online correspondence
NOTE: All authors' disclosures must be entered and current in our database before comments can be posted. Enter and update disclosures at http://submit.neurology.org. Exception: replies to comments concerning an article you originally authored do not require updated disclosures.
- Stay timely. Submit only on articles published within 6 months of issue date.
- Do not be redundant. Read any comments already posted on the article prior to submission.
- 200 words maximum.
- 5 references maximum. Reference 1 must be the article on which you are commenting.
- 5 authors maximum. Exception: replies can include all original authors of the article.
- Submitted comments are subject to editing and editor review prior to posting.
You May Also be Interested in
Related Articles
- No related articles found.