Determining the Minimal Important Change of Everyday Functioning in Dementia

Background and Objectives Decline in everyday functioning is a key clinical change in Alzheimer disease and related disorders (ADRD). An important challenge remains the determination of what constitutes a clinically meaningful change in everyday functioning. We aimed to investigate this by establishing the minimal important change (MIC): the smallest amount of change that has a meaningful effect on patients' lives. We retrospectively investigated meaningful change in a memory clinic cohort. Methods In the first, qualitative part of the study, community-recruited informal caregivers of patients with ADRD and memory clinic clinicians completed a survey in which they judged various situations representing changes in everyday functioning. Their judgments of meaningful change were used to determine thresholds for MIC, both for decline and improvement, on the Amsterdam Instrumental Activities of Daily Living (IADL) Questionnaire. In the second, quantitative part, we applied these values in an independent longitudinal cohort study of unselected memory clinic patients. Results MIC thresholds were established at the average threshold of caregivers (N = 1,629; 62.4 ± 9.5 years; 77% female) and clinicians (N = 13): −2.2 points for clinically meaningful decline and +5.0 points for clinically meaningful improvement. Memory clinic patients (N = 230; 64.3 ± 7.7 years; 39% female; 60% dementia diagnosis) were followed for 1 year, 102 (45%) of whom showed a decline larger than the MIC, after a mean of 6.7 ± 3.5 months. Patients with a dementia diagnosis and more atrophy of the medial temporal lobe had larger odds (odds ratio [OR] = 3.4, 95% CI [1.5–7.8] and OR = 5.0, 95% CI [1.2–20.0], respectively) for passing the MIC threshold for decline than those with subjective cognitive complaints and no atrophy. Discussion We were able to operationalize clinically meaningful decline in IADL by determining the MIC. The usefulness of the MIC was supported by our findings from the clinical sample that nearly half of a sample of unselected memory clinic patients showed a meaningful decline in less than a year. Disease stage and medial temporal atrophy were predictors of functional decline greater than the MIC. Our findings provide guidance in interpreting changes in IADL and may help evaluate treatment effects and monitor disease progression.

than the MIC. Our findings provide guidance in interpreting changes in IADL and may help evaluate treatment effects and monitor disease progression.
Alzheimer disease and related disorders (ADRD) are characterized by a gradual decline in cognitive and daily functioning, eventually leading to dementia. 1 Although changes in cognitively complex "instrumental activities of daily living" (IADLs) may occur in preclinical and prodromal disease stages, 2,3 little is known about the clinical meaningfulness of these initial changes. Determining clinical meaningfulness has become especially important because treatment and prevention studies are increasingly targeting early populations. 4,5 Regulatory agencies emphasize that the clinical efficacy of newly developed drugs should be predicated on a meaningful effect on relevant outcome measures. 6 The clinical meaningfulness of changes addresses a fundamental issue: What amount of change on a clinical outcome measure constitutes a change that is meaningful, or important, for the patient? This question has only been sparsely investigated, and definitions are inconsistent. Some have argued that the mere presence of any change in performance on questionnaires addressing everyday functioning is clinically meaningful. 7,8 Others have reasoned that clinical meaningfulness comprises prediction of future conversion from normal cognition to mild cognitive impairment (MCI) or dementia. 9 The first definition may overgeneralize and include changes due to noise, whereas the second may miss more subtle changes that can still have an impact on a patient's life. In the present work, we use the term "minimal important change" (MIC), which has been defined as the smallest within-person change that is important to the patient. 10,11 The MIC can be determined using anchors, 12 in which an external appraisal of the change, such as a single question on global perceived change, is used as an "anchor" to determine a MIC on an instrument (e.g., "On a scale of 0-10, how would describe the patient now, compared with 1 year ago? [0: no change; 10: much worse]"). A downside of this method is that the MIC then depends on the anchor and the anchor's quality. It has been shown that the anchor can be more strongly influenced by the patient's final status rather than reflecting the actual change. 13 An alternative can be found in a new systematic, qualitative approach 14 in which stakeholders (i.e., patients, caregivers, and clinicians) are asked to compare fictional patient summaries with different levels of impairment in the area that is being measured. Thresholds are then placed at the first point where the stakeholders indicate that a difference is meaningful. 14 The thresholds thus represent the MIC, and any change beyond it is deemed clinically meaningful.
We set out to establish the thresholds for MIC on the Amsterdam IADL Questionnaire (A-IADL-Q), an extensively validated measure of everyday functioning. 15,16 Subsequently, we applied the MIC thresholds to data from a cohort of memory clinic patients and registered how many passed the MIC threshold and which demographic, biological, and neuropsychological factors were associated with surpassing the MIC threshold.

Methods
Our study comprised 2 parts: a qualitative part to establish the MIC thresholds and a quantitative part in which we applied the MIC to a cohort of memory clinic patients, to investigate the frequency of passing the MIC threshold within 1 year and which factors were associated with surpassing the MIC threshold.

Standard Protocol Approvals, Registrations, and Patient Consent
This study was approved by the ethical review board of the VU University Medical Center. All included participants provided informed consent for the use of their data in accordance with the Declaration of Helsinki.

Establishing MIC Thresholds Participants
We recruited participants for an online survey to establish MIC thresholds on the A-IADL-Q through the Dutch Brain Research Registry (hersenonderzoek.nl). 17 We selected people who indicated that they were direct relatives and/or informal caregivers of people diagnosed with a dementiarelated diagnosis. Potential participants were excluded if they reported to have received such diagnosis themselves. Recruitment ran from February to April 2020. We also invited clinicians (neurologists, geriatricians, nurse specialists, and neuropsychologists) working in memory clinics in the Netherlands to complete the same survey. Glossary ADRD = Alzheimer disease and related disorders; A-IADL-Q = Amsterdam IADL Questionnaire; Aβ = amyloid beta 1-42 ; GDS = Geriatric Depression Scale; IADLs = instrumental activities of daily living; IRT = item response theory; MCI = mild cognitive impairment; MIC = minimal important change; MMSE = Mini-Mental State Examination; MTA = medial temporal atrophy; OR = odds ratio; SCD = subjective cognitive decline; WAIS = Wechsler Adult Intelligence Scale; ZBI = Zarit Burden Interview.

Materials: A-IADL-Q
The A-IADL-Q is an adaptive questionnaire aimed at measuring functional impairment in early dementia. 15 The questionnaire is self-administered and completed by a caregiver. Previous studies have shown robust psychometric properties, including sensitivity to change and good construct validity. 18,19 The questionnaire consists of 70 items assessing cognitively complex everyday activities. The total scores ("T-scores") are computed using item response theory (IRT), which uses mathematical models to calculate probabilities for item endorsement given a person's ability. This scoring method is described in more detail elsewhere. 16,19 The T-scores have a mean of 50 and a SD of 10 in the memory clinic. Lower scores indicate more impairment.

Materials: Vignettes
We created 18 vignettes using IRT item parameters that showed the most likely item responses at different total scores, that is, at different levels of functional impairment. To find the most likely responses at various T-scores, we used a script created by Morgan and colleagues. 20 To obtain the optimal balance between distinguishable levels of functional impairment and small distances between the vignettes, they were placed 0.2 SDs apart. We created 6 reference vignettes were spread across the total score distribution, representing different base levels of functioning. Cases were given a random sex and common last name and placed at the following T-scores:

Procedures
Survey respondents (both caregivers and clinicians) were randomly branched into 1 of 6 groups, each of which received a different "case" with a unique reference vignette. They were then shown 7 "comparison vignettes," which ranged from −8 to +6 points from the reference vignette.
Following previously outlined procedures, 14 we presented vignettes in pairs, with the reference vignette representing the patient's functioning "1 year ago" and each comparison vignette representing a new situation "now." Respondents judged whether the functioning "now" was better, worse, or the same as "1 year ago" (Figure 1). If the respondent considered there to be a decline or an improvement, they were then asked to state whether the decline or improvement in functioning would make a meaningful difference in everyday life. This was the core question of the survey. If the respondent judged both vignettes to represent the same level of daily functioning, the next situation was shown.
Individual MIC thresholds resulting from the survey responses represent the smallest change indicated as being meaningful. Thus, the score difference for the first situation that the respondent rated as a meaningful change in daily functioning was considered the threshold for MIC. Thresholds were determined separately for decline and improvement and could range from −8 to −2 and +2 to +6, respectively. When a respondent did not rate any of the presented comparison vignettes as a clinically meaningful change, their threshold was considered missing. We also investigated 2 types of misjudgment. First, when a respondent judged a comparison vignette anchored on a score representing more severe functional impairment than the reference vignette as an improvement (or vice versa), this judgment was considered out-of-range and treated as a judgment of no change. Second, we examined paradoxical judgments. When a smaller distance between reference and comparison vignettes was rated as a meaningful change and a larger distance was not (e.g., a 4-point decrease is judged as meaningful, whereas a 6-point decrease is not), the latter judgment is considered paradoxical.

Participants and Procedures
Next, we applied the MIC thresholds retrospectively to a cohort of consecutive memory clinic patients and their caregivers from the Amsterdam Dementia Cohort, 21 who visited Alzheimer Center Amsterdam for dementia screening between July 2013 and May 2015. Eligibility criteria were (1) a completed baseline A-IADL-Q from the screening visit, (2) the presence of a caregiver, (3) the availability to complete the follow-up A-IADL-Q online at home, and (4) adequate knowledge of the Dutch language. We did not select for diagnosis.
At the baseline screening visit, caregivers completed the A-IADL-Q, while the patients underwent a standard neuropsychological test battery. The screening visit also included a neurologic examination, brain MRI, and a lumbar puncture. 21 Diagnoses were made in a multidisciplinary consensus meeting in which the results from the screening visit were discussed. 21 Clinical diagnoses were made according to the criteria for subjective cognitive decline (SCD), MCI, dementia, Alzheimer disease, frontotemporal dementia, dementia with Lewy bodies, and vascular dementia. 21 Non-Alzheimer disease types of dementia were grouped to avoid small group sizes.
Caregivers were then invited to complete the A-IADL-Q from home at 4 follow-up waves: 3, 6, 9, and 12 months after baseline. At each follow-up wave, caregivers were also asked to rate on a visual analogue scale ranging from 0 (no decline/no burden) to 100 (very large decline/very large burden) (1) how much they think the patient declined from baseline and (2) how much burden they experienced from taking care of the patient. These 2 questions served as anchors. They could opt out at any point during the study. Invitations to participate were sent through e-mail at each wave, even when a previous wave was missed, unless the caregiver explicitly opted out of the study.

Clinical Measures
A standardized neuropsychological assessment was performed at baseline and included the Dutch version of the Auditory Verbal Learning Task 22 and the Visual Association Test, 23 to measure episodic memory. The Trail Making Test, Part B, 24 Wechsler Adult Intelligence Scale (WAIS) digit span backward, 25 letter fluency, 26 and Stroop Color-Word Task card III 27 were used to measure executive functioning. Attention and speed were measured using the Trail Making Test, Part A, 24 Stroop Color-Word Task card I, 27 the Letter Digit Substitution Test, 28 and the WAIS digit span forward. 25 Language tasks included the naming portion of the Visual Association Test 23 and the category fluency (animal naming) task. 26 We calculated Z-scores for the neuropsychological domains: episodic memory, executive functioning, attention/speed of processing, and language. Before Z-scoring, tests were reverse scored as necessary so that higher Z-scores represent better cognitive functioning. The Z-scores were computed using the means and SDs of the measures in the entire sample.
The Mini-Mental Examination (MMSE) was used as an indication of general cognitive performance, with higher scores representing better cognition. 29 The 15-item version of the Geriatric Depression Scale (GDS) was used as an indicator for depressive symptoms, 29 with higher scores representing more severe depressive complaints. The Zarit Burden Interview (ZBI) was used to determine the level of burden the caregiver experienced from caring for the patient, with scores ranging from 0 to 88 and higher scores indicating a larger caregiver burden. 30

Biological Measures
At baseline, patients underwent a standard MRI protocol on a 1.5 or 3 Tesla scanner. 21 All scans were visually rated by a radiologist who was blind to other clinical information. Visual rating scales were used on T1-weighted and fluidattenuated inversion recovery images to provide measures of atrophy and other neurodegenerative structural changes First, 2 vignettes are shown side-by-side, with one representing functioning of a fictional patient 1 year ago (the "reference vignette" on the top left, anchored in the example at T = 46) and the other representing functioning now (the "comparison vignette" on the top right, anchored in the example at T = 42). The respondent is asked to indicate whether they think the problems have worsened, remained the same, or improved from 1 year ago to now. Depending on the answer, they will be asked a follow-up question to determine whether the change (if any) was meaningful. and included the medial temporal atrophy (MTA) scale, 31 the posterior atrophy scale, 32 the global cortical atrophy scale, 33 and the Fazekas scale 34 for white matter hyperintensities. Cerebral microbleeds were counted.
Amyloid beta 1-42 (Aβ) levels in CSF were measured using ELISA (Innogenetics-Fujirebio, Ghent, Belgium) at the Neurochemistry Laboratory. 35 We dichotomized amyloid status into negative or positive for AD based on our center's cutoff of <813 pg/mL. 36 We also computed the ratio between phosphorylated tau and Aβ. A subset of participants underwent amyloid PET scans, using 11 C-Pittsburgh compound-B, 18 Fflutemetamol, 18 F-florbetapir, or 18 F-florbetaben. The result of the PET scan was dichotomized as either negative or positive for AD based on visual read by an independent nuclear radiologist.
APOE genotyping was performed after automated genomic DNA isolation from 2 to 4 mL EDTA blood. It was subjected to PCR testing, checked for size and quantity using a QlAxcel DNA Fast Analysis kit (Qiagen), and sequenced using Sanger sequencing on an ABI130XL. Patients with either 1 or 2 e4 alleles were classified as APOE e4 carriers.

Statistical Analyses
To obtain MIC thresholds, we averaged individual thresholds separately for each of the 6 cases, as well as all informal caregivers, clinicians, and the entire survey sample. Taking the average thresholds of all caregivers and the average thresholds of the clinicians, we established the final MIC thresholds as the average of the 2.
In the clinical cohort, patients were divided into 3 groups at each follow-up visit based on whether they surpassed the thresholds for MIC: (1) patients showing no meaningful change, (2) patients showing a meaningful decline, and (3) patients showing a meaningful improvement. In addition, patients were also classified in the same groups as based on their last visit (i.e., final status). The time in months from baseline to the first visit at which the MIC thresholds were surpassed was also recorded.
Group differences were tested using linear or logistic regressions, as appropriate. The Tukey range test was used to correct for multiple comparisons. Possible attrition bias was investigated by comparing baseline characteristics of patients who completed the last follow-up wave with those who dropped out.
Finally, we ran multinomial logistic regression models to identify baseline characteristics that were associated with the MIC groups (decline or improvement greater than the MIC, with no change beyond the MIC as the reference group), including screening instruments (MMSE, GDS, ZBI, diagnostic group), neuropsychological assessments (episodic memory, executive functioning, attention, processing speed, and language domain Z-scores), Alzheimer disease genetic risk factors and amyloid biomarkers, and MRI. All factors were investigated individually, with adjustments for sex, education, baseline age, and syndrome diagnosis (SCD, MCI, or dementia). Analyses were run in R version 4.1.1, 37 using the "nnet" package version 7.3-16 for the multinomial logistic regressions. 38

Data Availability
Data not provided in the article because of space limitations may be shared (anonymized) at the request of any qualified investigator for purposes of replicating procedures and results.
Almost all caregivers (n = 1,599; 98%) rated at least one of the situations as showing an important decline. An overview of how many caregivers reached the MIC threshold in each situation is given in eTable 2, links.lww.com/WNL/C83. We observed a difference in the proportion of caregivers who reached the threshold between those who saw the case with the lowest reference T-score (Mr. Garcia, T = 34) and all other cases (p < 0.001). The average MIC threshold for decline was 2.4 ± 1.0 points among all caregivers ( Table 1). The average threshold varied by the reference vignette: Caregivers who judged the Mr. Garcia case with the lowest T-score had the highest average threshold. The average threshold was also significantly higher in the group of caregivers who judged the case with a T-score of 50, compared with the other groups. Most participants (n = 1,216; 75%) made no paradoxical judgments for decline. Clinicians unanimously rated the smallest decline in scores as an important decline, placing the clinicians' MIC for decline at −2.0.
Most participants (n = 1,078; 66%) made no paradoxical judgments for improvement. Only 362 caregivers (22%) rated any of the improvements as important. In the groups where the reference vignette had a higher level of functioning (T = 54 and T = 50), more caregivers reached the MIC threshold for improvement. The average MIC threshold for improvement was 4.7 ± 1.3 points (Table 1). Five clinicians detected a meaningful improvement, with an average threshold of 5.2 ± 1.1.
Taken together, the MIC threshold for decline was established at −2.2 (i.e., the average of −2.4 for caregivers and −2.0 for clinicians), with a decline of 2.2 points or more indicating a meaningful decline. The MIC threshold for improvement was established at +5.0 (i.e., the average of +4.7 for caregivers and +5.2 for clinicians), meaning that an increase in the T-score of 5.0 points or more shows a meaningful improvement in everyday functioning.
The number of patients showing a meaningful decline from baseline increased with each follow-up wave, whereas the number of patients showing meaningful improvement or no meaningful change decreased. In subsequent analyses, we used the groups as defined at the patient's last completed visit. At the last visit assessment, 104 patients (45%) showed a meaningful decline, whereas 36 (16%) showed a meaningful improvement. The remaining 90 patients (39%) did not show a meaningful change during their follow-up. The anchors indicated that there was a stronger decline from baseline in the patients who surpassed the MIC (mean 39.0 ± 30.0) for decline than patients who showed no meaningful change (19.3 ± 21.5; mean difference p < 0.001) or meaningful improvement (12.1 ± 17.2; mean difference p < 0.001). Similarly, caregivers experienced a greater burden from taking care of patients who surpassed the MIC for decline (38.2 ± 28.5) than patients who did not change meaningfully (29.2 ± 26.0; mean difference p < 0.001) and patients who surpassed the MIC for improvement (15.7 ± 23.2; mean difference p < 0.001).  Table 3 summarizes the number of patients who reached the MIC thresholds for decline and improvement and the average time in months it took to reach them, for the entire sample, as well as for each diagnostic group separately. There were no significant differences between any of the diagnostic groups in time to

Discussion
In this study, we involved informal caregivers and clinicians of patients with ADRD to determine what amount of change in  functional impairments constitutes a clinically meaningful change. We established thresholds for the MIC, both for evaluating meaningful decline and meaningful improvement on the A-IADL-Q. We found that patients with dementia and more severe atrophy of the medial temporal lobe were more likely to show a meaningful decline in daily functioning than patients with SCD and with no atrophy.
The clinical meaningfulness of changes in cognitive and functional measures is of vital importance to track disease progression in clinical practice. It is also important for evaluating potential treatment effects. Full approval by the US Food and Drug Administration of disease-modifying treatments is contingent on the evidence of a meaningful benefit, 6 yet the interpretation of outcome measures remains difficult, 39 and there is considerable variability in how clinical meaningfulness is defined and investigated. Consensus is yet to be reached. 40 Some methods have methodological and conceptual limitations, including inadequate reliability and validity. 14,41,42 Distribution-based methods rely on statistics and are neither informed by clinical information nor do they translate to what is clinically meaningful. External anchors can give an indication of the perceived magnitude or importance of a change, but they may also be affected by current status, 13,42 which renders them less reliable for investigating the clinical meaningfulness of changes. More importantly, neither method considers input from the target population, although only the individuals themselves, and those who are close to them, can indicate whether a change is impactful. Still, these methods are commonly used in dementia research 8,[43][44][45][46] possibly because more elaborate qualitative approaches require extensive work. Our study is unique in the field of ADRD research in that it uses a systematic qualitative method involving the most important stakeholders.
Overall, we found that most caregivers considered the smallest amount of decline clinically meaningful. This suggests that even subtle decline in IADL functioning has a meaningful impact on the daily life of a patient. Depending on the base level of functioning, slightly differing amounts of change were considered meaningful. When someone's level of functioning is more impaired, a stronger decline may be necessary before it is considered meaningful. When functioning is relatively good, a small decline in functioning seems to have a meaningful impact.
When looking at changes in the opposite direction, we found that only when impairments were initially relatively limited, more than half the respondents identified important improvements. However, it is of interest that the threshold for minimal important improvement was higher when the level of functioning was better, compared with when there were more impairments at baseline. This finding seems to suggest that meaningful improvement from a more impaired status may require a somewhat smaller change, whereas meaningful improvement from a less impaired baseline may only occur when the change is relatively large.
This last finding links to another important point of discussion in the context of disease-modifying treatments and prevention studies: Does the absence of a meaningful decline constitute a clinical benefit or should a meaningful improvement be achieved? We found that determining the threshold for meaningful improvement was much more difficult than for decline. Less than a quarter of caregivers considered any of the situations to represent a meaningful improvement, which seems to implicate that improvements in functioning need to be larger before they have an impact on daily life. However, it is also possible that imagining an improvement in daily functioning in the context of dementia is difficult because this is currently not a reality. With the rapid developments in drug development, 47 the exercise of establishing MIC thresholds on outcome measures may need to be repeated because our understandings of what is possible change.
The second part of our study was to apply the MIC thresholds in a real-life data set. Just under half of a nonselected group of memory clinic patients passed the MIC threshold for decline within 1 year and thus showed a meaningful decline, on average within approximately 7 months. Patients who were diagnosed with dementia were more likely to show a meaningful decline than those diagnosed with subjective cognitive decline. Those with more MTA were more likely to show a meaningful decline than those with no atrophy. When the caregiver experienced a larger burden, the patient was less likely to surpass the MIC threshold for improvement. These findings provide further evidence that biological and cognitive factors underlie changes in IADL functioning: We previously found that any decline in IADL functioning was associated with disease severity, i.e., that patients with dementia declined faster than patients with subjective cognitive complaints, 18 and that worse IADL performance was associated with atrophy in the medial temporal lobe. 48 Studies with other IADL measures related changes in IADL to disease stage, 3 amyloid burden, 49 and executive functioning, 50 irrespective of the clinical meaningfulness of changes. In the present work, we show that disease stage, atrophy, and caregiver burden are associated with clinically meaningful changes in everyday functioning. It is therefore recommended that these factors be included in research of disease progression.
This study has some limitations. The qualitative method we used in the first part of our study is relatively new, which means that methodological guidelines are yet to be established. We followed earlier work and presented changes that ranged from one-fifth to four-fifths of a SD in the total score. Had we presented a smaller amount of change (e.g., a tenth of a SD), it is possible that the MIC thresholds would still be lower. However, such small changes may have been too subtle to distinguish and may also fall within the measurement error of the instrument. Similarly, if we had included larger amounts of change, more respondents may have reached the MIC threshold for improvement, which would then be more reliable. Future studies could replicate our findings in new samples, including outside of The Netherlands and representing individuals with different backgrounds and older ages. In the second part of our study, nonadherence was quite high. Dropouts and missed visits may have affected our estimates of the number of patients who passed the MIC thresholds. It is possible that patients who declined more severely discontinued their participation in the study, which may have led to an underestimation of actual decline. We did not find that patients who dropped out differed from those who completed the last visit, making this a less likely explanation. A further limitation was that we applied the MIC thresholds retrospectively and therefore did not ask the participants in the clinical sample whether they agreed with the MIC category that their loved one fell into. However, we did find that, on the anchor questions, participants indicated that their loved ones declined more strongly and that the caregiver burden was larger, when the patient passed the MIC for meaningful decline.
A particular strength of this study was our qualitative approach to establish thresholds for meaningful changes, involving different stakeholders (informal caregivers and clinicians). The frequent measurements with short intervals allowed us to pinpoint after how much time each patient first passed the threshold for meaningful decline. Finally, all patients underwent an elaborate diagnostic workup which provided a clear clinical diagnosis and allowed us to investigate a range of baseline characteristics to relate to IADL changes.
In conclusion, we performed a crucial investigation of the clinical meaningfulness of changes in IADL functioning. We applied a qualitative method involving stakeholders to determine the smallest amount of change in everyday functioning that has a meaningful impact on the patient's life and applied the thresholds we established to a cohort of memory clinic patients. Our findings have implications for evaluating possible treatment effects in clinical trials, as well as for monitoring disease progression in clinical practice.