Revealing Individual Neuroanatomical Heterogeneity in Alzheimer Disease Using Neuroanatomical Normative Modeling

Background and Objectives Alzheimer disease (AD) is highly heterogeneous, with marked individual differences in clinical presentation and neurobiology. To explore this, we used neuroanatomical normative modeling to index regional patterns of variability in cortical thickness. We aimed to characterize individual differences and outliers in cortical thickness in patients with AD, people with mild cognitive impairment (MCI), and controls. Furthermore, we assessed the relationships between cortical thickness heterogeneity and cognitive function, β-amyloid, phosphorylated-tau, and ApoE genotype. Finally, we examined whether cortical thickness heterogeneity was predictive of conversion from MCI to AD. Methods Cortical thickness measurements across 148 brain regions were obtained from T1-weighted MRI scans from 62 sites of the Alzheimer's Disease Neuroimaging Initiative. AD was determined by clinical and neuropsychological examination with no comorbidities present. Participants with MCI had reported memory complaints, and controls were cognitively normal. A neuroanatomical normative model indexed cortical thickness distributions using a separate healthy reference data set (n = 33,072), which used hierarchical Bayesian regression to predict cortical thickness per region using age and sex, while adjusting for site noise. Z-scores per region were calculated, resulting in a Z-score brain map per participant. Regions with Z-scores <−1.96 were classified as outliers. Results Patients with AD (n = 206) had a median of 12 outlier regions (out of a possible 148), with the highest proportion of outliers (47%) in the parahippocampal gyrus. For 62 regions, over 90% of these patients had cortical thicknesses within the normal range. Patients with AD had more outlier regions than people with MCI (n = 662) or controls (n = 159) (F(2, 1,022) = 95.39, p = 2.0 × 10−16). They were also more dissimilar to each other than people with MCI or controls (F(2, 1,024) = 209.42, p = 2.2 × 10−16). A greater number of outlier regions were associated with worse cognitive function, CSF protein concentrations, and an increased risk of converting from MCI to AD within 3 years (hazard ratio 1.028, 95% CI 1.016–1.039, p = 1.8 × 10−16). Discussion Individualized normative maps of cortical thickness highlight the heterogeneous effect of AD on the brain. Regional outlier estimates have the potential to be a marker of disease and could be used to track an individual's disease progression or treatment response in clinical trials.

Alzheimer disease (AD) is the commonest cause of dementia, being characterized by a progressive deterioration in cognitive functioning and independence. 1 The AD spectrum comprises substantial clinical and biological differences between patients recognized in clinical and research criteria. 2 These differences include variations in genetic basis, 3 symptom profile, age at onset, trajectory and severity, 4,5 biomarker readouts (e.g., CSF β-amyloid [Aβ] levels), 6 comorbidities, 7 and in atrophy patterns. 8 Despite this, conventional statistical analyses focus on group averages. This fundamental statistical assumption posits that AD will affect different patients in similar ways, 9 characterizing the average patient. To reach the goal of precision medicine for AD, we need to look beyond the average and design statistical approaches that reflect patient heterogeneity at the individual level.
Neuroimaging has revealed that differences in brain structure are very common in patients with AD. 10 Neuroimaging methods are the gold standard of understanding the in vivo brain 11 ; specifically, structural imaging has been described as the imaging workhorse of neurodegeneration, being commonly recommended in AD diagnostic guidelines. 12 With this in mind, large structural neuroimaging data sets are increasingly available for dementia, such as Alzheimer's Disease Neuroimaging Initiative (ADNI), Open Access Series of Imaging Studies (OASIS), and National Alzheimer's Coordinating Center and in the general population (e.g., UK Biobank [UKB] and the Human Connectome Project). These data sets provide the ability to chart variation across cohorts and facilitate individual prediction.
Furthermore, large neuroimaging data sets have supported the development and application of data-driven methods in AD research. This has revealed that differences in brain structure are very common in patients. 8,13 Moreover, they have enabled the estimation of disease subtypes from neuroimaging data, as a way to disentangle heterogeneity by grouping patients by distinctive neurobiological and cognitive characteristics 8,10,13,14 and disease progression. 15 Such subtypes have the potential to stratify patient groups for clinical decision making, such as regarding treatment strategy, services and therapies tailored to clinico-radiologic phenotype, and/or trial enrollment. 16,17 Nevertheless, there are challenges associated with the clinical translation of neuroimaging-derived subtypes. 10 These include the validity of subtypes, how distinct subtypes are from each other, and how stable subtypes are over the disease course. 13,18 Moreover, by design, clustering assumes homogeneity within each cluster, clouding the individual-level variation present, therefore limiting the representation of heterogeneity in the sample. 19 For instance, individual-level variation is seen in atypical, nonamnestic AD (who comprise up to a third of young-onset AD), which results in challenges to diagnosis and appropriate care. 17 Arguably, assessing the neurobiology of AD at the individual patient level will provide a precise understanding of their disease, likely outcomes and facilitate tailored treatment strategies. However, although this concept of patient-centered, individualized precision medicine for AD is well established, current research efforts are limited.
Neuroanatomical normative modeling is an emerging technique that captures individual-level variability in the brain. This can provide individual statistical inferences with respect to an expected normative distribution or trajectory over time. Specifically, this was by modeling the relationship between neurobiological variables (e.g., neuroimaging features) and covariates (e.g., demographic variables such as age and sex) to map centiles of variation across a cohort (i.e., Z-scores). An individual can then be located within the normative distribution to establish to what extent they deviate from the expected pattern in each measure, and a map can be generated of where and to what extent an individual's brain differs from the norm. 20,21 This technique has shown to be suitable for precise mapping of individual patterns of variation in brain structure across multiple psychiatric and neurodevelopmental disorders. 20,[22][23][24] Such findings motivate the first application of neuroanatomical normative modeling to AD. 2 Here, we examine individual patterns of variation in brain structure in patients with AD using neuroanatomical normative modeling. Using the well-characterized, multisite, ADNI data set, we applied a recent implementation of the normative modeling framework, hierarchical Bayesian regression. This technique has been shown to efficiently accommodate intersite variation and provides computational scaling, which is useful when using large studies, or combining smaller studies together, that are acquired across multiple sites in a federated learning framework. [25][26][27] Our main objective was to quantify spatial patterns of neuroanatomical heterogeneity using cortical thickness measures in patients with AD, people with mild cognitive impairment (MCI), and cognitively normal controls by calculating deviations from normative ranges for each brain region and then identifying statistical outliers. Specifically, we aimed to (1) assess the extent of neuroanatomical variability between individual Glossary Aβ = β-amyloid; AD = Alzheimer disease; ADNI = Alzheimer's Disease Neuroimaging Initiative; FDR = false discovery rate; IQR = interquartile range; MCI = mild cognitive impairment; MMSE = Mini-Mental State Examination; OASIS = Open Access Series of Imaging Studies; p-tau = phosphorylated-tau; tOC = total outlier count; UCSF = University of California, San Francisco; UKB = UK Biobank. patients based on overlapping or distinct patterns of outliers, (2) quantify group differences in between-participant dissimilarity, (3) relate the quantity of neuroanatomical outliers to cognitive performance and AD biomarkers, and (4) examine whether the number of outliers relate to subsequent disease progression from MCI to AD.

Participants
Participants were derived from 2 data sets: (1) a reference data set that comprised healthy people across the human lifespan and (2) a clinical target data set, which included people with AD or MCI in addition to age-matched cognitively normal controls. The reference data set was made by combining data on healthy people from multiple publicly available sources, 27 including OASIS, Adolescent Brain Cognitive Development study, and UKB, detailed in eTable 1 (links.lww.com/WNL/ C774). The clinical data used in the preparation of this article were obtained from the ADNI database. 28 The criteria for study inclusion was the availability of a baseline T1-weighted MRI, which passed quality control. Furthermore, AD participants had to meet the National Institute of Neurological and Communicative Disorders and Stroke-AD and Related Disorders Association criteria for probable AD and were screened to exclude genetic risk for familial AD. Participants with MCI reported a subjective memory concern either autonomously or via an informant or clinician, and participants had no significant levels of impairment in other cognitive domains.
Standard Protocol Approvals, Registrations, and Patient Consents Written informed consent was obtained from all participants before experimental procedures were performed. Approval was received by an ethical standards committee for ADNI study data use.

MRI Acquisition
For the clinical data set, T1-weighted images were acquired at multiple sites using 3T MRI scanners. Detailed MRI protocols for T1-weighted sequences are available online. 29 The quality of raw scans was evaluated by University of California, San Francisco (UCSF) before our exclusion criteria. Scans were excluded based on technical problems and significant motion artifacts and clinical abnormalities. 30 Estimation of Cortical Thickness T1-weighted scans from both the reference and ADNI data sets were processed using a mix of both FreeSurfer versions 5 and 6. Cortical thickness values were generated using the recon-all cross-sectional approach. 31 This cortical thickness algorithm calculates the mean distance between vertices of a corrected, triangulated estimated gray/white matter surface and gray matter/CSF (pial) surface, 32 which generated the cortical thickness of each region of the Destrieux atlas regions. 33 This included the mean cortical thickness and 148 regions cortical thickness values for each participant.
Quality control of FreeSurfer processing for the reference data set relied on automated filtering median-centered absolute Euler number higher than 25, as used in prior work. 26,27 The exclusion of outliers based on Euler numbers has shown to be a reliable quality control strategy in large neuroimaging cohorts. 34,35 For the ADNI, quality control was based on a visual review of each cortical region performed by UCSF. Only scans that passed this quality control were used.

Neuroanatomical Normative Modeling
A hierarchical Bayesian regression model was trained on multisite data to generate normative models per region using the covariates age and sex. This was based on the population variation in the reference data set (training data), which adaptively pools parameter estimates across sites via a shared prior over regression parameters across sites. 27 This simultaneously accounts for intersite variation and allows sites to borrow strength from one another in a fully Bayesian framework. The advantage of training the models on the large independent data set, compared with just using the ADNI, is that the ADNI consists of many sites with small sample sizes. This would result in unstable estimates of normative distributions that could be strongly influenced by outliers or sampling bias. Here, by training on over n = 33,000 from only 9 data sets (with 60 sites), the model produces a stable distribution of estimates across the entire lifespan. Next, these estimates were conditioned to our specific context, using an adapted transfer learning approach. 27 The parameters of the reference normative model were recalibrated to the ADNI data set using 70% of healthy controls per ADNI site, where 70% was used to give stable estimates of the transferred model parameters, given that many of the scan sites in the ADNI have quite small sample sizes. The remaining 30% of healthy controls plus MCI and patients with AD were used to assess the heterogeneity in neuroanatomical presentation. This process generated regional and mean cortical thickness Zscores for each participant in the clinical data set, relative to the normative range of the reference data set. All modeling steps are performed using PCNtoolkit (version 0.20).

Statistical Analysis Group Cortical Thickness Differences
Cortical thickness group comparisons were conducted using t tests at each region and corrected for multiple comparisons using the false discovery rate (FDR). Significant p values were mapped onto the Destrieux atlas using the R package ggseg. 36

Outlier Definition and Statistics
Outliers in terms of low cortical thickness were identified for each region, defined as Z <−1.96 (corresponding to the bottom 2.5% of the normative distribution of cortical thickness). We only used the lower bound threshold for outliers as we were interested in cortical thinning associated with neurodegeneration. The number of outliers was summed across 148 regions for each participant to give a total outlier count (tOC) across regions. Linear regression tested for group differences in mean cortical thickness Z-score and tOC. In addition, group comparisons at each region were conducted using χ 2 (FDR corrected). The Hamming distance, a quantitative measure of similarity between binary thresholded cortical thickness outlier vectors, was used to measure dissimilarity between individuals. Median Hamming distances were compared between groups. To explore spatial patterns of cortical thickness outliers per group, the proportion of participants within each group whose cortical thickness was an outlier (i.e., Z < −1.96) was calculated for each region. This enabled visualization of the extent to which patterns of outlier regions overlap or are distinct. This was mapped using the Destrieux atlas via the R package ggseg. All statistical analyses were implemented in R version 3.6.2.

Outlier Associations With Cognitive Function and CSF Markers
Linear regression adjusting for age, sex, years of education, and Clinical Dementia Rating (sum of boxes) examined the relationship between tOC and cognitive composite scores (memory using ADNI MEM or executive function using ADNI EF). 37 We assessed the interactional effects of the diagnostic group within a subsequent regression. Furthermore, linear regression adjusting for age and sex only examined the relationship between tOC and CSF markers (Aβ and phosphoylated-tau [p-tau]). Here, we also assessed the interactional effects of the diagnostic group within a subsequent regression. To stratify outlier maps in both MCI and patients with AD groups, we used total scores from the Mini-Mental State Examination (MMSE).

MCI to AD Conversion Analysis
Follow-up diagnosis status data, up to 3 years from the baseline scan, were obtained from 454 people with MCI. In total, 76 people with MCI at baseline had converted to AD within 3 years. We then ran a survival analysis using Cox proportional hazards regression to assess whether tOC related to the risk of converting from MCI to AD, controlling for age and sex. We use a Kaplan-Meier plot to illustrate how either a low or high tOC (split via median) can contribute to the risk of converting.

Data Availability
Statistical analysis scripts are available on GitHub (github.com/ serenaverdi/ADNI_normative-modelling). The neuroanatomical normative model was generated using the PCNtoolkit software package (github.com/amarquand/PCNtoolkit). ADNI data used in this study are publicly available and can be requested following ADNI Data Sharing and Publications Committee guidelines: adni.loni.usc.edu/data-samples/access-data/

Participants
In the reference data set, a total of n = 33,072 T1-weighted MRI scans were collated across 60 sites (this sample is described in detail in Kia et al. 27 and summarized in eTable 1, links.lww.com/WNL/C774). The clinical ADNI data set amounted to 1,492 participants which were scanned across 62 sites (Table 1). Here 70% of controls were removed from the clinical data set and were used as a calibration data set to adapt the normative model to the new sites. These controls were randomly selected and stratified across sites and gender to make sure all sites and genders are present in the adaptation set. This left a total of 1,027 participants in the final clinical data set.

Patients With AD Have Smaller Cortical Thicknesses Than People With MCI or With Normal Cognition
Mean cortical thicknesses were compared across participant groups. Age-and sex-adjusted mean cortical thickness  Region-level pairwise group comparisons (total of 148 regions-FDR corrected) showed higher numbers of outliers in cortical thickness in 79 regions in AD vs controls, in 63 regions in AD vs MCI, and 1 region in MCI vs controls. Region-level group differences in outlier count were most evident within temporoparietal and to a lesser extent frontal and occipital regions ( Figure 1A).
Patients With AD Are Less Similar to Each Other Than People With MCI or With Normal Cognition Hamming distance matrices indicated greater within-group dissimilarity in patients with AD, relative to MCI or control participants, who were most similar to each other in spatial patterns of outliers ( Figure 2

Patients With AD Have Spatially Higher Proportions of Cortical Thickness Outliers
The proportion of outliers defined within each group differed in regional patterns between AD, MCI, and control groups. This is illustrated in Figure 1B   gyrus was the region with the highest outlier percentage (14% of the MCI group). For the control group, only 66 regions had outliers above the expected 2.5%. The left occipital temporal lateral sulcus was the region with the highest outlier percentage (6% of controls).    ( Figure 5A). This is illustrated within a Kaplan-Meier plot, which shows how a high tOC can contribute to the risk of converting in comparison to a low tOC ( Figure 5B).

Discussion
In this study, we defined individual spatial patterns of cortical thickness outliers and illustrated that AD does not affect different people in a uniform way. Moreover, our analysis quantified and visualized these individual differences in patterns of cortical atrophy. Overall, the results of the present study provide evidence of (1) heterogeneous patterns of cortical thickness between patients with AD, (2) associations of cortical thickness heterogeneity with cognitive performance and CSF Aβ and p-tau, and (3) the potential of individualized markers of cortical thickness heterogeneity to predict survival time before conversion from the MCI stage to diagnosed AD.
Our findings both complement and offer additional information to the established understanding of AD. We observed a high tOC in patients with AD, consistent with the evidence of cortical thinning as a consequence of AD neuropathology. 38 Moreover, we also observe significant associations with cortical thinning and poor cognitive performance, a decrease in CSF Aβ, and an increase in CSF p-tau (Figure 3), which is also consistent with previous findings. 39,40 Atrophy has also been associated with the risk of progression from MCI to AD 41 (Figure 5), alongside a combination of other biomarkers. 42 Importantly, these previous studies examined the correlates of common patterns of cortical atrophy-yet conversely, we considered individual variability in patterns of cortical thickness, as opposed to assessing group average relationships. This highlights that individualized measures of neuroanatomy are sensitive to both nonimaging disease markers and disease progression.
The tOC has the potential to be used as an individual patient metric of poor brain health to help inform clinical decisions. Indeed, similar measures have recently been adopted as a clinical measure, that is, brain volume/thickness patient Z-scores. However, these have been calculated using different normative modeling techniques, 43,44 which base their normative population on smaller reference samples; limit modeling to just whole brain, or within specific regions; and do not account for site-related variation (i.e., site effects). These studies also did not fully relate these to clinical outcomes and cognitive scores. Our tOC can provide an optimized measure here and will translate within clinical applications for precision medicine. When assessing regional heterogeneity of the ADNI sample, we observed more outliers in patients with AD in temporal Crosses indicate censoring points (i.e., age at last diagnosis assessment). The filled color represents the 95% confidence intervals. (B) Mapped is the proportion of regional outliers among people with MCI who converted to patients with AD. AD = Alzheimer disease; MCI = mild cognitive impairment; tOC = total outlier count.
regions such as the hippocampus and the cingulate cortex. These are areas known to be sensitive to neurodegeneration in AD 45 and are responsible for clinical symptoms in AD. 46 However, looking beyond these group-average regional differences, we observe that the highest proportion of outliers in a single region was less than 50% in the AD group (Figure 1). This suggests that the individual spatial patterns of outliers in AD only partially overlap between patients; if atrophy were homogenous (as assumed within group averages), we might expect 100% of participants to have outliers here.
The observed variation in atrophy in the temporal lobe is consistent with subtyping studies. 8,47 Also, a recent study used normative modeling to estimate neuroanatomical heterogeneity within the ADNI cohort, which shows similarities of variation in atrophy within the temporal regions. 48 However, in comparison to these studies, our specific application of neuroanatomical normative modeling has enabled the creation of an individual metric of neuroanatomical heterogeneity, characterized the spatially distributed nature of alterations in MCI and AD, and assessed how neuroanatomical variability relates to cognitive performance, CSF biomarkers, disease progression, or genetic factors. Furthermore, our study employs a normative modeling technique (hierarchical Bayesian regression), which crucially accounts for the confounding effects of multiple scanning sites when evaluating neuroanatomical heterogeneity in AD.
Going further, our study reveals that each patient not only differs in the number of outliers they have, but the regional patterns of outliers markedly differ (Video 1). The latter is reflected in large levels of dissimilarity between individuals with AD ( Figure 2). Potentially, one reason for the variable patterns of atrophy is simply disease stage, whereby more atrophy appears with greater disease progression. However, our results indicate that this is not the case, as when closely examining patients of very similar demographics and clinical characteristics, being at a comparable disease stage (e.g., based on MMSE score), heterogeneous patterns of cortical atrophy were still present ( Figure 4).
It is surprising to observe that cognitively normal controls also showed some outliers, suggesting a degree of within-group heterogeneity (Figures 1 and 2). Therefore, the assumption of homogeneity in case-control studies should be made with caution, even in control groups. Statistical designs for basic research and clinical trials should better reflect this heterogeneity in brain structure.
A few considerations can be made regarding the data sets used within the study. Although the reference data set includes over 30,000 individuals, we should be cautious to assume that it is representative of a healthy population. Also, patients who volunteer for research studies (i.e., ADNI) do not necessarily reflect the clinical population. Future neuroanatomical normative modeling studies could supplement the reference data set with MRI scans acquired from routine clinical visits, community cohorts, or other less selective sources. Finally, the reference data were processed with a variety of FreeSurfer versions. While impractical to unify the image processing retrospectively, the different versions of FreeSurfer may potentially add noise to the normative models. This represents an important caveat to consider and further investigate.
As the ADNI comprises more participants with early-stage dementia, examining late-stage patients with AD may offer insights into the heterogeneity in spatial patterns of atrophy across the disease course. Clinical observations have suggested that late-stage patients with AD have widespread atrophy across the brain; therefore, we may hypothesize such patients will have less heterogeneous patterns of atrophy. However, regardless of the heterogeneous patterns of atrophy, the tOC can still provide information about the extent of cortical atrophy in a given individual.
Another limitation of the ADNI data set is the underrepresentation of cognitive domains beyond memory, executive function, and language. Between a quarter to a third of the AD group exhibit parieto-occipital outliers, comparable to separate parieto-occipital predominant subtypes associated with prominent visuospatial dysfunction, 10 Further characterization of how outlier distribution relates to nonmemory/executive symptoms may be of particular clinical relevance, for example, given the implications of visuospatial dysfunction for diminished autonomy, falls risk, and appropriate services. 17,49,50 Future efforts when applying neuroanatomical normative modeling to AD data should incorporate serial neuroimaging across multiple time points. This will help define patient-level longitudinal trajectories. Mapping neuroanatomical variability using neuroanatomical normative modeling at different time points has the potential to improve predictions of disease progression or treatment response at the level of the individual patient. Apart from our MCI to AD analysis, the sample taken from this study is cross-sectional, reflecting a snapshot in time, yet heterogeneity has been shown to differ temporally. 51 Potentially, data-driven staging methods here (e.g., SusStain 15 ) may also provide clinically useful information of longitudinal trends of individual heterogeneity while taking account of an individual's disease stage.
Furthermore, it will also be valuable to map variation using other neuroanatomical metrics, such as subcortical volumes. Our methodology can be extended to include subcortical volumes by using a reference data set that has such data available. 23 Future efforts that adopt this could enrich our understanding of regional anatomic heterogeneity between patients.
We provide a quantitative approach to estimate variability in brain atrophy at the regional level for individual patients. Individualized maps of neuroanatomical outliers were related to cognitive performance and CSF biomarkers. Furthermore, the number of outliers, based on individual patterns, helped predict conversion from MCI to AD. These individual neuroanatomical maps, derived from normative models, have the potential to be a marker of AD state. These could index disease progression or even evaluate the effectiveness of potential disease-modifying treatments tailored to the individual patient.