C9orf72, age at onset, and ancestry help discriminate behavioral from language variants in FTLD cohorts

Objective We sought to characterize C9orf72 expansions in relation to genetic ancestry and age at onset (AAO) and to use these measures to discriminate the behavioral from the language variant syndrome in a large pan-European cohort of frontotemporal lobar degeneration (FTLD) cases. Methods We evaluated expansions frequency in the entire cohort (n = 1,396; behavioral variant frontotemporal dementia [bvFTD] [n = 800], primary progressive aphasia [PPA] [n = 495], and FTLD–motor neuron disease [MND] [n = 101]). We then focused on the bvFTD and PPA cases and tested for association between expansion status, syndromes, genetic ancestry, and AAO applying statistical tests comprising Fisher exact tests, analysis of variance with Tukey post hoc tests, and logistic and nonlinear mixed-effects model regressions. Results We found C9orf72 pathogenic expansions in 4% of all cases (56/1,396). Expansion carriers differently distributed across syndromes: 12/101 FTLD-MND (11.9%), 40/800 bvFTD (5%), and 4/495 PPA (0.8%). While addressing population substructure through principal components analysis (PCA), we defined 2 patients groups with Central/Northern (n = 873) and Southern European (n = 523) ancestry. The proportion of expansion carriers was significantly higher in bvFTD compared to PPA (5% vs 0.8% [p = 2.17 × 10−5; odds ratio (OR) 6.4; confidence interval (CI) 2.31–24.99]), as well as in individuals with Central/Northern European compared to Southern European ancestry (4.4% vs 1.8% [p = 1.1 × 10−2; OR 2.5; CI 1.17–5.99]). Pathogenic expansions and Central/Northern European ancestry independently and inversely correlated with AAO. Our prediction model (based on expansions status, genetic ancestry, and AAO) predicted a diagnosis of bvFTD with 64% accuracy. Conclusions Our results indicate correlation between pathogenic C9orf72 expansions, AAO, PCA-based Central/Northern European ancestry, and a diagnosis of bvFTD, implying complex genetic risk architectures differently underpinning the behavioral and language variant syndromes.

Frontotemporal lobar degeneration (FTLD) refers to the second most common form of young-onset dementia after Alzheimer disease. 1 The major clinical syndromes are behavioral variant frontotemporal dementia (bvFTD) 2 or language dysfunctions, broadly called primary progressive aphasia (PPA); the latter is subdivided into semantic dementia or semantic variant PPA and progressive nonfluent aphasia (PNFA) or nonfluent/agrammatic variant PPA. 2,3 FTLD can also occur together with motor neuron disease (MND) or amyotrophic lateral sclerosis in a continuous spectrum of phenotypes. 4 In FTLD, repeat expansions in C9orf72 5 have been previously reported to occur in ;25% 6-10 of familial and ;6% 11 of sporadic cases (i.e., individuals with no clear familial history or genetic aetiology 12 ). Several studies had shown high frequencies of pathogenic C9orf72 expansions in Northern vs Southern European patients (North-South axis), especially in historically isolated populations (such as the Finnish 13,14 ), leading to the hypothesis that a Scandinavian founder might be at the basis of the spread of the C9orf72 expansion. 15 Other studies (based on the geographic location of the recruiting sites) challenged the North-South axis concept, reporting a high frequency (;25%) of pathogenic expansions in the Spanish population 10 or implying to the existence of more than 1 risk haplotype. [16][17][18][19] Patients with FTLD with abnormal C9orf72 repeat expansions exhibit marked phenotypic and pathologic heterogeneity, suggesting presence of additional (genetic and environmental) modifiers. 20 Despite conflicting studies reporting either direct or inverse correlation between repeat length and age at onset (AAO), C9orf72 expansions have been suggested to act as a genetic modifier of AAO. 16,[21][22][23][24] We analyzed 1,396 FTLD cases gathered through the International FTD Genetics Consortium (IFGC) (ifgcsite.wordpress.com/) phase III initiative, aiming at (1) characterizing C9orf72 expansions in relation to genetic ancestry and AAO and (2) assessing the usefulness of these measures in discriminating the behavioral from the language variant syndrome.

Methods
Standard protocol approvals, registrations, and patient consents Each contributing site obtained written informed consent from all patients to be part of extended genetic studies; the current study is approved under institutional review board approval 9811/001.
Cohort, clinical phenotyping FTLD cases were collected between 2016 and 2018 (within the IFGC phase III project [ifgcsite.wordpress.com/ongoing-projects/]). The samples were recruited by clinicians and research groups who are part of the IFGC network and based in Italy, Spain, Germany, the Netherlands, Belgium, the United Kingdom, Sweden, Norway, Slovenia, or the United States (supplementary table 1, doi.org/10.5522/ 04/12418157). Patients were diagnosed at each contributing site (supplementary table 2, doi.org/10.5522/04/ 12418157) in a harmonized fashion according to international consensus criteria such as those of Neary et al. 2 (for FTLD), Rascovsky et al. 25 (for bvFTD), Gorno-Tempini et al. 3 (for PPA [semantic dementia or PNFA]), and Strong et al. 4

(for FTLD-MND).
Genotyping, C9orf72 repeat expansions, and analysis cohorts A total of 1,454 cases were successfully genotyped by means of the NeuroArray 26 on the Illumina Infinium platform. Genotypes were used to inform on population substructure via standard principal components analysis (PCA) (supplementary figure 1, doi.org/10.5522/04/12418157), which led to the exclusion of 44 population outliers, and allowed us to address population substructure within the cohort (we identified 2 distinct [Nordic and Mediterranean] clusters; supplementary figure 2, doi.org/10.5522/04/12418157). We also assessed cryptic relatedness and excluded 14 firstor second-degree related individuals, leaving a cohort of 1,396 cases (group 0)-for which C9orf72 expansion status (i.e., presence/ absence of pathogenic expansions) was known-for analyses. Frequencies of pathogenic expansions were assessed in group 0 and further analyses were performed in (1) 1,295 cases (group 1: n = 800 bvFTD and n = 495 PPA) with known C9orf72 expansion status; (2) 1,179 cases (group 2; n = 756 bvFTD and n = 423 PPA) with known C9orf72 expansion status and AAO data available; and (3) 734 cases (group 3; n = 462 bvFTD and n = 272 PPA) with AAO and repeat counts (rc; screened via repeat-primed PCR) (see references 27 and 28; supplementary Methods and supplementary figure 3, doi. org/10.5522/04/12418157; and figure 1A).

Statistical analyses
We first assessed the frequency of pathogenic expansions in the entire cohort (group 0). The information on presence/absence of expansions was used as a binary variable (0 = absence of Glossary AAO = age at onset; bvFTD = behavioral variant frontotemporal dementia; FTLD = frontotemporal lobar degeneration; IFGC = International FTD Genetics Consortium; LOOCV = leave-one-out cross-validation; MND = motor neuron disease; PCA = principal components analysis; PNFA = progressive nonfluent aphasia; PPA = primary progressive aphasia; rc = repeat counts; SNP = single nucleotide polymorphism. expansion; 1 = presence of expansion). We then investigated differences in the frequencies of pathogenic expansions across bvFTD and PPA and the Nordic and Mediterranean clusters in group 1 (Fisher exact test) and in group 3 (logistic regression); in the latter, we used rc as a categorical variable (using no, short, intermediate, and long as factor levels) considering the following 4 categories: no expansions (rc = 2/3), short expansions (4 ≤ rc ≤ 8), intermediate expansions (9 ≤ rc ≤ 24), and long expansions (rc ≥ 25), the latter representing expansions in the pathogenic range (see references 10 and 22, supplementary Methods, and supplementary figure 3, doi.org/10.5522/04/ 12418157).
We then evaluated association between AAO and syndrome, genetic ancestry, and expansions (i.e., presence/absence used as a binary variable; see above) alone and with genetic ancestry as a covariate in group 2 (t test and logistic regression) and in group 3 (t test, analysis of variance with Tukey post hoc test, and logistic and linear mixed-effects model). In the latter case, we used rc as a categorical variable (see above).
Finally, we sought to build a model to predict syndrome (bvFTD vs PPA) using (1) presence/absence of pathogenic expansions (as binary variable [see above] for group 2) or (2) rc (as categorical variable [see above] for group 3), ancestry as binary variable, and AAO as continuous variable using logistic regression models (i.e., the leave-one-out cross-validation [LOOCV] and the K-fold models). A summary of the analyses workflow can be found in figure 1B.
All analyses were performed using R studio (version 3.6.0, studio version 1.2.1335).

Data availability
All data generated or analyzed during this study are included in this published article and supplementary files 1 and 2 at doi. org/10.5522/04/12418157.

C9orf72 expansions frequency and syndromes
We assessed the frequency of pathogenic expansions in the entire cohort and across the different syndromes in the group 0 cases (figure 1  5, doi.org/10.5522/04/12418157)-we analyzed the distribution of pathogenic expansions across syndromes and clusters. Stratified Fisher exact test showed significant differences in the distribution of the pathogenic expansions between bvFTD and PPA in the Nordic (but not the Mediterranean) cluster (p = 1 × 10 −4 ; OR 7.87; 95% CI 2.43-40.52), and between the Nordic and the Mediterranean clusters for the bvFTD (but not PPA) syndrome (p = 1.9 × 10 −2 ; OR 2.95; 95% CI 1.31-7.52), suggesting that ancestry (Nordic) and syndrome (bvFTD) are independently associated with pathogenic expansions (table 3).

Syndrome prediction
We then sought to build a model to predict syndrome (bvFTD vs PPA) and assess its accuracy. We analyzed both groups 2 and 3 cases using expansion status (presence/ absence of expansion for group 2 and the 4 rc factor levels for group 3

Discussion
This study aimed to characterize C9orf72 expansions in relation to genetic ancestry and AAO and to assess the usefulness of these measures in discriminating the behavioral from the language variant syndrome in a large pan-European cohort of 1,396 FTLD cases.
To our knowledge, the current work is unique in that, prior to characterizing the expansions, we excluded populationsubstructure bias using genome-wide genotyping data to cluster the cases on the basis of their genetic makeup. After PCA, we identified 2 distinct clusters including samples with geographic ancestry corresponding to Southern Europe (Mediterranean cluster) and Central/Northern Europe (Nordic cluster). Our analyses not only showed that patients from the Nordic cluster presented significantly higher frequency of pathogenic C9orf72 expansions compared to the Mediterranean cluster, but also that a core stretch of markers (n = 8) of the Finnish risk haplotype 29 appeared to be conserved across the Nordic expansion carriers, whereas there was a similar tendency for (just) 2 of such markers in the Mediterranean expansion carriers. Several studies had shown high frequencies of long C9orf72 expansions in Northern vs Southern European patients (North-South axis). [13][14][15] Other studies (based on the geographic location of the recruiting sites) challenged the North-South axis concept, 10 or the founder effect implying the existence of more than 1 risk haplotype. [16][17][18][19] All this taken together, our current data appear to support the North-South axis hypothesis and suggest that rearrangements (and instability) 16,19 at the C9orf72 locus might have occurred, reducing the level of conservation of the original risk haplotype across the European population.
We found pathogenic expansions in ;4% of all cases and that the proportion of expansion carriers was significantly higher in bvFTD compared to PPA. The fact that we overall identified significant association between pathogenic expansions and a diagnosis of bvFTD and Central/Northern European ancestryfindings in line with previous reports 8,10,13,20,[30][31][32][33][34] -suggests that C9orf72 expansions might serve as useful genetic fingerprint to define subpopulations of FTLD ( figure 3). Of note, we observed a trend of association with syndrome (bvFTD) and genetic ancestry (Central/Northern European) already supported by the intermediate repeat counts (9 ≤ rc ≤ 24) category. This appears in line with previous reports suggesting that individuals with 7-24 alleles might have an increased risk to convert to carriers of pathologic repeat expansions 10,22 and may, altogether, be useful information in the context of diagnostics.
Despite some previous conflicting reports of direct (or inverse) correlation between C9orf72 expansions and AAO, 16,21,23 we (as others 22,24 ) found a significant inverse correlation between C9orf72 expansion length and AAO. In addition, and interestingly, our data also indicate that Central/Northern European genetic ancestry contributes to a decreased AAO (independently from the expansions), possibly implying a more complex genetic signature (or architecture), and subsequently molecular mechanisms, underpinning this feature. Clearly, disease mechanisms that involve C9orf72 expansion length and AAO are complex, thus it is likely that additional factors might further modulate their relationship and effect on the phenotype (see also Babić Leko et al. 5 ).
While using expansion length, genetic ancestry, and AAO in a regression model to discriminate behavioral from language variant subtypes, we found that such measures supported a prediction of bvFTD with 64% accuracy.
Our results have a number of implications. First, provided that significant variation exists in the genetic architecture of the Caucasian population, 35 genetic variability characterizing and differentiating Nordic vs Mediterranean subjects (such as in the case of our cohort) might influence predisposition to harboring longer repeat expansions. In other repeat expansion diseases-e.g., Huntington disease or other microsatellite diseases, including myotonic dystrophy and spinocerebellar ataxias 35 -the presence of specific haplogroups in Western European populations occurs with a manifold increase in prevalence of repeats compared to other ethnic groups and populations. 36 Second, different genetic risk architectures underpinning different (and possibly genetically more homogeneous) subpopulations of patients may exist within the FTLD population.
In a nutshell, our results imply that a significantly higher proportion of FTLD cases, with Nordic rather than Mediterranean genetic ancestry, is likely to develop bvFTD in presence of intermediate and long (pathogenic) expansions, whereas long (pathogenic) expansions are (almost) negligible in PPA, regardless of ancestry. Clearly, multiple factors including genetic heterogeneity, epigenetic changes, ethnicity, as well as environmental factors and habits that may subsist within and across multicultural cohorts, all together, contribute to disease predisposition, onset, and progression. 22,37,38 These concepts, reinforced by our study, warrant further characterization of genetic, environmental, and additional clinical measures to finetune models able to predict disease outcome to complement diagnostic criteria, and possibly assist in the identification of informative cohorts for tailored clinical trials and the development of effective personalized therapies.