A genome-wide association study in multiple system atrophy

Objective: To identify genetic variants that play a role in the pathogenesis of multiple system atrophy (MSA), we undertook a genome-wide association study (GWAS). Methods: We performed a GWAS with >5 million genotyped and imputed single nucleotide polymorphisms (SNPs) in 918 patients with MSA of European ancestry and 3,864 controls. MSA cases were collected from North American and European centers, one third of which were neuropathologically confirmed. Results: We found no significant loci after stringent multiple testing correction. A number of regions emerged as potentially interesting for follow-up at p < 1 × 10−6, including SNPs in the genes FBXO47, ELOVL7, EDN1, and MAPT. Contrary to previous reports, we found no association of the genes SNCA and COQ2 with MSA. Conclusions: We present a GWAS in MSA. We have identified several potentially interesting gene loci, including the MAPT locus, whose significance will have to be evaluated in a larger sample set. Common genetic variation in SNCA and COQ2 does not seem to be associated with MSA. In the future, additional samples of well-characterized patients with MSA will need to be collected to perform a larger MSA GWAS, but this initial study forms the basis for these next steps.

To investigate whether common genetic risk factors play a role in MSA, we performed a genome-wide association study (GWAS) including 918 patients with MSA and 3,864 controls.
METHODS Study design. We performed a multicenter GWAS with patients with MSA of European ancestry and regionally matched controls. Collection sites for this study included brain banks and clinical centers in Germany, Austria, Netherlands, Denmark, the United Kingdom, Italy, Portugal, Spain, and the United States (table e-1 at Neurology.org). Since patients were recruited in different geographic regions across Europe and the United States, we matched cases with regional controls. Genotype data of healthy individuals from previous GWAS were obtained from Germany, the United Kingdom, the United States, and Italy. The MSA cases were grouped into 4 geographic regions and each case was matched with 4 controls from the same geographic region: UK and Irish MSA cases were matched with UK controls; Northern European MSA cases from Germany, Netherlands, Belgium, Austria, Denmark, and Sweden were matched with German controls; Southern European cases from Italy and Spain were matched with Italian controls; American MSA cases were matched with American controls. Matching was carried out using multidimensional scaling based on the first 2-component vectors for each subpopulation, derived from common variants assayed in both cases and controls (minor allele frequency [MAF] . 0.05). A flowchart of studied populations and quality control steps is shown in figure e-1.
Study participants. DNA samples from a total of 1,030 patients with MSA were collected for this study (details for each population are shown in table 1, demographics are listed in table e-2). Patients were clinically diagnosed with possible or probable MSA (n 5 699) by movement disorders specialists, or pathologically with definite MSA (n 5 331) by neuropathologists according to Gilman criteria. 13 For controls, we used genotype data from 3,864 previously published neurologically normal individuals from the United Kingdom (n 5 936 samples), Germany (n 5 944 samples), United States (n 5 794 samples), and Italy (n 5 1,190 samples) (table 1). All controls were genotyped on Illumina Bead-Chips (Illumina, San Diego, CA, USA). Details about sample collection and genotyping procedures in control cohorts have been described elsewhere. 7,[14][15][16] Standard protocol approvals, registrations, and patient consents. The appropriate institutional review boards approved the study, and written informed consent was obtained for each participant.
Quality control. For quality control, we excluded samples with a genotyping call rate ,95%, duplicate samples, cryptically related individuals (pi-hat threshold .0.2), individuals in whom the reported sex did not match the genotypic sex, and individuals with non-European ancestry as determined by multiple dimensional scaling analysis (samples deviating .6 SDs from the CEU/ TSI population were excluded; see figure e-2). Next, we excluded SNPs with inaccurate cluster separation (cluster separation score ,0.3), SNPs that were not shared among all 3 genotyping chip versions, and SNPs with an individual SNP call rate ,95%. Following this step, we excluded SNPs with missingness by haplotype or phenotype that exceeded a significance of p value , 0.0001 or a significant deviation from Hardy-Weinberg equilibrium with a p value , 0.00001. Of the remaining SNPs, only those with a MAF .0.01 were used for downstream analyses.
Power calculations and heritability analysis. Power calculations were performed for a GWAS testing 918 cases and 3,864 controls under a log-additive and a recessive model (QUANTO software v1.2.3). A minor allele frequency .1% and a 2-sided a 5 5 3 10 28 were assumed. Under the log-additive model (figure e-3A), power analysis for this study indicated greater than 80% power to detect associated loci with an odds ratio greater than 1.8 at risk allele frequencies between 7% and 87%. We recently also performed a heritability analysis based on our GWAS data demonstrating that the heritability for MSA due to common coding variants is estimated to be between 2.1% and 6.7%. 17 Genotype imputation. About 4,903,804 SNPs were imputed based on haplotype reference data from the 1000 Genomes project (1000genomes.org/; December 2010 version). These data were derived from studying the genomic sequence of 104 participants of European ancestry. Imputations were performed for each of the 4 cohorts separately using MACH software as described elsewhere (version 1.0.16). 18,19 SNPs with an R 2 ,0.3 and MAF ,0.01 were excluded from further analysis as imputed genotypes below this threshold are likely to be of poor quality.
Association tests. Due to the relatively diverse ancestries of European and US cohorts included in this study, we used principal component vectors 1 to 10 from a multidimensional scaling analysis generated in PLINK as covariates to adjust for possible population substructure in the logistic regression model  l 5 1.057. For each SNP, p values and odds ratios under the additive models were calculated using Mach2dat software. 18,19 Only SNPs exceeding the conservative Bonferroni threshold for multiple testing (p , 5 3 10 28 ) were considered genome-wide significant. We also performed a subanalysis for association in pathologically confirmed MSA cases (n 5 295 cases after quality control) vs 3,864 controls. The genomic inflation factor in this subgroup was l 5 1.024. SNCA SNP genotyping using a restriction enzyme assay. SNP rs11931074, located downstream of the SNCA gene, was regenotyped using a restriction enzyme assay. Primers were designed to amplify the SNP and surrounding region. Restriction enzyme Bsr1 (New England Biolabs, Ipswich, MA) was used to cut the PCR product. The major allele was cut, whereas the minor allele remained uncut. Disease association was tested with a x 2 test.
Analysis of brain tissue quantitative trait loci. For SNPs of interest, we attempted to infer functional consequences in frontal cortex and cerebellar tissue samples from neurologically normal individuals that were assayed for both genome-wide methylation and expression levels. 20 These analyses may shed light on potential disease mechanisms for follow-up in future studies. We tested cis associations (any methylation or expression probes 61 Mb from each SNP) in each of the datasets. 21 RESULTS Genotyping, imputation, and quality control.
After quality control procedures, the total study consisted of 918 cases and 3,864 controls. Of the MSA cases, 291 had a pathologically confirmed diagnosis. Samples were successfully genotyped for 267,998 SNPs and imputed to 4,903,804 SNPs. After removal of extreme ancestry outliers and using multidimensional scaling component vectors as covariates in regression models, only mild population stratification was evident based on genomic inflation factor calculations (l 5 1.057) (figure e-2).
Association results. Statistical analysis of association under an additive model was performed in the full case-control dataset (figure 1) and in the subset of MSA cases with neuropathologically confirmed diagnosis. Loci with the lowest p values (,1 3 10 26 ) in the full dataset and pathology confirmed cases are shown in tables 2 and e-3, respectively. In the full dataset, we identified 4 loci for future follow-up in a larger sample series at a p value ,1 3 10 27 including the genes FBXO47, ELOVL7, EDN1, and MAPT. None of the tested SNPs surpassed the Bonferroni threshold for multiple testing (p value , 5 3 10 28 ) in the full dataset or in the subanalysis of pathologically confirmed cases.
COQ2 in the MSA GWAS data. Recently, homozygous (or compound heterozygous) mutations in COQ2 were reported in familial cases of MSA in Japan. 10 While coding variants were also found at a higher frequency in sporadic MSA compared to controls, this finding did not replicate outside of Japanese MSA samples. We investigated the COQ2 locus for common variation in our GWAS data. A total of 453 SNPs in the COQ2 gene and the flanking region 6100 kb (build 36.3 positions) were tested for association. The most associated SNP was chr4:84473327 with a p value of 0.02169, which is far from genome-wide or even regional Bonferroni-adjusted significance. We therefore lack evidence that common genetic variation in the COQ2 locus plays a major role in MSA risk in the European/Northern American population.
SNCA rs11931074 SNP genotyping. SNP rs11931074 in the SNCA locus (or rs3822086, which is in close linkage disequilibrium) has been linked to MSA in several smaller studies. 8,22,23 This SNP is not represented on the Om-niExpress Bead arrays; hence, it was not genotyped in any samples run on this array (n 5 703) and consequently was not in the dataset of genotyped SNPs in common. Instead, the allele frequency was calculated postimputation, demonstrating that it was neither significantly associated in the entire dataset (p value 5 0.4722) nor in the subgroup of definite MSA cases (p value 5 0.4407). We decided to regenotype SNP rs11931074 in the MSA GWAS cohort using a restriction enzyme to exclude technical or imputation uncertainty (despite a high imputation score of 0.9670). Within the GWAS cohort, 906 MSA samples were available for regenotyping (table 3). In controls, data for this SNP were available from the bead array data. The minor allele of SNP rs11931074 was slightly more common in MSA cases vs controls (8.3% vs 7.3%) but again no significant association was found with MSA (x 2 5 2.6; p value 5 0.27). This is in concordance with the imputed GWAS finding, but contradicts previous reports in the literature. 8,22,23 Analysis of brain tissue quantitative trait loci in candidate loci. Twenty-one out of 24 SNPs of interest (p value , 1 3 10 25 ) passed quality control in the mRNA expression datasets, and 23 SNPs of interest passed quality control in the CpG methylation datasets. We tested multiple probes per SNP in each set of analyses. A total of 168 unique SNP-probe pairs were tested in the frontal cortex mRNA expression dataset, 165 pairs in the cerebellar mRNA expression dataset, and 391 pairs in the frontal cortex and cerebellar CpG methylation dataset. Associations were tested using linear regression adjusting for appropriate covariates and resulting p values were adjusted based on the falsediscovery rate correction as previously described. 21 After correcting for multiple testing, we found 8 significant associations between SNPs of interest and either CpG methylation or mRNA expression (table e-4). All of these were located on chromosome 17 and associated with 3 SNPs (rs78523330, rs9303521, and rs916888) in a region spanning 8 MB. SNP rs9393521 is located only 174 Kb upstream of MAPT, a gene previously suggested as a risk gene for MSA, and it is within a large 900 kb inversion polymorphism surrounding the MAPT locus.
DISCUSSION The contribution of common genetic factors to disease development has been established for many neurodegenerative conditions, which are complex, often sporadic conditions such as PD and progressive supranuclear palsy. In MSA, the identity of these contributing genetic factors remains almost entirely unknown. Here, we describe a GWAS investigating common genetic markers in 918 MSA cases and 3,864 controls for disease association. Although none of the tested SNPs surpassed the stringent Bonferroni threshold, we identified 4 MSA regions with p values ,1 3 10 26 : FBXO47, ELOVL7, EDN1, and MAPT. These loci are most promising for further follow-up.
Among the most highly associated regions is the MAPT locus. MAPT is an intriguing candidate as it has already been implicated in a number of neurodegenerative diseases, particularly in tauopathies such as progressive supranuclear palsy, frontotemporal degeneration, and Alzheimer dementia. 24,25 Although tau is not considered a key protein in MSA neuropathology, PD-which is also an a-synucleinopathy-has been found to be consistently associated with common variation in the MAPT locus. 7 Furthermore, MSA was also previously found to be associated with the MAPT H1 haplotype in a small cohort of American MSA cases. 9 These results, however, will need to be verified in a larger MSA cohort.
Located on chromosome 6, EDN1 is part of the endothelin gene family, which functions in the maintenance of vascular tone. Studies have demonstrated the presence of endothelin peptides in nonvascular structures including epithelial cells, glia, and neurons. 26,27 Given the phenotype of autonomic dysfunction or failure in patients with MSA, one may hypothesize that such pathophysiology may be associated with variation in EDN1.
ELOVL7, originating on chromosome 5, is involved in transferase activity. Chiefly, transferase is a condensing enzyme that acts as a catalyst for the synthesis of saturated and polyunsaturated very long chain fatty acids. 28,29 FBXO47 has a known association with papillary renal cell carcinoma. 30 Located on chromosome 17, FBXO47 promotes protein ubiquitination and degradation via phosphorylation, which are critical mechanisms in many neurodegenerative diseases. 30,31 This locus could provide an important mechanism to MSA pathology in light of the oligodendroglial a-synuclein burden, but requires further study to draw any conclusions.
Common variation in the COQ2 gene (recently found to be mutated in familial and sporadic probable MSA cases in Japan) did not show a significant association in the MSA GWAS data. This suggests that common genetic variation in this gene does not play a major role in disease pathogenesis in patients with MSA of European descent.
We were unable to replicate the previously reported association of variants at the SNCA. 8,22,23 A Korean study was also unable to reproduce the association indicating possible differences between populations or an actual null result. 32 The previously associated SNP rs11931074 has considerable variation in frequency among different populations. The reported risk allele has a frequency of about 7%-8% in our different European controls and in the European HapMap data. 33 In contrast, the same allele is much more abundant in Asian and African populations (e.g., 58% in Japanese and 68% in Yorubans). Our study attempts to account for both interpopulation and intrapopulation heterogeneity through principal component analysis, and interpopulation heterogeneity of SNCA is a plausible explanation for the previous findings.
The quantitative trait analysis data suggest a complex risk locus on chromosome 17. This includes several SNPs in the region associated with multiple expression or methylation effects. Several of these genes are involved in pathways potentially relevant to MSA pathology: namely, the ribosylation protein (ARL17A), the ribosome protein (RPL19), and the proteasome protein (PSMB3).
The disease pathology of several neurodegenerative disorders illustrates a common theme of disrupted proteasomal protein clearance and degradation. Notably, PSMB3 may be involved in trinucleotide repeat expansion-a phenomenon seen in several hereditary neurologic disorders (in particular autosomal dominant ataxias)-and it may be driving the trinucleotide expansion process. 34 Supporting this hypothesis, there is considerable clinical overlap between MSA and several hereditary ataxias. 35,36 Similarly, ribosomal dysfunction with aberrant protein synthesis may promote neurodegeneration. 37 Interestingly, no alteration in expression of MAPT-the most obvious candidate gene in the chromosome 17 locuswas found. All but one associated QTL in MSA were found in cerebellar tissue (table e-4). Since the cerebellum is among the most disturbed brain regions in MSA, differences in expression in this region are consistent with an etiologic role in MSA.
There are a number of limitations to this study. First, although this represents by far the largest collection of MSA samples to date, the number of MSA cases is still relatively small for a GWAS. GWAS are designed to identify disease variants of relatively high frequency (MAF .1%) in a population operating under the common disease/common variant hypothesis. 38 In PD, most identified risk factors have an odds ratio (OR) between 0.8 and 1.5. 39 Similarly, in Alzheimer disease-apart from APOE (OR ;4)-common genetic risk factors have low ORs ranging from 0.8 to 1.2. In these diseases, successful identification of significant loci was only feasible with considerably larger sample sizes than our study. However, with our current number we can reasonably exclude the existence of common risk alleles of large effect. A second limitation is the use of population control cohorts rather than age-matched controls, which increases the chance for population heterogeneity. We applied stringent corrections, including principal components from multidimensional scaling as covariates in our statistical model to address this concern. Third, the misdiagnosis rate of clinically ascertained patients, in particular those in early disease stages, can be as high as 38%. 40 We addressed this concern by including a large number of pathology-proven cases (; onethird of all patients in this study). The inclusion of pathology-confirmed cases is particularly important for future follow-up studies in MSA.
Finally, a major challenge is that currently no additional large cohorts of MSA cases with a similar genetic background are available. A comparison of European and non-European cohorts may be informative with regard to interpopulation heterogeneity.
None of the studied variants were statistically significant after appropriate correction for multiple testing. However, we identified 4 promising loci that could reveal important insight into the disease pathways of MSA. Increasing the sample size of our GWAS will be key to the successful confirmation of these candidate loci and the identification of other risk genes.