Meta-analysis in more than 17,900 cases of ischemic stroke reveals a novel association at 12q24.12

Objectives: To perform a genome-wide association study (GWAS) using the Immunochip array in 3,420 cases of ischemic stroke and 6,821 controls, followed by a meta-analysis with data from more than 14,000 additional ischemic stroke cases. Methods: Using the Immunochip, we genotyped 3,420 ischemic stroke cases and 6,821 controls. After imputation we meta-analyzed the results with imputed GWAS data from 3,548 cases and 5,972 controls recruited from the ischemic stroke WTCCC2 study, and with summary statistics from a further 8,480 cases and 56,032 controls in the METASTROKE consortium. A final in silico “look-up” of 2 single nucleotide polymorphisms in 2,522 cases and 1,899 controls was performed. Associations were also examined in 1,088 cases with intracerebral hemorrhage and 1,102 controls. Results: In an overall analysis of 17,970 cases of ischemic stroke and 70,764 controls, we identified a novel association on chromosome 12q24 (rs10744777, odds ratio [OR] 1.10 [1.07–1.13], p = 7.12 × 10−11) with ischemic stroke. The association was with all ischemic stroke rather than an individual stroke subtype, with similar effect sizes seen in different stroke subtypes. There was no association with intracerebral hemorrhage (OR 1.03 [0.90–1.17], p = 0.695). Conclusion: Our results show, for the first time, a genetic risk locus associated with ischemic stroke as a whole, rather than in a subtype-specific manner. This finding was not associated with intracerebral hemorrhage.

cardiovascular disease and stroke, suggesting the nonstroke content of the Immunochip may provide additional information when considering the stroke phenotype. 5,6 We report here the use of the Immunochip as the initial phase of a targeted GWAS, followed by metaanalysis with full GWAS data from WTCCC2 and an international collaboration of ischemic stroke GWAS data (METASTROKE). This is followed by in silico replication (i.e., ascertainment from previous data without the need for de novo genotyping) with data from the IN-TERSTROKE and VISP studies.
METHODS Study design and participating studies. The discovery sample consisted of 6 cohorts of patients of European ancestry with ischemic stroke. Participating centers were based in Belgium, Germany, the Netherlands (the PROMISe Study), Poland, Sweden, and the UK (2 cohorts, one from London [Imperial College; the BRAINS study] and one from Glasgow). All cohorts provided geographically and ancestry-matched controls. For the purposes of meta-analysis, the UK cohorts were treated as a single center in line with previous analyses undertaken in WTCCC2. 1 Analysis plan. The analysis plan for this study was to perform a single meta-analysis of available data as follows: (1) association analysis of imputed Immunochip data; (2) meta-analysis with HAPMAP2imputed WTCCC2 data and METASTROKE consortium data for which summary statistics were available; and (3): in silico look-up of significant SNPs from meta-analysis in the INTERSTROKE cohort 7 and the VISP cohort. 8 The populations used in both WTCCC2 and METASTROKE have been previously reported. 1,3 The WTCCC2 data have also been contributed to METASTROKE. Therefore, for this analysis the WTCCC2 data were removed from METASTROKE to prevent duplication of individuals, as was the BRAINS dataset, which overlapped with BRAINS cases contributing to the Immunochip discovery cohort. Table 1 includes full details of the discovery cohorts and outlines details of the WTCCC2, METASTROKE, INTERSTROKE, VISP, and intracerebral hemorrhage (ICH) cohorts. A GWAS standard a priori significance threshold of 5 3 10 28 was considered as a significant finding prior to analysis.
We additionally determined whether genome-wide associated SNPs from this analysis were also associated with primary ICH by in silico replication in GWAS data from a meta-analysis of 1,088 ICH cases and 1,102 controls (Genetics of Cerebral Hemorrhage with Anticoagulation [GOCHA] study). 9,10 Full population details and demographics of all consortia are available in their original publications. 1,3,7,8,9,10  this design of the Immunochip, WTCCC2 disease areas, including ischemic stroke, were able to suggest ;3,000 novel SNPs for incorporation into the array. The Immunochip therefore contains a subset of stroke-specific SNPs from an early analysis of WTCCC2 ischemic stroke data. However, for this study we used the entire Immunochip content. The 6 discovery phase cohorts used the commercially available Immunochip array (Illumina, San Diego, CA). Genotyping for the PROMISe study (the Netherlands) was performed independently in Utrecht, the Netherlands. Genotyping for the remaining 5 case cohorts was performed at the Sanger Centre, Hinxton, Cambridge, UK. Swedish controls were provided and genotyped by the Swedish SLE network, Uppsala, Sweden. Belgian control samples were provided through the efforts of the International Multiple Sclerosis Genetics Consortium. Analysis and quality control (QC) of the PROMISe study (the Netherlands) was performed in Utrecht and for all other cohorts was performed at St George's, University of London, UK.
The Immunochip datasets were each imputed separately to the 1,000 Genomes Phase 1 integrated variant set (March 2012) using IMPUTE v2.2.2. 11 Standard parameters were used with the exception of the number of haplotypes (k), which was increased to 100 to maximize accuracy. In total, between 123,920 and 135,006 SNPs were directly genotyped for the Belgian, German, Polish, Swedish, and British cohorts. After imputation and QC filtering on IMPUTE-info scores with filtering threshold ,0.3 and minor allele frequency (MAF) ,0.01, there were between 3,601,403 and 4,170,444 autosomal SNPs for final analysis. For the PROMISe study (the Netherlands), stricter QC parameters on the directly genotyped SNPs resulted in 88,511 directly genotyped SNPs and 3,524,203 SNPs for final analysis after imputation. Full details of SNPs at all stages for the discovery cohorts are listed in table e-1. The lambda value (a measure of genomic control to account for overinflation of falsepositive results) for the imputed Immunochip cohorts (filtered for heterogeneity or missingness as in the meta-analysis) was l 5 1.165, equating to l 1000 5 1.036 using the method of de Bakker et al. 12 Selecting the stroke-specific subset of SNPs produced l 5 1.252 (l 1000 5 1.055) and as a common null set the "reading and writing SNP subset" from WTCCC2 l 5 1.300 (l 1000 5 1.066), showing little evidence for inflation in the stroke subset or the Immunochip overall. QQ plots are shown in figure e-1.
Association analysis of clean imputed datasets was performed for each Immunochip cohort individually using the frequentist test under an additive model as implemented in SNPTEST v2, 13 including sex and 10 principal components as covariates. Imputed genotype probabilities were taken into account using a missing data likelihood score test or an expectation-maximization method for SNPs with low MAF or high uncertainty. Association analyses were performed on all ischemic stroke cases and for the defined subtypes of large artery stroke, small vessel disease, and cardioembolic stroke.
Meta-analyses were performed using an inverse-variance weighted fixed-effects model as implemented in METAL. 14 SNPs were taken into consideration only if they were present in at least 50% of datasets, were genotyped or imputed in all phases, and if the p value for Cochrane Q test for heterogeneity exceeded 1 3 10 23 . For targeted replication with data from INTERSTROKE and VISP, summary statistics were provided for 2 SNPs (rs17696736 and rs10744777) and meta-analyzed as above.
Conditional analysis. Conditional analysis of the Chr12 locus was performed by including genotype dosage of each of the 10 genome-wide significant SNPs as a covariate in the logistic regression model independently, as well as by inclusion of all 10 SNPs together. This was performed on Immunochip and WTCCC2 data only since individual level genotypes were not available for METASTROKE.
Risk factor defined analysis. A case-only risk factor defined analysis was performed by classifying presence (case) or absence (control) of defined risk factors in the Immunochip and WTCCC2 cohorts (individual genotype level data were not available for the METASTROKE, INTERSTROKE, or VISP cohorts). The 5 cardiovascular risk factors hypertension, diabetes, hypercholesterolemia, coronary artery disease/ischemic heart disease, and smoking status were then assessed independently across the Chr12 locus.
RESULTS Meta-analysis of Immunochip data. Analysis of the 6 Immunochip cohorts comprising 3,420 cases and 6,821 controls resulted in identification of 3 SNPs spanning 2 independent loci on chromosomes 10q26 and 19q13 exceeding a genome-wide significance threshold of 5 3 10 28 for all ischemic stroke, and a further 9 SNPs spanning 5 loci in large artery ischemic stroke (top SNPs in each of the 7 loci shown in table 2, block 1, full findings in table e-2). No SNP exceeded a genome-wide threshold in cardioembolic stroke or small vessel disease when examining Immunochip cohorts alone.
Meta-analysis with WTCCC2 and METASTROKE data confirmed previously published associations between the HDAC9 locus on chromosome 7p21 and large artery stroke, and between PITX2 and ZFHX3 loci at 4q25 and 16q22 and cardioembolic stroke. Meta-analysis also revealed a novel locus at 12q24.12 (top SNP rs17696736, p 5 6.06 3 10 210 ). All of these loci exceeded a genome-wide threshold of 5 3 10 28 . The 3 SNPs identified in the Immunochip data alone showed no replication (table 2, block 2).
Targeted in silico replication of the 12q24.12 region in INTERSTROKE and VISP strengthened the association, revealing a new top SNP at this locus (rs10744777, p 5 7.12 3 10 211 , odds ratio [OR] 1.10 [1.07-1.13]). A forest plot displaying results for all cohorts at rs10744777 is shown in figure 1A. The association was similar across all ischemic stroke subtypes as measured by effect size, with no evidence of subtype specificity (table 3).
Conditional analysis of 12q24.12. The significant SNPs in 12q24.12 spanned 2 Mb of DNA encompassing 16 genes. Across this region, 10 SNPs reached significance levels of p , 5 3 10 28 (figure e-2A). To investigate whether there was a single signal or multiple signals across this locus, we performed a conditional analysis on rs10744777 in those cohorts for which we had genotypic level data (Immunochip and WTCCC2). None of the 9 other genome-wide significant SNPs remained significant after controlling for rs10744777 (table e-3 and figure e-2B). The same effect was identified when conditioning on each of the 9 other SNPs independently (data not shown).
Risk factor defined analysis. A lack of risk factor data in all controls prevented risk factors from being used in a conventional stratified analysis. However, to examine the effect of underlying risk factors on the 12q24.12 region driving the observed association with all ischemic stroke, we performed a case-only risk factor defined analysis in which cases were subdivided on the basis of presence (case) or absence (control) of 5 available cardiovascular risk factors-hypertension, diabetes mellitus, hypercholesterolemia, smoking, and past history of symptomatic coronary artery disease/ischemic heart disease. These are defined in the online supplementary material. Regional analysis of the Chr12 locus was performed on Immunochip and WTCCC2 data (a lack of individual level genotypes prevented this analysis from being conducted in METASTROKE data). Table 4 shows the lack of association, when classifying cases and controls by the presence of risk factors, for rs10744777. Therefore we found no evidence that the association with stroke was mediated via a single conventional cardiovascular risk factor.
DISCUSSION Adopting a GWAS approach with direct genotyping and imputation with the Immunochip array, followed by meta-analysis with existing GWAS data in ischemic stroke, we identified a novel risk locus for ischemic stroke on chromosome 12q24.12. Unlike all previous GWAS-identified ischemic stroke loci, this locus does not appear to be associated with a single subtype but rather is associated with ischemic stroke as a whole.
The 12q24.12 locus had been included in the Immunochip due to its association with type 1 diabetes mellitus and was not one of the 3,000 stroke SNPs included from initial analysis of the WTCCC2 stroke study. 15 Type 1 diabetes, although associated with an increased risk of stroke and other premature cardiovascular disease, is rare and therefore accounts for only a very small proportion of total stroke risk on a population basis. In addition to type 1 diabetes, the 12q24.12 locus has been associated with a number of cardiovascular risk factors, including blood pressure and cholesterol levels. To investigate whether the association might be mediated via these risk factors, we performed a case-only risk factor defined analysis, which showed no evidence that the association with stroke was mediated via conventional cardiovascular Table 2 Top SNP for each locus exceeding a genome-wide threshold of 5e 208 risk factors. Furthermore, given the strong role hypertension plays as a risk factor for hemorrhage, the lack of association with ICH is additional evidence that hypertension does not mediate the association.
The mechanism by which this variant might increase risk of all ischemic stroke without increasing ICH risk is uncertain. However, as the mechanisms of arterial disease differ between stroke subtypes, this finding would be consistent with a systemic risk factor such as altered coagulation rather than a risk factor associated with a single ischemic stroke subtype. Chromosome 12q24.12 has been identified as a region likely to have undergone positive selection in Europeans about 3,000-4,000 years ago, and as such features a complex long-ranging linkage disequilibrium (LD) pattern that does not facilitate identification of a causal variant among the 10 genome-wide significant SNPs identified in this study. 16,17 However, the lead SNP, rs10744777, has been identified as an expression quantitative trait locus for ALDH2 in monocytes. 18 ALDH2 codes for mitochondrial aldehyde dehydrogenase 2, which plays a key role in ethanol metabolism but has also emerged as a potentially protective agent in myocardial ischemia. 19 The only nonsynonymous SNP in LD with any of the genomewide significant variants is rs3184504 in SH2B3. rs3184504 has previously been associated with blood pressure and coronary artery disease/myocardial infarction. 20,21 The missense mutation is classified as benign (PolyPhen-2) and tolerated (SIFT); however, it is thought to change 2 transcription factor motifs. SH2B3 (Src homology 2-B3, also: Lnk) has been implicated in inflammation and innate immunity and has been shown to influence endothelial cell migration and adhesion in vitro. 22 Detailed functional work will be required to elucidate the possibility of these 2 or any of the other 14 genes as potential candidates in ischemic stroke. We did perform a conditional analysis on rs3184504 in the Immunochip and WTCCC2 data, whereby rs10744777 remained significant (OR 5 1.12, 95% confidence interval 1.06-1.18, p 5 3.52 3 10 25 ). We therefore cannot exclude the possibility of independent signals in this region underlying the range of reported significant phenotypes.
In addition to the identification of chromosome 12q24.12 as a novel stroke locus, we show for the first time genome-wide significant association of SNPs in PITX2 and HDAC9 with all ischemic stroke; previously, associations had only been detected with cardioembolic and large artery stroke subtypes, respectively. This is likely to be an effect of increased sample size, however, as there is no evidence for these loci being risk alleles in subtypes other than cardioembolic stroke and large artery stroke.
There are a number of limitations to this analysis that would benefit from further exploration. The Immunochip is a chip focused preferentially on immune-related genes. As such, this study cannot be considered to be a "full" GWAS in a true sense, since regions of the genome lacking immune-related genes will not be covered. This effect is negated somewhat by imputation, but it is possible there are other risk alleles for ischemic stroke that remain to be identified in a cohort of this size. Although we included almost 18,000 stroke cases in the largest stroke GWAS meta-analysis to date, the numbers in individual subtypes were smaller; therefore, we cannot completely exclude the possibility that the association with chromosome 12q24.12 is predominantly mediated by a single subtype. Although the effect sizes in the subtype analyses are similar, we are unable to test the possibility of no significant difference between them directly due to the use of shared controls between subtypes. This could be overcome by subtype-specific analyses as a primary endpoint in a future study. We are also unable to definitively exclude the possibility of this association being driven by underlying risk factors we are not powered to detect. The lack of risk factor data in controls prevented a more conventional stratified analysis.
However, we performed a case-only risk factor defined analysis and this showed no evidence that the association with stroke was mediated via conventional cardiovascular risk factors. There is also the limitation of all such case-control studies that controls may become cases in the future, and in the case of stroke it is possible controls may have had a clinically undiagnosed "silent" stroke. Larger sample sizes are one mechanism that can be used to negate this possibility. In line with other genetic studies, the ORs associated with this finding are highly significant but small. Further functional genetic studies will be required to elucidate the mechanism of action associated with this finding, leading to patient benefit. We have identified, through the largest metaanalysis of ischemic stroke GWAS data to date, a novel locus on Chr12 increasing risk in all subtypes of ischemic stroke but not ICH. Previous GWAS associations with stroke have been subtype specific, and this represents the first genome-wide association increasing risk of all ischemic stroke subtypes. Table 3 Association between rs10744777 and ischemic stroke and its subtypes by analysis stage  approval, statistical analysis. Joshua C. Bis: analysis or interpretation of data, accepts responsibility for conduct of research and will give final approval, statistical analysis. Giorgio B. Boncoraglio: drafting/revising the manuscript, accepts responsibility for conduct of research and will give final approval, acquisition of data. James Meschia: drafting/revising the manuscript, accepts responsibility for conduct of research and will give final approval, acquisition of data. M. Arfan Ikram: drafting/revising the manuscript, accepts responsibility for conduct of research and will give final approval, acquisition of data, study supervision, obtaining funding. Bjorn M. Hansen: drafting/revising the manuscript, accepts responsibility for conduct of research and will give final approval, acquisition of data. Joan Montaner: study concept or design, analysis or interpretation of data, accepts responsibility for conduct of research and will give final approval, acquisition of data. Gudmar Thorleifsson: analysis or interpretation of data, accepts responsibility for conduct of research and will give final approval, statistical analysis. Kari Stefanson: analysis or interpretation of data, accepts responsibility for conduct of research and will give final approval, acquisition of data, study supervision. Jonathan Rosand: drafting/revising the manuscript, study concept or design, accepts responsibility for conduct of research and will give final approval, acquisition of data, study supervision, obtaining funding. Paul I. W. de Bakker: drafting/revising the manuscript, study concept or design, accepts responsibility for conduct of research and will give final approval, acquisition of data, statistical analysis. Martin Farrall: study concept or design, analysis or interpretation of data, accepts responsibility for conduct of research and will give final approval, acquisition of data, obtaining funding. Martin Dichgans: drafting/revising the manuscript, analysis or interpretation of data, accepts responsibility for conduct of research and will give final approval, acquisition of data. Hugh S. Markus: drafting/revising the manuscript, study concept or design, analysis or interpretation of data, accepts responsibility for conduct of research and will give final approval, acquisition of data, study supervision, obtaining funding. Steve Bevan: drafting/revising the manuscript, study concept or design, analysis or interpretation of data, accepts responsibility for conduct of research and will give final approval, acquisition of data, statistical analysis, study supervision.

STUDY FUNDING
No targeted funding reported.