Dystrophin quantification

Objective: We formed a multi-institution collaboration in order to compare dystrophin quantification methods, reach a consensus on the most reliable method, and report its biological significance in the context of clinical trials. Methods: Five laboratories with expertise in dystrophin quantification performed a data-driven comparative analysis of a single reference set of normal and dystrophinopathy muscle biopsies using quantitative immunohistochemistry and Western blotting. We developed standardized protocols and assessed inter- and intralaboratory variability over a wide range of dystrophin expression levels. Results: Results from the different laboratories were highly concordant with minimal inter- and intralaboratory variability, particularly with quantitative immunohistochemistry. There was a good level of agreement between data generated by immunohistochemistry and Western blotting, although immunohistochemistry was more sensitive. Furthermore, mean dystrophin levels determined by alternative quantitative immunohistochemistry methods were highly comparable. Conclusions: Considering the biological function of dystrophin at the sarcolemma, our data indicate that the combined use of quantitative immunohistochemistry and Western blotting are reliable biochemical outcome measures for Duchenne muscular dystrophy clinical trials, and that standardized protocols can be comparable between competent laboratories. The methodology validated in our study will facilitate the development of experimental therapies focused on dystrophin production and their regulatory approval.

measures study group (BOM-SG) to provide a data-driven reproducible methodology for dystrophin quantification. In a pilot study comparing the sensitivity and reliability of the preferred individual laboratories' methodologies, we found significant levels of interand intralaboratory variability (data not shown). Herein, we present a controlled analysis of proposed standard operating procedures for quantitative immunohistochemistry and Western blotting for evaluation of dystrophin expression. We discuss the biological significance of our data in the context of dystrophic muscle pathology. We demonstrate that data from different laboratories can be comparable, thus validating immunohistochemistry and Western blotting as biochemical biomarkers for DMD clinical trials. Standard protocol approvals, registrations, and patient consents. We obtained written informed consent for the use of archived muscle tissues from all patients or guardians (as appropriate) under a protocol approved by the Nationwide Children's Hospital institutional review board. The studies at Great Ormond Street Hospital were performed under approval by the National Research Ethics Committee (05/MRE12/32).
Muscle biopsies. We selected muscle biopsies previously archived as part of the United Dystrophinopathy Project in the Flanigan laboratory. All biopsies had been assessed for dystrophin content on a clinical or research basis and were dispensed labeled only with a blinded code, maintained in the Flanigan laboratory. 18 Each laboratory received the same number of unfixed frozen serial 10-mm transverse muscle sections on microscope slides and an Eppendorf tube containing forty 10-mm sections of frozen muscle tissue. All laboratories were informed of the identity of the control biopsies.
Immunohistochemistry. The staining protocol, based on that of Taylor et al., 18 was as follows: • Transverse sections were air-dried at room temperature for 20 to 30 minutes and circled with a hydrophobic peroxidase-antiperoxidase pen. • Primary dystrophin (rabbit C-terminal ab15277; Abcam, Cambridge, MA) and spectrin (monoclonal NCL-SPEC1; Leica Microsystems Inc., Buffalo Grove, IL) antibodies were diluted (1:400 and 1:100, respectively) in phosphate-buffered saline (PBS) and incubated with the sections for 1 hour at room temperature. • Sections were washed (33) in PBS for 3 minutes each.
• Each laboratory used secondary antibodies compatible with their microscope, e.g., Alexa Fluor 488 goat anti-mouse immunoglobulin G (IgG) (A11017; Molecular Probes, Eugene, OR) and Alexa Fluor 568 goat anti-rabbit IgG (A11036; Molecular Probes). These were diluted 1:500 in PBS and incubated for 30 minutes in the dark at room temperature. • Sections were washed (33) in PBS for 3 minutes.
All laboratories measured dystrophin intensity using the Arechavala-Gomeza method, which measures the fluorescent intensity of 40 specific sarcolemmal regions of interest 19 selected manually at random. Each of these regions of interest includes maximum and minimum intensity data points that are used in the data analysis. In parallel, 3 laboratories (1, 4, and 5) also quantified dystrophin using the Taylor method. 18 This method makes use of the double staining with spectrin, another sarcolemmal protein whose level is unaffected in dystrophinopathy muscle, to create a mask that defines only the sarcolemmal area in each image. This mask allows the measurement of the intensity of the sarcolemmal area of the whole image. 18 In addition, laboratory 4 used the Beekman method in which a spectrin mask is also used to select the sarcolemma of each individual fiber of the image. 20 With this algorithm, the individual intensities of an average of 350 fibers are measured and the mean dystrophin intensity of the fiber population is calculated using Definiens  software. 20 The key difference between these methods relates to the number of fibers measured per captured image. For a detailed protocol of each method, see e-Methods on the Neurology ® Web site at Neurology.org.
Each laboratory used their preferred image acquisition equipment (e.g., Image J-based software, Odyssey infrared imaging system); the data were normalized to a-actinin and presented relative to an average of the 2 controls.
Statistical analysis. Experiments were performed in triplicate and statistical analysis was performed using GraphPad Prism version 5.03 (GraphPad Software, La Jolla, CA). The coefficient of variation (CV) was calculated using the formula CV 5 SD/ mean 3 100. For intralaboratory analysis, the CV for each laboratory for each biopsy was calculated (tables e-1 to e-6) and the CVs from the 6 biopsies were averaged for each laboratory.
The Bland-Altman plot was used to assess the agreement between different methods. 21 RESULTS Each laboratory ranked the samples according to the relative level of dystrophin expression determined by each technique (table 1). There was a high level of agreement among all laboratories for both immunohistochemistry and Western blotting. All laboratories identified the 3 BMD samples as having the highest dystrophin protein levels, although this top order differed between immunohistochemistry and Western blotting. By Western blot analysis, no laboratory could detect dystrophin in sample B and only 2 laboratories (3 and 4) could detect trace amounts of dystrophin protein in sample E.
Inter-and intralaboratory variability of dystrophin quantification using immunohistochemistry. From each laboratory's data (Arechavala-Gomeza 19 method), we calculated the mean (6SD) dystrophin levels of each sample and the CV (figure 1). Overall, the level of variability observed among the different laboratories was minimal with an average SD of 7.78 (ranging between 3.33 for sample E and 11.93 for sample A). We calculated the CV for each sample to statistically measure the degree of variation between the laboratories. A CV value of less than Inter-and intralaboratory variability of dystrophin quantification using immunohistochemistry Five laboratories each quantified the level of dystrophin expression in the same 6 biopsies using a standardized immunohistochemistry protocol; data were analyzed using the Arechavala-Gomeza method. 19 To assess interlaboratory variability, the mean 6 SD for each biopsy was calculated as well as the coefficient of variation (CV). Note how this variation is higher for those samples containing less dystrophin (E and B). To assess intraassay precision within each laboratory, the mean 6 SD for each laboratory per sample was calculated as well as the average CV per laboratory. Laboratories are unidentified.
20% is considered optimal. 22 The CV values averaged 33% (ranging between 23% for sample A and 67% for sample B); samples A, C, D, and F had CV values between 20% and 30%. We next analyzed intralaboratory variability in the same manner (figure 1). We calculated the average CV value from each laboratory (see tables e-1 to e-6 for individual data). The CV values for immunohistochemistry were below 30% for all laboratories, with laboratories 4 and 5 having low CV values of 14% and 14%, respectively.
While all laboratories were able to use the Arechavala-Gomeza method, 19 some laboratories had access to software that enabled them to directly compare this method with additional automated methods. Three laboratories (1, 4, and 5) analyzed the same samples using the Taylor method and one laboratory (4) compared 3 different methods (Arechavala-Gomeza, 19 Taylor, 18 and Beekman 20 methods). We analyzed the same images using the above intensity measurement techniques and we assessed the level of agreement between them by plotting the mean (6SD) of each sample for all techniques ( figure 2A). Next, rather than calculate the correlation coefficient, which can hide a considerable lack of agreement, 23 we plotted the data with a regression line, plotting the more automated Taylor 18 or Beekman 20 methods against the Arechavala-Gomeza 19 method ( figure  2B). We then selected the 2 methods used by more than one laboratory (Arechavala-Gomeza 19 and Taylor 18 methods), observed that the mean data from the 2 techniques were essentially identical (figure 2, A and B), and generated a Bland-Altman plot ( figure  2C). 23 This analysis shows that both methods are equivalent: the bias (the difference between the means) was only 2.103 and the 95% limits of agreement were between 10.83 and 26.63.
Inter-and intralaboratory variability of dystrophin quantification using Western blotting. We assessed the level of inter-and intralaboratory variability observed with Western blotting as above (figure 3). We observed more variability with Western blotting than immunohistochemistry with a mean SD of 15.95 (ranging between 0.89 for sample E and 33.09 for sample F). The CV values for Western blotting averaged 80% (ranging between 23% for sample F and 223% for sample E) confirming a higher degree of variability with this technique; only samples D and F had CV values nearing 20% (figure 3). The CV values were particularly affected by 2 of the samples (B and E) being at/below the limit of sensitivity; our results thus indicate that the interlaboratory variability improves as the level of dystrophin increases.
Intralaboratory variability was also more pronounced than for immunohistochemistry. Only laboratory 1 had an optimal CV value of 0.3%; laboratory 3 had the highest at 119% (figure 3).
Immunohistochemistry and Western blotting data comparison. To assess the level of agreement between the immunohistochemistry and Western blotting data, we plotted the mean (6SD) of each sample for both techniques (figure 4A), plotted the data with a regression line (figure 4B), and generated a Bland-Altman plot of the difference between the methods against their mean (figure 4C). 23 The level of bias was 214.18 and the upper and lower limits of agreement Assessing the agreement between different methods of immunohistochemical dystrophin measurement The mean data from each method were compared in a bar chart 6 SD (A) and plotted with a regression line (B). The difference between the Arechavala-Gomeza and Taylor methods was plotted against their mean in a Bland-Altman plot (C) where the mean of the differences between the methods represents the bias (i.e., the value determined by one method minus the value determined by the other method) and the upper and lower 95% confidence limits represent the upper and lower limits of agreement, respectively (the difference between the 2 methods should lie within these bounds on 95% of occasions).
were 64.96 and 293.32, respectively. Although our sample size is small, the scatter of data points in figure  4C suggests that as the mean increases, the difference between the 2 methods also increases. Thus, while the immunohistochemistry and Western blotting data are somewhat comparable, the data are not in perfect agreement; this is unlikely to be attributable to technical problems but rather to the different properties of mutant dystrophin. For example, there is a large discrepancy for sample F (patient with BMD with a deletion of exons 10-44) with which the level of dystrophin quantified by Western blotting is considerably higher than that determined by immunohistochemistry (see discussion).
DISCUSSION Dystrophin expression is being used as a secondary outcome measure in several clinical trials, but the lack of standardized procedures limits the ability to compare these different studies.
We set up a study in which we first standardized the methodologies for detecting dystrophin expression and then applied them to assess patients with DMD and BMD. Our data show that optimized immunohistochemistry and Western blotting were surprisingly concordant given the variable nature of dystrophinopathy biopsies (e.g., variable fibro-fatty replacement between biopsies and variable dystrophin content within serial sections of the same biopsy). We demonstrated that properly handled tissue can be distributed to multiple centers internationally to achieve comparable results. Because recent studies demonstrated that dystrophin expression can vary between different controls, we distributed control biopsies to each laboratory. 19 In the context of clinical trials, this is a variable that will need to be considered, either by using one set of control samples, which may not be realistic, or the use of humanized mouse muscle. 24,25 In any case in a clinical trial, the use of each patient's pre-or nontreated muscle biopsy is of paramount importance because of the variable levels of revertant fibers and trace dystrophin expression in each patient. 15,16 Our study demonstrates that mean dystrophin levels obtained using 3 alternative immunohistochemical methods were comparable, suggesting that when the mean dystrophin level per biopsy is reported, the choice of which published script is used is not crucial, [18][19][20] although extra information could be obtained with some approaches over others. 26 We have validated the robustness, in a multicentre setting, of the Arechavala-Gomeza 19 (5 laboratories) and Taylor 18 (3 laboratories) methods by evaluating the precision of results generated by different equipment and operators. The Beekman 20 method was only tested in one of the participating laboratories. One limitation of this study was that the numbers of controls and test specimens were relatively small, because it is extremely challenging to obtain human muscle biopsies of the size needed for such a comparative study. Inter-and intralaboratory variability of dystrophin quantification using Western blotting Five laboratories each quantified the level of dystrophin expression in the same 6 biopsies using a standardized Western blotting protocol. To assess interlaboratory variability, the mean 6 SD for each laboratory and biopsy was plotted on a bar chart and the average coefficient of variation (CV) per laboratory calculated. To assess intralaboratory variation, the mean 6 SD for each laboratory per sample was calculated as well as the average CV per laboratory. Laboratories are unidentified.
Immunohistochemistry and Western blotting (measuring sarcolemmal and total dystrophin, respectively) give information that is not necessarily identical, but rather complementary, especially in diseased muscle. For example, in some samples (e.g., BMD sample A, c.40_41delGA), the level of dystrophin determined by both techniques was highly comparable, while with others (e.g., BMD sample F, del ex 10-44), the level of dystrophin quantified by Western blotting was significantly higher than that determined by immunohistochemistry. Considering that this patient carries a large deletion removing a significant portion of the actin-binding domain, 27,28 and the ability of this mutant dystrophin to bind to the sarcolemma, we suggest that Western blotting in this case overestimates the amount of functional dystrophin. Similar findings have been reported in transgenic mdx mice carrying BMD-like molecules, in which lower levels of mini-dystrophin were observed in the sarcolemmal Western blot fractions compared with mice expressing full-length dystrophin. 29 Considering that many BMD mutations (and the equivalent DMD mutations after exon skipping) affect the 3-dimensional structure and actin-binding properties of dystrophin, [30][31][32] capturing both the total amount of dystrophin in the homogenate as well as its localization at the sarcolemma is clearly important. Assessing dystrophin using immunohistochemistry is also important, because a different pattern of expression can lead to differences in the functional outcome irrespective of the total amount of protein present. For example, in transgenic mdx mice, mice with a low, but uniform dystrophin expression have a milder phenotype than mdx mice with a higher, but variable pattern. 33 Immunohistochemistry and Western blotting are not necessarily the most novel methods for dystrophin quantification but they remain widely available and accessible. Alternative, but less widely available techniques, such as mass spectrometry 34 and ELISA, may be advantageous for detecting linear increments of dystrophin from very small amounts of sample. However, their use in isolation would not be desirable because of the issues related to the functionality and localization of mutant dystrophin discussed above. Based on the results of our study, we recommend that dystrophin restoration in clinical trials should be quantified using parallel techniques, which are, in hierarchy of importance: (1) sarcolemmal dystrophin quantification by immunohistochemistry, and (2) quantitative Western blotting or alternative techniques measuring total dystrophin levels in muscle homogenates such as mass spectrometry. Counting dystrophin-positive fibers is also used, but interlaboratory reliability has not been assessed; it currently relies on a qualitative rather than quantitative operational definition for "positive fibers." Nevertheless, within a single laboratory, the reproducibility of counting dystrophin-positive fibers has been indicated, although the use of a pretreatment threshold is paramount. 7,9 Our study demonstrates that when biopsy preparation and antibody protocols are standardized, multiple laboratories are able to reliably measure dystrophin expression using existing techniques. We therefore recommend the use of standardized immunohistochemical and Western blotting methods in parallel as robust biochemical outcome measures for DMD clinical trials. Assessing the agreement between immunohistochemistry and Western blotting for dystrophin quantification The mean immunohistochemistry and Western blotting data for each biopsy were compared in a bar chart 6 SD (A) and plotted with a regression line (B). The difference between the methods was plotted against their mean in a Bland-Altman plot (C) where the mean of the differences between the methods represents the bias (i.e., the value determined by one method minus the value determined by the other method) and the upper and lower 95% confidence limits represent the upper and lower limits of agreement, respectively (the difference between the 2 methods should lie within these bounds on 95% of occasions).

AUTHOR CONTRIBUTIONS
Karen Anthony: drafting/revising the manuscript for content, study concept or design, analysis or interpretation of data, acquisition of data, statistical analysis, and study coordination. Virginia Arechavala-Gomeza: drafting/revising the manuscript for content, study concept or design, analysis or interpretation of data, acquisition of data, and study coordination. Laura E. Taylor: study concept or design, contribution of vital reagents/tools/patents, and study supervision or coordination. Adeline Vulin: analysis or interpretation of data and acquisition of data. Yuuki Kaminoh: analysis or interpretation of data, acquisition of data, statistical analysis. Silvia Torelli: analysis or interpretation of data, acquisition of data, drafting/revising the manuscript for content. Lucy Feng: drafting/ revising the manuscript for content, including medical writing for content, and contribution of vital reagents/tools/patents. Narinder Janghra: analysis or interpretation of data, acquisition of data. Gisèle Bonne: analysis or interpretation of data and study supervision/coordination. Maud Beuvin: analysis or interpretation of data. Rita Barresi: revising the manuscript for content, acquisition of data. Matt Henderson: revising the manuscript for content, contribution of vital reagents/tools/patents and acquisition of data. Steven Laval: analysis or interpretation of data. Afrodite Lourbakos: drafting/revising the manuscript for content, including medical writing for content, study concept or design, and analysis or interpretation of data. Giles Campion: drafting/revising the manuscript for content, including medical writing for content, analysis or interpretation of data, and study supervision or coordination. Volker Straub: study concept or design, analysis or interpretation of data, acquisition of data, study supervision or coordination, and obtaining funding. Thomas Voit: study concept and design, interpretation of data, study supervision, and obtaining funding. Caroline Sewry: drafting/revising the manuscript for content. Jennifer Morgan: drafting/revising the manuscript for content, study concept or design, analysis or interpretation of data. Kevin M. Flanigan: drafting/revising the manuscript for content, including medical writing for content, study concept or design, analysis or interpretation of data, contribution of vital reagents/tools, acquisition of data, and study supervision/coordination. Francesco Muntoni: drafting/revising the manuscript for content, including medical writing for content, study concept or design, analysis or interpretation of data, study supervision or coordination, and obtaining funding.