Reliability of the NINDS Myotatic Reflex Scale
Citation Manager Formats
Make Comment
See Comments

Abstract
The assessment of deep tendon reflexes is useful for localization and diagnosis of neurologic disorders, but only a few studies have evaluated their reliability.We assessed the reliability of four neurologists, instructed in two different countries, in using the National Institute of Neurological Disorders and Stroke (NINDS) Myotatic Reflex Scale. To evaluate the role of training in using the scale, the neurologists randomly and blindly evaluated a total of 80 patients, 40 before and 40 after a training session. Inter- and intraobserver reliability were measured with kappa statistics. Our results showed substantial to near-perfect intraobserver reliability, and moderate-to-substantial interobserver reliability of the NINDS Myotatic Reflex Scale. The reproducibility was better for reflexes in the lower than in the upper extremities. Neither educational background nor the training session influenced the reliability of our results. The NINDS Myotatic Reflex Scale has sufficient reliability to be adopted as a universal scale.
NEUROLOGY 1996;47: 969-972
The assessment of deep tendon reflexes is an essential component of the neurologic examination. Myotatic reflex scales. [1-8] are developed in an attempt to obtain "hard" clinical data, necessary for objective longitudinal assessment of patients, by the same or different physicians, and for the evaluation of patients in clinical trials. It can also assist when comparing different studies.
In an effort to standardize the measurement of deep tendon reflexes, the National Institute of Neurological Disorders and Stroke (NINDS) developed a myotatic reflex scale. [8] The similarity of this scale to other reflex scales may facilitate its universal acceptance and utilization. [9] However, rater agreement in using this scale has not been evaluated, and, in general, there has been little research on the reliability of reflex scales. The lack of explicit testing of the reliability of reflex scales restricts their use since reliability is an essential requirement for scientific quality in data management. Reliability includes the study of intraobserver and interobserver agreement. A previous study evaluating the reliability of the nine-point "Mayo Clinic Scale" found considerable disagreement among neurologists. [10] Other studies that evaluated myotatic reflexes without using specific scales also found low agreement between neurologists, [11,12] even though interobserver agreement is not influenced by patients' sex, age, mode of admission, or diagnosis at discharge. [12]
To assess the reliability of the NINDS Myotatic Reflex Scale, [8] four neurologists randomly and blindly evaluated the deep tendon reflexes of 40 subjects. To evaluate the effects of educational background, we chose neurologists who had been instructed in two different countries. Since "training" may improve interobserver variability, [11,13] we compared the performance of the same group of neurologists before and after being trained to administer this scale.
Methods.
This study was conducted in the Department of Neurology of the Ramos Mejia Hospital, Buenos Aires, Argentina. Reflexes were scored with the NINDS Myotatic Reflex Scale Table 1. Four clinical neurologists with similar background (two instructed in the United States and two instructed in Argentina) Table 2 evaluated a total of 80 subjects. Forty subjects were evaluated before and 40 after a training session.
Table 1. NINDS Myotatic Reflex Scale [8]
Table 2. Raters' attributes
During the training session, examining neurologists (raters) agreed both to use the same type of reflex hammer (long hammer with a soft round rubber head) and to employ the same techniques to elicit each of the reflexes. The raters examined a group of 10 subjects, who did not participate in the present study, until agreement among raters was reached on reflexes scores.
Subjects were informed of the study and gave written consent. After examining the first set of 40 patients, the rates completed a form in which they described the type of hammer they had used for their evaluation and gave details of their examining techniques. Each rater, in random order, assessed the reflexes in each subject twice with a difference of 2 hours between each assessment. Raters were unfamiliar with the subjects they examined and were provided only with the subject's age and chief complaint. Each rater completed a standardized form immediately after examining a subject. Subjects examined differed by sex, age, and neurologic diagnoses Table 3.
Table 3. Diagnoses
We will present only the results of the first examination, since the interobserver reliability findings based on the first examination were similar to those based on the second.
Statistical analysis.
Intra- and interobserver reliabilities were evaluated using the kappa coefficient kappa, which is a measure of agreement beyond chance. [14] Similar to a correlation coefficient, kappa varies from -1.0 (complete disagreement) to 0.0 (chance agreement) to +1.0 (perfect agreement). Strength of agreement was designated as poor (kappa < 0.0), slight (0.0 <or=to kappa <or=to 0.2), fair (0.21 <or=to kappa <or=to 0.4), moderate (0.41 <or=to kappa <or=to 0.6), substantial (0.61 <or=to kappa <or=to 0.8), and near perfect to perfect (0.81 <or=to kappa <or=to 1.0), as previously suggested. [15] Since our data are ordinal, we used weighted kappa to allow credit for two types of partial rater agreement. Weights were different according to the extent of the discrepancy; less weight was given to greater differences in scores (i.e., weights were different if disagreement occurred between a score 1 and a score 2 from between a score 1 and score 4) Table 4. Moreover, since rater confusion between a pathologic reflex and a normal reflex is much more serious than between normal reflexes, deviations from scores indicative of pathology (0 = reflex absent; 4 = reflex enhanced) were weighted differently Table 4. [16] For example, a disagreement between scores 4 and 3 (pathologic and normal reflex) was given less credit than between scores 3 and 2 (reflex in the upper half and lower half of normal range reflex). Weights were based on those presented by Cicchetti [16] Table 4. The statistical significance among and between kappa values was determined using the statistical procedure described in Fleiss [17]; p values were two-tailed. The level of significance was set at p < 0.05, except for pairwise comparisons, which were Bonferroni-corrected.
Table 4. Credit weights (credits for partial agreement) for calculating kappa [16]
Results.
Intraobserver reliability before training.
Intraobserver reliability by rater was substantial to near-perfect, with median kappa values ranging from 0.68 to 0.91. Reliability for rater 3 (0.68) was significantly lower than for the other three raters (0.81 to 0.91); reliability for rater 4 (0.91) was significantly higher than for rater 1 (0.81). Median intraobserver reliability of each reflex was substantial to near perfect (kappa = 0.77 to 0.89). Differences between reflexes were not significant.
Interobserver reliability before training.
Interobserver reliability by pairs of raters was moderate-to-substantial, with median kappa values ranging from 0.50 to 0.64. Pairwise comparison found no significant differences between pairs of raters.
Median interobserver reliability of each reflex was moderate to substantial (kappa = 0.43 to 0.74). The patellar reflex (right = 0.74; left = 0.69) achieved the highest interobserver agreement while the biceps had the lowest (right = 0.46; left = 0.43). In general, reliability was better for the lower-extremity reflexes (0.61 to 0.74) than for the upper-extremity reflexes (0.43 to 0.57).
No significant differences in reliability were found when examined by country of education (United States or Argentina).
Form.
Before training, most neurologists used a short reflex hammer Table 2. All of the raters reported a nonsystematic evaluation of the reflexes examined (e.g., limbs were not always examined in the same position, using the same methods to elicit the reflexes, etc.) but patients were examined mostly in the seated position.
Intraobserver reliability after training.
Intraobserver reliability by rater was substantial to near-perfect, with median kappa values ranging from 0.77 to 0.89. Reliability for rater 3 (0.77) was significantly lower than for raters 2 (0.89) and 4 (0.89). Median intraobserver reliability of each reflex was near perfect (kappa = 0.8 to 0.92). Differences between reflexes were not significant.
Interobserver reliability after training.
Interobserver reliability by pairs of observers was moderate-to-substantial, with median kappa values ranging from 0.51 to 0.61. Pairwise comparison found no significant differences between pairs of raters.
Median interobserver reliability of each reflex was moderate to substantial, (kappa = 0.43 to 0.8). Median reliability was significantly better for the lower-extremity reflexes (kappa = 0.67 to 0.80) than for the upper-extremity reflexes (kappa = 0.43 to 0.57).
No significant differences in reliability were found when examined by country of education.
Effects of training.
There were no significant effects of training on intraobserver reliability either by raters or by reflexes. No significant differences were found for interobserver reliability either by pairs of raters or by reflexes.
Discussion.
The intraobserver reliability of the NINDS Myotatic Reflex Scale was substantial to near-perfect, while the agreement between raters was moderate-to-substantial. Our study is difficult to compare with a previous reliability study of the "Mayo Clinic scale," which only used percentage of agreement in the analysis. [10] Percentage of agreement fails to account for concordance due to chance. In spite of using more rigorous statistical measures, the agreement between neurologists in our study was higher than previously reported in studies evaluating selected reflexes. [8-10] However, the agreement between neurologists was not optimal. Interobserver variation cannot be explained by physiologic changes of the reflexes because the subjects were examined within 2 hours or by training or methods of eliciting reflexes. We eliminated bias related to order of evaluation by the random order assignment of the neurologists.
Even though the neurologists used the same techniques and the same type of reflex hammer to elicit each reflex, and had agreed on how to score the reflexes, training did not influence the assessment of reflexes. Our findings disagree with a recent study showing that differences in techniques of evaluating ankle reflexes were relevant, [18] and those reporting that training improved performance. [11,13] Country of education apparently had no significant effect on interobserver reliability.
It is unclear why attempts to standardize the method to elicit reflexes were unsuccessful. Biologic variations of reflexes are an unlikely reason; whether subjects were or were not relaxed after being examined four times is a moot point since variability is lower in preactivated muscles. [19] Our study suggests that the performance of neurologists using the NINDS Myotatic Reflex Scale is independent of the techniques they used. Lack of influence of training and technique may be because neurologists were already performing at a good level and could not be much improved.
Similar to previous reports, [10,20,21] reproducibility was better for reflexes in the lower rather than upper extremities. Hyperactive lower-extremity reflexes cannot explain our findings, as there were no differences between the scores of the reflexes in upper and lower extremities. Moreover, neurologists had no difficulty in eliciting patients' reflexes, but it is harder to position the upper extremities.
We purposely conducted the study out of the context of a routine clinical examination. Additional information might have increased the reliability of the scale. [22] The NINDS Myotatic Reflex Scale has sufficient reliability to be adopted as a universal scale. The use of the NINDS Myotatic Reflex Scale should improve communication among neurologists and the quality of clinical investigations.
Note.
Readers can obtain 4 pages of supplementary material from the National Auxiliary Publications Service, c/o Microfiche Publications, PO Box 3513, Grand Central Station, New York, NY 10163-3513. Request document no. 05301. Remit with your order (not under separate cover), in US funds only, $7.75 for photocopies or $4.00 for microfiche. Outside the United States and Canada, add postage of $4.50 for the first 20 pages and $1.00 for each 10 pages of material thereafter, or $1.75 for the first microfiche and $.50 for each fiche thereafter. There is a $15.00 invoicing charge on all orders filled before payment.
Acknowledgment
We gratefully acknowledge the study subjects for their generous participation.
- Copyright 1996 by Advanstar Communications Inc.
REFERENCES
- 1.↵
Members of the Department of Neurology, Mayo Clinic. Mayo Foundation. Clinical examinations in neurology. 6th ed. St. Louis: Mosby, 1991:240-254.
- 2.
Medical Research Council. Aids to the examination of the peripheral nervous system. Memorandum no. 45. London: Her Majesty's Stationery Office, 1976.
- 3.
DeJong RN, Haerer AF. Case taking and the neurologic examination. In: Joynt RJ, ed. Clinical neurology. Vol I. Philadelphia: Lippincott, 1995:49-68.
- 4.
Gunderson CH. Essentials of clinical neurology. New York: Raven Press, 1990:85-86.
- 5.
Bradley WG, Daroff RB, Fenichel GM, Marsden CD. Neurology in clinical practice. 2nd ed. Boston: Butterworth-Heinemann, 1996:437.
- 6.
Warlow C. Handbook of neurology. Oxford: Blackwell Scientific Publications, 1993:7-8.
- 7.
Greenberg DA, Aminoff MJ, Simon RP. Clinical neurology. Norwalk: Appleton and Lange, 1993:314.
- 8.↵
Hallett M. NINDS Myotatic Reflex Scale [letter]. Neurology 1993;43:2723.
- 9.↵
- 10.↵
Stam J, van Crevel H. Reliability of the clinical and electromyographic examination of tendon reflexes. J Neurol 1990;237:427-431.
- 11.↵
Tomasello F, Mariani F, Fieschi C, et al. Assessment of interobserver differences in the Italian multicenter study on reversible cerebral ischemia. Stroke 1982;13:32-35.
- 12.↵
Hansen M, Christensen PB, Sindrup SH, Olsen NK, Kristensen O, Friis ML. Inter-observer variation in the evaluation of neurological signs: patient-related factors. J Neurol 1994;241:492-496.
- 13.
Vreeling FW, Jolles J, Verhey FRJ, Houx PJ. Primitive reflexes in healthy, adult volunteers and neurological patients: methodological issues. J Neurol 1993;240:495-504.
- 14.↵
Cyr L, Francis K. Measures of clinical agreement for nominal and categorical data: the kappa coefficient. Comp Biol Med 1992;22:239-246.
- 15.↵
- 16.↵
Cicchetti DV. Assessing inter-rater reliability for rating scales: resolving some basic issues. Br J Psychiatry 1976:129:452-456.
- 17.↵
Fleiss JL. Statistical methods for rates and proportions. 2nd ed. New York: John Wiley & Sons, 1981:222.
- 18.↵
- 19.↵
Toft E, Sinkjaer T, Rasmussen A. Stretch reflex variation in the relaxed and the pre-activated quadriceps muscle of normal humans. Acta Neurol Scand 1991;84:311-315.
- 20.
Ross RT. How to examine the nervous system. New Hyde Park: Medical Examination Publishing, 1985:212-231.
- 21.
- 22.↵
Vogel HP. Influence of additional information on interrater reliability in the neurologic examination. Neurology 1992;42:2076-2081.
Disputes & Debates: Rapid online correspondence
REQUIREMENTS
If you are uploading a letter concerning an article:
You must have updated your disclosures within six months: http://submit.neurology.org
Your co-authors must send a completed Publishing Agreement Form to Neurology Staff (not necessary for the lead/corresponding author as the form below will suffice) before you upload your comment.
If you are responding to a comment that was written about an article you originally authored:
You (and co-authors) do not need to fill out forms or check disclosures as author forms are still valid
and apply to letter.
Submission specifications:
- Submissions must be < 200 words with < 5 references. Reference 1 must be the article on which you are commenting.
- Submissions should not have more than 5 authors. (Exception: original author replies can include all original authors of the article)
- Submit only on articles published within 6 months of issue date.
- Do not be redundant. Read any comments already posted on the article prior to submission.
- Submitted comments are subject to editing and editor review prior to posting.
You May Also be Interested in
Related Articles
- No related articles found.