# Procedures for setting normal values

## Citation Manager Formats

* *Make Comment

* *See Comments

"Normal values" occupy an important place in medical practice. The judgment regarding the presence or absence of disease and the need to perform further testing or to treat may depend importantly on whether measured patient characteristics are within "normal limits". The types of measurements that may be of interest cover a broad spectrum, including most medical and laboratory assessments. Frequently, however, the quality of the available reference values are far from ideal, mainly because of unrepresentative selection of healthy subjects, inadequate sample size, lack of medical evaluation of subjects, failure to also measure other variables that influence the attribute studied, and failure to rigorously apply appropriate statistical analysis. Developing better normal values might result in more sensitive (detection of disease when it is present) and specific (not detecting disease when it is absent) identification of disease. Having better reference values even for such established tests as nerve conductions might result in tangible improvements in the value of the test. The approaches we will describe might also serve to create quality reference values for newly introduced measured phenomena and tests to be used in medical practice, epidemiologic research, and controlled clinical trials.

In the following sections, we discuss clinical characteristics and laboratory measurements for which normal values might be needed, why a representative sample of subjects from a population is best, the deletions from this cohort that should be considered, additional variables that might also be assessed, and the specific statistical steps used to estimate variable-specific percentile values.

This article is fashioned as a didactic guide. The insights are intuitive or practical extensions from consideration of scientific and statistical concepts. We provide a short list of references for background reading ^{[1-15]} and cite specific additional studies related to the examples we include.

Tests and clinical phenomena for which reference limits can be set. Normal values have been published for such measurements as height, weight, pulse, blood pressure, features of the electrocardiogram, characteristics of pulmonary function tests, attributes of nerve conduction, and a variety of tests performed on body fluids. These items have in common that they are measured in a continuum and can be measured accurately and reproducibly.

There undoubtedly are some clinical, psychometric, pathologic, cytologic, and imaging phenomena that might be quantitated and for which reference limits could be estimated. For these phenomena, measuring instruments may need to be developed. Ideally, the scale developed would be simple, have clearly defined boundaries between the units of measurement, be graded over a wide range, and give reproducible measurements both in health and disease.

As the following examples will show, some tests are designed to measure variability in health, while others are designed to characterize and scale disease abnormality. In the first case, a departure from a normal result may indicate disease. The test may also provide some indication of the severity of disease, but in this respect is often limited. Examples are given below of several test instruments employing scales that characterize the disease process (eg, Neuropathy Symptom Profile (NSP) and Minnesota Multiphasic Personality Inventory (MMPI)). In the second case, the measuring instrument is designed to characterize and quantitate or stage disease.

Example 1. The Medical Research Council (MRC) grading scale of muscle weakness. Many neurologists grade muscle weakness, using the MRC grading scale, ^{[16]} as follows: 5 = normal strength; 4 = weakness, but strength greater than 3; 3 = weakness, but the limb can just be moved against gravity; 2 = severe weakness, but the limb can be moved with gravity eliminated; 1 = visible contraction of muscle, but the limb cannot be moved even when gravity is eliminated; and 0 = paralysis. This scale has worth because (1) it is easily understood; (2) it is in common use--perhaps more than one-half of North American neurologists use it; and (3) reproducibility is good for severe weakness. The major disadvantages are (1) the minimal criteria for grade 4 weakness depend on a physician's judgment of what is normal or abnormal, which judgments may be difficult because muscle strength presumably varies with the physician, muscle group, age, gender, physical fitness, and perhaps other variables; (2) grade 4 encompasses too broad a range of weakness from mild to severe; (3) the test is limited to one muscle or functional muscle group--it does not provide a global estimate of muscle weakness or neurologic abnormality; (4) sensation and reflexes are not included; and (5) the magnitude of the steps of the scale are uneven.

Neurologists have expanded MRC grade 4 into 4-, 4, and 4+ to indicate that they could overcome slight, intermediate, and strong force. Even this approach is not standardized since the effort needed for a small and weak muscle is undefined and grossly different than for a large and strong muscle.

The scale was designed for the assessment of muscle recovery after nerve injury. For this purpose, it is an excellent scale. It is not an ideal scale for assessing weakness in neuromuscular disease. Since it is designed to characterize disease, it would not be helpful for assessing a healthy subject cohort.

Example 2. The Neuropathy Disability Score (NDS). The NDS also is designed to quantitate neurologic deficit in disease. It was designed to provide an overall score of neuropathic deficit. The scaling approach is a modification of the one introduced by neurologists at Mayo Clinic. Dyck et al ^{[17,18]} modified this grading approach to choose a list of muscles, reflexes, and sensation of index finger and great toe so as to express the overall neuropathic deficit as one number. Weakness is graded as 0 = normal, 1 = 25% weak, 2 = 50% weak, 3 = 75% weak, and 4 = paralyzed, considering the muscle, age, gender, height, weight, and physical fitness of the subject. Reflexes are graded as 0 = normal, 1 = decreased, and 2 = absent and considering the variables listed for weakness. Touch-pressure, vibration, joint position, and pin prick of index finger and great toe are graded as 0 = normal, 1 = decreased, and 2 = absent and considering the variables listed above. Cranial nerve function is also included. A person without any abnormality has a score of 0; maximum deficit = 244. The desirable features of the NDS are (1) it provides a global scale of weakness, state of the reflexes, and distal sensation; (2) the gradations of the scale are conceptually of equal magnitude; (3) it normalizes results for site, age, gender, height, weight, and physical fitness; and (4) it has been demonstrated to be meaningful and quite reproducible among trained observers in a cohort of neuropathy patients with varying degrees of neuropathic abnormality.

This scaling is clearly designed to characterize and quantitate neuromuscular conditions like peripheral neuropathy. In chronic inflammatory demyelinating polyradiculoneuropathy ^{[19]} and in neuropathy associated with monoclonal gammopathy, ^{[20]} the overall neuropathic deficit as measured with NDS was significantly related to the summated compound muscle action potentials (CMAPs) of limb nerves--the latter a test that is objective and not subject to bias. Efficacy of plasma exchange, compared with sham exchange, was demonstrated using NDS and SigmaCMAP (CMAP values from different nerves added together). Would it be valuable to study a healthy subject cohort? It could be done to let the examiner know whether his or her judgment of normality is correct, but publishing these normal values would have little meaning since another examiner might make other judgments. Therefore, although the NDS is quite useful in characterizing and quantitating global severity of neuromuscular disease, eg, peripheral neuropathy, it depends heavily on physician judgment and is only partially standardized.

Example 3. The Neuropathy Symptom Profile. Unlike the foregoing two tests, this one is standardized and reflects variability of neuromuscular symptoms in health and in disease ^{[21]}. Printed questions read by the subject or patient are answered by checking boxes marked true or false. The questionnaire is scored by optical reader. The questions have been divided into scales for neuropathy, weakness, sensory, autonomic, and so on. Since the questionnaire has been given to a healthy subject cohort, free of neuromuscular disease, the normal variability of the responses have been determined. The questionnaire has also been given to patients with various neuromuscular diseases so that patterns of abnormality have been demonstrated. In contrast to the two preceding measurement instruments, this one is standardized--the same questions are asked in the same way and are scored in the same way. Of course, it is possible that the wording is understood and interpreted differently by men or women, young or old, or those from different ethnic or educational groups. It may be important, therefore, to use normal values from the same population as the disease or experimental group.

Example 4. Quantitative sensory testing. The quantitative sensory testing approaches we have introduced (Computer Assisted Sensory Examination System IV (CASE IV)) ^{[22]} were designed to characterize sensory threshold and its variability in health and to recognize disease by finding abnormality of threshold. Basically, the test is designed to characterize sensory threshold in health and by doing so may reveal disease. Because the waveform is the same regardless of stimulus intensity and varies over a broad range and since standard approaches to testing and estimating threshold are used, it is possible to determine threshold specific for site, age, gender, and other variables. In the ideal system, all aspects of testing are standardized ^{[23]}. The stimulus employs an appropriate invariant waveform, stimulus intensities are given in steps over a broad range of magnitudes, the magnitude of each step is known, and stimuli are calibrated at frequent intervals to ensure that they remain within narrow tolerances. Mechanical stimuli are given from a machine-held (not hand-held) arm and with a constant load, and temperature pulses are superimposed on a constant temperature. The algorithm of testing is standardized and validated in groups of healthy subjects and disease patients. It is programmed into the controller computer. The algorithm used to estimate threshold is also validated and programmed into memory. Assuming that normal limits have been estimated, it is possible to print out percentiles for modality, site, age, and other variables in a given patient with neuromuscular disease. The CASE IV system provides this level of standardization.

This test, therefore, is highly standardized and is designed to evaluate sensory threshold in healthy patients and to recognize abnormality in disease.

Example 5. Minnesota Multiphasic Personality Inventory. In the development of the MMPI, a battery of questions was given to persons with and without defined categories of psychoneurosis. Normal limits were set for individual scales of anxiety, depression, hypochondriases, and so on. The pattern of abnormality among scales provides some insight regarding the type and severity of psychoneurosis and of psychosis.

Example 6. Kurtzke's scaling of neurologic deficits. Kurtzke's scaling of the neurologic deficits ^{[24-26]} clearly is designed to characterize and quantitate neurologic deficit in one disease--multiple sclerosis. Functional measures of activities of daily living, eg, walking a certain distance, are added to by certain neurologic findings. Clearly, a study of normal subjects would have little value.

These examples are perhaps sufficient to indicate that testing of healthy subjects is inherent in the design of some testing instruments and necessary to express meaningful results. Examples of this are determination of vibratory, cooling, and heat-pain detection thresholds. With employment of a battery of sensory tests and different scales of the NSP and MMPI, certain patterns of disease may be revealed. They therefore may provide characterizing and even diagnostic information. There are grading scales solely to characterize and quantitate severity of a disease. For these, normal subject studies are not generally needed.

Another concept about clinical measuring instruments should be introduced. It is important that measurements be objective and not subject to volition or bias by either the patient or the observer. Consider the problem of muscle force. Two objective tests of muscle function are the muscle twitch and the CMAP. The magnitude of the response is thought to be unrelated to volition or patient or observer bias. They may be said to be objective tests. Unfortunately, only a few muscles can be tested by these methods, and the procedures are somewhat time consuming and expensive. Conversely, measurement of isometric contraction force with a strain gauge depends on motivation and effort and is influenced by concurrent degrees of discomfort. For sensory nerve function, the sensory nerve action potential is an objective measure. For autonomic nerves, there are a variety of objective tests available for use.

Selection of the reference cohort. Setting normal values for a physical attribute or test result requires selection of a reference group of subjects who will reflect the norm or standard against which a patient's attribute or result can be compared. Considerable care needs to be taken in the selection of these reference subjects. Too often, the controls consist of the investigator, his or her laboratory associates, volunteers obtained by advertisement, or simply patients not known to have the phenomenon or disease studied. For some purposes, such a reference cohort may be suitable, but for other purposes, it may not be. Chosen persons may be unrepresentative of the population. The reference group may be atypical by age, gender, height, weight, diet, and other variables that may influence the characteristic studied. Therefore, ideally one should begin by randomly selecting persons from the general population. Assuming that a sufficiently large number of persons is studied, a random cohort will have been selected that should be representative by age, gender, ethnic origin, and other variables of interest. When reporting, one should describe the selection approaches and the reference population in sufficient detail so that the physician or investigator using the reference Table canknow whether the values might be appropriate for his or her purposes.

Having chosen randomly sampled persons from the community, one must next decide whether and which patients should be excluded. The choice of whether certain diseases are to be a basis for exclusion may depend on what the reference values are to be used for. If one wishes to characterize a physical attribute irrespective of disease, one would include everyone. If, on the other hand, one wishes to provide reference values for detection of a disease, patients with the disease in question should be excluded. To illustrate, in our studies of nerve conduction, we not only exclude persons with neuropathy but also patients with neurologic disease with unknown effects on nerve conduction and diseases known to predispose to neuropathies.

Even when obvious disease that may affect a test result has been excluded, the reference cohort may reflect an unhealthy sample. Assume that the average western adult is overweight and eats too much animal fat. Many persons in the population may then be unhealthy. Weight and plasma lipids from this population might, in part, reflect this unhealthy lifestyle. For certain purposes, therefore, one might want to exclude from the reference cohort persons with unhealthy behavior.

A more subtle issue is whether a certain departure from usual values of a characteristic for which reference limits are being set may itself be used as an exclusion criterion. Let us take the example of a subject who was randomly selected, who is not known to have a neurologic disease, and whose clinical examination is normal but who has conduction velocities of limb nerves in the range of 20 to 30 m/sec. Such values clearly are below what is usually encountered. One gains further insight from the discovery that the abnormality of nerve conduction is generalized and other nerves are similarly affected. It is known that this phenomena may occur in a patient with inherited neuropathy ^{[27]}. Despite the circularity in reasoning, one is perhaps justified in excluding certain values, known from previous experience, to be unequivocally abnormal.

Inviting persons randomly chosen, eg, from a census list, does not guarantee that the cohort agreeing to participate will be representative of the community. One should compare the participating subjects to nonparticipating subjects. It is more likely that the cohort may be representative of the community if the test can be quickly done, is painless, and does not inconvenience the subject. Thus, it is more likely that a representative cohort of subjects can be acquired for the measurement of such simple tests as blood pressure and blood constituents than for procurement of biopsied nerve or brain specimens.

Prospective evaluation of the reference population. For certain purposes, it may be useful to study previously obtained and stored material for a test result or phenomenon. The use of such banked tissue for purposes of setting reference limits sometimes is questionable and may reflect circular thinking. Let us assume that a laboratory is interested in developing an antibody assay for the detection of a disease. Let us assume further that many serum samples are submitted to the laboratory because they may reflect the disease in question. The investigator might then divide the serum samples into two groups--those from subjects with the disease (presumably adjudicated to have the disease by clinical criteria) and those from subjects not known to have the disease. To assume that those not known to have the disease are an appropriate reference source is questionable because they may in fact have the disease in mild or different form (as suspected) or have another disease that might affect the test result. There is also the distinct possibility, perhaps likelihood, that the persons who had serum banked are not representative of the community, and there is the additional consideration that stored serum may have changed and cannot be directly compared to test results from freshly obtained serum against which the response is to be compared.

For these reasons, it is best to specifically recruit a random cohort of persons from the community and then to assess their physical features and diseases sufficiently to know whether and which patients should be excluded.

More than age should be taken into account. Although many physicians probably appreciate that age may affect normal values for such physiologic measurements as blood pressure, many do not recognize that a variety of other variables may be as or more influential than age in determining normal values. To illustrate, prostate specific antigen and a variety of hormones are influenced not only by age, but by gender and smoking. Altitude may affect hemoglobin concentration. Daily exercise and dietary and mineral intake may influence pulse and blood pressure. In a study of vibratory detection threshold, we found important associations with anatomic site, age, gender, body surface area, and body mass index. For certain attributes of nerve conduction (for example, sensory nerve action potential), age, gender, and body mass index are important. The list differs for various attributes of nerve conduction. Some of the variables are readily explained; others are not. Thus, persons continuously exposed to higher altitudes presumably respond to the lower oxygen tension by forming more hemoglobin. Sensory displacement of the vibratory stimulus may need to be greater in the fat than the lean person because more fat is interposed between the stimulus and the receptors situated adjacent to tendons.

In these examples, we have assumed that the associations with the explanatory variables occur independently of the disease process of interest. For example, persons may have diminished sensation as a result of the aging process, apart from neuropathic disease, and it is the fact of neuropathic disease that we want to detect, not aging. However, in some instances, it may be the aging process itself that is producing both the progression of disease and worsening of the measurement of interest. For example, age-associated loss of bone density in women is indicative of osteoporosis. In this case, one may not wish to adjust for age: bone density below a certain level at any age may be of the same medical significance.

Ideally, decisions about which variables to adjust for in determining reference values would be made from a priori biologic and medical information regarding the importance of each variable. In practice, however, such information may be either imperfect or nonexistent. In these instances, the statistical considerations discussed below may be helpful. In applying these algorithms, it will continue to be necessary to keep issues of biologic mechanisms and plausibility in mind. For example, the fact that average values of some laboratory measurements change with age has become increasingly well recognized, so that adjustment is increasingly made for such changes. Less well recognized is that the variability among subjects may increase with age as well. In our studies, we have found that variability in sensation tends to increase with age. In examining the nature of the increasing variability, it became apparent that the lower percentiles remained somewhat constant but that the upper percentiles (representing diminished sensation) often increased markedly with age. Statistical algorithms must give due cognizance to such trends.

Cost effectiveness of prospectively assessing normal controls. The cost of selecting, evaluating, and analyzing a prospectively chosen normal cohort is considerable. Some thought, therefore, should be given to whether the improved estimate of abnormality and detection of disease that result can be justified. Accurate, generalizable reference values may aid in research studies, detection of disease, and accuracy of follow-up. For certain epidemiologic and controlled clinical trial studies, this degree of accuracy in quantifying abnormality may be justified. For other purposes (for example, a preliminary survey of a phenomenon in a population), this degree of sophistication may be unnecessary.

Statistical methods for computing normal values. 1. A single dependent variable. In considering the statistical techniques useful in obtaining normal values, we start with the simplest situation: only one characteristic (or, in statistical language, one dependent variable) for which reference values are desired, and no explanatory variables (independent variables) such as age or gender are to be considered. In the past, interest has centered on the 2.5th and 97.5th percentiles, commonly referred to as a normal range. However, in view of the wide availability of computers, there is no need to restrict attention to the resulting dichotomy of normal/abnormal. It is more informative to indicate the actual percentile attained by an individual patient, and computing each of the percentiles (from 1 through 99) is easily done with a computer. Notationally, we will refer to the P^{th} percentile for a variable Y as Y(P), for P = 1,.\.\.\, 99.

Notice that Y(P) is defined as the value such that P% of values in the reference population are less than Y(P). To estimate Y(P) from a study population, one simply selects the corresponding percentile in the set of study values. This method is simple and universally valid, requiring no esoteric mathematical assumptions. Notice, however, that large sample sizes are required to estimate very large or very small percentiles. This is not unreasonable. For example, common sense would indicate that, if we are to estimate the 99th percentile (that value below which 99% of persons fall), we would need at least 99 subjects in our study. Conversely, it should be apparent that a sample of 15 subjects, for example, would not suffice.

Unfortunately, there is a common misconception that statistical wizardry can obviate the need for large sample sizes in setting normal values. This misconception is based on the erroneous belief that 95% of values in any population can be expected to lie within the mean +-\2 SDs. The assertion is based on the assumption that values in a population must follow what statisticians refer to as a Gaussian, or normal, distribution. The use of the label "normal distribution" is unfortunate, since there is no basis for assuming that values in any population must follow such a distribution.

The lack of any rational basis for assuming values in a population follow a Gaussian distribution was pointed out eloquently by Elveback et al ^{[5-7]}. We will only add here that a Gaussian distribution implies the existence of negative values in the population, a circumstance that is often impossible. On the other hand, if one accepts that a Gaussian distribution is not a mathematical necessity but might only hold approximately, one must be concerned with the accuracy of the approximation (strictly speaking, one might argue that any statement is true "approximately"). In this case, however, one must acknowledge that judging the accuracy of a normal distribution in estimating normal percentiles in any specific instance would require knowledge of the true percentiles. Simply having a bell-shaped curve is not sufficient to insure close agreement of percentiles.

The observation that the "normal law of errors" lacks both a mathematical and an empirical basis was also pointed out eloquently by Poincare: "Everyone believes in the normal law of error, the physicists because they think that the mathematicians have proved it to be a mathematical necessity, the mathematicians because they believe that physicists have established it by laboratory demonstration" ^{[6]}. To illustrate the issues raised thus far, we consider a population of serum urea values obtained from consecutive patients at the Mayo Clinic. The population consisted of 5,594 values, which were stored on a computer. We then generated a true random sample from this population Table 1. Note that the 95th percentile may be estimated by the value 82, the 95th largest value. In fact, any number between 69 and 82 will exceed 95% of values in the sample, and thus any of these could be used to estimate the 95th percentile. Rather than choosing the largest value, an intermediate value is usually chosen, and various strategies are available for making the choice. Note also that the mean minus 2 SDs yields a negative number.

To illustrate the need for large sample sizes in estimating percentiles, we generated nine additional samples of size 100 from this population Table 2. Notice that estimates of the mean and median (P_{50}) are quite stable, indicating that n = 100 is a large sample size for estimating a typical value. (Note also that medians are more appropriate than means when the distribution of values is skewed, as in the present instance, with the largest values in the population more spread out than the smallest values.) However, the upper percentiles vary markedly from one sample to the next. In general, we recommend that a sample size of 100 should be considered a minimum for reliably estimating normal values.

2. A single independent variable. Next, suppose that values of Y vary with age. In this case, one would wish to estimate percentiles taking into account the ages of persons in the population. The methodology for doing so is described below.

Step 1. Visually inspect a scatterplot of the data for outliers. If present, these values are to be deleted from the regression analyses which follow (steps 3 through 5) but not from the estimation of percentiles at the end (steps 6 and 7).

Step 2. Visually inspect for skewness, possibly indicating the need to transform the data (taking logarithms of original values, for example).

Step 3. Evaluate whether the average value of Y changes with age. This may be performed by constructing a regression Equation relatingY to age, having the form: Y = b_{0} + b_{1} age. The decision to adjust for age at this point would depend on several factors. Is the association statistically significant? Is the association with age of sufficient magnitude to warrant considering an adjustment? In assessing this, one might compare the SD of the Y values to the SD of the differences between the Y values actually observed and the corresponding estimates obtained from the regression equation. (These differences are referred to as the residuals from regression.) Alternatively, one might compare the squared values of these quantities, which statisticians refer to as the variance. The percent reduction in variance of Y is given by a statistic called "R squared". If the association is found to depend on age, one might then wish to consider adding higher-order terms (age squared, for example) to the estimation equation. If no adjustment is made for age at this point, Y is set equal to the mean.

Step 4. It is equally important to consider whether the variability in Y values changes with age. This is commonly the case, since the aging process can be expected to progress at different rates, and in different forms, in individual subjects in the population. For percentiles above the 50th, one may estimate the effect of age on variability by regressing the positive residuals (R) against age, obtaining an Equation ofthe form: R = C_{0} = C_{1} age. The need to adjust for an association with age at this point is judged in a manner analogous to the method described in step 2. If no adjustment is made, R is set equal to 1.

Step 5. For each subject (including the outliers excluded from the regression analyses), compute an age-adjusted Z score: Z = (Y - Y)/R.

Step 6. Compute the percentiles for Z, using the methodology described in section 1, obtaining Z(P) for P = 1,.\.\.\, 99. (We have explicitly described the algorithm for the upper percentiles. The lower percentiles are obtained in the same manner, but using the absolute value of the negative residuals in step 3).

Step 7. To obtain percentiles for Y as a function of age, one computes Y(P) = RZ(P) + Y.

3. Multiple explanatory variables. In practice, one may have a variety of characteristics that may influence the measurement of interest, such as age, gender, weight, height, occupation, and so forth. The methodology for incorporating such variables is easily accomplished, using the basic principles described in section 2. However, in developing the analogous regression equations for Y and R (accounting for factors that influence average values and variability, respectively), we recommend using a stepwise regression approach. Thus, in the regression Equation forY, one first identifies the variable that best predicts Y. After including this variable in the prediction equation, one then identifies which of the remaining variables adds the most information (provides the best prediction of Y) beyond the information already available from the first variable entered. One continues in this way until none of the remaining variables provides a statistically significant improvement in the prediction of Y. This process yields a prediction Equation ofthe form: Y = b_{0} + b_{1} age + b_{2} weight +.\.\.\.

One can obtain an Equation forR in similar fashion and compute Z scores as described previously, ultimately obtaining percentiles for Y. Although the computations become more complicated than in the case of a single explanatory variable, they are easily carried out using stepwise regression algorithms that are standard with most statistical software packages. Once the equations for obtaining percentiles have been developed, one simply enters the selected explanatory variables to obtain the corresponding percentiles.

In some instances, it may be desirable to have percentiles available in tabular or graphic form, rather than (or in addition to) computerized printout of percentiles. Approximate tabulations can be made by defining strata obtained by dividing the explanatory variables into intervals. One may then compute a selected percentile of interest (such as the 95th) for the midpoint of each interval, then tabling these percentiles. Graphic displays may be obtained by retaining one variable (such as age) as a quantitative variable. For each of the strata defined by the remaining explanatory variables, one may graph Y against the single quantitative variable (age), including both data points and the estimated percentile line. Such graphic display is helpful in verifying that the algorithm functioned appropriately.

4. Multiple independent variables and multiple dependent variables. One is sometimes confronted with a battery of test results. Recognizing that healthy patients may often have one or two unusual values, one may desire to identify patients with a pattern of values that justifies further medical attention. Building on the analysis described previously, one may evaluate the possibility of unusual patterns by focusing on the Z scores obtained in the study of each dependent variable. Specifically, for each subject in the reference population, one computes the distance between the multivariate array of Z scores from the mean of the Z scores. The statistical measure of distance used is called Mahalanobis' distance (see Appendix), and the computations required are easily performed on most statistical software packages. Letting D represent the distance computed for each subject, one may set percentiles for these D values using the algorithm in section 1.

Example 7. The measurement of nerve conduction has proven to be a useful procedure in detection and characterization of disease of peripheral nerves. Typically, a response is elicited from a muscle or the nerve itself by suprathreshold stimulation of nerve fibers. For the purposes of this discussion, we will confine our interest to one nerve (the tibial nerve) and one attribute (the CMAP). The CMAP provides an approximate measure of the number of motor axons to foot muscles, but its size, shape, and latency may be affected by various functional and structural derangements of motor fibers, events at the neuromuscular junction, or in the muscle itself. Knowing the details of latency, amplitude, and configuration of the evoked potential, and especially how these depart from the normal, provides useful characterizing information important in diagnosis. It should be obvious that the more accurately one can define the normal characteristics of the response in health and correct for variables that affect the CMAP but are not due to disease of the nerve, junction, or muscle itself, the better definition one may gain of derangements of the neuromuscular apparatus itself. The variables that influence tibial CMAP are age, body surface area, and glycosylated hemoglobin. This knowledge of the variables that influence tibial CMAP came from a study of more than 300 healthy subjects selected at random from the Rochester, MN, population ^{[28]}.

Conclusions. Increasingly, it is possible to measure neurologic characteristics accurately and reproducibly. If this can be done, it may be desirable to prospectively evaluate, using the same measures, a randomly chosen cohort representative of the population. From this cohort, one should delete persons who have the dysfunction, abnormality, or disease being studied. It is also valuable to measure the variables that might influence the end-point or characteristic independently of the disease process. With study of a sufficiently large cohort (at least 100 persons, ideally 300 or more), it is possible to use an algorithm so that results specific for age, gender, and other variables might be derived. Use of personal computers should allow the operator to calculate specific percentiles of a characteristic considering a variety of patient variables. More-valid and accurate estimates of percentiles of abnormality allow a better separation of patients by the specific dysfunction or endpoints of interest.

Appendix. We describe the computations for D. Mahalanobis' distance. Additional details may be found in Morrison ^{[10]}. For each subject, we have an array of Z scores. If we let K represent the number of dependent variables, the array of Z scores may be expressed as a vector: Z' = (Z_{1}, Z_{2},.\.\.\, Z_{K}). The variance and covariance of the individual terms of the Z vector are represented by a matrix denoted by S. Using matrix notation,

- Copyright 1995 by Modern Medicine Publications, Inc., a subsidiary of Edgell Communications, Inc.

## REFERENCES

- 1.↵
- 2.
Dybkaer R, Jorgensen K, Nyboe J. Statistical terminology in clinical chemistry reference values. Scand J Clin Lab Invest 1975;35:45-74.
- 3.
- 4.
Dybkaer R. The theory of reference values. J Clin Chem Clin Biochem 1982;10:841-845.
- 5.↵
Elveback LR, Taylor WF. Statistical methods of estimating percentiles. Ann N Y Acad Sci 1969;161:538-548.
- 6.↵
Elveback LR, Guillier CL, Keating FR Jr. Health, normality, and the ghost of Gauss. JAMA 1970;211:69-75.
- 7.
Elveback LR. A discussion of some estimation problems encountered in establishing "normal values". In: Gabrieli ER, ed. Clinically oriented documentation of laboratory data. New York: Academic Press, 1972:117-137.
- 8.
Lott JA. Estimation of reference ranges: how many subjects are needed? Clin Chem 1992;38:648-650.
- 9.
Massod MF. Nonparametric percentile estimate of clinical normal ranges. Am J Med Technol 1977;43:243-252.
- 10.↵
Morrison DF. Multivariate statistical methods. 2nd ed. New York: McGraw Hill, 1976.
- 11.
Naus AJ, Borst A, Kuppens PS. Determinate of n-dimensional reference ellipsoids using patient data. J Clin Chem Clin Biochem 1982;20:75-80.
- 12.
- 13.
- 14.
Schneider AJ. Some thoughts on normal, or standard, values in clinical medicine. Pediatrics 1960;26:973-984.
- 15.
Winkel P, Lynbgbye J, Jorgensen K. The normal region--a multivariate problem. Scand J Clin Lab Invest 1972;30:339-344.
- 16.↵
Medical Research Council. Aids to the examination of the peripheral nervous system, memorandum no. 45. London, Her Majesty's Stationery Office. London: White Rose Press, 1975.
- 17.↵
Dyck PJ, Sherman WR, Hallcher LM, et al. Human diabetic endoneurial sorbitol, fructose, and myo-inositol related to sural nerve morphometry. Ann Neurol 1980;8:590-596.
- 18.
Dyck PJ. Quantitating severity of neuropathy. In: Dyck PJ, Thomas PK, Griffin JW, Low PA, Poduslo JF, eds. Peripheral neuropathy. 3rd ed. Philadelphia: WB Saunders, 1993:686-697.
- 19.↵
Dyck PJ, Daube J, O'Brien P, et al. Plasma exchange in chronic inflammatory demyelinating polyradiculoneuropathy. N Engl J Med 1986;314:461-465.
- 20.↵
- 21.↵
Dyck PJ, Karnes J, O'Brien PC, Swanson CJ. Neuropathy symptom profile in health, motor neuron disease, diabetic neuropathy, and amyloidosis. Neurology 1986;36:1300-1308.
- 22.↵
Dyck PJ, Karnes JL, O'Brien PC, Zimmerman IR. Detection thresholds of cutaneous sensation in humans. In: Dyck PJ, Thomas PK, Griffin JW, Low PA, Poduslo JF, eds. Peripheral neuropathy. 3rd ed. Philadelphia: WB Saunders, 1993:706-728.
- 23.↵
Dyck PJ. Quantitative sensory testing: a consensus report from the Peripheral Neuropathy Association. Neurology 1993;43:1050-1052.
- 24.↵
Kurtzke JF. A new scale for evaluating disability in multiple sclerosis. Neurology 1955;5:580-583.
- 25.
Kurtzke JF. On the evaluation of disability in multiple sclerosis. Neurology 1961;11:686-694.
- 26.
Kurtzke JF. Rating neurologic impairment in multiple sclerosis: an expanded disability status scale (EDSS). Neurology 1983;33:1444-1452.
- 27.↵
Dyck PJ, Chance P, Lebo R, Carney JA. Hereditary motor and sensory neuropathies. In: Dyck PJ, Thomas PK, Griffin JW, Low PA, Poduslo JF, eds. Peripheral neuropathy. 3rd ed. Philadelphia: WB Saunders, 1993:1094-1136.
- 28.↵
Dyck PJ, Kratz KM, Karnes JL, et al. The prevalence by staged severity of various types of diabetic neuropathy, retinopathy, and nephropathy in a population-based cohort: the Rochester Diabetic Neuropathy Study. Neurology 1993;43:817-824.

## Disputes & Debates: Rapid online correspondence

NOTE: All authors' disclosures must be entered and current in our database before comments can be posted. Enter and update disclosures at http://submit.neurology.org. Exception: replies to comments concerning an article you originally authored do not require updated disclosures.

- Stay timely. Submit only on articles published within the last 8 weeks.
- Do not be redundant. Read any comments already posted on the article prior to submission.
- 200 words maximum.
- 5 references maximum. Reference 1 must be the article on which you are commenting.
- 5 authors maximum. Exception: replies can include all original authors of the article.
- Submitted comments are subject to editing and editor review prior to posting.

## You May Also be Interested in

## Related Articles

- No related articles found.