Testing a test
A report card for DWI in acute stroke
Citation Manager Formats
Make Comment
See Comments

In this issue of Neurology, two reports describe the use of MRI in acute stroke.1,2 Both conclude that diffusion-weighted imaging (DWI) is better than CT or conventional MRI for patient care. How should the data be evaluated? If these were treatment trials, the answer would be straightforward. Prospective, randomized, blinded, placebo-controlled trials designed to minimize bias and carried out with a sufficient number of patients and an important clinical endpoint are widely accepted as the best evidence for treatment efficacy. Other study designs are more subject to bias and provide weaker evidence. Specific criteria based on assessment of the quality of the published evidence are used to evaluate therapies and provide recommendations for patient care.3-5
Similar evidence-based standards exist for the evaluation of studies of diagnostic tests. Proper study design to minimize bias is as critical for evaluation of diagnostic tests as it is for the evaluation of new treatments. Studies with methodologic shortcomings tend to overestimate the accuracy of diagnostic tests.6 The optimal design for assessing the accuracy of a diagnostic test is a prospective, blinded comparison of the test to a reference standard in a consecutive series of patients from a relevant clinical population.6 Prospective studies with predefined endpoints are necessary to prevent the bias that occurs when results are used to guide the selection of endpoints or establish criteria for a positive test result. Even subsequent blinded reading cannot overcome this problem. If the comparison is to an existing technology, patients must be selected before either test is done; the tests must be performed close together in time, obtained in random order, and interpreted independently of each other.7,8
There are a variety of methodologic aspects to proper study design that are specific to the evaluation of diagnostic tests.6,9-12 Some have been discussed in a recent Neurology editorial and will be mentioned here only briefly.13 Selection bias occurs when the study is not performed in a consecutive series of patients. Spectrum bias or referral bias occurs when a test performs differently in different groups of patients. It is therefore important to evaluate the test in the same patient population in which it will be used. Work-up bias (verification bias) occurs when the results of the test under study influence those who are referred for the reference standard. Review bias occurs when the interpretation of the diagnostic test is not performed completely blinded to all other information. Incorporation bias occurs when the results of the test under study are included among criteria to establish the reference standard.
Kent and Larson7 proposed four grades of methodologic quality for studies of diagnostic tests:
-
A) Studies with broad generalizability to a variety of patients and no significant flaws in research methods
-
B) Studies with a narrower spectrum of generalizability and with only a few flaws that are well described so that their impact on conclusions can be assessed
-
C) Studies with several flaws in research methods, small sample sizes, or incomplete reporting
-
D) Studies with multiple flaws in research methods or reports of opinion unsubstantiated by data
Rigorous double-blinded randomized controlled trials (RCTs) are grade A evidence. Grade B evidence includes RCTs with narrow spectrum. All good observational and non-RCTs are grade C. Flawed observation and non-RCT are grade D. Grade A and B studies are considered to be high-quality evidence. Grade C studies provide weak evidence and grade D studies are considered noncontributory.7
In addition to these grades of evidence based on methodologic quality, five levels of clinical efficacy have been proposed for assessing diagnostic technology: 1) technical capacity; 2) diagnostic accuracy; 3) diagnostic impact; 4) therapeutic impact; and 5) patient outcome.7 As with trials of therapeutic agents, the most important criterion for the usefulness of a diagnostic test is whether it leads to a change in management that improves patient outcome (level 5 efficacy).14 The endpoints for patient outcome studies are important clinical measurements of patient well-being: e.g., survival, relief of pain, functional status, and quality of life.7 Therapeutic impact (level 4 efficacy) is the effect of the diagnostic test on patient care. New diagnostic tests may reduce test-associated risks or discomforts, reduce costs, improve treatment selection (i.e., eliminate unnecessary surgery without affecting overall outcome), or provide more accurate prognostic information. Establishing level 4 efficacy provides a less compelling basis for clinical utility than does establishing level 5 efficacy.12,14 Documenting changes in management or even cost savings without assessing their effect on patient outcome can be misleading. In one study, MRI of the head and spine led to changes in diagnosis and management plans in a majority of patients but no change in quality of life 4 months later.15 Levels 1 through 3 address lesser degrees of clinical efficacy. Diagnostic impact (level 3) is the accuracy and clinical value compared with existing alternatives. Diagnostic accuracy (level 2) is the accuracy in detecting and classifying pathology, typically measured in terms of true positive ratios, false positive ratios, and receiver operating curves. Both levels 2 and 3 require comparison with a reference standard. Technical capacity (level 1 efficacy) is the capability of the technique to reproducibly display recognizable images that demonstrate pathology with good intra- and inter-observer reliability.
How, then, do these two Neurology reports fare when evaluated by these criteria? Lansberg et al. report a comparison of DWI and CT in acute stroke in 19 consecutive patients.1 This study utilized prospective data collection in a consecutive series of appropriate patients, but it is not clear if the endpoints were specified before the start of the study. Reviewers were blinded to clinical information, but it is not specified whether CT and MRI were interpreted without knowledge of the other test (possible review bias). The reference standard for infarct size and localization was DWI at 36 hours. Thus, the study suffers from incorporation bias because it compares the ability of the early DWI to predict the late DWI. Because CT and DWI measure different tissue characteristics, the bias is against CT. Also, CT scans were performed earlier than DWI, further biasing the data in favor of DWI. The three endpoints chosen have no demonstrated relationship to patient outcome. Accuracy in localizing cerebral infarction to middle cerebral, posterior cerebral, or anterior cerebral artery territories within the first 7 hours has not been demonstrated to be important for determining treatment. Middle cerebral artery infarct volume >33% as a risk factor for hemorrhage with thrombolytic therapy was established from evaluation of early CT in patients undergoing treatment within 6 hours of stroke onset.16 It does not necessarily follow that such a determination based on DWI would also predict increased risk of hemorrhage, as the two diagnostic modalities measure different tissue characteristics and identify different patients. Furthermore, there is no evidence to indicate that this finding on CT should influence the use of thrombolytic therapy within the first 3 hours, the time window in which treatment is effective.17 The correlation of acute lesion volume with final infarct size may be of interest, but it is also of unproven clinical importance. This study provides grade C evidence for level 3 efficacy.
Albers et al. analyzed 40 consecutive patients with acute stroke who underwent DWI within 6 to 48 hours after symptom onset.2 They used prospective data collection in a consecutive series of appropriate patients, but it is not clear if the endpoints were specified before the start of the study. Reviewers were not blinded to clinical information and DWI were interpreted immediately after baseline T2 and proton density images (review bias). They report the incidence of “potentially clinically relevant findings”: localization by vascular territory; lesions in different vascular territories suggestive of proximal source of embolism; and clarification that the lesions on the conventional MRI were actually not acute. The choice of “potentially clinically relevant findings” is not based on any data cited in the manuscript that these findings affect patient outcome. The definition of “different vascular territories” includes the distinction between isolated subcortical infarcts and other hemispheric infarcts. There was only one patient in whom the initial lesion was thought to be in a truly different vascular territory (posterior circulation versus the anterior circulation). The authors provide no data to substantiate the claim that the presence of multiple lesions in different vascular territories by DWI is associated with a higher incidence of proximal sources of emboli after making necessary adjustment for other evidence of cardiac pathology such as abnormalities on EKG, chest X-ray, history, and physical examination. Until these data are derived from a large series of patients, the predictive value of this finding over and above other commonly recognized clinical predictors is unknown. Finally, for the differentiation of acute from a subacute lesion, the reference standard used for lesion age was DWI (incorporation bias). This same type of bias is present for the assessment of vascular localization, as DWI data were not explicitly excluded from the data set used as the reference standard for localization. Information regarding therapeutic impact is limited to a few case reports. This study provides grade C evidence for level 3 efficacy and grade D evidence for level 4 efficacy.
Defining criteria to evaluate studies of diagnostic testing establishes a basis for recommendations regarding use in clinical practice. These studies provide grade C evidence for diagnostic impact (level 3 efficacy). Evidence for therapeutic impact (level 4 efficacy) is anecdotal (grade D). No information on patient outcome (level 5 efficacy) is provided. Such evidence is not of sufficient quality grade or efficacy level to support a recommendation for the use of DWI in the routine care of patients with acute ischemic stroke. Further studies with more rigorous experimental design and endpoints that are directly related to patient outcome would be necessary to determine the role of DWI in the care of patients with acute stroke.
Footnotes
-
See also pages 1548, 1552, 1557, and 1562
References
- ↵
Lansberg MG, Albers GW, Beaulieu C, Marks MP. Comparison of diffusion-weighted MRI and CT in acute stroke. Neurology 2000;54:1557–1561.
- ↵
Albers GW, Lansberg MG, Norbash AM, et al. Yield of diffusion-weighted MRI for detection of potentially relevant findings in stroke patients. Neurology 2000;54:1562–1567.
- ↵
-
Broderick JP, Adams HP, Barsan W, et al. Guidelines for the management of spontaneous intracerebral hemorrhage: a statement for healthcare professionals from a special writing group of the Stroke Council, American Heart Association. Stroke 1999;30:905–915.
- ↵
American Academy of Neurology.Practice parameter: stroke prevention in patients with nonvalvular atrial fibrillation. Report of the Quality Standards Subcommittee of the American Academy of Neurology. Neurology 1998;51:671–673.
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
Holloway RG, Feasby TE. To test or not to test? That is the question. Neurology 1999;53:1905–1907.
- ↵
Henkin RE. “Frankly, my dear, I don’t give a damn.” J Nucl Med 1996;37:1073–1074.
- ↵
Dixon AK, Southern JP, Teale A, et al. Magnetic resonance imaging of the head and spine: effective for the clinician or the patient? BMJ 1991;302:78–82.
- ↵
- ↵
NINDS t-PA Stroke Study Group.Intracerebral hemorrhage after intravenous t-PA for ischemic stroke. Stroke 1997;28:2109–2118.
Disputes & Debates: Rapid online correspondence
REQUIREMENTS
If you are uploading a letter concerning an article:
You must have updated your disclosures within six months: http://submit.neurology.org
Your co-authors must send a completed Publishing Agreement Form to Neurology Staff (not necessary for the lead/corresponding author as the form below will suffice) before you upload your comment.
If you are responding to a comment that was written about an article you originally authored:
You (and co-authors) do not need to fill out forms or check disclosures as author forms are still valid
and apply to letter.
Submission specifications:
- Submissions must be < 200 words with < 5 references. Reference 1 must be the article on which you are commenting.
- Submissions should not have more than 5 authors. (Exception: original author replies can include all original authors of the article)
- Submit only on articles published within 6 months of issue date.
- Do not be redundant. Read any comments already posted on the article prior to submission.
- Submitted comments are subject to editing and editor review prior to posting.