RT Journal Article SR Electronic T1 Education Research: Bias and poor interrater reliability in evaluating the neurology clinical skills examination JF Neurology JO Neurology FD Lippincott Williams & Wilkins SP 904 OP 908 DO 10.1212/WNL.0b013e3181b35212 VO 73 IS 11 A1 Schuh, L. A. A1 London, Z. A1 Neel, R. A1 Brock, C. A1 Kissela, B. M. A1 Schultz, L. A1 Gelb, D. J. YR 2009 UL http://n.neurology.org/content/73/11/904.abstract AB Objective: The American Board of Psychiatry and Neurology (ABPN) has recently replaced the traditional, centralized oral examination with the locally administered Neurology Clinical Skills Examination (NEX). The ABPN postulated the experience with the NEX would be similar to the Mini-Clinical Evaluation Exercise, a reliable and valid assessment tool. The reliability and validity of the NEX has not been established. Methods: NEX encounters were videotaped at 4 neurology programs. Local faculty and ABPN examiners graded the encounters using 2 different evaluation forms: an ABPN form and one with a contracted rating scale. Some NEX encounters were purposely failed by residents. Cohen’s kappa and intraclass correlation coefficients (ICC) were calculated for local vs ABPN examiners. Results: Ninety-eight videotaped NEX encounters of 32 residents were evaluated by 20 local faculty evaluators and 18 ABPN examiners. The interrater reliability for a determination of pass vs fail for each encounter was poor (kappa 0.32; 95% confidence interval [CI] = 0.11, 0.53). ICC between local faculty and ABPN examiners for each performance rating on the ABPN NEX form was poor to moderate (ICC range 0.14-0.44), and did not improve with the contracted rating form (ICC range 0.09-0.36). ABPN examiners were more likely than local examiners to fail residents. Conclusions: There is poor interrater reliability between local faculty and American Board of Psychiatry and Neurology examiners. A bias was detected for favorable assessment locally, which is concerning for the validity of the examination. Further study is needed to assess whether training can improve interrater reliability and offset bias. ABIM=American Board of Internal Medicine; ABPN=American Board of Psychiatry and Neurology; CI=confidence interval; HFH=Henry Ford Hospital; ICC=intraclass correlation coefficients; IM=internal medicine; mini-CEX=Mini-Clinical Evaluation Exercise; NEX=Neurology Clinical Skills Examination; RITE=residency inservice training examination; UC=University of Cincinnati; UM=University of Michigan; USF=University of South Florida.