Criticism of Standards

In this post we review the major criticisms of the standards and the methods are deriving them.

Standards and the methods to establish them are not without critics. Perhaps the most famous and vocal opponent to standard-setting is Gene Glass. Glass pointed out that many of the decisions made in setting standards are based on arbitrary decisions like which judges are chosen, the standard set by the judges, method used, and so forth. (Shepard, 1984). Due to the arbitrary nature of the standards, Glass questioned the usefulness of setting standards (Hambleton & Eignor, 1980). In response to this criticism, W. Popham noted that there are two dictionary definitions of arbitrary. One definition implies capriciousness, while the other describes arbitrary decisions as made with careful deliberation. While admitting that all standard-setting methods require human judgment, they are not necessarily arbitrary in the capricious sense. Since “classification decisions are often unavoidable, the judgments should be made as defensibly and reasonably as possible” (Shepard, 1984, p. 170).

Another frequently mentioned criticism of standard-setting methods is their lack of reliability. Findings of studies documenting the reliability of standards have been mixed. Several studies have found standards vary depending on the type of method used, and show very little reliability (Behuniak, Archambault, & Gable, 1982). Other studies like a two-part study by Norcini and Shea (1992) have found different results. This study first compared the standards established by two different groups of judges using the same method and items, and then compared the standards set by a single group of judges two years apart. The study found that the standards were similar when set using the same method and different judges. Similarly, standards were comparable when set two years apart by the same judges. As part of the methodology used by the authors, judges were given performance values for the examinees taking the exam. The authors reported that the correlation between the estimates and the performance values provided was very high (Norcini & Shea, 1992). A possible alternative explanation of these findings may be that the judges were simply following the patterns of the performance data and not truly providing estimates for the standard as required by the methodology.


Behuniak, P., Archambault, F. X., & Gable, R. K. (1982). Angoff and Nedelsky standard-setting procedures: Implications for the validity of proficiency test score interpretation. Educational and Psychological Measurement, 42, 247-255.

Hambleton, R. K., & Eignor, D. R. (1980). Competency test development, validation, and standard-setting. In R. M. Jaegger & C. K. Tittle (Eds.), Minimum competency achievement testing: Motives, models, measures, and consequences (pp. 367-396). Berkeley, CA: McCutchan.

Norcini, J. J., & Shea, J. A. (1992). The reproducibility of standards over groups and occasions. Applied Measurement in Education, 5(1), 63-72.

Shepard, L. A. (1984). Setting performance standards. In R. A. Berk (Ed.), A guide to criterion-referenced test construction (pp. 169-198). Baltimore, MA: The John Hopkins University Press.