Standard Settings Overview

In this post we provide an overview of the standard setting process and where it fits in the test development process.

In the test development process, the setting of standards is arguably the most controversial step. A standard is established to separate those with enough knowledge and skill in the testing area from those lacking such knowledge and skill. The intent of the standard is to answer the question “How much is enough” (Livingston & Zieky, 1989, p. 121), not to discover the single correct point at which a candidate can be clearly identified as a master or novice. The imposition of a cutscore, or the minimum score that a candidate must obtain to pass the exam, generally creates an artificial barrier separating masters from nonmasters. In reality there is generally very little difference between a candidate receiving a score one point below the cutscore and a candidate receiving a score at the cutscore (Dwyer, 1996). Obviously, identifying the cutscore is not an easy issue. Robert Ebel referred to this dilemma when he stated:

There is a widespread popular belief that any person who takes a test either passes or fails it…. This is patently false…. A second popular belief is that when a test is used to pass or to fail examinees, the distinction between the two outcomes is clear-cut and unequivocal. This is almost never true. (Ebel, 1979. p. 337)

The process of setting a cutscore generally involves reducing an entire continuum of knowledge and skill into a dichotomous pass/fail scale (Dwyer, 1996). Where this point should be placed is generally the subject of much debate. To further complicate the issue, there are myriads of methods to establish this standard. Characteristics or focus points in each of these methods can have a large influence on the final standard. Decisions regarding the selection of the method to set a standard must be made by evaluating carefully the circumstances and characteristics of the measurement instrument. After selecting the method, the standard-setting session should be carried out methodically to ensure the best possible adherence to the tenants of the methodology. According to Shepard (1984), “the validity of the final classification decisions will depend as much upon the validity of the standard as upon the validity of the test content” (p. 169).

REFERENCES

Dwyer, C. A. (1996). Cut scores and testing: Statistics, judgment, truth and error. Psychological Assessment, 8(4), 360-362.

Ebel, R. L. (1979). Essentials of educational measurement. Englewoods Cliffs, NJ: Prentice-Hall.

Livingston, S. A., & Zieky, M. J. (1989). A comparative study of standard-setting methods. Applied Measurement In Education, 2(2), 121-141.

Shepard, L. A. (1984). Setting performance standards. In R. A. Berk (Ed.), A guide to criterion-referenced test construction (pp. 169-198). Baltimore, MA: The John Hopkins University Press.