Angoff Best Practices

Best Practices for Angoff Standard Setting Workshops

In this post we provide insights and best practices that we have learned from experience from conducting dozens of Angoff standard setting workshops.

Conceptually, an Angoff standard setting workshop is an easy thing: set the minimally qualified candidate (MQC) description and get the subject matter experts to predict how the minimally qualified candidate will perform on each item. In practice there are a few things that we have learned through experience that you can do to make your Angoff workshop better. We've tried to order them in roughly sequential order.

An Authoritative Version of the Items and Ratings

In the past SMEs have participated in Angoffs referring to the items in a text document or PDF and then recording their predictions in a spreadsheet. If the facilitator needs to adjust an item then coordinating those changes to the group becomes a challenge. Once the SMEs are done with their ratings then the facilitator needs to collect and combine all of the ratings for analysis. Disseminating the items and collecting the ratings is time consuming and introduces the potential for errors in the process. We have found that having a central place as the authoritative source for the items and a single place for collecting the ratings without having to do any manual collection or combining is much more effective. Even better if you can use a single integrated system to present the items and collect the ratings.

Use the Angoff to Catch Gross Errors

Best practice is to have the items technically reviewed prior to the Angoff process. The items should be done, reviewed, and as close to their final versions as possible. Many times we have seen an Angoff workshop devolve into a glorified technical review. It is hard for the SMEs to make predictions on items that are still in a raw, unfinished, unreviewed state. Things really get bogged down and everyone loses focus when an Angoff turns into a wordsmithing workshop.

Angoff workshops should be for making predictions on how the MQC will perform on the items and NOT for fixing problems with the items. Once in a while there are gross errors in an item that need to be corrected and that is ok. Those kinds of errors are generally easy to find and correct.

If you can ensure that the items are tech reviewed prior to the Angoff your workshop we go much more smoothly.

SME Interaction

During an Angoff we have found that there needs to be a centralized place for the SMEs to leave observations on the items. This way only one SME has to take the time to write up an observation and everyone else can simply agree with OR refute the observation. It makes things much more productive and does not lead to group-think or trending in the predictions.

Two Angoff Rounds Work Best

The are two main benefits of doing two Angoff rounds:

The SMEs are forced to look at each item twice. I can't count the number of times SMEs have said, "Oh, I read this item wrong the first time" or "I missed that last time." Some times they are rushed or distracted during the first round and miss things that they catch the second time through. Being led through the items a second time gives them a chance to re-evaluate the items and their initial predictions.
The second round discussions often bring out insights and perspectives the SMEs have not all considered. Each SME has valid and valuable experience with the material and has something to share. Doing so in a controlled, facilitated manner is very effective for the group.

Discussion if Range Exceeds Threshold

In an Angoff second round discussion pay particular attention to items where there is a range greater than 30 between the lowest prediction and the highest prediction. A range greater than 30 represents a huge difference in opinions on how the minimally qualified candidates will perform on the item. The discussion will bring out the logic, rationale, and experience that led each SME to predict the way they did. The purpose of the discussion is NOT to sway the other SMEs to a single way of thinking OR to make the predictions more aligned with each other. Very often after hearing another SME's logic and rationale we have heard SMEs say "That is a valid point. I will adjust my prediction."

There are no Bad Predictions

Some times a SME will feel self-conscious about his predictions. All predictions are accurate and valuable because they are based on the SME's experience working with candidates who have the same skills, knowledge, and experience as the minimally qualified candidate. It is good to reassure the SMEs that their predictions are all equally valid and valuable.

Avoid Very Low and Very High Predictions

During an Angoff a SME will some times be tempted to predict that 0% of MQCs will get an item correct because it is too hard. In reality, a candidate relying only on dumb luck could guess the right answer some portion of the time. Using other test taking techniques such as elimination increase the candidates chances for getting the item correct.

Likewise, SMEs will some times be tempted to predict that 100% of MQCs will get an item correct because it is so easy. In reality, it is important to consider how many MQCs will mis-read the item or talk themselves out of the answer because it seemed too easy.

In general, if a SME predicts very low or very high we ask them to be prepared to justify their answer. Many times such predictions indicate a problem with an item.

How to Derive the Minimally Qualified Candidate Description

Here are a few ideas that you can consider in generating the MQC description for your Angoff:

What skills/knowledge/experience would a minimally qualified candidate have to have in order to accomplish the tasks represented by this objective?

Start with statements describing your exam content areas. This is generally a good place to start the discussion. It frames the scope of content and you just need to fill in the details and describe what the content areas mean in the context of the MQC.
Identify the kinds of tasks the candidates will be expected to perform related to each content area.
For each content area, ask the SMEs leading questions like "What skills/knowledge/experience would a minimally qualified candidate have to have in order to accomplish the tasks represented by this objective?"
Sometimes it makes sense to equate a particular level of knowledge with the content of a specific course, even if the course is not a prerequisite for the exam.
Maintain focus on the minimally qualified candidate. Sometimes the conversation will drift to levels that represent a more qualified candidate.

Really Emphasize the MQC Description

The minimally qualified candidate description cannot be over-emphasized. You might feel like you are beating a dead horse by often bringing up the MQC description. You know you've done your job well when the SMEs remind each other of the MQC so you don't have to. Try to work "MQC" into the questions you ask of the SMEs, "Will you help us understand your thought process in predicting that so many MQCs would get this item correct?"

Visualize the MQC

Subject matter experts (SMEs) need to really connect with the description of the minimally qualified candidate. We have found that it really helps to ask the SMEs to visualize someone they have worked with who matches the description of the MQC. The question becomes more clear, more real, and more concrete, "What portion of people like the person you visualized would get this item correct?"

Should vs Would

Sometimes SMEs ask themselves "How many MQCs should get this item right?" We want all of the candidates to get every item right. They should ALL get the items right. Reality is that the candidates are not all equally prepared for the exam and not all of them would get the items right. The distinction between should and would is subtle one but we have found it helpful to frame the SME's predictions with a "How many would...?" question instead of a should question.

Eliminate some SME Predictions

At the end of the process there are going to be individual SMEs whose predictions for whatever reason were consistently and significantly lower or higher than the other SMEs' predictions. This individual SME's predictions if included in the final analysis could single-handedly bring the standard/cutscore for the exam down. These kinds of predictions should NOT be included in the final analysis and derivation of the standard or cutscore.