READ | TEACHING @ | EVALUATING STUDENT LEARNING
designing |
grading |
learning
What We Can Learn From Evaluating Student Learning
As mentioned at the outset of this chapter, one important function of testing students is to determine how effective the learning opportunities provided by the course have been in enabling students to achieve the outcomes established for them. In order to make such a determination, it is necessary to evaluate the evaluation. Does the assessment differentiate among levels of student performance? Do assessments align in content to the desired learning outcomes? Which learning outcomes have been best achieved, and which need to be better taught?
Using Item Analysis to Test the Test
After a test has been administered and graded, a good way to judge how well it differentiates among students, particularly in the case of a limited-choice test, is to perform an item analysis. It is especially important to do this when test items will be reused or when there is sufficient doubt about the test results to consider dropping some items as invalid when computing the final grade. If machine scannable test forms are used and processed by the
Office of the University Registrar, the instructor will receive a printout with item analysis results already computed. See below for instructions on how to interpret these statistics or call Faculty & TA Development (614-292-3644) for a consultation.
If the instructor is scoring the test, standard statistical software packages (such as SPSS or DataDesk) are available for doing item analysis. It is possible to perform an item analysis without a computer, however, especially if the test is short and the class size is small. The information below describes how to compute the two most common item analysis statistics and describes the principles of these as well.
Procedures for Computing Difficulty and Discrimination Indexes
The
Difficulty Index of an item tells you the percentage of students who got the item correct.
The
Discrimination Index tells how well this item correlates with the entire test: Did the students who did well on the test in general do well on this question?
Follow these steps to compute Difficulty and Discrimination Indexes:
- Score each test by marking correct answers and putting the total number correct on the test.
- Sort the papers in numerical order according to the total score.
- Determine the upper, middle, and lower groups. One way to do this is to call the top 27% (some people use the top third) of the papers the upper group, the bottom 27% (some people use the bottom third) the lower group, and the remaining papers the middle group.
- Summarize the number correct and number wrong for each group.
- Calculate the Difficulty Index for each item by adding the number of students from all groups who chose the correct response and dividing that sum by the total number of students who took the test. The Difficulty Index will range from 0 to 1, with a difficult item being indicated by an index of less than .50 and an easy item being indicated by an index of over .80.
- Calculate the Discrimination Index by first calculating for both the upper and lower group students the percentage who answered each item correctly. Subtract the percentage of lower group students from the percentage of upper group students to get the index. The index will range from -1 to +1, with a discrimination over 0.3 being desirable and a negative index indicating a possibly flawed item. The grid above illustrates an item analysis for a simple set of scores for 37 students on a 10-item test. The names of the 10 students (approximately 27% of the total students) with the highest scores are listed as the "upper group;” the 10 students with the lowest scores (again, approximately 27%) are listed as the "lower group;” and the remaining 17 are listed as the "middle group.” On item 1, for example, the Difficulty Index was calculated by totaling the correct responses (C = correct response, I = incorrect response) and dividing by the number of students (19/37 = .51). The item appears to be on the difficult end of the range.

The Discrimination Index for the same item was obtained by first calculating the percent correct for both the upper and lower groups--20% and 90% respectively--then subtracting the percentage for the lower group from that of the upper group (.20 - .90 = -.70). This negative Discrimination Index indicates that the item is probably flawed.
Note that the students who scored poorly on the exam as a whole did well on this item and the students who got the top total scores on the exam did poorly--the reverse of what one would expect. A mistake in the answer key or some error in the question that only the more astute students would catch might be the cause. Questions that are inappropriately difficult and those that fail to discriminate effectively should be revised.