One of the most troublesome challenges of classroom instruction is the classroom test. Some teachers create their own tests and others use “book tests.” But the concern among principals—and complaints among parents and students—remains: Are we testing what we’re teaching, and are we using valid tests?
Classroom Tests Compared to Other Tests
Classroom Tests. But the purpose of the classroom test—usually at the completion of a Unit or chapter—is to measure student mastery of specific standards for just that Unit. Actually, it is this level of test for which the classroom teacher is most responsible. And it should be the most helpful. The results are quickly obtained, they are specific to what has just been taught, and they are diagnostic as to individual student needs. But classroom tests should not be about “points earned” toward a passing grade. They must reflect “standards mastered” toward a composite picture of learning.
But classroom tests are only one side of the assessment coin. The other side is the array of commercial tests that are external to the classroom. A few of these are listed below:
- High Stakes Tests. The purpose for high-stakes tests is to measure the levels of academic performance on state-level content standards. The scores compare students to their age-mates within a district, a state, or even the country. For example, there is the AIR Test (from the American Institutes for Research); the PARCC Test (from the Partnership for Assessment of Readiness for College and Careers), and various State Proficiency Tests. The states actually use these tests to rate districts in terms of the number of students who are proficient. Additionally, these test scores reflect each student’s “AYP” (or average yearly progress) to determine the direction of his or her individual growth.
- Commercial Diagnostic Tests. These tests identify individual student strengths and weaknesses with a set of specific academic skills. The scores inform teachers where and how to intervene with each student to improve proficiency. For example, there is the Stanford Diagnostic Reading Test, the Key Math test, and DIBELS (Dynamic Indicators of Basic Early Learning Skills).
- Benchmark Tests. Many districts also use semester or quarterly Benchmark tests. These tests determine student mastery of standards taught in a particular quarter. Collectively, they represent a year’s worth of mastery.
- Standardized Achievement Tests. A major spoke in the commercial test wheel is the classic Achievement Test. These tests determine student mastery of a select array of skills in comparison to other students their age around the state or the country. Samples of these are the ACT, the SAT, the Terra Nova, and the Iowa Test of Basic Skills.
The Structure of Traditional Classroom Paper-Pencil Tests
If districts are actually data-driven—and not just SAY they are—their classroom assessments fall at two intervals. One level is FORMATIVE (interim or short-cycle tests) to immediately determine if and what re-teaching is needed. The second level is SUMMATIVE (or end-of-unit assessments) to determine student mastery at the end of instruction. Teachers may call their summative tests Chapter tests or Unit Tests.
In some cases, teachers use the tests published by their textbook companies. In other cases, they create their own. Whichever, there are several important considerations in the selection or creation of classroom tests:
- The test items must be validly constructed to measure the Unit standards—not just the topic but the level of rigor required as well;
- The tests must parallel the formats students will see on their high-stakes tests. These are (1) Multiple Choice, some of which have more than one correct answer; and (2) Constructed Response items that require students to show they can extend the concept beyond a classroom situation. This may involve:
- Error analysis
- Citing text detail from a document
- Showing their work
- Writing an explanation
- Drawing a diagram or making a graphic
- Interpreting data
- Comparing two or more documents
- The test items actually reflect what has been taught—again at the level of rigor required in the standards.
In our experience, very few teachers have had training in how to construct or select effective tests. And now with the current emphasis on rigorous academic standards, the challenge is even greater.
EdFOCUS helps teachers learn to construct valid test items that reflect the content standards. The training includes:
- How to decide which type of test item is the most valid for the standard being measured—multiple choice or constructed response or both?
- How to unpack each standard in terms of its content and its cognitive demand.
- How many test items are needed to determine mastery of a standard.
- Whether multiple standards can be assessed with the same test items.
- How to construct Multiple Choice items:
- how to construct valid stems
- how to devise diagnostic distractors so they reveal specific “misunderstandings” that help pinpoint the exact need for intervention
- How to design Constructed Response items:
- how to develop items that require students to construct meaning for themselves, as per both the topic and the level of rigor required
- how to pre-write expected answers that align with expectations, but also reveal any needs for misunderstanding
The Necessity of Performance or Authentic Classroom Assessments
In contrast to lists of splinter skills and individual steps in learner outcomes (such as “add 2-digit numbers with regrouping”), the current more rigorous content standards are holistic and performance-based (such as “solve real-world math problems involving 2-digit numbers”). That is, students are expected to actually apply what they have learned in the classroom to scenarios from daily living. The mastery of performance standards cannot be determined solely by traditional tests. To verify independent and enduring mastery, the assessments must be authentic, life-based, and parallel to the standards themselves.
The idea of performance assessments is not new. It’s been the mainstay of Career and Technical schools for three decades, and the professions of law, medicine, and even plumbing have always used performance to determine competence. To “grade” or evaluate the quality of performance, teachers should use a Rubric or checklist of criteria taken from the standards being measured. EdFOCUS has seen firsthand the negative results of districts purchasing books of Rubrics that do not match the standards.
EdFOCUS consultants provide teachers with the rationale behind authentic assessments and offer several sample formats. These include a variety of original written products, original math problems, error analyses, and the deep-level on-demand analyses of unfamiliar texts and documents. Rather than starting from scratch, teachers are also provided actual performance assessments they can adapt. EdFOCUS is proud to have samples at all grade levels and subject areas. A few are listed below:
- Math: Given the purchase of a used boat for $20,000 and a depreciation factor of 15% per year, the student writes an exponential “depreciation” model to represent the value of the boat after 3.5 years.
- English/Language Arts: Each student presents the pro and con of a position (e.g., school uniforms), and presents it as a 3-minute newscast to present on FOX or CNN as a “talking head.”
- Science: The student describes the activities of the cell by creating graphic organizer to show how each activity in the cell process works together for the benefit of the individual cell as well as the overall organism.
- Social Studies: Each student devises a lesson plan to teach younger children about the 1930s that includes (a) the Great Depression; (b) the Dust Bowl; (c) the New Deal; and (d) an impact or a lesson learned for our own times. Include a presentation “script,” visuals, and at least one student handout.
For each authentic or performance assessment they add to their Units, teachers are provided help in creating a scoring Rubric—drawn from the standards reflected by the project.