How will we evaluate student performance on tasks? (Part 6)

Published in

Performance Task PD with Jay McTighe — Blog

13 min readMar 2, 2016

Student responses to assignments and assessment items that have a single, correct answer can be scored using an answer key or a scanning machine. In contrast, performance tasks are typically open-ended and therefore, teachers must use their judgment when evaluating the resulting products and performances. By using a set of established criteria aligned with targeted standards/outcomes, it is possible to fairly, consistently, and defensibly make a judgment-based evaluation of students’ products and performances. In this blog, we’ll explore: 1) four types of criteria for evaluating student performance on open-ended tasks; 2) four kinds of criterion-based evaluation tools; 3) practical processes for designing effective rubrics, and, 4) benefits to teachers and students.

Types of Evaluative Criteria

Criteria are guidelines or rules for judging student responses, products or performances. In essence, they describe what is most important in student work in relation to identified learning goals. Criteria serve as the foundation for the development of a rubric, a tool for evaluating student work according to a performance scale.

I propose four general categories of criteria that can be used to evaluate student work depending on the targeted standards or outcomes and the purpose of the performance task. The four criterion types focus on evaluating content, process, quality, and impact. Let’s consider each type.

Content criteria are used to evaluate the degree of a student’s knowledge and understanding of facts, concepts and principles.
Process criteria are used to evaluate the proficiency level of performance of a skill or process, as well as the effectiveness of the methods and procedures used in a task.
Quality criteria are used to evaluate the overall quality and craftsmanship of a product or performance.
Impact criteria are used to evaluate the overall results or effects of a product or performance given its purpose and audience.

Figure 6.1 presents some descriptive terms associated with each of the four criterion types.

Figure 6.1 — Descriptive Terms for Criterion Types

Criterion Types
Descriptive Terms (examples)

Content
accurate, clearly explained, complete, expert, knowledgeable,

Process
collaborative, coordinated, efficient, methodical, precise

Quality
creative, organized, polished, effectively designed, well crafted,

Impact
entertaining, informative, persuasive, satisfying, successful

Here is an example in which all four types of criteria are used to evaluate the dining experience in a restaurant:

Content — the server accurately describes the appetizers, main courses, side items, desserts and drinks; all meals and drinks are correctly delivered as ordered
Process — the kitchen staff collaborates well and coordinates with the server; the server checks on diners regularly
Quality — all the dishes are cooked to taste, presented in an aesthetically pleasing manner, and served in a timely fashion
Impact — the meal is tasty and satisfying to all diners

It is important to note that in this example, the four criteria are relatively independent of one another. For example, the server may accurately describe the content of the menu items, but the food may arrive late and be overcooked. When different traits or criteria are important in a performance, they should be evaluated on their own. This analytic approach allows for more specific feedback to be provided to the learner (as well as to the teacher) than does an overall, holistic rating.

While these four categories reflect possible types of criteria, I certainly do not mean to suggest that a teacher should use all four types for each and every performance tasks. Rather, teachers should select only the criterion types that are appropriate for the targeted standards or outcomes, as well as the specific qualities for which you want to provide feedback to learners. Having said this, I want to make a case for the value of including Impact Criteria in conjunction with authentic performance tasks. The more a task is set in an authentic context, the more important it is to consider the overall impact of the resulting performance. Indeed, we want students to move beyond “compliance” thinking (e.g., How many words does it have to be? Is this what you want? How many points is this worth?) to consider the overall effectiveness of their work given the intended purpose and target audience. Impact criteria suggest important questions that students should ask themselves. For example:

Did my story entertain my readers?
Was my diagram informative?
Could the visitor find their way using my map?
Did I find answers to my research questions?
Was my argument persuasive?
Was the problem satisfactorily solved?

Educators can help their students see purpose and relevance by including Impact criteria as they work on authentic performance tasks.

So… given these four types of criteria, how should a teacher decide which criteria should be used to evaluate student performance on a specific task? The answer may surprise you. In a standards-based system, criteria are derived primarily from the targeted standards or outcomes being assessed, rather than from the particulars of the performance task. For example, if a teacher is focusing on the CCSS E/LA Standard of Informative Writing, then the criteria for any associated performance task will likely require students to be: accurate (i.e., the information presented is correct), complete (i.e., all relevant aspects of the topic are addressed), clear (i.e., the reader can easily understand the information presented; appropriate descriptive vocabulary is used), organized (i.e., the information is logically framed and sequenced), and conventional (i.e., proper punctuation, capitalization, spelling, and sentence formation/transitions are used so that the reader can follow the writing effortlessly).

This point may seem counter-intuitive: How can you determine the evaluative criteria until you know the task? What if one version of a task required students to produce a visual product (e.g., a poster or graphic organizer) while another version of the same task asked students to give a verbal explanation? Certainly, there are different criteria involved in evaluating such different products and performances!

Indeed, there may be different secondary criteria related to a particular product or performance. For example, if students were to create a visual product to show their understanding of a concept in history, then we could include quality criteria (e.g., the visual should be neat and graphically appealing). However, the primary criteria in this example should focus on the content associated with the history standard instead of simply the qualities of the product (in this case, a visual display).

This point can be lost on students who tend to fixate on the surface features of whatever performance or product that they are to develop at the expense of the content being assessed. For example, think of the science fair projects where the backboard display is a work of art, while the depth of the science content or the projects’ conclusions are superficial.

Criterion-Based Evaluation Tools

Once the key criteria have been identified for a given performance (based on the targeted standards/outcomes), we can use them to develop more specific evaluation tools. Let’s now examine four types of criterion-based scoring tools used to evaluate student performance — criterion list, holistic rubric, and analytic rubric.

Criterion List

A basic and practical tool for evaluating student performance consists of a listing of key criteria, sometimes referred to as a performance list. For example, my wife was a high school art teacher and department chair. She and her department colleagues identified the following four key criteria that they used in evaluating student art portfolios.

Composition — Effective use of elements of art and principles of design in organizing space.
Originality — Evidence of development of unique ideas.
Visual Impact — Sensitivity in use of line, color and form to effectively convey ideas and mood.
Craftsmanship — Skill in use of media tools and technique. Attention to detail and care for results.

Here is another example of a criterion list for composing a fairy tale (Figure 6.2):

Figure 6.2 — Criterion List for a Fairy Tale

Key Criteria

1. Plot — The plot has a clear beginning, middle, and end that is carried throughout the tale.

2. Setting — The setting is described with details and shown through the events in the story.

3. Characterization — The characters are interesting and fit the story.

4. Details — The story contains descriptive details that help explain the plot, setting, and characters.

5. Fairy Tale Elements — The story contains the elements of a fairy tale (i.e.: appropriate characters, settings of the past, events that can’t really happen, magical events, etc.).

6. Pictures — Detailed pictures are effectively used to help tell the story.

7. Mechanics — The fairy tale contains correct spelling, capitalization, and punctuation. There are no errors in mechanics.

Well-developed criterion lists identify the key elements that define success on a performance task. They communicate to students how their products or performances will be judged and which elements are most important. Despite these benefits, criterion lists do not provide detailed descriptions of performance levels. In other words, there are no qualitative descriptions of the difference between a “15” and a “9” rating for a given element (or a full smile versus partial smile on the pumpkins). Thus, different teachers using the same performance list may rate the same student’s work quite differently.

Well-crafted rubrics can address this limitation. A rubric is based on a set of criteria and includes a description of performance levels according to a fixed scale (e.g., 4-points). Let’s examine three types of rubrics.

Holistic Rubric

A holistic rubric provides an overall rating of a student’s performance, typically yielding a single score. Here is an example of a holistic rubric for a scientific investigation task.

Holistic Rubric for a Scientific Investigation

4
The student’s investigation includes a stated hypothesis, follows a logical and detailed procedure, collects relevant and sufficient data, thoroughly analyzes the results, and reaches a conclusion that is fully supported by the data. The investigative process and conclusion are clearly and accurately communicated in writing so that others could replicate the investigation.

3
The student’s investigation includes a hypothesis, follows a step-by-step procedure, collects data, analyzes the results, and reaches a conclusion that is generally supported by the data. The process and findings are communicated in writing with some omissions or minor inaccuracies. Others could most likely replicate the investigation

2
The student’s stated hypothesis is unclear. The procedure is somewhat random and sloppy. Some relevant data is collected but not accurately recorded. The analysis of results is superficial and incomplete and the conclusion is not fully supported. The findings are communicated so poorly that it would be difficult for others to replicate the investigation.

1
The student’s investigation lacks a stated hypothesis and does not follow a logical procedure. The data collected is insufficient or irrelevant. Results are not analyzed, and the conclusion is missing or vague and not supported by data. The communication is weak or non-existent.

Since they yield an overall rating, holistic rubrics are well suited for summative evaluation and grading. However, they typically do not offer a detailed analysis of the strengths and weaknesses of a student’s work, and are thus less effective tools at providing specific feedback to learners.

Holistic rubrics can also present a challenge for teachers when they are evaluating a student’s complex performance having multiple dimensions. For example, consider two different students who have completed a graphic design project. One student uses visual symbols to clearly communicate an abstract idea. However, her design involves clip art that are sloppily pasted onto the graphic. A second student creates a beautiful and technically sophisticated design, yet his main idea is trivial. How would those respective pieces by scored using a holistic rubric? Often, the compromise involves averaging, whereby both students might receive the same score or grade, yet for substantially different reasons. Averaging obscures the important distinctions in the student’s performance, and doesn’t provide the student with detailed feedback. If all a student receives is a score or rating, it is difficult for the them to know exactly what the grade means or what refinements are needed in the future.

Analytic Rubric

An analytic rubric divides a product or performance into distinct elements or traits and judges each independently. Analytic rubrics are well suited to judging complex performances (e.g., multi-faceted problem solving or a research project) involving several significant dimensions. As evaluation tools, they provide more specific information (feedback) to students, parents and teachers about the strengths of a performance and the areas needing improvement.

Here is an example of an analytic rubric for mathematical problem solving (Figure 6.6).

Figure 6.6 — Analytic Rubric for Mathematical Problem Solving

Reasoning
Computation
Representation
Communication

4
An efficient and effective strategy is used and progress towards a solution is evaluated. Adjustments in strategy, if needed, are made, and/or alternative strategies are considered. There is sound mathematical reasoning throughout.
All computations are performed accurately and completely. There is evidence that computations are checked. A correct answer is obtained.
Abstract or symbolic mathematical representations are constructed and refined to analyze relationships, clarify or interpret the problem elements, and guide solutions.

Communication is clear, complete and appropriate to the audience and purpose. Precise mathematical terminology and symbolic notation are used to communicate ideas and mathematical reasoning.

3
An effective strategy is used and mathematical reasoning is sound.

Computations are generally accurate. Minor errors do not detract from the overall approach. A correct answer is obtained once minor errors are corrected.
Appropriate and accurate mathematical representations are used to interpret and solve problems.

Communication is generally clear. A sense of audience and purpose is evident. Some mathematical terminology is used to communicate ideas and mathematical reasoning.

2
A partially correct strategy is used, or a correct strategy for only solving part of the task is applied. There is some attempt at mathematical reasoning, but flaws in reasoning are evident.
Some errors in computation prevent a correct answer from being obtained.
An attempt is made to construct mathematical representations, but some are incomplete or inappropriate.

Communication is uneven. There is only a vague sense of audience or purpose. Everyday language is used or mathematical terminology is not always used correctly.

1
No strategy is used, or a flawed strategy is tried that will not lead to a correct solution. There is little or no evidence of sound mathematical reasoning.

Multiple errors in computation are evident. A correct solution is not obtained.
No attempt is made to construct mathematical representations or the representations are seriously flawed.

Communication is unclear and incomplete. There is no awareness of audience or purpose. The language is imprecise and does not make use mathematical terminology.

Analytic rubrics help students understand the nature of quality work since these evaluation tools identify the important dimensions of a product or performance. Moreover, teachers can use the information provided by an analytic evaluation to target instruction to particular areas of need (e.g., the students are generally accurate in their computations, but less effective at describing their mathematical reasoning).

Since there are several traits to be considered, the use of an analytic scoring rubric may take a bit more time than assigning a single score. However, I believe that the more specific feedback that results from this additional time is well worth the effort, especially given the ultimate goal of improving learning and performance.

Developmental Rubric

A third type of rubric — developmental — describes growth along a proficiency continuum, ranging from novice to expert. As examples, think of the colored belts that designate various proficiency levels in Karate or the categories for swimming from the American Red Cross.

Developmental rubrics are well suited to subjects that emphasize skill performance. Hence, they are natural to English/language arts, physical education, the arts, and language acquisition. The American Teachers of Foreign Language (ACTFL) has developed sets of longitudinal proficiency rubrics for listening, speaking, reading and writing that can be used in conjunction with assessment for world languages. View these at:

http://www.sil.org/lingualinks/LANGUAGELEARNING/OtherResources/ACTFLProficiencyGuidelines/contents.htm

Similar developmental rubrics exist for English/language arts. Bonnie Campbell-Hill has created a set of proficiency continuums for literacy, available at:

http://www.bonniecampbellhill.com/support.php

Developmental rubrics are generic in that they are not tied to any particular performance task nor age/grade level. Thus, teachers across the grades can profile student proficiency levels on the same rubric. Furthermore, an agreed-upon longitudinal scale enables learners, teachers, and parents to collectively chart progress toward desired accomplishments.

Yes, but… One often hears concerns about subjectivity when judging performance, whether during an Olympic ice skating event, at a juried art exhibit, or when teachers evaluate students’ products and performances for a task. Admittedly, all performance evaluation can be considered subjective in that human judgment is required. However, that does not mean that such judgments are destined to be biased or arbitrary. Student performance can be reliably judged as has been demonstrated by years of experience in statewide writing assessments, music adjudications, and AP art portfolio reviews. The reliability of evaluation increases with: 1) clear criteria embedded in well-developed rubrics; 2) models or anchors of performance coupled with the rubrics; and, 3) training and practice in scoring student work.

Conclusion

Over the years, I have observed five benefits resulting from the use of well-developed rubrics — two for teachers and three for students:

Benefits for Teachers

Scoring Reliability — A rubric constructed around clearly defined performance criteria assists teachers in reducing subjective judgments when they evaluate student work. The resulting performance evaluations, including grades, are thus more defensible to students and parents. When a common rubric is used throughout a department or grade-level team, school or district (with accompanying anchor examples), the consistency of judgments (i.e., scoring reliability) by teachers across classrooms and schools increases.
Focused Instruction — Clearly developed rubrics help clarify the meaning of standards and serve as targets for teaching. Indeed, teachers often observe that the process of evaluating student work against established criteria make them more attentive to addressing those qualities in their teaching.

Benefits for Students

Clear Targets — When well-developed rubrics are presented to students at the beginning, they are not left to guess about what is most important or how their work will be judged.
Feedback — Educational research conclusively shows that formative assessment and feedback can significantly enhance student performance. Clear performance criteria embedded in analytic rubrics enable teachers to provide the detailed feedback that learners need to improve their performance.
Guides for Self Assessment — When teachers share performance criteria and rubrics with students, learners can use these tools for self-assessment and goal setting.

Through the use of rubrics in these ways, educators can enhance the quality of student learning and performance, not simply evaluate it.

For a collection of authentic performance tasks and associated rubrics, see Defined STEM: http://www.definedstem.com

For a complete professional development course on performance tasks for your school or district, see Performance Task PD with Jay McTighe: http://www.performancetask.com

For more information about the design and use of performance tasks, see Core Learning: Assessing What Matters Most by Jay McTighe: http://www.schoolimprovement.com

Article originally posted:
URL: http://blog.performancetask.com/how-will-we-evaluate-student-performance-on-tasks-part-6/ | Article Title: How Will We Evaluate Student Performance On Tasks? | Website Title: PerformanceTask.com | Publication date: 2016–03–02

How will we evaluate student performance on tasks? (Part 6)

Written by Defined Learning