Keywords

The press for accountability is driving many reforms in education today. Educators at all levels are being asked to provide evidence to show that what they do makes a difference. In most cases these requests for evidence relate to the impact of school leaders and teachers on student achievement. But increasing, such requests extend to educators’ learning as well. Questions are raised about the value of the professional learning experiences in which educators engage, the impact of those experiences on their professional practice, and the effects of those practices on student learning outcomes.

Historically, educators have considered professional learning and development to be their right. Throughout the world, time for professional learning is included in nearly all teachers’ contracts for employment. But in recent years, government officials and policy-makers concerned with accountability have begun to question that right. As economic conditions decline and education budgets grow tight, they look at what schools spend on educators’ professional learning and want to know what benefits it brings. Does the school or school district’s investment in professional learning for educators yield tangible payoffs, or might that money be spent in better ways? Such questions highlight the importance of evaluating the effects of educators’ professional learning (Guskey 1998, 1999).

The professional learning experiences of educators cover a broad range of activities. Many of those experiences involve personal reflections on the everyday interactions between teachers and students. Teachers regularly try new approaches to instruction, gather information to learn how well those approaches work for their students, and then decide what changes need to be made or what instructional alternatives might be considered to improve students’ learning success. Thoughtful deliberations on these ongoing classroom interactions are a vital part of every teacher’s professional learning.

In addition, school leaders and teachers also take part in a variety of more structured professional learning experiences specifically designed to enhance their knowledge and improve their professional skills. These experiences include not only the broad spectrum of seminars and workshops in which educators engage, but also online programs, peer observations, professional learning communities, coaching or mentoring, university courses, conferences, and the like. In this chapter we will focus primarily on evaluating the effects of these more structured and formalized professional learning activities.

1 The Lack of Good Evidence

Educators generally pay little attention to evaluating their professional learning activities, even those that are more structured. When they do, those evaluations tend to be restricted to descriptive accounts of what took place or surveys of participants’ reactions to the experience. Rarely do evaluations consider the impact on teachers’ classroom practices or resultant improvements in student learning. Two large-scale reports produced in the United States point explicitly to the extent of this lack of well-designed evaluations.

In Reviewing the Evidence on How Teacher Professional Development Affects Student Achievement (Yoon et al. 2007), a team of scholars from the American Institutes for Research analyzed the findings from over 1,300 studies and evaluation reports published over a period of 20 years that potentially addressed the impact of teachers’ professional learning on student learning outcomes. Using the U.S. Department of Education’s “What Works Clearinghouse (WWC) Evidence Standards” to judge the quality of evidence presented in these investigations, the team identified only nine studies of sufficient quality for drawing valid conclusions about the characteristics of effective professional development for educators (see Guskey and Yoon 2009). All of the other studies and reports had significant design or methodological flaws that challenged the credibility of their findings.

The second report, Does Teacher Professional Development Have Effects on Teaching and Learning? (Blank et al. 2008), came from the Council of Chief State School Officers’ study of teacher professional development programs in mathematics and science sponsored by the U. S. National Science Foundation. The authors of this report reviewed evaluation studies from a voluntary sample of 25 professional development programs nominated by 14 states. Presumably, these programs represent the best of the best. Their analysis of study reports and papers from these nominated programs revealed that only seven program evaluations reported measureable effects of teacher professional development on subsequent student outcomes. No examination of the quality or validity of this evidence was conducted.

Some might argue that significant progress has been made in more recent years and that our knowledge of effective professional learning is improving. But even the most current evidence indicates that much of the research on educators’ professional learning, as well as most evaluations of professional development initiatives, continue to be descriptive rather that quantitative (Sawchuk 2010). Hard data on what professional learning models lead to better teaching and improved student learning remain difficult to find (Viadero 2011). In addition, new investigations employing rigorous methodological designs continue to yield uninspiring results.

Two recent, randomized field studies funded by the U.S. Department of Education, for example, investigated intensive professional development programs. Both studies found no effects on student achievement, even though the programs were generally aligned with the characteristics of effective professional development identified in the Yoon et al. (2007) review. In the first study, two professional learning approaches based on a popular early reading program were found to increase teachers’ knowledge of literacy development and their use of explicit reading instruction during the year of the intervention, but had little effect on the reading achievement of second grade students in high-poverty schools (Garet et al. 2008). In the second investigation, a professional development initiative focusing on secondary math was found to have significant effect on instructional practice but little impact on teachers’ content knowledge or students’ learning (Garet et al. 2011). Neither study offered sufficient evidence to direct specific improvement efforts.

What lessons can be gained from these recent studies and the earlier reviews? First, they make clear how few well-designed investigations and evaluation reports currently exist to adequately judge the effectiveness of educators’ professional learning experiences. While numerous studies and reports claim to have addressed this issue, most lack the rigor necessary to draw valid conclusions. But second and more important, they also represent a call to action for educators at all levels to plan better studies and more systematic evaluations in order to gain the critical evidence needed to guide improvements in professional learning programs and practice.

2 The Need for Sound Evaluations

The reasons so little good evidence currently exists on the effects of educators’ professional learning experiences are a matter of speculation. It may be educators’ commonly held perception of evaluation as a costly, time-consuming process that diverts attention from important planning, implementation, and follow-up activities. In addition, many educators undoubtedly believe that they lack the skill and expertise needed to become involved in rigorous evaluations. As a result they either ignore evaluation issues completely, or leave them to “evaluation experts” who are called in at the end and asked to determine if what was done made any difference. Sadly, these last-minute, post hoc evaluation efforts are seldom adequate in determining any experience or activity’s true effects.

Good evaluations, however, do not have to be costly or complicated. What they require is thoughtful planning, the ability to ask good questions, and a basic understanding of how to collect appropriate evidence in order to find valid answers. In many ways, good evaluations are merely the refinement of everyday thinking. They provide sound, meaningful, and sufficiently reliable information that allows thoughtful and responsible decisions to be made about professional learning processes and effects.

In this chapter we will explore the evaluation of educators’ professional learning experiences within the context of accountability. Three basic questions are addressed: (1) What does evaluation mean in this context? (2) What purposes do professional learning evaluations serve? and (3) What are the critical levels of professional learning evaluation? Finally the implications of the answers to these questions are considered with regard to accountability issues.

3 What Does Evaluation Mean in This Context?

Just as there are many forms of professional learning for educators, there are also many forms of evaluation. While experts may disagree on the best definition of evaluation, a useful operational definition for most purposes is: Evaluation is the systematic investigation of merit or worth (adapted from the Joint Committee on Standards for Educational Evaluation 1994).

Each part of this definition holds special significance. The word “systematic” distinguishes this process from the multitude of informal evaluation acts in which people consciously engage. “Systematic” implies that evaluation in this context is thoughtful, intentional, and purposeful. It is done for clear reasons and with explicit intent. Although its specific purpose may vary from one setting to another, all good evaluations are organized and deliberate.

Because it is systematic, some educators mistakenly believe that professional learning evaluation is appropriate only for planned seminars and workshops, but not for the wide range of other less structured, ongoing, job-embedded professional learning activities. Regardless of the form it takes, however, professional learning is not a haphazard process. It is, or should be, purposeful and results- or goal-driven (Schmoker 2004, 2006). Its objectives remain clear: to make a positive difference in teaching, to help educators reach high standards and, ultimately, to have a positive impact on students. This is true of seminars and workshops, as well as study groups, professional learning communities, action research, collaborative planning, curriculum development, structured observations, peer coaching and mentoring, and individually-guided professional learning activities. To determine if the goals of these activities are met, or if progress is being made, requires systematic evaluation.

“Investigation” refers to collecting and analyzing appropriate and pertinent information. While no evaluation can be completely objective, the process is not founded on opinion or conjecture. Rather, it is based on acquiring specific, relevant, and valid evidence examined through appropriate methods and techniques.

Using “merit or worth” in the definition implies appraisal and judgment. Evaluations are designed to determine something’s value. They help answer questions such as:

  • Is this experience or activity leading to the intended results?

  • Is it better than what was done in the past?

  • Is it better than another, competing activity?

  • Is it worth the costs?

Answers to these questions require more than a statement of findings. They demand an appraisal of quality and judgments of value, based on the best evidence available. Such appraisals are the basis of accountability.

4 What Purposes Do Professional Learning Evaluations Serve?

The purposes of evaluation are generally classified in three broad categories: planning, formative, and summative. Most evaluations actually fulfill all three of these purposes, although the emphasis on each changes during various stages of the evaluation process. While this blending of purposes blurs their distinction, differentiating their intent helps clarify understanding of evaluation procedures (Stevens et al. 1995).

4.1 Planning

Planning evaluation occurs before a professional learning program or activity begins, although certain aspects may be continual and ongoing. It is designed to give those involved in program development and implementation a precise understanding of what is to be accomplished, what procedures will be used, and how success will be determined. In essence, it lays the groundwork for all other evaluation activities. While some advocate an “evaluability assessment” prior to planning as a means of determining if a professional learning experience or activity is “evaluable” (Wholey et al. 2004), others contend that planning evaluation done well makes such assessment unnecessary (Guskey 2000a).

Planning evaluation involves appraisal of a professional learning program or activity’s critical attributes, usually on the basis of previously established standards. These include the specified goals, the proposal or plan to achieve those goals, the concept or theory underlying the proposal, the overall evaluation plan, and the likelihood that plan can be carried out with the time and resources available. In addition, planning evaluation typically includes a determination of needs, assessment of the characteristics of participants, careful analysis of the context, and the collection of relevant baseline information.

Evaluation for planning purposes is sometimes referred to as “preformative evaluation” (Scriven 1991) and may be thought of as “preventative evaluation.” It helps decision makers know if professional learning endeavors are headed in the right direction and likely to produce the desired results. It also helps identify and remedy early on the difficulties that might plague later evaluation efforts. Furthermore, planning evaluation helps ensure that other evaluation purposes can be met in an efficient and timely manner.

4.2 Formative

Formative evaluation occurs during the operation of a professional learning experience or activity. Its purpose is to provide those responsible for the activity with ongoing information about whether things are going as planned and if expected progress is being made. If not, this same information can be used to guide necessary improvements (Scriven 1967).

The most useful formative evaluations focus on the conditions for success. They address issues such as:

  • What conditions are necessary for success?

  • Have those conditions for success been met?

  • Can they be improved?

In many cases, formative evaluation is a recurring process that takes place at multiple times throughout the life of the professional learning program or activity. Many program developers, in fact, constantly engage in the process of formative evaluation. They use evidence gathered at each step of development and implementation to make adjustments, modifications, or revisions (Fitzpatrick et al. 2004).

To keep formative evaluations efficient, Scriven (1991) recommends using them as “early warning” evaluations. In other words, they provide an early version of the final, overall evaluation. As development and implementation proceed, formative evaluation can consider intermediate benchmarks of success to determine what is working as expected and what difficulties must be overcome. Flaws can be identified and weaknesses located in time to make the adaptations necessary for success.

4.3 Summative

Summative evaluation is conducted at the completion of a professional learning experience or activity. Its purpose is to provide program developers and decision makers with judgments about the program or activity’s overall merit or worth. Summative evaluation describes what was accomplished, what the consequences were (positive and negative), what the final results were (intended and unintended), and, in some cases, whether the benefits justify the costs (P. Phillips 2002).

Unlike formative evaluations that are used to guide improvements, summative evaluations present decision makers with the information they need to make crucial decisions about a professional learning program or activity. Should it be continued? Continued with modifications? Expanded? Discontinued? Ultimately, its focus is “the bottom line.”

Summative evaluation may focus on either internal or external interests. In other words, it may address issues relevant to program designers who want to know if the professional learning experience truly accomplished the goals for which it was intended. But in many instances, summative evaluation addresses the concerns of external parties such as policy makers, funding organizations, or government agencies. These efforts often take the form of “third-party evaluations,” in which a group other than the program designers or those implementing the program is asked to gather and analyze evidence on the effects, presumably to offer an impartial perspective on the results.

Perhaps the best description of the distinction between formative and summative evaluation is one offered by Robert Stake: “When the cook tastes the soup, that’s formative; when the guests taste the soup, that’s summative” (quoted in Scriven 1991, p. 169).

Unfortunately, many educators associate evaluation with its summative purposes only. Important information that could help guide planning, development, and implementation is often neglected, even though such information can be key in determining a professional learning program or activity’s overall success. Summative evaluation, although necessary, often comes too late to be much help. Thus, while the relative emphasis on planning, formative, and summative evaluation changes through the life of a professional learning program or activity, all three are essential to a meaningful evaluation.

5 What Are the Critical Levels of Professional Learning Evaluation?

Planning, formative, and summative evaluation all involve collecting and analyzing information. Effective professional learning evaluation requires consideration of the five critical stages or levels of information shown in Table 44.1 (Guskey 2000a, 2002a, 2005). These five levels represent an adaptation of an evaluation model developed by Kirkpatrick (1959, 1998) for judging the value of supervisory training programs in business and industry. Kirkpatrick’s model, although widely applied, has seen limited use in education because of inadequate explanatory power. While helpful in addressing a broad range of “what” questions, many find it lacking when it comes to explaining “why” (Alliger and Janak 1989; Holton 1996).

The five levels in this model are hierarchically arranged, from simple to more complex. With each succeeding level, the process of gathering evaluation information requires more time and resources. And because each level builds on those that come before, success at one level is usually necessary for success at higher levels.

Table 44.1 Five levels of professional learning evaluation

5.1 Level 1: Participants’ Reactions

The first level of evaluation looks at participants’ reactions to the professional learning experience. This is the most common form of professional learning evaluation and the easiest type of information to gather and analyze.

At Level 1 the questions addressed focus on whether or not participants liked the experience. Did they feel their time was well spent? Did the content and material make sense to them? Were the activities well planned and meaningful? Was the leader knowledgeable, credible, and helpful? Did they find the information useful?

Also important for some professional learning experiences are questions related to the context, such as: Was the room the right temperature? Were the chairs comfortable? Were the refreshments fresh and tasty? To some, questions such as these may seem silly and inconsequential. But experienced professional development leaders know the importance of attending to these basic human needs.

Information on participants’ reactions is usually gathered through questionnaires handed out at the end of a program or activity, or by online surveys distributed later through email. These questionnaires and surveys typically include a combination of rating-scale items and open-ended response questions that allow participants to provide more personalized comments. Because of the general nature of this information, many organizations use the same questionnaire or survey for all of their professional learning activities, regardless of the format.

Some educators refer to these measures of participants’ reactions as “happiness quotients,” insisting that they reveal only the entertainment value of an experience or activity, not its quality or worth. But measuring participants’ initial satisfaction provides information that can help improve the design and delivery of professional learning programs or activities in valid ways. In addition, positive reactions from participants are usually a necessary prerequisite to higher level evaluation results.

5.2 Level 2: Participants’ Learning

In addition to liking their professional learning experiences, participants ought to learn something from them. Level 2 focuses on measuring the new knowledge, skills, and perhaps attitudes or dispositions that participants gained (Guskey 2002b). Depending on the goals of the professional learning program or activity, this can involve anything from a pencil-and-paper assessment (Can participants describe the critical attributes of effective questioning techniques and give examples of how these might be applied in common classroom situations?) to a simulation or full-scale skill demonstration (Presented with a variety of classroom conflicts, can participants diagnose each situation, and then prescribe and carry out a fair and workable solution?). Oral or written personal reflections, or examinations of the portfolios participants assemble also can be used to document their learning.

Although Level 2 evaluation information often can be gathered at the completion of a professional learning program or activity, it usually requires more than a standardized form. And because measures must show attainment of specific learning goals, indicators of successful learning need to be outlined before activities begin.

Careful evaluators also consider possible “unintended” learning outcomes, both positive and negative. Professional learning activities that engage teachers and school leaders in collaboration, for example, can additionally foster a positive sense of community and shared purpose among participants (Supovitz 2002). But in some instances, individuals collaborate to block change or inhibit advancement (Corcoran et al. 2001; Little 1990). Investigations further show that collaborative efforts sometimes run headlong into enormous conflicts over professional beliefs and practices that can impede progress (Achinstein 2002). Thus even the best planned professional learning endeavors occasionally yield completely unanticipated negative consequences.

If there is concern that participants may already possess the requisite knowledge and skills, some form of pre- and post-assessment may be required. Analyzing this information provides a basis for improving the content, format, and organization of professional learning programs and activities.

5.3 Level 3: Organizational Support and Change

At Level 3 the focus shifts from participants to organizational dimensions that may be vital to the success of the professional learning experience or activity. Organizational elements also can sometimes hinder or prevent success, even when the individual aspects of professional development are done right (Sparks 1996).

Suppose, for example, that a group of secondary educators participates in a professional learning experience on aspects of cooperative learning. As part of their experience they gain an in-depth understanding of cooperative learning theory and organize a variety of classroom activities based on cooperative learning principles. Following their learning experience they implement these activities in classes where students are graded or marked “on the curve,” according to their relative standing among classmates, and great importance is attached to each student’s individual class rank. Organizational grading policies and practices such as these, however, make learning highly competitive and thwart the most valiant efforts to have students cooperate and help each other learn. When graded “on the curve,” students must compete against each other for the few scarce rewards (high grades) dispensed by the teacher. Cooperation is discouraged since helping other students succeed lessens the helper’s chance of success (Guskey 2000b).

The lack of positive results in this case does not reflect poor training or inadequate learning on the part of the participating teachers, but rather organizational policies that are incompatible with implementation efforts. Problems at Level 3 have essentially canceled the gains made at Levels 1 and 2 (Sparks and Hirsh 1997). That is precisely why professional learning evaluations must include information on organizational support and change.

Level 3 questions focus on the organizational characteristics and attributes necessary for success. Did the professional learning activities promote changes that were aligned with the mission of the school? Were changes at the individual level encouraged and supported at the building and district levels (Corcoran et al. 2001)? Were sufficient resources made available, including time for sharing and reflection (Colton and Langer 2005; Langer and Colton 1994)? Were successes recognized and shared? Issues such as these often play a large part in determining the success of any professional learning program.

Procedures for gathering information at Level 3 differ depending on the goals of the professional learning program or activity. They may involve analyzing school records, examining the minutes from follow-up meetings, administering questionnaires that tap issues related to the organization’s advocacy, support, accommodation, facilitation, and recognition of change efforts. Structured interviews with participants and school administrators also can be helpful. This information is used not only to document and improve organizational support for professional learning, but also to inform future change initiatives.

5.4 Level 4: Participants’ Use of New Knowledge and Skills

At Level 4 the primary question is: Did the new knowledge and skills that participants learned make a difference in their professional practice? The key to gathering relevant information at this level of evaluation rests in specifying clear indicators of both the degree and quality of implementation. Unlike Levels 1 and 2, this information cannot be gathered at the end of a professional learning program or activity. Enough time must pass to allow participants to adapt the new ideas and practices to their settings. And because implementation is often a gradual and uneven process, measures of progress may need to be gathered at several time intervals.

Depending on the goals of the professional learning program or activity, this information may involve questionnaires or structured interviews with participants and their school leaders. Oral or written personal reflections, or examinations of participants’ journals or portfolios also might be considered. The most accurate information typically comes from direct observations, either by trained observers or using digital recordings. These observations, however, should be kept as unobtrusive as possible (for examples, see Hall and Hord 1987).

Analyzing this information provides evidence on current levels of use. It also helps professional development leaders restructure future programs and activities to facilitate better and more consistent implementation.

5.5 Level 5: Student Learning Outcomes

Level 5 addresses “the bottom line” in education: What was the impact on students? Did the professional learning program or activity benefit them in any way? The particular student learning outcomes of interest will depend, of course, on the goals of that specific professional learning endeavor. In addition to the stated goals, the program or activity may result in important unintended outcomes. Suppose, for example, that students’ average scores on large-scale assessments went up, but so did the school dropout rate. Because mixed results such as this are so typical, evaluations should always include multiple measures of student learning (Chester 2005; Guskey 2007).

Since stakeholders vary in their trust of different sources of evidence, it is unlikely that any single indicator of success will prove adequate or sufficient to all. Providing acceptable evidence for judging the effects of professional learning activities will almost always require multiple sources of evidence. In addition, these sources of evidence must be carefully matched to the needs and perceptions of different stakeholder groups (Guskey 2012).

Results from large-scale assessments and nationally-normed standardized exams may be important for accountability purposes and will need to be included. In addition, school leaders often consider these measures to be valid indicators of success. Teachers, however, generally see limitations in large-scale assessment results. These types of assessments are typically administered only once per year, and results may not be available until several months later. By that time, the school year may have ended and students promoted to another teacher’s class. So while important, many teachers do not find such results particularly useful (Guskey 2007).

Teachers put more trust in results from their own assessments of student learning – classroom assessments, common formative assessments, and portfolios of student work. They turn to these sources of evidence for feedback to determine if the new strategies or practices they are implementing really make a difference. Classroom assessments provide timely, targeted, and instructionally relevant information that also can be used to plan revisions when needed. Since teachers comprise a major stakeholder group in any professional learning activity, sources of evidence that they trust and believe will be particularly important to include.

Measures of student learning typically include cognitive indicators of student performance and achievement, such as assessment results, portfolio evaluations, marks or grades, and scores from standardized tests. But in addition, affective and behavioral indicators of student performance can be relevant as well. Student surveys designed to measure how much students like school; their perceptions of teachers, fellow students, and themselves; their sense of self-efficacy, and their confidence in new learning situations can be especially informative. Evidence on school attendance, enrollment patterns, dropout rates, class disruptions and disciplinary actions are also important outcomes. In some areas, parents’ or families’ perceptions may be a vital consideration. This is especially true in initiatives that involve changes in grading practices, report cards, or other aspects of school-to-home and home-to-school communication (Epstein and Associates 2009; Guskey 2002c).

Furthermore, Level 5 evaluations should be made as methodologically rigorous as possible. Rigor, however, does not imply that only one evaluation method or design can produce credible evidence. Although randomized designs (i.e., true experimental studies) represent the gold-standard in scientific research, especially in studies of causal effects, a wide range of quasi-experimental designs can produce valid results. When evaluations are replicated with similar findings, that validity is further enhanced. One of the best ways to enhance an evaluation’s methodological rigor, however, is to plan for meaningful comparisons.

In many cases evidence on outcomes at Level 5 is gathered from a single school or school district in a single setting for a restricted time period. Unfortunately, from a design perspective, such evidence lacks both reliability and validity. Regardless of whether results are positive or not, so many alternative explanations may account for the results that most authorities would consider such outcomes dubious at best and meaningless at worst (Guskey and Yoon 2009).

It may be, for example, that the planned professional learning endeavors did, indeed, lead to noted improvements. But maybe the improvements were the result of a change in leadership or personnel instead. Maybe the community or student population changed. Maybe changes in government policies or assessments made a difference. Maybe other simultaneously implemented interventions were responsible. The possibility that these or other extraneous factors influenced results makes it impossible to draw definitive conclusions.

The best way to counter these threats to the validity of results is to include a comparison group – another similar group of educators or schools not involved in the current activity or perhaps engaged in a different activity. Ideal comparisons involve the random assignment of students, teachers, or schools to different groups. But because that is rarely possible in most education settings, finding similar classrooms, schools, or school districts provides the next best option. In some cases involvement in a professional learning activity can be staggered so that half of the group of teachers or schools that volunteer can be randomly selected to take part initially while the others delay involvement and serve as the comparison group. In other cases comparisons can be made to “matched” classrooms, schools, or school districts that share similar characteristics related to motivation, size, and demographics.

Using comparison groups does not eliminate the effects of extraneous factors that might influence results. It simply allows planners greater confidence in attributing the results attained to the particular program or activity being considered. In addition, other investigative methods may be used to formulate important questions and develop new measures relating to professional growth (Raudenbush 2005).

Student and school records provide the majority of information at Level 5. Results from questionnaires and structured interviews with students, parents, teachers, and administrators could be included as well. Level 5 information is used summatively to document a program or activity’s overall impact. But formatively, it can help guide improvements in all aspects of professional learning, including design, implementation, and follow-up. In some cases information on student learning outcomes is used to estimate the cost effectiveness of professional learning programs and activities, sometimes referred to as “return on investment,” or “ROI evaluation” (Parry 1996; Phillips 1997; Todnem and Warner 1993).

6 Implications for Improvement

Three important implications stem from this model for evaluating professional learning. First, each of the five evaluation levels is important. Although evaluation at any level can be done well or poorly, the information gathered at each level provides vital data for improving the quality of professional learning programs and activities. And while each level relies on different types of information that may be gathered at different times, no level can be neglected.

Second, tracking effectiveness at one level tells little about impact at the next level. Although success at an early level may be necessary for positive results at the next higher one, it is clearly not sufficient (Cody and Guskey 1997). Breakdowns can occur at any point along the way. Sadly, most government officials and policy makers fail to recognize the difficulties involved in moving from professional learning experiences (Level 1) to improvements in student learning (Level 5). They also tend to be unaware of the complexity of this process, as well as the time and effort required to build this connection (Guskey 1997; Guskey and Sparks 2004).

The third implication, and perhaps the most important, is that in planning professional learning programs and activities to impact student learning, the order of these levels must be reversed. In other words, education leaders must plan “backward” (Guskey 2001a, b, 2003), starting where they want to end up and then working back (Hirsh 2012).

7 Backward Planning for Accountability

In backward planning, educators first decide what student learning outcomes they want to achieve and what evidence best reflects those outcomes (Level 5). Relevant evidence provides the basis for accountability. School leaders and teachers must decide, for example, if they want to improve students’ reading comprehension, enhance their skills in problem solving, develop their sense of confidence in learning situations, improve their behavior in class, their persistence in school, or their collaboration with classmates. Critical analyses of data from assessments of student learning, samples of student work, and school records are especially useful in identifying these student learning goals.

Next they must determine, on the basis of pertinent research, what instructional practices and policies will most effectively and efficiently produce those outcomes (Level 4). They need to ask questions such as: What evidence verifies that these particular practices and policies will produce the results we want? How good or reliable is that evidence? Was it gathered in contexts similar to ours? In this process, leaders must be particularly mindful of innovations that are more “opinion-based” than “research-based,” promoted by people more concerned with “what sells” to desperate educators than with “what works” with students. Before jumping on any educational bandwagon, they must make sure that trustworthy evidence validates the chosen approach.

After that, leaders need to consider what aspects of organizational support need to be in place for those practices and policies to be implemented (Level 3). Many valuable improvement efforts fail miserably due to a lack of active participation and clear support from school leaders (Guskey 2004). Others prove ineffective because the resources required for implementation were not provided. The lack of time, instructional materials, or necessary technology can severely impede teachers’ attempts to use the new knowledge and skills acquired through a professional learning experience. A big part of planning involves ensuring that organizational elements are in place to support the desired practices and policies.

Then, leaders must decide what knowledge and skills the participating professionals must have in order to implement the prescribed practices and policies (Level 2). In other words, what must they know and be able to do to successfully adapt the innovation to their specific situation and bring about the sought-after change.

Finally, consideration turns to what set of experiences will enable participants to acquire the needed knowledge and skills (Level 1). Seminars and workshops, especially when paired with collaborative planning, structured opportunities for practice with feedback, and follow-up coaching can be a highly effective means of sharing information and expanding educators’ knowledge. Action research projects, organized study groups, collegial exchange, professional learning communities, and a wide range of other activities can all be effective, depending on the specified purpose of the professional learning activity.

What makes this backward planning process so important is that the decisions made at each level profoundly affect those made at the next. For example, the particular student learning outcomes being sought influence the kinds of practices and policies that need to be implemented. Likewise, the practices and policies to be implemented influence the kinds of organizational support or change required, and so on.

The context-specific nature of this work complicates matters further. Even if school leaders and teachers agree on the student learning outcomes they want to achieve, what works best in one context with a particular community of educators and a particular group of students might not work equally well in another context with different educators and different students. This is what makes developing examples of truly universal “best practices” in professional development so difficult. What works always depends on where, when, and with whom.

Unfortunately, professional developer leaders frequently fall into the same trap in planning that teachers do when they plan their lessons. Teachers often plan in terms of what they are going to do, instead of what they want their students to know and be able to do. Similarly, those planning professional learning programs and activities often focus on what they will do (workshops, seminars, institutes, etc.) and how they will do it (study groups, action research, peer coaching, etc.). Their planning tends to be “event-based” or “process-based.” This not only diminishes the effectiveness of their efforts, it also makes evaluation much more difficult.

The most effective professional learning planning begins with clear specification of the students learning outcomes to be achieved and the sources of evidence that best reflect those outcomes. With those goals articulated, school leaders and teachers then work backward. Not only will this make planning much more efficient, it also provides a format for addressing the issues most crucial to evaluation. As a result, it makes evaluation a natural part of the planning process and offers a basis for accountability.

8 Conclusion

Many good things are done in the name of professional learning. One could argue, in fact, that no significant educational improvement effort has succeeded in the absence of high quality professional development for educators. Unfortunately, many rotten things also pass for professional learning. What leaders in education have not done well is provide evidence to document the difference between the two. The new demands for accountability today make presenting that evidence more crucial than ever.

Evaluation provides the key to making the distinction. The procedures involved in meaningful evaluations are not especially complicated. They also do not require skills beyond those already possessed by most education leaders. Gathering and analyzing evidence on the five levels of professional development evaluation involve simply careful planning and thoughtful deliberation. Leaders who plan backward in that process, beginning with the clear articulation of student learning goals and specification of the evidence that best reflects achievement of those goals, have taken the most important first step. They have created the foundation for meaning and purposeful evaluation. They also have established the basis for addressing any questions regarding accountability. All other aspects of evaluation will stem from that essential first step. Those who proceed in this way will not only improve the quality of educators’ professional learning experiences, they will enhance the success of professional development endeavors everywhere.