A common focus of teacher-preparation programs (TPPs) is to develop teachers’ questioning skills and an understanding of how different types of questions serve different pedagogical purposes. In mathematics education, teacher questions that generate mathematical discussion among students have been increasingly emphasized. For example, the Common Core State Standards for Mathematical Practices (Common Core State Standards Initiative 2010) expect students to “construct viable arguments and critique the reasoning of others” and “express regularity in repeated reasoning”—practices that require mathematical talk among students and teacher. Whole-class discussions can support mathematical talk and, as such, are important pedagogical components of mathematics lessons. However, student talk does not ensure learning; teachers must ask generative questions at strategic times in order to elicit, build on, and extend student thinking about a mathematical point (Sleep 2012; Smith and Stein 2011). To orchestrate a productive whole-class discussion about mathematics, a teacher must possess a repertoire of types of questions to draw a range of students into the discussion, elicit their thinking, and hold them accountable for communicating their ideas using academic language. TPPs should afford preservice teachers multiple opportunities to learn the theories supporting various questioning practices and develop questioning skills (Lobato et al. 2005; Stein et al. 2008).

An underlying expectation of TPPs is that, as prospective teachers try different strategies and reflect with mentors, their practice improves over time and comes to reflect theories and techniques presented in their coursework. Unfortunately, as Feiman-Nemser (2001) notes, field experiences “are often limited, disconnected from university coursework and inconsistent” (p. 17). Lampert and Ball (1999) explain, “Prospective teachers are often in the end most influenced by what they see their cooperating teachers do or by their own memories from school” (p. 39). Such observations point to the strong influence of field experiences in teacher preparation and offer one explanation for why prospective teachers sometimes fail to take up teaching practices that are emphasized in TPP coursework (Gainsburg 2012). The study reported here explored this phenomenon through a focus on a specific practice: questioning. We investigated the decisions mathematics prospective teachers made about what types of questions to ask students during whole-class discussions.

A veteran teacher’s practice could be expected to remain relatively stable for long periods (e.g., a year) and so might be captured by a few observations occurring close together in time. In contrast, we presumed that a prospective teacher’s practice would evolve significantly over the weeks or months of her apprenticeship and so would be less reliably captured at a single point in time. Our study aimed to determine whether and how the participating prospective teachers revised their questioning practices over the course of their final field experience. Our study contributes to and extends current research on the types of questions prospective teachers ask in mathematics classrooms by investigating why they ask the questions they do and whether this changes over time.

Guskey (1986) documented the substantial impact that student responses have on teachers’ decisions about practice. More recent studies suggest that this same phenomenon applies to prospective teachers’ decisions. Studying one beginning high-school teacher, Lloyd (2008) found that the teacher’s decisions about classroom interactions and implementing a new curriculum were largely determined by his perception of his students’ expectations about mathematics-classroom roles and activity. This teacher perceived that his students were resistant to whole-class discussions and group work; as a result, he adjusted the curriculum and made his instruction primarily univocal. Similar results were found in Bohl and Van Zoest’s (2002) case study of a prospective teacher and her mentor’s use of new curriculum materials; in this study, the prospective teacher described her difficulty eliciting responses in whole-class discussions. Acknowledging the potential impact of student responses on prospective teachers’ instructional decisions, our study also examined student responses to the prospective teachers’ questions.

Literature review

Sociocultural theory encourages researchers to attend to language, as language is considered a mediator of meaning making and central to the learning process—briefly, “Learning should be seen as a communicative process” (César and Santos 2006, p. 333). Planas et al. (2018) describe two recent approaches to research on language in mathematics education: “that which takes language itself as the object of study and that which uses language as a vehicle for studying other phenomena” (p. 198). Our study takes the second approach, examining classroom language in order to understand the opportunities for mathematical thinking that are available to students. As Nasir et al. (2008) observe, language is “the primary source of our interaction with and reflection on the world” (p. 198), and how talk is structured in the math classroom shapes what is learned and by whom, and how students position themselves as doers and learners of mathematics. Planas et al. (2018) go further, equating mathematical activity with engagement in a particular form of discourse; therefore, understanding mathematical activity necessitates “studying that discourse and understanding its characteristics” (p. 199). Like Nasir et al., Planas et al. note the importance of studying “the use or function of language as well as its form,” in order to examine mathematical processes undertaken by students and teachers (p. 200).

Learning does not come automatically from talking. Students must go beyond providing answers to mathematical problems, to describing why they used certain strategies and approaches when solving, and they must use precise mathematical language so that the teacher and classmates can understand their thinking and expand on it (Nathan and Knuth 2003; Sfard and Kieran 2001). According to Smith and Stein (2011), teacher actions that support students’ explanations include posing

good questions that can guide students’ attention to previously unnoticed features of a problem or they can loosen up their thinking so that they gain a new perspective on what is being asked. Good questions also force students to articulate their thinking so that it is understandable to another human being; this articulation, in and of itself, is often a catalyst to learning. (p. 62)

Teacher questions that solicit procedures, calculations, or known facts may be key to the discussion but may be insufficient to promote rich discourse. Boaler and Brodie (2004) highlight three important types of teacher questions: exploring mathematical meanings or relationships, probing students to explain their thinking, and soliciting contributions from members of the class (p. 776). Questions that extend student thinking, such as, “Can you say more about that?” or simply, “Why?”, or open-ended questions that ask students to compare, choose, or consider “What if…?”, require students to elaborate in ways that deepen the discussion and enhance opportunities for conceptual understanding.

Prior research, however, has shown that the vast majority of teachers’ questions require students only to recall facts, rules, procedures, or calculations (Bennett 2010; Franke et al. 2009; Gall 1984; Kawanaka and Stigler 1999; Schleppenbach et al. 2007). In an early study of teachers’ questions, Stevens (1912) found that, in high-school classes of various subjects, two-thirds of the teachers’ questions solicited the direct recall of textbook information. More than 70 years later, Gall (1984) reviewed studies of questioning practices from the intervening decades and concluded, “In a half-century there has been no essential change in the kinds of questions used…. About 60% of teachers’ questions require students to recall facts; about 20% require students to think; and the remaining 20% are procedural” (p. 713). We acknowledge that a balance of question types is important in the mathematics classroom and that requests to recall facts and procedures have their purpose. But an imbalanced situation, in which students are rarely asked high-level questions, can be detrimental to students’ learning of and views about mathematics.

In mathematics-education research, investigations of teachers’ questions in whole-class discussion have been situated within larger studies of mathematical discourse in classrooms. Definitions of question types vary across studies, but researchers commonly place mathematics-teachers’ questions on a spectrum that ranges from closed (i.e., soliciting a one-word answer—yes/no, true/false, or a calculation result), to procedural or factual recall (i.e., soliciting predictable responses that state established knowledge—a formula, definition, or process), to conceptual or open ended (i.e., soliciting unpredictable responses that require reasoning and explanation). This body of research (e.g., Bennett 2010; Franke et al. 2009; Kawanaka and Stigler 1999; Sahin and Kulm 2008; Schleppenbach et al. 2007) has revealed that mathematics teachers at all experience levels have difficulty asking open-ended or deeper-level questions during discussions; the majority of teachers’ questions are closed. Inoue and Buczynski (2011), studying prospective teachers in particular, observed that their subjects not only asked few open-ended questions, they often answered their own questions.

In their analysis of 231 videos from the Third International Mathematics and Science Study (TIMMS), Kawanaka and Stigler (1999) found the most frequently asked content-related teacher questions in eighth-grade mathematics classrooms in Germany, Japan, and the USA were those that “(a) request[ed] a relatively short response, such as vocabulary, numbers, formulas, single rules, prescribed solution methods, or an answer for computation; (b) request[ed] that a student read the response from a notebook or textbook; and (c) request[ed] that a student choose among alternatives” (p. 258). US teachers asked significantly more yes/no questions than did Japanese teachers, while German and Japanese teachers asked significantly more describe/explain questions than did US teachers (Kawanaka and Stigler 1999). In another international comparison, Schleppenbach et al. (2007) analyzed student–teacher interactions in 31 videos from fourth- and fifth-grade mathematics classrooms in the USA and China. Here, too, the questions asked by US teachers were primarily requests for computation, whereas the Chinese teachers’ questions tended to solicit procedures, reasoning, rules, or terms, and to check for understanding or agreement. These studies point to a consistent lack of open-ended teacher questions throughout American K–12 mathematics education and the need to develop questioning techniques in teacher education. These studies say less, however, about the reasons for this phenomenon, or how individual teachers’ questioning practices change over time.

A smaller body of research about prospective teachers reveals a similar underuse of open-ended questions, but, again, little is known about what influences their questioning-related decisions or how those decisions change during the course of their apprenticeship (Weiland et al. 2014). Blanton et al. (2001) examined one mathematics prospective teacher’s developing classroom-discourse practice. Although this prospective teacher genuinely wanted her students to participate, early in her student-teaching semester she relied on questioning strategies that required students primarily to compute simple answers, recall information, or describe procedures previously learned (p. 233). Later in the semester, however, she made “an underlying shift away from instructional questions…to questions that explored students’ solutions and strategies” (p. 237). Blanton et al. concluded that the act of self-reflecting on her practice with the research team contributed to this shift. In a 6-month case study of two prospective teachers, Bennett (2010) (who also served as their mentor) examined their use of questions in whole-class discussions in high-school mathematics classes. Bennett categorized the prospective teachers’ questions as follow up, probing for fact, and probing for understanding. His prospective teachers increasingly asked all three types of questions over the 6-month period, but the majority of their questions remained probing for fact. Bennett documented neither the responses of the students nor the prospective teachers’ reasons for their questioning decisions. Bennett inferred, however, similarly to Blanton et al., that the overall increase in questioning was due to his post-observation interviews with the prospective teachers, which explicitly drew their attention to their questioning practices.

Franke et al. (2009) have commented on the difficulty of analyzing teacher questions and stress the importance of including student responses in such investigations, because “one cannot strip what teachers say from the context in which it happens or from how students engage with classroom interaction” (p. 391). When Inoue and Buczynski (2011) investigated eight prospective teachers’ use of open-ended questions in elementary-level mathematics lessons, they examined the students’ responses. The eight prospective teachers attempted open-ended questions but failed to capitalize on their students’ responses because they were unable to anticipate and/or fully comprehend those responses. Some of these prospective teachers ignored unanticipated student responses and sometimes answered their own questions. Moyer and Milewicz (2002), similarly, studied 48 prospective teachers as they interviewed elementary students individually about a mathematical task. These prospective teachers often responded inappropriately, speeding up their questioning when the interviewee seemed bored, or being flummoxed when the interviewee gave an unexpected answer. In both of these studies, the prospective teachers were observed only once, offering no information about development over time.

Our study aligned its methodological approach with that taken by Simon and his colleagues (Simon and Tzur 1999; Simon et al. 2000). Addressing the need in the field of mathematics education for “new conceptual frameworks for thinking about teaching” (Simon and Tzur 1999, p. 258), Simon and colleagues investigated the development of mathematics-teachers’ practice in the context of promoting reform. Their conception of the term practice included “not only what teachers do but also what they think about what they do and their motivations for the actions they take” (Simon et al. 2000, p. 581). To create these multifaceted accounts of practice, Simon and colleagues collected data sets from each teacher that comprised observations of instruction and interviews that inquired, among other things, about the teachers’ rationales for their plans and instructional moves. Because Simon and his colleagues presumed that “every teacher’s approach is rational and coherent from his or her perspective” (Simon and Tzur 1999, p. 261), it was crucial for them to uncover the teachers’ rationales and incorporate those into the accounts.

Schoenfeld (1998) proposed a theoretical model of teacher decision making, which drew together a teacher’s knowledge, beliefs, goals, and plans—all of which were presumed to depend on the teacher’s perception of the teaching environment. Disentangling the components of the prospective teachers’ decisions about their practice was beyond the scope of our study. Following Simon et al. (1999, 2000), as well as other researchers (e.g., Kohler et al. 2008; Rich and Hannafin 2008), we took the stance that teachers’ rationales for practice (also referred to in the literature as teachers’ thinking, decisions, or reasons) are valuable to study, as explanatory of teachers’ actions, even when the beliefs, knowledge, values, goals, emotions, and other factors that underlie their rationales are unknown.

Our study examined the kinds of questions that prospective teachers asked their students during whole-class discussions in high-school mathematics classes. We inquired about their rationales for the kinds of questions they asked, and we explored how their questions and rationales changed over time. Understanding the interplay among these factors should benefit TPPs by informing the ways prospective teachers are mentored in the field and prepared in courses. In particular, learning the trajectory of the prospective teachers’ questioning practices—toward improved questioning over time or toward the decline of TPP-taught techniques—would help teacher educators understand how and when professional-development interventions would be most effective.

Three-phase study

Overview

This study was exploratory in nature, with different phases of analyses surfacing iteratively (Agar 2006).

In Phase 1, video observations of prospective teachers’ lessons were analyzed to identify the kinds of questions the prospective teachers asked and how those changed over a 10-week period. The prospective teachers were also interviewed near the beginning and end of their field experience. The interview inquired about their teaching practices and views, using a protocol that was grounded in a teaching artifact: a state-required teaching performance assessment called the Performance Assessment for California Teachers (PACT) Teaching Event.

Driven by Phase 1 results showing declines in both the number of questions asked over time and the proportion of open-ended and deeper-level questions, Phase 2 involved analyzing the interviews to identify views and rationales expressed by the prospective teachers that related to their questioning practices and how those views changed over time.

These interviews revealed that the prospective teachers had revised downward their perception of the value of open-ended, deeper-level questions. Also in the interviews, the prospective teachers implicated the negative responses of their students to such questions as the reason for this changed view. Thus, in Phase 3, transcripts of the lesson videos were reanalyzed to characterize the students’ responses to the prospective teachers’ questions, to obtain a fuller picture of the phenomenon.

In the next sections, we describe the participants and setting. Then, we present the data collection and analysis methods and results by phase, to reflect the chronology and logic of the study process.

Setting and participants

The four study participants were enrolled in the same, yearlong (four-quarter) TPP in Southern California that led to a master’s degree (M.Ed.) and a mathematics-teaching credential. The study took place in the fourth (Spring) quarter, during the final, 16-week student-teaching experience. (The participants had also had a 6-week field experience in the third quarter, in which they mainly observed teaching.) For this final field experience, the prospective teachers were present at their school sites all day, planning and teaching a full load of classes without interruption from their cooperating teachers.

Prior to the final quarter, the participants had completed most of their coursework, which included an ongoing, math-specific, student-teaching support seminar and a math-teaching methods course. The methods and seminar courses, as well as other program courses, stressed the importance of and strategies for asking thought-eliciting questions. The seminar in particular required the prospective teachers to incorporate such questions into their lesson plans for student teaching, and the seminar instructor coached individual prospective teachers in this effort. Seminar continued into the final quarter; the few other courses the participants took in the final quarter were not math-specific. Also during the final quarter, the prospective teachers met with a university field supervisor four to eight times to discuss the supervisor’s classroom observations, review lesson plans, reflect, and set goals.

Out of the nine prospective teachers in the program cohort, four volunteered to participate in this study: Anthony, Carl, Linda, and Michelle (gender-preserving pseudonyms). Prior to the study, the first author had served as a field supervisor in this TPP but had not supervised any of the four participants. During the study, the four prospective teachers were placed at three different urban high schools with similar demographics.

Data collection, analysis, and results by phase

Phase 1 data collection and analysis

The first author videoed the prospective teachers’ lessons in their entirety (50–90 min each), once a week, eight times during the ten-week period, for a total of 32 video records. Observations began in the fifth week of the 16-week assignment to give the prospective teachers time to collect video consent forms and become familiar enough with the curriculum and pacing to appropriately select dates for video observations. Video observations ended 2 weeks prior to the end of the school semester on the recommendation of the prospective teachers, who reported that their classes would be reviewing for final exams then and not having many whole-class discussions. During the 10-week observation period, each school had a week of school-wide testing and a week of vacation. Observations were not conducted these weeks, resulting in eight observed lessons for each prospective teacher.

Observation lessons were selected collaboratively. The first author selected a consistent class period for each prospective teacher that would allow her to observe multiple prospective teachers in a single day (i.e., were non-overlapping in time). These periods were associated with different courses across the prospective teachers: 9th-grade Geometry (Anthony), mixed-grade Algebra 1B (Carl), 9th-grade Algebra 1 (Linda), and mixed-grade Algebra 2 (Michelle). Then, each week, each prospective teacher selected the observation day, based on these instructions from the first author: “I am investigating your communication with the whole class. I want to video your whole-class lesson, but I need it to be a class when there is no testing and a lot of interaction between you and the whole class, for instance, a class when you are introducing a topic, or have a reform-oriented activity or investigation planned.” Often, the prospective teachers made last-minute changes to a requested date based on changes in their plans or school schedule; therefore, it is reasonable to expect that the observed lesson was usually the one with the most whole-class discussion for the week.

For video analysis, episodes were considered whole-class discussions when teacher and students were discussing a mathematical concept or problem as an entire class. Whole-class discussions occurred at different points in the lessons, during homework review, classwork review, and instruction. The portion of time in each lesson that was spent in whole-class discussion ranged from 60 to 79%, providing approximately 28 h of video of whole-class discussions across the four participants.

All dialogue during these 28 h was transcribed. All mathematics-related questions asked by the prospective teachers were then extracted from the transcripts. Utterances that did not technically take the form of a question but that seemed intended to function as one or that prompted a student response (e.g., “A negative multiplied by a negative is—”) were included for analysis. Questions or utterances that did not seem intended to elicit a verbal, mathematical response from students (e.g., “OK?,” “Ya?,” and “See?”) were excluded.

The identified questions were then coded iteratively without a predetermined list of question types. This allowed a broad range of types to emerge and kept all questions in view. Codes reflected the anticipated response from students—the type(s) of statements the question would be expected to generate. Actual student responses were not taken into account when characterizing questions, because we were interested in the prospective teacher’s intent in crafting the question—the sort of response the prospective teacher was aiming for, which was independent of how the students actually responded. For example, if a prospective teacher asked, “Is this a right triangle?” the expected response would be either “Yes” or “No,” so the question was coded as a “Yes/No” question (even if a student actually answered something else, such as, “I don’t know”). Or, if a prospective teacher asked a student “Why?” the expected response would be some sort of explanation; this type of question was coded as an “Explain Thinking” question (even if no student actually did explain her thinking in response).

Twenty different types of questions ultimately emerged from the analysis. Those 20 types fell into two larger categories: Half of the question types anticipated a response from students that would be predictable by the prospective teacher. These sought a statement by the student of established knowledge, such as repeating a known formula, reading from available text, providing a vocabulary word or definition, or responding with a single word, such as yes/no, true/false, or a calculation (see Table 1 for examples). The other half of the question types anticipated an unpredictable response from students—unknown responses involving students’ explanations of their thinking (see Table 2). These two categories reflect categories previously identified in the literature. For example, predictable questions (PQs) would seem to subsume Gall’s (1984) categories of “recall facts” and “procedural,” as well as types of questions found in the TIMSS (Kawanaka and Stigler 1999) that requested short responses, reading from a text, or choosing among stated alternatives (including yes/no). Unpredictable questions (UQs), on the other hand, echo Gall’s category of questions that “require students to think,” and the TIMSS “describe/explain” requests, both of which are considered “deeper level” or “open ended,” as well as the “authentic” questions described by Kelly et al. (2018), and the “inquiring questions” identified by Alrø and Skovsmose (2004). For most of the analyses in this study, these two broader categories—predictable and unpredictable—were used. (See Table 3 for the question types in each category.)

Table 1 Examples of questions that anticipated predictable responses
Table 2 Examples of questions that anticipated unpredictable responses
Table 3 Question codes

A coding challenge was that the same question could have different purposes, depending on context—in particular, how the student was expected to arrive at the answer. For example, if a prospective teacher asked a student to state the length of a segment, we needed to determine whether the answer was already visible (i.e., written on the board), in which case this question would be coded as P1. Alternately, the segment length might need to be calculated or estimated, in which case various other codes could apply, requiring further inferences about the availability of information that the student would need for the calculation or estimation and whether the student could follow an already-taught procedure or needed to invent a strategy. To increase the reliability of this coding, the first author worked with her faculty advisor, who regularly reviewed sample questions and discussed their codes with the first author, especially for questions that required more inference to code. Once the 20 codes had been established, the full set of questions were coded and the frequency of each type of question was recorded.

Phase 1 results: the prospective teachers’ questions

As a group, the prospective teachers asked twice as many predictable questions than unpredictable over the study period. Table 4 shows the numbers and types of questions asked by the group of prospective teachers over their eight lessons. Out of 989 total questions asked, 302 (31%) were unpredictable. Within the unpredictable category, the prospective teachers asked more “How?” than “Why?” questions. In the predictable category, the prospective teachers asked more questions that solicited the calculations, steps, or choices involved in solving a problem than questions that prompted students to recall or restate formulas, vocabulary, given information, or previous steps or problems. Overall, four types of questions prevailed: soliciting a calculation result (P2), soliciting an agree/disagree choice (P8), prompts to verbalize the next step in a procedure (P3), and eliciting in-the-moment thinking (U7).

Table 4 Total predictable and unpredictable questions over eight lessons

The kinds of questions the prospective teachers asked changed over time. The prospective teachers asked a greater variety of question types in the earlier lessons. In Lessons 2 and 3 (about 6 or 7 weeks into their assignments), the prospective teachers, as a group, asked every type of question that was ever observed during the 10-week period—variety that was not seen in later lessons. Nearly half of all unpredictable questions (UQs) that were asked over the 10-week period were asked during Lessons 2 (30%) and 3 (17%). In Lesson 1, a large majority of the prospective teachers’ questions were predictable (PQs). The number of UQs rose considerably in Lesson 2, but in the weeks to follow decreased again. Most tellingly, the prospective teachers, as a group, asked ten times as many UQs in Lesson 2 (91 UQs) as they did in Lesson 8 (9 UQs), the final observation. Figure 1 compares the numbers of PQs and UQs that the prospective teachers, as a group, asked during each lesson.

Fig. 1
figure 1

Number of predictable and unpredictable questions asked per lesson

It is important to note that the overall number of teacher questions also decreased over the 10-week period, with decreases in both categories. Apparently, some phenomenon other than a gradual avoidance of UQs was also at play. We have no hypothesis about what caused this overall decrease, but we believe we can rule out three explanations. One is that course content changed over the 10-week period to become less conducive to UQs. This is unlikely to explain the overall decrease in questions because the decrease was seen with every teacher, despite their differing courses, and because the content shifted multiple times within each curriculum during the eight lessons. While it is possible that in one prospective teacher’s class, the topics in the earlier weeks were more conducive to question asking, it is unlikely that this would be true of four different curricula (no two prospective teachers taught the same topic in any of the eight lessons). Another possible explanation for reduced questioning is that the amount of time spent in whole-class discussion also decreased, limiting the opportunity to observe questions in later weeks. This explanation can be ruled out because there was no evidence of a decrease in whole-class discussion time; this remained close to 77% of class time throughout the study period. A third explanation is that the need to prepare for standardized testing suppressed teacher questions. Again, we rule out this explanation because of timing. All four classrooms experienced statewide testing during the study period, but testing came immediately after Lesson 4 or 5 for every prospective teacher. Thus, any test preparation would have been more intense in the first three to four lessons (when more teacher questions were documented) than in later lessons. While not an explanation, it is noteworthy that the steep drop in total questions in Lessons 7 and 8 was driven by two prospective teachers, Anthony and Linda, who asked no questions during these lessons.

The unexplained overall decrease in questions notwithstanding, it remains the case that the proportion of UQs also decreased. Figure 1 shows that the percent of unpredictable questions ranged from a high of 49% in Lesson 2 to a low of 18% in Lesson 8. The number of UQs decreased from 221 asked during Lessons 1–4 to 81 asked during Lessons 5–8—a decrease of 63%, while the number of PQs decreased from 428 asked during Lessons 1–4 to 259 asked during Lessons 5–8—a decrease of only 39%.

The patterns identified for the group applied to the individuals, too. Figure 2 shows that, despite varying total numbers of questions asked by the prospective teachers, all but Michelle asked declining proportions of UQs after a spike in Lesson 2 or 3. (Michelle’s proportion of UQs was slightly higher in Lesson 7 [28%] than in Lesson 3 [26%], but these represent small numbers of UQs: Michelle asked only 9 UQs in Lesson 7, compared to Anthony, Linda, and Carl’s highs of 47, 17, and 20 UQs, respectively, in Lesson 2 or 3.)

Fig. 2
figure 2

Percents of unpredictable questions asked by each prospective teacher

For reasons just discussed, this decline in UQs seems unlikely to be explained by curricular or topic changes over time, by the amount of whole-class discussion, or standardized testing. Further, the nature and structure of the lessons did not observably change for any of the prospective teachers but followed a similar pattern across prospective teachers and weeks: A homework review was followed by a warm-up problem, the prospective teacher would then provide direct instruction with solved examples while students took notes, and finally students would work guided-practice exercises (sometimes with manipulatives) individually or in small groups. The decline in UQs could not be explained by the curriculum or lesson structure.

Phase 2 data collection and analysis

The first author interviewed the prospective teachers individually, twice. The first interview occurred during the first week of the study, about 4 weeks into the student-teaching assignment. The second interview occurred near the end of the study period. The two interviews used different protocols, but both were designed to elicit the prospective teachers’ views on teaching and about questioning in particular, to provide insight into why the prospective teachers asked the kinds of questions they did and, if their questioning had changed over time, why. While these interviews were conducted prior to Phase 1 data analysis, they are described here as part of Phase 2 because their analysis was done in light of the Phase 1 results.

The first interview was grounded in the prospective teachers’ written responses on the Performance Assessment for California Teachers (PACT) Teaching Event (www.pacttpa.org)—a high-stakes, state-mandated, culminating assessment of teaching (and precursor to the edTPA). This interview focused on the prospective teachers’ narrative responses to three prompts from the PACT Instructional Task (one of five tasks):

  1. (a)

    Describe any routines or working structures of the class (e.g., group work roles, class discussion norms) that were operating in the learning task(s).

  2. (b)

    In the instruction, how do you further the students’ knowledge and skills and engage them intellectually in understanding mathematical concepts, procedures, and reasoning.

  3. (c)

    Describe the strategies you used to monitor student learning during the learning task.

The PACT designers’ presumed purpose for these prompts matched our interest: These prompts were expected to reveal prospective teachers’ views on methods for engaging students intellectually, including how whole-class discussions should be orchestrated and the uses of questioning. Indeed, of the many responses required on the PACT Teaching Event, it was in their responses to these three prompts that all four prospective teachers had described or implied their views on teaching and learning, questioning practices, or whole-class discussion. (No PACT prompts ask about questioning specifically, but we considered the generality of these three prompts a methodological advantage. Interviewing the prospective teachers explicitly and only about questioning would have revealed the focus of the classroom observations and perhaps artificially elevated the number of UQs the participants would naturally have asked, as Blanton et al. (2001) and Bennett (2010) suspected had occurred in their studies.) Because the prospective teachers had submitted their PACT Teaching Events just before the first interview, their responses were assumed to reflect their views at the start of the study.

Prior to the first interview, the first author had reviewed these PACT responses and highlighted statements that appeared to reflect personal views on teaching and learning mathematics and about the types of questions the prospective teachers felt they should ask their students. These included statements in which words or phrases such as “feel,” “think,” “believe,” “have come to realize” were used in reference to teaching or learning mathematics, expressions of opinions about students’ ability to learn mathematics, and claims about the value of certain kinds of questions or prompts for students. The main purpose of this interview was to verify the first author’s perception that the written statement indeed reflected the prospective teacher’s view and to clarify that view. During the interview, the first author explained, “These are the statements I found that I thought represented your views on teaching.” Then, for each highlighted statement, she asked, “Could you elaborate on what you meant by—” and read the statement. These first interviews lasted approximately 20 min each.

After the 10-week observation period, the first author again interviewed each prospective teacher individually for about an hour. Prior to these second interviews, the prospective teachers were reminded of the topics of each of their videoed lessons and asked to select a lesson that had significantly impacted their teaching approach for future lessons. In the first half of the interview, the interviewer reviewed the prospective teacher’s now 10-week-old responses from the PACT Teaching Event and, for each, asked, “Do you still feel this way?” and “Why?” or “Why not?” Every prospective teacher reported having changed some of their earlier views. When a prospective teacher reported a changed view, the interviewer followed up with questions such as, “How has this change in your feelings affected your teaching?” or “What other types of challenges have you faced?” as appropriate.

For the second half of the interview, the prospective teachers were invited to (re)view the video of the lesson they had selected and asked what it was about the lesson that had impacted them or taught them something about teaching. All four prospective teachers took the opportunity to review the video. They were given the freedom to maneuver through the video, stop and discuss segments, or voice a question to themselves about what had occurred. The first author had prepared six questions:

  • Why did you choose this lesson?

  • What was your rationale or goal when you were planning this lesson?

  • Did you change your lesson plan while you were actually teaching?

  • Describe anything you would do differently the next time you teach this lesson.

  • Describe any challenges you faced implementing this lesson.

  • Is there anything else you want to add?

Every prospective teacher was asked the first and last question; others of these questions were asked if the prospective teacher had not spontaneously answered them, as time permitted. After the prospective teachers had finished talking about the video, they were asked the final question, “Did you happen to notice or do you have any comments about the questions you asked the students?”

Each interview was audio recorded and transcribed. Statements that were determined to express views or rationales related to questioning, other teaching strategies, or student learning were extracted. Extracted statements from the initial and final interview were tabulated for comparison, to illuminate similarities and differences between each prospective teacher’s views at the beginning and end of the study period.

Phase 2 results: the prospective teachers’ views and rationales

During the first interview, all four prospective teachers noted that when they asked questions that required their students to explain their thinking (UQs), their students often struggled or seemed frustrated with the question. The prospective teachers explained that they had discovered they needed to “scaffold more” (as Linda put it) than they had originally expected. They repeatedly expressed concern about their students feeling “confused” or “scared.” Linda observed, “When I ask students ‘Why?’ they freak out and think they’re wrong.” The prospective teachers felt compelled to put their students at ease by asking better or more carefully phrased questions. As Carl noted, “I noticed that my students were afraid of trying to answer because they didn’t want to look foolish. I just need to be more aware of what I ask.”

Putting students at ease did not, during the first interview, seem to mean backing away from UQs; at this point, the prospective teachers still trusted the efficacy of UQs and requiring their students to explain their thinking. As Anthony said,

I don’t believe in directly giving my students the answer to my question if they don’t know it. Instead, I prefer to guide them and give them hints or questions to the extent where they are able to discover the answer on their own.

Michelle, similarly, reflected, “During presentations, I need to refrain from the urge to explain concepts for students and ask better questions so that [the students] explain.”

The second interview revealed a noticeable shift in all four prospective teachers’ views on what types of questions they should ask their students. Now, they expressed a need to avoid questions that challenged students or asked them to explain their thinking, because such questions “frustrated students” who had low math skills and caused whole-class discussions to go “awry.” All four prospective teachers complained that many of their students did not like math, which made it difficult to engage them in discussion or even get them to answer a question. The prospective teachers wanted to find ways to reduce student frustration, “ease students’ fears,” and “boost their confidence”—sentiments similar to ones expressed in the first interview. But now the prospective teachers felt that the way to accomplish this was to offer students “tricks” and “shortcuts” rather than “confusing them with questions.” Michelle explained, “I realize now that my students benefit more from direct instruction. They need to watch a step-by-step example, then follow it by doing a similar problem independently before they encounter abstract concepts.” Anthony observed, “When you throw a hard question at them, they don’t understand, and it throws them off. I noticed a lot of them wanted that step-by-step of what to do.” Carl came to a similar conclusion: “Once I ask, they are stuck and need a little bit of help so, I had to make instruction very, um, very explicit.” Linda was no different: “I asked them and they couldn’t understand. I showed them the shortcut the next day.”

Thus, a causal mechanism emerged to explain the decline in UQs—a mechanism that the prospective teachers themselves partly explicated: In the early weeks of student teaching, the prospective teachers asked UQs because they had viewed UQs as pedagogically important (presumably having developed this view in their TPP coursework); the questions were met with student resistance, frustration, and inability to answer; the prospective teachers gradually came to interpret this reaction as an indication that UQs were pedagogically ineffective for their students; and the prospective teachers abandoned UQs in favor of more-direct instruction, PQs, and the provision of procedural steps and shortcuts. We cannot prove this causal mechanism with the methods of this study, but the evidence from Phase 2 shows that the prospective teachers’ views on the value of UQs indeed changed, while the evidence from Phase 1 confirms that the prospective teachers abandoned UQs as the weeks wore on. For a fuller picture of this phenomenon, we returned to the videos to examine the students’ responses, in Phase 3.

Phase 3 data collection and analysis

The Phase 2 results indicated that the prospective teachers had moved away from UQs because they had perceived negative reactions from their students. We returned to the videos and transcripts to systematically examine the students’ responses, hoping to illuminate the nature of the students’ responses, negative or otherwise.

The first author identified in the transcripts the instances of students’ responses to each of the prospective teachers’ 989 questions. Then, she viewed the video of those responses to get a better sense of the students’ affect when responding (e.g., through voice tone, facial expression, and body language) than the transcript alone could provide. In an iterative process, she described the nature of each response that she initially considered negative, then developed a set of (non-exclusive) types of responses, until she reached a set that exhausted her descriptions. Ultimately, this set was:

  • Student is defensive about having been asked the question and expresses this defensiveness instead of answering.

  • Student answers the question with a non-mathematical (invalid) response.

  • Student pronounces the question unimportant or irrelevant.

  • Student may attempt to respond but displays amplified stress or worry.

  • Student expresses caring only about the solution to the task and not further discussion.

  • Student gives a numerical solution to the task rather than the requested explanation.

Responses coded as nonnegative, in contrast, comprised appropriate answers to the question; sincere attempts to answer; or reasons for not answering, such as “I don’t know;” in all cases, with no detectable negative valence. (Some of the 989 questions were also coded as receiving no response.) Table 5 shows excerpts from the data to illustrate how the questions and responses were coded.

Table 5 Examples of coded responses

Once all responses were coded, the number and proportion of negative and nonnegative responses were compared between PQs and UQs and across time.

Phase 3 results: student responses

For the full set of 989 questions asked over the 10-week period, UQs elicited a greater number of negative responses from students than did PQs: 38 (13%) of the 302 UQs were met with negative responses versus only one of the 687 PQs. Table 6 shows the distribution of the UQs and negative responses for each prospective teacher.

Table 6 Distribution of negative student responses to unpredictable questions

Table 6 gives additional perspective on the prospective teachers’ perceptions. It is true that, across all lessons, UQs received far more negative responses than did PQs (38 vs. 1), but the number of negative responses received by any prospective teacher in a single lesson was small. Only three lessons had more than two negative responses: Anthony and Linda in Lesson 2 and Linda in Lesson 5 (boldfaced in Table 6). Further, in these three lessons, the prospective teacher had asked a large number of UQs, so the three or four negative responses received represented small percents of the responses given to UQs (6, 20, and 21%, respectively). Lessons that show high percents of negative responses (Linda, Lesson 3, 40%; Michelle, Lesson 5, 100%; Michelle, Lesson 6, 50%) are less indicative, as small numbers of UQs were asked in these lessons, eliciting only one or two negative responses. Student responses coded as nonnegative, however, were not necessarily positive. Most were unemotional responses with short answers or claims not to know the answer. Few UQs were answered with extended explanations, arguments, or descriptions of students’ thought processes—the kinds of responses UQs are intended to elicit. In the next section, we discuss possible explanations for the prospective teachers’ perceptions and changed practice.

Discussion

Over the 10-week period of this study, the majority of questions that the prospective teachers asked during whole-class discussions required predictable responses (PQs). Yet, the prospective teachers also made attempts in the earlier weeks to ask unpredictable questions (UQs) that aimed to elicit their students’ thinking. All four prospective teachers initially expressed the view—consistent with what their TPP promoted—that asking thought-eliciting questions was important. Over the 10 weeks, however, the prospective teachers’ questioning practices evolved away from what they had learned in their TPP courses and had initially valued. After 10 weeks, the practice of asking UQs had practically been extinguished in their classrooms. Interview data indicated that the prospective teachers had made conscious decisions to avoid UQs because of their students’ responses.

We infer that the prospective teachers’ decisions were at least in part affectively mediated, affording relief from the discomfort of what the prospective teachers perceived as occasions of strong student resistance and negativity. This inference is consistent with findings by Guskey (1986), Lloyd (2008), and Bohl and Van Zoest (2002) that show the impact of student responses on teachers’ and prospective teachers’ decisions.

Interestingly, the analysis of student responses showed that the vast majority of the prospective teachers’ UQs had not elicited negative responses. Anthony, for example, asked 107 UQs in his first four lessons and only received seven negative responses. Rather than suggest that the teachers had overestimated their students’ resistance, however, we can understand their perception. First, although two or three may seem like small numbers on a data table, two or three negative outbursts in a single lesson likely feel significant to a prospective teacher. Table 6 gives no indication of the intensity of each negative response, but the examples in Table 5 capture some of the emotional “heat” that was apparent in the videos. Second, because most of the students’ responses to UQs, while nonnegative, were not particularly positive, the prospective teachers received little positive reinforcement for asking UQs, which likely brought the negative responses into sharper relief while suggesting that UQs “didn’t work” as intended for their students. Considering these points helps us appreciate why an apparently small number of negative responses may have caused considerable distress for the prospective teachers.

Limitations

The ability to generalize from this study is limited because it involved only four prospective teachers in one TPP. Another potential limitation is the pattern of observations: one lesson per teacher per week. While this pattern afforded a view of the prospective teachers’ questioning over time, it obscured possible developments in the course of the classroom discussions of a single topic over consecutive lessons.

We acknowledge that our window on these classrooms may lack in ecological validity due to the presence of the researcher and the prospective teachers’ selection of the video days. The prospective teachers may have tried to teach in ways they thought would please a researcher associated with their TPP, and they may have chosen their video days to present the “best” picture of their teaching (in fact, they were essentially asked to do this). If anything, this would seem to skew our results conservatively, overestimating the prospective teachers’ natural rate of UQs. Possible effects of the presence of the researcher and camera on the students are harder to predict. Students might work harder or less hard to please their prospective teacher in front of an audience, depending on their rapport with their prospective teacher. Finally, the prospective teachers may have been influenced by the culture of their schools or departments, by their cooperating teachers, or by each other, in ways we could not detect. We do know, however, that the prospective teachers’ university-course schedule left them little time to observe or collaborate with other teachers at their schools or to attend staff meetings.

We did not attempt to document the prospective teacher’s purpose in asking each question. The isolated exchanges we studied convey little of the context surrounding each teacher question, and we recognize that one question can have different intents or effects on students depending on context. It may not always have been the case that a prospective teacher’s PQ was aimed at helping students accomplish the task at hand or reducing their frustration; some PQs might have been thoughtfully designed to promote the development of students’ thinking. Still, in the aggregate (of nearly 1000 questions), UQs give students more opportunity for thinking and expression than do PQs, and an overall decline in UQs over time can be expected to change the cognitive demand on students. Our method is consistent with other studies in the literature we reviewed that identify types of questions in isolation from their context and report their effects on students.

Nor did we explicitly ask the prospective teachers why they abandoned UQs over time; thus, our inferences about what motivated the overall decline in UQs may have missed other factors. The prospective teachers may have been reacting to their students’ inability to answer UQs and concluding that UQs were not leading to desired learning outcomes. The prospective teachers may have felt increasing pressure, as the weeks progressed, to accelerate the pace of content coverage, which they may have believed UQs were impeding. Further research should examine whether these or other factors inhibit high-level questioning from prospective teachers (and teachers). Our interview data, however, suggest that these prospective teachers were reacting primarily to their students’ responses. Moreover, they were not simply acting to escape the discomfort of student resistance. The prospective teachers also gave pedagogical rationales for reducing their use of UQs. They had come to feel that UQs were neither instructionally effective nor appropriate for their students. And when the prospective teachers did explain their intent for the use of PQs, it was never to deepen understanding.

Implications

This study illustrates a process by which well-intentioned prospective teachers, in trying to support their students, were led to make decisions that ran counter to what they had been taught in their TPP and which could be detrimental to their students’ learning. This study only examined questioning, but we suspect that the kinds of questions teachers ask offer insight into their overall teaching approach. Thus, our findings raise important questions for TPPs and for future research.

Many mathematics TPPs currently aim to instill key understandings that ground the practice of asking high-level questions: that asking high-level questions offers opportunities for deeper understanding, that student mathematical talk promotes learning, and how to facilitate discussions effectively. This study suggests that these understandings, while necessary, are insufficient to ensure uptake of the practice. Teacher educators must better understand how field experiences shape prospective teachers’ views, decisions, and actions. Anticipating field-based influences—in particular the resistance of students—university coursework and field supervision should seek ways to “inoculate” prospective teachers against pressures that could drive them away from TPP-promoted practices. Just making prospective teachers aware, before they begin fieldwork, of the ways that their students are likely to respond to unfamiliar practices would reduce the surprise and discomfort that arises when students respond less than positively. This pre-awareness would also prevent prospective teachers from automatically equating student resistance with faulty teaching.

Teacher educators can also help prospective teachers acquire the skills needed to prepare their students emotionally and intellectually for high-level discussions and productive struggle. Prospective teachers could read case studies (e.g., Chazan 2000; Lampert 2001; Lampert and Blunk 1998; Staples 2007) of exemplary mathematics teachers who transformed the culture of their classrooms to support high-level discourse. Such case studies illuminate that this transformation is gradual and requires deliberate teacher moves to shape classroom norms and build students’ discursive abilities. In TPP courses, prospective teachers could be asked to read transcripts or view video of classroom discussions that illustrate a range of ways—positive and negative—in which students might respond to high-level demands; then prospective teachers could be guided in collaborative conversations about how teachers should respond in turn. Once in the field, prospective teachers could be guided in this same exercise but with transcripts or videos of their own class. Research articles, such as one by Hufferd-Ackles et al. (2004), offer rubrics or criteria for analyzing various dimensions of student and teacher behavior at different “levels” of discourse. Such rubrics emphasize the gradualness of transforming classroom discourse but also provide images of partial transformation and give prospective teachers more reasonable and concrete benchmarks. Prospective teachers could also interview students about their views on their roles in learning mathematics and in classroom discussions in particular, to better understand the beliefs, expectations, fears, and hopes that underlie the students’ reactions.

Future research should study the effectiveness of the various recommendations suggested here in helping prospective teachers continue to use TPP-promoted questioning practices, even when students do not respond as desired, as well as search for other strategies. It is also important to further examine teaching trajectories, following teachers over time to understand how their practice evolves as they gain experience and what factors influence their views and actions at different points in their career. This study only covered a 10-week span. Although 10 weeks may represent a major portion of the student-teaching experience, and many prospective teachers develop considerably in that brief period, 10 weeks is a tiny fraction of a teacher’s overall career (we hope). Better understanding the career-long trajectory of the relationship among a teacher’s environmental influences, her views, and her practices would allow professional development to be tailored to teachers’ experience level.