Introduction

According to the U.S. Distance Learning Association (cited in Bernard et al. 2009), interaction is an essential component of the distance learning experience: “distance education refers specifically to learning activities within a K-12, higher education, or professional continuing education environment, where interaction is an integral component” (p. 1246). Although online interactions can occur in a number of different ways, the most common comprises one of three types: student-instructor, student–student, or student-content (Moore 1989). Ultimately, the pedagogical goal of all types is to increase students’ understanding of the course content (Thurmond and Wombach 2004). And based on the results of a recent meta-analysis (Bernard et al. 2009) that goal is, indeed, being met. That is, after examining the achievement effects of 74 studies in which at least one interaction treatment was implemented, Bernard and his colleagues concluded that embedding interaction within distance education courses had a positive impact on student learning. Furthermore, student–student and student-content interactions had significantly higher effect sizes than student-instructor interactions.

Moore (1989) defined student-content interactions as those that result in “changes in the learner’s understanding, the learner’s perspective, or the cognitive structures of the learner’s mind” (p. 2). Although student-content interactions typically occur when students complete course readings, engage with multimedia materials (e.g., simulations, software), or finish course assignments, participation in course-related online discussions can also facilitate student-content interactions. While both forms of student-content interaction have the potential to promote learning, Cunningham (1992) claimed that listening or reading, by themselves, cannot challenge learners’ egocentric thinking sufficiently to generate new learning. As noted by Pea (1993), knowledge construction is a social, dialogic process. In online learning environments, this type of meaning making is accomplished, primarily, through the use of asynchronous discussions, designed to engage students in the processes of articulating, reflecting on, and negotiating their understandings of course content (Jonassen et al. 1995). As such, the student–student interactions that occur via asynchronous discussions offer a meaningful way to facilitate student-content interactions.

Currently, asynchronous discussions are considered the cornerstone of online courses (De Wever et al. 2006). According to Haavind (2006), online discussions enable students to explore multiple perspectives, negotiate content meaning, and identify their own knowledge gaps. Used in both wholly online and hybrid courses, asynchronous discussions can replace or extend in-class dialogue, providing opportunities for students to interact with each other over course-related topics. In reviewing the importance of interaction to students’ learning in online environments, Oncu and Cakir (2011) reiterated Zhu’s conclusion: “instruction is most effective when it is in the form of discussions or dialogues” (p. 1099).

However, measuring student-content interaction through participation in online discussion forums poses a number of challenges. For example, which types of interactions should be counted? Clearly, not every student post is meaningful or relevant to course content (Ertmer and Stepich 2004; Ertmer et al. 2007). In general, researchers (De Wever et al. 2006; Meyer 2004) have moved away from quantitative measures of interaction (e.g., number of posts) to more qualitative measures (e.g., quality of posts), typically defined in terms of critical thinking (Ertmer and Stepich 2004; Lee 2008; Walker 2004; Yang 2002).

Critical thinking, according to Halpern (2003), is “…thinking that is purposeful, reasoned, and goal-directed—the kind of thinking involved in solving problems, formulating inferences, calculating likelihoods, and making decisions” (p. 6). While critical thinking is not identical to higher-order thinking, many authors use these terms synonymously. Generally speaking, both higher-order and critical thinking are described as involving those cognitive processes that are at the higher levels of Bloom’s taxonomy (1956) such as analysis, synthesis, and evaluation (Pear et al. 2001; Szabo and Schwartz 2008).

Although online discussions have the potential to engage learners in meaningful discourse and to promote critical thinking related to course content, simply giving students the opportunity to discuss course content does not automatically lead to higher levels of thinking (McLoughlin and Mynard 2009). Based on the results of their research, Garrison, Anderson, and Archer (2001) noted that over 80% of students’ discussion posts reflected lower levels of thinking. Similarly, Gilbert and Dabbagh (2005) reported that approximately 75–80% of their students’ online postings were at the lower levels of Bloom’s taxonomy (e.g., knowledge, comprehension, application).

There are many reasons why students’ online postings may reflect relatively low levels of critical thinking, with a key reason being the structure of the discussions, in general, and question prompts, more specifically (Bradley et al. 2008; Yang 2002). Based on research findings of both Blanchette (2001) and Meyer (2004), students’ responses were observed to reflect the level of questions posed by the instructor. That is, if students were asked to describe a personal experience related to a topic, they tended to share personal stories; if asked to solve a dilemma posed by a case study, they tended to propose and justify solutions (Meyer 2004). Seemingly, by modifying the types of questions asked, faculty could more readily target the kinds of learning outcomes they wished their students to obtain (Andrews 1980).

Over the years, research has consistently demonstrated a strong relationship between levels of teachers’ questions and subsequent student responses (Bloom 1956; Dillon 1994). However, the majority of this research has been conducted in face-to-face settings (Andre 1979; Pear et al. 2001; Vogler 2008). As the popularity of online instruction grows (Allen and Seaman 2008), it is important to examine the nature of this relationship in the online environment as well. How do teachers’ question prompts influence the responses posted by their online students as well as the amount of interaction that occurs among the students? Are some questions more productive at eliciting greater amounts of interaction at the higher levels of thinking?

Recently, researchers have started to examine the link between the structure/type of question prompt and the quality of students’ postings (Bradley et al. 2008; Meyer 2004; Richardson and Ice 2010). For example, after examining students’ postings in two different courses, McLoughlin and Mynard (2009) concluded that the different proportions of postings at the different levels of critical thinking were “likely due to the nature of the task and the wording of the prompt” (p. 155). Based on their results, McLoughlin and Mynard recommended that online discussions be carefully constructed or the “circumstances conducive to promoting higher-order thinking may not arise” (p. 149). According to Blanchette (2001), teachers’ uses of low-level questions in online discussions can actually discourage student participation and thus, limit their opportunities to interact with the content, think critically, and ultimately, to learn.

Researchers have used a variety of classification schemes to categorize the types of questions teachers ask (Andrews 1980; Bloom 1956; Wilen 1991). For example, Andrews’ (1980) typology classified discussion prompts in terms of the strategy being used (e.g., brainstorm, general invitation, funnel) and included nine different types (see Table 1). In contrast, Bloom’s classification scheme represents, more directly, levels of critical thinking (e.g., application, analysis, synthesis; see Table 2). Using Bloom’s scheme, questions are classified as representing lower levels of thinking if they involve knowledge recall, comprehension, or application; higher level questions tend to require analysis, synthesis, or evaluation (Ertmer et al. 2007; Pear et al. 2001).

Table 1 Types of question prompts (adapted from Andrews 1980)
Table 2 Levels of thinking represented by Bloom’s taxonomy

Researchers also have developed different ways to “count” interactions in online discussions (Ertmer and Stepich 2004; Rourke et al. 1999; Swan 2002). For example, Andrews (1980) used the term “mileage” to describe quality, productive, face-to-face discussions. According to Andrews, discussions with greater mileage were those that (1) elicited a variety of student responses (NSS—number of individual student contributions), (2) involved the majority of the class (NS—number of students active in the discussion), (3) displayed momentum; that is students continued interacting without additional prompting (STT—the duration of all student talk), (4) engaged students directly with each other (NS-S—number of instances in which one student’s comment is followed immediately by another), and (5) resulted in a greater percentage of student talk versus teacher talk (%S—number of separate student comments divided by total number of all comments following a given question). Notably, all of these criteria, with the possible exception of number 3 (duration of student talk), can be applied to online discussions.

Purpose

Questions are one of the primary strategies used to facilitate student interaction in online discussions (Wang 2005), thus it’s important to understand how different types/levels of questions influence students’ subsequent responses and interactions. To what extent do students respond to high-level questions with high-level responses, as suggested in the literature? Do some types/levels of questions more readily lead to greater levels of student engagement and/or interaction? If so, which ones? This research was designed to examine, more closely, which type and level of questions resulted in the greatest amounts of interaction and the highest quality of students’ responses. The research questions guiding our efforts included:

  • What is the relationship between the level of question prompt (using Bloom’s taxonomy) and the level of students’ responses (using Bloom’s taxonomy)? Which levels of question prompts promote the highest levels of critical thinking?

  • What is the relationship between the type of question prompt (using Andrews typology) and the level of students’ responses (using Bloom’s taxonomy)? Which types of question prompts promote the highest levels of critical thinking?

  • Which levels and types of question prompts promote the greatest amount of student–student interactions, particularly at the highest levels of critical thinking?

Method

Overview

This exploratory descriptive study was designed to examine the relationships among question types and levels and students’ subsequent responses/interactions in online discussion forums. Question prompts were classified both by type, as outlined by Andrews (1980), and by levels of critical thinking, as outlined by Bloom (1956). Students’ responses (n = 850), taken from 19 discussion forums, were coded using Bloom’s six levels of cognitive processing: knowledge, comprehension, application, analysis, synthesis, and evaluation. Interaction patterns were determined using three of Andrews’ “mileage” indicators: average number of responses/student, average number of student–student sequences per question prompt, and average number of discussion threads and posts within a thread for each question prompt. Interaction patterns were then compared to levels of critical thinking elicited by each prompt to determine which question prompts led to the greatest amounts of interaction at the higher levels of critical thinking.

Context

In order to examine the relationships between the levels and types of question prompts and the level of students’ responses in online discussions, we examined discussion prompts from 10 asynchronous courses, taught by seven different instructors during five semesters: spring and fall, 2008; and spring, summer, and fall, 2009. Three courses were taught primarily online while seven used online discussions to augment regular class meetings. Courses ranged in size from 9 to 221 students (n = 569) and represented six disciplines including Educational Technology; Educational Psychology; English Education; Literacy and Language; Speech, Language, and Hearing Sciences, and Veterinary Medicine (see Table 3). The students in each course engaged in online discussions related to course content during 16-week semesters. In general, students received participation points for the responses posted in the online discussions.

Table 3 Course and participant details

Data collection

Ninety-two question prompts were collected from 10 courses and classified using both Andrews’ typology (1980) and Bloom’s taxonomy (1956). To ensure accuracy in the categorization, all 92 questions were classified by two authors independently and then compared for any differences. After reviewing each of the classifications, researchers discussed their differences and clarified individual interpretations in order to reach consensus. During these deliberations, two of Andrew’s categories (quiz show and multiple consistent) were eliminated and one category (shotgun/funnel) divided into two distinct categories (shotgun and funnel). The two categories were eliminated because none of the 92 discussion prompts fell into either of these categories. The one larger category was divided into two based on the fact that shotgun/funnel appeared to include two different types of prompts, which we believed had the potential to lead to different types of student responses.

It is important to note that many discussion starters included multiple questions for students to consider (see “Appendix”). In general, this was not an issue when classifying prompts according to Andrews’ typology since many of his categories (e.g., shotgun, analytic convergent, etc.) were defined based on this characteristic. However, classifying multiple questions within a single discussion prompt using Bloom’s taxonomy was more difficult; that is different questions within the same prompt often represented different levels of thinking. To address this issue, we made the decision to “code up” (Garrison et al. 2001). That is, we used the highest level of Bloom’s taxonomy coded within a prompt as the final code for that entire prompt. This was based on the rationale that if one question required analysis, for example, in addition to knowledge or comprehension, students were still being prompted to think analytically to answer that portion of the prompt. This is similar to what Bradley et al. (2008) did in which the final code used for a response was that which represented the highest level of thinking observed.

After coming to consensus on the classifications for the 92 discussion prompts, we then selected 18 discussions (2 from each of the final 9 Andrews’ categories; see Table 1) to use for the analysis of students’ postings. Questions were selected with the goal of including at least one prompt from each course. After meeting this criterion, we then selected discussions that included a relatively greater number of student posts (more breadth) or greater amounts of interaction (more depth), and/or provided the clearest examples of the question types. Finally, one additional question was selected based on the fact that it was one of the relatively few prompts that represented the higher levels of Bloom’s taxonomy (synthesis and evaluation). The final set of 19 question prompts is included in the “Appendix”.

Data analysis

After identifying 19 discussions for coding, the two authors independently coded students’ postings in four of the discussions using Bloom’s taxonomy. Postings were scored at the message level, which varied in length from a sentence to several paragraphs. After coming to consensus on the codes for the responses in these four discussions, each researcher independently coded approximately half of the remaining 15. Following this, discussion codes were entered into NVivo, a qualitative analysis software package. Matrix coding queries were conducted to examine relationships among specific, selected variables (question type, question level, etc.).

To answer our first research question regarding the relationship between level of question prompt and level of students’ responses, we totaled the number of responses that were coded at each level of Bloom’s taxonomy for each of the 19 selected discussions. First we grouped the 19 question prompts into each of the six levels of Bloom’s taxonomy, which resulted in the following number of questions per level: Knowledge = 1, Comprehension = 3, Application = 5, Analysis = 6, Synthesis = 1, and Evaluation = 3. Then, we calculated the total number of students’ responses at each level of thinking for each category of questions. Finally, we calculated the percentage of responses at each level. This enabled us to see the extent to which lower- and higher-levels of questions led to lower- or higher-level responses. Because questions at the lower and higher levels of Bloom’s taxonomy were not used as frequently as those at the middle levels, we decided to group discussion prompts into three categories (low, middle, and high) to provide a more robust comparison between different levels of questions.

To answer our second question regarding which question type (using Andrews’ typology), prompted the greatest percentage of high-level responses, we calculated the total number and percentage of responses that were coded at each level of Bloom’s taxonomy for each question type. For the purposes of this analysis, we used the original 18 discussion prompts (2 from each of the final 9 Andrews’ categories) in order to represent each question type equally. After reviewing our initial results, we decided to group students’ responses into low, medium, and high levels by combining the respective lower, middle, and upper two categories of Bloom’s taxonomy. This enabled us to see, more clearly, which types of questions tended to result in greater proportions of high-, medium- or low-level responses.

To answer our third research question, interaction patterns were analyzed using frequency data. Based on the mileage criteria outlined by Andrews (1980), we counted (1) the average number of responses per student to each question prompt, (2) the average number of student–student sequences for each question prompt, and (3) the average number of student posts within each thread within a discussion. Furthermore, we examined each of these mileage indicators twice: first, classifying question prompts using Andrews’ typology and second, classifying prompts using Bloom’s taxonomy.

Issues of reliability and validity

Lincoln and Guba (1985) recommended that qualitative results be evaluated using the standard of “trustworthiness,” as established by credibility and confirmability. In this study, credibility was gained by examining a relatively large number of discussions facilitated by instructors across 10 different courses from six different disciplines, thus providing triangulation of data sources. The use of multiple researchers led to confirmability of the data. That is, two researchers examined the data individually and then collaboratively as a means of developing consensus on the coding for each question prompt as well as the resulting students’ postings.

Results

Relationship between question level and level of response (Bloom by Bloom)

The results of this study support the hypothesis that higher levels of questions facilitate higher levels of students’ responses (Bloom 1956; Meyer 2004). Figure 1 illustrates general trends in the levels of students’ responses to question prompts categorized at a low, medium, or high level of Bloom’s taxonomy. First, there is a general downward trend in students’ responses at the knowledge and comprehension levels as questions move toward the higher levels. Knowledge and comprehension responses decreased from 53% of the total responses to low level questions to 38% of the responses to high level questions. Second, there is a general upward trend in students’ responses at the analysis, synthesis, and evaluation levels as questions moved toward the higher levels. Analysis responses increased from 25% of the responses to low level questions to 32% of the responses to high level questions. Synthesis and evaluation responses increased from 1% of the responses to low level questions to 17% of the responses to high level questions. Finally, there was no apparent trend among the application responses, which were fairly equally evident for low and medium level questions (20 and 24%, respectively), but relatively less frequently observed (12%) for high level questions.

Fig. 1
figure 1

Level of students’ responses when presented with questions at a low, medium, or high level of Bloom’s taxonomy

As suggested in the literature, higher level questions tended to lead to higher level responses, while lower level questions tended to lead to lower level responses (Limbach and Waugh 2005; Meyer 2004). However, none of the three levels of questions resulted in a majority of responses at the highest two levels of Bloom’s taxonomy. High level questions still resulted in 33% of the responses at the comprehension level, which may reflect students’ tendencies to simply restate or interpret information from their course readings or other postings. Although we would expect student comprehension to serve as the foundation for higher level thinking (Bradley et al. 2008; Kunen et al. 1981), responses that stop at this level suggest that students are failing to build on these understandings to engage in more complex thinking tasks. Kunen et al. (1981) argued that an overreliance on this kind of thinking actually decreases student achievement. On a positive note, however, 32% of the responses to high level questions were at the analysis level. This suggests that high level questions can be used effectively to prompt students to make comparisons, argue the pros and cons for an issue, and/or distinguish subtle differences between ideas or concepts.

Relationship between question type and level of response (Andrews by Bloom)

Based on our analysis of 816 coded responses, nearly half of the responses to the 18 question prompts were classified as low level (47%), with an equal percentage classified at the medium level (47%). Only a few messages (6%) reached the highest levels of thinking. As illustrated by Fig. 2, a small percentage of responses to analytical convergent (14%), focal (13%), and lower divergent (12%) questions reached the highest level of thinking based on Bloom’s taxonomy. In general, these questions required students to integrate ideas, to make decisions, or to take a position and justify it. In contrast, shotgun (80%), lower divergent (68%), critical incident (66%), and playground (62%) questions mainly resulted in responses at the medium level of Bloom’s taxonomy. In general these questions required students to analyze information, and/or to provide personal examples of the concepts being discussed.

Fig. 2
figure 2

Level of students’ responses when presented with different types of question prompts

Among the nine question types, lower divergent questions seemed to be most effective in generating levels of student thinking at the medium and high levels compared to other question types. Overall, for this question type, 12% of students’ responses reached the highest levels (synthesis and evaluation), while 62% represented thinking at the medium level of Bloom’s taxonomy (application and analysis). A review of students’ postings revealed that these questions tended to prompt students to integrate material from multiple sources and to connect relevant ideas from previous discussion posts to support their opinions or decisions. This finding is aligned with Zsohar and Smith’s (2008) conclusion that discussion prompts that incorporate course material, require reflective thinking to go beyond facts, and use judgment to produce knowledge can facilitate higher levels of critical thinking. This suggests that lower divergent questions can be effective in facilitating comparatively higher levels of critical thinking for students participating in online discussions.

Responses to general invitation (69%) and brainstorm (68%) questions resulted in the greatest number of responses at the lowest levels of thinking, compared to other question types. Due to the structure of these types of question (i.e., asking students to give a wide range of responses on a given topic), students tended to exchange ideas, search for explanations, and use personal opinions to support their arguments. This suggests that these questions are primarily effective in prompting students to share their initial ideas on a topic and demonstrate their basic understanding of an issue. However, higher levels of thinking may be possible if instructors add additional prompts that challenge students to go beyond their current understandings by comparing statements and arguments, looking for evidence, critiquing the evidence found, and then making thoughtful decisions based on that evidence (Jonassen et al. 1995).

Interaction patterns: how much are students talking?

Across the 19 discussion forums, the average number of posts/student was 4.6 (SD = 3.29), ranging from a low of .65 for a general invitation question at the knowledge level to a high of 13.95 for a brainstorming question at the application level. In general, students averaged the highest number of posts/student for questions at the comprehension and application levels of Bloom’s taxonomy (6.5 and 6.7, respectively), supporting our earlier conjecture that students tend to be comfortable posting responses requiring low-medium levels of critical thinking.

Using Andrews’ typology, brainstorming and playground questions averaged the highest number of posts/student at 10.8 and 7.2, respectively, suggesting that these questions can generate a lot of responses, although not necessarily at the higher levels of critical thinking, as noted earlier. The lowest average number of posts/student occurred for knowledge (.65) and evaluation questions (2.4) and for funnel (1.7) and shotgun questions (1.4). Although these results may be explained, at least in part, by the manner in which the instructors structured the discussion (e.g., not explicitly requiring students to respond to each other), alternative explanations may lie with the structure of the question itself. For example, knowledge and funnel questions typically require students to respond with a single “right” answer. Tables 4 and 5 provide a more detailed analysis of average student responses to question prompts classified by type (Andrews) and by level (Bloom).

Table 4 Average number of student responses and average number of student–student sequences per Andrews’ question type
Table 5 Average number of student responses and average number of student–student sequences per Bloom’s question level

Interaction patterns: how much are students talking to each other?

While average number of student responses/prompt provides a rough measure of the amount of student talk in a discussion, it doesn’t necessarily capture how much students are talking to each other. Rather, the average number of student–student sequences and the average number of posts within a thread provide better measures of this. Across all discussions, the average number of student–student sequences was 4.04 (SD = 2.71), ranging from a low of .2 for a funnel question at the analysis level to a high of 9.1 for an analytic convergent question at the application level. The average number of threads/discussion was 11.8 (SD = 3.79), ranging from a low of 7 for a brainstorming question at the comprehension level to a high of 20 threads for two questions, both at the analysis level: a funnel and a shotgun question. It’s important to remember that, in general, the more threads observed, the less interaction, as students are more likely to be posting isolated responses rather than responding to their peers. For example, the funnel prompt that resulted in 20 threads had only 4 threads with more than one post. In contrast, the analytic convergent prompt that resulted in 9 threads had anywhere from 5 to 14 postings in each, suggesting a much more interactive discussion.

Further examination of the student–student sequences for question prompts classified by Andrews’ typology demonstrated that brainstorming and playground questions resulted in the highest average number at 7.1 and 7.5, respectively (see Table 4). These two question types also had the greatest average number of student responses/prompt, as noted earlier.

Using Bloom’s taxonomy (see Table 5), questions at the comprehension, synthesis, and application levels resulted in the highest average number of student–student sequences (6.9, 5.4, and 5.0, respectively). However, because there was only one question prompt coded at the synthesis level, it is impossible to know if this pattern would hold for other synthesis questions. As noted for Andrews’ question types, two of the same questions that resulted in the highest average number of student posts/prompt also resulted in the highest average number of student–student sequences.

The questions that elicited the lowest average number of student–student sequences included knowledge and evaluation questions (0, 2.7, respectively) and shotgun and funnel questions (.8, 1.5). These were the same questions identified earlier as resulting in the lowest average number of posts/student. Thus, based on the results of this study, average number of student responses/prompt and average number of student–student sequences appeared to be highly correlated. Andrews (1980) also reported significant positive correlations (r = .93; p < .001) between these two measures and furthermore, reported little variation in correlations across instructors. Given this, instructors may be able to determine the general quality of their online discussions using a single measure. For example, it is relatively easy, especially given the tracking functions in today’s learning management systems, to identify which discussions have the greatest number of student responses without specifically having to count the number of student–student sequences. This, then, allows an instructor to gauge where additional support is needed (or not).

Discussion

Questions are one of the most common and effective strategies for facilitating learning (Clasen and Bonk cited in Limbach and Waugh 2005), both in online and face-to-face environments. In asynchronous online discussions, question prompts play an important role in facilitating critical thinking, specifically through student–student and student-content interactions (Bernard et al. 2009; Blanchette 2001; Rourke et al. 1999; Meyer 2004). Yet critical thinking does not happen automatically in online discussions; rather, instructors must pay close attention to the questions they ask and the facilitation strategies they use (Andrews 1980; Bradley et al. 2008; Vogler 2008). The results of this study suggest that by modifying the questions we ask, we may be able to increase the amount of critical thinking that occurs among our students.

This study examined the relationships between different types and levels of questions and the level of students’ subsequent responses. According to Blanchette (2001), “the cognitive level of the question is a greater determinant of interaction than is the syntactic form [yes–no questions, wh–questions]” (p. 46). Furthermore, the results of her research showed that the cognitive level of students’ responses to instructors’ questions matched the cognitive level of those questions. The results of our study lend support to Blanchette’s findings. That is, in this study, lower level questions tended to result in responses that were primarily at the lower levels of Bloom’s taxonomy, while higher level questions were able to generate more responses at the higher levels. However, responses at the higher levels of Bloom’s taxonomy were still fairly infrequent (approximately 15%), supporting the findings of other researchers: the majority of postings in online discussion boards tend to reflect thinking at the lower levels of Bloom’s taxonomy (Garrison et al. 2001; Gilbert and Dabbagh 2005; McLoughlin and Mynard 2009). This suggests that the potential for questions to elicit higher level responses does not rest solely on the level of question posed.

In this study, questions at every level of Bloom’s taxonomy elicited at least some responses reflecting higher levels of thinking. However, for the most part, additional coaching and/or prompting appears to be needed to facilitate students’ thinking at these higher levels. This is similar to what Biggs and Collis (1982) suggested when they proposed paying less attention to the initial question and more to the interaction that occurs during a discussion. According to Biggs and Collis, the interaction that follows the initial question can be especially effective in focusing students’ attention on higher levels of thinking. For example, explicitly asking students to integrate information from a number of different postings or outside readings might be one way to prompt students to engage in synthesis, while asking them to select and justify a proposed solution from among those offered by their peers might prompt engagement in evaluative thinking. Without these additional prompts, many students seem to miss these opportunities to advance their thinking to the higher levels of Bloom’s taxonomy (Bradley et al. 2008).

Additionally, our results support the notion that divergent questions (i.e., questions that are open-ended, seeking a variety of responses) are relatively more likely to lead to responses at the medium and higher levels of Bloom’s taxonomy than convergent questions (i.e., those that seek one or more specific answers). This is similar to the conclusion drawn by Limbach and Waugh (2005), who stated, “the most productive questions [are those which] will elicit a variety of responses, inviting students to think about and respond at a higher level” (p. 53).

However, our results diverge from Andrews’ results (1980) in a couple of important ways. First, Andrews found that playground, brainstorm, and focal questions tended to facilitate responses at the higher levels of Bloom’s taxonomy; all of these question types were classified as being “divergent, higher level, straightforward, and structured” (p. 145), which Andrews described as the “most fruitful question types” in terms of creating productive discussions. Although our results also demonstrated that playground and brainstorm questions led to “high mileage” discussions, brainstorm questions, specifically, resulted in one of the highest percentages of low-level responses. Playground and focal questions led primarily to application and comprehension level responses, respectively.

As a second point of contrast, in Andrews’ study (1980), analytic convergent and lower-level divergent questions tended to generate about “half the mileage” of the playground, brainstorm, and focal questions (p. 148). Yet, in our study, one of the two analytic convergent questions (AC1) elicited a highly interactive discussion and was one of the more effective question types for eliciting higher level responses. One possible reason for the differences between our results and Andrews may lie in the existence of either a course- or instructor-effect, or both. According to Andrews (1980), instructors who use a consistent style of questioning, especially at the higher levels, may generate responses that are more consistently at a higher level, even when the occasional lower level question is used. In addition, a closer look at this specific prompt (see “Appendix”), used in a relatively small graduate course, reveals that the first Analytic Convergent question (AC1) was a multi-question prompt that specifically asked students to translate a theory into practice and to query other students about the theories they were assigned. In contrast, the second Analytic Convergent question (AC2), which was also used in a small graduate course, did not elicit high levels of interaction or as many high level responses. In comparison to AC1, AC2 narrowed students’ focus to one that was primarily self-reflective, with no encouragement to respond to others or to consider alternative interpretations of the content being discussed. Thus, the specific details of a question prompt may differentially influence the resulting interaction as well as the level of students’ responses. Of course it is also possible that students in these courses were differentially capable and/or motivated to respond to questions at a higher level. Future research is needed to explore these ideas.

As a final point of differentiation: Andrews’ work (1980) was based on an analysis of face-to-face discussions in which students had little opportunity to reflect on the questions being posed. The potential for cognitive overload, especially when responding to a series of inconsistent questions (e.g., shotgun, funnel), would likely have been greater than in the online environment, where students can take more time to sort through the various questions posed and respond to a specific sub-question for which they feel more comfortable. This, then, may have eliminated some of the difficulties the students in Andrews’ study experienced, and may explain why our results showed more variation in response levels than Andrews. For example, Andrews noted that both the shotgun question and the funnel question led to confusion and “withdrawal” for the students in his study. However, in our study, both of these prompts led to at least some responses at the highest level of Bloom’s taxonomy.

Still, we recognize that Andrews’ indicators provide primarily a quantitative measure of quality. That is, a discussion with high “mileage” could still be off-topic or elicit student responses at the lower levels of Bloom’s taxonomy. Thus, an additional measure of interaction quality is needed to determine the extent to which interactions are specifically content-relevant. For example, Rourke et al. (1999) delineated five types of interaction indicators that “provide evidence the other is attending” (citing Short, Williams, and Christie, p. 7) and that serve as connectors or links between individual posts. Future research might entail adapting these indicators to define the extent to which students’ interactions revolve around course content.

Limitations and suggestions for future research

This study comprised an exploratory descriptive study; when interpreting our results, it is important to recognize our study’s limitations. First, Pear and his colleagues (Crone-Todd et al. 2000; Pear et al. 2001) described difficulties involved in using Bloom’s taxonomy to code questions and responses, particularly if the coders are not content experts and/or are unfamiliar with the context of the discussions. Although the use of two coders in this study helped minimize this limitation, the possibility still exists that that we misinterpreted the intent of a question or misjudged students’ responses.

Second, discussion questions were collected from 10 different courses, representing six different disciplines, including both undergraduate and graduate levels. Although others (Bradley et al. 2008; Gilbert and Dabbagh 2005; Schrire 2006) have reported that graduate students tend to demonstrate a higher frequency of high level responses than undergraduates (Blanchette 2001; Meyer 2004), we did not examine differences among these populations. Furthermore, we did not examine the relationship between the level of students’ responses and their relative capability (e.g., intellectually or motivationally) to respond at higher levels. Future research might examine more closely the differences among students’ responses at different achievement and educational levels, as well as in different courses and disciplines, and with different instructors. This has the potential to lead to more specific guidelines for the types and levels of questions to use with different populations.

Finally, although we coded 92 question prompts and 850 student responses, the results of this study are based on a small subsample of the total data set available. That is, although we coded 19 discussion forums (question prompts and student responses), our sample included only two for each of Andrew’s question types and two at each of Bloom’s levels. Additional discussions, representing each of the types and levels, must be examined to verify the initial patterns we observed in this initial, smaller sample.

Implications

The results of this study have important implications for instructors who teach online, especially those looking for general guidelines regarding how to structure discussion prompts to elicit high quality student responses. Because instructors have a lot of control over which questions they ask, and how they structure them, deliberate use of different types/levels of questions may enable them to engender higher quality interactions among students. One strategy instructors might consider is to combine ideas from both Andrews and Bloom when designing question prompts. For example, if instructors are interested in generating discussions with a lot of interaction, they might start with the guidelines offered by Andrews to design a question prompt with a lot of mileage. Then, after creating the overall structure of the prompt, instructors could use guidelines from Bloom’s taxonomy to target specific levels of thinking.

For example, in this study, brainstorming and playground questions generated high levels of interaction. Unfortunately, brainstorm questions also generated relatively low levels of thinking. However, an instructor could modify this type of question to target the higher or middle levels of Bloom’s taxonomy. Using a brainstorming question as the base, instructors might stimulate deeper thinking by asking students to go beyond the recall or comprehension level by describing underlying relationships or by making connections among ideas.

Generally speaking, it is important to use a variety of question types in order to target different learning outcomes and to create a reasonable balance among the complexities inherent to specific question types (Chin 2004). Fortunately, in the online environment, instructors have the opportunity to modify initial questions if students seem confused or frustrated. Furthermore, if they find that students are responding with simple interpretations or unsupported opinions, they can post additional questions prompting students to provide evidence or to integrate their ideas with those presented by someone with a divergent view. In this way, initial question prompts can be bolstered to facilitate the levels of thinking ultimately desired by the instructor.

Conclusion

Despite the importance of discussions to student learning in online courses, student-content interactions have been relatively under-researched, particularly in comparison to instructor-student and student–student interactions (Swan 2002). Although questions are used for many instructional reasons such as focusing attention, promoting recall, and encouraging reflection, using questions to stimulate critical, or higher-order thinking is one of the most important goals of education (Gibson 2009). Studies have shown that online discussions can support higher-order thinking (Gilbert and Dabbagh 2005; Richardson and Ice 2010), particularly through the use of effective questioning techniques. The results of this study provide additional evidence that discussion questions, especially those at the higher levels of Bloom’s taxonomy, can be leveraged for the benefit of our students. It is our hope that by examining the patterns observed in our results, as well as the individual question prompts used by instructors in their courses, others will be able to modify their own discussion prompts to stimulate higher levels of thinking among their students.