Collaborative learning is considered as an important pedagogical activity that supports students’ learning. Yet, successful peer collaboration requires the students to be engaged with and aware of each other’s thinking (Kuhn 2015). Involving students in peer feedback is one way to ensure their active engagement during collaborative learning (Phielix et al. 2010). Peer feedback is regarded in this paper as the qualitative variant of peer assessment that is defined as a learning activity where individuals or small­group constellations exchange, react to and/or act upon information about their performance on a particular learning task with the purpose to accomplish implicit or explicit shared and individual learning goals.

Providing feedback on the work of a peer is a complex assessment skill that can be improved with training (Sluijsmans et al. 2004). Previous studies showed that peer-feedback skills are influenced by students’ individual characteristics such as domain knowledge (Patchan and Schunn 2015) and task characteristics such as complexity (Van Zundert et al. 2012a). However, the role of peer-feedback providers’ domain knowledge seems to be ignored in studies conducting peer-feedback trainings. Also, the learning tasks used in most peer-feedback studies are often presentations or written essays. Few studies investigated students’ peer-feedback provision skills using complex learning tasks that involve scientific reasoning (e.g. Gan and Hattie 2014; Lavy and Shriki 2014; Van Zundert et al. 2012b).

Particularly in mathematics education, despite the importance of peer-feedback activities for teaching preservice teachers the assessment skills of scientific reasoning tasks like geometric proofs (Lavy and Shriki 2014), no study—to our knowledge—has empirically investigated training preservice teachers in peer-feedback provision skills on proof tasks. Additionally, involving preservice teachers in peer-feedback training does not only influence their assessment skills but can also shape their perspective about peer feedback (e.g. Sluijsmans et al. 2004). Thus, preservice teachers’ beliefs about peer-feedback provision should be more systematically investigated when peer-feedback provision trainings are being conducted. In the next sections, we discuss the factors that can influence training peer-feedback provision skills when complex mathematical tasks are used.

Peer feedback and geometric construction tasks

Geometric constructions are useful scientific reasoning tasks because they are discovery tasks in which the results of learned mathematical problems (i.e. theorems or concepts) can be applied to physical objects in the real world (Schoenfeld 1986). Similar to geometry proofs, constructions should be supported by deductions. However, research has shown that students at school and university levels including preservice teachers regard deductive proofs as irrelevant to geometric constructions (Kuzniak and Houdement 2001; Miyakawa 2004; Schoenfeld 1989; Tapan and Arslan 2009).

The failure to use the deductive mathematical knowledge that students already possess is attributed to passive instruction (Schoenfeld 1989). Preservice teachers can be actively engaged with the learning task through involving them in peer-feedback provision. But since they rely on empirical approaches when performing geometric construction tasks, they are more likely to use the same approaches during peer-feedback provision unless they are being trained and supported to use specific types of feedback.

Feedback levels

Hattie and Timperley’s (2007) progressive feedback model offers a conceptualization of learner’s engagement with the learning task and the learning and self-regulation processes associated with that learning task. According to this model, there are four types (i.e. levels) of feedback that have different effects on learning. The first level is the task-level that refers to the correctness/incorrectness of the solution or the content knowledge used to solve the task (e.g. “your answer is correct”). The second level is feedback about the learning processes and strategies that can be used to solve not only the task at hand but also other similar tasks (process-level, e.g. “your justifications should be based on theorems when you perform constructions”). The third level is self-regulation that directs the learners to monitor and regulate their learning goals (e.g. “what would happen if you change the size of the angle?”). Self-regulation feedback does not provide information to the learner about what should be done; instead, it stimulates learners to act and reflect on their learning.

The fourth level is feedback about the self that is typically used for motivational purposes (self-level). It includes no information about the learning task, but refers to learners’ personal characteristics; most of the time in the form of general non-task­related praise (e.g. “you are smart!”). Importantly, this feedback is not the same as what is referred to as self-feedback in the feedback literature (e.g. Butler and Winne 1995). Self-feedback is internal feedback, which is part of self-regulation, whereas feedback about the self is personal and focuses on the self.

Process and self-regulation feedback levels are more beneficial for deeper processing and mastery learning because they stimulate deeper engagement with the learning task (Hattie and Timperley 2007). Feedback at the self-level is regarded as the least useful feedback for learning because it diverts the learner’s attention to the self and away from the learning task (Kluger and DeNisi 1996).

Feedback levels in peer-feedback research

Due to its comprehensibility, several researchers used the Hattie and Timperley (2007) model to train or analyse peer feedback of high school students in reading, mathematics and chemistry (e.g. Gan 2011; Gan and Hattie 2014; Harris et al. 2015). These studies reported that peer feedback was dominated by the task- and self-levels (Gan and Hattie 2014; Harris et al. 2015). Nonetheless, when prompted and/or trained to provide peer feedback at the different levels on chemistry lab reports, students could provide more peer feedback at the process and the self-regulation levels (Gan 2011). However, it is unclear if this also applies to complex mathematics tasks such as geometric constructions.

Engaging preservice mathematics teachers in peer-feedback provision in which they are instructed to provide peer feedback on the process and self-regulation levels can be beneficial for their assessment skills of geometric constructions, as the provision of these types of peer feedback requires deeper processing of the task. The progressive nature of the feedback levels in the Hattie and Timperley (2007) model can aid the student to move beyond the surface features of the construction (i.e. how it looks) and consider other processes through which the construction was created, as well as rely on deductive reasoning to be able to provide peer feedback at the process and self-regulation levels. Nonetheless, since most of students’ peer feedback is at the task- and self-levels (Gan and Hattie 2014), they need to be trained to provide peer feedback at the higher levels.

Training peer-feedback skills

Peer feedback is usually a new practice to most students including preservice teachers. Students often experience uncertainty about their ability to assess the work of a peer and wish to have more support (Cheng and Warren 1997). There is evidence that training preservice teachers to assess the work of peers resulted in better peer assessment skills (e.g. Sluijsmans et al. 2004). Similarly, training students with peer-feedback prompts was found to improve feedback provided at the process and self-regulation levels on chemistry lab reports (Gan 2011). Thus, providing peer feedback at the higher levels can be trained, but students need some instructional support especially when complex learning tasks such as geometric constructions are being used.

Instructional support for peer feedback training

Van Zundert et al. (2012a) recommended training domain knowledge before training assessment skills when a complex learning task is used. However, in a typical face-to-face classroom, instruction time is usually limited and sequential training of domain knowledge and peer-feedback skills might not be feasible. One way to account for this challenge is to support students using domain knowledge scaffolds and peer-feedback scaffolds (i.e. tools used to help students succeed in performing challenging tasks; Quintana et al. 2004).

An efficient domain knowledge scaffold that can support students during peer-feedback provision is that of a worked example, which typically includes the problem with its solution (Renkl 2014). Several studies in geometry and algebra education showed that teaching students using worked examples works better than the conventional teaching, especially for students with low domain knowledge (e.g. Carroll 1994; Paas and Van Merriënboer 1994; Reiss et al. 2008). Using a worked example as a domain knowledge scaffold can be expected to reduce the demand introduced by the complexity of the task and, consequently, help students to focus on the peer-feedback provision process.

Feedback scaffolds that can be used to support the peer-feedback provision activity are prompts and an evaluation rubric. Structuring the peer-feedback activity through feedback provision prompts can lead to more peer-feedback provision at the process and the self-regulation levels (Gan and Hattie 2014; Gielen et al. 2010). Yet, students can still focus on one aspect of the peer solution. Providing students with an evaluation rubric (i.e. task-specific criteria) against which they judge the peer solution can help them to focus on the essential parts of the solution. The use of evaluation rubrics was found to increase the accuracy of peer assessment (Panadero et al. 2013) and can be used in combination with the feedback provision prompts to provide peer feedback at different levels.

In sum, structured peer-feedback training seems to have a potential to improve preservice mathematics teachers’ feedback skills on peer solutions to geometric construction tasks. Yet, this assumption requires empirical examination. Importantly, the peer-feedback providers’ level of domain knowledge is likely to influence the degree to which they can benefit from the peer-feedback training as well as their perspective about peer-feedback provision. In the next sections, we discuss the role of domain knowledge during peer-feedback provision and students’ beliefs about peer-feedback provision.

Domain knowledge and peer-feedback provision

Providing feedback on peer solutions to a learning task requires an understanding of the task. Van Zundert et al. (2012b) showed that domain knowledge is a prerequisite for assessing the work of a peer, especially when the learning task is complex. Thus, the type of provided peer feedback is likely to be influenced by the provider’s domain knowledge. Patchan and Schunn (2015) reported that, in academic writing, peer feedback by students with low domain knowledge was dominated by praise, whereas peer feedback provided by high domain knowledge students involved more criticism. However, there is still limited evidence about whether the type of peer feedback provided on complex mathematical tasks, like geometric constructions, is also influenced by domain knowledge.

Students’ beliefs about peer feedback

According to the Interactional Framework of Feedback (Strijbos and Müller 2014), providers’ and recipients’ individual characteristics including their beliefs are equally important for feedback. While recipients’ beliefs influence the processing of the feedback message, the providers’ beliefs are expected to shape the composition of the feedback message (Strijbos and Müller 2014). Students’ beliefs about the helpfulness of peer feedback were found to be positively associated with (self-reported) self-regulation and negatively associated with GPA (Brown et al. 2016). Similarly, students’ beliefs about the usefulness of peer feedback were found to be positively associated with perceived peer-feedback accuracy and trust in one-self as a provider and in the peer as a recipient (Rotsaert et al. 2017).

In studies implementing peer-feedback activities, changes in students’ beliefs about peer feedback are often investigated given that they might be associated with students’ insecurities regarding peer-feedback usefulness or their ability to provide peer feedback. Some studies that measured students’ beliefs about peer-feedback usefulness or their confidence regarding peer-feedback provision before and after the peer-feedback activities reported positive changes (e.g. Cheng and Warren 1997; Sluijsmans et al. 2004). In contrast, in a more recent EFL writing study, Wang (2014) reported a decrease in students’ beliefs about peer-feedback usefulness that was attributed to several factors among them domain knowledge.

So far, most studies have focused only on changes in beliefs about peer-feedback usefulness or confidence regarding peer-feedback provision after involving students in peer-feedback provision (e.g. Sluijsmans et al. 2004). Peer-feedback ­related epistemological beliefs (e.g. critical evaluation, self-reflection; see Nicol et al. 2014) are also important when investigating peer-feedback provision, and changes in these beliefs are yet to be explored. While domain knowledge is suggested to play a role in how students’ beliefs about peer-feedback provision change (Wang 2014), there seems to be no study to our knowledge that empirically tested this.

Current research and hypotheses

The aim of this study was to investigate the impact of a structured peer-feedback training on preservice mathematics teachers’ feedback provision skills on peer solutions to geometric construction tasks and on their peer-feedback provision beliefs taking into account their domain knowledge. We formulated the following hypotheses:

  • H1a: Structured peer-feedback training leads to improved peer feedback at the higher levels (i.e. process and self-regulation).

  • H1b: Higher levels of domain knowledge result in providing more peer feedback at the process- and self-regulation levels and less peer feedback at the task- and self-levels after a structured peer-feedback training.

  • H2a: Structured training results in changes in beliefs about peer-feedback provision.

  • H2b: Students’ level of domain knowledge influences the direction of changes in beliefs.

Method

Participants

The participants were 58 middle school preservice mathematics teachers from a large university in southern Germany. Participation was a course requirement, and the students received no additional compensation. The study ran throughout the semester. Data collection and peer-feedback training took place over several sessions. Only 43 out of the 58 students were present at all measurement occasions (9 males, 34 females, mean age 22.51, SD = 2.36) and were included in the analyses.

Design

We implemented a quasi-experimental mixed design to investigate the impact of peer-feedback providers’ domain knowledge (between-subjects, addressing H1b and H2b) on changes in the levels of their written peer feedback and their beliefs about peer-feedback provision after a structured peer-feedback provision training on geometric construction tasks (within-subject, addressing H1a and H2a) (see Fig. 1).

Fig. 1
figure 1

A detailed illustration of the study design showing different activities that took place during the peer feedback training in sequence. Geom test = domain knowledge test, Fict peer solution = fictional peer solution to geometric construction task(s), Geom Task = geometric construction tasks, PFPQ = peer feedback provision beliefs questionnaire, Intro PF levels = introduction to peer-feedback levels

Materials

Geometric construction tasks

The domain tasks consisted of a set of geometric objects (e.g. a line, a circle and an angle) and asked for the construction of a specified object (e.g. a tangent to the circle that intersects the line in a congruent angle). Solving the task requires (a) performing the construction, (b) describing the construction step by step and (c) providing reasoning to show why the construction yields the specified object. The participants were required to provide written peer feedback on fictional peer solutions to these domain tasks.

Fictional peer solutions

One fictional peer solution was used for the pretest and one for the posttest. The geometric construction tasks on which the peer solutions were created were similar (constructing a line tangent to a circle and parallel/ perpendicular to another line for the pretest and the posttest, respectively). Both fictional peer solutions contained (a) some correct steps, but partly followed an incorrect strategy, (b) correct descriptions of some but not all steps, (c) correct and incorrect reasoning steps and (d) vague language in some parts of the solution. The graphical construction in the peer solutions matched the description but was not completely accurate.

Feedback provision prompts

A visual organiser (developed by Gan 2011) with progressive prompts reflecting different levels of feedback according to Hattie and Timperley’s (2007) model was used (see Hattie and Gan 2011 for the visual organiser). We extended Gan’s visual organiser with additional prompts: mostly knowledge integration/self-reflection prompts (adopted from Chen et al. 2009; King 2002; Nückles et al. 2009; see Table 1). Most of the added prompts were at the self-regulation level because it is assumed that students benefit more from providing knowledge integration/self-reflective questions, as they need to think deeply about the learning task (King 2002).

Table 1 Prompts added to Hattie and Gan (2011) feedback level visual organiser and their sources

Evaluation rubric

The evaluation rubric consisted of a set of criteria that could be used in combination with the feedback prompts to judge the peer solution and produce written peer feedback. More specifically, the students had to judge (a) the construction of the geometric object, (b) the description of the construction and (c) the reasoning provided to prove the construction true (see Fig. 2). The feedback provision prompts could be applied to each section of the evaluation rubric that directed the students to all parts of the peer solution.

Fig. 2
figure 2

Evaluation rubric given to students in the training sessions to judge the construction procedure of the geometric figure, the description of construction and the reasoning provided for the construction

Worked example

All students received a standard worked example of the geometric construction task they had to provide written peer feedback on during the peer-feedback training. Since the purpose of this study was to improve students’ peer-feedback skills, and not their domain knowledge, we provided them with the worked example to ensure that those with low domain knowledge can still practice providing written peer feedback at the higher levels (i.e. process and self-regulation).

Measures

Domain knowledge test

To measure students’ domain knowledge, a geometry basic knowledge test (Ufer, Heinze and Reiss 2008) consisting of 49 true/false items measuring different topics (e.g. properties of an equilateral triangle, properties of a parallelogram, transversals) was used at the pretest (M = 37.93, SD = 5.15, Cronbach’s α = 0.77, maximum score of 49). Based on students’ scores, they were grouped into the lowest (M = 32.05, SD = 3.17), middle (M = 38.33, SD = 1.32) and highest (M = 43.67, SD = 1.65) one third of the sample.

Peer-feedback provision questionnaire

Students’ beliefs about (a) learning from peer-feedback provision (LPF) (e.g. “I learn from providing peer feedback”), (b) confidence regarding peer-feedback provision (CPF) (e.g. “I feel confident when providing positive feedback to my peers”) and (c) engaging in reasoning during peer-feedback provision (RPF) (e.g. “Providing peer feedback helps me to be critical about my own arguments”) were measured before and after the peer-feedback training using the peer-feedback provision questionnaire. This questionnaire was developed for the current study, and it consists of 40 items—five of which were adapted from Linderbaum and Levy’s (2010) Feedback Orientation Scale. The items were scored on a 6-point Likert scale (strongly disagree, disagree, slightly disagree, slightly agree, agree, strongly agree). Means were calculated for each subscale for further analyses.

The instrument (with initially 44 items) was piloted with an independent sample of students (N = 83). Parallel analysis following (O’Connor’s 2000) procedure revealed a three-factor solution that corresponded to the theoretical structure of the questionnaire (Hinkin 1998). The items were subjected to multiple principle component analyses (PCAs) because the ratio of the sample size to the number of items was too small to include all of the 44 items in one analysis. Three rounds of PCA were conducted with two theoretically distinct scales as the components in each round (LPF vs. CPF, CPF vs. RPF and LPF vs. RPF). Items were retained only if they loaded meaningfully on the intended theoretical component with factor loadings > 0.40 and had a value of at least 0.50 for the diagonal elements of the anti-image correlation matrix (Field 2009).

The procedure resulted in the exclusion of four items from the RPF scale. The three scales supported by the PCAs consisted of (a) ten items for LPF (Cronbach’s α = 0.87), (b) 17 items measuring CPF (Cronbach’s α = 0.91) and (c) 13 items for RPF (Cronbach’s α = 0.88). PCA was not conducted for the current study as the sample size was not sufficient, but the scales’ reliabilities were equally high for the present sample: LPF (Cronbach’s α pre = 0.83, Cronbach’s α post = 0.90), CPF (Cronbach’s α pre = 0.90, Cronbach’s α post = 0.96) and RPF (Cronbach’s α pre = 0.71, Cronbach’s α post = 0.87).

Peer-feedback levels

The peer-feedback levels at the pretest and posttest were coded using a coding scheme based on Hattie and Timperley’s (2007) model. Previous studies showed the applicability of this feedback model to peer feedback and developed coding schemes that were successfully used to analyse peer feedback (e.g. Gan and Hattie 2014; Harris et al. 2015). The coding scheme (Table 2) was adapted from Gan and Hattie (2014). Although peer feedback at the self-level was not prompted for, it was included in the coding because peer feedback often includes statements of this nature (Harris et al. 2015).

Table 2 Scheme used to code the written peer feedback based on the feedback levels adapted from Gan and Hattie (2014)

Prior to coding students’ written peer feedback, it was segmented following the procedure by Strijbos et al. (2006), with the smallest meaningful segment as the unit of analysis. Two coders (first author and a student-assistant) independently segmented 10% of the data reaching an acceptable percentage agreement level (81.80% lower bound, 82.30% upper bound), after which the first author segmented the remainder. The same two coders independently coded 10% of the segments (Krippendorff’s α = 0.76). The remaining segments were then coded by the first author. Proportions of peer feedback at each level (task, process, self-regulation, self) were calculated for further analyses.

Procedure

In the first session, all participants answered the domain knowledge test and provided written peer feedback on a fictional peer solution to a geometric construction task to measure their baseline peer-feedback skills. Then, they completed the inventory on their beliefs about peer-feedback provision. In sessions two to five, all participants received peer-feedback provision training that consisted of two stages. In the first stage, two instructional sessions were held each lasting 45 min. In the first session, the notion of peer feedback was discussed with the students. The students shared their thoughts about peer feedback, its benefits, how it should look like and their insecurities regarding peer feedback. Then, the feedback levels (task, process, self-regulation and self) were introduced and discussed with the students. At that point, the students also received the feedback provision prompts accompanied with the evaluation rubric. In the remaining part of the first session and in the second session, the students were involved in several individual and group activities to better understand the different levels of feedback. They had to (a) identify each feedback level in written peer feedback comments, (b) transform one feedback level to a higher level and (c) work in groups to provide written peer feedback on a solution, as well as share and discuss their peer feedback with the rest of the class. In the second stage of the peer-feedback training—which also involved two sessions—we provided the students with worked examples in addition to the feedback prompts and the evaluation rubric that they received at an earlier stage. The students received a fictional peer solution and practised providing written peer feedback on that solution with the help of the instructional scaffolds (i.e. feedback prompts, evaluation rubric and worked example). At the end of the semester, each participant provided written peer feedback on a fictional peer solution in the absence of all instructional scaffolds and answered the peer-feedback provision questionnaire again.

Analyses

We used repeated measure multivariate analysis of variance (MANOVA) followed by repeated measure ANOVAs to analyse peer feedback provided by preservice mathematics teachers. The peer-feedback data violated the assumption of normality, and thus, we applied a semi-parametric­repeated measure MANOVA test using the MANOVA.RM package in R (Friedrich et al. 2017). The test implements the rank-based ANOVA­type statistic (ATS) tests for factorial designs (Brunner et al. 1999). For the post hoc tests, we also used the ATS using the R package nparLD (Noguchi et al. 2012).

In the ATS tests for within-subject factors and interactions involving within-subject factors, the denominator degrees of freedom used for the approximation of the distribution is assumed to be infinity (Brunner et al. 1999) because the degrees of freedom used in conventional ANOVA produce conservative measures (Bathke et al. 2009). A measure for the effect size for the ATS is the relative effect, that can be interpreted as the probability that a randomly chosen observation from the sample would result in a smaller value for a specific peer-feedback level (e.g. task or process) than a randomly chosen observation from a domain knowledge group (e.g. low, medium or high) for a specific measurement occasion (i.e. pretest or posttest).

Changes in students’ beliefs about peer-feedback provision were analysed using parametric MANOVAs followed by repeated measure ANOVAs. Following recommendations by Lakens (2013), partial eta-squared (η p 2) is used as a measure of effect size. Values of 0.01, 0.06 and 0.14 indicate small, medium and large effects, respectively (Cohen 1988). Hypotheses 1a and 1b were tested as directional, whereas hypotheses 2a and 2b were tested as non-directional.

Results

Data inspection

The standardised skewness and kurtosis were within the acceptable range of ± 3 (Tabachnick and Fidell 2013) for all the peer-feedback provision questionnaire subscales, and there were no extreme outliers. However, the standardised skewness and kurtosis values were outside the acceptable range for the proportions of task-level peer feedback (skewness = − 4.18, kurtosis = 5.20), process-level peer feedback (skewness = 4.92, kurtosis = 6.68), self-regulation level peer feedback (skewness = 7.82, kurtosis = 13.63) and self-level peer feedback (skewness = 11.19, kurtosis = 29.33) in the pretest and for self-regulation level peer feedback (skewness = 3.49, kurtosis = 3.28) and self-level peer feedback (skewness = 5.77, kurtosis = 6.17) in the posttest. Seven univariate outliers were identified: one outlier for task-level peer feedback, one for process-level peer feedback, one for self-regulation level peer feedback and two outliers for self-level peer feedback in the pretest. One univariate outlier was identified for self-regulation level peer feedback and one for self-level peer feedback in the posttest. The outliers were checked to ensure that they were actual values. All outliers were retained as non-parametric tests were used to analyse the peer-feedback level variables.

Peer-feedback levels after the training

We performed a 2 × 3­repeated measure MANOVA with domain knowledge and measurement occasion (pretest vs. posttest) as the independent variables, and peer-feedback levels in terms of task, process, self-regulation and self as the dependent variables. According to the ATS, there was a significant three-way interaction between domain knowledge, measurement occasion and peer-feedback levels, F(4.26, ∞) = 4.16, p = .001. There were significant interactions between domain knowledge and peer-feedback levels, F(3.29, 49.85) = 3.50, p = .019; and between measurement occasion and peer-feedback levels, F(2.49, ∞) = 9.62, p = .000. There was no significant interaction between domain knowledge and measurement occasion, F(1.84, ∞) = 0.04, p = .955. There was a significant main effect for peer-feedback type, F(2.07, 49.85) = 283.38, p = .000; and no significant main effects for measurement occasion, F(1, ∞) = 0.05, p = .825; and for domain knowledge, F(1.63, 49.85) = 0.01, p = .972.

Follow-up 2 × 3­repeated measure ANOVAs with domain knowledge and measurement occasions as the independent variables and peer-feedback type as the depended variable were computed for each peer-feedback type with Bonferroni corrections.

Peer feedback at task-level

There was a significant interaction between domain knowledge and measurement occasion, F(1.86, ∞) = 4.99, p = .032 (Table 3, Fig. 3a).There was no significant main effect for measurement occasion, F(1, ∞) = 4.07, p = .128 and no significant main effect for domain knowledge, F(1.88, 34.87) = 3.95, p = .124, on peer feedback at the task-level. Post hoc multiple comparisons with Bonferroni correction revealed a significant difference between low domain knowledge students (M Rank = 57.40, SD = 0.33) and medium domain knowledge students (M Rank = 23.93, SD = 0.31), F(1, ∞) = 11.78, p = .002. No significant differences were found between low domain knowledge students and high domain knowledge students (M Rank = 36.00, SD = 0.46), F(1, ∞) = 2.87, p = .273 or between medium and high domain knowledge students, F(1, ∞) = 1.21, p = .815.

Table 3 Means and standard deviations of peer feedback levels (task, process, self-regulation and self) before (pretest) and after (posttest) the training for each domain knowledge group
Fig. 3
figure 3

Relative effects of peer-feedback levels: a task, b process, c self-regulation and d self, for each domain knowledge group (low, medium and high) before (pretest) and after (posttest) the peer­feedback training

Peer feedback at process-level

There was no significant interaction between domain knowledge and measurement occasion, F(1.99, ∞) = 1.65, p = .772. Similarly, there were no significant main effects of domain knowledge, F(1.95, 37.35) = 1.82, p = .708 or measurement occasion, F(1, ∞) = 0.45, p = 1, on peer feedback at the process level (Table 3, Fig. 3b).

Peer feedback at self-regulation level

There was a significant interaction between domain knowledge and measurement occasion, F(1.91, ∞) = 4.72, p = .040. There was a significant main effect of domain knowledge, F(1.90, 35.38) = 2.29, p = .472 and of measurement occasion, F(1, ∞) = 11.16, p = .004, on peer feedback at the self-regulation level (Table 3, Fig. 3c). Post hoc multiple comparisons with Bonferroni correction showed a significant difference between low domain knowledge students (M Rank = 36.10, SD = 0.25) and high domain knowledge students (M Rank = 55.50, SD = 0.40), F(1, ∞) = 9.02, p = .008; and between low domain knowledge and medium domain knowledge students (M Rank = 58.32, SD = 0.44), F(1, ∞) = 5.90, p = .045. No significant difference was found between medium and high domain knowledge students, F(1, ∞) = 0.01, p = 1.

Peer feedback at self-level

There was no significant interaction between domain knowledge and measurement occasion, F(1.82, ∞) = 0.32, p = 1. Similarly, there was no significant main effect for domain knowledge, F(1.30, 18.81) = 2.60, p = .468 or for measurement occasion, F(1,∞) = 0.74, p = 1, on peer feedback at the self-level (Table 3, Fig. 3d; note that lines for low and medium domain knowledge groups are superimposed in the figure).

Changes in beliefs about peer-feedback provision

We performed a 2 × 3 MANOVA with domain knowledge and measurement occasion (pretest vs. posttest) as the independent variables and peer-feedback provision beliefs (LPF, CPF and RPF) as the dependent variables. The results revealed significant multivariate main effects for measurement occasion, Pillai’s Trace = 0.337, F(1, 40) = 20.31, p = .000, η p 2 = 0.34 and for peer-feedback provision beliefs, Pillai’s Trace = 0.39, F(2, 39) = 12.64, p = .000, η p 2 = 0.34. There were no significant multivariate interactions between measurement occasion and domain knowledge, Pillai’s Trace = 0.01, F(2, 40) = 0.17, p = .844, η p 2 = 0.01; peer-feedback provision beliefs and measurement occasion, Pillai’s Trace = 0.02, F(2, 39) = 0.47, p = .627, η p 2 = 0.02; peer-feedback provision beliefs and domain knowledge, Pillai’s Trace = 0.01, F(4, 80) = 0.11, p = .978, η p 2 = 0.01; or between peer-feedback provision beliefs, domain knowledge and measurement occasion, Pillai’s Trace = 0.06, F(4, 80) = 0.66, p = .620, η p 2 = 0.03.

Follow-up 2 × 3­repeated measure ANOVAs with Bonferroni correction were performed for each peer-feedback provision belief as the dependent variable and measurement occasion (pretest, posttest) and level of domain knowledge (low, medium and high) as the independent variables. There were significant main effects of measurement occasion on LPF, F(2, 40) = 10.93, p = .006, η p 2 = 0.21; CPF, F(2, 40) = 9.81, p = .009, η p 2 = 0.20; and on RPF, F(2, 40) = 26.19, p = .000, η p 2 = 0.40 (Table 4). There were no significant interactions between domain knowledge and measurement occasions on LPF, F(2, 40) = 0.80, p = 1, η p 2 = 0.04; for CPF, F(2, 40) = 0.16, p = .851, η p 2 = 0.008; and for RPF, F(2, 40) = 0.33, p = 1, η p 2 = 0.02. There were no significant main effects of domain knowledge on LPF, F(2, 40) = 1.23, p = .690, η p 2 = 0.07; CPF, F(2, 40) = 0.94, p = 1, η p 2 = 0.04; and RPF, F(2, 40) = 1.70, p = .60, η p 2 = 0.07. Students’ beliefs about peer-feedback provision significantly decreased after the training regardless of the levels of their domain knowledge (see Table 4).

Table 4 Means and standard deviations of beliefs about peer feedback provision for each domain knowledge group before (pretest) and after (posttest) the training

Discussion

This study investigated (a) whether preservice mathematics teachers, with different levels of domain knowledge, benefited differentially from a structured peer-feedback training and (b) how students’ beliefs about peer-feedback provision changed in response to the training. The students received training on providing peer feedback at different levels (i.e. task, process and self-regulation) and practised providing written peer feedback on two fictional peer solutions with the help of three instructional scaffolds (worked example, feedback provision prompts, evaluation rubric).

Improvement of peer-feedback levels

The results indicated an increase in peer feedback provided at the highest level (i.e. self-regulation), but only for medium and high domain knowledge students (hypothesis 1a was partially supported; hypothesis 1b was not supported). This finding suggests that engaging preservice teachers in peer-feedback provision activities structured with instructional scaffolds can help them process solutions to geometric construction tasks at deeper levels. However, domain knowledge appears to be a prerequisite as suggested by Van Zundert et al. (2012b). This is also supported by the finding that low domain knowledge students ended up providing more peer feedback at the task-level. Even when trained with a set of instructional scaffolds including a worked example, providing higher levels of peer feedback seems to be challenging for low domain knowledge students.

In contrast to the study by Gan and Hattie (2014), peer feedback at the process-level did not significantly increase after the training in our study. One explanation might be the type of task. In Gan and Hattie’s (2014) study, the object of peer feedback was a chemistry lab report of an experiment which might elicit procedural comments to a larger extent. Conversely, in the case of geometric constructions in our study, many processes and strategies are rather implicit and therefore might be harder to provide peer feedback on. A second explanation might be the research sample. Gan and Hattie’s (2014) conducted their study with high school students, whereas our study was conducted with preservice teachers who can be regarded as having more domain knowledge than high school students and might not have realised the importance of providing process-related peer feedback to their peers. This difference requires empirical investigation by comparing the peer ­feedback preservice mathematics teachers provide to that provided by high school students on geometric construction tasks.

Although no significant differences were found between medium and high domain knowledge students in the process and self-regulation peer­feedback levels after the training, medium domain knowledge students tended to descriptively provide slightly more peer feedback at those higher levels. Furthermore, only the medium domain knowledge group provided significantly less task-level peer feedback than the low domain knowledge group after the training. These findings may suggest that medium domain knowledge students benefited most from the structured training in providing more feedback at the higher levels. However, the relatively lower amount of peer feedback at the process and self-regulation levels provided by the high domain knowledge students does not necessarily indicate that they did not process the peer solution deeply. It might be the case that self-regulation level peer feedback has already been internalised by students with high domain knowledge, and they might not consider it relevant to provide peer feedback at that level (i.e. akin to experts’ automatization of cognitive processes; see Nathan et al. 2001). Alternatively, they might not be able to verbalise self-regulation peer feedback as their mastery of performing the construction task might preclude them from verbalising all individual steps of the procedure (Ericsson and Charness 1994; Ericsson and Crutcher 1991). Conversely, through the process of providing feedback to a fictional peer, the medium domain knowledge students—who still have room for improvement—might have realised the importance of the self-regulation peer feedback for their own improvement and consequently used it more than their high domain knowledge counterparts. Further research with larger sample sizes is required to explore if such differences between high and medium domain knowledge with respect to higher peer-feedback levels exist and, if so, find explanations for them.

Decrease in beliefs about peer-feedback provision

Students’ beliefs about peer-feedback provision (LPF, CPF and RPF) all decreased after the training with a medium effect size (hypothesis 2a was supported). This finding is consistent with the study by Wang (2014) in which students’ beliefs about the usefulness of peer feedback decreased over repeated peer-feedback activities. Moreover, Wang (2014) suggested two related factors that contributed to this decrease in perceived usefulness: lack of domain knowledge and domain skills. In the present study, this decrease was observed for all domain knowledge levels, so we can conclude that hypothesis 2b was not supported. It might be the case that the geometric construction tasks were difficult for all participants and their beliefs became less positive regardless of their domain knowledge. Another potential explanation is provided by the self-assessment literature in which it is often reported that students who are low-performing on the target skill tend to over-estimate their performance (Panadero et al. 2016). Students in this study might have over- or under-estimated their ability to provide peer feedback due to the lack of or limited previous experience with peer feedback. When they were then introduced to the peer-feedback levels and repeatedly experienced producing peer feedback at the higher levels on complex geometric construction tasks, they might have realised that providing peer feedback was more complex than they expected. The decrease in beliefs observed in this study is inconsistent with previous studies in which students’ beliefs became more positive after the peer-assessment/feedback activities (e.g. Cheng and Warren 1997; Sluijsmans et al. 2004). Two factors might contribute to these contradictory findings: (a) the complexity of the peer-feedback object (i.e. essay vs. geometric constructions) (Van Zundert et al. 2012b) and (b) the type of the peer-feedback product (i.e. grade, or feedback; low or high peer-feedback levels). Furthermore, while many peer-feedback studies—including our study—investigated changes in students’ beliefs about peer-feedback provision, very few studies attempted to examine the impact of students’ peer-feedback­related beliefs on their performance or on the type of peer feedback they provide. Future research could examine (a) the impact of the complexities of the task and the peer-feedback product on students’ beliefs about peer-feedback provision and (b) the impact of different peer-feedback related beliefs on the peer-feedback type and learning outcomes.

Methodological limitations

Although the training improved peer-feedback provision at higher levels, separate effects of the instructional scaffolds could not be determined. A study with a larger sample size is required to systematically vary different combinations of instructional scaffolds in peer-feedback training and then compare peer feedback between different conditions. Furthermore, students’ domain knowledge was determined by testing their factual knowledge that is a good predictor of their performance in geometric tasks (Ufer et al. 2009). Yet, their meta-cognitive reasoning that is expected to be closely related to self-reflection comments (i.e. self-regulation feedback) was not tested. A combination of domain knowledge and meta­cognitive measures would be an informative measure of students’ knowledge level for future studies. Additionally, this study focused on peer-feedback provision skills with no direct measures of training effects on students’ task-specific performance (i.e. geometric construction task). While peer-feedback skills are essential for learners, it is important to investigate if training these skills can improve task-specific learning outcomes as well. Future studies could investigate whether peer-feedback provision training fosters performance on construction tasks.

Practical implications

The findings of the current study show that domain knowledge is an important factor when it comes to the type of peer feedback (i.e. progressively higher levels). Therefore, it is important to take into account students’ basic domain knowledge when designing peer-feedback activities in classrooms, as it might be challenging for low domain knowledge students to provide higher levels of peer feedback even with the help of instructional scaffolds. Teachers should not assume that the instructional scaffolds automatically solve the problem of the lack of knowledge required to provide peer feedback, especially for peer feedback at the higher levels (i.e. process and self-regulation). A progressive training in which only domain knowledge is trained followed by peer-feedback skill training seems to be more beneficial for students with low domain knowledge (Van Zundert et al. 2012a).