Unskilled but subjectively aware: Metacognitive monitoring ability and respective awareness in low-performing students

Händel, Marion; Fritzsche, Eva S.

doi:10.3758/s13421-015-0552-0

Unskilled but subjectively aware: Metacognitive monitoring ability and respective awareness in low-performing students

Published: 05 October 2015

Volume 44, pages 229–241, (2016)
Cite this article

Download PDF

Memory & Cognition Aims and scope Submit manuscript

Unskilled but subjectively aware: Metacognitive monitoring ability and respective awareness in low-performing students

Download PDF

Marion Händel¹ &
Eva S. Fritzsche²

3603 Accesses
29 Citations
5 Altmetric
Explore all metrics

Abstract

Two studies were conducted to further examine the unskilled-and-unaware effect and to test whether low-performing students are indeed unaware of their (expected) lower metacognitive monitoring abilities. Postdicted judgments of performance and second-order judgments (SOJs) were solicited to test students’ metacognitive awareness. Given that global and local judgments tend to differ (the confidence–frequency effect), we investigated whether students’ (un)awareness pertains to both types of judgments. A first study focusing on global judgments was conducted in a regular exam setting with 196 undergraduate education students. A second study with 115 undergraduate education students examined both global and local judgments. Local judgments were analyzed on an average level and according to different signal detection theory categories (hits, correct rejections, misses, and false alarms). In both studies, students were grouped in four performance quartiles. The results showed that low-performing students highly overestimated their performance (they were functionally overconfident). However, their SOJs indicated that they were less confident in their judgments than the other students, and thus seemed to be aware of their low ability to estimate their own performance (they were not subjectively overconfident). This was observed for global as well as for averaged local SOJs. Moreover, an analysis of the local judgments revealed that students’ SOJs varied depending not only on whether their judgments were accurate but also on whether or not they thought they knew the answer to an item. In sum, SOJs provide valuable information about students’ metacognitive awareness.

Metacognitive awareness as measured by second-order judgements among university and secondary school students

Article Open access 28 May 2020

What do second-order judgments tell us about low-performing students’ metacognitive awareness?

Article 02 May 2018

Confidence in performance judgment accuracy: the unskilled and unaware effect revisited

Article 29 November 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The ability to estimate one’s own performance accurately—that is, metacognitive monitoring ability—is important in the process of self-regulated learning and affects future performance: Only when students are aware that they lack knowledge to achieve high performance can they seek resources to fill these knowledge gaps. As longitudinal data by Rinne and Mazzocco (2014) have shown, students who had better metacognitive monitoring ability also had larger gains in performance. Students’ ability to estimate their own performance can be measured via metacognitive judgments. These judgments differ depending on the time of assessment—that is, on whether they are assessed before or after a performance test—and according to their grain size—that is, whether they are related to a whole test or to single items (Hacker, Bol, & Keener, 2008). Judgments prior to testing, which are called predictions, are usually solicited to judge future performance on a comprehension test after students have read a specific text (e.g., Lin & Zabrucky, 1998; Maki & McGuire, 2002). One drawback of predictions about test performance, however, is that they do not rely exclusively on the test to be done (which is usually unknown beforehand), but on more general beliefs about one’s own performance in a specific domain (cf. Hacker, Bol, Horgan, & Rakow, 2000). In contrast, postdictions are judgments made after a test or a specific item has been completed. Students can accordingly base their judgments on their knowledge of the test items, and especially on how well they believe they have performed (see Hacker et al., 2000). By referring to the specific test or item, postdictions appear to be a better indicator of metacognitive monitoring ability than are predictions (Schraw, 2009; Stankov & Crawford, 1996). Consequently, examining postdictions especially adds to our understanding of metacognitive monitoring ability—specifically, of whether or not students are able to monitor their learning appropriately. In recent studies, postdictions were generally more accurate than predictions, likely because students then had more detailed information about what to judge (Hacker et al., 2000; Maki, Jonas, & Kallod, 1994; Pierce & Smith, 2001; Zabrucky, Agler, & Moore, 2009). For these reasons, postdictions are the focus of the present studies.

The accuracy of metacognitive judgments depends not only on the point in time at which they are assessed (pre- vs. postdictions). Differences in accuracy have also been shown between local judgments (or microjudgments) that refer to single items and global judgments (or macrojudgments) that refer to an entire test (Nietfeld, Cao, & Osborne, 2005). Previous literature has indicated that local and global judgments must be distinguished from each other because their frames of reference differ (Gigerenzer, Hoffrage, & Kleinbölting, 1991). The studies by Gigerenzer et al. (1991), Mazzoni and Nelson (1995), Nietfeld et al. (2005), Schraw (1994), and Stankov and Crawford (1996) consistently yielded more accurate judgments for global judgments than for local judgments. Whereas the latter type usually results in overconfidence, global judgments tend to be more appropriate and underconfident (Liberman, 2004). This phenomenon was termed the confidence–frequency effect by Gigerenzer and colleagues. It denotes a systematic difference between a metacognitive monitoring judgment of confidence in a single event (local judgment) and a metacognitive monitoring judgment of the frequency of correct answers in total (global judgment). Although local judgments appear to be less accurate, they provide important information about whether students are able to discern correctly and incorrectly solved items. To investigate metacognitive monitoring ability for global and local postdictions, two studies were implemented, each focusing on one of these two kinds of judgments.

The accuracy of metacognitive judgments is usually assessed via a difference score that indicates under- or overestimation. However, if some students highly overestimate performance and others highly underestimate, a difference value close to zero would result for the whole sample, suggesting perfect (mean) accuracy (Hacker, Bol, & Bahbahani, 2008). For that reason, the difference score might not be a valid indicator of how accurate students actually are. An absolute difference score provides further information on the accuracy of a judgment and overcomes this limitation of the difference score. To analyze metacognitive judgments at an item level, several approaches can be applied (see Lichtenstein & Fischhoff, 1977, or Schraw, 2009, for discussions of different calibration measures). For example, gamma is a commonly applied measure (Nelson, 1986); however, it has the disadvantage of not being able to measure any bias (Jang, Wallsten, & Huber, 2012).

Metacognitive monitoring ability and performance

Alongside general tendencies toward over- or underestimation in local and global judgments, students’ performance has been a strong predictor of judgment accuracy: Higher-performing students are more accurate than lower-performing students in their metacognitive monitoring judgments (Bol & Hacker, 2001; Hacker, Bol, & Keener, 2008; Nietfeld et al., 2005). Kruger and Dunning (1999) conducted several studies within different domains to investigate how accurately students at different performance levels estimated their performance in a previously performed test (global postdictions). Kruger and Dunning investigated whether students were able to judge their own performance in comparison to others’ performance and with regard to their own raw performance score. After grouping students post-hoc in four performance quartiles, they found across all domains that students in the bottom performance quartile significantly overestimated their performance relative to students in the top performance quartile. On the basis of this finding, Kruger and Dunning argued that “incompetence . . . not only causes poor performance but also the inability to recognize that one’s performance is poor” (p. 1130). They concluded that the lowest-performing students were unaware of being unskilled due to a lack of metacognitive skills. This prominent study sparked further studies investigating the unskilled-and-unaware effect in different settings and domains (e.g., Ehrlinger, Johnson, Banner, Dunning, & Kruger, 2008). However, not all previous studies could replicate the proposed effect; heterogeneous results occurred especially with test items in (educational) psychology (cf. Hacker et al., 2000; Hartwig & Dunlosky, 2014). These differences in study outcomes may be due to differences in group performance levels, because students are classified as low-performing in relation to the other study participants. For instance, in the studies by Hacker et al. (2000), only a small group of the students categorized as very-low-performing were unaware with respect to pre- and postdictions. Additionally, alternate (methodological) explanations were provided for the low accuracy of low-performing students, such as the better-than-average effect or regression to the mean (Burson, Larrick, & Klayman, 2006; Krajc & Ortmann, 2008; Krueger & Mueller, 2002). Others suggested that different assessment and analysis approaches to judging performance might lead to different interpretations of the results (Ackerman, Beier, & Bowen, 2002). Nevertheless, “perhaps the leading interpretation is that low performers are overconfident because they have a general deficit of metacognitive insight” (Miller & Geraci, 2011, p. 502).

Awareness of metacognitive monitoring ability

Miller and Geraci (2011) further investigated the supposed unawareness of low-performing students by soliciting their predictions in an exam situation and assessing metacognitive monitoring ability on the basis of these performance judgments. In addition, the authors asked for second-order judgments (SOJs) to assess the students’ awareness of their metacognitive monitoring ability. SOJs are confidence judgments for previously given performance judgments (Dunlosky, Serra, Matvey, & Rawson, 2005) and can be regarded as meta-monitoring (Dunlosky et al., 2005) or meta-metacognitive judgments (Buratti & Allwood, 2012). Confirming the results by Kruger and Dunning (1999), Miller and Geraci found that low-performing students’ performance judgments were too high in comparison with their actual performance. Analysis of the SOJs revealed, however, that the students seemed aware of their low metacognitive monitoring ability. Hence, although low-performing students provided exaggerated performance estimates, their confidence in their performance judgments was significantly lower than that of higher-performing students, who provided more accurate estimates. The authors differentiated between functional overconfidence—that is, people estimate their performance higher than it actually is—and subjective overconfidence—that is, people are overly certain of their estimates. Accordingly, low-performing students are functionally but not subjectively overconfident.

In their study, Miller and Geraci (2011) extended Kruger and Dunning’s (1999) research with predictions in a regular exam situation and by using SOJs. Whether the results concerning functional and subjective overconfidence also apply to SOJs of postdictions, however, has not yet been investigated. Postdictions are based on the privileged knowledge of the type of test and its items (Hacker et al., 2000), do not so heavily rely on general beliefs such as do self-efficacy and self-concept, and are therefore assumed to be better indicators of metacognitive monitoring. They are also of particular importance for individual self-assessment in the process of self-regulated learning (e.g., when solving mock exams). Hence, the question is whether, with postdictions, low-performing students are still functionally unaware. To that end, in the present two studies we investigated postdictions. The first study presented in this article focused on global judgments. In the second study, we additionally investigated local judgments, whose assessment offers several theoretical and methodological advantages. The specific strengths of local judgments are that they provide a fine-grained picture of students’ metacognitive monitoring ability and that they are a more specific measure of metacognitive monitoring because students have to refer to specific items (Nietfeld et al., 2005). Moreover, the assessment of local judgments offers the advantage of calculating internal consistency for the judgments, which would not be possible with a single global judgment. Local judgments refer to specific items and might also be less strongly influenced by a personality trait such as self-concept (Gigerenzer et al., 1991). Hence, rules of thumb cannot be applied, and students have to make a new decision for each item. Finally, local judgments are necessary as a source of information when students try to assess their knowledge (i.e., to monitor their knowledge in the process of self-regulated learning) and want to know explicitly which contents they need to study further (and not only to judge the proportion of knowledge that they already possess).

The use of local judgments enables differences in SOJs to be investigated with regard to the accuracy of performance judgments at an item level. Two questions regarding SOJs are of interest: whether the respective item was solved correctly, and whether the person indicated having solved the item correctly (performance judgment). Such a classification of actual performance and judged performance on single items corresponds to the classification of items according to signal detection theory (Green & Swets, 1966). The four possible combinations are hits (correctly solved and expected to be correct), misses (correctly solved but expected to be wrong), correct rejections (wrong solution and expected to be wrong), and false alarms (wrong solution but expected to be correct) (cf. Schraw, Kuch, & Gutierrez, 2013, or Winne & Muis, 2011, for overviews of the statistical tools to assess metacognitive judgment accuracy according to signal detection theory). The successful analysis of metacognitive judgments with these measures (see, e.g., Barrett, Dienes, & Seth, 2013; Jang et al., 2012; Maniscalco & Lau, 2012; Masson & Rotello, 2009) was the point of origin for our analyses of SOJs. We investigated how confident students are about items that they accurately judged as being correct/incorrect (hits/correct rejections), as compared to items for which they overestimated (false alarm) or underestimated (misses) their performance. This specific analysis at an item level goes beyond earlier studies based on SOJs that focused on judgments of averaged test performance.

Aims of the studies

The present two studies aimed to broaden previous findings on metacognitive monitoring ability and the awareness of this ability, especially in low-performing students. First, to investigate whether low-performing students are indeed unaware of their expected low metacognitive monitoring ability, in Study 1 we examined SOJs in addition to performance judgments. In accordance with Miller and Geraci (2011), lower-performing students were assumed to be overconfident in their performance judgments (functional overconfidence). However, the low-performing students’ confidence in their performance judgments should be lower than that of the high-performing students (no subjective overconfidence). Second, Study 2 tested whether global judgments and their respective SOJs would be more accurate than the averaged local ones (confidence–frequency effect). How this might affect the judgment accuracy of students at different performance levels was a key research question. To investigate judgment accuracy, an absolute difference score, in addition to a difference score, was calculated for global as well as for local judgments. Because the difference score averages the levels of the true difference, the absolute difference score is assumed to result in a stronger discrepancy than the difference score. Third, the SOJs of local judgments were analyzed at an item level to investigate whether students discriminate in their SOJs according to the four possible combinations of performance and performance judgments outlined by signal detection theory. Assuming that students are aware of their metacognitive monitoring ability, the SOJs of correctly estimated items (hits or correct rejections) were expected to be higher than the SOJs for wrongly estimated items (misses or false alarms).

General method

Two studies were conducted to extend previous work on students’ ability to judge their own performance, and thus their metacognitive awareness. In both studies, undergraduate education students were asked to postdict their own performance as well as to provide a respective SOJ. In Study 1, a single global judgment was investigated—that is, a judgment about performance on the entire test. In Study 2, local judgments were investigated, as well. The data from Study 1 were collected within an ecologically valid setting during a regular exam situation, and Study 2 was conducted as a laboratory study. The contents of the implemented tests were similar in both studies.

Study 1

Overview

The study was implemented in the final exam at the end of a study term. All data included in this study were assessed by all participating students at the same point in time. After students completed their exam, they were asked to voluntarily provide two judgments (a performance judgment and an SOJ) on a separate paper sheet. Altogether, students were given 45 min to complete the exam and to make the two judgments.

Method

Participants

The participating students attended an educational psychology course for undergraduate education students. Of the 351 students who took the corresponding exam, 196 voluntarily provided performance judgments and SOJs. Further analyses relied on this subsample only. Most were first-year students (89.2 %) and female (72.8 %), which is typical for an introductory course in education.

Instruments

Performance

The final exam served as an indicator of students’ performance in educational psychology. The students had to answer 32 multiple-choice questions (four options, single select, Cronbach’s α = .81). A sample item was “Which method is suitable to assess learning processes? (A) Portfolio, (B) Oral exam, (C) Written exam, or (D) Interview” [the correct answer is “(A) Portfolio”].

Postdicted performance judgment

After students completed their exam, they were asked to judge the raw score of items that they answered correctly (global performance judgment). The implemented question was, “What do you think: How many of the 32 test items did you solve correctly? _____ items.”

SOJ

Finally, students were asked to provide an SOJ. They were requested to judge their confidence in their performance judgment on a 5-point-Likert scale. The implemented item was “How confident are you that your performance judgment is correct?” Students were asked to select a confidence rating on a 5-point smiley scale, displayed in Fig. 1 (see Händel & Fritzsche, 2015; Jäger, 2004). The frowning face on the left represented low confidence (1), and the smiling face on the right represented high confidence (5).

Data analysis

To interpret the resulting scores more easily, the performance score and the global performance judgment score were recoded into percentage scores; for example, 16 of 32 items solved correctly was recoded to 50. A difference score was calculated as “estimated performance – actual performance” in order to investigate the degree of over- or underestimation of the global performance judgment. Negative values (max = –100) refer to an underestimation, and positive values (max = 100) to an overestimation of performance. The absolute value of the difference score was calculated in order to investigate the accuracy of the performance judgments. The absolute difference score was | estimated performance – actual performance |. Values close to 0 indicated high accuracy, and values close to 100 indicated low accuracy.

To investigate the influence of performance on the accuracy of the respective judgments, we followed the procedure implemented by Kruger and Dunning (1999) and Miller and Geraci (2011) by grouping students into four performance quartiles (Q1 = lowest-performing students, to Q4 = highest-performing students). A multivariate analysis of variance (MANOVA) was performed with the performance quartile as the independent variable and judgment, difference, absolute difference, and SOJ as dependent variables.

Results

Overall, students had high performance scores and slightly underestimated their performance.^{Footnote 1} Table 1 shows the descriptive statistics for each performance quartile in which students were grouped. Figure 2 additionally provides a quick overview of the descriptive study results, summarized for both studies.

Table 1 Descriptive statistics [M (SD)] for performance, performance judgments, differences, absolute differences, and second-order judgments (SOJs) in Study 1, reported separately by performance quartiles for global judgments

Full size table

A MANOVA resulted in the following statistical differences between the performance quartiles. First, the difference scores differed significantly between the performance quartiles [F(3, 196) = 23.34, p < .001, η _p ² = .267]: Students in the bottom quartile overestimated their performance, whereas students in the other three quartiles underestimated their performance (Tukey post-hoc comparisons between all quartiles except Q2–Q3 and Q3–Q4 were statistically significant, ps < .001). The quartile differences in absolute differences did not reach statistical significance [F(3, 196) = 2.34, p = .075, η _p ² = .035]. However, students in Q1 showed the highest absolute difference on a descriptive level. Finally, a significant difference emerged in the SOJs: F(3, 185) = 3.64, p = .014, η _p ² = .054. Tukey post-hoc tests indicated that students in Q1 and Q2 were less confident in their ratings than the students in the top quartile (ps = .023 and .034, respectively).

Discussion

The aim of the first study was to investigate whether low-performing students are unaware of their (expected) low metacognitive monitoring ability. To create a combined investigation of metacognitive monitoring ability and of the awareness of this ability, Kruger and Dunning’s study procedure (1999) was extended to include SOJs (cf. Miller & Geraci, 2011) and was conducted in a regular exam setting (see also the first study by Ehrlinger et al., 2008). Before considering low-performing students’ degrees of awareness of their expected low metacognitive monitoring ability, we first discuss whether their metacognitive monitoring ability was in fact low. Our results replicated the unskilled-and-unaware effect in its initial consideration (in other words, Q1 students were functionally overconfident) with respect to educational psychology questions. This result is noteworthy because it contradicts an earlier study also conducted with introductory psychology items in which the unskilled-and-unaware effect was not shown (Hartwig & Dunlosky, 2014). As expected, students in the lowest performance quartile overestimated their performance according to the difference score (functional overconfidence). In contrast, students in higher performance quartiles underestimated their own performance, which might be explained by a statistical artifact: The students in Q4 had a performance score of nearly 90. Hence, they had little opportunity to overestimate, but much space to underestimate, their own performance. In addition, the effect discussed by Liberman (2004) might have played a role: Insofar as students have to provide global judgments, they do not sufficiently take into account the guessing rate in their judgment.

Furthermore, in the present study we investigated the accuracy of the postdictions via the absolute difference score as a measure for metacognitive monitoring ability. The absolute difference scores did not differ significantly between the four performance quartiles. That is, students in the four performance quartiles seemed equally able to judge their global performance accurately. However, the descriptive statistics and the medium (albeit nonsignificant) effect size indicate that students in the lowest performance quartile tended to be the least accurate.

Regarding the issue of main interest—awareness of metacognitive monitoring ability assessed via SOJs—the students in the lowest performance quartile (who were functionally overconfident) were least confident in their performance estimations. That is, although low-performing students seemed to overestimate their own performance and appeared—on a descriptive level—least accurate, they seemed to acknowledge this in their confidence about their performance judgments. In other words, they did not seem to be subjectively overconfident. By contrast, the underestimation of students in the high performance quartile was related with the highest SOJs.

These results have implications for self-regulated learning. When students test their knowledge via solving a mock exam for which they do not have the sample solution, their further learning process will be based on their anticipated success. By considering their largely overestimated performance judgments only, low-performing students would presumably not invest enough effort in the further learning process, whereas high-performing students, who largely underestimate their performance, would presumably invest more effort than necessary. This learning pattern seems especially detrimental for low-performing students. On closer examination, however, this potential consequence appears to be attenuated by the lower SOJs, indicating that low-performing students are indeed aware of their inaccurate performance judgments.

Study 1 was limited by the fact that each student provided only one performance judgment and only one SOJ. From a methodological perspective, this did not allow us to compute internal consistencies of the performance judgments or SOJs. A global judgment can be used to gain a general idea of whether students are over- or underconfident, or whether they correctly estimate their performance. However, global judgments do not provide any information about whether students are able to discern correctly and wrongly solved items. Imagine two students (Susan and Robert) who both provide a correct judgment of 20 correctly solved items. Asking them which items were correctly solved could result in Susan showing perfect calibration by picking the exact items correctly solved, but Robert showing worse calibration, by being able to pick only ten items correctly solved (and also picking ten wrong items, while not being able to identify the other ten items that were correctly solved). In consequence, global judgments might be a good indicator of general over- or underconfidence, but they are limited regarding any information about students’ ability to distinguish between correct and wrong items. It was hence our aim to overcome this limitation in Study 2. To gain a finer-grained picture of students’ metacognitive monitoring abilities, performance judgments and their respective SOJs were investigated at an item level in Study 2.

Study 2

Taking into account the confidence–frequency effect, in the second study we investigated local judgments. In particular, the study addressed the question: Do the differences between performance quartiles occur not only with global judgments, but also with averaged local judgments? In addition, we analyzed whether local SOJs differ on the basis of whether an item was correctly or mistakenly judged as being correct or incorrect.