1 Introduction

Many earlier psychological and social–psychological studies claim that people are overconfident, which has sometimes been labelled the better-than-average effect.Footnote 1 However, recent economic studies analyzing choice behavior instead of verbal reports, show that individuals are able to estimate their skills correctly or even underestimate it.Footnote 2 A biased self-assessment can lead to systematic biases in individuals’ decision making, thus the topic is of high interest to economists.Footnote 3 Yet, barely any research has been done to analyze how overconfidence in comparison to underconfidence is perceived by others, and whether individuals adapt their reported self-assessment to others’ perception.

I use a controlled laboratory study to address this topic, thereby focusing on two aspects: first, I analyze whether underconfident individuals are perceived as more or less likeable than overconfident individuals. Secondly, I explore whether under- or overconfidence is perceived as a stronger signal for ambition and effort. As one’s self-assessment might be a signal of one’s actual performance, and therefore not the self-assessment but the estimated performance might be rewarded, I control for performance as follows: the individuals compared exhibit the same relative rank based on their performance in a task, conducted at the beginning of the experiment. My findings contribute to the expanding literature analyzing the advantages of overconfidence in comparison to underconfidence. Thereby I add to the questions why individuals might (rationally) exhibit a bias in their reported self-assessment and in which situations we should expect individuals to over- or underestimate themselves. How reported over- or underconfidence is perceived by others is difficult to analyze in the field as self-confidence interacts with other characteristics in many ways. The anonymous laboratory setting allows me to separate the causal effects of over- and underconfidence on others’ appraisal, by only varying the accuracy of subjects’ reported self-assessment.

The experiment consists of two parts. In the first part all subjects perform an incentivized real effort task which serves as the basis for their self-assessment. In the second part two thirds of the subjects (agents) are assigned a rank based on their relative performance, whereas each rank is assigned to two subjects. The two agents having the same rank are assigned to one of the remaining participants (principals). Both agents estimate their relative rank and the principal learns by how many ranks each of them over- or underestimated himself.Footnote 4 In treatment SYMP the principal chooses to whom of the two agents he wants to give 5 Euros. In treatment PERF the principal has a monetary incentive to choose the agent who performs better in a repetition of the real effort task. The only information the principal gets is the deviation of the agents’ self-assessments and the information that both agents have the same actual rank. This element of the design is essential as subjects on higher ranks are more likely to be underconfident, while subjects on lower ranks are more likely to be overconfident, due to the limited scale for self-assessment. If subjects’ actual ranks differed, principals might choose the underconfident agent not because they prefer underconfidence, but because underconfidence might signal a higher actual rank.

The results show that it can be advantageous to be underconfident with respect to the perception of others. In SYMP principals reward the underconfident agent significantly more often than the overconfident agent. In PERF principals bet on the underconfident agent significantly more often than on the overconfident agent. Questionnaire data reveals that underconfidence is preferred over overconfidence, and that the less self-confident agent is expected to exert more effort to improve himself, while the more self-confident agent is expected to rest on his high self-perception.

I also analyze whether the antipathy towards overconfidence is anticipated by eliciting agents’ (incentivized) beliefs of principals’ selection choices. Moreover, to test whether agents strategically bias their self-assessment in order to increase their selection chances, I conduct two control treatments without monetary incentives to be selected by the principal. Agents’ beliefs in PERF show that they do not expect underconfidence to signal a higher performance than overconfidence. Correspondingly, there is no difference in self-assessment between PERF and its control treatment. In contrast, subjects anticipate that underconfidence is rewarded significantly more often than overconfidence, and men state marginally significantly lower ranks in SYMP than in the non-strategic control treatment. Yet, there is no difference in self-assessment for women. One possible explanation is that that women do not downgrade their self-assessment strategically. Yet, I rather suggest that they even lower their self-assessment in the non-strategic setting, as its accuracy is still observable. Thus, they might still be afraid that their image might suffer when being overconfident.Footnote 5 Furthermore, women and men might downgrade their self-assessment in non-strategic settings due to an idea which goes back to Myerson (1991). He suggests that people internalize optimal behavior from certain situations and behave the same way in similar but different situations.Footnote 6 Thus, it might be the case that women have somehow imprinted the social norm of modesty and even downgrade their self-assessment in environments in which the (monetary) need for modesty is absent.Footnote 7

There is an extensive and expanding literature on overconfidence. One strand of this literature analyzes whether people are overconfident (e.g., Clark and Friesen 2009; Hoelzl and Rustichini 2005; Svenson 1981), thereby focusing on the definition of overconfidence, the appropriate measurement, and influencing factors (see e.g., Benoît and Dubra 2011; Moore and Healy 2008). This paper is rather related to the literature identifying potential consequences of a biased self-assessment. Within this literature many papers focus on non-payoff maximizing decisions caused by a biased self-assessment, e.g., overinvestment, value-destroying mergers of CEOs (Camerer and Lovallo 1999; Malmendier and Tate 2005, 2008; Odean 1999), and suboptimal selection of payment schemes (Dohmen and Falk 2011; Niederle et al. 2013; Niederle and Vesterlund 2007), or work environments (Niederle and Yestrumskas 2008). Another strand of the literature, mainly theoretical work, identifies utility enhancing aspects of being overconfident, providing (behavioral) explanations for overconfidence at the same time. Overconfidence may directly enhance well-being (Akerlof and Dickens 1982; Brunnermeier and Parker 2005; Caplin and Leahy 2001; Koszegi 2006), boost one’s motivation and willpower (Bénabou and Tirole 2002; Brocas and Carrillo 2000), or increase performance (Compte and Postlewaite 2004).

Only very few recent papers consider the impact of one’s self-assessment on others, and whether individuals account for others’ perception when stating their self-assessment. Ewers and Zimmermann (2012) theoretically and experimentally analyze whether individuals bias their self-assessment due to image concerns. They find that individuals state a higher self-assessment if reports (but not true performances) are observed by an audience (anonymity is lifted) in comparison to the situation in which reports are private.Footnote 8 Yet, they find that individuals do not bias their reported self-assessments if the true performance is also publicly revealed. Thus, in their paper subjects do not try to signal high confidence or modesty if the accuracy of one’s self-assessment is observable. However, in contrast to my study, in Ewers and Zimmermann (2012) individuals have no monetary incentives to strategically bias their reported self-assessment. Moreover, their study does not analyze how one’s reported self-assessment is perceived by others, i.e., if one’s social image or expected ability is actually increased by stating a higher self-assessment.

The perception of self-assessment and the strategic incentives to bias one’s self-assessment are addressed in an experimental study by Charness et al. (2012). They investigate whether individuals bias their stated confidence about their performance to deter or motivate others to enter a two-player tournament, and whether others react to it. They find that males inflate their stated confidence when deterrence is strategically optimal, and that men and women deflate their confidence if encouraging entry is strategically optimal. Moreover, they observe that individuals are less likely to enter the competition, the higher the stated confidence of the other person is. In line with these results Reuben et al. (2012) observe that men inflate their self-assessment to be voted as the group leader, which turns out to be a successful strategy.Footnote 9 However, in contrast to my experimental design, in Charness et al. (2012) and Reuben et al. (2012) individuals’ actual performance is unknown and might strongly differ. Expecting all individuals to exhibit the same bias in self-assessment, the ranking of subjects’ self-assessment most likely corresponds to the ranking of subjects’ true performance. Thus, it is not the bias in self-assessment, which reveals information about the actual performance, but the self-assessment per se. This is different in my experiment, in which both agents have the same actual (relative) performance, enabling me to investigate whether the bias in self-assessment serves as a performance signal. In many real-life situations, in which individuals have to assess their performance, e.g., in promotion interviews or wage negotiations, the true performance is somehow appraised or at least partly known. Thus, the accuracy of the self-assessment might also be evaluated.

To the best of my knowledge this study is the first, to experimentally test whether a principal prefers an over- or an underconfident agent. Yet, theoretical studies exist, providing different predictions. Gervais and Goldstein (2007) suggest that skill and effort are complements, thus an overconfident agent makes a higher effort choice due to underestimating the cost of effort or overestimating his marginal productivity. Sautmann (2013) suggests that overconfident agents overestimate their expected payoff, thus receiving higher incentives with the same wage. In contrast, Santos-Pinto (2008) suggests that a positive self-image and effort are substitutes, as an overconfident agent thinks that he has to exert less effort for the same outcome than an underconfident agent.

This paper might also contribute to the literature on gender differences in self-assessment.Footnote 10 I suggest that the observed antipathy towards overconfidence adds to the explanation of the gender difference in reported self-assessment as women report to experience emotions, i.e., the negative attitude towards overconfidence, more intensely than men (see e.g., Brody 1997; Grossman and Wood 1993). In addition, they might even be punished more harshly than men when being self-confident (Eagly 1987; Rudman 1998). My results could moreover provide an explanation for the findings of Ludwig and Thoma (2012) who observe that women are ashamed to overestimate themselves in public, while men have not.

The rest of this paper is structured as follows. In the next Sect. I describe the experimental design and the two different treatments. In Sect. 3 I present the selection behavior of principals. In Sect. 4 I analyze whether agents anticipate principals’ preferences and whether they strategically bias their self-assessment. I conclude in Sect. 5.

2 Experimental design

The experiment consists of two parts and a questionnaire, with separate instructions for each part. In part 1 all participants conduct a real effort task (task 1), which is to solve Raven’s Advanced Progressive Matrices, a measure of cognitive ability (Raven 2003). For each matrix participants have to select one out of eight symbols fitting the visual pattern of the matrix. An example of a matrix is given in Fig. 1.

Fig. 1
figure 1

Example of a Raven Advanced Progressive Matrix

In this task ability and effort are needed to succeed in the task. The participants have 5 min to solve as many matrices as possible. After choosing a symbol, they receive feedback whether their chosen symbol is correct, and thereafter, the next matrix appears. Once they have chosen a symbol, they cannot go back and correct it. It is not possible to skip a matrix without making a choice either. On subjects’ screens the remaining time as well as the number of correctly and wrongly solved matrices is displayed. The maximum number of matrices is 22, none of the subjects managed to get to the last matrix. Subjects are informed about their absolute performance, but not about their relative performance or the performance of others. They receive five tokens for each matrix they solve correctly. For each wrong answer five tokens are deducted from their earnings. Yet, they receive at least zero tokens for part 1. During the whole experiment participants earn tokens, which are converted into Euros at the end of the experiment, at an exchange rate of 1 Euro for 10 tokens. Before the task starts, participants solve two matrices as a trial without payment. After part 1 is finished the instructions for part 2 are distributed. The instructions for both parts are read aloud.

At the beginning of part 2 subjects are randomly assigned a role. Out of the 24 participants in each session, the role A is assigned to 8 participants (principals) and the role B is assigned to 16 participants (agents). All 16 agents are ranked from 1 to 8, according to their performance in task 1, where each rank is assigned to two agents. The best and second best agent receive rank 1, the third and fourth best receive rank 2 and so on. The worst and second worst agent receive rank 8.Footnote 11 The two agents having the same rank, are merged to a pair, i.e., there are eight pairs in each session. Each pair is randomly assigned to one of the eight principals.

I conduct two main treatments and two control treatments. In each treatment both agents estimate their rank between 1 and 8 and the principal selects one of the two agents. The only information the principal receives when making his choice is the deviation of the agents’ self-assessment, i.e., whether an agent under- or overestimated himself and to what extent. What differs between the treatments are the principal’s incentives for his choices.

Agents receive 20 tokens if their estimated rank corresponds with their actual rank. The payment for the accuracy of the guessed rank was not announced in the instructions, but only on the screen of the agents. Otherwise an inequity averse principal might reward the agent who did not estimate his rank correctly independent of his preferences for an accurate self-assessment.Footnote 12

As the scale for subjects’ relative self-assessment is limited, it naturally occurs that subjects on a higher rank are more likely to underestimate their rank and subjects on a lower rank are more likely to overestimate their rank. If the actual ranks of the two agents differed, underconfidence might signal a higher actual rank than overconfidence. Therefore, the principal’s choice might be influenced by beliefs about the agents’ actual ranks, mitigating the attitude towards over- or underconfidence. I exclude this effect by merging two subjects on the same rank. Then, the deviations of agents’ self-assessments might only hint on a difference in the absolute performance, the agents’ expected performance or their self-confidence.

2.1 Treatment sympathy (SYMP)

In this treatment each principal selects one of his two agents, who then receives 50 tokens. To exclude fairness concerns of the principal, it is neither possible to split the 50 tokens nor to avoid the decision. The only information the principal receives about the two agents is the deviation of their guessed ranks from their actual rank. Based on this information, he chooses whom to give the 50 tokens. I use the strategy vector method (SVM): before the principal learns the actual deviations of the agents, each principal takes the decision for the same 12 different situations. His choice becomes relevant for the situation, which actually applies to the agents’ deviations. For the case that none of the situations applies, the principal also chooses whether the agent who estimates the higher (better) rank or the agent who estimates the lower (worse) rank shall receive the 50 tokens. The accuracy of the agents’ estimated ranks is not considered in this decision.Footnote 13

The different situations are listed in Table 1. The deviation is negative if an agent underestimates himself and it is positive if an agent overestimates himself.Footnote 14

Table 1 The situations for which principals take a decision

In situations 1a, 1b, 2a, and 2b one agent estimates his rank correctly and the other agent either under- or overestimates himself. In situations 3a and 4b both agents either under- or overestimate their ranks, but the absolute deviation differs. These situations are to test whether accuracy is preferred over over- and underconfidence. In situations 4a, 4b, 5a, and 5b one agent is over- and the other agent is underestimating his rank and the absolute deviations of the agents differ. The principals’ choices in these situations will reveal whether the size of the deviation is more important than the direction of the deviation. In situations 6a and 6b one agent under- and the other overestimates his rank and both agents have the same absolute deviation. Here the decisions of the principals will reveal whether they prefer under- or overconfidence.Footnote 15

For every situation the principal selects the agent he wants to receive the 50 tokens. If the two agents estimate the same rank, and therefore have the same deviation, a situation not covered by the decisions of the principal, chance determines which agent receives the 50 tokens.Footnote 16

After his decisions, the principal learns the actual deviations of the two agents and thus which situation becomes relevant. He neither learns the actual nor the estimated ranks of the agents. The agents learn their actual rank at the end of the experiment, but do not learn the guessed rank of the other agent. They get informed whether the principal selected them or not, or whether the selection happened by chance.

Every principal receives 50 tokens for part 2 independent of his decision.

Control-treatment for agents (SYMP-CON)

To check whether agents bias their self-assessment in order to be selected by the principal, I conduct a control treatment in which the agent receiving the 50 tokens is not selected by the principal, but by chance. However, to keep social image concerns constant, each pair of agents is assigned to a principal and the principal learns the actual deviations of the two agents. Moreover, I ask the principals to make the same (hypothetical) decisions as in SYMP, but without any monetary consequences for any participant. Agents do not know that principals make these hypothetical choices.

2.2 Treatment performance (PERF)

In PERF the principal takes the same decisions as in SYMP, i.e., using the SVM, the principal chooses one out of two agents, based on the deviations of their self-assessments. However, the motivation of the principal is different. While in SYMP the principal’s choice does not influence his monetary payoff, in PERF it does. To maximize his expected earnings, the principal should choose the agent with the higher expected performance in a repetition of task 1, which is called task 2.

After the principals’ decisions, all agents perform the same task as in part 1, with different matrices. Every agent solves the matrices for himself, but the payment scheme is competitive, with the two agents competing against each other. The agent, who achieves the higher difference of correctly minus wrongly solved matrices, receives 50 tokens. In case of a tie, the agent having solved more matrices correctly, receives the 50 tokens. If this number is also equal, chance decides.

The principals do not participate in task 2. They bet on the agent chosen in situations 1–12. If the agent who the principal bets on wins the competition, the principal receives 50 tokens. If both agents estimate the same rank, chance determines who the principal bets on. After his decisions the principal learns about the actual deviations of both agents and which situation is relevant. As in SYMP, the principal does not learn the actual or the estimated ranks of the agents. An agent receives 50 tokens if the principal bets on him. This monetary incentive is introduced to analyze whether agents bias their self-assessment in order to be selected. The payment is only announced on the agents’ screens, thus principals do not know that an agent receives money when they bet on him, in order to avoid that feelings of sympathy and inequity aversion influence principals’ choices. An agent learns whether the principal bet on him, but only after task 2. This is common knowledge. An agent does not learn the estimated rank of the other agent, only when the two estimated the same rank, then they learn that a chance move decides on whom the principal bets.

Control-treatment for agents (PERF-CON)

The only difference between PERF-CON and PERF is that in PERF-CON agents do not receive any money when selected by the principal. Thus, they do not have a monetary incentive to bias their reported self-assessment. As principals do not know about agents’ payment when being selected, PERF-CON and PERF are identical for principals.

In both treatments all subjects are asked additional incentivized questions. In particular they estimate the mean deviation of agents and the choice behavior of principals. After the announcement of the payoffs, subjects complete a questionnaire asking for their age, gender, subject of study, choice motivation, and a self-assessment of risk preferences (Dohmen et al. 2011). Figure 2 illustrates the timeline of the treatment PERF. SYMP has the same course except that there is no task 2.

Fig. 2
figure 2

The timeline of PERF

2.3 Experimental procedure

I conducted the computerized experiment in the Munich Experimental Laboratory for Economic and Social Sciences (MELESSA) at the University of Munich during spring 2012. The experiment was programmed and conducted with the software z-tree (Fischbacher 2007) and participants were recruited via ORSEE (Greiner 2004). In total 216 subjects participated in the experiment (mainly students from the universities in Munich). I ran three sessions of SYMP and two sessions of each of the other treatments SYMP-CON, PERF and PERF-CON. Subjects were randomly assigned to sessions and could take part in one session only. Each session had 24 subjects and lasted a little less than 1 h. Subjects earned 12.15 Euros on average (including a show-up fee of 4 Euros).

3 Experimental results

In this section I analyze the choice behavior of principals. I will analyze agents’ self-assessment in the next section.

3.1 Treatment SYMP

For the analysis of principals’ choices I pool the real (SYMP) and hypothetical (SYMP-CON) choices as they do not differ significantly.Footnote 17 Moreover, in this section, I refer to the results in SYMP which are actually the pooled results of SYMP and SYMP-CON. Table 2 reports the shares of principals choosing agent 1 or agent 2 for each of the 12 different situations.

Table 2 Principals’ selection behavior in SYMP

The results clearly show that principals have a preference for accuracy. If agents’ absolute deviations differ (situations 1a–5b), the agent who guesses his rank correctly (situations 1a–2b) or who has the smaller absolute deviation (3a–5b) is selected significantly more often.

Result 1

Principals reward agents with a correct guess or with smaller deviations significantly more often than agents with larger deviations, regardless of the sign of the deviation.

If the absolute deviations are the same (situations 6a and 6b) principals favor agents who are underconfident rather than overconfident. In situation 6a 65 % of principals (26 out of 40) reward the agent who underestimates his rank by one rank, whereas only 35 % of principals (14 out of 40) reward the agent who overestimates his rank by one rank. This distribution is weakly significantly different from a 50:50 split (two-sided binomial test, p = 0.080). The result is even stronger in situation 6b, in which 87.5 % of principals (35 out of 40) reward the agent who underestimates his rank by two ranks and only 12.5 % of principals (5 out of 40) reward the agent who overestimates his rank by two ranks (two-sided binomial test, p = 0.000). If the absolute deviation is unknown, and principals are asked to choose either the agent estimating the lower rank or the agent estimating the higher rank, 67.5 % of principals reward the agent estimating the lower rank (two-sided binomial test, p = 0.038).

Result 2

Agents who are underconfident are rewarded significantly more often than agents who are overconfident if the absolute deviation is the same.

The preference for underconfidence over overconfidence can also be seen when taking a closer look at situations 4 and 5. In 4a (5a) only 5 % (2.5 %) of principals choose the less accurate, overconfident agent, whereas in 4b (5b) 32.5 % (27.5 %) choose the less accurate, underconfident agent. The differences between situations 4a versus 4b and 5a versus 5b are significant (McNemar p = 0.007 and 0.002). Note that there is no difference in selection behavior between situations 1a versus 1b and 2a versus 2b. Thus, the higher selection rates of the less accurate agent in situations 4b and 5b, in comparison to 4a and 5b, seem to be driven by an aversion towards overconfidence.

These results are confirmed by probit regressions, which are reported in Table 3.Footnote 18 The dependent variable is 1 if the agent with the larger absolute deviation is selected. I include a dummy for the situations, in which the agent with the larger absolute deviation states the lower rank (situations 1b, 2b, 3b, 4b, and 5b). Column (1) includes principals’ choices for the situations 1–3, i.e., the situations in which either one agent has a deviation of 0 (situations 1 and 2), or both agents are either under- or overconfident (situations 3). Column (2) includes the principals’ choices for the situations 4 and 5, i.e., all situations in which agents have a different absolute deviation and one agent is under- and the other agent is overconfident. In both regressions I include control variables for risk and gender (of principals) and cluster on principals.Footnote 19

Table 3 Probit of selection of the less accurate agent in SYMP

The coefficient of the dummy variable in situations 1–3 is not significant, while in situations 4–5 it is highly significant. This shows that the less accurate agent is more likely to be selected if he is under- and the other agent is overconfident.

A modest agent who underestimates his performance is more likely to be rewarded by the principal than an overconfident agent, who believes that his performance has been better than it actually was. In the questionnaire answered by subjects at the end of the experiment, a reasoning for this behavior is revealed: most subjects prefer modest persons to self-confident persons. Some subjects even stated explicitly that they dislike people who are overconfident.

3.2 Treatment PERF

As the treatments PERF and PERF-CON are identical for the principals and only differ in the agents’ payment about which principals are not informed, I pool the data of the principals’ decisions and talk about the results in PERF, while actually meaning the results of PERF and PERF-CON. The principals’ choice behavior is reported in Table 4.

Table 4 Principals’ selection behavior in PERF

In contrast to SYMP the preference for accuracy is less pronounced. In the situations 1a, 2a, 3a, 4a, 5a the more accurate agent is the one being less confident about his performance. Here, significantly more principals select the more precise and less confident agent. However, in situations 2b, 3b, 4b, 5b, 6b, the more accurate agent is the one being more confident about his performance. Here, the more accurate agent is not selected significantly more often, i.e., selection rates are not significantly different from a 50:50 split as confirmed by binomial tests reported in Table 4. Note that even for situations 1b and 2b, in which one agent correctly assesses himself and the other agent underestimates himself, roughly 50 % of principals select the underconfident agent.

Result 3

More accurate agents are expected to perform better than overconfident agents.

Result 4

Underconfident agents are expected to perform as good as agents who estimate their rank more precisely or even correctly.

If agents’ absolute deviations are equal (situations 6a and 6b), the underconfident agent is selected more often than the overconfident agent. In situation 6a 62.5 % of principals (20 out of 32) bet on the agent who underestimates his rank by one rank, while only 37.5 % (12 out of 32) bet on the agent who overestimates his rank by one rank. However, the result is not significant when using a two-sided binomial-test (p = 0.216). Yet, in situation 6b 75 % of principals (24 out of 32) bet on the agent underestimating his rank by two ranks, while only 25 % (8 out of 32) bet on the agent overestimating his rank by two ranks. This is significantly different from a 50:50 split (two-sided binomial-test, p = 0.008). Moreover, 62.5 % select the agent estimating the lower rank and 37.5 % select the agent estimating the higher rank (p = 0.216).

Result 5

Underconfidence seems to be perceived as a stronger signal for future performance than overconfidence.

These results also become apparent when conducting the same two probit regressions as in SYMP. The results are reported in Table 5.Footnote 20

Table 5 Probit of selection of the less accurate agent in PERF

As above, the dependent variable is 1 if the principal selects the agent with the higher absolute deviation. The significance of the coefficients of the dummy variables show that the less accurate agent is more likely to be selected if he is underconfident. Note that in contrast to SYMP the coefficient in column (2) is also highly significant. This shows that even if the more accurate agent is assessing himself correctly (or the deviation of his self-assessment goes in the same direction), the less accurate agent is more likely to be selected if he is under- and not overconfident.

In addition, the within-data analysis of principals shows that roughly one third of the principals always selects the agent stating the lower rank, independent of the agents’ absolute deviations. Another third of principals always selects the more accurate agent. Very few principals always select the agent stating the higher rank, and some principals seem to select randomly. Thus, in situations in which selection is distributed equal, the selection does not occur randomly, but most principals have a preference for accuracy or a low self-assessment. This is also confirmed by answers given in the questionnaire.

The data shows that both agents are equally likely to win the competition, independent of their self-assessment. Considering all agents stating the worse rank, a share of 52 % (14 out of 27) wins task 2.Footnote 21 Yet, only very few principals seem to choose randomly, but rather seem to have a clear preference. In the questionnaire I asked subjects to state reasons for their choice. The most frequently given answer was that they expect the underconfident agent to try harder to improve and thus to exert more effort in task 2 than the overconfident agent. The few principals who choose the overconfident agent stated that they expect self-confidence to enhance performance. The reason for selecting the accurate agent was that an agent being able to estimate his performance correctly is expected to have a high overall level of performance.

4 Further analysis

The results in the former section show that being overconfident might have negative consequences. This might explain why individuals are sometimes modest when assessing their performance. In this section, I analyze whether agents anticipate the preference for underconfidence and whether they (consciously) downgrade their self-assessment.

4.1 Anticipation of principals’ preferences

After the principals’ choices, agents estimate principals’ selection behavior for the situations listed in Table 6. For each situation they estimate how many of the eight principals in their session selected the agent with the deviation listed first. One question is randomly chosen for payment and participants receive 1 Euro if their answer does not differ more than +/\(-\)0.5 from the correct answer. The agents’ average estimations are reported in Table 6.

Table 6 Agents’ average estimation of principals’ choice behavior

In SYMP agents anticipate the preference for a correct self-assessment (1a, 1b) but underestimate the preference for smaller mistakes (4b). However, they anticipate principals’ preference for underconfidence and their aversion towards overconfidence (6a). In PERF agents correctly anticipate that underconfident agents are expected to perform as good as more precise agents (1b, 4b), but do not anticipate the strong preference for underconfident agents (6a).

4.2 Strategic adaptation of reported self-assessment

Agents only anticipate principals’ preference for underconfidence in SYMP, but not in PERF. If agents adapt their reported self-assessment to be selected by the principal, agents’ self-assessment should thus be lower in SYMP than in PERF. This actually holds true as reported further below. Yet, as the settings in the two treatments are different, I cannot exclude that other factors are also responsible for the difference in reported self-assessments. To cleanly check wether agents adapt their reported self-assessment to principles’ preferences, I compare the stated ranks in SYMP and PERF with the stated ranks in the control treatments SYMP-CON and PERF-CON.

To control for potential differences in performance across treatments, I calculate the deviations of agents’ stated ranks instead of only comparing agents’ reported self-assessments. However, as agents’ ranks are determined endogenously within a session, agents’ actual ranks might be influenced by performance differences across sessions. Therefore, I use simulated ranks instead of the actual ranks to calculate agents’ deviations. The simulated rank of an agent is the rank that is most likely assigned to him, given his performance and the performance distribution of all agents in all treatments.Footnote 22 I calculate the deviation of an agent as his simulated rank minus his actual rank. Thus, a negative deviation represents underconfidence, a positive deviation represents overconfidence. Table 7 gives an overview of agents’ average guessed and simulated ranks as well as their average deviation in all treatments.

Table 7 Guessed and simulated ranks of agents

While agents are slightly underconfident in SYMP (average deviation \(-\)0.13), they are slightly overconfident in the other three treatments.Footnote 23 The deviation across SYMP and PERF is significantly different (MWU, two-sided, p = 0.021). I cannot differentiate whether agents bias their reported self-assessment upwards in PERF or downwards in SYMP.Footnote 24 Moreover, the differences between treatments and control treatments are not significant (MWU, two-sided, SYMP: p = 0.562, PERF: p = 0.236). As participants do not seem to anticipate the preference for underconfidence in PERF the absence of a difference in deviations between PERF and PERF-CON is not surprising. However, I expected a lower reported self-assessment in SYMP than in SYMP-CON. A possible explanation for the missing difference is that in SYMP-CON (as in SYMP) principals get to know whether an agent over- or underestimated himself. Ludwig and Thoma (2012) show that women state a lower self-assessment if overestimation becomes public, whereas men do not. This effect might bias women’s reported self-assessment in the same direction as the ambition of being selected by the principal, leading to a low reported self-assessment in both treatments. As men do not seem to be influenced by observable overestimation I conduct ordered probit regressions for each gender separately. The results are reported in Table 8.Footnote 25 The dependent variable is the guessed rank. The independent variables are the simulated rank (as performance measure) and dummies for the treatments PERF, SYMP-CON, and PERF-CON. I also control for risk aversion.

Table 8 Ordered probit of guessed rank

Column (1) reports the results for women, column (2) reports the results for men. For men, the coefficient of SYMP-CON is marginal significant (p = 0.072) and negative, i.e., men rank themselves worse in SYMP than in SYMP-CON. It seems as if men strategically downgrade their reported self-assessment in SYMP to increase their selection chances.

There is no significant difference in reported self-assessment for women across SYMP and SYMP-CON. Note that there is no gender difference in reported self-assessment (each treatment considered separately), with one exception: SYMP-CON. Here, men rank themselves higher than women (MWU, two-sided, p = 0.032) which fits to the hypothesis that men downgrade their reported self-assessment in SYMP.

Besides a conscious downgrade of reported self-assessment, it can be the case that the social aversion towards overconfidence influences subjects’ self-assessment unconsciously. Subjects’ reported self-assessment might not only be influenced in situations, in which others learn their self-assessment, but per se. Charness et al. (2012) show that individuals might act out of unconscious strategic concerns, even in situations, in which strategic concerns are absent. They pick up the idea made by Myerson (1991) and further developed by Samuelson (2001) that people make the same decisions in situations that appear to be similar for the sake of convenience. This can lead to suboptimal behavior in certain situations. While Charness et al. (2012) argue that subjects might be overconfident due to an internalization of the positive impact of overconfidence, I suggest that the opposite can be the case.

5 Conclusion

In this paper I analyze how overconfidence in comparison to underconfidence is perceived by others. The results reveal that underconfident agents are perceived as more likeable than overconfident agents and are expected to exhibit a higher performance in a real effort task. Questionnaire answers suggest that modest agents are expected to be more ambitious to improve, while overconfident agents rather have the reputation to rest on their high self-confidence.

Elicited beliefs of agents show that they do not expect the principals to select the underconfident agent more often when performance is the critical selection criterion. However, they anticipate that underconfidence is deemed more likeable than overconfidence. The comparison of reported self-assessments to a treatment, in which the principal cannot make a selection choice (non-strategic setting) shows that men slightly deflate their reported self-assessment strategically to be rewarded by the principal. Yet, women do not. An explanation might be given by women’s shame of overestimation (Ludwig and Thoma 2012) suggesting that women even deflate their reported self-assessment in the non-strategic setting, as principals still learn the deviation of agents’ reported self-assessment. Thus, apart from monetary consequences women might expect their social image to suffer. Moreover, besides the conscious adaptation of reported self-assessment, individuals might have internalized the negative attitude towards overconfidence and might be modest in situations, in which no strategic concerns are at place.

While further research is needed to precisely identify in which situations individuals might bias their reported self-assessment consciously or unconsciously, the results reported in this paper provide an important fact one should consider when eliciting and interpreting individuals’ stated self-assessment: subjects might not state the self-assessment that they actually have, but that has the largest signaling value.