1 Introduction

In Mizrahi (2013), I argue that arguments from expert opinion are weak arguments. An argument from expert opinion is an argument one makes “under conditions of uncertainty” (Mizrahi 2016, p. 246), i.e., when the truth-value of p is unknown and there is no reason to believe that p is the case other than the fact that an expert judges that p is the case. To appeal to expert opinion is to take the expert’s judgment that p is the case as (defeasible) evidence that (probably) p. For if there is evidence that p is the case independent of the fact that an expert judges that p, then why appeal to expert opinion at all? We can “cut the middleman” and appeal directly to the evidence that supports p (Mizrahi 2013, p. 71). To say that an argument from expert opinion is weak is to say that an expert’s judgment that p does not make it significantly more likely that p is the case. This claim is supported by experimental studies on expert performance. Such studies show that expert performance is often no better than the performance of novices or even chance.

Some argumentation theorists have responded to my argument in Mizrahi (2013), but I think that their responses miss the main question that my argument raises. That is to say, instead of addressing the question of how to justify the assumption that expert judgments are a reliable source of evidence, which accounts of arguments from expert opinion simply take for granted, those who responded to Mizrahi (2013) have chosen to focus on my formulation of arguments from expert opinion and accuse me of “refusing to countenance the possibility that other premises of the form of the argument from expert opinion need to be taken into account” (Walton 2014, p. 142),Footnote 1 on my examples and accuse me of being a “radical sceptic about expertise” (Seidel 2014, p. 215),Footnote 2 and even on objections raised against my argument (rather than my own argument) and accuse me of being “confused” (Hinton 2015, p. 542).

However, as I show in Mizrahi (2013, pp. 67–72), virtually all formulations of arguments from expert opinion, including Walton’s (2006, p. 750), take it for granted that an expert’s judgment that p counts as (defeasible) evidence for p. For example, in Walton et al.’s (2008, p. 20) argument scheme for Appeal to Expert Opinion (Version IV), it is the Conditional Premise that captures the basic assumption that expert judgments count as (defeasible) evidence for propositions:

Conditional Premise: If source E is an expert in a subject domain S containing proposition A, and E asserts that proposition A is true (false), then A may plausibly be taken to be true (false).

Similarly, in Wagemans’ (2011, p. 337) “scheme for argumentation from expert opinion,” this basic assumption is stated as follows: “Accepting that O is asserted by E renders acceptable that O is true or acceptable.” No matter how many premises an argumentation scheme for arguments from expert opinion contains, or how many critical questions are added to that argumentation scheme (cf. Walton et al. 2008, pp. 14–15), the basic assumption is that expert judgments count as (defeasible) evidence for propositions. After all, that is precisely what an argument from expert opinion is supposed to be; that is, an argument from expert opinion is supposed to be an inference from what an expert judges to be the case to the conclusion that what the expert judges to be the case probably is the case.Footnote 3 The question that my argument in Mizrahi (2013) raises, then, is this: Why assume that an expert’s judgment that p is (defeasible) evidence for p? In other words, according to Goodwin (2011, p. 293):

the appeal to expert authority is a blackmail and bond transaction (Goodwin 2001) that brings the background norm of respect for expertise to bear in a particular situation. The appeal gives citizens some means to assess the expert and some reason to trust what he says; it also forces them to heed his views. The appeal also requires the expert to take responsibility for his views, putting himself in a position to be held accountable if he turns out to be wrong (emphasis added).Footnote 4

Goodwin (2011, p. 293) calls this “‘he would not risk…so I can trust’ reasoning” and says that it “is widespread in other situations where people are trying to communicate in the face of information asymmetries.” As I point out in Mizrahi (2013, p. 72) however, the fact that this sort of “he would not risk…so I can trust” reasoning is widespread does not make it good reasoning. The tendency to think in terms of stereotypes is also widespread, perhaps even hardwired into the human brain (Cohen 2009, pp. 164–165). But stereotypical thinking is clearly not a good way to reason. So the question is this: why should we respect expertise at all (let alone accept such respect as a “background norm”)? In a recent reply to critics (Mizrahi 2016, p. 241), I put this question as follows:

The question, then, is whether arguments from expert opinion are good ampliative arguments, i.e., whether we can gain new knowledge by arguing on the basis of expert opinion (emphasis in original).

In other words, when we appeal to expert opinion, we often need experts to weigh in on questions whose answers are unknown, and we expect experts to be able to provide new answers that are significantly more likely to be correct than not. For instance, we turn to political pundits and analysts for insight into the electoral process. But if it turns out that the judgments of these pundits and analysts are unreliable, as seems to have been the case in the recent Brexit vote in the UK and the recent presidential elections in the United States,Footnote 5 then why trust expert judgments at all? If they are indeed unreliable, then expert judgments cannot be a trustworthy source of evidence or knowledge.

In Mizrahi (2013), I consider the following response to this question concerning the justification for the “background norm of respect for expertise”:

The assumption that an expert’s judgment that p is (defeasible) evidence for p is warranted only if expert judgments are more reliable (i.e., significantly more likely to be true) than novice judgments.

In other words, we should respect expertise, or treat expert opinion as a trustworthy source of evidence, because experts are significantly more likely to get things right than novices are. As I argue in Mizrahi (2013, pp. 63–65), however, this response is inadequate because expert judgments under uncertainty are not significantly more likely to be true than novice judgments are. Since the question whether expert judgments are reliable or not is an empirical question, I looked at experimental studies aimed at testing expert performance. I (Mizrahi 2013, pp. 63-65) observed that many experimental studies on expert performance show that experts often fail to perform better than novices on cognitive tasks such as decision-making, forecasting, diagnosing, and the like. In fact, expert judgments under uncertainty are often no more likely to be true than false, which means that, statistically speaking, they are not significantly better than guessing. Based on such empirical evidence derived from experimental studies on expert performance, in Mizrahi (2013, pp. 58–59), I argue as follows:

  1. 1.

    An expert’s judgment that p is (defeasible) evidence for p only if expert judgments are reliable.

  2. 2.

    It is not the case that expert judgments are reliable.

Therefore,

  1. 3.

    It is not the case that an expert’s judgment that p is (defeasible) evidence for p.

If cogent, this argument shows that the “background norm of respect for expertise” cannot be justified by claiming that expert judgments are reliable (either significantly more likely to be true than novice judgments or significantly more likely to be true than false). The first premise of this argument states that reliability is a necessary condition for being a trustworthy source of evidence, hence the “only if.” Being unreliable disqualifies a source from being a trustworthy source of evidence. As Devitt (2015, p. 687) puts the point with respect to intuitive judgments:

intuitions are a source of evidence because they are reliable. And if they really are reliable, which does not of course require them to be infallible, then they are indeed a source of evidence. (And, one might add, this is true of any judgment, whether intuitive or not.) (emphasis in original).

In other words, only reliable sources can be trustworthy sources of evidence. To use my own exmaple from Mizrahi (2013, p. 65):

Would you trust a watch that gets the time right 55% of the time? Would you trust a thermometer that gets the temperature right 55% of the time? I suspect the answer to these questions is “no.” Similarly, a method of reasoning, such as appealing to expert opinion, is trustworthy only if expert opinion is [reliable].

Accordingly, if there were evidence that expert judgments are reliable, either significantly more likely to be true than false or at least significantly more likely to be true than novice judgments are, then that would have been a strong reason to think that expert judgments are reliable, and hence that they can be a trustworthy source of (defeasible) evidence for the truth of such judgments. Since there is no evidence that expert judgments are reliable (in fact, there is evidence to the contrary, i.e., that expert judgments are not more reliable than novice judgments), there is no reason to think that expert judgments are a trustworthy source of (defeasible) evidence for the truth of such judgments.

If this is correct, then the justification for the “background norm of respect for expertise” cannot be that expert judgments are reliable, since experimental studies suggest that expert judgments are not more reliable than novice judgments. In fact, there is empirical evidence that expert judgments are often not significantly more likely to be true than false.

In this paper, I will consider another potential response to the question concerning the justification for the “background norm of respect for expertise.” This response goes as follows:

The assumption that an expert’s judgment that p is (defeasible) evidence for p is warranted only if experts are not susceptible to the kinds of cognitive biases that novices are susceptible to.

In other words, we should respect expertise, or treat expert opinion as a trustworthy source of evidence, because, unlike novices, experts are immune to cognitive biases, such as confirmation bias, framing effects, order effects, and the like. Since the question whether or not experts are immune to cognitive biases is an empirical question, I will look at experimental studies aimed at testing for cognitive biases among experts and novices. I will argue that this response, too, is inadequate, since experimental studies on cognitive biases show that experts are vulnerable to pretty much the same cognitive biases that novices are vulnerable to. If this is correct, then the “background norm of respect for expertise” cannot be justified by claiming that experts are immune to cognitive biases, and thus this basic assumption at the core of arguments from expert opinion, according to which expert judgments are a trustworthy source of evidence, remains unjustified.

2 Are Experts Immune to Cognitive Biases?

For present purposes, I take it as uncontroversial that, from an epistemic point of view, we should not trust or respect sources of evidence that are not reliable. My critics would probably agree with that. For example, Seidel (2014, p. 195) writes that he has “no objection to this latter claim [namely, that “arguments from expert opinion are weak arguments unless the fact that E says that p makes it significantly more likely that p is true” (Mizrahi 2013, p. 58)] since it just is a formulation of the close connection between the reliability and the epistemic trustworthiness of an epistemic source.” In Mizrahi (2013), I cite experimental studies that provide empirical evidence against the claim that expert judgments are reliable (either significantly more likely to be true than novice judgments or significantly more likely to be true than false).

The aforementioned empirical evidence notwithstanding, some might still think that we should respect expertise, or treat expert opinion as a trustworthy source of evidence, because, unlike the judgments of novices, the judgments of experts are not subject to cognitive biases. In other words, if expert judgments are unbiased judgments, then they should be trusted. For example, confirmation bias is “a tendency to search out and pay special attention to information that supports one’s beliefs, while ignoring information that contradicts a belief” (Goodwin 2010, p. 8). If we were to find out that, unlike novices, experts are immune to confirmation bias, then that would provide a strong reason to think that expert judgments are reliable, and thus that we should respect expertise. The question, then, is whether or not experts are as susceptible to confirmation bias (as well as other cognitive biases) as novices are. Since this is an empirical question, we should look at experimental studies aimed at testing for cognitive biases among experts and novices.

Here are a few examples of experimental findings about the cognitive biases novices and experts are susceptible to in a variety of domains:

  • Driving: Waylen et al. (2004, p. 323) “compared expert police drivers with novice police drivers,” and found that “[d]espite their extensive additional training and experience, experts still appear to be as susceptible to illusions of superiority as everyone else.”

  • Law: In a series of experimental studies, Guthrie et al. (2001, 2007, 2011) “demonstrated that anchoring, hindsight, framing, and other documented biases influence the way in which judges [even “judges who specialize in a specific area of law”] analyze legal vignettes” (Teichman and Zamir 2014, p. 691).

  • Medicine: The results of several experimental studies (e.g., McNeil et al. 1982) show that decisions made by patients, medical students, and physicians are all subject to framing effects (e.g., whether the same treatment is presented as a potential loss or as a potential gain).Footnote 6

  • Philosophy: A growing body of experimental work suggests that, just like the judgments of non-philosophers, the judgments of professional philosophers are influenced by cognitive biases. For instance, Schwitzgebel and Cushman (2012) show that the moral judgments of professional philosophers, just like the moral judgments of non-philosophers, are affected by the order in which hypothetical scenarios are presented. In a follow-up study, Schwitzgebel and Cushman (2015) show that framing and order effects persist despite high levels of academic expertise and familiarity with the hypothetical scenarios (e.g., the trolley problem) that were presented to the professional philosophers who participated in the study.

  • Software engineering: Leventhal et al. (1994) found that expert programmers are susceptible to confirmation bias, or what they call “positive test bias,” just as novices are. The results of their experimental study show evidence of positive test bias, i.e., of the tendency of software testers to test software against data that supports their judgments about that software regardless of the subjects’ level of expertise. (See also Calikli and Bener 2015.)

Accordingly, the results of many experimental studies do not support the claim that experts are not susceptible to the kinds of cognitive biases that novices are susceptible to. In fact, such experimental studies provide empirical evidence to the contrary; that is, they show that experts are susceptible to cognitive biases just like everyone else. As Eisenstein and Lodish (2002, p. 437) put it, the results of experimental studies like those mentioned above suggest that “[e]xperts can fall prey to the same array of cognitive biases that affect novices, resulting in sub-optimal performance and unreliability.”

In addition to the aforementioned cognitive biases, there is empirical evidence suggesting that there are several decision heuristics that both experts and novices tend to use. These unreliable decision heuristics have the following characteristics:

First, the errors in judgments attributable to the heuristic are systematic and directional; that is, they always act in the same way and in the same direction. Second, they are general and nontransferable; that is, all humans are susceptible to the errors and knowledge of how they act does not immunize us against them. Third, they are independent of intelligence and education; that is, experts make the same mistakes as novices (Lash et al. 2009, p. 6; emphasis added).

Two of these decision heuristics are:

  • Anchoring and adjustment: a decision-making process in which “people make estimates by starting from an initial value that is adjusted to yield a final answer” (Tversky and Kahneman 1974, p. 1128). These “adjustments are typically insufficient” (Tversky and Kahneman 1974, p. 1128). For example, Northcraft and Neale (1987) conducted a study whose results show that both novices (undergraduate students) and experts (real estate agents) use anchoring and adjustment in information-rich, real-world settings (as opposed to a laboratory setting).

  • Overconfidence: “people are often more confident in their judgments than is warranted by the facts” (Griffin and Tversky 1992, p. 411). Numerous experimental studies show evidence of overconfidence in expert judgments of physicians (Lusted 1977), clinical psychologists (Oskamp 1965), lawyers (Wagenaar and Keren 1986), negotiators (Neale and Bazerman 1991), engineers (Kidd 1970), and security analysts (Staël von Holstein 1972). As Griffin and Tversky (1992, p. 412) report: “As one critic described expert prediction, ‘often wrong but rarely in doubt’.”

In a study that looked at both anchoring and overconfidence, Englich et al. (2005) show that, just like novices, criminal law experts are susceptible to anchoring but tend to be very confident about their biased judgments. As Englich et al. (2005, p. 718) put it, the results of their experimental study suggest that “expertise does not protect against assimilative sentencing bias.” More generally, “experts, like lay citizens, tend to draw on various heuristics—even when such usage does not fit context” (Perez 2015, p. 119).

To avoid confirmation bias, and to make sure that the aforementioned studies on cognitive biases among experts and novices are representative, I have conducted a more systematic review of the literature on cognitive biases among experts and novices. In particular, I have searched the Web of Science for articles containing the terms ‘expert’, ‘novice’, ‘bias’, and ‘judgment’. This search turned up 21 results. From the 21 search results, I have excluded studies that are not specifically about cognitive biases among experts and novices. Four articles did not meet this selection criterion. The first article presents the results of a psychometric (as opposed to a “bare-bone”) meta-analysis of judgmental achievement (i.e., accuracy of judgments) and decision-making. The “results indicated that students reached a slightly higher judgmental achievement than experts” (Kaufmann et al. 2013, p. 13). After “separating expertise within domains,” and applying the psychometric method, “judgmental achievement among business experts dropped to a low value,” and “experts’ judgments in other research domains decreased [as well]” (Kaufmann et al. 2013, p. 13). The second article discusses the use of decision heuristics and diagnostic aids by organizational consultants, but does not provide new empirical evidence on cognitive biases among experts and novices (Armenakis et al. 1990). The third article presents the results of a study of novice performance in evaluating e-learning environments. The results show that novice evaluators perform better when they use pattern-based inspection (Lanzilotti et al. 2011). The fourth article presents the results of a study on gender recognition. The results show that “conductors were more often judged to be male” (Wöllner and Deconinck 2013, p. 84).

Of the remaining search results (17), the following studies report that expert judgments exhibit cognitive biases that novice judgments do not:

  • Experts are susceptible to “the curse of knowledge, a bias whereby knowledgeable people are unable to ignore information they hold that others do not” (Hinds 1999, p. 206; emphasis in original).

The following studies report that expert and novice judgments exhibit similar cognitive biases but to varying degrees or in different directions:

  • “Perceptual judgments of aesthetics in gymnastics appear to be subject to specific memory influences regardless of the judge’s expertise” (Ste-Marie and Lee 1991, p. 129).

  • “No significant difference between the magnitude of experts’ [experienced police drivers] and novices’ [inexperienced police drivers] aggregate biases [namely, self-enhancement bias and illusions of superiority] were found” (Waylen et al. 2004, p. 329).

  • “[N]on-experts were more likely to exhibit focal brand favorableness bias than experts” (Posavac et al. 2005, p. 93).

  • “[T]o the extent that framing bias occurs, the bias seems to occur in both expert clinicians and novice students” (Brailey et al. 2001, p. 74); “there was no evidence that students were more likely than clinicians to exhibit a decision-consistent framing bias” (Brailey et al. 2001, p. 74); “expert clinicians are less affected by confirmatory bias than students” (Brailey et al. 2001, p. 75).

  • “[I]rrespective of the level of expertise in teaching, classroom teachers and university students misjudged a tutee’s understanding” (Herppich et al. 2013, p. 256).

  • Art experts are not “immune from verbalization bias [i.e., “an increase in the evaluative significance of easily verbalized stimulus attributes brought about by the evaluator’s efforts to verbally encode a stimulus” (McGlone et al. 2005, p. 242)] in their evaluation of the abstract works [of art]” (McGlone et al. 2005, p. 249), but “experts were better able to articulate the reasons for their judgments than novices” (McGlone et al. 2005, pp. 250–251).

  • The judgments of “both novices and experts were affected [by self-assessment]; it was little effected by warning that self-assessment is unreliable” (Chen and Kemp 2012, p. 587).

  • “[T]he salesperson selections made by novice and expert sales managers were equally biased, albeit in different directions, with novices outweighing and expert underweighting historical performance trends” (DeCarlo et al. 2015, p. 1484).

The following studies report a tendency among either experts or novices to make a particular judgment rather than another as a “response bias”:

  • “[G]oalkeepers were significantly biased towards deception decisions, whereas neither the field players nor the novices showed this bias” (Canal-Bruland and Schmidt 2009, p. 239).

  • “A contextual bias effect was observed for novice and expert participants when comparing and assessing fingerprints. That bias effect was stronger for novice participants. Experts were more resistant to bias suggestions towards individualization and less so to suggestions towards inconclusive and exclusion” (Langenburg et al. 2009, p. 581).

  • “Reading the text [on baseball] reduced confidence bias scores more for the higher knowledge than for the lower knowledge participants” (Griffin et al. 2009, p. 1008).

  • “[T]he law enforcement group [compared to a group of students] showed a bias to respond ‘lie’ and not greater discrimination of message veracity [in deception detection]” (Bond 2008, p. 343).

  • “[B]oth experts’ and novices’ judgments are biased in that they assign excessively heavier weight to more pathological information and that this bias is stronger among experts than among novices” (Ganzach 1997, pp. 956–957).

  • Unlike amateur soccer players (the novices), “experts [i.e., assistant referees] showed a higher response bias toward not responding to the stimulus. […] [I]t appears that with years of practice and increased task-specific expertise, international assistant referees have biased their response criterion into a more conservative direction” (Put et al. 2013, p. 581).Footnote 7

  • Among groups of Norwegian detectives, Norwegian novices, English detectives, and English novices, “all groups, except the English [detectives], were clearly biased toward investigative hypotheses involving the abduction or killing of the girl by family members. […] Although the identity of the body had not been established, all groups, including the English [detectives], displayed a clear bias toward investigative hypotheses involving murder” (Fahsing and Ask 2016, p. 215).

  • “[W]hen recalling locations in scenes of geological relevance, geology experts’ conceptual knowledge resulted in different patterns of bias compared to experts from unrelated fields” (Holden et al. 2016, p. 450).

Of the 17 search results, none report that expert judgments are immune to cognitive biases or that expertise provides protection against cognitive biases. However, as mentioned above, one study did find that “expert clinicians are less affected by confirmatory bias than students” (Brailey et al. 2001, p. 75; emphasis added). So I have looked beyond the sample of articles produced by my search through the Web of Science for additional studies on cognitive biases among experts and novices that report similar results. I did manage to find one study whose results could be construed as showing that expertise might be able to provide at least some measure of protection against confirmation bias. Krems and Zierer (1994) argue that “high-domain knowledge experts have less of a confirmation bias than intermediates or novices” (emphasis added). This means that, like intermediates and novices, experts are still susceptible to confirmation bias, but not as susceptible as intermediates and novices are. The results of this study are taken to show that expert are less susceptible to confirmation bias than novices are because the latter take more time to correct erroneous assumptions than the former (cf. Mendel et al. 2011).Footnote 8 Accordingly, the judgments of experts are still biased, but experts correct their biased decisions faster than novices do.

To sum up, then, most of the empirical evidence on cognitive biases among experts and novices suggests that “expert judgment is subject to various biases, such as the hindsight bias,” to varying degrees, and that “experts are not immune to these biases” (Perez 2015, p. 119). In the words of Kahneman (1991, p. 144), “there is much evidence that experts are not immune to the cognitive illusions that affect other people.” If we should not trust biased judgments, and expert judgments are biased judgments, much like novice judgments are, then it follows that we should not trust expert judgments.

3 Objections and Replies

I have argued that the “background norm of respect for expertise,” i.e., the assumption that an expert’s judgment that p is (defeasible) evidence for p, cannot be justified by claiming that experts are immune to the sort of cognitive biases that novices are susceptible to. This is because empirical evidence suggests that experts are susceptible to cognitive biases just like everyone else. My overall argument can be stated as follows:

  1. 1.

    An expert’s judgment that p is (defeasible) evidence for p only if experts are not susceptible to the sort of cognitive biases that novices are susceptible to.

  2. 2.

    It is not the case that experts are not susceptible to the sort of cognitive biases that novices are susceptible to.

Therefore,

  1. 3.

    It is not the case that an expert’s judgment that p is (defeasible) evidence for p.

If cogent, this argument shows that the “background norm of respect for expertise” cannot be justified by claiming that experts are immune to cognitive biases. In other words, if there were evidence that experts are immune to the sort of cognitive biases that novices are susceptible to, then that would have been a strong reason to think that expert judgments are reliable, and hence that they can be a trustworthy source of (defeasible) evidence for the truth of such judgments. Since there is no evidence that experts are immune to the sort of cognitive biases that novices are susceptible to (in fact, there is empirical evidence to the contrary; that is, that experts are vulnerable to pretty much the same cognitive biases as novices), there is no reason to think that expert judgments are a trustworthy source of (defeasible) evidence for the truth of such judgments.

If this is correct, then the justification for the “background norm of respect for expertise” cannot be that experts are immune to the sort of cognitive biases that novices are susceptible to, since experimental studies on cognitive biases provide empirical evidence that expert judgments are susceptible to pretty much the same cognitive biases that novice judgments are susceptible to. Given that it is also not the case that expert judgments are significantly more likely to be true than novice judgments or significantly more likely to be true than false, as I argue in Mizrahi (2013), the justification for the “background norm of respect for expertise” cannot be that expert opinion is reliable (either significantly more likely to be true than novice opinion or significantly more likely to be true than false). If this is correct, then the “background norm of respect for expertise,” the basic assumption at the core of arguments from expert opinion according to which expert judgments are a trustworthy source of evidence, remains unjustified. In this section, I will consider a couple of objections to my overall argument.

Before I do so, however, it is important to acknowledge the limits of my argument. In particular, my argument concerns what Walton (1992, p. 48) calls “cognitive authority,” i.e., “authority based on [professed] knowledge (‘cognitive’, ‘epistemic’)” (Walton 2016, p. 2), as opposed to what Walton (1992, p. 48) calls “administrative authority,” i.e., “authority based on directives (‘administrative’, ‘executive’, ‘deontic’)” (Walton 2016, p. 2). Similarly, expertise can be thought of in terms of “what experts [claim to] know that others do not” (Phillips et al. 2008, p. 300; emphasis in original), or in terms of “what experts can do that others cannot” (Phillips et al. 2008, p. 300; emphasis in original).Footnote 9 The empirical evidence concerning expert susceptibility to cognitive biases, then, pertains to experts qua cognitive or epistemic authorities, not to experts qua administrative or executive authorities. From an epistemic point of view, a source of evidence is trustworthy just in case it is reliable (Devitt 2015, p. 687; Mizrahi 2013, p. 65). In particular, judgments or opinions should be trusted only if they are reliable. From an epistemic point of view, if expert judgments are unreliable, they should not be trusted. This argument is about what experts say or judge, not about what experts can do. Accordingly, it could be the case that experts are better than novices are at doing things other than expressing true opinions or making unbiased judgments. For instance, evidence suggests that experts process visual information faster than novices do (Gegenfurtner et al. 2011). However, speed does not entail epistemic reliability (either in terms of getting things right more often than not or in terms of being free from cognitive biases). From an epistemic point of view, then, we need good reasons to think that expertise guarantees epistemic reliability, especially since empirical evidence suggests that “[i]ntelligence and experience do not make a person immune to cognitive biases” (Cooper and Frain 2017, p. 26).

Since critics have claimed that my argument in Mizrahi (2013) is “self-undermining” (Seidel 2014, p. 210), some might be tempted to level a similar charge against my argument here. So, in response, it is important to note that the warrant for the claim that experts are not immune to the cognitive biases that affect all of us is not that experts on cognition say so. Rather, the warrant for the claim that experts are not immune to the cognitive biases that affect all of us is empirical evidence gathered from experimental studies on cognitive biases among experts and novices. There is a clear difference between accepting a claim because it is judged to be true by an expert and accepting a claim because empirical evidence supports it. In other words, an argument that proceeds from a premise about an expert’s judgment that p to the conclusion that p is true or probable is clearly different from an argument that proceeds from a premise about an experimental procedure that yields result r to the conclusion that r is true or probable. While the former is an appeal to a person’s judgment, the latter is an appeal to a procedure (cf. Mizrahi 2016, p. 248). Indeed, in Walton’s (2016, pp. 130–131) argumentation scheme for arguments from expert opinion, an expert is taken to be a source of evidence (the “major premise”) but there is a critical question about whether or not the expert’s assertion is based on evidence (the “backup evidence question”).Footnote 10 This is why I argue in Mizrahi (2013, pp. 67–72) that Walton’s argumentation scheme for arguments from expert opinion faces a dilemma: if there is evidence for an expert’s assertion from a source that is more reliable than expert opinion, why rely on expert opinion at all?

Of course, to say that “there is a clear difference between the inference from ‘Expert E judges that p’ to ‘p’ (where p is an opinion or judgment whose truth value is unknown and the only reason to accept p is that E asserts it) and the inference from ‘Experiment X yields the result that r’ to ‘r’” (Mizrahi 2016, p. 248) is not to say that scientific investigation is devoid of human judgment. Historians (e.g., Kuhn 2000), philosophers (e.g., Douglas 2009), and sociologists of science (e.g., Latour and Wooglar 1986) have pointed out that both epistemic and non-epistemic judgments play an important role in scientific practice. Such judgments, however, have to be “trained judgments,” to use Daston and Galison’s terminology, which are judgments made by scientists who are already working through “mechanical objectivity,” that is, scientists engaging in the pursuit of knowledge “that bears no trace of the knower” (Daston and Galison 2007, p. 17). For example, blinding or masking procedures are designed to eliminate any “trace of the knower” by guaranteeing that the experimenters or the participants (single-blind), or both (double-blind), are unaware of the treatment assignment in a clinical study. In this way, blinding is designed to minimize biases, such as assessment bias and observation bias (Bacchieri and Della Cioppa 2007, pp. 216–217). Indeed, interventions such as blinding are needed in research precisely because “[e]xperts are not immune to these cognitive vulnerabilities and hence often exhibit bias in their conclusions, but might be unaware of it” (MacLean and Dror 2016, p. 13).

The origin of this idea of scientific objectivity can be traced back to the motto of the Royal Society of London: “Nullius in verba” or “take nobody’s word for it” (https://royalsociety.org/about-us/history/). In adopting this motto, the Fellows of the Royal Society did not mean to suggest that they do not trust each other as working scientists (or, more precisely, natural philosophers). Rather, this motto “is an expression of the determination of Fellows to withstand the domination of authority and to verify all statements by an appeal to facts determined by experiment” (https://royalsociety.org/about-us/history/, emphasis added). In other words, the Fellows’ motto is meant to suggest that they do not trust testimony as a source of evidence. The Fellows of the Royal Society probably picked up this idea from Locke who argued “both that other people are a highly unreliable source of information and that, even when they speak truthfully, one cannot gain true knowledge merely by taking someone else’s word” (Huemer 2002, p. 217). As Locke (1975, p. 101) writes:

The floating of other Mens Opinions in our brains makes us not one jot the more knowing, though they happen to be true. What in them was Science, is in us but Opiniatrety, whilst we give up our Assent only to revered Names, and do not, as they did, employ our own Reason to understand those Truths, which gave them reputation (emphasis in original).

For Locke, then, as well as for the Fellows of the Royal Society, testimony is not a trustworthy source of evidence.Footnote 11

For similar reasons, in Mizrahi (2016, p. 242), I distinguish “between epistemic trust in expertise, i.e., trusting that p is true because E judges that p is true, and what we might call professional trust, i.e., trusting that E is a professional.” Scientists can trust one another professionally, i.e., they trust that their fellow scientists are professionals who can be trusted to follow protocols of experimentation, randomization, blinding, and the like. But that does not mean that they trust each other epistemically, i.e., that they “take each other’s word” without question. The same distinction applies outside of the domain of science as well. For instance, to use my example in Mizrahi (2016, p. 242), “I may trust my physician insofar as she is a professional who knows what she is doing (e.g., she knows how to perform physical examinations and make clinical decisions) but I may still mistrust her diagnosis in a particular case and seek a second opinion.” This is why, when our physicians misdiagnose, as they often do (Makary 2016), we do not say that they are not professional doctors; they are still professionals, even when they get things wrong. But if they do get things wrong more often than not, or if their judgments are biased, then their expert judgments are not reliable, and thus not a trustworthy source of evidence or knowledge.

I think that these remarks should also help in terms of pre-empting another objection, namely, that we have to appeal to expert opinion; it is an indispensable source of evidence. After all, the objection goes, what else is there? In reply, I would like to make two points. First, even if it were true that we have to appeal to expert opinion, and that it is an indispensable source of evidence, it would still not follow that expert opinion is a reliable or trustworthy source of evidence. From the fact that the only tool I have at my disposal is a hammer it does not follow that a hammer is a good tool for fixing faulty smartphones. Likewise, even if it is true that we have to appeal to expert opinion, it does not follow that appealing to expert opinion is rationally justified. I take it that, as argumentation theorists, we should not be content with claiming that expert opinion is an indispensable source of evidence. Rather, we should try to justify it, i.e., to show that expert opinion is a trustworthy source of evidence. The question, then, is how the “background norm of respect for expertise” (i.e., that an expert’s judgment that p is defensible evidence for p) can be justified.

Second, it is in fact not true that we have to appeal to expert opinion. Some alternatives to expert opinion include algorithms, Artificial Intelligence (AI), machine learning, and neural networks. In medicine, for instance, where “medical error is the third most common cause of death in the US” (Makary 2016), algorithms are used as decision aids for the purposes of diagnosis and treatment. According to the Encyclopedia of Medical Decision Making:

Algorithms are branching-logic pathways that permit the application of carefully defined criteria to the task of identifying or classifying different types of the same entity. Clinical algorithms are often represented as schematic models or flow diagrams of the clinical decision pathway described in a guideline (p. 135).

For example, Ramos et al. (2016) present evidence that an ICU admission triage algorithm demonstrated higher interrater reliability than physicians’ judgments and Craig et al. (2010, p. 11) developed a “computerized diagnostic model” that “outperformed clinical judgment for the diagnosis of fever in young children”. Such clinical algorithms are now becoming more widely used in medicine (Winters-Minder et al. 2015).

At this point, some might be tempted to retort that appealing to expert opinion is still indispensable for it is experts who design and “train” the algorithms in the first place. But this retort fails to take into consideration the types of algorithms, and other decision aids, that are currently available. Keeping with the theme of examples from the field of medicine, AI and machine learning technology can assist medical professionals in their clinical decision-making. As Kononenko (2001, p. 90) points out:

Machine learning technology is currently well suited for analyzing medical data, and in particular there is a lot of work done in medical diagnosis in small specialized diagnostic problems.

One of the prominent examples of the application of machine learning in medicine is IBM’s Watson. Watson’s capabilities include natural language processing, hypothesis generation and evaluation, and evidence-based learning, which are put to use in helping medical professional make clinical decisions, such as diagnosis and treatment for cancer patients (IBM 2013).

In addition to AIs like IBM’s Watson, there are also neural networks that can learn from experience to the point that they outperform human experts on a particular cognitive task. For example, a deep neural network model called “Watch, Listen, Attend, and Spell (WLAS),” which was developed by researchers from Google and Oxford University, can transcribe videos better than a professional lip reader (Chung et al. 2016). Another AI machine that surpasses human experts in pattern recognition tasks is Giraffe, “a chess engine that uses self-play to discover all its domain-specific knowledge, with minimal hand-crafted knowledge given by the programmer” (Lai 2015, p. 2). Ever since IBM’s Deep Blue beat the world chess champion, Gary Kasparov, in 1997, machines have become much better than professional chess players, to the point where chess players nowadays “prefer to stay away from computers as opponents” (Siegel 2016). Unlike other chess playing machines, however, Giraffe learns to play chess by evaluating moves rather than calculating every possible move (Lai 2015, p. 16).Footnote 12

The point, then, is that appealing to expert opinion is not indispensable; there are alternative sources of evidence that one could appeal to. Of course, there may be pragmatic reasons why we might be wary of AI (see, e.g., Bostrom 2014). From an epistemic point of view, however, a source of evidence is trustworthy just in case it is reliable. If it turns out that the outputs of neural networks are more reliable (either more likely to be true than false or immune to cognitive biases) than the judgments of human experts, then from a purely epistemic point of view, we should prefer the former to the latter.

In that respect, it is also important to note that conceding that expert opinion is not a reliable source of evidence does not mean that “anything goes” and that people should simply make up their own minds without appealing to any source of evidence. Again, there are sources of evidence other than expert opinion as well as a variety of decision aids, such as algorithms and neural networks. Indeed, given the limitations of our sensory perception, we generally need instruments, such as microscopes and telescopes, to help us in our epistemic pursuits. As Hooke (Micrographia, Preface) puts it, scientific instruments, like his microscope, are needed for the “inlargement of the dominion of the senses.”Footnote 13 Given our cognitive limitations, then, it should not be surprising that we also need instruments, such as algorithms, to help us in our epistemic pursuits. In fact, just as non-experts may need assistance from decision aids, such as the Carneades Argumentation System (CAS), to evaluate expert opinion (Walton 2016), experts may need assistance from decision aids, such as algorithms and neural networks, to make more reliable, and less biased, judgments.

4 Conclusion

In this paper, I have argued that the justification for the “background norm of respect for expertise” cannot be that experts are immune to the sort of cognitive biases that novices are susceptible to. This is because experimental studies on cognitive biases provide empirical evidence that expert judgments under uncertainty are susceptible to pretty much the same cognitive biases that novice judgments are susceptible to. Given that it is also not the case that expert judgments are significantly more likely to be true than novice judgments or significantly more likely to be true than false (Mizrahi 2013), the justification for the “background norm of respect for expertise” cannot be that expert opinion is reliable (either significantly more likely to be true than novice judgments or significantly more likely to be true than false).

If this is correct, then the basic assumption at the core of accounts of arguments from expert opinion, according to which expert judgments count as (defeasible) evidence for propositions, remains unjustified. And so the question I have raised in Mizrahi (2013) remains unanswered: what is the justification for the “background norm of respect for expertise,” i.e., the assumption that an expert’s judgment that p is (defeasible) evidence for p?

Some might think that the argument I have advanced in this paper is mostly “negative,” and that it would be ideal if I could offer a “positive” proposal in the form of a new argumentation scheme for arguments from expert opinion. I am afraid that I do not know how to justify the assumption that expert opinion counts as (defeasible) evidence other than to show that expert opinion is reliable. I think Devitt (2015, p. 687) is right in saying that judgments count as a source of evidence only if they are reliable. (See also Mizrahi 2013, p. 65.) As we have seen, however, there is empirical evidence suggesting that expert opinion is not reliable, both in terms of not being significantly more likely to be true than novice opinion (Mizrahi 2013) and in terms of not being immune to the cognitive biases that affect novice opinion. Since being unreliable disqualifies a source from being a trustworthy source of evidence, at the very least we should suspend judgment about the evidential status of expert opinion until new evidence concerning the reliability of expert opinion comes to light.

However, the brief remarks I have made at the end of Sect. 3 do point to what I think might be a promising way forward. As I have suggested above, our current epistemic enterprises may be such that they cannot be adequately pursued without the aid of instruments (e.g., telescopes, microscopes, MRI, etc.) that enhance the dominion of perception. Similarly, I submit, perhaps we have reached a point where our epistemic enterprises are such that they cannot be adequately pursued without the aid of instruments that enhance the dominion of cognition. If this is correct, then a promising way forward might be to think of algorithms, neural networks, and other decision aids, not as replacing expert opinion, but as enhancing expert performance. If it turns out that expert opinions aided by algorithms, neural networks, and other decision aids are more reliable than unaided expert opinion, then, from an epistemic point of view, we should prefer the former to the latter. Just as unaided observations of celestial bodies would not be trusted nowadays,Footnote 14 unaided expert opinions should not be trusted as well. If this is correct, then it opens up new avenues of research for argumentation theorists to investigate the ways in which expert opinion can be aided, and by what means, such that it is more reliable and less biased, and how arguments from expert opinion can thereby be made stronger.