Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

there is no species of reasoning more common, more useful, and even necessary to human life, than that which is derived from the testimony of men, and the reports of eyewitnesses and spectators. (Hume 1977, p. 74)

1 Introduction

Within philosophy, appreciation of the fact that much of what we believe is derived from the assertions of others has meant that testimony, after many decades of neglect, has become a topic of considerable interest (e.g. Coady 1992; Kusch and Lipton 2002; Adler 2006). The focus here has largely been on the extent to which testimony may be said to give rise to ‘knowledge’, and how it relates to other sources of knowledge, in particular, perception.

In keeping with this emphasis, there has been a long tradition of psychological research that has focused on the reliability of testimony. Motivated particularly by the issue of evaluating witnesses in legal contexts, there has been much research both into how reliable people are as witnesses, even where they are well intentioned, and into how people evaluate the reliability of witness evidence. Limitations on reliability arise not just from simple ‘forgetting’ but from the reconstructive nature of memory which makes memory sensitive to the way information is elicited (e.g. Loftus 1975); consequently, research has identified factors relating to the characteristics of the event, the witness and the procedures by which testimony is gained (see, e.g. Wells and Olsen 2003, for a review). Studies concerned with the evaluation of witnesses have examined factors affecting how people weigh and interpret evidence (e.g. Carlson and Russo 2001; Schuller et al. 2001; Weinstock and Flaton 2004); they have also examined people’s responses to different types of testimony such as testimony by experts or by children and so on (e.g. Eaton and O’Callaghan 2001; ForsterLee et al. 2000; Krauss and Sales 2001), and they have tried to examine how testimony is seen relative to other types of evidence (e.g. Skolnick and Shaw 2001).

Motivated by the epistemological importance of testimony, the reception of testimony has also been a recent concern within developmental psychology. Not only is much of what we think we know derived from the testimony of others, but it is, to a considerable extent, acquired early in life. Developmental research has established that, contrary to long-held beliefs, even young children are not uniformly credulous. Rather, they display considerable selectivity in who they choose to learn from. Specifically they will select informants with whom they have had previous interactions and prefer those for whom past interactions have indicated reliability and expertise (for a review, see Harris and Corriveau 2011).

However, testimony need not be viewed as just a particular kind of evidence. It is also a very general feature of argumentative discourse. In any argument, reasons are necessarily advanced by a specific agent; they are not simply abstract propositions floating about. The presentation of an argument can itself be viewed as a type of testimony (see also Adler 2006), and the inherent combination of argument content with an argument source raises the question of how the source should be taken into account. Here, a common view seems to have been that arguments should ‘speak for themselves’ and that the source should play no role. This is manifest in the fact that arguments involving characteristics of the source itself have traditionally been viewed as fallacious. The ad hominem argument, which seeks to undermine the credibility of the source (e.g. Walton 1998; see also Oaksford and Hahn, this volume), is a staple of the traditional catalogue of fallacies (e.g. Woods et al. 2004) and is routinely featured in textbooks on critical thinking (e.g. Bowell and Kemp 2002; Hughes et al. 2010; Rainbolt and Dwyer 2012). The reasoning behind this is that properties of the speaker are insufficient to undermine an argument. Likewise, the appeal to authority (or argumentum ad verecundium, Walton 1998), which seeks support for a position from the credentials of the speaker, is viewed as fallacious because the fact that the speaker is of high standing is irrelevant to the argument itself and does nothing to improve it.

In keeping with this, there is also a long social-psychological tradition of research on persuasion that has treated message content and message source as two more or less independent variables that are associated with psychologically distinct routes through which we are persuaded. Specifically, persuasion research has distinguished between analytical and heuristic routes to persuasion (e.g. Chaiken 1980; Petty et al. 1981; Petty and Cacioppo 1984; Eagly and Chaiken 1993). The analytic route is characterised by careful scrutiny of the message content in order to determine the merits of the argument, whereas the heuristic route is characterised by more low-effort, shallow processing which focuses on putatively superficial and readily available characteristics of the message such as the perceived credibility of the source, the attractiveness or likeability of the source, or the quality of presentation. Although it has been acknowledged that there may be circumstances in which characteristics of the source may themselves be considered as cues that are relevant in analytic processing (e.g. Petty and Wegener 1999; see also Kruglanski and Stroebe 2005), source characteristics, by and large, have been associated with qualitatively inferior evaluation.

One may ask, though, whether such a separation between source and content is really normatively desirable. Clearly, there are cases where one is in a position to evaluate everything about the content of an argument. In this case, it is not clear what source considerations could add. However, at least as prevalent seem cases where there may be some uncertainty about the content – concerning, for example, its veracity or completeness. Here, source characteristics could provide additional information, at least in principle, and it would seem questionable to ignore information that could be inductively useful. Moreover, a strict separation between content and source presupposes a fundamental distinction between ‘argument’ and ‘evidence’, given that reliability considerations seem essential to testimonial evidence. There may be some types of reasons which should be viewed as only one or the other. However, there would seem to be many more where no real distinction exists, and this can be seen nowhere more clearly than in the overlap between the supposedly fallacious appeal to authority and the testimony of experts. Hence, it should also come as no surprise that more recent treatments of the fallacies have moved away from treating either ad hominem arguments or appeals to authority as always fallacious (see, e.g. Tindale 2007, and examples therein).

This then raises the questions of how source considerations should be taken into account, how fallacious and non-fallacious ad hominem arguments should be distinguished (see also Oaksford and Hahn, this volume) and, more generally, how source and content characteristics should be combined in the overall evaluation of an argument or piece of evidence.

In this chapter, we examine these questions both from a Bayesian perspective and from the perspective of plausible reasoning, drawing out and contrasting theoretical positions and comparing them with experimental data concerning people’s intuitions.

2 Testimony, Argumentation and the ‘Third Way’

Argumentation typically involves uncertainty. In arguing, we seek to convince those who are not yet fully convinced of a position. There is no point seeking to convince further someone who is already fully convinced of a position, nor is there, practically, much point in trying to convince someone of a position that they are certain is actually wrong. This severely limits the role of classical logic in everyday argument, and most everyday arguments are not logically valid (see, e.g. Toulmin 1958; Perelman and Olbrechts-Tyteca 1969). Appropriate norms for rational argument must consequently deal naturally with uncertainty. The probability calculus provides the standard formal tool for dealing with uncertainty. However, many authors have held the view that probabilities are inappropriate or insufficient for dealing with argumentation (for references and discussion of some of these critiques, see, e.g. Hahn and Oaksford 2006b) and have advanced the view that there is some third form of reasoning, in addition to deduction and induction, that requires formal development. Specifically, ‘plausible reasoning’ may constitute such a third option (see, e.g. Pollock 2001; Walton 2004). However, despite many differences in detail, one may consider as proponents of a potential ‘third way’ any of the many default logics, non-monotonic logics and logics of practical reasoning that have been proposed (see, e.g. Prakken and Vreeswijk 2002, for an overview). Furthermore, not only may these ‘third way’ approaches be considered as candidates for formalising argumentation, they are frequently advanced as tools for dealing with uncertainty per se.

A central concept within this third way tradition is the notion of ‘presumption’ or ‘default’ (see Rescher 1976). A presumption is a position that is adopted, ‘as a rule’, in the absence of specific counter-indication. A dialectic borrowing from law, the notion of presumption is closely related to another legal import, the concept of burden of proof (for an overview and critical evaluation of the burden of proof in the context of argumentation, see Hahn and Oaksford 2007b). The basis for presumption is plausibility. Rescher (1977), for example, states this as follows:

Presumption favors the most plausible of rival alternatives-when indeed there is one. This alternative will always stand until set aside (by the entry of another, yet more plausible, presumption). (p. 38)

Plausibility, for Rescher, is not a matter of probability but rather of how well something ‘fits’ within our overall framework of cognitive commitments.

Interestingly, testimony plays an important role in this for Rescher:

The standing of sources in point of authoritativeness affords one major entry point to plausibility. In this approach, a thesis is more or less plausible depending on the reliability of the sources that vouch for it- their entitlement to qualify as well-informed or otherwise in a position to make good claims to credibility. It is on this basis that ‘expert testimony’ and ‘general agreement’ (the consensus of men) come to count as conditions for plausibility. (p. 39)

This thread is developed further by Walton (2008) who explicitly adopts the ‘third way’ approach based on defaults and defeasible claims as a framework for ‘modeling rational thinking about witness testimony as a kind of evidence’ (p. 3). Walton shares Rescher’s ready dismissal of probability as a tool (see, e.g. Walton 2008, pp. 92–102, 2001; Rescher 1976, in particular, pp. 28–39; for counterarguments, see, e.g. Hahn and Oaksford 2006b; note also, however, that despite some discussion of Bayesianism, Walton typically seems to think of probabilities as `objective’, `statistical’ quantities, see, e.g. Walton 2008, pp. 206–209). Walton rejects the view that witness testimony may be seen as inductively strong (2008, p. 99). He seeks instead to provide rational, normative guidance on testimony by characterising ‘the appeal to witness testimony’ as a particular kind of argument with its own structure, that is, its own premises and conclusions, and requirements or queries that must be satisfied in order for the argument to be cogent. Specifically, Walton tries to establish a so-called argumentation scheme for witness testimony. An argumentation scheme is a stereotypical pattern of inference that is characterised by its specific type of premises and conclusion, along with the nature of the inferential link between the two (see also, e.g. Walton 1996, 2008; Verheij 2003). This scheme-based approach seeks to broaden the range of circumstances in which a conclusion can be viewed as rationally derived from a set of premises which are assumed to be true in order to capture the many informal arguments that are beyond the reach of classical logic. In general, the conclusion is defeasible (held tentatively subject to further information), and the rationality of a particular scheme rests on the fact that the defeasible inference or presumption is typically plausible, given what we know about the world.

For testimony, Walton provides a number of related schemes, the most basic of which is the ‘argument from a position to know’ and of which the ‘argument from expert opinion’ (discussed more extensively below) is a subtype. Walton’s paradigmatic example concerns a dialogue in which someone lost in a foreign city asks a stranger for directions to the central station. Here, the person seeking directions presumes that the stranger is familiar with the city, and the underlying scheme in such a dialogue, the ‘argument from a position to know’ (Walton 1996, p. 61), has the following structure:

  • Major premise: Source a is in a position to know about things in a certain subject domain containing a proposition A.

  • Minor premise: a asserts that A (in domain S) is true (false).

  • Conclusion: A is true.

This scheme shifts ‘a probative weight’ from the premises to the conclusion (see also Walton et al. 2008) which is rendered defeasibly plausible or acceptable. However, matching the argument from a position to know are three critical questions:

  1. CQ1.

    Is a in a position to know whether A is true (false)?

  2. CQ2.

    Is a an honest (trustworthy, reliable) source?

  3. CQ3.

    Did a assert A is true (false)?

In a given case, the argumentation scheme is evaluated in light of these critical questions. When such a question is asked the probative weight ‘shifts’; and it shifts back again only if the question is answered satisfactorily.

Identifying the structure of particular types of arguments is an important and interesting issue for argumentation research. However, at the end of the day, the practically most pressing question in evaluating any particular argument or line of reasoning is how strong it should be considered to be. Normative frameworks must also have something to say about this issue. Such summary evaluation needs to reflect the fact that evidence can be more or less compelling, and that, often, multiple sources of evidence must be combined. The majority of ‘third way’ approaches arguably bypass this evaluation question altogether, not just in the context of testimony. Walton (2008) also suggests that finding satisfactory answers to the question of evaluating strength in the case of testimony is more difficult than just drawing out its structural characteristics. However, Walton (2008) does draw together various evaluation rules that have been proposed in the literature and expands on these to provide a framework for the evaluation of plausible arguments.

Integral to this are two different evaluation contexts that according to Walton (2008, see also Walton 1992) need to be distinguished: these are what are known in the argumentation literature as linked and convergent arguments (or ‘coordinative’ and ‘subordinative’ argumentation, see also van Eemeren and Grootendorst 2004, and Johnson 2000, for discussion and further references). In a convergent argument, a number of arguments each independently support a claim. By contrast, in linked arguments, arguments depend on each other and provide support for the claim only in combination. Although this distinction is not captured by classical logic, Walton maintains that it is fundamental in dialectic contexts because these two types of arguments can be attacked (or need to be defended) in very different ways. In the case of a linked argument, the proponent will seek out the weakest of the premises because once this fails, the whole argument fails. However, this strategy is not sufficient in the case of a convergent argument because ‘taking out’ one premise still leaves the other intact as a separate line of support.

For linked arguments, Walton proposes that the overall strength of the argument is determined by the weakest link. This so-called MIN rule (see also Walton 1992) follows on from proposals by Rescher (1976) and Pollock (2001).

Rescher put forward Theophrastus’ rule as a consequence condition for plausible reasoning:

when a set of mutually consistent propositions in a given set of propositions with plausibility values entails some other proposition in that set, the resulting proposition cannot be less plausible than the least plausible among them. (Rescher 1976, p. 15)

Pollock (1995, pp. 95–101) generalised Theophrastus’ rule to chains of arguments through ‘the weakest link principle’ (p. 99):

the degree of support of the conclusion of a deductive argument is the minimum of the degrees of support for its premises.

Walton, however, is explicit in extending this weakest link principle to arguments that are not deductively valid (e.g. Walton 2008, p. 96).

For convergent arguments, by contrast, a different rule is required, and for these, Walton (1992, 2008) recommends the MAX rule, whereby the overall plausibility of the conclusion corresponds to the plausibility of the strongest of the independent lines of support (or is at least as strong).

The result then is the so-called MAXMIN rule (Walton 1992, p. 43, 2008), whereby a reasoner is instructed to,

At each local argument in the sequence of connected argumentation, use the least plausible premise rule if the argument is linked, and use the most plausible premise rule if the argument is convergent

Testimony and source reliability, for Walton (2008), involve linked arguments. The content and the source of a testimonial statement are not independent lines of support that stand one without the other but rather are inextricably linked. In the remainder of this chapter, it will be argued that the MAXMIN rule does not provide an appropriate approach to testimony and that it conflicts with fundamental intuitions. Before dealing with the case of testimony specifically, however, some general concerns regarding the MAXMIN rule will be discussed.

3 Some Problems for MAXMIN

A fundamental problem for the consequence condition, and with it the MIN rule, involves conjunction. From A and B, it follows deductively that A & B. Hence, it should be the case by these rules thatFootnote 1

$$ {{Plausibility(\it A}} \& {{\it B\rm)}} \ge {{MIN Plausibility(\it A\rm)}},{{ Plausibility(\it B\rm)}} $$
(1)

However, intuitive examples can readily be found where this does not seem to be the case. Walton (1992) discusses two; the first is as follows:

  • A = Jones is less than 5 ft tall.

  • B = Jones is an all-star forward on the NBA for the Los Angeles Lakers.

  • Conclusion (A&B) = Jones is a less than 5-ft-tall all-American forward on the NBA Los Angeles Lakers.

Here, even if there is evidence to support the plausibility of both A and B individually, the plausibility of an extremely successful less than 5-ft-tall basketball player seems rather limited.

The problem here may be that the two statements seem to point in opposite directions and may thus possibly be addressed by an additional condition Rescher (1976) specified on plausible inference, namely, that the propositions in the set must be ‘logically compatible and materially consonant with one another’ (p. 15). This condition would seem to raise more problems than it solves in that it remains entirely unclear how this is to be assessed (see also Walton 1992), and it indicates, furthermore, the limitations of the rule, which now leaves such simple cases without evaluation procedure.

However, the compatibility condition is also insufficient in that neither logical nor probabilistic conflict is required to generate problematic examples as is clear from Walton’s second example:

  • A = The first flip of this coin will be heads.

  • B = The second flip of this coin will be heads

  • Conclusion(A&B) = Both the first and the second flip of this coin will be heads.

A and B are entirely compatible, but their conjunction nevertheless seems less plausible than each of them individually, and, as Walton (1992) concedes, ‘plausibility seems to parallel probability in this case’ (p. 38). It is the contention of this chapter that this is true in other cases also.

This particular case (which links also to Kyburg’s 1961, ‘lottery paradox’ and Makinson’s 1965, ‘preface paradox,’ see Wheeler 2007, for a review, and on the preface paradox see also Blamey, this volumeFootnote 2), however, is particularly transparent, in that there are clear probabilities we are willing to attribute to coin tosses. The example illustrates that conjunction is not ‘probability functional’ (Adams 1998). Whereas conjunctions such as E and L and E or L are truth-functional in the sense that their truth values are functions of the truth values of E and L, they are not probability functional because their probabilities are not functions (solely) of the probabilities of E and L. By the same token, probabilities are not simply ‘degrees of truth’ as in many-valued logics, and the combination of probability and logic must respect the unique inference and combination rules of each if it is to be successful.

For probabilities, in fact, the conjunction of two events can be no more probable than the less probable of the two events, that is,

$$P(A\& B) \leq MIN P(A),P(B) $$
(2)

and violations of this in judgement are known as the ‘conjunction fallacy’ (e.g. Tversky and Kahneman 1983). That is, in the case of probabilities and their conjunction, the ‘weakest link’ does not provide a lower bound on strength as stipulated by the consequence condition and the MIN rule but rather an upper bound.

Probabilistically, the value assigned to a conjunction is governed by the relationship

$$ {{\it P{\rm (}A}}\& {{B{\rm )}}} = {{\it P{\rm (}A{\rm )}}} \times {{\it P}}{\rm (} {{{\it A}}|{{\it B}}} {\rm )}{ } = {{\it P{\rm (}B{\rm )}}} \times {{\it P}}{\rm (} {{{\it B}}|{{\it A}}} {\rm )} $$
(3)

that is, the relationship between A and B, as captured by the conditional probabilities also matters.

Hence, in the specific example of the coin toss, with P(A) and P(B) each reflecting an unbiased coin at.5, P(A&B), by Eq. 3, equals.25. In other words, the probability of the conjunct can be lower than the minimum of either P(A) or P(B) and can be as low as zero. Specifically, the lower bound on the value of the conjunct is 0 if P(A) + P(B) ≤ 1 and P(A) + P(B)−1 otherwise.Footnote 3 The coin example is troubling for the consequence condition because it would seem to otherwise fall squarely within its remit. The example may, however, be relegated into the role of an ‘exception’ if everyday arguments can be mapped on to probabilities only in exceptional circumstances or if it can be shown that our intuitions about everyday arguments clearly follow consequence condition and MIN rule instead. It will be the goal of the remainder of this chapter to show that neither is, in fact, the case.

4 A Bayesian Perspective

At the heart of the Bayesian perspective on testimony is Bayes’ theorem – a normative rule for updating beliefs based on new evidenceFootnote 4:

$$ {{\it P}}{{\it (h}}|{{\it e)}} = \frac{{{{\it P(h)\it P(e|h)}}}}{{{{\it P(h)P(e|h)} + {\it P(\neg h)P(e|\neg h)}}}} $$
(4)

according to which one’s posterior degree of belief in a hypothesis, h, in light of the evidence, P(h|e), is a function of one’s initial, prior degree of belief, P(h), and how likely it is that the evidence one observed would have occurred if one’s initial hypothesis was true, P(e|h), as opposed to if it was false, P(e|¬h). These latter two quantities P(e|h) and P(e|¬h) may be thought of as the ‘hit rate’ and ‘false positive rate’ of a diagnostic test. Their ratio, the so-called likelihood ratio, provides a natural measure of the diagnosticity of the evidence – that is, its informativeness regarding the hypothesis in question.

Crucially, if P(e|h) > P(e|¬h), then receipt of the evidence will result in an increase in belief in h, whereas if P(e|h) < P(e|¬h), then receipt of the evidence will result in a decrease, and if the two are equal, our beliefs remain unchanged. Moreover, the magnitude of the difference between P(e|h) and P(e|¬h) will influence directly how much change in belief is brought about –more reliable evidence will lead to higher posterior degrees of belief. Finally, where there is more than one piece of evidence their combined impact is readily derived through sequential application of Bayes’ theorem, taking the posterior at each step as the new prior that is combined with the next piece of evidence in order to calculate its impact.

This captures naturally the simple case in which multiple independent witnesses all provide the same testimony. This may be illustrated with a further example of Walton’s (1992, p. 42):

Virgil said sincerely that there is a fire.

Vanessa said sincerely that there is a fire.

Therefore, there is a fire at the university.

where Virgil is a highly reliable source and Vanessa somewhat less reliable. From the Bayesian perspective, this means that the ratio P(‘Virgil says fire’|fire)/P(‘Virgil says fire’|no fire) is greater than that of P(‘Vanessa says fire’|fire)/P(‘Vanessa says fire’|no fire). Consequently, Virgil’s testimony on its own will lead to greater degrees of belief in the presence of a fire than Vanessa’s. However, receiving Vanessa’s independent evidence will further increase our belief in the presence of a fire, as will every further witness even if they are yet less reliable (as long as we assume the relevant likelihood ratio is greater than 1). In other words, each witness has their own impact on our conviction, with that impact scaled by their reliability.Footnote 5

From the perspective of plausible reasoning, it constitutes a convergent argument because each premise provides a separate, independent line of evidence, consequently the MAX rule is applied (Walton 1992, p. 42). In the best case, this leaves the exact plausibility of the conclusion under-defined because the rule stipulates simply that “in a convergent argument the conclusion is at least as plausible as the most plausible premise” (p. 42, italics added), or, at worst, this ignores entirely Vanessa’s testimony (and an army of potential further witnesses like her) if for convergent arguments we “take the maximum of the value of the premises” (p. 44). Either way, this seems a less satisfactory treatment.

This case, however, is really only the most simple case of testimony. More subtle, and considerably less well-examined, issues arise when witnesses differ not just in the reliability of their testimony but also in its content. These issues will be the focus of the remainder of this chapter.

5 Message Content and Message Source: Exploring Norms and Intuitions

Where witnesses differ not just in reliability but also in the content of their testimony, the impact of both of these factors on the believability (or plausibility) of the conclusion needs to be taken into account.

There are two ways in which source reliability might be factored into a Bayesian model of a given task. The first is to consider source reliability as an exogenous variable; that is, inherent characteristics of the evidence – or message content – and the characteristics of the source providing that evidence are (implicitly) combined into a single, overall likelihood ratio (as in, e.g. Birnbaum and Mellers 1983; Birnbaum and Stegner 1979; Corner and Hahn 2009). In other words, evaluation is based on the subjective probability of the composite evidence E ‘that specific message from that specific source’ conditional on truth or falsity of the hypothesis, that is, P(E|H) and P(E|¬H).

The second possibility is to model source reliability endogenously, capturing it through an explicit variable(s) in the model (as in, e.g. Bovens and Hartmann 2003; Friedman 1987, see also Goldman 1999; Hahn et al. 2009; Hahn and Oaksford 2007a, b; Pearl 1988; Schum 1981, 1994). This involves a cascaded inference in a hierarchical model. Figure 1 shows a simple hierarchical model in which to capture an evidence report from a partially reliable source. This model captures explicitly the fact that what is received is a report of some evidence through a partially reliable source, not the evidence directly. In other words, it naturally captures cases of testimony where evidence of an event is based on witness description, not on first-hand experience.

Fig. 1
figure 1figure 1

A hierarchical model in which the reliability of the reporting source is captured explicitly. Three levels are distinguished: the underlying hypothesis H, the evidence E and the source’s actual report of that evidence Erep

The likelihood ratio associated with such an evidence report, Erep, is described by Eq. 5 is:

$$ \frac {P{(E|H)}\it[P(E_{rep}|E,H) \ - P(E{rep}|\neg {E, H})] + P(E_{rep}|\neg {E,H})}{P(E|\neg H)\it[P(E_{rep}|E,\neg H) - P(E_{rep}|\neg{E},\neg{H})] \ + P (E_{rep}|\neg E, \neg H)} $$
(5)

Here, P(Erep|E,H) represents the probability of an evidence report, Erep, to the effect that the evidenceE obtains, given that both E and H(the hypothesis) are true, and so on (see also Schum 1981). If the witness is completely reliable and reports only the true state of the evidence, then Eq. 5 reduces simply to the standard direct relationship between evidence and hypothesis. An immediate, general characteristic of testimony arises from this formalisation. Specifically, the evidential characteristics of the report vis à vis the hypothesis are a multiplicative combination of the diagnosticity of the evidence itself and the characteristics of the reporting source, that is, the source’s own hit and false alarm rate regarding the true state of that evidence.

If we contrast sources high and low in reliability and contrast arguments from these sources that are either weak or strong, then this multiplicative combination means that we should see not only independent contributions on posterior degree of belief of source reliability and argument strength, but these factors should interact (see Fig. 2 below). This is indeed what is observed in recent experimental studies of argumentation.Footnote 6 Specifically, participants in Hahn et al.’s (2009) studies saw arguments such as the following:

Dave: This drug is safe.

Jimmy: How do you know?

Dave: Because I read that there have been fifty experiments conducted, and they didn’t find any side effects.

Jimmy: Where did you read that?

Dave: I read it in the journal Science just yesterday.

(strong content/reliable source)

or,

Dave: This drug is safe.

Jimmy: How do you know?

Dave: Because I read that there has been one experiment conducted, and it didn’t find any side effects.

Jimmy: Where did you read that?

Dave: I got sent a circular email from excitingnews@wowee.com

(weak content/unreliable source)

as well as the combinations `strong content/weak source’ and `weak content/strong source’. Figure 2 shows the resultant ratings of convincingness given by participants. The convincingness of the arguments were affected both by the nature of the source and the content of the argument, with a statistical interaction between the two, in line with the Bayesian norm. This interaction can be seen in the Figure in the ratings for the strong content/reliable source, which sees an extra `boost’ relative to the difference between reliable and unreliable source in the weak argument condition. This may be contrasted, once again, with the evaluation suggested by a plausible reasoning perspective. As noted above, Walton (2008) states that such cases should be considered as linked arguments. On receiving an argument from a source of given reliability, one can attack either the argument itself or the reliability of the source. Undermining the reliability of the source will also undermine the argument (unless of course that argument has some independent basis). Hence, the two components form a linked argument, in which, according to the MIN rule, the overall strength depends on the weakest link. If the plausibility of the conclusion is set to the weaker of the two components, however, then evaluation will necessarily be blind to one of the dimensions of variation considered in the matrix of Fig. 2 panel (a) and the data of panel (b). Specifically, if the degree of plausibility assigned to the claim that the source is reliable is less than the plausibility value assigned to the content of either the weak or strong argument, then the variation in strength of content is immaterial. Conversely, if the plausibility value attached to the source is higher than that attached to the argument content, then the variation in reliability is without consequence. Thus, the MIN rule would seem to violate fundamental intuitions about argument strength across such simple sets of arguments.

Fig. 2
figure 2figure 2

Varying both source reliability and argument strength. Panel (a) highlights the factorial combinations arising from contrasts of weak and strong evidence combined with low and high source reliability; Panel (b) shows data from a corresponding experimental manipulation in Exp. 1 from Hahn et al. (2009).

This limitation stems from the fact that the weakest link idea is implemented in what is in effect a ‘loser takes all’ fashion. At fault here is not the intuition that the impact of testimony should somehow be limited by the reliability of the source, but rather the specific way in which this fundamental intuition is implemented. Notably, there is a way in which source reliability caps the influence of evidence within a Bayesian framework as well (see also Oaksford and Hahn this volume). Returning to Eq. 5 above and its multiplicative nature, we had noted that if the witness is completely reliable and reports only the true state of the evidence, then Eq. 5 reduces simply to the standard direct relationship between evidence and hypothesis. By the same token, where the evidence is entirely deterministic and arises if and only if the hypothesis is true (i.e., P(E|H) = 1, P(EH) = 0), the hit and false positive rates of the witness completely determine the characteristics of the report. From this latter case, it can be seen that less than perfect reliability of the witness necessarily reduces the overall diagnosticity of the evidence received. How diagnostic the report can be and hence what posterior degree of belief it can bring about is capped by the reliability of the witness (see Hahn et al. 2009).

The effects of this are demonstrated in Fig. 3 which contrasts the resultant posterior degree of belief arising from message content that ranges in strength from weak to extremely strong (as measured by the likelihood ratio associated with the content itself) when that evidence is received from a partially as opposed to fully reliable source. In other words, the Bayesian perspective captures naturally the sense that limits on the reliability of the source must limit the ultimate conviction in the conclusion that their argument brings about, but it does so without the counter-intuitive consequences of the MIN rule or weakest link principle.

Fig. 3
figure 3figure 3

The figure contrasts the relative impact of receiving the same message (varying in evidential strength as measured by the likelihood ratio) given variation in the reliability of the reporting source. The x-axis captures message strength, and the y-axis indicates resultant degree of conviction measured as posterior degree of belief (with the prior always set to.5)

Finally, there has been interest recently in considering another way in which message content and source reliability interact, namely, not just as distinct factors that determine how convincing an argument is, but rather as sources of evidence that may be seen to possess inferential value about each other. This intuition is captured in a further hierarchical model of evidence from partially reliable sources by Bovens and Hartmann (2003).

Again, the hypothesis (or conclusion) at stake, the source and the evidence presented by the source all have an explicit representation within a simple Bayesian belief network. This simple network is shown in Fig. 4. The network captures the intuition that what a source actually reports, Erep, is determined both by the ‘actual’ evidence and the reliability of the source, in that a less than fully reliable source may misreport the evidence in question. But it differs from the hierarchical model of Fig. 1 above in exactly which relationships and factors are represented explicitly and hence which can be explicitly reasoned about. The above model of Fig. 1 distinguishes as explicit variables the ‘actual’ evidence E and the source’s report, Erep. By contrast, the present model in Fig. 4 wraps this distinction into a direct relationship between hypothesis H and report. But it represents as an explicit variable the reliability of the source, Rel, whereas the former model captures the reliability of the source in the relationship (i.e., likelihood ratio) between E and Erep. The two models share all the general characteristics discussed so far, namely, the multiplicative relationship between message content and message source in their effect on posterior degree of belief, and, the fact that the degree of reliability of the source ‘caps’ the impact of the source’s evidence (see also Hahn et al. 2009). However, by representing the reliability of the source as a separate variable, the model of Fig. 4 captures the intuition that the content of the message potentially has an impact not just on our degree of belief in the conclusion (hypothesis) but also on how reliable we consider the source to be, even in those circumstances where we are by no means certain in our beliefs. Receiving an evidence report that conflicts with our beliefs about the hypothesis can influence not just our belief in that hypothesis but also, simultaneously, lower our belief in the source’s credibility.

Fig. 4
figure 4figure 4

A simple explicit model of hypothesis, evidence and source. The model shown consists of three binary variables representing the hypothesis or claim in question, H, the evidence report provided by the source and a variable governing the reliability of the source, Rel. As indicated by the arrows, the evidence report is influenced by both the truth/falsity of the hypothesis and whether or not the source is reliable; however, the reliability of the source and the truth/falsity of the hypothesis itself are assumed (in this model) to be independent

Bovens and Hartmann (2003) demonstrate a number of interesting consequences of this model for central questions in epistemology, and the underlying intuition is embodied also by the agents in Olsson and Angerer’s simulations of knowledge in networks of interacting agents (see, e.g. Olsson, this volume). Empirical support for this intuition, finally, stems from a recent study (Jarvstad and Hahn 2011, Exp. 2) demonstrating that participants readily drew conclusions about the degree of reliability of a source based on the content of a source’s very simple communications, even though participants had no way of being sure about that content (see also Reimer et al. 2004).

6 Rehousing Argumentation Schemes Within a Bayesian Framework

We have sought to demonstrate thus far that the evaluation rules for plausible reasoning conflict both with the Bayesian framework and with common intuition. This does not, however, mean that the argumentation schemes and the critical questions that accompany them within Walton’s defeasible reasoning framework are without merit. Rather, they genuinely capture criteria that are typically relevant. The critical questions introduced in the context of the example of the visitor asking for directions above are ones that are clearly relevant, and it is where, in the normal course of affairs, we have reason to believe that the criteria they describe are met that the inference from testimony to the actual location of the central station seems justified.

At the same time, it is the contention of the Bayesian approach that notions such as ‘relevance’, ‘typically’ and ‘in the normal course of affairs’ can be handled naturally within the probability calculus (see, e.g. Pearl 1988; and in the argumentation context specifically, also Hahn and Oaksford 2006a, b). Moreover, the calculus captures naturally dynamic changes in relevance through the notion of conditional independence (see Pearl 1988): the probability of outcome A in light of variable B may be different in the presence of C, than it is without it, and such dependence relationships are captured naturally in the graph structures of Bayesian belief networks that we have drawn on already.

So for example, in the network of Fig. 1 above, the testimonial report Erep will cease to be relevant if – for whatever reason – we gain access to the evidence, E, itself; once the state of E is known, receiving a report on its state no longer leads us to increase our belief in H (relatedly, experimental evidence from the persuasion literature finds that the impact of source reliability is moderated by the recipient’s own ‘expertise’; see, e.g. Ratneshwar and Chaiken 1991). Bayesian belief networks represent relevant variables (as nodes) and capture directions of influence between them as weighted links and do so in such a way that supports probabilistic inference (i.e., the propagation of beliefs). In the remainder, we thus seek to provide a simple example of how an argumentation scheme can be represented within this formalism. For this example, we use Walton’s argumentation scheme for the argument from expert opinion, which is a subtype of the argument from position to know outlined above.

Appeals to expert opinion arise in any situation in which we lack specialised knowledge in a domain and, in some cases, might be the only option we have available to us (e.g. consulting our G.P. to diagnose a set of symptoms). Such appeals can take the form of the fallacy of ad verecundiam, if the appeal is made to ‘parties having no legitimate claim to authority in the matter at hand’ (Copi and Cohen 1994, p. 119). The task of evaluating the strength of an appeal to expert opinion is essentially, therefore, one of evaluating the expertise of the party to whom the appeal is made. Walton (1997, 2008, p. 218) outlined six critical questions for evaluating the strength of an appeal to expert opinion (Table 1). As we shall see, these six questions are well captured within a Bayesian network.

Table 1 The six critical questions for the appeal to expert opinion (Adapted from Walton 2008, p. 218)

Figure 5 shows a simple Bayesian network within which it is possible to evaluate the answers to all six of the questions outlined by Walton (Table 1), and here, we outline how each question is addressed within the network. Walton (2008, p. 218) defines the credibility of S as an expert source as being the question of whether S has mastery of a domain of knowledge or skill. We will conflate the expertise question and the field question as the expertise will only be of relevance if it is in the particular domain under consideration. These questions are therefore captured by the prior probability assigned to ‘expertise’. Of course, the network could be extended to the top right to allow for parents of this node, enabling evidence to be presented in support of S’s credentials. Such nodes could be direct representations of Walton’s five subquestions critical to determining whether S might be called an expert, pertaining to qualifications, references, record of experience, record of successful predictions and record of previous projects reviewed by other experts.

Fig. 5
figure 5figure 5

A Bayesian network representation of the appeal to expert opinion, within which all critical questions raised by Walton (2008) can be addressed

‘What did S assert that implies H?’ In Fig. 5, S is directly asserting that H is true. S might, however, choose to assert only an intermediate fact (e.g. ‘Evidence’ in Fig. 5). In this instance, the degree to which this assertion implies H results from the likelihood ratio at 4 \( \displaystyle\frac{{P{\it (}{{\it Evidence}}|{{\it H}}{\it )}}}{{P{\it(}{{\it Evidence}}|\neg {{\it H}}{\it )}}} \).

‘Is S personally reliable as a source’ is captured by the prior degree of belief assigned to the ‘trustworthiness’ node. As with the ‘expertise’ node, the network could be extended to provide evidence for S’s trustworthiness.

‘Is H consistent with what other experts assert?’ Nodes ‘Hrep (from S2)’ and ‘Hrep (from S3)’ represent the reports of different experts on the notion of whether or not H is true.

‘Is S’s assertion based on evidence?’ This question is captured by the opinion question already covered. Note, though, that the network can be extended for cases in which S’s statement is based on more than a single item of evidence.

Figure 5 therefore illustrates how central features of appeals to expert opinion can be captured within a Bayesian network. According to the known evidence and the assignment of conditional probability values, the network prescribes not only how likely H is to be true, but information also propagates through the network, updating degrees of belief in the expertise and trustworthiness of the expert source (as in Jarvstad and Hahn 2011, Exp. 2).

Hence, it is the contention of this chapter that significant progress can be achieved by marrying the insights of scheme-based approaches with the formal framework of probability and that this will supply the satisfactory framework for evaluation that is presently missing (on such combination, see also Grabmair and Ashley, this volume).

From this perspective, reasoning appropriately about source reliability in any given context involves Bayesian inference within a suitable model (see also Lagnado et al. 2012). How complex that model needs to be depends on the context, such as the relevant dimensions of variation within that context. Where multiple sources all report the same content, it is not necessary to separate out content and source. Where the specific factors determining reliability are irrelevant or simply unknown, summary representation through a single variable ‘reliability’ will suffice. In other contexts, it will be useful to separate out personal trustworthiness and expertise, such as in the model of Fig. 5., or possibly sincerity, observational sensitivity, and objectivity (lack of bias), as in Schum (1994); or as Goldman (1999) puts it, competence, opportunity, and honesty. Sometimes it may also be useful to distinguish different cognitive processes affecting the reliability of the report (e.g. Friedman 1987). Though specific contexts may require either more detail or less, the criteria that scheme-based approaches have sought to identify are ones that are often likely to be a concern and hence are good candidates for our models. This chapter has tried to describe the basic building blocks from which such models are assembled and the key conceptual implications that the nature of these building blocks has for thinking about testimony.

7 Concluding Remarks

Testimony, as many have argued recently, is central to the way we acquire information about the world and form our beliefs and opinions. Hence, testimony is central also to argumentation. This was long overlooked, and thinking about argument was dominated by the view that arguments should somehow ‘stand for themselves’, independently of the person advancing them. Consequently, considerations of the source were branded fallacious. Walton’s work on the ad hominem (Walton 1998) and ad verecundiam fallacies (Walton 1997) has done much to challenge that view, and the present upsurge on philosophical interest in testimony lends support to this challenge.

Nevertheless, there may be areas of argumentation where source considerations are unnecessary or inappropriate. One important limit for the relevance of source reliability considerations has already been mentioned: where the recipient of an argument possesses independent means by which to verify its content, consideration of the source becomes unnecessary.

At the same time, the examples considered in this chapter have been limited to statements that involve facts. They have not concerned statements that are purely about values (‘democracy is good’), and there may be limits to the role of testimony in contexts of practical reasoning because value statements possess different criteria of evaluation. Certainly, the Bayesian framework as discussed so far applies only to conclusions that are true or false. However, many arguments involving values concern the choice of actions under conditions of uncertainty, and such choice falls under the normative scope of decision theory. Here, recent work on consequentialist arguments (‘we should not raise taxes, because it will ruin the economy’), including the purportedly fallacious ‘slippery slope argument’ (‘if we allow medical screening of embryos, it will be designer babies next’), shows how such arguments can be captured in a Bayesian, decision-theoretic framework (Hahn and Oaksford 2006a, 2007a; Corner et al. 2011; Thompson et al. 2005). Examining in detail the way testimony might operate in such contexts that involve both fact and values seems an important issue for future research.

Finally, it is worth mentioning other research on the fallacies of argumentation that draws on the probability calculus because the range of argument forms and examples discussed in these works (see, e.g. on the ‘argument from ignorance’, Oaksford and Hahn 2004; on circular arguments Shogenji 2000; Hahn and Oaksford 2006a, 2007b; Hahn 2011; Shogenji, this volume) add to the examples discussed in this chapter in making clear that the application of Bayesian probability as a formal framework is not, as some have assumed, limited to arguments that are overtly numerical or statistical.