1 Introduction

Causal theories of mental content come in many varieties, but they are all based on the same motivating idea—that the content of a given mental representation type is determined by what causes tokens of that type.Footnote 1 If, say, the content of perceptual belief type b is the proposition that that’s a dog, then, the story goes, this is because tokens of b are caused by the presence of dogs. This is just part of the picture, however, since tokens of b are also caused by foxes-at-a-distance, by retinal states of various sorts, and by lots of other things. The general challenge is to say why b’s content is one proposition rather than another, and to spell out an answer in the language of causality, perhaps employing ideas from logic and probability theory along the way.Footnote 2

This challenge has many facets. Suppose, for simplicity, that tokens of b occur only when they are caused by dogs or by foxes-at-a-distance. If a given causal theory T implies that b’s content is the proposition that that’s a dog-or-fox-at-a-distance and isn’t the proposition that that’s a dog or the proposition that that’s a fox-at-a-distance, then T entails that beliefs of this type never misrepresent—they are never false. It may be acceptable that a theory should occasionally judge that some belief types have contents that are never false. However, any theory that goes farther, and judges that misrepresentation is impossible, has gone too far. Misrepresentation is ubiquitous and theories of content, whether they are causal or not, must explain why beliefs have contents that are sometimes false. This is the point of the infamous “disjunction problem.”Footnote 3

The disjunction problem has a cousin, the so-called “distality problem.”Footnote 4 Suppose now that tokens of b are caused by dogs, by retinal states of type s, and by nothing else, where dogs cause retinal states of type s, and the latter, in turn, cause tokens of b. If b’s content is the proposition that that’s a dog, and a given causal theory T says that b’s content is the proposition that a token of s is occurring (which concerns the more proximate cause), and isn’t the proposition that that’s a dog (which concerns the more distal cause), then T has made a mistake. Here again, there’s nothing wrong with a theory that entails that some beliefs are about retinal states, but a theory that says beliefs are never about events in the external world has gone too far.

The disjunction problem and the distality problem both concern ways in which a theory of content can go wrong. To cleanly separate these two problems, it helps to think of the disjunction problem as a synchronic problem and the distality problem as diachronic (Stampe 1977, 44); see Fig. 1. The arrows from Cd to X, from X to Cp, and from Cp to B are causal; they indicate that Cd (the distal cause in the chain) causes X, X causes Cp (the proximate cause in the chain), and Cp causes B, where B is the proposition that this or that organism has a token of b at a given time.Footnote 5 The arrow from X to X  ∨  Y, in contrast, is logical; it indicates that X logically entails X ∨ Y. The issue raised by the disjunction problem concerns whether b’s content is X and isn’t the “simultaneous” proposition X ∨ Y. The issue in the distality problem concerns whether b’s content is X and isn’t the “earlier” proposition Cd or the “later” proposition Cp.Footnote 6

Fig. 1
figure 1

The synchronic disjunction problem and the diachronic distality problem

Our interest here is in purely probabilistic causal (ppc) theories. Suppose that tokens of b are caused by X1, by X2, …, and by Xn. Let T be a ppc theory. If T entails that b’s content is X1 and isn’t X2, …, or Xn, then this is solely because of B’s probabilistic (and causal) profile with respect to X1, X2, …, and Xn. Perhaps, for example, it’s because B indicates that X1 is true in that Pr(X1  | B) = 1, but does not do the same for any of X2, …, or Xn. More specifically, a ppc theory T takes as “input” a set (perhaps infinite) of candidate propositions for b’s content (where, since a ppc theory is a causal theory, these propositions are restricted to propositions about things that can cause tokens of b), a probability distribution defined over B and those propositions, and nothing else, and then “outputs” a verdict on b’s content—for instance, the verdict that b’s content is X and isn’t any other proposition.Footnote 7

Some causal theories, in contrast, are probabilistic but only “partially” so. They take as input not just a set of candidate propositions for b’s content and a probability distribution defined over B and those propositions, but also something else.Footnote 8

We will focus on ppc theories, but this isn’t because we’re convinced that an adequate theory of content should be purely probabilistic. We harbor no such conviction. Our motivation, rather, is that the prospects of ppc theories in the context of the disjunction problem and the distality problem have been underexplored in the literature, and that this gap is unfortunate since a more thorough examination might be significant.Footnote 9 If it turns out that some ppc theories are able to cope with the disjunction problem and the distality problem, and if ppc theories are nonetheless problematic as a whole, then this isn’t because of the disjunction and distality problems. If, instead, it turns out that no ppc theory can handle both problems, then this provides additional motivation for partially probabilistic theories and also for non-probabilistic theories (i.e., theories that don’t take probability distributions as relevant inputs). Furthermore, it might be that a more thorough examination of ppc theories will suggest novel theories in the partially probabilistic camp, and perhaps some such theories will compare favorably with extant partially probabilistic theories.

The remainder of this paper is divided into five sections. In Sect. 2, we clarify how we mean for the disjunction and distality problems to be understood. We also introduce a third problem, which we call “the hard problem.” In Sect. 3, we describe four types of ppc theory, and present two theories of each type. We call the eight theories in question “T1,” “T2,” and so on. Some of these have been discussed in the extant literature, but others are new. We also note several—240, to be exact—hybrids of two or more of T1–T8. In Sect. 4, we describe which of our candidate theories can handle the disjunction problem, which can handle the distality problem, and which can handle the hard problem. It turns out that though some can handle the first, and some can handle the second, none can handle the third. This is our main result. In Sect. 5, we consider three potential responses to that result, and argue that the first two fail, but the third has some promise. In Sect. 6, we offer some concluding comments.

2 The disjunction problem, the distality problem, and the hard problem

2.1 The disjunction problem

Our target theories of content are causal, from which it follows that some candidate propositions for b’s content are ruled out from the start. For example, B is ruled out as b’s content, on the grounds that b can’t be caused by B’s being true, and propositions about the future are ruled out as well, and for the same reason.

Consistent with this, consider the following:

(DISJ1):

b’s content is X, X ∨ Y, or Y.

(DISJ2):

1 > Pr(X ∨ Y) > Pr(X&Y) = 0.

(DISJ3):

Pr(B&X) > 0 and Pr(B&Y) > 0.

We will say that a given ppc theory T can handle the disjunction problem if and only if there is a belief state b, there are propositions X, X ∨ Y, and Y, and there is a probability distribution such that (i) (DISJ2) and (DISJ3) hold and (ii) given the assumption that (DISJ1) holds, T outputs the result that b’s content is X and isn’t X ∨ Y or Y.Footnote 10 We propose that an adequate ppc theory of content should be able to handle the disjunction problem thus understood. A ppc theory that passes this test thus makes room for misrepresentation.

The probabilities deployed in (DISJ2) and (DISJ3), and in what follows, should not be understood as credences. That would put the cart before the horse, since we want to consider theories that characterize propositional contents in terms of probabilities; this means that the probabilities themselves should not involve degrees of belief. A broadly “objective” interpretation of probability is needed, but we won’t assume any particular objective interpretation here.Footnote 11,Footnote 12

Our way of understanding the disjunction problem differs from Fodor’s (1987) in that his, but not ours, requires that each of X and Y be causally sufficient but not necessary for b.Footnote 13 We prefer ours because it doesn’t involve that requirement and thus is more general, at least in that respect. We leave it open, however, that an adequate theory of content should also be able to handle Fodor’s version of the disjunction problem. The important point here is just that an adequate theory of content should be able to handle ours.Footnote 14

Assumptions (DISJ2) and (DISJ3) are pretty modest; for example, they do not entail that B is probabilistically dependent on X or on Y. The two assumptions therefore fail to reflect a feature of our simple example. We said that tokens of b are caused by dogs and by foxes-at-a-distance, and it is natural to assume that these causes raise the probability of b’s occurring. This modesty is all for the good, however, since we are especially interested in finding theories of content that fail to solve the disjunction problem. If there is no probability distribution that satisfies our austere requirements, there is no probability distribution that satisfies a logically stronger set of requirements. We grant, though, that exploring different formulations of the disjunction problem is a good project for the future, and we take one step in that direction in Sect. 5.3. These points about the disjunction problem also apply to our formulation of the distality problem, to which we now turn.

2.2 The distality problem

Let’s return to the causal chain depicted in Fig. 1, from Cd to X to Cp to B, and consider:

(DIST1):

b’s content is X, Cp, or Cd.

(DIST2):

B’s probability is increased by each of Cp, X, and Cd, Cp’s probability is increased by each of X and Cd, and X’s probability is increased by Cd.

(DIST3):

Cp screens-off each of X and Cd from B, and X screens off Cd from each of B and Cp.Footnote 15

These last two propositions are based on the assumption that (a) B is caused by each of Cd, X, and Cp, (b) Cp is caused by each of Cd and X, (c) X is caused by Cd, (d) causes (at least typically) increase the probabilities of their effects, and (e) events often screen-off their causes from their effects.Footnote 16

We will say that a given ppc theory T can handle the distality problem if and only if there is a belief state b, there are propositions X, Cp, and Cd that are causally related in the way just described, and there is a probability distribution such that (i) (DIST2) and (DIST3) hold and (ii) given the assumption that (DIST1) holds, T outputs the result that b’s content is X and isn’t Cp or Cd.Footnote 17 We propose that an adequate ppc theory of content should be able to handle the distality problem thus understood.

We mean for the distality problem to be understood so that it resembles but is distinct from “the solipsism problem.” The latter says that an adequate theory of meaning must allow for the possibility that organisms have beliefs about things outside their own minds. The former goes farther and says that an adequate theory of meaning must allow for the possibility that organisms have beliefs about things outside their own bodies; this happens, for example, when you have beliefs about dogs as opposed to your retinal states. Any theory that can handle the distality problem can handle the solipsism problem, but not vice versa.Footnote 18

2.3 The hard problem

We will say that a given ppc theory T can handle the hard problem if and only if T can handle both the disjunction problem and the distality problem. The hard problem is harder to handle than the disjunction problem and the distality problem taken individually. Why we give the hard problem that moniker will be clear by the end of Sect. 4.

3 A gaggle of ppc theories

This section has seven subsections. In the first four, we set out ppc theories T1–T8. In the fifth, we provide a table of those eight theories, and note an important distinction. In the sixth, we relate T1–T8 to various probabilistic theories in the extant literature. In the seventh, we provide two schemas for constructing hybrids of two or more of T1–T8. The result is a total of 240 additional ppc theories.

3.1 Maximum-probability theories

Consider the following:

  • T1: For any b and X, b’s content is X if and only if Pr(X | B) = 1.

  • T2: For any b and X, b’s content is X if and only if Pr(B | X) = 1.

We call these theories “Maximum-Probability Theories,” since each says that whether b’s content is X is a matter of whether a given probability has the maximum value of unity. The probability at issue in T1 is the probability of X given B. The probability at issue in T2 is the probability of B given X. Since there can be cases where the one probability equals unity but the other one does not, T1 and T2 are logically distinct.

It might seem that T1 and T2 are too demanding in requiring probabilities of 1, and that they should be relaxed so that the probabilities in question need to be high but don’t need to be maximally high. However, note that these relaxed versions of T1 and T2 would be open to the worry that there’s no non-arbitrary threshold for high probability. Why, for example, set the bar at 0.95 as opposed to 0.949?Footnote 19

Note too that T1 and T2 can be understood so that the probabilities in question are restricted to special circumstances. Dretske (1981), for example, defends a theory in the neighborhood of T1 on which Pr(X | B) is relativized to a certain “training” period. This allows there to be cases after the relevant training period where B is true but X is false.

3.2 Increase-in-probability theories

T1 and T2 contrast with the following theories:

  • T3: For any b and X, b’s content is X if and only if Pr(X | B) > Pr(X).

  • T4: For any b and X, b’s content is X if and only if Pr(B | X) > Pr(B).

These are “Increase-in-Probability Theories.” Although neither T3 nor T4 is equivalent with either of T1 and T2, T3 and T4 are equivalent with each other, since increase in probability is symmetric. We therefore count them as a single theory, which we call “T3/T4.”

We don’t know of any proponents of T3/T4, although Artiga and Sebastian (forthcoming) discuss it. We mention this theory for completeness, but there’s a further reason: even if T3/T4 is implausible on its own, maybe that theory can be used to construct a hybrid theory on which b’s content is X if and only if the right side of T3/T4 holds and some additional condition does too. We explore this possibility in Sect. 3.7.

T3/T4 resembles T1 and T2: its right-hand side, like T1’s right-hand side and T2’s right-hand side, is non-contrastive. To see whether proposition X is the content of b, you don’t need to consider an alternative proposition Y. T1, T2, and T3/T4 are in that respect unlike the theories to which we now turn.

3.3 Highest-probability theories

Here are two more theories:

  • T5: For any b and X, b’s content is X if and only if Pr(X | B) > Pr(Y | B) for any Y distinct from X.Footnote 20

  • T6: For any b and X, b’s content is X if and only if Pr(B | X) > Pr(B | Y) for any Y distinct from X.

These are “Highest-Probability Theories.” The right-hand side of T5 says that the probability of X given B is greater than the probability of any other proposition given B. The right-hand side of T6 says that the probability of B given X is greater than the probability of B given any other proposition. There are cases where the right-hand side of the one holds but the right-hand side of the other does not, so T5 and T6, unlike T3 and T4, are logically distinct.

3.4 Highest-degree-of-confirmation theories

T5 and T6 are contrastive analogues of the non-contrastive T1 and T2. The following, in turn, are contrastive analogues of T3 and T4:

  • T7: For any b and X, b’s content is X if and only if DOC(X, B) > DOC(Y, B) for any Y distinct from X.

  • T8: For any b and X, b’s content is X if and only if DOC(B, X) > DOC(B, Y) for any Y distinct from X.

These are “Highest-Degree-of-Confirmation Theories,” where, for any propositions E and H, DOC(H, E) is the degree to which E confirms H, where confirmation is a matter of increase in probability. The right-hand side of T7 says that the degree to which B confirms X is greater than the degree to which B confirms any other proposition. The right-hand side of T8 says that the degree to which X confirms B is greater than the degree to which any other proposition confirms B.

How is degree of confirmation to be measured? Several prima facie plausible answers to this question have been discussed in the literature.Footnote 21 One is that the degree to which E confirms H is a matter of the difference between H’s probability given E (i.e., H’s posterior probability relative to E) and H’s prior probability:

$$ {\text{DOC}}_{\text{DM}} \left( {H,E} \right) = \Pr \left( {H|E} \right){-}\Pr \left( H \right) $$

This is the “difference measure” of degree of confirmation. Another prima facie plausible answer is that the degree to which E confirms H is a matter of the ratio of H’s probability given E and H’s prior probability:

$$ {\text{DOC}}_{\text{RM}} \left( {H,E} \right) = \Pr \left( {H|E} \right)/\Pr \left( H \right) $$

This is the “ratio measure” of degree of confirmation. DOCDM and DOCRM both meet the following minimal adequacy condition on measures of degree of confirmation:

(*):

There is a number n such that DOC(H, E) >/=/< n if and only if Pr(H | E) > / = / < Pr(H).Footnote 22

Here “n” is the neutral point between confirmation and disconfirmation. For DOCDM, the neutral point n is 0; for DOCRM, the neutral point is 1.

DOCDM and DOCRM are not equivalent. DOCRM is symmetric in that DOCRM(H, E) = DOCRM(E, H) in all cases. This isn’t true of DOCDM, for DOCDM(H, E) ≠ DOCDM(E, H) in some cases.

We take no stand here on whether one of DOCDM and DOCRM is preferable to the other. It turns out, however, that if T7 and T8 are understood in terms of DOCRM, then T6, T7, and T8 are all logically equivalent to each other. We show this in Appendix 1. So, since T6 is already on the table, and since no two of T6, T7, and T8 are logically equivalent when T7 and T8 are understood in terms of DOCDM, we shall understand T7 and T8 in terms of DOCDM.Footnote 23 This choice allows an additional ppc theory to be placed on the table.

3.5 B-to-X theories versus X-to-B theories

T1–T8 are listed in Table 1. Even though we count T3 and T4 as a single theory, we list them separately in the table so as to highlight an important distinction. T1, T3, T5, and T7 are “B-to-X” theories in that the right-hand side of each involves a conditional probability that “moves” from B as the conditioning proposition to X as the conditioned proposition. T2, T4, T6, and T8, in contrast, are “X-to-B” theories in that the right-hand side of each involves a conditional probability that “moves” from X as the conditioning proposition to B as the conditioned proposition. T3 and T4 are logically equivalent to each other, but this isn’t true in general when it comes to B-to-X theories and their X-to-B counterparts.Footnote 24

Table 1 A partial taxonomy of ppc theories

3.6 Extant probabilistic theories

T1–T8 are all inspired by extant probabilistic theories (whether or not they are pure). First, T1, T2, and T3/T4 are inspired by Dretske’s (1981, 1983) theory. This is a theory on which b’s content is X only if Pr(X | B) = 1 > Pr(X).Footnote 25 T1 and T2 are like Dretske’s in requiring a maximal probability of unity, whereas T3/T4 is like Dretske’s in requiring a probability increase. Second, T5 and T6 are inspired by Rupert’s (1999) theory, on which Pr(B | X) needs to be greater than Pr(B | Y) for any Y distinct from X, but doesn’t need to have the maximal value of unity. Rupert restricts his theory to natural kind concepts, and so, strictly speaking, it isn’t identical to T6 (which isn’t thus restricted). Even so, T6 is obviously similar to Rupert’s, and so is T5 in requiring a highest probability as opposed to a maximum probability. Third, T7 and T8 are inspired by Eliasmith’s (2005) and Usher’s (2001) theories. These are theories on which DOCRM(B, X) needs to be greater than DOCRM(B, Y) for any Y distinct from X, but doesn’t need to clear some absolute threshold.Footnote 26 They frame their theories in terms of “information” as opposed to “confirmation,” yet T8 is nonetheless similar to their theories, and so is T7 in requiring a highest degree of confirmation as opposed to a degree of confirmation greater than some absolute threshold.

3.7 Hybrid theories

There are ppc theories additional to T1–T8. Consider, for example, the following:

  • T1&T2: For any b and X, b’s content is X if and only if (i) Pr(X | B) = 1 and (ii) Pr(B | X) = 1.

  • T1 ∨ T2: For any b and X, b’s content is X if and only if (i) Pr(X | B) = 1 or (ii) Pr(B | X) = 1.

Each of these theories is based on T1 and T2. The difference is that the right-hand side of T1&T2 is the conjunction of T1’s and T2’s right-hand sides, whereas the right-hand side of T1 ∨ T2 is their disjunction.

This is the tip of the iceberg. T1&T2 is but one of 120 instances of the following conjunctive schema (where each theory in question is one of T1–T8):

  • Ti&Tj&…&Tn: For any b and X, b’s content is X if and only if Ti’s right-hand side holds, Tj’s right-hand side holds, …, and Tn’s right-hand side holds.

Similarly, T1 ∨ T2 is but one of 120 instances of the following disjunctive schema (where each theory in question is one of T1–T8):

  • Ti ∨ Tj ∨ … ∨ Tn: For any b and X, b’s content is X if and only if Ti’s right-hand side holds, Tj’s right-hand side holds, …, or Tn’s right-hand side holds.

These schemas yield a total of 240 ppc theories in addition to T1–T8. That is a lot.

We will mention additional ppc theories in Sect. 5, but we now have enough theories to get started. There are seven “basic” (non-hybrid) theories (T1, T2, T3/T4, T5, T6, T7, and T8) and 240 hybrids formed from these basics (T1&T2, T1&T3, etc.).

4 How theories T1–T8 and their hybrids fare

T1–T8 are a mixed bag when it comes to the disjunction problem. T1 and T5 fall prey to the the disjunction problem, whereas the remaining theories—T2, T3/T4, T6, T7, and T8—do not. These results are established in Appendix 2. A mix of good news and bad also arises in connection with the distality problem, though here the pattern is different. T1, T2, T3/T4, T6, T7, and T8 all succumb to the distality problem, whereas T5 does not. These results are established in Appendix 3. When it comes to the hard problem, in contrast, T1–T8 are all in the same boat: each of them falls prey to the hard problem. This follows from the fact that each of them falls prey to the disjunction problem or the distality problem.

These results are summarized in Table 2. A “Yes” in a cell indicates that the theory in question can handle the problem in question; a “No” in a cell means that the theory cannot.

Table 2 Which problems can T1–T8 handle?

What about the two hundred forty hybrids of two or more of T1–T8 noted in Sect. 3.7? It turns out that none of them can handle the hard problem either. We show this in Appendix 4. Some of them can handle the disjunction problem, and some can handle the distality problem, but none can handle both.

We are not the first to note that various ppc theories have trouble with distality. For example, consider this passage from Artiga and Sebastián (forthcoming):

Consider the Fusiform Face Area (FFA), which is usually thought to represent faces…. Suppose we discover that a certain neural network R in the FFA selectively fires with significant intensity when there is a face and also that, given that R is active, the entity that is more likely to be present is a face. One might think these observations suffice for establishing the fact that the brain state represents face according to SGIT. Unfortunately, it is unclear that SGIT can deliver this result. Consider, for instance, the set of neuronal structures in the thalamus that are active whenever there is a face in front of the subject. If R has the highest statistical dependence with faces, it will also normally have the highest statistical dependence with these neuronal states in early vision. Thus, SGIT would entail that this activity in FFA represents neuronal activation in another part of the brain. This is of course an extremely counterintuitive result. Indeed, even if there was some principled way of excluding other brain states from being represented, other inadequate contents such as face-looking thing could probably not be avoided. (Artiga and Sebastián forthcoming, p. 8, italics original)

Here “SGIT” is short for “Scientifically Guided Informational Theories.” T6 is an example of such a theory, and so is T8 when understood in terms of DOCRM.Footnote 27 In our terminology, and focusing on T6, Artiga and Sebastián’s worry is that T6 outputs the mistaken result that b’s content is the proposition Cp, which describes neuronal activation in the brain, and isn’t X, which describes a face.

There’s an important difference, though, between Artiga and Sebastián’s discussion and ours. Consider the the following claims from the passage just quoted:

If R has the highest statistical dependence with faces, it will also normally have the highest statistical dependence with these neuronal states in early vision.

Indeed, even if there was some principled way of excluding other brain states from being represented, other inadequate contents such as face-looking thing could probably not be avoided.

These claims are prima facie plausible, but Artiga and Sebastián provide no argument in support of them. The claims are simply asserted. In contrast, we prove in Appendix 3 that there are no probability distributions such that (i) (DIST2) and (DIST3) hold and (ii) given the assumption that (DIST1) holds, T6 outputs the result that b’s content is X and isn’t Cp or Cd.Footnote 28

One final note is in order. The fact that a given theory solves the disjunction problem or the distality problem doesn’t mean that there are realistic probability distributions—probabilitiy distributions in line with the relevant frequencies in the actual world—of the sort in question. Consider T5, for example, and the fact that it can handle the distality problem. It could be that no realistic probability distribution is such that (i) (DIST2) and (DIST3) hold and (ii) given the assumption that (DIST1) holds, T5 outputs the result that b’s content is X and isn’t Cp or Cd. Our adequacy conditions are very weak, which is why failing to meet them is a death blow to a theory, whereas meeting them is a minor victory.Footnote 29

5 Three potential responses to the hard problem

It would be premature to give up hope at this point, and conclude that no ppc theory can handle the hard problem. There are potential responses to consider.

5.1 Weaken T1–T8

In Sect. 3.7, we noted 240 hybrid ppc theories that were formed by using two or more of T1–T8. Each of those hybrid theories is like T1–T8 in that it gives a necessary and sufficient condition for b’s meaning X, and each is like T1–T8 in that it falls prey to the hard problem. What about ppc theories that are like T1–T8 except that they give only a sufficient condition for b’s meaning X or give just a necessary condition for b’s meaning X? Let “TiS” be Ti (for any i = 1, 2, …, 8) when weakened so as to give only a sufficient condition for b’s meaning X, and let “TiN” be Ti (for any i = 1, 2, …, 8) when weakened so as to give only a necessary condition for b’s meaning X. Can any of theses weaker theories handle the hard problem?

The situation is perfectly uniform when it comes to T1S–T8S: none of them can handle the disjunction problem or the distality problem; hence none of them can handle the hard problem. The reason why is straightforward. None of T1S–T8S gives a necessary condition for b’s meaning X, and thus none of them can rule out any candidate proposition as b’s content. For example, although there might be cases where T5S outputs the result b’s content is X, T5S is unable to output the result b’s content is not say, X ∨ Y.Footnote 30

It might seem that the situation is similar with respect to T1N–T8N. For, it might seem that because none of T1N–T8N gives a sufficient condition for b’s meaning X, none of them can “rule in” any candidate proposition as b’s content. However, consider (DISJ1) and (DIST1). The former says that b’s content is X, X ∨ Y, or Y, while the latter says that b’s content is X, Cp, or Cd. Take some case where (DIST2) and (DIST3) hold, and suppose that one of T5N, for example, rules out each of Cp and Cd as b’s content because Pr(X | B) is greater than each of Pr(Cp | B) and Pr(Cd | B). Then given the assumption that (DIST1) holds, it follows by T5N that b’s content is X.

The situation with respect to T1N–T8N is summarized in Table 3. First, neither T1N nor T5N can handle the disjunction problem, but the other theories can. Second, T5N can handle the distality problem, but none of the other theories can. Third, none of the theories can handle the hard problem. These results are established in Appendix 5.

Table 3 Which problems can T1N–T8N handle?

It’s interesting that the pattern for the necessary-condition theories T1N–T8N (as summarized in Table 3) is identical to the pattern for the necessary-and-sufficient-condition theories T1–T8 (as shown in Table 2). We conjecture that the reason for this confluence is to be found in (DISJ1) and (DIST1).

5.2 Move from confirmation to either correlation or mutual information

It’s not uncommon for theorists to use the term “correlation” in such a way that a high degree of correlation is just a matter of a high conditional probability. Consider, for example, the following passage from Fodor:

However, the crude treatment just sketched clearly won’t do: it is open to an objection that can be put like this: If there are wild tokenings of R, it follows that the nomic dependence of R upon S is imperfect; some R-tokens – the wild ones – are not caused by S tokens. Well, but clearly they are caused by something; i.e., by something that is, like S, sufficient but not necessary for bringing Rs about. Call this second sort of sufficient condition the tokening of situations of type T. Here’s the problem: R represents the state of affairs with which its tokens are causally correlated. Some representations of type R are causally correlated with states of affairs of type S; some representations of type R are causally correlated with states of affairs of type T. So it looks as though what R represents is not either S or T, but rather the disjunction (S  ∨  T): The correlation of R with the disjunction is, after all, better than its correlation with either of the disjuncts and, ex hypothesi, correlation makes information and information makes representation. If, however, what Rs represent is not S but (S  ∨  T), then tokenings of R that are caused by T aren’t, after all, wild tokenings and our account of misrepresentation has gone West. (Fodor 1984, p. 240, emphasis original)

Dretske (1983, pp. 83–84) also uses “correlation” to refer to a single conditional probability.

Fodor’s claims in the above quote make sense, if “high degree of correlation” means high conditional probability. For, switching to our notation, if X and Y each entails X ∨ Y but not vice versa (since X and Y are mutually exclusive), it follows that Pr(X | B) < Pr(X ∨ Y | B) and Pr(Y | B) < Pr(X ∨ Y | B), which means that B’s “degree of correlation” with X ∨ Y is greater than both its degree of correlation with X and its degree of correlation with Y.

This usage of “correlation,” however, is miles away from standard usage in statistics. Consider:

$$ r(H,E) = \frac{\Pr (H\& E) - \Pr (H)\Pr (E)}{{\sqrt {\Pr (H)\Pr (\sim H)\Pr (E)\Pr (\sim E)} }} $$

This is the Pearson correlation coefficient applied to propositions rather than to quantitative variables.Footnote 31 Pearson’s r(H, E) has a range of [1, −1], where r(H, E) > / = / < 0 if and only if Pr(H | E) > / = / < Pr(H). Suppose, for example, that Pr(H | E) = 0.990 < 0.999 = Pr(H). Then though Pr(H | E) is high and thus E’s degree of correlation as understood by Fodor is high, r(H, E) is negative and thus it isn’t high.

But are there cases where X entails X ∨ Y but not vice versa, and yet it’s not the case that r(X, B) < r(X ∨ Y, B)? Yes, for there are cases where X entails X ∨ Y but not vice versa, and yet Pr(X | B) > Pr(X) whereas Pr(X ∨ Y | B) < Pr(X ∨ Y).Footnote 32 Any such case is a case where r(X, B) > 0 > r(X ∨ Y, B).

There are clear respects in which correlation as standardly understood in statistics is similar to confirmation as standardly understood in Bayesian confirmation theory. But, at the same time, there are important differences. Unlike DOCDM, r is symmetric in that r(H, E) = r(E, H) in all cases. And unlike DOCRM, r is maximal at 1 precisely when H and E entail each other.Footnote 33

This suggests that the way to solve the hard problem may be to replace Highest-Degree-of-Confirmation Theories such as T7 with a Highest-Degree-of-Correlation Theory like the following:

  • T9: For any b and X, b’s content is X if and only if r(X, B) > r(Y, B) for any Y distinct from X.Footnote 34

However, we show in Appendix 6 that T9 falls prey to the distality problem.Footnote 35 Hence T9 cannot solve the hard problem.

Perhaps if T9 were modified in terms of some degree-of-correlation measure other than r, the resulting theory would be able to handle the hard problem. We leave that question for the future, and turn now to “mutual information.”

Some philosophers use the expression “mutual information” to refer to the logarithm, base 2, of DOCRM(H, E):

$$ mi(H,E) = \log \left[ {\frac{\Pr (H|E)}{\Pr (H)}} \right] $$

We noted in Sect. 3.4 that T7 and T8 can be understood in terms of DOCRM, and that if they are so understood, then T6, T7, and T8 are all logically equivalent to each other. The same is true if T7 and T8 are understood in terms of mi. This is because DOCRM and mi are ordinally equivalent in that for any H1, H2, E1, and E2, DOCRM(H1, E1)  > / = / < DOCRM(H2, E2) if and only if mi(H1, E1) > / = / < mi(H2, E2); see Sect. 3.6 for relevant background. Hence it won’t help to modify T7 and T8 in terms of mi.

However, there’s another usage of the expression “mutual information” in the literature. In information theory (Cover and Thomas 2006), the expression “mutual information” is standardly used to refer not to mi, but to a weighted average of mi:

$$ MI(\varGamma_{\text{H}} ,\varGamma_{\text{E}} ) = \sum\nolimits_{i,j} {\Pr (H_{i} \& E_{j} )\log \left[ {\frac{{\Pr (H_{i} |E_{j} )}}{{\Pr (H_{i} )}}} \right]} $$

Note that whereas H and E are propositions, ΓH = {H1, H2, …, Hn} and ΓE = {E1, E2, …, Em} are partitions of propositions (i.e., sets of pairwise mutually exclusive and jointly exhaustive propositions).Footnote 36 Can T7 and T8 be modified by using MI, and if so, would this help in terms of handling the hard problem?

It isn’t clear how best to modify T7 and T8 by using MI, but we have a suggestion. Consider:

  • T10: For any b and X, b’s content is X if and only if there are partitions Γ1 = {B , ~ B} and Γ2 = {X , Y1, …, Yn} such that (i) MI1, Γ2) > MI1, Γ3) for any Γ3 distinct from Γ2 and (ii) mi(X, B) > mi(Yi, B) for any Yi in Γ2.Footnote 37

Think of this as working in two steps. First, MI narrows down b’s content to the members of a particular partition. Call this “the content partition.” Second, mi induces a further narrowing, to a particular member of the content partition. For example, suppose that the candidate content partitions are Γ2= {X, Y, Z} and Γ3={X*, Y*, Z*}, that MI1, Γ2) is greater than MI1, Γ3), and that mi(X, B) is greater than each of mi(Y, B) and mi(Z, B). Given that MI1, Γ2) is greater than MI1, Γ3), it follows that the content partition is Γ2, and so, as none of X*, Y*, and Z* is a member of that partition, b’s content isn’t X*, Y*, or Z*. Given that Γ2 is the content partition, and that mi(X, B) is greater than each of mi(Y, B) and mi(Z, B), it then follows that b’s content is X and isn’t Y or Z.

What are the candidate content partitions in the disjunction problem and the distality problem? It might seem that one of the candidate content partitions in the disjunction problem should include X, X ∨ Y, and Y, and that, similarly, one of the candidate content partitions in the distality problem should include X, Cp, and Cd. But note that X, X ∨ Y, and Y aren’t pairwise mutually exclusive, and neither are X, Cp, and Cd, so no set with X, X ∨ Y, and Y as members or with X, Cp, and Cd as members is a partition. We will assume, instead, that the candidate content partitions in the context of the disjunction problem are Γ2= {X, Y, ~ X& ~ Y} and Γ3={ X ∨ Y, ~ X& ~ Y}, and that the candidate content partitions in the context of the distality problem are Γ2={ X, ~ X}, Γ3={ Cp, ~ Cp}, and Γ4={ Cd, ~ Cd}.

We show in Appendix 7 that T10 falls prey to the distality problem and thus can’t handle the hard problem.Footnote 38 Hence, just as it won’t help solve the hard problem to move from confirmation to correlation in the sense of r, it also won’t help to move from confirmation to mutual information in the sense of MI.

5.3 Move to a degree-of-confirmation measure other than DOCDM

T7 and T8, when understood in terms of DOCDM, fall prey to the distality problem, but what if they are modified in terms of some degree-of-confirmation measure other than DOCDM (and also other than DOCRM)? Does this allow them to handle the distality problem? If so, does this modification also enable them to handle the hard problem?

Before answering this question, it’s important to note here that our proofs in Appendix 3 regarding T7 and T8 go well beyond these theories when understood in terms of DOCDM. Our proof that T7 falls prey to the distality problem generalizes to T7 when understood in terms of any degree-of-confirmation measure such that:

  • Weak Law of Likelihood: For any E, H1, and H2, if (i) Pr(E | H1) > Pr(E | H2) and (ii) Pr(E  |  ~ H1) < Pr(E  |  ~ H2), then DOC(H1, E) > DOC(H2, E).Footnote 39

Further, our proof that T8 falls prey to the distality problem generalizes to T8 when understood in terms of any degree-of-confirmation measure such that:

  • Final Probability Incrementality: For any E1, E2, and H, if Pr(H | E1) > Pr(H | E2), then DOC(H, E1) > DOC(H, E2).Footnote 40

It won’t help, then, to simply understand T7 and T8 in terms of any old degree-of-confirmation measure that is distinct from DOCDM.

However, there are degree-of-confirmation measures on which Weak Law of Likelihood or Final Probability Incrementality does not hold. Here is an example:

$$ {\text{DOC}}_{{{\text{DM}}*}} \left( {H,E} \right) = \Pr \left( {H\,|\,E} \right)\Pr \left( {H\,|\,E} \right){-}\Pr \left( H \right)\Pr \left( H \right) $$

This is a variant of DOCDM on which Final Probability Incrementality holds but Weak Law of Likelihood does not. Now consider:

  • T11: For any b and X, b’s content is X if and only if DOCDM*(X, B) > DOCDM*(Y, B) for any Y distinct from X.

It turns out, surprisingly, that T11 can handle the hard problem. We show this in Appendix 8.

This does not mean that T11 has any real plausibility in the context of ppc theories. DOCDM* is a rather strange-looking measure (to say the least). Why square X’s posterior and prior probabilities? Why not instead take them to the third power, or the fourth? Further, there are variants of the disjunction problem and the distality problem to consider. Consider:

(DISJ4):

Pr(B | X) > Pr(B) and Pr(B | Y) > Pr(B).

Let’s say that a given ppc theory T can handle the disjunction* problem if and only if there is a probability distribution D such that (i) (DISJ2), (DISJ3), and (DISJ4) hold and (ii) given the assumption that (DISJ1) holds, T outputs the result that b’s content is X and isn’t X ∨ Y or Y. If, as it seems, an adequate theory of content should be able to handle the disjunction* problem thus understood, then T11 is inadequate. For, as we show in Appendix 9, T11 falls prey to the disjunction* problem.

However, the fact that at least one Highest-Degree-of-Confirmation Theory can handle the hard problem perhaps provides some hope for ppc theories.

6 Conclusion

The disjunction problem is a problem for some but not all of T1–T8 and their 240 hybrids, and likewise with respect to the distality problem. The hard problem, in contrast, is a problem for every single one of T1–T8 and their 240 hybrids (Sect. 4). This generalizes to various weakened versions of T1–T8 (Sect. 5.1), to T7 and T8 both when they are modified in terms of Pearson’s correlation measure r and when they are modified in terms of mutual information MI (Sect. 5.2), to T7 when modified in terms of any degree-of-confirmation measure that, like DOCDM, meets Weak Law of Likelihood (Sect. 5.3), and to T8 when modified in terms of any degree-of-confirmation measure that, like DOCDM, meets Final Probability Incrementality (Sect. 5.3). The hard problem is recalcitrant! However, it doesn’t bring down every ppc theory in logical space. T11, for instance, can handle it. We don’t claim that T11 is the right theory of semantic content, in part because it succumbs to the strengthened version of the disjunction problem described in Sect. 5.3. Rather, our point is that ppc theories have resources that should be scrutinized further, in connection with the hard problem, and with respect to other conditions of adequacy as well.Footnote 41 We note, in this regard, that although we have looked at a largish number of candidate ppc theories, we have used a small handful of formal tools to construct those theories; many of those tools are Bayesian. There are other formal frameworks that are worth exploring.Footnote 42

We close with an analogy. Adaptationism is a research program in evolutionary biology that aims to explain the current traits of organisms in terms of natural selection in ancestral populations. It would be absurd to dismiss this research program on the grounds that adaptationism has so far failed to explain this or that trait in a given biological population. We feel the same way about the development of ppc theories of semantic content. This is a research program, and noting defects in this or that ppc theory hardly suffices to show that the whole program is bankrupt. Philosophers should be just as tenacious as scientists!