Abstract
The disjunction problem and the distality problem each presents a challenge that any theory of mental content must address. Here we consider their bearing on purely probabilistic causal (ppc) theories. In addition to considering these problems separately, we consider a third challenge—that a theory must solve both. We call this “the hard problem.” We consider 8 basic ppc theories along with 240 hybrids of them, and show that some can handle the disjunction problem and some can handle the distality problem, but none can handle the hard problem. This is our main result. We then discuss three possible responses to that result, and argue that though the first two fail, the third has some promise.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Causal theories of mental content come in many varieties, but they are all based on the same motivating idea—that the content of a given mental representation type is determined by what causes tokens of that type.Footnote 1 If, say, the content of perceptual belief type b is the proposition that that’s a dog, then, the story goes, this is because tokens of b are caused by the presence of dogs. This is just part of the picture, however, since tokens of b are also caused by foxes-at-a-distance, by retinal states of various sorts, and by lots of other things. The general challenge is to say why b’s content is one proposition rather than another, and to spell out an answer in the language of causality, perhaps employing ideas from logic and probability theory along the way.Footnote 2
This challenge has many facets. Suppose, for simplicity, that tokens of b occur only when they are caused by dogs or by foxes-at-a-distance. If a given causal theory T implies that b’s content is the proposition that that’s a dog-or-fox-at-a-distance and isn’t the proposition that that’s a dog or the proposition that that’s a fox-at-a-distance, then T entails that beliefs of this type never misrepresent—they are never false. It may be acceptable that a theory should occasionally judge that some belief types have contents that are never false. However, any theory that goes farther, and judges that misrepresentation is impossible, has gone too far. Misrepresentation is ubiquitous and theories of content, whether they are causal or not, must explain why beliefs have contents that are sometimes false. This is the point of the infamous “disjunction problem.”Footnote 3
The disjunction problem has a cousin, the so-called “distality problem.”Footnote 4 Suppose now that tokens of b are caused by dogs, by retinal states of type s, and by nothing else, where dogs cause retinal states of type s, and the latter, in turn, cause tokens of b. If b’s content is the proposition that that’s a dog, and a given causal theory T says that b’s content is the proposition that a token of s is occurring (which concerns the more proximate cause), and isn’t the proposition that that’s a dog (which concerns the more distal cause), then T has made a mistake. Here again, there’s nothing wrong with a theory that entails that some beliefs are about retinal states, but a theory that says beliefs are never about events in the external world has gone too far.
The disjunction problem and the distality problem both concern ways in which a theory of content can go wrong. To cleanly separate these two problems, it helps to think of the disjunction problem as a synchronic problem and the distality problem as diachronic (Stampe 1977, 44); see Fig. 1. The arrows from Cd to X, from X to Cp, and from Cp to B are causal; they indicate that Cd (the distal cause in the chain) causes X, X causes Cp (the proximate cause in the chain), and Cp causes B, where B is the proposition that this or that organism has a token of b at a given time.Footnote 5 The arrow from X to X ∨ Y, in contrast, is logical; it indicates that X logically entails X ∨ Y. The issue raised by the disjunction problem concerns whether b’s content is X and isn’t the “simultaneous” proposition X ∨ Y. The issue in the distality problem concerns whether b’s content is X and isn’t the “earlier” proposition Cd or the “later” proposition Cp.Footnote 6
Our interest here is in purely probabilistic causal (ppc) theories. Suppose that tokens of b are caused by X1, by X2, …, and by Xn. Let T be a ppc theory. If T entails that b’s content is X1 and isn’t X2, …, or Xn, then this is solely because of B’s probabilistic (and causal) profile with respect to X1, X2, …, and Xn. Perhaps, for example, it’s because B indicates that X1 is true in that Pr(X1 | B) = 1, but does not do the same for any of X2, …, or Xn. More specifically, a ppc theory T takes as “input” a set (perhaps infinite) of candidate propositions for b’s content (where, since a ppc theory is a causal theory, these propositions are restricted to propositions about things that can cause tokens of b), a probability distribution defined over B and those propositions, and nothing else, and then “outputs” a verdict on b’s content—for instance, the verdict that b’s content is X and isn’t any other proposition.Footnote 7
Some causal theories, in contrast, are probabilistic but only “partially” so. They take as input not just a set of candidate propositions for b’s content and a probability distribution defined over B and those propositions, but also something else.Footnote 8
We will focus on ppc theories, but this isn’t because we’re convinced that an adequate theory of content should be purely probabilistic. We harbor no such conviction. Our motivation, rather, is that the prospects of ppc theories in the context of the disjunction problem and the distality problem have been underexplored in the literature, and that this gap is unfortunate since a more thorough examination might be significant.Footnote 9 If it turns out that some ppc theories are able to cope with the disjunction problem and the distality problem, and if ppc theories are nonetheless problematic as a whole, then this isn’t because of the disjunction and distality problems. If, instead, it turns out that no ppc theory can handle both problems, then this provides additional motivation for partially probabilistic theories and also for non-probabilistic theories (i.e., theories that don’t take probability distributions as relevant inputs). Furthermore, it might be that a more thorough examination of ppc theories will suggest novel theories in the partially probabilistic camp, and perhaps some such theories will compare favorably with extant partially probabilistic theories.
The remainder of this paper is divided into five sections. In Sect. 2, we clarify how we mean for the disjunction and distality problems to be understood. We also introduce a third problem, which we call “the hard problem.” In Sect. 3, we describe four types of ppc theory, and present two theories of each type. We call the eight theories in question “T1,” “T2,” and so on. Some of these have been discussed in the extant literature, but others are new. We also note several—240, to be exact—hybrids of two or more of T1–T8. In Sect. 4, we describe which of our candidate theories can handle the disjunction problem, which can handle the distality problem, and which can handle the hard problem. It turns out that though some can handle the first, and some can handle the second, none can handle the third. This is our main result. In Sect. 5, we consider three potential responses to that result, and argue that the first two fail, but the third has some promise. In Sect. 6, we offer some concluding comments.
2 The disjunction problem, the distality problem, and the hard problem
2.1 The disjunction problem
Our target theories of content are causal, from which it follows that some candidate propositions for b’s content are ruled out from the start. For example, B is ruled out as b’s content, on the grounds that b can’t be caused by B’s being true, and propositions about the future are ruled out as well, and for the same reason.
Consistent with this, consider the following:
- (DISJ1):
-
b’s content is X, X ∨ Y, or Y.
- (DISJ2):
-
1 > Pr(X ∨ Y) > Pr(X&Y) = 0.
- (DISJ3):
-
Pr(B&X) > 0 and Pr(B&Y) > 0.
We will say that a given ppc theory T can handle the disjunction problem if and only if there is a belief state b, there are propositions X, X ∨ Y, and Y, and there is a probability distribution such that (i) (DISJ2) and (DISJ3) hold and (ii) given the assumption that (DISJ1) holds, T outputs the result that b’s content is X and isn’t X ∨ Y or Y.Footnote 10 We propose that an adequate ppc theory of content should be able to handle the disjunction problem thus understood. A ppc theory that passes this test thus makes room for misrepresentation.
The probabilities deployed in (DISJ2) and (DISJ3), and in what follows, should not be understood as credences. That would put the cart before the horse, since we want to consider theories that characterize propositional contents in terms of probabilities; this means that the probabilities themselves should not involve degrees of belief. A broadly “objective” interpretation of probability is needed, but we won’t assume any particular objective interpretation here.Footnote 11,Footnote 12
Our way of understanding the disjunction problem differs from Fodor’s (1987) in that his, but not ours, requires that each of X and Y be causally sufficient but not necessary for b.Footnote 13 We prefer ours because it doesn’t involve that requirement and thus is more general, at least in that respect. We leave it open, however, that an adequate theory of content should also be able to handle Fodor’s version of the disjunction problem. The important point here is just that an adequate theory of content should be able to handle ours.Footnote 14
Assumptions (DISJ2) and (DISJ3) are pretty modest; for example, they do not entail that B is probabilistically dependent on X or on Y. The two assumptions therefore fail to reflect a feature of our simple example. We said that tokens of b are caused by dogs and by foxes-at-a-distance, and it is natural to assume that these causes raise the probability of b’s occurring. This modesty is all for the good, however, since we are especially interested in finding theories of content that fail to solve the disjunction problem. If there is no probability distribution that satisfies our austere requirements, there is no probability distribution that satisfies a logically stronger set of requirements. We grant, though, that exploring different formulations of the disjunction problem is a good project for the future, and we take one step in that direction in Sect. 5.3. These points about the disjunction problem also apply to our formulation of the distality problem, to which we now turn.
2.2 The distality problem
Let’s return to the causal chain depicted in Fig. 1, from Cd to X to Cp to B, and consider:
- (DIST1):
-
b’s content is X, Cp, or Cd.
- (DIST2):
-
B’s probability is increased by each of Cp, X, and Cd, Cp’s probability is increased by each of X and Cd, and X’s probability is increased by Cd.
- (DIST3):
-
Cp screens-off each of X and Cd from B, and X screens off Cd from each of B and Cp.Footnote 15
These last two propositions are based on the assumption that (a) B is caused by each of Cd, X, and Cp, (b) Cp is caused by each of Cd and X, (c) X is caused by Cd, (d) causes (at least typically) increase the probabilities of their effects, and (e) events often screen-off their causes from their effects.Footnote 16
We will say that a given ppc theory T can handle the distality problem if and only if there is a belief state b, there are propositions X, Cp, and Cd that are causally related in the way just described, and there is a probability distribution such that (i) (DIST2) and (DIST3) hold and (ii) given the assumption that (DIST1) holds, T outputs the result that b’s content is X and isn’t Cp or Cd.Footnote 17 We propose that an adequate ppc theory of content should be able to handle the distality problem thus understood.
We mean for the distality problem to be understood so that it resembles but is distinct from “the solipsism problem.” The latter says that an adequate theory of meaning must allow for the possibility that organisms have beliefs about things outside their own minds. The former goes farther and says that an adequate theory of meaning must allow for the possibility that organisms have beliefs about things outside their own bodies; this happens, for example, when you have beliefs about dogs as opposed to your retinal states. Any theory that can handle the distality problem can handle the solipsism problem, but not vice versa.Footnote 18
2.3 The hard problem
We will say that a given ppc theory T can handle the hard problem if and only if T can handle both the disjunction problem and the distality problem. The hard problem is harder to handle than the disjunction problem and the distality problem taken individually. Why we give the hard problem that moniker will be clear by the end of Sect. 4.
3 A gaggle of ppc theories
This section has seven subsections. In the first four, we set out ppc theories T1–T8. In the fifth, we provide a table of those eight theories, and note an important distinction. In the sixth, we relate T1–T8 to various probabilistic theories in the extant literature. In the seventh, we provide two schemas for constructing hybrids of two or more of T1–T8. The result is a total of 240 additional ppc theories.
3.1 Maximum-probability theories
Consider the following:
-
T1: For any b and X, b’s content is X if and only if Pr(X | B) = 1.
-
T2: For any b and X, b’s content is X if and only if Pr(B | X) = 1.
We call these theories “Maximum-Probability Theories,” since each says that whether b’s content is X is a matter of whether a given probability has the maximum value of unity. The probability at issue in T1 is the probability of X given B. The probability at issue in T2 is the probability of B given X. Since there can be cases where the one probability equals unity but the other one does not, T1 and T2 are logically distinct.
It might seem that T1 and T2 are too demanding in requiring probabilities of 1, and that they should be relaxed so that the probabilities in question need to be high but don’t need to be maximally high. However, note that these relaxed versions of T1 and T2 would be open to the worry that there’s no non-arbitrary threshold for high probability. Why, for example, set the bar at 0.95 as opposed to 0.949?Footnote 19
Note too that T1 and T2 can be understood so that the probabilities in question are restricted to special circumstances. Dretske (1981), for example, defends a theory in the neighborhood of T1 on which Pr(X | B) is relativized to a certain “training” period. This allows there to be cases after the relevant training period where B is true but X is false.
3.2 Increase-in-probability theories
T1 and T2 contrast with the following theories:
-
T3: For any b and X, b’s content is X if and only if Pr(X | B) > Pr(X).
-
T4: For any b and X, b’s content is X if and only if Pr(B | X) > Pr(B).
These are “Increase-in-Probability Theories.” Although neither T3 nor T4 is equivalent with either of T1 and T2, T3 and T4 are equivalent with each other, since increase in probability is symmetric. We therefore count them as a single theory, which we call “T3/T4.”
We don’t know of any proponents of T3/T4, although Artiga and Sebastian (forthcoming) discuss it. We mention this theory for completeness, but there’s a further reason: even if T3/T4 is implausible on its own, maybe that theory can be used to construct a hybrid theory on which b’s content is X if and only if the right side of T3/T4 holds and some additional condition does too. We explore this possibility in Sect. 3.7.
T3/T4 resembles T1 and T2: its right-hand side, like T1’s right-hand side and T2’s right-hand side, is non-contrastive. To see whether proposition X is the content of b, you don’t need to consider an alternative proposition Y. T1, T2, and T3/T4 are in that respect unlike the theories to which we now turn.
3.3 Highest-probability theories
Here are two more theories:
-
T5: For any b and X, b’s content is X if and only if Pr(X | B) > Pr(Y | B) for any Y distinct from X.Footnote 20
-
T6: For any b and X, b’s content is X if and only if Pr(B | X) > Pr(B | Y) for any Y distinct from X.
These are “Highest-Probability Theories.” The right-hand side of T5 says that the probability of X given B is greater than the probability of any other proposition given B. The right-hand side of T6 says that the probability of B given X is greater than the probability of B given any other proposition. There are cases where the right-hand side of the one holds but the right-hand side of the other does not, so T5 and T6, unlike T3 and T4, are logically distinct.
3.4 Highest-degree-of-confirmation theories
T5 and T6 are contrastive analogues of the non-contrastive T1 and T2. The following, in turn, are contrastive analogues of T3 and T4:
-
T7: For any b and X, b’s content is X if and only if DOC(X, B) > DOC(Y, B) for any Y distinct from X.
-
T8: For any b and X, b’s content is X if and only if DOC(B, X) > DOC(B, Y) for any Y distinct from X.
These are “Highest-Degree-of-Confirmation Theories,” where, for any propositions E and H, DOC(H, E) is the degree to which E confirms H, where confirmation is a matter of increase in probability. The right-hand side of T7 says that the degree to which B confirms X is greater than the degree to which B confirms any other proposition. The right-hand side of T8 says that the degree to which X confirms B is greater than the degree to which any other proposition confirms B.
How is degree of confirmation to be measured? Several prima facie plausible answers to this question have been discussed in the literature.Footnote 21 One is that the degree to which E confirms H is a matter of the difference between H’s probability given E (i.e., H’s posterior probability relative to E) and H’s prior probability:
This is the “difference measure” of degree of confirmation. Another prima facie plausible answer is that the degree to which E confirms H is a matter of the ratio of H’s probability given E and H’s prior probability:
This is the “ratio measure” of degree of confirmation. DOCDM and DOCRM both meet the following minimal adequacy condition on measures of degree of confirmation:
- (*):
-
There is a number n such that DOC(H, E) >/=/< n if and only if Pr(H | E) > / = / < Pr(H).Footnote 22
Here “n” is the neutral point between confirmation and disconfirmation. For DOCDM, the neutral point n is 0; for DOCRM, the neutral point is 1.
DOCDM and DOCRM are not equivalent. DOCRM is symmetric in that DOCRM(H, E) = DOCRM(E, H) in all cases. This isn’t true of DOCDM, for DOCDM(H, E) ≠ DOCDM(E, H) in some cases.
We take no stand here on whether one of DOCDM and DOCRM is preferable to the other. It turns out, however, that if T7 and T8 are understood in terms of DOCRM, then T6, T7, and T8 are all logically equivalent to each other. We show this in Appendix 1. So, since T6 is already on the table, and since no two of T6, T7, and T8 are logically equivalent when T7 and T8 are understood in terms of DOCDM, we shall understand T7 and T8 in terms of DOCDM.Footnote 23 This choice allows an additional ppc theory to be placed on the table.
3.5 B-to-X theories versus X-to-B theories
T1–T8 are listed in Table 1. Even though we count T3 and T4 as a single theory, we list them separately in the table so as to highlight an important distinction. T1, T3, T5, and T7 are “B-to-X” theories in that the right-hand side of each involves a conditional probability that “moves” from B as the conditioning proposition to X as the conditioned proposition. T2, T4, T6, and T8, in contrast, are “X-to-B” theories in that the right-hand side of each involves a conditional probability that “moves” from X as the conditioning proposition to B as the conditioned proposition. T3 and T4 are logically equivalent to each other, but this isn’t true in general when it comes to B-to-X theories and their X-to-B counterparts.Footnote 24
3.6 Extant probabilistic theories
T1–T8 are all inspired by extant probabilistic theories (whether or not they are pure). First, T1, T2, and T3/T4 are inspired by Dretske’s (1981, 1983) theory. This is a theory on which b’s content is X only if Pr(X | B) = 1 > Pr(X).Footnote 25 T1 and T2 are like Dretske’s in requiring a maximal probability of unity, whereas T3/T4 is like Dretske’s in requiring a probability increase. Second, T5 and T6 are inspired by Rupert’s (1999) theory, on which Pr(B | X) needs to be greater than Pr(B | Y) for any Y distinct from X, but doesn’t need to have the maximal value of unity. Rupert restricts his theory to natural kind concepts, and so, strictly speaking, it isn’t identical to T6 (which isn’t thus restricted). Even so, T6 is obviously similar to Rupert’s, and so is T5 in requiring a highest probability as opposed to a maximum probability. Third, T7 and T8 are inspired by Eliasmith’s (2005) and Usher’s (2001) theories. These are theories on which DOCRM(B, X) needs to be greater than DOCRM(B, Y) for any Y distinct from X, but doesn’t need to clear some absolute threshold.Footnote 26 They frame their theories in terms of “information” as opposed to “confirmation,” yet T8 is nonetheless similar to their theories, and so is T7 in requiring a highest degree of confirmation as opposed to a degree of confirmation greater than some absolute threshold.
3.7 Hybrid theories
There are ppc theories additional to T1–T8. Consider, for example, the following:
-
T1&T2: For any b and X, b’s content is X if and only if (i) Pr(X | B) = 1 and (ii) Pr(B | X) = 1.
-
T1 ∨ T2: For any b and X, b’s content is X if and only if (i) Pr(X | B) = 1 or (ii) Pr(B | X) = 1.
Each of these theories is based on T1 and T2. The difference is that the right-hand side of T1&T2 is the conjunction of T1’s and T2’s right-hand sides, whereas the right-hand side of T1 ∨ T2 is their disjunction.
This is the tip of the iceberg. T1&T2 is but one of 120 instances of the following conjunctive schema (where each theory in question is one of T1–T8):
-
Ti&Tj&…&Tn: For any b and X, b’s content is X if and only if Ti’s right-hand side holds, Tj’s right-hand side holds, …, and Tn’s right-hand side holds.
Similarly, T1 ∨ T2 is but one of 120 instances of the following disjunctive schema (where each theory in question is one of T1–T8):
-
Ti ∨ Tj ∨ … ∨ Tn: For any b and X, b’s content is X if and only if Ti’s right-hand side holds, Tj’s right-hand side holds, …, or Tn’s right-hand side holds.
These schemas yield a total of 240 ppc theories in addition to T1–T8. That is a lot.
We will mention additional ppc theories in Sect. 5, but we now have enough theories to get started. There are seven “basic” (non-hybrid) theories (T1, T2, T3/T4, T5, T6, T7, and T8) and 240 hybrids formed from these basics (T1&T2, T1&T3, etc.).
4 How theories T1–T8 and their hybrids fare
T1–T8 are a mixed bag when it comes to the disjunction problem. T1 and T5 fall prey to the the disjunction problem, whereas the remaining theories—T2, T3/T4, T6, T7, and T8—do not. These results are established in Appendix 2. A mix of good news and bad also arises in connection with the distality problem, though here the pattern is different. T1, T2, T3/T4, T6, T7, and T8 all succumb to the distality problem, whereas T5 does not. These results are established in Appendix 3. When it comes to the hard problem, in contrast, T1–T8 are all in the same boat: each of them falls prey to the hard problem. This follows from the fact that each of them falls prey to the disjunction problem or the distality problem.
These results are summarized in Table 2. A “Yes” in a cell indicates that the theory in question can handle the problem in question; a “No” in a cell means that the theory cannot.
What about the two hundred forty hybrids of two or more of T1–T8 noted in Sect. 3.7? It turns out that none of them can handle the hard problem either. We show this in Appendix 4. Some of them can handle the disjunction problem, and some can handle the distality problem, but none can handle both.
We are not the first to note that various ppc theories have trouble with distality. For example, consider this passage from Artiga and Sebastián (forthcoming):
Consider the Fusiform Face Area (FFA), which is usually thought to represent faces…. Suppose we discover that a certain neural network R in the FFA selectively fires with significant intensity when there is a face and also that, given that R is active, the entity that is more likely to be present is a face. One might think these observations suffice for establishing the fact that the brain state represents face according to SGIT. Unfortunately, it is unclear that SGIT can deliver this result. Consider, for instance, the set of neuronal structures in the thalamus that are active whenever there is a face in front of the subject. If R has the highest statistical dependence with faces, it will also normally have the highest statistical dependence with these neuronal states in early vision. Thus, SGIT would entail that this activity in FFA represents neuronal activation in another part of the brain. This is of course an extremely counterintuitive result. Indeed, even if there was some principled way of excluding other brain states from being represented, other inadequate contents such as face-looking thing could probably not be avoided. (Artiga and Sebastián forthcoming, p. 8, italics original)
Here “SGIT” is short for “Scientifically Guided Informational Theories.” T6 is an example of such a theory, and so is T8 when understood in terms of DOCRM.Footnote 27 In our terminology, and focusing on T6, Artiga and Sebastián’s worry is that T6 outputs the mistaken result that b’s content is the proposition Cp, which describes neuronal activation in the brain, and isn’t X, which describes a face.
There’s an important difference, though, between Artiga and Sebastián’s discussion and ours. Consider the the following claims from the passage just quoted:
If R has the highest statistical dependence with faces, it will also normally have the highest statistical dependence with these neuronal states in early vision.
Indeed, even if there was some principled way of excluding other brain states from being represented, other inadequate contents such as face-looking thing could probably not be avoided.
These claims are prima facie plausible, but Artiga and Sebastián provide no argument in support of them. The claims are simply asserted. In contrast, we prove in Appendix 3 that there are no probability distributions such that (i) (DIST2) and (DIST3) hold and (ii) given the assumption that (DIST1) holds, T6 outputs the result that b’s content is X and isn’t Cp or Cd.Footnote 28
One final note is in order. The fact that a given theory solves the disjunction problem or the distality problem doesn’t mean that there are realistic probability distributions—probabilitiy distributions in line with the relevant frequencies in the actual world—of the sort in question. Consider T5, for example, and the fact that it can handle the distality problem. It could be that no realistic probability distribution is such that (i) (DIST2) and (DIST3) hold and (ii) given the assumption that (DIST1) holds, T5 outputs the result that b’s content is X and isn’t Cp or Cd. Our adequacy conditions are very weak, which is why failing to meet them is a death blow to a theory, whereas meeting them is a minor victory.Footnote 29
5 Three potential responses to the hard problem
It would be premature to give up hope at this point, and conclude that no ppc theory can handle the hard problem. There are potential responses to consider.
5.1 Weaken T1–T8
In Sect. 3.7, we noted 240 hybrid ppc theories that were formed by using two or more of T1–T8. Each of those hybrid theories is like T1–T8 in that it gives a necessary and sufficient condition for b’s meaning X, and each is like T1–T8 in that it falls prey to the hard problem. What about ppc theories that are like T1–T8 except that they give only a sufficient condition for b’s meaning X or give just a necessary condition for b’s meaning X? Let “TiS” be Ti (for any i = 1, 2, …, 8) when weakened so as to give only a sufficient condition for b’s meaning X, and let “TiN” be Ti (for any i = 1, 2, …, 8) when weakened so as to give only a necessary condition for b’s meaning X. Can any of theses weaker theories handle the hard problem?
The situation is perfectly uniform when it comes to T1S–T8S: none of them can handle the disjunction problem or the distality problem; hence none of them can handle the hard problem. The reason why is straightforward. None of T1S–T8S gives a necessary condition for b’s meaning X, and thus none of them can rule out any candidate proposition as b’s content. For example, although there might be cases where T5S outputs the result b’s content is X, T5S is unable to output the result b’s content is not say, X ∨ Y.Footnote 30
It might seem that the situation is similar with respect to T1N–T8N. For, it might seem that because none of T1N–T8N gives a sufficient condition for b’s meaning X, none of them can “rule in” any candidate proposition as b’s content. However, consider (DISJ1) and (DIST1). The former says that b’s content is X, X ∨ Y, or Y, while the latter says that b’s content is X, Cp, or Cd. Take some case where (DIST2) and (DIST3) hold, and suppose that one of T5N, for example, rules out each of Cp and Cd as b’s content because Pr(X | B) is greater than each of Pr(Cp | B) and Pr(Cd | B). Then given the assumption that (DIST1) holds, it follows by T5N that b’s content is X.
The situation with respect to T1N–T8N is summarized in Table 3. First, neither T1N nor T5N can handle the disjunction problem, but the other theories can. Second, T5N can handle the distality problem, but none of the other theories can. Third, none of the theories can handle the hard problem. These results are established in Appendix 5.
It’s interesting that the pattern for the necessary-condition theories T1N–T8N (as summarized in Table 3) is identical to the pattern for the necessary-and-sufficient-condition theories T1–T8 (as shown in Table 2). We conjecture that the reason for this confluence is to be found in (DISJ1) and (DIST1).
5.2 Move from confirmation to either correlation or mutual information
It’s not uncommon for theorists to use the term “correlation” in such a way that a high degree of correlation is just a matter of a high conditional probability. Consider, for example, the following passage from Fodor:
However, the crude treatment just sketched clearly won’t do: it is open to an objection that can be put like this: If there are wild tokenings of R, it follows that the nomic dependence of R upon S is imperfect; some R-tokens – the wild ones – are not caused by S tokens. Well, but clearly they are caused by something; i.e., by something that is, like S, sufficient but not necessary for bringing Rs about. Call this second sort of sufficient condition the tokening of situations of type T. Here’s the problem: R represents the state of affairs with which its tokens are causally correlated. Some representations of type R are causally correlated with states of affairs of type S; some representations of type R are causally correlated with states of affairs of type T. So it looks as though what R represents is not either S or T, but rather the disjunction (S ∨ T): The correlation of R with the disjunction is, after all, better than its correlation with either of the disjuncts and, ex hypothesi, correlation makes information and information makes representation. If, however, what Rs represent is not S but (S ∨ T), then tokenings of R that are caused by T aren’t, after all, wild tokenings and our account of misrepresentation has gone West. (Fodor 1984, p. 240, emphasis original)
Dretske (1983, pp. 83–84) also uses “correlation” to refer to a single conditional probability.
Fodor’s claims in the above quote make sense, if “high degree of correlation” means high conditional probability. For, switching to our notation, if X and Y each entails X ∨ Y but not vice versa (since X and Y are mutually exclusive), it follows that Pr(X | B) < Pr(X ∨ Y | B) and Pr(Y | B) < Pr(X ∨ Y | B), which means that B’s “degree of correlation” with X ∨ Y is greater than both its degree of correlation with X and its degree of correlation with Y.
This usage of “correlation,” however, is miles away from standard usage in statistics. Consider:
This is the Pearson correlation coefficient applied to propositions rather than to quantitative variables.Footnote 31 Pearson’s r(H, E) has a range of [1, −1], where r(H, E) > / = / < 0 if and only if Pr(H | E) > / = / < Pr(H). Suppose, for example, that Pr(H | E) = 0.990 < 0.999 = Pr(H). Then though Pr(H | E) is high and thus E’s degree of correlation as understood by Fodor is high, r(H, E) is negative and thus it isn’t high.
But are there cases where X entails X ∨ Y but not vice versa, and yet it’s not the case that r(X, B) < r(X ∨ Y, B)? Yes, for there are cases where X entails X ∨ Y but not vice versa, and yet Pr(X | B) > Pr(X) whereas Pr(X ∨ Y | B) < Pr(X ∨ Y).Footnote 32 Any such case is a case where r(X, B) > 0 > r(X ∨ Y, B).
There are clear respects in which correlation as standardly understood in statistics is similar to confirmation as standardly understood in Bayesian confirmation theory. But, at the same time, there are important differences. Unlike DOCDM, r is symmetric in that r(H, E) = r(E, H) in all cases. And unlike DOCRM, r is maximal at 1 precisely when H and E entail each other.Footnote 33
This suggests that the way to solve the hard problem may be to replace Highest-Degree-of-Confirmation Theories such as T7 with a Highest-Degree-of-Correlation Theory like the following:
-
T9: For any b and X, b’s content is X if and only if r(X, B) > r(Y, B) for any Y distinct from X.Footnote 34
However, we show in Appendix 6 that T9 falls prey to the distality problem.Footnote 35 Hence T9 cannot solve the hard problem.
Perhaps if T9 were modified in terms of some degree-of-correlation measure other than r, the resulting theory would be able to handle the hard problem. We leave that question for the future, and turn now to “mutual information.”
Some philosophers use the expression “mutual information” to refer to the logarithm, base 2, of DOCRM(H, E):
We noted in Sect. 3.4 that T7 and T8 can be understood in terms of DOCRM, and that if they are so understood, then T6, T7, and T8 are all logically equivalent to each other. The same is true if T7 and T8 are understood in terms of mi. This is because DOCRM and mi are ordinally equivalent in that for any H1, H2, E1, and E2, DOCRM(H1, E1) > / = / < DOCRM(H2, E2) if and only if mi(H1, E1) > / = / < mi(H2, E2); see Sect. 3.6 for relevant background. Hence it won’t help to modify T7 and T8 in terms of mi.
However, there’s another usage of the expression “mutual information” in the literature. In information theory (Cover and Thomas 2006), the expression “mutual information” is standardly used to refer not to mi, but to a weighted average of mi:
Note that whereas H and E are propositions, ΓH = {H1, H2, …, Hn} and ΓE = {E1, E2, …, Em} are partitions of propositions (i.e., sets of pairwise mutually exclusive and jointly exhaustive propositions).Footnote 36 Can T7 and T8 be modified by using MI, and if so, would this help in terms of handling the hard problem?
It isn’t clear how best to modify T7 and T8 by using MI, but we have a suggestion. Consider:
-
T10: For any b and X, b’s content is X if and only if there are partitions Γ1 = {B , ~ B} and Γ2 = {X , Y1, …, Yn} such that (i) MI(Γ1, Γ2) > MI(Γ1, Γ3) for any Γ3 distinct from Γ2 and (ii) mi(X, B) > mi(Yi, B) for any Yi in Γ2.Footnote 37
Think of this as working in two steps. First, MI narrows down b’s content to the members of a particular partition. Call this “the content partition.” Second, mi induces a further narrowing, to a particular member of the content partition. For example, suppose that the candidate content partitions are Γ2= {X, Y, Z} and Γ3={X*, Y*, Z*}, that MI(Γ1, Γ2) is greater than MI(Γ1, Γ3), and that mi(X, B) is greater than each of mi(Y, B) and mi(Z, B). Given that MI(Γ1, Γ2) is greater than MI(Γ1, Γ3), it follows that the content partition is Γ2, and so, as none of X*, Y*, and Z* is a member of that partition, b’s content isn’t X*, Y*, or Z*. Given that Γ2 is the content partition, and that mi(X, B) is greater than each of mi(Y, B) and mi(Z, B), it then follows that b’s content is X and isn’t Y or Z.
What are the candidate content partitions in the disjunction problem and the distality problem? It might seem that one of the candidate content partitions in the disjunction problem should include X, X ∨ Y, and Y, and that, similarly, one of the candidate content partitions in the distality problem should include X, Cp, and Cd. But note that X, X ∨ Y, and Y aren’t pairwise mutually exclusive, and neither are X, Cp, and Cd, so no set with X, X ∨ Y, and Y as members or with X, Cp, and Cd as members is a partition. We will assume, instead, that the candidate content partitions in the context of the disjunction problem are Γ2= {X, Y, ~ X& ~ Y} and Γ3={ X ∨ Y, ~ X& ~ Y}, and that the candidate content partitions in the context of the distality problem are Γ2={ X, ~ X}, Γ3={ Cp, ~ Cp}, and Γ4={ Cd, ~ Cd}.
We show in Appendix 7 that T10 falls prey to the distality problem and thus can’t handle the hard problem.Footnote 38 Hence, just as it won’t help solve the hard problem to move from confirmation to correlation in the sense of r, it also won’t help to move from confirmation to mutual information in the sense of MI.
5.3 Move to a degree-of-confirmation measure other than DOCDM
T7 and T8, when understood in terms of DOCDM, fall prey to the distality problem, but what if they are modified in terms of some degree-of-confirmation measure other than DOCDM (and also other than DOCRM)? Does this allow them to handle the distality problem? If so, does this modification also enable them to handle the hard problem?
Before answering this question, it’s important to note here that our proofs in Appendix 3 regarding T7 and T8 go well beyond these theories when understood in terms of DOCDM. Our proof that T7 falls prey to the distality problem generalizes to T7 when understood in terms of any degree-of-confirmation measure such that:
-
Weak Law of Likelihood: For any E, H1, and H2, if (i) Pr(E | H1) > Pr(E | H2) and (ii) Pr(E | ~ H1) < Pr(E | ~ H2), then DOC(H1, E) > DOC(H2, E).Footnote 39
Further, our proof that T8 falls prey to the distality problem generalizes to T8 when understood in terms of any degree-of-confirmation measure such that:
-
Final Probability Incrementality: For any E1, E2, and H, if Pr(H | E1) > Pr(H | E2), then DOC(H, E1) > DOC(H, E2).Footnote 40
It won’t help, then, to simply understand T7 and T8 in terms of any old degree-of-confirmation measure that is distinct from DOCDM.
However, there are degree-of-confirmation measures on which Weak Law of Likelihood or Final Probability Incrementality does not hold. Here is an example:
This is a variant of DOCDM on which Final Probability Incrementality holds but Weak Law of Likelihood does not. Now consider:
-
T11: For any b and X, b’s content is X if and only if DOCDM*(X, B) > DOCDM*(Y, B) for any Y distinct from X.
It turns out, surprisingly, that T11 can handle the hard problem. We show this in Appendix 8.
This does not mean that T11 has any real plausibility in the context of ppc theories. DOCDM* is a rather strange-looking measure (to say the least). Why square X’s posterior and prior probabilities? Why not instead take them to the third power, or the fourth? Further, there are variants of the disjunction problem and the distality problem to consider. Consider:
- (DISJ4):
-
Pr(B | X) > Pr(B) and Pr(B | Y) > Pr(B).
Let’s say that a given ppc theory T can handle the disjunction* problem if and only if there is a probability distribution D such that (i) (DISJ2), (DISJ3), and (DISJ4) hold and (ii) given the assumption that (DISJ1) holds, T outputs the result that b’s content is X and isn’t X ∨ Y or Y. If, as it seems, an adequate theory of content should be able to handle the disjunction* problem thus understood, then T11 is inadequate. For, as we show in Appendix 9, T11 falls prey to the disjunction* problem.
However, the fact that at least one Highest-Degree-of-Confirmation Theory can handle the hard problem perhaps provides some hope for ppc theories.
6 Conclusion
The disjunction problem is a problem for some but not all of T1–T8 and their 240 hybrids, and likewise with respect to the distality problem. The hard problem, in contrast, is a problem for every single one of T1–T8 and their 240 hybrids (Sect. 4). This generalizes to various weakened versions of T1–T8 (Sect. 5.1), to T7 and T8 both when they are modified in terms of Pearson’s correlation measure r and when they are modified in terms of mutual information MI (Sect. 5.2), to T7 when modified in terms of any degree-of-confirmation measure that, like DOCDM, meets Weak Law of Likelihood (Sect. 5.3), and to T8 when modified in terms of any degree-of-confirmation measure that, like DOCDM, meets Final Probability Incrementality (Sect. 5.3). The hard problem is recalcitrant! However, it doesn’t bring down every ppc theory in logical space. T11, for instance, can handle it. We don’t claim that T11 is the right theory of semantic content, in part because it succumbs to the strengthened version of the disjunction problem described in Sect. 5.3. Rather, our point is that ppc theories have resources that should be scrutinized further, in connection with the hard problem, and with respect to other conditions of adequacy as well.Footnote 41 We note, in this regard, that although we have looked at a largish number of candidate ppc theories, we have used a small handful of formal tools to construct those theories; many of those tools are Bayesian. There are other formal frameworks that are worth exploring.Footnote 42
We close with an analogy. Adaptationism is a research program in evolutionary biology that aims to explain the current traits of organisms in terms of natural selection in ancestral populations. It would be absurd to dismiss this research program on the grounds that adaptationism has so far failed to explain this or that trait in a given biological population. We feel the same way about the development of ppc theories of semantic content. This is a research program, and noting defects in this or that ppc theory hardly suffices to show that the whole program is bankrupt. Philosophers should be just as tenacious as scientists!
Notes
The extant literature on causal theories of mental content is huge. See Adams and Aizawa (2017, sec. 4.4) for a helpful overview.
Here and throughout our focus in on causal theories of the contents of perceptual belief types. How exactly such theories should be incorporated into fully general theories of the contents of belief types is a difficult but separate issue (as is the issue of how they should be incorporated into theories of the contents of mental state types other than beliefs). See Adams and Aizawa (2017, sec. 4.4), Buras (2009), and Gerken (2014) for relevant discussion.
The disjunction problem is widely associated with Fodor. This makes some sense, since Fodor makes heavy use of it in his 1984 paper “Semantics, Wisconsin style” and in many subsequent works (e.g., Fodor 1987, Ch. 4; 1990a, Ch. 3; b). But there are works prior to Fodor (1984) in which the essence of the disjunction problem is discussed (though not under that name). See, e.g., Stampe (1977, 44) and Dretske (1983, p. 89). For further discussion and references, see Adams and Aizawa (2017).
Here, for ease of presentation, we’re being a bit sloppy. Cd, X, Cp, and B are propositions, not events, but causation is a relationship among events, not propositions. We trust that readers will not be thrown into confusion by this.
We aren’t assuming that causes always precede their effects; it suffices that this is often the case with causal chains leading to belief states.
We put “input” and “output” in scare quotes because we aren’t saying that theories of content are decision procedures in the sense of providing an algorithm.
An anonymous reviewer asks whether partially probabilistic causal theories are logically stronger than their corresponding purely probabilistic causal theories. Take some ppc theory T and some partially probabilistic variant of it T*, and suppose that each theory has the form “For any b and X, b’s content is X if and only if …” where the right-hand side of T* is logically stronger than the right-hand side of T. It doesn’t follow that T* is logically stronger than T. In fact, it might well be that T and T* are mutually exclusive. Compare: the right-hand side of “S is a bachelor if and only if S is an unmarried adult male” is logically stronger than the right-hand side of “S is a bachelor if and only if S is an adult male”, but the first biconditional is inconsistent with and thus isn’t logically stronger than the second.
We aren’t claiming here that there has been absolutely no discussion of ppc theories in the context of the disjunction problem and the distality problem. Artiga and Sebastián (forthcoming) examine the theories of Eliasmith (2005), Rupert (1999), and Usher (2001) in that context. We will discuss these theories in what follows. However, the present point is that the class of ppc theories in logical space goes well beyond these examples. This should be abundantly clear by the end of Sect. 3.
Note that the right-hand side of this biconditional leaves it open that there is a belief state b, there are propositions X, X ∨ Y, and Y, and there is a probability distribution such that (i) (DISJ2) and (DISJ3) hold and (ii) given the assumption that (DISJ1) holds, it’s not the case that T outputs the result that b’s content is X and isn’t X ∨ Y or Y.
There are numerous interpretations of probability; see Hájek (2012) for helpful discussion.
An anonymous reviewer objects that ppc theories don’t apply to cases where we lack sample frequencies, and that this is true in many cases of perceptual beliefs. The issue, though, isn’t whether we have the requisite sample data. The issue is whether the requisite probabilities exist. If they do, then even if we don’t have evidence about their values, ppc theories have application.
Here is Fodor’s (1987, p. 102, emphasis original) formulation: “… a viable causal theory of content has to acknowledge two kinds of cases where there are disjoint causally sufficient conditions for the tokenings of a symbol: the case where the content of the symbol is disjunctive (‘A’ expresses the property of being (A v B)) and the case where the content of the symbol is not disjunctive and some of the tokenings are false (‘A’ expresses the property of being A, and B-caused ‘A’ tokenings misrepresent)”.
It should be clear, then, that we are giving a necessary condition, not a sufficient condition, for the adequacy of a theory of content. The same is true with respect to the distality problem and the hard problem as formulated below.
Here we have in mind “no-impact” screening-off: For any P, Q, and R, Q screens-off P from R precisely when Pr(R | Q & P) = Pr(R | Q) and Pr(R | ~ Q & P) = Pr(R | ~ Q). This kind of screening-off is logically stronger than “positive impact screening-off” and “negative impact screening-off” as formulated in Roche and Shogenji (2014).
We aren’t assuming that causality is transitive. We’re assuming only that some effects are caused both by their causes and by the causes of their causes.
The distality problem so understood resembles what Artiga and Sebastián (forthcoming, Sec. 3.2) call the “wrong distality attribution problem,” but the two are different.
Here we assume, contrary to Descartes, that minds have spatial locations – they are inside the bodies of minded individuals.
Note too that what we say about T1 and T2 in relation to the hard problem carries over to T1 and T2 when relaxed so that the probabilities in question need to be high but don’t need to be maximally high.
Here and throughout by “distinct” we mean “logically independent”.
This is a compressed way of saying that there is a degree n such that (i) DOC(H, E) > n if and only if Pr(H | E) > Pr(H), (ii) DOC(H, E) = n if and only if Pr(H | E) = Pr(H), and (iii) DOC(H, E) < n if and only if Pr(H | E) < Pr(H).
In Sect. 5.3, we address variants of T7 and T8 on which neither DOCDM nor DOCRM is assumed.
Consider the following logarithmic variant of DOCRM:
$${\text{DOC}}_{{{\text{LRM}}}} (H,E) = \log [\Pr (H|E)/\Pr (H)]$$Strictly speaking, Eliasmith (2005) and Usher (2001) frame their theories so that DOCLRM(B, X) needs to be greater than DOCLRM(B, Y) for any Y distinct from X. Since, however, DOCRM and DOCLRM are ordinally equivalent to each other, it follows that DOCRM(B, X) > DOCRM(B, Y) for any Y distinct from X precisely when DOCLRM(B, X) > DOCLRM(B, Y) for any Y distinct from X.
Why are the theories in question called “informational”? See Artiga and Sebastián (forthcoming, p. 3, n. 4) for an explanation. In Sect. 5.2, we discuss two senses of “mutual information” in the literature, and how they relate to ppc theories and the hard problem.
Artiga and Sebastián consider a theory that we have yet to address. They call it “INFO.” It can be put like this:
INFO: For any b and X, b’s content is X if and only if (i) Pr(B | X) > Pr(B | Y) for any Y distinct from X and (ii) Pr(X | B) > Pr(X | B*) for any B* distinct from B.
This is like T6 except that it also requires that the probability of X given B be greater than the probability of X given any other proposition B* to the effect that the organism in question has a token of belief type b* at the time in question. It turns out, though, that our proof that T6 falls prey to the distality problem carries over to INFO. The problem is that any probability distribution on which (DIST2) and (DIST3) hold is such that Pr(B | X) < Pr(B | Cp).
An anonymous referee suggests, in effect, that if X causes different retinal states on different occasions, and each such retinal state always causes B, then X’s probability given B is greater than the probabilities of the various retinal-state descriptions (taken individually) given B, and that because of this, b’s content is X and isn’t some retinal-state description. This idea is captured by T5, but we have two comments. First, T5 falls prey to the disjunction problem. Second, even if X’s probability given B is higher than the probabilities of the various retinal-state descriptions given B, it might be that X’s probability given B is not greater than the probability of the disjunction of the various retinal-state descriptions. See Roche and Sober (forthcoming, third to last paragraph in section 5) for further discussion.
It might be objected that since b’s content can’t both be X and be X ∨ Y (though its content might both entail X and entail X ∨ Y), any case where T5S outputs the result b’s content is X is ipso facto a case where T5S outputs the result b’s content isn’t X ∨ Y. The problem, though, is that T5S itself implies otherwise. For, any case where Pr(X | B) = 1 is also a case where Pr(X ∨ Y | B) = 1.
See Kemeny and Oppenheim (1952, p. 314, proof of Theorem 2) for discussion of how to understand correlation in the context of propositions.
We give an example in Appendix 2.
DOCRM(H, E) can be arbitrarily close to 1 (the neutral point for DOCRM) when H and E entail each other. For, when H and E entail each other, DOCRM(H, E) = 1/Pr(H), and this ratio approaches 1 as Pr(H) approaches 1.
Since r is symmetric, it follows that T9 is logically equivalent to T8 when understood in terms of r.
T9 can handle the disjunction problem, but we won’t explain why here.
For discussion of how best to interpret MI, see Roche and Shogenji (2018).
Since mi is symmetric, it follows that the second condition on the right-hand side of T10 is logically equivalent to the condition that mi(B, X) > mi(B, Yi) for any Yi in Γ2.
It can be shown that T10, like T9, can handle the disjunction problem, but we won’t bother with that here.
For discussion of Weak Law of Likelihood and related theses, see Roche and Shogenji (2014).
Final Probability Incrementality is logically weaker than the principle that Crupi et al. (2013) call by the same name.
We remind the reader that what we’ve been saying about “the disjunction problem” and “the distality problem” has, for most of this paper, really been about highly specific “versions” of each. As noted in Sect. 2, other versions are possible, and exploring them is worthwhile.
Here we draw the reader’s attention to Roche and Sober (forthcoming), where the Akaike Information Criterion is used to investigate the epistemology of hypotheses that attribute false beliefs.
What about the disjunction problem? The probability distribution given in Appendix 2 is such that (DISJ2) and (DISJ3) hold. It’s also such that r(X, B) ≈ 0.512 > −0.510 ≈ r(X ∨ Y, B) and r(X, B) ≈ 0.512 > − 0.815 ≈ r(Y, B). Given the assumption that (DISJ1) holds, T9 outputs the result that b’s content is X and isn’t X ∨ Y or Y. Hence T9 can handle the disjunction problem.
References
Adams, F., & Aizawa, K. (2017). Causal theories of mental content. In E. Zalta (Ed.), Stanford Encyclopedia of Philosophy (Summer 2017). https://plato.stanford.edu/archives/sum2017/entries/content-causal/.
Armstrong, D. (1968). A materialist theory of the mind. London: Routledge & Kegan Paul.
Artiga, M., & Sebastián, M. (forthcoming). Informational theories of content and mental representation. Review of Philosophy and Psychology.
Buras, T. (2009). An argument against causal theories of mental content. American Philosophical Quarterly, 46, 117–129.
Cover, T., & Thomas, J. (2006). Elements of information theory (2nd ed.). Hoboken: Wiley.
Crupi, V., Chater, N., & Tentori, K. (2013). New axioms for probability and likelihood ratio measures. British Journal for the Philosophy of Science, 64, 189–204.
Crupi, V., Tentori, K., & Gonzalez, M. (2007). On Bayesian measures of evidential support: Theoretical and empirical issues. Philosophy of Science, 74, 229–252.
Dretske, F. (1981). Knowledge and the flow of information. Cambridge, MA: MIT Press.
Dretske, F. (1983). Précis of Knowledge and the flow of information. Behavioral and Brain Sciences, 6, 55–90.
Eells, E., & Fitelson, B. (2002). Symmetries and asymmetries in evidential support. Philosophical Studies, 107, 129–142.
Eliasmith, C. (2005). A new perspective on representational problems. Journal of Cognitive Science, 6, 97–123.
Field, H. (1990). “Narrow” aspects of intentionality and the information-theoretic approach to content. In E. Villanueva (Ed.), Information, semantics, and epistemology (pp. 102–116). Oxford: Blackwell.
Fitelson, B. (2008). A decision procedure for probability calculus with applications. Review of Symbolic Logic, 1, 111–125.
Fodor, J. (1984). Semantics, Wisconsin style. Synthese, 59, 231–250.
Fodor, J. (1987). Psychosemantics: The problem of meaning in the philosophy of mind. Cambridge, MA: MIT Press.
Fodor, J. (1990a). A theory of content and other essays. Cambridge, MA: MIT Press.
Fodor, J. (1990b). Information and representation. In P. Hanson (Ed.), Information, language, and cognition (pp. 175–190). Oxford: Oxford University Press.
Gerken, M. (2014). A puzzle about mental self-representation and causation. Philosophical Psychology, 27, 890–906.
Hájek, A. (2012). Interpretations of probability. In E. Zalta (Ed.), The Stanford Encyclopedia of Philosophy (Winter 2012 ed.). https://plato.stanford.edu/archives/win2012/entries/probability-interpret/.
Kemeny, J., & Oppenheim, P. (1952). Degree of factual support. British Journal for the Philosophy of Science, 19, 307–324.
Nozick, R. (1981). Philosophical explanations. Cambridge, MA: Harvard University Press.
Roche, W., & Shogenji, T. (2014). Dwindling confirmation. Philosophy of Science, 81, 114–137.
Roche, W., & Shogenji, T. (2018). Information and inaccuracy. British Journal for the Philosophy of Science, 69, 577–604.
Roche, W., & Sober, E. (forthcoming). Hypotheses that attribute false beliefs—A two-part epistemology (Darwin + Akaike). Mind & Language
Rupert, R. (1999). The best test theory of extension: First principle(s). Mind & Language, 14, 321–355.
Shogenji, T. (2003). A condition for transitivity in probabilistic support. British Journal for the Philosophy of Science, 54, 613–616.
Stampe, D. (1977). Toward a causal theory of linguistic representation. In P. French, H. K. Wettstein, & T. E. Uehling (Eds.), Midwest studies in philosophy (Vol. 2, pp. 42–63). Minneapolis: University of Minnesota Press.
Usher, M. (2001). A statistical referential theory of content: Using information theory to account for misrepresentation. Mind & Language, 16, 311–334.
Acknowledgements
We thank Martha Gibson, Eric Saidel, Alan Sidelle, Larry Shapiro, Tomoji Shogenji, Dennis Stampe, Mike Steel, Marius Usher, and two anonymous referees for helpful feedback.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendices
Some of the claims in the appendices below can be readily verified by using elementary probability theory, but others are more difficult. Some are based on various results in the extant literature; here readers can refer to the cited works for details. Still others were verified or found on Mathematica with Fitelson’s PrSAT (on which see Fitelson 2008). This is true, for example, of (A2.3.1) in Appendix 2.
Appendix 1: DOCRM and T6, T7, and T8
Bayes’s theorem implies that:
These equalities imply:
Given this, it follows that if T7 and T8 are understood in terms of DOCRM, then the right sides of T6, T7, and T8 are all logically equivalent to each other. Hence, if T7 and T8 are understood in terms of DOCRM, then T6, T7, and T8 are all logically equivalent to each other.
Appendix 2: How T1–T8 fare in terms of the disjunction problem
3.1 T1 and Disjunction
Any probability distribution on which (DISJ2) and (DISJ3) holds is such that Pr(X | B) < 1. It follows that any such distribution is such that T1 outputs the result that b’s content isn’t X. Hence T1 falls prey to the disjunction problem.
3.2 T5 and Disjunction
Any probability distribution on which (DISJ2) and (DISJ3) hold is such that Pr(X | B) < Pr(X ∨ Y | B) and Pr(Y | B) < Pr(X ∨ Y | B). It follows that any such distribution is such that given the assumption that (DISJ1) holds, T5 outputs the result that b’s content is X ∨ Y and isn’t X or Y. Hence T5 falls prey to the disjunction problem.
3.3 T2, T3/T4, T6, T7, T8 and Disjunction
Consider the following probability distribution:
X | Y | B | Pr |
---|---|---|---|
T | T | T | \( 0 \) |
T | T | F | \( 0 \) |
T | F | T | \( \frac{10}{57} \) |
T | F | F | \( 0 \) |
F | T | T | \( \frac{3}{31} \) |
F | T | F | \( \frac{16}{29} \) |
F | F | T | \( \frac{10}{57} \) |
F | F | F | \( \frac{32}{51243} \) |
It follows on this distribution that
Hence both (DISJ2) and (DISJ3) hold. It also follows that:
Given (A2.3.3) and the assumption that (DISJ1) holds, each of T2 and T6 outputs the result that b’s content is X and isn’t X ∨ Y or Y. Given (A2.3.4), (A2.3.5), (A2.3.6), and the assumption that (DISJ1) holds, each of T3/T4, T7, and T8 outputs the result that b’s content is X and isn’t X ∨ Y or Y. Hence each of T2, T3/T4, T6, T7, and T8 can handle the disjunction problem.Footnote 43
Appendix 3: How T1–T8 fare in terms of the distality problem
4.1 T1 and Distality
Any probability distribution on which (DIST2) and (DIST3) hold is such that Pr(X | B) = 1 only if Pr(Cp | B) = 1. Hence any probability distribution on which (DIST2) and (DIST3) hold is such that T1 outputs the result that b’s content is X only if it also outputs the result that b’s content is Cp. Hence T1 falls prey to the distality problem.
4.2 T2, T6, T8 and Distality
Any probability distribution on which (DIST2) and (DIST3) holds is such that:
(A3.2.3) and (A3.2.4) imply (see Shogenji 2003):
They further imply:
Given (A3.2.5) and (A3.2.6), and given (A3.2.1) and (A3.2.2), it follows that:
This inequality implies:
By similar reasoning, it can be shown that:
Given (A3.2.8), T2 outputs the result that b’s content isn’t X. Given (A3.2.8), (A3.2.10), and the assumption that (DIST1) holds, T6 outputs the result that X’s content is Cp and isn’t X or Cd. Given (A3.2.7), (A3.2.9), and the assumption that (DIST1) holds, T8 outputs the result that X’s content is Cp and isn’t X or Cd. Hence each of T2, T6, and T8 falls prey to the distality problem.
4.3 T3/T4 and Distality
Any probability distribution on which (DIST2) holds is such that:
Given (A3.3.2), T3/T4 outputs the result that b’s content is X. But given (A3.3.1) and (A3.3.3), T3/T4 also outputs both the result that b’s content is Cp and the result that b’s content is Cd. Hence T3/T4 falls prey to the distality problem.
4.4 T5 and Distality
Consider the following probability distribution:
Cd | X | Cp | B | Pr | Cd | X | Cp | B | Pr | |
---|---|---|---|---|---|---|---|---|---|---|
T | T | T | T | \( \frac{72}{565} \) | F | T | T | T | \( \frac{51}{2260} \) | |
T | T | T | F | \( \frac{96}{113113} \) | F | T | T | F | \( \frac{17}{113113} \) | |
T | T | F | T | \( \frac{1}{113} \) | F | T | F | T | \( \frac{17}{10848} \) | |
T | T | F | F | \( \frac{96}{1243} \) | F | T | F | F | \( \frac{17}{1243} \) | |
T | F | T | T | \( 0 \) | F | F | T | T | \( 0 \) | |
T | F | T | F | \( 0 \) | F | F | T | F | \( 0 \) | |
T | F | F | T | \( \frac{1}{67} \) | F | F | F | T | \( \frac{19395421}{313141920} \) | |
T | F | F | F | \( \frac{96}{737} \) | F | F | F | F | \( \frac{19395521}{35880845} \) |
It follows on this distribution that:
Hence (DIST2) and (DIST3) both hold. It also follows that:
Given the assumption that (DIST1) holds, T5 outputs the result that b’s content is X and isn’t Cp or Cd. Hence T5 can handle the distality problem.
4.5 T7 and Distality
Any probability distribution on which (DIST2) holds is such that:
These inequalities imply:
Any probability distribution on which (DIST3) holds is such that:
These equalities imply:
(A3.5.3), (A3.5.4), (A3.5.7), and (A3.5.8) together imply (see Roche and Shogenji 2014):
(A3.5.9) and (A3.5.10) together imply (see Roche and Shogenji 2014):
By similar reasoning, it can be shown that:
Given the assumption that (DIST1) holds, T7 outputs the result that b’s content is Cp and isn’t X or Cd. Hence T7 falls prey to the distality problem.
Appendix 4: How hybrid theories fare in terms of the hard problem
5.1 Conjunctive theories based on one or more of T2, T6, T7, and T8
We show in Appendix 3 that any probability distribution on which (DIST2) and (DIST3) hold is such that:
It follows that for any conjunctive theory T based on one or more of T2, T6, T7, and T8, there are no probability distributions such that (i) (DIST2) and (DIST3) hold and (ii) given the assumption that (DIST1) holds, T outputs the result that b’s content is X and isn’t Cp or Cd. Hence no conjunctive theory based on one or more of T2, T6, T7, and T8 can handle the distality problem. Hence no such theory can handle the hard problem.
5.2 Conjunctive theories based on T5
We show in Appendix 2 that any probability distribution on which (DISJ2) and (DISJ3) hold is such that Pr(X | B) < Pr(X ∨ Y | B) and Pr(Y | B) < Pr(X ∨ Y | B). It follows that for any conjunctive theory T based on T5, there are no probability distributions such that (i) (DISJ2) and (DISJ3) hold and (ii) given the assumption that (DISJ1) holds, T outputs the result that b’s content is X and isn’t X ∨ Y or Y. Hence no conjunctive theory based on T5 can handle the disjunction problem. Hence no such theory can handle the hard problem.
5.3 The T1&T3/T4 Conjunctive Theory
There is only one conjunctive theory left to consider: T1&T3/T4. We note in Appendix 2 that any probability distribution on which (DISJ2) and (DISJ3) hold is such that Pr(X | B) < 1. It follows that there are no probability distributions such that (i) (DISJ2) and (DISJ3) hold and (ii) T1&T3/T4 outputs the result that b’s content is X. Hence T1&T3/T4 falls prey to the disjunction problem. Hence it falls prey to the hard problem.
5.4 Disjunctive theories based on one or more of T6, T7, and T8
We show in Appendix 3 that any probability distribution on which (DIST2) and (DIST3) hold is such that:
It follows that for any disjunctive theory T based on one or more of T6, T7, and T8, there are no probability distributions such that (i) (DIST2) and (DIST3) hold and (ii) given the assumption that (DIST1) holds, T outputs the result that b’s content isn’t Cp. Hence no disjunctive theory based on one or more of T6, T7, and T8 can handle the distality problem. Hence no such theory can handle the hard problem.
5.5 Disjunctive theories based on T5
We show in Appendix 2 that any probability distribution on which (DISJ2) and (DISJ3) hold is such that Pr(X | B) < Pr(X ∨ Y | B) and Pr(Y | B) < Pr(X ∨ Y | B). It follows that for any disjunctive theory T based on T5, there are no probability distributions such that (i) (DISJ2) and (DISJ3) hold and (ii) T outputs the result that b’s content isn’t X ∨ Y. Hence no disjunctive theory based on T5 can handle the disjunction problem. Hence no such theory can handle the hard problem.
5.6 Disjunctive theories based on T3/T4
Any probability distribution on which (DIST2) holds is such that Pr(B | Cp) > Pr(B). It follows that for any disjunctive theory T based on T3/T4, there are no probability distributions such that (i) (DIST2) and (DISJ3) hold and (ii) T outputs the result that b’s content isn’t Cp. Hence no such theory can handle the Ditality Problem. Hence no disjunctive theory based on T3/T4 can handle the hard problem.
5.7 T1 ∨ T2
There is only one disjunctive theory left to consider: T1 ∨ T2. We note in Appendix 3 that any probability distribution on which (DIST2) and (DIST3) hold is such that:
It follows that there are no probability distributions such that (i) (DIST2) and (DISJ3) hold and (ii) T1 ∨ T2 outputs the result that b’s content is X and isn’t Cp. Hence T1 ∨ T2 falls prey to the distality problem. Hence it falls prey to the hard problem.
Appendix 5: How T1N–T8N fare in terms of the hard problem
6.1 T1N and the disjunction problem
We note in Appendix 2 that any probability distribution on which (DISJ2) and (DISJ3) holds is such that Pr(X | B) < 1. It follows that any such distribution is such that T1N outputs the result that b’s content isn’t X. Hence T1N falls prey to the disjunction problem.
6.2 T2N, T3/T4N, T6N, T7N, and T8N and the disjunction problem
We show in Appendix 2 section that there are probability distributions on which (DISJ2) and (DISJ3) hold and:
Given (A2.3.3), and given the assumption that (DISJ1) holds, T2N, T6N, and T8N all output the result that b’s content is X. Hence each of T2N, T6N, and T8N can handle the disjunction problem. Given (A2.3.4), (A2.3.5), and (A2.3.6), and given the assumption that (DISJ1) holds, T3/T4N and T7N both output the result that b’s content is X. Hence T3/T4N and T7N can handle the disjunction problem.
6.3 T5N and the disjunction problem
We note in Appendix 2 that when (DISJ2) and (DISJ3) hold, it follows that Pr(X | B) < Pr(X ∨ Y | B). Hence when (DISJ2) and (DISJ3) hold, T5N rules out X as b’s content. Hence T5N falls prey to the disjunction problem.
6.4 T1N and the distality problem
We note in Appendix 3 that when (DIST2) and (DIST3) hold, it follows that Pr(X | B) = 1 only if Pr(Cp | B) = 1. Hence when (DIST2) and (DIST3) hold, T1N rules out Cp as b’s content only if it also rules out X as b’s content. Hence T1N falls prey to the distality problem.
6.5 T2N, T3/T4N, T6N, T7N, and T8N and the distality problem
We show in Appendix 3 that when (DIST2) and (DIST3) hold, it follows that:
(A3.2.8) implies that Pr(B | X) isn’t equal to 1, that Pr(B | X) isn’t greater than Pr(B | Cp), and that Pr(B | X) − Pr(B) isn’t greater than Pr(B | Cp) − Pr(B). Hence when (DIST2) and (DIST3) hold, T2N, T6N, and T8N all rule out X as b’s content. Hence each of T2N, T6N, and T8N falls prey to the distality problem. Given that (A3.3.1) holds when (DIST2) and (DIST3) hold, it follows that when (DIST2) and (DIST3) hold, T3/T4N doesn’t rule out Cp as b’s content. Hence T3/T4N falls prey to the distality problem. Given that (A3.5.11) holds when (DIST2) and (DIST3) hold, it follows that when (DIST2) and (DIST3) hold, T7N rules out X as b’s content. Hence T7N falls prey to the distality problem.
6.6 T5N and the distality problem
We show in Appendix 3 that there are probability distributions on which (DIST2) and (DIST3) hold and:
Given (A3.4.15), T5N rules out each of Cp and Cd as b’s content. Given this, and given the assumption that (DIST1) holds, it follows by T5N that b’s content is X. Hence T5N can handle the distality problem.
Appendix 6: How T9 fares in terms of the hard problem
First, note that:
We show in Appendix 3 that any probability distribution on which (DIST2) and (DIST3) hold is such that:
These inequalities imply (see Roche and Shogenji 2014):
It follows that any probability distribution on which (DIST2) and (DIST3) hold is such that:
By similar reasoning, it can be shown that any probability distribution on which (DIST2) and (DIST3) hold is such that:
It follows that any probability distribution on which (DIST2) and (DIST3) hold is such that given the assumption that (DIST1) holds, T9 outputs the result that b’s content is Cp and isn’t X or Cd. Hence T9 falls prey to the distality problem.Footnote 44 Hence it falls prey to the hard problem.
Appendix 7: How T10 fares with respect to the hard problem
We noted in Appendix 3 that any probability distribution on which (DIST3) holds is such that:
It follows from these equalities that:
This means that each member of Γ3 = {Cp, ~ Cp} screens-off each member of Γ2 = {X, ~ X} from each member of Γ1 = {B, ~ B}. Given this, it follows by the so-called “Data Processing Inequality” (Cover and Thomas 2006, Ch. 2) that:
But then there are no probability distributions such that (i) (DIST2) and (DIST3) hold and (ii) given the assumption that (DIST1) holds, T10 outputs the result that b’s content is X and isn’t Cp or Cd. Hence T10 falls prey to the distality problem. Hence T10 falls prey to the hard problem.
Appendix 8: How T11 fares with respect to the hard problem
The probability distribution given in Appendix 2 section is such that (DISJ2) holds, (DISJ3) holds, and:
The probability distribution given in Appendix 3 section is such that (DIST2) holds, (DIST3) holds, and:
Given (A8.1), (A8.2), and the assumption that (DISJ1) holds, T11 outputs the result that b’s content is X and isn’t X ∨ Y or Y. Hence T11 can handle the disjunction problem. Given (A8.3), (A8.4), and the assumption that (DIST1) holds, T11 outputs the result that b’s content is X and isn’t Cp or Cd. Hence T11 can handle the distality problem. Hence T11 can handle the hard problem.
Appendix 9: How T11 fares with respect to the disjunction* problem
First, note that:
Any probability distribution on which (DISJ2), (DISJ3), and (DISJ4) hold is such that DOCDM*(X, B) > 0, DOCDM*(Y, B) > 0, and:
It follows that any probability distribution on which (DISJ2), (DISJ3), and (DISJ4) hold is such that given the assumption that (DISJ1) holds, T11 outputs the result that b’s content is X ∨ Y and isn’t X or Y. Hence T11 falls prey to the disjunction* problem.
Rights and permissions
About this article
Cite this article
Roche, W., Sober, E. Disjunction and distality: the hard problem for purely probabilistic causal theories of mental content. Synthese 198, 7197–7230 (2021). https://doi.org/10.1007/s11229-019-02516-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11229-019-02516-y