Probabilistic coherence measures: a psychological study of coherence assessment

Koscholke, Jakob; Jekel, Marc

doi:10.1007/s11229-015-0996-6

Probabilistic coherence measures: a psychological study of coherence assessment

Published: 11 January 2016

Volume 194, pages 1303–1322, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Synthese Aims and scope Submit manuscript

Probabilistic coherence measures: a psychological study of coherence assessment

Download PDF

Jakob Koscholke¹ &
Marc Jekel²

506 Accesses
6 Citations
Explore all metrics

Abstract

Over the years several non-equivalent probabilistic measures of coherence have been discussed in the philosophical literature. In this paper we examine these measures with respect to their empirical adequacy. Using test cases from the coherence literature as vignettes for psychological experiments we investigate whether the measures can predict the subjective coherence assessments of the participants. It turns out that the participants’ coherence assessments are best described by Roche’s (Insights from philosophy, jurisprudence and artificial intelligence, 2013) coherence measure based on Douven and Meijs’ (Synthese 156:405–425, 2007) average mutual support approach and the conditional probability.

A General Framework for Probabilistic Measures of Coherence

Article 20 February 2019

The logic of coherence

Article 06 March 2020

Evaluating Test Cases for Probabilistic Measures of Coherence

Article 17 April 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Loosely speaking, coherence is the property of propositions hanging or fitting together, dovetailing with or mutually supporting each other (cf. BonJour 1985; Olsson 2005). It is the key concept of any coherentist theory of justification or truth. Nevertheless, as many authors have pointed out, coherentists have put only little emphasis on elucidating their central concept—or in Nicholas Rescher’s words “the coherence theorists themselves have not always been too successful in explicating the nature of coherence” (Rescher 1973, p. 33). In order to overcome this supposed shortcoming, various philosophers have attempted to provide a mathematically precise explication of the concept coherence using probability theory. The results are so-called probabilistic coherence measures (cf. Douven and Meijs 2007; Fitelson 2003, 2004; Glass 2002; Meijs 2006; Olsson 2002; Roche 2013; Schippers 2014; Schupbach 2011; Shogenji 1999). Of course, these measures have to be examined with respect to their claim of measuring coherence adequately. So far, there have been two common ways to do that: (i) formulating adequacy constraints and proving whether they are satisfied by a measure or not, (ii) developing paradigmatic test cases providing an intuitive normative coherence assessment and testing whether a measure is in line with this assessment or not.

This paper concentrates on the second approach, although in a slightly different way. Rather than using test cases and the provided normative coherence assessment as a benchmark for probabilistic coherence measures, the test cases are used as vignettes for psychological experiments, in which participants are asked for subjective coherence assessments of specified sets of propositions. Accordingly, the results of the experiments can be used to (i) evaluate the normative coherence assessments provided by the test cases and to (ii) evaluate the suitability of the tested measures as predictors of the participants’ coherence assessments. The paper is structured as follows. In Sect. 2 a collection of probabilistic coherence measures that have been proposed in the literature is introduced. In Sects. 3 and 4 the psychological study including methods and results is described. Finally, Sect. 5 discusses which conclusions can be drawn from the results.

2 Probabilistic measures of coherence

The notion of a probabilistic coherence measure can be introduced formally in a straightforward manner. Let L be a classical propositional language consisting of atomic formulas closed under some functional complete selection of classical logical connectives such as e.g. $\{\lnot ,\wedge \}$ and let $P:L\rightarrow [0,1]$ be a probability function over L with conditional probability defined by $P(x_1|x_2)=P(x_1\wedge x_2)/P(x_2)$ for any $x_2\in L$ with $P(x_2)\ne 0$. Furthermore, let $2^L_{\ge 2}$ denote the set of all non-empty, non-singleton subsets of L and $\mathbf {P}$ the set of all probability functions over L. A probabilistic coherence measure can then be defined as a partial function $C:2^L_{\ge 2}\times \mathbf {P}\rightarrow \mathbb {R}$ assigning real numbers to sets of propositions under some joint probability distribution. By contrast, a probabilistic measure of support, on which a coherence measure can be based, is a partial function $S:L\times L\times \mathbf {P}\rightarrow \mathbb {R}$ assigning real numbers to pairs of propositions under some probability distribution where the first argument is commonly interpreted as a hypothesis and the second as a piece of evidence. Notice that we will omit reference to a particular probability function as a separate function argument of coherence or support measures.

Still, these are only very general requirements a probabilistic coherence measure should meet. The question which probabilistic information should be taken into account by a probabilistic coherence measure in order to adequately quantify the degree of coherence has been answered in different ways leading to different kinds of measures. They can be categorized into three groups: (i) measures that quantify coherence in terms of deviation from probabilistic independence (see Sect. 2.1), (ii) in terms of relative set-theoretic overlap (see Sect. 2.2) and (iii) in terms of average mutual support (see Sect. 2.3). In the following we briefly introduce the approaches and the resulting measures.

2.1 Deviation from independence measures

According to standard textbooks on probability theory (cf. e.g. Kolmogorov 1956), a set X of propositions $x_1,\ldots ,x_n$ is said to be n-wise negatively dependent iff $P(x_1\wedge \ldots \wedge \,x_n)<P(x_1)\times \ldots \times P(x_n)$, independent iff $P(x_1\wedge \ldots \wedge x_n)=P(x_1)\times \ldots \times P(x_n)$ and positively dependent iff $P(x_1\wedge \ldots \wedge x_n)>P(x_1)\times \ldots \times P(x_n)$. This definition can be rearranged by dividing the term on the left hand side by the term on the right hand side. Positive dependence is then defined as a value in the open interval $(1,\infty )$, independence as a value of 1 and negative dependence as a value in the half-open interval [0, 1). This can be considered the basic idea underlying Shogenji’s (1999) coherence measure. According to Shogenji, the degree of coherence of a finite set X of propositions can be computed by dividing the joint probability of X’s propositions by the product over their marginal probabilities. This quantifies the propositions’ ratio-wise deviation from their independence threshold value $\theta =1$. This value is interpreted as neutrality such that values below $\theta $ indicate degrees of incoherence and values above $\theta $ indicate degrees of coherence:

$$\begin{aligned} {C}_{sho}(X)=\frac{P\left( \bigwedge _{x_i\in X}x_i\right) }{\prod _{x_i\in X}P(x_i)} \end{aligned}$$

As Fitelson (2003) and Schupbach (2011) have pointed out, Shogenji’s measure suffers from a lack of subset sensitivity when applied to a set of more than two propositions. This is due to the fact that for any set X consisting of n propositions there are probability distributions such that there are subsets of X which are i-wise negatively dependent, independent or dependent but not j-wise negatively dependent, independent or dependent for $i\ne j$ where $i,j\le n$ (cf. Pfeiffer 1990). Therefore, Schupbach (2011) has suggested the following alternative generalization: to assess the degree of coherence of X, apply a log-normalized version of $C_{sho}$ to each set $X'_{ij}$ which is the i-th subset of X and contains $j\ge 2$ propositions. For each of them divide its coherence value by the number of sets with j members, sum up the resulting values and divide this sum by X’s cardinality minus one ignoring singleton sets:

$$\begin{aligned} {{C}_{sch}(X)=\dfrac{\sum \nolimits _{j=2}^{n}\sum \nolimits _{i=1}^{\left( {\begin{array}{c}n\\ j\end{array}}\right) } \log \left( {C}_{sho}(X'_{ij})\right) \times \left( {\begin{array}{c}n\\ j\end{array}}\right) ^{-1}}{n-1}} \end{aligned}$$

Although the measure is more fine-grained it is still based on the idea of measuring coherence in terms of their deviation from independence. However, due to the log-normalization the threshold value of Schupbach’s measure for neutrality is $\theta =0$, such that values in $(-\infty ,0)$ indicate degrees of incoherence and values in $(0,\infty )$ indicate degrees of coherence.

2.2 Relative overlap measures

Glass (2002) and Olsson (2002) have proposed a different measure of coherence that is based on a set-theoretically inspired understanding of coherence. Here, the joint probability over all propositions $x_1\ldots ,x_n$ in some set X is interpreted as their overlapping set-theoretic surface. Likewise, the probability that any of these propositions is true is interpreted as their total set-theoretic surface. In order to compute the degree of coherence of X Glass and Olsson suggest to simply divide the probability of the conjunction by the probability of the disjunction over X’s members. Set-theoretically speaking this can be understood as quantifying the propositions’ relative overlap:

$$\begin{aligned} {C}_{go}(X)=\frac{P\left( \bigwedge _{x_i\in X}^{}x_i\right) }{P\left( \bigvee _{x_i\in X}x_i\right) } \end{aligned}$$

It is easy to see that the measure has the codomain [0, 1] where 0 means no overlap at all and 1 means identity of overlap and total surface of the propositions. But unlike the two measures mentioned before the threshold $\theta $ cannot be based on probabilistic independence. One can, however, argue that the threshold is .5 in the case of two propositions ${x_1,x_2}$, since values below this threshold would indicate that $x_1$ coheres better with $\lnot x_2$ than with $x_2$. In any case, Bovens and Hartmann (2003) have shown that this measure has similar problems with respect to subset-sensitivity like Shogenji’s measure. In order to overcome these difficulties Meijs (2006) has suggested the following alternative: in order to assess the coherence of X, take the straight average over all $C_{go}$ values applied to every subset $X'_i$ of X with $|X'_i|\ge 2$:

$$\begin{aligned} {C}_{mei}(X)=\frac{\sum _{i=1}^{(2^{n}-n)-1}{C}_{go}(X'_i)}{(2^{n}-n)-1} \end{aligned}$$

This measure is obviously more fine-grained but it is easy to see that the codomain [0, 1] and the threshold $\theta $ remain the same.

2.3 Average mutual support measures

A whole family of coherence measures can be obtained using an approach systematically developed by Douven and Meijs (2007). According to their approach, coherence is to be understood as average mutual support. And since there is a variety of probabilistic measures of support (for overviews cf. Crupi et al. 2007; Festa 2012) one can easily obtain a huge collection of candidates for coherence measures based on them. The basic idea runs as follows: to assess the coherence of X, consider all pairs $(X',X'')_i$ where $X'$ and $X''$ are non-empty, disjoint subsets of X. For each pair, take the conjunctions over the propositions contained in the respective set and calculate the average degree of support according to some chosen probabilistic support measure S:

$$\begin{aligned} {C_{S}}(X)=\dfrac{\sum ^{{(3^{n}-2^{n+1})+1}}_{i=1}{S}\left( \bigwedge _{x_j\in X'} x_j,\bigwedge _{x_k\in X''} x_k\right) _i}{(3^n-2^{n+1})+1} \end{aligned}$$

For his coherence measure Fitelson (2004) has chosen a case-sensitive variation of Kemeny and Oppenheim’s (1952) measure of factual support. The values of the resulting coherence measure are in $[-1,1]$ with $\theta =0$:

$$\begin{aligned} {S}_{fit}(x_1,x_2)={\left\{ \begin{array}{ll} \frac{P(x_2|x_1)-P(x_2|\lnot {x_1})}{P(x_2|x_1)+P(x_2|\lnot {x_1})} &{}\,\,\, {\text {if}}\;x_2\nvdash x_1\,{\text {and}}\,x_2\nvdash \lnot x_1\\ 1 &{}\,\,\, {\text {if}}\;x_2\vdash x_1\;{\text {and}}\;x_2\nvdash \bot \\ -1 &{}\,\,\, {\text {if}}\;x_2\vdash \lnot x_1 \end{array}\right. } \end{aligned}$$

Douven and Meijs (2007) have investigated three further support measures as foundations for probabilistic coherence measures, namely Carnap’s (1950) difference measure with codomain $[-1,1)$ and $\theta =0$, Keynes’ (1921) relevance quotient and Good’s (1984) likelihood ratio measure both with codomain $[0,\infty )$ and $\theta =1$. Notice that due to commutativity or ordinal equivalence (up to identity) one would obtain identical coherence measures if one used Levi’s (1962) corroboration measure or Mortimer’s (1988) confirmation measure instead of Carnap’s difference measure, Kuipers’ (2000) confirmation measure or Finch’s (1960) confirmation measure +1 instead of Keynes’ and finally Joyce’s (2008) odds-ratio measure instead of Good’s likelihood-ratio measure. Douven and Meijs’ favourite is the coherence measure based on Carnap’s difference measure:

Siebel and Wolff (2008) have extended the collection of candidate measures by taking into account Carnap’s (1950) relevance measure with codomain $(-1,1)$ and $\theta =0$, Nozick’s (1981) counterfactual likelihood difference measure, Popper’s (1954) corroboration measure and Rescher’s (1958) measure of evidential support, all three with values in $[-1,1]$ with $\theta =0$:

$$\begin{aligned} \begin{array}{c} \displaystyle {S}_{car'}(x_1,x_2)=P(x_1\wedge x_2)- P(x_1)\cdot P(x_2)\\ \displaystyle {S}_{noz}(x_1,x_2)=P(x_2|x_1)-P(x_2|\lnot {x_1})\\ \displaystyle {S}_{pop}(x_1,x_2)=\frac{P(x_2|x_1)-P(x_2)}{P(x_2|x_1)+P(x_2)}\cdot \left( 1+P(x_1)\cdot P(x_1|x_2)\right) \\ \displaystyle {S}_{res}(x_1,x_2)=\frac{P(x_1|x_2)-P(x_1)}{1-P(x_1)}\cdot P(x_2)\\ \end{array} \end{aligned}$$

A more recent proposal is due to Roche (2013). His favourite coherence measure is based on Douven and Meijs’ approach and a case-sensitive version of absolute—as opposed to incremental—support, namely the conditional probability. The codomain of the resulting measure obviously is [0, 1] but just like in the case of the Glass-Olsson measure $\theta =0.5$ although the interpretation of this value differs from the interpretation of the $\theta $ values of the other measures. Here, values above $\theta $ mean that some proposition is supported to a stronger degree than its negation, while values below indicate the opposite.

$$\begin{aligned} {S}_{roc}(x_1,x_2)={\left\{ \begin{array}{ll} P(x_1|x_2) &{}\,\,\, {\text {if}}\;x_2\nvdash x_1\,{\text {and}}\,x_2\nvdash \lnot x_1 \\ 1 &{}\,\,\, {\text {if}}\;x_2\vdash x_1\;{\text {and}}\;x_2\nvdash \bot \\ 0 &{}\,\,\, {\text {if}}\;x_2\vdash \lnot x_1 \end{array}\right. } \end{aligned}$$

Another recent coherence measure has been developed by Schippers (2014) and is based on his own measure of support. The values of this measure are in $[-1,1]$ with $\theta =0$. Notice that one can obtain the very same coherence measure by using the so-called power PC measure by Cheng (1997).

$$\begin{aligned} {S}_{sch}(x_1,x_2)={\left\{ \begin{array}{ll} \frac{P(x_1|x_2)-P(x_1|\lnot {x_2})}{1-P(x_1|\lnot {x_2})} &{}\,\,\, {\text {if}}\; P(x_1|x_2)\ge P(x_1)\\ \frac{P(x_1|x_2)-P(x_1|\lnot {x_2})}{P(x_1|\lnot {x_2})} &{} \,\,\,{\text {if}}\; P(x_1|x_2)<P(x_1) \end{array}\right. } \end{aligned}$$

Finally, Koscholke (2015) has added four further cadidate measures to the investigation, namely Crupi’s (2007) z-measure with values in $[-1,1]$ and $\theta =0$, Gaifman’s (1979) measure in $[0,\infty )$ with $\theta =1$, Rips’ (2001) measure and Shogenji’s (2012) justification measure which according to him is also a measure of evidential support both with values in in $(-\infty ,1]$ and $\theta =0$:

$$\begin{aligned}&{S}_{cru}(x_1,x_2)={\left\{ \begin{array}{ll} \frac{P(x_1|x_2)-P(x_1)}{1-P(x_1)} &{}\,\,\, {\text {if}}\; P(x_1|x_2)\ge P(x_1)\\ \frac{P(x_1|x_2)-P(x_1)}{P(x_1)} &{}\,\,\, {\text {if}}\; P(x_1|x_2)<P(x_1) \end{array}\right. }\\&{S}_{gai}(x_1,x_2)=\dfrac{P(\lnot {x_1})}{P(\lnot {x_1}|x_2)}\\&{S}_{rip}(x_1,x_2)=1-\dfrac{P(\lnot {x_2}|x_1)}{P(\lnot {x_2})}\\&S_{sho}(x_1,x_2)=\frac{\log _2 P(x_1|x_2)-\log _2 P(x_1)}{-\log _2 P(x_1)} \end{aligned}$$

Notice that the codomains and the $\theta $ values of the confirmation measures carry over to the coherence measures that are based on the respective measures. It is also worth noticing that although many of the presented confirmation measures have ordinal equivalent versions, the resulting coherence measures are not necessarily ordinally equivalent. Having introduced the candidate measures we may now turn to the experiments.

3 Methods

For the experiments a collection of test cases from Koscholke (2015) has been employed as vignettes. These test cases include Akiba’s dice case (cf. Akiba 2000), Bovens and Hartmann’s Tweety and their Tokyo murder cases (cf. Bovens and Hartmann 2003), Glass’ dedecahedron case (cf. Glass 2005), Meijs’ samurai and his rabbit case (cf. Meijs 2005), Meijs and Douven’s plane lottery case (cf. Meijs and Douven 2007), Schupbach’s robber case (cf. Schupbach 2011), Siebel’s pickpocketing case (cf. Siebel 2004) and Siebel and Schippers’ inconsistent testimony case (cf. Schippers and Siebel 2014). An overview of the employed test cases is given in Appendix 1, the test case results for each measure in Appendix 2.

Notice that Harris and Hahn (2009) have provided a very similar study to the one presented here. However, they only investigated the empirical adequacy of Bovens and Hartmann’s (2003) coherence quasi-ordering and only for a modified version of their Tokyo murder case. The present study can therefore be understood as an extension of Harris and Hahn’s project with respect to coherence measures and with respect to test cases.

3.1 Participants

57 participants (36 female, mean age = 25.8) were recruited from the Decision Lab Subject Pool of the University of Göttingen using the online recruiting tool ORSEE (cf. Greiner 2004). Participants received a show-up fee of 7 Euros (approx. USD 9.50) or course-credit.

3.2 Procedure and materials

The participants answered three questionnaires online no later than twelve hours before they arrived for the main study in the lab. The questionnaire included a translation of the brief form of the preference for consistency scale (cf. Cialdini et al. 1995) consisting of nine items, the numeracy scale (cf. Weller et al. 2013) consisting of fourteen items, and the cognitive reflection test (cf. Frederick 2005) consisting of three items. In the lab participants were presented the ten test cases in random order. Except for Bovens and Hartmann’s (2003) Tokyo murder case and Siebel’s (2004) pickpocketing case, each test case consists of two sets of propositions. Participants were first asked to indicate in which of the two sets the propositions fit together better or if they fit together equally well. Then participants were asked to use a continuous slider ranging from $-$100 to 100 to indicate the degree to which the propositions for each set fit together. In Bovens and Hartmann’s Tokyo murder case participants were asked to rank order the five sets of propositions according to how well the propositions fit together. Here, they also had to rate the degree of coherence of each set of propositions using the slider. For Siebel’s (2004) pickpocketing case the participants were asked if the propositions fit together or not. Then again the participants had to use the slider to evaluate how well the propositions fit together. Finally, participants were asked to provide demographic data and received a written debriefing.

3.3 Assessment of predictive accuracy

We assessed three variables to evaluate how well the coherence measures predict participants’ coherence assessments. We recorded if participants chose the first or second set of propositions as more coherent or if participants chose that the sets were equally coherent. The first variable choices (see Sect. 4.1) is the agreement between participants’ choices and the coherence assessments of each measure. For Bovens and Hartmann’s Tokyo murder case we recorded the coherence ranking participants gave to the five sets of propositions. The second variable ranking (see Sect. 4.2) is the percentage of participants who ranked propositions according to the rankings given by the measures. We also recorded the continuous coherence judgments participants gave for each set of propositions in each test case. The third variable judgments (see Sect. 4.3) is the fit between the observed judgments and coherence predictions as assessed in a mixed-linear-regression model for each measure as explained in more detail below.

4 Results

4.1 Choices

Most measures can predict participants’ choices better than chance, i.e. 33 % for three choice-options.^{Footnote 1} Correctly predicted choices range from 31 to 60 % between the measures. The three best measures—$C_{go}, C_{mei}$ and $C_{S_{roc}}$—perform equally well around 59% to 60% of correctly predicted choices (see Fig. 1).

4.2 Rankings

Only six particpants (i.e. 11 %) rank-ordered the five pairs of propositions in Bovens and Hartmann’s Tokyo murder test case in the way predicted by 44 % of all measures, i.e. rank-order: 1, 5, 4, 3, 2. A majority of 19 participants (33 %) used a similar ranking differing only in the ranking of the final two pairs of propositions, i.e. rank-order 1, 5, 4, 3, 2 versus 1, 5, 4, 2, 3. Thus, if we allow for one error in the ranking of the final two propositions, 44 % of the measures predict 44 % of participants correctly. Furthermore, allowing a switch in the second and third ranking (i.e. rank-order 1, 4, 5, 2, 3 or 1, 4, 5, 3, 2), the remaining 56 % of measures predict another 14 % of all participants’ rankings. Overall, 68 % of participants behave (although not perfectly) in line with at least one of the measures. This also means that a considerable percentage of participants (i.e. 32 %) do not rank-order the pairs of propositions in accordance with any measure. The three best measures for predicting choices—$C_{go}, C_{mei}$ and $C_{S_{roc}}$—also predict the ranking that a majority of participants gave quite well (see Fig. 2).

4.3 Judgments

The investigated coherence measures differ in the assessment of coherence in various test cases, which results in a unique profile for each measure (corresponding to the rows of Table 2 in Appendix 2). We used these profiles as predictors for participants’ continuous coherence judgments to test how well the measures can account for the judgments. Conceptually, for each measure j we fitted the profile to the coherence judgments of all participants. To account for the different scaling between predictions and judgments ranging from $-$100 to 100 used in the study, the regression weight of a factor $b_{ij}$ was estimated from the data to expand the profile. To account for differences in the extent of scaling for each participant i and to also account statistically for repeated ratings from the same participants, $b_{ij}$ consists of the sum of a value shared by all participants and an individual value estimated from the judgments of all participants and participant i. We further accounted for the direction of the predictions by subtracting the neutrality value from all values for each measure to restrict predictions above the neutrality value to judgments above 0, below the neutrality value to judgments below 0, and predictions identical to the neutrality value to zero-judgments. Technically, this can be achieved by including the prediction profile of a measure after subtracting the neutrality value in a mixed-linear-regression without an intercept as a fixed effect and a random effect for each participant.

In order to compare the measures we used the Bayesian Information Criterion (BIC) (cf. Schwarz 1978) from each regression model for each measure as an indicator of how well a measure can account for the participants’ continuous coherence judgments (see Fig. 3). The results from the analysis using a software package for linear and nonlinear mixed effects models (cf. Pinheiro et al. 2013) in R (cf. R Core Team 2015) show that the measure $C_{S_{roc}}$ can account for participants’ judgments best. The evidence from the data for $C_{S_{roc}}$ is extreme with a Bayes-factor of $2\times 10^{11}$ between $C_{S_{roc}}$ and the next best fitting measure (cf. Jeffreys 1961; Wagenmakers 2007). Overall, predictions based on $C_{S_{roc}}$ can describe participants’ judgment ratings very well (see Fig. 4).

4.4 Ability and personality as predictors of coherence-judgments

We also analyzed the relation between individual scaling factors $b_{i}$ for the measure $C_{S_{roc}}$ and the participants’ ability to process numbers on the one hand and their personality on the other hand.^{Footnote 2} The numeracy scale measures “the ability to understand, manipulate, and use numerical information, including probabilities” (Weller et al. 2013, p. 198) by asking participants to solve mainly statistical problems (e.g. “If Person A’s chance of getting a disease is 1 in 100 in 10 years, and person B’s risk is double that of A, what is B’s risk?”). Since the most successful measure $C_{S_{roc}}$ in predicting participants’ answers is based on conditional probabilities, we hypothesized that people who are sensitive to the measure $C_{S_{roc}}$ as reflected in a higher scaling factor $b_{i}$ should also score higher on the numeracy scale.

Table 1 Correlations (Pearson’s r) between the scaling factor $b_{i}$ for $C_{S_{roc}}$, the numeracy scale, the cognitive reflection task, and the preference for consistency scale

Full size table

We found a low correlation of $r = .30$ ($t(49) = 2.24, p < .05$) in the predicted direction (see Table 1). Closer inspection revealed that this correlation is driven by a single participant. After removing this participant from the analysis the correlation decreased to $r = .13$ ($t(48) = 0.94, p = .35$).

The cognitive reflection test (cf. Frederick 2005) measures if people rely on their first incorrect intuitive answer or reflect more on a task before giving an answer (e.g. “A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?”). We again predicted a positive correlation between the scaling factor and the cognitive reflection test and found a low positive ($r = .11$) but insignificant correlation ($t(49) = 0.79, p = 0.43$). The preference for consistency scale (cf. Cialdini et al. 1995) measures individuals’ preference for one’s own and others’ behavior being consistent and predictable (e.g. “It is important to me that those who know me can predict what I will do”). Consistent and predictable coherence judgments can be achieved by either being sensitive to coherence and thereby clearly disentangling different degrees of coherence between sets of propositions or by being insensitive to coherence and behaving similarly regarding all sets of propositions. In the analysis we found weak support for the second account: participants with a high preference for consistency show lower scaling factors ($r = -.28$; $t(49) = -2.04, p < .05$). Overall, the analyses show that the impact of ability and personality on the subjective coherence assessment is low.

5 Conclusion

The evaluation of the psychological experiments clearly shows that there are probabilistic coherence measures performing better in predicting subjective coherence assessments in the employed test cases than other measures. In particular, one measure standing out from the crowd is Roche’s (2013) coherence measure based on Douven and Meijs’ average mutual support approach and the conditional probability. This measure shows decent results with respect to comparative coherence assessments (see Sects. 4.1, 4.2) as well as absolute, continuous coherence judgments (see Sect. 4.3).

It is, however, important to notice that this does not mean that measures showing a weak performance as predictors of subjective coherence assessments should be completely disregarded as inadequate. First, being able to predict subjective coherence assessments for a specific case does not ensure that the predicted coherence assessments themselves are correct. It might turn out that based on philosophical considerations the subjective coherence assessments for a certain scenario need to be corrected and as a consequence might be better captured by a measure that has wrongly been disregarded. Second, the empirical adequacy of a probabilistic coherence measure is only one component among others—e.g. satisfaction of certain coherence desiderata or performance in coherence-related test cases—that should be taken into account when evaluating the overall adequacy of a probabilistic coherence measure. Interestingly enough, Roche’s measure also cuts a good figure in these two respects (cf. Schippers 2014; Koscholke 2015). Therefore, this investigation can be understood as providing further, empirical support for the claim that Roche’s measure is a very promising candidate for an adequate probabilistic measure of coherence.

Notes

We excluded the Siebel and Schippers’ inconsistent testimony case from all analyses because most measures have undefined function values in this test case (see Appendix 2).
Six participants did not answer the questionnaire prior to the lab study and were therefore excluded from this analysis.

References

Akiba, K. (2000). Shogenji’s probabilistic measure of coherence is incoherent. Analysis, 60, 356–359.
Article Google Scholar
BonJour, L. (1985). The structure of empirical knowledge. Cambridge: Harvard University Press.
Google Scholar
Bovens, L., & Hartmann, S. (2003). Bayesian epistemology. Oxford: Oxford University Press.
Google Scholar
Carnap, R. (1950). Logical foundations of probability. Chicago: University of Chicago Press.
Google Scholar
Cheng, P. W. (1997). From covariation to causation: A causal power theory. Psychological Review, 104, 367–405.
Article Google Scholar
Cialdini, R. B., Trost, M. R., & Newsom, J. T. (1995). Preference for consistency: The development of a valid measure and the discovery of surprising behavioral implications. Journal of Personality and Social Psychology, 69, 318–328.
Article Google Scholar
Crupi, V., Tentori, K., & Gonzales, M. (2007). On Bayesian measures of evidential support: Theoretical and empirical issues. Philosophy of Science, 74, 229–252.
Article Google Scholar
Douven, I., & Meijs, W. (2007). Measuring coherence. Synthese, 156, 405–425.
Article Google Scholar
Festa, R. (2012). For unto every one that hath shall be given. Matthew properties for incremental confirmation. Synthese, 184, 89–100.
Article Google Scholar
Finch, H. A. (1960). Confirming power of observations metricized for decisions among hypotheses. Philosophy of Science, 27, 293–307.
Article Google Scholar
Fitelson, B. (2004). Two technical corrections to my coherence measure. http://fitelson.org/coherence2.
Fitelson, B. (2003). A probabilistic theory of coherence. Analysis, 63, 194–199.
Article Google Scholar
Frederick, S. (2005). Cognitive reflection and decision making. Journal of Economic Perspectives, 19, 25–42.
Article Google Scholar
Gaifman, H. (1979). Subjective probability, natural predicates and Hempel’s ravens. Erkenntnis, 21, 105–147.
Google Scholar
Glass, D. H. (2002). Coherence, explanation, and Bayesian networks. In O’Neill, M., Sutcliffe, R. F. E., Ryan, C., Eaton, M., & Griffith, N. J. L. (Eds.), Artificial intelligence and cognitive science. 13th Irish conference, AICS 2002, Limerick, Ireland, September 2002 (pp. 177–182). Berlin: Springer.
Glass, D. H. (2005). Problems with priors in probabilistic measures of coherence. Erkenntnis, 63, 375–385.
Article Google Scholar
Good, I. J. (1984). The best explicatum for weight of evidence. Journal of Statistical Computation and Simulation, 19, 294–299.
Article Google Scholar
Greiner, B. (2004). An online recruitment system for economic experiments. In K. Kremer & V. Macho (Eds.), Forschung und wissenschaftliches Rechnen 2003, GWDG Bericht 63 (pp. 79–93). Goettingen: Ges. fuer Wiss. Datenverarbeitung.
Google Scholar
Harris, A., & Hahn, U. (2009). Bayesian rationality in evaluating multiple testimonies: Incorporating the role of coherence. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35(5), 1366–1373.
Google Scholar
Jeffreys, H. (1961). Theory of probability. Oxford: Oxford University Press.
Google Scholar
Joyce, J. (2008). Bayes’ theorem. http://plato.stanford.edu/archives/fall2008/entries/bayes-theorem/.
Kemeny, J., & Oppenheim, P. (1952). Degrees of factual support. Philosophy of Science, 1952, 307–324.
Article Google Scholar
Keynes, J. (1921). A treatise on probability. London: Macmillan.
Google Scholar
Kolmogorov, A. (1956). Foundations of the theory of probability. New York: AMS Chelsea Publishing.
Google Scholar
Koscholke, J. (2015). Evaluating test cases for probabilistic measures of coherence. Erkenntnis. doi:10.1007/s10670-015-9734-1.
Kuipers, T. A. F. (2000). From instrumentalism to constructive realism. Dordrecht: Reidel.
Book Google Scholar
Levi, I. (1962). Corroboration and rules of acceptance. British Journal for the Philosophy of Science, 13, 307–313.
Google Scholar
Meijs, W. (2005). Probabilistic measures of coherence. PhD thesis, Erasmus University, Rotterdam.
Meijs, W. (2006). Coherence as generalized logical equivalence. Erkenntnis, 64, 231–252.
Article Google Scholar
Meijs, W., & Douven, I. (2007). On the alleged impossibility of coherence. Synthese, 157(3), 347–360.
Article Google Scholar
Mortimer, H. (1988). The logic of induction. Paramus: Prentice Hall.
Google Scholar
Nozick, R. (1981). Philosophical explanations. Oxford: Clarendon.
Google Scholar
Olsson, E. J. (2002). What is the problem of coherence and truth? The Journal of Philosophy, 94, 246–272.
Article Google Scholar
Olsson, E. J. (2005). Against coherence: Truth, probability and justification. Oxford: Oxford University Press.
Book Google Scholar
Pfeiffer, P. (1990). Probability for applications. New York: Springer.
Book Google Scholar
Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D., & R Core Team (2013). nlme: Linear and nonlinear mixed effects models. http://CRAN.R-project.org/package=nlme.
Popper, K. R. (1954). Degree of confirmation. British Journal for the Philosophy of Science, 5, 143–149.
Article Google Scholar
R Core Team (2015). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
Rescher, N. (1958). Theory of evidence. Philosophy of Science, 25, 83–94.
Article Google Scholar
Rescher, N. (1973). The coherence theory of truth. Oxford: Oxford University Press.
Google Scholar
Rips, L. J. (2001). Two kinds of reasoning. Psychological Science, 12, 129–134.
Article Google Scholar
Roche, W. (2013). Coherence and probability: A probabilistic account of coherence. In M. Araszkiewicz & J. Savelka (Eds.), Coherence: Insights from philosophy, jurisprudence and artificial intelligence (pp. 59–91). Dordrecht: Springer.
Chapter Google Scholar
Schippers, M. (2014). Probabilistic measures of coherence: From adequacy constraints towards pluralism. Synthese, 191(16), 3821–3845.
Article Google Scholar
Schippers, M., & Siebel, M. (2015). Inconsistency as a touchstone for coherence measures. Theoria, 30, 11–41.
Article Google Scholar
Schupbach, J. N. (2011). New hope for Shogenji’s coherence measure. British Journal for the Philosophy of Science, 62(1), 125–142.
Article Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.
Article Google Scholar
Shogenji, T. (1999). Is coherence truth conducive? Analysis, 59, 338–345.
Article Google Scholar
Shogenji, T. (2012). The degree of epistemic justification and the conjunction fallacy. Synthese, 184, 29–48.
Article Google Scholar
Siebel, M. (2004). On Fitelson’s measure of coherence. Analysis, 64, 189–190.
Article Google Scholar
Siebel, M., & Wolff, W. (2008). Equivalent testimonies as a touchstone of coherence measures. Synthese, 161, 167–182.
Article Google Scholar
Wagenmakers, E. J. (2007). A practical solution to the pervasive problems of $p$ values. Psychonomic Bulletin & Review, 14, 779–804.
Article Google Scholar
Weller, J. A., Dieckmann, N. F., Tusler, M., Mertz, C. K., Burns, W. J., & Peters, E. (2013). Development and testing of an abbreviated numeracy scale: A Rasch analysis approach. Journal of Behavioral Decision Making, 26, 198–212.
Article Google Scholar

Download references

Acknowledgments

We would like to thank (in alphabetical order) Arndt Bröder, Andreas Glöckner, Björn Meder, Michael Schippers and Mark Siebel for their contributions. We would also like to thank the participants of the Operationalization Workshop 2013 in Freiburg for helpful comments. This work was supported by grant SI 1731/1-1 to Mark Siebel and grant GL 632/3-1 and BR 2130/8-1 to Andreas Glöckner and Arndt Bröder from the Deutsche Forschungsgemeinschaft (DFG) as part of the priority program “New Frameworks of Rationality” (SPP 1516).

Author information

Authors and Affiliations

Philosophy Department, University of Oldenburg, 26111, Oldenburg, Germany
Jakob Koscholke
Institut für Psychologie, FernUniversität in Hagen, 58084, Hagen, Germany
Marc Jekel

Authors

Jakob Koscholke
View author publications
You can also search for this author in PubMed Google Scholar
Marc Jekel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jakob Koscholke.

Appendices

Appendix 1: Test cases

1.1 Akiba’s (2000) Die case

Imagine rolling a fair die and consider the following three statements:

$S_1:$ The die comes up 2.
$S_2:$ The die comes up 2 or 4.
$S_3:$ The die comes up 2 or 4 or 6.

Which pair of statements fits together better. Statement 1 and 2 or statement 1 and 3?

1.2 Bovens and Hartmann’s (2003) tweety case

Situation 1:
Consider a population of 100 animals. 50 out of 100 animals are birds and 50 out of 100 animals cannot fly. Among these 100 animals there is exactly one animal that is a bird and cannot fly. Randomly pick one animal and consider the following two statements:
- $S_1:$ The picked animal is a bird.
- $S_2:$ The picked animal cannot fly.
Situation 2:
Consider a population of 100 animals. 50 out of 100 animals are birds and 50 out of 100 animals cannot fly. Among these 100 animals there is exactly one penguin and therefore a bird that cannot fly. Randomly pick one animal and consider the following three statements:
- $S_1:$ The picked animal is a bird.
- $S_2:$ The picked animal cannot fly.
- $S_3:$ The picked animal is a penguin.

In which of the two situations do the respective sets of statements fit together better?

1.3 Bovens and Hartmann’s (2003) Tokyo murder case

Imagine that a murder has occurred in Toyko and the corpse is still to be found. In order to search more efficiently the map of Tokyo is separated into 100 equally-sized where the probability of finding the corpse is the same for each square. Now, 5 pairs of equally reliable and independent witnesses give the following statements as witness reports:

Pair 1:
- $S_1:$ The corpse is in squares 50 to 60.
- $S_2:$ The corpse is in squares 51 to 61.
Pair 2:
- $S_1:$ The corpse is in squares 22 to 55.
- $S_2:$ The corpse is in squares 55 to 90.
Pair 3:
- $S_1:$ The corpse is in squares 20 to 61.
- $S_2:$ The corpse is in squares 50 to 91.
Pair 4:
- $S_1:$ The corpse is in squares 41 to 60.
- $S_2:$ The corpse is in squares 51 to 70.
Pair 5:
- $S_1:$ The corpse is in squares 39 to 61.
- $S_2:$ The corpse is in squares 50 to 72.

Which pair of statements fits together best, which worst? Can you give and ordering where the first pair is the best and the last pair the worst?

1.4 Glass’ (2005) Dodecahedron case

Situation 1: You are rolling a fair die.
Situation 2: You are rolling a fair dodecahedron.

Now consider the following two statements:

$S_1:$ The result will be 2.
$S_2:$ The result will be 2 or 4.

In which of the two situation do these two statements fit together better?

1.5 Meijs’ (2005) Samurai case

Situation 1:
There are 10,000,000 suspects in a murder case. 1059 of them are Japanese and also 1059 own a samurai sword such that in total there are 9 suspects who are Japanese and own a samurai sword at the same time.
Situation 2:
There are 100 suspects in a murder case. 10 of them are Japanese and also 10 own a samurai sword such that in total there are 9 suspects who are Japanese and own a samurai sword at the same time.

Now consider the following two statements:

$S_1:$ The murderer is Japanese.
$S_2:$ The murderer owns a samurai sword.

In which of the two situations do the two statements fit together better?

1.6 Meijs’ (2006) Albino rabbit case

Situation 1:
There are 102 rabbits on the first island. 101 out of these 102 rabbits are grey. Also, 101 out of 102 rabbits have two ears. In total there are 100 out of 102 rabbits which are grey and have two ears at the same time. Consequently, there is exactly one rabbit which is grey but does not have two ears and exactly one rabbits which is not grey but has two ears.
Situation 2:
There are 102 rabbits on the second island, too. 100 out of these 102 rabbits are grey. Also, 100 out of 102 rabbits have two ears. In total there are 100 out of 102 rabbits which are grey and have two ears at the same time. Consequently, every grey rabbit has two ears and every rabbit that has two ears is also grey.

Now, randomly pick one rabbit and consider the following two statements:

$S_1:$ The rabbit is grey.
$S_2:$ The rabbits has two ears.

In which of the two situations do these two statements fit together better?

1.7 Meijs and Douven’s (2007) plane lottery case

Imagine the following lottery. The chances are 4 / 100 for flying to the North pole, 49 / 100 for flying to the South pole and 47 / 100 for flying to New Zealand. The probability for seeing a penguin at the North pole is 0, at the South pole it is 10 / 49 and in New Zealand it is 1 / 47. Now consider the following two situations in which when having landed one is confronted with two statements.

Situation 1:
- $S_1:$ You are landing at the North pole.
- $S_2:$ The animal you see is a penguin.
Situation 2:
- $S_1:$ You are landing at the South pole.
- $S_2:$ The animal you see is a penguin.

In which of the two situations do the respective statements fit together better?

1.8 Schippers and Siebel’s (2015) inconsistent testimony case

Imagine there are 8 suspects for a robbery. It is certain that exactly one of them is the robber. Consider the following two situations in which two independent and equally reliable witnesses give statements about the robber:

Situation 1:
- $S_1:$ The robbery was committed by suspect 1 or 2.
- $S_2:$ The robbery was committed by suspect 2 or 3.
- $S_3:$ The robbery was committed by suspect 1 or 3.
Situation 2:
- $S_1:$ The robbery was committed by suspect 1 or 2.
- $S_2:$ The robbery was committed by suspect 3 or 4.
- $S_3:$ The robbery was committed by suspect 5 or 6.

In which of these two situations do the respective sets of statements fit together better?

1.9 Schupbach’s (2011) Robber case

Imagine there are 10 suspects for a robbery. It is certain that exactly one of them is the robber. Consider the following two situations in which two independent and equally reliable witnesses make give statements about the robber:

Situation 1:
- $S_1:$ The robbery was committed by suspect 1 or 2 or 3.
- $S_2:$ The robbery was committed by suspect 1 or 2 or 4.
- $S_3:$ The robbery was committed by suspect 1 or 3 or 4.
Situation 2:
- $S_1:$ The robbery was committed by suspect 1 or 2 or 3.
- $S_2:$ The robbery was committed by suspect 1 or 4 or 5.
- $S_3:$ The robbery was committed by suspect 1 or 6 or 7.

In which of these two situations do the respective sets of statements fit together better?

1.10 Siebel’s (2004) pickpocketing robber case

Imagine the following situation. There are 10 equally likely suspects for a murder. 8 out of 10 have committed a pickpocketing before, 8 out of 10 have committed a robbery and in total 6 out of 10 have committed a pickpocketing and a robbery. Now consider the following two statements:

$S_1:$ The murderer has committed a robbery.
$S_2:$ The murderer has committed a pickpocketing.

Do these two statements fit together or not?

Appendix 2: Test case results

See Table 2.

Table 2 Summary of the results

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koscholke, J., Jekel, M. Probabilistic coherence measures: a psychological study of coherence assessment. Synthese 194, 1303–1322 (2017). https://doi.org/10.1007/s11229-015-0996-6

Download citation

Received: 01 September 2015
Accepted: 14 December 2015
Published: 11 January 2016
Issue Date: April 2017
DOI: https://doi.org/10.1007/s11229-015-0996-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Probabilistic coherence measures: a psychological study of coherence assessment

Abstract

Similar content being viewed by others

A General Framework for Probabilistic Measures of Coherence

The logic of coherence

Evaluating Test Cases for Probabilistic Measures of Coherence

1 Introduction

2 Probabilistic measures of coherence

2.1 Deviation from independence measures

2.2 Relative overlap measures

2.3 Average mutual support measures

3 Methods

3.1 Participants

3.2 Procedure and materials

3.3 Assessment of predictive accuracy

4 Results

4.1 Choices

4.2 Rankings

4.3 Judgments

4.4 Ability and personality as predictors of coherence-judgments

5 Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Test cases

1.1 Akiba’s (2000) Die case

1.2 Bovens and Hartmann’s (2003) tweety case

1.3 Bovens and Hartmann’s (2003) Tokyo murder case

1.4 Glass’ (2005) Dodecahedron case

1.5 Meijs’ (2005) Samurai case

1.6 Meijs’ (2006) Albino rabbit case

1.7 Meijs and Douven’s (2007) plane lottery case

1.8 Schippers and Siebel’s (2015) inconsistent testimony case

1.9 Schupbach’s (2011) Robber case

1.10 Siebel’s (2004) pickpocketing robber case

Appendix 2: Test case results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation