1 Introduction

Loosely speaking, coherence is the property of propositions hanging or fitting together, dovetailing with or mutually supporting each other (cf. BonJour 1985; Olsson 2005). It is the key concept of any coherentist theory of justification or truth. Nevertheless, as many authors have pointed out, coherentists have put only little emphasis on elucidating their central concept—or in Nicholas Rescher’s words “the coherence theorists themselves have not always been too successful in explicating the nature of coherence” (Rescher 1973, p. 33). In order to overcome this supposed shortcoming, various philosophers have attempted to provide a mathematically precise explication of the concept coherence using probability theory. The results are so-called probabilistic coherence measures (cf. Douven and Meijs 2007; Fitelson 2003, 2004; Glass 2002; Meijs 2006; Olsson 2002; Roche 2013; Schippers 2014; Schupbach 2011; Shogenji 1999). Of course, these measures have to be examined with respect to their claim of measuring coherence adequately. So far, there have been two common ways to do that: (i) formulating adequacy constraints and proving whether they are satisfied by a measure or not, (ii) developing paradigmatic test cases providing an intuitive normative coherence assessment and testing whether a measure is in line with this assessment or not.

This paper concentrates on the second approach, although in a slightly different way. Rather than using test cases and the provided normative coherence assessment as a benchmark for probabilistic coherence measures, the test cases are used as vignettes for psychological experiments, in which participants are asked for subjective coherence assessments of specified sets of propositions. Accordingly, the results of the experiments can be used to (i) evaluate the normative coherence assessments provided by the test cases and to (ii) evaluate the suitability of the tested measures as predictors of the participants’ coherence assessments. The paper is structured as follows. In Sect. 2 a collection of probabilistic coherence measures that have been proposed in the literature is introduced. In Sects. 3 and 4 the psychological study including methods and results is described. Finally, Sect. 5 discusses which conclusions can be drawn from the results.

2 Probabilistic measures of coherence

The notion of a probabilistic coherence measure can be introduced formally in a straightforward manner. Let L be a classical propositional language consisting of atomic formulas closed under some functional complete selection of classical logical connectives such as e.g. \(\{\lnot ,\wedge \}\) and let \(P:L\rightarrow [0,1]\) be a probability function over L with conditional probability defined by \(P(x_1|x_2)=P(x_1\wedge x_2)/P(x_2)\) for any \(x_2\in L\) with \(P(x_2)\ne 0\). Furthermore, let \(2^L_{\ge 2}\) denote the set of all non-empty, non-singleton subsets of L and \(\mathbf {P}\) the set of all probability functions over L. A probabilistic coherence measure can then be defined as a partial function \(C:2^L_{\ge 2}\times \mathbf {P}\rightarrow \mathbb {R}\) assigning real numbers to sets of propositions under some joint probability distribution. By contrast, a probabilistic measure of support, on which a coherence measure can be based, is a partial function \(S:L\times L\times \mathbf {P}\rightarrow \mathbb {R}\) assigning real numbers to pairs of propositions under some probability distribution where the first argument is commonly interpreted as a hypothesis and the second as a piece of evidence. Notice that we will omit reference to a particular probability function as a separate function argument of coherence or support measures.

Still, these are only very general requirements a probabilistic coherence measure should meet. The question which probabilistic information should be taken into account by a probabilistic coherence measure in order to adequately quantify the degree of coherence has been answered in different ways leading to different kinds of measures. They can be categorized into three groups: (i) measures that quantify coherence in terms of deviation from probabilistic independence (see Sect. 2.1), (ii) in terms of relative set-theoretic overlap (see Sect. 2.2) and (iii) in terms of average mutual support (see Sect. 2.3). In the following we briefly introduce the approaches and the resulting measures.

2.1 Deviation from independence measures

According to standard textbooks on probability theory (cf. e.g. Kolmogorov 1956), a set X of propositions \(x_1,\ldots ,x_n\) is said to be n-wise negatively dependent iff \(P(x_1\wedge \ldots \wedge \,x_n)<P(x_1)\times \ldots \times P(x_n)\), independent iff \(P(x_1\wedge \ldots \wedge x_n)=P(x_1)\times \ldots \times P(x_n)\) and positively dependent iff \(P(x_1\wedge \ldots \wedge x_n)>P(x_1)\times \ldots \times P(x_n)\). This definition can be rearranged by dividing the term on the left hand side by the term on the right hand side. Positive dependence is then defined as a value in the open interval \((1,\infty )\), independence as a value of 1 and negative dependence as a value in the half-open interval [0, 1). This can be considered the basic idea underlying Shogenji’s (1999) coherence measure. According to Shogenji, the degree of coherence of a finite set X of propositions can be computed by dividing the joint probability of X’s propositions by the product over their marginal probabilities. This quantifies the propositions’ ratio-wise deviation from their independence threshold value \(\theta =1\). This value is interpreted as neutrality such that values below \(\theta \) indicate degrees of incoherence and values above \(\theta \) indicate degrees of coherence:

$$\begin{aligned} {C}_{sho}(X)=\frac{P\left( \bigwedge _{x_i\in X}x_i\right) }{\prod _{x_i\in X}P(x_i)} \end{aligned}$$

As Fitelson (2003) and Schupbach (2011) have pointed out, Shogenji’s measure suffers from a lack of subset sensitivity when applied to a set of more than two propositions. This is due to the fact that for any set X consisting of n propositions there are probability distributions such that there are subsets of X which are i-wise negatively dependent, independent or dependent but not j-wise negatively dependent, independent or dependent for \(i\ne j\) where \(i,j\le n\) (cf. Pfeiffer 1990). Therefore, Schupbach (2011) has suggested the following alternative generalization: to assess the degree of coherence of X, apply a log-normalized version of \(C_{sho}\) to each set \(X'_{ij}\) which is the i-th subset of X and contains \(j\ge 2\) propositions. For each of them divide its coherence value by the number of sets with j members, sum up the resulting values and divide this sum by X’s cardinality minus one ignoring singleton sets:

$$\begin{aligned} {{C}_{sch}(X)=\dfrac{\sum \nolimits _{j=2}^{n}\sum \nolimits _{i=1}^{\left( {\begin{array}{c}n\\ j\end{array}}\right) } \log \left( {C}_{sho}(X'_{ij})\right) \times \left( {\begin{array}{c}n\\ j\end{array}}\right) ^{-1}}{n-1}} \end{aligned}$$

Although the measure is more fine-grained it is still based on the idea of measuring coherence in terms of their deviation from independence. However, due to the log-normalization the threshold value of Schupbach’s measure for neutrality is \(\theta =0\), such that values in \((-\infty ,0)\) indicate degrees of incoherence and values in \((0,\infty )\) indicate degrees of coherence.

2.2 Relative overlap measures

Glass (2002) and Olsson (2002) have proposed a different measure of coherence that is based on a set-theoretically inspired understanding of coherence. Here, the joint probability over all propositions \(x_1\ldots ,x_n\) in some set X is interpreted as their overlapping set-theoretic surface. Likewise, the probability that any of these propositions is true is interpreted as their total set-theoretic surface. In order to compute the degree of coherence of X Glass and Olsson suggest to simply divide the probability of the conjunction by the probability of the disjunction over X’s members. Set-theoretically speaking this can be understood as quantifying the propositions’ relative overlap:

$$\begin{aligned} {C}_{go}(X)=\frac{P\left( \bigwedge _{x_i\in X}^{}x_i\right) }{P\left( \bigvee _{x_i\in X}x_i\right) } \end{aligned}$$

It is easy to see that the measure has the codomain [0, 1] where 0 means no overlap at all and 1 means identity of overlap and total surface of the propositions. But unlike the two measures mentioned before the threshold \(\theta \) cannot be based on probabilistic independence. One can, however, argue that the threshold is .5 in the case of two propositions \({x_1,x_2}\), since values below this threshold would indicate that \(x_1\) coheres better with \(\lnot x_2\) than with \(x_2\). In any case, Bovens and Hartmann (2003) have shown that this measure has similar problems with respect to subset-sensitivity like Shogenji’s measure. In order to overcome these difficulties Meijs (2006) has suggested the following alternative: in order to assess the coherence of X, take the straight average over all \(C_{go}\) values applied to every subset \(X'_i\) of X with \(|X'_i|\ge 2\):

$$\begin{aligned} {C}_{mei}(X)=\frac{\sum _{i=1}^{(2^{n}-n)-1}{C}_{go}(X'_i)}{(2^{n}-n)-1} \end{aligned}$$

This measure is obviously more fine-grained but it is easy to see that the codomain [0, 1] and the threshold \(\theta \) remain the same.

2.3 Average mutual support measures

A whole family of coherence measures can be obtained using an approach systematically developed by Douven and Meijs (2007). According to their approach, coherence is to be understood as average mutual support. And since there is a variety of probabilistic measures of support (for overviews cf. Crupi et al. 2007; Festa 2012) one can easily obtain a huge collection of candidates for coherence measures based on them. The basic idea runs as follows: to assess the coherence of X, consider all pairs \((X',X'')_i\) where \(X'\) and \(X''\) are non-empty, disjoint subsets of X. For each pair, take the conjunctions over the propositions contained in the respective set and calculate the average degree of support according to some chosen probabilistic support measure S:

$$\begin{aligned} {C_{S}}(X)=\dfrac{\sum ^{{(3^{n}-2^{n+1})+1}}_{i=1}{S}\left( \bigwedge _{x_j\in X'} x_j,\bigwedge _{x_k\in X''} x_k\right) _i}{(3^n-2^{n+1})+1} \end{aligned}$$

For his coherence measure Fitelson (2004) has chosen a case-sensitive variation of Kemeny and Oppenheim’s (1952) measure of factual support. The values of the resulting coherence measure are in \([-1,1]\) with \(\theta =0\):

$$\begin{aligned} {S}_{fit}(x_1,x_2)={\left\{ \begin{array}{ll} \frac{P(x_2|x_1)-P(x_2|\lnot {x_1})}{P(x_2|x_1)+P(x_2|\lnot {x_1})} &{}\,\,\, {\text {if}}\;x_2\nvdash x_1\,{\text {and}}\,x_2\nvdash \lnot x_1\\ 1 &{}\,\,\, {\text {if}}\;x_2\vdash x_1\;{\text {and}}\;x_2\nvdash \bot \\ -1 &{}\,\,\, {\text {if}}\;x_2\vdash \lnot x_1 \end{array}\right. } \end{aligned}$$

Douven and Meijs (2007) have investigated three further support measures as foundations for probabilistic coherence measures, namely Carnap’s (1950) difference measure with codomain \([-1,1)\) and \(\theta =0\), Keynes’ (1921) relevance quotient and Good’s (1984) likelihood ratio measure both with codomain \([0,\infty )\) and \(\theta =1\). Notice that due to commutativity or ordinal equivalence (up to identity) one would obtain identical coherence measures if one used Levi’s (1962) corroboration measure or Mortimer’s (1988) confirmation measure instead of Carnap’s difference measure, Kuipers’ (2000) confirmation measure or Finch’s (1960) confirmation measure +1 instead of Keynes’ and finally Joyce’s (2008) odds-ratio measure instead of Good’s likelihood-ratio measure. Douven and Meijs’ favourite is the coherence measure based on Carnap’s difference measure:

Siebel and Wolff (2008) have extended the collection of candidate measures by taking into account Carnap’s (1950) relevance measure with codomain \((-1,1)\) and \(\theta =0\), Nozick’s (1981) counterfactual likelihood difference measure, Popper’s (1954) corroboration measure and Rescher’s (1958) measure of evidential support, all three with values in \([-1,1]\) with \(\theta =0\):

$$\begin{aligned} \begin{array}{c} \displaystyle {S}_{car'}(x_1,x_2)=P(x_1\wedge x_2)- P(x_1)\cdot P(x_2)\\ \displaystyle {S}_{noz}(x_1,x_2)=P(x_2|x_1)-P(x_2|\lnot {x_1})\\ \displaystyle {S}_{pop}(x_1,x_2)=\frac{P(x_2|x_1)-P(x_2)}{P(x_2|x_1)+P(x_2)}\cdot \left( 1+P(x_1)\cdot P(x_1|x_2)\right) \\ \displaystyle {S}_{res}(x_1,x_2)=\frac{P(x_1|x_2)-P(x_1)}{1-P(x_1)}\cdot P(x_2)\\ \end{array} \end{aligned}$$

A more recent proposal is due to Roche (2013). His favourite coherence measure is based on Douven and Meijs’ approach and a case-sensitive version of absolute—as opposed to incremental—support, namely the conditional probability. The codomain of the resulting measure obviously is [0, 1] but just like in the case of the Glass-Olsson measure \(\theta =0.5\) although the interpretation of this value differs from the interpretation of the \(\theta \) values of the other measures. Here, values above \(\theta \) mean that some proposition is supported to a stronger degree than its negation, while values below indicate the opposite.

$$\begin{aligned} {S}_{roc}(x_1,x_2)={\left\{ \begin{array}{ll} P(x_1|x_2) &{}\,\,\, {\text {if}}\;x_2\nvdash x_1\,{\text {and}}\,x_2\nvdash \lnot x_1 \\ 1 &{}\,\,\, {\text {if}}\;x_2\vdash x_1\;{\text {and}}\;x_2\nvdash \bot \\ 0 &{}\,\,\, {\text {if}}\;x_2\vdash \lnot x_1 \end{array}\right. } \end{aligned}$$

Another recent coherence measure has been developed by Schippers (2014) and is based on his own measure of support. The values of this measure are in \([-1,1]\) with \(\theta =0\). Notice that one can obtain the very same coherence measure by using the so-called power PC measure by Cheng (1997).

$$\begin{aligned} {S}_{sch}(x_1,x_2)={\left\{ \begin{array}{ll} \frac{P(x_1|x_2)-P(x_1|\lnot {x_2})}{1-P(x_1|\lnot {x_2})} &{}\,\,\, {\text {if}}\; P(x_1|x_2)\ge P(x_1)\\ \frac{P(x_1|x_2)-P(x_1|\lnot {x_2})}{P(x_1|\lnot {x_2})} &{} \,\,\,{\text {if}}\; P(x_1|x_2)<P(x_1) \end{array}\right. } \end{aligned}$$

Finally, Koscholke (2015) has added four further cadidate measures to the investigation, namely Crupi’s (2007) z-measure with values in \([-1,1]\) and \(\theta =0\), Gaifman’s (1979) measure in \([0,\infty )\) with \(\theta =1\), Rips’ (2001) measure and Shogenji’s (2012) justification measure which according to him is also a measure of evidential support both with values in in \((-\infty ,1]\) and \(\theta =0\):

$$\begin{aligned}&{S}_{cru}(x_1,x_2)={\left\{ \begin{array}{ll} \frac{P(x_1|x_2)-P(x_1)}{1-P(x_1)} &{}\,\,\, {\text {if}}\; P(x_1|x_2)\ge P(x_1)\\ \frac{P(x_1|x_2)-P(x_1)}{P(x_1)} &{}\,\,\, {\text {if}}\; P(x_1|x_2)<P(x_1) \end{array}\right. }\\&{S}_{gai}(x_1,x_2)=\dfrac{P(\lnot {x_1})}{P(\lnot {x_1}|x_2)}\\&{S}_{rip}(x_1,x_2)=1-\dfrac{P(\lnot {x_2}|x_1)}{P(\lnot {x_2})}\\&S_{sho}(x_1,x_2)=\frac{\log _2 P(x_1|x_2)-\log _2 P(x_1)}{-\log _2 P(x_1)} \end{aligned}$$

Notice that the codomains and the \(\theta \) values of the confirmation measures carry over to the coherence measures that are based on the respective measures. It is also worth noticing that although many of the presented confirmation measures have ordinal equivalent versions, the resulting coherence measures are not necessarily ordinally equivalent. Having introduced the candidate measures we may now turn to the experiments.

3 Methods

For the experiments a collection of test cases from Koscholke (2015) has been employed as vignettes. These test cases include Akiba’s dice case (cf. Akiba 2000), Bovens and Hartmann’s Tweety and their Tokyo murder cases (cf. Bovens and Hartmann 2003), Glass’ dedecahedron case (cf. Glass 2005), Meijs’ samurai and his rabbit case (cf. Meijs 2005), Meijs and Douven’s plane lottery case (cf. Meijs and Douven 2007), Schupbach’s robber case (cf. Schupbach 2011), Siebel’s pickpocketing case (cf. Siebel 2004) and Siebel and Schippers’ inconsistent testimony case (cf. Schippers and Siebel 2014). An overview of the employed test cases is given in Appendix 1, the test case results for each measure in Appendix 2.

Notice that Harris and Hahn (2009) have provided a very similar study to the one presented here. However, they only investigated the empirical adequacy of Bovens and Hartmann’s (2003) coherence quasi-ordering and only for a modified version of their Tokyo murder case. The present study can therefore be understood as an extension of Harris and Hahn’s project with respect to coherence measures and with respect to test cases.

3.1 Participants

57 participants (36 female, mean age = 25.8) were recruited from the Decision Lab Subject Pool of the University of Göttingen using the online recruiting tool ORSEE (cf. Greiner 2004). Participants received a show-up fee of 7 Euros (approx. USD 9.50) or course-credit.

3.2 Procedure and materials

The participants answered three questionnaires online no later than twelve hours before they arrived for the main study in the lab. The questionnaire included a translation of the brief form of the preference for consistency scale (cf. Cialdini et al. 1995) consisting of nine items, the numeracy scale (cf. Weller et al. 2013) consisting of fourteen items, and the cognitive reflection test (cf. Frederick 2005) consisting of three items. In the lab participants were presented the ten test cases in random order. Except for Bovens and Hartmann’s (2003) Tokyo murder case and Siebel’s (2004) pickpocketing case, each test case consists of two sets of propositions. Participants were first asked to indicate in which of the two sets the propositions fit together better or if they fit together equally well. Then participants were asked to use a continuous slider ranging from \(-\)100 to 100 to indicate the degree to which the propositions for each set fit together. In Bovens and Hartmann’s Tokyo murder case participants were asked to rank order the five sets of propositions according to how well the propositions fit together. Here, they also had to rate the degree of coherence of each set of propositions using the slider. For Siebel’s (2004) pickpocketing case the participants were asked if the propositions fit together or not. Then again the participants had to use the slider to evaluate how well the propositions fit together. Finally, participants were asked to provide demographic data and received a written debriefing.

3.3 Assessment of predictive accuracy

We assessed three variables to evaluate how well the coherence measures predict participants’ coherence assessments. We recorded if participants chose the first or second set of propositions as more coherent or if participants chose that the sets were equally coherent. The first variable choices (see Sect. 4.1) is the agreement between participants’ choices and the coherence assessments of each measure. For Bovens and Hartmann’s Tokyo murder case we recorded the coherence ranking participants gave to the five sets of propositions. The second variable ranking (see Sect. 4.2) is the percentage of participants who ranked propositions according to the rankings given by the measures. We also recorded the continuous coherence judgments participants gave for each set of propositions in each test case. The third variable judgments (see Sect. 4.3) is the fit between the observed judgments and coherence predictions as assessed in a mixed-linear-regression model for each measure as explained in more detail below.

4 Results

4.1 Choices

Most measures can predict participants’ choices better than chance, i.e. 33 % for three choice-options.Footnote 1 Correctly predicted choices range from 31 to 60 % between the measures. The three best measures—\(C_{go}, C_{mei}\) and \(C_{S_{roc}}\)—perform equally well around 59% to 60% of correctly predicted choices (see Fig. 1).

Fig. 1
figure 1

Correctly predicted choices

4.2 Rankings

Only six particpants (i.e. 11 %) rank-ordered the five pairs of propositions in Bovens and Hartmann’s Tokyo murder test case in the way predicted by 44 % of all measures, i.e. rank-order: 1, 5, 4, 3, 2. A majority of 19 participants (33 %) used a similar ranking differing only in the ranking of the final two pairs of propositions, i.e. rank-order 1, 5, 4, 3, 2 versus 1, 5, 4, 2, 3. Thus, if we allow for one error in the ranking of the final two propositions, 44 % of the measures predict 44 % of participants correctly. Furthermore, allowing a switch in the second and third ranking (i.e. rank-order 1, 4, 5, 2, 3 or 1, 4, 5, 3, 2), the remaining 56 % of measures predict another 14 % of all participants’ rankings. Overall, 68 % of participants behave (although not perfectly) in line with at least one of the measures. This also means that a considerable percentage of participants (i.e. 32 %) do not rank-order the pairs of propositions in accordance with any measure. The three best measures for predicting choices—\(C_{go}, C_{mei}\) and \(C_{S_{roc}}\)—also predict the ranking that a majority of participants gave quite well (see Fig. 2).

Fig. 2
figure 2

Correctly predicted rankings

4.3 Judgments

The investigated coherence measures differ in the assessment of coherence in various test cases, which results in a unique profile for each measure (corresponding to the rows of Table 2 in Appendix 2). We used these profiles as predictors for participants’ continuous coherence judgments to test how well the measures can account for the judgments. Conceptually, for each measure j we fitted the profile to the coherence judgments of all participants. To account for the different scaling between predictions and judgments ranging from \(-\)100 to 100 used in the study, the regression weight of a factor \(b_{ij}\) was estimated from the data to expand the profile. To account for differences in the extent of scaling for each participant i and to also account statistically for repeated ratings from the same participants, \(b_{ij}\) consists of the sum of a value shared by all participants and an individual value estimated from the judgments of all participants and participant i. We further accounted for the direction of the predictions by subtracting the neutrality value from all values for each measure to restrict predictions above the neutrality value to judgments above 0, below the neutrality value to judgments below 0, and predictions identical to the neutrality value to zero-judgments. Technically, this can be achieved by including the prediction profile of a measure after subtracting the neutrality value in a mixed-linear-regression without an intercept as a fixed effect and a random effect for each participant.

In order to compare the measures we used the Bayesian Information Criterion (BIC) (cf. Schwarz 1978) from each regression model for each measure as an indicator of how well a measure can account for the participants’ continuous coherence judgments (see Fig. 3). The results from the analysis using a software package for linear and nonlinear mixed effects models (cf. Pinheiro et al. 2013) in R (cf. R Core Team 2015) show that the measure \(C_{S_{roc}}\) can account for participants’ judgments best. The evidence from the data for \(C_{S_{roc}}\) is extreme with a Bayes-factor of \(2\times 10^{11}\) between \(C_{S_{roc}}\) and the next best fitting measure (cf. Jeffreys 1961; Wagenmakers 2007). Overall, predictions based on \(C_{S_{roc}}\) can describe participants’ judgment ratings very well (see Fig. 4).

Fig. 3
figure 3

Bayesian information criterion \(\varDelta \text {BIC}=\text {BIC}_{j}-\text {BIC}_{C_{S_{roc}}}\) (i.e. difference of BIC for each measure and best fitting measure by Roche 2013). Notice that \(C_{sch}, C_{S_{goo}}, C_{S_{gai}}\) and \(C_{S_{sho}}\) do not provide a BIC score since they are undefined for some test cases

Fig. 4
figure 4

Scatterplot of mean observed coherence judgments in the test cases and predicted judgments according to the best-fitting measure \(C_{S_{roc}}\) (cf. Roche 2013). Note that dotted circles around means indicate 95% confidence intervals. Pearson correlation between observed and predicted means is \(r = .84\) (\(t(18) = 6.7, p < .001\))

4.4 Ability and personality as predictors of coherence-judgments

We also analyzed the relation between individual scaling factors \(b_{i}\) for the measure \(C_{S_{roc}}\) and the participants’ ability to process numbers on the one hand and their personality on the other hand.Footnote 2 The numeracy scale measures “the ability to understand, manipulate, and use numerical information, including probabilities” (Weller et al. 2013, p. 198) by asking participants to solve mainly statistical problems (e.g. “If Person A’s chance of getting a disease is 1 in 100 in 10 years, and person B’s risk is double that of A, what is B’s risk?”). Since the most successful measure \(C_{S_{roc}}\) in predicting participants’ answers is based on conditional probabilities, we hypothesized that people who are sensitive to the measure \(C_{S_{roc}}\) as reflected in a higher scaling factor \(b_{i}\) should also score higher on the numeracy scale.

Table 1 Correlations (Pearson’s r) between the scaling factor \(b_{i}\) for \(C_{S_{roc}}\), the numeracy scale, the cognitive reflection task, and the preference for consistency scale

We found a low correlation of \(r = .30\) (\(t(49) = 2.24, p < .05\)) in the predicted direction (see Table 1). Closer inspection revealed that this correlation is driven by a single participant. After removing this participant from the analysis the correlation decreased to \(r = .13\) (\(t(48) = 0.94, p = .35\)).

The cognitive reflection test (cf. Frederick 2005) measures if people rely on their first incorrect intuitive answer or reflect more on a task before giving an answer (e.g. “A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?”). We again predicted a positive correlation between the scaling factor and the cognitive reflection test and found a low positive (\(r = .11\)) but insignificant correlation (\(t(49) = 0.79, p = 0.43\)). The preference for consistency scale (cf. Cialdini et al. 1995) measures individuals’ preference for one’s own and others’ behavior being consistent and predictable (e.g. “It is important to me that those who know me can predict what I will do”). Consistent and predictable coherence judgments can be achieved by either being sensitive to coherence and thereby clearly disentangling different degrees of coherence between sets of propositions or by being insensitive to coherence and behaving similarly regarding all sets of propositions. In the analysis we found weak support for the second account: participants with a high preference for consistency show lower scaling factors (\(r = -.28\); \(t(49) = -2.04, p < .05\)). Overall, the analyses show that the impact of ability and personality on the subjective coherence assessment is low.

5 Conclusion

The evaluation of the psychological experiments clearly shows that there are probabilistic coherence measures performing better in predicting subjective coherence assessments in the employed test cases than other measures. In particular, one measure standing out from the crowd is Roche’s (2013) coherence measure based on Douven and Meijs’ average mutual support approach and the conditional probability. This measure shows decent results with respect to comparative coherence assessments (see Sects. 4.1, 4.2) as well as absolute, continuous coherence judgments (see Sect. 4.3).

It is, however, important to notice that this does not mean that measures showing a weak performance as predictors of subjective coherence assessments should be completely disregarded as inadequate. First, being able to predict subjective coherence assessments for a specific case does not ensure that the predicted coherence assessments themselves are correct. It might turn out that based on philosophical considerations the subjective coherence assessments for a certain scenario need to be corrected and as a consequence might be better captured by a measure that has wrongly been disregarded. Second, the empirical adequacy of a probabilistic coherence measure is only one component among others—e.g. satisfaction of certain coherence desiderata or performance in coherence-related test cases—that should be taken into account when evaluating the overall adequacy of a probabilistic coherence measure. Interestingly enough, Roche’s measure also cuts a good figure in these two respects (cf. Schippers 2014; Koscholke 2015). Therefore, this investigation can be understood as providing further, empirical support for the claim that Roche’s measure is a very promising candidate for an adequate probabilistic measure of coherence.