1 Introduction

The use of analogy can be found in many different areas of thought, and there has been much theorizing about its different uses (Guarini et al. 2009). In ethics and law, analogical reasoning and argument often involve starting with some source case where there is agreement or clarity on how it should be treated, and using it to draw a conclusion about some target case where there is disagreement or confusion. The idea is that there are relevant similarities between the source and target that warrant them being treated in the same way, notwithstanding the differences between the cases. In other words, there is a kind of assessment of the overall similarity between cases that depends on some of the local similarities and differences found in those cases. This paper will be about the kind of similarity at work in analogical reasoning and argument in ethics.

In other work, I have argued against deductive reconstructions of analogical arguments in ethics and law (Guarini 2004, 2010a). In other words, I have argued against the idea that analogical arguments need to be understood as implicitly appealing to some exceptionless rule from which the classifications of the cases in question can be monotonically deduced. If the similarity between cases does not consist in the classification or judgement about the cases being deducible from a common, exceptionless rule, in what does it consist? Brewer (1996, p. 952) puzzled over this and labelled theorists as “mystics” if they denied a deductive moment to analogy. Just to be clear, it is not only analogy in law Brewer is puzzling over; his puzzlement about analogy without deduction applies to analogical arguments in general.

There is a large body of literature in Law and AI on similarity in case-based reasoning that is anything but mystical, and much of it does not treat analogy as deductive. (See Rissland 2006 for a brief overview.) Even those taking a more traditional philosophical approach to the philosophy of analogy (Postema 2002, 2007; Sunstein 1993, 1996, 1999, 2000) would not see themselves as mystics. So we have theorists who do computer modeling who do not treat analogical arguments as involving monotonic deductions from substantive rules and applicable facts, and we have more traditional philosophical approaches that do that same. This paper will make use of considerations both from computational modeling and from more traditional philosophical approaches to defend a view on the nature of similarity in analogical reasoning and argument in ethics.

The philosophical considerations are very much informed by the debates between particularists and generalists in moral philosophy. The computational considerations come from computational neural modeling. Each will be used to inform the other. An artificial neural network trained to classify moral situations will be presented, and distinctions between different types of substantive moral rule or standard will be used to interpret what is going on in the network. Also, it will be argued that computational tools that can be used to analyse network activity may be used to inform debates about particularism and generalism.

Given that computational modeling of moral case classification is used, this paper is, among other things, a contribution in Machine Ethics. As discussed in the introduction to this issue, there are practical and reflexive motives for Machine Ethics. This paper has the reflexive motivation. It builds on previous work (especially Guarini 2010b, 2011), and can be considered a sister paper to Guarini (forthcoming), which cites this work for its defense of a state space approach to similarity.

1.1 State Space Approaches to Similarity

There are different approaches to understanding similarity. Churchland (1989, 2007) has been writing for years about the ability of artificial neural networks to partition high dimensional state spaces to set up a similarity spaces. State space models of similarity have been criticized for their ineffectiveness at capturing the type of similarity needed to make sense of analogy (Markman and Gentner 2005) and have been criticised for other reasons as well (Laakso and Cottrell 2006). Among other things this work will attempt to rehabilitate state space models of similarity, at least with respect to the kind of similarity at work in moral or ethical reasoning. This will be done by presenting an Artificial Neural Network (ANN) simulation and subjecting the moral state space produced by the ANN to analysis. I will argue that these state spaces are usefully understood in terms of contributory or pro tanto standards, as discussed in the moral philosophy literature on particularism and generalism.

1.2 Implications for Particularism and Generalism

Jonathan Dancy (2004) has put forward a holistic view of the nature of moral reasons. Dancy (1999, 2004) also has suggested that ANNs might be helpful in explaining how we could learn the difference between right and wrong without making use of rules or principles. He identifies different forms of holism and contrasts his position with an atomistic conception of reasons. I prefer to think in terms of the nonlocality of reasons, with atomism being on the maximally local end of the spectrum, various forms of holism being on the other end of the spectrum, still other possibilities being somewhere in the middle, and positions become increasingly nonlocal as we move away from atomism. Dancy (2004, pp. 113–117) recognizes the possibility of positions in the middle but argues against them. In other places (Guarini 2010b, 2011) I have argued that there may be a conceptual middle ground somewhere between the more thoroughgoing forms of particularism and generalism. The arguments in this paper should be taken in that context, as an attempt to make a case for the defensibility of a conceptual middle ground. There is no denial herein that reasons are nonlocal, and that analyses of ANNs and their state spaces could help us to understand how this is so, but it does not follow that we need to go all the way to holism.

1.3 Outline

The second part of this paper introduces different conceptions of moral standards or rules. It also introduces the Moral Case Classifier (MCC), an artificial neural network trained to classify moral situations. The third part of the paper engages in a detailed discussion of the MCC, arguing that its behavior is usefully characterized with contributory standards, not in more holistic or thoroughly particularist ways. The fourth part of the paper examines the nonlocality of reasons. I argue that (a) contributory standards can allow for variable relevance and the nonlocality of reasons, and (b) the MCC can be understood in those terms. The last part of this section argues that a state space approach to similarity together with the nonlocality of reasons would render it unsurprising that we could see similarity between cases even if the cases were not being monotonically deduced from some exceptionless or total moral standard. The potential value of mathematical and computational tools in enriching philosophical discourse about the nature of reasons is also discussed. The fifth portion of the paper considers some objections and replies. Central among these has to do with the alleged inabilities of state space approaches to capture key features of similarities. Important qualifications are added throughout part five and the concluding part six.

2 Background

2.1 Approaches to Principles or Standards

McKeever and Ridge (2006) have provided a useful overview of a number of different ways of thinking of moral principles and rules. This paper will only consider two types of principle or rule or standard: the absolute, exceptionless, or total standard, and the contributory or pro tanto standard. The mark of the total standard is that it licenses monotonic, deductive entailments when combined with the appropriate facts. Consider the rule “Killing is wrong.” If we interpret that as a total standard and combine it with the fact that Jack killed Jill, we can deduce that Jack did something wrong. Moreover, if we interpret this rule as a total standard, it will not matter what other premises we add to our original premise set, we can still deduce that Jack did something wrong. Take the same rule—that killing is wrong—and interpret it as a contributory standard, and we get a different result. For example, if Jill kills Jack in self-defense, and “Killing is wrong” is a contributory standard, then we could say that killing contributed to the impermissibility of the action, but the action being done is self-defense contributed to its permissibility, and if the latter outweighs the former, then all things considered, the action is permissible. There are different ways of conceiving of contributory or pro tanto standards. On all approaches, a contributory standard specifies a direction of contribution for some feature or consideration. On a strong conception, the amount or magnitude of the consideration is fixed; on a weaker approach, the extent of the contribution made in a given direction can vary, but the direction of contribution remains constant. It is the weaker conception that will be used in this paper, and it starts to play an important role in Sect. 3.4. A more detailed discussion that looks into reasons for the variation in magnitude takes place in Sect. 4.2.

Thoroughgoing particularists such as Jonathan Dancy reject standards of all kinds. Dancy (2000, 2004) is neither a sceptic nor a relativist. He thinks that there are right answers to moral questions; he simply does not believe that we need rules or principles or standards to discover them or reason about them. When challenged (Jackson et al. 2000) as to whether a particularist understanding of moral reasons would allow us to understand how we learn the difference between right and wrong, permissible or impermissible, or the like, Dancy (1999) suggested that Artificial Neural Networks (ANNs) might be able to show us how learning about right and wrong could take place without substantive moral rules. To be sure, ANNs make use of rules—such as the training algorithm and the activation equation—but these are not substantive moral rules.

2.2 The Moral Case Classifier

The Moral Case Classifier (MCC) is a simple recurrent ANN. It has 8 input units, 24 hidden units, 1 output unit, and 24 context units. See Fig. 1. There is full interconnection between input and hidden units, and between hidden units and the output unit. The values from the hidden units are copied to the context units, which can then feed their input back to the hidden units. Details have been published elsewhere (Guarini 2010b, 2011). The network is trained to deal with actions involving two possible agents or subjects—Jack and Jill, with a different vector representing each one. There are two possible actions—killing and allowing to die, with a different vector for each of these as well. And there are six motives and eight consequences, with 14 different vectors for each of these. The idea is to feed a description of a case into the network one phrase at a time. For example, first the vector for Jill is fed in, and information is processed, and copied to the context units; then the vector for kills is fed in with the information from the context units, and everything is processed and copied to the context units; and the process continues until the entire case is presented. Table 1 provides some sample cases. The context units are used as a kind of working memory where past processing results are stored while the network builds up its own internal representation of the case at the level of the hidden units. The desired output is set to 1 for permissible actions and −1 for impermissible actions.

Fig. 1
figure 1

A schematic of the Moral Case Classifier (bottom left), with some hidden units labeled. Solid lines indicate feed forward connections; dashed lines indicate copy connections that work both ways. The values of the hidden units can be represented as a point in a state space (upper right). Given n hidden units, an n dimensional space is needed to represent one pattern of activation (or one set of values across the hidden units) as a single point in that n dimensional space

Table 1 Sample cases

Two strategies have been used in training the MCC. On one approach, the desired output is set to zero until the entire case is presented to the network. Call this straight training. On another approach, the network is being trained to classify the case as it is being presented to the network. Call this subcase training. Table 2 provides an example. Previous work (Guarini 2010b, 2011) has shown that training by subcases is vastly superior to straight training. In some instances, networks untrainable by straight training could be trained using subcase training. In instances where networks were trainable by straight training, subcase training was invariably faster. All simulations discussed herein make use of subcase training.

Table 2 Straight training versus subcase training

3 Recent Simulations

3.1 Motivating the State Space Approach

The most recent simulations run on the MCC have been done with 59 training cases and 267 testing cases. Analyses of the hidden unit activation vectors have been done with the intention of trying to understand better the kind of similarity at work in analogical reasoning in ethics. Let us have a look at an example of such reasoning.

In discussing the ethics of abortion, Thomson (1971) used the famous violinist example for a number of reasons. In this case, you are kidnapped, knocked unconscious and hooked up to a famous violinist who needs your kidneys to filter his blood; when you wake up, you are told that if you disconnect yourself, the violinist will die. Assume you have to remain hooked up for 9 months. At one point Thomson suggests that the violinist case is similar to the case of pregnancy resulting from rape. The idea appears to be that in both cases, one life has been made dependent on another through force. Some have claimed that in the case of the violinist, unplugging yourself and walking away amounts to allowing the violinist to die, and in cases of abortion, killing is taking place. According to Thomson there is enough similarity between the case of the violinist and the case of rape induced pregnancy that if it is morally permissible to “walk away” from the violinist (or allow the violinist to die), then permissibility also attaches to abortion (or killing the fetus) in the case of rape induced pregnancy. In its training and testing, the MCC was given cases that are similar to the cases we have been examining. After the network is trained, we can examine its own internal representation of all the cases it has been given. Each of these internal representations has its own pattern of activation across 24 hidden units. That pattern of activation can be represented as a vector, and plotting that vector in a state space would require 24 dimensions, one for the value of each hidden unit (or each element in the vector). Each case can be represented as a point in a 24 dimensional space. This can be a desirable thing to do since algebraic and statistical techniques then become available for the analysis of these cases/points in the network’s moral state space. If we could develop a way of visualizing more than three dimensions of information, we might be able to see how the cases are related to one another in state space.

3.2 How to Do Things with Cones

Visualizing three dimensions of information is not that difficult; more than three is another story altogether. But it is not impossible, if we “cheat” a little bit. Instead of using a point, let us use a geometric solid. It turns out that a cone can be useful in this regard. Let us place a cone in a three dimensional space. See Fig. 2. The location of the center of the base of the cone represents three dimensions of information (just as a single point would). The diameter of the base represents another dimension of information, and the height of the cone still another dimension. Normalization is used to prevent the cone from becoming too wide or thin, or too tall or short. If a cone is too thin or short, it cannot be seen; if it is too wide or tall, it could hide or occlude too many other cones represented in the same space. Where the tip of the cone is pointing can be used to represent three more dimensions of information. We can colour the base of the cone, and that colour could represent three dimensions of information using red, green, blue colour coding. Normalization is also used to restrict the bounds of each of the constituent colours. Another three dimensions of information can be represented by colouring the shell of the cone. That is a total of 14 dimensions of information. I was a little naughty when I described this as cheating. Strictly speaking, there is no cheating going on. Normalizing and scaling are perfectly common strategies for representing information. Using the dimensions of a geometric solid and the colour coding of its surfaces to represent information is non-standard, but hardly inappropriate. Granted, it does have problems that using points does not have. For example, cones can overlap, and this can make it difficult to directly examine some cones. See Fig. 3. That is a price worth paying, though, since software can be used to rotate the space, zoom in, zoom out, delete specified cones, and engage in other manipulations of the space and its objects so that the desired information can be examined.

Fig. 2
figure 2

Using a cone to represent 14 dimensions of information. The colour of the base captures 3 dimensions of information, one dimension encoded as the red component (R), one as the green component (G), and one as the blue component (B). The colour of the shell can encode another 3 dimensions of information using the same technique. See Fig. 3 for rendering in full colour

Fig. 3
figure 3

326 moral cases in 14 dimensions

Why cones? 14 dimensions served my initial purposes since the early versions of the MCC had 14 possible phrases as input, and I was interested in assessing the contribution of these phrases or features. More elaborate simulations with more phrases are being carried out, and there are plans for using different visualization techniques in the future. In short, there is nothing special about cones. Cubes could have been used, and colouring their faces yields 18 dimensions of information on its own (6 faces × 3 colour dimensions). If we add three more dimensions for the location of the centre of the cube in three dimensional space, and one more dimension for the size of the cube, that gives us 22 dimensions. If we used rectangular solids, then instead of using size to code for one dimension, we could use size to code for 3 dimensions (length, width, and depth) which brings us to 24 dimensions of information. However, much more rotation of the space is required to see all the faces of the cubes or rectangular solids, which can be time consuming. Cones provided a useful compromise between the amount of information that could be represented and the ease with which that information could be viewed.

As indicated earlier, the MCC’s moral state space has 24 dimensions. (I could not get the network to train reliably with fewer than 24 hidden units.) If using cones allows us to visualize 14 dimensions of information, which 14 dimensions should we represent? Picking 14 of the hidden units at random and plotting them is not a promising strategy since that could well leave out units that play an important role. There is a better way. Principal component analysis can be used to compute 24 dimensions of statistical variance over data in the original 24 dimensional space, with those dimensions being ranked from the greatest significance to the least. The original space can be thought of as being rotated so that that the dimension of greatest variance (first principal component) falls on the x-axis, and the second dimension of greatest variance (second principal component) falls on the y-axis, and so on for the first 14 principal components.

3.3 Similarity in MCC’s Moral State Space

All the training and testing cases are represented in Fig. 3. Each cone is plotted using the first 14 principal components of the case it represents. In essence, this is a statistically reconstructed version of the MCC’s moral state space, designed to render salient information easy to view. Notice how the distinction between permissible and impermissible shows up neatly on the x-axis (the first principal component). It is important to note that the MCC was not given any information about the similarity or dissimilarity of cases in its training set. It generates the similarity space as a byproduct of solving the problem of learning to classify cases as permissible or impermissible. In short, we get the similarity space for free. This is interesting since there is a kind of poverty of stimulus with respect to instruction about similarity; we simply do not go around telling children in great detail what is similar to what, but they do recognize and use similarity nonetheless.Footnote 1 It does not follow that similarity spaces are innate, only that any learned similarity space is not the result of direct instruction about similarities. As we saw, such a space may be a byproduct of another learning task.

While further analysis needs to be done on this data set, preliminary analysis yields some interesting results. The MCC was trained so that violinist-type cases come out permissible: x is permitted to allow y to die so that x can obtain freedom from an imposed burden. Cases that involved x killing y so that x could be freed from an imposed burden were classified as impermissible—think of abortion in cases of rape induced pregnancy. Some have been persuaded by the similarity between the violinist case and rape induced pregnancy to change their views. The point in this paper is not to take a stand on such substantive issues. Rather, it is try to understand how similarity can play the role that it sometimes does in analogical reasoning. How can we capture the idea that two cases which are initially classified in different ways are also being treated as similar?

In Fig. 3, red highlighting is used to pick out one case involving killing to free oneself from an imposed burden (i.e. an abortion-type case). This case shows up in the impermissibility region of the first principal component. The preceding notwithstanding, there is a way to see this case as more similar to the permissible cases than to the impermissible cases. But first, a brief mathematical digression is in order.

Measuring similarity is tricky business. There are many possible metrics, and it is not always obvious which one is the best to use. For forcefully articulated concerns regarding such matters, see Laakso and Cottrell (2006). Euclidean (or straight line) distance is only one possible metric. Have a look at Fig. 4a, b. To which cluster of points is X closest? If we use distance from the mean of a cluster, in Fig. 4a we would say that X is closest to the cluster B. The centre points of the clusters are the means for those clusters. If we use straight line distance from the mean of a cluster in Fig. 4b, we get an intuitively odd result: X is equidistant from or could belong to either cluster. In Fig. 4b, the Euclidean distance between X and A is the same as the distance between X and B. Fortunately, Pransanta Chandra Mahalanobis defined a statistical (non-straight line or non-Euclidean) distance measure that allows us to say that X is closer to the cluster A than to the cluster B in Fig. 4b. This point can be put in different ways: X is closer to cluster A; X is more similar to cluster A, or even X belongs in cluster A. The distance metric is called Mahalanobis distance, and is computed over the covariance matrices of the data points in question. It can be generalized to n dimensions, which is useful for our purposes. Since Mahaanobis distance is the distance of a point to a mean, we do not need multiple clusters. We can examine one cluster and compute the Mahalanobis distance of selected points from the mean of that cluster. While this metric is often used for normal and multi-normal distributions, it is sufficiently general that it can apply to distributions that fail to satisfy those constraints. Indeed, its applicability to many different types of distribution is a motivation for its use herein. I am unaware of any empirical work on human similarity judgments of ethical situations that speak to what sorts of metrics may actually be at work. Given that MD does not make assumptions about normality of distributions and how they may impact similarity, and given that there is no a priori reason for making assumptions about normality, it is a useful choice, for now. If it turns out that some other metric would be a better measure of scalar similarity between cases, so be it. There is no essential commitment here to MD. Finally, while there is a mathematical relationship between MD and principal components, MD should not be confused with distance along a single principal component. So while a case on the x-axis (first principal component) of Fig. 3 may be grouped with impermissible cases, its MD may place it closer to the permissible cases. There are other features of cases (such as its score on other principal components) which can affect the MD.

Fig. 4
figure 4

a (top), b (bottom) In a, if we use Euclidean distance, point X is closer to the mean (centre) of cluster B than the mean of cluster A. In b, if we use Euclidean distance, X is equidistant between the means of the clusters. Still in Fig. 4b, using Mahalanobis distance, X is closer to the mean of cluster A. (Color figure online)

If we use MD as our metric, it turns out the abortion-type case identified two paragraphs back is closer to or more similar to the set of permissible cases than it is to the set of impermissible cases. This is a little unexpected given that the network classifies the case as impermissible at the output layer, so this is worth looking into a bit further.

3.4 MCC and Contributory Standards

The MCC has two sets of synaptic weights encoding what it has learned about how to classify situations. The first contributes to the generation of hidden unit representations (out of which similarity space is constructed); the second set operates on the hidden unit representations to generate an output. If we are looking at the final outputs, and we want to attempt an analysis in terms of contributory standards, there are at least three possibilities we need to consider. First, killing is contributing to permissibility, and freedom from imposed burden is contributing to impermissibility, and on balance the case comes out permissible. Second, killing contributes to impermissibility, and freedom from imposed burden contributes to permissibility, and on balance the case comes out permissible. Finally, still another option is that the combination of killing and freedom from imposed burden jointly contribute to permissibility. The first option is easy to rule out. While the network was never directly trained on cases like “Jack kills Jill” or “Jill kills Jack”, when it is tested on such cases, it delivers the output of impermissible, which makes it difficult to argue that killing, in general, contributes to permissibility. We can also use a vector the network has never seen in its training as a sort of blank or dummy vector to test it on cases like this:

“Jill _____ Jack and freedom from imposed burden results.”

When we do this, the output comes back permissible. This result, combined with testing on killing makes the second of the three options look well motivated: killing is contributing to impermissibility, but the permissibility contributed by obtaining freedom from an imposed burden is outweighing the impermissibility contributed by killing. Still, someone might argue that it is not true that killing contributes one way, and freedom from an imposed burden contributes another way; rather, it might be the case that killing-resulting-in-freedom-from-imposed-burden is contributing to impermissibility as a unified whole. This is a very strong form of holism. Claiming that there is no independent contribution for killing and freedom from imposed burden is a little odd given the above results, but it is at least logically possible that when the features occur separately, they make an independent contribution, but when they occur together, there is no separate contribution of constituents; rather, there is a strictly indivisible contribution of the whole.

I will argue that the analysis of the state space mitigates against the holistic interpretation of the cases we are looking at. Even though the case we are considering (killing resulting in freedom from an imposed burden) is classified as impermissible, it is closer to permissible cases than impermissible cases. That needs explaining. If we look at the first principal component (the x-axis in Fig. 3), there is a revealing clustering pattern for the impermissible cases. Certain features appear to be playing the role of impermissibility-makers. In general, the more of the following features a case has, the further to the right it is on the x-axis: killing, doing something out of revenge, many innocents suffering, and the like. In general, the more of the following features a case has, the further to the left it is on the x-axis: doing something in self-defense, obtaining freedom from an imposed burden, relieving extreme suffering, and the like. These appear to be playing the role of permissibility makers.Footnote 2 For now, let us consider another thought provoking example of clustering. The height of cones plots the fifth principal component. Let us consider the subset of permissible cases. In that subset, cases involving freedom from imposed burden have tall cones, as do cases having lives of many innocents saved, and the combination of these two yields even taller cones. Cases involving self-defense also have tall cones, as do cases involving extreme suffering being relieved, and the combination of those yields the only cones as tall as those having freedom from imposed burden and lives of many innocents are saved. If it really were the case that killing-to-obtain-freedom-from-imposed-burden were contributing as an indivisible whole, the structure of the similarity space becomes something of a mystery. Why does killing (reliably) contribute to a case ranking further to the right of the permissibility-impermissibility axis than similar cases not involving killing? Why does adding freedom from an imposed burden to a case move it further to left of the x-axis? Why is there further clustering involving these and other features along other principal components? It is not clear how talk of indivisible wholes is helpful here.

Okay, okay, but how is it that by using MD, killing to obtain freedom from imposed burden comes out similar to the permissible cases at the level of hidden unit analysis but ends up being classified as impermissible in the final output? The above results lead me to the following abduction:

  1. (a)

    the synaptic weights at the level of the hidden units (which is where we set up the similarity spaces we have discussed) make it the case that killing contributes to impermissibility, and freedom from imposed burden contributes to permissibility, and freedom from imposed burden outweighs killing at this level, and

  2. (b)

    the synaptic weights at the level of output maintains the directions of contribution of killing (impermissible) and freedom from imposed burden (permissible), but the magnitude of the contributions change, so killing now outweighs freedom from imposed burden.

We need to explain the output classifications of cases containing both freedom from imposed burden and killing as well as outputs of cases involving just one of those features, and (b) helps us to do that. We also need to explain the structure of the similarity space, and (a) helps us to do that. In both cases, a weak contributory standard was used in the explanation.

An alternative explanation might be offered as an objection. It might be thought that the above results are due largely to differences in the number of cases that have the features we have just been discussing. While appealing to the number of cases involving killing and freedom from imposed burden as somehow skewing things in one direction is a possible response, it does not work here. The number of such cases in the training set is 4, and the number of such cases in the testing set is 33, and they all come out impermissible. The number of cases involving allowing death and freedom from imposed burden in the training set is also 4, and the number in the testing set is also 33, and they all come out permissible. The number of cases involving freedom from imposed burden that ends up impermissible is equal to the number of cases of freedom from imposed burden that ends up permissible. So if the number of cases involving killing and freedom from imposed burden is not skewing things, it is just not clear on an holistic approach why cases involving freedom from imposed burden and killing are not closer to other cases that are also classified as impermissible, such as killing to make money, or killing out of revenge than they are to permissible cases. Saying, “freedom from an imposed burden pulls such cases closer to permissibility subspace” refers to the contribution of a constituent, and this move is not open to an approach that treats of indivisible wholes. Perhaps an answer to this concern is forthcoming, but without one, the contributory approach is better motivated as an interpretation of the MCC results we are looking at than the indivisible whole approach.

Why care about hidden unit representations? The MCC only performs a low-level classification task. It does not do any high-level reasoning or reflection. However, it is conceivable that hidden unit representations of the MCC could be fed into another process that does high-level tasks. The properties of the low-level representations would then be available to the high-level processes.Footnote 3 Contributions made by specific features or relations would be among these properties since the information is implicit in the representations. Something else that might become available to the high-level process(es) is information about how one case is related to another case or even clusters. In considering all these matters, it is useful to distinguish between implicit contributory standards and explicit ones. In the above argument, the point is that it is useful to think of the MCC in terms of contributory standards that are implicit in the synaptic weights of the network. If another system were to extract that information and represent it to itself, then the standards have been made explicit. Of course, the process of attempting to render the implicit standards explicit need not be infallible, which could lead to genuine surprise in the system (or agent) when a mismatch is discovered between more or less automatic (low-level) classification of cases and the deliberate, linguistically mediated (high-level) reflection about the standards we use to classify cases. Different resources may be applied at different levels. This distinction between implicit and explicit uses of contributory standards could be used to make sense of analogical reasoning and argument. There may be a kind of low-level, implicit or intuitive sense of what is similar to what (which could be understood in terms of implicit contributory standards), and these intuitions may inform the high-level, reflective attempts to reason and argue with similarity, which would involve linguistically articulated appeals to contributing factors or standards. Guarini (forthcoming) discusses such an approach in more detail. Of course, this high-level reflective work may also feed back to the low-level classification processes and modify what we consider similar to what. I see this general approach as compatible with both relativistic and non-relativistic approaches to ethics; see Guarini (2010b) for a brief discussion.

At this point it might be objected that the use of MD and principal component analysis (PCA) is ad hoc, and any argument against interpreting the MCC’s behaviour in a nonholistic manner is based on ad hoc premises. Given the ability of MD to be applied to different types of distribution (as discussed above) there is some reason to think that it is not ad hoc as a starting place. PCA is widely used in examining the behaviour of neural networks and the state spaces they set up, so there is some motivation for this metric as well. However, to respond to this concern more fully, we need to separate the issue of which metric to select from the issue of using contributory standards in understanding similarity. It is difficult to contest that one of the many desiderata of a computational model of moral recognition, reasoning, and argument would be the ability to deal with similarity. We may well have to use other metrics; so be it if there are better metrics to use. The bigger issue is how we make sense of similarity. The moral of much of the above argument (and some of what is to come) is that indivisible wholes make it difficult to understand overall similarity and difference between cases since that similarity seems to depend on the constituents of the cases and how those constituents interact. Other metrics may well lead to different results for what counts as similar to what. However, the notion of a contributory standard is compatible with many possible metrics, so challenging a given metric does not go vary far in challenging the arguments herein. To defend the kind of holism considered above, one would have to show not only that a different metric would yield different results, but that it is somehow possible to make sense of similarity without the idea of contributory standards of any kind.

4 The Nonlocality of Reasons

A preliminary analysis of the MCC suggests that a certain kind of holistic interpretation of what it is doing simply does not work. It also suggests that interpreting the state space in terms of contributory standards is useful. We should not be satisfied with any of this. It turns out that what contributory standards might be is a rather complicated thing. Section 4.1 provides a more sophisticated account of such standards. It also turns out that holism is rather complicated as there are a number of different ways of conceiving of what it might be. Three types of holism are considered in Sect. 4.2, including a moderate form endorsed by Dancy. The variability of relevance is closely linked with Dancy’s holism. In Sect. 4.3, it is argued that a non-holistic and nonlocalFootnote 4 conception of reasons can allow for quite a bit of variable relevance without going all the way to holism. The idea that will emerge is that there is a range of possibilities that starts with an atomistic or local conception of reasons, moves through increasingly nonlocal conceptions of reasons and ends at the strongest forms of nonlocality that are holistic. The more a given feature depends on the presence of other features for the contribution it makes, the more nonlocal it will be said to be. It will be shown that there are interesting forms of nonlocality that stop short of various forms of holism.

4.1 Enablers, Disablers, Attenuators, and Amplifiers

There are different ways of thinking about contributory standards. On a weak conception, it specifies that some feature that contributes either to permissibility or to impermissibility, but not both.Footnote 5 The extent to which the feature contributes to permissibility (or impermissibility) is not specified. In other words, the direction of the contribution is specified, but amount of the contribution is not. On a strong conception of contributory standards, the extent to which some feature contributes in a given direction is specified. This second conception embodies a very local, atomistic if you will, conception of reasons. Both the direction and amount of contribution function in the same way in all cases. To find out whether an action is permissible or not would simply require adding up the permissible and impermissible reasons, each of which make a fixed contribution one way or the other, to see what we get. The first conception of contributory standards is weaker. While it allows that some features make a contribution in a specific direction, the extent of that contribution can vary. Let us examine this first conception in more detail.

While a given feature φ may contribute to permissibility, some other feature α may amplify or increase the extent to which φ contributes in the direction it does. Still another feature ω may attenuate or diminish the extent to which φ contributes in the direction it does. The same can be said for features that contribute to impermissibility. For example, for many, preplanning appears to act as an amplifier with respect to the impermissibility of killing. If you are of the view that killing contributes to impermissibility, and preplanning (or premeditating the killing) makes it even worse, then preplanning is functioning as an amplifier for killing since it is increasing the amount of its contribution in its default direction. Other features may act as attenuators on killing, diminishing the extent to which it contributes to impermissibility. To complicate matters further, a single feature may act as an amplifier under some circumstances, but as an attenuator under other circumstances. Let us consider, again, preplanning. Intuitions do vary on these matters, so the following example is used not because everyone would agree, but to show how the variability in question might work. If someone plans in great detail the robbing of a bank, and much of the planning goes into how to kill the bank tellers even if they offer no resistance, then the planning may amplify the impermissibility contributed by killing. Let us consider a very different scenario: a just war, one fought in self-defense. Country X has been attacked and is acting in self-defense, and much planning goes on with respect to how to kill, in an ethically acceptable manner and in accordance with the laws of war, members of the invading army. This kind of planning in this context may attenuate the impermissibility contributed by killing (and that impermissibility may be outweighed by the permissibility contributed by saving the lives of many innocents). Killing people willy-nilly in war (or any other context) is reprehensible. In some contexts where killing can, all things considered, be justified, there is a danger that it might get out of hand. War is such a context, and planning the killing in these contexts might actually attenuate the impermissibility contributed by killing (by preventing it from getting out of hand). So, perhaps planning acts as an amplifier to some φ under circumstances cα and as an attenuator under cω. So, φ can have enabling and disabling conditions; it can have amplifiers and attenuators, and whether something counts as an amplifier or attenuator depends on the details of the circumstances. To keep things simple, for now, we will assume that if we have a series of contributors, φ1, φ2, φ3,… φn, that φi are neither among the enablers and disablers nor among the ci. In other words, if φ1 contributes to permissibility subject to enablers and disablers, then the rest of the φi are not stated in the list of enablers and disablers, and they are not functioning as attenuators or amplifiers.

Let us say that claims or a list of conditions are surveyable if they can be used as a premise, part of a premise, or as premises in practical reasoning.Footnote 6 Once the claim or the list becomes too long to be used as a premise in practical reasoning, we will say that it is unsurveyable. Given some φ, we can ask whether either of the following are surveyable:

  1. (i)

    the enabling and disabling conditions for φ, or

  2. (ii)

    the cα and cω for φ.

We can see the nonlocality of reasons increasing if one of (i) or (ii) is not surveyable (since that would mean that the contribution of φ depends on more and more things other than φ). If both (i) and (ii) are not surveyable, then nonlocality increases even more. The longer the list of qualifiers with respect to the functioning of some φ, the more nonlocal φ is as a reason to allow, do, or refrain from doing something. It is my view that the unsurveyability of (i) undermines the possibility of a weak contributory standard in a way that the unsurveyability of (ii) does not. A weak contributory standard only states the direction of the contribution, not the magnitude; more specifically, it can allow the magnitude of that contribution to vary from case to case (as long as the direction remains constant). So even if the list of attenuators and amplifiers for some φ is unsurveyable, as long as the list of enablers and disablers is surveyable, we could still state a standard to the effect that subject to a surveyable list of enablers/disablers, φ contributes in a specific direction (though the magnitude of that direction may vary in ways that cannot be stated surveyably). If it turns out that the list of enablers and disablers is unsurveyable, then we run the risk of saying that sometimes φ contributes in one direction, and sometimes not. Leaving it at that would undermine the possibility of a contributory standard for φ, at least to the extent that we conceive of such a standard as something that could be used to guide us in practical reasoning. (see Sect. 4.3 for further reflections on these matters.) If we use the word “principle” in a strictly stipulated sense, we might say that some contributory principle could be true even if both (i) or (ii) are not surveyable. The idea is that, in principle, a true description exists of the circumstances under which φ makes a contribution in a given direction. However, from the truth of a contributory principle for φ, it does not follow that there is a contributory standard for φ, a standard that could inform reasoning or conduct.

Thus far we have ignored the interaction effects among the various φi, but something does need to be said about them. There are at least two kinds of interaction effect we can consider. First, some of the φi may act not only as contributors, but also as amplifiers and attenuators to the other φi. Second, some of the φi may act not only as contributors or as a reason for doing or not doing something, but also as an enabler or disabler to other φi. Let us consider each of these in turn.

Some amplifiers and attenuators may have their effects on some φ but not make a contribution on their own, and some amplifiers and attenuators may actually be among the φi that make contributions on their own as well. In this latter case, we have the first kind of interaction effect just listed above. Let us consider φ1:“doing something in defense of your own life.” Subject to enablers and disablers, we will say that φ1 contributes to permissibility. Say that φ2 is “killing a human being.” Subject to its own enablers and disablers, φ2 contributes to impermissibility. Over and above that, it is also possible that φ1 has an attenuating affect on φ2. In other words, whatever φ2 contributes in the absence of attenuators and amplifiers, that contribution is diminished in the presence of φ1. φ2 still contributes to impermissibility, but less so in the presence of φ1. This interaction effect among φ1 and φ2 does not, on its own, undermine the idea of a weak contributory standard, which simply states the direction of a contribution but not the magnitude of the contribution. This would be true even if the list of such interaction effects is unsurveyable.

Things are different with the second kind of interaction effect. Here, we can consider the possibility that some of the φi may actually enable or disable some of the other φi. I will argue that if we allow for that, problems are created for contributory standards.

As I have discussed elsewhere (Guarini 2010b, pp. 393–394) Dancy has a well-considered objection to contributory standards. Isolation is often used as a test for the alleged contribution of some φ.

To figure out in which direction φ contributes, ask this: if φ was isolated from all other relevant features, how would it contribute? Dancy argues that the question makes no sense. Say that “ought implies can” is among the relevant enabling conditions for keeping a promise to contribute to permissibility. If it is impossible for you to keep the promise due to circumstances beyond your control, we will say that this disables promise breaking from counting against breaking the promise. Now, such enabling/disabling conditions are among the relevant considerations we would screen off if we screened off all other relevant matters. What would it mean to ask, “How does promise-keeping (or -breaking) contribute independent of all other relevant information?” Alternatively, “How does promise keeping (or breaking) contribute if it is the only relevant consideration?” If information about enablers/disablers is relevant, and if we are not allowed to consult it, then it is hard to know what to say about the preceding questions because the contribution of promise-keeping (or -breaking) seems to depend on some other enabling and disabling considerations. In the aforementioned paper, I agreed that as traditionally conceived, there is a problem with the isolation test since it requires screening off all other relevant information. I also argued that it could be modified. Instead of asking what contribution some φ makes when isolated from all other relevant considerations, ask in what direction some φ contributes in the absence of the rest of the φi but not in the absence of enabling/disabling conditions. (For example, to assess the contribution of killing, isolate it from promise breaking, lying,…, but not from the enablers/disablers relevant to killing.) The idea is that if we have access to the enablers/disablers with respect to some φ, then we can say something about the direction in which φ contributes. This modified isolation test allows us to speak to the direction of contribution of some φ, but not in a way that requires isolation from everything that is relevant, only the effects of the other φi.

There is a concern with the above strategy: it can only work if the φi themselves are not among the enabling/disabling conditions. To say that determining the direction of contribution of some φ requires isolating it from the remaining φi but not from the enablers/disablers can only make sense if the remaining φi are not among the enablers/disablers. So, allowing the φi to serve not only as contributors but as enablers/disablers appears to undercut the possibility of formulating an isolation test for contributors. If there are contributory standards, and if they are surveyable, and if we are to use some sort of isolation test to identify the direction of contribution, then to a first approximation, it would be correct to say that the contributors, i.e. the φi, cannot be among the enablers and disablers. To concede such a result would be to further constrain what could count as contributory standards,Footnote 7 but not in a way that obviously undercuts the serviceability of such standards.

4.2 Holisms: Moderate, Extreme, and Ultra

Dancy distinguishes between two kinds of holism: moderate and extreme. I will argue that extreme holism allows for different interpretations, and the stronger of the interpretations I will call ultra holism. It will be argued that extreme and ultra holism are incapable of giving an account of how we generalize from cases we have learned, or figured out how to classify, to cases we have not seen before. The critique of Dancy’s moderate holism will be different. Dancy allows for default reasons, and he uses an explanation test to determine when default reasons have switched their polarity. The explanation test will be shown to be problematic. It will also be shown that there are differences between contributory or pro tanto reasons and Dancy’s default reasons.

4.2.1 On Ultra Holism

The following is how Dancy (2004, p. 191) makes the distinction between extreme and moderate holism:

The extreme claim is that no feature is a reason except in context, and the status of every reason as a reason is contextually grounded. The moderate claim is that there can be such a thing as a default rational polarity of a feature, though this can be reversed or annulled by other features of the context.

Dancy allows for what he calls a default polarity for reasons, but he does not think that we can formulate contributory standards with those default polarities. Instead of writing of reasons as contributors, Dancy refers to reasons as favourers and disfavourers. The default polarity of a reason is its default tendency to favour or disfavour doing some action. Dancy’s moderate holism only requires that the default polarity can be reversed by other features of the context. His talk of favours and disfavours should not be confused with talk of contributors or contributory standards. Dancy rejects the isolation test or any other way of formulating contributory standards.

Let us start by looking at the distinction between extreme and moderate holism. Say that we have a very long list of contributors: φ1, φ2, φ3,… φn. Let us also say that for any φ we consider, the list of its enabling and disabling conditions is not surveyable, and let us say that its associated ci are not surveyable. Let us further imagine a neural network that is trained to classify moral situations despite the complexity of the contributions being made by the φi. All of this still does not show that extreme holism is correct. Extreme holism says that some specific φ does not make a contribution in a given direction (permissible or impermissible) on its own; it only does so in the context of all the other features that are present. Let us consider the example from Sect. 3.4 and add assumptions about the unsurveyability of qualifiers, whereby the unsurveyability of qualifiers is meant both the unsurveyability of enablers/disablers and attenuators/amplifiers. It is not simply that

  1. (a)

    killing (subject to a list of unsurveyable qualifiers) makes a specific contribution;

  2. (b)

    gaining freedom from an imposed burden (subject to a list of unsurveyable qualifiers) makes a specific contribution;

  3. (c)

    each of the contributors may be qualifying the extent to which the others are contributing, and

  4. (d)

    the final classification is the result of the contributions of each of the contributors.

Extreme holism goes beyond this. It says that killing only serves to favour or disfavour in a context, but there are at least two different ways in which we could understand that view. First, we could take the view that killing-to-obtain-freedom-from-imposed burden is contributing as an indivisible whole. Since there are no constituents contributing in different directions, if we were to think in terms of that case’s similarity to other cases, there is no reason to think that this case is more or less similar to other cases. Overall similarity between cases is a function of both local similarities and differences between the cases (or which of the φi they have in common, and which they do not.) On the approach we are considering, there is no such thing as one φ contributing one way and another contributing in a another way; there is just the indivisible whole making its contribution. Let us call this ultra holism. Without local constituents contributing in different ways, it is hard to see how ultra holism could make sense of similarities between cases, since such global assessments between cases depends in similarities and differences in the constituents of cases. Ultra holism essentially says that the constituents of cases do not matter; the case is an indivisible whole. If constituents do not matter, that seems to preclude an informative similarity (i.e. one with variation) since there do not appear to be any dimensions along which to define variation. To think of it in terms of similarity space, say that we have three cases that we are plotting in a two dimensional space, and our overall metric for similarity between any two cases is Euclidean distance. Now, place the three cases on the vertices of an equilateral triangle, one case on each vertex—they are all equidistant from one another. We can define higher dimensional structures in higher dimensional spaces to achieve the same equidistance relationship between larger numbers of cases. If ultra holism can allow for any sort of similarity space, it appears to be committed to the preceding sort of similarity space, which is a kind of limiting or degenerate example of such spaces since there is no variation in similarity. Indeed, it is hard to imagine how a being or system could learn cases of this sort in a way that allows for generalization to new cases since the learned cases have no informative (i.e. varying) similarity relations holding between them such that when a being or system gets a new case, it can be classified based on its similarity or dissimilarity with other cases (because there is no basis for such similarity assessments.) If ultra holism were to allow for dimensions of variation in similarity, then it would turn into either a form of extreme holism or moderate holism.

4.2.2 On Extreme Holism

Let us reserve the term “extreme holism” for something weaker than ultra holism. Say that in the case under consideration, there is no default contribution for either killing or obtaining freedom, but when we combine the two in a case, each contributes in a different direction. There is no surveyable way to state a default polarity since such things do not exist (so contributory standards do not exist) but features can contribute in a specific direction in a given context. Call this an extreme holistic approach. It allows us to talk about similarities and differences between cases because features can make contributions in different directions.

If there are no default contributions, it is hard to see how a system could ever learn to generalize to new cases. For example, killing is not a default reason for doing anything, and saving the lives of many individuals is not a default reason for doing anything, but when we combine them, suddenly we have a reason for doing something. If this were true for all cases, it looks like the classification for every case would have to be stored separately; how then, do we generalize to cases we have not seen before? When we learn to classify situations, we are not just learning that say, φ1 and φ2 when present jointly are permissible, φ3, φ4, and φ5 when together are impermissible, and so on for many other φi and their inclusion in cases. That would make us massive lookup tables, having no generality of any kind, default or otherwise to bring to the classification of new situations. Dancy (2004, pp. 112–113 and 184–185) shows sensitivity to this concern, and it appears to be a motivator for his discussion of default values.Footnote 8 To see the point another way, think of being on the receiving end of a case which you have to classify. Let us say you have no defaults or contributors or generalities at all that you bring to bear on the case in question. How is it that you classify it? Objection: but look, extreme holism allows for similarity relations because it allows for features to contribute in varying ways in the context of different cases, so those similarity relations can be brought to bear on new cases. Reply: when we think through what it means to bring to bear such similarity relations, it appears to mean that we are at least applying default tendencies, something extreme holism rejects. Let us see why this is so.

Using Dancy’s language, similarity relations are made possible by some features counting as favourers, and other features counting as disfavourers, with overall similarity and difference between the cases being about how those favourers and disfavourers are working in the various cases. To bring those similarities to bear on new cases just is to apply the favouring or disfavouring tendencies of reasons. If the combination of φ1 and φ2 (call it case-φ1φ2) bear similarity or difference relations to the combination of φ2, φ3, and φ4, (call it case-φ2φ3φ4) and those relations are brought to bear on some new case involving φ1 and φ4 (call it case-φ14) we need to ask how that can be. We cannot take the polarity of φ1 from case-φ1φ2, and the polarity of φ4 from case-φ2φ3φ4, and apply those defaults to case-φ1φ4 because—by hypothesis of extreme holism—we are not supposed to have any idea what the tendencies of the φ i are outside of the cases in which they occur. It will do no good to say that, “well, when we see φ1 and φ4 in the context of case-φ1φ4, we just know that they play the same role there as they did in some other cases.” The issue is precisely how we come to be aware of such things, especially if we are not importing information about defaults or tendencies or some such from older cases we have already learned to new ones. To say that we are applying an understanding of overall similarities from old cases to new ones is problematic because those overall similarities are understood in terms of the directions of impact of the φi, so any application of overall similarity between cases requires taking the contributions of the φi out of their original cases—which we are not supposed to be able to do on pain of becoming moderate holists—and apply it to the new case. In short, if extreme holism allows for the application of some similarity space to new cases, it appears to collapse into moderate holism, which does allow for default reasons; if extreme holism does not allow for the application of its similarity space to new cases, then it has no account of how to generalize from what it has learned to new cases—the same catastrophic predicament shared by ultra holism.

4.2.3 On Moderate Holism

Moderate holism does allow for default polarities, but the polarities are subject to change in a given context. We need to examine both the idea that polarities can change, and the idea of what these polarities are. First, consider what it would mean for the polarities to change in a context. Using the above example, for the sake of argument, say the influence of obtaining freedom could change the default direction in which killing contributes or simply send its influence to zero. In such a situation, it looks as if obtaining freedom is both favouring (contributing toward permissibility) and disabling killing from functioning in its default way. (In the case under consideration, I would claim that killing is not disabled; I am simply considering the possibility of disabling it to draw out a difference between moderate and ultra holism). So obtaining freedom is both a favourer and a disabler to some other feature at the same time. Freedom from imposed burden is functioning as a favourer while sending the impact of killing to zero. This is still different from ultra holism because it is allowed that φx is making its own contribution. In terms of similarity assessments, that matters, because overall similarity between cases depends on constituents. Of course, moderate holism can also allow for φx to contribute in one direction and φy to contribute in another, just like extreme holism. The difference between the two is about default values.

If some φ has a default polarity, the marker of it contributing in a non-default way is that it needs explaining (Dancy 2004, p. 113). If a feature has no default disposition, then when placed in a context where it does give us a reason to do something, that is what needs explaining. I will focus on default reasons and the alleged need for an explanation when they no longer carry their default force. The problem with identifying default polarities in this way is that it may not pick out what is needed. Say we are told that Jill kills Jack, and the person telling us this thinks that it is morally permissible, and we are not given any other information. Most likely, we would ask for an explanation. We are then given an explanation: Jill killed Jack to save her own life and to save the lives of others. So what do we say about killing in this context? Does it continue to contribute in its default way, or have the other features disabled the default contribution of killing? When I suggest to colleagues that killing can continue to contribute in its default way, I am frequently pressed to explain why. Consider a standard cave escape scenario:

Case 1

There are many people in a cave, and a large individual accidently gets trapped in the entrance, obstructing the only way out. The only way to save everyone’s life is to kill the individual blocking the entrance. The individual is killed, and everyone is saved.

Now consider two variations.

Case 2

As above, except that it is possible to save everyone by gravely, but not mortally, wounding the obstructing individual. The individual is removed from the entrance, thereby gravely wounding him, and everyone is saved.

Case 3

As above, except that the obstructer simply needs to be asked to exert himself a little, and the entryway will be cleared. The obstructer is asked to exert himself; the entryway is cleared; no injuries are incurred, and everyone is saved.

Morally speaking, Case 1 is more similar to Case 2 than it is to Case 3. If someone was not sure whether killing was acceptable in Case 1, and interlocutors wanted to argue from analogy that it is permissible, they might have a chance if they used Case 2 as a source case for permissibility and then argued for the permissibility of the action in Case 1, but it would seem almost perverse to use Case 3 as a source case and argue from the permissibility of asking someone to exert himself to the permissibility of killing someone in Case 1. The reason for all this is that for the purpose arguing for permissibility or impermissibility, Case 3 is more similar to Case 2 than to Case 1. If the similarity relation is as I stated it, this should not be a surprise. A plausible way of making sense of the similarity relations is to say that in Case 1, killing contributes to impermissibility even if that is outweighed by saving the lives of many innocents, which contributes toward permissibility. In Case 2, gravely wounding someone contributes to impermissibility, but again it is outweighed. In Case 3, there is nothing contributing to impermissibility, which is why it is so much more morally dissimilar from Case 1 than Case 2 is. Indeed, to claim that saving the lives of many innocents somehow disables the negative contribution of killing by sending it to zero would do great violence to our views on similarity in the above cases since we lose the ability to explain the rather significant differences in similarity. (To do even greater violence, imagine that the polarity of killing was reversed and now contributed to permissibility.)

The approach to contributory standards sketched in Sect. 4.1 allows that a feature can continue to contribute to impermissibility even if the action as a whole is, all things considered, impermissible. As we saw in this section, Dancy’s moderate particularism can allow for this by using default reasons.Footnote 9 It might be claimed that in the above cases, we need to focus in on the killing, and since killing continues to contribute to impermissibility, it does not need an explanation, so the explanation test is (allegedly) intact since it is only when the default polarity actually changes that we need an explanation (of that change). However, sometimes the very constancy of the direction of contribution of killing may be in need of explanation. If you have the intuition that killing is, all things considered, permissible in Case 1, then you may not (at first) have held the view that killing continues to contribute to impermissibility in that case. For individuals who have the intuition that killing had its polarity neutralized in Case 1, an explanation (which I provided using contrasting cases) is required for why we should say that killing continues to contribute in its default way, and that is contrary to what the explanation test requires. Default reasons are not supposed to require explanation if they contribute in their default way. Sometimes, it is not obvious in what direction some feature is contributing, even if it is contributing in its default direction.

Dancy associates default reasons with the explanation test; contributory or pro tanto reasons (or standards) are sometimes associated with some sort of isolation test. Default reasons do not admit of a surveyable statement about when they count in their default way and when they do not. To a first approximation,Footnote 10 contributory standards do allow for a surveyable statement of when a reason contributes in a given direction. That surveyability notwithstanding, it turns out that much of the work that Dancy wants done can be carried out with contributory standards. Provided a contributory standard is qualified with enabling conditions, such a standard can easily allow for the contribution of a feature to change. Allowing for some variable relevance—a point that is near and dear to the particularist’s heart—is not a problem for someone defending contributory standards. How nonlocal one believes reasons to be turns in large part on how much variable relevance one is prepared to accept. Even a defender of contributory standards can allow for quite a bit of variable relevance given the way enablers/disablers and attenuators/amplifiers work.

4.3 Taking Stock: Nonlocality, Contributory Standards, and the MCC

What we have seen in Sect. 4 of this paper is that there are different ways for reasons to be nonlocal. A simple form of atomism asserts that some feature always contributes in a specific direction and to a specific extent; this is a maximally localist view. As soon as we allow the extent of that contribution to vary with amplifiers and attenuators, then our conception of reasons is no longer just about a specific atom because to determine a contribution, we are not looking very locally at one thing, but are looking nonlocally by examining how different things interact. We can push this further by saying that we need lists of enabling and disabling conditions for a feature to count in a particular direction. This leads us to take a perspective that is even more nonlocal in order to assess the contribution that some feature makes. If the lists of enablers/disablers and amplifiers/attenuators are unsurveyable, then we have even more nonlocality. Moderate holism, extreme holism, and ultra holism explore even more nonlocal conceptions of reasons. There are various ways of subscribing to the nonlocality of reasons without going all the way to the various forms of holism we have considered.

In part three, it was argued that contributory standards could be used to make sense of the similarity space set up by the MCC. It is completely compatible with that view that some contributors are also acting as attenuators or amplifiers; thus far insufficient analysis has been done to determine if this is so. What should be noted is that not only could we use tools from traditional second-order moral philosophy to inform how we understand the state space set up by the MCC, we could also use the tools of computational modeling to shed some light on the different ways of understanding the nonlocality of reasons. How could we see similarity between cases if we are not deducing it from some exceptionless or total rule? Well, it might be that in the process of learning to classify situations, we produce a similarity space (or spaces) as a by-product of learning to do the classification. We might be sensitive to the distance relations of the cases in that space even if we are not able to formulate some total, substantive moral rule that allows us to monotonically deduce the classification of the cases. Say that the list of enablers for contributory standards is long but surveyable, and say that the list of amplifying and attenuating considerations is unsurveyable. It may turn out to be quite difficult to understand all of our moral reasoning with total moral standards that allow for monotonic deductions about the classification of cases. This does not lead to mysticism; nor need it lead to despair. While we do have many intuitions as to what may be similar to what, natural language does not contain many resources for modeling such intuitions. Many mathematical tools do exist for analysing state spaces. If we understand classified cases as occupying a position in state space, a wide variety of tools become available for understanding various possible similarity relations. A new set of tools could conceivably lead to transformations in debates between particularists and generalists, which have been largely about the role of rules, principles, or standards as expressed in a natural language.

The reason for suggesting that the unsurveyability of enablers/disablers for contributors leads in the direction of particularism is that the generality of norms is something that everyone has assumed should be expressible in a natural language so that it could be used in reasoning and argument. Fair enough, as far as that goes, but maybe there is further to go. What if it turned out that

  1. (a)

    we could give a mathematical characterization of a state space which made it clear that some features had strong tendencies to contribute in specific directions; and

  2. (b)

    we could give a computational and empirical argument that the complexity of the state space could not be mastered unless such strong tendencies existed (i.e. that the strong tendencies are necessary if we are to master the requisite similarity space); but

  3. (c)

    it turned out that there was no surveyable way of listing in natural language all the enablers/disablers and attenuators/amplifiers that attach to the general tendencies?

Would this be partiularism or generalism? Given the need for a certain kind of generality, that would be a point in favour of generalism; given that the generality could not be stated with a surveyable list of enablers/disablers for the contributors, that would be a point in favour of particularism. For Dancy it is not just that the lists of qualifiers would be unsurveyable, it is that there cannot be any necessity in any of the default tendencies or polarities that he considers. In short, if it turns out that (a) through (c) are true of moral reasons (or even some subset of them) then something between the more thoroughgoing forms of generalism or particularism would end up being true. Such a possibility could only be considered, never mind evaluated, if we take seriously the tools of mathematical and computational modeling. In short, using these tools could help us to see possibilities that we have not yet seriously considered. The possibility of (a) through (c) being true is raised to make the preceding point. Nothing in this paper has demonstrated that these claims are true, and there is no reason to believe that they are true of the MCC in its current form. Of course, the MCC and its training and testing sets are rather simple, and it would be no surprise if what holds for the MCC may not hold for more complex systems. However, the possibility we have been considering suggests that even if the required level of complexity in moral cognition is such that the lists of natural language qualifiers to contributors is unsurveyable, the spectrum of possibilities for the nonlocality of reasons may have stopping places even before we get to Dancy’s moderately holistic particularism.

5 Some Objections and Replies

As with any approach, objections can be raised. Thus far, we have simply given the MCC the features it is expected to use to classify situations. Things get very complicated very quickly when we expect a being or system to infer features that are relevant to classification. Section 5.1 deals with this type of consideration in rudimentary ways and argues that the ability to classify situations would have to draw on other resources to have the slightest chance of working. Sections 5.2 and 5.3 deal with objections that are specific to state space approaches to similarity. All the talk about state spaces and neural networks might lead some to wonder how this work could be considered of philosophical interest; Sect. 5.4 engages concerns along these lines.

5.1 Beyond Surface Similarities

Let us consider two cases: Jack rapes Jill, and Huey enslaves Dewy. Given the kinds of similarity we have been considering, we would not be in a position to assert that these two cases have anything in common since there are no explicitly stated features that they share. In other words, they share no surface similarities. However, it is not impossible to see a similarity. For example, we may say that both cases are instances of one individual violating the autonomy of another individual, even though autonomy is nowhere mentioned in the description of the cases. Indeed, we might say that those two cases are more similar to one another than to a third case in virtue of autonomy being violated in the two but not the third. This raises a number of issues. First, there is the issue of not remaining content with a given description; we often have to infer features in a case that may not be supplied with the description of that case, and those features may be relevant both to the classification of a case and its similarity to other cases. If the inferred features are ones that the network has already been trained on, then this might not be a major problem. Perhaps some other system responsible for extracting more information from a case—a supplemental information extractor—could process the cases and feed a more detailed description of the case into the network. However, this strategy will not work in cases where the inferred features where not in the training set of the network. This poses a more serious problem, to which we now turn.

Imagine that someone is taught that slavery and rape are wrong, but the word “autonomy” is not in their vocabulary. Someone might believe that the rape and enslavement cases are both instances of morally impermissible behaviours and not notice any other similarity beyond that. Based simply on surface considerations, the cases are not seen as all that similar. After learning about and reflecting on the nature of autonomy, the individual’s similarity space might be reshaped such that the cases in question are seen as being more similar than they used to be, because rape and slavery are grievous violations of autonomy, of being able to live a self-directed life. The idea that learning a new concept could restructure a similarity space (or spaces) is important. In the previous paragraph we considered the strategy of using a separate system for drawing inferences or otherwise extracting more information from a case to feed into the MCC a more complete description of the case. But that approach requires the MCC be trained on cases that include the more detailed information. If the MCC is not trained on cases involving autonomy—and that is the scenario we are now considering—then it will not know what to do with cases that have autonomy in the description. One way the MCC has of dealing with newly learned concepts would be to retrain based on redescribed versions of all the relevant cases in its training set. It is not all clear that this is psychologically plausible. Retraining on the entire training set is a tall order, and when we learn new concepts we often have dramatic and sudden changes in how we see things. It is the suddenness of the change that needs to be reflected on. Retraining on an entire training set takes time, and when suddenness is what needs explaining, we have a problem. It might be possible to deal with this sort of problem by taking the MCC already trained to classify some cases, and then give it a handful of new cases to train on further, where those new cases include references to autonomy. It might (and that qualifier looms very, very large indeed) be possible for such training to happen quickly. Call this the updated MCC. Again, in addition to the MCC, let us hypothesise a separate system that can extract information not provided as input. Originally, it could not extract information about autonomy, but we will say that the newly educated version of the supplemental information extractor can. Given a new case, the modified information extractor could infer if considerations of autonomy are present and provide a more detailed description to the updated MCC, which could now classify the new case. Even the old cases the MCC originally trained on, which contained no reference to autonomy, when fed to the modified supplemental information extractor may be treated differently since the modified extractor can extract information about whether those old cases involve autonomy violation.

That is all very sketchy, and certainly too simplistic. To start, it assumes that a situation is being provided as input in the form of language. To be sure, sometimes that happens: we are given a description of a situation, and we classify it. Often, though, we see or hear something going on, and we classify the situation. Perhaps those sensory inputs are being converted to something language-like, but even if we make such an assumption, we need to recognize that the conversion process may involve filtering of various sorts that may impact how classification happens, filtering which was nowhere mentioned above. Moreover, the supplemental information extraction described above operates independently of the MCC and then passes along its results to the MCC. What if it is not only the case that whether we think autonomy is present informs case classification, but how we engage in case classification could also inform whether we think considerations of autonomy are at issue? If that turns out to be the case, then a much more elaborate account would be required.

That the problems discussed above should arise is no surprise. We have been looking at one neural network, and there is no reason to think that one network could possibly generate all the temporally appropriate functional patterns we are interested in. What is needed is to look more generally at issues of cognitive architecture and how concept learning interacts with classification and other learning tasks—a detailed exploration of which is well beyond the scope of this paper. However, some remarks on state spaces are in order. From the fact that the state space set up by a network trained on surface similarities cannot capture everything we want to capture, it does not follow that a state space approach to similarity needs to be discarded. It may be possible to interpret the interaction of multiple networks as setting up a state space that can be restructured in ways that are more powerful and psychologically plausible than how the MCC’s state space could be restructured. We may also need to discuss the possibility of multiple state spaces and the coordination of the types of similarity information they embody. The final part of this paper contains a few more thoughts along these lines. For now, it will suffice to say that we need to keep the issue of what the MCC can do separate from the issue of what it might be possible to do with state space approaches to similarity, which are in no way wedded to a single simple recurrent neural network.

5.2 The Concern About Scalar Metrics

One of the implications of work discussed herein is that state space models of similarity may be richer than some suspect. As developed, such a model can allow for a multidimensional account of similarity as well as an overall or scalar account of similarity. Markman and Gentner (2005) have argued that state space analyses, in virtue of using an overall, scalar distance metric, do not capture the various respects in which items being compared are similar and dissimilar. This may be true of some state space approaches, but not all. We have already seen that freedom from an imposed burden was an important dimension of similarity in the above simulation, with such cases scoring high on the fifth principal component, and that is not all. For example, “self-defense” and “many innocents suffer” show up on other principal components. There is no reason we cannot compare the overall similarity between two cases by looking at many local similarities and differences, where these local similarities and differences show up as the structuring principles of the distribution of cases in the moral state space. Contributory standards are, if you will, morphologically implemented in the distribution patterns of the moral state space. Various scalar metrics are available for overall, scalar assessments of similarity and difference.

Part of the concern over scalar metrics may be related to the issue of relational structure in analogy. Gentner is one of the giants in analogy research—arguably the mother of contemporary analogy research in psychology and cognitive science. Her work (Gentner 1983) played a central role in ushering in the structuring mapping paradigm of analogy in those fields. If overall analogical similarity between two cases is to be assessed in terms of the local relational constituents of the cases, then looking only at a scalar metric—one number—washes out all the local comparisons between the cases. While there is some utility to the overall comparison, state space analyses need not restrict us to that sort of metric. Similarity with respect to the local relational features of cases may show up on principal component analysis (or other types of mathematical analysis). A state space approach need not deny the importance of relational structure in analogues; what it does need to do is show how this information can be captured. While the MCC does not deal with cases of higher order relational structure (relations about relations, and relations about relations about relations,…), it is far from obvious that a simple recurrent network could not be trained to deal with such cases. It is also possible that higher order relational structure could be captured in state spaces, perhaps as subspaces within subspaces (within subspaces…).

5.3 The Asymmetry or Directionality Concern

Of course, there are other objections to state space models of similarity and analogy. One of them has to do with asymmetry or directionality. To use a stock example, it is one thing to say “this butcher is a surgeon” but quite another to say that “this surgeon is a butcher.” The direction of the similarity often matters. As it turns out, this is often true in arguments as well. It is one thing to appeal to a violinist type case to argue for a conclusion about a rape-abortion type case, but it would be something different to appeal to a rape-abortion type case to argue for a conclusion about a violinist type case.Footnote 11 The worry is that if we understand similarity simply using some sort of overall (scalar) distance metric, then we have lost the directionality of the analogy since mere distance tells us nothing about the direction of the analogy. To be sure, directionality needs to be explained, and there are at least two strategies open to the defender of state space approaches when looking at the similarity in moral argument. One strategy would be to argue that state spaces will not explain the directionality of the analogy. The various distance relations available in state spaces are useful as far as they go, but they are not the whole story and need to be supplemented with other considerations to explain directionality. For example, in the context of argument between two interlocutors, there are contextual considerations that determine which cases can serve as sources and which may be targets. Cases that are in dispute are targets, and which cases can serve as sources will depend on where agreement can be found between the interlocutors. There is no reason to expect state space approaches to settle such issues, since the directionality of arguing about cases is determined by contextual relations that hold between interlocutors. (It may turn out that there are important differences in how we understand analogies in arguing and other types of analogy, such as the two predicative analogies at the beginning of the paragraph.)

A second strategy would go further and claim that a better understanding of state spaces may place constraints on how the directionality of analogical argument might work (even if it does not entirely define directionality). For example, it might turn out that if we are arguing that action A is permissible by appeal to some source case, then for the selection of the source case to be effective in persuasion, it is best to pick a source whose Mahalonobis distance is closer to our interlocutor’s permissibility subset than to his or her impermissibility subset. Then again, maybe some other distance metric between cases will be an informative constraint. There is no room to explore all the possibilities here. The point of raising this possibility is to show that a state space approach can be qualified but still make contributions to understanding the role of analogy in argument.

A variation on the second strategy could be used when we are looking at the case of an individual. In the monolectical case, someone (say, Jack) might be confused: perhaps he treats the case of abortion when pregnancy results from rape one way (as impermissible), and treats disconnection from the violinist another way (as permissible), but he thinks the cases are similar and should be treated in the same way. It would be one thing to treat the violinist case as the source and reason to a conclusion about the rape case as the target, and quite another to treat the rape case as the source and reason to a conclusion about the violinist case as the target. The direction of the analogy matters, even in the monolectical case. Appealing to intersubjective differences (the first strategy raised in this section) to account for directionality will not work because we are dealing with one subject. However, we might still be able to use other features of individuals to account for the direction in which they might reason. Discrepancies between (1) the classification of cases and (2) the similarity relations holding between cases may place constraints on the direction in which he reasons. If B is originally classified as impermissible but is closer to permissible cases that include A, then the individual might reason from A to B (arguing that B should be treated as permissible). If, on the other hand, A is originally classified as impermissible but it closer to permissible cases that include B, then the individual might reason from B to A (arguing that A should be treated as permissible). Of course, these are not the only options. They assume that we are reasoning analogically, and that we are satisfied with our initial sense of what is similar to what—neither of these need to be the case. By appeal to general theoretical considerations, for example, we might reconceive what we consider similar to what. For now, though, let us put considerations of the preceding type to the side. The point being made is that state space approaches to similarity may have something to offer to an account of similarity, even with respect to understanding the directionality of similarity.

Similarity is a complex notion, and we should not expect it to work the same way in all contexts of its use. To see this, consider the notion of trust. Should Jill trust Jack? Well, it depends, trust him for what? If Jack is a brilliant cardiac surgeon who has never lost a patient, it would make sense for Jill to trust Jack to carry out her heart operation. If the same Jack is a womanizing, disloyal individual, then it would make sense for Jill not to trust him as a partner in an exclusive, intimate relationship. Does Jill trust Jack? Should she? Perhaps for some things she does and she should, and for other things, not so much. We get this variation in trust even if we do not vary the object of trust (Jack). We can get further variation by varying the object: does Jill trust her thermostat? Humans can fail to be trustworthy in ways that are different from how a thermostat can fail, though both can fail to be trustworthy. Is x similar to y? Is x more similar to y than it is to z? The answer to these questions is going to depend on the sorts of things we are comparing, and what the comparison is for. Are we comparing objects? States of affairs? Actions? Why are we doing the comparison? No claim has been made here that a state space approach will give us everything we need to understand every kind of similarity. I have been considering similarity between moral cases (involving actions, motives, and consequences), and how that similarity may be used in reasoning and argument. Even if we focus in on just that kind of similarity, there are differences between the dialectical and monolectical cases. If we look at the dialectical use of similarity in moral argument, the similarities and differences between interlocutors is an important one in understanding the direction in which an analogical argument might or might not work. The issue becomes even more difficult in the multilectical case: imagine a politician in a democratic country making an analogical argument to a large audience. Here, what governs the choice of the source case will likely be informed not by what a single interlocutor believes, but by what the majority of the audience believes about various issues. In the monolectical case of using an analogy to reason through a confusion, differences between interlocutors are simply not at issue (unless one is trying to reconcile one’s views with those of others). As we have seen, though, a state space approach can make a contribution in the monolectical and dialectical cases. How we assess similarity will depend on what we are comparing and the reason(s) for the comparison.

5.4 How is Any of this Philosophy?

Look, it is like this: philosophy deals with logical, conceptual, interpretive, and normative analyses; cognitive and neural modeling is descriptive, so none of this stuff about computational neural modeling could possibly be relevant to philosophy. I have had a number of colleagues (including referees) put variations of that complaint to me, and given that some readers may share it, I will say a few words about it here.

Philosophical discourse can and has proceeded on assumptions about empirical matters, which is not to say that such discourse reduces entirely to empirical questions. The debate between particularists and generalists is a second-order debate about the nature of moral reasoning. As I have noted elsewhere (Guarini 2010b), it is to Dancy’s great credit that he acknowledged that his position has empirical commitments and dealt with them in the appropriate way. If a particularist claims that we can learn about right and wrong without the use of generalities, then make no mistake about it: that involves an empirical commitment. The claim that we can reason about right and wrong without generalities also involves an empirical commitment. These are claims about how cognition works or could possibly work, and surely empirical considerations are relevant to assessing these claims.

Objection:

perhaps philosophers should not be concerned with empirical and computational claims about how cognition does or could work, and to the extent that they have been, they have not been doing philosophy. Sometimes this sort of objection is supplemented with the view that philosophy should only be concerned about cognition as it could be studied a priori, or if not strictly a priori, then at least from the arm chair. Philosophers (most of us anyway) are not trained as empirical scientists.

Reply:

ignoring empirical and computational work would be a great leap in the wrong direction, and one need not be a scientist to reflect on and consider such work—there is quite a bit of useful work done in philosophy of science, and most of it by people who do not hold a Ph.D. in science. Virtually all the discussion between particularists and generalists has been in terms of sentential ways of thinking about cognition. Generalities are conceived of as principles, rules, or standards—all sentential vehicles of some sort. The debate is structured around what role such things play in moral argument and cognition. Elsewhere (Guarini 2010b, 2011), I argued for the view that there is a kind of conceptual middle ground between the more thoroughgoing forms of particularism and generalism, one which sees neither principles nor cases as primary. In Guarini (2011), I suggested that the state space approach provides a natural way of understanding that middle ground. I argued that to understand how the state space evolves during the training of the network we need to notice both (a) the variation in the complexity of the cases being presented to the network and (b) the generalities implicit in the structuring of the distribution of the cases in the state space, where these generalities are not sententially encoded. I am not insisting that such an approach is correct. Far from it. It is early days still in understanding cognition, and methodological pluralism in these matters is surely the order of the day. The point is that empirical and computational work helps us to raise and consider possibilities that might not otherwise occur to us (as argued in Sect. 4.3 above and in the closing sections of Guarini 2011). For all we know, this could lead to new ways of thinking of logics as well. If we use a vector to represent a case, perhaps we could quantify over vectors to develop logics of multidimensional similarity that capture the logic in similarity-based argument more perspicuously than existing logics. Perhaps. It may be that by exploiting the properties of vectors, such logics might provide resources for stating norms for similarity-based reasoning and argument that currently go unstated because we have not yet developed the conceptual or logical resources for stating them in a perspicuous manner.Footnote 12 Maybe. The reason for these speculations is to make it clear that nothing in this paper is meant to suggest that we should dispense with the normative, with logic, or with conceptual or interpretive analyses. On the contrary, the suggestion is that by having a philosophically informed look at empirical and computational work, we may discover new fodder for our traditional activities, and new ways of engaging in our traditional enterprises may emerge. We should not be surprised if some of our traditional debates undergo significant transformations in the process.

Objection:

but if we use empirical and computational results to inform some of our activities as philosophers, then philosophical work done in that way will stand or fall with the empirical or computational work.

First reply:

this is an honest way to stand or fall. Discussions between particularists and generalists have made use of empirical assumptions. It is best to be honest about that, acknowledge it, and subject the assumptions to careful scrutiny (including any available empirical evidence).Footnote 13 We may then go on to consider new empirical or computational work that could inform our debates, and if that work has its problems (as it surely will) then our views will need still further revisions. And so it goes. But that is nothing new: there has been quite a bit of change in western philosophy over the last 25 centuries or so. Second reply: if the idea is that philosophers are supposed to be interested only in necessary truths, because once we have them, then they cannot fall, and necessary truths are discovered a priori, then we are not being sufficiently sensitive to number of points. We can start by noticing that depending on our views about nomological necessity and epistemology, we might well be able to empirically discover necessary truths, so it does not follow from a concern about necessity that the empirical is irrelevant. But even for those not sympathetic to such Kripkean ideas, it is worth remembering that some alleged necessary truths turned out to be neither necessary nor true, so the search for necessity is no guarantee that some prospective truth will not fall. Finally, there are no arguments in this paper against the existence of necessary truths or the pursuit of such truths; it has only been argued that searching for them should not preclude us from also examining empirical and computational matters relevant to philosophical work.

6 Qualifications and Conclusions

Nothing in this paper should be read as suggesting that the MCC (at 57 neurons!) is biologically plausible, and it certainly need not be (and almost certainly is not) the case that there is just one neural network that would participate in doing low-level classification. To complicate things further, there may well be different low-level pattern classifiers setting up their own similarity spaces, and these classifiers may feed their outputs into high-level reasoning systems which then have to mange the different similarity spaces developed by the low-level classifiers, with feedback relations between the high- and low-level systems. It may even turn out that there are different similarity spaces, but no one similarity space from a given neural network is a moral similarity space. In other words, there might be different similarity spaces, and each of them contains elements of the moral, the legal, the conventional and other matters, so only subspaces of each of these larger spaces form moral similarity spaces, and high-level systems have to learn how to filter out the moral subsimilarity spaces to operate on them. It may turn out that in humans, talking of a single or general moral similarity space may be a kind of approximation, or maybe a regulative ideal. In any event, existing research in functional neuroimaging (Greene and Haidt 2002; Young and Dungan 2012) suggests that there is no one place in the brain where everything relevant to moral reasoning or even classification takes place, so we should fully expect things to be considerably more complex than anything sketched out above.

The purpose of this paper has not been to weigh in on any first-order normative issues in moral or ethical discourse. The focus herein has been on second-order issues pertaining to the nature of reasons. The positions we take on the nature of reasons can be used to support second-order claims about what might or might not count as acceptable ethical reasoning or argument. Particularists and generalists often engage in such debates without taking substantive views on first-order moral issues. Second-order normative claims about the nature of moral reasoning have not been made, but to the extent that the ideas in this paper might be used to do normative work, it would be at that level of analysis. This paper has been largely speculative, examining some possible views on the structure of moral reasons, arguing against some conceptions, expressing some sympathy for nonlocal conceptions, but not taking a firm and detailed stand on the structure of moral reasons in human beings.

We have been examining similarity in moral or ethical case classification. This paper does not address all the concerns with state space models of similarities, and the concerns not addressed will have to await future work. While certainly not the whole story about similarity reasoning, it is hoped that the computational model presented herein together with the analyses of this model will provide new ways of thinking about state space models of similarities as they might apply to ethical or moral reasoning.

The reader might be wondering how the views expressed herein might connect up with other views. What would the implications of a state space approach be for work on moral grammar (Mikhail 2011) or moral sentiment (Nichols 2004)? There are many possibilities here, and this is not the place to explore them in any detail. That said, I would caution against claiming that state space approaches to similarity are diametrically opposed to other approaches. Perhaps sentiment or affect play an important role in constraining the evolution or structure of a moral state space; the same might be said for hypothesized moral grammars. There are a number of different ways in which computationally and empirically informed work on moral cognition might develop. Even though philosophers have been thinking about the nature of moral reasoning for over two millennia, there is still be much to be discovered about what moral sensitivity, reasoning, and argument might (or could) be.

While we should not forget the lessons that we have learned over some twenty five centuries of moral thought in the western philosophical traditions, neither should we close our eyes to the new tools that are at our disposal. The title for Sect. 3.2 is a deliberate play on Austin’s (1962) How to do Things with Words. To many, there will be something very strange in trying to understand anything about moral sensitivity or reasoning using vectors in high dimensional state spaces, and cones to represent 14 dimensions of information, and so on. We really are used to natural language treatments using words, and everyday words at that. However, formal tools are nothing new in philosophy—just think of the different ways in which deontic logics have been used to model obligation. Part of the burden of this paper has been to show that we can put new tools in our tool box, from which it does not follow (at least not without further argument) that older tools should be dispensed with. All tools, new or old, have to prove their worth by what they can do.

Similarity plays an important role in moral sensitivity, reasoning and argument. We need to better understand that role. How affect and other considerations may contribute to similarity, or how similarity may contribute to affect and other features of moral cognition—these too need to be better understood. And so does much else. There is plenty of work to be done.