Introduction

Although there is evidence of pursuits in the area of Artificial Intelligence (AI) before 1956, it is widely considered that its birthplace was in Dartmouth, wherein John McCarthy and nine other scientists spent two months working on a detailed study of the subject (McCarthy et al. 2006; Russell and Norvig 2009). Fast forward to 2019, and it’s difficult to imagine an industry that has not been impacted by AI. For example, artificial agents are used to drive cars (Daily et al. 2017), help us with personal assistant tasks (Leviathan and Matias 2017), beat the world’s best players in games like Go (Silver et al. 2017), assist us in providing better healthcare (Jiang et al. 2017), and even help militaries gain strategic advantages (Sapaty 2015).

Due to the increase in scope and autonomy of artificial agents, many philosophers and ethicists have raised concerns around deploying them without the necessary measures in place for safe and ethical integration into society (Moor 2006; Dameski 2018; Allen and Wallach 2012; Anderson and Anderson 2007). In particular, there are concerns about how increasingly autonomous artificial agents will treat human beings and whether this treatment will be considered ethical. Moor (2006) states it bluntly when he writes: “we want machines to treat us well”. The emergent field of enquiry dealing with how machines treat us is called Machine Ethics Footnote 1(Anderson and Anderson 2007; Moor 2006; Allen and Wallach 2012), and it is primarily focused on “developing computer systems and robots capable of making moral decisions” (Allen and Wallach 2012).

Many philosophers have argued that computationally-based agents can be considered artificial moral agents (AMA’s) if they are built to incorporate the relevant ethical dimensions in their decision making processes (Abney 2012; Scheutz and Malle 2017; Floridi and Sanders 2004; Sullins 2006; Moor 2006; Johnson 2006). Abney (2012), for instance, argues that non-cognitive and emotional elements contribute to moral decision making. He further argues, however, that they do not ultimately determine whether or not an agent is moral. What ultimately determines the morality of an agent, according to Abney (2012), is their ability to deliberately and rationally choose ethical decisions and actions. In other words, a rational, though emotionless, robot could be classified as an AMA if it were to meet the requirement above. This is the central philosophical idea in the claim that computationally-based agents can be AMA’s. It is a claim that computational rationality can entail artificial moral agency.

The AMA project is held back by a seemingly disintegrated approach in which its advocates have sought to advance it. For example, there are enough projects from the sciences that have sought to build AMA’s independently of meaningful considerations from normative ethics.Footnote 2 These projects often end up with poorly conceptualised AMA’s that will not stand the test of philosophical scrutiny. Similarly, there have been enough philosophical arguments, both for and against, the possibility of artificial moral agency. However, philosophical arguments alone will not advance the AMA project. This dichotomy of approaches in the AMA project can “distract from the immediate task of making increasingly autonomous robots safer and more respecting of moral values, given present or near-future technology” (Allen and Wallach 2012).

Consequently, the purpose of this article is to invite both developers (i.e. engineers and scientists) and philosophers to consider how models of computational rationality might be applied in the building of well conceptualised and formulated AMA’s. I will do this by putting forward a proposal of such a model for computational rationality applied to the problem of solving for artificial morality. This will hopefully shift the discussion from a mostly philosophical debate about whether or not artificial morality is possible, to the models can practically demonstrate it. The next three sections will seek to clarify the concepts of computational rationality and artificial moral agency, before delving into the proposed model and some of its anticipated limitations.

Computational rationality

Computational rationality is perhaps best described as approximating decision making for maximum utility while using the optimal computational resources (Lewis et al. 2014). It is about making rational decisions within a computational framework. As Gershman et al. (2015) note, computational rationality is a convergence of ideas from AI, cognitive science and neuroscience around intelligence, and in particular, its computational nature. They go into extensive lengths in their work to show how ideas of computation from AI have inspired researchers in the cognitive and neurosciences, and vice versa. To get a proper grasp of computational rationality, however, we need to look back a few decades to the works of Simon (1955), Horvitz (1987), and others.

Many of the ideas in computational rationality stem from the tradition of Herbert Simon, who was an economist and political scientist. While Turing (1950) and others were postulating about the nature of machine intelligence, Simon brought much-needed constraints on the kind of rationality that could be achieved by computationally bounded agents. He started looking at candidate definitions for bounded rationality when he was deriving a model for rational choice (Simon 1955, 1972; Selten 1990). He argued that agents do not always have all the information they require to make a decision and that their internal computation was limited in how they could use the available data to make rational decisions. Bounded rationality was, therefore, a way for him to ‘‘formulate the process of rational choice in situations where we wish to take explicit account of the “internal” as well as the “external” constraints that define the problem of optimisation for the organism” (Simon 1955, p. 2).

These ideas inspired many works in AI, a field which also found itself dealing with creating intelligent agents that operate with much of the constraints that Herbert Simon saw in general organisms. Most notably, Horvitz (1987, 1988), and others at the then Medical Computer Science at Stanford took the ideas forward (Horvitz et al. 1989). Horvitz argued that probability and utility theories,Footnote 3 both of which were generally considered normative decision making in computer science, were insufficient for the real-world problems that machine intelligence systems were trying to solve. Real-world problems often go beyond the standard axiomatic basis defined by utility and probability theories, as they are often characterised by uncertain and limited information (thus making the process of modelling and knowledge representation difficult). Furthermore, machine intelligence systems have limited computational resources, which made the application of classical decision-theoretic approaches to many real-world problems difficult, and many times, intractable (Horvitz 1987).

To deal with these problems, Horvitz suggested looking at various optimisation and heuristic strategies to resolve some of the challenges in real-world decision making. Notably, he proposed the notions of flexible inference and decision-theoretic control. Various inference techniques have been developed over the years that allow partial inference with limited information or partial execution. This also paved the way for the concept of meta-reasoning, which is just a program that is aware of various inference strategies and can select the best strategy based on the type of problem that needs to be solved (Horvitz 1989). These types of inference strategies present a natural fit for the optimisation and heuristic framework of Horvitz. Decision-theocratic control represents the ability for the agent to determine how best to execute a specific inference strategy based on a trade-off between computation time, precision, maximum expected utility (MEU) and cost of delaying the action. Balancing these trade-offs, along with suitable or multiple inference strategies, represents the core idea in the approach of Horvitz.

The ideas of Horvitz and Simon have persisted well over time, with many AI researchers adopting them (Marwala 2013; Zilberstein 2013; Russell and Subramanian 1995; Genewein et al. 2015; Lewis et al. 2014; Gershman et al. 2015). Russell and Subramanian (1995) use these ideas to develop what they call provably bounded-optimal agents. Bounded-optimal agents are machine intelligence systems whose solutions to problems are optimal for the information that they can acquire from the task environment and the limitations of their programs and architectures. In other words, optimality is what the agent can achieve, given its internal and external constraints, and not necessarily what a perfectly rational agent would do for a given task. This conception of a bounded-optimal agent formed the foundation for what is now referred to as computationally rational agents in recent literature (Gershman et al. 2015; Lewis et al. 2014).

Quite suitably, the work of Gershman et al. (2015), with Horvitz as one of the co-authors, likely represents one of the clearest pictures of what computational rationality is, and what it can be. As the authors note, computational rationality has the potential to be a “unifying framework for the study of intelligence in minds, brains, and machines” (Gershman et al. 2015, p. 278). I support this claim and further posit that computational rationality can be a unifying framework not only for ideas in the sciences, but also in Philosophy, and more specifically, in Machine Ethics. After all, it was Aristotle who first placed a strict emphasis on practical rationalityFootnote 4 as a basis for virtuous and ethical action (Miller 1984). I aim to clarify how exactly computational rationality can be an integrative framework for machine ethics by showing how the ideas of Gershman et al. (2015), Horvitz (1987), Russell and Subramanian (1995), and others, can be applied to the question of building artificial moral agents. I will do this by discussing the epistemic capacities required for moral agency and considering whether these capacities can be replicated or approximated within a framework of computational rationality.

Artificial moral agency

Before delving into the details of the computability of the capacities necessary for moral agency, I need to first define what I mean by an artificial moral agent. Generally speaking, the idea of agency denotes the capacity for an agent to act independently (Schlosser 2015). In contrast, moral agency denotes the capacity for an agent to act independently in so far as making morally charged decisions and actions, and to have a level of responsibility and accountability for the consequences resulting from its decisions and actions (Parthemore and Whitby 2014). Moral agency implies a certain understanding and knowledge of what is good and what is bad (morality) and being able to discern what is right from what is wrong (ethics). Moral agency should not be confused with moral goodness or ethical uprightness. Its emphasis is on the agent’s ability to be responsible for its decisions and actions, regardless of whether or not those actions are evaluated as morally good or bad.

The definition above gives us a good idea of the notion of moral agency, but it does not address who or what can be included in the class of moral agents. How the concept of moral agency is framed is important because asking who is a moral agent already presupposes personhood, which is generally taken to be embodied in human beings. Parthemore and Whitby (2014) suggest framing the question more broadly by asking “when is any agent a moral agent”. Such open-ended framing of the question allows one to consider a wider set of agents for inclusion in the class of moral agents. When one asks the question in this way, three broad categories of moral agents seem to emerge from the literature. These categories are: biological moral agents (Torrance 2008; Churchland 2014; Liao 2010; Rottschaefer 2000) ; conscious moral agents (Parthemore and Whitby 2013, 2014; Himma 2009); and artificial moral agents (Abney 2012; Scheutz and Malle 2017; Floridi and Sanders 2004; Sullins 2006; Moor 2006; Johnson 2006). I will place my focus on artificial moral agents.

The proponents of artificial moral agency can be further subdivided into two. The first group are those that argue that most, if not all, of the full range of moral decisions, can be computed by some near or future term artificial agent (Abney 2012; Sullins 2006; Allen and Wallach 2012). The second group are those that argue that only certain kinds of moral decisions can be computed using current approaches to AI and that the full range of moral decisions will require super-rational capacities (Scheutz and Malle 2017; Johnson 2006). Let us call the former group of views strong machine ethics, and the latter weak machine ethics. Strong machine ethics refers to the argument that moral agency can likely be fully achieved with an appropriate level of (computational) intelligence. On the other hand, weak machine ethics refers to the argument that full moral agency, at least in its historic and somewhat anthropomorphic roots (Torrance 2013), will not be achieved using current computational approaches to AI. As a result, robots will only have a pseudo or functional morality. I will consider definitions of artificial moral agency in both the strong and weak machine ethics perspectives.

Given this context, I can now discuss my candidate definition for artificial moral agency. To do this; it is essential to understand that current approaches to machine ethics are primarily computational, i.e. they are dealing with computational morality. Outside of significant advances in new approaches to designing artificial agents, it seems unlikely that this will change soon. Even those that recognise that some notion of consciousness will be required for general intelligence (Franklin 2003), and indeed full moral agency (Wallach et al. 2011), are only working towards functional approximations of it—mostly using a combination of cognitive architectures and computational implementations (Franklin et al. 2014; Lucentini and Gudwin 2015). The nature of machine ethics implementations, it would seem, will remain almost certainly computational, at least for the foreseeable future.

The definition of moral agency given by Parthemore and Whitby (2014, p. 1)Footnote 5 serves as a good reference. However, the previous discussion showed that different people mean different things when they use the term ‘moral agent’. What is important for researchers and designers in machine ethics is to state clearly in which sense we mean the term ‘moral agent’ and also clearly specify what exactly our definition for it is. To illustrate, I define artificial moral agency (in the weak sense) by modifying Parthemore and Whitby’s definition as follows:

An artificial moral agent is a computationally-based agent whom one appropriately holds responsible for its actions and consequences, and artificial moral agency is the distinct type of agency that agent possesses.

I refer to moral agency in the weak sense, meaning that I believe not all moral decisions can be made rationally—super-rational capacities are required for others. A strong machine ethics view of artificial moral agency can also be defined and clarified by following a similar process. The definition of artificial moral agency above is somewhat ontological in that it emphasises the nature of the agent. However, in theory, a definition based on the agent’s moral capability could also be derived. Thankfully, Moor (2006) has already developed a taxonomy that helps characterise the level of ethical capability in artificial agents.

Moor describes four different kinds of AMA’s, each according to capability. These four kinds are (in order of increasing ethical capability): ethical impact agents; implicit ethical agents; explicit ethical agents; and full ethical agents. Though a full examination of Moor’s taxonomy is outside of the scope of this article, I submit that my sample definition is quite consistent with what Moor calls a explicit ethical agentFootnote 6. For the remainder of this article, I will use the sample definition of artificial moral agency stated above (in the weak sense), complemented by the use of the term explicit ethical agent, to be what I mean when referring to an AMA.

Artificial moral agency within a framework of computational rationality

I now need to show how the concept of artificial moral agency is compatible with a framework of computational rationality. Firstly, I will argue that the capacities necessary for moral agency lend themselves naturally to being computable. Secondly, I will also argue that many of the problems that were envisaged could be solved by computational rationality, are also present in computational morality, and that these same problems can also be solved through a framework computational rationality in the tradition of Gershman et al. (2015), Horvitz (1987), Russell and Subramanian (1995), and others. Let me begin by examining the claim that the capacities required for moral agency can be computed.

So far, I have avoided stating which capacities are required for moral agency. In the literature, these capacities can include emotions, empathy, free will, rationality, cognition (including mental and intentional states), concepts, awareness, amongst others (Wallach et al. 2011; Parthemore and Whitby 2013, 2014; Himma 2009; Torrance 2008). One way to get around this issue is to focus on what these various capacities give you as a result. In other words, instead of arguing about which capacities (and combinations thereof) will result in some facet of moral agency, focus on the outcome that is expected to be achieved. This is precisely what philosophers such as Sullins (2006) and Floridi and Sanders (2004) do by focusing on the top-level requirements for artificial moral agency and abstracting away the detail regarding the exact capacities required. I choose to focus on the requirements expressed by Floridi and Sanders because they conceptualise artificial moral agency within a weak machine ethics framework, as opposed to Sullins, who conceptualises it within a strong machine ethics framework.

Flordi and Sanders define the requirements for artificial moral agency as interactivity (being aware and responsive to environmental stimuli), adaptability (the ability to change internal states according to environmental stimuli) and autonomy (the ability to change internal states according to the agent’s own transition rules independently of environmental stimuli) (Floridi and Sanders 2004). Focusing on the top-level requirements for moral agency and abstracting away details around required capacities is essentially a focus on ‘mindless’ morality—a form of morality that distinctly suits a computational framing of moral agency. It does not care how autonomy or intentionality, for instance, are achieved—it only cares that they are achieved. This is precisely what Floridi and Sanders (2004) are alluding to when they talk about moral agency at different levels of abstraction (LoA). At a low enough LoA, a human being would also not be considered a moral agent since we would be dealing with their biological make-up, the neurobiological processes in their brains and other cognitive processes which at that level would seem indistinguishable from a machine.

Similarly, artificial agents observed at a low enough LoA are simply electronic components and code, and at that level, we cannot decide on moral agency. However, at a high enough LoA, these low-level processes and components are abstracted such that we only see the outcomes of their decisions. We wouldn’t ordinarily know how exactly the AMA functions, only that it seems to have some goals and intentions, it can function autonomously, and learn new things over time. At that LoA, we would be forced to admit that the robot acts in a manner that is consistent with our expectations of moral agents (Coeckelbergh 2014).

Flrodi and Sanders’ approach to defining the requirements for artificial moral agency is not without its critiques, the strongest of which likely comes from Himma (2009). He argues that, under Floridi and Sanders’ formulation, rattlesnakes, for example, could be wrongly considered to be artificial moral agents. If, as Himma’s example goes, the rattlesnake acts as a response to hunger and kills something, then it would have acted autonomously, certainly interactively and apparently with some ability to learn. The crux of Himma’s argument seems to be that only praise or blame-worthy agents could be moral agents. There are two issues with Himma’s argument, especially as it pertains to artificial moral agency.

Firstly, Himma’s argument presupposes that discourse around moral agency is equivalent to responsibility analysis and that no room exists for prescriptive discourse in the identification of moral agents (Floridi and Sanders 2004). Secondly, and to use his example, the rattlesnake would not qualify as a moral agent, according to Floridi and Sanders’ requirements, because it cannot learn moral values. It is only responding to instinct.

An artificial agent, on the other hand, can be programmed to simulate the capacity to learn (morally), and thus could qualify as an AMA. How good an AMA it will be (i.e. responsibility analysis) is a different matter altogether, and will require us to build models of computational morality and to evaluate them. To be clear, without consciousness or intentional/unconscious mental states, the AMA could not be a full moral agent, but that is why we put the qualifier ‘artificial’ in front of ‘moral agent’. In theory, it’s moral performance will lie somewhere between a rattlesnake and a full moral agent such as a human being (Moor 2006).

I have argued that the capacities required for moral agency, as expressed by Floridi and Sanders (2004), lend themselves to being computable. However that is not the only reason that artificial moral agency is compatible with a framework of computational rationality. Computational rationality exists as a framework primarily because artificial agents are not perfectly rational. They face many internal and external constraints such as limited computational resources, limited information about the problem at hand, limited time (and space) within which to make a decision, the tractability of the problem itself, and so on.

As it turns out, AMA’s are faced with much of the same constraints and limitations as computationally rational agents. AMA’s have to make moral decisions despite the limitation of computational resources, information, time, and the tractability of the moral decision itself. I posit that the problem of computational morality is simply a special case, albeit a complex one, of computational rationality, and that many of the approaches to solving computational rationality in the general case, can be used to enhance the prospects for computational morality further.

For example, the emergence of hybrid approaches, i.e. model-based (top-down) and model-free (bottom-up), as a superior choice for certain complex tasks for computational rationality (Gershman et al. 2015), and the fact that prominent researchers in machine ethics believe that a combination of top-down and bottom-up approaches will likely be required to solve certain kinds of complex moral decisions (Allen et al. 2005), lends further credence to the idea that the two domains are more related than differentFootnote 7. Just as Russell and Subramanian (1995) popularised the concept of a bounded-optimal agent, perhaps it is time to start talking about bounded-optimal artificial moral agents, i.e. AMA’s that arrive at moral decisions based on the information they can acquire from the environment, given the limitations of their software architectures and programming. Next, I will briefly discuss a basic conceptual model for a computationally rational AMA.

A model for an optimally-bounded, computationally rational AMA

The proposed model for a computationally rational AMA is based on the idea of an optimally-bounded, computationally rational agent that has been discussed thus far. I openly base the model on Russell and Norvig (2009, p. 55) (see Fig. 1), whose conception of a general learning agent is simple, and yet comprehensive. The ideas in computationally rationality can be integrated into the model of any general artificial and intelligent agent, so long as its key tenants, such as bounded-optimality, the separation of meta-reasoning from specific algorithms for reasoning, and the use of formal and heuristic methods, are preserved. Figure 1 depicts the structure of a general learning agent which can perform certain actions in an environment, through its sensors and actuators, according to a set performance standard (perhaps set by a human being). The general learning agent also can improve its decision making and performance capability over time, and generate new problems (goals) that can help it to improve performance further and learn new ways to reason.

Fig. 1
figure 1

A generic representaton of a general learning agent (Russell and Norvig 2009)

I present Fig. 2 as a proposed high-level conceptual model for a computationally rational AMA. The agent gathers bounded information from the environment, and processes it in the ethical performance element, which is responsible for ethical as well as general reasoning. The decisions and actions from this element are then transferred back to the environment (via the relevant actuators and communication mechanisms). The learning element and problem generator are left as-is from Russell and Norvig’s conception. They are responsible for updating the performance element with new ways to reason and generate new ideas for future performance, respectively. The critic element is also similar, except instead of only allowing for external input to modify the performance of the agent (e.g. human input), it also allows the agent to provide a human-understandable rationale for its performance.

Fig. 2
figure 2

A conceptual model for an optimally-bounded, computational rational AMA

Figure 3 zooms in on the ethical performance element, where the ethical meta-reasoner is responsible for deciding on the best ethical framework (or combinations thereof) and one or more programs to execute to arrive at an optimally-bounded ethical decision. The ethical performance element thus separates high-level meta reasoning activities from the execution. However, it still exposes the ethical meta-reasoner to the information from the environment to allow it to make the optimal choice of execution strategy. At a high-level, the proposed AMA would meet the requirements of interactivity (it can receive information from the environment and act on it), adaptability (it can change its performance state through the ethical performance and learning elements), and autonomy (it can behave in a somewhat autonomous manner through the problem generator, which generates new ideas about how to execute performance in the future). Additionally, it can receive a new performance standard and explain its current performance to a human being.

Fig. 3
figure 3

A detailed view of the ethical performance element

With regards to potential limitations of the model, I have argued in earlier sections that the AMA is conceptualised to have weak machine ethics (Sect. 3). As such, we can expect that the AMA could only be capable of making some, but not all, moral decisions. At this stage, it would be difficult to determine which moral decisions it would be able to make, and such a determination lies outside the scope of this article. However, I can speculate that moral decision making in situations where (bounded) information is readily available and accessible to the AMA should be theoretically possible. Such contexts could include highly domain-specific environments, such as in self-driving cars, healthcare robots, loan approval bots, home-assistants, and the like.

The model depends heavily on the availability of bounded information. Thus I expect that moral decisions requiring little to no external information (i.e. abstract decision-making) would be difficult to compute, at least initially until the AMA learns a sufficient representation of moral values. Furthermore, there is the general issue (not necessarily a limitation, but an unknown) of how the model would internally represent its learned moral values, and how this would map to actions that affect real agents in the real world.

Conclusion

The purpose of this article was to advance an argument and model for artificial moral agency based on a framework of computational rationality. This was done by showing that computational rationality can be an integrative element that can effectively combine both the scientific and philosophical elements of artificial moral agency consistently and logically. In particular, I argued that the capacities required for artificial moral agency, as well as the aspects of functional consciousness that underpin them, are computable. I further argued that computational morality is a special, if not complex, case of computational rationality, hence many techniques originally developed for general rationality can be adapted for computational morality. I then briefly proposed a conceptual model for a bounded-optimal, computationally rational AMA.

Some philosophers and scientists might reject the idea of a bounded-optimal artificial moral agent. After all, the stakes can be quite high when it comes to moral decision making, as the wrong decision could have significant moral and societal implications. However, we need to start somewhere, and I suggest that starting from a weak machine ethics perspective is helpful to allow us to begin to test its limits and the sorts of domains and contexts where it can be applied. The model proposed is an invitation for dialogue and feedback, and the hope is that many philosopher-developers pairs can be formed to solve the problem of constraining weak AI systems and making them more respecting of human moral values.

I have specifically chosen to omit mentions of the ethical frameworks that the AMA should follow, as the main purpose of this article was to locate artificial moral agency within a framework of computational rationality. Future research needs to focus on the kinds of ethical frameworks an optimally-bounded, computationally rational AMA ought to follow. Further research into appropriate software architectures for the AMA, and the type of programs that can form part of the ethical performance element’s program space, is also required.