Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Humans reason—of that there is no doubt. But what sort(s) of reasoning do we do? Clearly there are some among us who do mathematical reasoning, and do it well. And, it has been argued that all reasoning is an attempt to reach the ideal model of mathematics, i.e., to arrive at true conclusions (from given assumptions).

Perhaps for this reason, efforts to examine human reasoning have tended to be in formal logical dress, mimicking the rigor of mathematics, e.g., Aristotle, Leibniz, Boole, and Frege.Footnote 1 But there is evidence—recounted below—that human reasoning is not always aimed specifically at true conclusions.

The field of artificial intelligence also followed in this math-based mode, at least initially, despite many doubters. One such doubter, Marvin Minsky, pointed out an embarrassingly obvious difficulty: much of human reasoning tends to be non-monotonic. What we conclude depends not only on what we believe but also on what we do not believe. That is, we draw conclusions on the basis of not having certain beliefs.

Minsky’s famous example is essentially this: Told Tweety is a bird, we may reasonably come to believe that Tweety can fly. But if we had been told Tweety is a bird and furthermore is a penguin, we would not have drawn that conclusion. That is, the original conclusion (Tweety can fly) was performed by an inferential process that can be halted by the presence of additional information. This completely flies (pardon the pun) in the face of mathematical reasoning, where only ironclad guaranteed true conclusions are of interest. Minsky’s example highlights the fact that in everyday life very often we are content (indeed may have no other choice except) to seek a very highly plausible conclusion. The real world has far too many parameters for us to be able to have strict data on them all, so we end up reasoning as if we had beliefs such as “most birds fly”, and so on. And this relaxing of the truth-demand into a plausibility demand opens the door to retractions in the face of further evidence.

Well, this did not stop the logic-based AI-ers from using logic. It simply encouraged them to find better logics, so-called non-monotonic logics, where an enlarged set of assumptions might lead to a different set of conclusions that happens to be missing one of the original ones. A number of proposals for such logics soon surfaced, and had high degrees of success, most notably those of McCarthy and of Reiter.Footnote 2

Nevertheless, smart artificial systems did not spring up. It seems that being non-monotonic is not enough. In fact, it was pointed out a number of times that these new logics tended to be designed for the purpose of specifying the kinds of conclusions a smart reasoner ought to come to, but were not in general useful to system designers. The logics did not lend themselves to specifying ways for a system to actually arrive at these conclusions. The biggest roadblock was that the logics in question—like those before them—provided a characterisation of the set of all theorems that would follow from given axioms (or beliefs). This set typically is infinite, and the logics give little or no indication as to how and in what order these theorems are to be proven. Thus a system designer is stuck with the task of building an inference engine that produces, little by little, the theorems specified by one of the formal logics.

Yet even if one solves these problems, another surfaces. Very many of the theorems are of no use whatsoever to a given system’s activity. Logics in general tend to have promiscuous rules of inference, concluding sentence after sentence without regard for their usefulness. This problem of relevance had long been recognised. But possibly an even worse aspect is that time (a lot of time) is being used up, both on these irrelevant results and on the enormous (even infinite) set of all theorems. Somehow a real-world reasoner must exercise some control over its reasoning so that time is not wasted without regard to real-world exigencies.

This is a major difficulty because time is highly significant in almost all endeavours, and because formal inference crunches on and on forever, oblivious to time. Clearly, on-board logicFootnote 3 must be able to take into account the fact that time passes as reasoning is going on, and what is important at one moment might not be so at another. In short, a reasoner ought to realise that “now” changes out from under it; it never stands still. So reasoning about time is slippery. As an example, the following makes perfect sense and is essential for effective on-board logic, yet is absurd from the point of view of spec logic:

From Now(t) infer ¬Now(t).

That is, as soon as the time is known to be t, it no longer is t. Given a small unit or grain of time (e.g., a second, or a millisecond), the gist of the above can be approximated by this rule:

t :  Now(t)

——–

t + 1 :  Now(t + 1)

The above “clock” rule is the essential feature of so-called active logics, a species of on-board logic. While it might not seem particularly revolutionary, it has major consequences. Three of the most important are as follows:

  1. 1.

    Reasoning can keep up with deadlines. Given a noon lunch appointment, one reasons at 11:30 a.m. that at 11:45 one should start walking to the restaurant. Then at 11:44 one reasons that it is time to stop reading the newspaper and put on one’s coat. And by the time one’s coat is on, one reasons that it is time to walk (because by the time that reasoning has been done it will be close enough to 11:45). Trivial enough, but impossible to do with spec logics. But the clock rule makes deadline sensitive reasoning possible.Footnote 4

    Here is a much-simplified example of the above form of reasoning in active logic, involving a deadline. We have annotated each time-step in the reasoning with the actual time on the left; via the clock rule, the logic has effective access to this information as well, assuming it is started off with the correct time. In each step below we have placed the agent’s relevant beliefs at that time, with any new ones listed first; among these is always the current time, Now(t). And the last step shown has the newly inferred belief “Walk” as well. Note that beliefs of the form Now(t) are not inherited to the next step (see above clock rule) but that in general other beliefs—such as that one should start walking at 11:45—tend to be retained (precisely which beliefs are to be retained is a subtle issue; in particular cases there are useful heuristics but no single general principle). Note that, in general, a belief at one time is carried forward (remains a belief) at later times—for instance

    Now(11 : 45) →  Walk

    remains a belief indefinitely, whereas some special beliefs, such as knowledge of the current time, are dropped at the next step and replaced by a new belief (in this case due to the above clock rule, because time is always changing; but something similar can occur whenever there is reason to no longer hold a belief). Here is the example:

    $$\begin{array}{rcl} & & \mathrm{[11 : 30] : Now(11 : 30),Now(11 : 45)} \rightarrow \mathrm{Walk} \\ & & \mathrm{[11 : 30 : 01] : Now(11 : 30 : 01),Now(11 : 45)} \rightarrow \mathrm{Walk} \\ & & \ldots \\ & & \mathrm{[11 : 44 : 59] : Now(11 : 44 : 59),Now(11 : 45)} \rightarrow \mathrm{Walk} \\ & & \mathrm{[11 : 45] : Now(11 : 45),Now(11 : 45)} \rightarrow \mathrm{Walk} \\ & & \mathrm{[11 : 45 : 01] : Now(11 : 45 : 01),Walk,Now(11 : 45)} \rightarrow \mathrm{Walk} \\ \end{array}$$

    At time 11:45 above, modus ponens goes to work on the then-current beliefs, and by 11:45:01 has inferred Walk. One simplifying assumption here is that it takes one “step” of time to apply an inference rule. Note that the belief that at 11:45 the Walk action should begin is still there among the beliefs, even though it is not likely to be useful; this can be “pruned” by a cleanup rule that drops conditionals of the form

    $$\mathit{Now}(t) \rightarrow X$$

    after time t has passed; after all, Now(t) will never be true again after that time so the conditional will always remain true but never useful in concluding anything except at time t.

  2. 2.

    Inconsistency is a disaster for spec logics. They simply accept all sentences as theorems, making them useless. Paraconsistent logics adopt various means to avoid this “explosion” of consequences. But what is really needed is a paraconsistent logic with the ability not only to side-step a contradiction, but to notice it and consider what to do about it, possibly altering its status as a belief. After all, it might be an important clue to something amiss. Again, time comes to the rescue, providing a temporal “stratification” of theorems according to when they are proven, so that the time at which one sentence is proven (believed) allows inferences at the next time-step to comment on the previous result, such as that it is in contradiction with other beliefs and should be abandoned:

    t : P,  ¬P

    ————————

    t + 1 :  Contra(P, t)

    However, active logic does not discover all inconsistencies; that is in general not computable in finite time. It simply scans the current knowledge base for an occurrence of a wff and its negation. If deeper inconsistencies remain, so be it: just as a human may unknowingly entertain contradictory beliefs, so with active logic. Only when a contradiction is noticed—such as in the form of a direct contradiction between a formula and its negation, is an agent (human or otherwise) in a position to do anything about it.

    Also, once P and ¬P are noticed and removed from the KB, there is no general method for adjudicating between them. In general, the reasoning agent may have to be content with uncertainty. In particular cases, there are heuristics that may be useful, such as deciding in favour of P if the evidence that produced it is more compelling than that for ¬P. That of course requires additional machinery.

  3. 3.

    Inconsistency is just one example of a situation needing some sort of change (e.g., distrust various sentences). But more generally, any manner of change might be called for in a given situation. Even a change in language may be needed, if for example that is a plausible way to resolve an inconsistency. For instance, given the beliefs “John is reading,” and “John is wagging his tail,” one might consider that the word John is being used to name two different entities. This might then prompt the introduction of two new names, John1 and John2. But to do this, the reasoning must be able to have breathing room, time to make such changes before the inference engine rushes ahead to all the infinitely many theorems that would arise from the two earlier beliefs that together are implausible: that a dog is reading.

    So where the does this leave us? Are we closer to commonsense reasoning? We think so. One feature that we have identified, as a key to commonsense reasoning, is the ability to notice—and respond usefully to—anomalies. And it turns out that anomalies can easily be cast in the form of mismatches between expectations and observations, i.e., a contradiction—or close enough so that the Contra and Distrust rules can go into action. An evolving-time logic such as active logic provides just this capability.

2 Human Paraconsistency

Humans are very good at dealing with—reasoning and acting in the face of—uncertainty, change and even contradictions. In contrast, AI systems, especially those implemented with logic-based reasoning mechanisms, are notoriously bad at coping with these pervasive features of real environments. One widely-accepted conclusion from these observations has been that humans do not use logic-based mechanisms to implement their core reasoning abilities. And, indeed, there is a great deal of empirical evidence that seems to point in this direction. Humans often fail to achieve the ideal of valid logical deduction, and in many contexts we seem to utilise representational formats more suited to non- or extra-logical manipulations.

For instance, a large body of research has established that people are less likely to judge instances of modus tollens to be valid than instances of modus ponens.Footnote 5 Moreover, people are subject to some characteristic logical fallacies, such as the converse error (Example 12.1) and the inverse error (Example 12.2):

Example 12.1.

If the horses went to the watering hole, we would see their tracks.

We see their tracks. ∴ The horses went to the watering hole.

Example 12.2.

If the horses went to the watering hole, we would see their tracks.

The horses did not go to the watering hole. ∴ We will not see their tracks.

Interestingly, despite trouble with modus tollens in general, people have little trouble with that logical form in the following sort of case:

Example 12.3.

If the horses went to the watering hole, we would see their tracks.

We do not see their tracks. ∴ The horses did not go to the watering hole.

This pattern of results has suggested to many that what looks like logical reasoning is actually causal reasoning. Rather than building formal logical models from these sentences and judging the validity of the argument, we are in fact building causal models of the situations depicted, and judging the likelihood of the outcome. And, indeed, by those standards, arguments Examples 12.1 and 12.2 represent fairly plausible inferences.

Similarly, results from the Wason card selection task in Johnson-Laird and Wason (1970) apparently point to the use of inference mechanisms that are not logic-based. In this task, participants are shown four cards, e.g. (A, K, 2, 7), given a rule of the form “If a card has a vowel on one side, it has an even number on the other” (p → q), and asked to choose the cards they need to turn over to check the validity of the rule. The majority of participants choose A and 2 (i.e. p and q), even though the logically correct choice is A and 7 (p and ¬q). To cite just two examples of how this evidence has been interpreted, Oaksford and Chater (1994) take it to indicate that decision making is instead driven by considerations of information yield (according to their analysis, turning over A and 2 yields more information about the rule than does turning over any other two cards), while Cosmides (1989)—after noticing that participants make the logically correct choices when the abstract rule is replaced with one governing social conduct, e.g., “If you drink beer you must be over 21”—argues for the existence of mechanisms specialised for reasoning about social exchanges.

Such results—and there are many more like them—are of course deeply interesting, and assimilating them will be crucial to articulating a complete model of the mechanisms supporting human reasoning. And while we do not wish to question the existence and importance of the many different non-logical mechanisms that have been proposed to account for the vast amounts of available data on human reasoning and decision-making—including causal and other mental models (Gentner and Stevens 1983; Johnson-Laird 1983), Bayesian inference (Oaksford and Chater 2007), social exchange modules (Cosmides 1989), expected utility curves (Kahneman and Tversky 1979), frequency sensitivity (Gigerenzer 1994), and expected information gain (Oaksford and Chater 1994)—we would like to suggest that there is nevertheless room for continued empirical attention to human logical reasoning, for at least the following reasons.

First, and most obvious, from the fact that humans possess some inferential mechanisms that are not logic-based, it does not follow that we do not have and use some native logic-based reasoning abilities. It might be noted in support of this thought that people’s vulnerability to fallacies like those presented in arguments Examples 12.1 and 12.2 largely disappears when the propositions involved are not causally related as they are in the examples. This suggests that causal-model-based mechanisms may be interfering with logic-based ones in circumstances in which both potentially apply.

Second, from the fact that logic-based AI is brittle while humans are not, it does not follow that human flexibility is necessarily or entirely the result of non-logical capacities. It may be that human logic takes a special form, or has certain features, or interacts with non-logical capacities in particular ways, and these attributes of human logic have simply not been captured in prevailing logic-based AI systems.

Third, even if it is proven that humans have no natural, native, logic-based inference mechanisms, the fact that humans can nevertheless reason logically would mean that our non- and extra- logical capacities can be harnessed to this end. Thus, investigating human logical reasoning, particularly in the face of contradiction and change, may help us understand what is special about our implementation of logic such that it supports the observed flexibility of human reasoning.

Fourth and finally, given the significant advantages of logic-based implementations in AI—including the fact that rule-based systems are relatively easy for humans to understand, and therefore to trust, and that changing their behaviour is as simple and quick as changing the rules that govern it (something that is not the case in systems that require extensive (re-)training)—it behooves us to consider how logic-based systems can be made more robust in the face of various perturbations. Human perturbation-tolerance can be a source of ideas and inspiration in this task.

Unfortunately, perhaps because human flexibility has been largely taken as an indication of non- or extra-logical mechanisms at work, there has been relatively little empirical work on human performance in the face of contradictory or changing information in specifically logical contexts. There has nevertheless been some work along these lines, enough to draw some preliminary conclusions that can be used to guide the development of more robust logic-based systems. We will first review the results, and then discuss what we take the implications to be.

In one interesting set of experiments Dean Sharpe and Lacroix (1999) asked adults and children how they resolve assertions of the form p& ¬p, such as the response “yes and no” to the question “Was the movie good?” In this work, 24 adults and 48 children (ranging in ages from 3 to 8) were told a story about two characters having dinner. At the end of the meal, one asks the other, “Did you like your supper?”, to which the other character replies “Yes and no. I liked my supper and I didn’t like it.” Participants were asked to explain what the second character meant by the response.

The vast majority of participants (around 70%, including some children as young as 4), dealt with the contradiction by reinterpreting the statement p (I liked my supper) to take advantage of the internal structure of the object “supper”. That is, they took the character to be asserting that he liked one part of the supper, but didn’t like a different part. In addition, two other strategies were employed. Two adults and nine children reinterpreted the meaning of p by drawing attention to the applicability of the predicate “like”. These participants said things like: the supper was average, so he neither liked it nor didn’t like it. In addition, four of the adults, but only one of the children simply denied p, explaining that he didn’t like the supper, but was trying to be polite. There were no other resolution strategies employed. The authors summarise their main findings by noting that “adults and even preschoolers possess interpretive structures—particularly object structure—that are non-classical in the sense that they can be used to resolve apparent contradictions” (Sharpe and Lacroix 1999, p. 489).

A different set of experiments revealed some similar tendencies. Renee Elio (1997, 1998) asked what strategies people use to resolve logical contradictions of the form {p, p → q,  ¬q}. Participants were given premises like:

Example 12.4.

  • If the ignition key is turned the car will start.

  • The ignition key was turned.

They were then told:

  • The car did not start

and asked: assuming that C is true, which statement A or B do you think it is more plausible to disbelieve? What revision would you make to that statement to make it consistent with the other premises?

Overall, participants were more inclined (around 60% of the time) to doubt p → q than they were to doubt p,Footnote 6 and when they did so they usually (around 63% of the time) made the statement consistent by re-interpreting the meaning of p, typically by adding conditions. Thus, participants might revise the statement to read “If the ignition is turned and the battery is not dead, then …” Most of the remaining revisions (around 30%) involved reinterpretations of q, with the effect of turning the rule into a default, e.g. “If the ignition key is turned the car will usually start.”

This last finding is related to an interesting discovery by Byrne (1989), that reasoners seem to tacitly treat many rules as defaults, and thus can be made to suppress valid inferences under certain conditions. In her studies she found that while participants were happy to accept as valid inferences like:

Example 12.5.

If she has an essay to write then she will study late in the library.

She has an essay to write. ∴ She will study late in the library.

they will suppress the logically valid inference if certain additional premises are added, as in the following.

Example 12.6.

If she has an essay to write then she will study late in the library.

If the library stays open then she will study late in the library.

She has an essay to write. ∴ She will study late in the library.

In the case of argument Example 12.6, participants’ chance of accepting the conclusion that she will study late in the library drops from 96% to 38%.

So, what do these interesting findings mean?

  1. 1.

    Humans maintain control over their inferences, and don’t necessarily come to all logically valid conclusions.

  2. 2.

    This control is content based, in that they do not manage inference by ceasing to apply valid rules to all applicable forms, but instead selectively block application of valid rules to certain formulas. As Byrne concludes: “The moral of these experiments is that in order to explain how people reason, we need to explain how the premises of the same apparent logical form can be interpreted in quite different ways.”

  3. 3.

    Reinterpretation of the meanings of premises is the most commonly used strategy for dealing with contradictory formulas. People maintain consistency of beliefs by changing their meanings in appropriate ways.

  4. 4.

    People use only a few strategies to address inconsistencies; these strategies nevertheless suffice for the purposes of everyday reasoning.

Can these features be captured in a formal system? We think so, and active logic is intended as one proposal for how that might be done. For instance, feature 1 is captured by active logic’s stepwise character—an active logic reasons in time and, through the use of rules like contra(), permits “inspection” of its beliefs at each step. This allows an active logic to decide whether to continue to trust certain beliefs, or cease using them in further inference. In conjunction with this, active logic allows sentences to be “superscripted”, as in the earlier example of the two Johns. This is a formal device implementing features 2 and 3, above. Its effect is to give an active logic the freedom to resolve contradictions by giving sentences different interpretations. Exactly how all of this is effected by active logic is described in detail in the section on active logic below, and in Anderson et al. (2008). Before getting to that, however, we turn to a brief survey of some of the many other approaches to implementing AI reasoning systems. This will allow us to better highlight the unique, and we think valuable, features of active logic.

3 Formal Models of Human Reasoning

From the theoretical perspective any AI reasoning system typically consists of two main components: (1) a logical formalism for knowledge representation and (2) an inference engine to conclude new knowledge from existing knowledge. Based on the logical formalism and the theoretical and philosophical motivations behind the reasoning system, the inference mechanism can either be deductive, inductive, non-monotonic, default, defeasible, etc. An important issue in the implementation of the inference engine is the use of heuristics for typically the complexity of an algorithmic approach is prohibitively high. In the following subsections we survey some reasoning systems that take different approaches towards knowledge representation and inferencing.

General intelligence in human beings can be analyzed in terms of levels of description (see Newell 1990). Each level corresponds to a particular degree of abstraction or, more concretely, to a particular timescale of intelligent tasks. Every increase in the order of magnitude on the timescale would instantiate a new higher level of abstraction. Levels can be grouped into three bands (see Rosenbloom et al. 1991): (1) the neural band which corresponds to levels that do not exceed the order of few milliseconds; this band is the focus of the connectionist community, (2) the cognitive band which corresponds to levels starting with few milliseconds and up to levels with few seconds; this band is the focus of the cognitive science community, and (3) the rational band which corresponds to complex goal-oriented planning and actions which take at least the order of seconds; this band is the focus of the logicist and expert systems communities.

3.1 Soar

Soar (see Laird et al. 1987; Rosenbloom et al. 1991) is an implementation of a theoretical-based approach to general intelligence that focuses on the cognitive band. The relationship of Soar to other bands are investigated in Newell (1990), Rosenbloom (1989) and Rosenbloom et al. (1990). Soar assumes no distinction between human intelligence and machine intelligence, hence it has been extensively used both for developing artificial intelligence applications and cognitive models.

The architecture of Soar can be described by four levels of abstraction. First it uses an associative parallel memory to store long-term knowledge, and to identify and retrieve knowledge relevant to the current problem solving context. This knowledge is stored as a set of productions of the form P : condition →   action, where the correct action is performed when its preconditions hold. Memory access consists of the parallel execution of these productions. The result of this access is the retrieval of information into a short-term working memory that stores contextual information in the form of interrelated objects with attribute-value pairs. For example, an object representing a blue Ford car owned by Heather might look like

$$\left [\mathit{Id} = \mathit{te}12,\mathit{type} = \mathit{car},\mathit{model} = \mathit{Ford},\mathit{color} = \mathit{blue},\mathit{owner} = \mathit{Heather}\right ]$$

The second level of abstraction in Soar’s architecture is the decision making mechanism which proceeds in two elaborate-decide cycles. During elaboration memory is accessed repeatedly and the corresponding relevant productions are executed in parallel. Then one or more of the retrieved actions is performed based upon preference knowledge about what actions are acceptable and/or desirable.

Above the decision making comes the determination of goals. Goals are set out whenever the decision procedure has reached a situation (called impasse) where alternatives do not exist any more or there are alternatives, but not enough discriminating information to choose among them (Rosenbloom et al. 1991). Along with the determination of a new goal, a new problem context is generated which allows the continuation of decision making. If in the new context another impasse is encountered, then a new sub-goal and context are generated and the whole process recurs.

The final layer of abstraction is learning. When Soar resolves an impasse it summarises and generalises all the reasoning that led to its resolution. This adds new knowledge to its long-term memory that will prevent the occurrence of such an impasse in similar future situations. Soar’s learning mechanism can be used to learn new conceptual knowledge, learn new procedures, and correct its knowledge from the feedback obtained from its interactions with the surrounding environment.

3.2 Cyc

Cyc is a reasoning system that focuses on the construction of a vast knowledge base (KB) of trivial and commonsense knowledge (see Lenat et al. 1990; Lenat and Guha 1990). The rationale behind Cyc is as follows. The research and design of AI reasoning systems have largely been concentrating on the development of a logical formalism for knowledge representation and an efficient inference engine based on that formalism. However, little attention has been given to the construction of a real, or at least an approximation to a real, KB that grounds the whole enterprise in reality (the raw material over which the reasoning engine operates). This KB would encode commonsense knowledge about the world that we take for granted concerning things such as time, space, agenthood, life, death, etc.

The early systems lacked the kind and amount of knowledge that would make them effective. With modest-sized KBs (102–103 domain-specific assertions or rules), such systems sometimes showed very impressive performance in narrow task domains but notable problems remained. For example, consider an expert system that contains the following rules from Lenat et al. (1990):

$$\begin{array}{rcl} & & {\it { if frog(x), then amphibian(x)}} \\ & & {\it { if amphibian(x), then lays_eggs_in_water(x)}} \\ & & {\it { if lays_eggs_in_water(x), then lives_near_lots_of(x,water)}} \\ & & {\it { if lives_near_lots_of(x,water), then }}\neg {\it { lives_in_desert(x)}} \\ \end{array}$$

Given the assertion that Freda is a frog, the expert system can conclude various facts about Frida such as Frida is amphibian, lays eggs in water, lives near lots of water, etc. However, it can not answer simple commonsense questions, that would otherwise seem trivial to humans, such as: Does Freda lay eggs i.e., instead of asking about laying eggs in water? Is Freda sometimes in water? Is Freda a living being?, etc. Hence, such expert systems with complex detailed knowledge were very rigid, non-robust, and could easily fail when encountering a situation or question that is slightly different from the intended narrow domain.

Cyc is an attempt to overcome this brittleness. Its philosophy is to build a vast KB (size at least the order of millions of facts) containing general commonsense facts, domain-specific facts, general heuristics, specific heuristics, and heuristics for analogizing.

The construction of Cyc is, by its very nature, incremental. This includes the representation language, the inference engine, and of course the KB itself.

3.3 OSCAR

As opposed to Soar which is intended to simulate the cognitive band, OSCAR is constructed to simulate the rational band (Pollock 1992). It is an architecture for rational agents based upon an evolving philosophical theory of rational cognition (Pollock 1999). The general architecture is described in Pollock (1995). OSCAR’s overall behaviour can be briefly described by the following cycle: (1) OSCAR has beliefs representing the surrounding environment, (2) it evaluates the current situation according to these beliefs, then (3) it engages in an activity to change the world to its liking and to update its belief system. The most distinguishing feature of OSCAR is that most of its rational cognition is performed by epistemic cognition, cognition about what to believe, as opposed to practical cognition which is cognition about what to do.

OSCAR is essentially a defeasible reasoner. Additionally, by providing it with the axiom schemas of first-order logic it becomes a complete theorem prover for that logic (that is OSCAR is able to deduce every valid first-order formula). Defeasible reasoning leads to conclusions that are not necessarily deductively valid. The truth of the premises along with a rationally compelling argument provide good support of the conclusion, even though it is still possible for the premises to be true and the conclusion false. Such premises are called prima facie reasons. Conclusions supported defeasibly might have to be withdrawn later in the face of new additional information (Pollock 1999). For instance, if something looks red to me, that gives me a prima facie reason for thinking that it is red. But if someone I trust insists that it is not red then that gives a rebutting defeater. This kind of defeater attacks the conclusion. Another kind of defeater would attack the relationship between the premises and the conclusion. For example, learning that there was red light illumination should weaken my belief that the object is red. The interested reader may consult Pollock (198719891991a,b) for further details.

3.4 SNePS

SNePS, the Semantic Network Processing System (Shapiro 1979; Shapiro and Rapaport 19871992; Shapiro 1993), is a logic-based approach to natural language understanding and commonsense reasoning. Its ultimate goal is to acquire new knowledge through natural language interaction either with human agents or through media such as books, journals, radios, TVs, etc. SNePS should generally be able to represent everything expressible in natural language and should be able to reason in the presence of incomplete, circular, or inconsistent information.

Reasoning in SNePS is done through a formalism called SNePS logic SNePSLOG which is an enhanced version of first-order logic that is adapted to the natural language context (Shapiro 2000). For example, one of the features of SNePSLOG is the implementation of a new logical connective andor(i, j), which can be used to express the fact that an object satisfies some properties among several alternatives. This is not easily expressible in first-order logic because it is neither inclusive or nor exclusive or. The general formal syntax of andor(i, j) is:

$$\mathit{andor}(i,j)\{{P}_{1},\ldots ,{P}_{n}\}$$

is true if and only if at least i and at most j of the first-order properties P 1, , P n are true. Another improvement to first-order logic is the addition of the connective thresh which has the following syntactical form:

$$\mathit{thresh}(i,j)\{{P}_{1},\ldots ,{P}_{n}\}$$

and is true if and only if fewer than i or more then j of P 1, , P n hold. This connective could be used to capture equivalences among first-order properties. More connectives, quantifiers and other logical features are included in SNePSLOG (see Shapiro 2000).

SNePS memory is a semantic network modeled as a directed graph. Nodes in this graph represent concepts, individuals, general and specific rules, and propositions. The neighbours of any node in the semantic network can determine more complex structural properties of that node. For example, composite rules, propositions, and concepts can be formed by following a path of several nodes along the edges. Figure 12.1 shows an example of a SNePS semantic network. The nodes ‘Max’, ‘David’, and ‘John’ represent individuals, the node ‘Male’ represents a property, and the nodes ‘Has_gender’, ‘Is_parent_of’, ‘Is_child_of’, and ‘Equiv’ represent binary relations.

Fig. 12.1
figure 1

An example of a SNePS network (Adapted from Shapiro et al. (1968))

3.5 ACT-R

The ACT-R architecture is a simulation environment that supports the creation of cognitive models capable of predicting and explaining human behaviour (Anderson et al. 2004; Lebiere and Anderson 1993). The architecture is constrained by the theory of rational analysis which is an empirical program that aims at explaining the functions and purposes of cognitive processes (Anderson 19901991; Oaksford and Chater 1999). According to rational analysis it is important to step back from the investigation of human methods and mechanisms to ask about the environment within which these mechanisms are applied (Gray et al. 2006). In the context of ACT-R, each component of the cognitive system is optimised with respect to environmental demands, given computational limitations (Taatgen et al. 2006). According to this pragmatic approach truth is not a fundamental notion in ACT-R, though it is a derivative one: useful demand-based knowledge (either sensed directly from the surrounding environment or extracted from the current beliefs given the contextual environment) is usually true (weaker than defeasible reasoning described above in OSCAR); however, true knowledge is not necessarily useful (deducing Fermat’s Last Theorem or solving the Continuum Hypothesis are not useful in everyday activities). This is in contrast to purely logical-based systems built upon (presumed) true premises that are acted upon by sound reasoning rules irrespective of usefulness, which is not a logical notion. As will be seen below, this notion of usefulness/utility upon which ACT-R is based is manifested in the design of its memory.

ACT-R has two kinds of memory: declarative memory for facts and procedural memory for rules. Declarative memory is defined by items called chunks. Chunks have different levels of activation which reflect both their general access pattern and their relevance to the current context. Chunks that are frequently accessed receive a high activation. This activation decays stochastically over time if the chunk is not used. Procedural memory is defined by a set of production rules. Similar to the use of activation in declarative memory, each production rule has an associated utility value that determines its usefulness in reaching the desired goal. Selection of productions is based on the values of this attribute which are updated stochastically through the use of learning mechanisms.

4 Active Logic

In contrast with most of the systems outlined above, active logic was explicitly designed to capture some of the non-classical aspects of human commonsense reasoning, including time-awareness, control of inference, paraconsistency and non-monotonicity, including the ability to re-interpret the meanings of formulas. We have provided a detailed semantics (for a propositional version of active logic) in Anderson et al. (2008), but we offer some of the highlights here.

Formulas in active logic are expressed in a sorted first-order language \(\mathcal{L}\) with two parts \({\mathcal{L}}_{w}\), a propositional language in which are expressed facts about the world, and \({\mathcal{L}}_{a}\), a first-order language used to express facts about the agent, including the agent’s beliefs, for instance that the agent’s time is now t, that the agent believes P, or that the agent discovered a contradiction in its beliefs at a given time.

\({\mathcal{L}}_{w}\) is a propositional language consisting of the following symbols:

  • A set S of sentence symbols (propositional or sentential variables) S = { S i j : i, j ∈ N} (N is the set of natural numbers).

  • The propositional connectives ¬and → 

  • Left and right parentheses ( and )

\(S{n}_{{\mathcal{L}}_{w}}\) is the set of sentences of \({\mathcal{L}}_{w}\) formed in the usual way. These represent the propositional beliefs of the agent about the world. For instance S 1 0 might mean “John is happy”. For later use we assume there is a fixed lexicographic ordering for the sentences in \(S{n}_{{\mathcal{L}}_{w}}\).

\({\mathcal{L}}_{a}\), contains the unary predicate symbol Now, used to express the agent’s time, the binary predicate symbol Contra, used to indicate the existence of a direct contradiction in its beliefs at a given time, and the binary predicate symbol Bel, which expresses the fact that the agent had a particular belief at a given time. \({\mathcal{L}}_{a}\) contains only the connective ¬; hence statements such as Bel(θ, t) →  Bel(θ, t + 1) are not in the language.

All inferences in active logic depend on the knowledge base (KB) of the agent. The agent’s knowledge base at time t, KB t , is a finite set of sentences from \(\mathcal{L}\), that is, \(K{B}_{t} \subseteq S{n}_{\mathcal{L}}\). In the case of KB 0 we allow only formulas of \(S{n}_{{\mathcal{L}}_{w}}\) whose superscripts are all 0.

For \({\mathcal{L}}_{w}\), we use a fairly standard notion of interpretation \(h : S{n}_{{\mathcal{L}}_{w}} \rightarrow \{ T,F\}\) over the sentences in \({\mathcal{L}}_{w}\) that extends an \({\mathcal{L}}_{w}\)-truth assignment h as follows:

$$\begin{array}{rcl} & & h(\neg \varphi ) = T\;\Longleftrightarrow\;h(\varphi ) = F \\ & & h(\varphi \rightarrow \psi ) = F\;\Longleftrightarrow\;(h(\varphi ) = T\mbox{ and }h(\psi ) = F) \\ \end{array}$$

We also stipulate a standard definition of consistency for \({\mathcal{L}}_{w}\): a set of \({\mathcal{L}}_{w}\) sentences is consistent iff there is some interpretation h in which all the sentences are true. Notationally we write the usual hΣ, to mean that all the sentences of Σ are assigned T by h.

The interpretation for \({\mathcal{L}}_{a}\) is somewhat more unusual. The symbol for the interpretation is H t + 1 Σ; it is an interpretation at time t + 1 based on Σ, where Σ is to be understood formally as any set of sentences from \(\mathcal{L}\). For current purposes, the most important aspects of the interpretation are as follows:

  • The predicate symbol Now has the following semantics: H t + 1 ΣNow(s) ⇔ s = t + 1 and Now(t) ∈ Σ; otherwise H t + 1 Σ⊧ ¬Now(s).

  • The predicate symbol Contra has the following semantics: H t + 1 ΣContra(σ, s)  ⇔  either s < t and Contra(σ, s) ∈ Σ or s = t and ∃σ,  ¬σ ∈ Σ; otherwise H t + 1 Σ⊧ ¬Contra(σ, s).

  • The predicate symbol Bel has the following semantics: H t + 1 ΣBel(θ, s) ⇔  either s < t and Bel(θ, s) ∈ Σ or s = t and θ ∈ Σ; otherwise H t + 1 Σ⊧ ¬Bel(θ, s). 

For this version of active logic, we assume that the sentences in \({\mathcal{L}}_{a}\) are consistent, but allow for the possibility of inconsistency in the set of \({\mathcal{L}}_{w}\) sentences. We use the term Γ to refer to the potentially inconsistent set of \({\mathcal{L}}_{w}\) sentences in Σ: \(\Gamma = \Sigma \cap S{n}_{{\mathcal{L}}_{w}}\).

In order to model the sentences in Γ, active logic uses an “apperception function”. The notion of an apperception function is intended to help capture, at least roughly, how the world might seem to an agent with a given inconsistent belief set Γ. For a real agent, only some logical consequences are believed at any given time, since it cannot manage to infer all the potentially infinitely many consequences in a finite time, let alone in the present moment. Moreover, even if the agent has contradictory beliefs, the agent still has a view of the world, and there will be limits on what the agent will and won’t infer. This is in sharp distinction to the classical notion of a model, where (1) inconsistent beliefs are ruled out of bounds, since then there are no models, and (2) all logical consequences of the KB are true in all models.

The idea is simple: suppose S i 0, S i 0 → S j 0 and ¬S j 0 are all in Γ, we imagine that the agent might not realise, at first, that the two instances of S i are in fact instances of the same sentence symbol. That is, it might seem to the agent that the world is one in which, say, S i 1 is true, and so is S i 2 → S j 0.

The apperception functions we define can make changes only to Γ. An apperception function does not change Σ − Γ. We use the same notation ap when the apperception function is applied to an occurrence of a sentence symbol, a sentence, or a set of sentences. We start by defining a function that changes the superscripts of sentence symbols to 0. This is used to recover the original direct contradictions that were modified by the assignment of superscripts.

Definition

For any sentence \(\phi \in S{n}_{{\mathcal{L}}_{w}}\), let z(ϕ) be the sentence ϕ with all superscripts reset to 0. If \(\Sigma \subseteq S{n}_{{\mathcal{L}}_{w}}\), then z(Σ) = { z(ϕ) | ϕ ∈ Σ}.

Definition

An apperception (awareness) ap is a function ap: Σ → Σ′ where Σ and Σ′ are sets of \(\mathcal{L}\)-sentences. An ap is represented as a finite sequence of nonnegative integers: ⟨n 1, , n p ⟩. The effect of ap on Σ is as follows:

  1. 1.

    Let Σ be a set of \(\mathcal{L}\)-sentences and let \(\Gamma = \Sigma \cap {\mathcal{L}}_{w}\). Using the lexicographic order given earlier, let the k th sentence symbol in Γ be S i j. The effect of the ap = ⟨n 1, , n p ⟩ is to change S i j to \({S}_{i}^{{n}_{k}}\) if 1 ≤ k ≤ p, otherwise S i j is unchanged.

  2. 2.

    ap(Σ) = (Σ − Γ) ∪ap(Γ). (ap does not change Σ − Γ).

Example 12.7.

Let Σ = { Now(5), Bel(S 2 0, 4),  ¬S 2 1, S 2 1, S 1 0 → S 5 4}. In this case Γ =  ¬S 2 1, S 2 1, S 1 0 → S 5 4}. Writing the elements lexicographically yields ord(Γ) = { S 2 1,  ¬S 2 1, S 1 0 → S 5 4}. Consider ap = ⟨1, 3, 2, 16, 7⟩. Thenap(Σ) = { Now(5), Bel(S 2 0, 4), S 2 1,  ¬S 2 3, S 1 2 → S 5 16}.

The purpose of the apperception functions is to get rid of inconsistencies in Σ. Hence we are interested only in apperception functions that output consistent sets. The set of apperception functions that do this depends on Σ.

Definition

Let AP denote the class of all apperception functions. AP Σ = { ap ∈ AP | ap(Σ) is consistent}.

It turns out that AP Σ is never empty (Anderson et al. 2008).

At this point we are ready to define the notion of active consequence at time t—the active logic equivalent of logical consequence. Here again, the full technical details are given in Anderson et al. (2008), but we outline some of the more important elements here. We start by defining the concept of 1-step active consequence as a relationship between sets of sentences Σ and Θ of \(\mathcal{L}\), where Σ ⊆ KB t and Θ is a potential subset of KB t + 1. When we define this notion we want to make sure that Θ contains only sentences required by Σ and the definition of H t + 1 Σ. This is the reason for the next definition.

Definition

Given Σ and ap ∈ AP Σ, definedcs(Γ) = { ϕ ∈ Γ |  ∃ψ ∈ Γ such that z(ϕ) =  ¬z(ψ) or ¬z(ϕ) = z(ψ)}.ap z(Γ) = ap(Γ) − dcs(Γ).

The meaning of Definition 12.4 is that we are removing direct contradictions from ap(Γ) while ignoring the superscripts.

Definition

Let \(\Sigma ,\Theta \subseteq S{n}_{\mathcal{L}}\). Then Θ is said to be a 1-step active consequence of Σ at time t, written Σ1 Θ if and only if ∃ap ∈ AP Σ such that

  1. i.

    If \(\sigma \in \Theta \cap S{n}_{{\mathcal{L}}_{w}}\) then ap z(Γ)⊧σ (σ is a classical logical consequence of ap z(Γ)), and

  2. ii.

    If \(\sigma \in \Theta \cap S{n}_{{\mathcal{L}}_{a}}\) then H t + 1 (Σ − Γ) ∪z(Γ)⊧σ.

Definition

  1. i.

    Let \(\Sigma ,\Theta \subseteq S{n}_{\mathcal{L}}\). Then Θ is said to be an n-step active consequence of Σ at time t, written Σ n Θ, if and only if

    $$\begin{array}{rcl} \exists \Delta \subseteq S{n}_{\mathcal{L}}: \Sigma {\models }_{n-1}\Delta \;and\;\Delta {\models }_{1}\Theta .& & \end{array}$$
    (12.1)
  2. ii.

    We say that Θ is an active consequence of Σ, written Σ a Θ, if and only if Σ n Θ for some positive integer n.

Next we give some examples to illustrate the concept of active consequence.

Example 12.8.

  1. i.

    Let Σ = { Now(t), S 1 0, S 1 0 → S 4 0, S 12 0} and Θ = { Now(t + 1), S 4 0, S 12 0}. Let ap ∈ AP Σ be the identity function. It is easy to see that {S 4 0, S 12 0} are logical consequences of {S 1 0, S 1 0 → S 4 0, S 12 0}. Also by definition H t + 1 ΣNow(t + 1). Hence Σ1 Θ.

  2. ii.

    Let Σ = { S 1 0, S 2 0, S 2 0 →  ¬S 1 0} and Θ = { Contra(S 1 0, t + 1)}. We will see that Σ2 Θ. Let Δ = { S 1 1,  ¬S 1 2}. Then Σ1 Δ, through the apperception function ap(Σ) = { S 1 1, S 2 2, S 2 2 →  ¬S 1 2}. Then Δ1 Θ by the second part of the definition, regardless of the apperception function applied in this step.

Note that in Example 12.8(ii), it is not the case that Σ1{Contra(S 1 0, t)} even though the conditions for the later appearance of the relevant direct contradiction were already in place at time t. This underlines the fact that in active logic it can take time for consequences to appear in the KB. Apperception functions give active logic agents control over which inferences to make, and which to suppress. They allow the agent to have inconsistent beliefs while still having a consistent world model. Moreover, this allows us to see how an agent with inconsistent beliefs could avoid vacuously concluding any proposition, and also reason in a directed way, by applying inference rules only to an appropriately apperceived subset of its beliefs.

For instance, consider the following active logic inference:

Definition

If φ,  ¬φ ∈ KB t , where \(\varphi \in S{n}_{{\mathcal{L}}_{w}}\), then the direct contradiction inference rule is defined as follows:

$$\frac{t : \varphi ,\neg \varphi } {t + 1 : \mathit{Contra}(\varphi ,t)}$$

This inference is sound based on the definition and interpretation of Contra. And because of this, along with apperception functions, the following inference is unsound:

Definition

Let \(\Sigma \subseteq S{n}_{{\mathcal{L}}_{w}}\) be inconsistent. Let \(\psi \in S{n}_{{\mathcal{L}}_{w}}\). We define the explosive rule with respect to the language \({\mathcal{L}}_{w}\) as follows.

$$\frac{t : \Sigma ;\mathit{Inconsistent}(\Sigma )} {t + 1 : \psi }$$

The explosive inference rule is unsound. For consider the case where ψ is ¬(S 1 0 → S 1 0). No apperception function ap that turns Σ into a consistent set can logically derive ψ. Hence ap(Σ) ⊭ 1ψ.

This shows that active logic is paraconsistent. We hope that this approach to paraconsistency can shed some light on focused, step-wise, resource-bounded reasoning more generally. More details on the semantics for active logic, and many more examples of its use, can be found in Anderson et al. (2008).

5 Comparison with Reasoning Systems and Formalisms

Active logic possesses several interesting properties. It has a temporal component so that inference occurs in time: for a set of formulas Γ at time t deduce formula ϕ at time t + 1. Active logic is paraconsistent as both ϕ and ¬ϕ may hold at some time t. Active logic is also non-monotonic because a formula ϕ that holds at time t does not necessarily hold at time t + 1; this happens in particular when ϕ and ¬ϕ are replaced by the Contra formula.

We are not aware of any other logic system that possesses such a temporal component as well as paraconsistency and non-monotonicity. SOAR, Cyc and ACT-R do not appear to incorporate any of these features, and while OSCAR is non-monotonic, it is neither time-tracking nor paraconsistent. The closest of the above systems to having the distinctive features of active logic is SNePS, but there are some important differences between the two approaches. For instance although SNePS incorporates a time-tracking feature, in a SNePS-based agent NOW is a meta-logical variable, rather than a logical term fully integrated into the SNePS semantics. The variable NOW is implemented so that it does, indeed, change over time, but this change is the result of actions triggering an external time-variable update. In active logic, in contrast, reasoning itself implies the passage of time. Perhaps in part because of this difference, SNePS is a monotonic logic, whereas active logic is non-monotonic, leveraging the facts that beliefs are held at times, and beliefs can be held about beliefs, to easily represent such things as “I used to believe P, but now I believe ¬P” using the Bel operator. SNePS is also able to represent beliefs about beliefs, but there is no indication that this ability is leveraged by SNePS to guide belief updates. Rather, all beliefs are about states holding over time, so that belief change is effected by allowing beliefs to expire, rather than by formally retracting them. This is a strategy similar to that employed by the situation calculus (which does not itself incorporate a changing Now term) (McCarthy and Hayes 1969). Finally, although SNePS is a paraconsistent logic, in SNePS contradictions imply nothing at all, whereas in active logic contradictions imply Contra, a meta-level operator that can trigger further reasoning.

Nevertheless, although there are few examples of implemented systems with the features of active logic, we know that a substantial amount of work has been done on non-monotonic paraconsistent logics. While these logics are not really comparable to active logics, we provide here information on some such systems.

An early influential paraconsistent non-monotonic logical system was presented in Priest (1989). The logic LP has three truth values: True, False, and Both. The connectives and entailment in LP are defined as in classical logic, but on account of the third truth value, LP is paraconsistent. LP is then extended to LP m with consistency as a default assumption and a notion of default consequence relation ⊧ m is defined using minimal models. LP m is a non-monotonic paraconsistent system.

Another such system is a combination of LEI (Logic of Epistemic Inconsistency) and IDL (Inconsistent Default Logic), called IDL&LEI. We refer to Martins et al. (2002) for details about it including a multiple world semantics. Formulas in LEI are divided into two groups: the irrevocable formulas and the plausible formulas; the latter are distinguished by a question mark, as in α ? . No contradictions are allowed involving any irrevocable formula; contradictions are allowed only for plausible formulas. LEI is paraconsistent. Non-monotonicity is obtained by adding default rules using IDL. The IDL&LEI system has both an elegant syntax and a multiple world semantics.

Finally we mention the work in Arieli and Avron (1998) where a non-monotonic paraconsistent logic uses Belnap’s four-valued logic with a notion of logical consequence based on minimal preferential models. The approach here is primarily semantical. (Actually, it turns out that a four-valued semantics is available also for IDL.) The recent paper by Arieli (2007) uses quantified Boolean formulas in the context of multiple-valued logics to represent several non-monotonic paraconsistent logics. This paper also contains many references to recent related work.

6 Conclusions

As shown by many psychological experiments, the logic used by humans is substantially different from classical logic, and for just this reason may be more useful to commonsense reasoning. Hence logic-based AI systems should be attuned to, and where possible implement, these non-classical features. We have described several AI reasoning systems, as well as active logic, a logic designed to capture features such as time-awareness, control of inference, paraconsistency, and non-monotonicity, that we think are important to human commonsense reasoning.