Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Are you as rational as a clever philosopher or a professor of economics? Well, you answer, it depends on what “rational” means. In the traditional view of rationality, the decision maker possesses all information that can possibly be gathered and based on it makes all logically correct deductions, which she uses to make an optimal decision. For example, when choosing among probabilistic options, this decision maker knows all possible outcomes of each option, knows the probability that each outcome will occur, is able to assign a numerical utility to each outcome and finally calculates the expected utility of each option and picks an option which maximizes it.

This traditional kind of rationality is called unbounded rationality. In contrast, bounded rationality refers to problems for which there is not adequate time or computational resources to obtain all information and find an optimal solution but nevertheless a good solution must be identified. In other words, bounded rationality is the realistic kind of rationality that laypeople and experts need to exhibit in their lives and work, save for decisions for which all possible values and probabilities of all options can be known, such as in casinos.

Herbert Simon (1955– ), one of the great twentieth-century polymaths—who sometimes also wore the hat of an Operational Researcher—is credited as the father of bounded rationality, but he refrained from giving a precise definition. Thus, there are multiple views of bounded rationality (Rubinstein 1998; Gigerenzer and Selten 2001; Lee 2011; Katsikopoulos 2014).

This chapter presents one view of bounded rationality, which I see as particularly relevant to Operational Research (OR). This view has a very strong behavioral component: it consists of prescriptive models of decision making, which have also been used to describe people’s actual behavior. The models include the few pieces of information that people use and also specify the simple ways in which people process this information. These models go under labels such as “fast and frugal heuristics” (Gigerenzer et al. 1999), “simple models” (Hogarth and Karelaia 2005), “psychological heuristics” (Katsikopoulos 2011) and “simple rules” (Sull and Eisenhardt 2012). This chapter uses the label psychological heuristics for all of these.

The contribution of the chapter is fourfold: The conceptual foundation of the psychological heuristics research program, along with a discussion of its relationship to soft and hard OR, is provided in Sect. 2.2. Then, Sect. 2.3 presents an introduction to models of psychological heuristics. In Sect. 2.4, conditions are reviewed under which models of psychological heuristics perform better or worse than more complex models of optimization in problems of multi-attribute choice, classification and forecasting; based on these conditions, a guide is provided for deciding which of the two approaches to use for which types of problems. Finally, Sect. 2.5 concludes by providing the main take-home messages and briefly discusses the role that psychological heuristics can play in OR theory and practice.

2 The Conceptual Foundation of Psychological Heuristics

There are at least three interpretations of heuristics which are relevant to this chapter. First, in hard OR, heuristics refers to computationally simple models which allow one to “quickly [find] good feasible solutions” (Hillier and Lieberman 2001, p. 624). The other two interpretations of heuristics come from the behavioral sciences, such as psychology and economics. Kahneman et al. (1982) focused on the experimental study of psychological processes that “in general…are quite useful, but sometimes lead to severe and systematic errors” (Tversky and Kahneman 1974, p. 1124) and proposed informal models (i.e. models that do not make precise quantitative predictions) of heuristics. Gigerenzer et al. (1999) developed and tested formal models of heuristics that, they argued, “…when compared to standard benchmark strategies…, can be faster, more frugal and more accurate at the same time” (Gigerenzer and Todd 1999, p. 22).

Katsikopoulos (2011) proposed a definition which is a hybrid of these interpretations, i.e. psychological heuristics are formal models for making decisions that:

(i) rely heavily on core psychological capacities (e.g. recognizing patterns or recalling information from memory);

(ii) do not necessarily use all available information and process the information they use by simple computations (e.g. ordinal comparisons or un-weighted sums);

(iii) are easy to understand, apply and explain.

Requirements (i), (ii) and (iii) are partly underspecified, but the following discussion should clarify their meaning. Consider the problem of choosing one out of many apartments to rent based on attributes such as price, duration of contract, distance from the center of town and so on. The standard approach of hard OR, decision analysis (Keeney and Raiffa 1976), includes eliciting attribute weights, single attribute functions, interactions among attributes, and so on. Then these different pieces of information are integrated by using additive or multi-linear functions. On the other hand, a psychological heuristic for solving the problem could be to decide based on one attribute (e.g. price) or order attributes by subjective importance and decide based on the first attribute in the order which sufficiently discriminates among the alternatives (Hogarth and Karelaia 2005).

For example, price could be ranked first and contract duration second, and prices could differ only by 50 pounds per month while contract durations could differ by a year, in which case the apartment with the longest contract would be chosen (assuming that you prefer longer to shorter contracts). In a review of 45 studies, Ford et al. (1989) found that people very often use such heuristics for choosing items as diverse as apartments, microwaves and birth control methods.

As a second example, consider the problem of forecasting which one of two companies will have higher stock value five years from now. Assuming that you recognize only one of the two companies, a psychological heuristic for making such decisions is to pick the recognized company (Goldstein and Gigerenzer 2009). This is in stark contrast with doing the computations of mean-variance portfolio optimization (Markowitz 1952).

Psychological heuristics differ from the heuristics of the “heuristics-and-biases” research program of (Kahneman et al. (1982)) mainly in that they are models which make precise quantitative predictions. For further discussion, see Kelman (2011) and Katsikopoulos and Gigerenzer (2013). Formal modeling also differentiates psychological heuristics from the “naturalistic decision making” research program (Zsambok and Klein 1997). For a discussion of how the two programs are related and can learn from each other, see Keller et al. (2010). For a discussion of how psychological heuristics can be integrated with systems approaches (Sage 1992), see Clausing and Katsikopoulos (2008).

Psychological heuristics target problems which have been tackled by hard OR models as well. In these problems, there is a clear objective (e.g. choose the company with the higher stock value five years from now), and the success of a method may be evaluated by using standards such as agreement with the ground truth (e.g. company stock values). Like hard OR methods, heuristics are models of people’s behavior and thus differ from a mere restatement or reuse of managerial intuition. In particular, they are formalized so that they conform to (i), (ii) and (iii).

Psychological heuristics differ from the heuristics of hard OR in that they are not mere computational shortcuts but have an identifiable psychological basis. This psychological basis can be due to expertise (Zsambok and Klein 1997). For example, some experienced managers are aware of the fact that customers who have not bought anything from an apparel company in the last nine months are very unlikely to buy something in the future, and use this single attribute to make more accurate decisions about targeted advertising than they could using a standard forecasting model (83% vs. 75%; Wuebben and von Wangenheim 2008). Furthermore, the psychological basis of heuristics can be available to laypeople as well. For example, a human child can recognize faces better than currently available software (with the possible exception of new anti-terrorist technologies).

Of course, some heuristics of hard OR may formally look like the heuristics a person would spontaneously use, as in solving the traveling salesman problem by always going to the closest unvisited town. But the process of arriving at the heuristics is different. Unlike hard OR models, psychological heuristics are not derived by solving or approximating the solution of an optimization model. Rather, psychological heuristics are based on the observation and analysis of human behavior, and in particular of how people make good decisions with little data.

Psychological heuristics have a nuanced relationship with methods of soft OR (Rosenhead and Mingers 2001). The main point is that psychological heuristics and soft OR methods target different problems. Unlike soft OR, the heuristics discussed in this chapter do not apply to wicked problems (Churchman 1967) with unclear objectives or multiple disagreeing stakeholders. The sSuccess of soft OR methods may mean that communication among stakeholders was enhanced or that consensus was achieved (Mingers 2011), whereas the success of psychological heuristics may be measured quantitatively.

On the other hand, there is a crucial point of convergence of psychological heuristics and soft OR. Both approaches acknowledge the possibility that high-quality data—say, on utilities or probabilities—is missing, and tailor their methods appropriately.

Table 2.1 summarizes these conceptual connections among soft OR, psychological heuristics and hard OR. It can be argued that psychological heuristics lie between hard and soft OR.

Table 2.1 A summary of conceptual connections among soft OR, psychological heuristics and hard OR

3 Models of Psychological Heuristics

A main family of psychological heuristics is lexicographic models (Fishburn 1974). Consider the problem of choosing one out of many apartments to rent based on attributes such as price, duration of contract, and distance from the center of town. Lexicographic models decide based on one attribute—say, price—or order attributes by subjective importance and decide based on the first attribute in the order which sufficiently discriminates among the alternatives (Hogarth and Karelaia 2005). For example, price could be ranked first and contract duration second, and prices could differ only by 50 pounds per month while contract durations could differ by a year, in which case the apartment with the longest contract would be chosen (assuming that you prefer longer to shorter contracts).

Lexicographic models have been applied to problems of multi-attribute choice, classification and forecasting. In multi-attribute choice, the objective is to choose one out of many alternatives, an approach which obtains the maximum true multi-attribute utility to the decision maker, such as, for example, overall satisfaction from renting an apartment.

In classification, the objective is to classify an object into one out of many possible categories, again based on its attribute values. For example, one classification problem is to decide if a patient with some known symptoms, such as intense chest pain, is at a high risk of a heart attack and needs to be in the emergency room or should just be monitored in a regular nursing bed.

Forecasting refers to any type of problem where the ground truth is not known now but will be available in the future (e.g. company stock values in five years). It does not necessarily refer to making point estimates (e.g. predicting the stock value of a company in five years). Rather, forecasting here could mean making multi-attribute choices (e.g. which one of two companies will have a higher stock value in five years?) or classifications (e.g. will this company be bankrupt within five years?) into the future.

It is a mathematical fact that lexicographic models for multi-attribute choice, classification and forecasting can be formally represented by a simple graphical structure, called fast and frugal trees (Martignon et al. 2008). An example fast and frugal tree is provided in Fig. 2.1. It was developed for classifying vehicles approaching a military checkpoint as hostile or nonhostile (Keller and Katsikopoulos 2016). Fast and frugal trees use a small number of attributes, which are first ordered and then inspected one at a time. Every time an attribute is inspected, a yes-or-no question on the value of the attribute is asked. Typically, the question refers to an ordinal comparison; for example, in the first attribute of the tree in Fig. 2.1, the number of occupants in the vehicle is compared to 1. For each attribute, for one of the two possible answers a classification is made immediately (e.g. in the tree of Fig. 2.1, the vehicle is immediately classified as non-hostile if there are more than one occupant), whereas for the other possible answer the next attribute is inspected. Of course, a classification is made for each answer on the last attribute in the order.

Fig. 2.1
figure 1

A fast and frugal tree for classifying vehicles approaching a military checkpoint as hostile or non-hostile (Keller and Katsikopoulos, 2016)

Typically, attributes are ordered by a measure of the statistical correlation between each attribute of the object and the utility or category of the object (Martignon et al. 2008). This means that data on attributes and utilities or categories of objects is required. This data comprises the training set. It has been found that when people are given a training set of adequate size and enough time to learn from it, they can order attributes by their correlation (Broeder and Newell 2008).

It is important to note that fast and frugal trees do not necessarily require statistical data. An alternative possibility is expert knowledge, combined with a task analysis (Vicente 1999). Indeed, the tree of Fig. 2.1 could not be built based on statistics, because the available database, 1,060 incident reports of situations involving motor vehicles approaching a NATO military checkpoint in Afghanistan between January 2004 and December 2009, included only seven successful suicide attacks, and on those only one attribute was available.

Because of this, methods of statistics and computer science, such as classification and regression trees (Breiman et al. 1984) and support vector machines (Vapnik 1999), also cannot be applied to this problem in an obvious way. The tree of Fig. 2.1 was built based on semi-structured interviews with German armed forces training instructors and combat-experienced personnel, and a literature review. Had it been applied in Afghanistan, the tree would have reduced civilian casualties by 60% (from 204 to 78) (Keller and Katsikopoulos 2016).

Financial and medical practitioners have been positive toward fast and frugal trees. Economists from the Bank of England developed a fast and frugal tree for forecasting whether a bank is at risk of bankruptcy or not (Aikman et al. 2014), anticipating that it will be a useful aid to regulators. The tree used four economic indicators: leverage ratio in the balance sheet, market-based capital ratio, total amount of wholesale funding and loan to deposit ratio. In the dataset of 116 banks which had more than 100 billion USD in assets at the end of 2006, the tree correctly identified 82% of the banks which subsequently failed and 50% of the banks which did not fail. The fast and frugal tree was not outperformed by any of 20 versions of the usual tool of financial economics, logistic regression, which used the same economic indicators as the tree while being much easier to understand and use.

Louis Cook and his team at the Emergency Medical Services Division of the New York City Fire Department used a fast and frugal tree for deciding which of the victims of the September 11 terrorist attack needed urgent care (Cook 2001). Based on their own medical experience, Green and Mehr (1997) developed a fast and frugal tree for the heart attack problem discussed earlier, which improved upon the unaided performance of doctors in a Michigan hospital. Overall, it has been argued that fast and frugal trees make the medical decision process more transparent and easier to understand and to communicate to patients (Elwyn et al. 2001).

There are many other models of psychological heuristics beyond lexicographic ones (Gigerenzer et al. 2011). Another main type of heuristics is tallying (or unit-weights) models (Dawes and Corrigan 1974). Tallying models are linear models for multi-attribute choice, classification and forecasting in which the weights of all attributes are set to 1.

Surprisingly, it has been found in applications in psychometrics and personnel selection that tallying could sometimes forecast better than linear regression with unconstrained attribute weights (Bobko et al. 2007). Tallying also could not be outperformed by 13 versions of Markowitz’s mean-variance optimization model in allocating wealth across assets in seven real financial portfolios (DeMiguel et al. 2009).

Finally, note that tallying and lexicographic models occupy the two extremes of a continuum: in tallying models, each attribute can compensate for any other attribute, whereas in lexicographic models, the first discriminating attribute cannot be compensated for by all other attributes put together.

The few applications discussed in this section suggest that psychological heuristics compete well with more complex models used in statistics, computer science and hard OR. But are these isolated incidents? The next section provides a systematic review.

4 When to Use Psychological Heuristics and When Not To

In 1979, Herbert Simon wrote: “decision makers can [find] optimal solutions for a simplified world, or satisfactory solutions for a more realistic world. Neither approach, in general, dominates the other and both have continued to co-exist in the world of management science” (Simon 1979, p. 498).

Almost 40 years later, this point can be elaborated on: a fair amount of research has focused on the comparison between one approach to finding satisfactory solutions, psychological heuristics, and the more standard approach of using models of optimization, where an optimum of a mathematical function that models a simplified but supposedly sufficient version of the problem is computed (this definition of optimization is inspired by Kimball 1958). Here, optimization models include regressions (linear, logistic and regularized), Bayesian networks (such as naïve Bayes), neural networks, classification and regression trees, and support vector machines.

Some empirical evidence from this research was provided in Sects. 2.2 and 2.3, with the examples on targeted advertisement, identification of hostiles at checkpoints and flagging banks at a high risk of bankruptcy. For a systematic review of the empirical evidence, which comes from such diverse domains as economics, management, health, transportation and engineering, see Katsikopoulos (2011).

In this section, I focus on the theoretical analyses. A general framework for understanding the comparative performance of psychological heuristics and more complex models is provided by the statistical theory of prediction, and in particular by the bias-variance decomposition-of-prediction error (Geman et al. 1992; Gigerenzer and Brighton 2009).

This decomposition is a mathematical fact which says that the prediction error of any model is the sum of two terms. The first term is called bias, and it measures how well, on the average, the model agrees with the ground truth. Complex models—which usually have many parameters—tend to have less bias than simple models—which usually have fewer parameters—because when parameters can be tweaked, the agreement between model prediction and ground truth can increase as well. For example, Markowitz’s multi-parameter optimization model achieves low bias, whereas tallying attribute values has zero parameters and has relatively high bias.

But this is not the whole story. There is a second term, called variance, which contributes to a model’s total prediction error. Variance measures the variation of model predictions around the model’s average prediction. Unlike the bias term, when it comes to the variance term, model complexity is less of a blessing and more of a curse. Complex multi-parameter models tend to have higher variance than simple models with fewer parameters, because more parameters can combine in more ways and generate more distinct predictions.

For example, one can intuit why simple models tend to have lower variance than more complex models for small training set sizes. The smaller the training set, the more likely it is that sampling error and natural variations in the instances which are included in the training set will lead to variation in the parameter estimates of a given model. This variation can be expected to have an influence on the more heavily parameterized models to a greater degree than on the simpler rules. In an extreme case, Markowitz’s multi-parameter optimization model has relatively high variance, whereas tallying has zero variance because it has zero parameters.

Because a model’s total prediction error is the sum of its bias and variance, one can see that the result can go either way: a simple or a more complex model can have higher predictive accuracy in a particular dataset, depending on whether an advantage in bias is larger than an advantage in variance in this dataset.

It has been argued that in practice variance may be more critical than bias (Brighton and Gigerenzer 2015). This claim is consistent with a recent review of the forecasting literature which concluded that all valid evidence-based forecasting methods are simple and urged decision makers to accept forecasts only from simple methods (Green and Armstrong 2015).

Surprisingly, it has been recently discovered that simple rules may also achieve competitive bias in practice. This happens when there exists an attribute or an alternative option which dominates the others.

An attribute dominates other attributes when it is subjectively much more important to the decision maker than the other attributes. For example, the distance of an apartment to the city center may be much more important to a particular renter than other apartment attributes. A second meaning of attribute dominance is when an attribute is statistically much more informative of the utility of options than other attributes. For instance, time since last purchase predicts future sales much more accurately than customer age does (Wuebben and von Wangenheim 2008). It has been analytically shown that lexicographic models which decide based on a dominant attribute incur zero bias (Martignon and Hoffrage 2002; Katsikopoulos 2011).

An alternative option dominates other options when its attribute values are better than or equal to the attribute values of the other options. In this case, most psychological heuristics incur zero bias. Furthermore, less restrictive definitions of dominance exist, which also have been shown to lead to zero bias for lexicographic models and tallying (Baucells et al. 2008). These results hold when utility is an additive or multi-linear function of the attributes (Katsikopoulos et al. 2014).

One may think that dominant attributes and alternatives are rare in the real world. In fact, the opposite seems to be the case (Şimşek 2013). Across 51 real datasets, it was found that dominant attributes exist in 93% of binary datasets (i.e. attributes that had values of 1 or 0) and in 83% of the numeric datasets and that dominant alternatives exist in 87% and 58% of binary and numeric datasets, respectively.

In sum, the conclusion of the theoretical work is that psychological heuristics tend to perform better than more complex models of optimization when (i) the information available is not of high quality or not ample enough to estimate the parameters of models reliably or (ii) there exists one attribute or one alternative option which dominates the others. On the other hand, when neither condition (i) or condition (ii) holds, more complex models tend to perform better than psychological heuristics.

Condition (i) essentially says that a problem is difficult. Such difficulties may arise when a problem is dynamic or future developments are unpredictable. If (i) holds, an advantage in the variance component of the prediction error is much larger than the bias component and simpler models have a very good chance of outperforming more complex models.

An interesting interpretation of condition (ii) is that it says that the problem is easy, in the following sense: either there exists one alternative option which is better than all other options and the decision maker needs only to identify it, or there exists one attribute which is so important or informative that it suffices to consult only this attribute and again the decision maker needs only to identify it. If (ii) holds, as empirical research has shown that it often does in practice, several simple models achieve zero bias and thus can indeed outperform more complex models.

Based on the empirical and theoretical results, shown in Table 2.2, a guide is provided for deciding which of the two approaches to use for which types of problems.

Table 2.2 A guide for deciding which of the two approaches to decision making to use for which types of problems

5 Conclusions

This chapter presented one view of bounded rationality, which I see as particularly relevant to OR. Psychological heuristics, which have been used to describe people’s actual behavior, were proposed as prescriptive methods for how people should make multi-attribute choices, classify objects into categories and make forecasts. Psychological heuristics specify the few pieces of information that people—experts as well as laypeople—use and the simple ways in which people process this information. A few relevant examples were provided, including targeted advertisement, identification of hostiles at checkpoints (Fig. 2.1) and flagging of banks at a high risk of bankruptcy.

Why should one consider psychological heuristics as prescriptive models when so much effort has already been put into developing models of optimization? In one of his fables, Russell Ackoff (1979) complained about pronouncing optimization models optimal without checking if their assumptions held: a very large intrasystem distribution problem was modeled as a linear programming problem, and its optimal solution was derived; the argument offered for implementing this solution was that its performance was superior, according to the linear programming model, to that of another solution!

Ultimately, choosing a method for making decisions should be based on facts. This chapter contrasted the empirical evidence and theoretical analyses on the relative performance of psychological heuristics and optimization models, in problems of multi-attribute choice, classification and forecasting (Sect. 2.3). On the basis of these facts, a guide was provided for deciding which of the two approaches to use for which types of problems (Table 2.2). Perhaps the main message is that, so far as we know, psychological heuristics should be chosen for problems that are either easy or difficult and more complex models should be used for problems in between.

Of course, more work needs to be done. For example, most psychological heuristics research has ignored the case of more than two alternatives or categories (for exceptions, see Hogarth and Karelaia 2005 and Katsikopoulos 2013), which may be more representative of real problems. But in any case, the study of psychological heuristics can serve as a conceptual bridge between soft and hard OR. This point was also made in Sect. 2.1 (Table 2.1).

But can psychological heuristics scale up to more complex problems, as for example strategic problems with unclear objectives and multiple disagreeing stakeholders? French et al. (2009) seem to believe they cannot, when they say that psychological heuristics can be applied to “simple decision tasks with known correct solutions” (p. 169) and to “some tactical and operational decisions” (p. 419).

I basically agree with French et al. (2009) that it is not yet possible to scale up the formal models of psychological heuristics presented in this chapter. But there are two caveats: The first is that psychological heuristics require only that there exist a correct solution, not that it be given to them. In fact, as was shown, psychological heuristics perform especially well when the correct solution will be available in the future. This is a point where psychological heuristics exhibit the kind of robust power of human intuition and expertise (Klein 1999) that is often lost in hard OR and that soft OR tries to capture. The second caveat is that a heuristics approach has in fact been applied to problems of understanding information about health conditions and making informed decisions about treatments; these are problems where patients, doctors, pharmaceutical companies, health administrators and policy makers often have unclear or conflicting objectives (Gigerenzer and Gray 2011). These heuristics are based on knowledge from the psychology of thinking, perception and emotion and from social psychology. Integrating this approach with the one presented in this chapter and with soft and hard OR is a key task for the future.