The conjunction of utility theory and decision theory involves formulations of decision making in which the criteria for choice among competing alternatives are based on numerical representations of the decision agent’s preferences and values. Utility theory as such refers to these representations and to assumptions about preferences that correspond to various numerical representations. Although it is a child of decision theory, utility theory has emerged as a subject in its own right as seen, for example, in the contemporary review by Fishburn (SeeRepresentation of Preferences). Readers interested in more detail on representations of preferences should consult that entry.

Our discussion of utility theory and decision theory will follow the useful three-part classification popularized by Luce and Raiffa (1957), namely decision making under certainty, risk, and uncertainty. I give slightly different descriptions than theirs.

Certainty refers to formulations that exclude explicit consideration of chance or uncertainty, including situations in which the outcome of each decision is known beforehand. Most of consumer demand theory falls within this category.

Risk refers to formulations that involve chance in the form of known probabilities or odds, but excludes unquantified uncertainty. Games of chance and insurance decisions with known probabilities for possible outcomes fall within the risk category. Note that ‘risk’ as used here is only tangentially associated with the common notion that equates risk with the possibility of something bad happening.

Uncertainty refers to formulations in which decision outcomes depend explicitly on events that are not controlled by the decision agent and whose resolutions are known to the agent only after the decision is made. Probabilities of the events are regarded either as meaningless, unknowable, or assessable only with reference to personal judgement. Situations addressed by the theory of noncooperative games and statistical decision theory typically fall under this heading.

A brief history of the subject will provide perspective for our ensuing discussion of the three categories.

Historical Remarks

The first important paper on the subject was written by Daniel Bernoulli (1738) who, in conjunction with Gabriel Cramer, sought to explain why prudent agents often choose among risky options in a manner contrary to expected profit maximization. One example is the choice of a sure $10,000 profit over a risky venture that loses $5000 or gains $30,000, each with probability 1/2. Bernoulli argued that many such choices could be explained by maximization of the expected utility (‘moral worth’) of risky options, wherein the utility of wealth increases at a decreasing rate. He thus introduced the idea of decreasing marginal utility of wealth as well as the maximization of expected utility.

Although Bernoulli’s ideas were endorsed by Laplace and others, they had little effect on the economics of decision making under risk until quite recently. On the other hand, his notion of decreasing marginal utility became central in consumer economics during the latter part of the 19th century (Stigler 1950), especially in the works of Gossen, Jevons, Menger, Walras and Marshall.

During this early period, utility was often viewed as a measurable psychological magnitude. This notion of intensive measurable utility, which was sometimes represented by the additive form u1(x1) + u2(x2) + ⋯ + un(xn) for commodity bundles (x1,x2,…,xn), was subsequently replaced in the ordinalist revolution of Edgeworth, Fisher, Pareto, and Slutsky by the view that utility represents nothing more than the agent’s preference ordering over consumption bundles. A revival of intensive measurable utility occurred after 1920 when Frisch, Lange and Alt axiomatized the notion of comparable preference differences, but it did not regain the prominence it once held.

Bernoulli’s long-dormant principle of the maximization of expected utility reappeared with force in the expected utility theory of von Neumann and Morgenstern (1944, 1947). Unlike Bernoulli and Cramer, who favoured an intensive measurable view of utility, von Neumann and Morgenstern showed how the expected-utility form can arise solely from simple preference comparisons between risky options. They thus accomplished for decision making under risk what the ordinalists accomplished for demand theory a generation earlier.

Although little noted at the time, Ramsey (1931), in an essay written in 1926 and published posthumously, attempted something more ambitious than the utility theory for risky decisions of von Neumann and Morgenstern. Ramsey’s aim was to show how assumptions about preferences between uncertain decisions imply not only a utility function for outcomes but also a subjective or personal probability distribution over uncertain events such that one uncertain decision is preferred to another precisely when the former has greater subjective (probability) expected utility. Ramsey’s outline of a theory of decision making under uncertainty greatly influenced the first complete theory of subjective expected utility, due to Savage (1954). Savage also drew heavily on Bruno de Finetti’s seminal ideas on subjective probability, which are similar in ways to views espoused much earlier by Bayes and Laplace.

During the historical period, several unsuccessful proposals were made to replace ‘utility’ by a term better suited to the concepts it represents. Despite these failures, the terms ordinal utility and cardinal utility, introduced by Hicks and Allen (1934) to distinguish between the ordinalist viewpoint and the older measurability view of utility as a ‘cardinal magnitude’, caught on. Present usage adheres to the following measurement theoretic definitions.

Let \( \succ \) denote the relation is preferred to on a set X of decision alternatives, outcomes, commodity bundles, or whatever. Suppose preferences are ordered and can be represented by a real valued function u on X as

$$ x\succ y\iff u(x)>u(y), $$
(1)

for all x and y in X. We then say that u is an ordinal utility function if it satisfies (1) but is subject to no further restrictions. Then any other real function v that preserves the order of \( \succ \), or satisfies (1) in place of u, is also an ordinal utility function, and all such functions for the given \( \succ \) are equivalent in the ordinal context. A different preference ordering on X will have a different equivalence class of order-preserving functions. If u is also required to be continuous, we may speak of continuous ordinal utility.

If u satisfies (1) and is restricted by subsidiary conditions in such a way that v also satisfies (1) and the subsidiary conditions if and only if there are numbers a > 0 and b such that

$$ \upsilon (x)= au(x)+b, \ \mathrm{for}\;\mathrm{all}\;x\;\mathrm{in}\;X, $$
(2)

then u is a cardinal utility function and is said to be unique up to a positive (a > 0) liner transformation. Subsidiary conditions that force (2) under appropriate structure for X include additivity u(x1,…,xn) = u1(x1) + … + un (xn) with n ≥ 2, the linearity property u[λp + (1 − λ)q] = λu(p) + (1 − λ)u(q) of expected utility, and the ordered preference-difference representation (x, y) \( \succ \)*(z, w) ⇔ u(x) − u(y) > u(z) − u(w). Only the last of these, where (x, y) \( \succ \)*(z, w) says that the intensity of preference for x over y exceeds the intensity of preference for z over w, involves a view of preference that goes beyond the basic relation \( \succ \).

Decisions Under Certainty

Representation (1) is the preeminent utility representation for decision making under certainty. It presumes that the preference relation \( \succ \) is

$$ {\displaystyle \begin{array}{l} asymmetric:\mathrm{if}\;x\succ y\;\mathrm{then}\;\mathrm{not}\;\left(y\succ x\right),\\ {} negatively \ transitive:\mathrm{if}\;x\succ z\;\mathrm{then}\;x\succ y\;\mathrm{or}\;y\succ z,\end{array}} $$

and, when X is uncountably infinite, that there is a countable subset C0 in X such that, whenever x\( \succ \)y, there is a z in C0 such that x\( \gtrsim \)z\( \gtrsim \)y, where x\( \gtrsim \)z means not (z\( \succ \)x). An asymmetric and negatively transitive relation is often referred to as a weak order, and in this case both \( \succ \) and its induced indifference relation ~, defined by

$$ x\sim y\;\mathrm{if} \ \mathrm{neither}\;x\succ y\;\mathrm{nor}\;y\succ x, $$

are transitive, that is (x\( \succ \)y, y\( \succ \)z) ⇒ x\( \succ \)z and (x ~ y, y ~ z) ⇒ x ~ z. If X is a connected and separable topological space, the countable C0 condition can be replaced by the assumption that the preferred-to-x set {y:y\( \succ \)x} and the less-preferred-than-x set {y:x\( \succ \)y} are open sets in X’s topology for every x in X. When this holds, u can be taken to be continuous. If the countable C0 condition fails when \( \succ \) is a weak order, (1) cannot hold and instead we could represent \( \succ \) by vectorvalued utilities ordered lexicographically. For details and further references, see Fishburn (1970, 1974).

Economics is often concerned with situations in which any one of a number of subsets of X might arise as the feasible set from which a choice is required. For example, if X is a commodity space {(x1,…,xn):xi ≥ 0 for i = 1,…,n}, then the feasible set at price vector p = (p1,…,pn) > (0,…,0) and disposable income m ≥ 0 is the opportunity set {(x1,…,xn):p1x1 + … + pnxnm} of commodity bundles that can be purchased at p and m. The allure of (1) in such situations is that the same u can be used for choice by maximization of utility for each non-empty feasible set Y so long as the set

$$ \underset{u}{\max }Y=\left\{x\;\mathrm{in}\;Y:u(x)\ge u(y)\;\mathrm{for}\;\mathrm{all}\;y\;\mathrm{in}\;Y\right\} $$

is not empty. The existence of non-empty maxuY is assured if Y is finite or if it is a compact subset of a connected and separable topological space on which u is upper semicontinuous.

When (1) holds, the set

$$ \max \succ Y=\left\{x\;\mathrm{in}\;Y:y\succ x\;\mathrm{for}\;\mathrm{no}\;y\;\mathrm{in}\;Y\right\} $$

of maximally-preferred elements in Y is identical to maxuY. On the other hand, max \( \succ \)Y can be non-empty when no utility function satisfies (1). For example, if X is finite, then max \( \succ \)Y is non-empty for every non-empty subset Y of X if, and only if, X contains no preference cycle, that is no x1,…,xm such that \( {x}_1\succ {x}_2\succ \dots \succ {x}_m\succ {x}_1 \). In this case it is always possible to define u on X so that, for all x and y in X,

$$ x\succ y\Rightarrow u(x)\succ u(y). $$
(3)

Then choices can still be made by maximization of utility since maxuY will be a non-empty subset of max \( \succ \)Y. However, if \( \succ \) has cycles, then the principle of choice by maximization breaks down.

The situation for infinite X and suitably constrained feasible sets is somewhat different. Sonnenschein (1971) shows for the commodity space setting that \( \succ \) can have cycles while every opportunity set Y has a non-empty max \( \succ \)Y. His key assumptions are a semicontinuity condition on \( \gtrsim \) and the assumption that every preferred-to-x set is convex. Thus, choice by maximal preference may obtain when \( \succ \) can be characterized by neither (1) nor (3).

Max \( \succ \)Y for opportunity sets Y in commodity space is the agent’s demand correspondence (which depends on p and m) or, if each max \( \succ \)Y is a singleton, his demand function. The revealed preference approach of Samuelson and others represents an attempt to base the theory of consumer demand directly on demand functions without invoking preference as an undefined primitive. If f (p, m) denotes the consumer’s unique choice at (p, m) from the opportunity set there, we say that commodity bundle x is revealed to be preferred to commodity bundle y if yx and there is a (p, m) at which x = f(p, m) and p1y1 + ⋯ + pnynm. Conditions can then be stated (Uzawa 1960; Houthakker 1961; Hurwicz and Richter 1971) for the revealed preference relation such that there exists a utility function u on X for which maxuY = {f(p, m)} when Y is the opportunity set at (p, m), for every such Y.

The revealed preference approach in demand theory has stimulated a more general theory of choice functions. A choice function C is a mapping from a family of non-empty feasible subsets of X into subsets of X such that, for each feasible Y, C(Y) is a non-empty subset of Y. The choice set C(Y) describes the ‘best’ things in Y. Research in this area has identified conditions on C that allow it to be represented in interesting ways. Examples appear in Fishburn (1973, chapter 15) and Sen (1977). One is the condition.

$$ {\displaystyle \begin{array}{l}\mathrm{if}\;Y\subseteq Z\;\mathrm{and}\;Y\cap C(Z)\;\mathrm{is} \ \mathrm{non}-\mathrm{empty},\\ {}\mathrm{then} \ C(Y)=Y\cap C(Z).\end{array}} $$

When every two-element and three-element subset of X is feasible, this implies that the revealed preference relation \( {\succ}_r \), defined by \( x{\succ}_ry \) if xy and C({x, y}) = {x}, is a weak order. The weaker condition

$$ \mathrm{if}\;Y\subseteq z \ \mathrm{then} \ Y\cap C(Z)\subseteq C(Y) $$

implies that \( {\succ}_r \) has no cycles when every non-empty finite subset of X is feasible.

Decisions Under Risk

Let P be a convex set of probability measures on an algebra \(\mathscr{A}\) of subsets of an outcome set X. Thus, for every p in \( \mathscr {P} \), p(A) ≥ 0 for all \( \mathscr {A} \) in \(\mathscr{A}\), p(A) = 1, and p(AUB) = p(A) + p(AUB) whenever A and B are disjoint sets in \(\mathscr{A}\). Convexity means that λp + (1 − λ)q is in \(\mathscr{P}\) whenever p and q are in \(\mathscr{P}\) and 0 ≤ λ ≤ 1. We assume that each {x} is in \(\mathscr{A}\) and each measure that has p({x}) = 1 for some x in X is in. \(\mathscr{P}\)

The basic expected utility representation is, for all p and q in \(\mathscr{P}\),

$$ p\succ q\iff {\int}_Xu(x)\mathrm{d}p(x)>{\int}_Xu(x)\mathrm{d}q(x), $$
(4)

where u is a real valued function on X. When u satisfies (4), it is unique up to a positive linear transformation. The expected utility representation follows from the preference axioms of von Neumann and Morgenstern (1947) when each p in \(\mathscr{P}\) has p(A) = 1 for a finite A in \(\mathscr{A}\). Other cases are axiomatized in Fishburn (1970, 1982a). The most important axiom besides weak order is the independence condition which says that, for all p, q and r in P and all 0 < λ < 1,

$$ p\succ q\Rightarrow \lambda p+\left(1-\lambda \right)r\succ \lambda q+\left(1-\lambda \right)r. $$
(5)

If $5000 with certainty is preferred to a 50–50 gamble for $12,000 or $0, then (5) says that a 50–50 gamble for $5000 or – $20,000 will be preferred to a gamble that returns $12,000 or $0 or – $20,000 with probabilities 1/4, 1/4 and 1/2 respectively: r (−$20,000) = 1 and λ = 1/2.

The principle of choice for expected utility says to choose an expected-utility maximizing decision or measure in the feasible subset \(\mathscr{L}\) of \(\mathscr{P}\)when much a measure exists. Since convex combinations of measures in \(\mathscr{L}\) can be formed at little or no cost with the use of random devices, feasible sets are often assumed to be convex. Although this will not create a maximizing combination when none existed prior to convexification under the usual expected utility model, it can create maximally-preferred measures in more general theories that allow cyclic preferences. Convex feasible sets are also important in the minimax theory of noncooperative games (Nash 1951) and economic equilibrium without ordered preferences (Mas-Colell 1974; Shafer and Sonnenschein 1975).

Expected utility for the special case of monetary outcomes has sired extensive literatures on risk attitudes (Pratt 1964; Arrow 1974) and stochastic dominance (Whitmore and Findlay 1978; Bawa 1982). Risk attitudes involve curvature properties of an increasing utility function (u″ < 0 for risk aversion) and their economic consequences for expected-utility maximizing agents. Stochastic dominance relates shape features of u to distribution function comparisons. For example, ∫ udp ≥ ∫ udq for all increasing u if and only if p({x:xc}) ≥ q({x:xc}) for every real c.

Alternatives to expected utility maximization with monetary outcomes base criteria for choice on distribution function parameters such as the mean, variance, below-target semivariance, and loss probability (Markowitz 1959; Libby and Fishburn 1977). The best known of these are mean (more is better)/variance (more is worse) models developed by Markowitz (1959), Tobin (1965) and others. Whether congruent with expected utility or not (Chipman 1973), such models assume that preferences between distributions depend only on the parameters used.

Recent research in utility/decision theory of risky decisions has been motivated by empirical results (Allais and Hagen 1979; Kahneman and Tversky 1979; Slovic and Lichtenstein 1983) which reveal systematic and persistent violations of the expected utility axioms, including (5) and transitivity. Alternative utility models that weaken (5) but retain weak order have been proposed by Kahneman and Tversky (1979), Machina (1982), and others. A representation which presumes neither (5) nor transitivity is axiomatized by Fishburn (1982b).

Decisions Under Uncertanity

We adopt Savage’s (1954) formulation in which each potential decision is characterized by a function f, called an act, from a set S of states into a set X of consequences. The consequence that occurs if f is taken and state s obtains is f(s). Exactly one state will obtain, the agent does not know which it is, and the act chosen will not affect its realization. Examples of states are possible temperatures in central London at 12 noon next 14 July and possible closing prices of a company’s stock next Thursday.

Suppose S and the set F of available acts are finite, and that there is a utility function u on X that satisfies (1) and perhaps other conditions. Choice criteria that avoid the question of subjective probabilities on \(\mathscr{S}\) (Luce and Raiffa 1957, chapter 13) include

$$ {\displaystyle \begin{array}{l} maximum utility:\mathrm{choose} \ f\ \mathrm{to}\ \mathrm{maximize}\ \underset{s}{\min \limits }u\left[f(s)\right];\\ {} minimax \ loss: \ \mathrm{choose}\ f\;\mathrm{to}\ \mathrm{minimize}\\ {}\underset{S}{\max \limits}\left\{\underset{F}{\max \limits }u\left[ \ f(s)\right]-u\left[ \ f(s)\right]\right\};\hfill \\ {} \ Hurwicz\;\alpha :\mathrm{given} \ 0\le \alpha \le 1,\mathrm{choose} \ f\;\mathrm{to}\ \mathrm{maximize}\hfill \\ {}\alpha\;\underset{S}{\max \limits }u\left[ \ f(s)\right]+\left(1-\alpha \right)\underset{S}{\min \limits }u\left[ \ f(s)\right].\hfill \end{array}} $$

Maximin, which maximizes the worst that can happen, is very conservative. Minimax loss (or regret), which is less conservative than maximin, minimizes the maximum difference between the best that could happen and what actually happens. Hurwicz α ranges from maximin (α = 0) to ‘maximax’ (α = 1).

Another criterion maximizes the average value of u[f(s)] over s. This is tantamount to the subjective expected utility model with equal probability for each state.

Subjective probability as developed by Ramsey, de Finetti and Savage quantifies partial beliefs by the extent to which we are prepared to act on them. If you would rather bet £100 on horse A than on horse B then, for you, A has the higher probability of winning. If your beliefs adhere to appropriate axioms for a comparative probability relation \( \succ {}^{\ast } \) on the algebra of \(\mathscr{S}\)subsets of S (Fishburn 1986) then there is a probability measure ρ on \(\mathscr{S}\) such that, for all A and B in \(\mathscr{S}\), A\( \succ \)*Bρ(A) > ρ (B).

Savage’s axioms for \( \succ \) on F (Savage 1954; Fishburn 1970, chapter 14) imply the existence of a bounded utility function u on X and a probability measure C on \(\mathscr{S}\) such that, for all f and g in F

$$ f\succ g\iff {\int}_Su\left[f(s)\right]\mathrm{d}\rho (s)>{\int}_Su\left[g(s)\right]\mathrm{d}\rho (s), $$
(6)

with u unique up to a positive linear transformation and ρ unique. His axioms include weak order, independence axioms that in part yield the preceding representation of \( {\succ}^{\ast } \), and a continuity condition. Many other people (Fishburn 1981) have developed alternative axiomatizations of (6) and closely-related representations.

Recent alternatives to Savage’s subjective expected utility theory have been motivated by the empirical results cited in the preceding section and by Ellsberg’s (1961) challenges to the traditional subjective probability model. Suppose an urn contains 30 red balls and 60 others that are black and yellow in an unknown proportion. One ball is to be drawn at random. Many people are observed to prefer a bet on red rather than black, and a bet on black or yellow rather than red or yellow. By the traditional model, the first preference gives ρ(red) > ρ(black) and the second gives ρ(black) > ρ(red).

Schmeidler (1984) axiomatizes a utility model that replaces the additive probability measure ρ in (6) by a monotone [ABρ(A) ≤ ρ(B)] but not necessarily additive measure and argues that his model can accommodate Ellsberg’s phenomena. A different model (Loomes and Sugden 1982) uses additive p but accommodates other violations of independence and cyclic preferences.

Maximization of subjective expected utility is the core principle of Bayesian decision theory (Savage 1954; Raiffa and Schlaifer 1961; Winkler 1972). This name, used in distinction to classical methods of statistical analysis pioneered by R.A. Fisher, Jerzy Neyman, Egon Pearson, and Abraham Wald, recognizes the unabashed use of subjective probability and the revision of probabilities in light of new evidence by the basic formula of conditional probability known as Bayes’s Theorem.

A typical problem is statistical decision theory is to decide which of several possible experiments, if any, to perform for the purpose of gathering additional information that will be used in a subsequent decision. In the Bayesian approach, the primary states that occasion the need for further information can be enriched to incorporate potential experimental outcomes in such a way that (6) refers to the entire decision process. The problem can then be decomposed, as is usually done in practice, to compute optimal subsequent decisions based on particular experiments and their possible outcomes. Decision functions for each experiment that map outcomes into best subsequent acts can then be compared to determine a best experiment. Various methods of analysis in the Bayesian mode are described and illustrated in Raiffa and Schlaifer (1961).

See Also