Keywords

1 Introduction

We will start with discrete-time utility optimization which is now a classical subject and can be treated as a Markov decision process in discrete time 0 ≤ n ≤ N. Our main goal will be an application to adequate pricing of financial derivatives, in particular options, which is an important subject of financial mathematics. A financial market is studied where two assets, bond and stock, can be traded under transaction costs. A mutual fund is a good example for the stock. Under concavity and homogeneity assumptions on the utility function U, it is known that the optimal policy has a cone structure not only for models without but also for models with linear transaction costs, see below. In the present paper we will focus on such models.

An Explanatory Model

In order to describe the application of the optimal policy from utility maximization to pricing of financial derivatives, let us first consider a simple model with only one period [0, N] (starting in 0 and finishing in N = 1) and without transaction costs. Let B N be the value on the bank account at N if we start with one unit of money B 0 = 1. Then B N −1 is the classical discount factor. For fixed initial wealth x, the policy can be described by a real number θ, the investment in the stock. Then the wealth at N is X N θ = (xθ)B N + S 0 −1 θS N  = B N (xθ + S 0 −1 θB N −1 S N ), where S 0 and S N are the stock prices at 0 and N and S 0 −1 θ is the invested number of stocks.

The classical present value principle for pricing future incomes is based on the expectation of discounted quantities. According to this principle, an adequate price for a contingent claim offering S N , i.e. one unit of stock, at N would be \(\mathop{\mathrm{pr}}\nolimits (S_{N}) = E[B_{N}^{-1}S_{N}]\). But this answer may be wrong, because we know in the present situation of a financial market that S 0 is the adequate price. Starting with S 0 one is sure to have S N at N. But in general one has E[B N −1 S N ] ≠ B 0 −1 S 0 = S 0 and not the equality one would like to have. Note that the equality means that the discounted stock price process {B 0 −1 S 0, B N −1 S N } is a martingale. It was a great discovery for the stochastic community when one realized that martingales come into play. This is the reason for a change of measure where the original real-world probability measure P is replaced by an artificial martingale measure Q with Radon-Nikodym density q w.r.t. P. One wants to study adequate prices \(\mathop{\mathrm{pr}}\nolimits (C)\) for a contingent claim C depending on the underlying financial derivative and maturing at N. In the present simple model, one has C = f(S N ) for some function f, since S N is the only random variable. In multiperiod models, C is contingent upon the whole development of the stock up to N. After a change of measure, one considers the present value principle under Q:

$$\displaystyle{ \mathop{\mathrm{pr}}\nolimits (C) = E[q\,B_{N}^{-1}C] = E_{ Q}[B_{N}^{-1}C]\quad \mbox{ with }\quad \mathop{\mathrm{pr}}\nolimits (S_{ N}) = E_{Q}[B_{N}^{-1}S_{ N}] = S_{0}. }$$
(21.1)

Then {B 0 −1 S 0, B N −1 S N } is a martingale under Q and \(\mathop{\mathrm{pr}}\nolimits (\,\cdot \,)\) is called a consistent price system because of the relation \(\mathop{\mathrm{pr}}\nolimits (S_{N}) = S_{0}\). In general however, one has several choices for a martingale measure Q and one has to specify an additional preference in order to distinguish one measure Q and thus one generally agreed prize. Therefore, no preference-independent pricing of financial derivatives is possible.

Construction of a Price System

Now we explain the relations to utility optimization and how to construct a martingale measure Q and thus a consistent price system by the optimal investment θ . Let us consider the portfolio optimization problem where the wealth at N is \(X_{N}^{\theta } = B_{N}(x -\theta +S_{0}^{-1}\theta \,B_{N}^{-1}S_{N})\) defined as above and where we study max θ E[U(B N −1 X N θ)]. Then we get for the optimal investment θ by differentiating:

$$\displaystyle{E\left [U'\left (B_{N}^{-1}X_{ N}^{\theta ^{{\ast}} }\right )\left (S_{0}^{-1}B_{ N}^{-1}S_{ N} - 1\right )\right ] = 0\quad \mbox{ or }\quad E\left [c\,U'(B_{N}^{-1}X_{ N}^{\theta ^{{\ast}} })B_{N}^{-1}S_{ N}\right ] = S_{0},}$$

if the constant c is chosen such that E[cU′(B N −1 X N −1)] = 1. By a simple calculation one obtains \(c = x\,E[U^{{\ast}}(B_{N}^{-1}X_{N}^{\theta ^{{\ast}} })]^{-1}\) with U (w): = U′(w)w. Now we can set \(q = c\,U'(B_{N}^{-1}X_{N}^{\theta ^{{\ast}} })\) for q as above and we get

$$\displaystyle{ \mathop{\mathrm{pr}}\nolimits (C) = x\,E\left [U^{{\ast}}\left (B_{ N}^{-1}X_{ N}^{\theta ^{{\ast}} }\right )\right ]^{-1}\,E\left [U'\left (B_{ N}^{-1}X_{ N}^{\theta ^{{\ast}} }\right )B_{N}^{-1}C\right ] }$$
(21.2)

where typically x = 1. In fact we then have E[qB N −1 S N ] = S 0 and q thus defines a martingale measure. By a ‘marginal rate of substitution’ argument it can be shown how this price depends in a traditional way on the investor’s preference or relative risk aversion (see Davis [7], Schäl [26, Introduction]).

The Numeraire Portfolio

In the present paper, a special martingale measure Q is studied which is defined by the concept of the numeraire portfolio. Then the choice of Q can be justified by a change of numeraire (discount factor) in place of a change of measure. For this approach one has to choose for U the log-utility with U′(w) = w −1 and U (w) = 1 (see Becherer [2], Bühlmann and Platen [3], Christensen and Larsen [4], Goll and Kallsen [9], Karatzas and Kardaras [13], Korn et al. [17], Korn and Schäl [15, 16], Long [19], Platen [21], Schäl [25]). The optimal investment θ is called log-optimal. In fact, then one obtains \(q = c\,(B_{N}^{-1}X_{N}^{\theta ^{{\ast}} })^{-1}\) and \(\mathop{\mathrm{pr}}\nolimits (C) = E[q\,B_{N}^{-1}C] = E[c\,(X_{N}^{\theta ^{{\ast}} })^{-1}C]\) and c = 1 for x = 1 since U (w) = 1. As a result we finally get

$$\displaystyle{ \mathop{\mathrm{pr}}\nolimits (C) = E[(X_{N}^{\theta ^{{\ast}} })^{-1}C]. }$$
(21.3)

Comparing ( 21.1) with the possibly wrong prize \(\mathop{\mathrm{pr}}\nolimits (C) = E[B_{N}^{-1}C]\) (see above) and with a consistent prize ( 21.1), we see the following: In ( 21.3) we stick to the original probability measure but replace B N with the wealth \(X_{N}^{\theta ^{{\ast}} }\) which can be realized on the market when starting with x = 1 on the bank account and investing according to θ . When looking for a discount factor, we thus assume that we will use x = 1 in an optimal way instead of investing exclusively in the bank account. By the way, as a consequence the (generalized) discount factor \((X_{N}^{\theta ^{{\ast}} })^{-1}\) is random.

We think that it is easier to explain a change of the discount factor to a non-expert than a change of measure since we here have a financial market where we have more choices for investing one unit of money and not only the choice to invest in the bank account.

The General Model with Transaction Costs

The problem of the paper is to carry over this idea to multiperiod financial models (where N ≥ 1) in the presence of transaction costs. For such models, utility maximization and in particular log-optimality are also well studied. The wealth at stage n will be given by portfolios (Y n , Z n ) with generic values (y, z) describing the value of the stock account and the bank account at time n, respectively. It is known that the log-optimal dynamic portfolio can be described by two Merton lines in the (y, z)-plane (see Kamin [12], Constantinides [5], Sass [22]) in place of one Merton line as in the setting without transaction costs. For results in continuous time see Davis and Norman [8], Magill and Constantinides [20] and Shreve and Soner [27].

Here we will contribute to that theory. We need a natural region for portfolios (y, z) and therefore allow for negative values of y and z (but with y + z > 0), i.e. for short selling and borrowing. For any stage n < N, the region of admissible portfolios will be the solvency region and it is divided by the two Merton lines into three cones where it is optimal either (i) to buy (ii) to sell or (iii) not to trade, respectively. These properties simplify numerical studies considerably. When looking for a natural region, ‘natural’ means that it is as large as possible and that these three cones are not empty. The latter fact can happen if one restricts to nonnegative values of y and z. We will provide a moment condition (R3) on the returns for the latter property. Furthermore we will deal with open action spaces in order to be sure that the optimal action lies in the interior. This is needed for the argument that the derivative vanishes at a maximum point which was also used above in the simple explanatory model.

Martingale Measures and the Numeraire Portfolio

Martingale measures and price systems are also discussed in the literature for models with transaction costs, see Jouini and Kallal [10], Koehl et al. [14], Kusuoka [18], Schachermayer [24]. As explained above, they are basic for the concept of a numeraire portfolio. Now the goal of the paper is the following: Study the log-optimal dynamic portfolio and show that it defines a numeraire portfolio. The definition of martingale measures is not so evident in the presence of transaction cost.

When maximizing the expected utility E[U(B N −1(Y N + Z N ))], we will use Y N + Z N as total wealth at time N as in Bäuerle and Rieder [1, Sect. 4.5] and Cvitanić and Karatzas [6]. A more general concept can also be used where one introduces liquidation costs L at time N and considers L(Y N ) + Z N in place of Y N + Z N . For this problem we refer the reader to Sass and Schäl [23]. Since L is not differentiable, this case would cause a lot of additional problems and additional assumptions are needed. Indeed, this paper aims at providing the proof in the case without liquidation costs, since this case allows for much more straightforward arguments and requires less assumptions.

A contingent claim C, maturing in N, is split into a contingent claim Y C for the stock account and a contingent claim Z C for the bank account. Then a price for (Y C, Z C) turns out to be

$$\displaystyle{ \mathop{\mathrm{pr}}\nolimits (Y ^{C},Z^{C}) = E\left [(Y _{ N}^{{\ast}} + Z_{ N}^{{\ast}})^{-1}(Y ^{C} + Z^{C})\right ]. }$$
(21.4)

Here Y N + Z N is the wealth at N under the optimal dynamic portfolio. The role of (Y N + Z N )−1 is that of a generalized discount factor and (Y N , Z N ) is then called a numeraire portfolio at N.

Main Result

As main result, the log-optimal portfolio indeed turns out to define a numeraire portfolio also for models with transaction costs. As in the classical case without transactions costs, the message is the following: under very general conditions you don’t need to change the measure for pricing a contingent claim. You can stick to the probability measure P describing the real market and thus being open to statistical procedures. Instead of the bank account you must use the wealth of the log-optimal policy, starting with one unit of money as usual, as reference unit or benchmark (in the terminology of Platen [21]). Thus we see a contingent claim C relative to Y N + Z N . Working with P is also extremely useful when integrating the modeling of risk into finance as in combined finance and insurance problems, see Bühlmann and Platen [3].

2 The Financial Model

The bond with prices B n , n = 0, , N, will be described by positive deterministic interest rates r n − 1 ≥ 0 and the stock with prices S n , n = 0, , N, will be described by the relative return process consisting of positive independent random variables {R n − 1, n = 1, , N}. Let B 0 = 1 and S 0 > 0 be deterministic. Then

$$\displaystyle{ B_{n} = B_{n-1}r_{n},\quad B_{n}^{-1}S_{ n} = B_{n-1}^{-1}S_{ n-1}R_{n},\quad n = 1,\ldots,N. }$$
(21.5)

We write \(\mathbb{F} =\{ \mathcal{F}_{n},n = 0,\ldots,N\}\) for the filtration generated by {R n , n = 1, , N} where \(\mathcal{F}_{0}\) is trivial and \(\mathcal{F} = \mathcal{F}_{N}\).

A trading strategy is given by a real valued \(\mathbb{F}\)-adapted stochastic process {Δ n , 0 ≤ n < N} describing the amount of money (wealth) invested in the stock. For the transaction Δ n , the total cost K(Δ n ) with transaction costs 0 ≤ μ < 1, λ ≥ 0 has to be paid, where

$$\displaystyle{ K(\theta ):= (1+\lambda )\theta \mbox{ for }\theta \geq 0,\quad K(\theta ):= (1-\mu )\theta \mbox{ for }\theta \leq 0. }$$
(21.6)

A trading strategy will define a dynamic portfolio \(\{(Y _{n},Z_{n}),0 \leq n \leq N\}\) describing the wealth {Y n } on the stock account and the wealth {Z n } on the bank account. We get the budget equations

$$\displaystyle\begin{array}{rcl} Y _{n}& =& \overline{Y }_{n-1}r_{n}R_{n},\quad Z_{n} = \overline{Z}_{n-1}r_{n} {}\end{array}$$
(21.7)
$$\displaystyle\begin{array}{rcl}\overline{Y }_{n-1}& =& Y _{n-1} +\varDelta _{n-1},\quad \overline{Z}_{n-1} = Z_{n-1} - K(\varDelta _{n-1}),{}\end{array}$$
(21.8)

where \(\overline{Y }_{n-1}\) and \(\overline{Z}_{n-1}\) are the wealth on the stock account and the bank account after trading. We consider self-financing trading strategies where no additional wealth is added or consumed. Then we have K(y) ≥ y and \(K(\alpha y) =\alpha K(y)\) (positive homogeneity).

We will only consider admissible trading strategies where the investor stays solvent at any time in the following sense:

$$\displaystyle{ (a)\quad Y _{N} + Z_{N}> 0\quad \mbox{ and }\quad (b)\quad Z_{n} - K(-Y _{n})> 0\quad \mbox{ for }\quad n <N. }$$
(21.9)

Note that ( 21.9) implies Y n + Z n  > 0 for n ≤ N.

3 The Markov Decision Model

To ease notation we shall now assume r n  = 1 and thus B n  = 1, 1 ≤ n ≤ N. This a usual assumption and means that one uses directly discounted quantities as B n −1 S n and B n −1 B n  = 1 instead of S n and B n .

We will work with a Markov decision process where the state is described by (y, z) where y denotes the wealth on the stock account and z the wealth on the bank account.

Definition 21.3.1.

  1. a.

    The state space at n is \(\mathcal{S}_{N}:=\{ (y,z)\,:\, y + z> 0\}\) for n = N and \(\mathcal{S}:=\{ (y,z)\,:\, z - K(-y)> 0\} =\{ (y,z)\,:\, (1-\mu )y + z> 0,(1+\lambda )y + z> 0\}\) for n < N.

  2. b.

    An action θ will denote the transaction describing the amount of money (wealth) invested in the stock. The set of admissible actions will be defined below.

  3. c.

    The law of motion is defined by the budget Eqs. ( 21.7) and ( 21.8) where {R n , n = 1, , N} are independent (but not necessarily identically distributed) random variables. Thus, given the state (y, z) and the action θ at n − 1, the distribution of the state at n is that of

    $$\displaystyle{\left ((y+\theta )R_{n},z - K(\theta )\right ).}$$

\(\mathcal{S}_{N}\) is called the solvency region at stage N and \(\mathcal{S}\) is called the solvency region at all stages n < N. Obviously \(\mathcal{S}_{N}\) is defined as \(\mathcal{S}\) replacing (λ, μ) by (0, 0). Thus, \(\mathcal{S}_{N}\) and \(\mathcal{S}\) are open convex cones and the boundaries are formed by half-lines. The condition ( 21.9) can be written as \((Y _{N},Z_{N}) \in \mathcal{S}_{N}\) and \((Y _{n},Z_{n}) \in \mathcal{S}\) for n < N. We will make the following assumptions on R n .

Assumption 21.3.2.

We assume for n = 1, , N that R n is bounded by real constants R, \(\overline{R}\) with

$$\displaystyle\begin{array}{rcl} \mathrm{(R1)}& & 0 <\underline{ R} \leq R_{n} \leq \overline{R}, {}\\ \mathrm{(R2)}& & \underline{R} <1-\mu,\quad 1+\lambda <\overline{R}, {}\\ \mathrm{(R3)}& & E[(R_{n} -\underline{ R})^{-1}] = E[(\overline{R} - R_{ n})^{-1}] = \infty. {}\\ \end{array}$$

For convenience, we omit the index n for R, \(\overline{R}\). Assumption (R3) implies that R, \(\overline{R}\) are in the support of R n . Then (R2) implies a no-arbitrage condition, i.e., there is a chance that one can loose money and that one can win money when investing in the stock. Assumption (R3) is by far not necessary. Indeed, one only needs that E[(R n R)−1] and \(E[(\overline{R} - R_{n})^{-1}]\) are big enough. But it is complicated to quantify this property for each stage. Assumption (R3) is satisfied if P(R n  = r) > 0 for r = R, \(\overline{R}\) or if R n has the uniform distribution on \([\underline{R},\overline{R}]\).

Definition 21.3.3.

\(\varGamma:=\{ (y,z)\,:\, (y\,r,z) \in \mathcal{S}\mbox{ for }\underline{R} \leq r \leq \overline{R}\}\) and Γ N are the pre-solvency regions where Γ N is defined as Γ replacing \(\mathcal{S}\) with \(\mathcal{S}_{N}\) and thus (λ, μ) by (0, 0).

Obviously Γ N contains all states at time N − 1 after trading such that the system is in \(\mathcal{S}_{N}\) at time N for every possible value r of R N . Assumption (R2) now guarantees that \(\varGamma _{N} \subset \mathcal{S}\) and one can move from any state \((y,z) \in \mathcal{S}\setminus \varGamma _{N}\) to a state (y +θ, zK(θ)) ∈ Γ N by buying (θ > 0) or selling (θ < 0).

Lemma 21.3.4.

\(\varGamma =\{ (y,z)\,:\, (1-\mu )\underline{R}\,y + z> 0,\,(1+\lambda )\overline{R}\,y + z> 0\}\) and \(\varGamma _{N} =\{ (y,z)\,:\,\underline{ R}\,y + z> 0,\,\overline{R}\,y + z> 0\}\) . Γ and Γ N are closed convex cones and their boundaries are formed by two rays.

Definition 21.3.5.

The set of admissible actions θ at stage n < N − 1 will be chosen as

$$\displaystyle{\mathcal{A}(y,z):=\{\theta \,:\, (y+\theta,z - K(\theta )) \in \varGamma \},\quad (y,z) \in \mathcal{S},}$$

and at stage N − 1 as \(\mathcal{A}_{N-1}(y,z)\) defined as \(\mathcal{A}(y,z)\) replacing Γ with Γ N .

Thus \(\varDelta _{n-1} \in \mathcal{A}(Y _{n-1},Z_{n-1})\) implies \((Y _{n},Z_{n}) \in \mathcal{S}\) for n < N. Important quantities will depend on the state (y, z) only through y∕(y + z) and are thus independent of α on the ray {(α y, α z) : α > 0}. This fact will entail an important cone structure. Therefore we introduce the risky fraction

$$\displaystyle{ \varPi _{n}:= Y _{n}/(Y _{n} + Z_{n}). }$$
(21.10)

We will restrict attention to situations where Y n + Z n is strictly positive. Then Π n is well-defined.

Convention 21.3.6.

If y, z, and π appear in the same context, then we always mean π = y∕(y + z).

By use of Assumption (R2), it is easy to prove the following lemma.

Lemma 21.3.7.

There exist some functions \(\underline{\vartheta },\overline{\vartheta }: (-\lambda ^{-1},\mu ^{-1}) \rightarrow \mathbb{R}\) such that

$$\displaystyle{\mathcal{A}(y,z) =\{\theta \,;\,\underline{\vartheta }(\pi ) <\theta /(y + z) <\overline{\vartheta }(\pi )\}.}$$

The same result holds for \(\mathcal{A}_{N-1}\) replacing (−λ −1 −1 ) by \(\mathbb{R}\) , i.e. (λ,μ) by (0,0).

Then the interval \((\underline{\vartheta }(\cdot ),\overline{\vartheta }(\cdot ))\) will be a function of Π n for \((Y _{n},Z_{n}) \in \mathcal{S}\). Note that \(\overline{\vartheta }(\pi )\) may be negative (if π is too large) and ϑ(π) may be positive (if π is too small).

We will use the log-utility and consider the following maximization problem:

$$\displaystyle{ G_{n}^{{\ast}}(y,z):=\sup \, E[\log (Y _{ N} + Z_{N})\,\vert \,Y _{n} = y,\,Z_{n} = z], }$$
(21.11)

where the supremum is taken over all admissible trading strategies. The expectation in ( 21.11) is well-defined. In fact, for given (y, z), the integrand log(Y N + Z N ) is bounded from above. For that fact it is sufficient to consider the case without transaction costs which was treated in Korn and Schäl [15, Theorem 4.12]. From dynamic programming we know that we can restrict to Markov policies where Δ n  = δ n (Y n , Z n ). There a trading strategy will be described by a Markov policy {δ n , n = 0, , N − 1} if the decision rule δ n is a function on \(\mathcal{S}\) with \(\delta _{N-1}(y,z) \in \mathcal{A}_{N-1}(y,z)\) and \(\delta _{n}(y,z) \in \mathcal{A}(y,z)\) for n < N − 1. Set

$$\displaystyle{ G_{n}(y,z):= E[G_{n+1}^{{\ast}}(y\,R_{ n+1},z)]. }$$
(21.12)

Then the following optimality equation holds:

$$\displaystyle{ G_{n}^{{\ast}}(y,z) = \text{max}_{\theta }G_{ n}(y+\theta,z - K(\theta )), }$$
(21.13)

where θ runs through \(\mathcal{A}_{N-1}(y,z)\) for n = N − 1 and through \(\mathcal{A}(y,z)\) for n < N − 1. The optimality criterion states (see e.g. [1, Theorem 2.3.8]): If there are maximizers \(\theta ^{{\ast}} =\delta _{n}(y,z)\) such that

$$\displaystyle{ G_{n}(y +\theta ^{{\ast}},z - K(\theta ^{{\ast}})) = \text{max}_{\theta }G_{ n}(y+\theta,z - K(\theta )), }$$
(21.14)

then {δ n } defines an optimal Markov policy.

Definition 21.3.8.

We call a line \(\{(y+\theta,z - (1-\mu )\theta )\,:\,\theta \in \mathbb{R}\}\) a sell-line and a line \(\{(y+\theta,z - (1+\lambda )\theta )\,:\,\theta \in \mathbb{R}\}\) a buy-line.

We can now state the main theorem on the structure of the optimal Markov policy.

Theorem 21.3.9.

For n = N − 1,…,1,0 we have

  1. a.

    There exist numbers \(-1/\lambda <a_{n} \leq b_{n} <1/\mu\) such that the following holds: There exists an optimal Markov policy {δ n } where {δ n } is defined by

    1. (i)

      δ n = 0 on the no-trading cone \(\mathcal{T}_{n}^{\mathrm{notr}}:=\{ (y,z) \in \mathcal{S}\,:\, a_{n} \leq \pi \leq b_{n}\}\),

    2. (ii)

      δ n (y,z) = θ < 0 on the sell cone \(\mathcal{T}_{n}^{\mathrm{sell}}:=\{ (y,z) \in \mathcal{S}\,;\,b_{n} <\pi <1/\mu \}\) such that (y + θ,z − (1 −μ)θ) is situated on the ray {(αb n ,α(1 − b n )) : α ≥ 0},

    3. (iii)

      δ n (y,z) = θ > 0 on the buy cone \(\mathcal{T}_{n}^{\mathrm{buy}}:=\{ (y,z) \in \mathcal{S}\,:\, -1/\lambda <\pi <a_{n}\}\) such that (y + θ,z − (1 + λ)θ) is situated on the ray {(αa n ,α(1 − a n )) : α ≥ 0}.

  2. b.

    G n (αy,αz) = log α + G n (y,z) for α > 0 and G n (y,z) is concave and isotone in each component.

  3. c.

    On the sell-line through (y,z), G n attains its maximum in a point (αb n ,α(1 − b n )) for some \(\alpha \in \mathbb{R}\) . On the buy-line through (y,z), G n attains its maximum in a point (αa n ,α(1 − a n )) for some \(\alpha \in \mathbb{R}\).

  4. d.

    The sell cone and the buy cone (and of course the no-trading cone) are not empty.

Condition (R3) is only used for part (d) in Theorem 21.3.9, but it will play an important role in Sects. 21.4 and 21.5. Now the theorem has the following interpretation. Selling can be interpreted as walk on a sell-line in the (y, z)-plane. For (y, z) in the sell-cone, optimal selling then means to walk on a sell-line (starting in (y, z)) until one reaches the boundary of the no-trading-cone. The situation for the buy-cone is similar. \(\mathcal{T}_{n}^{\mathrm{notr}} \cup \{ 0\}\) is a closed convex cone and \(\mathcal{T}_{n}^{\mathrm{notr}}\) degenerates to the Merton-line if μ = λ = 0. In the present general case the boundaries of \(\mathcal{T}_{n}^{\mathrm{notr}}\) may be called the two Merton-lines. The proof of the theorem is given in Appendix 21.6. A similar result holds for the power utility function U γ (w) = γ −1 w γ, 0 ≠ γ < 1 (see Sass and Schäl [23]).

4 Martingale Properties of the Optimal Markov Decision Process

Given the optimal policy {δ n } from Theorem 21.3.9, the initial value (y, z), and the sequence R n (ω), n ≥ 1, we can construct the state process (Y n (ω), Z n (ω)), n ≥ 0. In the sequel we will only consider this process {(Y n , Z n ), n = 0, , N} determined by the optimal policy. In this section we want to prove a martingale property of the optimal Markov decision process which is important for the financial application. In the model without transaction costs, {(Y n + Z n )−1} is a martingale. In the presence of transaction costs one has to modify Y n by a factor ρ n which is close to one if the transaction costs are small. Our main goal will be to prove that {(ρ n Y n + Z n )−1} is a martingale then.

Besides the risky fraction Π n we will consider the risky fraction after trading \(\overline{\varPi }_{n}\) defined by

$$\displaystyle{ \overline{\varPi }_{n}:= \overline{Y }_{n}/(\overline{Y }_{n} + \overline{Z}_{n}). }$$
(21.15)

Further we introduce

$$\displaystyle{ \hat{\varPi }(\pi,r):= \frac{\pi r} {\pi r + 1-\pi }. }$$
(21.16)

Then we obtain from Theorem 21.3.9:

$$\displaystyle\begin{array}{rcl} \overline{\varPi }_{n}& =& 1_{\{\varPi _{n}\leq a_{n}\}}a_{n} + 1_{\{a_{n}<\varPi _{n}<b_{n}\}}\varPi _{n} + 1_{\{\varPi _{n}\geq b_{n}\}}b_{n} {}\end{array}$$
(21.18)
$$\displaystyle\begin{array}{rcl} \varPi _{n+1}& =& \overline{Y }_{n}R_{n+1}/(\overline{Y }_{n}R_{n+1} + \overline{Z}_{n}) =\hat{\varPi } (\overline{\varPi }_{n},R_{n+1}).{}\end{array}$$
(21.19)

By the definition of (Y n , Z n ) above, we know that ( 21.11) becomes

$$\displaystyle{ G_{n}^{{\ast}}(y,z) = E[\log (Y _{ N} + Z_{N})\,\vert \,Y _{n} = y,Z_{n} = z]. }$$
(21.19)

Then we have G N−1 (y, z) = G N−1(y, z) for (y, z) in the no-trading cone \(a_{N-1} \leq \pi \leq b_{N-1}\) where

$$\displaystyle{ G_{N-1}(y,z) = E[\log (y\,R_{N} + z)]. }$$
(21.20)

Definition 21.4.1.

We define H N : = Y N + Z N  = ρ N Y N + Z N , where ρ N : = 1, and for n = N − 1, , 0

$$\displaystyle\begin{array}{rcl} \rho _{n}&:=& E[\rho _{n+1}R_{n+1}H_{n+1}^{-1}\,\vert \,\mathcal{F}_{ n}]/E[H_{n+1}^{-1}\,\vert \,\mathcal{F}_{ n}], {}\\ H_{n}&:=& \rho _{n}Y _{n} + Z_{n}. {}\\ \end{array}$$

Remark 21.4.2.

In Definition 21.4.1, ρ n is well-defined since H n+1 is positive and bounded away from zero given \((\overline{Y }_{n},\overline{Z}_{n}) = (y,z) \in \varGamma _{N}\) (and Γ, respectively).

Lemma 21.4.3.

One can write \(\rho _{n} =\hat{\rho } _{n}(\varPi _{n})\) for some function \(\hat{\rho }_{n}\) , i.e. ρ n depends on the history only through Π n.

  1. a.

    For a n ≤π ≤ b n

    $$\displaystyle{\hat{\rho }_{n}(\pi ) = E[\hat{\rho }_{n+1}(\hat{\varPi }(\pi,R_{n+1}))R_{n+1}H_{n+1}^{-1}]/E[H_{ n+1}^{-1}],}$$

    where \(H_{n+1} =\hat{\rho } _{n+1}(\hat{\varPi }(\pi,R_{n+1}))\pi \,R_{n+1} + 1-\pi\).

  2. b.

    For π ≤ a n we have \(\hat{\rho }_{n}(\pi ) =\hat{\rho } _{n}(a_{n})\).

  3. c.

    For π ≥ b n we have \(\hat{\rho }_{n}(\pi ) =\hat{\rho } _{n}(b_{n})\).

Proof.

For n = N we set \(\hat{\rho }_{N} = 1\). For the induction step n + 1 → n let \(\overline{\varPi }_{n} =\pi\) and \(\overline{Y }_{n} + \overline{Z}_{n} = x\) be fixed. Then \(\rho _{n} = E[\hat{\rho }_{n+1}(\hat{\varPi }(\pi,R_{n+1})R_{n+1}H_{n+1}^{-1}\,\vert \,\mathcal{F}_{n}]/E[H_{n+1}^{-1}\,\vert \,\mathcal{F}_{n}]\), where \(H_{n+1} =\hat{\rho } _{n+1}(\varPi _{n+1})Y _{n+1} + Z_{n+1} = x\left (\hat{\rho }_{n+1}(\hat{\varPi }(\pi,R_{n+1}))\pi \,R_{n+1} + 1-\pi \right )\). Thus ρ n is in fact a function of \(\overline{\varPi }_{n} =\pi\) and thus \(\hat{\rho }_{n}\) a function of Π n .

Now (b) and (c) follow in view of (21.17). □ 

Lemma 21.4.4.

\(\hat{\rho }_{n}\) is continuous.

Proof.

We know that ρ N  ≡ 1 is continuous. We will prove now that \(\hat{\rho }_{n}\) is continuous if \(\hat{\rho }_{n+1}\) is continuous. By Lemma 21.4.3(b), (c), \(\hat{\rho }_{n}\) is continuous for π ≤ a n and for π ≥ b n . For a n  ≤ π ≤ b n the statement follows from Lemma 21.4.3(a), since \(\hat{\varPi }(\pi,r)\) is continuous in π. □ 

Theorem 21.4.5.

  1. a.

    {H n −1 ,n = 0,…,N} is a martingale,

  2. b.

    1 −μ ≤ρ n ≤ 1 + λ, n = 0,…,N.

The proof is given in Appendix 21.6.

5 Price Systems and the Numeraire Portfolio

Price Systems and Martingale Measures Q

In this section discount factors play an important role. Then the theory seems to become more transparent if we write the discount factor B n −1 explicitly. We are interested in an alternative probability measure Q with density q = dQdP w.r.t P, where Q has the same null sets as P, i.e. Q and P are equivalent. Then we have

$$\displaystyle{ q> 0\mbox{ a.s. and }\quad E[q] = 1,\quad Q(A) =\int _{A}q\,dP\quad \mbox{ for }A \in \mathcal{F}. }$$
(21.21)

Now consider a contingent claim (Y C, Z C) maturing in N and split into a contingent claim Y C for the stock account and a contingent claim Z C for the bank account. We want to find a price \(\mathop{\mathrm{pr}}\nolimits (Y ^{C},Z^{C})\) for (Y C, Z C) and will use the following approach (ansatz) if (Y C, Z C) is bounded or if Y C + Z C ≥ 0:

$$\displaystyle{ \mathop{\mathrm{pr}}\nolimits (Y ^{C},Z^{C}) = E_{ Q}\left [B_{N}^{-1}(Y ^{C} + Z^{C})\right ] = E\left [q\,B_{ N}^{-1}(Y ^{C} + Z^{C})\right ]. }$$
(21.22)

Theorem 21.5.1.

\(\mathop{\mathrm{pr}}\nolimits (\,\cdot \,)\) as given by ( 21.22 ) defines a price system, i.e. one has \(\mathop{\mathrm{pr}}\nolimits (Y ^{C},Z^{C})> 0\) for any (Y C ,Z C ) with the properties

$$\displaystyle{ Y ^{C} + Z^{C} \geq 0\quad \mbox{ a.s.},\qquad P(Y ^{C} + Z^{C}> 0)> 0. }$$
(21.23)

The proof of Theorem 21.5.1 is given by Kusuoka [18] for finite probability spaces. There it is shown that the form ( 21.22) is also necessary for a consistent price system as defined in Theorem 21.5.3 below. See also Sass and Schäl [23]. We will write

$$\displaystyle{ q_{n}:= E[q\,\vert \,\mathcal{F}_{n}]. }$$
(21.24)

Then {q n } is the density process and is a martingale under P by definition. Now we define {ρ n } given q = q N , ρ N  = 1. It will turn out that the process will agree with {ρ n } as defined in Sect. 21.4.

Definition 21.5.2.

\(q_{n}\rho _{n}B_{n}^{-1}S_{n}:= E[q\,B_{N}^{-1}S_{N}\,\vert \,\mathcal{F}_{n}]\), (i.e. \(\rho _{n} = E_{Q}[R_{n+1}\cdots R_{N}\,\vert \,\mathcal{F}_{n}]\)).

The equation in parentheses follows from Bayes’ rule. Then {q n ρ n B n −1 S n } is a martingale under P by definition which also means, in view of Bayes’ rule, that {ρ n B n −1 S n } is a martingale under Q. If there are no transaction costs, i.e. λ = μ = 0, we have under condition ( 21.25) below ρ n  = 1, 1 ≤ n ≤ N. Then the discounted stock price process {B n −1 S n } forms a martingale under the probability measure Q with density q and density process {q n }. That is the reason for calling Q a martingale measure then.

Now we define the notion of a consistent price system and give a condition in terms of {ρ n }.

Theorem 21.5.3.

Assume for 1 ≤ n ≤ N

$$\displaystyle{ 1-\mu \leq \rho _{n} \leq 1 +\lambda. }$$
(21.25)

Then the price system \(\mathop{\mathrm{pr}}\nolimits (\,\cdot \,)\) is consistent , i.e.

$$\displaystyle\begin{array}{rcl} & & \mathop{\mathrm{pr}}\nolimits (Y ^{C},Z^{C}) = 1\quad \mbox{ for }\quad (Y ^{C},Z^{C}) = (0,B_{ N});{}\end{array}$$
(21.26)
$$\displaystyle\begin{array}{rcl} & & (1-\mu )S_{0} \leq \mathop{\mathrm{pr}}\nolimits (Y ^{C},Z^{C}) \leq (1+\lambda )S_{ 0}\quad \mbox{ for }\quad (Y ^{C},Z^{C}) = (S_{ N},0);{}\end{array}$$
(21.27)
$$\displaystyle\begin{array}{rcl} & & \mathop{\mathrm{pr}}\nolimits (Y ^{C},Z^{C}) \leq 0\quad \mbox{ for }\quad (Y ^{C},Z^{C}) = (Y _{ N},Z_{N}),{}\end{array}$$
(21.28)

where (Y N ,Z N ) is the terminal portfolio under an arbitrary admissible policy with start in (Y 0 ,Z 0 ) = (0,0).

Relation ( 21.26) is natural. If one starts with 1 unit of bond, then one can be sure to have B N on the bank account at N. Relation ( 21.27) is also natural. Let us only consider the case λ = μ = 0 without transaction costs. If one starts then with 1 unit of stock, then one can be sure to have S N on the stock account at N. Relation ( 21.28) excludes a sort of arbitrage opportunity. Starting with nothing one can never reach a portfolio with a positive price. The proof of Theorem 21.5.3 is given by Kusuoka [18] for finite probability spaces. There it is shown that ( 21.25) is also necessary for a consistent price system.

The Numeraire Portfolio

Now we can explain the main purpose of the paper in terms of this section. We study the following problem. Can we replace the discount factor B N −1 by a more general one, H N −1, where H N is the terminal total wealth under some traded portfolio, and then keep to the original (physical) probability measure in place of Q. Thus we want find an admissible policy with start in (Y 0, Z 0) and with total wealth H N  = Y N + Z N at N such that E[qB N −1(Y C + Z C)] = E[H N −1(Y C + Z C)]. Then we have to define q by

$$\displaystyle{ B_{N}^{-1}q = c\,(Y _{ N} + Z_{N})^{-1} = c\,H_{ N}^{-1},\quad c = E[H_{ N}^{-1}B_{ N}]^{-1}, }$$
(21.29)

where the case c = 1 is of particular interest.

From now on, we return to the setting where B n  ≡ 1.

Lemma 21.5.4.

The definition of {ρ n } in Sect.  21.4 agrees with Definition  21.5.2 and we have q n = c H n −1.

We will require that c = 1 in Corollary 21.1 below.

Proof.

Let (Y N , Z N ) be the portfolio at N under the optimal policy as in Sect. 21.4. Set H N : = Y N + Z N  = ρ N Y N + Z N , \(\rho _{n}:= E[\rho _{n+1}R_{n+1}H_{n+1}^{-1}\,\vert \,\mathcal{F}_{n}]/E[H_{n+1}^{-1}\,\vert \,\mathcal{F}_{n}]\) as in Definition 21.4.1 and define H n : = ρ n Y n + Z n , n < N. Then we can conclude from Theorem 21.4.5(a) that

$$\displaystyle{ \{H_{n}^{-1}\}\quad \mbox{ is a martingale.} }$$
(21.30)

Upon setting q = q N : = cH N −1 as above, we obtain \(q_{n} = E[c\,H_{N}^{-1}\,\vert \,\mathcal{F}_{n}] = c\,H_{n}^{-1}\) and \(\rho _{n}H_{n}^{-1} =\rho _{n}E[H_{n+1}^{-1}\,\vert \,\mathcal{F}_{n}] = E[\rho _{n+1}R_{n+1}H_{n+1}^{-1}\,\vert \,\mathcal{F}_{n}]\). This yields

$$\displaystyle\begin{array}{rcl} q_{n}\rho _{n}S_{n}& =& c\,H_{n}^{-1}\rho _{ n}S_{n} = c\,S_{n}\,E[\rho _{n+1}R_{n+1}H_{n+1}^{-1}\,\vert \,\mathcal{F}_{ n}] {}\\ & =& c\,E[\rho _{n+1}S_{n+1}H_{n+1}^{-1}\,\vert \,\mathcal{F}_{ n}] = E[q_{n+1}\rho _{n+1}S_{n+1}\,\vert \,\mathcal{F}_{n}]. {}\\ \end{array}$$

Thus {q n ρ n S n } is a martingale under P and the definition of ρ n in Sect. 21.4 agrees with Definition 21.5.2. □ 

Now we are allowed to apply Theorem 21.4.5(b) and we get condition ( 21.25). Hence Theorem 21.5.3 applies and we know that \(\mathop{\mathrm{pr}}\nolimits (Y ^{C},Z^{C}) = c\,[H_{N}^{-1}(Y ^{C} + Z^{C})]\) is a consistent price system. For c we have 1 = E[q] = cE[H N −1] = cH 0 −1 by ( 21.30). Thus

$$\displaystyle{ c = H_{0} =\rho _{0}Y _{0} + Z_{0}. }$$
(21.31)

For models without transaction costs, one usually starts with one unit of money to get the discount factor. If we do the same in the present case, then we start with (Y 0, Z 0) = (0, 1) and thus with c = H 0 = 1. Thus we get the following corollary as main result.

Corollary 21.1.

Let {(Y n ,Z n )} be generated by an optimal policy as in Sect.  21.4 . If we start with (Y 0 ,Z 0 ) = (0,1) or more generally with H 0 = ρ 0 Y 0 + Z 0 = 1, then a consistent price system is given by

$$\displaystyle{\mathop{\mathrm{pr}}\nolimits (Y ^{C},Z^{C}) = E[(Y _{ N} + Z_{N})^{-1}(Y ^{C} + Z^{C})].}$$

Definition 21.5.5.

In the situation of Corollary 21.1 we call the dynamic portfolio {(Y n , Z n )} a numeraire portfolio.

6 Conclusive Remarks

Extension 21.6.1.

A similar result can be derived for power utility U γ (x) = x γγ with U γ (w) = w γ−1 and U γ (w) = U γ (w) w = w γ for 0 ≠ γ < 1, where γ = 0 would correspond to the log-utility. When starting again with (Y 0, Z 0) = (0, 1), one obtains a consistent price system (see Sass and Schäl [23]) by

$$\displaystyle{ \mathop{\mathrm{pr}}\nolimits ^{\gamma }(Y ^{C},Z^{C}) = E[U_{\gamma }^{{\ast}}(Y _{ N} + Z_{N})]^{-1}E[U'_{\gamma }(Y _{ N} + Z_{N})(Y ^{C} + Z^{C})], }$$
(21.32)

where {(Y n , Z n )} now is the optimal dynamic portfolio for U γ . Then (R3) is to be replaced by \(E[(R_{n} -\underline{ R})^{\gamma -1}] = E[(\overline{R} - R_{n})^{\gamma -1}] = \infty\). Now ( 21.32) formally corresponds to formula ( 21.2), but Y N + Z N still depends on the transaction costs. On the one hand, the power utility allows to work with a more general relative risk aversion 1 −γ of the investor. On the other hand we have to work with a probability measure Q γ  ≠ P. In fact, we then have

$$\displaystyle{Q_{\gamma }(A) =\int q_{\gamma }dP,\quad A \in \mathcal{F},\quad \mbox{ and }\quad q_{\gamma } = E[U_{\gamma }^{{\ast}}(Y _{ N} + Z_{N})]^{-1}U'_{\gamma }(Y _{ N} + Z_{N})\tilde{B}_{N}}$$

if we decide for \(\tilde{B}_{N}^{-1}\) as discount factor. We can choose \(\tilde{B}_{N} = B_{N}\) or \(\tilde{B}_{N} = Y _{N} + Z_{N}\) or more generally \(\tilde{B}_{N} = Y _{N}^{0} + Z_{N}^{0}\), where {(Y n 0, Z n 0)} is the dynamic portfolio under any admissible policy {δ n 0}.

Algorithm 21.6.2.

The pricing of financial derivatives under proportional transaction costs can now be done efficiently as follows. First, by backward induction one can find numerically the boundaries a N−1, , a 0 and b N−1, , b 0 of the no-trade-region which exist according to Theorem 21.3.9(c). Second, having computed these constants, the dynamic portfolio (Y n , Z n ), n = 0, , N, under the optimal policy can then be computed forwardly for any path of the stock prices. These computations are independent of the specific claims we want to price. For any financial derivative C = (Y C, Z C) we find a price according to Corollary 21.1. Since this price system is consistent, the resulting price does not lead to arbitrage. This price is preference based. Since it depends on the log-optimal portfolio it corresponds to an investor with logarithmic utility which has relative risk aversion 1. Different relative risk aversions 1 −γ > 0 can be covered by using power utility functions as in Extension 21.6.1. Also for these the computation is efficient in the sense that the optimal policy can be computed first and then prices for any claim can be found by taking expectations as in ( 21.32).

The formulation of a utility optimization problem in discrete time 0 ≤ n ≤ N for a financial market as a Markov decision model is now classical. This is also true for models with transaction costs (see Kamin [12], Constantinides [5]). However we add some new features. In particular, we use the first order condition of the optimal action as for ( 21.2). For that argument, it is necessary that the optimal action lies in the interior of the action space which is guaranteed by working with open action spaces. In fact, the first order condition leads to the martingale property in Theorem 21.4.5(a).

In Lemma 21.5.4, {H n −1} is identified as the density process {q n } and we see that the martingale property for {H n −1} must necessarily hold. Moreover this property is also used in Lemma 21.5.4 to show that {H n −1 ρ n S n } is a martingale as well.

The paper treats a financial model with one stock (and one bond). But models with d stocks (d > 1) and transition costs play an important role and one can ask for extensions of the present results to models with several stocks. Numerical results show that for d > 1 the structure of the optimal policy may be complicated. Without knowing the structure of the optimal policy, one can however prove by use of the methods of Kallsen and Muhle-Karbe [11] that the main result remains true for models where the underlying probability space is finite. In fact, for such models the optimal policy defines a dynamic portfolio which is a numeraire portfolio. It seems to be unknown whether this extends to infinite probability spaces.