1 Introduction

Markowitz’s mean-variance model (see [10]) initiated the famous return-risk assets allocation framework. An investor who considers a mean-variance criterion seeks the best investment policy such that the expected value of the terminal wealth (i.e., return) is maximized and the variance of the terminal wealth (i.e., risk) is minimized, which can be formulated as follows,

$$\begin{aligned} (MV(\gamma )):&\quad \min \quad \text{ Var }(X_{1}|X_{0})-\gamma \mathbb {E}[X_{1}|X_{0}],\\ (MV(\omega )):&\quad \max \quad \mathbb {E}[X_{1}|X_{0}]-\omega \text{ Var }(X_{1}|X_{0}), \end{aligned}$$

where \(X_{0}\) is the initial wealth level, \(X_1\) is the terminal wealth level, \(\gamma \ge 0\) and \(\omega \ge 0\) are the trade-off parameters between two conflict objectives. We call \(\gamma \) and \(\omega \) risk aversion parameters, which represent the risk aversion attitude of the investor. When \(\gamma =0\) or \(\omega =+\infty \), the investor is totally risk averse.

Besides constant risk aversion parameter, there are several state-dependent risk aversion parameters. Björk et al. [4] and Wu [14] proposed, respectively, in continuous time setting and multi-period setting, that risk aversion parameter \(\omega \) takes a fractional form of current wealth level \(X_{t}\),

$$\begin{aligned} \omega (X_{t})=\frac{\omega }{X_{t}}, \quad (\omega \ge 0). \end{aligned}$$

Hu et al. [7] proposed the risk aversion parameter \(\gamma \) takes a linear function of current wealth level \(X_{t}\),

$$\begin{aligned} \gamma (X_{t})=\mu _{1}X_{t}+\mu _{2}, \quad (\mu _{1}\ge 0), \end{aligned}$$

in continuous time. Cui et al. [6] proposed an extended piecewise linear risk aversion parameter in a multi-period setting as follows,

$$\begin{aligned} \gamma (X_{t})= {\left\{ \begin{array}{ll} \gamma ^{+}(X_{t}-\rho _{t}^{-1}W), &{} \text {if } X_{t}\ge \rho _{t}^{-1}W, \\ -\gamma ^{-}(X_{t}-\rho _{t}^{-1}W), &{} \text {if } X_{t}< \rho _{t}^{-1}W, \end{array}\right. }\quad (\gamma ^{+}\ge 0,~ \gamma ^{-}\ge 0), \end{aligned}$$

where \(\rho _t^{-1}\) is the riskless discount factor and W is the preset investment target. For Björk et al.’s setting, due to the positiveness of controlled wealth level \(X_t\) in continuous time, the investor is always risk averse and the higher the wealth level, the lower the risk aversion. In Wu’s setting, when the wealth level is negative, \(\omega (X_{t})\) is negative, i.e., the investor is risk seeking and tries to maximize both the expected value and the variance of the terminal wealth. In Hu et al.’s setting, when the wealth level is less than \(-\mu _{2}/\mu _{1}\), \(\gamma (X_{t})\) becomes negative, i.e., the investor tries to minimize the variance as well as the expected value of the terminal wealth. In Cui et al.’s setting, the investor is always risk averse and may have different views with respect to the difference between current wealth level and discounted preset investment target. In this paper, we solve the continuous time mean-variance portfolio optimization problem with Cui et al. [6]’s piecewise linear risk aversion parameter.

Consider the dynamic mean-variance portfolio optimization problem at time 0,

$$\begin{aligned} (MV_{0}(\gamma (X_{0}))): \quad \min _{\{u_t\}_{t\in [0,T]}} \quad \text{ Var }(X_{T}|X_{0})-\gamma (X_0)\mathbb {E}[X_{T}|X_{0}], \end{aligned}$$

where \(u_t\) is the investment policy at time t and \(X_T\) is the terminal wealth level. At time 0, although \(\gamma (X_0)\) is a known constant, the problem is hard to solve, due to the non smoothing property of the variance term, i.e., \(\text{ Var }(X_T|X_0)\ne \text{ Var }(\text{ Var }(X_T|X_t)|X_0)\). Li and Ng [9] and Zhou and Li [15] adopted the embedding scheme to derive the optimal mean-variance policy under a multi-period setting and a continuous time setting, respectively. We call the optimal mean-variance policy determined at time 0 pre-committed optimal mean-variance policy. However, for \(t>0\), the investor faces a truncated mean-variance portfolio optimization problem,

$$\begin{aligned} (MV_{t}(\gamma (X_{t}))): \quad \min _{\{u_s\}_{s\in [t,T]}} \quad \text{ Var }(X_{T}|X_{t})-\gamma (X_t)\mathbb {E}[X_{T}|X_{t}], \end{aligned}$$

whose short term optimal mean-variance policy is different from the pre-committed optimal mean-variance policy derived at time 0 in general (see Basak and Chabakauri [2], Cui et al. [5], Wang and Forsyth [13]). This phenomenon is called time inconsistency. In the language of dynamic programming, Bellman’s principle of optimality is not applicable in this dynamical return-risk portfolio selection model, as the global and local objectives are not consistent (see Artzner et al. [1], Cui et al. [5]). In the fields of dynamic risk measures and dynamic risk management, time consistency is a basic requirement of dynamic risk measures (see Rosazza Gianin [11], Artzner et al. [1] and Jobert and Rogers [8]).

How to resolve this inconsistency? Basak and Chabakauri [2] extended Strotz [12]’s proposal of strategy of consistent planning and reformulated the dynamic mean-variance model as an interpersonal game model where the investor optimally chooses the policy at any time t, on the premise that he or she has already decided his or her time-consistent policies in the future. For example, in multi-period setting, the investor at time t faces the following nested portfolio selection problem

$$\begin{aligned} (NMV_{t}(\gamma (X_t))): \quad \min _{u_{t}}&\quad \text{ Var }(X_{T}|X_{t})-\gamma (X_t)\mathbb {E}[X_{T}|X_{t}],\\ \text{ s.t. }&\quad u_{j} \text{ solves } (NMV_{j}(\gamma (X_j))), \quad t< j \le T, \end{aligned}$$

with terminal period problem given as

$$\begin{aligned} (NMV_{T-1}(\gamma (X_T))): \quad \min _{u_{T-1}} \quad \text{ Var }(X_{T}|X_{T-1})-\gamma (X_T)\mathbb {E}[X_{T}|X_{T-1}]. \end{aligned}$$

The subgame Nash equilibrium solution of the nested problem is called time consistent mean-variance policy, which can be derived by backward induction. Basak and Chabakauri [2] assumed that the investor has a constant risk aversion during the investment procedure. Björk et al. [4], Hu et al. [7], Wu [14] and Cui et al. [6] extended Basak and Chabakauri’s results by studying different state-dependent risk aversion parameters. For general time inconsistent control problems, Björk and Murgoci [3] proposed the time inconsistent stochastic control framework to derive the time consistent control.

In this paper, we aim to adopt a time inconsistent stochastic control framework to derive a time consistent mean-variance policy (subgame Nash equilibrium policy) for continuous time mean-variance portfolio selection problems with the defined piecewise linear risk aversion parameter. Totally different from multi-period setting, we are exposed to a Hamilton–Jacobi–Bellman (HJB) system of equations with nonsmooth coefficients, which may not admit classical solution. To overcome the difficulty, we first study some important properties of the time consistent mean-variance policy and then solve the HJB system of equations in two separate domains. By combining the solutions in two domains, we derive the time consistent mean-variance policy. In Sect. 2, we formulate the continuous time mean-variance model with piecewise linear risk aversion. In Sect. 3, we obtain the explicit time consistent mean-variance policy by solving the extended HJB system.

2 Portfolio optimization formulation

Our market setting is a standard Black–Scholes model, which includes a risky asset (such as a stock) and a riskless asset (such as a bank account). Denoting the stock price by \(S_t\) and the bank account by \(B_t\), the dynamics of \(S_t\) and \(B_t\) are as follows,

$$\begin{aligned}&\left\{ \begin{array}{l} dS_{t} = \mu S_{t}dt+\sigma S_{t}dW_{t},\\ S_0=s_0, \end{array} \right. \\&\left\{ \begin{array}{l} dB_{t}= rB_{t}dt,\\ B_0=b_0, \end{array} \right. \end{aligned}$$

where \(r>0\) is the interest rate of bank account, \(\mu \) is the appreciation rate of the stock, \(\sigma >0\) is the volatility or dispersion rate of the stock and \(W_t\) is a standard Brownian motion defined on a filtered probability space \((\varOmega , \mathcal {F}_T, \{\mathcal {F}_t\}_{t\in [0,T]},P)\). We assume that r, \(\mu \), \(\sigma \) are constants. In the analysis below, we will study self-financing portfolios (without consumption) consisting of the risky stock and the bank account. Denoting the dollar value invested in the risky asset at time t by \(u_{t}\), the value of the portfolio at time t, \(X_t^u\), is given by

$$\begin{aligned}&\left\{ \begin{array}{l} dX_{t}^u= [rX_{t}^u+(\mu -r)u_{t}]dt+\sigma u_{t}dW_{t},\\ X_0^u=x_0, \end{array} \right. \end{aligned}$$
(1)

where \(x_0\) is the initial wealth level.

At time t, the investor faces a mean-variance portfolio selection problem,

$$\begin{aligned} (MV_t(\gamma (x))):\quad \min _{\{u_s\}_{s\in [t,T]}}&\text{ Var }_{t,x}(X_{T}^u)-\gamma (x)\mathbb {E}_{t,x}[X_T^u]\\ \text{ s.t. }&\left\{ \begin{array}{l} dX_{s}^u= [rX_{s}^u+(\mu -r)u_{s}]dt+\sigma u_{s}dW_{s},\\ X_t^u=x, \end{array} \right. \end{aligned}$$

where \(\text{ Var }_{t,x}(X_{T}^u)=\text{ Var }(X_{T}^u|X_t^u=x)\), \(\mathbb {E}_{t,x}[X_T^u]=\mathbb {E}[X_T^u|X_t^u=x]\). The risk aversion parameter of the investor is assumed to be

$$\begin{aligned} \gamma (x)= {\left\{ \begin{array}{ll} \gamma ^{+}(x-\rho _{t}^{-1}W), &{} \text {if } x\ge \rho _{t}^{-1}W,\\ -\gamma ^{-}(x-\rho _{t}^{-1}W), &{} \text {if } x< \rho _{t}^{-1}W, \end{array}\right. } \end{aligned}$$

where W is the investment target set by the investor, \(\rho _{t}^{-1}=e^{-r(T-t)}\) is the riskless discount factor from current time t to terminal time T, \(\gamma ^{+}\ne 0\) and \(\gamma ^{-}\ne 0\) are the constant risk aversion coefficients. The different signs of \(\gamma ^{+}\) and \(\gamma ^{-}\) respond different risk attitudes of the investor. In the case of \(\gamma ^{+}> 0\) and \(\gamma ^{-}>0\), if current wealth level is larger than the discounted investment target, the investor becomes less risk averse along with the increase of current wealth level; if current wealth level is less than the discounted investment target, the investor becomes less risk averse along with the decrease of current wealth level.

Setting \(Y_{t}^u=X_{t}^u-\rho _{t}^{-1}W\), we have

$$\begin{aligned} dY_{t}^u&= dX_{t}^u-r\rho _{t}^{-1}Wdt\\&= [r(X_{t}^u-\rho _{t}^{-1}W)+(\mu -r)u_{t}]dt+\sigma u_{t}dW_{t}\\&= [rY_{t}^u+(\mu -r)u_{t}]dt+\sigma u_{t}dW_{t}. \end{aligned}$$

Furthermore,

$$\begin{aligned} \text{ Var }_{t,x}(X_{T}^u)&=\text{ Var }_{t,y}(Y_T^u),\\ \mathbb {E}_{t,x}[X_T^u]&=\mathbb {E}_{t,y}[Y_T^u]+\rho _t^{-1}W, \end{aligned}$$

with \(y=x-\rho _t^{-1}W\). Thus, problem \((MV_t(\gamma (x)))\) is equivalent to

$$\begin{aligned} (MV_t(\gamma (y))):\quad \min _{\{u_s\}_{s\in [t,T]}}&\text{ Var }_{t,y}(Y_{T}^{u})-\gamma (y)\mathbb {E}_{t,y}[Y_{T}^{u}]\nonumber \\ \text{ s.t. }&\left\{ \begin{array}{l} dY_{s}^u= [rY_{s}^u+(\mu -r)u_{s}]dt+\sigma u_{s}dW_{s},\\ Y_t^u=y, \end{array} \right. \end{aligned}$$
(2)

where

$$\begin{aligned} \gamma (y)= {\left\{ \begin{array}{ll} \gamma ^{+}y, &{} \text {if } y \ge 0, \\ -\gamma ^{-}y, &{} \text {if } y < 0. \end{array}\right. } \end{aligned}$$

3 Time consistent mean-variance policy

We define the objective function as

$$\begin{aligned} J(t,y,u)=\text{ Var }_{t,y}(Y_{T}^{u})-\gamma (y)\mathbb {E}_{t,y}[Y_{T}^{u}]=E_{t,y}[F(y,Y_T^u)]+G(y,\mathbb {E}_{t,y}[Y_{T}^{u}]), \end{aligned}$$
(3)

where the two new functions \(F(y,z)=z^2-\gamma (y)z\) and \(G(y,z)=-z^2\). We try to derive the time consistent investment policy (i.e., the subgame Nash equilibrium policy) of J(tyu), \(\hat{u}\), which is rigorously defined in [4] and restated in the following definition.

Definition 1

(Björk et al. [4]) Given a policy \(\hat{u}\), construct a policy \(u_h\) by

$$\begin{aligned} u_h(s,y)=\left\{ \begin{array}{ll} u, &{}\quad \text{ for } t\le s<t+h,\\ \hat{u}(s,y), &{}\quad \text{ for } t+h\le s\le T, \end{array}\right. \end{aligned}$$

where \(u\in \mathbb {R}\), \(h>0\) and \((t,y)\in [0,T]\times \mathbb {R}\) are arbitrarily chosen. If

for all \(u\in \mathbb {R}\) and \((t,y)\in [0,T]\times \mathbb {R}\), we say that \(\hat{u}\) is an equilibrium policy. And the equilibrium value function V is defined by

$$\begin{aligned} V(t,y)=J(t,y,\hat{u}). \end{aligned}$$

Following the time inconsistent stochastic control approach proposed in Björk and Murgoci [3] and Björk et al. [4], the extended HJB system of equations for the subgame Nash equilibrium problem takes the following form:

$$\begin{aligned}&\displaystyle \inf _{u\in \mathbb {R}}\{ (\mathbf {A}^uV)(t,y)-(\mathbf {A}^uf)(t,y,y)+(\mathbf {A}^uf^y)(t,y)\\&\displaystyle \quad -\mathbf {A}^u(G\diamond g)(t,y)+(\mathbf {H}^ug)(t,y)\}=0,\quad 0\le t\le T, \\&\displaystyle \mathbf {A}^{\hat{u}}f^{z}(t,y)=0,\quad 0\le t\le T, \\&\displaystyle \mathbf {A}^{\hat{u}}g(t,y)=0,\quad 0\le t\le T, \\&\displaystyle V(T,y)=F(y,y)+G(y,y),\\&\displaystyle f(T,y,z)=F(z,y),\\&\displaystyle g(T,y)=y, \end{aligned}$$

where the infinitesimal operator \(\mathbf {A}^u\) and notations \(f^x\), \(G\diamond g\), \(\mathbf {H}^u g\) are defined by

$$\begin{aligned}&\displaystyle \mathbf {A}^u=\frac{\partial }{\partial t}+[ry+(\mu -r)u]\frac{\partial }{\partial y} +\frac{1}{2}\sigma ^2u^2 \frac{\partial ^2}{\partial y^2},\\&\displaystyle f^z(t,y)=f(t,y,z),\\&\displaystyle (G\diamond g)(t,y)=G(y,g(t,y)),\\&\displaystyle \mathbf {H}^u g(t,y)=\frac{\partial G}{\partial z}(y,z)\cdot \mathbf {A}^u g(t,y). \end{aligned}$$

Given \(F(y,z)=z^2-\gamma (y)z\) and \(G(y,z)=-z^2\), the HJB system of equations can be reduce into the following,

$$\begin{aligned}&\displaystyle V_{t}(t,y) + \inf _{u\in \mathbb {R}}\Big \{[ry+(\mu -r)u](V_{y}(t,y)-f_{z}(t,y,y)) \nonumber \\&\displaystyle \quad +\frac{1}{2}\sigma ^{2}u^{2}(V_{yy}(t,y)-f_{zz}(t,y,y)\nonumber \\&\displaystyle \quad -2f_{yz}(t,y,y)+2(g_{y}(t,y))^{2})\Big \}=0,\quad 0\le t\le T, \end{aligned}$$
(4)
$$\begin{aligned}&\displaystyle f_{t}(t,y,z)+[ry+(\mu -r)\hat{u}]f_{y}(t,y,z)+ \frac{1}{2}\sigma ^{2}\hat{u}^{2}f_{yy}(t,y,z) =0,\quad 0\le t\le T, \nonumber \\ \end{aligned}$$
(5)
$$\begin{aligned}&\displaystyle g_{t}(t,y)+[ry+(\mu -r)\hat{u}]g_{y}(t,y)+\frac{1}{2}\sigma ^{2}\hat{u}^{2}g_{yy}(t,y) =0,\quad 0\le t\le T, \end{aligned}$$
(6)
$$\begin{aligned}&\displaystyle V(T,y) =-\gamma (y)y,\end{aligned}$$
(7)
$$\begin{aligned}&\displaystyle f(T,y,z) =y^2-\gamma (z)y,\end{aligned}$$
(8)
$$\begin{aligned}&\displaystyle g(T,y) =y, \end{aligned}$$
(9)

where \(V_t\), \(V_y\), \(V_{yy}\), \(f_z\), \(f_{zz}\), \(f_{yz}\), \(g_y\) are corresponding partial derivatives of V(ty) and f(tyz).

However, as the risk version parameter \(\gamma (y)\) is a nonsmooth function, there does not exist classical solution of the HJB system (4)–(9). To overcome this difficulty, we first investigate the properties of time consistent mean-variance policy.

Proposition 1

The time consistent mean-variance policy has the following properties:

  1. 1.

    Whenever the state \(Y_t^u=0\), time consistent mean-variance policy over [tT] is

    $$\begin{aligned} \hat{u}(s, Y_s^{\hat{u}}=0)=0,\quad \text{ for }\quad t\le s\le T. \end{aligned}$$
  2. 2.

    When \(Y_0^u\le 0~(or \ge 0)\), the state at time t along time consistent mean-variance policy is always nonpositive (or nonnegative), i.e., \(Y_t^{\hat{u}}\le 0~(or \ge 0)\).

Proof

When the state \(Y_s^{\hat{u}}=0\) at future time s (\(t\le s\le T\)), the investor becomes totally risk averse. The objective function of the investor at time s can attain its minimum value 0 by investing all the wealth in the riskless asset for time interval [sT], which implies that the optimal decisions of the investor at all future time instances are consistent. On the other hand, along the proposed time consistent mean-variance policy \(\hat{u}(s,Y_s^{\hat{u}})=0\) (i.e., investing all the wealth in the riskless asset), the state \(Y_s^{\hat{u}}=0\) for \(t\le s\le T\). Therefore, the investor at all future time instances will become totally risk averse and would like to insist on investing all the wealth in the riskless asset, which implies that the subgame Nash equilibrium policy is just to invest all the wealth in the riskless asset.

For the second property, as the state process \(\{Y_t^u\}_{t\in [0,T]}\) has continuous paths, \(Y_t^u\) should touch zero before changing its sign, and remain to be zero due to the first property. \(\square \)

With the help of Proposition 1, we can solve the HJB system (4)–(9) in two domains, \(\{(t,y)|~y\ge 0\}\) and \(\{(t,y)|~y\le 0\}\). In each domain, the risk aversion parameter \(\gamma (\cdot )\) becomes a smooth function now. Our main result is given in the following Theorem.

Theorem 1

Under piecewise linear state-dependent risk aversion framework, the time consistent mean-variance policy is given by

$$\begin{aligned} \hat{u}(t,y)= {\left\{ \begin{array}{ll} -k^{+}(t)y, &{} \text {if } y \ge 0, \\ -k^{-}(t)y, &{} \text {if } y \le 0, \end{array}\right. } \end{aligned}$$
(10)

where

$$\begin{aligned} k^+(t)&=\frac{\mu -r}{\sigma ^{2}}\cdot \frac{2c^+(t)-\gamma ^+b^+(t)-2(b^+(t))^2}{2c^+(t)},\end{aligned}$$
(11)
$$\begin{aligned} k^-(t)&=\frac{\mu -r}{\sigma ^{2}}\cdot \frac{2c^-(t)+\gamma ^-b^-(t)-2(b^-(t))^2}{2c^-(t)}. \end{aligned}$$
(12)

The parameters \(b^{+}(t)\), \(c^{+}(t)\), \(b^{-}(t)\), \(c^{-}(t)\) solve the following system of ordinary differential equations,

$$\begin{aligned} \dot{c}^{+}(t)+ 2[r-(\mu -r)k^+(t)]c^+(t) + \sigma ^{2}(k^+(t))^{2}c^+(t)=0,\\ c^+(T)=1,\\ \dot{b}^+(t)+[r-(\mu -r)k^+(t)]b^+(y)=0,\\ b^+(T)=1,\\ \dot{c}^-(t)+ 2[r-(\mu -r)k^-(t)]c^-(t)+ \sigma ^{2}(k^-(t))^{2}c^-(t)=0,\\ c^-(T)=1,\\ \dot{b}^-(t)+[r-(\mu -r)k^-(t)]b^-(t)=0,\\ b^-(T)=1, \end{aligned}$$

where \(\dot{c}^+(t)\), \(\dot{b}^+(t)\), \(\dot{c}^-(t)\) and \(\dot{b}^-(t)\) are the first order derivatives with respective to time t.

Proof

We have the probabilistic interpretations of f(tyz), g(ty) and V(ty) as follows,

$$\begin{aligned} f(t,y,z)&=\mathbb {E}_{t,y}[F(y,Y_T^{\hat{u}})]=\mathbb {E}_{t,y}[(Y_T^{\hat{u}})^2]-\gamma (z)\mathbb {E}_{t,y}[Y_T^{\hat{u}}],\\ g(t,y)&=\mathbb {E}_{t,y}[Y_T^{\hat{u}}],\\ V(t,y)&=\mathbb {E}_{t,y}[F(y,Y_T^{\hat{u}})]+G(y,g(t,y))=f(t,y,y)-g^2(t,y). \end{aligned}$$

Then, HJB equation (4) can be reduced into

$$\begin{aligned}&f_{t}(t,y,y) -2g(t,y)g_t(t,y) + \inf _{u\in \mathbb {R}}\Big \{[ry+(\mu -r)u](f_y(t,y,y)-2g(t,y)g_y(t,y)) \\&\quad + \frac{1}{2}\sigma ^{2}u^{2}(f_{yy}(t,y,y)-2g(t,y)g_{yy}(t,y))\Big \}=0, \end{aligned}$$

which implies

$$\begin{aligned} \hat{u}(t,y)=-\frac{\mu -r}{\sigma ^2}\frac{f_{y}(t,y,y)-2g(t,y)g_{y}(t,y)}{f_{yy}(t,y,y)-2g(t,y)g_{yy}(t,y)}. \end{aligned}$$

For \(\{(t,y) |~ y\ge 0\}\), due to the second property in Proposition 1, \(z=Y_T^{\hat{u}}\) has the same sign as y. We prove that the solution set of HJB system is

$$\begin{aligned} V(t,y)=&~ [c^+(t)-\gamma ^+b^+(t)-(b^+(t))^2] y^2,\\ f(t,y,z) =&~ c^+(t)y^2-\gamma ^+b^+(t)yz,\\ g(t,y) =&~ b^{+}(t)y. \end{aligned}$$

It is easy to check that setting \(c^+(T)=b^+(T)=1\), V(Ty), f(Tyz) and g(Ty) satisfies terminal conditions (7)–(9). Furthermore,

$$\begin{aligned} g_{t}(t,y)&=\dot{b}^+(t)y, \quad g_{y}(t,y)=b^{+}(t), \quad g_{yy}(t,y)=0,\\ f_{t}(t,y,z)&= \dot{c}^{+}(t)y^2-\gamma ^+\dot{b}^+(t)yz,\quad f_{y}(t,y,z) = 2c^+(t)y-\gamma ^+b^+(t)z,\\ f_{yy}(t,y,z)&= 2c^+(t). \end{aligned}$$

The time consistent policy is given by

$$\begin{aligned} \hat{u}(t,y)&= -\frac{\mu -r}{\sigma ^{2}}\frac{2c^+(t)-\gamma ^+b^+(t)-2(b^+(t))^2}{2c^+(t)}y=-k^+(t) y. \end{aligned}$$

Then, the HJB equations (4)–(6) become,

$$\begin{aligned}&\displaystyle \dot{c}^{+}y^2-\gamma ^+\dot{b}^+y^2-2b^{+}\dot{b}^+y^2 + [ry+(\mu -r)\hat{u}][2c^+y \\&\displaystyle \quad -\gamma ^+b^+y-2(b^{+})^2y] + \sigma ^{2}\hat{u}^{2}c^+=0,\\&\displaystyle \dot{c}^{+}y^2-\gamma ^+\dot{b}^+yz +[ry+(\mu -r)\hat{u}][2c^+y-\gamma ^+b^+z]+\sigma ^{2}\hat{u}^{2}c^+=0,\\&\displaystyle \dot{b}^+y+[ry+(\mu -r)\hat{u}]b^+=0, \end{aligned}$$

which implies

$$\begin{aligned}&\displaystyle \dot{c}^{+}+ 2[r-(\mu -r)k^+]c^+ + \sigma ^{2}(k^+)^{2}c^+=0,\\&\displaystyle \dot{b}^++[r-(\mu -r)k^+]b^+=0. \end{aligned}$$

Here we omit the arguments of functions \(c^+(t)\), \(b^+(t)\) and \(k^+(t)\).

For \(\{(t,y) |~ y\le 0\}\), due to the second property in Proposition 1, \(z=Y_T^{\hat{u}}\) has the same sign as y. We prove that the solution set of HJB system is

$$\begin{aligned}&\displaystyle V(t,y)=~ [c^-(t)+\gamma ^-b^-(t)-(b^-(t))^2] y^2,\\&\displaystyle f(t,y,z) =~ c^-(t)y^2+\gamma ^-b^-(t)yz,\\&\displaystyle g(t,y) =~ b^{-}(t)y. \end{aligned}$$

It is easy to check that setting \(c^-(T)=b^-(T)=1\), V(Ty), f(Tyz) and g(Ty) satisfies terminal conditions (7)–(9). Furthermore,

$$\begin{aligned} g_{t}(t,y)&=\dot{b}^-(t)y, \qquad g_{y}(t,y)=b^-(t), \qquad g_{yy}(t,y)=0,\\ f_{y}(t,y,z)&=2c^-(t)y+\gamma ^-b^-(t)z, \qquad f_{t}(t,y,z) = \dot{c}^-(t)y^2+\gamma ^-\dot{b}^-(t)yz,\\ f_{yy}(t,y,z)&= 2c^-(t). \end{aligned}$$

The time-consistent policy is given by

$$\begin{aligned} \hat{u}(t,y)&= -\frac{\mu -r}{\sigma ^{2}}\frac{2c^-(t)+\gamma ^-b^-(t)-2(b^-(t))^2}{2c^-(t)}y=-k^-(t) y. \end{aligned}$$

Then, the HJB equations (4) - (6) become,

$$\begin{aligned}&\displaystyle \dot{c}^-y^2+\gamma ^-\dot{b}^-y^2-2b^-\dot{b}^-y^2 + [ry+(\mu -r)\hat{u}][2c^-y \\&\displaystyle \quad +\gamma ^-b^-y-2(b^-)^2y] + \sigma ^{2}\hat{u}^{2}c^-=0,\\&\displaystyle \dot{c}^-y^2+\gamma ^-\dot{b}^-yz +[ry+(\mu -r)\hat{u}][2c^-y+\gamma ^-b^-z]+\sigma ^{2}\hat{u}^{2}c^-=0,\\&\displaystyle \dot{b}^-y+[ry+(\mu -r)\hat{u}]b^-=0, \end{aligned}$$

which implies

$$\begin{aligned}&\displaystyle \dot{c}^-+ 2[r-(\mu -r)k^-]c^- + \sigma ^{2}(k^-)^{2}c^-=0,\\&\displaystyle \dot{b}^-+[r-(\mu -r)k^-]b^-=0. \end{aligned}$$

Here we also omit the arguments of functions \(c^-(t)\), \(b^-(t)\) and \(k^-(t)\). \(\square \)

Remark 1

Theorem 1 has shown that in both domains \(\{(t,y)|y\ge 0\}\) and \(\{(t,y)|y\le 0\}\), the time consistent mean-variance policy has the same form as the one in Björk et al. [4]’s paper. It is an extension of [4]’s result.

Remark 2

When substituting the time consistent mean-variance policy \(\hat{u}\) back to the dynamics of state \(Y_t^u\) in (2), we can see that \(Y_{t}^{\hat{u}}\) is a geometric Brownian motion, which implies \(Y_s^{\hat{u}}> 0~(\text{ or } < 0)\) for \(s\in (t,T]\), if and only if \(Y_{t}^{\hat{u}}=y> 0~ (\text{ or } < 0)\). It means that if the current wealth level is less than (or larger than) the discounted investment target, the future wealth level along the time consistent mean-variance policy is always less than (or larger than) the discounted investment target.