Fused lasso nearly-isotonic signal approximation in general dimensions

Pastukhov, Vladimir

doi:10.1007/s11222-024-10432-6

Fused lasso nearly-isotonic signal approximation in general dimensions

Original Paper
Open access
Published: 22 May 2024

Volume 34, article number 120, (2024)
Cite this article

Download PDF

You have full access to this open access article

Statistics and Computing Aims and scope Submit manuscript

Fused lasso nearly-isotonic signal approximation in general dimensions

Download PDF

Vladimir Pastukhov¹

267 Accesses
Explore all metrics

Abstract

In this paper, we introduce and study fused lasso nearly-isotonic signal approximation, which is a combination of fused lasso and generalized nearly-isotonic regression. We show how these three estimators relate to each other and derive solution to a general problem. Our estimator is computationally feasible and provides a trade-off between monotonicity, block sparsity, and goodness-of-fit. Next, we prove that fusion and near-isotonisation in a one-dimensional case can be applied interchangably, and this step-wise procedure gives the solution to the original optimization problem. This property of the estimator is very important, because it provides a direct way to construct a path solution when one of the penalization parameters is fixed. Also, we derive an unbiased estimator of degrees of freedom of the estimator.

Group Fused Lasso

Bayesian fused lasso modeling via horseshoe prior

Article Open access 21 August 2023

A dual symmetric Gauss-Seidel technique-based proximal ADMM for robust fused lasso estimation

Article 23 April 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

This work is motivated by recent papers in nearly-constrained estimation in several dimensions, and by the papers in generalised penalized least squared regression. The subject of penalized estimators starts with $L_{1}$-penalisation, cf. Tibshirani (1996), which is called lasso signal approximation, and $L_{2}$-penalisation, which is usually addressed as ridge regression (Hoerl and Kennard 1970) or sometimes as Tikhonov–Philips regularization (Phillips 1962; Tikhonov et al. 1995). The first generalisation of lasso is $L_{1}$-penalisation imposed on the successive differences of the coefficients. For a given sequence of data points ${\varvec{y}} \in {\mathbb {R}}^{n}$, the fusion approximator (cf. Rinaldo 2009) is given by

$$\begin{aligned} \hat{{\varvec{\beta }}}^{F}({\varvec{y}}, \lambda _{F}) = \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2} +\lambda _{F} \sum _{i=1}^{n-1}|\beta _{i} - \beta _{i+1}|.\nonumber \\ \end{aligned}$$

(1)

The combination of fusion approximator and lasso is called fused lasso estimator and is given by:

$$\begin{aligned} \hat{{\varvec{\beta }}}^{FL}({\varvec{y}}, \lambda _{F}, \lambda _{L}){} & {} = \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2}\nonumber \\{} & {} \quad +\lambda _{F}\sum _{i=1}^{n-1}|\beta _{i} - \beta _{i+1}| + \lambda _{L} ||{\varvec{\beta }}||_{1}. \end{aligned}$$

(2)

The fused lasso was introduced in Tibshirani et al. (2005), and its asymptotic properties were studied in detail in Rinaldo (2009). Also, it is worth to note that in the paper Tibshirani and Taylor (2011) the estimator in (1) is called the fused lasso, while the estimator in (2) is addressed as the sparse fused lasso.

In the area of constrained inference the basic and simplest problem is isotonic regression in onedimension. For a given sequence of data points ${\varvec{y}} \in {\mathbb {R}}^{n}$, the isotonic regression is the following approximation

$$\begin{aligned} \hat{{\varvec{\beta }}}^{I} = \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min }||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2}, \ \text {subject to} \ \beta _{1} \le \beta _{2} \le \dots \le \beta _{n},\nonumber \\ \end{aligned}$$

(3)

i.e. it is $\ell ^{2}$-projection of the vector ${\varvec{y}}$ onto the set of non-increasing vectors in ${\mathbb {R}}^{n}$. The notion of isotonic “regression” in this context might be confusing. Nevertheless, it is a standard notion in this subject, cf., for example, the papers Best and Chakravarti (1990); Stout (2013), where the notation “isotonic regression” is used for the isotonic projection of a general vector. Also, in this paper we use the notations “regression”, “estimator” and “approximator” interchangeably. A general introduction to isotonic regression can be found, for example, in Robertson et al. (1988).

The nearly-isotonic regression, introduced in Tibshirani et al. (2011) and studied in detail in Minami (2020), is a less restrictive version of isotonic regression, and it is given by the following optimization problem

$$\begin{aligned} \hat{{\varvec{\beta }}}^{NI}({\varvec{y}}, \lambda _{NI})= & {} \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2} \nonumber \\{} & {} + \lambda _{NI}\sum _{i=1}^{n-1}|\beta _{i} - \beta _{i+1}|_{+}, \end{aligned}$$

(4)

where $x_{+} = x \cdot 1\{x > 0 \}$.

In this paper we combine the fused lasso estimator with nearly-isotonic regression and call the resulting estimator as fused lasso nearly-isotonic signal approximator, i.e. for a given sequence of data points ${\varvec{y}} \in {\mathbb {R}}^{n}$, the problem in one-dimensional case is the following optimization:

$$\begin{aligned} \begin{aligned}&\hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI})\\&\quad = \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2}+ \lambda _{F} \sum _{i=1}^{n-1}|\beta _{i} - \beta _{i+1}| \\&\qquad + \lambda _{L} ||{\varvec{\beta }}||_{1} + \lambda _{NI}\sum _{i=1}^{n-1}|\beta _{i} - \beta _{i+1}|_{+}. \end{aligned} \end{aligned}$$

(5)

Also, in the case of $\lambda _{F} \ne 0$ and $\lambda _{NI} \ne 0$ with $\lambda _{L} = 0$ we call the estimator fused nearly-isotonic regression, i.e.

$$\begin{aligned} \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI})\equiv & {} {} \hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}}, \lambda _{F}, 0, \lambda _{NI}) \nonumber \\= & {} \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2}\nonumber \\{} & {} + \lambda _{F} \sum _{i=1}^{n-1}|\beta _{i} - \beta _{i+1}|\nonumber \\{} & {} +\lambda _{NI}\sum _{i=1}^{n-1}|\beta _{i} - \beta _{i+1}|_{+}. \end{aligned}$$

(6)

This generalisation of nearly-isotonic regression in (6) was proposed in the conclusion of the paper Tibshirani et al. (2011). Next, a one-dimensional fused nearly-isotonic regression was considered and numerically solved in Yu et al. (2022) with time complexity ${\mathcal {O}}(n)$. Nevertheless, first, in this paper we consider and solve the problem in general dimensions. Second, for fixed penalisation parameters in a one-dimensional case we also provide a solution with linear complexity and an exact partly path solution (when one of the parameters is fixed and the path is with respect to the other one) with complexity ${\mathcal {O}}(n\log (n))$.

It is also worth to mention the paper Gómez et al. (2022), where the authors studied the nearly-isotonic approximator with extra penalisation term

$$\begin{aligned} (\beta _{i} - \beta _{i+1})^{2}\cdot 1\{(\beta _{i} - \beta _{i+1}) > 0 \} \end{aligned}$$

with an additional lasso penalty. Also, in the paper Gaines et al. (2018) the authors did a comparison of the algorithms to solve the lasso with linear constraints, which is called constrained lasso.

In the next step we state the problem defined in (5) for the general case of isotonic constraints with respect to a general partial order. First, we have to introduce the notation.

1.1 Notation

We start with basic definitions of partial order and isotonic regression. Let ${\mathcal {I}} = \{{\varvec{i}}_{1}, \dots , {\varvec{i}}_{n}\}$ be some index set. Next, we define the following binary relation $\preceq $ on ${\mathcal {I}}$.

A binary relation $\preceq $ on ${\mathcal {I}}$ is called partial order if

it is reflexive, i.e. ${\varvec{j}}\preceq {\varvec{j}}$ for all ${\varvec{j}} \in {\mathcal {I}}$;
it is transitive, i.e. ${\varvec{j}}_{1}, {\varvec{j}}_{2}, {\varvec{j}}_{3} \in {\mathcal {I}}$, ${\varvec{j}}_{1} \preceq {\varvec{j}}_{2}$ and ${\varvec{j}}_{2} \preceq {\varvec{j}}_{3}$ imply ${\varvec{j}}_{1} \preceq {\varvec{j}}_{3}$;
it is antisymmetric, i.e. ${\varvec{j}}_{1}, {\varvec{j}}_{2} \in {\mathcal {I}}$, ${\varvec{j}}_{1} \preceq {\varvec{j}}_{2}$ and ${\varvec{j}}_{2} \preceq {\varvec{j}}_{1}$ imply ${\varvec{j}}_{1} = {\varvec{j}}_{2}$.

Further, a vector ${\varvec{\beta }}\in {\mathbb {R}}^{n}$ indexed by ${\mathcal {I}}$ is called isotonic with respect to the partial order $\preceq $ on ${\mathcal {I}}$ if ${\varvec{j}}_{1} \preceq {\varvec{j}}_{2}$ implies $\beta _{{\varvec{j}}_{1}} \le \beta _{{\varvec{j}}_{2}}$. We denote the set of all isotonic vectors in ${\mathbb {R}}^{n}$ with respect to the partial order $\preceq $ on ${\mathcal {I}}$ by ${\varvec{{\mathcal {B}}}}^{is}$, which is a closed convex cone in ${\mathbb {R}}^{n}$ and it is also called isotonic cone. Next, a vector ${\varvec{\beta }}^{I}\in {\mathbb {R}}^{n}$ is the isotonic regression of an arbitrary vector ${\varvec{y}} \in {\mathbb {R}}^{n}$ over the pre-ordered index set ${\mathcal {I}}$ if

$$\begin{aligned} {\varvec{\beta }}^{I} = \underset{{\varvec{\beta }} \in {\varvec{{\mathcal {B}}}}^{is}}{\arg \min } \sum _{{\varvec{j}} \in {\mathcal {I}}}(\beta _{{\varvec{j}}} - y_{{\varvec{j}}})^{2}. \end{aligned}$$

(7)

For any partial order relation $\preceq $ on ${\mathcal {I}}$ there exists a directed graph $G = (V,E)$, with $V = {\mathcal {I}}$ and E is the minimal set of edges such that

$$\begin{aligned}{} & {} E = \{({\varvec{j}}_{1}, {\varvec{j}}_{2}), \, \text {where} \, ({\varvec{j_{1}}},{\varvec{j_{2}}}) \,\nonumber \\{} & {} \quad \text { is the ordered pair of vertices from } \, {\mathcal {I}}\}, \end{aligned}$$

(8)

and an arbitrary vector ${\varvec{\beta }} \in {\mathbb {R}}^{n}$ is isotonic with respect to $\preceq $ iff $\beta _{{\varvec{l_{1}}}} \le \beta _{{\varvec{l_{2}}}}$, given that E contains the chain of edges from ${\varvec{l}}_{1} \in V$ to ${\varvec{l}}_{2} \in V$.

Now we can generalise the estimators discussed above. First, equivalent to the definition in (7), a vector ${\varvec{\beta }}^{I}\in {\mathbb {R}}^{n}$ is the isotonic regression of an arbitrary vector ${\varvec{y}} \in {\mathbb {R}}^{n}$ indexed by the partially ordered index set ${\mathcal {I}}$ if

$$\begin{aligned} {\varvec{\beta }}^{I} = \underset{{\varvec{\beta }}}{\arg \min } \sum _{{\varvec{j}} \in {\mathcal {I}}}(\beta _{{\varvec{j}}} - y_{{\varvec{j}}})^{2}, \end{aligned}$$

(9)

subject to $\beta _{{\varvec{l_{1}}}} \le \beta _{{\varvec{l_{2}}}}$, whenever E contains the chain of edges from ${\varvec{l}}_{1} \in V$ to ${\varvec{l}}_{2} \in V$.

Second, for the directed graph $G = (V, E)$, which corresponds to the partial order $\preceq $ on ${\mathcal {I}}$, the nearly-isotonic regression of ${\varvec{y}}\in {\mathbb {R}}^{n}$ indexed by ${\mathcal {I}}$ is given by

$$\begin{aligned} \hat{{\varvec{\beta }}}^{NI}({\varvec{y}}, \lambda _{NI})= & {} \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2} \nonumber \\{} & {} + \lambda _{NI}\sum _{({\varvec{i}},{\varvec{j}})\in E}|\beta _{{\varvec{i}}} - \beta _{{\varvec{j}}}|_{+}. \end{aligned}$$

(10)

This generalisation of nearly-isotonic regression was introduced and studied in Minami (2020).

Next, fused and fused lasso approximators for a general directed graph $G = (V, E)$ are given by

$$\begin{aligned} \hat{{\varvec{\beta }}}^{F}({\varvec{y}}, \lambda _{F}) = \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2} +\lambda _{F}\sum _{({\varvec{i}},{\varvec{j}})\in E}|\beta _{{\varvec{i}}} - \beta _{{\varvec{j}}}|,\nonumber \\ \end{aligned}$$

(11)

and

$$\begin{aligned} \hat{{\varvec{\beta }}}^{FL}({\varvec{y}}, \lambda _{F}, \lambda _{L}){} & {} = \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2} \nonumber \\{} & {} \quad +\lambda _{F}\sum _{({\varvec{i}},{\varvec{j}})\in E}|\beta _{{\varvec{i}}} - \beta _{{\varvec{j}}}| + \lambda _{L} ||{\varvec{\beta }}||_{1}. \end{aligned}$$

(12)

These optimization problems were introduced and solved for a general graph in Friedman et al. (2007); Hoefling (2010); Tibshirani and Taylor (2011).

Further, let D denote the oriented incidence matrix for the directed graph $G = (V, E)$, corresponding to $\preceq $ on ${\mathcal {I}}$. We choose the orientation of D in the following way. Assume that the graph G with n vertexes has m edges. Next, assume we label the vertexes by $\{1, \dots , n\}$ and edges by $\{1, \dots , m\}$. Then, D is $m\times n$ matrix with

$$\begin{aligned} D_{i,j} = {\left\{ \begin{array}{ll} 1, &{} \quad \hbox { if vertex } j \hbox { is the source of edge } i, \\ -1, &{} \quad \hbox { if vertex } j \hbox { is the target of edge } i,\\ 0, &{} \quad \text {otherwise}. \end{array}\right. } \end{aligned}$$

(13)

In order to clarify the notations we consider the following examples of partial order relation. First, let us consider the monotonic order relation in the one-dimensional case. Let ${\mathcal {I}} = \{1, \dots , n\}$, and for $j_{1} \in {\mathcal {I}}$ and $j_{2} \in {\mathcal {I}}$ we naturally define $j_{1}\preceq j_{2}$ if $j_{1} \le j_{2}$. Further, if we let $V = {\mathcal {I}}$ and $E = \{(i, i+1): i = 1, \dots , n-1 \}$, then $G = (V, E)$ is the directed graph which corresponds to the one-dimensional order relation on ${\mathcal {I}}$. Figure 1 displays the graph and the incidence matrix for the graph.

Next, we consider two dimensional case with bimonotonic constraints. The notion of bimonotonicity was first introduced in Beran and Dümbgen (2010) and it means the following. Let us consider the index set

$$\begin{aligned}{} & {} {\mathcal {I}} = \{ {\varvec{i}}= (i^{(1)},i^{(2)}): \, i^{(1)}=1,2,\dots , n_{1}, \\{} & {} i^{(2)}=1,2,\dots , n_{2}\} \end{aligned}$$

with the following order relation $\preceq $ on it: for ${\varvec{j}}_{1}, {\varvec{j}}_{2}\in {\mathcal {I}}$ we have ${\varvec{j}}_{1} \preceq {\varvec{j}}_{2}$ iff $j^{(1)}_{1} \le j^{(1)}_{2}$ and $j^{(2)}_{1} \le j^{(2)}_{2}$. Then, a vector ${\varvec{\beta }}\in {\mathbb {R}}^{n}$, with $n=n_{1}n_{2}$, indexed by ${\mathcal {I}}$ is called bimonotone if it is isotonic with respect to bimonotone order $\preceq $ defined on its index ${\mathcal {I}}$. Further, we define the directed graph $G = (V, E)$ with vertexes $V = {\mathcal {I}}$, and the edges

$$\begin{aligned} \begin{aligned} E ={}&\{((l, k),(l, k+1) ): \, 1 \le l \le n_{1}, 1 \le k \le n_{2} - 1\}\\ \cup \,&\{((l, k),(l+1, k) ): \, 1 \le l \le n_{1}-1, 1 \le k \le n_{2} \}. \end{aligned} \end{aligned}$$

The labeled directed graph for bimonotone constraints and its incidence matrix are displayed in Fig. 2.

1.2 General statement of the problem

Now we can state the general problem studied in this paper. Let ${\varvec{y}} \in {\mathbb {R}}^{n}$ be a signal indexed by the index set ${\mathcal {I}}$ with the partial order relation $\preceq $ defined on ${\mathcal {I}}$. Next, let $G=(V,E)$ be the directed graph corresponding to $\preceq $ on ${\mathcal {I}}$. The fused lasso nearly-isotonic signal approximation with respect to $\preceq $ on ${\mathcal {I}}$ (or, equivalently, to the directed graph $G = (V, E)$, corresponding to $\preceq $) is given by

$$\begin{aligned} \begin{aligned}&\hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI})\\&\quad = \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2} + \lambda _{F} \sum _{({\varvec{i}},{\varvec{j}})\in E}|\beta _{{\varvec{i}}} - \beta _{{\varvec{j}}}| \\&\qquad + \lambda _{L} ||{\varvec{\beta }}||_{1} + \lambda _{NI}\sum _{({\varvec{i}},{\varvec{j}})\in E}|\beta _{{\varvec{i}}} - \beta _{{\varvec{j}}}|_{+}. \end{aligned} \end{aligned}$$

(14)

Therefore, the estimator in (14) is a combination of the estimators in (10) and (12).

Equivalently, we can rewrite the problem in the following way:

$$\begin{aligned} \begin{aligned} \hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI})&= \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2} \\&\quad + \lambda _{F} ||D{\varvec{\beta }}||_{1}\\&\quad + \lambda _{L} ||{\varvec{\beta }}||_{1} + \lambda _{NI}||D{\varvec{\beta }}||_{+}, \end{aligned} \end{aligned}$$

(15)

where D is the oriented incidence matrix of the graph $G =(V, E)$. Here, we clarify that in the case of penalisation with the incidence matrix D we assume that ${\varvec{\beta }}$ is indexed according to the indexing of the edges in the graph $G =(V, E)$. Analogously to the definition in the one-dimensional case, if $\lambda _{L} = 0$ we call the estimator fused nearly-isotonic approximator and denote it by $\hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI})$.

Here, it is worth to mention recent papers in constrained estimation (Deng and Zhang 2020; Han et al. 2019; Han and Zhang 2020), where the authors studied the asymptotic properties of the isotonic regression in general dimensions. Also, in paper Wang et al. (2015) $\ell _{1}$-trend filtering was generalised for the case of a general graph.

1.3 Organisation of the paper

The rest of the paper is organized as follows. In Sect. 2 we provide the numerical solution to the fused lasso nearly-isotonic signal approximator. Section 3 is dedicated to the theoretical properties of the estimator. We show how the solutions to the fused lasso nearly-isotonic regression, fused lasso, and nearly-isotonic regression are related to each other. Also, we prove that in the one-dimensional case the new estimator has agglomerative property and the procedures of near-isotonisation and fusion can be swapped and provide the solution to the original problem. Next, in Sect. 4 we derive the unbiased estimator of the degrees of freedom of the estimator. Furthermore, in Sect. 5 we discuss the computational aspects, do the simulation study and show that the estimator is computationally feasible for moderately large data sets. Also, we illustrate the usage of the estimator for the real data set. The article closes with a conclusion and a discussion of possible generalisations in Sect. 6. The proofs of all results are given in the Appendix. The R and Python implementations of the estimator are available upon request.

2 Solution to the fused lasso nearly-isotonic signal approximator

First, we consider fused nearly-isotonic regression, i.e. in (15) we assume that $\lambda _{L} = 0$.

Theorem 1

For a fixed data vector ${\varvec{y}} \in {\mathbb {R}}^{n}$ indexed by the index set ${\mathcal {I}}$ with the partial order relation $\preceq $ defined on ${\mathcal {I}}$, the solution to the fused nearly-isotonic problem in (15) is given by

$$\begin{aligned} \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) = {\varvec{y}} - D^{T} \hat{{\varvec{\nu }}}(\lambda _{F}, \lambda _{NI}) \end{aligned}$$

(16)

with

$$\begin{aligned}{} & {} \hat{{\varvec{\nu }}}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) = \underset{{\varvec{\nu }} \in {\mathbb {R}}^{m}}{\arg \min } \, \frac{1}{2}||{\varvec{y}} - D^{T}{\varvec{\nu }}||_{2}^{2} \nonumber \\{} & {} \quad \text {s. t.} \quad - \lambda _{F} {\varvec{1}} \le {\varvec{\nu }} \le (\lambda _{F} + \lambda _{NI}){\varvec{1}}, \end{aligned}$$

(17)

where D is the incidence matrix of the directed graph $G = (V, E)$ with n vertices and m edges corresponding to $\preceq $ on ${\mathcal {I}}$, ${\varvec{1}}\in {\mathbb {R}}^{m}$ is the vector whose all elements are equal to 1 and the notation ${\varvec{a}} \le {\varvec{b}}$ for vectors ${\varvec{a}},{\varvec{b}} \in {\mathbb {R}}^{m}$ means $a_{i} \le b_{i}$ for all $i = 1, \dots , m$.

Next, we provide the solution to the fused lasso nearly-isotonic regression.

Theorem 2

For a given vector ${\varvec{y}}$ indexed by ${\mathcal {I}}$, the solution to the fused lasso nearly-isotonic signal approximator $\hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}},\lambda _{F}, \lambda _{L},\lambda _{NI})$ is given by soft thresholding the fused nearly-isotonic regression $\hat{{\varvec{\beta }}}^{FNI}({\varvec{y}},\lambda _{F}, \lambda _{NI})$, i.e.

$$\begin{aligned}{} & {} {\hat{\beta }}^{FLNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, \lambda _{L},\lambda _{NI}) \nonumber \\{} & {} \quad = {\left\{ \begin{array}{ll} {\hat{\beta }}^{FNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, \lambda _{NI}) - \lambda _{L}, &{} \,\text { if } \, {\hat{\beta }}^{FNI}_{{\varvec{i}}} \ge \lambda _{L}, \\ 0, &{} \, \text { if } \, |{\hat{\beta }}^{FNI}_{{\varvec{i}}}| \le \lambda _{L},\\ {\hat{\beta }}^{FNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, \lambda _{NI}) + \lambda _{L}, &{}\, \text { if } \, {\hat{\beta }}^{FNI}_{{\varvec{i}}} \le -\lambda _{L}, \end{array}\right. } \end{aligned}$$

(18)

for ${{\varvec{i}}}\in {\mathcal {I}}$.

From this result we can conclude that adding lasso penalisation does not add much to the computational complexity of the solution. The computational aspects of fused nearly-isotonic approximator will be discussed in Sect. 5 below. In the next section we discuss properties of the fused lasso nearly-isotonic regression.

3 Properties of the fused lasso nearly-isotonic signal approximator

We start with a proposition which shows how the solutions to the optimization problems (11), (10) and (15) are related to each other. This result will be used in the next section to derive degrees of freedom of the fused lasso nearly-isotonic signal approximator.

Proposition 3

For a fixed data vector ${\varvec{y}}$ indexed by ${\mathcal {I}}$ and penalisation parameters $\lambda _{NI}$ and $\lambda _{F}$ the following relations between estimators $\hat{{\varvec{\beta }}}^{F}$, $\hat{{\varvec{\beta }}}^{NI}$ and $\hat{{\varvec{\beta }}}^{FNI}$ hold

$$\begin{aligned}{} & {} \hat{{\varvec{\beta }}}^{NI}({\varvec{y}}, \lambda _{NI}) = \hat{{\varvec{\beta }}}^{F}\left( {\varvec{y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \frac{1}{2}\lambda _{NI}\right) , \end{aligned}$$

(19)

$$\begin{aligned}{} & {} \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) = \hat{{\varvec{\beta }}}^{NI}({\varvec{y}} + \lambda _{F}D^{T}{\varvec{1}}, \lambda _{NI} + 2\lambda _{F})\nonumber \\{} & {} \qquad \qquad \qquad \qquad \qquad \quad = \hat{{\varvec{\beta }}}^{F}\left( {\varvec{y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \frac{1}{2}\lambda _{NI} + \lambda _{F}\right) \nonumber \\ \end{aligned}$$

(20)

and

$$\begin{aligned} \hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI})= & {} \hat{{\varvec{\beta }}}^{FL}({\varvec{y}}- \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}},\nonumber \\{} & {} \frac{1}{2}\lambda _{NI} + \lambda _{F}, \lambda _{L}), \end{aligned}$$

(21)

where D is the oriented incidence matrix for the graph $G = (V, E)$ corresponding to the partial order relation $\preceq $ on ${\mathcal {I}}$.

Further, let us introduce two "naive" versions of $\hat{{\varvec{\beta }}}^{FNI}$. Instead of simultaniously penalise by fusion and isotonisation we consider the following two-step procedures:

$$\begin{aligned} \begin{aligned} \hat{{\varvec{\beta }}}^{F\rightarrow NI}({\varvec{y}}, \lambda _{F}, \lambda _{NI})&= {} \hat{{\varvec{\beta }}}^{NI}(\hat{{\varvec{\beta }}}^{F}({\varvec{y}}, \lambda _{F}), \lambda _{NI}) \\&\equiv \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} || \hat{{\varvec{\beta }}}^{F}({\varvec{y}}, \lambda _{F}) \\&\quad - {\varvec{\beta }}||_{2}^{2} + \lambda _{NI}\sum _{({\varvec{i}},{\varvec{j}})\in E}|\beta _{{\varvec{i}}} - \beta _{{\varvec{j}}}|_{+}, \end{aligned} \end{aligned}$$

(22)

and

$$\begin{aligned} \begin{aligned} \hat{{\varvec{\beta }}}^{NI\rightarrow F}({\varvec{y}}, \lambda _{NI}, \lambda _{F})&= {} \hat{{\varvec{\beta }}}^{F}(\hat{{\varvec{\beta }}}^{NI}({\varvec{y}}, \lambda _{NI}), \lambda _{F}) \\&\equiv \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} || \hat{{\varvec{\beta }}}^{NI}({\varvec{y}}, \lambda _{NI}) \\&\quad - {\varvec{\beta }}||_{2}^{2} + \lambda _{F}\sum _{({\varvec{i}},{\varvec{j}})\in E}|\beta _{{\varvec{i}}} - \beta _{{\varvec{j}}}|. \end{aligned} \end{aligned}$$

(23)

Below we prove that both “naive” methods in the one-dimensional case with a simple monotonic restriction defined above are not only equivalent, but both methods provide the solution to the fused nearly-isotonic regression.

First, we have to prove that, analogously to fused lasso and nearly-isotonic regression, as one of the penalization parameters increases, the constant regions in the solution $\hat{{\varvec{\beta }}}^{FLNI}$ can only be joined together and not split apart. In the paper Minami (2020) this property of the estimator was called agglomerative property. We prove this result only for the one-dimensional monotonic order, and the general case is an open question.

Proposition 4

(Agglomerative property of FLNI estimator) Let ${\mathcal {I}} = \{1, \dots , n\}$ with the natural order for integers defined on it. Next, let ${\varvec{\lambda }} = (\lambda _{F}, \lambda _{L}, \lambda _{NI})$ and ${\varvec{\lambda }}^{*}= (\lambda _{F}^{*}, \lambda _{L}^{*}, \lambda _{NI}^{*})$ are the triples of penalisation parameters such that one of the elements of ${\varvec{\lambda }}^{*}$ is greater than the corresponding element in ${\varvec{\lambda }}$, while two others are the same. Next, assume that for some i the solution $\hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}},{\varvec{\lambda }})$ satisfies

$$\begin{aligned} {\hat{\beta }}_{i}^{FLNI}({\varvec{y}},{\varvec{\lambda }}) = {\hat{\beta }}_{i+1}^{FLNI}({\varvec{y}},{\varvec{\lambda }}). \end{aligned}$$

Then for ${\varvec{\lambda }}^{*}$ we have

$$\begin{aligned} {\hat{\beta }}_{i}^{FLNI}({\varvec{y}},{\varvec{\lambda }}^{*}) = {\hat{\beta }}_{i+1}^{FLNI}({\varvec{y}},{\varvec{\lambda }}^{*}). \end{aligned}$$

Now we can prove the commutability property of the “naive” estimators and the equivalence of the approach to the fused nearly-isotonic regression.

Theorem 5

(Commutability property of FNI estimator) Let $\hat{{\varvec{\beta }}}^{F\rightarrow NI}\!({\varvec{y}}, \lambda _{F}, \lambda _{NI})$ and $\hat{{\varvec{\beta }}}^{NI\rightarrow F}\!({\varvec{y}}, \lambda _{NI}, \lambda _{F})$ be the “naive” versions of the fused nearly-isotonic approximator, defined in (22) and (23), in the case of one-dimensional monotonic constraint. Then, we have

$$\begin{aligned} \hat{{\varvec{\beta }}}^{F\rightarrow NI}({\varvec{y}}, \lambda _{F}, \lambda _{NI})= & {} \hat{{\varvec{\beta }}}^{NI\rightarrow F}({\varvec{y}}, \lambda _{NI}, \lambda _{F}) \\= & {} \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}). \end{aligned}$$

One of the first conclusions of Theorem 5 is commutability of strict isotonisation (which corresponds to the large values of $\lambda _{NI}$) and fusion. For big values of $\lambda _{NI}$, fused lasso nearly-isotonic signal approximation is, in principle, analogous to the approach studied in Gao et al. (2020), where the authors studied estimation of isotonic piecewise constant signals solving the following optimization problem

$$\begin{aligned} {\varvec{\beta }}^{*} = \underset{{\varvec{\beta }} \in {\varvec{{\mathcal {B}}}}^{is}_{n,k} }{\arg \min } \sum _{j = 1}^{n}(\beta _{j} - y_{j})^{2} + pen(n, k), \end{aligned}$$

(24)

where

$$\begin{aligned} \begin{aligned} {\varvec{{\mathcal {B}}}}^{is}_{n,k} = \{&{\varvec{\beta }}\in {\mathbb {R}}^{n}: \text { there exists } \{ a_{j} \}_{j=0}^{k} \text { and } \{ \mu _{j} \}_{j=1}^{k}\\&\text { such that } 0 \le a_{0} \le a_{1} \le \dots \le a_{k} = n, \\&\mu _{1} \le \mu _{2} \le \dots \le \mu _{k}, \text { and } \beta _{i} = \mu _{j}\\&\text { for all } i \in (a_{j-1}: a_{j}] \}, \end{aligned} \end{aligned}$$

and pen(n, k) is a penalization term which depends on n and k but not on ${\varvec{y}}$. Therefore, the result of Theorem 5 provides an alternative approach to obtain an exact solution in the estimation isotonic piecewise constant signals.

4 Degrees of freedom

In this section we discuss the estimation of the degrees of freedom for the fused nearly-isotonic regression and the fused lasso nearly-isotonic signal approximator. Let us consider the following nonparametric model

$$\begin{aligned} {\varvec{Y}} = {\varvec{\mathring{\beta }}} + {\varvec{\varepsilon }}, \end{aligned}$$

where ${\varvec{\mathring{\beta }}}\in {\mathbb {R}}^{n}$ is an unknown signal, and the error term ${\varvec{\varepsilon }}\in {\mathcal {N}}({\varvec{0}}, \sigma ^{2}{\varvec{I}})$.

The degrees of freedom is a measure of complexity of the estimator, and following Efron (1986), for the fixed values of $\lambda _{F}$, $\lambda _{L}$ and $\lambda _{Ni}$ the degrees of freedom of $\hat{{\varvec{\beta }}}^{FNI}$ and $\hat{{\varvec{\beta }}}^{FLNI}$ are given by

$$\begin{aligned}{} & {} df(\hat{{\varvec{\beta }}}^{FLNI}({\varvec{Y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI}))\nonumber \\{} & {} \quad = \frac{1}{\sigma ^{2}} \sum _{i=1}^{n}\textrm{Cov}[{\hat{\beta }}^{FLNI}_{i}({\varvec{Y}}, \lambda _{F},\lambda _{L}, \lambda _{NI}), Y_{i}]. \end{aligned}$$

(25)

The next theorem provides the unbiased estimators of the degrees of freedom $df(\hat{{\varvec{\beta }}}^{FNI})$ and $df(\hat{{\varvec{\beta }}}^{FLNI})$.

Theorem 6

For the fixed values of $\lambda _{F}$, $\lambda _{L}$ and $\lambda _{Ni}$ let

$$\begin{aligned} K^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI})= & {} \#\{\text {fused groups in }\\{} & {} \quad \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI})\}, \end{aligned}$$

and

$$\begin{aligned} K^{FLNI}({\varvec{y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI})= & {} \#\{\text {non-zero fused groups in } \\{} & {} \quad \hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI})\}. \end{aligned}$$

Then we have

$$\begin{aligned} {\mathbb {E}}[K^{FNI}({\varvec{Y}}, \lambda _{F}, \lambda _{NI})] = df(\hat{{\varvec{\beta }}}^{FNI}({\varvec{Y}}, \lambda _{F}, \lambda _{NI})), \end{aligned}$$

and

$$\begin{aligned}{} & {} {\mathbb {E}}[K^{FLNI}({\varvec{Y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI})]\\{} & {} \quad = df(\hat{{\varvec{\beta }}}^{FLNI}({\varvec{Y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI})). \end{aligned}$$

We can potentially use the estimate of degrees of freedom for an unbiased estimation of the true risk ${\mathbb {E}}[\sum _{i=1}^{n}(\mathring{\beta }_{i}- {\hat{\beta }}^{FLNI}_{i}({\varvec{Y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI}))^2]$, which is given by the ${\hat{C}}_{p}$ statistic

$$\begin{aligned} \begin{aligned} {\hat{C}}_{p}(\lambda _{F}, \lambda _{L}, \lambda _{NI})&=\sum _{i=1}^{n}(y_{i} - {\hat{\beta }}^{FLNI}_{i} ({\varvec{y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI}))^2 \\&\quad - n\sigma ^{2} + 2\sigma ^{2}K^{FLNI}({\varvec{Y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI}). \end{aligned} \end{aligned}$$

Though, we note that in real applications the variance $\sigma ^{2}$ is unknown. The variance estimator for the case of one-dimensional isotonic regression was introduced in Meyer and Woodroofe (2000). To the authors’ knowledge, the variance estimator even for the one-dimensional nearly-isotonic regression is an open problem.

In order to illustrate the performance of the degrees of freedom estimator, we generate $M = 3000$ independent samples from te following signal on a grid

$$\begin{aligned} Y_{i,j} = \max {(i,j)} + \varepsilon _{i,j}, \end{aligned}$$

where $i,j= 1,\dots , 5$ and ${\varvec{\varepsilon }}\in {\mathcal {N}}(0, 0.25)$. Using Monte-Carlo simulations, we estimate $\textrm{Cov}[{\hat{\beta }}^{FLNI}_{i}, Y_{i}]$ and compare the estimated true value of df with the estimator defined in Theorem 6. The result for different values of penalisation parameters is given in Fig. 3.

5 Computational aspects, simulation study and application to a real data set

First of all, recall that the solution to the fused lasso nearly-isotonic approximator is given by

$$\begin{aligned}{} & {} {\hat{\beta }}^{FLNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, \lambda _{L},\lambda _{NI})\\{} & {} \quad = {\left\{ \begin{array}{ll} {\hat{\beta }}^{FNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, \lambda _{NI}) - \lambda _{L}, &{} \,\text { if } \, {\hat{\beta }}^{FNI}_{{\varvec{i}}} \ge \lambda _{L}, \\ 0, &{} \, \text { if } \, |{\hat{\beta }}^{FNI}_{{\varvec{i}}}| \le \lambda _{L},\\ {\hat{\beta }}^{FNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, \lambda _{NI}) + \lambda _{L}, &{}\, \text { if } \, {\hat{\beta }}^{FNI}_{{\varvec{i}}} \le -\lambda _{L}, \end{array}\right. } \end{aligned}$$

for ${{\varvec{i}}}\in {\mathcal {I}}$, with

$$\begin{aligned} \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) = {\varvec{y}} - D^{T} \hat{{\varvec{\nu }}}(\lambda _{F}, \lambda _{NI}) \end{aligned}$$

where

$$\begin{aligned}{} & {} \hat{{\varvec{\nu }}}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) = \underset{{\varvec{\nu }} \in {\mathbb {R}}^{m}}{\arg \min } \, \frac{1}{2}||{\varvec{y}} - D^{T}{\varvec{\nu }}||_{2}^{2}\\{} & {} \quad \text {s. t.} - \lambda _{F} {\varvec{1}} \le {\varvec{\nu }} \le (\lambda _{F} + \lambda _{NI}){\varvec{1}}, \end{aligned}$$

where D is the incidence matrix displayed in Fig. 1 (a) for the one-dimensional case. The matrix D is full raw ranked, therefore, the problem is strictly convex. Next, we have similar box-type constraints as in the problem of the $L_{1}$-trend filtering example and we can solve the problem with ${\mathcal {O}}(n)$ time complexity.

Second, note that in one-dimensional case the time complexities of path solution algorithms for the nearly-isotonic regression and the fusion approximator are equal to ${\mathcal {O}}(n\log (n))$, cf. Tibshirani et al. (2011); Hoefling (2010); Bento et al. (2018) with the references therein. Therefore, if we have $\lambda _{F}$ fixed, then using the result of Theorem 5 we can get the solution path with respect to $\lambda _{NI}$ with the time complexity ${\mathcal {O}}(n\log (n))$. Further, if we fix $\lambda _{NI}$ then, again, using Theorem 5 we can obtain the solution path with respect to $\lambda _{F}$ with complexity ${\mathcal {O}}(n\log (n))$. In paper Yu et al. (2022) the one-dimensional fused nearly-isotonic regression was solved for fixed values of penalisation parameters. Therefore, one-dimensional fused lasso and nearly-isotonic regression have been studied in detail, therefore, in our paper we focus on the two-dimensional case.

The case of several dimensions is more complicated. Note, that, for example, even in the case of two dimensions the matrix D, displayed in Fig. 2, is not full raw ranked. Therefore, the dual problem is not strictly convex. At the same time one can see that the matrix D is sparse diagonal. Therefore, we apply the recently developed OSQP algorithm, cf. Stellato et al. (2020). The time complexity of the solution is linear with respect to the number of edges in the graph, i.e. it is ${\mathcal {O}}(|E|)$.

The exact solution for fixed values of penalisation parameters can be obtained using the results of the paper Minami (2020), where the author proposed the algorithm for a general graph with computational complexity ${\mathcal {O}}(n|E|\log (\frac{n^{2}}{|E|}))$. Therefore, in principle, using the relation between fused nearly-isotonic regression and nearly-isotonic regression proved in Proposition 3 it is possible to obtain the exact solution to the fused nearly-isotonic approximation for a general graph.

First, recall that from Theorem 2 it follows that the solution with $\lambda _{L} \ne 0$ is given by soft-thresholding of the solution with $\lambda _{L} = 0$. Therefore, lasso penalization does not add much to the complexity, and we concentrate on the case with $\lambda _{L} = 0$. Following Minami (2020), we use the following bi-monotone functions (bisigmoid and bicubic) to test the performance of the fused nearly-isotonic approximator:

$$\begin{aligned} f_{bs}(x^{(1)}, x^{(2)})= & {} \frac{1}{2}\Big (\frac{e^{16x^{(1)} - 8}}{1 + e^{16x^{(1)} - 8}} + \frac{e^{16x^{(2)} - 8}}{1 + e^{16x^{(2)} - 8}}\Big ),\\ f_{bc}(x^{(1)}, x^{(2)})= & {} \frac{1}{2}\Big ( (2x^{(1)} -1)^{3} + (2x^{(2)} -1)^{3} \Big ) + 2, \end{aligned}$$

where $x^{(1)} \in [0,1)$ and $x^{(2)} \in [0,1)$.

The simulation experiment is performed in the following way. First, we generate homogeneous grid $k\times k$:

$$\begin{aligned} x^{(1)}_{k} = \frac{k-1}{d} \quad \text {and} \quad x^{(2)}_{k} = \frac{k-1}{d}, \end{aligned}$$

for $k = 1, \dots , d$. The size of the side d varies in $\{ 2\times 10^{2}, 4\times 10^{2}, 6\times 10^{2}, 8\times 10^{2}, 10^{3} \}$. Next, we uniformly generate penalisation parameters $\lambda _{F}$ and $\lambda _{NI}$ from U(0, 5). We perform 10 runs and compute computational times for each d. Analogously to Stellato et al. (2020), we consider two cases of OSQP algorithm: low precision case with $\varepsilon _{abs} = \varepsilon _{rel} = 10^{-3}$, and high precision case with $\varepsilon _{abs} = \varepsilon _{rel} = 10^{-5}$ (for the details of the settings in OSQP we refer to Stellato et al. (2020)). Figure 4 below provides these computational times. All the computations were performed on MacBook Air (Apple M1 chip), 16 GB RAM. From these results we can conclude that the estimator is computationally feasible for moderate sized data sets (i.e. for the grids with millions of nodes).

Next, Fig. 5 visualizes the fused nearly-isotonic approximator. We use the Adult data set, available from the UCI Machine Learning repository (Becker and Kohavi 1996). The target variable in this data set is either a person’s salary is greater than 50 000 dollars per year or less. We use two features (education number and working hours per week) and each bar in the figure is the proportion of people making more that the amount of money mentioned above. This data set was used, for example, in Wang et al. (2022).

From Fig. 5 we can see that fused nearly-isotonic regression provides a trade-off between monotonicity, block sparsity and goodness-of-fit.

6 Conclusion and discussion

In this paper we introduced and studied the fused lasso nearly-isotonic signal approximator in general dimensions. The main result is that the estimator is computationally feasible and it provides interplay between fusion and monotonisation. Also, we proved that the properties of the new estimator are very similar to the properties of the fusion estimator and the nearly-isotonic regression.

In our opinion, one of the most important results is Theorem 5, where we proved the commutability property of fusion and nearly-isotonisation, because for the fixed values of one of the penalisation parameters we can immediately obtain the path solution with respect to the other one. Path algorithm for fused lasso exists (Hoefling 2010; Tibshirani and Taylor 2011). At the same time, to the authors’ knowledge, path algorithm for nearly-isotonic regression in general dimensions has not been developed yet. Therefore, further direction could be the solution for the nearly-isotonic regression, and, next, to prove if commutability holds in a general dimensional case.

One of the other possible directions is to study the asymptotic properties. In particular, it is interesting to understand the rate of convergence for different model selection and cross-validation procedures of choosing penalisation parameters.

Another direction is to study properties of the solution when $\lambda _{F}$ and $\lambda _{NI}$ are not the same for each vertex. An example where one must use different penalisation parameters is the case when the data points are measured along non-homogeneously spaced grid. It is important to note that, as discussed in Minami (2020), this case is different and even in the one-dimensional case the estimator will behave differently. In particular, the agglomerative property of the nearly-isotonic regression holds if the penalisation parameters satisfy the certain relation, cf. Proposition A.1. in Minami (2020), which is crucial for the solution path.

Finally, in our opinion, it is interesting to study different combinations of penalisation estimators, even though, practically, in this case one needs more data, because there will be more penalisation parameters to estimate.

References

Becker, B., Kohavi, R.: UCI Machine Learning Repository (1996). http://archive.ics.uci.edu/ml
Bento, J., Furmaniak, R., Ray, S.: On the complexity of the weighted fused lasso. IEEE Signal Process. Lett. 25(10), 1595–1599 (2018)
Article Google Scholar
Beran, R., Dümbgen, L.: Least squares and shrinkage estimation under bimonotonicity constraints. Stat. Comput. 20(2), 177–189 (2010)
Article MathSciNet Google Scholar
Best, M.J., Chakravarti, N.: Active set algorithms for isotonic regression; a unifying framework. Math. Program. 47(1), 425–439 (1990)
Article MathSciNet Google Scholar
Deng, H., Zhang, C.-H.: Isotonic regression in multi-dimensional spaces and graphs. Ann. Stat. 48(6), 3672–3698 (2020)
Article MathSciNet Google Scholar
Efron, B.: How biased is the apparent error rate of a prediction rule? J. Am. Stat. Assoc. 81(394), 461–470 (1986)
Article MathSciNet Google Scholar
Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)
Article MathSciNet Google Scholar
Gaines, B.R., Kim, J., Zhou, H.: Algorithms for fitting the constrained lasso. J. Comput. Graph. Stat. 27(4), 861–871 (2018)
Article MathSciNet Google Scholar
Gao, C., Han, F., Zhang, C.-H.: On estimation of isotonic piecewise constant signals. Ann. Stat. 48(2), 629–654 (2020)
Article MathSciNet Google Scholar
Gómez, A., He, Z., Pang, J.-S.: Linear-step solvability of some folded concave and singly-parametric sparse optimization problems. Math. Program. 198, 1–42 (2022)
Han, Q., Zhang, C.-H.: Limit distribution theory for block estimators in multiple isotonic regression. Ann. Stat. 48(6), 3251–3282 (2020)
Han, Q., Wang, T., Chatterjee, S., Samworth, R.J.: Isotonic regression in general dimensions. Ann. Stat. 47(5), 2440–2471 (2019)
Article MathSciNet Google Scholar
Hoefling, H.: A path algorithm for the fused lasso signal approximator. J. Comput. Graph. Stat. 19(4), 984–1006 (2010)
Hoerl, A.E., Kennard, R.W.: Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Article Google Scholar
Kim, S.-J., Koh, K., Boyd, S., Gorinevsky, D.: $\ell _1$ trend filtering. SIAM Rev. 51(2), 339–360 (2009)
Meyer, M., Woodroofe, M.: On the degrees of freedom in shape-restricted regression. Ann. Stat. 28(4), 1083–1104 (2000)
Article MathSciNet Google Scholar
Minami, K.: Estimating piecewise monotone signals. Electron. J. Stat. 14(1), 1508–1576 (2020)
Article MathSciNet Google Scholar
Phillips, D.L.: A technique for the numerical solution of certain integral equations of the first kind. J ACM (JACM) 9(1), 84–97 (1962)
Article MathSciNet Google Scholar
Rinaldo, A.: Properties and refinements of the fused lasso. Ann. Stat. 37(5B), 2922–2952 (2009)
Article MathSciNet Google Scholar
Robertson, T., Wright, F.T., Dykstra, R.L.: Order Restricted Statistical Inference. Wiley, New York (1988)
Stellato, B., Banjac, G., Goulart, P., Bemporad, A., Boyd, S.: OSQP: an operator splitting solver for quadratic programs. Math. Progr. Comput. 12(4), 637–672 (2020)
Article MathSciNet Google Scholar
Stout, Q.F.: Isotonic regression via partitioning. Algorithmica 66(1), 93–112 (2013)
Article MathSciNet Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)
Article MathSciNet Google Scholar
Tibshirani, R.J., Taylor, J.: The solution path of the generalized lasso. Ann. Stat. 39(3), 1335–1371 (2011)
Article MathSciNet Google Scholar
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 67(1), 91–108 (2005)
Article MathSciNet Google Scholar
Tibshirani, R.J., Hoefling, H., Tibshirani, R.: Nearly-isotonic regression. Technometrics 53(1), 54–61 (2011)
Article MathSciNet Google Scholar
Tikhonov, A.N., Goncharsky, A., Stepanov, V., Yagola, A.G.: Numerical Methods for the Solution of Ill-posed Problems, vol. 328. Springer, Dordrecht (1995)
Book Google Scholar
Wang, Y.-X., Sharpnack, J., Smola, A., Tibshirani, R.: Trend filtering on graphs. In: Artificial Intelligence and Statistics, pp. 1042–1050. PMLR (2015)
Wang, X., Ying, J., Cardoso, J.V.M., Palomar, D.P.: Efficient algorithms for general isotone optimization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 8575– 8583 (2022)
Yu, Z., Chen, X., Li, X.: A dynamic programming approach for generalized nearly isotonic optimization. Math. Program. Comput. 15, 1–31 (2022)

Download references

Acknowledgements

This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundataion.

Funding

Open access funding provided by University of Vienna.

Author information

Authors and Affiliations

Department of Statistics and Operations Research, University of Vienna, Oskar-Morgenstern-Platz 1, 1090, Vienna, Austria
Vladimir Pastukhov

Authors

Vladimir Pastukhov
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

I am the only author of this paper.

Corresponding author

Correspondence to Vladimir Pastukhov.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A Proofs of the results

Proof of Theorem 1

First, following the derivations of $\ell _{1}$ trend filtering and generalised lasso in Kim et al. (2009) and Tibshirani and Taylor (2011), respectively, we can write the optimization problem in (6) in the following way

$$\begin{aligned}{} & {} \underset{{\varvec{\beta }}, {\varvec{z}}}{\text {minimize}} \, \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2} + \lambda _{F} ||{\varvec{z}}||_{1} + \lambda _{NI}||{\varvec{z}}||_{+} \\{} & {} \quad \text {subject to} \quad D {\varvec{\beta }} = {\varvec{z}} \in {\mathbb {R}}^{m}. \end{aligned}$$

Further, the Lagrangian is given by

$$\begin{aligned} L({\varvec{\beta }}, {\varvec{z}}, {\varvec{\nu }}){} & {} = \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2} + \lambda _{F} ||{\varvec{z}}||_{1} \nonumber \\{} & {} \quad + \lambda _{NI}||{\varvec{z}}||_{+} + {\varvec{\nu }}^{T}(D {\varvec{\beta }} - {\varvec{z}}), \end{aligned}$$

(A1)

where ${\varvec{\nu }} \in {\mathbb {R}}^{m}$ is a dual variable.

Note that

$$\begin{aligned}{} & {} \underset{{\varvec{z}}}{\min } \big (\lambda _{F} ||{\varvec{z}}||_{1} + \lambda _{NI}||{\varvec{z}}||_{+} - {\varvec{\nu }}^{T}{\varvec{z}}\big )\\{} & {} \quad = {\left\{ \begin{array}{ll} 0, &{} \quad \text {if} \quad -\lambda _{F} {\varvec{1}} \le {\varvec{\nu }} \le (\lambda _{F} + \lambda _{NI}){\varvec{1}}, \\ -\infty , &{} \quad \text {otherwise}, \end{array}\right. } \end{aligned}$$

and

$$\begin{aligned}{} & {} \underset{{\varvec{\beta }}}{\min } \left( \frac{1}{2} ||{\varvec{y}} - {\varvec{\beta }}||_{2}^{2} + {\varvec{\nu }}^{T} D {\varvec{\beta }}\right) \\{} & {} \quad = -\frac{1}{2}{\varvec{\nu }}^{T}DD^{T}{\varvec{\nu }} + {\varvec{y}}^{T}D^{T}{\varvec{\nu }}\\{} & {} \quad = - \frac{1}{2} ||{\varvec{y}} - D^{T}{\varvec{\nu }}||_{2}^{2} + \frac{1}{2}{\varvec{y}}^{T}{\varvec{y}}. \end{aligned}$$

Next, the dual function is given by

$$\begin{aligned} \begin{aligned} g(\nu ) ={}&\underset{{\varvec{\beta }}, {\varvec{z}}}{\min } \, L({\varvec{\beta }}, {\varvec{z}}, {\varvec{\nu }}) \\ =&{\left\{ \begin{array}{ll} -\frac{1}{2} ||{\varvec{y}} - D^{T}{\varvec{\nu }}||_{2}^{2} + \frac{1}{2}{\varvec{y}}^{T}{\varvec{y}}, \\ \qquad \qquad \qquad \text {if} \quad -\lambda _{F} {\varvec{1}} \le {\varvec{\nu }} \le (\lambda _{F} + \lambda _{NI}){\varvec{1}}, \\ -\infty , \qquad \qquad \qquad \text {otherwise}, \end{array}\right. } \end{aligned} \end{aligned}$$

and, therefore, the dual problem is

$$\begin{aligned}{} & {} \hat{{\varvec{\nu }}}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) = \underset{{\varvec{\nu }}}{\arg \max } \, g(\nu ) \\{} & {} \quad \text {subject to} \quad -\lambda _{F} {\varvec{1}} \le {\varvec{\nu }} \le (\lambda _{F} + \lambda _{NI}){\varvec{1}}, \end{aligned}$$

which is equivalent to

$$\begin{aligned}{} & {} \hat{{\varvec{\nu }}}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) = \underset{{\varvec{\nu }}}{\arg \min } \, \frac{1}{2}||{\varvec{y}} - D^{T}{\varvec{\nu }}||_{2}^{2} \\{} & {} \quad \text {subject to} \quad -\lambda _{F} {\varvec{1}} \le {\varvec{\nu }} \le (\lambda _{F} + \lambda _{NI}){\varvec{1}}. \end{aligned}$$

Lastly, taking first derivative of Lagrangian $L({\varvec{\beta }}, {\varvec{z}}, {\varvec{\nu }})$ with respect to ${\varvec{\beta }}$ we get the following relation between $\hat{{\varvec{\beta }}}^{FNI}(\lambda _{F}, \lambda _{NI})$ and $\hat{{\varvec{\nu }}}({\varvec{y}}, \lambda _{F}, \lambda _{NI})$

$$\begin{aligned} \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) = {\varvec{y}} - D^{T} \hat{{\varvec{\nu }}}({\varvec{y}}, \lambda _{F}, \lambda _{NI}). \end{aligned}$$

$\Box $

Proof of Theorem 2

The proof is similar to the derivation of the solution of the fused lasso in Friedman et al. (2007). Nevertheless, for compleeteness of the paper we provide the proof for $\hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}},\lambda _{F}, \lambda _{L},\lambda _{NI})$.

The subgradient equations (which are necessary and sufficient conditions for the solution of (5)) for $\beta _{{\varvec{i}}}$, with ${\varvec{i}}\in {\mathcal {I}}$, are

$$\begin{aligned} \begin{aligned}&g_{{\varvec{i}}}(\lambda _{L})\\&\quad = -(y_{{\varvec{i}}} - \beta _{{\varvec{i}}}) + \lambda _{NI}\left( \sum _{{\varvec{j}}:({\varvec{i}}, {\varvec{j}})\in E} q_{{\varvec{i}}, {\varvec{j}}} - \sum _{{\varvec{j}}:({\varvec{j}}, {\varvec{i}})\in E}q_{{\varvec{j}}, {\varvec{i}}}\right) \\&\qquad +\lambda _{F}\left( \sum _{{\varvec{j}}:({\varvec{i}}, {\varvec{j}})\in E}t_{{\varvec{i}}, {\varvec{j}}} - \sum _{{\varvec{j}}:({\varvec{j}}, {\varvec{i}})\in E}t_{{\varvec{j}}, {\varvec{i}}}\right) + \lambda _{L}s_{i} = 0, \end{aligned}\nonumber \\ \end{aligned}$$

(A2)

where

$$\begin{aligned}{} & {} q_{{\varvec{i}}, {\varvec{j}}}: {\left\{ \begin{array}{ll} = 1, &{} \text {if } \, \beta _{{\varvec{i}}} - \beta _{{\varvec{j}}}> 0,\\ = 0, &{} \text {if } \, \beta _{{\varvec{i}}} - \beta _{{\varvec{j}}}< 0,\\ \in [0,1], &{} \text {if } \, \beta _{{\varvec{i}}} =\beta _{{\varvec{j}}}, \end{array}\right. } \nonumber \\{} & {} \quad t_{{\varvec{i}}, {\varvec{j}}}: {\left\{ \begin{array}{ll} = 1, &{} \text {if } \, \beta _{{\varvec{i}}} - \beta _{{\varvec{j}}}> 0,\\ = -1, &{} \text {if } \, \beta _{{\varvec{i}}} - \beta _{{\varvec{j}}}< 0,\\ \in [-1, 1], &{} \text {if } \, \beta _{{\varvec{i}}} =\beta _{{\varvec{j}}}, \end{array}\right. } \nonumber \\{} & {} \quad s_{{\varvec{i}}}: {\left\{ \begin{array}{ll} = 1, &{} \text {if } \, \beta _{{\varvec{i}}} > 0, \\ = -1, &{} \text {if } \, \beta _{{\varvec{i}}} < 0,\\ \in [-1, 1], &{} \text {if } \, \beta _{{\varvec{i}}} = 0. \end{array}\right. } \end{aligned}$$

(A3)

Next, let $q_{{\varvec{i}}, {\varvec{j}}}(\lambda _{L})$, $t_{{\varvec{i}}, {\varvec{j}}}(\lambda _{L})$ and $s_{{\varvec{i}}}(\lambda _{L})$ denote the values of the parameters defined above at some value of $\lambda _{L}$. Note, the values of $\lambda _{NI}$ and $\lambda _{F}$ are fixed. Therefore, if ${\hat{\beta }}_{{\varvec{i}}}^{FLNI}({\varvec{y}},\lambda _{F}, 0, \lambda _{NI}) \ne 0$ for $s_{{\varvec{i}}}(0)$ we have

$$\begin{aligned} s_{{\varvec{i}}}(0) = {\left\{ \begin{array}{ll} 1, &{} \text {if } \, {\hat{\beta }}_{{\varvec{i}}}^{FLNI}({\varvec{y}},\lambda _{F}, 0, \lambda _{NI}) > 0, \\ -1, &{} \text {if } \, {\hat{\beta }}_{{\varvec{i}}}^{FLNI}({\varvec{y}},\lambda _{F}, 0, \lambda _{NI}) < 0,\\ \end{array}\right. } \end{aligned}$$

and for ${\hat{\beta }}_{{\varvec{i}}}^{FLNI}({\varvec{y}}, \lambda _{F}, 0, \lambda _{NI}) = 0$ we can set $s_{{\varvec{i}}}(0) = 0$.

Next, let $\hat{{\varvec{\beta }}}^{ST}(\lambda _{L})$ denote the soft thresholding of $\hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}}, \lambda _{F}, 0, \lambda _{NI})$, i.e.

$$\begin{aligned}{} & {} {\hat{\beta }}_{{\varvec{i}}}^{ST}(\lambda _{L}) \\{} & {} \quad = {\left\{ \begin{array}{ll} {\hat{\beta }}^{FLNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, 0, \lambda _{NI}) - \lambda _{L}, \\ \qquad \qquad \qquad \qquad \quad \text {if} \quad {\hat{\beta }}_{{\varvec{i}}}^{FLNI}({\varvec{y}}, \lambda _{F}, 0, \lambda _{NI}) \ge \lambda _{L}, \\ 0, \qquad \qquad \qquad \qquad \text {if } |{\hat{\beta }}_{{\varvec{i}}}^{FLNI}({\varvec{y}}, \lambda _{F}, 0,\lambda _{NI})| \le \lambda _{L},\\ {\hat{\beta }}_{{\varvec{i}}}^{FLNI}({\varvec{y}}, \lambda _{F}, 0, \lambda _{NI}) + \lambda _{L}, \\ \qquad \qquad \qquad \qquad \text {if} \quad {\hat{\beta }}_{{\varvec{i}}}^{FLNI}({\varvec{y}},\lambda _{F}, 0, \lambda _{NI}) \le -\lambda _{L}. \end{array}\right. } \end{aligned}$$

The goal is to prove that $\hat{{\varvec{\beta }}}^{ST}(\lambda _{L})$ provides the solution to (14).

Note, analogously to the proof for the fused lasso estimator in Lemma A.1 at Friedman et al. (2007), if either ${\hat{\beta }}_{{\varvec{i}}}^{ST}(\lambda _{L}) \ne 0$ or ${\hat{\beta }}_{{\varvec{j}}}^{ST}(\lambda _{L}) \ne 0$, and ${\hat{\beta }}_{{\varvec{i}}}^{ST}(\lambda _{L}) < {\hat{\beta }}_{{\varvec{j}}}^{ST}(\lambda _{L})$ or ${\hat{\beta }}_{{\varvec{i}}}^{ST}(\lambda _{L}) > {\hat{\beta }}_{{\varvec{j}}}^{ST}(\lambda _{L})$, then we also have ${\hat{\beta }}_{{\varvec{i}}}^{ST}(0) < {\hat{\beta }}_{{\varvec{j}}}^{ST}(0)$ or ${\hat{\beta }}_{{\varvec{i}}}^{ST}(0) > {\hat{\beta }}_{{\varvec{j}}}^{ST}(0)$, respectively. Therefore, soft thresholding of $\hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}}, \lambda _{F}, 0, \lambda _{NI})$ does not change the ordering of these pairs and we have $q_{{\varvec{i}}, {\varvec{j}}}(\lambda _{L}) = q_{{\varvec{i}},{\varvec{j}}}(0)$ and $t_{{\varvec{i}}, {\varvec{j}}}(\lambda _{L}) = t_{{\varvec{i}},{\varvec{j}}}(0)$. Next, if for some $({\varvec{i}}, {\varvec{j}})\in E$ we have ${\hat{\beta }}_{{\varvec{i}}}^{ST}(\lambda _{L}) = {\hat{\beta }}_{{\varvec{j}}}^{ST}(\lambda _{L}) = 0$, then $q_{{\varvec{i}}, {\varvec{j}}} \in [0,1]$ and $t_{{\varvec{i}}, {\varvec{j}}} \in [-1, 1]$, and, again, we can set $t_{{\varvec{i}}, {\varvec{j}}}(\lambda _{L}) = t_{{\varvec{i}}, {\varvec{j}}}(0)$, and $q_{{\varvec{i}}, {\varvec{j}}}(\lambda _{L}) = q_{{\varvec{i}}, {\varvec{j}}}(0)$.

Now let us insert ${\hat{\beta }}_{{\varvec{i}}}^{ST}(\lambda _{L})$ into subgradient equations (A2) and show that we can find $s_{{\varvec{i}}}(\lambda _{L}) \in [0,1]$, for all ${\varvec{i}}\in {\mathcal {I}}$.

First, assume that for some ${\varvec{i}}$ we have ${\hat{\beta }}_{{\varvec{i}}}^{FLNI}({\varvec{y}}, \lambda _{F}, 0, \lambda _{NI}) \ge \lambda _{L}$. Then

$$\begin{aligned} \begin{aligned} g_{{\varvec{i}}}(\lambda _{L})&={} -(y_{{\varvec{i}}} - {\hat{\beta }}_{{\varvec{i}}}^{FLNI}({\varvec{y}}, \lambda _{F}, 0, \lambda _{NI})) - \lambda _{L}\\&+ \lambda _{NI}\left( \sum _{{\varvec{j}}:({\varvec{i}}, {\varvec{j}})\in E} q_{{\varvec{i}}, {\varvec{j}}}(\lambda _{L}) - \sum _{{\varvec{j}}:({\varvec{j}}, {\varvec{i}})\in E}q_{{\varvec{j}}, {\varvec{i}}}(\lambda _{L})\right) \\&\quad +\lambda _{F}\left( \sum _{{\varvec{j}}:({\varvec{i}}, {\varvec{j}})\in E}t_{{\varvec{i}}, {\varvec{j}}}(\lambda _{L}) - \sum _{{\varvec{j}}:({\varvec{j}}, {\varvec{i}})\in E}t_{{\varvec{j}}, {\varvec{i}}}(\lambda _{L})\right) \\&\quad + \lambda _{L}s_{{\varvec{i}}}(\lambda _{L})\\&={} -(y_{{\varvec{i}}} - {\hat{\beta }}_{{\varvec{i}}}^{FLNI}({\varvec{y}}, \lambda _{F}, 0, \lambda _{NI}))\\&\quad + \lambda _{NI}\left( \sum _{{\varvec{j}}:({\varvec{i}}, {\varvec{j}})\in E} q_{{\varvec{i}}, {\varvec{j}}}(0) - \sum _{{\varvec{j}}:({\varvec{j}}, {\varvec{i}})\in E}q_{{\varvec{j}}, {\varvec{i}}}(0)\right) \\&\quad +s\lambda _{F}\left( \sum _{{\varvec{j}}:({\varvec{i}}, {\varvec{j}})\in E}t_{{\varvec{i}},{\varvec{j}}}(0) - \sum _{{\varvec{j}}:({\varvec{j}}, {\varvec{i}})\in E}t_{{\varvec{j}}, {\varvec{i}}}(0)\right) \\&\quad + \lambda _{L}s_{{\varvec{i}}}(\lambda _{L}) - \lambda _{L} = 0. \end{aligned} \end{aligned}$$

Note, that

$$\begin{aligned} \begin{aligned}&-\left( y_{i} - {\hat{\beta }}_{i}^{FLNI}({\varvec{y}},\lambda _{F}, 0, \lambda _{NI})\right) \\&\quad + \lambda _{NI}\left( \sum _{{\varvec{j}}:({\varvec{i}}, {\varvec{j}})\in E} q_{{\varvec{i}}, {\varvec{j}}}(0) - \sum _{{\varvec{j}}:({\varvec{j}}, {\varvec{i}})\in E}q_{{\varvec{j}}, {\varvec{i}}}(0)\right) \\&\quad +\lambda _{F}\left( \sum _{{\varvec{j}}:({\varvec{i}}, {\varvec{j}})\in E}t_{{\varvec{i}}, {\varvec{j}}}(0) - \sum _{{\varvec{j}}:({\varvec{j}}, {\varvec{i}})\in E}t_{{\varvec{j}}, {\varvec{i}}}(0)\right) = 0, \end{aligned} \end{aligned}$$

because $\hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) \equiv \hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}},\lambda _{F}, 0, \lambda _{NI})$.

Therefore, if $s_{i}(\lambda _{L}) = \textrm{sign}{{\hat{\beta }}_{{\varvec{i}}}^{ST}(\lambda _{L})} = 1$, then $g_{{\varvec{i}}}(\lambda _{L}) = 0$.

The proof for the case when ${\hat{\beta }}_{{\varvec{i}}}^{FLNI}({\varvec{y}}, \lambda _{F}, 0, \lambda _{NI}) \le -\lambda _{L}$ is similar and one can show that $g_{{\varvec{i}}}(\lambda _{L}) = 0$ if $s_{{\varvec{i}}}(\lambda _{L}) = \textrm{sign}{{\hat{\beta }}_{{\varvec{i}}}^{ST}(\lambda _{L})} = -1$.

Second, assume that $|{\hat{\beta }}_{{\varvec{i}}}^{FLNI}({\varvec{y}}, \lambda _{F}, 0, \lambda _{NI})| < \lambda _{L}$. Then, ${\hat{\beta }}_{{\varvec{i}}}^{ST}(\lambda _{L}) = 0$, and

$$\begin{aligned} \begin{aligned} g_{{\varvec{i}}}(\lambda _{L})&={} -y_{{\varvec{i}}} + \lambda _{NI}\left( \sum _{{\varvec{j}}:({\varvec{i}}, {\varvec{j}})\in E} q_{{\varvec{i}}, {\varvec{j}}}(\lambda _{L}) - \sum _{{\varvec{j}}:({\varvec{j}}, {\varvec{i}})\in E}q_{{\varvec{j}}, {\varvec{i}}}(\lambda _{L})\right) \\&\quad +\lambda _{F}\left( \sum _{{\varvec{j}}:({\varvec{i}}, {\varvec{j}})\in E}t_{{\varvec{i}}, {\varvec{j}}}(\lambda _{L}) - \sum _{{\varvec{j}}:({\varvec{j}}, {\varvec{i}})\in E}t_{{\varvec{j}}, {\varvec{i}}}(\lambda _{L})\right) \\&\quad + \lambda _{L}s_{{\varvec{i}}}(\lambda _{L})\\&={} -y_{{\varvec{i}}} + \lambda _{NI}\left( \sum _{{\varvec{j}}:({\varvec{i}}, {\varvec{j}})\in E} q_{{\varvec{i}}, {\varvec{j}}}(0)- \sum _{{\varvec{j}}:({\varvec{j}}, {\varvec{i}})\in E}q_{{\varvec{j}}, {\varvec{i}}}(0)\right) \\&\quad +\lambda _{F}\left( \sum _{{\varvec{j}}:({\varvec{i}}, {\varvec{j}})\in E}t_{{\varvec{i}}, {\varvec{j}}}(0) - \sum _{{\varvec{j}}:({\varvec{j}}, {\varvec{i}})\in E}t_{{\varvec{j}}, {\varvec{i}}}(0)\right) \\&\quad + \lambda _{L}s_{{\varvec{i}}}(\lambda _{L}) = 0. \end{aligned} \end{aligned}$$

Next, if we let $s_{{\varvec{i}}}(\lambda _{L}) = {\hat{\beta }}_{{\varvec{i}}}^{FLNI}({\varvec{y}},\lambda _{F}, 0, \lambda _{NI})/\lambda _{L}$, then, again, we have

$$\begin{aligned} \begin{aligned} g_{{\varvec{i}}}(\lambda _{L})&={}-(y_{i} - {\hat{\beta }}_{i}^{FLNI}({\varvec{y}},\lambda _{F}, 0, \lambda _{NI}))\\&\quad + \lambda _{NI}\left( \sum _{{\varvec{j}}:({\varvec{i}}, {\varvec{j}})\in E} q_{{\varvec{i}}, {\varvec{j}}}(0) - \sum _{{\varvec{j}}:({\varvec{j}}, {\varvec{i}})\in E}q_{{\varvec{j}}, {\varvec{i}}}(0)\right) \\&\quad +\lambda _{F}\left( \sum _{{\varvec{j}}:({\varvec{i}}, {\varvec{j}})\in E}t_{{\varvec{i}}, {\varvec{j}}}(0) - \sum _{{\varvec{j}}:({\varvec{j}}, {\varvec{i}})\in E}t_{{\varvec{j}}, {\varvec{i}}}(0)\right) = 0, \end{aligned} \end{aligned}$$

Therefore, we have proved that $\hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}},\lambda _{F}, \lambda _{L},\lambda _{NI}) = \hat{{\varvec{\beta }}}^{ST}(\lambda _{L})$. $\Box $

Proof of Proposition 3

First, from Tibshirani et al. (2011) the solution to the nearly-isotonic problem is given by

$$\begin{aligned} \hat{{\varvec{\beta }}}^{NI}({\varvec{y}}, \lambda _{NI}) = {\varvec{y}} - D^{T} \hat{{\varvec{v}}}({\varvec{y}}, \lambda _{NI}), \end{aligned}$$

with

$$\begin{aligned} \hat{{\varvec{v}}}({\varvec{y}}, \lambda _{NI})= & {} \underset{{\varvec{v}} \in {\mathbb {R}}^{n-1}}{\arg \min } \, \frac{1}{2}||{\varvec{y}} - D^{T}{\varvec{v}}||_{2}^{2} \quad \text {subject to} \\{} & {} {\varvec{0}} \le {\varvec{v}} \le \lambda _{NI}{\varvec{1}}, \end{aligned}$$

and from Tibshirani and Taylor (2011) it follows

$$\begin{aligned} \hat{{\varvec{\beta }}}^{F}({\varvec{y}}, \lambda _{F}) = {\varvec{y}} - D^{T} \hat{{\varvec{w}}}({\varvec{y}}, \lambda _{F}), \end{aligned}$$

with

$$\begin{aligned} \hat{{\varvec{w}}}({\varvec{y}}, \lambda _{F})= & {} \underset{{\varvec{w}} \in {\mathbb {R}}^{n-1}}{\arg \min } \, \frac{1}{2}||{\varvec{y}} - D^{T}{\varvec{w}}||_{2}^{2}\\{} & {} \quad \text {subject to} \quad -\lambda _{F} {\varvec{1}} \le {\varvec{w}} \le \lambda _{F}{\varvec{1}}. \end{aligned}$$

Second, let us introduce a new variable ${\varvec{v}}^{*} = {\varvec{v}} - \frac{\lambda _{NI}}{2} {\varvec{1}}$. Then

$$\begin{aligned} \hat{{\varvec{\beta }}}^{NI}({\varvec{y}}, \lambda _{NI}) = {\varvec{y}} -D^{T}\frac{\lambda _{NI}}{2} {\varvec{1}} - D^{T} \hat{{\varvec{v}}}^{*}({\varvec{y}}, \lambda _{NI}), \end{aligned}$$

where

$$\begin{aligned} \hat{{\varvec{v}}}^{*}({\varvec{y}}, \lambda _{NI}) =&\underset{{\varvec{v}}^{*} \in {\mathbb {R}}^{n-1}}{\arg \min } \, \frac{1}{2}||{\varvec{y}} - D^{T}\frac{\lambda _{NI}}{2} {\varvec{1}} - D^{T}{\varvec{v}}^{*}||_{2}^{2} \\&\text {s. t.} \quad -\frac{\lambda _{NI}}{2} {\varvec{1}} \le {\varvec{v}}^{*} \le \frac{\lambda _{NI}}{2} {\varvec{1}}. \end{aligned}$$

Therefore, we have proved that $\hat{{\varvec{\beta }}}^{NI}({\varvec{y}}, \lambda _{NI}) = \hat{{\varvec{\beta }}}^{F}({\varvec{y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \frac{1}{2}\lambda _{NI})$.

The proof for the fused lasso nearly-isotonic estimator is the same with the change of variable ${\varvec{u}}^{*} = {\varvec{u}} + D^{T}\lambda _{F} {\varvec{1}}$ in (16) and (17) for the proof of the first equality in (20) and with ${\varvec{u}}^{*} = {\varvec{u}} - \frac{\lambda _{NI}}{2} {\varvec{1}}$ for the second equality.

Next, we prove the result for the case of fused lasso nearly-isotonic approximator. From Theorem 2 we have

$$\begin{aligned}{} & {} {\hat{\beta }}^{FLNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, \lambda _{L},\lambda _{NI})\\{} & {} \quad = {\left\{ \begin{array}{ll} {\hat{\beta }}^{FNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, \lambda _{NI}) - \lambda _{L}, &{} \quad \text {if} \quad {\hat{\beta }}^{FNI}_{{\varvec{i}}} \ge \lambda _{L}, \\ 0, &{} \quad \text {if} \quad |{\hat{\beta }}^{FNI}_{{\varvec{i}}}| \le \lambda _{L},\\ {\hat{\beta }}^{FNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, \lambda _{NI}) + \lambda _{L}, &{} \quad \text {if} \quad {\hat{\beta }}^{FNI}_{{\varvec{i}}} \le -\lambda _{L}, \end{array}\right. } \end{aligned}$$

for ${\varvec{i}} \in {\mathcal {I}}$.

Further, using (20) we have

$$\begin{aligned}{} & {} {\hat{\beta }}^{FLNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, \lambda _{L},\lambda _{NI}) = {\hat{\beta }}^{F}_{{\varvec{i}}}({\varvec{y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \\{} & {} \frac{1}{2}\lambda _{NI} + \lambda _{F}) - \lambda _{L}, \end{aligned}$$

if ${\hat{\beta }}^{F}_{{\varvec{i}}}({\varvec{y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \frac{1}{2}\lambda _{NI} + \lambda _{F}) \ge \lambda _{L}$,

$$\begin{aligned} {\hat{\beta }}^{FLNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, \lambda _{L},\lambda _{NI}) = 0, \end{aligned}$$

if $|{\hat{\beta }}^{F}_{{\varvec{i}}}({\varvec{y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \frac{1}{2}\lambda _{NI} + \lambda _{F})| \le \lambda _{L}$,

$$\begin{aligned} {\hat{\beta }}^{FLNI}_{{\varvec{i}}}({\varvec{y}},\lambda _{F}, \lambda _{L},\lambda _{NI})= & {} {\hat{\beta }}^{F}_{{\varvec{i}}}({\varvec{y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \\{} & {} \times \frac{1}{2}\lambda _{NI} + \lambda _{F}) + \lambda _{L}, \end{aligned}$$

if ${\hat{\beta }}^{F}_{{\varvec{i}}}({\varvec{y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \frac{1}{2}\lambda _{NI} + \lambda _{F}) \le -\lambda _{L}$.

Therefore, we obtain

$$\begin{aligned} \begin{aligned}&\hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI}) \\&\quad = \underset{{\varvec{\beta }} \in {\mathbb {R}}^{n}}{\arg \min } \, \frac{1}{2} \left\| {\varvec{y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}} - {\varvec{\beta }}\right\| _{2}^{2} + \left( \frac{1}{2}\lambda _{NI} + \lambda _{F}\right) ||D{\varvec{\beta }}||_{1}\\&\qquad + \lambda _{L} ||{\varvec{\beta }}||_{1} \equiv \hat{{\varvec{\beta }}}^{FL}\left( {\varvec{y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \frac{1}{2}\lambda _{NI} + \lambda _{F}, \lambda _{L}\right) . \end{aligned} \end{aligned}$$

$\Box $

Let us consider the following cases separately.

Case 1: $\lambda _{NI}$and $\lambda _{F}$are fixed and $\lambda _{L}^{*} > \lambda _{L}$. The result of the proposition for this case follows directly from Theorem 2.

Case 2: $\lambda _{F}$ and $\lambda _{L}$ are fixed and $\lambda _{NI}^{*} > \lambda _{NI}$. Let us consider the fused nearly-isotonic regression and write the subgradient equations

$$\begin{aligned}{} & {} g_{i}(\lambda _{NI}) = -(y_{i} - \beta _{i}) + \lambda _{NI}(q_{i}(\lambda _{NI}) - q_{i-1}(\lambda _{NI}))\\{} & {} \quad + \lambda _{F}(t_{i}(\lambda _{NI}) - t_{i-1}(\lambda _{NI})) = 0, \end{aligned}$$

where $q_{i}$ and $t_{i}$, with $i = 1, \dots , n$, are defined in (A3), and, analogously to the proof of Theorem 2, $q(\lambda _{NI})$, $t(\lambda _{NI})$ denote the values of the parameters defined above at some value of $\lambda _{NI}$.

Assume that for $\lambda _{NI}$ in the solution $\hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI})$ we have a following constant region

$$\begin{aligned} \begin{aligned}&{\hat{\beta }}_{j-1}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) \ne {\hat{\beta }}_{j}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) = \dots \\&= {\hat{\beta }}_{j+k}^{FNI}({\varvec{y}},\lambda _{F}, \lambda _{NI}) \ne {\hat{\beta }}_{j+k +1}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}), \end{aligned} \end{aligned}$$

(A4)

and in the same way as in Tibshirani et al. (2011) for $\lambda _{NI}^{*}$ we consider the subset of the subgradient equations

$$\begin{aligned} \begin{aligned} g_{i}(\lambda _{NI})&= -(y_{i} - \beta _{i}) + \lambda _{NI}^{*}(q_{i}(\lambda _{NI}^{*}) - q_{i-1}(\lambda _{NI}^{*}))\\&\quad + \lambda _{F}(t_{i}(\lambda _{NI}^{*}) - t_{i-1}(\lambda _{NI}^{*})) = 0, \end{aligned} \end{aligned}$$

(A5)

with $i = j, \dots , k$, and show that there exists the solution for which (A4) holds, $q_{i} \in [0, 1]$ and $t_{i} \in [-1, 1]$.

Note first that as $\lambda _{NI}$ increases, (A4) holds until the merge with other groups happens, which means that $q_{j-1}, q_{j+k} \in \{0, 1\}$ and $t_{j-1}, t_{j+k} \in \{-1, 1\}$ will not change their values until the merge of this constant region. Also, as it follows from (A3), for $i \in [j, j+ k]$ the value of $t_{i}$ is in $[-1,1]$. Therefore, without any violation of the restrictions on $t_{i}$ we can assume that $t_{i}(\lambda _{NI}^{*}) = t_{i}(\lambda )$ for any $i \in [j, j + k - 1]$.

Next, taking pairwise differences between subgradient equations for $\lambda _{NI}$ we have

$$\begin{aligned}{} & {} \lambda _{NI}A\tilde{{\varvec{q}}}(\lambda _{NI}) + \lambda _{F}A\tilde{{\varvec{t}}}(\lambda _{NI}) \\{} & {} \quad = D\tilde{{\varvec{y}}} + \lambda _{NI}{\varvec{c}}(\lambda _{NI}) + \lambda _{F}{\varvec{d}}(\lambda _{NI}), \end{aligned}$$

where D is displayed at Fig. 1,

$$\begin{aligned} A = \begin{bmatrix} 2 &{} -1 &{} 0 &{} \dots &{} 0 &{} 0 &{} 0 \\ -1 &{} 2 &{} -1 &{} \dots &{} 0 &{} 0 &{} 0 \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ 0 &{} 0 &{} 0 &{} \dots &{} -1 &{} 2 &{} -1\\ 0 &{} 0 &{} 0 &{} \dots &{} 0 &{} -1 &{} 2 \end{bmatrix}, \end{aligned}$$

(A6)

and

$$\begin{aligned} \begin{aligned} \tilde{{\varvec{y}}}&= (y_{j}, \dots , y_{j + k}),\\ \tilde{{\varvec{q}}}(\lambda _{NI})&= (q_{j}(\lambda _{NI}), \dots , q_{j + k - 1}(\lambda _{NI})),\\ \tilde{{\varvec{t}}}(\lambda _{NI})&= (t_{j}(\lambda _{NI}), \dots , t_{j + k - 1}(\lambda _{NI})),\\ {\varvec{c}}(\lambda _{NI})&= (q_{j-1}(\lambda _{NI}), 0, \dots , 0, q_{j+k}(\lambda _{NI})),\\ {\varvec{d}}(\lambda _{NI})&= (t_{j-1}(\lambda _{NI}), 0, \dots , 0, t_{j+k}(\lambda _{NI})). \end{aligned} \end{aligned}$$

Since A is invertible we have

$$\begin{aligned}{} & {} \lambda _{NI}\tilde{{\varvec{q}}}(\lambda _{NI}) + \lambda _{F}\tilde{{\varvec{t}}}(\lambda _{NI})\\{} & {} \quad = A^{-1}D\tilde{{\varvec{y}}} + \lambda _{NI}A^{-1}{\varvec{c}}(\lambda _{NI}) + \lambda _{F}A^{-1}{\varvec{d}}(\lambda _{NI}), \end{aligned}$$

and, since $\tilde{{\varvec{q}}}(\lambda _{NI})$ and $\tilde{{\varvec{t}}}(\lambda _{NI})$ provide the solution to the subgradient equations (A5), then

$$\begin{aligned} -\lambda _{F} \le \lambda _{NI}\tilde{{\varvec{q}}}(\lambda _{NI}) + \lambda _{F}\tilde{{\varvec{t}}}(\lambda _{NI}) \le \lambda _{NI} + \lambda _{F}. \end{aligned}$$

(A7)

Next, as pointed out at Friedman et al. (2007) and Tibshirani et al. (2011)

$$\begin{aligned}{} & {} (A^{-1})_{i,1} = (n-i+1)/(n+1) \quad \text {and} \\{} & {} (A^{-1})_{i,n} = i/(n+1), \end{aligned}$$

then, one can show that

$$\begin{aligned} -\lambda _{F}{\varvec{1}}\preceq & {} \lambda _{NI}A^{-1}{\varvec{c}}(\lambda _{NI}) + \lambda _{F}A^{-1}{\varvec{d}}(\lambda _{NI}) \nonumber \\\preceq & {} \lambda _{NI}{\varvec{1}} + \lambda _{F}{\varvec{1}}. \end{aligned}$$

(A8)

Further, let us consider the case of $\lambda _{NI}^{*} > \lambda _{NI}$. Then we have

$$\begin{aligned}{} & {} \lambda _{NI}^{*}\tilde{{\varvec{q}}}(\lambda _{NI}^{*}) + \lambda _{F}\tilde{{\varvec{t}}}(\lambda _{NI}^{*})\\{} & {} \quad = A^{-1}D\tilde{{\varvec{y}}} + \lambda _{NI}^{*}A^{-1}{\varvec{c}}(\lambda _{NI}^{*}) + \lambda _{F}A^{-1}{\varvec{d}}(\lambda _{NI}^{*}). \end{aligned}$$

Recall, above we set $\tilde{{\varvec{t}}}(\lambda _{NI}^{*}) = \tilde{{\varvec{t}}}(\lambda _{NI})$, and $q_{j-1}, q_{j+k}, t_{j-1}$ and $t_{j+k}$ does not change their values until the merge, which means that ${\varvec{c}}(\lambda _{NI}^{*}) = {\varvec{c}}(\lambda _{NI})$, and ${\varvec{d}}(\lambda _{NI}^{*}) = {\varvec{d}}(\lambda _{NI})$.

Therefore, the subgradient equations for $\lambda _{NI}^{*}$ can be written as

$$\begin{aligned}{} & {} \lambda _{NI}^{*}\tilde{{\varvec{q}}}(\lambda _{NI}^{*}) + \lambda _{F}\tilde{{\varvec{t}}}(\lambda _{NI})\\{} & {} \quad = A^{-1}D\tilde{{\varvec{y}}} + \lambda _{NI}^{*}A^{-1}{\varvec{c}}(\lambda _{NI}) + \lambda _{F}A^{-1}{\varvec{d}}(\lambda _{NI}). \end{aligned}$$

Next, since the term $A^{-1}D\tilde{{\varvec{y}}}$ is not changed, $-\lambda _{F} \le \lambda _{F}\tilde{{\varvec{t}}}(\lambda _{NI}) \le \lambda _{F}$, and

$$\begin{aligned} -\lambda _{F}{\varvec{1}} \preceq \lambda _{NI}^{*}A^{-1}{\varvec{c}}(\lambda _{NI}) + \lambda _{F}A^{-1}{\varvec{d}}(\lambda _{NI}) \preceq \lambda _{NI}^{*}{\varvec{1}} + \lambda _{F}{\varvec{1}}, \end{aligned}$$

then we have

$$\begin{aligned} {\varvec{0}} \preceq \tilde{{\varvec{q}}}(\lambda _{NI}^{*}) \preceq {\varvec{1}}. \end{aligned}$$

Therefore we proved that ${\hat{\beta }}_{i}^{FNI}({\varvec{y}},{\varvec{\lambda }}^{*}) = {\hat{\beta }}_{i+1}^{FNI}({\varvec{y}},{\varvec{\lambda }}^{*})$. Since ${\hat{\beta }}_{i}^{FLNI}({\varvec{y}},{\varvec{\lambda }}^{*})$ is given by soft thresholding of ${\hat{\beta }}_{i}^{FNI}({\varvec{y}},{\varvec{\lambda }}^{*})$, then ${\hat{\beta }}_{i}^{FLNI}({\varvec{y}},{\varvec{\lambda }}^{*}) = {\hat{\beta }}_{i+1}^{FLNI}({\varvec{y}},{\varvec{\lambda }}^{*})$ for $i \in [j, k]$.

Case 3: $\lambda _{NI}$ and $\lambda _{L}$ are fixed and $\lambda _{F}^{*} > \lambda _{F}$. The proof for this case is virtually identical to the proof for the Case 2. In this case we assume that $q_{i}(\lambda _{F}^{*}) = q_{i}(\lambda _{2 })$ for any $i \in [j, j + k-1]$. Next, $q_{j-1}, q_{j+k}, t_{j-1}$ and $t_{j+k}$ do not change their values until the merge, which, again, means that ${\varvec{c}}(\lambda _{F}^{*}) = {\varvec{c}}(\lambda _{F})$, and ${\varvec{d}}(\lambda _{F}^{*}) = {\varvec{d}}(\lambda _{F})$. Finally, we can show that

$$\begin{aligned} -{\varvec{1}} \preceq \tilde{{\varvec{t}}}(\lambda _{F}^{*}) \preceq {\varvec{1}}. \end{aligned}$$

$\Box $

Proof of Theorem 5

For some fixed $\lambda _{F}$ and $\lambda _{NI}$, let us write subgradient equations for the fused lasso nearly-isotonic approximator:

$$\begin{aligned} g_{i} = -(y_{i} - \beta _{i}) + \lambda _{NI}(q_{i} - q_{i-1}) + \lambda _{F}(t_{i} - t_{i-1}) = 0, \end{aligned}$$

for $i = 1, \dots , n$, where $q_{i}$ and $t_{i}$, with $i = 1, \dots , n-1$, are given by

$$\begin{aligned}{} & {} q_{i}: {\left\{ \begin{array}{ll} = 1, &{} \text {if } \, \beta _{i} - \beta _{i+1}> 0,\\ = 0, &{} \text {if } \, \beta _{i} - \beta _{i+1}< 0,\\ \in [0,1], &{} \text {if } \, \beta _{i} =\beta _{i+1}, \end{array}\right. } \nonumber \\{} & {} \quad t_{i}: {\left\{ \begin{array}{ll} = 1, &{} \text {if } \, \beta _{i} - \beta _{i+1} > 0,\\ = -1, &{} \text {if } \, \beta _{i} - \beta _{i+1} < 0,\\ \in [-1, 1], &{} \text {if } \, \beta _{i} =\beta _{i+1}, \end{array}\right. } \end{aligned}$$

(A9)

and $q_{0} = q_{n} = t_{0} = t_{n} = 0$.

Second, assume that in the solution $\hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI})$ there are K distinct constant regions ${\mathcal {A}}(\lambda _{F}, \lambda _{NI}) = \{ A_{1}, \dots , A_{K} \}$, and $f_{j}$ and $l_{j}$ are the first and last indices, respectively, in the region $A_{j}$. Therefore, using the telescoping sums, for $k\in A_{j}$ the solution $\hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI})$ can be written as

$$\begin{aligned} {\hat{\beta }}^{FNI}_{k}({\varvec{y}}, \lambda _{F}, \lambda _{NI})= & {} \frac{\sum _{i=f_{j}}^{l_{j}}y_{i}}{|A_{j}|} - \lambda _{NI}\frac{q_{f_{j+1}} - q_{l_{j}}}{|A_{j}|} \\{} & {} - \lambda _{F} \frac{t_{f_{j+1}} - t_{l_{j}}}{|A_{j}|}, \end{aligned}$$

with $|A_{j}| = \#\{j: y_{j}\in A_{j} \}$.

We, first, prove that

$$\begin{aligned} \hat{{\varvec{\beta }}}^{F\rightarrow NI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) = \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}). \end{aligned}$$

Let us fix some $\lambda _{F}$, and take $\lambda _{NI}^{*} > \lambda _{NI}$ such that $\hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{*})$ has the same constant regions as $\hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI})$. Therefore, analogously to the case of one dimensional nearly-isotonic regression in Tibshirani et al. (2011), for a fixed $\lambda _{NI}$ the solution is linear function of $\lambda _{NI}$ in between the values of $\lambda _{NI}$ (which are called knots) where some constant regions merge.

Assume now that $\lambda _{NI} = 0$. Next, assume that in the solution $\hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, 0)$ there are K(0) distinct constant regions ${\mathcal {A}}(\lambda _{F}, 0) = \{ A_{1}, \dots , A_{K} \}$, and $f_{j}$ and $l_{j}$ are the first and last indices, respectively, in those region $A_{j}$.

Next, we increase the value of $\lambda _{NI}^{*} > \lambda _{NI}$ and assume that we still have the same constant regions as for $\lambda _{F}$ and $\lambda _{NI}$, i.e.

$$\begin{aligned} {\hat{\beta }}^{FNI}_{k}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{*})= & {} \frac{\sum _{i=f_{j}}^{l_{j}}y_{i}}{|A_{j}|} - \lambda _{NI}^{*}\frac{q_{f_{j+1}} - q_{l_{j}}}{|A_{j}|} \\{} & {} - \lambda _{F} \frac{t_{f_{j+1}} - t_{l_{j}}}{|A_{j}|}, \end{aligned}$$

i.e. at the value $\lambda _{NI}^{*}$ not merge has happened, which means that

$$\begin{aligned} {\hat{\beta }}^{FNI}_{k}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{*}) \ne {\hat{\beta }}^{FNI}_{k'}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{*}) \end{aligned}$$

if k and $k'$ are not in the same $A_{j}\in {\mathcal {A}}(\lambda _{F}, 0)$. Next, recall that for any $k \in A_{j}$ we have

$$\begin{aligned} {\hat{\beta }}^{F}_{k}({\varvec{y}}, \lambda _{F})= & {} {\hat{\beta }}^{FNI}_{k}({\varvec{y}}, \lambda _{F}, 0) \nonumber \\{} & {} = \frac{\sum _{i=f_{j}}^{l_{j}}y_{i}}{|A_{j}|} - \lambda _{F} \frac{t_{f_{j+1}} - t_{l_{j}}}{|A_{j}|}. \end{aligned}$$

(A10)

Therefore, ${\hat{\beta }}^{F}_{k}({\varvec{y}}, \lambda _{F})$ has the same constant regions as ${\hat{\beta }}^{FNI}_{k}({\varvec{y}}, \lambda _{F}, 0)$.

Then, recall that

$$\begin{aligned} \hat{{\varvec{\beta }}}^{F\rightarrow NI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) = \hat{{\varvec{\beta }}}^{NI}(\hat{{\varvec{\beta }}}^{F}({\varvec{y}}, \lambda _{F}), \lambda _{NI}) \end{aligned}$$

Next, let us choose $\lambda '_{NI} < \lambda _{NI}^{*} $ such that, again, the constant regions of $\hat{{\varvec{\beta }}}^{NI}({\hat{\beta }}^{F}_{k}({\varvec{y}}, \lambda _{F}), \lambda '_{NI})$ are the same as for ${\hat{\beta }}^{F}_{k}({\varvec{y}}, \lambda _{F})$ and ${\hat{\beta }}^{FNI}_{k}({\varvec{y}}, \lambda _{F}, \lambda _{NI})$. Then, for $k\in A_{j}$ the solution is given by

$$\begin{aligned} {\hat{\beta }}^{F\rightarrow NI}_{k}({\varvec{y}}, \lambda _{F}, \lambda '_{NI}) = \frac{\sum _{i=f_{j}}^{l_{j}}{\hat{\beta }}^{F}_{i}}{|A_{j}|} - \lambda '_{NI}\frac{q_{f_{j+1}} - q_{l_{j}}}{|A_{j}|}, \end{aligned}$$

and using (A10) we get

$$\begin{aligned}{} & {} {\hat{\beta }}^{F\rightarrow NI}_{k}({\varvec{y}}, \lambda _{F}, \lambda '_{NI}) \\{} & {} \quad = \frac{\sum _{i=f_{j}}^{l_{j}}y_{i}}{|A_{j}|} - \lambda '_{NI}\frac{q_{f_{j+1}} - q_{l_{j}}}{|A_{j}|} - \lambda _{F} \frac{t_{f_{j+1}} - t_{l_{j}}}{|A_{j}|}, \end{aligned}$$

which means that the solution is linear function of $\lambda '_{NI}$ until some constant regions merge.

Note now

$$\begin{aligned} \hat{{\varvec{\beta }}}^{F\rightarrow NI}({\varvec{y}}, \lambda _{F}, \lambda '_{NI}) = \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda '_{NI}) \end{aligned}$$

and, obviously, this equality holds at least until constant regions merge. Let $\lambda _{NI}^{(1)}$ be the first value of $\lambda _{NI}$ when the first merge happens. At the value $\lambda _{NI}^{(1)}$ the equality

$$\begin{aligned} \hat{{\varvec{\beta }}}^{F\rightarrow NI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(1)}) = \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(1)}) \end{aligned}$$

holds, since $\hat{{\varvec{\beta }}}^{NI}$ is continuous in $\lambda _{NI}$.

Assume for simplicity of notation that at $\lambda _{NI} = \lambda _{NI}^{(1)}$ the constant region $A_{j}$ merges with constant region $A_{j+1}$. Therefore, for $k\in A_{j}\cup A_{j+1}$ we have

$$\begin{aligned}{} & {} {\hat{\beta }}^{FNI}_{k}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(1)}) \\{} & {} \quad = \frac{\sum _{i=f_{j}}^{l_{j+1}}y_{i}}{|A_{j}| + |A_{j+1}|} - \lambda _{NI}^{(1)}\frac{q_{f_{j+2}} - q_{l_{j}}}{|A_{j}|+ |A_{j+1}|}\\{} & {} \qquad - \lambda _{F} \frac{t_{f_{j+2}} - t_{l_{j}}}{|A_{j}|+ |A_{j+1}|}, \end{aligned}$$

and for $k \in A_{m} \ne A_{j}\cup A_{j+1}$:

$$\begin{aligned}{} & {} {\hat{\beta }}^{FNI}_{k}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(1)}) \\{} & {} \quad = \frac{\sum _{i=f_{m}}^{l_{m}}y_{i}}{|A_{m}|} - \lambda _{NI}^{(1)}\frac{q_{f_{m+1}} - q_{l_{m}}}{|A_{m}|} - \lambda _{F} \frac{t_{f_{m+1}} - t_{l_{m}}}{|A_{m}|}. \end{aligned}$$

Further, for $\hat{{\varvec{\beta }}}^{F\rightarrow NI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(1)})$ for $k\in A_{j}\cup A_{j+1}$ we have

$$\begin{aligned}{} & {} {\hat{\beta }}^{F\rightarrow NI}_{k}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(1)}) \\{} & {} \quad = \frac{\sum _{i=f_{j}}^{l_{j+1}} {\hat{\beta }}^{F}_{i} }{|A_{j}| + |A_{j+1}|} - \lambda _{NI}^{(1)}\frac{q_{f_{j+2}} - q_{l_{j}}}{|A_{j}|+ |A_{j+1}|}\\{} & {} \quad = {\hat{\beta }}^{FNI}_{k}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(1)}), \end{aligned}$$

and for $k \in A_{m} \ne A_{j}\cup A_{j+1}$:

$$\begin{aligned}{} & {} {\hat{\beta }}^{F\rightarrow NI}_{k}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(1)}) \\{} & {} \quad = \frac{\sum _{i=f_{m}}^{l_{m}} {\hat{\beta }}^{F}_{i} }{|A_{m}|} - \lambda _{NI}^{(1)}\frac{q_{f_{m+1}} - q_{l_{m}}}{|A_{m}|} \\{} & {} \quad = {\hat{\beta }}^{FNI}_{k}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(1)}). \end{aligned}$$

Next, let us increase $\lambda _{NI}^{(1)}$ by $\delta \lambda $ so that no merge in $\hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI} + \delta \lambda )$ happens. Then, for $k\in A_{j}\cup A_{j+1}$ we have

$$\begin{aligned} \begin{aligned}&{\hat{\beta }}^{FNI}_{k}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(1)} + \delta \lambda ) \\&\quad = \frac{\sum _{i=f_{j}}^{l_{j+1}}y_{i}}{|A_{j}| + |A_{j+1}|} - (\lambda _{NI}^{(1)} + \delta \lambda ) \frac{q_{f_{j+2}} - q_{l_{j}}}{|A_{j}|+ |A_{j+1}|} \\&\qquad - \lambda _{F} \frac{t_{f_{j+2}} - t_{l_{j}}}{|A_{j}|+ |A_{j+1}|}, \end{aligned} \end{aligned}$$

and for $k \in A_{m} \ne A_{j}\cup A_{j+1}$:

$$\begin{aligned}{} & {} {\hat{\beta }}^{FNI}_{k}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(1)} + \delta \lambda )\\{} & {} \quad = \frac{\sum _{i=f_{m}}^{l_{m}}y_{i}}{|A_{m}|} - (\lambda _{NI}^{(1)} + \delta \lambda )\frac{q_{f_{m+1}} - q_{l_{m}}}{|A_{m}|} - \lambda _{F} \frac{t_{f_{m+1}} - t_{l_{m}}}{|A_{m}|}. \end{aligned}$$

Further, in the case of $\hat{{\varvec{\beta }}}^{F\rightarrow NI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(1)})$ we increase $\lambda $ by $\delta \lambda ' < \delta \lambda $ and, therefore, we have for $k\in A_{j}\cup A_{j+1}$:

$$\begin{aligned}{} & {} {\hat{\beta }}^{F\rightarrow NI}_{k}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(1)} + \delta \lambda ') \\{} & {} \quad = \frac{\sum _{i=f_{j}}^{l_{j+1}} {\hat{\beta }}^{F}_{i} }{|A_{j}| + |A_{j+1}|} - (\lambda _{NI}^{(1)}+\delta \lambda ')\frac{q_{f_{j+2}} - q_{l_{j}}}{|A_{j}|+ |A_{j+1}|}, \end{aligned}$$

and for $k \in A_{m} \ne A_{j}\cup A_{j+1}$:

$$\begin{aligned}{} & {} {\hat{\beta }}^{F\rightarrow NI}_{k}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(1)} + \delta \lambda ') \\{} & {} \quad = \frac{\sum _{i=f_{m}}^{l_{m}} {\hat{\beta }}^{F}_{i} }{|A_{m}|} - (\lambda _{NI}^{(1)}+\delta \lambda ')\frac{q_{f_{m+1}} - q_{l_{m}}}{|A_{m}|}. \end{aligned}$$

Therefore, before the next merge happens we have the following relation between the estimators $ \hat{{\varvec{\beta }}}^{F\rightarrow NI}({\varvec{y}}, \lambda _{F}, \lambda _{NI})$ and $\hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI})$

$$\begin{aligned}{} & {} {\hat{\beta }}^{FNI}_{k}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(1)} + \delta \lambda )\\{} & {} \quad = {\hat{\beta }}^{F\rightarrow NI}_{k}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(1)} + \delta \lambda ') \\{} & {} \qquad + (\delta \lambda - \delta \lambda ')\frac{q_{f_{j+2}} - q_{l_{j}}}{|A_{j}|+ |A_{j+1}|}, \end{aligned}$$

if $k\in A_{j}\cup A_{j+1}$, and

$$\begin{aligned}{} & {} {\hat{\beta }}^{FNI}_{k}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(1)} + \delta \lambda ) \\{} & {} \quad = {\hat{\beta }}^{F\rightarrow NI}_{k}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(1)} + \delta \lambda ') + (\delta \lambda - \delta \lambda ')\frac{q_{f_{m+1}} - q_{l_{m}}}{|A_{m}|}, \end{aligned}$$

for $k \in A_{m} \ne A_{j}\cup A_{j+1}$.

We have proved that before the second merge we have

$$\begin{aligned} \hat{{\varvec{\beta }}}^{F\rightarrow NI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) = \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) \end{aligned}$$

and at the value of $\lambda _{NI}^{(2)}$ when the second merge of some constant regions happens we have

$$\begin{aligned} \hat{{\varvec{\beta }}}^{F\rightarrow NI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(2)}) = \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}^{(2)}) \end{aligned}$$

by the continuity.

We can continue this process until the last knot point in the path. Therefore we proved the equality of the estimators. The proof of

$$\begin{aligned} \hat{{\varvec{\beta }}}^{NI\rightarrow F}({\varvec{y}}, \lambda _{F}, \lambda _{NI}) = \hat{{\varvec{\beta }}}^{FNI}({\varvec{y}}, \lambda _{F}, \lambda _{NI}). \end{aligned}$$

is virtually the same with $q_{j}$ suitably changed to $t_{j}$ and $\lambda _{NI}$ to $\lambda _{F}$ and using the propertis of fused lasso from Hoefling (2010). $\Box $

Proof of Theorem 6

First, for the fused estimator $\hat{{\varvec{\beta }}}^{F}({\varvec{y}}, \lambda _{F})$ let

$$\begin{aligned} K^{F}({\varvec{y}}, \lambda _{F}) = \#\{\text {fused groups in } \hat{{\varvec{\beta }}}^{F}({\varvec{y}}, \lambda _{F})\}. \end{aligned}$$

Then, as it follows from Tibshirani and Taylor (2011), for $\hat{{\varvec{\beta }}}^{F}({\varvec{y}}, \lambda _{F})$ we have

$$\begin{aligned} {\mathbb {E}}[K^{F}({\varvec{Y}}, \lambda _{F})] = df\left( \hat{{\varvec{\beta }}}^{F}({\varvec{Y}}, \lambda _{F})\right) . \end{aligned}$$

Next, from Proposition 3, it follows

$$\begin{aligned}{} & {} \hat{{\varvec{\beta }}}^{FNI}\left( {\varvec{y}}, \lambda _{F}, \lambda _{NI}) = \hat{{\varvec{\beta }}}^{F}({\varvec{y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \frac{1}{2}\lambda _{NI} + \lambda _{F}\right) . \end{aligned}$$

Therefore, using the property of covariance we have

$$\begin{aligned} \begin{aligned}&df(\hat{{\varvec{\beta }}}^{FNI}( {\varvec{Y}}, \lambda _{F}, \lambda _{NI})) \\&\quad =\sum _{i=1}^{n}\textrm{Cov}\left[ {\hat{\beta }}^{FNI}_{i}({\varvec{Y}}, \lambda _{F}, \lambda _{NI}), Y_{i}\right] \\&\quad = \sum _{i=1}^{n}\textrm{Cov}\left[ {\hat{\beta }}^{F}_{i}\left( {\varvec{Y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \frac{1}{2}\lambda _{NI} + \lambda _{F}\right) , Y_{i}\right] \\&\quad = \sum _{i=1}^{n}\textrm{Cov}\Bigg [{\hat{\beta }}^{F}_{i}\left( {\varvec{Y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \frac{1}{2}\lambda _{NI} + \lambda _{F}\right) , Y_{i} \\&\qquad - \frac{\lambda _{NI}}{2} [D^{T}{\varvec{1}}]_{i}\Bigg ] \\&\quad ={\mathbb {E}}\left[ K^{F}\left( {\varvec{Y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \frac{1}{2}\lambda _{NI} + \lambda _{F}\right) \right] \\&\quad \equiv {\mathbb {E}}\Bigg [K^{FNI}({\varvec{Y}}, \lambda _{F}, \lambda _{NI})\Bigg ], \end{aligned} \end{aligned}$$

where $[{\varvec{a}}]_{i}$ denotes i-th element in the vector ${\varvec{a}}\in {\mathbb {R}}^{n}$.

Next, we prove the result for the fused lasso nearly-isotonic approximator. From Proposition 3 we have

$$\begin{aligned}{} & {} \hat{{\varvec{\beta }}}^{FLNI}({\varvec{y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI}) \\{} & {} \quad = \hat{{\varvec{\beta }}}^{FL}\left( {\varvec{y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \frac{1}{2}\lambda _{NI} + \lambda _{F}, \lambda _{L}\right) . \end{aligned}$$

Next, for the fused lasso $\hat{{\varvec{\beta }}}^{FL}({\varvec{y}}, \lambda _{F}, \lambda _{L})$ defined in (2) let

$$\begin{aligned} K^{FL}({\varvec{y}}, \lambda _{F}, \lambda _{L})= & {} \#\{\text {non-zero fused groups in } \\{} & {} \quad \hat{{\varvec{\beta }}}^{FL}({\varvec{y}}, \lambda _{F}, \lambda _{L})\}, \end{aligned}$$

and from Tibshirani and Taylor (2011) it follows

$$\begin{aligned} {\mathbb {E}}\left[ K^{FL}({\varvec{Y}}, \lambda _{F}, \lambda _{L})\right] = df\left( \hat{{\varvec{\beta }}}^{FL}({\varvec{Y}}, \lambda _{F}, \lambda _{L})\right) . \end{aligned}$$

Further, again, using the property of the covariance, we have

$$\begin{aligned} \begin{aligned}&df\left( \hat{{\varvec{\beta }}}^{FLNI} ({\varvec{Y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI})\right) \\&\quad ={} \sum _{i=1}^{n}\textrm{Cov}\left[ {\hat{\beta }}^{FLNI}_{i}({\varvec{Y}}, \lambda _{F},\lambda _{L}, \lambda _{NI}), Y_{i}\right] \\&\quad =\sum _{i=1}^{n}\textrm{Cov}\left[ {\hat{\beta }}^{FL}_{i}\left( {\varvec{Y}}{-} \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \frac{1}{2}\lambda _{NI} {+} \lambda _{F}, \lambda _{L}\right) , Y_{i}\right] \\&\quad =\sum _{i=1}^{n}\textrm{Cov}\Bigg [{\hat{\beta }}^{FL}_{i}\left( {\varvec{Y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \frac{1}{2}\lambda _{NI} + \lambda _{F},\lambda _{L}\right) , Y_{i} \\&\qquad - \frac{\lambda _{NI}}{2} [D^{T}{\varvec{1}}]_{i}\Bigg ]\\&\quad ={\mathbb {E}}\left[ K^{FL}\left( {\varvec{Y}} - \frac{\lambda _{NI}}{2} D^{T}{\varvec{1}}, \frac{1}{2}\lambda _{NI} + \lambda _{F},\lambda _{L}\right) \right] \\&\quad \equiv {\mathbb {E}}\left[ K^{FLNI}({\varvec{Y}}, \lambda _{F}, \lambda _{L}, \lambda _{NI})\right] . \end{aligned} \end{aligned}$$

Lastly, we note that the proof for the unbiased estimator of the degrees of freedom for nearly-isotonic regression, given in Tibshirani et al. (2011), can be done in the same way as in the current proof, using the relation (19) and, again, the result of the paper Tibshirani and Taylor (2011) for the fusion estimator $\hat{{\varvec{\beta }}}^{FLNI}({\varvec{Y}}, \lambda _{F})$. $\Box $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pastukhov, V. Fused lasso nearly-isotonic signal approximation in general dimensions. Stat Comput 34, 120 (2024). https://doi.org/10.1007/s11222-024-10432-6

Download citation

Received: 19 October 2023
Accepted: 12 April 2024
Published: 22 May 2024
DOI: https://doi.org/10.1007/s11222-024-10432-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Fused lasso nearly-isotonic signal approximation in general dimensions

Abstract

Similar content being viewed by others

Group Fused Lasso

Bayesian fused lasso modeling via horseshoe prior

A dual symmetric Gauss-Seidel technique-based proximal ADMM for robust fused lasso estimation

Explore related subjects

1 Introduction

1.1 Notation

1.2 General statement of the problem

1.3 Organisation of the paper

2 Solution to the fused lasso nearly-isotonic signal approximator

Theorem 1

Theorem 2

3 Properties of the fused lasso nearly-isotonic signal approximator

Proposition 3

Proposition 4

Theorem 5

4 Degrees of freedom

Theorem 6

5 Computational aspects, simulation study and application to a real data set

6 Conclusion and discussion

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendix A Proofs of the results

Appendix A Proofs of the results

Proof of Theorem 1

Proof of Theorem 2

Proof of Proposition 3

Proof of Theorem 5

Proof of Theorem 6

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation