Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In various tasks in digitally archiving cultural heritages including 3D reconstruction [1, 2], numerical optimization plays a central role. More specifically, we often optimize some \(\ell _p\)-norm of the cost vector (equivalently, its p-th power \(\ell _p^p\)) or the combination of different vector norms. For example, in compressive sensing, an unconstrained form of Lasso (least absolute shrinkage and selection operator) [3]

$$\begin{aligned} \min _x \Vert A x - b \Vert _2^2 + \lambda \Vert x \Vert _1 \end{aligned}$$
(1)

is used for reconstructing sparse signal x while ensuring data fitting. As another example, Tikhonov regularization (or ridge regression) [4],

$$\begin{aligned} \min _x \Vert A x - b \Vert _2^2 + \lambda \Vert \varGamma x\Vert _2^2, \end{aligned}$$
(2)

appears in image restoration, super-resolution, and image deblurring. These objective functions can be further augmented by additional \(\ell _p\)-norm terms that represent further constraints for particular problems. For example, the elastic net [5] is defined as

$$\begin{aligned} \min _x \Vert A x - b \Vert _2^2 + \lambda _1 \Vert x\Vert _1 + \lambda _2 \Vert x\Vert _2^2 \end{aligned}$$
(3)

by regularizing the solution using both \(\ell _1\)- and \(\ell _2\)-norm terms. Some special cases are known to have closed-form solutions, e.g., the minimizer of Tikhonov regularization (2) is given by \(\hat{x} = (A^T A + \lambda \varGamma ^T \varGamma )^{-1} A^T b\), or can be transformed into a simpler expression, e.g., the elastic net problem (3) can be rewritten as an equivalent Lasso problem (1) with some augmentation [5].

Generally, these \(\ell _p (p \ge 1)\) unconstrained minimization problems can be solved by any convex optimization methodsFootnote 1. To gain a greater computation performance, typically problem-specific structures are exploited to design a tailored solution method. For example, least-squares problems that consist of \(\ell _2\)-norms can be solved analytically, or for a large-scale problem, conjugate gradient methods [6] are employed for faster computation. When the rank of the design matrix is small enough, randomized singular value decomposition (R-SVD) [7] may be employed for further acceleration. It is also understood that \(\ell _1\) minimization problems can be transformed into a linear programming problem [8], which can be efficiently solved by an interior point method [9]. On the other hand, it is still of broad interest to improve the performance of the general norm approximation problem, because in practical situations there is a strong need for testing with various formulations with different norms in designing computer vision applications. For example, one might initially formulate a regression problem with an \(\ell _2\)-norm but later might add an \(\ell _1\) or \(\ell _2\) regularizer for stabilizing the solution.

This motivates us to develop a fast solver for the generalized norm approximation problem:

$$\begin{aligned} \min _x \sum _{k=1}^K \lambda _k \left\| A_k x-b_k \right\| ^{p_k}_{p_k}, \end{aligned}$$
(4)

where \(k=\{1, \cdots , K\}\) is the term index, \(A_k \in \mathbb {R}^{m_k \times n}\) and \(b_k \in \mathbb {R}^{m_k}\) the the design matrix and constant vector that define the k-th linear objective, and the overall objective function is defined as a linear combination of \(p_k\)-th power of \(\ell _{p_k}\)-norm weighted by \(\lambda _k\). Our method is built upon a simple yet powerful iteratively reweighted least-squares (IRLS) scheme. In the IRLS scheme, the problem can be reduced to iteratively solving a linear system that is derived as a normal equation of the sum of weighted squares of the terms.

In this paper, we present a fast method for deriving the approximate solution for this problem that outperforms the state-of-the-art solution methods such as [10, 11]. Our method exploits the trait that the solution is gradually updated in an iterative manner in the IRLS scheme, and achieves acceleration by taking the previous estimate as an initial guess at each iteration for an LSQR solver [12]. The proposed method is faster and more stable than the previous state-of-the-art approaches as we are going to see in the experimental validation. In addition, the solution method for the general expression (4) has not been explicitly described in the literature that we are aware of, and we show in this paper that the general form can be solved in a unique manner regardless of the number of terms and diverse \(\ell _p\)-norm objectives and benefit from the proposed method.

2 Related Works

The early studies of IRLS can be found back in 1960’s [13], developed for approximating a Chebyshev or \(\ell _{\infty }\) norm. It has been later extended to approximate a general \(\ell _p\)-norm term [14]. While the early studies focus on convex approximations with \(p \ge 1\) mostly with a single term, later, the focus has been shifted to the case where \(p < 1\) (non-convex cases). The original sparse recovery using the IRLS scheme has been known as FOCUSS [15] prior to that shift, and it has been known useful for robust estimation tasks. With the rise of compressive sensing and sparse recovery, norm approximation with \(p<1\) has been extensively studied. Chartrand and Yin [16] introduced a regularizer \(\epsilon \), for augmenting the IRLS weight, that varies from large to small over iterations so that it effectively smooths out the objective function and as a result avoids local minima. Daubechies et al. [10] proposed an alternative method for updating the weight over iterations and showed the convergence property in sparse recovery. Candès et al. [17] introduced an iteratively reweighted \(\ell _1\) minimization method, which repeatedly solves \(\ell _1\) minimization problem, for further enhancing sparsity. Wipf and Nagarajan [18] provided an extensive analysis on \(\ell _2\) and \(\ell _1\) reweighting schemes [16, 17] and made distinction between separable (i.e, the weighting of a particular coefficient depends only that in previous iteration) and non-separable iterative reweighting schemes. The development of an effective numerical algorithm for sparse recovery is still an active research topic, and there is a broad interest in the area.

With these theoretical and algorithmic development, the IRLS scheme has expanded its application domain. It has been used for various signal processing applications, such as FIR filter design [19], image deblurring with \(\ell _p\)-norm (\(p=0.8\)) minimization [20, 21], denoising based on TV regularization [22], and super-resolution [23]. The IRLS scheme has also been used for minimizing nuclear norm [24] and structured sparse recovery [25]. These new applications widen the use of IRLS scheme for even more diverse applications. This paper is motivated by the background that accelerating the general sum of \(\ell _p^p\) terms is urgent because of its increasing need in various computer vision applications.

Because the problem of (4) is unconstrained and convex when \(p_k \ge 1\), it can be also solved via a family of efficient quasi-Newton methods, such as limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) method, although such general convex optimizers are typically not optimal in terms of their performance.

3 Fast General Norm Approximation

This section describes the proposed method for general norm approximation using IRLS. In Sect. 3.1, we begin with briefly reviewing the IRLS and show how the generalized form of Eq. (4) can be solved via IRLS. We then describe an acceleration method using LSQR in Sect. 3.2.

3.1 IRLS for Norm Approximation Problems

Since 1960’s, it has been understood that norm approximation problems can be solved via IRLS [13, 14]. With the IRLS framework, norm approximation problems can be casted to iteratively solving weighted least-squares problem. Let us take an example of minimizing the p-th power of \(\ell _p\)-norm of a real-valued vector (\(Ax - b\)):

$$\begin{aligned} \min _x f(x), \quad f(x) = \left\| A x-b \right\| _{p}^{p} = (Ax-b)^T W^T W (A x - b). \end{aligned}$$
(5)

The above can be expressed by a weighted squares of the vector as:

$$\begin{aligned} \min _x f(x), \quad f(x) = \left\| W (A x - b) \right\| _2^2 , \end{aligned}$$
(6)

with a proper diagonal weight matrix W, whose elements are all non-negative. where W is a diagonal matrix of weights to be determined in IRLS. The problem of (6) is a quadratic programming; therefore the minimizer \(x^*\) is attained when

$$\begin{aligned} \frac{\partial }{\partial x} \left\| W (A x- b)\right\| _2^2= 2 p \left( A^T W^T W A x - A^T W^T W b\right) = 0. \end{aligned}$$
(7)

Therefore, the approximate solution \(x^*\) becomes

$$\begin{aligned} x^* = (A^T W^T W A)^{-1} A^T W^T W b. \end{aligned}$$
(8)

In the IRLS scheme, the weight matrix W is iteratively refined for a more focal estimate. Let \(w_i\) and \(e_i\) denote the i-th diagonal element of W and the i-th element of the residual vector \(e = A x - b\), respectively. Since

$$\begin{aligned} \left\| A x- b \right\| _p^p = \sum _i |e_i|^{p-2} |e_i|^2 = \sum _i w_i^2 |e_i|^2, \end{aligned}$$
(9)

at each iteration, the weight matrix element \(w_i\) is updated by \(w_i \leftarrow |e_i|^{(p/2-1)}\) if \(e_i \ne 0\) or \(w_i \leftarrow 1/\varepsilon \) otherwise where \(\varepsilon \) is a sufficiently small positive value. Typically, the weight matrix W is initialized as an identity matrix.

IRLS for General Norm Approximation. We have seen how \(\ell _p\)-norm minimization for a single term is achieved by IRLS. We now describe its extension to the multiple terms for applying IRLS to the general norm approximation problem (4). For a general norm approximation problem

$$\begin{aligned} \min _x f(x), \quad f(x) = \sum _{k=1}^K \lambda _k \left\| A_k x-b_k \right\| ^{p_k}_{p_k}, \end{aligned}$$
(10)

there are K terms that are defined by \(\ell _{p_k}\)-norm, where \(p_k\) may be different across the terms, each weighted by \(\lambda _k\). With K weight matrices \(W_k\), it can be approximated by

$$\begin{aligned} f(x) = \sum _{k=1}^K \lambda _k \left( A_k x - b_k \right) ^T W_k^T W_k \left( A_k x -b_k \right) , \end{aligned}$$
(11)

in a similar manner to the single term case. From the normal equation of above, the approximate solution can be determined by differentiating f(x) w.r.t. x as

$$\begin{aligned} \frac{\partial f(x)}{\partial x} = \sum _{k=1}^K 2 p_k \lambda _k \left( A_k^T W_k^T W_k A_k x - A_k^T W_k^T W_k b_k \right) = 0. \end{aligned}$$
(12)

Therefore, the minimizer \(x^*\) can be obtained by

$$\begin{aligned} x^* = \left( \sum _{k=1}^K p_k \lambda _k A_k^T W_k^T W_k A_k \right) ^{-1} \left( \sum _{k=1}^K p_k \lambda _k A_k^T W_k^T W_k b_k\right) . \end{aligned}$$
(13)

The pseudo-code of IRLS for general norm approximation is summarized in Algorithm 1.

figure a

3.2 Acceleration of IRLS

The major bottleneck of the IRLS algorithm is its need for solving the weighted least-squares problem over iterations until convergence. In particular, when the size of matrix \(A_k\) is large, the computation cost significantly increases if an analytic solution method like Eq. (13) is used. To accelerate the whole algorithm, it is needed to efficiently solve the weighted least-squares problem. We exploit the fact that the solution x and weight matrix \(W_k\) are gradually updated over iterations of IRLS and use the previous estimate of x for efficiently updating the solution x using LSQR.

Previously, conjugate gradient methods have been applied to the least-squares problem in an iterative framework [25, 26] in a similar spirit. They are effective when the design matrix A in \(\Vert Ax-b\Vert _2^2\) is well-conditioned. Convergence of the conjugate gradient method is analyzed by the relative error [27]:

$$\begin{aligned} \frac{\left\| x-x^{(n)}\right\| _A}{\left\| x-x^{(0)}\right\| _A} \le 2 \left( \frac{\sqrt{\kappa -1}}{\sqrt{\kappa +1}}\right) ^n, \end{aligned}$$
(14)

where \(\left\| \cdot \right\| _A\) indicates A-norm, \(\kappa \) is matrix A’s condition number calculated by \(\kappa = \sigma _{\text {max}}(A) / \sigma _{\text {min}}(A)\), \(\sigma _{\text {max}}(A)\) and \(\sigma _{\text {min}}(A)\) are the maximum and the minimum singular values of A, respectively. When the relative error in the left side of Eq. (14) is large, convergence from \(x^{(0)}\) to \(x^{(n)}\) through n iterations takes much time. The upper bound of the relative error can be calculated with \(\kappa \), and a greater \(\kappa \) makes the convergence slower. Unfortunately, for the weighted least-squares problem \(WAx = Wb\) in IRLS, the condition number of matrix WA naturally increases as the iteration proceeds [26]. To depict this issue, we show a preliminary experiment of running IRLS for solving \(\min _x \Vert Ax - b\Vert _2^2\) ten times with randomly generated matrices \(A \in \mathbb {R}^{500 \times 400}\) with vector \(b \in \mathbb {R}^{500}\) and also a larger scale setting, \(A \in \mathbb {R}^{1000 \times 800}\) with vector \(b \in \mathbb {R}^{1000}\). Figure 1 shows the variation of the average condition number over iterations plotted in a log scale. As seen in the figure, the condition number grows exponentially over iterations, which makes conjugate gradient methods slower and less stable in later iterations.

Fig. 1.
figure 1

Growth of the condition number of A in each iteration. The plots show the average condition numbers over ten trials for each setting.

To overcome this issue, previous approaches use preconditioning to yield the equivalent least-squares problem with the small condition number of A by multiplying a matrix P called preconditioner [25, 26, 28]. However, the preconditioner P is problem-dependent [25]; therefore, it is not straightforward to incorporate the preconditioned conjugate gradient method into the generalized norm approximation problem.

To avoid these problems, we use the LSQR method [12], which is a stable iterative method for ill-conditioned least-squares problems [29, 30]. In LSQR, the least-square problem is reduced to another least-square problem including a bidiagonal matrix, and the reduced problem is solved by QR factorization. To accelerate the LSQR computation in IRLS framework, we use a “warm start” strategy by taking the previous estimate of the solution as the initial guess for the next iteration. The pseudo-code of LSQR to find a solution \(x^{(i+1)}\) at the \((i+1)\)-th iteration from \(x^{(i)}\) from the previous iteration in our context is shown in Algorithm 2.

figure b

4 Performance Evaluation

To evaluate the computational efficiency of the proposed method, we test the algorithm using the following weighted norms minimization problems:

$$\begin{aligned} \text {(Problem 1)} \qquad&\min _x \left\| A_1 x - b_1\right\| _1, \end{aligned}$$
(15)
$$\begin{aligned} \text {(Problem 2)} \qquad&\min _x \left\| A_2 x - b_2\right\| _2^2 + \left\| A_3 x - b_3\right\| _1, \end{aligned}$$
(16)

where \(A_1 \in \mathbb {R}^{500 \times 400}, b_1 \in \mathbb {R}^{500}\) and \(A_2, A_3 \in \mathbb {R}^{1000 \times 800}, b_2, b_3 \in \mathbb {R}^{1000}\). Matrices \(A_k\) and solution x are randomly generated, and vector \(b_k\) is computed by \(b_k \leftarrow A_k x\). To add outliers to the systems, the signs of \(10\%\) of elements in \(b_k\) are flipped.

Table 1. Computation times and residuals of each method applied to problems 1 and 2.

We implement the following methods to compare the computational times on a PC equipped with Intel Core i7 930 @2.8GHz and 24 GB Memory.

  • L-BFGS

  • IRLS by QR decomposition with column pivoting (IRLS-QR)

  • IRLS by Jacobi-preconditioned Conjugate Gradient method (IRLS-CG)

  • IRLS by Jacobi-preconditioned Conjugate Gradient method with warm start (IRLS-CG-WS)

  • IRLS by LSQR (IRLS-LSQR)

  • IRLS by LSQR with warm start (IRLS-LSQR-WS) (proposed method)

We use ALGLIB [31] for L-BFGS and Eigen [32] matrix library for matrix operations. Because L-BFGS does not converge well on Problem 1 and Problem 2 that include non-smooth \(\ell _1\)-norms, we approximate \(\left\| A_1 x - b_1\right\| _1\) to \(\sqrt{(A_1 x-b_1 )^2 + \gamma }\) with a sufficiently small positive value \(\gamma (=10e^{-8})\) when applying L-BFGS.

Fig. 2.
figure 2

Variation of computation times over iterations. Our method benefits from the warm start strategy and the computation time quickly drops at the early stage of iterations.

Fig. 3.
figure 3

Variation of residuals over iterations.

Table 1 summarizes the average computation times and residuals of ten trials, and Figs. 2 and 3 show computation times and residuals over iterations when solving Problems 1 and 2, respectively. As shown in Table 1, IRLS-based methods are faster and more accurate than L-BFGS, even though we use a relaxed tolerance for L-BFGS for faster convergence. Although IRLS-QR solves the smaller problem fast with the smallest residual, the computation time rapidly grows as the size of the matrix becomes larger.

The effectiveness of the warm start strategy is clearly seen in IRLS by Conjugate Gradient and LSQR, showing about 20 times faster convergence. As shown in Fig. 2, while computation times for methods without warm start drastically increase as the iteration proceeds, they are significantly reduced with warm start. It is also shown in Fig. 3 that the warm start strategy is effective in reducing the residual by providing a guide to solve least-squares with high condition numbers at later iterations.

5 Applications

The expression of the general norm approximation (4) offers flexibility of treating diverse objective functions in a unified manner, and they can generally benefit from the proposed efficient computation method. As example use cases, we show two applications in this section: Photometric stereo in Sect. 5.1 and surface reconstruction from normals in Sect. 5.2.

5.1 Photometric Stereo

Photometric stereo is a method for estimating surface normal of a surface from its appearance variations under different lightings. Let us assume that we have an \(f \times 3\) light direction matrix L that contains f distinct light directions as row vectors. A scene is illuminated under these light directions, and it yields corresponding observation vector \(o \in \mathbb {R}_+^f\) for each pixel. Assuming that the scene reflectance obeys the Lambert’s law, the image formation model of a pixel can be expressed as can be expressed as

$$\begin{aligned} o = Ln, \end{aligned}$$
(17)

where n is a surface normal vector that we wish to estimate scaled by diffuse albedo. When the number of light directions is greater than three \((f > 3)\), the Lambertian photometric stereo [33] determines the scaled surface normal by the least-squares approximate solution as

$$\begin{aligned} n^* = \mathop {\mathrm {argmin}}_n \Vert Ln - o\Vert _2^2. \end{aligned}$$
(18)

The solution method for the above problem is rather straightforward and does not even require our method, while it can still be represented as a special case of (4).

In reality, scene reflectances may contain specularity that can be regarded as unmodelled outliers. It motivated the previous work [34, 35] to use an \(\ell _1\)-norm minimization as robust estimation as

$$\begin{aligned} n^* = \mathop {\mathrm {argmin}}_n \Vert Ln - o \Vert _1. \end{aligned}$$
(19)

It corresponds to minimization of the residual \(e = o - Ln\), as above can be rewritten as the problem \(\min _n \Vert e\Vert _1\) s.t. \(e = o-Ln\). This hard constraint can be relaxed as a regularizer as depicted in [34, 35] as

$$\begin{aligned} n^* = \mathop {\mathrm {argmin}}_n \lambda _1 \Vert o - Ln - e \Vert _2^2 + \Vert e \Vert _1 \end{aligned}$$
(20)

for allowing a certain magnitude of Gaussian error in the constraint.

In a special case, when we have an external method for determining surface normal, e.g., surface normal obtained from a measured depth by structured light, and can use it as prior for stabilizing the solution, the photometric stereo problem can be formulated as

$$\begin{aligned} n^* = \mathop {\mathrm {argmin}}_n \Vert Ln - o \Vert _1 + \lambda _2 \Vert n - n_g\Vert _2^2, \end{aligned}$$
(21)

in which \(n_g\) is surface normal that are obtained by an external method.

These four different problem settings (18)–(21) can all be represented by general norm approximation (4), and our method is applicable to any of the settings. To demonstrate this, we render 40 images of Bunny and Caesar scenes under different light directions and solve these four problems. We set \(\lambda _1, \lambda _2\,=\,10\text {e}^{6}\). \(n_g\) in (21) is generated by adding \(1\%\) of Gaussian noise to every normal elements in the ground truth normal vector. Figure 4 shows estimated surface normal and error maps of the proposed IRLS-LSQR-WS method. Putting aside that there are variations in errors because of different formulations, it shows that our method is applicable to diverse formulations.

Fig. 4.
figure 4

Surface normal estimates and error maps with photometric stereo. (a)–(d) correspond to the settings (18)–(21). “Error” indicates the angular error in degrees, and “Time” corresponds to the computation time.

Fig. 5.
figure 5

Surface reconstruction from normal maps by the proposed method (IRLS-LSQR-WS)

5.2 Surface Reconstruction from Normals

Once the surface normal is obtained by photometric stereo or shape from shading, one may like to reconstruct a surface from the normal, e.g., by [36]. From surface normal \(n = (n_x, n_y, n_z)^T\) defined in the image coordinates (uv), the gradients \(g_u\) and \(g_v\) are computed as

$$\begin{aligned} g_x (u,v) = \frac{n_x(u,v)}{n_z(u,v)}, \quad g_y (u,v) = \frac{n_y(u,v)}{n_z(u,v)}. \end{aligned}$$
(22)

Let \(G_u, G_v \in \mathbb {R}^{m \times n}\) denote matrices of \(g_u\) and \(g_v\), respectively. These gradients corresponds to the 1-st order differentiation of the surface \(Z \in \mathbb {R}^{m \times n}\) to be reconstructed. Therefore, with a differentiation matrix \(D \in \mathbb {R}^{2mn \times mn}\), the relationship can be written as

$$\begin{aligned} Dz \approx g, \end{aligned}$$
(23)

where

figure c

with the identity matrix I and pixel size h. Similar to [37], we consider that there are outliers in normals that are 10 times larger than true normals and \(10\%\) of pixels are corrupted by them. In this setting, we consider the \(\ell _1\)-residual minimization to reconstruct the surface in the presence of outliers as

$$\begin{aligned} \min _z \left\| Dz - g \right\| _1. \end{aligned}$$
(24)

This problem, again, is a special case of the general norm approximation problem (4), and our method can be applied to derive the solution z. We use three different scenes, Bunny, Dragon, and Happy Buddha as target scenes for testing this scenario. For comparison, we show the result of surface reconstruction by \(\ell _2\)-norm minimization as reference.

Figure 5 shows the reconstructed surfaces, \(\ell _2\)-residual from the ground truth, and computation times. The accuracy indicates the strength of \(\ell _1\)-residual minimization, but more importantly, our method is capable of handling any of these formulations because of the generalized form of norm minimization (4).

6 Conclusions

We presented a fast general norm approximation that can be applicable to diverse problem settings in computer vision. The proposed method (IRLS-LSQR-WS) is assessed in comparison to other state-of-the-art techniques and shows the favorable computation efficiency and accuracy at a time. In addition to the numerical tests, we show application scenarios by taking photometric stereo and surface reconstruction as examples to illustrate the usefulness of the general norm approximation.

During the experiments, we found that the proposed method is advantageous over IRLS-CG-WS in terms of stability, i.e., IRLS-CG-WS occasionally becomes unstable when the condition number of the problem grows rapidly, while the proposed method does not suffer from this issue. On the other hand, IRLS-CG-WS tends to converge slightly faster when the design matrix A is sparse compared to the proposed method. We are interested in studying this trade-off by characterizing the problem by looking into the design matrix structure. In addition, further acceleration by preconditioning for LSQR [38] is another interesting venue to investigate.