1 Introduction

Uncalibrated photometric stereo (UPS) is the problem of recovering the 3D shape of an object and associated lighting conditions, given images taken with varying, unknown illumination. In this work we replace the existing pipeline for solving UPS with an integrated approach. This paper, like much prior work  [2, 4, 11, 14, 22, 31, 33], focuses on Lambertian objects illuminated by a single distant point light source in each image. Existing methods, pioneered by [14], formulate UPS as the problem of finding a low-rank factorization of the measurements. Specifically, given m images each with p pixels, let M denote the \(m \times p\) matrix containing the pixel intensities. These methods optimize

$$\begin{aligned} \underset{\hat{M}}{\min } \Vert \hat{M}-M \Vert ^{2}_{F} \quad \mathrm {s.t. \quad rank}(\hat{M})=3. \end{aligned}$$
(1)

This problem can be solved by SVD, from which we produce a family of solutions, each consisting of a set of light sources, albedos, and surface normals. These solutions are related by a \(3 \times 3\) ambiguity matrix. The surface normals provided by SVD are in general inconsistent with the partial derivatives of the surface (i.e., they are not integrable). Consequently, existing methods apply an additional sequence of steps aimed at reducing the ambiguity and fitting a surface to the recovered normals.

In this paper we propose instead to optimize:

$$\begin{aligned}&\underset{\hat{M}}{\min } \Vert \hat{M}-M \Vert ^{2}_{F} \quad \\ \nonumber \mathrm {s.t.}&\hat{M} \ \text {is rank 3 and produced by an integrable surface}. \end{aligned}$$
(2)

Equation (1) optimizes over rank-3 matrices, which can represent sets of images produced by any set of surface normals. In contrast, in (2) we optimize over only those rank 3 matrices that correspond to integrable surfaces.

Fig. 1
figure 1

An illustration of our approach. Blue represents the set of rank-3 matrices, while red represents the subset of those that correspond to integrable surfaces. Our optimization seeks to find the integrable matrix (red dot) that is closest to the measurements (black dot). If instead we first find the nearest rank 3 matrix and then select an integrable matrix (the blue dots) we may produce a suboptimal solution

Intuitively, a single optimization over all constraints may produce a better global optimum than a sequence of optimizations in which constraints are used one at a time to increasingly narrow the solution (see illustration in Fig. 1). Specifically in UPS the measurement matrix may contain many errors due to shadows and specular effects. Therefore, while in theory UPS can be solved with as few as three images, SVD can properly handle these modeling errors only when many images are supplied. Indeed, current methods [2, 11] typically use 10 or more images. With fewer images SVD results tend to provide noisy solutions. Our method incorporates integrability into this estimation, providing valuable additional constraints that help us to find a better subspace in the presence of fewer images with noisy estimates. In presence of large number of images SVD-based methods can handle noises and outliers to solve an overconstrained problem. However for fewer images, a joint optimization based on rank constraint can obtain a better subspace. Our experiments indicate that our method can produce reasonable reconstructions with as few as four images and good reconstruction with six images, significantly improving over state-of-the-art methods with these few images.

For our approach we optimize a cost function based on (2) over the surface, lighting, normals and (restored) error-free observations. The cost ensures that normals and lighting are consistent with the measurements, which must have low rank. We use constraints that ensure integrability. This is somewhat tricky because rank constraints apply to the measurements while integrability constraints apply to the normals. We show that by constructing a rank-3 matrix that contains normals, measurements and lighting, we can impose the rank and integrability constraints together. Specifically, we use a truncated nuclear norm approach [15] to enforce the rank constraint, while integrability is represented by linear equalities. This leads to a single non-convex problem that we solve using a series of Alternating Direction Method of Multipliers (ADMM) operations [5, 13].

Our formulation allows us to easily account for missing data in the measurement matrix. This commonly occurs when pixels are dark due to shadows, or saturated due to specularities. In some of the prior approaches, this can be solved with a preprocessing step, which may lead to a pipeline with yet another optimization [31]. We handle missing data using matrix completion based on the rank constraint. We initialize our optimization using prior approaches, since non-convex optimization requires a good initialization.

2 Background and Previous Work

In this section we introduce in detail the problem of uncalibrated photometric stereo for Lambertian objects and review past work. We assume that we view an object in multiple images from a fixed viewpoint. In each image the object is illuminated by a single, distant point light source. We represent lighting in image i with \(l_i \in R^3\), in which the direction of \(l_i\) represents the direction to the lighting, and \(\Vert l_i\Vert \) represents its magnitude. We represent the object using a set of surface normals \(\hat{n}_j \in R^{3}\), and albedos \(\rho _j \in R\) for each pixel. We then obtain images with the equation:

$$\begin{aligned} M_{ij} = \max (0, \rho _{j} l_{i}^\mathrm{T} \hat{n}_j) \end{aligned}$$
(3)

where \(M_{ij}\) represents the j-th pixel of the i-th image. We define the surface normal \(\hat{n}_j=\frac{n_j}{\Vert n_j\Vert }\), \(n_j = (-z_{x}, -z_{y},1)^\mathrm{T}\), where \(z_x\) and \(z_y\) represent partial derivatives of the surface z(xy) at pixel j. Negative values of \(\rho _{j} l_{i}^\mathrm{T} \hat{n}_j\) are set to 0; these appear as attached shadows.

We now describe the creation of all images using matrix operations. We define S to be a \(3 \times p\) matrix in which column j contains \(\rho _j\hat{n}_j\). Given m images, we can stack the light into the matrix L of dimension \(m \times 3\), where each row denotes one light per image. We concatenate all the images to form an observation matrix M of dimension \(m \times p\), where p is the number of pixels. Now, in the absence of shadows, we can write the equation of UPS as:

$$\begin{aligned} M = LS. \end{aligned}$$
(4)

Classical work on photometric stereo (e.g., [30], see a recent review in [1]) has assumed that known lighting is obtained by careful calibration. With L known, (4) can be solved as a linear least squares problem. A more general and challenging case is unconstrained photometric stereo, in which the L is unknown. A common approach, which we use as a Baseline algorithm, follows the steps in Algorithm 1.

figure g

Belheumer et al. [4] showed that in UPS the integrable set of surface normals can only be recovered up to a generalized bas-relief transformation (GBR). A number of recent papers have concentrated on methods of solving the GBR ambiguity. Researchers have used priors on the albedo distribution [2], reflectance extrema [11], total variation norm [24], grouping based on image appearance and color [25], inter-reflections [9], isotropy and symmetries [29], and specularity [10] as constraints while solving for the GBR. All of these methods have first used the above mentioned Baseline described in Algorithm 1 to obtain a solution up to the GBR.

Recent works have explored a variety of other research directions in photometric stereo. Mecca et al. [17] proposed an integrated, PDE-based approach to calibrated photometric stereo that uses a mere two images under perspective projection. It is not clear how to extend this to uncalibrated photometric stereo. Basri et al. [3] extended the Baseline to handle multiple light sources in each image using a spherical harmonics formulation. Chandraker et al. [8] proposed a method to handle attached and cast shadows in the case of multiple light sources per image. In [27] the authors determine the visibility subspace for a set of images to remove the cast and attached shadows for performing UPS. Various works have addressed non-Lambertian materials (e.g., Georghiades et al. [12] and Okabe et al. [20]).

In the context of Lambertian UPS, Georghiades et al. [12] proposed to remove shadows and specularities and recover the missing pixel values using matrix completion algorithms, e.g., using the damped Wiberg [21] or Cabral’s algorithm [6]. Wu et al. [31] proposed a Robust PCA formulation for calibrated photometric stereo. Their approach seeks a low-rank (not necessarily rank 3) approximation to M while removing outlier pixels (corresponding to shadows and specularities). Oh et al. [18, 19] applied Robust PCA in the context of calibrated photometric stereo, replacing the nuclear norm with a Truncated Nuclear Norm (TNN) regularizer [15]. In [11], Favaro et al. have used Robust PCA as preprocessing to the Baseline algorithm for UPS.

3 Our Approach

In this section we introduce our integrated formulation that enforces integrability of surface normals in solving the uncalibrated photometric stereo problem. We recall from (4) that the measurement matrix M can be factored into \(M=LS\). To access the derivatives of z(xy) we write S as a product

$$\begin{aligned} S = N \Lambda , \end{aligned}$$
(5)

where N is a \(3 \times p\) matrix whose j’th column is \(n_j = (-z_{x}, -z_{y},1)^\mathrm{T}\) and \(\Lambda = \mathrm {diag}(\lambda _i, \lambda _2, \ldots , \lambda _p)\) with \(\lambda _j = -\rho _j / \Vert n_j\Vert \). We next define the matrix:

$$\begin{aligned} X= \begin{bmatrix} X^{I}&X^{N} \\ X^{L}&X^{M} \end{bmatrix} = \begin{bmatrix} I \quad&N\\ L \quad&~~M \Lambda ^{-1} \end{bmatrix}, \end{aligned}$$
(6)

where X is \((3+m)\times (3+p)\). The matrices X, \(\Lambda ,\) and the depth values (z(xy)) form the unknowns in our optimization. Note that, because \(LN=M \Lambda ^{-1}\), the following holds for any \(3 \times 3\) non-degenerate matrix A

$$\begin{aligned} X = \begin{bmatrix} A^{-1} \\ L A^{-1} \end{bmatrix}\begin{bmatrix} A&~A N \end{bmatrix}. \end{aligned}$$
(7)

This shows that X is rank 3. The matrix A represents a linear ambiguity. However, forcing the normals in N to be integrable will reduce this ambiguity to the GBR.

To force integrability we denote by \(\mathbf {z}=(z_1,\ldots ,z_p)^\mathrm{T}\) the vector of unknown depth values and require

$$\begin{aligned} X^N = \begin{bmatrix} D_x \mathbf {z},&D_y \mathbf {z},&-\mathbf {1} \end{bmatrix}^\mathrm{T}, \end{aligned}$$
(8)

where \(D_x,D_y\) denote, respectively, the x- and y-derivative operators and \(\mathbf {1}\) denotes the vector of all 1’s.

Additional constraints are obtained by noticing that, because \(0 \le \rho _j \le 1\) and \(\Vert n_j\Vert \ge 1\),

$$\begin{aligned} -1 \le \lambda _{j} \le 0 \end{aligned}$$
(9)

and

$$\begin{aligned} X^{I} = I_{3\times 3}. \end{aligned}$$
(10)

We are now ready to define our optimization function. Let W be a binary, \(m \times p\) matrix so that \(W_{ij}=0\) if \(M_{ij}\) is missing and \(W_{ij}=1\) otherwise, and let

$$\begin{aligned} f_{\mathrm{data}}(X, \Lambda ) = \frac{1}{2}\left\| W \odot (M - X^{M} \Lambda )\right\| ^{2}_F, \end{aligned}$$
(11)

where \(\odot \) denotes element-wise multiplication. Then (2) can be written as

$$\begin{aligned} \underset{X,\Lambda ,\mathbf {z}}{\min }&\quad f_{\mathrm{data}}(X,\Lambda ) \nonumber \\ \mathrm {s.t.}&\quad \mathrm {rank}(X)=3, ~(8), ~(9), \mathrm {and}~(10). \end{aligned}$$
(12)

Handling the rank-3 constraint Enforcing the non-convex constraint \(\mathrm {rank}(X)=3\) can be challenging. In the context of matrix completion a recent paper [15] proposed using the Truncated Nuclear Norm (TNN) regularization term:

$$\begin{aligned} f_{\mathrm{tnn}}(X) = \Vert X\Vert _* - \sum \limits _{k=1}^{3} \sigma _{k}(X), \end{aligned}$$
(13)

where \(\Vert X\Vert _*\) denotes the nuclear norm of X and \(\sigma _k(X)\) is the k-th largest singular value of X. Clearly, \(f_{\mathrm{tnn}}(X)=0\) if and only if \(\mathrm {rank}(X) \le 3\). We use \(f_{\mathrm{tnn}}\) as a regularizer and solve

$$\begin{aligned} \underset{X,\Lambda ,\mathbf {z}}{\min }&\quad ~ f_{\mathrm{data}}(X,\Lambda ) + c ~ f_{\mathrm{tnn}}(X) \nonumber \\ \mathrm {s.t.}&\quad ~~ ~(8), ~(9), \mathrm {and}~(10), \end{aligned}$$
(14)

where c is a preset scalar.

There are several different ways of handling the rank constraint. One such technique is to use explicit factorization of M into L, N and \(\rho \) with the integrability constraint over N. This trilinear decomposition can be solved with alternate steps. Alternate optimizations are sometimes slow to converge. Another technique widely used in matrix completion is nuclear norm (NN) relaxation : \(\sum \sigma _{k}(X)\). Even though NN relaxation is convex, TNN [15] regularization is shown to outperform the former for matrix completion problems. An alternate approach is to directly impose \(\mathrm {rank}(X)=3\) with ADMM. We have implemented both NN relaxation and \(\mathrm {rank}(X)=3\) and empirically observed TNN to outperform both of them. Thus, in this paper we have used TNN regularization to handle the rank constraint.

4 Optimization Using ADMM

In this section we introduce a method for solving (14). This is a challenging problem because both \(f_{\mathrm{data}}\) and \(f_{\mathrm{tnn}}\) are non-convex. Specifically, \(f_{\mathrm{data}}\) (11) is bilinear in X and \(\Lambda \), while \(f_{\mathrm{tnn}}\) (13) is a difference between two convex functions. Our solution is based on a nested iteration in which the outer loop uses majorization to decrease \(f_{\mathrm{tnn}}\), whereas the inner loop uses the scaled ADMM with alternation to decrease \(f_{\mathrm{data}}\).

Outer loop Following [15] at each iteration of the outer loop we replace \(f_{\mathrm{tnn}}(X)\) with a majorizer. Specifically, at iteration k let \(X^{(k)}=U\Sigma V^\mathrm{T}\) be the singular value decomposition of \(X^{(k)}\), and let \(U_3\) (and \(V_3\)) be the matrices containing the left (right) singular vectors corresponding to the three largest singular values of \(X^{(k)}\). \(U_3\) and \(V_3\) are determined in the outer loop and are held constant throughout the inner loop. We then define

$$\begin{aligned} f_{\mathrm{maj}}^{(k)}(X) = \Vert X\Vert _* - \mathrm {trace}(U_3^\mathrm{T} X V_3). \end{aligned}$$
(15)

It was shown in [15] that \(f_{\mathrm{maj}}^{(k)}(X) \ge f_{\mathrm{tnn}}(X)\) for all X and that \(f_{\mathrm{maj}}^{(k)}(X^{(k)})=f_{\mathrm{tnn}}(X^{(k)})\), and so decreasing \(f_{\mathrm{maj}}\) leads to decreasing \(f_{\mathrm{tnn}}\).

Inner loop In the inner loop we seek to minimize

$$\begin{aligned} \underset{X,\Lambda ,\mathbf {z}}{\min } \quad&f_{\mathrm{data}}(X,\Lambda ) + c f_{\mathrm{maj}}^{(k)}(X) \nonumber \\ \mathrm {s.t.}&\quad ~~ ~(8), ~(8), \mathrm {and}~(10), \end{aligned}$$
(16)

We use scaled ADMM, a variant of the augmented Lagrangian method that splits the objective function and aims to solve the different subproblems separately. We maintain a second copy of X, which we denote by Y and form the augmented Lagrangian of (16) as follows

$$\begin{aligned} \max _\Gamma \min _{X,\Lambda ,\mathbf {z},Y}&\quad \frac{1}{2}\left\| W\odot (M - X^{M} \Lambda )\right\| _F^2 \nonumber \\&+\,c \left( \Vert Y\Vert _* - \mathrm {trace}(U_3^\mathrm{T} Y V_3)\right) + \frac{\tau }{2}\Vert Y-X+\Gamma \Vert _F^2 \nonumber \\ s.t. \ X^{I} = I_{3\times 3},&~ -1 \le \lambda _j \le 0 ~ \forall j, ~ X^N = \begin{bmatrix} D_x \mathbf {z},&D_y \mathbf {z},&-\mathbf {1} \end{bmatrix}^\mathrm{T}, \end{aligned}$$
(17)

where \(\Vert Y-X+\Gamma \Vert ^2_F\) denotes the Lagrangian penalty; \(\tau \) is a constant, and \(\Gamma \) is a matrix of Lagrange multipliers the same size as X that is updated by the ADMM steps [5, 13]. We next describe the ADMM steps (applied iteratively).

Step 1 Solving for \((X,\Lambda ,\mathbf {z})\).

In each iteration, k, we solve the following subproblems:

  1. 1.

    Optimize w.r.t. \(X^{I}\): \(X^{I\,(k+1)} = I_{3 \times 3}\).

  2. 2.

    Optimize w.r.t. \(X^{L}\):

    $$\begin{aligned} X^{L\,(k+1)}= & {} \underset{X^{L}}{{{\mathrm{{\arg \!\min }}}}} \Vert Y^{L\, (k)}-X^{L}+\Gamma ^{L\, (k)}\Vert _F^2 \nonumber \\= & {} Y^{L\,(k)} + \Gamma ^{L\,(k)}. \end{aligned}$$
    (18)
  3. 3.

    Optimize w.r.t. \(X^{N}\) and \(\mathbf {z}\):

    $$\begin{aligned} \left( X^{N\,(k+1)},\mathbf {z}^{(k+1)}\right)&= \underset{X^{N},\mathbf {z}}{{{\mathrm{{\arg \!\min }}}}} \left\| Y^{N\,(k)}-X^{N}+\Gamma ^{N\,(k)} \right\| _F^2 \nonumber \\ \mathrm {s.t.}~~ X^N&= \begin{bmatrix} D_x \mathbf {z},&D_y \mathbf {z},&-\mathbf {1} \end{bmatrix}^\mathrm{T}. \end{aligned}$$
    (19)

    The problem is solved by setting the third row of \(X^{N\,(k+1)}\) to \(-\mathbf {1}\) and by substituting \(D_x\mathbf {z}\) and \(D_y\mathbf {z}\) for the first two rows of \(X^N\) in the objective, obtaining linear least squares equations in \(\mathbf {z}\) that can be solved directly.

  4. 4.

    Optimize w.r.t. \(X^{M}\) and \(\Lambda \):

    $$\begin{aligned} \left( X^{M\,(k+1)},\Lambda ^{(k+1)} \right) =\,&\underset{X^{M},\Lambda }{{{\mathrm{{\arg \!\min }}}}} \frac{1}{2}\left\| W\odot (M - X^{M}\Lambda )\right\| _F^2 \\&+\frac{\tau }{2} \left\| Y^{M\,(k)}-X^{M}+\Gamma ^{M\,(k)}\right\| _F^2 \nonumber \\ ~~~ \mathrm {s.t.} ~~~ -1 \le \lambda _j \le 0 ~ \forall j. \end{aligned}$$

    We will separate this into the known and unknown pixels based on W. For an unknown pixel j in frame i (\(W_{ij}=0\)) the first term vanishes and the minimization only determines the respective entry of \(X^{M}\) so that:

    $$\begin{aligned} X^{M\, (k+1)}_{ij} = Y_{ij}^{M\,(k)}+\Gamma _{ij}^{M\,(k)}. \end{aligned}$$
    (20)

    For the known pixels, since \(\Lambda \) is diagonal we can write these equations separately for each column j (corresponding to the j-th pixel):

    $$\begin{aligned} \left( X^{M\,(k+1)}_{j},\lambda _{j}^{(k+1)}\right) = \,&\underset{X_{j}^{M},\lambda _{j}}{{{\mathrm{{\arg \!\min }}}}} \frac{1}{2} \left\| (W_j \odot \left( M_j - \lambda _{j} X_{j}^{M} \right) \right\| ^{2}_{2} \nonumber \\&+ \frac{\tau }{2}\left\| Y_{j}^{M\,(k)}-X_{j}^{M}+\Gamma _{j}^{M\,(k)}\right\| _{2}^{2} \nonumber \\ ~~~ \mathrm {s.t.} ~~ -1 \le \lambda _j \le 0. \end{aligned}$$
    (21)

    The problem (21) is non-convex. We will solve it with alternate optimization. \(X^{M}\) and \(\Lambda \) are updated by the following steps until convergence. \(\mathbf {X^{M}}\) : Let \(\tilde{M_{j}}=W_{j} \odot M_{j}\) , \(\tilde{X_{j}}=W_{j} \odot X^{M}_{j}\) and \(\tilde{A}^{M\,(k)}_{j}=W_{j} \odot (Y_{j}^{M\,(k)}+\Gamma _{j}^{M\,(k)})\). Then,

    $$\begin{aligned} \tilde{X_{j}}&= \underset{\tilde{X_{j}}}{{{\mathrm{{\arg \!\min }}}}}~ \frac{1}{2} \left\| \tilde{M_j} - \lambda _{j} \tilde{X_j}\right\| ^{2}_{2} + \frac{\tau }{2} \left\| \tilde{A}^{M\,(k)}_j - \tilde{X_j}\right\| ^{2}_{2} \nonumber \\&= \dfrac{\lambda _j \tilde{M_j} + \tau \tilde{A}^{M\,(k)}_j}{\lambda ^{2}_j + \tau }. \end{aligned}$$
    (22)

    \(\mathbf {\Lambda }\):

    $$\begin{aligned} \lambda _j&= \underset{\lambda _j}{{{\mathrm{{\arg \!\min }}}}}~ \frac{1}{2} \left\| \tilde{M_j} - \lambda _j \tilde{X_j}\right\| ^2_2 ~~ \mathrm {s.t.} ~ -1 \le \lambda _j \le 0, \nonumber \\&=\min ( 0,\max ( -1, \tilde{X}_j^\mathrm{T} \tilde{M}_j/ \Vert \tilde{X}_j \Vert ^2_2 ) ). \end{aligned}$$
    (23)

Step 2 Solving for Y. Solving for Y requires a solution to

$$\begin{aligned} Y^{(k+1)}= & {} {{\mathrm{{\arg \!\min }}}}_{Y} c \left( \Vert Y\Vert _{*} - \mathrm {trace}\left( U_3^\mathrm{T} Y V_3\right) \right) \nonumber \\&+\, \frac{\tau }{2} \left\| Y - X^{(k+1)} + \Gamma ^{(k)}\right\| _F^2. \end{aligned}$$
(24)

Below we show that this problem can be solved in closed form by applying the shrinkage operator, obtaining

(25)

where the shrinkage operator \(D_{t}(.)\) is defined as follows. For a scalar s we define \(D_t(s)=\mathrm {sign}(s)\times \max (|s|-t,0)\). For a diagonal matrix \(S=\mathrm {diag}(s_1,s_2,\ldots )\) with nonnegative entries we define \(D_t(S)=\mathrm {diag}(D_t(s_1),D_t(s_2), \ldots )\). Finally, for a general matrix \(\Upsilon \), let \(\Upsilon =\tilde{U} S \tilde{V}^\mathrm{T}\) be its singular value decomposition, then \(D_{t}(\Upsilon )=\tilde{U} D_t(S) \tilde{V}^\mathrm{T}\).

To derive (25), we rewrite (24) as:

$$\begin{aligned} Y^{(k+1)}= & {} \mathop {{{\mathrm{{\arg \!\min }}}}}\limits _{Y} \Vert Y\Vert _{*} + \dfrac{\tau }{2 c} \Vert Y - X^{(k+1)} + \Gamma ^{(k)} \nonumber \\&- \dfrac{c}{\tau }U_3 V_3^\mathrm{T} \Vert ^{2}_{F} - T, \end{aligned}$$
(26)

where \(T=\mathrm {trace}( V_3U_3^\mathrm{T} (X^{(k+1)}-\Gamma ^{(k)}))+\dfrac{c}{2 \tau }\Vert U_3V_3^\mathrm{T}\Vert _F^2\) is independent of Y. Equation (26) is of the general form \(\underset{Y}{\min }\Vert Y\Vert _{*} + \dfrac{1}{2t}\Vert Y -C\Vert ^2_F\), for which the solution is \(D_t(C)\), as is shown in [7], implying (25).

Step 3 Update of \(\Gamma \). The matrix \(\Gamma \) contains Lagrange multipliers that are used in the saddle-point formulation (17) to enforce the equality constraint \(X=Y\). The following update is a gradient ascent step that acts to maximize the augmented Lagrangian (17) for \(\Gamma .\) For details, see [5, 13].

$$\begin{aligned} \Gamma ^{(k+1)} = \Gamma ^{(k)} + \left( Y^{(k+1)} - X^{(k+1)}\right) . \end{aligned}$$
(27)

The entire optimization process is listed in Algorithm 2. We will make the code available.

figure h

5 Experimental Results

In this section we evaluate and compare the performance of our algorithm with two versions of the Baseline algorithm, in both real-world and synthetic examples. We compare the following methods:

Baseline :

Algorithm 1 described in Sect. 2. This method is used in  [2, 9,10,11, 25, 29].

RPCA :

Images are preprocessed using Robust PCA [31], parameters are chosen as suggested by [11]. Then we apply the Baseline algorithm to the obtained matrix. This method is used in [11]. RPCA solves a sparse low-rank optimization to detect shadows and other non-Lambertian effects. The method uses \(L_1\) regularization to identify outlier pixels, even when they do not result in intensities near 0 or 1. The refined intensities obtained from RPCA may not be in the range of [0,1]. Obtuse angle between the surface normal and the light can cause negative intensity, and specularity can cause intensity more than 1.

Our(NC) :

Our proposed formulation as described in Sect. 4 using \(W=1\), i.e., no completion. This allows comparison to Baseline, which also does not perform matrix completion.

Our(MC) :

Our proposed formulation as in Sect. 4 with \(w_{ij} \in \{0,1\}\), allowing for matrix completion. In both versions of our algorithm we use \(c=1\). We identify missing pixels as those with normalized intensity outside the range of (0.02, 0.98). We use RPCA algorithm to perform UPS, and the obtained normals and lights are used to initialize our algorithm as highlighted in Algorithm 2.

All the tested methods solve for the surface only up to a GBR ambiguity. To compare the results with ground truth, we find the GBR that optimizes the fit to ground truth and measure the residual error.

In the presence of a large number of images with noise and non-Lambertian effects, we expect the sequential pipeline of Baseline and RPCA, involving SVD, to produce accurate solutions, because the problem solved by SVD is heavily overconstrained. In the presence of fewer images, our integrated method will be able to produce a more accurate decomposition by using both rank and integrability constraints to find the right linear subspace. Thus, we expect our integrated approach to improve over the Baseline and RPCA as we reduce the number of images. In the following subsection we will show results with synthetic and real-world data that supports our claim.

5.1 Experiments on Synthetic Data

We use five real objects (“cat,” “owl,” “rock,” “horse,” “buddha”) to produce synthetic images, their shape is obtained by applying calibrated photometric stereo to a publicly available dataset [16]. We use the normals and albedos from these objects to generate images. Each image is generated by a randomly selected light source which lies at 30 degrees of the viewing direction on average. We clip the intensities outside the range [0,1] to create shadows and specularity. All images are of size \(512 \times 340\) with objects occupying 29–72 K pixels. A segmentation mask is also supplied. To show the variation of performance with the number of images \(N_I\), we use sets of 4, 6, 8, 10, 15, 20, 25 and 30 images, respectively. We add Gaussian noise with standard deviation ranging from 1 to 7% (in steps of 2%) of the maximum intensity. For each choice of noise, we run five different trials with random noise and lighting to generate the synthetic images. Thus, we have five objects, four levels of noise and five random simulations, making a total of 100 experiments for each of the 8 different sets of 4, 6, 8, 10, 15, 20, 25 and 30 images. As a measure of performance, we calculate the error in the reconstructed depth map. Let the ground truth surface be \(Z_T\) and the reconstructed surface be \(Z_{\mathrm{rec}}\). We measure error in depth as \(Z_{\mathrm{err}}=100 \times \frac{\Vert Z_T - Z_{\mathrm{rec}} \Vert }{\Vert Z_T \Vert }\). To compare two algorithms (say, algorithm A vs. algorithm B), we define the following two terms :

Relative improvement (in %) Denote \(e^a_k\) and \(e^b_k\) as the depth error for each trial k by using algorithm A and B, respectively. The relative improvement of algorithm B over A is the average of \(\frac{(e^a_k - e^b_k)}{e^a_k}\) over all trials K for each choice of \(N_I\) expressed in percentage.

Percent of improved trials This denotes the number of trials in which algorithm B improves over A. In terms of notation introduced previously, this is \(\frac{1}{K}\sum _{k=1}^K \mathbb {I}(e^{\mathrm {a}}_k < e^{\mathrm {b}}_k)\), where \(\mathbb {I}(.)\) is in indicator variable and K is the total number of trials for each choice of \(N_I\). The measure is expressed in percentage.

Fig. 2
figure 2

Performance comparison of Our(MC) algorithm to RPCA (in blue) and Baseline (yellow) for different numbers of input images with gaussian noise under a pure Lambertian model. The left bar plot shows the amount of relative improvement achieved with our algorithm, and the right plot shows the percent of trials in which our algorithm outperformed each one of the competing algorithms a median error in depth, b median error in normal

Fig. 3
figure 3

Performance comparison of Our(MC) algorithm to RPCA (in blue) and Baseline (yellow) for different numbers of input images with gaussian noise under the Phong model. The left bar plot shows the amount of relative improvement achieved with our algorithm, and the right plot shows the percent of trials in which our algorithm outperformed each one of the competing algorithms a median error in depth, b median error in normal

Fig. 4
figure 4

Performance comparison of Our(MC) with RPCA and Baseline with varying noise created using the Phong model

In Fig. 2 we compare performance of Our(MC) with Baseline and RPCA, on synthetic data in the presence of Gaussian noise. We initialize our methods with RPCA. We observe that as the number of images decreases, our method improves compared to Baseline and RPCA. With simple Gaussian noise RPCA does not produce additional advantages as there are no outliers.

In Fig. 3 we compare the performance of our methods on synthetic data with Gaussian noise and with specularities generated by the Phong reflectance model [23, 28]. Mathematically each image \(M_{i}\) can be represented as :

$$\begin{aligned} M_{i} = L_{i}S + k_{s} (VR)^{\alpha }, \end{aligned}$$
(28)

where V is the viewing direction and R denotes the directions of perfect reflection for incoming light \(L_i\) for each pixel j. Larger \(\alpha \) produces sharper specularities, while larger \(k_s\) causes more light to be reflected as specularity. We use \(k_s = 0.2\) and \(\alpha = 10\). We observe that the advantage of Our(MC) degrades as the number of images increases, as expected. This experiment shows that even though our method is designed specifically for Lambertian objects it can tolerate a certain amount of model irregularities such as specularity. With four images our method beats RPCA in 85% of the all trials with a relative improvement of 22.12%.

In Fig. 4 we compare Our(MC) with Baseline and RPCA with variation of noise for different subsets of images (4,6,10 and 15). We can conclude that our method is robust to noise and its advantages do not degrade with an increase in noise.

5.2 Experiments on Real-World Data

5.2.1 Lambertian Objects

To test our approach on real data, we used the two publicly available data sets [16, 32] consisting of five and seven objects, respectively. We perform uncalibrated photometric stereo over a set of images and use the result of calibrated photometric stereo as ground truth for comparison. The datasets provide calibrated lighting, which we use to perform calibrated photometric stereo. We use the code provided by [16, 32] along with the lighting information to obtain normals and depth map. The obtained depth map, albedo and surface normals from calibrated PS are considered as ground truth for photometric stereo with unknown lighting similar to [2]. To show the variation of performance with the number of images, we select subset of 4, 6, 8 and ten images for each object. We perform 10 random selections of subset of images for each of the 12 objects. Thus, we have 120 experiments for every subset of images.

In Fig. 5 we compare the performance of our methods, Our(MC) and Our(NC), with Baseline and RPCA with variation in the number of images. We see that for fewer images our methods outperform Baseline and RPCA by a significant amount and are comparable to RPCA for more images. For four images Our(MC) outperforms Baseline in 84.9% cases with a relative improvement of 30.6% and outperforms RPCA in 81.4% cases with a relative improvement of 12%. However, for ten images we beat Baseline in 75% cases with a relative improvement of 10.7% and beat RPCA in only 47.3% cases with a relative improvement of \(-7.2\)%.

Fig. 5
figure 5

Performance comparison of Our(MC) and Our(NC) algorithms to RPCA and Baseline with real images

Fig. 6
figure 6

Average surface reconstruction error with four (top) and six (bottom) real images of 12 objects over 10 random trials using Our(MC), RPCA and Baseline

Figure 6 shows the average reconstruction error obtained by Our(MC), RPCA and Baseline on 12 real-world objects over 10 random simulations. We observe that Our(MC) outperforms RPCA on 11 out of 12 objects for four images and 10 out of 12 objects for six images (and is comparable in 1). With ten images the average reconstruction error using Our(MC) over all objects and all trials is 4.6%. This increases to 8.1% with four images and is only 5.4% with six images. This shows that we have reasonable reconstruction with 4 images and good reconstruction with as few as six images.

Figure 7 shows the average of median angular error in surface normal obtained by Our(MC), RPCA and Baseline on 12 real-world objects over 10 random simulations. Our(MC) outperforms both RPCA and Baseline on all 12 objects for four images and 10 out of 12 objects for six images. This shows that also in terms of surface normal reconstruction error our algorithm outperforms RPCA and Baseline when fewer images are available (Fig. 8).

Fig. 7
figure 7

Average median angular error in surface normal with 4, 6, 8 and 10 real images of 12 objects over 10 random trials using Our(MC), RPCA and Baseline

Fig. 8
figure 8

Convergence of ADMM algorithm for each iteration of TNN as shown in Eq. 17 (left). Convergence of TNN regularized cost function as shown in Eq. 16 (right)

Fig. 9
figure 9

Reconstruction error \(|Z_T - Z_{\mathrm{rec}}|\) for Baseline, RPCA and Our(MC) on “Cat,” “Owl,” “Pig” and “Hippo” shown in each row. The left column shows results for four images, the right shows results for 10

In Fig. 9 we compare the error in surface reconstruction between Baseline, RPCA and Our(MC) on some of our real-world examples. Figure 10 shows two views of surfaces reconstructed using Our(MC) algorithm using four images, showing reasonable surface reconstruction. These results suggest that our joint approach to enforcing rank and integrability constraints can significantly improve the performance of photometric stereo in the presence of a few images.

Fig. 10
figure 10

Two views of surfaces reconstructed with Our(MC) algorithm for four images. Each column shows two images of surfaces reconstructed on “Cat,” “Owl,” “Pig” and “Hippo,” respectively

Table 1 Median surface normal reconstruction error for four images
Table 2 Median surface normal reconstruction error for six images
Table 3 Median surface normal reconstruction error for eight images
Table 4 Median surface normal reconstruction error for ten images
Table 5 Median surface normal reconstruction error for 15 images
Table 6 Median surface normal reconstruction error for 20 images
Table 7 Median surface normal reconstruction error for 30 images
Table 8 Median surface normal reconstruction error for 40 images
Table 9 Median surface normal reconstruction error for all (96) images (We set RPCA parameter to 0.5)

In general, we see that incorporating matrix completion into our formulation results in a slight improvement, with Our(MC) somewhat outperforming Our(NC). This indicates that the improvement of our method compared to RPCA or Baseline is mostly due to the joint optimization formulation and not due to matrix completion. We further note that RPCA seems to significantly improve over Baseline. RPCA is able to identify outliers and use that extra information for better recovery. This also suggests that the robust error function used by RPCA is important. However, our integrated approach, which does not have a robust cost function like RPCA, still outperforms RPCA for four and six images and is almost equal for eight or ten images. This shows that an integrated approach is very useful for a small number of images and provides similar gain compared to RPCA for more images. It would be an interesting topic of future work to amend the cost function of Our(MC) to include RPCA’s robust handling of error, to see if this further improves its performance.

5.2.2 Non-Lambertian Objects

We also test our method on 8 objects of a non-Lambertian objects dataset [26]. We compare our method with RPCA and Baseline. For each object we choose five different random sub-samples to 4, 6, 8, 10, 15, 20, 30 and 40 images. In Tables 1, 2, 3, 4, 5, 6, 7, 8 and 9 we show the results of Our(MC), RPCA and Baseline for different number of images. The results show that our method is also robust in the presence of non-Lambertian objects.

The result shows that Our(MC) consistently performs better than Baseline and RPCA. Even for large number of images, like 96, Our(MC) improves over RPCA by 6.19% on average over all objects in the dataset.

For an image of size \(512 \times 340\) with an object occupying an area of 30 K pixels, our algorithm takes 20 min on a 2.7 GHz Intel Core i5 machine. In Fig. 8 we show a typical sample convergence graph for our ADMM algorithm which solves the optimization problem (17) on the left and the convergence graph for TNN-ADMM algorithm which solves our original optimization (16) on the right. We empirically observe that ADMM converges to a local minimum. Since the problem is non-convex there is no guarantee of convergence to the global minimum.

6 Conclusion and Future Work

In this paper we have introduced a new low-rank constrained optimization method for solving uncalibrated photometric stereo using fewer images. The key to this approach is to combine rank and integrability constraints in a single optimization problem. This relies on a novel formulation that exposes both depth and surface normals to the optimization, linking them with an integrability constraint. We then show how to perform this optimization using a truncated nuclear norm and ADMM. Our joint formulation produces better solutions, compared to other methods that use SVD, for fewer images. We have shown promising results compared to Baseline approaches using both real and synthetic examples. We also observe that our method can handle certain degrees of model irregularities as it has outperformed RPCA in synthetic examples with specularities generated using the Phong model.

In the future, it will be interesting to apply the idea of Robust PCA to our formulation. We would also like to extend this work to handle more general lighting configurations, e.g., using spherical harmonic approximations to lighting.