1 Introduction

3D-reconstruction is one of the most important goals of computer vision. Among the many techniques which can be used to accomplish this task, shape-from-shading [28] and photometric stereo [64] are photometric techniques, as they use the relationship between the gray or color levels of the image, the shape of the scene, supposedly opaque, its reflectance and the luminous flux that illuminates it.

Let us first introduce some notations that will be used throughout this paper. We describe a point \(\mathbf {x}\) on the scene surface by its coordinates \([x, y, z]^\top \) in a frame originating from the optical center C of the camera, such that the plane Cxy is parallel to the image plane and the Cz axis coincides with the optical axis and faces the scene (cf. Fig. 1). The coordinates \([u, v]^\top \) of a point \(\mathbf {p}\) in the image (pixel) are relative to a frame Ouv whose origin is the principal point O, and whose axes Ou and Ov are parallel to Cx and Cy, respectively. If f refers to the focal length, the conjugation relationship between \(\mathbf {x}\) and \(\mathbf {p}\) is written, in perspective projection:

$$\begin{aligned} {\left\{ \begin{array}{ll} x = \dfrac{z}{f} \, u, \\ y = \dfrac{z}{f} \, v. \end{array}\right. } \end{aligned}$$
(1.1)
Fig. 1
figure 1

Schematic representation of the geometric setup. A point \(\mathbf {x} = [x,y,z]^\top \in \mathbb {R}^3\) on the scene surface and a pixel \(\mathbf {p} = [u,v]^\top \in \mathbb {R}^2\) in the image plane are conjugated according to Eq. (1.1). Equation (2.1) states that, when the scene is illuminated by a LED located in \(\mathbf {x}_s \in \mathbb {R}^3\), the gray level \(I(\mathbf {p})\) of the pixel \(\mathbf {p}\) conjugated to \(\mathbf {x}\) is a function of the angle between the lighting vector \(\mathbf {s}(\mathbf {x})\) and the normal \(\mathbf {n}(\mathbf {x})\) to the surface in \(\mathbf {x}\) (illuminance), of the angle \(\theta \) between the principal direction \(\mathbf {n}_s\) of the LED and \(\mathbf {s}(\mathbf {x})\) (anisotropy), of the distance \(\Vert \mathbf {x}-\mathbf {x}_s\Vert \) between the surface point and the light source location (inverse-of-square falloff), and of the albedo in \(\mathbf {x}\) (Lambertian reflectance)

The 3D-reconstruction problem consists in estimating, in each pixel \(\mathbf {p}\) of a part \(\varOmega \) of the image domain, its conjugate point \(\mathbf {x}\) in 3D-space. Equation (1.1) shows that it suffices to find the depth z to determine \(\mathbf {x} = \left[ x,y,z\right] ^\top \) from \(\mathbf {p} = \left[ u,v\right] ^\top \). The only unknown of the problem is thus the depth map z, which is defined as follows:

$$\begin{aligned} \begin{array}{rccl} &{}z: \varOmega \subset \mathbb {R}^2 &{} \rightarrow &{} \mathbb {R}^+ \\ &{} \mathbf {p} = [u,v]^\top &{} \mapsto &{} z(\mathbf {p}). \end{array} \end{aligned}$$
(1.2)

We are interested in this article in 3D-reconstruction of Lambertian surfaces by photometric stereo. The reflectance in a point of such a surface is completely characterized by a coefficient \(\rho \), called albedo, which is 0 if the point is black and 1 if it is white. Photometric stereo is nothing else than an extension of shape-from-shading: instead of a single image, the former uses \(m \geqslant \) 3 shots \(I^i,\, i \in \{1,\ldots ,m\}\), taken from the same angle, but under varying lighting. Considering multiple images allows to circumvent the difficulties of shape-from-shading: photometric stereo techniques are able to unambiguously estimate the 3D-shape as well as the albedo, i.e., without resorting to any prior.

A parallel and uniform illumination can be characterized by a vector \(\mathbf {s} \in \mathbb {R}^3\) oriented toward the light source, whose norm is equal to the luminous flux density. We call \(\mathbf {s}\) the lighting vector. For a Lambertian surface, the classical modeling of photometric stereo is written, in each pixel \(\mathbf {p} \in \varOmega \), as the following systemFootnote 1:

$$\begin{aligned} I^i(\mathbf {p}) = \rho (\mathbf {x}) \,\, \mathbf {s}^i \cdot \mathbf {n}(\mathbf {x}),\quad i\in \{1,\ldots ,m\}, \end{aligned}$$
(1.3)

where \(I^i(\mathbf {p})\) denotes the gray level of \(\mathbf {p}\) under a parallel and uniform illumination characterized by the lighting vector \(\mathbf {s}^i\), \(\rho (\mathbf {x})\) denotes the albedo in the point \(\mathbf {x}\) conjugate to \(\mathbf {p}\), and \({\mathbf {n}}(\mathbf {x})\) denotes the unit-length outgoing normal to the surface in this point. Since there is a one-to-one correspondence between the points \(\mathbf {x}\) and the pixels \(\mathbf {p}\), we write for convenience \(\rho (\mathbf {p})\) and \(\mathbf {n}(\mathbf {p})\), in lieu of \(\rho (\mathbf {x})\) and \(\mathbf {n}(\mathbf {x})\). Introducing the notation \(\mathbf {m}(\mathbf {p}) = \rho (\mathbf {p}) \, \mathbf {n}(\mathbf {p})\), System (1.3) can be rewritten in matrix form:

$$\begin{aligned} \mathbf {I}(\mathbf {p}) = \mathbf {S} \, \mathbf {m}(\mathbf {p}), \end{aligned}$$
(1.4)

where vector \(\mathbf {I}(\mathbf {p}) \in \mathbb {R}^m\) and matrix \(\mathbf {S} \in \mathbb {R}^{m\times 3}\) are defined as follows:

$$\begin{aligned} \mathbf {I}(\mathbf {p}) = \begin{bmatrix} I^1(\mathbf {p}) \\ \vdots \\ I^m(\mathbf {p}) \end{bmatrix} \quad \text {and} \quad \mathbf {S} = \begin{bmatrix} \mathbf {s}^{1 \top } \\ \vdots \\ \mathbf {s}^{m \top } \end{bmatrix}. \end{aligned}$$
(1.5)

As soon as \(m \geqslant 3\) non-coplanar lighting vectors are used, matrix \(\mathbf {S}\) has rank 3. The (unique) least-squares solution of System (1.4) is then given by

$$\begin{aligned} \mathbf {m}(\mathbf {p}) = \mathbf {S}^\dagger \, \mathbf {I}(\mathbf {p}), \end{aligned}$$
(1.6)

where \(\mathbf {S}^\dagger \) is the pseudo-inverse of \(\mathbf {S}\). From this solution, we easily deduce the albedo and the normal:

$$\begin{aligned} \rho (\mathbf {p}) = \Vert \mathbf {m}(\mathbf {p}) \Vert \quad \text {and} \quad \mathbf {n}(\mathbf {p}) = \frac{\mathbf {m}(\mathbf {p})}{\Vert \mathbf {m}(\mathbf {p})\Vert }. \end{aligned}$$
(1.7)

The normal field estimated in such a way must eventually be integrated so as to obtain the depth map, knowing that the boundary conditions, the shape of domain \(\varOmega \) as well as depth discontinuities significantly complicate this task [55].

To ensure lighting directionality, as is required by Model (1.3), it is necessary to achieve a complex optical setup [45]. It is much easier to use light-emitting diodes (LEDs) as light sources, but with this type of light sources, we should expect significant changes in the modeling, and therefore in the numerical solution. The aim of our work is to conduct a comprehensive and detailed study of photometric stereo under point light source illumination such as LEDs.

1.1 Related Works

Modeling the luminous flux emitted by a LED is a well-studied problem, see for instance [46]. One model which is frequently considered in computer vision is that of nearby point light source. This model involves an inverse-of-square law for describing the attenuation of lighting intensity with respect to distance, which has long been identified as a key feature for solving shape-from-shading [32] and photometric stereo [12]. Attenuation with respect to the deviation from the principal direction of the source (anisotropy) has also been considered [7].

If the surface to reconstruct lies in the vicinity of a plane, it is possible to capture a map of this attenuation using a white planar reference object. Conventional photometric stereo [64] can then be applied to the images compensated by the attenuation maps [3, 40, 61]. Otherwise, it is necessary to include the attenuation coefficients in the photometric stereo model, which yields a nonlinear inverse problem to be solved.

This is easier to achieve if the parameters of the illumination model have been calibrated beforehand. Lots of methods exist for estimating a source location [1, 4, 11, 17, 22, 54, 59, 62]. Such methods triangulate this location during a calibration procedure, by resorting to specular spheres. This can also be achieved online, by introducing spheres in the scene to reconstruct [37]. Calibrating anisotropy is a more challenging problem, which was tackled recently in [48, 67] by using images of a planar surface. Some photometric stereo methods also circumvent calibration by (partly or completely) automatically inferring lighting during the 3D-reconstruction process [36,37,38, 44, 51, 57].

Still, even in the calibrated case, designing numerical schemes for solving photometric stereo under nearby point light sources remains difficult. When only two images are considered, the photometric stereo model can be simplified using image ratios. This yields a quasi-linear PDE [42, 43] which can be solved by provably convergent front propagation techniques, provided that a boundary condition is known. To improve robustness, this strategy has been adapted to the multi-images case in [38, 39, 41, 56], using variational methods. However, convergence guarantees are lost. Instead of considering such a differential approach, another class of methods [2, 8, 13, 29, 34, 47, 51, 69] rather modify the classical photometric stereo framework [64], by alternatingly estimating the normals and the albedo, integrating the normals into a depth map, and updating the lighting based on the current depth. Yet, no convergence guarantee does exist. A method based on mesh deformation has also been proposed in [68], but convergence is not established either.

1.2 Contributions

In contrast to existing works which focus either on modeling, calibrating or solving photometric stereo with near point light sources such as LEDs, the objective of this article is to propose a comprehensive study of all these aspects of the problem. Building upon our previous conference papers [56,57,58], we introduce the following innovations:

  • We present in Sect. 2 an accurate model for photometric stereo under point light source illumination. As in recent works [38, 39, 41,42,43, 47, 48, 67], this model takes into account the nonlinearities due to distance and to the anisotropy of the LEDs. Yet, it also clarifies the notions of albedo and of source intensity, which are shown to be relative to a reference albedo and to several parameters of the camera, respectively. This section also introduces a practical calibration procedure for the location, the orientation and the relative intensity of the LEDs.

  • Section 3 reviews and improves two state-of-the-art numerical solutions in several manners. We first modify the alternating method [2, 8, 13, 29, 34, 47, 51, 69] by introducing an estimation of the shape scale, in order to recover the absolute depth without any prior. We then study the PDE-based approach which employs image ratios for eliminating the nonlinearities [38, 39, 41, 56], and empirically show that local minima can be avoided by employing an augmented Lagrangian strategy. Nevertheless, neither of these state-of-the-art methods is provably convergent.

  • Therefore, we introduce in Sect. 4 a new, provably convergent method, inspired by the one recently proposed in [57]. It is based on a tailored alternating reweighted least-squares scheme for approximately solving the nonlinearized system of PDEs. Following [58], we further show that this method is easily extended in order to address shadows and specularities.

  • In Sect. 5, we build upon the analysis conducted in [56] in order to tackle the case of RGB-valued images, before concluding and suggesting several future research directions in Sect. 6.

2 Photometric Stereo Under Point Light Source Illumination

Conventional photometric stereo [64] assumes that the primary luminous fluxes are parallel and uniform, which is difficult to guarantee. It is much easier to illuminate a scene with LEDs.

Keeping this in mind, we have developed a photometric stereo-based setup for 3D-reconstruction of faces, which includes \(m = 8\) LEDsFootnote 2 located at about 30 cm from the scene surface (see Fig. 2a). The face is photographed by a Canon EOS 7D camera with focal length \(f = 35\) mm. Triggering the shutter in burst mode, while synchronically lighting the LEDs, provides us with \(m = 8\) images such as those of Fig. 2b–d. In this section, we aim at modeling the formation of such images, by establishing the following result: 

If the m LEDs are modeled as anisotropic (imperfect Lambertian) point light sources, if the surface is Lambertian and if all the automatic settings of the camera are deactivated, then the formation of the m images can be modeled as follows, for \(i \in \{1,\ldots ,m\}\):

$$\begin{aligned} I^i(\mathbf {p}) = \varPsi ^i \, \overline{\rho }(\mathbf {p}) \left[ \frac{ \mathbf {n}^i_s \cdot \left( \mathbf {x}-\mathbf {x}^i_s \right) }{\Vert \mathbf {x}-\mathbf {x}^i_s\Vert } \right] ^{\mu ^i} \frac{\left\{ (\mathbf {x}^i_s-\mathbf {x}) \cdot \mathbf {n}(\mathbf {p}) \right\} _+}{\Vert \mathbf {x}^i_s-\mathbf {x}\Vert ^3}, \end{aligned}$$
(2.1)

where:

  • \(I^i(\mathbf {p})\) is the “corrected gray level” at pixel  \(\mathbf {p}\) conjugate to a point \(\mathbf {x}\) located on the surface [cf. Eq. (2.12)];

  • \(\varPsi ^i\) is the intensity of the ith source multiplied by an unknown factor, which is common to all the sources and depends on several camera parameters and on the albedo \(\rho _0\) of a Lambertian planar calibration pattern [cf. Eq. (2.14)];

  • \( \overline{\rho }(\mathbf {p})\) is the albedo of the surface point \(\mathbf {x}\) conjugate to pixel \(\mathbf {p}\), relatively to  \(\rho _0\) [cf. Eq. (2.22)];

  • \(\mathbf {n}^i_s \in \mathbb {S}^2 \subset \mathbb {R}^3\) is the (unit-length) principal direction of the ith source,  \(\mathbf {x}^i_s \in \mathbb {R}^3\) its location (cf. Fig. 2), and \(\mu ^i \ge 0\) its anisotropy parameter [cf. Fig. 3 and Eq. (2.5)];

  • \(\{\cdot \}_+\) is the positive part operator, which accounts for self-shadows:

    $$\begin{aligned} \{x\}_+ = \max \{x,0\}. \end{aligned}$$
    (2.2)

In Eq. (2.1), the anisotropy parameters \(\mu ^i\) are (indirectly) provided by the manufacturer (cf. Eq. (2.6)), and the other LEDs parameters \(\varPsi ^i\), \(\mathbf {n}^i_s\) and \(\mathbf {x}^i_s\) can be calibrated thanks to the procedure described in Sect. 2.2. The only unknowns in System (2.1) are thus the depth z of the 3D-point \(\mathbf {x}\) conjugate to \(\mathbf {p}\), its (relative) albedo \(\overline{\rho }(\mathbf {p})\) and its normal \(\mathbf {n}(\mathbf {p})\). The estimation of these unknowns will be discussed in Sects. 3 and 4. Before that, let us show step-by-step how to derive Eq. (2.1).

Fig. 2
figure 2

a Our photometric stereo-based experimental setup for 3D-reconstruction of faces using a Canon EOS 7D camera (highlighted in red) and \(m = 8\) LEDs (highlighted in blue). The walls are painted in black in order to avoid the reflections between the scene and the environment. bd Three out of the \(m = 8\) images obtained by this setup (Color figure online)

2.1 Modeling the Luminous Flux Emitted by a LED

For the LEDs we use, the characteristic illuminating volume is of the order of one cubic millimeter. Therefore, in comparison with the scale of a face, each LED can be seen as a point light source located at \(\mathbf {x}_s \in \mathbb {R}^3\). At any point \(\mathbf {x} \in \mathbb {R}^3\), the lighting vector \(\mathbf {s}(\mathbf {x})\) is necessarily radial i.e., collinear with the unit-length vector \(\mathbf u _r = \frac{\mathbf {x} - \mathbf {x}_s}{\Vert \mathbf {x} - \mathbf {x}_s \Vert }\). Using spherical coordinates \((r, \theta , \phi )\) of \(\mathbf {x}\) in a frame having \(\mathbf {x}_s\) as origin, it is written

$$\begin{aligned} \mathbf {s}(\mathbf {x}) = -\frac{\varPhi (\theta ,\phi )}{r^2} \, \mathbf u _r, \end{aligned}$$
(2.3)

where \(\varPhi (\theta ,\phi )\geqslant 0\) denotes the intensity of the sourceFootnote 3, and the \(1/r^2\) attenuation is a consequence of the conservation of luminous energy in a non-absorbing medium. Vector \(\mathbf {s}(\mathbf {x})\) is purposely oriented in the opposite direction from that of the light, in order to simplify the writing of the Lambertian model.

Model (2.3) is very general. We could project the intensity \(\varPhi (\theta ,\phi )\) on the spherical harmonics basis, which allowed Basri et al. to model the luminous flux in the case of uncalibrated photometric stereo [6]. We could also sample \(\varPhi (\theta ,\phi )\) in the vicinity of a plane, using a plane with known reflectance [3, 40, 61].

Using the specific characteristics of LEDs may lead to a more accurate model. Indeed, most of the LEDs emit a luminous flux which is invariant by rotation around a principal direction indicated by a unit-length vector \(\mathbf {n}_s\) [46]. If \(\theta \) is defined relatively to \(\mathbf {n}_s\), this means that \(\varPhi (\theta ,\phi )\) is independent from \(\phi \). The lighting vector in \(\mathbf {x}\) induced by a LED located in \(\mathbf {x}_s\) is thus written

$$\begin{aligned} \mathbf {s}(\mathbf {x}) = \frac{\varPhi (\theta )}{\Vert \mathbf {x}_s-\mathbf {x}\Vert ^2} \, \frac{\mathbf {x}_s-\mathbf {x}}{\Vert \mathbf {x}_s-\mathbf {x}\Vert }. \end{aligned}$$
(2.4)

The dependency on \(\theta \) of the intensity \(\varPhi \) characterizes the anisotropy of the LED. The function \(\varPhi (\theta )\) is generally decreasing over \([0, \pi / 2]\) (cf. Fig. 3).

Fig. 3
figure 3

Source: http://www.lumileds.com/uploads/28/DS64-pdf)

Intensity patterns of the LEDs used. a Anisotropy function \(\varPhi (\theta ) / \varPhi _0\) as a function of \(\theta \). b Polar representation. These diagrams show us that \(\theta _{1/2} = \pi /3\), which corresponds to \(\mu = 1\) according to Eq. (2.6) (Lambertian source).

An anisotropy model satisfying this constraint is that of “imperfect Lambertian source”:

$$\begin{aligned} \varPhi (\theta ) = \varPhi _0 \, \cos ^\mu \theta , \end{aligned}$$
(2.5)

which contains two parameters \(\varPhi _0=\varPhi (0)\) and \(\mu \geqslant 0\), and models both isotropic sources (\(\mu =0\)) and Lambertian sources (\(\mu =1\)). Model (2.5) is empirical, and more elaborate models are sometimes considered [46], yet it has already been used in photometric stereo [38, 39, 41, 42, 47, 48, 57, 67], including the case where all the LEDs are arranged on a plane parallel to the image plane, in such a way that \(\mathbf {n}_s = [0,0,1]^\top \) [43]. Model (2.5) has proven itself and, moreover, LEDs manufacturers provide the angle \(\theta _{1/2}\) such that \(\varPhi (\theta _{1/2}) = \varPhi _0/2\), from which we deduce, using (2.5), the value of \(\mu \):

$$\begin{aligned} \mu = -\frac{\log (2)}{\log (\cos \theta _{1/2})}. \end{aligned}$$
(2.6)

As shown in Fig. 3, the angle \(\theta _{1/2}\) is \(\pi /3\) for the LEDs we use. From Eq. (2.6), we deduce that \(\mu =1\), which means that these LEDs are Lambertian. Plugging the expression (2.5) of \(\varPhi (\theta )\) into (2.4), we obtain

$$\begin{aligned} \mathbf {s}(\mathbf {x}) = \varPhi _0 \, \cos ^\mu \theta \, \frac{\mathbf {x}_s-\mathbf {x}}{\Vert \mathbf {x}_s-\mathbf {x}\Vert ^3}, \end{aligned}$$
(2.7)

where we explicitly keep \(\mu \) to address the most general case. Model (2.7) thus includes seven parameters: three for the coordinates of \(\mathbf {x}_s\), two for the unit vector \(\mathbf {n}_s\), plus \(\varPhi _0\) and \(\mu \). Note that \(\mathbf {n}_s\) appears in this model through the angle \(\theta \).

In its uncalibrated version, photometric stereo allows the 3D-reconstruction of a scene surface without knowing the lighting. Uncalibrated photometric stereo has been widely studied, including the case of nearby point light sources [29, 36, 44, 51, 69], but if this is possible, we should rather calibrate the lightingFootnote 4.

2.2 Calibrating the Luminous Flux Emitted by a LED

Most calibration methods of a point light source [1, 4, 11, 17, 22, 54, 59, 62] do not take into account the attenuation of the luminous flux density as a function of the distance to the source, nor the possible anisotropy of the source, which may lead to relatively imprecise results. To our knowledge, there are few calibration procedures taking into account these phenomena. In [67], Xie et al. use a single pattern, which is partially specular and partially Lambertian, to calibrate a LED. We intend to improve this procedure using two patterns, one specular and the other Lambertian. The specular one will be used to determine the location of the LEDs by triangulation, and the Lambertian one to determine some other parameters by minimizing the reprojection error, as recently proposed by Pintus et al. in [53].

2.2.1 Specular Spherical Calibration Pattern

The location \(\mathbf {x}_s\) of a LED can be determined by triangulation. In [54], Powell et al. advocate the use of a spherical mirror. To estimate the locations of the \(m = 8\) LEDs for our setup, we use a billiard ball. Under perspective projection, the edge of the silhouette of a sphere is an ellipse, which we detect using a dedicated algorithm [52]. It is then easy to determine the 3D-coordinates of any point on the surface, as well as its normal, since the radius of the billiard ball is known. For each pose of the billiard ball, detecting the reflection of the LED allows us to determine, by reflecting the line of sight on the spherical mirror, a line in 3D-space passing through \(\mathbf {x}_s\). In theory, two poses of the billiard ball are enough to estimate \(\mathbf {x}_s\), even if two lines in 3D-space do not necessarily intersect, but the use of ten poses improves the robustness of the estimation.

2.2.2 Lambertian Model

To estimate the principal direction \(\mathbf {n}_s\) and the intensity \(\varPhi _0\) in Model (2.7), we use a Lambertian calibration pattern. A surface is Lambertian if the apparent clarity of any point \(\mathbf {x}\) located on it is independent from the viewing angle. The luminance \(L(\mathbf {x})\), which is equal to the luminous flux emitted per unit of solid angle and per unit of apparent surface, is independent from the direction of emission. However, the luminance is not characteristic of the surface, as it depends on the illuminance \(E(\mathbf {x})\) (denoted E from French “clairement”), that is to say on the luminous flux per unit area received by the surface in \(\mathbf {x}\). The relationship between luminance and illuminanceFootnote 5 is written, for a Lambertian surface:

$$\begin{aligned} L(\mathbf {x}) = \frac{\rho (\mathbf {x})}{\pi }\,E(\mathbf {x}), \end{aligned}$$
(2.8)

where the albedo \(\rho (\mathbf {x})\in [0,1]\) is defined as the proportion of luminous energy which is reemitted, i.e., \(\rho (\mathbf {x}) = 1\) if \(\mathbf {x}\) is white, and \(\rho (\mathbf {x}) = 0\) if it is black.

The parameter \(\rho (\mathbf {x})\) is enough to characterize the reflectanceFootnote 6 of a Lambertian surface. In addition, the illuminance at a point \(\mathbf {x}\) of a (not necessarily Lambertian) surface with normal \(\mathbf {n}(\mathbf {x})\), lit by the lighting vector \(\mathbf {s}(\mathbf {x})\), is writtenFootnote 7

$$\begin{aligned} E(\mathbf {x}) = \left\{ \mathbf {s}(\mathbf {x}) \cdot \mathbf {n}(\mathbf {x})\right\} _+. \end{aligned}$$
(2.9)

Focusing the camera on a point \(\mathbf {x}\) of the scene surface, the illuminance \(\epsilon (\mathbf {p})\) of the image plane, at pixel \(\mathbf {p}\) conjugate to \(\mathbf {x}\), is related to the luminance \(L(\mathbf {x})\) by the following “almost linear” relationship [27]:

$$\begin{aligned} \epsilon (\mathbf {p}) = \beta \, \cos ^4\alpha (\mathbf {p}) \, L(\mathbf {x}), \end{aligned}$$
(2.10)

where \(\beta \) is a proportionality coefficient characterizing the clarity of the image, which depends on several factors such as the lens aperture and the magnification. Regarding the factor \(\cos ^4 \alpha (\mathbf {p})\), where \(\alpha (\mathbf {p})\) is the angle between the line of sight and the optical axis, it is responsible for darkening at the periphery of the image. This effect should not be confused with vignetting, since it occurs even with ideal lenses [16].

With current photosensitive receptors, the gray level \(J(\mathbf {p})\) at pixel \(\mathbf {p}\) is almost proportionalFootnote 8 to its illuminance \(\epsilon (\mathbf {p})\), except of course in case of saturation. Denoting \(\gamma \) this coefficient of quasi-proportionality, and combining equalities (2.8), (2.9) and (2.10), we get the following expression of the gray level in a pixel \(\mathbf {p}\) conjugate to a point \(\mathbf {x}\) located on a Lambertian surface:

$$\begin{aligned} J(\mathbf {p}) = \gamma \, \beta \, \cos ^4\alpha (\mathbf {p}) \, \frac{\rho (\mathbf {x})}{\pi } \, \left\{ \mathbf {s}(\mathbf {x}) \cdot \mathbf {n}(\mathbf {x})\right\} _+. \end{aligned}$$
(2.11)

We have already mentioned that there is a one-to-one correspondence between a point \(\mathbf {x}\) and its conjugate pixel \(\mathbf {p}\), which allows us to denote \(\rho (\mathbf {p})\) and \(\mathbf {n}(\mathbf {p})\) instead of \(\rho (\mathbf {x})\) and \(\mathbf {n}(\mathbf {x})\). As the factor \(\cos ^4 \alpha (\mathbf {p})\) is easy to calculate in each pixel \(\mathbf {p}\) of the photosensitive receptor, since \(\cos \alpha (\mathbf {p}) = \frac{f}{\sqrt{\Vert \mathbf {p}\Vert ^2+f^2}}\), we can very easily compensate for this source of darkening and will manipulate from now on the “corrected gray level”:

$$\begin{aligned} I(\mathbf {p}) = \frac{J(\mathbf {p})}{\cos ^4\alpha (\mathbf {p})}= \gamma \, \beta \, \frac{\rho (\mathbf {p})}{\pi } \, \left\{ \mathbf {s}(\mathbf {x}) \cdot \mathbf {n}(\mathbf {p}) \right\} _+. \end{aligned}$$
(2.12)

2.2.3 Lambertian Planar Calibration Pattern

To estimate the parameters \(\mathbf {n}_s\) and \(\varPhi _0\) in Model (2.7), i.e., to achieve photometric calibration, we use a second calibration pattern consisting of a checkerboard printed on a white paper sheet, which is itself stuck on a plane (cf. Fig. 4), with the hope that the unavoidable outliers to the Lambertian model will not influence the accuracy of the estimates too much.

Fig. 4
figure 4

Two out of the q poses of the Lambertian planar calibration pattern used for the photometric calibration of the LEDs. The parts of the white cells which are used for estimating the LEDs principal directions and intensities are highlighted in red

The use of a convex calibration pattern (planar, in this case) has a significant advantage: the lighting vector \(\mathbf {s}(\mathbf {x})\) at any point \(\mathbf {x}\) of the surface is purely primary i.e., it is only due to the light source, without “bouncing” on other parts of the surface of the target, provided that the walls and surrounding objects are covered in black (see Fig. 2a). Thanks to this observation, we can replace the lighting vector \(\mathbf {s}(\mathbf {x})\) in Eq. (2.12) by the expression (2.7) which models the luminous flux emitted by a LED. From (2.7) and (2.12), we deduce the gray level \(I(\mathbf {p})\) of the image of a point \(\mathbf {x}\) located on this calibration pattern, illuminated by a LED:

$$\begin{aligned} I(\mathbf {p}) = \gamma \, \beta \, \frac{\rho (\mathbf {p})}{\pi } \, \varPhi _0 \cos ^\mu \theta \frac{ \left\{ (\mathbf {x}_s-\mathbf {x}) \cdot \mathbf {n}(\mathbf {p}) \right\} _+}{\Vert \mathbf {x}_s-\mathbf {x}\Vert ^3}. \end{aligned}$$
(2.13)

If \(q \geqslant 3\) poses of the checkerboard are used, numerous algorithms exist for unambiguously estimating the coordinates of the points \(\mathbf {x}^j\) of the pattern, for the different poses \(j \in \{1,\ldots ,q\}\). These algorithms also allow the estimation of the q normals \(\mathbf {n}^j\) (we omit the dependency in \(\mathbf {p}\) of \(\mathbf {n}^j\), since the pattern is planar), and the intrinsic parameters of the cameraFootnote 9. As for the albedo, if the use of white paper does not guarantee that \(\rho (\mathbf {p}) \equiv 1\), it still seems reasonable to assume \(\rho (\mathbf {p}) \equiv \rho _0\) i.e., to assume a uniform albedo in the white cells. We can then group all the multiplicative coefficients of the right hand side of Eq. (2.13) into one coefficient

$$\begin{aligned} \varPsi = \gamma \, \beta \, \frac{\rho _0}{\pi } \, \varPhi _0. \end{aligned}$$
(2.14)

With this definition, and knowing that \(\theta \) is the angle between vectors \(\mathbf {n}_s\) and \(\mathbf {x}-\mathbf {x}_s\), Eq. (2.13) can be rewritten, in a pixel \(\mathbf {p}\) of the set \(\varOmega ^j\) containing the white pixels of the checkerboard in the \(j^\mathrm{th}\) pose (these pixels are highlighted in red in the images of Fig. 4):

$$\begin{aligned} I^j(\mathbf {p}) = \varPsi \left[ \frac{\mathbf {n}_s \cdot \left( \mathbf {x}^j-\mathbf {x}_s\right) }{\Vert \mathbf {x}^j-\mathbf {x}_s\Vert } \right] ^\mu \frac{\left\{ (\mathbf {x}_s-\mathbf {x}^j) \cdot \mathbf {n}^j \right\} _+}{\Vert \mathbf {x}_s-\mathbf {x}^j\Vert ^3}. \end{aligned}$$
(2.15)

To be sure that in Eq. (2.15), \(\varPsi \) is independent from the pose j, we must deactivate all automatic settings of the camera, in order to make \(\beta \) and \(\gamma \) constant.

Since \(\mathbf {x}_s\) is already estimated, and the value of \(\mu \) is known, the only unknowns in Eq. (2.15) are \(\mathbf {n}_s\) and \(\varPsi \). Two cases may occur:

  • If the LED to calibrate is isotropic i.e., if \(\mu =0\), then it is useless to estimate \(\mathbf {n}_s\), and \(\varPsi \) can be estimated in a least-squares sense, by solving

    $$\begin{aligned} \underset{\varPsi }{{\min }} \sum _{j=1}^{q} \sum _{\mathbf {p} \in \varOmega ^j} \left[ I^j(\mathbf {p}) - \varPsi \, \frac{\left\{ (\mathbf {x}_s-\mathbf {x}^j) \cdot \mathbf {n}^j \right\} _+}{\Vert \mathbf {x}_s-\mathbf {x}^j\Vert ^3} \right] ^2, \end{aligned}$$
    (2.16)

    whose solution is given by

    $$\begin{aligned} \varPsi = \frac{\sum _{j=1}^{q} \sum _{\mathbf {p} \in \varOmega ^j} I^j(\mathbf {p}) \, \frac{\left\{ (\mathbf {x}_s-\mathbf {x}^j) \cdot \mathbf {n}^j \right\} _+}{\Vert \mathbf {x}_s-\mathbf {x}^j\Vert ^3}}{\sum _{j=1}^{q} \sum _{\mathbf {p} \in \varOmega ^j} \left[ \frac{\left\{ (\mathbf {x}_s-\mathbf {x}^j) \cdot \mathbf {n}^j \right\} _+}{\Vert \mathbf {x}_s-\mathbf {x}^j\Vert ^3} \right] ^2}. \end{aligned}$$
    (2.17)
  • Otherwise (if \(\mu >0\)), Eq. (2.15) can be rewritten

    $$\begin{aligned} \underbrace{\varPsi ^{\frac{1}{\mu }} \, \mathbf {n}_s}_{\mathbf {m}_s} \cdot \,(\mathbf {x}^j-\mathbf {x}_s) = \left[ I^j(\mathbf {p}) \, \frac{\Vert \mathbf {x}_s-\mathbf {x}^j\Vert ^{3+\mu }}{\left\{ (\mathbf {x}_s-\mathbf {x}^j) \cdot \mathbf {n}^j \right\} _+} \right] ^{\frac{1}{\mu }} . \end{aligned}$$
    (2.18)

    The least-squares estimation of vector \(\mathbf {m}_s\) defined in (2.18) is thus written

    $$\begin{aligned} \underset{\mathbf {m}_s}{{\min }} \sum _{j=1}^{q} \sum _{\mathbf {p} \in \varOmega ^j} \left[ \mathbf {m}_s \cdot (\mathbf {x}^j-\mathbf {x}_s) - \left[ I^j(\mathbf {p}) \, \frac{\Vert \mathbf {x}_s-\mathbf {x}^j\Vert ^{3+\mu }}{ \left\{ (\mathbf {x}_s-\mathbf {x}^j) \cdot \mathbf {n}^j\right\} _+} \right] ^{\frac{1}{\mu }} \right] ^2. \end{aligned}$$
    (2.19)

    This linear least-squares problem can be solved using the pseudo-inverse. From this estimate, we easily deduce those of parameters \(\mathbf {n}_s\) and \(\varPsi \):

    $$\begin{aligned} \mathbf {n}_s = \frac{\mathbf {m}_s}{\Vert \mathbf {m}_s\Vert } \quad \text {and} \quad \varPsi = \Vert \mathbf {m}_s\Vert ^\mu . \end{aligned}$$
    (2.20)

In both cases, it is impossible to deduce from the estimate of \(\varPsi \) that of \(\varPhi _0\), because in the definition (2.14) of \(\varPsi \), the product \(\gamma \, \beta \, \frac{\rho _0}{\pi }\) is unknown. However, since this product is the same for all LEDs (deactivating all automatic settings of the camera makes \(\beta \) and \(\gamma \) constant), all the intensities \(\varPhi _0^i\), \(i \in \{1,\ldots ,m\}\), are estimated up to a common factor.

Figure 5 shows a schematic representation of the experimental setup of Fig. 2a, where the LEDs parameters were estimated using our calibration procedure.

Fig. 5
figure 5

Two views of a schematic representation of the experimental setup of Fig. 2a. The camera center is located in (0, 0, 0). A black marker characterizes the location \(\mathbf {x}_s\) of each LED (unit mm), the orientation of a blue arrow its principal direction \(\mathbf {n}_s\), and the length of this arrow its intensity \(\varPhi _0\) (up to a common factor) (Color figure online)

2.3 Modeling Photometric Stereo with Point Light Sources

If the luminous flux emitted by a LED is described by Model (2.7), then we obtain from (2.13) and (2.14) the following equation for the gray level at pixel \(\mathbf {p}\):

$$\begin{aligned} I(\mathbf {p}) = \varPsi \, \frac{\rho (\mathbf {p})}{\rho _0} \left[ \frac{\mathbf {n}_s \cdot \left( \mathbf {x}-\mathbf {x}_s \right) }{\Vert \mathbf {x}-\mathbf {x}_s\Vert } \right] ^\mu \frac{\left\{ (\mathbf {x}_s-\mathbf {x}) \cdot \mathbf {n}(\mathbf {p})\right\} _+}{\Vert \mathbf {x}_s-\mathbf {x}\Vert ^3}. \end{aligned}$$
(2.21)

Let us introduce a new definition of the albedo relative to the albedo \(\rho _0\) of the Lambertian planar calibration pattern:

$$\begin{aligned} \overline{\rho }(\mathbf {p}) = \frac{\rho (\mathbf {p})}{\rho _0}. \end{aligned}$$
(2.22)

By writing Eq. (2.21) with respect to each LED, and by using Eq. (2.22), we obtain, in each pixel \(\mathbf {p}\in \varOmega \), the system of equations (2.1), for \(i\in \{1,\ldots ,m\}\).

To solve this system, the introduction of the auxiliary variable \(\overline{\mathbf {m}}(\mathbf {p}) = \overline{\rho }(\mathbf {p})\, \mathbf {n}(\mathbf {p})\) may seem relevant, since this vector is not constrained to have unit-length, but we will see that this trick loses part of its interest. Defining the following m vectors, \(i\in \{1,\ldots ,m\}\):

$$\begin{aligned} \mathbf {t}^i(\mathbf {x}) = \varPsi ^i \left[ \frac{\mathbf {n}^i_s \cdot \left( \mathbf {x}-\mathbf {x}^i_s\right) }{\Vert \mathbf {x}-\mathbf {x}^i_s\Vert } \right] ^{\mu ^i} \frac{\mathbf {x}^i_s-\mathbf {x}}{\Vert \mathbf {x}^i_s-\mathbf {x}\Vert ^3}, \end{aligned}$$
(2.23)

and neglecting self-shadows (\(\{x\}_+ = x\)), then System (2.1) is rewritten in matrix form:

$$\begin{aligned} \mathbf {I}(\mathbf {p}) = \mathbf {T}(\mathbf {x}) \, \overline{\mathbf {m}}(\mathbf {p}), \end{aligned}$$
(2.24)

where \(\mathbf {I}(\mathbf {p})\in \mathbb {R}^m\) has been defined in (1.5) and \(\mathbf {T}(\mathbf {x})\in \mathbb {R}^{m\times 3}\) is defined as follows:

$$\begin{aligned} \mathbf {T}(\mathbf {x}) = \begin{bmatrix} \mathbf {t}^{1}(\mathbf {x})^\top \\ \vdots \\ \mathbf {t}^{m}(\mathbf {x})^\top \end{bmatrix}. \end{aligned}$$
(2.25)

Equation (2.24) is similar to (1.4). Knowing the matrix field \(\mathbf {T}(\mathbf {x})\) would allow us to estimate its field of pseudo-inverses in order to solve (2.24), just as calculating the pseudo-inverse of \(\mathbf {S}\) allows us to solve (1.4). However, the matrix field \(\mathbf {T}(\mathbf {x})\) depends on \(\mathbf {x}\), and thus on the unknown depth. This simple difference induces major changes when it comes to the numerical solution, as discussed in the next two sections.

3 A Review of Two Variational Approaches for Solving Photometric Stereo Under Point Light Source Illumination, with New Insights

In this section, we study two variational approaches from the literature for solving photometric stereo under point light source illumination.

The first one inverts the nonlinear image formation model by recasting it as a sequence of simpler subproblems [2, 8, 13, 29, 34, 47, 51, 69]. It consists in estimating the normals and the albedo, assuming that the depth map is fixed, then integrating the normals into a new depth map, and to iterate. We show in Sect. 3.1 how to improve this standard method in order to estimate absolute depth, without resorting to any prior.

The second approach first linearizes the image formation model by resorting to image ratios, then directly estimates the depth by solving the resulting system of PDEs in an approximate manner [38, 39, 41, 56]. We show in Sect. 3.2 that state-of-the-art solutions, which resort to fixed point iterations, may be trapped in local minima. This shortcoming can be avoided by rather using an augmented Lagrangian algorithm.

As in these state-of-the-art methods, self-shadows will be neglected through out this section, i.e., we abusively assume \(\{x\}_+ = x\). To enforce robustness, we simply follow the approach advocated in [10], which systematically eliminates, in each pixel, the highest gray level, which may come from a specular highlight, as well as the two lowest ones, which may correspond to shadows. More elaborate methods for ensuring robustness will be discussed in Sect. 4.

Apart from robustness issues, we will see that the state-of-the-art methods studied in this section remain unsatisfactory, because their convergence is not established.

3.1 Scheme Inspired by the Classical Numerical Solution of Photometric Stereo

For solving Problem (2.24), it seems quite natural to adapt the solution (1.6) of the linear model (1.4). To linearize (2.24), we have to assume that matrix \(\mathbf {T}(\mathbf {x})\) is known. If we proceed iteratively, this can be made possible by replacing, at iteration \((k+1)\), \(\mathbf {T}(\mathbf {x})\) by \(\mathbf {T}(\mathbf {x}^{(k)})\). This very simple idea has led to several numerical solutions [2, 8, 13, 29, 34, 47, 51, 69], which all require some kind of a priori knowledge on the depth. On the contrary, the scheme we propose here requires none, which constitutes a significant improvement. This new scheme consists in the following algorithm:

figure g

For this scheme to be completely specified, we need to set the initial 3D-shape \(\mathbf {x}^{(0)}\). We use as initial guess a fronto-parallel plane at distance \(z_0\) from the camera, \(z_0\) being a rough estimate of the mean distance from the camera to the scene surface.

3.1.1 Integration of Normals

Stages 3 and 4 of the scheme above are trivial and can be achieved pixelwise, but Stages 5 and 6 are trickier. From equalities in (1.1), and by denoting \(\nabla z(\mathbf {p}) = \left[ \partial _u z(\mathbf {p}),\partial _v z(\mathbf {p})\right] ^\top \) the gradient of z in \(\mathbf {p}\), it is easy to deduce that the (non-unit-length) vector

$$\begin{aligned} \overline{\mathbf {n}}(\mathbf {p}) = \begin{bmatrix} f \, \partial _u z(\mathbf {p}) \\ f \, \partial _v z(\mathbf {p}) \\ -z(\mathbf {p}) - \mathbf {p} \cdot \nabla z(\mathbf {p}) \end{bmatrix} \end{aligned}$$
(3.4)

is normal to the surface. Expression (3.4) shows that integrating the (unit-length) normal field \(\mathbf {n}\) allows to estimate the depth z only up to a scale factor \(\kappa \in \mathbb {R}\), since:

$$\begin{aligned} \mathbf {n}(\mathbf {p}) \propto \begin{bmatrix} f \, \partial _u z(\mathbf {p}) \\ f \, \partial _v z(\mathbf {p}) \\ -z(\mathbf {p}) - \mathbf {p} \cdot \nabla z(\mathbf {p}) \end{bmatrix} \propto \begin{bmatrix} f \, \partial _u (\kappa \, z)(\mathbf {p}) \\ f \, \partial _v (\kappa \, z)(\mathbf {p}) \\ - (\kappa \, z)(\mathbf {p}) - \mathbf {p} \cdot \nabla (\kappa \, z)(\mathbf {p}) \end{bmatrix}. \end{aligned}$$
(3.5)

The collinearity of \(\overline{\mathbf {n}}(\mathbf {p})\) and \(\mathbf {n}(\mathbf {p}) = [n_1(\mathbf {p}),n_2(\mathbf {p}),n_3(\mathbf {p})]^\top \) leads to the system

$$\begin{aligned} {\left\{ \begin{array}{ll} n_3(\mathbf {p}) \, f \, \partial _u z(\mathbf {p}) + n_1(\mathbf {p}) \left[ z(\mathbf {p}) + \mathbf {p} \cdot \nabla z(\mathbf {p}) \right] = 0, \\ n_3(\mathbf {p}) \, f \, \partial _v z(\mathbf {p}) + n_2(\mathbf {p}) \left[ z(\mathbf {p}) + \mathbf {p} \cdot \nabla z(\mathbf {p}) \right] = 0, \end{array}\right. } \end{aligned}$$
(3.6)

which is homogeneous in \(z(\mathbf {p})\). Introducing the change of variable \(\tilde{z} = \log (z)\), which is valid since \(z > 0\), (3.6) is rewritten

$$\begin{aligned} {\left\{ \begin{array}{ll} \left[ f \, n_3(\mathbf {p}) + u \, n_1(\mathbf {p}) \right] \partial _u \tilde{z}(\mathbf {p}) + v \, n_1(\mathbf {p}) \partial _v \tilde{z}(\mathbf {p}) = - n_1(\mathbf {p}), \\ u \, n_2(\mathbf {p}) \partial _u \tilde{z}(\mathbf {p}) + \left[ f \, n_3(\mathbf {p}) + v \, n_2(\mathbf {p}) \right] \partial _v \tilde{z}(\mathbf {p}) = - n_2(\mathbf {p}). \end{array}\right. } \end{aligned}$$
(3.7)

The determinant of this system is equal to

$$\begin{aligned} f \, n_3(\mathbf {p}) \left[ u \, n_1(\mathbf {p}) {+} v \, n_2(\mathbf {p}) {+} f \, n_3(\mathbf {p})\right] {=} f \, n_3(\mathbf {p}) \left[ \overline{\mathbf {p}} \cdot \mathbf {n}(\mathbf {p})\right] , \end{aligned}$$
(3.8)

if we denote

$$\begin{aligned} \overline{\mathbf {p}} = [u,v,f]^\top . \end{aligned}$$
(3.9)

It is then easy to deduce the solution of (3.7):

$$\begin{aligned} \nabla \tilde{z}(\mathbf {p}) = - \frac{1}{\overline{\mathbf {p}} \cdot \mathbf {n}(\mathbf {p})} \begin{bmatrix} n_1(\mathbf {p}) \\ n_2(\mathbf {p}) \end{bmatrix}. \end{aligned}$$
(3.10)

Let us now come back to Stages 5 and 6 of Algorithm 1. The new normal field is \(\mathbf {n}^{(k+1)}(\mathbf {p})\), from which we can deduce the gradient \(\nabla \tilde{z}^{(k+1)}(\mathbf {p})\) thanks to Eq. (3.10). By integrating this gradient between a pixel \(\mathbf {p}_0\), chosen arbitrarily inside \(\varOmega \), and any pixel \(\mathbf {p}\in \varOmega \), and knowing that \(z = \exp \{\tilde{z}\}\), we obtain:

$$\begin{aligned} z^{(k+1)}(\mathbf {p}) = z^{(k+1)}(\mathbf {p}_0) \, \exp \left\{ \int _{\mathbf {p}_0}^{\mathbf {p}} \nabla \tilde{z}^{(k+1)}(\mathbf {q}) \cdot \mathrm {d}\mathbf {q} \right\} . \end{aligned}$$
(3.11)

This integral can be calculated along one single path inside \(\varOmega \) going from \(\mathbf {p}_0\) to \(\mathbf {p}\), but since the gradient field \(\nabla \tilde{z}^{(k+1)}(\mathbf {p})\) is never rigorously integrable in practice, this calculus usually depends on the choice of the path [66]. The most common parry to this well-known problem consists in resorting to a variational approach, see for instance [55] for some discussion.

Expression (3.11) confirms that the depth can only be calculated, from \(\mathbf {n}^{(k+1)}(\mathbf {p})\), up to a scale factor equal to \(z^{(k+1)}(\mathbf {p}_0)\). Let us determine this scale factor by minimization of the reprojection error of Model (2.24) over the entire domain \(\varOmega \). Knowing that, from (1.1) and (3.9), we get \(\mathbf {x} = \frac{z}{f}\, \overline{\mathbf {p}}\), this comes down to solving the following nonlinear least-squares problem:

$$\begin{aligned}&z^{(k+1)}(\mathbf {p}_0) = \underset{w \, \in \, \mathbb {R}^+}{\arg \min ~} \mathcal {E}_{\mathrm {alt}}(w):= \sum _{\mathbf {p} \in \varOmega } \Big \Vert \mathbf {I}(\mathbf {p}) \nonumber \\&\quad - \mathbf {T} \Big (\frac{w}{f} \exp \left\{ \int _{\mathbf {p}_0}^{\mathbf {p}} \nabla \tilde{z}^{(k+1)}(\mathbf {q}) \cdot \mathrm {d}\mathbf {q}\right\} \overline{\mathbf {p}} \Big ) \, \overline{\mathbf {m}}^{(k+1)}(\mathbf {p}) \Big \Vert ^2, \end{aligned}$$
(3.12)

which allows us to eventually write the 3D-shape update (Stages 5 and 6):

$$\begin{aligned} \mathbf {x}^{(k+1)} = \frac{z^{(k+1)}(\mathbf {p}_0)}{f} \, \exp \left\{ \int _{\mathbf {p}_0}^{\mathbf {p}} \nabla \tilde{z}^{(k+1)}(\mathbf {q}) \cdot \mathrm {d}\mathbf {q} \right\} \overline{\mathbf {p}}. \end{aligned}$$
(3.13)

3.1.2 Experimental Validation

Despite the lack of theoretical guarantee, convergence of this scheme is empirically observed, provided that the initial 3D-shape \(\mathbf {x}^{(0)}\) is not too distant from the scene surface. For the curves in Fig. 6, several fronto-parallel planes with equation \(z\equiv z_0\) were tested as initial guess. The mean distance from the camera to the scene being approximately 700 mm, it is not surprising that the fastest convergence is observed for this value of \(z_0\). Besides, this graph also shows that under-estimating the initial scale quite a lot is not a problem, whereas over-estimating it severely slows down the process.

Fig. 6
figure 6

Evolution of the energy \(\mathcal {E}_{\mathrm {alt}}\) of the alternating approach, defined in (3.12), in function of the iterations, when the initial 3D-shape is a fronto-parallel plane with equation \(z\equiv z_0\). The used data are the \(m=8\) images of the plaster statuette of Fig. 2. The proposed scheme consists in alternating normal estimation, normal integration and scale estimation (cf. Algorithm 1). It converges toward the same solution (at different speeds), for the five tested values of \(z_0\)

Figure 7 allows to compare the 3D-shape obtained by photometric stereo, from sub-images of size \(920\times 1178\) in full resolution (bounding box of the statuette), which contain 773,794 pixels inside \(\varOmega \), with the ground truth obtained by laser scanning, which contains 1,753,010 points. The points density is thus almost the same on the front of the statuette, since we did not reconstruct its back. However, our result is achieved in less than ten seconds (five iterations of a MATLAB code on a recent i7 processor), instead of several hours for the ground truth, while we also estimate the albedo.

Fig. 7
figure 7

a 3D-reconstruction and b albedo obtained with Algorithm 1. c Ground truth 3D-shape obtained by laser scanning. Photometric stereo not only provides a 3D-shape qualitatively similar to the laser scan, but also provides the albedo

Fig. 8
figure 8

a Histogram of point-to-point distances between the alternating 3D-reconstruction and the ground truth (cf. Fig. 7). The median value is 1.3 mm. b Spatial distribution of these distances. The histogram peak is not located in zero. As we will see in Sect. 3.2, this bias can be avoided by resorting to a differential approach based on PDEs

Figure 8a shows the histogram of point-to-point distances between our result (Fig. 7a) and the ground truth (Fig. 7c). The median value is 1.3 mm. The spatial distribution of these distances (Fig. 8b), shows that the largest distances are observed on the highest slopes of the surface. This clearly comes from the facts that, even for a diffuse material such as plaster, the Lambertian model is not valid under skimming lighting, and that self-shadows were neglected.

More realistic reflectance models, such as the one proposed by Oren and Nayar in [49], would perhaps improve accuracy of the 3D-reconstruction in such points, and we will see in Sect. 4 how to handle self-shadows. But, as we shall see now, bias also comes from normal integration. In the next section, we describe a different formulation of photometric stereo which permits to avoid integration, by solving a system of PDEs in z.

3.2 Direct Depth Estimation Using Image Ratios

The scheme proposed in Sect. 3.1 suffers from several defects. It requires to integrate the gradient \(\nabla \tilde{z}^{(k+1)}(\mathbf {p})\) at each iteration. This is not achieved by the naive formulation (3.12), but using more sophisticated methods which allow to overcome the problem of non-integrability [14]. Still, bias due to inaccurate normal estimation should not have to be corrected during integration. Instead, it seems more justified to directly estimate the depth map, without resorting to intermediate normal estimation. This can be achieved by recasting photometric stereo as a system of quasi-linear PDEs.

3.2.1 Differential Reformulation of Problem (2.24)

Let us recall (cf. Eq. (1.1)) that the coordinates of the 3D-point \(\mathbf {x}\) conjugate to a pixel \(\mathbf {p}\) are completely characterized by the depth \(z(\mathbf {p})\):

$$\begin{aligned} \mathbf {x} = \frac{z(\mathbf {p})}{f} \, \begin{bmatrix} \mathbf {p} \\ f \end{bmatrix}. \end{aligned}$$
(3.14)

The vectors \(\mathbf {t}^i(\mathbf {x})\) defined in (2.23) thus depend on the unknown depth values \(z(\mathbf {p})\). Using once again the change of variable \(\tilde{z} = \log (z)\) Footnote 10, we consider from now on each \(\mathbf {t}^i\), \(i \in \{1,\ldots ,m\}\), as a vector field depending on the unknown map \(\tilde{z}\):

$$\begin{aligned} \begin{array}{rccl} \mathbf {t}^i(\tilde{z}):&{} \varOmega &{} \rightarrow &{} \mathbb {R}^3 \\ &{} \mathbf {p} &{} \mapsto &{} \mathbf {t}^i(\tilde{z})(\mathbf {p}) = \varPsi ^i \left[ - \frac{ \mathbf {n}_s^i \cdot \mathbf {v}^i(\tilde{z})(\mathbf {p})}{\Vert \mathbf {v}^i(\tilde{z})(\mathbf {p})\Vert }\right] ^{\mu ^i} \frac{\mathbf {v}^i(\tilde{z})(\mathbf {p})}{\Vert \mathbf {v}^i(\tilde{z})(\mathbf {p})\Vert ^3}, \end{array} \end{aligned}$$
(3.15)

where each field \(\mathbf {t}^i(\tilde{z})\) depends in a nonlinear way on the unknown (log-) depth map \(\tilde{z}\), through the following vector field:

$$\begin{aligned} \begin{array}{rccl} \mathbf {v}^i(\tilde{z}):&{} \varOmega &{} \rightarrow &{} \mathbb {R}^3 \\ &{} \mathbf {p} &{} \mapsto &{} \mathbf {v}^i(\tilde{z})(\mathbf {p}) = \mathbf {x}^i_s - \frac{\exp \left( \tilde{z}(\mathbf {p})\right) }{f} \, \begin{bmatrix} \mathbf {p} \\ f \end{bmatrix}. \end{array} \end{aligned}$$
(3.16)

Knowing that the (non-unit-length) vector \(\overline{\mathbf {n}}(\mathbf {p})\) defined in (3.4), divided by \(z(\mathbf {p})\), is normal to the surface, and still neglecting self-shadows, we can rewrite System (2.1), in each pixel \(\mathbf {p}\in \varOmega \):

$$\begin{aligned}&I^i(\mathbf {p}) = \frac{\overline{\rho }(\mathbf {p})}{d(\tilde{z})(\mathbf {p})} \, \mathbf {t}^i(\tilde{z})(\mathbf {p}) \cdot \begin{bmatrix} f \nabla \tilde{z}(\mathbf {p}) \\ -1 - \mathbf {p} \cdot \nabla \tilde{z}(\mathbf {p}) \end{bmatrix}, \nonumber \\&\quad ~ i\in \{1,\dots ,m\}, \end{aligned}$$
(3.17)

with

$$\begin{aligned} d(\tilde{z})(\mathbf {p}) = \sqrt{f^2 \left\| \nabla \tilde{z}(\mathbf {p}) \right\| ^2 + \left( -1 - \mathbf {p} \cdot \nabla \tilde{z}(\mathbf {p}) \right) ^2 }. \end{aligned}$$
(3.18)

3.2.2 Partial Linearization of (3.17) Using Image Ratios

In comparison with Eqs. (2.1), the PDEs (3.17) explicitly depend on the unknown map \(\tilde{z}\), and thus remove the need for alternating normal estimation and integration. However, these equations contain two difficulties: they are nonlinear and cannot be solved locally. We can eliminate the nonlinearity due to the coefficient of normalization \(d(\tilde{z})(\mathbf {p})\). Indeed, neither the relative albedo \(\overline{\rho }(\mathbf {p})\), nor this coefficient, depend on the index i of the LED. We deduce from any pair \(\{i,j\} \in \{1,\ldots ,m\}^2\), \(i \ne j\), of equations from (3.17), the following equalities:

$$\begin{aligned} \frac{\overline{\rho }(\mathbf {p})}{d(\tilde{z})(\mathbf {p})}&= \frac{I^i(\mathbf {p})}{\mathbf {a}^i(\tilde{z})(\mathbf {p}) \cdot \nabla \tilde{z}(\mathbf {p}) - b^i(\tilde{z})(\mathbf {p}) } \nonumber \\&= \frac{I^j(\mathbf {p})}{\mathbf {a}^j(\tilde{z})(\mathbf {p}) \cdot \nabla \tilde{z}(\mathbf {p}) - b^j(\tilde{z})(\mathbf {p}) }, \end{aligned}$$
(3.19)

with the following definitions of \(\mathbf {a}^i(\tilde{z})(\mathbf {p})\) and \(b^i(\tilde{z})(\mathbf {p})\), denoting \(\mathbf {t}^i(\tilde{z})(\mathbf {p}) = [t^i_{1}(\tilde{z})(\mathbf {p}),t^i_{2}(\tilde{z})(\mathbf {p}),t^i_{3}(\tilde{z})(\mathbf {p})]^\top \):

$$\begin{aligned} \mathbf {a}^i(\tilde{z})(\mathbf {p})&= f \begin{bmatrix} t^i_{1}(\tilde{z})(\mathbf {p}) \\ t^i_{2}(\tilde{z})(\mathbf {p}) \end{bmatrix} - t^i_{3}(\tilde{z})(\mathbf {p}) \, \mathbf {p}, \end{aligned}$$
(3.20)
$$\begin{aligned} b^i(\tilde{z})(\mathbf {p})&= t^i_{3}(\tilde{z})(\mathbf {p}). \end{aligned}$$
(3.21)

From equalities (3.19), we obtain:

$$\begin{aligned}&\underbrace{\begin{bmatrix} I^i(\mathbf {p}) \, \mathbf {a}^j(\tilde{z})(\mathbf {p}) - I^j(\mathbf {p}) \, \mathbf {a}^i(\tilde{z})(\mathbf {p}) \end{bmatrix}}_{\mathbf {a}^{i,j}(\tilde{z})(\mathbf {p})} \cdot \, \nabla \tilde{z}(\mathbf {p}) \nonumber \\&\quad = \underbrace{\left[ I^i(\mathbf {p}) \, b^j(\tilde{z})(\mathbf {p}) - I^j(\mathbf {p}) \, b^i(\tilde{z})(\mathbf {p})\right] }_{b^{i,j}(\tilde{z})(\mathbf {p})}. \end{aligned}$$
(3.22)

The fields \(\mathbf {a}^{i,j}(\tilde{z})\) and \(b^{i,j}(\tilde{z})\) defined in (3.22) depend on \(\tilde{z}\) but not on \(\nabla \tilde{z}\): Eq. (3.22) is thus a quasi-linear PDE in z over \(\varOmega \). It could be solved by the characteristic strips expansion method [42, 43] if we were dealing with \(m=2\) images only, but using a larger number of images is necessary in order to design a robust 3D-reconstruction method. Since we are provided with \(m > 2\) images, we follow [20, 38, 39, 41, 56, 60] and write \(\left( {\begin{array}{c}m\\ 2\end{array}}\right) \) PDEs such as (3.22) formed by the \(\left( {\begin{array}{c}m\\ 2\end{array}}\right) \) pairs \(\{i,j\} \in \{1,\ldots ,m\}^2\), \(i \ne j\). Forming the matrix field \(\mathbf {A}(\tilde{z}):\,\varOmega \rightarrow \mathbb {R}^{\left( {\begin{array}{c}m\\ 2\end{array}}\right) \times 2}\) by concatenation of the row vectors \(\mathbf {a}^{i,j}(\tilde{z})(\mathbf {p})^\top \), and the vector field \(\mathbf {b}(\tilde{z}):\,\varOmega \rightarrow \mathbb {R}^{\left( {\begin{array}{c}m\\ 2\end{array}}\right) }\) by concatenation of the scalar values \(b^{i,j}(\tilde{z})(\mathbf {p})\), the system of PDEs to solve is written:

$$\begin{aligned} \mathbf {A}(\tilde{z}) \, \nabla \tilde{z} = \mathbf {b}(\tilde{z}) \quad \text {over}~\varOmega . \end{aligned}$$
(3.23)

This new differential formulation of photometric stereo seems simpler than the original differential formulation (3.17), since the main source of nonlinearity, due to the denominator \(d(\tilde{z})(\mathbf {p})\), has been eliminated. However, it still presents two difficulties. First, the PDEs (3.23) are generally incompatible and hence do not admit an exact solution. It is thus necessary to estimate an approximate one, by resorting to a variational approach. Assuming that each of the \(\left( {\begin{array}{c}m\\ 2\end{array}}\right) \) equalities in System (3.23) is satisfied up to an additive, zero-mean, Gaussian noiseFootnote 11, one should estimate such a solution by solving the following variational problem:

$$\begin{aligned} \underset{\tilde{z}: \varOmega \rightarrow \mathbb {R}}{\min ~} \,\mathcal {E}_{\mathrm {rat}}(\tilde{z}) := \Vert \mathbf {A}(\tilde{z}) \, \nabla \tilde{z} - \mathbf {b}(\tilde{z}) \Vert _{L^2(\varOmega )}^2. \end{aligned}$$
(3.24)

Second, the PDEs (3.22) do not allow to estimate the scale of the scene. Indeed, when all the depth values simultaneously tend to infinity, then both members of (3.22) tend to zero (because the coordinates of \(\mathbf {t}^i\) do so, cf. (3.15)). Thus, a large, distant 3D-shape will always “better” fit these PDEs (in the sense of the criterion \(\mathcal {E}_{\mathrm {rat}}\) defined in Eq. (3.24)) than a small, nearby one (cf. Figs. 1011). A “locally optimal” solution close to a very good initial estimate should thus be sought.

3.2.3 Fixed Point Iterations for Solving (3.24)

It has been proposed in [38, 39, 41, 56] to iteratively estimate a solution of Problem (3.24), by uncoupling the (linear) estimation of \(\tilde{z}\) from the (nonlinear) estimations of \(\mathbf {A}(\tilde{z})\) and of \(\mathbf {b}(\tilde{z})\). This can be achieved by rewriting (3.24) as the following constrained optimization problem:

$$\begin{aligned} \begin{array}{l} \underset{\tilde{z}: \varOmega \rightarrow \mathbb {R}}{\min ~} \, \Vert \mathbf {A} \, \nabla \tilde{z} - \mathbf {b}\Vert _{L^2(\varOmega )}^2 \\ \text {s.t.} {\left\{ \begin{array}{ll} \mathbf {A} &{}= \mathbf {A}(\tilde{z}), \\ \mathbf {b} &{}= \mathbf {b}(\tilde{z}), \end{array}\right. } \end{array} \end{aligned}$$
(3.25)

and resorting to a fixed point iterative scheme:

$$\begin{aligned} \tilde{z}^{(k+1)}&= \underset{\tilde{z}: \varOmega \rightarrow \mathbb {R}}{\arg \min ~} \Vert \mathbf {A}^{(k)} \, \nabla \tilde{z} - \mathbf {b}^{(k)} \Vert _{L^2(\varOmega )}^2, \end{aligned}$$
(3.26)
$$\begin{aligned} \mathbf {A}^{(k+1)}&= \mathbf {A}(\tilde{z}^{(k+1)}), \end{aligned}$$
(3.27)
$$\begin{aligned} \mathbf {b}^{(k+1)}&= \mathbf {b}(\tilde{z}^{(k+1)}). \end{aligned}$$
(3.28)

In the linear least-squares variational problem (3.26), the solution can be computed only up to an additive constant. Therefore, the matrix of the system arising from the normal equations associated with the discretized problem will be symmetric, positive, but rank-1 deficient, and thus only semi-definite. Figure 9 shows that this may cause the fixed point scheme not to decrease the energy after each iteration. This issue can be resolved by resorting to the alternating direction method of multipliers (ADMM algorithm), a standard procedure which dates back to the 70’s [15, 18], but has been revisited recently [9].

Fig. 9
figure 9

Evolution of the energy \(\mathcal {E}_{\mathrm {rat}}\) of the ratio-based approach, defined in (3.24), in function of the iterations, for the data of Fig. 2 (the initial 3D-shape is a fronto-parallel plane with equation \(z \equiv 700~\hbox {mm}\)). With the fixed point scheme, the energy is not always decreased after each iteration, contrarily to the ADMM scheme we are going to introduce

3.2.4 ADMM Iterations for Solving (3.24)

Instead of “freezing” the nonlinearities of the variational problem (3.24), \(\tilde{z}\) can be estimated not only from the linearized parts, but also from the nonlinear ones. In this view, we introduce an auxiliary variable \(\overline{z}\) and reformulate Problem (3.24) as follows:

$$\begin{aligned} \begin{array}{rl} &{} \underset{\overline{z},\tilde{z}}{\min } \left\| \mathbf {A}(\overline{z}) \, \nabla \tilde{z} - \mathbf {b}(\overline{z}) \right\| _{L^2(\varOmega )}^2 \\ &{} \text {s.t.}~ \tilde{z} = \overline{z}. \end{array} \end{aligned}$$
(3.29)

In order to solve the constrained optimization problem (3.29), let us introduce a dual variable h and a descent step \(\nu \). A local solution of (3.29) is then obtained at convergence of the following algorithm:

Fig. 10
figure 10

3D-reconstructions after 10 iterations of the ADMM scheme, taking as initial guess different fronto-parallel planes \(z \equiv z_0\). The median of the distances to ground truth is, from left to right: 3.05, 2.88, 1.68, 2.08 and 5.86. When the initial guess is too close to the camera, the 3D-reconstruction is flattened, while the scale is over-estimated when starting too far away from the camera (although this yields a lower energy, see Fig. 11). a \(z_0= 500\) mm, b \(z_0= 650\) mm, c \(z_0= 700\) mm, d \(z_0= 750\) mm and e \(z_0= 900\) mm

figure h

Stage (3.30) of Algorithm 2 is a linear least-squares problem which can be solved using the normal equations of its discrete formulationFootnote 12. The presence of the regularization term now guarantees the positive definiteness of the matrix of the system. This matrix is however too large to be inverted directly. Therefore, we resort to the conjugate gradient algorithm.

Thanks to the auxiliary variable \(\overline{z}\), which decouples \(\nabla \tilde{z}\) and \(\tilde{z}\) in Problem (3.29), Stage (3.31) of Algorithm 2 is a local nonlinear least-squares problem: in fact, \(\nabla \overline{z}\) is not involved in this problem, which can be solved pixelwise. Problem (3.31) thus reduces to a nonlinear least-squares estimation problem of one real variable, which can be solved by a standard method such as the Levenberg-Marquardt algorithm.

Because of the nonlinearity of Problem (3.31), it is unfortunately impossible to guarantee convergence for this ADMM scheme, which depends on the initialization and on parameter \(\nu \) [9]. A reasonable initialization strategy consists in using the solution provided by Algorithm 1 (cf. Sect. 3.1). As for the descent step \(\nu \), we iteratively calculate its optimal value according to the Penalty Varying Parameter procedure described in [9]. Finally, the iterations stop when the relative variation of the criterion of Problem (3.24) falls under a threshold equal to \(10^{-4}\).

Figure 9 shows that with such choices, Problem (3.24) is solved more efficiently than with the fixed point scheme: the energy is now decreased at each iteration. Figure 11 shows that this is the case whatever the initial guess, although initialization has a strong impact on the solution, as confirmed by Fig. 10.

Fig. 11
figure 11

Evolution of the energy \(\mathcal {E}_{\mathrm {rat}}\) defined in (3.24), in function of the iterations, for the data of Fig. 2. Using as initialization \(\tilde{z}^{(0)} \equiv \log (z_0)\), the ADMM scheme always converges toward a local minimum, yet this minimum strongly depends on the value of \(z_0\). Besides, a lower final energy does not necessarily means a better 3D-reconstruction, as shown in Fig. 10. Hence, not only a careful initial guess is of primary importance, but the criterion derived from image ratios prevents automatic scale estimation

Figure 12 shows the 3D-reconstruction obtained by refining the results of Sect. 3.1 using Algorithm 2. At first sight, the 3D-shape depicted in Fig. 12a seems hardly different from that of Fig. 7a, but the comparison of histograms in Figs. 8a and 12b indicates that bias has been significantly reduced. This shows the superiority of direct depth estimation over alternating normal estimation and integration.

Fig. 12
figure 12

a 3D-reconstruction obtained with Algorithm 2, using the result from Fig. 7a as initial guess. b Histogram of point-to-point distances between this 3D-shape and the ground truth (cf. Fig. 7c). The median value is 1.2 mm

However, the lack of convergence guarantees and the strong dependency on the initialization remain limiting bottlenecks. The method discussed in the next section overcomes both these issues.

4 A New, Provably Convergent Variational Approach for Photometric Stereo Under Point Light Source Illumination

When it comes to solving photometric stereo under point light source illumination, there are two main difficulties: the dependency of the lighting vectors on the depth map (cf. Eq. (3.15)), and the presence of the nonlinear coefficient ensuring that the normal vectors have unit-length (cf. Eq. (3.18)).

The alternating strategy from Sect. 3.1 solves the former issue by freezing the lighting vectors at each iteration, and the latter by simultaneously estimating the normal vector and the albedo. The objective function tackled in this approach, which is based on the reprojection error, seems to be the most relevant. Indeed, the final result seems to be independent from the initialization, although convergence is not established.

On the other hand, the differential strategy from Sect. 3.2 explicitly tackles the nonlinear dependency of lighting on the depth, and eliminates the other nonlinearity using image ratios. Directly estimating depth reduces bias, but the objective function derived from image ratios admits a global solution which is not acceptable (depth uniformly tending to \(+\infty \)), albedo is not estimated and convergence is not established either.

Therefore, an ideal numerical solution should: (i) build upon a differential approach, in order to reduce bias, (ii) avoid linearization using ratios, in order to avoid the trivial solution and allow albedo estimation, and (iii) be provably convergent. The variational approach presented in this section, initially presented in [57], satisfies these three criteria.

4.1 Proposed Discrete Variational Framework

The nonlinearity of the PDEs (3.17) with respect to \(\nabla \tilde{z}\), due to the nonlinear dependency of \(d(\tilde{z})\) (see Eq. (3.18)), is challenging. We could explicitly consider this nonlinear coefficient within a variational framework [26], but we rather take inspiration from the way conventional photometric stereo [64] is linearized and integrate the nonlinearity inside the albedo variable, as we proposed recently in [57, 58]. Instead of estimating \(\overline{\rho }(\mathbf {p})\) in each pixel \(\mathbf {p}\), we thus rather estimate:

$$\begin{aligned} \tilde{\rho }(\mathbf {p}) = \frac{\overline{\rho }(\mathbf {p})}{d(\tilde{z})(\mathbf {p})}. \end{aligned}$$
(4.1)

The system of PDEs (3.17) is then rewritten as

$$\begin{aligned}&I^i(\mathbf {p}) = \tilde{\rho }(\mathbf {p}) \, \left[ \mathbf {Q}(\mathbf {p}) \, \mathbf {t}^i(\tilde{z})(\mathbf {p}) \right] \cdot \begin{bmatrix} \nabla \tilde{z}(\mathbf {p}) \\ - 1 \end{bmatrix},\nonumber \\&\quad i\in \{1,\ldots ,m\}, \end{aligned}$$
(4.2)

where we use the following notation, \(\forall \mathbf {p} = \left[ u,v\right] ^\top \in \varOmega \):

$$\begin{aligned} \mathbf {Q}(\mathbf {p}) =&\begin{bmatrix} f&\quad 0&\quad -u \\ 0&\quad f&\quad -v \\ 0&\quad 0&\quad 1 \end{bmatrix}. \end{aligned}$$
(4.3)

System (4.2) is a system of quasi-linear PDEs in \((\tilde{\rho },\tilde{z})\), because \(\mathbf {t}^i(\tilde{z})\) only depends on \(\tilde{z}\), and not on \(\nabla \tilde{z}\). Once \(\tilde{\rho }\) and \(\tilde{z}\) are estimated, it is straightforward to recover the “real” albedo \(\overline{\rho }\) using (4.1).

Let us now denote \(j \in \{ 1, \ldots , n\}\) the indices of the pixels inside \(\varOmega \), \(I^i_j\) the gray level of pixel j in image \(I^i\), \(\tilde{\varvec{\rho }}\in \mathbb {R}^n\) and \(\tilde{\mathbf {z}} \in \mathbb {R}^n\) the vectors stacking the unknown values \(\tilde{\rho }_j\) and \(\tilde{z}_j\), \(\mathbf {t}^i_j(\tilde{z}_j) \in \mathbb {R}^3\) the vector \(\mathbf {t}^i(\tilde{z})\) at pixel j, which smoothly (though nonlinearly) depends on \(\tilde{z}_j\), and \(\mathbf {Q}_j\) the matrix defined in Eq. (4.3) at pixel j. Then, the discrete counterpart of System (4.2) is written as the following system of nonlinear equations in \((\tilde{\varvec{\rho }},\tilde{\mathbf {z}})\):

$$\begin{aligned}&I^i_j = \tilde{\rho }_j \, \left[ \mathbf {Q}_j \, \mathbf {t}^i_j(\tilde{z}_j) \right] \cdot \begin{bmatrix} \left( \nabla \tilde{\mathbf {z}} \right) _j \\ - 1 \end{bmatrix}, \nonumber \\&\quad i\in \{1,\ldots ,m\},\, j \in \{1,\ldots ,n\}, \end{aligned}$$
(4.4)

where \(\left( \nabla \tilde{\mathbf {z}} \right) _j \in \mathbb {R}^2\) represents a finite differences approximation of the gradient of \(\tilde{z}\) at pixel j Footnote 13.

Our goal is to jointly estimate \(\tilde{\varvec{\rho }}\in \mathbb {R}^n\) and \(\tilde{\varvec{z}}\in \mathbb {R}^n\) from the set of nonlinear equations (4.4), as solution of the following discrete optimization problem:

$$\begin{aligned} \min _{\begin{array}{c} \tilde{\varvec{\rho }},\tilde{\varvec{z}} \end{array}} \mathcal {E}(\tilde{\varvec{\rho }},\tilde{\varvec{z}}) := \sum _{j=1}^n \sum _{i=1}^m\phi \left( r^i_j(\tilde{\varvec{\rho }},\tilde{\varvec{z}})\right) , \end{aligned}$$
(4.5)

where the residual \(r^i_j(\tilde{\varvec{\rho }},\tilde{\varvec{z}})\) depends locally (and linearly) on \(\tilde{\varvec{\rho }}\), but globally (and nonlinearly) on \(\tilde{\varvec{z}}\):

$$\begin{aligned} r^i_j(\tilde{\varvec{\rho }},\tilde{\varvec{z}}) = \tilde{\rho }_j \left\{ \zeta ^i_j(\tilde{\varvec{z}}) \right\} _+ - I^i_j, \end{aligned}$$
(4.6)

with

$$\begin{aligned} \zeta ^i_j(\tilde{\varvec{z}}) = \left[ \mathbf {Q}_j \mathbf {t}^i_j(\tilde{z}_j) \right] \cdot \begin{bmatrix} (\nabla \tilde{\varvec{z}})_j \\ -1 \end{bmatrix}. \end{aligned}$$
(4.7)

An advantage of our formulation is to be generic, i.e., independent from the choice of the operator \(\{\cdot \}_+\) and of the function \(\phi \). For fair comparison with the algorithms in Sect. 3, one can use \(\{x\}_+ = x\) and \(\phi (x) = \phi _{\text {LS}}(x) = x^2\). To improve robustness, self-shadows can be explicitly handled by using \(\{x\}_+ = \max \{x,0\}\), and the estimator \(\phi \) can be chosen as any \(\mathbb {R} \rightarrow \mathbb {R}^+\) function which is even, twice continuously differentiable, and monotonically increasing over \(\mathbb {R}^+\) such that:

$$\begin{aligned} \frac{\phi '(x)}{x}\ge \phi ''(x),~ \forall x \in \mathbb {R}. \end{aligned}$$
(4.8)

A typical example is Cauchy’s robust M-estimatorFootnote 14:

$$\begin{aligned} \phi _{\text {Cauchy}}(x) = \lambda ^2 \log \left( 1+\frac{x^2}{\lambda ^2}\right) , \end{aligned}$$
(4.9)

where the parameter \(\lambda \) is user-defined (we use \(\lambda = 0.1\)).

4.2 Alternating Reweighted Least-Squares for Solving (4.5)

Our goal is to find a local minimizer \((\tilde{\varvec{\rho }}^*,\tilde{\varvec{z}}^*)\) for (4.5), which must satisfy the following first-order conditionsFootnote 15:

$$\begin{aligned} \frac{\partial \mathcal {E}}{\partial \tilde{\varvec{\rho }}}(\tilde{\varvec{\rho }}^*\!,\!\tilde{\varvec{z}}^*)&=\sum _{j=1}^n \sum _{i=1}^m \phi '(r^i_j(\tilde{\varvec{\rho }}^*\!,\tilde{\varvec{z}}^*)) \frac{\partial r^i_j}{\partial \tilde{\varvec{\rho }}}(\tilde{\varvec{\rho }}^*\!,\tilde{\varvec{z}}^*) = {\varvec{0}}, \end{aligned}$$
(4.10)
$$\begin{aligned} \frac{\partial \mathcal {E}}{\partial \tilde{\varvec{z}}}(\tilde{\varvec{\rho }}^*\!,\!\tilde{\varvec{z}}^*)&=\sum _{j=1}^n \sum _{i=1}^m \phi '(r^i_j(\tilde{\varvec{\rho }}^*,\tilde{\varvec{z}}^*)) \frac{\partial r^i_j}{\partial \tilde{\varvec{z}}}(\tilde{\varvec{\rho }}^*,\tilde{\varvec{z}}^*) = {\varvec{0}}, \end{aligned}$$
(4.11)

with:

$$\begin{aligned} \frac{\partial r^i_j}{\partial \tilde{\rho }_l }(\tilde{\varvec{\rho }}^*,\tilde{\varvec{z}}^*)&= {\left\{ \begin{array}{ll} \{\zeta ^i_j(\tilde{\varvec{z}}^*)\}_+ &{}\quad \text {if}~ l = j, \\ 0 &{}\quad \text {if}~ l \ne j, \\ \end{array}\right. } \end{aligned}$$
(4.12)
$$\begin{aligned} \frac{\partial r^i_j}{\partial \tilde{\varvec{z}}}(\tilde{\varvec{\rho }}^*,\tilde{\varvec{z}}^*)&=\tilde{\rho }^*_j \, \chi (\zeta ^i_j(\tilde{\varvec{z}}^*)) \, \partial \zeta ^i_j(\tilde{\varvec{z}}^*). \end{aligned}$$
(4.13)

In (4.13), \(\chi \) is the (sub-)derivative of \(\{\cdot \}_+\), which is a constant function equal to 1 if \(\{x\}_+ = x\), and the Heaviside function if \(\{x\}_+ = \max \{x,0\}\).

For this purpose, we derive an alternating reweighted least-squares (ARLS) scheme. Suggested by its name, the ARLS scheme alternates Newton-like steps over \(\tilde{\varvec{\rho }}\) and \(\tilde{\varvec{z}}\), which can be interpreted as iteratively reweighted least-squares iterations. Similar to the famous iteratively reweighted least-squares [63] (IRLS) algorithm, ARLS solves the original (possibly non-convex) problem (4.5) iteratively, by recasting it as a series of simpler quadratic programs.

Given the current estimate \((\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)})\) of the solution, ARLS first freezes \(\tilde{\varvec{z}}\) and updates \(\tilde{\varvec{\rho }}\) by minimizing the following local quadratic approximation of \(\mathcal {E}(\cdot ,\tilde{\varvec{z}}^{(k)})\) around \(\tilde{\varvec{\rho }}^{(k)}\) Footnote 16:

$$\begin{aligned}&\mathcal {E}(\cdot ,\tilde{\varvec{z}}^{(k)}) \approx \sum _{j=1}^n\sum _{i=1}^m \Bigg \{ \phi \left( r^i_j(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)})\right) \nonumber \\&\quad + \frac{\phi '(r^i_j(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)}))}{r^i_j(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)})} \, \frac{r^i_j(\cdot ,\tilde{\varvec{z}}^{(k)})^2 - r^i_j(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)})^2 }{2} \Bigg \}, \end{aligned}$$
(4.14)

where we set \(\frac{\phi '(r^i_j(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)}))}{r^i_j(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)})} = 0\) if \(r^i_j(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)}) = 0\).

Then, \(\tilde{\varvec{\rho }}\) is freezed and \(\tilde{\varvec{z}}\) is updated by minimizing a local quadratic approximation of \(\mathcal {E}(\tilde{\varvec{\rho }}^{(k+1)},\cdot )\) around \(\tilde{\varvec{z}}^{(k)}\), which is in all points similar to (4.14). Iterating this procedure yields the following alternating sequence of reweighted least-squares problems:

$$\begin{aligned} \tilde{\varvec{\rho }}^{(k+1)}&= \underset{\tilde{\varvec{\rho }}\in \mathbb {R}^n}{\arg \min }~ \mathcal {E}_{\tilde{\varvec{\rho }}}(\tilde{\varvec{\rho }};\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)}) \nonumber \\&:= \frac{1}{2} \sum _{j=1}^n\sum _{i=1}^m w^i_j(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)}) \, r^i_j(\tilde{\varvec{\rho }},\tilde{\varvec{z}}^{(k)})^2, \end{aligned}$$
(4.15)
$$\begin{aligned} \tilde{\varvec{z}}^{(k+1)}&= \underset{\tilde{\varvec{z}}\in \mathbb {R}^n}{\arg \min }~ \mathcal {E}_{\tilde{\varvec{z}}}(\tilde{\varvec{z}};\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)}) \nonumber \\&:= \frac{1}{2} \sum _{j=1}^n\sum _{i=1}^m w^i_j(\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)}) \, r^i_j(\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}})^2. \end{aligned}$$
(4.16)

Here, the functions \(\mathcal {E}_{\tilde{\varvec{\rho }}}\) and \(\mathcal {E}_{\tilde{\varvec{z}}}\) are the above local quadratic approximations minus the constants which play no role in the optimization, and the following (lagged) weight variable w is usedFootnote 17:

$$\begin{aligned} w^i_j(\tilde{\varvec{\rho }},\tilde{\varvec{z}}) = {\left\{ \begin{array}{ll} \dfrac{\phi '(r^i_j(\tilde{\varvec{\rho }},\tilde{\varvec{z}}))}{r^i_j(\tilde{\varvec{\rho }},\tilde{\varvec{z}})} &{}\text {if}~r^i_j(\tilde{\varvec{\rho }},\tilde{\varvec{z}}) \ne 0, \\ 0&{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(4.17)

4.2.1 Solution of the \(\tilde{\varvec{\rho }}\)-Subproblem

Problem (4.15) can be rewritten as the following n independent linear least-squares problems, \(j \in \{1,\ldots ,n\}\):

$$\begin{aligned} \tilde{\rho }_j^{(k+1)} = \underset{\tilde{\rho }_j\in \mathbb {R}}{\arg \min }~ \frac{1}{2} \sum _{i=1}^m w^i_j(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)})\, r^i_j(\tilde{\varvec{\rho }},\tilde{\varvec{z}}^{(k)})^2. \end{aligned}$$
(4.18)

Each problem (4.18) almost always admits a unique solution. When it does not, we set \(\tilde{\rho }_j^{(k+1)} = \tilde{\rho }_j^{(k)}\). The update thus admits the following closed-form solution:

$$\begin{aligned} \tilde{\rho }_j^{(k+1)} = {\left\{ \begin{array}{ll} \dfrac{\sum _{i=1}^m w^i_j(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)}) \left\{ \zeta ^i_j(\tilde{\varvec{z}}^{(k)}) \right\} _+ I^i_j }{\sum _{i=1}^m w^i_j(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)}) \left\{ \zeta ^i_j(\tilde{\varvec{z}}^{(k)}) \right\} _+^2 } \\ \quad \text {if}~ \sum _{i=1}^m w^i_j(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)}) \left\{ \zeta ^i_j(\tilde{\varvec{z}}^{(k)}) \right\} _+^2 > 0,\\ \tilde{\rho }_j^{(k)} ~ \text {if}~ \sum _{i=1}^m w^i_j(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)}) \left\{ \zeta ^i_j(\tilde{\varvec{z}}^{(k)}) \right\} _+^2 = 0. \end{array}\right. } \end{aligned}$$
(4.19)

The second case in (4.19) means that \(\tilde{\varvec{\rho }}^{(k+1)}\) is set to be the solution of (4.15) which has minimal (Euclidean) distance to \(\tilde{\varvec{\rho }}^{(k)}\).

The update (4.19) can also be obtained by remarking that, since (4.15) is a linear least-squares problem, the solution of the equation \(\partial \mathcal {E}_{\tilde{\varvec{\rho }}}(\tilde{\varvec{\rho }};\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)}) = {\varvec{0}}\) is attained in one step of the Newton method:

$$\begin{aligned} \tilde{\varvec{\rho }}^{(k+1)}=\tilde{\varvec{\rho }}^{(k)}-H_{\tilde{\varvec{\rho }}}(\tilde{\varvec{\rho }}^{(k)}, \tilde{\varvec{z}}^{(k)})^\dagger \, \partial \mathcal {E}_{\tilde{\varvec{\rho }}}(\tilde{\varvec{\rho }}^{(k)} ;\tilde{\varvec{\rho }}^{(k)} ,\tilde{\varvec{z}}^{(k)}). \end{aligned}$$
(4.20)

In (4.20), the n-by-n matrix \(H_{\tilde{\varvec{\rho }}}(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)})\) is the Hessian of \(\mathcal {E}_{\tilde{\varvec{\rho }}}(\cdot ;\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)})\) at \(\tilde{\varvec{\rho }}^{(k)}\) Footnote 18, i.e.:

$$\begin{aligned}&\delta \tilde{\varvec{\rho }}^\top H_{\tilde{\varvec{\rho }}}(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)}) \, \delta \tilde{\varvec{\rho }}=\sum _{j=1}^n\sum _{i=1}^m w^i_j(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)}) \nonumber \\&\quad \left( \delta \tilde{\rho }_j\{\zeta ^i_j(\tilde{\varvec{z}}^{(k)})\}_+\right) ^2 \end{aligned}$$
(4.21)

for any \(\delta \tilde{\varvec{\rho }}= \left[ \delta \tilde{\rho }_1,\ldots ,\delta \tilde{\rho }_n \right] ^\top \in \mathbb {R}^n\). Since the n problems (4.18) are independent, it is a diagonal matrix with entry (jj) equal to \(e_j = \sum _{i=1}^m w^i_j(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)}) \left\{ \zeta ^i_j(\tilde{\varvec{z}}^{(k)}) \right\} _+^2\). This matrix is singular if one of the entries \(e_j\) is equal to zero, but its pseudo-inverse always exists: it is an n-by-n diagonal matrix whose entry (jj) is equal to \(1/ e_j\) as soon as \(e_j >0\), and to 0 otherwise. The updates (4.19) and (4.20) are thus strictly equivalent.

4.2.2 Solution of the \(\tilde{\varvec{z}}\)-Subproblem

The depth update (4.16) is a nonlinear least-squares problem, due to the nonlinearity of \(r^i_j(\tilde{\varvec{\rho }},\tilde{\varvec{z}})\) with respect to \(\tilde{\varvec{z}}\). We therefore introduce an additional linearization step, i.e., we follow a Gauss-Newton strategy. A first-order Taylor approximation of \(r^i_j(\tilde{\varvec{\rho }}^{(k+1)},\cdot )\) around \(\tilde{\varvec{z}}^{(k)}\) yields, using (4.13):

$$\begin{aligned}&\mathcal {E}_{\tilde{\varvec{z}}}(\tilde{\varvec{z}};\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)}) \approx \overline{\mathcal {E}}_{\tilde{\varvec{z}}}(\tilde{\varvec{z}};\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)}) \nonumber \\&\quad := \frac{1}{2} \sum _{j=1}^n\sum _{i=1}^m w^i_j(\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)}) \Big (r^i_j(\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)}) \nonumber \\&\quad +\tilde{\rho }^{(k+1)}_j \chi (\zeta ^i_j(\tilde{\varvec{z}}^{(k)})) \, (\tilde{\varvec{z}}-\tilde{\varvec{z}}^{(k)})^\top \partial \zeta ^i_j(\tilde{\varvec{z}}^{(k)})\Big )^2. \end{aligned}$$
(4.22)

Therefore, we replace the update (4.16) by

$$\begin{aligned} \tilde{\varvec{z}}^{(k+1)} = \underset{\tilde{\varvec{z}}\in \mathbb {R}^n}{\arg \min }~\overline{\mathcal {E}}_{\tilde{\varvec{z}}}(\tilde{\varvec{z}};\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)}), \end{aligned}$$
(4.23)

which is a linear least-squares problem whose solution is attained in one step of the Newton methodFootnote 19:

$$\begin{aligned} \tilde{\varvec{z}}^{(k+1)}=\tilde{\varvec{z}}^{(k)}-H_{\tilde{\varvec{z}}}(\tilde{\varvec{\rho }}^{(k+1)}, \tilde{\varvec{z}}^{(k)})^\dagger \, \partial \overline{\mathcal {E}}_{\tilde{\varvec{z}}}(\tilde{\varvec{z}}^{(k)};\tilde{\varvec{\rho }}^{(k+1)}, \tilde{\varvec{z}}^{(k)}), \end{aligned}$$
(4.24)

where the n-by-n matrix \(H_{\tilde{\varvec{z}}}(\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)})\) is the Hessian of \(\overline{\mathcal {E}}_{\tilde{\varvec{z}}}(\cdot ;\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)})\) at \(\tilde{\varvec{z}}^{(k)}\), i.e.:

$$\begin{aligned}&\delta \tilde{\varvec{z}}^\top H_{\tilde{\varvec{z}}}(\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)}) \delta \tilde{\varvec{z}}= \sum _{j=1}^n\sum _{i=1}^m \, w^i_j(\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)}) \nonumber \\&\quad \Big (\tilde{\rho }^{(k+1)}_j \chi (\zeta ^i_j(\tilde{\varvec{z}}^{(k)}))\delta \tilde{\varvec{z}}^\top \partial \zeta ^i_j(\tilde{\varvec{z}}^{(k)})\Big )^2 \end{aligned}$$
(4.25)

for any \(\delta \tilde{\varvec{z}}\in \mathbb {R}^n\).

In practice, \(H_{\tilde{\varvec{z}}}(\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)})^\dagger \,\partial \overline{\mathcal {E}}_{\tilde{\varvec{z}}}(\tilde{\varvec{z}}^{(k)};\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)})\) in Eq. (4.24) is computed (inexactly) by preconditioned conjugate gradient iterations up to a relative tolerance of \(10^{-4}\) (less than fifty iterations in our experiments).

4.2.3 Implementation Details

The proposed ARLS algorithm is summarized in Algorithm 3.

figure i

In our experiments, we use constant vectors as initializations for \(\tilde{\varvec{z}}\) and \(\tilde{\varvec{\rho }}\), i.e., the surface is initially approximated by a plane with uniform albedo. Iterations are stopped when the relative difference between two successive values of the energy \(\mathcal {E}\) defined in (4.5) falls below a threshold set to \(10^{-3}\). In our setup using \(m=8\) HD images and a recent i7 processor at 3.50 GHz with 32 GB of RAM, each depth update (the albedo one has negligible cost) required a few seconds, and 10–50 updates were enough to reach convergence.

4.3 Convergence Analysis

In this subsection, we present a local convergence theory for the proposed ARLS scheme. The proofs are provided in appendix.

When we write \(A\succeq B\) (resp. \(A\succ B\)), this means that the difference matrix \(A-B\) is positive semi-definite (resp. positive definite). The spectral radius of a matrix is denoted by \(\mathrm {sr}(\cdot )\).

4.3.1 ARLS as Newton Iterations

It is easily deduced from Eqs. (4.10), (4.15) and (4.17) that \(\partial \mathcal {E}_{\tilde{\varvec{\rho }}}(\tilde{\varvec{\rho }}^{(k)};\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)}) = \frac{\partial \mathcal {E}}{\partial \tilde{\varvec{\rho }}}(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)})\), and thus (4.20) also writes

$$\begin{aligned} \tilde{\varvec{\rho }}^{(k+1)} =\tilde{\varvec{\rho }}^{(k)} - H_{\tilde{\varvec{\rho }}}(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)})^\dagger \, \frac{\partial \mathcal {E}}{\partial \tilde{\varvec{\rho }}}(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)}), \end{aligned}$$
(4.26)

which is a quasi-Newton step with respect to the \(\tilde{\varvec{\rho }}\)-subproblem in (4.5), provided that \(H_{\tilde{\varvec{\rho }}}(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)})\) is a “reasonable” approximation of \(\frac{\partial ^2 \mathcal {E}}{\partial \tilde{\varvec{\rho }}^2}(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)})\). Lemma 1 will clarify what “reasonable” means here.

Regarding the \(\tilde{\varvec{z}}\)-update, let us remark that the Gauss-Newton step (4.23) for (4.16) can also be viewed as an approximate solution of the \(\tilde{\varvec{z}}\)-subproblem in (4.5), linearized around \(\tilde{\varvec{z}}^{(k)}\) as follows:

$$\begin{aligned}&\min _{\tilde{\varvec{z}}\in \mathbb {R}^n} \tilde{\mathcal {E}}_{\tilde{\varvec{z}}}(\tilde{\varvec{z}};\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)}):= \sum _{j=1}^n\sum _{i=1}^m \,\phi \Big (r^i_j(\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)}) \nonumber \\&\quad +\tilde{\rho }^{(k+1)}_j \chi (\zeta ^i_j(\tilde{\varvec{z}}^{(k)})) \, (\tilde{\varvec{z}}-\tilde{\varvec{z}}^{(k)})^\top \partial \zeta ^i_j(\tilde{\varvec{z}}^{(k)})\Big ). \end{aligned}$$
(4.27)

Since \(\partial \overline{\mathcal {E}}_{\tilde{\varvec{z}}}(\tilde{\varvec{z}}^{(k)};\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)}) = \partial \tilde{\mathcal {E}}_{\tilde{\varvec{z}}}(\tilde{\varvec{z}}^{(k)};\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)})\) [see Eqs. (4.17), (4.22), (4.27)], (4.24) also writes

$$\begin{aligned} \tilde{\varvec{z}}^{(k+1)}=\tilde{\varvec{z}}^{(k)}-H_{\tilde{\varvec{z}}}(\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)})^\dagger \, \partial \tilde{\mathcal {E}}_{\tilde{\varvec{z}}}(\tilde{\varvec{z}}^{(k)};\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)}), \end{aligned}$$
(4.28)

which is a quasi-Newton step for (4.27)Footnote 20, provided that matrix \(H_{\tilde{\varvec{z}}}(\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)})\) is a “reasonable” approximation of the Hessian \(\partial ^2\tilde{\mathcal {E}}_{\tilde{\varvec{z}}}(\cdot ,\tilde{\varvec{\rho }}^{(k+1)},\tilde{\varvec{z}}^{(k)})\) at \(\tilde{\varvec{z}}^{(k)}\). Let us now explain our meaning of “reasonable”.

4.3.2 A majorization Result

The following lemma establishes the (local) majorization properties of \(H_{\tilde{\varvec{\rho }}}\) and \(H_{\tilde{\varvec{z}}}\) over the Hessian matrices \(\frac{\partial ^2 \mathcal {E}}{\partial \tilde{\varvec{\rho }}^2}\) and \(\partial ^2 \tilde{\mathcal {E}}_{\tilde{\varvec{z}}}\), respectively.

Lemma 1

If the following condition holds at \((\tilde{\varvec{\rho }}^*,\tilde{\varvec{z}}^*)\):

$$\begin{aligned} \zeta ^i_j(\tilde{\varvec{z}}^*)\ne 0, \quad \forall (i,j)\in \{1,\ldots ,m\}\times \{1,\ldots ,n\}, \end{aligned}$$
(4.29)

then we have

$$\begin{aligned} {\left\{ \begin{array}{ll} H_{\tilde{\varvec{\rho }}}(\tilde{\varvec{\rho }},\tilde{\varvec{z}})&{} \succeq \frac{\partial ^2 \mathcal {E}}{\partial \tilde{\varvec{\rho }}^2}(\tilde{\varvec{\rho }},\tilde{\varvec{z}}), \\ H_{\tilde{\varvec{z}}}(\tilde{\varvec{\rho }},\tilde{\varvec{z}}) &{} \succeq \partial ^2 \tilde{\mathcal {E}}_{\tilde{\varvec{z}}}(\tilde{\varvec{z}};\tilde{\varvec{\rho }},\tilde{\varvec{z}}), \end{array}\right. } \end{aligned}$$
(4.30)

whenever \((\tilde{\varvec{\rho }},\tilde{\varvec{z}})\) lies in some small neighborhood of \((\tilde{\varvec{\rho }}^*,\tilde{\varvec{z}}^*)\).

4.3.3 Convergence Proof for ARLS

The next theorem contains the main result of our local convergence analysis.

Theorem 1

Assume that, for some iteration k, the iterate \((\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)})\) generated by Algorithm 3 is sufficiently close to some local minimizer \((\tilde{\varvec{\rho }}^*,\tilde{\varvec{z}}^*)\) where, in addition to (4.29), the following conditions hold:

$$\begin{aligned}&\frac{\partial \mathcal {E}}{\partial \tilde{\varvec{\rho }}}(\tilde{\varvec{\rho }}^*,\tilde{\varvec{z}}^*)={\varvec{0}}, \quad \frac{\partial \mathcal {E}}{\partial \tilde{\varvec{z}}}(\tilde{\varvec{\rho }}^*,\tilde{\varvec{z}}^*)={\varvec{0}}, \end{aligned}$$
(4.31)
$$\begin{aligned}&\begin{bmatrix} \dfrac{\partial ^2 \mathcal {E}}{\partial \tilde{\varvec{\rho }}^2}(\tilde{\varvec{\rho }}^*,\tilde{\varvec{z}}^*)&\dfrac{\partial ^2 \mathcal {E}}{\partial \tilde{\varvec{\rho }}\partial \tilde{\varvec{z}}}(\tilde{\varvec{\rho }}^*,\tilde{\varvec{z}}^*) \\ \dfrac{\partial ^2 \mathcal {E}}{\partial \tilde{\varvec{\rho }}\partial \tilde{\varvec{z}}}(\tilde{\varvec{\rho }}^*,\tilde{\varvec{z}}^*)&\dfrac{\partial ^2 \mathcal {E}}{\partial \tilde{\varvec{z}}^2}(\tilde{\varvec{\rho }}^*,\tilde{\varvec{z}}^*) \end{bmatrix} \succ \mathbf {O}, \end{aligned}$$
(4.32)
$$\begin{aligned}&\partial ^2 \tilde{\mathcal {E}}_{\tilde{\varvec{z}}}(\tilde{\varvec{z}}^*;\tilde{\varvec{\rho }}^*,\tilde{\varvec{z}}^*) \succ \mathbf {O}, \end{aligned}$$
(4.33)
$$\begin{aligned}&\mathrm {sr}\left( \partial ^2 \tilde{\mathcal {E}}_{\tilde{\varvec{z}}}(\tilde{\varvec{z}}^*;\tilde{\varvec{\rho }}^*\!\!,\!\tilde{\varvec{z}}^*)^{-1}\! \left( \!\dfrac{\partial ^2 \mathcal {E}}{\partial \tilde{\varvec{z}}^2}(\tilde{\varvec{\rho }}^*\!,\!\tilde{\varvec{z}}^*\!)-\partial ^2 \tilde{\mathcal {E}}_{\tilde{\varvec{z}}}(\tilde{\varvec{z}}^*;\tilde{\varvec{\rho }}^*\!\!,\!\tilde{\varvec{z}}^*)\right) \right) \!<\!1. \end{aligned}$$
(4.34)

Then we have \(\lim _{k\rightarrow \infty }(\tilde{\varvec{\rho }}^{(k)},\tilde{\varvec{z}}^{(k)})=(\tilde{\varvec{\rho }}^*,\tilde{\varvec{z}}^*)\).

As a remark, conditions (4.31) and (4.32) assumed in Theorem 1 are typically referred to as the first-order and the second-order sufficient optimality conditions, while conditions (4.33) and (4.34) are similar to the local convergence criteria for Gauss-Newton method, see, e.g., [21, Theorem 1]. They always seem satisfied in our experiments, i.e., the convergence of ARLS in form of Algorithm 3 is always observed. If needed, these conditions may however be explicitly enforced by replacing \(\{\cdot \}_+\) by its (smooth) proximity operator, and incorporating a line search step into ARLS, see [57].

4.4 Experimental Validation

For fair comparison with the methods discussed in Sect. 3, we first consider least-squares estimation without explicit self-shadows handling, i.e., \(\phi (x) = x^2\) and \(\{x\}_+ = x\). The results in Figs. 13 and 14 show that, unlike the previous least-squares differential method from Sect. 3.2, the new scheme always converges toward a similar solution for a wide range of initial estimates.

Fig. 13
figure 13

a Evolution of the energy \(\mathcal {E}\) of the proposed approach, defined in (4.5), using least-squares estimation, in function of the iterations, for the data of Fig. 2. As long as the initial scale is not over-estimated too much, the proposed scheme converges toward similar solutions for different initial estimates (cf. Fig. 14), though with different speeds. b 3D-model obtained at convergence, using \(z_0 = 750\) mm. c Histogram of point-to-point distances between (b) and the ground truth (cf. Fig. 7c). As in the experiment of Fig. 12, the median value is 1.2 mm, yet this result is almost independent from the initialization, and is obtained using a provably convergent algorithm

Fig. 14
figure 14

3D-reconstructions after 50 iterations of the proposed scheme, taking as initial guess different fronto-parallel planes \(z \equiv z_0\) and using least-squares estimation. Similar results are obtained whatever the initialization, at least as long as the initial scale is not over-estimated too much. a \(z_0 = 500\) mm, b \(z_0 = 650\) mm, c \(z_0 = 700\) mm, d \(z_0 = 750\) mm and e \(z_0 = 900\) mm

Although the accuracy of the results obtained with this new scheme is not improved, the influence of the initialization is much reduced and convergence is guaranteed. Besides, it is straightforward to improve robustness by simply changing the definitions of the function \(\phi \) and of the operator \(\{ \cdot \}_+\), while ensuring robustness of the ratio-based approach is not an easy task [41, 60]. Figure 15 shows the result obtained using Cauchy’s M-estimator \(\varPhi _{\text {Cauchy}}\) and explicit self-shadows handling, i.e., \(\{x\}_+ = \max \{x,0\}\).

Fig. 15
figure 15

Same as Fig. 13, but using Cauchy’s robust M-estimator and explicit self-shadows handling. Despite the non-convexity of the estimator, convergence is similar to that obtained in the previous experiment. However, the median value of the 3D-reconstruction error is now 0.91 mm, which is to be compared with the previous value 1.2 mm (cf. Fig. 13)

5 Estimating Colored 3D-Models by Photometric Stereo

So far, we have considered only gray level images. In this section, we extend our study to RGB-valued images, in order to estimate colored 3D-models using photometric stereo. Similar to Sect. 2, we will first establish the image formation model and discuss calibration. Then, we will show how to modify the algorithm from Sect. 4 in order to handle RGB images.

5.1 Spectral Dependency of the Luminous Flux Emitted by a LED

We need to introduce a spectral dependency in Model (2.7) to extend our study to color. It seems reasonable to limit this dependency to the intensity (\(\lambda \) denotes the wavelength):

$$\begin{aligned} \mathbf {s}(\mathbf {x},\lambda ) = \varPhi (\lambda ) \, \cos ^\mu \theta \, \frac{\mathbf {x}_s-\mathbf {x}}{\Vert \mathbf {x}_s-\mathbf {x}\Vert ^3}. \end{aligned}$$
(5.1)

Model (5.1) is more complex than Model (2.7), because the intensity \(\varPhi _0 \in \mathbb {R}^+\) has been replaced by the emission spectrum \(\varPhi (\lambda )\), which is a function (cf. Fig. 16a). The calibration of \(\varPhi (\lambda )\) could be achieved by using a spectrometer, but we will show how to extend the procedure from Sect. 2.2, which requires nothing else than a camera and two calibration patterns.

Given a point \(\mathbf {x}\) of a Lambertian surface with albedo \(\rho (\mathbf {x})\), under the illumination described by the lighting vector \(\mathbf {s}(\mathbf {x})\), we get from (2.8), (2.9) and (2.10) the expression of the illuminance \(\epsilon (\mathbf {p})\) of the image plane in the pixel \(\mathbf {p}\) conjugate to \(\mathbf {x}\):

$$\begin{aligned} \epsilon (\mathbf {p}) = \beta \, \cos ^4\alpha (\mathbf {p}) \, \frac{\rho (\mathbf {x})}{\pi } \, \left\{ \mathbf {s}(\mathbf {x}) \cdot \mathbf {n}(\mathbf {x}) \right\} _+. \end{aligned}$$
(5.2)

This expression is easily extended to the case where \(\mathbf {s}(\mathbf {x})\) and \(\rho (\mathbf {x})\) depend on \(\lambda \):

$$\begin{aligned} \epsilon (\mathbf {p},\lambda ) = \beta \, \cos ^4\alpha (\mathbf {p}) \, \frac{\rho (\mathbf {x},\lambda )}{\pi } \, \left\{ \mathbf {s}(\mathbf {x},\lambda ) \cdot \mathbf {n}(\mathbf {x}) \right\} _+. \end{aligned}$$
(5.3)

The one-to-one correspondence between the points \(\mathbf {x}\) and the pixels \(\mathbf {p}\) allows us to denote \(\rho (\mathbf {p},\lambda )\) and \(\mathbf {n}(\mathbf {p})\), in lieu of \(\rho (\mathbf {x},\lambda )\) and \(\mathbf {n}(\mathbf {x})\). In addition, the light effectively received by each cell goes through a colored filter characterized by its transmission spectrum \(c_\star (\lambda )\), \(\star \in \{R,G,B\}\), whose maximum lies, respectively, in the red, green and blue ranges (cf. Fig. 16b). To define the color levels \(I_\star (\mathbf {p})\), \(\star \in \{R,G,B\}\), by similarity with the expression (2.12) of the (corrected) gray level \(I(\mathbf {p})\), we must multiply (5.3) by \(c_\star (\lambda )\), and integrate over the entire spectrum:

$$\begin{aligned} I_\star (\mathbf {p}) = \frac{\gamma \, \beta }{\pi } \, \left\{ \left[ \int _{\lambda =0}^{+\infty } c_\star (\lambda ) \, \rho (\mathbf {p},\lambda ) \,\mathbf {s}(\mathbf {x},\lambda ) \, \mathrm {d}\lambda \right] \cdot \mathbf {n}(\mathbf {p}) \right\} _+. \end{aligned}$$
(5.4)

Using a Lambertian calibration pattern which is uniformly white i.e., such that \(\rho (\mathbf {p},\lambda ) \equiv \rho _0\), allows us to rewrite (5.4) as follows:

$$\begin{aligned} I_\star (\mathbf {p}) = \gamma \, \beta \, \frac{\rho _0}{\pi } \, \left\{ \left[ \int _{\lambda =0}^{+\infty } c_\star (\lambda ) \,\mathbf {s}(\mathbf {x},\lambda ) \, \mathrm {d}\lambda \right] \cdot \mathbf {n}(\mathbf {p}) \right\} _+, \end{aligned}$$
(5.5)

which is indeed an extension of (2.17) to RGB images, since (5.5) can be rewritten

$$\begin{aligned} I_\star (\mathbf {p}) = \gamma \, \beta \, \frac{\rho _0}{\pi } \, \left\{ \mathbf {s}_\star (\mathbf {x}) \cdot \mathbf {n}(\mathbf {p}) \right\} _+, \end{aligned}$$
(5.6)

provided that the three colored lighting vectors \(\mathbf {s}_\star (\mathbf {x})\) are defined as follows:

$$\begin{aligned} \mathbf {s}_\star (\mathbf {x}) = \int _{\lambda =0}^{+\infty } c_\star (\lambda ) \, \mathbf {s}(\mathbf {x},\lambda ) \, \mathrm {d}\lambda , \quad \star \in \{R,G,B\}. \end{aligned}$$
(5.7)

Replacing the lighting vector \(\mathbf {s}(\mathbf {x},\lambda )\) in (5.7) by its expression (5.1), we obtain the following extension of Model (2.7) to color:

$$\begin{aligned} \mathbf {s}_\star (\mathbf {x}) = \varPhi _\star \, \cos ^\mu \theta \, \frac{\mathbf {x}_s-\mathbf {x}}{\Vert \mathbf {x}_s-\mathbf {x}\Vert ^3}, \quad \star \in \{R,G,B\}, \end{aligned}$$
(5.8)

where the colored intensities \(\varPhi _\star \) are defined as follows:

$$\begin{aligned} \varPhi _\star = \int _{\lambda =0}^{+\infty } c_\star (\lambda ) \, \varPhi (\lambda ) \, \mathrm {d}\lambda , \quad \star \in \{R,G,B\}. \end{aligned}$$
(5.9)

The spectral dependency of the lighting vector \(\mathbf {s}(\mathbf {x},\lambda )\) expressed in (5.1) is thus partially described by Model (5.8), which contains nine parameters: three for the coordinates of \(\mathbf {x}_s\), two for the unit-length vector \(\mathbf {n}_s\), plus the three colored intensities \(\varPhi _R\), \(\varPhi _G\), \(\varPhi _B\), and the anisotropy parameter \(\mu \). Nonetheless, since the definition (5.9) of \(\varPhi _\star \) depends on \(c_\star (\lambda )\), it follows that the parameters \(\varPhi _R\), \(\varPhi _G\) and \(\varPhi _B\) are not really characteristic of the LED, but of the camera-LED pair.

Fig. 16
figure 16

Source: http://www.lumileds.com/uploads/28/DS64-pdf

a Emission spectrum \(\varPhi (\lambda )\) of the LEDs used. b Camera response functions in the three channels R, G, B, for the Canon EOS 50D camera [33] (which is similar to the Canon EOS 7D we use). Our extension to RGB images of the calibration procedure from Sect. 2.2 requires nothing else than a camera and two calibration patterns. Therefore, we do not need any of these diagrams in practice.

5.2 Spectral Calibration of the Luminous Flux Emitted by a LED

We use again the Lambertian planar calibration pattern from Sect. 2.2. Since it is convex, the incident light comes solely from the LED. We can thus replace \(\mathbf {s}_\star (\mathbf {x})\) by its definition (5.8) in the expression (5.6) of the color level \(I_\star (\mathbf {p})\). Assuming that \(\mathbf {x}_s\) is estimated by triangulation and that the anisotropy parameter \(\mu \) is provided by the manufacturer, we then have to solve, in each channel \(\star \in \{R,G,B\}\), the following problem, which is an extension of Problem (2.19) (q is the number of poses of the Lambertian calibration pattern):

$$\begin{aligned} \underset{\mathbf {m}_{s,\star }}{{\min }} \sum _{j=1}^{q} \sum _{\mathbf {p} \in \varOmega ^j} \left[ \mathbf {m}_{s,\star } \cdot (\mathbf {x}^j-\mathbf {x}_s) - \left[ I_\star ^j(\mathbf {p}) \, \frac{\Vert \mathbf {x}_s-\mathbf {x}^j\Vert ^{3+\mu }}{ \left\{ (\mathbf {x}_s-\mathbf {x}^j) \cdot \mathbf {n}^j\right\} _+} \right] ^{\frac{1}{\mu }} \right] ^{2} , \end{aligned}$$
(5.10)

where \(\mathbf {m}_{s,\star }\) is defined by analogy with \(\mathbf {m}_s\) (cf. (2.18)):

$$\begin{aligned} \mathbf {m}_{s,\star } = {\varPsi _\star }^{\frac{1}{\mu }} \, \mathbf {n}_s, \end{aligned}$$
(5.11)

and \(\varPsi _\star \) is defined by analogy with \(\varPsi \) (cf. (2.14)):

$$\begin{aligned} \varPsi _\star = \gamma \, \beta \, \frac{\rho _0}{\pi } \, \varPhi _\star . \end{aligned}$$
(5.12)

Each problem (5.10) allows us to estimate a colored intensity \(\varPhi _R\), \(\varPhi _G\) or \(\varPhi _B\) (up to a common factor) and the principal direction \(\mathbf {n}_s\), which is thus estimated three times. Table 1 groups the values obtained for one of the LEDs of our setup. The three estimates of \(\mathbf {n}_s\) are consistent, but instead of arbitrarily choosing one of them, we compute the weighted mean of these estimates, using spherical coordinates.

Table 1 Parameters of one of the LEDs of our setup, estimated by solving (5.10) in each color channel

In Table 1, the values of \(\widehat{\varPsi }_R\), \(\widehat{\varPsi }_G\) and \(\widehat{\varPsi }_B\) are given without unit because, from the definition (5.12) of \(\varPsi _\star \), only their relative values are meaningful. As it happens, the value of \(\widehat{\varPsi }_G\) is roughly twice as much as those of \(\widehat{\varPsi }_R\) and \(\widehat{\varPsi }_B\), but this does not mean that \(\varPhi (\lambda )\) is twice higher in the green range than in the red or in the blue ranges, since the definition (5.9) of a given colored intensity \(\varPhi _\star \) also depends on the transmission spectrum \(c_\star (\lambda )\) in the considered channel.

Our calibration procedure relies on the assumption that the calibration pattern is uniformly white, i.e., that \(\rho (\mathbf {p},\lambda ) \equiv \rho _0\), which may be inexact, yet in no way does this question our rationale. Indeed, if we assume that the color of “white” cells from the Lambertian checkerboard (cf. Fig. 4) is uniform i.e., \(\rho (\mathbf {p},\lambda ) = \rho (\lambda )\), \(\forall \mathbf {p} \in \varOmega ^j\), and if we denote \(\rho _0\) the maximum value of \(\rho (\lambda )\), Eq. (5.5) is still valid, provided that \(c_\star (\lambda )\) is replaced by the function \(\overline{c}_\star (\lambda )\) defined as followsFootnote 21:

$$\begin{aligned} \overline{c}_\star (\lambda ) = \frac{\rho (\lambda )}{\rho _0} \, c_\star (\lambda ). \end{aligned}$$
(5.13)

5.3 Photometric Stereo Under Colored Point Light Source Illumination

If we pretend to extend Model (2.21) to RGB images, then it must be possible to write the color level at \(\mathbf {p}\), in each channel \(\star \in \{R,G,B\}\), in the following manner:

$$\begin{aligned} I_\star (\mathbf {p}) = \varPsi _\star \, \frac{\rho _\star (\mathbf {p})}{\rho _0} \left[ \frac{\mathbf {n}_s \cdot \left( \mathbf {x}-\mathbf {x}_s \right) }{\Vert \mathbf {x}-\mathbf {x}_s\Vert } \right] ^\mu \frac{\left\{ (\mathbf {x}_s-\mathbf {x}) \cdot \mathbf {n}(\mathbf {p})\right\} _+}{\Vert \mathbf {x}_s-\mathbf {x}\Vert ^3} \end{aligned}$$
(5.14)

where the colored albedos \(\rho _\star (\mathbf {p})\) are some extensions of the albedo \(\rho (p)\) to the RGB case. Equating both expressions of \(I_\star (\mathbf {p})\) given in (5.4) and in (5.14), and using the definition (5.1) of \(\mathbf {s}(\mathbf {x},\lambda )\), we obtain:

$$\begin{aligned} \varPsi _\star \frac{\rho _\star (\mathbf {p})}{\rho _0} \, = \frac{\gamma \beta }{\pi } \int _{\lambda =0}^{+\infty } c_\star (\lambda ) \, \rho (\mathbf {p},\lambda ) \, \varPhi (\lambda ) \, \mathrm {d}\lambda . \end{aligned}$$
(5.15)

Using the definitions (5.12) and (5.9) of \(\varPsi _\star \) and \(\varPhi _\star \), (5.15) yields the following expression for the colored albedos:

$$\begin{aligned} \rho _\star (\mathbf {p}) = \frac{\int _{\lambda =0}^{+\infty } c_\star (\lambda ) \, \rho (\mathbf {p}, \lambda ) \,{\varPhi }(\lambda ) \, \mathrm {d}\lambda }{\int _{\lambda =0}^{+\infty } c_\star (\lambda ) \, {\varPhi }(\lambda ) \, \mathrm {d}\lambda },\, \star \in \{R,G,B\}, \end{aligned}$$
(5.16)

which is the mean of \(\rho (\mathbf {p},\lambda )\) over the entire spectrum, weighted by the product \(c_\star (\lambda ) \, {\varPhi }(\lambda )\). In addition, although the transmission spectrum \(c_\star (\lambda )\) depends only on the camera, the emission spectrum \({\varPhi }(\lambda )\) usually varies from one LED to another. Thus, generalizing photometric stereo under point light source illumination to RGB images requires to superscript the colored albedos by the LED index i. Hence, it seems that we have to solve, in each pixel \(\mathbf {p}\in \varOmega \), the following problem:

$$\begin{aligned}&I_\star ^i(\mathbf {p}) = \varPsi ^i_\star \, \frac{\rho ^i_\star (\mathbf {p})}{\rho _0} \left[ \frac{\mathbf {n}^i_s \cdot \left( \mathbf {x}-\mathbf {x}_s^i \right) }{\Vert \mathbf {x}-\mathbf {x}_s^i\Vert } \right] ^{\mu ^i} \frac{\left\{ (\mathbf {x}_s^i-\mathbf {x}) \cdot \mathbf {n}(\mathbf {p})\right\} _+}{\Vert \mathbf {x}_s^i-\mathbf {x}\Vert ^3},\nonumber \\&\quad ~i\in \{1,\ldots ,m\},\quad \star \in \{R,G,B\}. \end{aligned}$$
(5.17)

System (5.17) is underdetermined, because it contains 3m equations with \(3m+3\) unknowns: one colored albedo \(\rho _\star ^i(\mathbf {p})\) per equation, the depth \(z(\mathbf {p})\) of the 3D-point \(\mathbf {x}\) conjugate to \(\mathbf {p}\) (from which we get the coordinates of \(\mathbf {x}\)), and the normal \(\mathbf {n}(\mathbf {p})\). Apart from this numerical difficulty, the dependency on i of the colored albedos is puzzling: while it is clear that the albedo is a photometric characteristic of the surface, independent from the lighting, it should go the same for the colored albedos. This shows that the extension to RGB images of photometric stereo is potentially intractable in the general case. However, such an extension is known to be possible in two specific cases [56]:

  • For a non-colored surface i.e., when \(\rho (\mathbf {p},\lambda ) = \rho (\mathbf {p})\), we deduce from (5.16) that \(\rho _R(\mathbf {p}) = \rho _G(\mathbf {p}) = \rho _B(\mathbf {p}) = \rho (\mathbf {p})\). Problem (5.17) is thus written:

    $$\begin{aligned}&I_\star ^i(\mathbf {p}) = \varPsi ^i_\star \, \frac{\rho (\mathbf {p})}{\rho _0} \left[ \frac{\mathbf {n}^i_s \cdot \left( \mathbf {x}-\mathbf {x}_s^i \right) }{\Vert \mathbf {x}-\mathbf {x}_s^i\Vert } \right] ^{\mu ^i} \frac{\left\{ (\mathbf {x}_s^i-\mathbf {x}) \cdot \mathbf {n}(\mathbf {p})\right\} _+}{\Vert \mathbf {x}_s^i-\mathbf {x}\Vert ^3},\nonumber \\&\quad i\in \{1,\ldots ,m\},\quad \star \in \{R,G,B\}. \end{aligned}$$
    (5.18)

    If the albedo is known, and if a channel dependency is added to the sources parameters \(\mathbf {x}^i_s\), \(\mathbf {n}^i_s\) and \(\mu ^i\), then System (5.18) has 3 unknowns and 3m independent equations: a single RGB image may suffice to ensure that the problem is well-determined. This well-known case, which dates back to the 90’s [35], has been applied to real-time 3D-reconstruction of a white painted deformable surface [23].

  • When the sources are non-colored i.e., when \({\varPhi }^i(\lambda ) \equiv \varPhi _0\), \(\forall i \in \{1,\ldots ,m\}\), (5.16) gives:

    $$\begin{aligned} \rho _\star (\mathbf {p}) = \frac{\int _{\lambda =0}^{+\infty } c_\star (\lambda ) \, \rho (\mathbf {p},\lambda ) \, \mathrm {d}\lambda }{\int _{\lambda =0}^{+\infty } c_\star (\lambda ) \, \mathrm {d}\lambda },~ \star \in \{R,G,B\}. \end{aligned}$$
    (5.19)

    Since this expression is independent from i, Problem (5.17) is rewritten:

    $$\begin{aligned}&I_\star ^i(\mathbf {p}) = \varPsi _\star \, \frac{\rho _\star (\mathbf {p})}{\rho _0} \left[ \frac{\mathbf {n}^i_s \cdot \left( \mathbf {x}-\mathbf {x}_s^i \right) }{\Vert \mathbf {x}-\mathbf {x}_s^i\Vert } \right] ^{\mu ^i} \frac{\left\{ (\mathbf {x}_s^i-\mathbf {x}) \cdot \mathbf {n}(\mathbf {p})\right\} _+}{\Vert \mathbf {x}_s^i-\mathbf {x}\Vert ^3}, \nonumber \\&\quad i\in \{1,\ldots ,m\},~\star \in \{R,G,B\}. \end{aligned}$$
    (5.20)

    In (5.20), the parameter \(\varPsi _\star \) is independent from i, but it really depends on the channel \(\star \), although the sources are supposed to be non-colored, since in the definition (5.12) of \(\varPsi _\star \), the colored intensity \(\varPhi _\star \) is channel-dependent (cf. Eq. (5.9)). System (5.20), which has 3m equations and six unknowns, is overdetermined if \(m\geqslant 3\). If \(m=2\), it is well-determined but rank-deficient, since in each point, the 6 lighting vectors are coplanar. Additional information (e.g., a boundary condition) is required [43].

Another case where the colored albedos are independent from i is when the m LEDs all share the same emission spectrum, up to multiplicative coefficients (\(\varPhi ^i(\lambda ) = \kappa ^i \, \varPhi (\lambda ),\,\forall i \in \{1,\ldots ,m\}\)). Under such an assumption, the colored albedos \(\rho _\star (\mathbf {p})\) do not have to be indexed by i, according to their definition (5.16). Note however that the parameters \(\varPsi _\star \) still have to be indexed by i, in this case. Using the notation

$$\begin{aligned} \overline{\rho }_\star (\mathbf {p}) = \frac{\rho _\star (\mathbf {p})}{\rho _0},\quad \star \in \{R,G,B\}, \end{aligned}$$
(5.21)

we obtain the following result:

Under the same hypotheses as in Eq. (2.1), if the m light sources share the same emission spectrum, up to a multiplicative coefficient, then the m RGB images can be modeled as follows:

$$\begin{aligned}&I_\star ^i(\mathbf {p}) = \varPsi _\star ^i \, \overline{\rho }_\star (\mathbf {p}) \left[ \frac{ \mathbf {n}^i_s \cdot \left( \mathbf {x}-\mathbf {x}^i_s\right) }{\Vert \mathbf {x}-\mathbf {x}^i_s\Vert } \right] ^{\mu ^i} \frac{ \left\{ (\mathbf {x}^i_s-\mathbf {x}) \cdot \mathbf {n}(\mathbf {p}) \right\} _+}{\Vert \mathbf {x}^i_s-\mathbf {x}\Vert ^3}, \nonumber \\&\quad \,i\in \{1,\ldots ,m\}, \quad \star \in \{R,G,B\}. \end{aligned}$$
(5.22)

where:

  • \(I^i_\star \) is the (corrected) color level in channel \(\star \);

  • \(\varPsi _R^i\), \(\varPsi _G^i\) and \(\varPsi _B^i\) are the colored intensities of the ith source, multiplied by an unknown factor, which is common to all the sources and depends on several camera parameters and on the albedo \(\rho _0\) (cf. Eqs. (5.9) and (5.12));

  • \(\overline{\rho }_\star \) is the colored albedo in channel  \(\star \), relatively to \(\rho _0\) (cf. Eq. (5.21)).

For the setup of Fig. 2a, the \(m=8\) LEDs probably do not exactly share the same spectrum, although they come from the same batch, yet this assumption seems more realistic than that of “non-colored sources”, and it allows us to better justify the use of (5.22), which models both the spectral dependency of the albedo and that of the luminous fluxes.

The calibration procedure described in Sect. 5.2 provides us with the values of the parameters \(\mathbf {x}^i_s\), \(\mathbf {n}^i_s\) and \(\varPsi _\star ^i\), \(i \in \{1,\ldots ,m\}\), and the parameters \(\mu ^i\), \(i\in \{1,\ldots ,m\}\), are provided by the manufacturer. The unknowns of System (5.22) are thus the depth \(z(\mathbf {p})\) of \(\mathbf {x}\), the normal \(\mathbf {n}(\mathbf {p})\) and the three colored albedos \(\overline{\rho }_\star (\mathbf {p})\), \(\star \in \{R,G,B\}\). Resorting to RGB images allows us to replace the system (2.1) of m equations with four unknowns, by the system (5.22) of 3m equations with six unknowns, which should yield more accurate results.

5.4 Solving Colored Photometric Stereo Under Point Light Source Illumination

The alternating strategy from Sect. 3.1 is not straightforward to adapt to the case of RGB-valued images, because the albedo is channel-dependent, while the normal vector is not. Principal component analysis could be employed [5], but we already know from Sect. 3 that a differential approach should be preferred anyway.

A PDE-based approach similar to that of Sect. 3.2 is advocated in [56]: ratios between color levels can be computed in each channel \(\star \in \{R,G,B\}\), thus eliminating the colored albedos \(\overline{\rho }_\star (\mathbf {p})\) and obtaining a system of PDEs in z similar to (3.23). The PDEs to solve remain quasi-linear, unlike in [30]. Yet, we know that the solution strongly depends on the initialization.

On the other hand, it is straightforward to adapt the method recommended in Sect. 4, by turning the discrete optimization problem (4.5) into

$$\begin{aligned} \min _{\begin{array}{c} \tilde{\varvec{\rho }}_R,\tilde{\varvec{\rho }}_G,\tilde{\varvec{\rho }}_B,\tilde{\varvec{z}} \end{array}} \sum _{\star \in \{R,G,B\}} \sum _{j=1}^n \sum _{i=1}^m\phi \left( r^i_{\star ,j}(\tilde{\varvec{\rho }}_\star ,\tilde{\varvec{z}})\right) , \end{aligned}$$
(5.23)

with the following new definitions, which use straightforward notations for the channel dependencies:

$$\begin{aligned} r^i_{\star ,j}(\tilde{\varvec{\rho }}_\star ,\tilde{\varvec{z}})&= \tilde{\rho }_{\star ,j} \left\{ \zeta ^i_{\star ,j}(\tilde{\varvec{z}})\right\} _+ -I^i_{\star ,j}, \end{aligned}$$
(5.24)
$$\begin{aligned} \zeta ^i_{\star ,j}(\tilde{\varvec{z}})&= \left[ \mathbf {Q}_j \mathbf {t}^i_{\star ,j}(\tilde{z}_j)\right] \cdot \begin{bmatrix} (\nabla \tilde{\varvec{z}})_j \\ -1 \end{bmatrix}. \end{aligned}$$
(5.25)

The actual solution of (5.23) follows immediately from the algorithm described in Sect. 4.2. The depth update simply uses three times more equations, which improves its robustness, while the estimation of each colored albedo is carried out independently in each channel in exactly the same way as in Sect. 4.2.

Since the depth estimation now uses more data, the 3D-model of Fig. 17, which uses RGB images, is improved in two ways, in comparison with that of Fig. 15: it is not only colored, but also more accurate.

Fig. 17
figure 17

a 3D-model estimated from the \(m=8\) images of Fig. 2, which are RGB images. b Histogram of the distances between this 3D-shape and the ground truth (cf. Fig. 7c). Using RGB images improves the result, in comparison with the experiment of Fig. 15: the median of the point-to-point distances to the ground truth is now equal to 0.85 mm

6 Conclusion and Perspectives

In this article, we describe a photometric stereo-based 3D-reconstruction setup using LEDs as light sources. We first model the luminous flux emitted by a LED, then the resulting photometric stereo problem. We present a practical procedure for calibrating photometric stereo under point light source illumination, and eventually, we study several numerical solutions. Existing methods are based either on alternating estimation of normals and depth, or on direct depth estimation using image ratios. Both these methods have their own advantages, but their convergence is not established. Hence, we introduce a new, provably convergent solution based on alternating reweighted least-squares. Finally, we extend the whole study to RGB images.

The result of Fig. 18 suggests that our goal, i.e., the estimation of colored 3D-models of faces by photometric stereo, has been reached. Of course, many other types of 3D-scanners exist, but ours relies only on materials which are easy to obtain: a relatively mainstream camera, eight LEDs and an Arduino controller to synchronize the LEDs with the shutter release. Another significant advantage of our 3D-scanner is that it also estimates the albedo.

Fig. 18
figure 18

ac Three RGB images (out of \(m=8\)) of a face captured by our setup. d Estimated 3D-shape. e Colored 3D-model. Since their estimation is relative to the Lambertian planar calibration pattern, the colored albedos of the 3D-model may appear different from the colors of the images

However, there may still be some points where the shape, and therefore the albedo, are poorly estimated. In the example of Fig. 19, the area under the nose, which is dimly lit, is poorly reconstructed (this problem does not appear in the example of Fig. 18, because the face is oriented in such a way that it is “well” illuminated). Although such artifacts remain confined, thanks to robust estimation, future extensions of our work could get rid of them by resorting to an additional regularization term in the variational model.

Fig. 19
figure 19

ac Three images (out of \(m=8\)) of a face. d Estimated 3D-shape. e Colored 3D-model. The 3D-reconstruction is not satisfactory under the nose, which is a dimly lit area. Robustness of the proposed method to shadows could still be improved

Besides dealing with these defects, other questions arise. In particular, could we extend our 3D-scanner to full 3D-reconstruction, by coupling the proposed method with multi-view 3D-reconstruction techniques [24]? Aside from obtaining a more complete 3D-reconstruction, this would circumvent the difficult problem of handling possible discontinuities in a depth map, although Fig. 19 suggests that employing a non-convex estimator already partly allows the recovery of such sharp structures [14].

Eventually, the proposed numerical framework could be extended in order to automatically refine calibration. Several steps in that direction were already achieved in [38, 44, 51, 57], but either without convergence analysis [38, 44, 51] or in the restricted case where only the source intensities are refined [57]. Providing a provably convergent method for uncalibrated photometric stereo under point light source illumination would thus constitute a natural extension of our work.