1.1 Introduction

Inferring the 3D-shape of a scene is necessary in many applications such as quality control, augmented reality or medical diagnosis. Depending upon the requirements of the application, 3D-estimation can be carried out using a variety of technological solutions, from coordinate measuring machines to X-ray scanners. Over the last decades, digital cameras have become a reliable alternative to such sensors, as they represent a reasonable compromise between resolution and affordability. Given one or several 2D-images of a scene captured by a digital camera, the process of estimating its 3D-shape is called 3D-reconstruction. It is a classic inverse problem in computer vision, which has been addressed in several ways.

A 3D-model consists of a set of geometric (position, orientation, etc.) and photometric (color, texture, etc.) information. Knowing both these pieces of information allows to render synthetic images, by simulating the trajectory of the light rays from the sources to the camera, after reflection on the surface of the scene. 3D-scanning is the dual of rendering: one aims at a geometric and photometric characterisation of the scene’s surface by reversing the trajectory of the light rays. In fact, 3D-scanning includes both the subproblems of 3D-reconstruction (estimating the scene’s geometry) and appearance estimation (estimating its photometric properties).

The various 3D-reconstruction techniques from digital cameras are grouped under the generic terminology shape-from-X, X indicating that shape estimation can be based on various clues (shadows, contours, etc.). The main shape-from-X techniques are presented in Table 1.1. In this table, they are classified according to the clue they are based on (photometric or geometric) and the number of images they require.

Table 1.1 Main shape-from-X techniques. Geometric techniques aim at identifying and analysing features. This presentation rather focuses on photometric techniques, which aim at inverting a physics-based image formation model

Geometric shape-from-X techniques are built upon the identification of features in the image(s). On the other hand, photometric shape-from-X techniques are based on the analysis of the quantity of light received in each photosite of the camera’s sensor. Photometric 3D-reconstruction techniques indeed rely on a physics-based forward image formation model describing the interactions between light, matter and the camera, and aim at inverting this model in order to infer the geometry of the scene and, possibly, its photometric properties.

There exist out-of-the box solutions for geometric 3D-reconstruction, e.g. Microsoft Kinect (based on stereopsis or structured light, depending on the version), or the CMPMVS [54] or AliceVision [2] projects (based on structure-from-motion and stereopsis). On the contrary, there is a lack of such solutions for photometric techniques, which are usually rather viewed as “lab” reconstruction techniques because they rely on several assumptions on the acquisition setup. Still, they bear great promises in terms of level of geometric details which can be recovered, and of applicability to a wide range of materials.

The aim of this chapter is to present an overview of the three main photometric shape-from-X techniques: shape-from-shading, photometric stereo and shape-from-polarisation. We first review in Sect. 1.2 the shape-from-shading problem, which is a computer vision technique consisting in inferring geometry from a single image. Then, we discuss two techniques where multiple images are analysed under controlled incident or reflected lighting. In photometric stereo (Sect. 1.3), a series of images are acquired under varying incident lighting, which permits to estimate both the shape and the reflectance of the pictured surface. In shape-from-polarisation (Sect. 1.4), it is the state of polarisation of the reflected light which is analysed, by considering a series of images acquired with a controllable polarising filter attached to the camera. Section 1.5 eventually concludes this study by presenting the subsequent chapters of this volume.

1.2 Shape-from-Shading

Inferring 3D-geometry from a single image of a shaded surface is a problem known as shape-from-shading. This technique was first developed in the seventies at MIT, under the impulse of Horn [48].

1.2.1 Non-differential SfS Models

Let us briefly outline the problem, attaching to the camera a 3D-coordinate system Oxyz, such that Oxy coincides with the image plane and Oz with the optical axis. Assuming orthographic projection, the visible part of the scene is, up to a scale factor, a graph \(z = u(\mathbf{x} )\), where \(\mathbf{x} = [x,y]^\top \) is an image point. The SfS problem can be modelled by the image irradiance equation [49]:

$$\begin{aligned} I(\mathbf{x} ) = R(\mathbf{n} (\mathbf{x} )), \end{aligned}$$
(1.1)

where \(I(\mathbf{x} )\) is the graylevel at point \(\mathbf{x} \) (in fact, \(I(\mathbf{x} )\) is the irradiance at point \(\mathbf{x} \), but both quantities are proportional), and the radiance \(R(\mathbf{n} (\mathbf{x} ))\) gives the value of the light re-emitted by the surface as a function of its orientation, i.e. of the unit normal \(\mathbf{n} (\mathbf{x} )\) to the surface at the 3D-point \([x,y,u(\mathbf{x} )]^\top \) conjugate with \(\mathbf{x} \) (cf. Fig. 1.1). Assuming that, besides I, the radiance function R is also known, then solving Eq. (1.1) is a non-differential model of SfS, in which the unknown is the normal \(\mathbf{n} (\mathbf{x} )\).

Fig. 1.1
figure 1

The surface is represented as a graph \(z = u(\mathbf {x}\)), where \(\mathbf {x} = \left[ x,y\right] ^\top \) is an image point in the reconstruction domain \(\Omega \). The normal at the surface point \(\left[ x,y,u(\mathbf {x})\right] ^\top \) conjugate with \(\mathbf {x}\) is denoted by \(\mathbf {n}(\mathbf {x})\), and the incident light direction by \(\varvec{\omega }\)

Let us assume there is a unique light source at infinity, whose direction is characterised by the unit vector \(\varvec{\omega } = [\omega _{1},\omega _{2},\omega _{3}]^\top \in \mathbb {R}^{3}\), and whose intensity is denoted by \(\psi (\mathbf{x} )\). Let us also assume for simplicity that the surface is Lambertian, i.e. an ideal diffuse reflecting surface for which the apparent brightness is independent from the viewing angle. Then, R is written in such a way that Eq. (1.1) becomes

$$\begin{aligned} I(\mathbf{x} ) = r(\mathbf{x} ) \, \psi (\mathbf{x} ) \, \varvec{\omega }^\top \mathbf{n} (\mathbf{x} ). \end{aligned}$$
(1.2)

In Eq. (1.2), \(r(\mathbf{x} )\) is the reflectance (or albedo), and the scalar product \(\psi (\mathbf{x} ) \, \varvec{\omega }^\top \mathbf{n} (\mathbf{x} )\) is called shading. This is another example of non-differential SfS model.

Equation (1.2) is fundamentally ill-posed, according to the trompe-l’œil principle, which is well illustrated by Adelson and Pentland’s “workshop metaphor” [1] (cf. Fig. 1.2). If a painter, a light designer and a sculptor are asked to design an artwork explaining a given image \(I(\mathbf{x} )\), they may propose very different, but plausible, solutions. The painter will assume a planar surface and a uniform lighting, the changes in intensity being explained by changes in reflectance \(r(\mathbf{x} )\). The light designer may propose a sophisticated lighting configuration \(\psi (\mathbf{x} )\) placed in front of a planar surface with uniform reflectance. Eventually, the sculptor will assume lighting and reflectance are uniform and explain the changes in intensity solely by the shading, which results from variations in the local orientation \(\mathbf{n} (\mathbf{x} )\) of the surface.

Fig. 1.2
figure 2

Adelson and Pentland’s “workshop metaphor” [1]. To explain an image a in terms of reflectance, lighting and shape, b a painter, c a light designer and d a sculptor will design three different, but plausible, solutions. Inferring the shape d from a single image is the shape-from-shading problem

This last explanation, which comes down to inverting Eq. (1.2) in order to infer a 3D-shape, assuming everything is known but \(\mathbf{n} (\mathbf{x} )\), is precisely the shape-from-shading problem. So it is assumed that the reflectance is known, which is usually written \(r(\mathbf{x} ) \equiv 1\), and that the lighting is uniform, i.e. \(\psi (\mathbf{x} ) \equiv 1\).

1.2.2 Differential SfS Models

Let us now turn to differential SfS models. Under orthographic projection, the normal is easily expressed as

$$\begin{aligned} \mathbf{n} (\mathbf{x} )=\frac{1}{\sqrt{1+p(\mathbf{x} )^2+q(\mathbf{x} )^2}} \, [- p(\mathbf{x} ), - q(\mathbf{x} ),1]^\top , \end{aligned}$$
(1.3)

where

$$\begin{aligned} p := \frac{\partial u}{\partial x} \qquad \text {and} \qquad q := \frac{\partial u}{\partial y}, \end{aligned}$$
(1.4)

so that \(\nabla u(\mathbf{x} ) = [p(\mathbf{x} ),q(\mathbf{x} )]^\top \). It is easily deduced from Eqs. (1.2) and (1.3), assuming \(r(\mathbf{x} ) \equiv 1\) and \(\psi (\mathbf{x} ) \equiv 1\), that the following equation holds true for a general parallel lighting whose direction is characterised by \(\varvec{\omega } = [\omega _{1},\omega _{2},\omega _{3}]^\top \):

$$\begin{aligned} I(\mathbf{x} ) \sqrt{1+| \nabla u(\mathbf{x} ) |^{2}}+[\omega _{1},\omega _{2}] \, \nabla u(\mathbf{x} )-\omega _{3} = 0. \end{aligned}$$
(1.5)

This is a first-order nonlinear partial differential equation (PDE) of Hamilton–Jacobi type, which constitutes an example of differential SfS model, in which the unknown is now the function u, called the height map. This equation has to be solved on a compact domain \(\Omega \subset \mathbb {R}^{2}\), called the reconstruction domain.

The PDE which appears in most of the papers on SfS corresponds to a frontal lighting, i.e. \(\varvec{\omega } = [0,0,1]^\top \). This assumption leads to the eikonal equation, which is a particular case arising from the differential SfS model (1.5):

$$\begin{aligned} \left| \nabla u(\mathbf{x} ) \right| = f(\mathbf{x} ):= \sqrt{\frac{1}{I(\mathbf{x} )^2}-1}, \end{aligned}$$
(1.6)

where the graylevel function I, which typically takes integer values between 0 and 255, is implicitly resampled to take real values in the range [0, 1].

Note that even in the most simple case (the eikonal equation) we get a nonlinear PDE of the first order, and the solutions are a priori non-differentiable and non-unique, even if we complement Eq. (1.6) with a Dirichlet boundary condition, i.e. \(u=g\) on the boundary \(\partial \Omega \) of \(\Omega \). Moreover, the right-hand side of the eikonal equation is not always defined because \(I(\mathbf {x})\) can vanish at some points. A simple example is the hemisphere \(z=\sqrt{1-x^2-y^2}\) under a parallel frontal lighting. In this case, \(I(\mathbf {x})=0\) at the equator. However, in that simple situation the boundary condition \(u=0\) can help to solve the problem.

Under oblique light direction the same example becomes more difficult because there will be a black shadow region \(\Omega _s\subset \Omega \) where \(I(\mathbf {x}) \equiv 0\), and in that region the model has no information to reconstruct the surface. The boundary of \(\Omega _s\), which is not known a priori, is a curve where it would be difficult to impose boundary conditions in the numerical approximation. In general, the curve separating the region \(\Omega _s\) will depend on the shape of the surface and on the light source direction \(\omega \). Note that in case of black shadows, the model is clearly unable to produce a reasonable surface approximation, because the information is missing. In this situation, one can follow a global approach avoiding to impose boundary conditions on \(\partial \Omega _s\). This leads to the concept of maximal solution, where we solve the PDE on the whole domain with the standard Dirichlet boundary condition on \(\partial \Omega \) (not on \(\partial \Omega _s\)), and recover a linear reconstruction on \(\Omega _s\) (we refer to [17, 34, 35] for more details).

1.2.3 Ill-Posedness of the SfS Models

The “workshop metaphor” illustrated in Fig. 1.2 is representative of the ill-posedness of SfS, because a posteriori estimating 3D-geometry from a single image is possible only if reflectance and lighting are known a priori. The reliability of these priors is of fundamental importance to guarantee that the solution of SfS is meaningful.

This is illustrated in Fig. 1.3, which shows how the assumptions leading to Eq. (1.6), i.e. a uniform reflectance (\(r(\mathbf{x} ) \equiv 1\)) and a parallel uniform lighting (\(\psi (\mathbf{x} ) \equiv 1\) and \(\varvec{\omega } = [0,0,1]^\top \)), which is just a rough approximation in the case of Fig. 1.3a, yield the erroneous interpretation of the 3D-shape shown in Fig. 1.3b. However, this solution is an exact solution of Eq. (1.6), as shown by frontally relighting this uniformly white 3D-shape (cf. Fig. 1.3c).

Fig. 1.3
figure 3

Illustration of the importance of reflectance and lighting priors on the solution of SfS [25]. a A well-known graylevel image \(I(\mathbf{x} )\). b 3D-shape estimated by solving the SfS model (1.6), by (wrongly) assuming uniform reflectance \(r(\mathbf{x} ) \equiv 1\), and uniform frontal lighting, i.e. \(\psi (\mathbf{x} ) \equiv 1\) and \(\varvec{\omega } = [0,0,1]^\top \). The 3D-reconstruction largely departs from the true geometry. Yet, by taking from above a picture of the uniformly white 3D-shape b, using the camera’s flash as single light source, one gets image c, which resembles a. 3D-shape b is thus a plausible explanation of a: the bias comes from the inappropriate reflectance and lighting priors

Even when reflectance and lighting are known, i.e. when \(r(\mathbf{x} ) \equiv 1\) and \(\psi (\mathbf{x} ) \equiv 1\), the non-differential model (1.2) of SfS remains ill-posed:

$$\begin{aligned} I(\mathbf{x} ) = \varvec{\omega }^\top \mathbf{n} (\mathbf{x} ). \end{aligned}$$
(1.7)

Except for some sparse singular points, where \(\mathbf{n} (\mathbf{x} )\) points in the same direction of \(\varvec{\omega }\), there exists an infinity of surface normals explaining the graylevel in one pixel. It comes down from Eq. (1.7) that these normals \(\mathbf{n} (\mathbf{x} )\) form a revolution cone around the lighting direction \(\varvec{\omega }\). It is thus very difficult to locally solve SfS.

A simple example in dimension 1 is given by the surface \(z=u_1(x)=1-x^2\) in the interval \((-1,1)\) (cf. Fig. 1.4a), under vertical lighting \(\varvec{\omega } = [0, 1]^\top \), which satisfies an equation of the form (1.6) and the homogeneous boundary condition \(u_1(1) = u_1(-1) = 0\). However, the function \(u_2(x) = -u_1(x)\) in the same interval still satisfies this equation, since \(\left| \nabla u(\mathbf{x} ) \right| \) is the same and the same boundary condition holds. This example is an illustration of the famous concave/convex ambiguity of SfS.

Note that \(u_1\) and \(u_2\) are two differentiable solutions to the same problem. If we decide to accept also solutions which are differentiable only almost everywhere (a very natural choice in view of real applications), we suddenly have an infinite number of solutions which can be obtained just considering all the possible reflections of one of those solutions, e.g. \(u_1\), with respect to a horizontal axis located at the height \(z = h\), where \(h \in (0,1)\). This is illustrated in Fig. 1.4b, where three such solutions are exhibited. If one fixes the height at the singular point, then only one of these solutions can be accepted (we refer to [64] for this result), but such an additional knowledge is clearly not very realistic. A general theory for weak solutions of Hamilton–Jacobi equations (that includes the eikonal equation) has been developed in the last 20 years starting from the seminal paper by Crandall and Lions [24]. We refer the interested reader to the book [8] and the references therein.

Fig. 1.4
figure 4

a Example of the 1D-surface \(z = u_1(x) = 1-x^2\). b Under vertical lighting, three other solutions (amongst an infinity), which are differentiable almost everywhere, satisfy the same eikonal equation as \(u_1\)

Practical ways to reduce this ambiguity include resorting to more realistic models such as perspective camera [117] or near-lighting [89]. However, it has been shown that this remains insufficient to ensure well-posedness [16]. Recently, the introduction of an attenuation factor in the brightness equations relative to various perspective SfS models allowed to make the corresponding differential problems well-posed. In [18], a unified approach based on the theory of viscosity solutions has been proposed, showing that the brightness equations coming from different non-Lambertian reflectance models with the attenuation term admit a unique viscosity solution.

1.2.4 Numerical Approximation

An important step towards the numerical solving of SfS was achieved when inverse problems in computer vision caught the attention of mathematicians [64]. Efficient numerical approaches were suggested to solve the eikonal equation (1.6), which is the SfS fundamental differential model relating the surface slope to the image graylevel. By construction, this equation can only be solved globally, therefore, SfS ambiguities are reduced, in comparison with local approaches. They are not eliminated yet, because the concave/convex ambiguity remains.

An overview of the numerical methods for solving SfS can be found in [31, 134]. PDE-based methods (e.g. [34]) find a viscosity solution to the eikonal equation. Just to give an example let us consider the basic eikonal equation (1.6). A typical technique to solve it is using a finite difference scheme. One example is the following iterative Lax–Friedrichs scheme which, in its simplest form, can be written as

$$\begin{aligned} u^{(k+1)}_{i,j} =&\frac{u^{(k)}_{i-1,j}+u^{(k)}_{i+1,j}+u^{(k)}_{i,j-1}+u^{(k)}_{i,j+1}}{4} \\&- \frac{1}{2} \left( \sqrt{\left( \frac{u^{(k)}_{i+1,j}-u^{(k)}_{i-1,j}}{2}\right) ^2+\left( \frac{u^{(k)}_{i,j+1}-u^{(k)}_{i,j-1}}{2}\right) ^2} - f_{i,j} \right) , \nonumber \end{aligned}$$
(1.8)

where \(f_{i,j}\) is the right-hand side of the eikonal equation (1.6) at the pixel (ij), \(u_{i,j}\) is the height at this pixel, and the index k is the number of the iteration of the iterative scheme. The values \(\{u^0_{i,j}\}\) represent an initial guess for the height, typically a constant value. Let us briefly explain the meaning of the iterative scheme: the first term is an average of four values around the pixel (ij), and inside the square root there are the centered finite difference approximations of the partial derivatives \(\partial u / \partial x\) and \(\partial u/\partial y\). In practice, several approximation schemes are available, e.g. finite difference as illustrated in [86, 103], semi-Lagrangian schemes [32, 33]. Most of the efficient schemes use upwind approximations of the derivatives and additional terms to control the diffusion in the scheme. Let us also mention that a fast-marching version for these methods allows to drastically reduce the CPU time for this type of algorithms [26, 103] and has been extensively applied in the area of image processing. Another delicate point is that the graylevel function I is typically a discontinuous function, so the approximation scheme should take into account this lack of regularity (a result in this direction is in [36]).

On the other hand, many optimisation-based methods have been proposed to compute the normal field \(\mathbf {n}\). Under orthographic projection, (1.3) shows that \(\mathbf {n}\) just depends on p and q as defined in (1.4). Therefore, there exists a function \(\mathcal {R}\) such that \(\mathcal {R}(p(\mathbf {x}),q(\mathbf {x})) := R(\mathbf {n}(\mathbf {x}))\). From this and Eq. (1.1), the following least-squares variational model of SfS is derived (robust estimators have also been used [130]):

$$\begin{aligned} \underset{p,q:\,\Omega \rightarrow \mathbb {R}}{\min } \, \displaystyle \int _{\Omega } \Big | I(\mathbf{x} ) - \mathcal {R}(p(\mathbf{x} ),q(\mathbf{x} )) \Big |^2 \, \mathrm {d}\mathbf {x}. \end{aligned}$$
(1.9)

As already said in the previous subsection, this problem is clearly ill-posed. Nevertheless, if u is of class \(C^2\), p and q are two non-independent functions since, according to Schwarz’s theorem, \(\partial p / \partial y = \partial q / \partial x\). For numerical reasons [49], this hard constraint is usually replaced by a quadratic regularisation term weighted by a hyper-parameter \(\lambda >0\), which gives the following better-posed problem than (1.9):

$$\begin{aligned} \underset{p,q:\,\Omega \rightarrow \mathbb {R}}{\min } \, \displaystyle \int _{\Omega } \Big | I(\mathbf{x} ) - \mathcal {R}(p(\mathbf{x} ),q(\mathbf{x} )) \Big |^2 \, \mathrm {d}\mathbf {x} + \lambda \displaystyle \int _{\Omega } \displaystyle \Big | \frac{\partial p}{\partial y}(\mathbf{x} )-\frac{\partial q}{\partial x}(\mathbf{x} ) \Big |^2 \, \mathrm {d}\mathbf {x}. \end{aligned}$$
(1.10)

Another regularisation term has been extensively used, since it is easier to discretise [62]:

$$\begin{aligned} \underset{p,q:\,\Omega \rightarrow \mathbb {R}}{\min } \, \displaystyle \int _{\Omega } \Big | I(\mathbf{x} ) - \mathcal {R}(p(\mathbf{x} ),q(\mathbf{x} )) \Big |^2 \, \mathrm {d}\mathbf {x} + \lambda \displaystyle \int _{\Omega } \displaystyle \Big [ |\nabla p(\mathbf{x} )|^2 + |\nabla q(\mathbf{x} )|^2 \Big ] \, \mathrm {d}\mathbf {x}. \end{aligned}$$
(1.11)

Typical optimisation methods are descent methods. For instance, the Euler–Lagrange equations derived from (1.11) are written (dependencies on \(\mathbf {x}\) are omitted):

$$\begin{aligned} \big [ I - \mathcal {R}(p,q) \big ] \, \frac{\partial \mathcal {R}}{\partial p}(p,q) + \lambda \, \Delta p = 0 \quad \text {and} \quad \big [ I - \mathcal {R}(p,q) \big ] \, \frac{\partial \mathcal {R}}{\partial q}(p,q) + \lambda \, \Delta q = 0. \end{aligned}$$
(1.12)

Using the classical discrete approximation of the Laplacian \(\Delta p\) at pixel (ij):

$$\begin{aligned} \Delta p_{i,j} \approx \frac{p_{i+1,j}+p_{i-1,j}+p_{i,j+1}+p_{i,j-1}}{4} - p_{i,j}, \end{aligned}$$
(1.13)

the following iterative scheme for solving (1.11) comes down from (1.12) and (1.13) [53]:

$$\begin{aligned} {\left\{ \begin{array}{ll} p_{i,j}^{(k+1)} = \overline{p}_{i,j}^{(k)} + \displaystyle \frac{1}{\lambda } \, \big [ I_{i,j} - \mathcal {R}\big (p_{i,j}^{(k)},q_{i,j}^{(k)}\big ) \big ] \, \frac{\partial \mathcal {R}}{\partial p}\big (p_{i,j}^{(k)},q_{i,j}^{(k)}\big ), \\ [0.3cm] q_{i,j}^{(k+1)} = \overline{q}_{i,j}^{(k)} + \displaystyle \frac{1}{\lambda } \, \big [ I_{i,j} - \mathcal {R}\big (p_{i,j}^{(k)},q_{i,j}^{(k)}\big ) \big ] \, \frac{\partial \mathcal {R}}{\partial q}\big (p_{i,j}^{(k)},q_{i,j}^{(k)}\big ), \end{array}\right. } \end{aligned}$$
(1.14)

where \(\overline{p}\) denotes the local average of p, \(p^{(0)}\) and \(q^{(0)}\) are given initial conditions, and the index k is the number of the iteration.

To avoid divergence for such schemes [30], it has been proposed to directly minimise the functional in (1.11), using conjugate gradient descent [62, 115] or line search [29], but the approximate solution is typically a local minimum. A way to overcome this limitation is to use a global optimisation, e.g. simulated annealing [27]. Finally, to decrease the CPU time, it has been dealt with multi-resolution [115].

Even if some optimisation-based methods aim to directly solve the SfS problem in the height u, as for instance [23] where a parametric model with few parameters is used, most of them first compute a normal field \(\mathbf {n}\). Once the components p and q of the normal (cf. Eq. (1.4)) have been computed, it remains to integrate them into a height map. Several methods can be used for this task, depending on the application’s requirements in terms of speed, robustness to noise in the estimated normal field and preservation of discontinuities [95]. For instance, a standard solution for the recovery of a smooth height map consists in considering the quadratic variational problem:

$$\begin{aligned} \underset{u:\,\Omega \rightarrow \mathbb {R}}{\min } \int _\Omega \bigg | \nabla u(\mathbf {x}) - \begin{bmatrix} p(\mathbf {x}) \\ q(\mathbf {x}) \end{bmatrix} \bigg |^2\,\mathrm {d}\mathbf {x}, \end{aligned}$$
(1.15)

which can be solved, e.g. using Fourier analysis [37], discrete sine or cosine transform [109] or iterative methods [7], depending upon the shape of \(\Omega \) and the boundary conditions.

1.2.5 Applications of SfS

The natural application of SfS is the 3D-reconstruction of a scene from a single image. However, in real-world settings the assumptions formulated above on reflectance and lighting are too restrictive. Therefore, efforts have recently been devoted to move beyond the assumptions of Lambertian reflectance [56, 118, 119] and controlled illumination [55, 92]. In such works, reflectance and lighting are allowed to take a more general form, yet they still must be calibrated. To remove this limitation, additional priors must be introduced, as it is common in the field of intrinsic image decomposition where the reflectance is often assumed to be piecewise smooth [9]. Alternatively, deep learning techniques can be employed to simultaneously estimate shape, reflectance and lighting, provided that the object to reconstruct resembles those in the learning database [102]. In the absence of such priors, SfS can be combined with another 3D-reconstruction technique: the latter provides a coarse prior on geometry, whose details are then refined using SfS. In this view, SfS has been combined with shape-from-texture [123], structure-from-motion [38], multi-view stereopsis [61, 68, 72] or depth sensors [41, 85].

An alternative strategy to resolve the ambiguities of SfS consists in using additional images taken under varying lighting. This approach, which is called photometric stereo, will be discussed in the next section.

1.3 Photometric Stereo

The photometric stereo technique, first developed by Woodham [129], is an extension of SfS which considers several images acquired under the same viewing angle, but various lighting conditions.

1.3.1 Well-Posedness of PS

One may reasonably hope that shape inference by PS will be better-posed, in comparison with the single-image case of SfS. Indeed, 3D-shape and Lambertian reflectance can be exactly and uniquely determined from a set of three images taken under non-coplanar, uniform, calibrated directional lighting. This is easily shown by considering a system of \(m\ge 3\) image irradiance equations such as (1.2), obtained under illumination with uniform intensity \(\psi (\mathbf {x}) \equiv 1\), but varying direction characterised by vectors \(\varvec{\omega }_i,\,i \in \{1,\dots ,m\}\):

$$\begin{aligned} I_i(\mathbf{x} ) = r(\mathbf{x} ) \, \varvec{\omega }_i^\top \mathbf{n} (\mathbf{x} ), \qquad i \in \{1,\dots ,m\}. \end{aligned}$$
(1.16)

This system of equations comes down to a linear system of m equations in \(\mathbf{m} (\mathbf{x} ) := r(\mathbf{x} )\,\mathbf{n} (\mathbf{x} )\). Provided that \(m=3\) and the three illumination vectors \(\varvec{\omega }_i,\,i\in \{1,2,3\}\), are non-coplanar, there exists a unique solution \(\mathbf{m} (\mathbf{x} )\) of this system, from which the albedo can be extracted as \(r(\mathbf{x} ) = |\mathbf {m}(\mathbf {x}) |\) and the surface normal as \(\mathbf {n}(\mathbf {x}) = \frac{\mathbf {m}(\mathbf {x})}{|\mathbf {m}(\mathbf {x}) |}\). When \(m>3\), an approximate solution of the system can be estimated as long as the m illumination vectors remain non-coplanar. An example of result obtained with this approach on a banknote is presented in Fig. 1.5. It illustrates well the unique ability of PS both to estimate fine-scale geometric details, and to estimate the reflectance.

Fig. 1.5
figure 5

Photometric stereo-based 3D-reconstruction of a 10 euro banknote. From a set of images captured under varying lighting (left), PS infers both the surface geometry (top-right, we show the RGB-coded estimated normals and the 3D-shape obtained by integration of the normals), as well as its reflectance (bottom-right, we show the estimated albedo) 

PS can however be ill-posed in two particular scenarios. Firstly, when lighting is unknown (uncalibrated PS), the local estimation of surface normals is under-constrained. As in SfS, the problem must be reformulated globally, and the integrability constraint must be imposed [133]. But even then, a low-frequency ambiguity known as the generalised bas-relief ambiguity remains [12]: it is necessary to introduce additional priors, see [107] for an overview of existing uncalibrated photometric stereo approaches, and [19] for a modern solution based on deep learning. Another situation where PS is ill-posed is when only two images are considered [91]: in each pixel there exist two possible normals explaining the pair of graylevels, even with known reflectance and lighting. Again, integrability must be imposed in order to limit the ambiguities [84].

1.3.2 Numerical Solving of PS

In the previous subsection, we described a simple strategy to estimate the surface normals by photometric stereo. The knowledge of surface normals is however not sufficient to fully characterise the geometry of the pictured scene. To obtain a complete 3D-representation, the normals must then be integrated into a height map. We have already discussed this integration problem in Sect. 1.2.4, and we refer the reader to [95] for a comprehensive overview.

With this pipeline, one first estimates the surface normals, and then integrates them into a height map. This strategy is however suboptimal, since any error in the normal estimation step will propagate during the subsequent normal integration one. An alternative strategy is to reformulate (1.16) as a system of partial differential equations in the unknown height map u, and directly estimate u. For instance, one may consider the ratio of two equations such as (1.16), for \(i\ne j\), while replacing the surface normal \(\mathbf{n} (\mathbf{x} )\) by its definition (1.3). This yields the following PDE:

$$\begin{aligned} \left[ I_i(\mathbf{x} ) \, \varvec{\omega }_j - I_j(\mathbf{x} ) \, \varvec{\omega }_i \right] ^\top \begin{bmatrix} -\nabla u(\mathbf{x} ) \\ 1 \end{bmatrix} = 0, \qquad \forall \mathbf{x} \in \Omega , \end{aligned}$$
(1.17)

which is linear in \(\nabla u\), and is independent from the reflectance r. It can be solved, e.g. using a finite difference upwind scheme or semi-Lagrangian methods [69]. When more than a single pair of images is considered, the joint approximate solving of the system of equations such as (1.17), obtained for every pair \(\{i,j\}\), can be formulated as a variational problem:

$$\begin{aligned} \underset{u:\,\Omega \rightarrow \mathbb {R}}{\min } \, \underset{i<j}{\sum \sum } \int _\Omega \Bigg | \left[ I_i(\mathbf{x} ) \, \varvec{\omega }_j - I_j(\mathbf{x} ) \, \varvec{\omega }_i \right] ^\top \begin{bmatrix} -\nabla u(\mathbf{x} ) \\ 1 \end{bmatrix} \Bigg |^2\,\mathrm {d}\mathbf {x}. \end{aligned}$$
(1.18)

Such an approach, initially proposed in [90, 110], also easily extends to more elaborate camera or reflectance models [71].

Nevertheless, this ratio-based approach does not provide the reflectance, contrarily to the simple pipeline presented in the previous subsection. Moreover, solving the linearised partial differential equations (1.17) is not equivalent to solving the original Eqs. (1.16): for instance, Gaussian noise on the images turns into Cauchy noise on the ratios, making least-squares inference suboptimal. Thus, the joint recovery of height and reflectance by variational inversion of the image irradiance Eqs. (1.16) has also been explored. For example, plugging the definition (1.3) of \(\mathbf {n}(\mathbf {x})\) into (1.16), the joint estimation of height and reflectance in a least-squares sense leads to

$$\begin{aligned} \underset{u,r:\,\Omega \rightarrow \mathbb {R}}{\min } \sum _{i=1}^m \int _\Omega \Bigg | I_i(\mathbf{x} ) - r(\mathbf{x} ) \, \varvec{\omega }_i^\top \begin{bmatrix} -\nabla u(\mathbf{x} ) / \sqrt{1 + |\nabla u(\mathbf{x} ) |^2} \\ 1 / \sqrt{1 + |\nabla u(\mathbf{x} ) |^2} \end{bmatrix} \Bigg |^2\,\mathrm {d}\mathbf {x}, \end{aligned}$$
(1.19)

which can be solved, e.g. using alternating reweighted least-squares [93].

1.3.3 PS with Non-trivial Reflectance or Lighting

The surface has been assumed Lambertian in our models, and lighting has been assumed directional but those assumptions are difficult to satisfy in real-world scenarios. An important feature of PS, in comparison with SfS, is that the redundancy provided by the multiple images enables relaxing such assumptions. Indeed, shadows or off-Lambertian effects such as specularities can be coped with by solving PS in a robust manner, for instance, by resorting to sparse regression which treats such effects as outliers to the Lambertian model [52, 93]. Other ways to deal with off-Lambertian effects include inverting a reflectance model which is more sophisticated than Lambert’s [71, 106] or pre-processing the images according to a low-rank prior [131]. Let us also mention data-driven methods, which either compare the intensity variations with those observed on a reference object with known shape [46] or resort to a deep neural network trained or large dataset [98].

Another direction of research on PS is the study of more realistic lighting models, in order to simplify the acquisition of data. For instance, some methods have been developed to handle images acquired under nearby point light illumination [70], which finds a natural application in LED-based photometric stereo [94]. This permits to build a simple acquisition setup based on cheap hardware. Extended light sources have also been considered, which permits for instance to use the screen of a LCD display as light source [21]. Eventually, other approaches have considered the case of natural illumination in order to bring PS outdoor [11], and numerical solving methods based on variational principles [42] or deep learning [47] have recently been suggested.

1.3.4 Combining PS and Other 3D-Reconstruction Methods

A criticism which is frequently formulated against PS is that it excels with the recovery of high-frequency geometric details, yet it is prone to a low-frequency bias which may distort the global geometry [82]. In fact, such a bias usually comes from a contradiction between the assumptions behind the image formation model and the actual experiments, e.g. assuming a directional light source instead of a nearby point light one. Therefore, the methods discussed in the previous subsection provide a remedy to such a bias.

On the other hand, it is sometimes simpler from a practical perspective to stick to the simplest assumptions, and rather remove the low-frequency bias by coupling PS with another 3D-reconstruction method such as shape-from-silhouette [122], multi-view stereopsis [63] or depth sensing [87]. In such works, PS provides the fine-scale geometric details, which are combined with the gross geometry provided by the alternative technique.

Another interesting application of PS is 3D-reconstruction from a single shot, which can be achieved by using a multichannel camera coupled with monochromatic coloured light sources which are simultaneously turned on: each channel can then be viewed as a graylevel image obtained under a single light source. This idea, which dates back from the nineties [60], has more recently been applied to the real-time 3D-reconstruction of deformable surfaces by combining PS with optical flow [44] or scene flow [40].

1.3.5 Applications of PS

The ability of PS to estimate both the fine-scale geometric details and the reflectance of the surface has proven useful in many applications. Here, we briefly highlight a few of them. For instance, PS can be used to infer 3D-models for augmented reality, which can be very helpful for computer-aided surgery using laparoscopy [22]. Another medical application of PS is the characterisation of the melanoma’s shape and color, as proposed in [114]. Besides medical applications, PS has been extensively used in the field of quality control, e.g. for the inspection of defects on metallic surfaces [113]. Also, let us mention Reflectance Transform Imaging (RTI) techniques, which are based on PS principles, and allow one to interactively relight the pictured surfaces. Such an approach finds a natural application in the field of cultural heritage, see the recent survey [88] for an overview. Finally, Chap. 7 in the present volume addresses a novel application, which is the estimation of facial aging.

1.4 Shape-from-Polarisation

Another problem belonging to the shape-from-X class is the shape-from-polarisation one. The goal is the same, i.e. recover the 3D-shape of the object, but starting from a different input data, given by polarisation information.

1.4.1 Description and Generation of a Polarisation Image

When unpolarised light is reflected by a surface, it becomes partially polarised [128]. This applies to both specular [96] and diffuse [4] reflections caused by subsurface scattering. Using a linear polarising filter placed in front of a camera, a sequence of \(m\ge 3\) images (cf. Fig. 1.6a) is captured by rotating the filter under varying polariser angle \(\vartheta _j\), \(j\in \left\{ 1,\dots ,m\right\} \). The measured brightness at each pixel \(\mathbf{x} \) varies in accordance to the transmitted radiance sinusoid corresponding to

$$\begin{aligned} i_{\vartheta _j}(\mathbf{x} )=\frac{I_{\text {max}}(\mathbf{x} )+I_{\text {min}}(\mathbf{x} )}{2} +\frac{I_{\text {max}}(\mathbf{x} )-I_{\text {min}}(\mathbf{x} )}{2}\cos [2\vartheta _j-2\phi (\mathbf{x} )], \end{aligned}$$
(1.20)

where \(\phi (\mathbf{x} )\) is the phase angle, \(I_{\text {max}}\) the maximum measured pixel brightness and \(I_{\text {min}}\) the minimum one.

Fig. 1.6
figure 6

Pictures taken and adapted from [112]

a Polarimetric capture, and bd decomposition into polarisation images, from captured data of a piece of fruit.

A polarisation image (cf. Fig. 1.6b–d), i.e. the full set of polarisation data for a given object or scene, can be obtained by decomposing the sinusoid at every pixel into three separate components [127]: the phase angle, \(\phi (\mathbf{x} )\), the unpolarised intensity, \(i_{\text {un}}(\mathbf{x} )\) and the degree of polarisation, \(\rho (\mathbf{x} )\), where

$$\begin{aligned} i_{\text {un}}(\mathbf{x} )=\frac{I_{\text {max}}(\mathbf{x} )+I_{\text {min}}(\mathbf{x} )}{2}\ \quad \text { and }\ \quad \rho (\mathbf{x} ) = \frac{I_{\text {max}}(\mathbf{x} )-I_{\text {min}}(\mathbf{x} )}{I_{\text {max}}(\mathbf{x} ) +I_{\text {min}}(\mathbf{x} )}. \end{aligned}$$
(1.21)

The phase angle \(\phi (\mathbf{x} )\) is directly related to the angle of the linearly polarised component of the reflected light and can be defined as the angle of maximum or minimum transmission. Since polarisers cannot distinguish between two angles separated by \(\pi \) radians, the range of initially acquired phase measurements is \([0,\pi )\). Therefore, there is a \(\pi \) ambiguity, since two maxima in pixel brightness are found as the polariser is rotated through \(2\pi \). The unpolarised image \(i_{\text {un}}(\mathbf{x} )\) is simply the image that would be obtained using a standard camera. The degree of polarisation \(\rho (\mathbf{x} )\) can be defined in terms of refractive index and zenith angle of the surface normal [128], but the explicit formula is different depending on the polarisation model used, as we will see in Sect. 1.4.2 below.

These quantities can be estimated from the captured image sequence using different methods, e.g. the Levenberg–Marquardt nonlinear curve fitting algorithm [4], linear methods [50] or following the procedure suggested by Wolff in [127] for the specific case of \(m=3\), \(\vartheta \in \{ 0, \frac{\pi }{4}, \frac{\pi }{2}\}\).

1.4.2 Diffuse and Specular Polarisation Models

A polarisation image provides information on the azimuth and zenith angles of the normal, and, hence, a constraint on the surface normal direction at each pixel. The exact nature of the constraint depends on the polarisation model used.

Using a diffuse polarisation model, the phase angle \(\phi (\mathbf{x} )\) is the polariser angle \(\vartheta _j\) at which \(I_{\text {max}}\) is observed. It determines the azimuth angle \(\alpha (\mathbf{x} )\in [0,2\pi [\) of the surface normal up to a \(\pi \) ambiguity: \(\alpha (\mathbf{x} ) = \phi (\mathbf{x} )\ \text {or}\ \phi (\mathbf{x} ) + \pi \). The degree of polarisation \(\rho _d(\mathbf{x} )\), on the other hand, is related to the zenith angle \(\theta (\mathbf{x} )\in [0,\frac{\pi }{2}]\) of the normal in viewer-centered coordinates (i.e. the angle between the normal and viewer) as follows:

$$\begin{aligned} \rho _d(\mathbf{x} ) = \frac{\sin ^2\theta (\mathbf{x} )\, \big (\eta - \frac{1}{\eta }\big )^2}{4 \cos \theta (\mathbf{x} )\, \sqrt{\eta ^2 - \sin ^2 \theta (\mathbf{x} )} - \sin ^2 \theta (\mathbf{x} ) \, \big (\eta + \frac{1}{\eta }\big )^2 + 2\, \eta ^2 + 2}, \end{aligned}$$
(1.22)

where \(\eta \) is the refractive index (in general, \(\eta \) is unknown, but for most dielectrics typical values range between 1.4 and 1.6, hence an accurate estimate of geometry can be obtained without a precise estimate of \(\eta \) [4]).

Instead, using a specular polarisation model, the azimuth angle of the surface normal is perpendicular to the phase of the specular polarisation [97] leading to a \(\frac{\pi }{2}\) shift, so that the azimuth angle corresponds to polariser angle \(\vartheta _j\) at which \(I_{\text {min}}\) is observed: \(\alpha (\mathbf{x} ) = \phi (\mathbf{x} ) \pm \frac{\pi }{2}\). Regarding the degree of polarisation \(\rho _s(\mathbf{x} )\), it relates to the zenith angle according to

$$\begin{aligned} \rho _s(\mathbf{x} ) = \frac{2 \sin ^2 \theta (\mathbf{x} ) \, \cos \theta (\mathbf{x} ) \, \sqrt{\eta ^2-\sin ^2 \theta (\mathbf{x} )}}{\eta ^2-\sin ^2 \theta (\mathbf{x} )-\eta ^2 \sin ^2 \theta (\mathbf{x} )+2 \sin ^4 \theta (\mathbf{x} )}, \end{aligned}$$
(1.23)

and in that case the dependency of the degree of polarisation \(\rho _s\) on \(\eta \) is weaker than in the diffuse case.

1.4.3 3D-Shape Recovery Using Polarisation Information

The phase angle \(\phi (\mathbf{x} )\) (cf. Fig. 1.6b) and the degree of polarisation \(\rho (\mathbf{x} )\) (cf. Fig. 1.6d) of reflected light convey information about the surface orientation through information on zenith and azimuth angles and, therefore, provide a cue for 3D-shape recovery.

There are nice and attractive properties to the SfP cue: it requires only a single viewpoint and a single illumination condition, it is invariant to illumination direction and surface albedo, and it provides information about both the zenith and azimuth angle of the surface normal. Unfortunately, the polarisation information alone restricts the surface normal at each pixel to two possible directions, providing in such a way only ambiguous estimates of the surface orientation.

SfP methods can be categorised into three groups:

  1. 1.

    Methods which use only polarisation information (cf. Sect. 1.4.3.1). They are passive since, typically, a polarisation image is obtained by capturing a sequence of images in which a linear polarising filter is rotated in front of the camera (possibly with unknown rotation angles [100]). These methods can be considered “single shot” methods by using custom CCD cameras configured for polarisation imagingFootnote 1 or by mounting the polarisation filter on a CMOS sensor in order to acquire polarisation information in real timeFootnote 2).

  2. 2.

    Methods which combine polarisation with shading cues (cf. Sect. 1.4.3.2).

  3. 3.

    Methods which combine a polarisation image with an additional cue (cf. Sect. 1.4.3.3) such as stereo, multispectral measurements, an RGBD sensor or active polarised illumination.

SfP methods can also be classified according to the polarisation model (dielectric versus metal, diffuse [4, 50, 77], specular [79] or hybrid models [116]) and whether they compute shape in the surface normal or surface height domain.

1.4.3.1 Resolution Using Only Polarisation Information

The earliest work focused on capture, decomposition and visualisation of polarisation images was by Wolff in the nineties [127], even if older works on shape recovery by polarisation information exist since 1962 [108]. Both Atkinson and Hancock [4] and Miyazaki et al. [77] disambiguated the polarisation normals via propagation from the boundary under an assumption of global convexity. Huynh et al. [50] also disambiguated polarisation normals with a global convexity assumption, estimating the refractive index in addition. These works used a diffuse polarisation model whereas Morel et al. [79] used a specular polarisation model for metals. Recently, Taamazyan et al. [116] introduced a mixed diffuse/specular polarisation model. All of these methods estimate surface normals which must then be integrated into a height map. Moreover, since they rely entirely on the weak shape cue provided by polarisation and do not enforce integrability, the results are extremely sensitive to noise.

1.4.3.2 Polarisation and Shading Cues

A polarisation image also contains an unpolarised intensity channel (cf. Fig. 1.6c), which provides a shading cue. Mahmoud et al. [67] used a shape-from-shading cue assuming known light source direction, known albedo and Lambertian reflectance, in order to disambiguate the polarisation normals. Atkinson and Hancock [6] used calibrated, three-source Lambertian photometric stereo for disambiguation but avoiding an assumption of known albedo. Smith et al. [111] showed how to express polarisation and shading constraints directly in terms of surface height, leading to a robust and efficient linear least-squares solution. They also showed how to estimate the illumination, up to a binary ambiguity, making the method uncalibrated. However, they require known or uniform albedo. This requirement was afterwards relaxed in [112], where spatially varying albedo was estimated from a single polarisation image, assuming known illumination and strong smoothness assumptions. In [120] variants of the aforementioned method have been exploited by introducing additional constraints which arise when a second light source is considered, allowing to relax the uniform albedo assumption even under unknown lighting. In this work, albedo-invariant or phase-invariant formulations were proposed. Another differential approach has been proposed in [65], where the geometry of the object is described through its level-sets for both diffuse and specular reflections. Ngo et al. [83] derived constraints which allowed surface normals, light directions and refractive index to be estimated from polarisation images under varying lighting. However, this approach requires at least four light sources.

1.4.3.3 Combining Polarisation with Other Cues

In order to solve the ambiguities generated by models using only polarisation information, some attempts have been done combining SfP with other cues. In addition to photometric cues (from SfS or PS), auxiliary geometric information can be considered. Stereo cues has been combined with polarisation to obtain surface orientation information since the nineties [126]. Rahmann et al. [96] proposed to reconstruct specular surfaces taking polarisation images from multiple views. The reconstruction problem is solved by an optimisation scheme where the surface geometry is modelled by a set of hierarchical basis functions. Atkinson et al. [3, 5] refined estimates of the surface normal to establish correspondences between two views of an object, extracting surface patches from each view. Multi-View Stereo (MVS) and polarisation have also been adopted for transparent and specular objects [73, 76], and a polarimetric MVS method applied to objects with mixed polarisation models is proposed in [28]. With respect to this last paper, which is offline and needs a manual preparation, Yang et al. proposed in [132] a fully automatic approach to produce a height map in real time using two views. More than two views have been used in [20]. Space carving [75, 76] or RGBD sensors [57, 58] have been employed to obtain initial 3D-shape, from which the ambiguities in SfP are resolved. Zhu et al. [135] used polarisation and an RGBD stereo pair to disambiguate the polarisation surface normal estimates using a higher order graphical model. Cameras with multiple spectral bands [51, 74] could be useful for disambiguating and estimating the refractive index of the surface.

1.4.4 An Example of Numerical Resolution for Shape Recovery

In this section, we want to give an example of numerical resolution of the SfP problem, either by following a non-differential approach, which considers as unknowns the partial derivatives p and q as defined in Eq. (1.4), or by solving a linear differential system directly in the height u.

We assume orthographic projection and directional illumination. We consider only the diffuse polarisation model, hence, the degree of polarisation is defined as in (1.22), and the object we want to recover is composed by dielectric (i.e. non metallic) materials. Moreover, the refractive index \(\eta \) is supposed to be a known constant, and interreflections are neglected. In order to estimate the phase angle \(\phi (\mathbf{x} )\) and the degree of diffuse polarisation \(\rho _d(\mathbf{x} )\) at each point, we fit the data to the transmitted radiance sinusoid (1.20) following one of the aforementioned methods, e.g. the idea by Wolff [127]. The zenith angle \(\theta (\mathbf{x} )\) of the surface normal can be obtained from Eq. (1.22) arriving to

$$\begin{aligned}&\cos \theta (\mathbf{x}) = \mathbf {n}(\mathbf{x})\cdot \mathbf {v} = f(\rho (\mathbf{x}),\eta ) = \\&\sqrt{\frac{2\, \rho + 2\, \eta ^2\, \rho - 2\, \eta ^2 + \eta ^4 + {\rho }^2 + 4\, \eta ^2\, {\rho }^2 - \eta ^4\, {\rho }^2 - 4\, \eta ^3\, \rho \, \sqrt{- \left( \rho - 1\right) \, \left( \rho + 1\right) } + 1}{\eta ^4\, {\rho }^2 + 2\, \eta ^4\, \rho + \eta ^4 + 6\, \eta ^2\, {\rho }^2 + 4\, \eta ^2\, \rho - 2\, \eta ^2 + {\rho }^2 + 2\, \rho + 1}}, \nonumber \end{aligned}$$
(1.24)

where we have denoted \(\rho _d\) simply by \(\rho \) and we have dropped the dependency of \(\rho \) on \(\mathbf{x}\) for readability. The normal vector defined in (1.3) can be written in terms of azimuth and zenith angles as

$$\begin{aligned} \mathbf {n}(\mathbf{x}) = \begin{bmatrix} \cos \alpha (\mathbf{x})\sin \theta (\mathbf{x}) \\ \sin \alpha (\mathbf{x})\sin \theta (\mathbf{x}) \\ \cos \theta (\mathbf{x}) \\ \end{bmatrix}. \end{aligned}$$
(1.25)

Remembering that the phase angle \(\phi (\mathbf{x} )\) determines the azimuth angle \(\alpha (\mathbf{x} )\) of the normal up to a \(\pi \) ambiguity (\(\alpha (\mathbf{x} ) = \phi (\mathbf{x} )\ \text {or}\ \alpha = \phi (\mathbf{x} ) + \pi \)), the normal vector can be estimated up to an ambiguity. Several attempts have been done in order to disambiguate the azimuth angle, as explained in Sect. 1.4.3.1. Once the surface normal has been estimated, by integration we can recover the height, which is our real and final unknown to be found. Again, we refer the interested reader to Sect. 1.2.4 and the survey [95] for some discussion on the integration problem.

As an alternative, we can solve the problem directly in the unknown height following a differential approach, starting again from a single polarisation image, but using also the unpolarised intensity quantity, which is the image obtained using a standard camera for the SfS problem. For example, let us assume Lambertian reflectance, known illumination and uniform albedo that is factored into the light source vector \(\varvec{\omega }\). The shading constraint coming from the unpolarised intensity channel of a polarisation image reads as (cf. Eq. (1.5)):

$$\begin{aligned} i_{\text {un}}(\mathbf{x}) = \frac{- \omega _1 \, p(\mathbf{x}) - \omega _2 \, q(\mathbf{x}) + \omega _3}{\sqrt{1+p(\mathbf{x})^2+q(\mathbf{x})^2}}. \end{aligned}$$
(1.26)

Since we are working in a viewer-centered coordinate system, with the viewer \(\mathbf {v} = \left[ 0,0,1\right] ^\top \), Eq. (1.24) simplifies to \(n_3(\mathbf{x}) = f(\rho (\mathbf{x}),\eta )\), which can be expressed in terms of the surface gradient as

$$\begin{aligned} f(\rho (\mathbf{x}),\eta ) = \frac{1}{\sqrt{1+p(\mathbf{x})^2+q(\mathbf{x})^2}}. \end{aligned}$$
(1.27)

Now, by using the image ratio technique commonly applied also in PS-SfS problems [119], taking a ratio between (1.26) and (1.27), the nonlinear normalisation factor vanishes, yielding the following linear equation in the surface gradient:

$$\begin{aligned} \frac{i_{\text {un}}(\mathbf{x})}{f(\rho (\mathbf{x}),\eta )} = - \omega _1 \, p(\mathbf{x}) - \omega _2 \, q(\mathbf{x}) + \omega _3. \end{aligned}$$
(1.28)

Instead of disambiguating the polarisation normals at each pixel locally, as illustrated before following a non-differential approach, here we express the azimuth ambiguity as a collinearity condition which is satisfied by either of the two possible azimuth angles. In this way, we postpone resolution of the ambiguity until surface height is computed, solving the azimuthal ambiguities in a globally optimal way.

More in detail, for the diffuse case we require that the projection of the surface normal into the image plane Oxy, \([n_1(\mathbf{x}),n_2(\mathbf{x})]^\top \), is collinear with a vector pointing in the phase angle direction, \([\sin \phi (\mathbf{x}),\cos \phi (\mathbf{x})]^\top \). This requirement translates into the following condition:

$$\begin{aligned} \mathbf {n}(\mathbf{x})^\top [\cos \phi (\mathbf{x}),-\sin \phi (\mathbf{x}),0]^\top = 0. \end{aligned}$$
(1.29)

By rewriting \(\mathbf {n}(\mathbf{x})\) in terms of the surface gradient, noting that the nonlinear normalisation term is always non-null, we obtain from Eq. (1.29) a second linear equation in the surface gradient:

$$\begin{aligned} - p(\mathbf{x})\cos \phi (\mathbf{x}) + q(\mathbf{x})\sin \phi (\mathbf{x}) = 0. \end{aligned}$$
(1.30)

At this point, after approximating the surface gradient, e.g. by using finite differences, we arrive to a linear system of equations in terms of the unknown surface height, which can be solved using linear least-squares. For stability reasons, priors on convexity and smoothness can be added to the linear system. For more information on this idea and for details on the implementation, we refer the interested reader to [111, 112].

1.4.5 Applications

The polarisation state of light reflected by a surface provides a cue on the material properties of the surface and, via a relationship with surface orientation, the 3D-shape. Polarisation has been used for several applications since the nineties, including early work on material segmentation [128] and diffuse/specular reflectance separation [80]. In recent years, there has been an increasing interest in using polarisation information for 3D-shape estimation [57, 83, 111, 116]. Nice applications include polarised laparoscopy [45] or in general biomedical applications [121]. In addition to the use of polarisation information for 3D-reconstruction, recently several other applications are using polarisation for different tasks. For example, for image segmentation [104], robot dynamic navigation [13, 14], image enhancement [99, 100] and reflection separation by a deep learning approach [66], which simplifies previous works requiring three images from different polariser angles [59, 101, 124]. For more details on possible applications, we refer the interested reader to Chap. 6 of this volume.

1.5 A Short Presentation of This Volume

As we said in the introduction, the volume contains several contributions which represent recent trends in 3D-reconstruction via photometric techniques. Here is an overview of the chapters.

In Chap. 2, Breuss and Yarahmadi focus on a more realistic shape-from-shading model than that we described in Sect. 1.2, where perspective projection is considered. A comprehensive state of the art of perspective SfS (PSfS) is carried out. The case of a Lambertian surface illuminated either by a parallel and uniform luminous flux, or by a nearby point light source, is more specifically addressed. Finally, a comparative study is carried out between two methods of resolution of the PSfS problem under directional lighting, both of which are based on the fast-marching algorithm.

In Chap. 3, Or-El et al. tackle the problem of refining the depth map provided by RGBD sensors, by applying shape-from-shading techniques. The authors propose three ways to solve this problem. First, by a model-based approach effective for Lambertian surfaces, which refines the depth map by a SfS strategy applied to the RGB image. Then, they extend this approach to specular objects, using a Phong-type model and the InfraRed image with the attached (near) light source. Lastly, a deep learning-based solution is proposed.

In Chap. 4, Gallardo et al. tackle the problem of the 3D-reconstruction of deformable surfaces using non-rigid structure-from-motion and shading. The authors propose an optimisation-based strategy, which aims at finding the geometry (parameterised by vertices) and reflectance (parameterised by a finite set of albedo values) which minimise a cost function combining a shape-from-shading term and a structure-from-motion one. Additional terms are also included in the cost function: a contour boundary one, a smoothness one and a quasi-isometry one. The resulting non-convex optimisation problem is addressed by a careful heuristical initialisation followed by an iterative, Gauss–Newton-based refinement over all variables in a multi-scale fashion. The proposition is evaluated both qualitatively and quantitatively against the state of the art.

In Chap. 5, Brahimi et al. present a theoretical contribution on the well-posedness of uncalibrated photometric stereo under general illumination. In particular, they prove that there is no ambiguity for the perspective model if lighting is represented by first-order spherical harmonics. In the process of establishing their main result they also provide a comprehensive survey of the available results regarding the well-posedness of several photometric stereo problems and they examine in detail the case of the orthographic projection. For this problem they prove that, even in the case of spherical harmonics, the concave/convex ambiguity still persists. They conclude with some numerical experiments.

Chapter 6, authored by Shi et al., represents a concise survey on SfP. After an introduction, the authors briefly recall the Fresnel theory, which is the theoretical basis of polarisation imaging. The process for the formation of a polarisation image is described, giving also details on the data acquisition. The authors discuss the estimation of azimuth and zenith angles of the normal for surfaces with different reflectance properties (specular, diffuse, and mixed polarisation). Then, the combination of SfP with auxiliary information is explored, e.g. geometric cues, spectral cues, photometric cues and deep learning. Moreover, applications which can benefit from polarisation information, in addition to the 3D-shape recovery, are presented. The chapter ends with a discussion on problems still open.

Finally, Chap. 7 by Dahlan et al. addresses the problem of facial aging estimation, using light scattering photometry. It is shown that the roughness parameter of several BRDF models is correlated with the age. Therefore, facial aging estimation can be carried out by fitting a BRDF model to an input image. In this work, geometry estimation is carried out using photometric stereo, by resorting to an illumination dome. Then, given the estimated normals, an image with frontal lighting is used to infer the BRDF parameters. Various experiments are carried out to study whether these estimated parameters correlate with age and it is shown that this is the case for the roughness parameter. Several tests on real images are illustrated and analysed.