Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

In his novel “Let’s stay together” of 1956, Isaac Asimov has described a critical situation: an officer of the US was interviewing the head of the Department of Robotics in Cheyenne to know more about the research in that area. He was scared by the fact that ten humanoids built by the enemy where walking around the States. At that time this was completely unrealistic, but it seems that the times have changed.

Nowadays robots can walk, read, speak, play a piano and behave as humans in many circumstances although their “intelligence” is still rather limited. However, recent advancements in computer vision can make them recognize a scene or an object and walk safely in an unknown environment avoiding obstacles. This is due to the findings in many research areas, including computer vision. Here we will describe in particular the Shape-from-Shading (SfS) problem, an inverse problem which amounts to reconstruct the 3D structure of an object given one 2D grey value image. For this task, the SfS process relies on informations on the illumination and the light reflectance in the scene. This problem, first introduced by Horn [9] in his PhD thesis, is a classic inverse problem in computer vision with many potential applications beyond robotics, see e.g. [16, 6, 11, 20, 7].

The Classical Shape from Shading Model

Let us give a brief outline of the SFS problem and introduce the basic assumptions for the classical model based on a single image. We attach to the camera a three-dimensional coordinate system (Oxyz), such that Oxy coincides with the image plane and Oz coincides with the optical axis, orthogonal projection is usually considered to simplify the model. Under this simplification, the visible part of the scene is, up to a scale factor, a graph z = u(x), where x = (x, y) is an image point. The datum in this problem is our image which is represented by the irradiance (or brightness) function I. The function I is measured at each point (pixel) of the image, for example in terms of a greylevel (from 0 to 255). However, to construct a continuous model, we will assume that I takes real values in the interval [0, 1] where 0 corresponds to black and 1 to white passing through infinitely many gray levels. The SFS problem is characterized by the image irradiance equation:

$$\displaystyle{ R(\mathbf{n}(\mathbf{x})) = I(\mathbf{x}), }$$
(1)

where I(x) is the irradiance (or brightness) measured in the image at point x and R(n(x)) is the reflectance function, which gives the value of the light re-emitted by the surface as a function of its orientation i.e., of the unit normal n(x) to the surface at the point (x, u(x)). This normal can be expressed as:

$$\displaystyle{ \mathbf{n}(\mathbf{x}) = \frac{1} {\sqrt{1 + p(\mathbf{x} )^{2 } + q(\mathbf{x} )^{2}}}\,(-p(\mathbf{x}),-q(\mathbf{x}),1), }$$
(2)

where

$$\displaystyle{ p = \partial u/\partial x,\;q = \partial u/\partial y\,\mbox{ so that}\,\nabla u(\mathbf{x}) = (p(\mathbf{x}),q(\mathbf{x})) }$$
(3)

(here ∂ u∂ x and ∂ u∂ y are the partial derivatives uf u with respect to the x and y coordinates and ∇u is the gradient of u). The height function u, which is the unknown of the problem, has to be reconstructed on a compact domain \(\varOmega \subset \mathbb{R}^{2}\), called the “reconstruction domain”. Assume that there is a unique light source at infinity whose direction is indicated by the unit vector \(\omega = (\omega _{1},\omega _{2},\omega _{3}) \in \mathbb{R}^{3}\). and let us assume, for simplicity, that ω is given. In order to simplify the model, it is usually assumed that the surface has uniform reflectance properties, so the light which is reflected by the surface just depends on the angle between the normal to the surface and the direction of the light ω scaled by a positive factor γ (the albedo). A surface with these properties is called a Lambertian surface and for it we can finally write (1) as R(n(x)) = γ ω ⋅ n(x). Setting the light source at infinity guarantees that all the light rays are parallel. Then, setting the uniform albedo equal to 1 and applying definition (2), we obtain

$$\displaystyle{ I(\mathbf{x})\sqrt{1 + \vert \nabla u(\mathbf{x} )\vert ^{2}} + (\omega _{1},\omega _{2}) \cdot \nabla u(\mathbf{x}) -\omega _{3} = 0\qquad \mbox{ for}\,\,\mathbf{x} \in \Omega, }$$
(4)

which is a first order non-linear partial differential equation of the Hamilton-Jacobi type (here ⋅ is the scalar product). We will also consider the simplified equation which appears in most of the literature and corresponds to a vertical light source at infinity i.e., ω = (0, 0, 1). Then (4) reduces to the eikonal equation

$$\displaystyle{ \left \vert \nabla u(\mathbf{x})\right \vert = f(\mathbf{x})\qquad \mbox{ for}\,\,\mathbf{x} \in \Omega, }$$
(5)

where

$$\displaystyle{ f(\mathbf{x}) = \sqrt{ \frac{1} {I(\mathbf{x})^{2}} - 1}. }$$
(6)

Since this equation just depend on ∇u, if u is a solution also u + constant is a solution so one has (at least) to add the value at the boundary of the silhouette Ω in order select a unique solution (we will see later that this is not enough to get uniqueness). Equation (4) or (5) must be complemented with boundary conditions on ∂ Ω and with additional information to select a unique solution. For an image containing an “occluding boundary”, it is usual to consider this boundary as ∂ Ω. For example, in Fig. 1, the part of the image representing the object in greylevels is the silhouette is Ω and ∂ Ω coincides with the occluding boundary.

Fig. 1
figure 1

Object with occluding boundary, which might be used to define the silhouette Ω (the white region on the right)

A natural choice is to consider Dirichlet type boundary conditions in order to take into account (at least) two different possibilities. The first corresponds to the assumption that the surface is standing on a flat background i.e., we set:

$$\displaystyle{ u(\mathbf{x}) = 0\qquad \mbox{ for}\,\,\mathbf{x} \in \partial \Omega. }$$
(7)

The second possibility occurs when the height of the surface on the occluding boundary is known, e.g. this happens when we know (or assume) that the object is a surface of revolution around a given axis. This situation leads to the more general condition:

$$\displaystyle{ u(\mathbf{x}) = g(\mathbf{x})\qquad \mbox{ for}\,\,\mathbf{x} \in \partial \Omega. }$$
(8)

The solution of the above Dirichlet problems (4)–(7) or (4)–(8) will characterize a surface corresponding to the brightness function I(x) measured in Ω which is the datum of the problem. Points x ∈ Ω such that I(x) is maximal correspond to the particular situation where ω and n(x) point in the same direction: these points are usually called “singular points” and they play an important role to characterize a unique solution.

The first remark is that nonlinear partial differential equation such as (1) do not have in general regular solutions (i.e. solutions which are differentiable at every point). So the natural framework to set the problem is that of weak solutions which do not satisfy the equation everywhere but are at least continuous with (possibly) discontinuities in the gradient. Starting from the paper by Rouy and Tourin [18], the most recent approach to the resolution of SFS uses the notion of viscosity solutions to first order PDEs, see e.g. [1]. This is also natural from the point of view of computer vision applications: if you look around you can see many objects whose surface is not smooth. Unfortunately, the Dirichlet problem (4)–(8) can have several “weak solutions” in the viscosity sense and also several classical solutions. As an example, all the surfaces represented in Fig. 2 are viscosity solutions of the same equations (5)–(7), which is a particular case of (4)–(8). The solution represented in Fig. 2a is the maximal solution and is regular (differentiable). All the non-smooth a.e. solutions which can be obtained by a reflection with respect to a horizontal axis, are still admissible weak solutions (see Fig. 2b). In fact the value of the scalar product ω ⋅ n(x) corresponding to any of these surfaces is the same for every x and coincides with the same value computed on the regular solution (Fig. 2a). This is the concave/convex ambiguity of the SfS model. In this example, the lack of uniqueness of the viscosity solutions is due to the presence of a singular point, where the right hand side of (5) vanishes. An additional effort is then needed to define our solution since the lack of uniqueness is also a big drawback when trying to compute a numerical solution. In order to circumvent those difficulties, the problem is usually solved by adding some informations such as the height at each singular point [14] or the complete knowledge of a level curve [12, 13]. More recently, an attempt has been made to eliminate the need for a priori additional information. Always in the framework of viscosity solutions an effort has been made to characterize the maximal solution without additional informations and to build approximation schemes which converge to that solution [8, 3]. Let us finally mention that Eq. (4) is not the most general equation for the SfS. We made along the way many assumptions but, in general, real materials are not Lambertian, the albedo is not constant, the direction ω is not known a priori. The situation is even more complex if interreflections are taken into account, but will not treat these cases in detail.

Fig. 2
figure 2

Illustration of the concave/convex ambiguity: (a) maximal solution and (b) a.e. solutions giving the same image

In order to obtain a numerical solution of the SfS problem two main strategies are available. The first is to solve directly the partial differential equation combined with the boundary conditions, this method will give an approximation of u provided additional conditions on the approximation scheme are satisfied. The second is the variational approach where we work first on an optimization problem in the unknown (p, q) at every pixel of the image. Then, using the field (p, q) and recalling their definition (3) we can reconstruct u by an integration along a path, made for example by a an horizontal plus a vertical step. We will not give the details here but they can be found in the survey [7].

The first approach is rather standard and the difficulty is to make the discretization of the continuous equation converge to the right solution. In the variational approach the main point is the choice of the functional to be minimized in the first step. In this approach, some regularity of the surface is assumed, and the approximate solutions computed are typically local minima of the functional. But u appears, in the image irradiance equation, only through its first derivatives p and q, which are two non-independent functions. In fact, if u is regular and twice differentiable

$$\displaystyle{ \partial p/\partial y = \partial q/\partial x. }$$
(9)

and this condition guarantees integrability along a path. The only problem with these unknowns is that p or q become infinite at every point x belonging to the occluding boundary. This problem is not a cause for concern if no point x in the reconstruction domain Ω is such that I(x) = 0. As Eq. (9) is a hard constraint on p and q, the most natural functional associated with Eqs. (1) and (9) is

$$\displaystyle{ \begin{array}{cl} \mathcal{F}_{1}\left (p,q,\mu \right ) =\int _{\mathbf{x}\in \Omega }&\left [r(p(\mathbf{x}),q(\mathbf{x})) - I(\mathbf{x})\right ]^{2}\,d\mathbf{x} \\ & +\int _{\mathbf{x}\in \Omega }\mu (\mathbf{x})\left [\partial p/\partial y(\mathbf{x}) - \partial q/\partial x(\mathbf{x})\right ]\,d\mathbf{x},\end{array} }$$
(10)

(here the function r is such that R(n(x)) = r(p(x), q(x))) Since the minimization of \(\mathcal{F}_{1}\) is rather slow, Horn and Brooks have also proposed the functional,

$$\displaystyle{ \begin{array}{cl} \mathcal{F}_{2}\left (p,q\right ) =\int _{\mathbf{x}\in \Omega }&\left [r(p(\mathbf{x}),q(\mathbf{x})) - I(\mathbf{x})\right ]^{2}\,d\mathbf{x} \\ & +\lambda _{\mbox{ int}}\int _{\mathbf{x}\in \Omega }\left [\partial p/\partial y(\mathbf{x}) - \partial q/\partial x(\mathbf{x})\right ]^{2}\,d\mathbf{x}. \end{array} }$$
(11)

where the constraint term for integrability becomes a penalty term and the positive constant weight λ int called “integrability factor” (note that the choice of λ int is arbitrary).

The PSFS Model and Related Equations

Several improvements with respect to the original problem presented in the previous section have been studied. One of the most relevant is the introduction of perspective deformations in the model as proposed in [16, 5, 19], also taking into account a light attenuation factor [17].

Let us sketch the model for PSFS with point light source located at the optical center O and light attenuation term (see Fig. 3). Let x = (x, y) be a point in the image domain Ω, where Ω is an open bounded subset of \(\mathbb{R}^{2}\). Let I = I(x, y) > 0 be the normalised brightness function. We have \(I = \frac{E(x,y)} {\sigma }\), where E is the greylevel of the given image and σ is the product of the surface albedo (which tells us to which extent the surface reflects light) and the light source intensity. Moreover, f denotes the focal length, i.e. the distance between the optical center C of the camera and the two-dimensional plane to which the scene of interest is mapped.

Fig. 3
figure 3

Notations for the PSFS model with point light source at optical center O

M will represent a generic point on the surface Σ to be reconstructed. We choose as unknown of the problem the function \(u:\Omega \rightarrow \mathbb{R}\) such that

$$\displaystyle{ M = M(x,y) = u(x,y)\,m'\,, }$$
(12)

where

$$\displaystyle{ m' = \frac{\mathsf{f}} {\sqrt{x^{2 } + y^{2 } + \mathsf{f} ^{2}}}\ m\quad \mbox{ and }\quad m = (x,y,-\mathsf{f})^{\top }. }$$
(13)

Note that, according to these notations, u > 0 holds as the depicted scene is in front of the camera. Here we denote by r(x, y) the distance between the point light source and the point M(x, y) on the surface. It holds \(u(x,y) = r(x,y)/\mathsf{f}\), since the light source location coincides with the optical center.

The model associated to the PSFS problem is obtained again by the image irradiance equation (1)

We denote by ω(x, y) the unit vector representing the light source direction at the point M(x, y) (note that in the classic SFS model this vector is constant but this is not the case here):

$$\displaystyle{ \omega (x,y) = \frac{(-x,-y,\mathsf{f})^{\top }} {\sqrt{x^{2 } + y^{2 } + \mathsf{f} ^{2}}}. }$$
(14)

Adding the assumptions of a light attenuation term and of a Lambertian surface, the function R is defined as

$$\displaystyle{ R(\mathbf{n}(x,y)) = \frac{\omega (x,y) \cdot \mathbf{n}(x,y)} {r(x,y)^{2}}, }$$
(15)

with an attenuation factor which is equal to the inverse of the squared distance from the source. Expression (15) would still hold for any location of the point light source, but the same would not be true for the equality \(u(x,y) = r(x,y)/\mathsf{f}\) nor for (14). The case of a light source coinciding with the optical center corresponds more or less to endoscopic images [16] and to photographs taken at short distance with the camera flash [17]. Another notable advantage of the PSFS model using a point light source at the optical center is that there is no shadow in the image.

Finally, by the irradiance equation and (15) we obtain the PSFS equation

$$\displaystyle{ \frac{\omega (x,y) \cdot \mathbf{n}(x,y)} {r(x,y)^{2}} = I(x,y). }$$
(16)

In order to write down the corresponding PDE, it is useful to introduce the new unknown v = ln(u) (we recall that u > 0). Finally, Eq. (16) can be written as

$$\displaystyle{ H(x,y,v,\nabla v):= \frac{I(x,y)} {Q(x,y)}\,\mathsf{f}^{2}\,W(x,y,\nabla v) - e^{-2v(x,y)} = 0\,,\quad (x,y) \in \varOmega }$$
(17)

where

$$\displaystyle{ Q(x,y):= \frac{\mathsf{f}} {\sqrt{x^{2 } + y^{2 } + \mathsf{f} ^{2}}} }$$
(18)

and

$$\displaystyle{ W(x,y,\nabla v):= \sqrt{\mathsf{f} ^{2 } \|\nabla v\|^{2 } + (\nabla v \cdot (x, y))^{2 } + Q(x, y)^{2}}, }$$
(19)

(here \(\|\cdot \|\) denotes the Euclidean norm). Figure 4 shows the result for this model on a real image representing two cups (see [2] for more details on these models).

Fig. 4
figure 4

(Left) Real image; (right) 3D reconstruction

Recent Developments on Shape from Shading Models

More recently the classic model based on Lambertian assumptions and on orthographic projection has been improved in order to deal with more realistic assumption and improve the results on real images. For example, one can consider the light source at finite distance from the object so that the rays are not parallel and/or include black shadows (i.e. regions where I(x) = 0) in the image. The black shadows are usually forbidden for two main reasons. The first reason is technical, in the classical model the eikonal equation can be applied only where I(x) > 0 because I appears at the denominator of the right-hand side term (6). The second is the remark that if the image is black we do not have informations on the real surface so we can not reconstruct it. Every surface whose height is below the “black ray” separating light and shadow could be a reasonable solution. Note that “black shadow” regions are typically projected by the object on the background and/or on its own surface and appear when the light source moves away from the vertical. This situation often appears in real images taken at sunset or at dawn. Since it is impossible, for lack of information, to reconstruct the real surface corresponding to these regions we can follow two different strategies. Either we apply the model only to the regions where I > 0 and deal with all the boundary conditions corresponding to these regions, or we solve the problem on the whole image and reconstruct a “virtual surface” on the black shadow regions. It is reasonable to define this virtual surface as the one described by the rays separating light and shadows.

The model has been extended in order to include in the solution the flat surface separating light and shadow (a result is shown is Fig. 5).

Fig. 5
figure 5

From left to right: the real image with black shadows, the grid discretization of the domain Ω, the 3D reconstruction

In Stereo Shape from Shading, one can consider two images from different point of view under the same light conditions. This situation mimics our vision system based on two eyes separated by a small distance. It has been shown that for this specific problem there is just one solution and the problem is well-posed [4].

Another, recent improvement is the stereophotometric approach which is now considered a very promising technique. In this approach, we use several images from the same point of view but taken under different light source directions. This is the laboratory situations where one can move the light source around an object while taking several pictures. Clearly, for every image we can use one of the previous models for SfS (orthographic, perspective, light attenuation) and this gives rise to a system of nonlinear partial differential equations. For example, by applying the orthopraphic model and taking two images I 1 and I 2 which correspond respectively to two different light source directions ω 1 and ω 1, we get the following system of nonlinear partial differential equations

$$\displaystyle{ \left \{\begin{array}{ll} I_{1}(\mathbf{x})\sqrt{1 + \vert \nabla u(\mathbf{x} )\vert ^{2}} + (\omega _{1}^{1},\omega _{2}^{1}) \cdot \nabla u(\mathbf{x}) -\omega _{3}^{1} = 0,\quad &\mbox{ for}\ \mathbf{x} \in \Omega \\ I_{2}(\mathbf{x})\sqrt{1 + \vert \nabla u(\mathbf{x} )\vert ^{2}} + (\omega _{1}^{2},\omega _{2}^{2}) \cdot \nabla u(\mathbf{x}) -\omega _{3}^{2} = 0,\quad &\mbox{ for}\ \mathbf{x} \in \Omega \\ \end{array} \right. }$$
(20)

Note that the number of equations is determined by the number of images and in some cases the system has a unique solution with just two images although, in general, to select a unique solution three images are necessary. For example, starting from the three images of the Beethoven bust (Fig. 6) we can compute the solution of the system and get back the 3D reconstruction of the surface (Fig. 7) (see [15] for more details).

Fig. 6
figure 6

Photometric stereo: Three real images of Beethoven bust

Fig. 7
figure 7

The 3D reconstruction of the Beethoven bust starting from the data in Fig. 6

As we mentioned at the beginning, the above techniques allow for a rather accurate 3D reconstructions of the objects appearing in a scene. Then, this technology can be applied in medical surgery where the image acquisition is obtained via an endoscope and is important to have a precise description of the area. At a macroscopic level the same inverse problem appears in space investigations when a satellite takes several pictures of the same area allowing for the construction of its altitude map which can be used later for robot explorations in the area.

Another interesting applications of the SfS models is related to the preservation of cultural heritage. The advanced techniques for 3D reconstruction are based on laser or optical measurements but 3D scanners are rather expensive since a specific hardware for the data acquisition is required (see for example the movie presentation [21]). An accurate reconstruction based on SfS model could produce a digitization of a statue just using low cost cameras in a controlled environment where one can adjust the light sources. This is clearly an advantage since the method can be easily applied with low investments.

Another example comes from the digitization of ancient manuscripts. These books are very delicate and their reproduction requires an expensive hardware in order to preserve the volumes. Moreover, the copy of the manuscript via a standard camera has a big perspective distortion which is due to the curvature of the pages near the center of the book. Typically the text is not horizontal due to this distortion and classical OCR (Optical Character Recognition) systems cannot work on it, so an automatic classification or analysis of the text is not feasible. Via SfS technique one can compute the real surface of the pages and then reapply the content of the page on it correcting the perspective distortion as in Fig. 8 (more details can be found in [6]).

Fig. 8
figure 8

An example of corrections for book reproduction via SfS models

So computer vision can really help robots to make a walk and, finally, the science fiction novel imagined by Asimov becomes reality.