A Variational Model for Intrinsic Light Field Decomposition

Alperovich, Anna; Goldluecke, Bastian

doi:10.1007/978-3-319-54187-7_5

Anna Alperovich¹⁷ &
Bastian Goldluecke¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10113))

Included in the following conference series:

Asian Conference on Computer Vision

Abstract

We present a novel variational model for intrinsic light field decomposition, which is performed on four-dimensional ray space instead of a traditional 2D image. As most existing intrinsic image algorithms are designed for Lambertian objects, their performance suffers when considering scenes which exhibit glossy surfaces. In contrast, the rich structure of the light field with many densely sampled views allows us to cope with non-Lambertian objects by introducing an additional decomposition term that models specularity. Regularization along the epipolar plane images further encourages albedo and shading consistency across views. In evaluations of our method on real-world data sets captured with a Lytro Illum plenoptic camera, we demonstrate the advantages of our approach with respect to intrinsic image decomposition and specular removal.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Shadow and Specularity Priors for Intrinsic Light Field Decomposition

Inverse Lightfield Rendering for Shape, Reflection and Natural Illumination

Densely-Sampled Light Field Reconstruction

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Intrinsic image decomposition aims at separating an illumination invariant reflectance image from an input color image. Such a decomposition has numerous applications in color enhancement, image segmentation, pattern recognition, and object tracking [1,2,3]. The separation of the shading component is used in BRDF estimation and shadow removal methods [4,5,6]. However, while intrinsic images have many applications, recovering them remains a substantial challenge for researchers. Estimation of intrinsic components is an ill-conditioned problem: a single image can be decomposed into infinitely many different combinations of reflectance and illumination. Thus, additional constraints or priors are needed to select an appropriate solution. Priors on reflectance (albedo) and shading are usually based on physical principles of light and object interaction, scene geometry, and material properties, as well as on expert knowledge of how intrinsic images should look like. Finally, decomposition into reflectance and illumination components is suitable only for diffuse (Lambertian) objects. According to the dichromatic model introduced by Shafer [7], if glossy (non-Lambertian) objects are present in a scene, a specular term should be taken into account. Many classical approaches fail when the target scene has non-Lambertian objects; as specularity depends on view point, it is hardly possible to estimate it from a single image.

To improve accuracy of intrinsic images, researchers use additional information, for instance, a video sequence instead of a single image, RGB-D imaging sensors, or manual labeling. This information may be incomplete, suffer from sensor noise, calibration errors, and be dependent on a human factor. Computing this information may be time consuming, require complex experiments, and special equipment. Thus, it is hardly possible to use it in for example industrial applications.

In this work, we leverage light fields for intrinsic image decomposition. 4D light fields are widely used in image analysis and computer graphics. The key idea of light field is to represent a scene not as a traditional 2D image, which contains information about accumulated intensity at each image point, but as a collection of images of the same scene from slightly different view points, see Fig. 1. The specific structure of the light field allows a wide range of applications. It is used for efficient depth estimation, virtual refocusing, automatic glare reduction as well as object insertion and removal [8,9,10]. Recently, the inherent structure of the light field was leveraged for shape and BRDF estimation [4, 11].

Contributions. In this paper, we formulate and solve intrinsic light field decomposition by means of an optimization problem for albedo, shading, and specularity. As far as we are aware, this is the first time this problem is addressed for 4D light fields. Based on a detailed review of the state-of-the-art in intrinsic image decomposition, we propose priors for modeling all unknowns based on additional data available in the light field. Epipolar plane image constraints encourage albedo and shading to be constant for projections of the same scene point. By means of a novel term which is specific to light fields, we can also estimate specularity and highlights, and separate them from shading and albedo components. In experiments, we demonstrate that we outperform state-of-the-art intrinsic image decompostion based on RGB plus depth data [13], as well as an alternative approach to detect and remove light field specularity [10, 14].

2 Related Work

Intrinsic images have been a challenging research topic for many years. First introduced by Barrow and Tenenbaum [15], they divide an observed image into the product of a reflectance and illumination image. According to Land and McCann [1], large discontinuities in pixel intensities correspond to changes in reflectance, and the remaining variation corresponds to shading. They proposed a Retinex theory that was successfully extended and implemented for intrinsic image decomposition by Tappen et al. [16], Chung et al. [17], Grosse et al. [18], Finalayson et al. [5, 6], and many others.

Besides the Retinex approach, it is common to include additional regularization terms that describe certain physical properties of intrinsic components. Barron and Malik [19,20,21] introduce priors on reflectance, shape, and illumination to recover intrinsic images. Shen et al. [22] employ texture information. Finalyson et al. [23] search for an invariant image which is independent of lighting and shading. Gehler and Rother [24] model reflectance values drawn from a sparse set of basis colors. Bell et al. [25] also assume that reflectance values come from a predefined set which is unique for every image, then they iteratively adjust reflectance values in this set.

Recently, a significant improvement in intrinsic image decomposition was achieved by using richer types of input data. Having a sequence of images with depth information available allows to penalize albedo and shading consistency between different views, Lee et al. [26]. Depth or disparity information allows to incorporate spatial dependencies between pixels to construct shading prior, Jeon et al. [27]. Chen and Koltun [13] develop a model based on RGB-D information. They separate shading into two components: direct and indirect irradiance that significantly improved decomposition results. Barron and Malik [21] use depth to extend their SIRFS model [20] such that it is applicable for natural scenes.

Although decomposition algorithms nowadays achieve spectacular results for Lambertian scenes, their performance is suffering in the non-Lambertian case in the presence of highlights or specularity. In our paper, we will make use of the rich structure in the light field to estimate specularity for non-Lambertian objects. According to the dichromatic model introduced by Shafer [7], diffuse and specular reflections behave differently. Diffuse objects reflect incident light in multiple directions equally, thus, their color is independent of viewpoint. Specular objects reflect light in a certain direction that depends on orientation, and thus their color depends on viewpoint, light source color, and physical material properties. Blake and Bülthoff [28] made an extensive analysis of specular reflections, and propose a strategy for recovering 3D structure using specularity. Swaminathan et al. [29] study photometric properties of specular pixels, and model their motion depending on the surface geometry. Adato et al. [30] model specular flow with non-linear partial differential equations. Tao et al. [10, 14] introduced depth estimation for glossy surfaces. They leverage the light field structure to cluster pixels in specular and specular-free groups, then they remove specular components from the input light field.

3 Intrinsic Light Field Model

Light Field Structure. We briefly describe the light field structure and review notation. For more detailed information, we refer to [12, 31]. A light field is defined on 4D ray space ${\mathcal R} = \varPi \times \varOmega $, which parametrizes rays ${\varvec{r}} = (x,y,s,t)$ by their intersection coordinates with two planes $\varPi $ and $\varOmega $. Intersection with the focal plane $\varPi $ gives view point coordinates (s, t), while the image plane $\varOmega $ denotes image coordinates (x, y), see Fig. 1. A 4D light field is now a map $L: \mathcal {R} \rightarrow \mathbb {R}^{n}$ on ray space. It can be scalar or vector-valued for grey scale or color images, respectively.

Light Field Decomposition. We model an intrinsic light field as a function

$$\begin{aligned} L({\varvec{r}}) = A({\varvec{r}})S({\varvec{r}}) + H({\varvec{r}}), \end{aligned}$$

(1)

where the radiance L of every ray ${\varvec{r}}$ is decomposed into albedo A, shading S, and specular component H. The functions $L,A,S,H : \mathcal {R} \rightarrow \mathbb {R}^{3}$ map ray space to RGB values. Albedo represents the color of an object independent of illumination and camera position. Shading describes intensity changes due to illumination, inter-reflections, and object geometry. Finally, specularity represents highlights that occur in case of non-Lambertian objects. They depend on illumination, object geometry, and camera position.

The common assumption in the literature related to intrinsic image decomposition is to model the shading component as mono-chromatic [16, 18, 24]. However, in case of multiple light sources or non-Planckian light, this modeling assumption is not sufficient. Thus, we further decompose shading into mono-chromatic shading s and trichromatic light source color C,

$$\begin{aligned} S({\varvec{r}}) = s({\varvec{r}}) C({\varvec{r}}). \end{aligned}$$

(2)

We directly compute the illumination component C in a pre-processing step with the illuminant estimation algorithm developed by Yang et al. [32] applied to the center view, assuming that it will be similar across views. After illumination color is computed, we exclude it from the original light field by switching to the new decomposition model

$$\begin{aligned} \frac{L({\varvec{r}})}{C({\varvec{r}})} = A({\varvec{r}})s({\varvec{r}}) + \frac{H({\varvec{r}})}{C({\varvec{r}})} \end{aligned}$$

(3)

which is illumination color free. Vector division is to be understood component-wise. As a further simplification, we obtain System (3) in linear form

$$\begin{aligned} L^{log}({\varvec{r}}) = A^{log}({\varvec{r}}) + {\varvec{1}}s^{log}({\varvec{r}}) + H^{log}({\varvec{r}}, A,s,H) \end{aligned}$$

(4)

by applying the logarithm. We now want to solve (4) with respect to albedo, shading, and specularity.

System (4) is ill-posed, since its number of variables is three times larger than the number of equations. To select a solution that agrees with physical meaning of intrinsic components, we pose it as an inverse problem and introduce a number of constraints or regularization terms for albedo, shading, and specularity. As usual, dependence of $H^{log}$ on all arguments except ${\varvec{r}}$ is ignored during optimization, and it is estimated as another independent component. We thus solve a global energy minimization problem where we weight the residual of (4) with different priors and regularization terms,

$$\begin{aligned} \begin{aligned} \mathop {{{\mathrm{arg\,min}}}}\limits _{(A^{log},s^{log},H^{log})} \Bigl \{\;&{||L^{log}({\varvec{r}}) - A^{log}({\varvec{r}}) - \varvec{1}s^{log}({\varvec{r}}) - H^{log}({\varvec{r}}) ||}_2^2 \, + \,\dots \\ \dots +\, P_\text {albedo}(A^{log})&\,+\, P_\text {shading}(s^{log}) \,+\, P_\text {spec}(H^{log}) \,+\, J( A^{log}, s^{log}) \;\Bigr \}. \end{aligned} \end{aligned}$$

(5)

The priors $P_\text {albedo}$ and $P_\text {shading}$ for albedo and shading essentially apply the key ideas in intrinsic image decomposition to every subaperture image. They are defined in Sect. 4. The specularity prior $P_\text {spec}$ is specific to light fields, and a main contribution of our work. It is described in detail in Sect. 5. Finally, the smoothing prior J across ray space encourages spatial smoothness and in particular consistency across different subaperture images. It relies on disparity, and is described together with the optimization framework in Sect. 6.

4 Albedo and Shading Priors

We start with describing the priors, which are the key for obtaining an accurate solution for intrisic light field decomposition from the variational model (5). In this section, we introduce the priors $P_\text {albedo}$ and $P_\text {shading}$ for albedo and shading, respectively.

Albedo. To model albedo, we combine ideas of Retinex theory, which is widely used to decompose an image into shading and reflectance components [16, 17, 33], with the idea that pixels with equal chromaticity are likely to have similar albedo [13, 26, 34]. Thus, the prior for albedo is the sum of two energies, $P_\text {albedo}(A^{log}) = E_\text {retinex}(A^{log}) + E_\text {chroma}(A^{log})$, corresponding to these two models.

Under the simplifying assumption that image derivatives in the log-domain are caused either by shading or reflectance, we classify the derivative at every ray as caused by shading or albedo. The idea is to compute a modified gradient field $\hat{g}$ which assigns a zero value to all derivatives that are caused by shading. The derivative classification is done with approach similar to Color Retinex used in [17, 18]. A partial spatial derivative $L_x$ of the light field is classified as albedo if neighbouring RGB vectors into the direction of differentiation are not parallel, or if it is above a certain magnitude. Thus, the modified derivative is

$$\begin{aligned} \hat{g}_x = {\left\{ \begin{array}{ll} L_x &{} \text { if } \varvec{c}_{x+1,y} \cdot \varvec{c}_{x,y} < \tau _{col} \text { or } |L_x| > \tau _{grad},\\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

(6)

Above, $\varvec{c} = (r,g,b)^T$, the constant $\tau _{col}>0$ is a threshold above which two vectors are assumed to be parallel, and $\tau _{grad} >0$ is another user-defined constant. In a similar way, we estimate the modified partial derivative $\hat{g}_y$ in the second spatial direction.

The gradient of the albedo should be equal to the gradient field modified by retinex, thus we finally obtain the retinex energy

$$\begin{aligned} E_{retinex}(A^{log}) = \lambda _{retinex}\int _{\mathcal {R}} {||\partial _x A^{log}({\varvec{r}}) - \hat{g}_x({\varvec{r}}) ||}^2 \,+\, {||\partial _y A^{log}{\varvec{r}}- \hat{g}_y({\varvec{r}}) ||}^2 d{\varvec{r}}. \end{aligned}$$

(7)

The second regularization term is based on chromaticity similarities between adjacent rays. The basic idea is that if two neighboring rays of the same view have close chromaticity values, they have the same albedo. We use the chromaticity measure described by Chen and Koltun [13], which gives a weight $\alpha _{{\varvec{r}},\varvec{q}}$ for how likely it is that two rays ${\varvec{r}}$ and $\varvec{q}$ have the same albedo,

$$\begin{aligned} {\begin{matrix} \alpha _{{\varvec{r}},{\varvec{q}}} = \Big ( 1 - \frac{ {||L^{ch}({\varvec{r}}) - L^{ch}(\varvec{q})||} }{\max \limits _{{\varvec{r}}^\prime \in \varOmega ,\,{\varvec{q}}^\prime \in N_A({\varvec{r}}^\prime )} { ||L^{ch}({\varvec{r}}^\prime ) - L^{ch}({\varvec{q}}^\prime )||}} \Big ) \sqrt{L^{lum}({\varvec{r}}) L^{lum}(\varvec{q})}, \end{matrix}} \end{aligned}$$

(8)

where $N_A({\varvec{r}})$ is a neighborhood of ${\varvec{r}}$, and $L^{ch}$ and $L^{lum}$ are chromaticity and luminance. The chromaticity energy

$$\begin{aligned} E_{chroma}(A^{log}) = \lambda _{chroma}\int _{{\mathcal R}} \sum _{N_A({\varvec{r}})} \alpha _{{\varvec{r}},{\varvec{q}}} \, {||A^{log}( {\varvec{r}} ) - A^{log}( \varvec{q} ) ||}^2 \; d {\varvec{r}} \end{aligned}$$

(9)

now penalizes dissimilarity of albedos that have chromaticity measure $\alpha _{{\varvec{r}},\varvec{q}}$ close to one. Note that we use a mixed continuous/discrete notation for ${\varvec{r}}$ and ${\varvec{q}}$, as our choice of neighbourhood is inherently discrete, while we require a variational rayspace model in the optimization framework, see Sect. 6.

To construct the neighborhoods $N_A({\varvec{r}})$ for every ray ${\varvec{r}}\in {\mathcal R}$, we impose the assumption that spatially close points in $\mathbb {R}^{3}$ probably have similar albedo. We select $k_A$ nearest neighbors in $\mathbb {R}^{3}$ for the point P on the scene surface intersected by ${\varvec{r}}$, and choose $m_A$ out of $k_A$ neighbors randomly. We believe that this connectivity strategy has several advantages over fully random connectivity: by defining neighbors we increase the chance to meet points with similar chromaticity, and by random connectivity within neighboring points we avoid disconnected chromaticity clusters.

Shading. The shading prior is also the sum of two components, $P_\text {shading} = E_\text {normal} + E_\text {spatial}$. To model the first component, we adopt the well-known assumption [13, 26, 35] that scene points which are spatially close to each other and share the same orientation are likely to have similar shading. To facilitate this, we construct the six-dimensional set

$$\varGamma := \left\{ \bigl ( \,P({\varvec{r}}),\; {\varvec{n}}( P({\varvec{r}}) ) \,\bigr ) \;:\; {\varvec{r}}\in {\mathcal R} \right\} ,$$

where $P({\varvec{r}})$ is again the point of the scene surface intersected by ${\varvec{r}}$, and ${\varvec{n}}(P({\varvec{r}}))$ the corresponding outer normal. The set of neighbours $N_S({\varvec{r}})$ now consists of the $k_N$-nearest neighbours of ${\varvec{r}} \in {\mathcal R}$ in the six-dimensional space $\varGamma $. The regularization term

$$\begin{aligned} E_{normal}(s^{log}) = \lambda _{normal}\int _{{\mathcal R}} \sum _{{\varvec{q}} \in N_S({\varvec{r}})} ( s^{log}({\varvec{r}}) - s^{log}({\varvec{q}}) )^2 \; d {\varvec{r}} \end{aligned}$$

(10)

thus penalizes shading components to be the same if corresponding 3D points are spatially close to each other and their outer normals have similar orientations.

To account for indirect shading, which is caused by inter-reflections between objects in a scene, we also include a purely spatial regularization term

$$\begin{aligned} E_{spatial}(s^{log}) = \lambda _{space}\int _{{\mathcal R}} \sum _{{\varvec{q}}\in N_D({\varvec{r}})} ( s^{log}({\varvec{r}}) - s^{log}({\varvec{q}}) )^2 \; d {\varvec{r}}, \end{aligned}$$

(11)

where the neighborhood $N_D({\varvec{r}})$ denotes the $k_D$ nearest neighbors of the 3D scene point first intersected by ${\varvec{r}}$.

5 Prior for the Specular Component

In this section, we describe the specular prior in the variational energy (5). We first discuss the modeling assumptions, then show how to compute a mask for candidate specular pixels based on these assumptions, and finally construct the prior $P_\text {spec}$.

Modeling Assumptions. We combine several approaches to model specularity [2, 10, 14, 28,29,30, 36]. According to the specular motion model [28, 29], specularity changes depend on surface geometry. For instance, regions of low curvature on a specular object create color intensity changes within different views. Specular regions of high curvature result in high pixel intensities in all subaperture views. Thus, curvature information can be useful while estimating specularity. In practice, however, it turns out that curvature estimation is very sensitive to inaccuracies of the 3D model of a scene. Imperfect disparity maps lead to a certain amount of noise in the estimated spatial coordinates, thus curvature information becomes highly unreliable. Instead of using curvature information directly, we therefore propose a heuristic approach that estimates candidate regions where specularity or highlights can occur. Our main modeling assumptions are thus:

S1. Specularity is view dependent.
S2. If a projected 3D point has high pixel intensities and its color is constant across all subaperture views, then the point may be part of a specular surface.
S3. If a projected 3D point has high variation in pixel intensities, and the color of the corresponding rays changes across subaperture views, then the point may belong to a specular surface.
S4. If a point is classified as specular, then it is a part of specular surface, and its local neighborhood in $\mathbb {R}^{3}$ may result in specular pixels from a certain viewing angle.
S5. The distribution of specularity is sparse.

Potential specular objects are identified based on magnitude and variation of pixel values over different views. We compute a specular mask for the center view, and propagate it to the remaining views according to disparity.

Computing the Specular Mask. Our proposed algorithm proceeds in 4 steps:

1.
Let $\varOmega _c$ be the image plane for the center view, and $V = \{(s_1,t_1), ..., (s_N,t_N)\}$ the set of remaining N view points.

For every $\varvec{p} \in \varOmega _c$, we compute the vector ${\varvec{\omega }}_{\varvec{p}}$ of color intensity changes with respect to V according to
$$\begin{aligned} w^i_p = L_i(\varvec{p} + \varvec{v_i}d(\varvec{p})), \, i = 1, \dots , N. \end{aligned}$$
(12)
where $\varvec{v_i} = (s_c - s_i, t_c - t_i)$ is the view point displacement and $d(\varvec{p})$ the estimated scalar disparity of $\varvec{p}$.
2.
Identify pixels where color and intensity vary within subaperture views in three steps according to assumptions (S1) and (S3):
- Filter out a percentage $\% n_{var}$ of pixels that have low luminance variation $\sigma (\varvec{\omega _p})$, where by $\varOmega ^*_c$ we define a set of remaining pixels.
- Exclude occlusion boundaries from $\varOmega ^*_c$. To find occlusion boundaries, we compute the k-nearest neighbors in the image domain, and corresponding spatial coordinates in $\mathbb {R}^{3}$. If neighboring pixels in $\varOmega ^*_c$ are far away in $\mathbb {R}^{3}$, with distances larger than $d_{occ}$, then we classify those pixels as occlusion boundaries.
- From the remaining pixels, finally exclude the percentage $\%n_{conf}$ with the lowest confidence scores similar to the approach proposed by Tao et al. [14].
  
  To compute confidence, we cluster the corresponding values of $\varvec{\omega _p}$ in two groups using K-means. Let $\varvec{m}(\varvec{p})$ be the cluster centroid with the larger mean $\mu (\varvec{m})$. The confidence is computed as
  $$\begin{aligned} c(\varvec{p}) = \exp \left( -\frac{1}{\sigma _{spec}^2} \left( \frac{\beta _0}{ \mu (\varvec{ m}) } + \frac{\beta _1}{ \xi (\varvec{m}) } \right) \right) , \end{aligned}$$
  (13)
  where $\xi (\varvec{m})$ denotes the sum of all distances within the cluster.
  
  The confidence score grows with mean intensity and variation within the brightest cluster. Thus, we obtain pixels with varying values within subaperture views. Above, $\beta _0$ and $\beta _1$ are user-defined parameters that control exponential decay of brightness and distance terms, $\sigma _{spec}$ scales the confidence function. We fix $\beta _0 = 0.5, \, \beta _1 = 10^{-3}, \sigma _{spec} = 2$.
3.
Identify pixels where intensity is high and color not changing within all subaperture views according to assumption (S2). According to Tian and Clark [36], regions with high unnormalized Wiener entropy, which is defined as the product of RGB values over all pixels, are likely to be specular. We adopt their approach and also identify those regions.
4.
Combine pixels found in steps 2 and 3 into the specular mask
$$\begin{aligned} h_{mask} = {\left\{ \begin{array}{ll} 1, \, \text {specular}\\ 0, \, \text {non-specular}, \end{array}\right. } \end{aligned}$$
(14)
which is then grown according to assumption (S4) to include all $k_{spec}$-nearest neighbors for each specular pixel in the initial mask.

An example specular mask for a Lytro dataset is shown in Fig. 2.

Final Prior on Specularity. The specular component should be non-zero only within the candidate specular region given by the mask $h_\text {mask}$ defined above. We therefore strongly penalize non-zero values outside this region by defining the final sparsity prior as

$$\begin{aligned} P_\text {spec}(H^{log}) \;=\; \lambda _{spec} \int _{\mathcal R} \gamma _w ( 1 - h_\text {mask} ) ||H^{log}(\varvec{r})||^2 \; d {\varvec{r}} \;+\; \lambda _{sparse}|| H^{log} ||_1. \end{aligned}$$

(15)

where $\gamma _w\gg 0$ is a constant. We include an additional sparsity norm on $H^{log}$ to account for assumption (S5).

6 Ray Space Regularization and Optimization

We summarize the previously defined terms in the variational energy (5) as a functional F, so that to obtain the light field decomposition we have to solve

$$\begin{aligned} \begin{aligned} \mathop {{{\mathrm{arg\,min}}}}\limits _{(A^{log},s^{log},H^{log})} \Bigl \{\; F( A^{log}, s^{log}, H^{log} ) \,+\, J( A^{log}, s^{log}) \;\Bigr \}. \end{aligned} \end{aligned}$$

(16)

As typical in intrinsic image decomposition, the overall optimization problem is rather complex. However, taking a detailed look at the individual terms, it turns out that we have a convex objective F. Furthermore, our intention is to define the global smoothness term J on ray space in a way that it enforces spatial smoothness within the views, as well as consistency with the disparity-induced structure on the epipolar plane images. Thus, the complete objective function exactly fits the light field optimization framework for inverse problems on ray space proposed by Goldluecke and Wanner [12]. The key advantage of this framework is that it is computationally efficient since it allows to solve subproblems for each epipolar plane image and view independently. Also, it is generic in the sense that we just need to provide a way to compute F and related proximity operators. We thus adopt their method to solve our problem.

Table 1. Main parameters for intrinsic image decomposition used in implementation.

Full size table

In [12], the light field regularizer J in (16) is a sum of several contributions. First, there are individual regularizers $J_{xs}$ and $J_{yt}$ for each epipolar plane image, which depend on the disparity map and employ an anisotropic total variation to enforce consistency of the fields in the arguments with the linear patterns on the epipolar plane images, see Fig. 1. Second, for each view, there is a regularizer $J_{st}$, and as in the basic framework in [12], we use a simple total variation term for efficiency. In future work, we intend to move to something more sophisticated here.

Albedo and shading are independent of view point, thus their values should not vary between views. We want $A^{log}$ and $s^{log}$ to be constant in the direction of $\varvec{d}$, except at disparity discontinuities. We also regularize both components within each individual view as noted above. The complete regularizer can thus be written as

$$\begin{aligned} J(A^{log}, s^{log}) = \mu J_{xs}(A^{log}, s^{log}) + \mu J_{yt}(A^{log}, s^{log}) + \lambda J_{st}(A^{log}, s^{log}), \end{aligned}$$

(17)

where $\lambda , \mu > 0$ are user-defined constants which correspond to the amount of smoothing on the separate views and EPIs, respectively. The objective is convex, so that we achieve global optimality. For details and the actual optimization algorithm we refer to [12].

7 Results

We validate our decomposition method on light fields captured with a Lytro Illum plenoptic camera, as well as on synthetic and gantry data sets provided by Wanner et al. [37]. In the paper, we present selected results for real world indoor and outdoor scenes, the rest we show in the supplementary material. While benchmark datasets for evaluating intrinsic image decomposition are presented in [18, 25, 38], those data sets are designed for algorithm evaluation on single RGB, RGB+D images, or on optic flow; they are not applicable to our light field based method. Since there are thus no ground truth intrinsic light fields available so far, we evaluate our method visually and with qualitive comparisons, deferring rendering of a novel benchmark to future work.

To recover a 3D model and estimate normals, we perform disparity estimation with the multi-view stereo method described in [9], with an improved more occlusion-aware data term and refined with further smoothing using a generalized total variation regularizer, see estimated disparity labels in Fig. 3. The main algorithm parameters and their values are presented in Table 1. Our method is implemented in Matlab, version R2015b, with run-times on a PC with Intel(R) Core i7-4790 CPU 3.60 GHz and an NVIDIA GeForce GTX 980.

Evaluation Results. Since there are no intrinsic image decomposition algorithms that consider specularity, to compare results of our specular term we select a recent algorithm for depth estimation and specular removal developed for light field cameras by Tao et al. [10]. To compare albedo and shading terms, we investigated recently published algorithms that employ 3D information. There are several papers where depth information is used for intrinsic image decomposition [13, 21, 26, 27]. We selected the algorithm developed by Chen and Koltun [13] to compare against, since it outperforms other algorithms that use 3D information. For both comparisons, we use the authors implementations with default parameter setting. Figures 4, 5 and 6 illustrate original image, our proposed decomposition method, Chen and Koltun [13], and Tao et al. [10]. For all images, contrast was enhanced using the Matlab function imadjust for better visualization. Figure 7 illustrates specular masks for origami swan and owl with candle light fields.

We also compared runtime of the algorithms. The Chen and Koltun algorithm converges in 20–30 min for a single image, the method by Tao et al. (including depth estimation) takes 60 min. Our approach evaluated on a light field with a cross-hair shaped subset of 17 views from a light field with $9 \times 9$ views in total converges in 30–40 min, which amounts to 1.7–2.4 min per frame.

8 Conclusions

In this work, we propose the first approach towards solving the intrinsic 4D light field decomposition problem while leveraging the disparity-induced structure on the epipolar plane images. In contrast to existing intrinsic image algorithms, the dense collection of views in a light field allows us to define an additional specular term in the decomposition model, so that we can optimize over the specular component as well as albedo and shading by minimizing a single variational functional. As the inverse decomposition problem is embedded in a recent framework for light field labeling [12], we can ensure that albedo and shading estimates are consistent across and use information from all views. Experiments demonstrate that we outperform both a state-of-the-art intrinsic image decomposition method employing additional depth information [13], as well as a light field based method for specular removal [10, 14] on challenging non-Lambertian scenes.

References

Land, E.H., McCann, J.J.: Lightness and retinex theory. J. Opt. Soc. Am. 61, 1–11 (1978)
Article Google Scholar
Shroff, N., Taguchi, Y., Tuzel, O., Veeraraghavan, A., Ramalingam, S., Okuda, H.: Finding a needle in a specular haystack. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 5963–5970 (2011)
Google Scholar
Beigpour, S., van de Weijer, J.: Object recoloring based on intrinsic image estimation. In: IEEE International Conference on Computer Vision (2016)
Google Scholar
Wang, T.C., Chandraker, M., Efros, A., Ramamoorthi, R.: SVBRDF-invariant shape and reflectance estimation from light-field cameras. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Finlayson, G.D., Hordley, S.D., Drew, M.S.: Removing shadows from images using retinex. In: Color Imaging Conference: Color Science and Engineering Systems, Technologies, and Applications (2002)
Google Scholar
Finlayson, G.D., Hordley, S.D., Lu, C., Drew, M.S.: On the removal of shadows from images. IEEE Trans. Pattern Anal. Mach. Intell. 28, 59–68 (2006)
Article Google Scholar
Shafer, S.: Using color to separate reflection components. Color Res. Appl. 10, 210–218 (1985)
Article Google Scholar
Levoy, M.: Light fields and computational imaging. Computer 39, 46–55 (2006)
Article Google Scholar
Wanner, S., Goldluecke, B.: Variational light field analysis for disparity estimation and super-resolution. IEEE Trans. Pattern Anal. Mach. Intell. 36, 606–619 (2014)
Article Google Scholar
Tao, M., Su, J.C., Wang, T.C., Malik, J., Ramamoorthi, R.: Depth estimation and specular removal for glossy surfaces using point and line consistency with light-field cameras. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1155–1169 (2015)
Article Google Scholar
Tao, M., Srinivasan, P., Hadap, S., Rusinkiewicz, S., Malik, J., Ramamoorthi, R.: Shape estimation from shading, defocus, and correspondence using light-field angular coherence. IEEE Trans. Pattern Anal. Mach. Intell. 39(3), 546–560 (2016)
Article Google Scholar
Goldluecke, B., Wanner, S.: The variational structure of disparity and regularization of 4D light fields. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (2013)
Google Scholar
Chen, Q., Koltun, V.: A simple model for intrinsic image decomposition with depth cues. In: Proceedings of International Conference on Computer Vision (2013)
Google Scholar
Tao, M.W., Wang, T.-C., Malik, J., Ramamoorthi, R.: Depth estimation for glossy surfaces with light-field cameras. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 533–547. Springer, Cham (2015). doi:10.1007/978-3-319-16181-5_41
Google Scholar
Barrow, H.G., Tenenbaum, J.M.: Recovering intrinsic scene characteristics from images. Comput. Vis. Syst. 23, 3–26 (1978)
Google Scholar
Tappen, M.F., Freeman, W.T., Adelson, E.H.: Recovering intrinsic images from a single image. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1459–1472 (2005)
Article Google Scholar
Chung, Y., Cherng, S., Bailey, R.R., Chen, S.W.: Intrinsic image extraction from a single image. J. Inf. Sci. Eng. 25, 1939–1953 (2009)
Google Scholar
Grosse, R., Johnson, M.K., Adelson, E.H., Freeman, W.T.: Ground truth dataset and baseline evaluations for intrinsic image algorithm. In: Proceedings of International Conference on Computer Vision (2009)
Google Scholar
Barron, J.T., Malik, J.: High-frequency shape and albedo from shading using natural image statistics. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (2011)
Google Scholar
Barron, J.T., Malik, J.: Color constancy, intrinsic images, and shape estimation. In: Proceedings of European Conference on Computer Vision (2012)
Google Scholar
Barron, J.T., Malik, J.: Intrinsic scene properties from a single RGB-D image. IEEE Trans. Pattern Anal. Mach. Intell. 38, 690–703 (2015)
Article Google Scholar
Shen, L., Tan, P., Lin, S.: Intrinsic image decomposition with non-local texture cues. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (2008)
Google Scholar
Finlayson, G.D., Drew, M.S., Lu, C.: Intrinsic images by entropy minimization. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3023, pp. 582–595. Springer, Heidelberg (2004). doi:10.1007/978-3-540-24672-5_46
Chapter Google Scholar
Gehler, P.V., Rother, C., Kiefel, M., Zhang, L., Schölkopf, B.: Recovering intrinsic images with a global sparsity prior on reflectance. In: NIPS (2011)
Google Scholar
Bell, S., Bala, K., Snavely, N.: Intrinsic images in the wild. ACM Trans. Graph. (SIGGRAPH) 33, 159:1–159:12 (2014)
Article Google Scholar
Lee, K.J., Zhao, Q., Tong, X., Gong, M., Izadi, S., Lee, S.U., Tan, P., Lin, S.: Estimation of intrinsic image sequences from image+depth video. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 327–340. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33783-3_24
Chapter Google Scholar
Jeon, J., Cho, S., Tong, X., Lee, S.: Intrinsic image decomposition using structure-texture separation and surface normals. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 218–233. Springer, Cham (2014). doi:10.1007/978-3-319-10584-0_15
Google Scholar
Blake, A., Bülthoff, H.: Shape from specularities: computation and psychophysics. Phil. Trans. R. Soc. Lond. B 331, 237–252 (1991)
Article Google Scholar
Swaminathan, R., Kang, S.B., Szeliski, R., Criminisi, A., Nayar, S.K.: On the motion and appearance of specularities in image sequences. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 508–523. Springer, Heidelberg (2002). doi:10.1007/3-540-47969-4_34
Chapter Google Scholar
Adato, Y., Vasilyev, Y., Ben-Shahar, O., Zickler, T.: Toward a theory of shape from specular flow. In: Proceedings of International Conference on Computer Vision (2007)
Google Scholar
Bolles, R.C., Baker, H.H., Marimont, D.H.: Epipolar-plane image analysis: an approach to determining structure from motion. Int. J. Comput. Vision 1, 7–55 (1987)
Article Google Scholar
Yang, K., Gao, S., Li, Y.: Efficient illuminant estimation for color constancy using grey pixels. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Weiss, Y.: Deriving intrinsic images from image sequences. In: Proceedings of International Conference on Computer Vision (2001)
Google Scholar
Finlayson, G.D., Drew, M.S., Lu, C.: Entropy minimization for shadow removal. IJCV 85(1), 35–57 (2009)
Article Google Scholar
Tao, M., Srinivasan, P., Malik, J., Rusinkiewicz, S., Ramamoorthi, R.: Depth from shading, defocus, and correspondence using light-field angular coherence. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Tian, Q., Clark, J.J.: Real-time specularity detection using unnormalized wiener entropy. In: Computer and Robot Vision (CRV), pp. 356–363 (2013)
Google Scholar
Wanner, S., Meister, S., Goldluecke, B.: Datasets and benchmarks for densely sampled 4D light fields. In: Vision, Modelling and Visualization (VMV) (2013)
Google Scholar
Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 611–625. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33783-3_44
Chapter Google Scholar

Download references

Acknowledgements

This work was supported by the ERC Starting Grant “Light Field Imaging and Analysis” (LIA 336978, FP7-2014).

Author information

Authors and Affiliations

University of Konstanz, Konstanz, Germany
Anna Alperovich & Bastian Goldluecke

Authors

Anna Alperovich
View author publications
You can also search for this author in PubMed Google Scholar
Bastian Goldluecke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Alperovich .

Editor information

Editors and Affiliations

National Tsing Hua University, Hsinchu, Taiwan
Shang-Hong Lai
Graz University of Technology, Graz, Austria
Vincent Lepetit
Drexel University, Philadelphia, Pennsylvania, USA
Ko Nishino
The University of Tokyo , Tokyo, Japan
Yoichi Sato

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 12066 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alperovich, A., Goldluecke, B. (2017). A Variational Model for Intrinsic Light Field Decomposition. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10113. Springer, Cham. https://doi.org/10.1007/978-3-319-54187-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-54187-7_5
Published: 11 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54186-0
Online ISBN: 978-3-319-54187-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Variational Model for Intrinsic Light Field Decomposition

Abstract

Similar content being viewed by others

Shadow and Specularity Priors for Intrinsic Light Field Decomposition

Inverse Lightfield Rendering for Shape, Reflection and Natural Illumination

Densely-Sampled Light Field Reconstruction

Keywords

1 Introduction

2 Related Work

3 Intrinsic Light Field Model

4 Albedo and Shading Priors

5 Prior for the Specular Component

6 Ray Space Regularization and Optimization

7 Results

8 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 12066 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Variational Model for Intrinsic Light Field Decomposition

Abstract

Similar content being viewed by others

Shadow and Specularity Priors for Intrinsic Light Field Decomposition

Inverse Lightfield Rendering for Shape, Reflection and Natural Illumination

Densely-Sampled Light Field Reconstruction

Keywords

1 Introduction

2 Related Work

3 Intrinsic Light Field Model

4 Albedo and Shading Priors

5 Prior for the Specular Component

6 Ray Space Regularization and Optimization

7 Results

8 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 12066 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation