Keywords

1 Introduction

In many computer vision applications, steps such as segmentation, object detection or recognition, three-dimensional modeling, and albedo and shading analysis are adversely affected by lighting conditions. In particular, specularity can disrupt a variety of different visual processing steps.

Intrinsic image decomposition (IID) is one such step that aims to decompose an image into reflectance, shading, and illumination components, simplifying the robustness of downstream processes. We find that, like other vision processes, IID algorithms are disrupted by the kinds of specularities seen in practical applications such as video surveillance. If IID is to be useful across a wide spectrum of practical applications, then we require methods that are robust to specularities. Therefore, in this paper, we develop a method for handling specularity prior to decomposing an image into reflectance and shading with IID.

Many algorithms for IID have been proposed. Land and McCann’s classic retinex method [1] was among the first to decompose images into illumination and reflectance components and is used as a basic approach in many of today’s methods. Rother et al. [2] address the problem of decoupling material-dependent properties (reflectance, albedo, and diffuse reflectance) and light-dependent properties (specular reflectance, shading, and inter-object reflection) from a single image using a prior on the reflectance component. Similarly, Barron et al. [3] decompose images into albedo and illumination components. Barron and Malik [4] recover shape, surface color, and illumination color from a single image. Laffont et al. [5] propose a method that decomposes an input image into intrinsic images from multiple view angles of a scene, without a priori knowledge of scene geometry. The method uses multi-view stereo to reconstruct a 3D point cloud. Although the cloud is sparse and incomplete, it provides the necessary information about illumination at each point. The authors optimize the point cloud and estimate sun visibility, i.e., decomposing illumination into sun, sky, and indirect illumination layers. Shi et al. [6] present an approach that decomposes input RGBD images into reflectance and shading images. The method assumes piecewise constant reflectance, allowing superpixel clustering of the reflectance image. Pixels lying in the same superpixel are assumed to have identical reflectance; any color variation is assumed to arise from shading. Kang et al. [7] perform IID on hyper-spectral images and extract features from the image set thus obtained. They use a method by Shen et al. [8] for the IID step.

Despite the richness of this literature, most IID methods assume Lambertian surfaces and diffuse lighting and are thus not robust under more adverse conditions, especially specular reflections common in practical applications with less-than-ideal illumination, such as video surveillance.

Therefore, in this paper, we propose to handle specularities while decomposing an image into reflectance and shading components. Our method comprises two steps. In the first step, we decompose the image into a specular component and a diffuse component with a new method that improves upon the state of the art. In the second step, the resulting image is decomposed into its specularity free reflectance and shading components. In the future we will add shadow handling and compare our pipeline with that of Fan et al [9]. We evaluate our method qualitatively by conducting experiments on synthetic and real images. The results show that our approach is successful at producing specularity-free reflectance and shading components. Our method may help computer vision applications such as object detection, segmentation, and tracking under adverse lighting conditions.

In the next section, we present the method in detail. In Sect. 3, we discuss our experimental setup and results, and in Sect. 4, we conclude and discuss future work.

2 Proposed Model

A schematic of our method is shown in Fig. 1. It consists of two components: specularity removal and IID. We describe each component separately in Sects. 2.1 and 2.2.

Fig. 1.
figure 1

Proposed model for specularity-aware intrinsic image decomposition (IID).

2.1 Specularity Removal

In the specularity removal component, we begin with the well-known dichromatic reflection model for image formation [10]. This model has been used in many researchers’ work [11,12,13,14]. According to the dichromatic reflection model, the RGB color \(\varvec{V}(p)\) of observed pixel with index p is the combination of diffuse and specular reflection

$$\begin{aligned} \varvec{V}(p) = \alpha (p)\varvec{V_d}(p) + \beta (p)\varvec{V_s}(p), \end{aligned}$$
(1)

where \(\varvec{V_d}(p)\) is the (unknown) underlying diffuse color value of pixel p, \(\varvec{V_s}(p)\) is the (unknown) underlying specular color value of pixel p, and \(\alpha (p)\) and \(\beta (p)\) represent the (unknown) contributions of diffuse and specular reflection to the observed pixel p, respectively. We use row vectors to denote RGB intensities for a pixel.

Clearly, the problem of finding \(\alpha (p)\), \(\varvec{V_d}(p)\), \(\beta (p)\), and \(\varvec{V_s}(p)\) consistent with \(\varvec{V}(p)\) is underconstrained. Our method extends that of H.L. Shen et al. [14]. First, following H.L. Shen et al., we categorize each pixel p as either diffuse or specular based on whether all three color components are above (specular) a threshold or not (diffuse). Next, let p be the diffuse pixel with the highest value of R, G, or B. Set \(\varvec{V_d}(p) = \varvec{V}(p)\) and \(\varvec{V_s}(p) = [0,0,0]\). After that, for each diffuse pixel q whose chromaticity (normalized RGB vector) is within a threshold distance of that of p, the method calculates \(\alpha (q)\) and \(\beta (q)\) as

$$\begin{aligned} \begin{bmatrix} \alpha (q)\\ \beta (q) \end{bmatrix} = \begin{bmatrix} \varvec{V_d}(p)^T \quad \varvec{L}^T \end{bmatrix}^{+} \varvec{V}(q)^T, \end{aligned}$$
(2)

where \(\varvec{L} = \begin{bmatrix} 255&255&255 \end{bmatrix}\). By assuming \(\varvec{V_d}(q) = \varvec{V_d}(p)\) for each pixel q similar to p, we are left with two simple linear equations with one unknown each, allowing direct solution for \(\varvec{V_s}(q)\). The process is repeated with another p not yet considered until every diffuse pixel is resolved.

In H.L. Shen et al.’s method, after each diffuse pixel q has been assigned values for \(\varvec{V_d}(q)\) and \(\varvec{V_s}(q)\), a similar procedure is applied to the specular pixels, assuming that the diffuse component for each specular pixel q is the same as the diffuse pixel p with the most similar chromaticity.

This method works well in some cases, but it does not work well in the case of large, bright specularities, which are common in real world situations. Bright specularity dominates the diffuse component of the light to the point that the original diffuse chromaticity cannot be recovered, and the assumption that the specular pixel’s diffuse component should be same as that of the diffuse pixel with the most similar chromaticity introduces artifacts in the estimated diffuse image. An example is shown in Fig. 2.

Fig. 2.
figure 2

Diffuse component of image with large bright specularities using H.L. Shen et al. method [14] and our method.

To handle this issue and provide more accurate reconstruction of the diffuse component of the image under such conditions, we propose a simple interpolation method able to accurately reconstruct the diffuse component of bright specular regions. Our method uses barycentric interpolation of the RGB components of the diffuse image based on the boundary of the specular region, as determined in the first step. An example is shown in Fig. 2. Diffuse pixel value \(\varvec{V_d}(q)\) for pixel q within a specular region is reconstructed as

$$\begin{aligned} \varvec{V}_{\varvec{d}}(q) = \sum _{j \in B(q)}\left[ \omega _{jq} \varvec{V}_{\varvec{d}}(j)\right] , \end{aligned}$$
(3)

where B(q) represents the set of pixels on the boundary of the specular region containing pixel q, \(\varvec{V_d}(j)\) represents the RGB diffuse component of boundary pixel j, and \(\omega _{jq}\) is a barycentric weight calculated as

$$\begin{aligned} \omega _{jq} = \frac{\exp \left( -\gamma \Vert {j - q}\Vert \right) }{\sum _{j \in B(q)} \exp \left( -\gamma \Vert {j - q}\Vert \right) }. \end{aligned}$$
(4)

Here \(||j - q ||\) represents the spatial Euclidean distance between specular pixel q and boundary pixel j. \(\gamma > 0 \) is a normalization constant. Large values of \(\gamma \) give high relative weights to the closest boundary points, and lower values of \(\gamma \) give more uniform weights over boundary points. It would be possible to make \(\gamma \) adaptive to the size of the specular region in question, but we find that \(\gamma = 0.25\) gives good weight distributions for most specular region size. We pass the diffuse component of the input image to the next step, intrinsic image decomposition.

2.2 Intrinsic Image Decomposition (IID)

After removing specularities, the next step is to decompose the image into reflectance and shading components. We use the well-known intrinsic image decomposition model of Rother et al. [2], which has been used by many other researchers [5, 8, 15, 16]. We adopt their basic approach following Shen et al. [8], assuming that after specularity removal,

$$\begin{aligned} \varvec{V_{d}}(p) = \varvec{L}(p)\varvec{R}(p), \end{aligned}$$
(5)

where \(\varvec{V_{d}}(p)\) represents an input diffuse image pixel, \(\varvec{L}(p)\) represents the diffuse illumination, and \(\varvec{R}(p)\) represents the diffuse reflectance of pixel p. J. Shen et al. [8] assume that neighbor pixels having similar intensity will have similar reflectance, leading them to minimize the energy function

$$\begin{aligned} E(\varvec{R}, \varvec{L}) = \sum _{p} \left( \varvec{R}(p) - \sum _{q \in \mathcal{N}(p)} w_{pq} \varvec{R}(q)\right) ^2 \nonumber \\ + \sum _{p}\left( \varvec{V_{d}}(p) / \varvec{L}(p) - \varvec{R}(p)\right) ^2, \end{aligned}$$
(6)

where \(\varvec{R}\) is the reflectance image, \(\varvec{L}\) is the shading image, \(\mathcal{N}(p)\) is a local neighborhood around pixel p (for example a \(3\times 3\) window), and \(w_{pq}\) is a weight indicating the similarity of pixels p and q:

$$\begin{aligned} w_{pq}=e^{-\left[ \langle {\varvec{\widetilde{V}}_{\varvec{d}}}(p), \varvec{\widetilde{V_{d}}}(q)\rangle ^2/\rho _{pT}^2 + (Y(p) - Y(q))^2/\rho _{pY}^2\right] }, \end{aligned}$$
(7)

where \(\langle \varvec{\widetilde{V_{d}}}(p), \varvec{\widetilde{V_{d}}}(q)\rangle \) is the cosine of the angle between normalized RGB vectors \(\varvec{\widetilde{V_{d}}}(p)\) and \(\varvec{\widetilde{V_{d}}}(q)\), and Y(p) denotes the intensity or luminosity of pixel p in \(\varvec{V}\). \(\rho ^2_{pT}\) and \(\rho ^2_{pY}\) represent the variance of the angles and intensities, respectively, around pixel p.

3 Empirical Evaluation

To evaluate the method described in the previous section, we use two datasets. First, we generated a new dataset in which well-known 3D models (monkey, bunny, and teapot) are rendered with specularity using the Blender software package. For purposes of quantitative analysis using Local Mean Squared Error (LMSE) [17], we also rendered corresponding ground-truth reflectance and shading images without specularity. Second, we use a dataset of images provided by H.L. Shen et al. [14]. No ground truth is available for these images, so the results can only be evaluated qualitatively. We performed two sets of experiments. In Experiment I, we ran dataset 1 through the pipeline of Fig. 1, and in Experiment II, we ran dataset 2 through the pipeline of Fig. 1. We report local mean square error and show qualitative results for the case (dataset 1) where we have ground truth, and we show qualitative results for dataset 2, where we have no ground truth.

3.1 Experiment I: Images with Specularity

As dataset 1 (consisting of synthetic images rendered by Blender) has ground truth, we compare the results of specularity removal and IID with the ground truth reflectance and shading components obtained from Blender. A quantitative comparison is shown in Fig. 3. The evaluation shows that our proposed framework produces better results than those of J. Shen et al. [8].

Fig. 3.
figure 3

Quantitative analysis of proposed framework using local mean squared error (LMSE)

Fig. 4.
figure 4

Experiment I & II results. Proposed method produces reflectance images closer to the ground truth than the original J. Shen et al. method.

Qualitative results are shown in Fig. 4. It can be seen that our proposed framework produces better specular-free reflectance and shading images than those provided by Shen et al. [8].

3.2 Experiment II: Specularity from [14]

This experiment uses images provided by Shen et al. [14]. Qualitative experimental results are shown in Fig. 4. By visual inspection, it is clear that our proposed model produces better specularity-free reflectance and shading images than produced by the method of Shen et al. [8].

4 Conclusion and Future Work

Adverse lighting conditions, particularly specular reflections, negatively affect important computer vision tasks such as feature detection, tracking, image segmentation, and object detection. In this paper, we propose a pipeline for removal of specularity from images prior to decomposition into reflectance and shading components. We find that the proposed framework is capable in many cases of producing accurate specularity-free reflectance and shading images from single input images. In future work, we plan to improve the specularity removal method and extend our pipeline to handle shadows. Finally, we will explore the use of these methods in the context of real-world video processing systems.