Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Multi-view-stereo (MVS) aims to reconstruct the dense geometry of a scene from a set of calibrated images. A large number of MVS methods reconstruct geometry in the form of a depth map that assigns a distance value to each pixel in a given input image. Depth maps are a useful representation of geometry with applications in image-based rendering, 3D modeling and augmented reality, and can be merged to create more complete models of a scene [1].

Fig. 1.
figure 1

Overview of the advantage of our user-assisted multi-view stereo. Standard variational approaches to depth map estimation often have shortcomings when presented with real world scenes, for example the specular highlights on the brochure (a) result in holes in the depth map (b). In our approach, we provide the user with interactive tools (in the form of brush strokes) to correct for such short-comings

Depth reconstruction is a mature area of research, with methods that perform well on scenes that obey certain assumptions. The first assumption is that the scene geometry has a similar local appearance across different views, and that surfaces are sufficiently textured to distinguish between correct and incorrect geometry under projection. The second main assumption is that depth varies smoothly in regions of low image texture and that discontinuities in depth coincide with strong image edges. These are encoded in the data and smoothness terms common to almost all methods, with data terms based on photo-consistency scores and smoothness achieved by global optimization [2] or local filtering [3].

In reality, these assumptions are only partially valid, as many scenes contain objects with large areas of low-texture, specular reflections and colour discontinuities that are unrelated to changes in geometry (Fig. 1).

As a result, fully automatic methods have trouble reconstructing general scenes. Instead, we propose to enable the user to provide intuitive constraints to aid the reconstruction process. In our method we allow the user to apply depth smoothness, depth discontinuity and depth ordering constraints, all of which are specified by the user as simple scribbles. We incorporate the user’s constraints directly into a variational optimization framework that simultaneously optimizes for both a regularized depth map derived from on a photo-consistency cost volume as well as the constraints.

Our work shows how user input can help correct depth reconstruction. In this paper we make the following contributions:

  • We introduce a framework to incorporate user constraints, motivating our choice of algorithm by the need for interactive refinement.

  • We define a set of simple edit operations to correct common failure cases in multi-view-stereo. Our edit operations are intuitive, do not require the user to draw accurate object boundaries and do not rely on fixed sequential operations such as superpixel segmentation.

  • We demonstrate how user constraints may be directly incorporated into a state of the art variational depth reconstruction algorithm and show how this can be used to correct depth reconstruction.

  • We provide a clear derivation of our solver from the energy formulation.

2 Related Work

User-guided monocular shape and depth recovery. In scenarios such as 2D-3D conversion, a user may want to recover geometry given their knowledge of the image contents. Several techniques exist to interpolate empirical depth or disparity values painted onto monocular image or video data, while others use relative constraints on depth [4]. Guttmann et al. [5] train a classifier on user-scribbled disparity values provided on the first and last frames of a shot, using the results to constrain a coarse-to-fine quadratic optimization. Brosch et al. [6] also require users to paint disparity values onto the first and last frame of a sequence. Their technique attempts to maintain consistent depth over time by combining the scribbles with the results of a video over-segmentation algorithm and optical flow. In our approach, rather than directly painting depth values, the user indirectly improves the depth computed by multi-view-stereo by providing constraints.

The Depth Director system of Ward et al. [7] uses superpixel data, along with sparse structure-from-motion information to assign depth to video frames. The system provides a segmentation tool, allowing users to adjust depth variation and orientation within a region. Additionally, users may choose a template shape, such as a car, to fit to a selected region. Liao et al. [8] ask users to provide relative depth input (ordering and equality) at keyframes. The system also uses depth from SFM features where available and applies a temporal perspective depth correction stage based on optical flow. Their user-study suggests that it is more intuitive for users to specify the relative depth constraints than to paint colour-coded empirical depth values.

High level user interactions have been applied to intrinsic image decomposition and geometric modelling. Two recent methods [9, 10] include in their optimization user-provided local constraints: constant-reflectance, constant-illumination and fixed-illumination. In single-image modelling, Zhang et al. [11] optimize for a mesh satisfying user-provided normal, positional and curvature constraints. Toeppe et al. [12] propose a variational framework for single-view single-object modelling using a ballooning energy formulation. The recent work of Chen et al. [13] demonstrates modelling of complex objects with simple inputs, using symmetry constraints and automatic alignment to image structures.

User-guided stereo. The recent system of Zhang et al. [14] aims to improve the quality of disparity maps generated from a stereo video sequence by allowing a range of user edits. The user corrects disparity at keyframes indirectly with several interactive tools which alter unary and pair-wise cost terms in an label-based optimization framework. The system relies on a GrabCut-based object selection tool [15] to segment objects. The user may fit a parametric model, using sparse feature matches within a selected object, a disparity alignment tool, a smoothness brush and a discontinuity brush. The paper also presents a method for propagation of edited disparity through the sequence, with the edited disparity map and disparity costs used soft constraints in non-keyframes. This approach is related to ours in that the user edits also influence energy terms in an optimization, however our constraints do not require the user to first accurately segment object boundaries, and we target the multi-view-stero case rather than (binocular) stereo video.

Photogrammetry. There has been extensive work on image-based modelling using photogrammetry [16], with recent methods [17] providing high quality results from inaccurate user input. In the VideoTrace system [18], a user interactively creates a model from video footage. The interactions are assisted by pre-computed sparse stereo and superpixel information and the user may define symmetries to complete geometry in areas not visible in the footage.

3 Our Approach

The input to our system is a sequence of images \(\lbrace I_1 \cdots I_N \rbrace \) for which the camera motion has been calculated using, for example, structure-from-motion with an off-the-shelf tool. This provides us with a projection matrix \(P_i\) defining the camera pose, and intrinsic parameters such as focal length, at each image (or frame of a video sequence).

Given this calibrated image sequence, the goal of our method is to estimate a dense depth map for one or more of the images. Without loss of generality, we shall discuss calculating the depth for a single image at a time; this image shall be referred to as the source or reference image and denoted \(I_\mathrm{s}\). For the source frame \(I_\mathrm{s}\), a subset of adjacent frames are used to automatically reconstruct the source frame depth \(D_\mathrm{s}\).

In this section we provide details on our approach to interactive depth reconstruction. We begin by providing details of how to build a photo-consistency cost volume that encodes appearance constraints from neighboring views. We then describe the basic energy model that uses this cost volume, along with regularisation enforcing smoothness constraints, to define the most likely depth map. Next, we add user constraints to modify the energy function in a principled manner to improve the quality of the depth map in an interative refinement process with the user in the loop. Finally, we provide details of the optimization scheme used to solve our energy model, including the user constraints, in an efficient manner that allows for interactive editing.

3.1 Photo-Consistency

Our depth estimation process first builds a photo-consistency cost volume. The cost volume acts as a cached data term in our optimization, as in [2] and is computed before any user interactions are introduced. We test a range of candidate depths for each pixel \(x\) in the source image \(I_\mathrm{s}\), spaced linearly in inverse depth (disparity). Our photo-consistency error for the pixel \(x\) taking a candidate depth \(d\) is defined as

$$\begin{aligned} C(x, d) = \frac{1}{K} \sum _{I_j \in \mathcal {N}(I_\mathrm{s})} \left| I_\mathrm{s}(x) - I_j (\pi _{\mathrm{s},j}(x,d)) \right| . \end{aligned}$$
(1)

Here we have used \(\pi _{\mathrm{s},j}(x,d)\) to denote the back-projection from \(I_\mathrm{s}\) to a depth \(d\) along a ray through \(x\), followed by projection into image coordinates in a neighbouring frame \(I_j\), and, \(I_j(x')\) to denote the interpolated colour at pixel \(x'\) in \(I_j\). In addition, \(\mathcal {N}(I_\mathrm{s})\) denotes the local neighbourhood of frames around \(I_\mathrm{s}\), and \(K\) normalizes by the number of frames where the re-projected pixel is inside the image bounds.

3.2 Energy

Taking the lowest cost per pixel in the photo-consistency cost volume, i.e.

$$\begin{aligned} d(x) = \arg \min _{x} C(x, d), \end{aligned}$$
(2)

would provide us with a very noisy estimate of the depth map; instead, [2, 19], we apply spatial regularization using a variational energy formulation that combines the photo-consistency cost with a term that encourages gradient sparsity in the depth map. Formally, we define our energy as

$$\begin{aligned} E\left[ \, d(x) \,\right] = \int _{\varOmega } \lambda \,C(x, d(x)) + g(x) ~\left| \left| \nabla d(x)\right| \right| _{\epsilon } \,\mathrm {d}x \end{aligned}$$
(3)

where we use the photo-consistency cost from (1) and a Total Variation (TV) prior on the gradient of the depth map, \(\nabla d(x)\), and integrate over the image domain \(\varOmega \). The TV term is weighted by the inhomogeneous contrast sensitive term

$$\begin{aligned} g(x) = \exp \left( - \gamma ~\left| \left| \nabla I_\mathrm{s}\right| \right| \right) \end{aligned}$$
(4)

that encourages depth discontinuities to coincide with image intensity discontinuities. We use the Huber norm

$$\begin{aligned} ~\left| \left| \mathbf {s}\right| \right| _\epsilon = {\left\{ \begin{array}{ll} \frac{~\left| \left| \mathbf {s}\right| \right| ^2}{2\epsilon } &{} \text {if } ~\left| \left| \mathbf {s}\right| \right| \le \epsilon \\ ~\left| \left| \mathbf {s}\right| \right| - \frac{\epsilon }{2} &{} \text {if } ~\left| \left| \mathbf {s}\right| \right| > \epsilon \end{array}\right. } \end{aligned}$$
(5)

on the depth map gradient. For clarity, we will drop the dependency on \(x\) in our notation such that \( d(x) \mapsto d\) and \(g(x) \mapsto g\).

The energy in (3) is non-convex and difficult to optimize directly. To overcome this issue we use a quadratic relaxation, similar to [20] and [2]. We introduce an auxiliary depth variable, \(v(x) \mapsto v\), and approximate (3) with the auxiliary energy \(E_\mathrm{aux}\left[ \, d,v \,\right] \) as

$$\begin{aligned} \int _{\varOmega } \, g ~\left| \left| \nabla d\right| \right| _\epsilon + \frac{1}{2\theta }\,( d - v)^2 + \lambda \,C(x, v) \, \,\mathrm {d}x. \end{aligned}$$
(6)

We observe that as \(\theta \rightarrow 0\) we will have \(v \rightarrow d\) and thus (6\(\rightarrow \) (3).

By decoupling the regularization from the data term, we obtain two sub-problems. Fixing \(v = v'\), we have a problem \(\min _{ d} E_\mathrm{aux}\left[ \, d,v' \,\right] \) that is convex in \( d\) as

$$\begin{aligned} \underset{ d}{\min } \int _{\varOmega } g ~\left| \left| \nabla d\right| \right| _\epsilon + \frac{1}{2\theta }\,( d - v')^2 \,\mathrm {d}x \end{aligned}$$
(7)

and by fixing \( d = d'\) we have a problem \(\min _{v} E_\mathrm{aux}\left[ \, d',v \,\right] \) that can be solved point-wise for \(v\) as

$$\begin{aligned} \underset{v}{\min } \int _{\varOmega } \frac{1}{2\theta }( d - v)^2 + \lambda \,C(x, v) \,\mathrm {d}x. \end{aligned}$$
(8)

We will show how to solve this alternation optimization in Sect. 3.4 using a primal-dual saddle point technique; before this, we provide details of how to extend this standard energy model to include user constraints.

3.3 Including User Constraints

We now describe how to extend the basic energy model to include the user in the reconstruction process (user-in-the-loop). This takes to form of the user providing brush strokes from a toolbox of three constraints targeted against specific shortcomings of the standard variational approaches. We now describe each of these three tools in further detail.

Smoothness. There are regions where the photo-consistency values in the cost volume may be noisy or incorrect, e.g. the Lambertian assumption fails in the prescence of specular highlights, or image intensity discontinuities may encourage artifical depth discontinuities. In such regions we can smooth the solution from neighboring regions containing the correct depth by downweighting the photo-consistency term and relying on the gradient regularizer to fill in a smooth surface.

We maintain a brush bitmap \(B_\mathrm{sm}(x) \in [0,1]\) and use it to modulate the weight on the data term in (3); we allow \(\lambda \) to vary across the image and set it to \(\lambda = \alpha (1 - B_\mathrm{sm})\), again dropping the explicit dependency on \(x\) for clarity. We used feathered brush strokes to ensure a smooth transistion. We note that we must modify the sub-problem in (7) since the quadratic relaxation means that the depth is still influenced by \(v\). We therefore include \(\lambda \) in the coupling term as

$$\begin{aligned} \underset{ d}{\min } \int _{\varOmega } g ~\left| \left| \nabla d\right| \right| _\epsilon + \frac{\lambda }{2\theta }\,( d - v')^2 \,\mathrm {d}x \end{aligned}$$
(9)

such that when \(\lambda \rightarrow 0\) the data term is decoupled and the regularization will take over and smooth the resulting depth map.

Boundary Discontinuities. There are the opposite cases where the solution will be too smooth in regions where there should be a discontinuity in the depth; for example, foreground and background objects with similar coloring may result in a low contrast intensity edge disguising a true discontinuity in depth. To tackle this problem we provide a second brush bitmap \(B_\mathrm{dc}(x) \in [0,1]\) that increases the contrast sensitive edge term of (4) as

$$\begin{aligned} g(x) = \exp \big ( - \gamma \, (1 + \mu \, B_\mathrm{dc})~ ~\left| \left| \nabla I_\mathrm{s}\right| \right| \big ). \end{aligned}$$
(10)

We also need to downweight the data term in these regions since the photo-consistency term can lead to the phenomenon of foreground ening [3]. Whilst previous approaches have addressed this with adaptive support weights or including view selection in the optimization [21], this requires changing or re-computing the cost volume and removes the efficiency advantages of pre-caching the photo-consistency costs. Instead, we make use of the discontinuity brush and again downweight the \(\lambda \) term again such that

$$\begin{aligned} \lambda = \alpha \, (1 - B_\mathrm{sm}) \, (1 - B_\mathrm{dc}). \end{aligned}$$
(11)

Ordering Constraints. Errors in the photo-consistency volume can give rise to the an incorrect local minimum where even if smoothness and discontinuity constraints are preserved, a distinct surface may appear in the wrong layer (either too close or too far from the camera). Our third tool makes use of two brush strokes where a user can select two nearby image regions and apply the constraint that one is closer to the camera than the other. Multiple instances of such pairwise constraints can be built up as necessary given the interactive reconstruction feedback available to the user.

We will illustrate with a single pair of brush strokes; we define a foreground brush \(B_\mathrm{fore}(x)\) and corresponding background brush \(B_\mathrm{back}(x)\) with the contraint that all the foreground brush pixels \(\{x_{\mathrm{f},i}\}\in B_\mathrm{fore}\) are closer to the camera than the background brush pixels \(\{x_{\mathrm{b},j}\}\in B_\mathrm{back}\). We proceed by matching each foreground pixel \(x_{\mathrm{f},i}\) to the nearest pixel in the background set \(x_{\mathrm{b},m(i)}\) such that

$$\begin{aligned} m(i) = \underset{j \in B_\mathrm{back}}{\arg \min } ~\left| \left| \, x_{\mathrm{f},i}- x_{\mathrm{b},j}\, \big .\right| \right| . \end{aligned}$$
(12)

This can be performed efficiently using a k-d tree. We then form a set of linear inequality constraints with a minimum threshold distance in depth \(t_{\mathrm{dist}}\) that must separate the two layers which gives us that

$$\begin{aligned} \varPhi \; [\, d\,] + t_{\mathrm{dist}}\mathbf{1} < \mathbf{0} \end{aligned}$$
(13)

where \(\varPhi \; [\, d\,]\) denotes

$$\begin{aligned} {\small \begin{bmatrix} \;\;0&\cdots&-1&\cdots&1&\cdots&0\;\; \\&\vdots&\vdots&\vdots&\end{bmatrix} \begin{bmatrix} \cdot \\ d(x_{\mathrm{f},i}) \\ \cdot \\ d(x_{\mathrm{b},m(i)}) \\ \cdot \end{bmatrix}.} \end{aligned}$$
(14)

We can apply this constraint to the energy model using a set of Lagrangian multipliers \(\mathbf{r}\in {\mathcal {R}}^{|\{x_{\mathrm{f},i}\}|}\) to augment sub-problem (9) to

$$\begin{aligned} \underset{ d}{\min } \; \underset{\mathbf{r}}{\max } \int _{\varOmega } g ~\left| \left| \nabla d\right| \right| _\epsilon + \frac{\lambda }{2\theta }\,( d - v')^2 + \langle \, \mathbf{r}, \left( \varPhi \; [\, d\,] + t_{\mathrm{dist}}\mathbf{1} \right) \, \rangle \,\mathrm {d}x \end{aligned}$$
(15)

and maximizing with respect to \(\mathbf{r}\) such that \(\mathbf{r}\ge 0\).

3.4 Optimization

In Sect. 3.2 we described how to split our energy model into two sub-problems to be solved in alternation. We first consider the sub-problem, with the auxiliary variables \(v\) fixed, of (15); this can be solved with a primal-dual approach [22].

Auxiliary Sub-Problem. Taking (15), we first dualize the regularisation term \(f(d) = g~\left| \left| d\right| \right| _\epsilon \), with \(g >0\). The Legendre-Fenchel transform of \(f(\cdot )\) is given by

$$\begin{aligned} f^\star (p)= & {} g \; \underset{d}{\max }\left\{ g^{-1}\langle d, p \rangle - ~\left| \left| d\right| \right| _\epsilon \right\} \end{aligned}$$
(16)
$$\begin{aligned}= & {} \frac{\epsilon }{2g} ~\left| \left| p\right| \right| ^2 + \delta \left( \frac{p}{g}\right) \end{aligned}$$
(17)

where \(\delta (\cdot )\) is the indicator function

$$\begin{aligned} \delta (p) = \left\{ \begin{array}{lr} 0 &{} \mathrm{if} ~\left| \left| p\right| \right| \le 1\\ \infty &{} \mathrm{otherwise} \end{array} \right. \end{aligned}$$
(18)

and we use the scaling property

$$\begin{aligned} f(x) = a \, h(x) \implies f^\star (p) = a \, h^\star \left( \frac{p}{a}\right) \end{aligned}$$
(19)

for \(a > 0\). We then add a dual variable \(\mathbf {q}(x) \mapsto \mathbf {q}\) and obtain the saddle point problem \(\int _{\varOmega } L \,\mathrm {d}x\) as

$$\begin{aligned} \int _{\varOmega } \langle \mathbf {q}, \nabla d \rangle - \frac{\epsilon }{2g} ~\left| \left| \mathbf {q}\right| \right| ^2 - \delta \left( \frac{\mathbf {q}}{g}\right) + \frac{\lambda }{2\theta }( d - v)^2 + \langle \, \mathbf{r}, \left( \varPhi \; [\, d\,] + t_{\mathrm{dist}}\mathbf{1} \right) \, \rangle \,\mathrm {d}x \end{aligned}$$
(20)

that we minimize with respect to \( d\) and maximise with respect to \(\mathbf {q}\) and \(\mathbf{r}\). Taking partial derivatives we obtain

$$\begin{aligned} \frac{\partial L}{\partial \mathbf {q}}&= \nabla d - \frac{\epsilon }{g}\mathbf {q}\end{aligned}$$
(21)
$$\begin{aligned} \frac{\partial L}{\partial \mathbf{r}}&= \left( \varPhi \; [\, d\,] + t_{\mathrm{dist}}\mathbf{1} \right) \end{aligned}$$
(22)
$$\begin{aligned} \frac{\partial L}{\partial d}&= - \nabla \cdot \mathbf {q} + \frac{\lambda }{\theta }( d - v). \end{aligned}$$
(23)

We then discretize for descent in \( d\) and ascent in \(\mathbf {q}\) and \(\mathbf{r}\) and solve for \(\mathbf {q}^{k+1}\), \(\mathbf{r}^{k+1}\) and \( d^{k+1}\) as

$$\begin{aligned} \mathbf {q}^{k+1}&= g \, \frac{\left( \mathbf {q}^k + \sigma _q \nabla d^k\right) }{\left( g + \sigma _q \epsilon \right) }\end{aligned}$$
(24)
$$\begin{aligned} \mathbf{r}^{k+1}&= \mathbf{r}^{k} + \sigma _{r}\, \left( \varPhi \; [\, d\,] + t_{\mathrm{dist}}\mathbf{1} \right) \end{aligned}$$
(25)
$$\begin{aligned} d^{k+1}&= \frac{\left( \theta \, d^k + \sigma _d \left( \theta \, \nabla \cdot \mathbf {q}^{k+1} + \lambda v \right) \right) }{\left( \theta + \lambda \sigma _d \right) } \end{aligned}$$
(26)

with appropriate step sizes \(\sigma _q\), \(\sigma _{r}\) and \(\sigma _d\), as in [23]; the updates are shown in Algorthm 1. We note that two projection steps are required. The \(\mathbf {q}\) updates require projection into the norm ball, represented by the operation \(\pi [\cdot ]\). The \(\mathbf{r}\) updates require projection into the positive half-plane, respresented by \(I^{+} [\cdot ]\).

figure a

Cost Volume Sub-Problem. The second sub-problem, with the primal variables \( d\) fixed, of (8) may be solved using a simple point-wise search in the cost volume. As noted in [2], this search may be accelerated by keeping track of a search depth range for each pixel over subsequent iterations. We also perform a single Newton step in \(v\) in each iteration, as in [2], to obtain sub-sample accuracy within the cost volume and reduce depth quantization artefacts.

Efficiency. The overall optimization process is given in Algorithm 1. We note that each update step can be performed efficiently and in parallel for each pixel. As demonstrated by the results of [2], this allows for real-time performance when implemented on the GPU. In addition, the cost volume calculation (Eq. 1) may be performed in parallel on the GPU using the texture units to perform efficient image interpolation.

4 Results

We demonstrate the results of our system, showing the effect of the different user interactions on the reconstructed depth. In this paper we show a reconstruction of the flower and lawn dataset from Zhang et al. [24] (in Figs. 3 and 4) and a desk scene (in Fig. 2) that features a number of violations of standard MVS assumptions. In our experiments, we select the 8 closest views to the reference frame and sample 100 depth values to construct the cost volume. For the flower and lawn scenes, we use the camera pose provided by [24] and for the desk scene we use the NUKEX camera tracker [25].

Figure 2 shows the impact of the different interactions compared to the result of the baseline variational method [2]. The desk scene features a glossy non-Lambertian surface with strong image edges that do not coincide with depth discontinuities. The resulting hole in the depth map is corrected with smoothness constraints in our method. The discontinuity edits reduce the depth smearing on the top edge of the cloth in the desk scene due to low image contrast.

Figures 3 and 4 compare our results for the flower and lawn datasets to the depth maps computed by [24] and to the results of baseline variational method [2] without any additional constraints. The user annotations are shown in the last row. In the flower and lawn datasets, the method of [24] produces smooth-looking depth maps but loses a significant amount of detail, while the baseline method [2] is able to recover more detail but suffers from edge fattening artefacts and in large areas of low texture. Our method allows the user to selectively maintain detail, for example in the leaves of the flower, while imposing smoothness in other areas of the image, such as the gravel and the bottom edge of the image. In the lawn dataset, the severe artefacts in the sky and the boundaries of the figure and bench are improved.

Fig. 2.
figure 2

Our results for the desk dataset. We demonstrate that considerable improvement can be made over the basic model by adding the user to the reconstruction loop. We show our full result in (c) for comparison to the basic model result in (b). We have filled in the holes in the reconstruction and improved the quality and ordering of the discontinuities. We show the individual brush strokes in (d)–(f) and results for subsets of the brush strokes in (g)–(i).

Fig. 3.
figure 3

Results on the flower dataset of [24]. By controlling the smoothness and discontinuity terms locally, a user can obtain a smooth result while selectively maintaining fine detail, illustrated here by the detail in the leaves. The depth maps in (e) and (f) were adjusted globally with a colour-curve tool for visualisation.

Fig. 4.
figure 4

Results on the lawn dataset of [24]. Both [24] and [2] fail to recover depth discontinuities in areas of low contrast between the bench and the lawn. The baseline method cannot determine depth for the textureless sky region, while [24] loses detail in the background trees and enlarges foreground objects such as the legs and sides of the bench. These problem areas are improved with user annotations.

5 Conclusion

In this paper we have shown how simple user annotations may significantly improve the quality of depth maps reconstructed from multiple images. Our method builds on state-of-the-art continuous depth recovery algorithms with the annotations modifying terms in the energy formulation, and does not require users to draw precise object boundaries or to paint absolute depth values. We believe this is particularly well-suited to mobile devices, where user input is limited to coarse strokes and image-based-rendering applications such as depth defocus are gaining popularity.