User Directed Multi-view-stereo

Doron, Yotam; Campbell, Neill D. F.; Starck, Jonathan; Kautz, Jan

doi:10.1007/978-3-319-16631-5_23

Yotam Doron¹⁵,
Neill D. F. Campbell¹⁵,
Jonathan Starck¹⁶ &
…
Jan Kautz^15,17

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9009))

Included in the following conference series:

Asian Conference on Computer Vision

2015 Accesses
1 Citations

Abstract

Depth reconstruction from video footage and image collections is a fundamental part of many modelling and image-based rendering applications. However real-world scenes often contain limited texture information, repeated elements and other ambiguities which remain challenging for fully automatic algorithms. This paper presents a technique that combines intuitive user constraints with dense multi-view stereo reconstruction. By providing annotations in the form of simple paint strokes, a user can guide a multi-view stereo algorithm and avoid common failure cases. We show how smoothness, discontinuity and depth ordering constraints can be incorporated directly into a variational optimization framework for multi-view stereo. Our method avoids the need for heuristic approaches that edit a depth-map in a sequential process, and avoids requiring the user to accurately segment object boundaries or to directly model geometry. We show how with a small amount of intuitive input, a user may create improved depth maps in challenging cases for multi-view-stereo.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Shading-Aware Multi-view Stereo

Pixelwise View Selection for Unstructured Multi-View Stereo

A TV Prior for High-Quality Scalable Multi-View Stereo Reconstruction

Article 08 September 2016

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Multi-view-stereo (MVS) aims to reconstruct the dense geometry of a scene from a set of calibrated images. A large number of MVS methods reconstruct geometry in the form of a depth map that assigns a distance value to each pixel in a given input image. Depth maps are a useful representation of geometry with applications in image-based rendering, 3D modeling and augmented reality, and can be merged to create more complete models of a scene [1].

Depth reconstruction is a mature area of research, with methods that perform well on scenes that obey certain assumptions. The first assumption is that the scene geometry has a similar local appearance across different views, and that surfaces are sufficiently textured to distinguish between correct and incorrect geometry under projection. The second main assumption is that depth varies smoothly in regions of low image texture and that discontinuities in depth coincide with strong image edges. These are encoded in the data and smoothness terms common to almost all methods, with data terms based on photo-consistency scores and smoothness achieved by global optimization [2] or local filtering [3].

In reality, these assumptions are only partially valid, as many scenes contain objects with large areas of low-texture, specular reflections and colour discontinuities that are unrelated to changes in geometry (Fig. 1).

As a result, fully automatic methods have trouble reconstructing general scenes. Instead, we propose to enable the user to provide intuitive constraints to aid the reconstruction process. In our method we allow the user to apply depth smoothness, depth discontinuity and depth ordering constraints, all of which are specified by the user as simple scribbles. We incorporate the user’s constraints directly into a variational optimization framework that simultaneously optimizes for both a regularized depth map derived from on a photo-consistency cost volume as well as the constraints.

Our work shows how user input can help correct depth reconstruction. In this paper we make the following contributions:

We introduce a framework to incorporate user constraints, motivating our choice of algorithm by the need for interactive refinement.
We define a set of simple edit operations to correct common failure cases in multi-view-stereo. Our edit operations are intuitive, do not require the user to draw accurate object boundaries and do not rely on fixed sequential operations such as superpixel segmentation.
We demonstrate how user constraints may be directly incorporated into a state of the art variational depth reconstruction algorithm and show how this can be used to correct depth reconstruction.
We provide a clear derivation of our solver from the energy formulation.

2 Related Work

User-guided monocular shape and depth recovery. In scenarios such as 2D-3D conversion, a user may want to recover geometry given their knowledge of the image contents. Several techniques exist to interpolate empirical depth or disparity values painted onto monocular image or video data, while others use relative constraints on depth [4]. Guttmann et al. [5] train a classifier on user-scribbled disparity values provided on the first and last frames of a shot, using the results to constrain a coarse-to-fine quadratic optimization. Brosch et al. [6] also require users to paint disparity values onto the first and last frame of a sequence. Their technique attempts to maintain consistent depth over time by combining the scribbles with the results of a video over-segmentation algorithm and optical flow. In our approach, rather than directly painting depth values, the user indirectly improves the depth computed by multi-view-stereo by providing constraints.

The Depth Director system of Ward et al. [7] uses superpixel data, along with sparse structure-from-motion information to assign depth to video frames. The system provides a segmentation tool, allowing users to adjust depth variation and orientation within a region. Additionally, users may choose a template shape, such as a car, to fit to a selected region. Liao et al. [8] ask users to provide relative depth input (ordering and equality) at keyframes. The system also uses depth from SFM features where available and applies a temporal perspective depth correction stage based on optical flow. Their user-study suggests that it is more intuitive for users to specify the relative depth constraints than to paint colour-coded empirical depth values.

High level user interactions have been applied to intrinsic image decomposition and geometric modelling. Two recent methods [9, 10] include in their optimization user-provided local constraints: constant-reflectance, constant-illumination and fixed-illumination. In single-image modelling, Zhang et al. [11] optimize for a mesh satisfying user-provided normal, positional and curvature constraints. Toeppe et al. [12] propose a variational framework for single-view single-object modelling using a ballooning energy formulation. The recent work of Chen et al. [13] demonstrates modelling of complex objects with simple inputs, using symmetry constraints and automatic alignment to image structures.

User-guided stereo. The recent system of Zhang et al. [14] aims to improve the quality of disparity maps generated from a stereo video sequence by allowing a range of user edits. The user corrects disparity at keyframes indirectly with several interactive tools which alter unary and pair-wise cost terms in an label-based optimization framework. The system relies on a GrabCut-based object selection tool [15] to segment objects. The user may fit a parametric model, using sparse feature matches within a selected object, a disparity alignment tool, a smoothness brush and a discontinuity brush. The paper also presents a method for propagation of edited disparity through the sequence, with the edited disparity map and disparity costs used soft constraints in non-keyframes. This approach is related to ours in that the user edits also influence energy terms in an optimization, however our constraints do not require the user to first accurately segment object boundaries, and we target the multi-view-stero case rather than (binocular) stereo video.

Photogrammetry. There has been extensive work on image-based modelling using photogrammetry [16], with recent methods [17] providing high quality results from inaccurate user input. In the VideoTrace system [18], a user interactively creates a model from video footage. The interactions are assisted by pre-computed sparse stereo and superpixel information and the user may define symmetries to complete geometry in areas not visible in the footage.

3 Our Approach

The input to our system is a sequence of images $\lbrace I_1 \cdots I_N \rbrace $ for which the camera motion has been calculated using, for example, structure-from-motion with an off-the-shelf tool. This provides us with a projection matrix $P_i$ defining the camera pose, and intrinsic parameters such as focal length, at each image (or frame of a video sequence).

Given this calibrated image sequence, the goal of our method is to estimate a dense depth map for one or more of the images. Without loss of generality, we shall discuss calculating the depth for a single image at a time; this image shall be referred to as the source or reference image and denoted $I_\mathrm{s}$. For the source frame $I_\mathrm{s}$, a subset of adjacent frames are used to automatically reconstruct the source frame depth $D_\mathrm{s}$.

In this section we provide details on our approach to interactive depth reconstruction. We begin by providing details of how to build a photo-consistency cost volume that encodes appearance constraints from neighboring views. We then describe the basic energy model that uses this cost volume, along with regularisation enforcing smoothness constraints, to define the most likely depth map. Next, we add user constraints to modify the energy function in a principled manner to improve the quality of the depth map in an interative refinement process with the user in the loop. Finally, we provide details of the optimization scheme used to solve our energy model, including the user constraints, in an efficient manner that allows for interactive editing.

3.1 Photo-Consistency

Our depth estimation process first builds a photo-consistency cost volume. The cost volume acts as a cached data term in our optimization, as in [2] and is computed before any user interactions are introduced. We test a range of candidate depths for each pixel $x$ in the source image $I_\mathrm{s}$, spaced linearly in inverse depth (disparity). Our photo-consistency error for the pixel $x$ taking a candidate depth $d$ is defined as

$$\begin{aligned} C(x, d) = \frac{1}{K} \sum _{I_j \in \mathcal {N}(I_\mathrm{s})} \left| I_\mathrm{s}(x) - I_j (\pi _{\mathrm{s},j}(x,d)) \right| . \end{aligned}$$

(1)

Here we have used $\pi _{\mathrm{s},j}(x,d)$ to denote the back-projection from $I_\mathrm{s}$ to a depth $d$ along a ray through $x$, followed by projection into image coordinates in a neighbouring frame $I_j$, and, $I_j(x')$ to denote the interpolated colour at pixel $x'$ in $I_j$. In addition, $\mathcal {N}(I_\mathrm{s})$ denotes the local neighbourhood of frames around $I_\mathrm{s}$, and $K$ normalizes by the number of frames where the re-projected pixel is inside the image bounds.

3.2 Energy

Taking the lowest cost per pixel in the photo-consistency cost volume, i.e.

$$\begin{aligned} d(x) = \arg \min _{x} C(x, d), \end{aligned}$$

(2)

would provide us with a very noisy estimate of the depth map; instead, [2, 19], we apply spatial regularization using a variational energy formulation that combines the photo-consistency cost with a term that encourages gradient sparsity in the depth map. Formally, we define our energy as

$$\begin{aligned} E\left[ \, d(x) \,\right] = \int _{\varOmega } \lambda \,C(x, d(x)) + g(x) ~\left| \left| \nabla d(x)\right| \right| _{\epsilon } \,\mathrm {d}x \end{aligned}$$

(3)

where we use the photo-consistency cost from (1) and a Total Variation (TV) prior on the gradient of the depth map, $\nabla d(x)$, and integrate over the image domain $\varOmega $. The TV term is weighted by the inhomogeneous contrast sensitive term

$$\begin{aligned} g(x) = \exp \left( - \gamma ~\left| \left| \nabla I_\mathrm{s}\right| \right| \right) \end{aligned}$$

(4)

that encourages depth discontinuities to coincide with image intensity discontinuities. We use the Huber norm

$$\begin{aligned} ~\left| \left| \mathbf {s}\right| \right| _\epsilon = {\left\{ \begin{array}{ll} \frac{~\left| \left| \mathbf {s}\right| \right| ^2}{2\epsilon } &{} \text {if } ~\left| \left| \mathbf {s}\right| \right| \le \epsilon \\ ~\left| \left| \mathbf {s}\right| \right| - \frac{\epsilon }{2} &{} \text {if } ~\left| \left| \mathbf {s}\right| \right| > \epsilon \end{array}\right. } \end{aligned}$$

(5)

on the depth map gradient. For clarity, we will drop the dependency on $x$ in our notation such that $ d(x) \mapsto d$ and $g(x) \mapsto g$.

The energy in (3) is non-convex and difficult to optimize directly. To overcome this issue we use a quadratic relaxation, similar to [20] and [2]. We introduce an auxiliary depth variable, $v(x) \mapsto v$, and approximate (3) with the auxiliary energy $E_\mathrm{aux}\left[ \, d,v \,\right] $ as

$$\begin{aligned} \int _{\varOmega } \, g ~\left| \left| \nabla d\right| \right| _\epsilon + \frac{1}{2\theta }\,( d - v)^2 + \lambda \,C(x, v) \, \,\mathrm {d}x. \end{aligned}$$

(6)

We observe that as $\theta \rightarrow 0$ we will have $v \rightarrow d$ and thus (6) $\rightarrow $ (3).

By decoupling the regularization from the data term, we obtain two sub-problems. Fixing $v = v'$, we have a problem $\min _{ d} E_\mathrm{aux}\left[ \, d,v' \,\right] $ that is convex in $ d$ as

$$\begin{aligned} \underset{ d}{\min } \int _{\varOmega } g ~\left| \left| \nabla d\right| \right| _\epsilon + \frac{1}{2\theta }\,( d - v')^2 \,\mathrm {d}x \end{aligned}$$

(7)

and by fixing $ d = d'$ we have a problem $\min _{v} E_\mathrm{aux}\left[ \, d',v \,\right] $ that can be solved point-wise for $v$ as

$$\begin{aligned} \underset{v}{\min } \int _{\varOmega } \frac{1}{2\theta }( d - v)^2 + \lambda \,C(x, v) \,\mathrm {d}x. \end{aligned}$$

(8)

We will show how to solve this alternation optimization in Sect. 3.4 using a primal-dual saddle point technique; before this, we provide details of how to extend this standard energy model to include user constraints.

3.3 Including User Constraints

We now describe how to extend the basic energy model to include the user in the reconstruction process (user-in-the-loop). This takes to form of the user providing brush strokes from a toolbox of three constraints targeted against specific shortcomings of the standard variational approaches. We now describe each of these three tools in further detail.

Smoothness. There are regions where the photo-consistency values in the cost volume may be noisy or incorrect, e.g. the Lambertian assumption fails in the prescence of specular highlights, or image intensity discontinuities may encourage artifical depth discontinuities. In such regions we can smooth the solution from neighboring regions containing the correct depth by downweighting the photo-consistency term and relying on the gradient regularizer to fill in a smooth surface.

We maintain a brush bitmap $B_\mathrm{sm}(x) \in [0,1]$ and use it to modulate the weight on the data term in (3); we allow $\lambda $ to vary across the image and set it to $\lambda = \alpha (1 - B_\mathrm{sm})$, again dropping the explicit dependency on $x$ for clarity. We used feathered brush strokes to ensure a smooth transistion. We note that we must modify the sub-problem in (7) since the quadratic relaxation means that the depth is still influenced by $v$. We therefore include $\lambda $ in the coupling term as

$$\begin{aligned} \underset{ d}{\min } \int _{\varOmega } g ~\left| \left| \nabla d\right| \right| _\epsilon + \frac{\lambda }{2\theta }\,( d - v')^2 \,\mathrm {d}x \end{aligned}$$

(9)

such that when $\lambda \rightarrow 0$ the data term is decoupled and the regularization will take over and smooth the resulting depth map.

Boundary Discontinuities. There are the opposite cases where the solution will be too smooth in regions where there should be a discontinuity in the depth; for example, foreground and background objects with similar coloring may result in a low contrast intensity edge disguising a true discontinuity in depth. To tackle this problem we provide a second brush bitmap $B_\mathrm{dc}(x) \in [0,1]$ that increases the contrast sensitive edge term of (4) as

$$\begin{aligned} g(x) = \exp \big ( - \gamma \, (1 + \mu \, B_\mathrm{dc})~ ~\left| \left| \nabla I_\mathrm{s}\right| \right| \big ). \end{aligned}$$

(10)

We also need to downweight the data term in these regions since the photo-consistency term can lead to the phenomenon of foreground ening [3]. Whilst previous approaches have addressed this with adaptive support weights or including view selection in the optimization [21], this requires changing or re-computing the cost volume and removes the efficiency advantages of pre-caching the photo-consistency costs. Instead, we make use of the discontinuity brush and again downweight the $\lambda $ term again such that

$$\begin{aligned} \lambda = \alpha \, (1 - B_\mathrm{sm}) \, (1 - B_\mathrm{dc}). \end{aligned}$$

(11)

Ordering Constraints. Errors in the photo-consistency volume can give rise to the an incorrect local minimum where even if smoothness and discontinuity constraints are preserved, a distinct surface may appear in the wrong layer (either too close or too far from the camera). Our third tool makes use of two brush strokes where a user can select two nearby image regions and apply the constraint that one is closer to the camera than the other. Multiple instances of such pairwise constraints can be built up as necessary given the interactive reconstruction feedback available to the user.

We will illustrate with a single pair of brush strokes; we define a foreground brush $B_\mathrm{fore}(x)$ and corresponding background brush $B_\mathrm{back}(x)$ with the contraint that all the foreground brush pixels $\{x_{\mathrm{f},i}\}\in B_\mathrm{fore}$ are closer to the camera than the background brush pixels $\{x_{\mathrm{b},j}\}\in B_\mathrm{back}$. We proceed by matching each foreground pixel $x_{\mathrm{f},i}$ to the nearest pixel in the background set $x_{\mathrm{b},m(i)}$ such that

$$\begin{aligned} m(i) = \underset{j \in B_\mathrm{back}}{\arg \min } ~\left| \left| \, x_{\mathrm{f},i}- x_{\mathrm{b},j}\, \big .\right| \right| . \end{aligned}$$

(12)

This can be performed efficiently using a k-d tree. We then form a set of linear inequality constraints with a minimum threshold distance in depth $t_{\mathrm{dist}}$ that must separate the two layers which gives us that

$$\begin{aligned} \varPhi \; [\, d\,] + t_{\mathrm{dist}}\mathbf{1} < \mathbf{0} \end{aligned}$$

(13)

where $\varPhi \; [\, d\,]$ denotes

$$\begin{aligned} {\small \begin{bmatrix} \;\;0&\cdots&-1&\cdots&1&\cdots&0\;\; \\&\vdots&\vdots&\vdots&\end{bmatrix} \begin{bmatrix} \cdot \\ d(x_{\mathrm{f},i}) \\ \cdot \\ d(x_{\mathrm{b},m(i)}) \\ \cdot \end{bmatrix}.} \end{aligned}$$

(14)

We can apply this constraint to the energy model using a set of Lagrangian multipliers $\mathbf{r}\in {\mathcal {R}}^{|\{x_{\mathrm{f},i}\}|}$ to augment sub-problem (9) to

$$\begin{aligned} \underset{ d}{\min } \; \underset{\mathbf{r}}{\max } \int _{\varOmega } g ~\left| \left| \nabla d\right| \right| _\epsilon + \frac{\lambda }{2\theta }\,( d - v')^2 + \langle \, \mathbf{r}, \left( \varPhi \; [\, d\,] + t_{\mathrm{dist}}\mathbf{1} \right) \, \rangle \,\mathrm {d}x \end{aligned}$$

(15)

and maximizing with respect to $\mathbf{r}$ such that $\mathbf{r}\ge 0$.

3.4 Optimization

In Sect. 3.2 we described how to split our energy model into two sub-problems to be solved in alternation. We first consider the sub-problem, with the auxiliary variables $v$ fixed, of (15); this can be solved with a primal-dual approach [22].

Auxiliary Sub-Problem. Taking (15), we first dualize the regularisation term $f(d) = g~\left| \left| d\right| \right| _\epsilon $, with $g >0$. The Legendre-Fenchel transform of $f(\cdot )$ is given by

$$\begin{aligned} f^\star (p)= & {} g \; \underset{d}{\max }\left\{ g^{-1}\langle d, p \rangle - ~\left| \left| d\right| \right| _\epsilon \right\} \end{aligned}$$

(16)

$$\begin{aligned}= & {} \frac{\epsilon }{2g} ~\left| \left| p\right| \right| ^2 + \delta \left( \frac{p}{g}\right) \end{aligned}$$

(17)

where $\delta (\cdot )$ is the indicator function

$$\begin{aligned} \delta (p) = \left\{ \begin{array}{lr} 0 &{} \mathrm{if} ~\left| \left| p\right| \right| \le 1\\ \infty &{} \mathrm{otherwise} \end{array} \right. \end{aligned}$$

(18)

and we use the scaling property

$$\begin{aligned} f(x) = a \, h(x) \implies f^\star (p) = a \, h^\star \left( \frac{p}{a}\right) \end{aligned}$$

(19)

for $a > 0$. We then add a dual variable $\mathbf {q}(x) \mapsto \mathbf {q}$ and obtain the saddle point problem $\int _{\varOmega } L \,\mathrm {d}x$ as

$$\begin{aligned} \int _{\varOmega } \langle \mathbf {q}, \nabla d \rangle - \frac{\epsilon }{2g} ~\left| \left| \mathbf {q}\right| \right| ^2 - \delta \left( \frac{\mathbf {q}}{g}\right) + \frac{\lambda }{2\theta }( d - v)^2 + \langle \, \mathbf{r}, \left( \varPhi \; [\, d\,] + t_{\mathrm{dist}}\mathbf{1} \right) \, \rangle \,\mathrm {d}x \end{aligned}$$

(20)

that we minimize with respect to $ d$ and maximise with respect to $\mathbf {q}$ and $\mathbf{r}$. Taking partial derivatives we obtain

$$\begin{aligned} \frac{\partial L}{\partial \mathbf {q}}&= \nabla d - \frac{\epsilon }{g}\mathbf {q}\end{aligned}$$

(21)

$$\begin{aligned} \frac{\partial L}{\partial \mathbf{r}}&= \left( \varPhi \; [\, d\,] + t_{\mathrm{dist}}\mathbf{1} \right) \end{aligned}$$

(22)

$$\begin{aligned} \frac{\partial L}{\partial d}&= - \nabla \cdot \mathbf {q} + \frac{\lambda }{\theta }( d - v). \end{aligned}$$

(23)

We then discretize for descent in $ d$ and ascent in $\mathbf {q}$ and $\mathbf{r}$ and solve for $\mathbf {q}^{k+1}$, $\mathbf{r}^{k+1}$ and $ d^{k+1}$ as

$$\begin{aligned} \mathbf {q}^{k+1}&= g \, \frac{\left( \mathbf {q}^k + \sigma _q \nabla d^k\right) }{\left( g + \sigma _q \epsilon \right) }\end{aligned}$$

(24)

$$\begin{aligned} \mathbf{r}^{k+1}&= \mathbf{r}^{k} + \sigma _{r}\, \left( \varPhi \; [\, d\,] + t_{\mathrm{dist}}\mathbf{1} \right) \end{aligned}$$

(25)

$$\begin{aligned} d^{k+1}&= \frac{\left( \theta \, d^k + \sigma _d \left( \theta \, \nabla \cdot \mathbf {q}^{k+1} + \lambda v \right) \right) }{\left( \theta + \lambda \sigma _d \right) } \end{aligned}$$

(26)

with appropriate step sizes $\sigma _q$, $\sigma _{r}$ and $\sigma _d$, as in [23]; the updates are shown in Algorthm 1. We note that two projection steps are required. The $\mathbf {q}$ updates require projection into the norm ball, represented by the operation $\pi [\cdot ]$. The $\mathbf{r}$ updates require projection into the positive half-plane, respresented by $I^{+} [\cdot ]$.

Cost Volume Sub-Problem. The second sub-problem, with the primal variables $ d$ fixed, of (8) may be solved using a simple point-wise search in the cost volume. As noted in [2], this search may be accelerated by keeping track of a search depth range for each pixel over subsequent iterations. We also perform a single Newton step in $v$ in each iteration, as in [2], to obtain sub-sample accuracy within the cost volume and reduce depth quantization artefacts.

Efficiency. The overall optimization process is given in Algorithm 1. We note that each update step can be performed efficiently and in parallel for each pixel. As demonstrated by the results of [2], this allows for real-time performance when implemented on the GPU. In addition, the cost volume calculation (Eq. 1) may be performed in parallel on the GPU using the texture units to perform efficient image interpolation.

4 Results

We demonstrate the results of our system, showing the effect of the different user interactions on the reconstructed depth. In this paper we show a reconstruction of the flower and lawn dataset from Zhang et al. [24] (in Figs. 3 and 4) and a desk scene (in Fig. 2) that features a number of violations of standard MVS assumptions. In our experiments, we select the 8 closest views to the reference frame and sample 100 depth values to construct the cost volume. For the flower and lawn scenes, we use the camera pose provided by [24] and for the desk scene we use the NUKEX camera tracker [25].

Figure 2 shows the impact of the different interactions compared to the result of the baseline variational method [2]. The desk scene features a glossy non-Lambertian surface with strong image edges that do not coincide with depth discontinuities. The resulting hole in the depth map is corrected with smoothness constraints in our method. The discontinuity edits reduce the depth smearing on the top edge of the cloth in the desk scene due to low image contrast.

Figures 3 and 4 compare our results for the flower and lawn datasets to the depth maps computed by [24] and to the results of baseline variational method [2] without any additional constraints. The user annotations are shown in the last row. In the flower and lawn datasets, the method of [24] produces smooth-looking depth maps but loses a significant amount of detail, while the baseline method [2] is able to recover more detail but suffers from edge fattening artefacts and in large areas of low texture. Our method allows the user to selectively maintain detail, for example in the leaves of the flower, while imposing smoothness in other areas of the image, such as the gravel and the bottom edge of the image. In the lawn dataset, the severe artefacts in the sky and the boundaries of the figure and bench are improved.

5 Conclusion

In this paper we have shown how simple user annotations may significantly improve the quality of depth maps reconstructed from multiple images. Our method builds on state-of-the-art continuous depth recovery algorithms with the annotations modifying terms in the energy formulation, and does not require users to draw precise object boundaries or to paint absolute depth values. We believe this is particularly well-suited to mobile devices, where user input is limited to coarse strokes and image-based-rendering applications such as depth defocus are gaining popularity.

References

Campbell, N.D.F., Vogiatzis, G., Hernández, C., Cipolla, R.: Using multiple hypotheses to improve depth-maps for multi-view stereo. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 766–779. Springer, Heidelberg (2008)
Chapter Google Scholar
Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: dense tracking and mapping in real-time. In: ICCV (2013)
Google Scholar
Rhemann, C., Hosni, A., Bleyer, M., Rother, C., Gelautz, M.: Fast cost-volume filtering for visual correspondence and beyond. In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, pp. 3017–3024. IEEE Computer Society, Washington, DC (2011)
Google Scholar
Sýkora, D., Sedlacek, D., Jinchao, S., Dingliana, J., Collins, S.: Adding depth to cartoons using sparse depth (In)equalities. Comput. Graph. Forum 29, 615–623 (2010)
Article Google Scholar
Guttmann, M., Wolf, L., Cohen-Or, D.: Semi-automatic stereo extraction from video footage. In: ICCV (2009)
Google Scholar
Brosch, N., Rhemann, C., Gelautz, M.: Segmentation-based depth propagation in videos. In: ÖAGM/AAPR (2011)
Google Scholar
Ward, B., Kang, S.B., Bennett, E.P.: Depth director: A system for adding depth to movies. IEEE Comput. Graph. Appl. 31, 36–48 (2011)
Article Google Scholar
Liao, M., Gao, J., Yang, R., Gong, M.: Video stereolization: Combining motion analysis with user interaction. IEEE Trans. Vis. Comput. Graph. 18, 1079–1088 (2012)
Article Google Scholar
Bousseau, A., Paris, S., Durand, F.: User-assisted intrinsic images. In: ACM Transactions on Graphics (TOG), vol. 28, p. 130. ACM (2009)
Google Scholar
Shen, J., Yang, X., Jia, Y., Li, X.: Intrinsic images using optimization. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3481–3487. IEEE (2011)
Google Scholar
Zhang, L.Z.L., Dugas-Phocion, G., Samson, J.S., Seitzt, S.: Single view modeling of free-form scenes. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1 (2001)
Google Scholar
Töppe, E., Oswald, M.R., Cremers, D., Rother, C.: Image-based 3d modeling via cheeger sets. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010, Part I. LNCS, vol. 6492, pp. 53–64. Springer, Heidelberg (2011)
Chapter Google Scholar
Chen, T., Zhu, Z., Shamir, A., Hu, S.M., Cohen-Or, D.: 3-sweep: extracting editable objects from a single photo. ACM Trans. Graph. (Proceedings of SIGGRAPH Asia 2013) 32, Article 195 (2013)
Google Scholar
Zhang, C., Price, B.: High-quality stereo video matching via user interaction and space-time propagation. In: 3DTV-Conference (2013)
Google Scholar
Rother, C., Kolmogorov, V., Blake, A.: “GrabCut”: Interactive foreground extraction using iterated graph cuts. In: ACM SIGGRAPH 2004 Papers, SIGGRAPH 2004, pp. 309–314. ACM, New York (2004)
Google Scholar
Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs: a hybrid geometry- and image-based approach. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH 1996, pp. 11–20. ACM, New York (1996)
Google Scholar
Arikan, M., Schwärzler, M., Flöry, S., Wimmer, M., Maierhofer, S.: O-snap: optimization-based snapping for modeling architecture. ACM Trans. Graph. 32, 6:1–6:15 (2013)
Article Google Scholar
van den Hengel, A., Dick, A., Thormählen, T., Ward, B., Torr, P.H.S.: VideoTrace: rapid interactive scene modelling from video. In: ACM Transactions on Graphics (TOG), vol. 26, p. 86. ACM (2007)
Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011)
Article MATH MathSciNet Google Scholar
Steinbrücker, F., Pock, T., Cremers, D.: Large displacement optical flow computation withoutwarping. In: ICCV (2009)
Google Scholar
Zheng, E., Dunn, E., Jojic, V., Frahm, J.M.: Patchmatch based joint view selection and depthmap estimation (2014)
Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011)
Article MATH MathSciNet Google Scholar
Pock, T., Chambolle, A.: Diagonal preconditioning for first order primal-dual algorithms in convex optimization. In: ICCV (2011)
Google Scholar
Zhang, G., Jia, J., Wong, T.T., Bao, H.: Consistent depth maps recovery from a video sequence. IEEE Trans. Pattern Anal. Mach. Intell. 31, 974–988 (2009)
Article Google Scholar
The Foundry: NUKEX. http://www.thefoundry.co.uk/products/nuke-product-family/nukex/ (2014)

Download references

Acknowledgements

We would like to thank Anastasios Roussos and Fabio Viola for helpful discussions. This work has been supported by the EngD VEIV Centre at UCL, The Foundry and EPSRC grant EP/I031170/1.

Author information

Authors and Affiliations

University College London, London, UK
Yotam Doron, Neill D. F. Campbell & Jan Kautz
The Foundry, London, UK
Jonathan Starck
NVIDIA Research, Guildford, UK
Jan Kautz

Authors

Yotam Doron
View author publications
You can also search for this author in PubMed Google Scholar
Neill D. F. Campbell
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Starck
View author publications
You can also search for this author in PubMed Google Scholar
Jan Kautz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yotam Doron .

Editor information

Editors and Affiliations

Center for Visual Information Technology, International Institute of Information Technology, Hyderabad, India
C.V. Jawahar
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Shiguang Shan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Doron, Y., Campbell, N.D.F., Starck, J., Kautz, J. (2015). User Directed Multi-view-stereo. In: Jawahar, C., Shan, S. (eds) Computer Vision - ACCV 2014 Workshops. ACCV 2014. Lecture Notes in Computer Science(), vol 9009. Springer, Cham. https://doi.org/10.1007/978-3-319-16631-5_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-16631-5_23
Published: 11 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16630-8
Online ISBN: 978-3-319-16631-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

User Directed Multi-view-stereo

Abstract