1 Introduction

The estimation of optical flow has been a core problem in computer vision for decades. Many successful methods for solving this task belong to the class of variational approaches. Based on the minimization of a continuous energy functional consisting of a data and a smoothness term, such methods offer dense and sub-pixel accurate results as well as a transparent modelling. Since the pioneering work of Horn and Schunck [14], a lot of progress has been made on both the modelling and the optimization side. On the modelling side, modern smoothness priors allow the estimation of flow fields with both gradual transitions [6, 11] and sharp motion discontinuities [16, 27], while modern data terms cope with noise [4], outliers and varying illumination [11]. On the optimization side, coarse-to-fine schemes [17] have been proposed that allow to handle large displacements of relatively large objects. Fast motions of small objects, however, are still hard to capture.

One widely used approach to alleviate this problem is the integration of descriptor matches [8, 21, 25]. While such methods allow to handle arbitrarily large motion, they heavily rely on the uniqueness of the underlying descriptors. Hence, in case of weakly textured regions or repetitive patterns, such methods are likely to produce false matches which deteriorate the final optical flow estimation. Recent approaches face this problem by applying a-posteriori regularization to a set of descriptor matches in order to improve its quality [12].

While there are scenarios where the large displacement problem of small objects is intrinsically unsolvable – e.g. in the presence of multiple non-unique instances – a surprisingly large share of large displacement cases can actually be solved even with a-priori regularization, i.e. regularization during the estimation of the matches. In order to understand in which cases large displacements can still be recovered correctly, we have to distinguish two scenarios: (i) One problem is that small objects may not be present on that coarse-to-fine level that is necessary for the estimation of the displacement [8]. This case cannot be handled by standard coarse-to-fine optimization without further data transformations [18]. (ii) Another problem – which, however, has hardly been adressed in the literature so far – is the influence of the balance between data and smoothness term on the estimation of large displacements. For small objects that undergo large displacements, it is typically cheaper to violate the constancy assumptions in the data term (due to the small spatial extent) than to violate the regularity assumptions in the smoothness term (due to the large motion gradient). This particularly holds for large values of the smoothness parameter that are typically required to obtain noise-free flow fields. So even if there is sufficient data on the appropriate coarse-to-fine level, the smoothness term will suppress the estimation of the corresponding large displacement.

In this context, Brox and Malik [8] made the observation that fast motion of high-contrast objects is more likely to be accurately estimated than the motion of low-contrast objects. This is related to the fact that there is an implicit weighting of the constancy assumptions with the corresponding image gradient as observed in [27]. In view of the data costs, mismatches of high-contrast objects are thus more expensive than those of low-contrast objects. This, in turn, suggests to use constraint normalization as in [27] when estimating large displacements.

Contributions. In this work, we address the aforementioned problem of that the appropriate smoothness weight may depend on the local motion pattern. By proposing a variational method that jointly estimates and fuses candidate flows with different smoothness weights into a final flow field, we show that many large displacement scenarios can actually be resolved without using additional feature matches. In contrast to related work from the literature that typically relies on a one-way pipeline based on a discrete fusion of pre-computed flows, we model the entire approach as a single minimization problem based on standard coarse-to-fine optimization. Moreover, we demonstrate the benefit of constraint normalization when estimating large displacements. Please note that we do not focus on designing an overall top-performing method but rather on pushing the limits of pure variational approaches w.r.t. large displacements.

Related Work. To handle large displacements, Brox and Malik [8] proposed to integrate descriptor matches into variational methods by means of a similarity term. While Stoll et al. [21] improved the sensitivity of this strategy w.r.t. to outliers by restricting the integration of such matches to promising locations, Weinzaepfel et al. [25] investigated the use of improved descriptors. In contrast, Xu et al. [26] refrained from using a similarity term, and proposed to enhance the upsampled flow initialization by integrating SIFT-matches at each level of the coarse-to-fine optimization. In contrast to our work, all these methods rely on feature descriptors to estimate large displacements.

Tu et al. [23] used a similar strategy as [26] but they considered proposals generated by PatchMatch [3] and by varying the smoothness weight of a variational method. Similarly, Lempitsky et al. [15] considered flows obtained by different methods and different parameter sets in a discrete fusion approach. Both works [15, 23], however, did not investigate the benefit of varying the smoothness weight for large displacement optical flow.

In all cases, descriptor matching and match integration are separate steps.

2 Baseline Method

Let us start by introducing our baseline optical flow method which is the Complementary Optic Flow method [27]. It is a variational approach where the optical flow \(\mathbf{{w}} = (u,v)^\top \) between two input color images \(\mathbf{{f}}_{{\mathrm {1}}} = (f_{{\mathrm {1}}}^1, f_{{\mathrm {1}}}^2, f_{{\mathrm {1}}}^3)^\top \) and \(\mathbf{{f}}_{{\mathrm {2}}} = (f_{{\mathrm {2}}}^1, f_{{\mathrm {2}}}^2, f_{{\mathrm {2}}}^3)^\top \) is computed as the minimizer of the following energy:

$$\begin{aligned} \mathcal {E}_{base}(\mathbf{{w}})_{\alpha } =\int _\varOmega \mathcal {E}_\mathrm {D}(\mathbf{{w}})+ \alpha \; \mathcal {E}_\mathrm {S}(\mathbf{{w}})\ d \mathbf{{x}} \;. \end{aligned}$$
(1)

Here, \(\mathcal {E}_D\) is the data term, \(\mathcal {E}_S\) is the smoothness term, \(\alpha \!>\!0\) is a balancing weight and \(\mathbf{{x}} = (x,y)^\top \in \varOmega \) is the location within the image domain \(\varOmega \subset \mathbb {R}^2\).

Data Term. The data term relates the two input images via the optical flow and is given by [27]

$$\begin{aligned} \nonumber \mathcal {E}_\mathrm {D}(\mathbf{{w}})=&\delta \ \varPsi _\mathrm {D}\!\left(\sum \limits _{c=1}^3 \left(\sqrt{\theta ^c} \cdot \left( f^c_{{\mathrm {2}}}(\mathbf{{x}}\!+\!\mathbf{{w}})-f^c_{{\mathrm {1}}}(\mathbf{{x}})\right) \right)^2\right) \\+ & {} \gamma \ \varPsi _\mathrm {D}\!\left(\sum \limits _{c=1}^3 \left|\begin{pmatrix}\sqrt{\theta ^c_x} &{} 0 \\ 0 &{} \sqrt{\theta ^c_y}\end{pmatrix} \cdot \left( \nabla f^c_{{\mathrm {2}}}(\mathbf{{x}}\!+\!\mathbf{{w}})-\nabla f^c_{{\mathrm {1}}}(\mathbf{{x}})\right) \right|^2\right) \;. \end{aligned}$$
(2)

It comprises the brightness constancy and the gradient constancy assumption in order to allow for illumination robust flow estimation [7]. Moreover, to reduce the influence of large gradients, constraint normalization [20] is applied via the weights \(\theta ^c = {1}/({|\nabla f_{2}^{c}|^2 + \zeta ^2})\) and \(\theta ^c_* = {1}/({|\nabla f_{2,*}^{c}|^2 + \zeta ^2})\) (with \(* \in \{x,y\}\)), where \(\zeta \) is a regularization parameter that prevents divisions by zero. Finally, both assumptions are rendered robust under noise by applying a sub-quadratic penalizer [4] – here given by the Charbonnier function [10] \(\varPsi _D(s^2) = 2\epsilon _D^2 \sqrt{1 + s^2 / \epsilon _D^2}\) with contrast parameter \(\epsilon _\mathrm {D}\). The non-negative weights \(\delta \), \(\gamma \) serve as balancing factors.

Smoothness Term. As smoothness term, we consider the anisotropic complementary smoothness term [27]

$$\begin{aligned} \mathcal {E}_\mathrm {S}(\mathbf{{w}})=\sum \limits _{i=1}^2\varPsi _{\mathrm {S}_i}\left(\left|\mathcal {J}\mathbf{{w}}\cdot \mathbf{{r}}_i\right|^2\right), \end{aligned}$$
(3)

that penalizes the directional derivatives of the flow by projecting the Jacobian \(\mathcal {J}\) onto the local directions \(\mathbf{{r}}_1\), \(\mathbf{{r}}_2\) of maximum and minimum information contrast. In this context, the directions \(\mathbf{{r}}_1\) and \(\mathbf{{r}}_2\) are the eigenvectors of the so-called regularization tensor [27] which reads

$$\begin{aligned} R_\rho = K_\rho *\sum \limits _{c=1}^3 \bigg [\delta \; \nabla f_1^c \, \nabla {f_1^c}^\top + \; \gamma \; \Big (\nabla {f_1^c}_x \nabla {f_1^c}_x^\top + \nabla {f_1^c}_y \nabla {f_1^c}_y^\top \Big ) \bigg ], \end{aligned}$$
(4)

where \(*\) denotes convolution with a Gaussian \(K_\rho \) of standard deviation \(\rho \).

Following [24], we apply the edge-enhancing Perona-Malik penalizer [5] given by \(\varPsi _S(s^2)=\epsilon _\mathrm {S_1}^2 \log \left( 1+s^2/\epsilon _\mathrm {S_1}^2 \right) \) in \(\mathbf{{r}}_1\)-direction and the edge-preserving Charbonnier penalizer [10] in \(\mathbf{{r}}_2\)-direction; the former with contrast parameter \(\epsilon _\mathrm {S_1}\) and the latter with contrast parameter \(\epsilon _\mathrm {S_2}\).

3 Joint Estimation and Fusion Model

After we have discussed the baseline method in the previous section, we are now in the position to describe our joint estimation and fusion model. Similar to methods from the literature that include descriptor matches [8, 21, 25], we want to estimate an optical flow \(\mathbf{{w}}_f\) using the baseline method \(\mathcal {E}_{base}\) and some similarity term \(\mathcal {E}_{sim}\) that feeds N candidate flows \(\mathbf{{w}} = \{\mathbf{{w}}_1, \ldots , \mathbf{{w}}_N\}\) from the candidate model \(\mathcal {E}_{cand}\) into the solution. To this end, we propose the joint variational model

$$\begin{aligned} \mathcal {E}(\mathbf{{w}}, \mathbf{{w}}_f) = \mathcal {E}_{base}(\mathbf{{w}}_f)_{\alpha _f} + \mathcal {E}_{sim}(\mathbf{{w}}, \mathbf{{w}}_f) + \mathcal {E}_{cand}(\mathbf{{w}}), \end{aligned}$$
(5)

that consists of three terms. On the one hand, as baseline model, we use the approach from the previous section with smoothness weight \(\alpha _f\). On the other hand, as candidate model, we consider multiple instances of the baseline model \(\mathcal {E}_{base}(\mathbf{{w}})_\alpha \) with different smoothness weights \(\alpha _i\) that estimate the corresponding candidate optical flows \(\mathbf{{w}}_i\). It is given by

$$\begin{aligned} \mathcal {E}_{cand}(\mathbf{{w}}) = \lambda _C \cdot \sum \limits _{i=1}^N \mathcal {E}_{base}(\mathbf{{w}}_i)_{\alpha _i} \end{aligned}$$
(6)

Due to the different smoothness weights, the single instances can capture different levels of motion details, i.e. displacement scales. Finally, in order to couple the candidate flows \(\mathbf{{w}}_i\) and the final optical flow \(\mathbf{{w}}_f\), we introduce a similarity term \(\mathcal {E}_{C}\) for each of these instances weighted by a parameter \(\beta _i\). The combined similarity term reads

$$\begin{aligned} \mathcal {E}_{sim}(\mathbf{{w}}, \mathbf{{w}}_f) = \sum \limits _{i=1}^N \beta _i\; \mathcal {E}_C(\mathbf{{w}}, \mathbf{{w}}_f)_i\;, \end{aligned}$$
(7)

where the distinct similarity terms are defined as

$$\begin{aligned} \mathcal {E}_C(\mathbf{{w}}, \mathbf{{w}}_f)_i = \int \limits _\varOmega c_i(\mathbf{{x}}, \mathbf{{w}})\cdot \varPsi _C\left( |\mathbf{{w}}_i - \mathbf{{w}}_f|^2\right) d \mathbf{{x}} \;. \end{aligned}$$
(8)

Here, \(c_i\) is a local confidence function for the candidate flow \(\mathbf{{w}}_i\) and \(\varPsi _C\) is the Charbonnier penalizer [10] that makes the estimation more robust against outliers in the candidate flows. In Sect. 4, we will define appropriate confidence functions \(c_i\) that steer the local influence of each instance flow \(\mathbf{{w}}_i\) on the final flow \(\mathbf{{w}}_f\). The overall weight \(\lambda _C\) balances \(\mathcal {E}_{cand}(\mathbf{{w}})\) and \(\mathcal {E}_{base}(\mathbf{{w}}_f)\) by steering the direction of information flow between the candidate flows and the final flow. The higher it is, the more remains the estimation of the candidates \(\mathbf{{w}}\) unaffected by the similarity term and the information only flows from \(\mathbf{{w}}\) to \(\mathbf{{w}}_f\) via \(\mathcal {E}_{sim}\) while backward information flow is suppressed.

4 Smoothness Weights and Confidence Functions

Since we desire candidate flows at different smoothness scales, the questions arise how to choose the global smoothness weights of these flows and how to locally decide which flow candidate is the most appropriate. Let us discuss these two issues in the following sections.

4.1 Smoothness Weights

First of all, we define a maximum smoothness weight \(\alpha _1\) which is intended to be appropriate at most locations. Moreover, we consider smoothness weights that are significantly smaller in order to be able to capture large displacement motions. Our choice for the smoothness weights \(\alpha _i\) of the flow candidates \(\mathbf{{w}}_i\) is an exponential decrease w.r.t. \(\alpha _1\):

$$\begin{aligned} \alpha _i := \frac{\alpha _1}{2^{i-1}}\;. \end{aligned}$$
(9)

With this choice, we can cover a wide range of different smoothness scales with only a low number of candidate flows. By the example of the Tennis sequence [8] depicted in Fig. 1 (top row), one can see at which smoothness scale the different motion patterns appear. While the first, smoothest flow covers the background motion and the overall motion of the Tennis player smoothly, the second flow covers the motion of the racket and the arm well, the third flow covers the motion of the hand and the right foot while the fifth flow covers the motion of the ball.

Fig. 1.
figure 1

Top row: Candidate flows with isotropic regularisation. Bottom row: normalized visualizations of the local confidence functions \(c_1,\ldots ,c_5\). Right: Final flow.

4.2 Assumptions on Local Confidences

Given a set of candidate flows \(\mathbf{{w}}_i\) with different smoothness scales, we take into account the considerations from the introduction to state the local assumptions on how to integrate these flows in the estimation of the final flow \(\mathbf{{w}}_f\):

  1. 1.

    A less smooth flow is likely to fulfill the data term better than a smoother flow, independently from being reliable or unreliable. Hence, a less smooth flow shall only have influence if it provides significantly less data costs than both the next smoother flow candidate and the smoothest flow candidate (similar to considerations in [21]).

  2. 2.

    The less smooth a flow is, the more texture is necessary in order to achieve meaningful flow vectors (similar to [8]). Otherwise, we might likely get trapped into the aperture problem.

  3. 3.

    A less smooth flow should not be considered if the data is unreliable (i.e. in over- or undersaturated regions).

In order to integrate those assumptions in our local confidence functions \(c_i\), we need measures for the data cost and for the local structure. While the data costs are simply given by evaluating the data term, we compute the structure tensor [13] to measure structureness [8], both on local patches to increase robustness.

4.3 Composition of the Local Confidence Function \(c_i\)

Following the assumptions from the last section, we model the local confidence function \(c_i\) (where i is the index of the candidate flow) as the product of three weights which will be defined in the following.

Structureness Weight. Let \(s(\mathbf x )\) be the smaller eigenvalue of the structure tensor (integrated over a \(7 \times 7\) neighborhood) of the reference frame \(f_1\), let \(\bar{s}\) be its average value over the whole image and let \(r_{i} = \frac{\alpha _1}{\alpha _i}\). The structureness weight is then defined as

$$\begin{aligned} w^s_{i}(\mathbf{{x}}) := \left( \frac{s(\mathbf x )}{\bar{s}}\right) ^{\kappa _s \cdot \text {log}(r_i)}\;, \end{aligned}$$
(10)

where the exponent \(\kappa _s\) is a free parameter. Here, the structureness weight is more pronounced for less smooth candidate flows (i.e. if \(r_i\) is bigger).

Cost Reduction Weight. Let \(\mathcal {E}_{\mathrm {D}}\) be the data costs and let \(\rho _{L\times L}(g, \mathbf x )\) be a functional that averages the function g in a \(L \times L\) neighborhood around \(\mathbf x \). The following two functions describe the patch-wise energy improvement of flow \(\mathbf w _{i}\) compared to the previous, smoother flow \(\mathbf w _{i-1}\) and the first and smoothest flow \(\mathbf w _{1}\), respectively:

$$\begin{aligned} \delta _{\text {prev},L}(\mathbf x , \mathbf w ,i):= & {} \rho _{L \times L}(\mathcal {E}_{\mathrm {D}}(\mathbf w _{i-1}), \mathbf x ) - \rho _{L \times L}(\mathcal {E}_{\mathrm {D}}(\mathbf w _{i}), \mathbf x )\;,\\ \delta _{\text {first},L}(\mathbf x , \mathbf w ,i):= & {} \rho _{L \times L}(\mathcal {E}_{\mathrm {D}}(\mathbf w _{1 {-0}}), \mathbf x ) - \rho _{L \times L}(\mathcal {E}_{\mathrm {D}}(\mathbf w _{i}), \mathbf x ). \end{aligned}$$

The cost reduction weight is then defined as

$$\begin{aligned} w^d_i(\mathbf{{x}}) := \text {log}\left( 1 + \text {e}^{\kappa _d (\delta _{\text {prev},L}(\mathbf x , \mathbf w , i) + \delta _{\text {first},L}(\mathbf x , \mathbf w , i))}\right) , \end{aligned}$$
(11)

where \(\kappa _d\) is a free parameter. Please note that this function resembles a linear one for large arguments of the exponential while it approaches zero for decreasing (negative) arguments.

Data Reliability Weight. We define \(\chi _I(\mathbf x )\) as an indicator function that excludes under- and oversaturated regions. It reads

$$\begin{aligned} \chi _I(\mathbf x ) = \left\{ \begin{matrix} 1 &{} \text {if} f^c_1(\mathbf x ) > \tau \text {and} f^c_1(\mathbf x ) < 255 - \tau \forall c \in \{1,2,3\} \\ 0 &{} \text {else} \end{matrix} \right. , \end{aligned}$$
(12)

where \(\tau = 1\) is a robustness threshold.

Overall Confidence Function. The overall confidence functions \(c_1, \ldots , c_N\) are then defined as follows

$$\begin{aligned} \hat{c}_i(\mathbf x , \mathbf w ) := w^d_i(\mathbf{{x}}) \cdot w^s_i(\mathbf{{x}}) \cdot \chi _I(\mathbf x )(i > 1). \end{aligned}$$
(13)

In order to be numerically robust, they are bounded from above via

$$\begin{aligned} c_i(\mathbf x , \mathbf w ) := \text {min}\left( \hat{c}_i(\mathbf x , \mathbf w ), 1000\right) . \end{aligned}$$
(14)

Since the smoothest flow \(\mathbf{{w}}_1\) serves as reference, it should be used everywhere except for those locations where a less smooth flow could improve the result. Hence, we define the confidence \(c_1\) of the smoothest flow as

$$\begin{aligned} c_1(\mathbf x , \mathbf w ) := 1, \end{aligned}$$
(15)

which corresponds to the confidence of the other flows at average structured areas with only a small energy reduction.

Exemplary visualizations of these local confidence functions \(c_i\) for the Tennis sequence are shown in Fig. 1 (bottom row) where brighter values indicate higher confidence. As one can see, for each large displacement we have a high confidence in the smoothest candidate flow that is able to capture it.

5 Minimization

The whole variational model is minimized in a standard coarse-to-fine setting with warping and incremental computations [17]. Due to the nonlinearity of the penalizer functions, we additionally apply the lagged nonlinearity method in order to transform the nonlinear subproblems into series of linear equation systems. These linear equation systems are then solved using a multicolor variant of the successive overrelaxation (SOR) method [1].

Please note that in Eq. 8 the flow \(\mathbf{{w}}\) is apparent in both the confidence functions and the coupling term. In order to avoid multiplications of unknowns during the minimization, in each coarse-to-fine level we compute the confidence functions based on the flow from the previous level. This can also be seen as a lagged nonlinearity method regarding the computation of the confidences.

6 Evaluation

In order to evaluate the performance of our method, we conducted several experiments. These include a qualitative comparison against LDOF [8] that investigates the large displacement capabilities of our method, an experiment that analyzes the effect of constraint normalization in this context, an experiment that evaluates the effect of different types of data costs and a quantitative experiment on the MPI Sintel benchmark [9] that shows improvements compared to the baseline method. In all experiments, we optimized only the following parameters: the number N of candidates, the data weights \(\delta \) and \(\gamma \) and the smoothness weight \(\alpha _1\). To this end, we used the downhill simplex method as implemented in [22]. The remaining parameters are kept fixed throughout all experiments. They are given by \(\beta _i = \alpha _f = \alpha _1\), \(L = 5\), \(\lambda _C = 1000\), \(\kappa _s = 0.3\), \(\kappa _d = 5\), \(\epsilon _\mathrm {D} = 0.01\), \(\zeta = 0.01\), \(\epsilon _\mathrm {S_1} = 0.02\), \(\epsilon _\mathrm {S_2} = 0.03\).

Fig. 2.
figure 2

Left to right: Baseball sequence [26], Beanbags sequence [2], Bird sequence, Football sequence [26], Human Eva sequence [19]. Top to down: Overlayed frames, baseline result, LDOF, our result (isotropic), our result (anisotropic).

6.1 Large Displacement Sequences

In our first experiment, we evaluate the performance of our method in the context of large displacements. To this end, we consider various challenging large displacement sequences from the literature and compare our results to those of the method of Brox and Malik (LDOF) [8] which has introduced descriptor matching in variational methods for large displacement optical flow. The parameters for all sequences are \(\delta = \gamma = 0.5\), \(\alpha _1 = 2\) and \(N=7\) candidate flows.

In Figs. 2 and 3 we show the results of both the publicly available implementation of LDOF and our novel variational method for large displacement optical flow. As one can see, our method correctly estimates the large displacements that LDOF is able to estimate – and even some more (see e.g. Tennis sequence 496). This particularly includes the displacements of the tennis balls that evidently extent their sizes. The extremely challenging Bird sequence [26] shows the limitations of both methods as none of them could capture the motion of the bird’s head. In order to demonstrate that the correct estimation of large displacements does not depend on the anisotropic regularizer, we also added results for our method with an isotropic smoothness term (which is also used in LDOF).

While we have chosen the number of candidate flows fixed for all sequences, one may actually improve the results further by choosing it according to the extent of large displacements. For the beanbags sequences, already a value of \(N=3\) is sufficient, while we need a value of \(N=7\) in order to capture the motion of the tennis ball in the Tennis sequence 577.

Fig. 3.
figure 3

Left to right: Tennis sequences 496, 502, 538, 577 [8]. Top to down: Overlayed frames, baseline result, LDOF, our result (isotropic), our result (anisotropic).

Fig. 4.
figure 4

From left to right: No constraint normalization, \(\zeta = 1\), \(\zeta = 0.1\), \(\zeta = 0.001\), \(\zeta = 0.00001\). From top to bottom: Tennis sequences 496 and 577.

6.2 Constraint Normalization

In our second experiment we show that constraint normalization [27] is helpful in the context of large displacements. To this end, we estimated flow fields without normalization and with normalization for different values of the normalization parameter \(\zeta \). While the general benefits of the constraint normalization have already been shown in [27], Fig. 4 shows the results on two large displacement sequences. As one can see particularly at hand of the tennis balls, both the deactivation of the constraint normalization and a too high value of \(\zeta \) inhibit the estimation of large displacements. A too low value for \(\zeta \), in contrast, leads to noisier results. Using constraint normalization with a value between 0.001 and 0.01 (our standard value) for \(\zeta \) provides the best results.

6.3 Influence of the Data Constancy Assumptions

In our third experiment, we analyze the two types of data terms we used in our model w.r.t. their data costs and their influence on the fusion scheme. While the Brightness Constancy Assumption (BCA) can produce high costs at any part of a mismatched object, the Gradient Constancy Assumption (GCA) can only produce data costs where edges are involved. It is hence a lot sparser (see Fig. 5, top row). As can be seen from the bottom row of Fig. 5, the fusion using only the GCA data term is by far inferior to the results of using BCA or combining both data terms. The data costs of a pure GCA data term for incorrect matches are too low and hence it cannot compete with the smoothness term which prevents the motion discontinuity of a large displacement. In contrast, when including the BCA, the denser data costs make the misestimation of large displacements more expensive and thus increase the probability to estimate large displacements correctly. This shows that data costs with dense coverage for mismatched objects are important for our fusion scheme.

Fig. 5.
figure 5

From left to right: Brightness Constancy Assumption (BCA), Gradient Constancy Assumption (GCA) and both combined. From top to bottom: Data costs of the baseline flow (brighter grey values indicate larger energies), final result.

6.4 MPI Sintel Benchmark

In our fourth experiment, we compare our strategy with the baseline method (Complementary Optical Flow [27]) on the MPI Sintel benchmark [9]. To this end, we use our method with the first order complementary regularizer and computed results both for the training and the evaluation data.

Regarding the training data, Table 1 shows a clear improvement over the baseline (\(N = 0\)). The average endpoint error (AEE) decreases from 4.273 down to 3.974 (by 7%). This behavior is confirmed by the results for the evaluation data sets that are listed on the MPI Sintel webpage where our method is denoted as ContFusion and the baseline is denoted as COF. Here, the error decreases from 6.496 to 6.263 (by 3.6%) for the clean pass and from 8.204 to 7.857 (by 4.2%) for the final pass. This shows that the our novel strategy of simultaneous estimation and fusion of motion candidates is also beneficial in a quantitative sense.

Table 1. Quantitative results on the clean training data of the MPI Sintel benchmark.

6.5 Limitations

The behavior at occlusions is a limitation of our method. This can be seen both visually at the large displacement sequences (in Figs. 2 and 3) and quantitatively at the unmatched EPE in the MPI Sintel benchmark (that increases compared to the baseline). Additionally to regions with mismatched objects, occluded regions produce potentially high data costs. Since our confidence function heavily relies on data costs, correct smooth flows are replaced by less smooth candidate flows that lead to a smaller local data energy but are often meaningless.

7 Conclusion

In this work, we pushed the limits of variational approaches that are minimized using a standard coarse-to-fine scheme a little bit further w.r.t. large displacements. We have shown that many large displacement cases from the literature can be estimated without the need for descriptor matches. The weaknesses of prior variational methods in these cases are not due to weak data representations on coarse resolutions but due to a weight balancing of data term and smoothness term that is inappropriate for large displacement optical flow estimation. With multiple instances of the baseline model and appropriate choices of weighted similarity terms, we can estimate different scales of motions within a single variational model that simultaneously estimates and fuses candidate flows with different smoothness weights. The findings were confirmed by the evaluation which showed a good performance for large displacements and an improvement over its baseline method.

Limitations include the behavior at occluded regions where advanced occlusion handling would be necessary. Future work includes the handling of severe illumination changes where the BCA is not applicable at all and the GCA alone cannot help to estimate large displacements correctly, as well as the inclusion of second order smoothness terms for non-fronto-parallel motion patterns.