Abstract
For the last 40 years, optical flow (OF) estimation has been challenging the computer vision community. OF is the apparent motion of the pixels in two consecutive images. Variational models are the most used technique to estimate OF, stating an OF estimation error energy model. The minimization of this energy performs an iterative coarse-to-fine approach. In each iteration, the OF is processed with an intermediate filter to eliminate outliers. In the literature, there are studies related to the number of warpings, the number of considered scales, and their effect on the optical flow estimation. However, in the literature, there is no work that studies in depth the effect of different intermediate filters on the optical flow estimation performance. That fact motivates us to compare the effect of different filters in optical flow estimation. Taking an OF model, which is robust to large displacements and illumination changes, we tested it with four different filters: Bilateral filter (BF), Median filter (MF), Weighted median filter (WMF), and a new filter called a Balanced median filter (BMF). This BMF is a weighted sum of the Median filter and the Bilateral filter (BF) using an adaptive weight map that balances the optical flow estimation error due to the median filter or the Bilateral median filter. We assessed these three filters in the state-of-the-art MPI-Sintel training set. That contains realistic synthetic scenes presenting blur, fog, illumination changes, and sequences with significant displacement. Our experimental evaluation shows that the Balanced median filter performs better than the other two filters. When we validated our results on the MPI-Sintel benchmark webpage, we observed that our model performs better than classical methods such as Horn-Schunck, TVL1, and LDOF. Besides, it performs similar to current optical flow estimation methods such as GeoFlow, CPNFlow, and FLOW_UFO.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
AMS Classification
3.1 Introduction
Optical Flow estimation is one of the most challenging problems in computer vision. Optical flow is defined as the per-pixel motion between two consecutive digital images. Optical flow has many applications, such as video post-production, particle velocimetry, video compression, control of autonomous vehicles, and many others. In Fig. 3.1, we show an example of these applications. A transparent plastic model full of water containing black plastic tracers is used to estimate the fluid’s velocity.
In Fig. 3.1, we show velocity estimation inside a process showing an oxygen inlet and small black plastic tracers. Oxygen is injected from the left side of the plastic model, causing the fluid to move. Tracer particles move at the same velocity as the fluid. This application aims to determine the location where maximum velocity is reached to predict erosion of internal walls.
In Fig. 3.2, we show the second application of the optical flow. We show two consecutive images. In these images, we show a person riding a bike that moves to the left. We estimated the optical flow between the two images; with this optical flow, we know the displacement for each pixel, so an image in between can be created. We show in Fig. 3.2c an interpolation of two images using optical flow.
3.1.1 Related Works
Since the seminal work of [1], many contributions have been made in order to improve the optical flow estimation. In that work, the authors proposed a variational model to estimate the optical flow. The proposal is an energy model to estimate the optical flow estimation error, and the argument that minimizes that energy is the optical flow of the sequence of images. The proposal is a model that uses a quadratic error, and it means that the model is susceptible to outliers and the presence of noise. Zach et al. [2] proposed another model based on the absolute value of the error. All those models in an iterative way minimize the energy error model. In each iteration, some of them filter the optical estimation to eliminate noise or outliers, avoiding noise and outliers propagating across the iterations.
Bidimensional Empirical Mode Decomposition (BEMD) presented in [3] is a method to decompose a 2D signal in its frequency modes. This work was applied to texture extraction and also 2D signal filtering. Their algorithm consists of the extraction of features at multiple scales or spatial frequencies. These features, called intrinsic mode functions, are extracted by sifting. The author performs this sifting using morphological operators to detect regional maxima and radial basis functions for surface interpolation. The author demonstrates the efficiency of their proposal with synthetic and natural images.
3.1.2 Optical Flow Filtering
In [4], the authors integrated a Median filter in the optical flow model. Their proposal formalizes the Median filter with a new model that integrates OF estimation over a local neighborhood. In [5], the Median filter of size \(3 \times 3\) is used to eliminate irregularities of the optical flow and also noisy estimation. In [6], to enhance the OF estimation, a bilateral filter is used. In that work, the authors replaced the anisotropic diffusion of their proposed model with a novel multi-cue-driven bilateral filter that considers the estimated occlusion.
Dérian et al. [7] utilizes another approach, where an optical flow model based on wavelet analysis is presented. The multi-resolution approach used in the optical flow estimation is similar to the multi-resolution used in wavelets analysis. The authors constructed a scale-space representation of the optical flow; furthermore, they provide a mechanism to locally approach the optical flow using high-order polynomials by trunking wavelets at fine scales. This methodology was not evaluated in a contemporaneous dataset but evaluated video sequences of moving fluids.
Motivation
In [8] an optical flow study is presented. This study consider the optical flow estimation using image pyramid and also considers a theoretical analysis of warping, but this study does not consider the study of intermediate filters. The study in [9] varies the number of warpings, the image pyramid levels, and the influence of parameters but does not take into account the intermediate filters. Those facts motivate us to compare the optical flow estimation performance considering different intermediate filters.
3.1.3 Contribution of this Work
In this study, we use the optical flow estimation model proposed in [5]. The used model is based on the absolute value of the optical flow estimation error (\(TV-L^1\)) and, thanks to other components, is robust to illumination changes and medium displacements. We present, in this work, the following contributions:
-
(a) Evaluation of the OF performance estimation considering four intermediate filters: bilateral filter, median filter, weighted median filter, and a balanced weighted median filter.
-
(b) We proposed an adaptive or weighted sum of the bilateral and the median filter.
-
(c) We performed final evaluation in the complete MPI-Sintel test dataset and submitted the results to the MPI-Sintel benchmark web page [10].
In Sect. 3.2, we explain the principal strategies used to estimate the optical flow. We will briefly explain the linearization of the optical flow constraint, warping an image, and image pyramid. We explain these strategies in order to make our manuscript self-contained. In Sect. 3.3, we explain the filters considered in this work. In section 3.4, we explain experiments and used dataset. In Sect. 3.5, we present our obtained results and a brief discussion about other methodologies and our results. Finally, in Sect. 3.6 we present our conclusions and future work.
3.2 Preliminary
In order to state a model of the optical flow estimation, we consider two consecutive color images \(I_0\) (reference image) and \(I_1\) (target image), where \(I_0\), \(I_1: \varOmega \rightarrow \mathbb {R}^3\) and \(\varOmega \) a rectangular image domain; let \(\textbf{u}:\varOmega \rightarrow \mathbb {R}^2\) be the optical flow between these two consecutive images (reference and target) where \(\textbf{u}(\textbf{x})=(u_1(\textbf{x}),u_2( \textbf{x}) )\) has two components, that is to say, \(u_1, u_2:\varOmega \rightarrow \mathbb {R}\). Optical flow estimation aims to determine a motion field \(\textbf{u}(\textbf{x})\) such that \(I_0(\textbf{x})\) and \(I_1( \textbf{x} + \textbf{u}(\textbf{x}))\) are measures of the same pixel \(\textbf{x}\), i.e.:
Equation 3.1 in the literature is called the color constancy constraint. Equation 3.1 is highly non-linear and a linearization is applied around a known optical flow \(\textbf{u}_0(\textbf{x})\).
3.2.1 Linearized Color Constancy Constraint
Considering a known value \(\mathbf {u_0}(\textbf{x})\) of the optical flow, a Taylor expansion is valid:
with \(\langle , \rangle \) being the scalar product, \(I_1(\textbf{x}+\textbf{u}_0(\textbf{x}))\) the warped image (\(I_1(\textbf{x})\)) by a known optical flow \(\textbf{u}_0(\textbf{x})\), and \(\nabla I_1(\textbf{x} +\textbf{u}_0(\textbf{x}) )\) a gradient vector of \(I_1(\textbf{x}+\textbf{u}_0(\textbf{x}))\).
3.2.2 Image Warping
Image warping is a process of image manipulation such that it distorts any shape contained in the image. Given an image \(I(\textbf{x})\) and an optical flow \({\textbf{u}_0}(\textbf{x})\), we can warp the image \(I(\textbf{x})\) to obtain \(I(\textbf{x}+\mathbf {u_0}(\textbf{x}))\) as we show in Fig. 3.3.
3.2.3 Image Pyramid
In large displacements, the Taylor approximation does not hold, and we use a coarse-to-fine strategy. This strategy constructs a multi-scale pyramid down-sampling images with a factor of 2. In Fig. 3.4, a scheme is shown of the image pyramid, considering four levels.
Figure 3.4 begins with the coarsest scale, and each of them doubles the image dimension in the previous one. At each level, the optical flow is filtered with an intermediate filter to eliminate noise and outliers.
3.3 Intermediate Filters
Traditionally, OF methods perform the computation in an image pyramid. In each resolution, intermediate filter processing was applied to the analysis to eliminate estimation irregularities and noisy estimation. We explain the scheme using pseudo-code in algorithm 1 where \(N_\textrm{levels}\) are the level numbers in the pyramid and \(N_\textrm{warpings}\) is the warping number in each scale once the optical flow \(\textbf{u}(\textbf{x})\) is computed. Intermediate filtering is applied in every iteration of warping Algorithm 1.
3.3.1 Bilateral Filter
The Bilateral filter can be represented as
with \(\mathcal {N}(\textbf{x})\) a neighborhood around \(\textbf{x}\), \(w(\textbf{x},\textbf{y})\) are the exponential weights, and \(u_i\) (with \(i=1,2\)) are the vertical and horizontal components of the optical flow:
where \(\phi _S\) is the spatial distance between \(\textbf{x}\) and \(\textbf{y}\), and \(\phi _I\) is the photo-metric distance between the pixels in \(I_0(\textbf{x})\) and \(I_0(\textbf{y})\). \(\phi _S\) and \(\phi _I\) are Gaussian kernels given by \(\phi _S(\textbf{x})=\displaystyle e^{-\frac{\Vert \textbf{x} \Vert ^2}{2\sigma _S^2}}\) and, \(\phi _I(I_0(\textbf{x}))=\displaystyle e^{-\frac{\Vert I_0(\textbf{x}) \Vert ^2}{2\sigma _I^2}}\) where \(\sigma _S>0\), \(\sigma _I>0\), and \(I_0(\textbf{x})\) is the reference image.
3.3.2 Median Filter
The Median filter is a non-linear filter, which is used to remove noise and outliers from the optical flow estimation. Given a neighborhood around \(\textbf{x}\) (let’s say \(\mathcal {N}(\textbf{x})\)),
We considered square geometries to implement the Median filter, i.e., we filter a \(3 \times 3\) square around each \(\textbf{x}\) point. Each component of the optical flow is filtered with this filter.
3.3.3 Weighted Median Filter
The Weighted median filter is a non-linear filter applied to a bidimensional distribution of weights and pixel intensities. Each pixel in a neighborhood \(\mathcal {N}(\textbf{x})\) has a weight. The goal is to sort the pixel intensity values, given the distribution of weights. In this case, for each pixel \(\textbf{x}\), we use bilateral weights. The following expression represents weights in each neighborhood:
We sorted both the weights w and also the values in \(u(\textbf{x})\). The auxiliary variables are \(s_w\) and \(s_u\), representing weight values and optical flow values, respectively. We found in the array \(s_w\) positions \(p^*\) that hold
Finally, using the position \(p^*\), we assign the weighted median filtered values of u:
3.3.4 Balanced Median Filter
Our proposal is a weighted combination of the bilateral and the median. This linear combination has an adaptive weight \(\alpha (\textbf{x})\). This adaptive weight balances the contribution of the bilateral and the median filter in the intermediate filtering.
Let \(u_{bl}(\textbf{x})\) and \(u_{m}(\textbf{x})\) be the filtered estimated optical flow by the bilateral filter and the median filter, respectively. Following the ideas in [12], we constructed a balance weight:
where \(D_{bl}(\textbf{x})\) is given as
and \(D_{m}(\textbf{x})\) is given as
These two terms, \(D_{bl}(\textbf{x})\) and \(D_{m}(\textbf{x})\), represent the error of the OF in the point \(\textbf{x}\) when the bilateral filter or the median filter are applied, respectively. The convex combination gives the combined optical flow:
Depending on the values of \(\alpha _i(\textbf{x})\), the flow can be more confident in the median filter or the bilateral filter. Equation 3.9 shows that when \(D_{bl}>>D_{m}\), \(\alpha _i\) value is almost 0 and when \(D_{bl}<<D_{m}\), \(\alpha \) is almost 1.
3.4 Dataset and Experiments
This section presents the performed experiments and the dataset used to evaluate optical flow estimation performance.
3.4.1 Dataset
The dataset contains different video sequences that present blur, fog, different illumination, and many scenes with large displacement and fast displacements. The dataset is divided into two subsets. One set is a training set, and the second one is a validation set. The training set is also divided into two subsets, one called clean and the other called final. The final stage considers the effects mentioned above. This set is more challenging than the clean one; therefore, we performed the experiments in this set. Figure 3.5 shows examples of video sequences in the MPI-Sintel dataset, and we offer the available ground truth optical flow with arrows and also using color code. We show the used color code in Fig. 3.5i.
In Table 3.1, we show a numerical description of each sequence of the final MPI-Sintel training set. In the MPI-Sintel dataset, the optical flow ground truth is available as shown in Fig. 3.5. Thus, this ground truth let us compute end-point-error (EPE) and angular-average-error (AAE); these errors are giving by the following expressions:
3.4.2 Experiments
In the following, we explain experiments performed using the MPI-Sintel dataset.
-
(i)
Evaluation using median filter
We evaluated the complete MPI-Sintel training set in the final version using the median filter of size \(3 \times 3\) as an intermediate filter.
-
(ii)
Evaluation using weighted median filter, median filter, and balanced median filter
We evaluated the complete MPI-Sintel training set in the final version using the weighted median filter considering \(\sigma _I=200\) and \(\sigma _s=200\).
-
(iii)
Evaluation using Balanced weighted median filter
also evaluated our proposed combined filter, which combines a weighted median filter and a bilateral filter.
-
(iv)
Evaluation using a bilateral filter.
3.5 Results and Discussion
We have evaluated the filters in sequences that contain medium displacements, which are the sequences: market, mountain, shaman, sleeping, and temple. In these evaluation sets, we assessed EPE and AAE for these intermediate filters. In these experiments, we computed OF in the training set, which is around 490 images. We present in Table 3.2 second and third columns, the obtained numerical results. As a resume, we obtained an \(EPE=4.47\) and \(AAE=9.22\text { }\) and finally \(EPE+AAE=13.69\) by the Median filter. In Table 3.2 in the fourth and fifth column, we show results obtained in the MPI-Sintel data set using the Weighted Median filter. We obtained \(EPE=4.47\) and \(AAE=9.23\text { }\) and also \(EPE+AAE=13.70\). These results are worser than the results obtained by the Median filter.
In Table 3.2 sixth and seventh columns, that the \(EPE+AAE=13.44\). Comparing results obtained by the Median filter (\(EPE+AAE=13.69\)) and Weighted Median (\(EPE+AAE=13.70\)), and Balanced filter, we see that the obtained results are very similar. There are differences of 0.26 between the Weighted Median filter and the balanced median filter. We observe a small difference in the AAE between the Median filter and balanced median filter. The Balanced median filter performs better than the median filter (\(AAE=8.99\)). This result indicates that the estimated optical flow is better aligned w.r.t. the ground truth.
3.5.1 Comparison with Other Methods
We submitted the obtained results in the MPI-Sintel to the MPI-Sintel webpage. Those results were ranked in a benchmark that compares different OF method results. Figure 3.6 shows the obtained performance. In Fig. 3.6, we show the performance of our proposal. Our model using the combination of bilateral filter and weighted median filter (called TVL1_BWMFilter) outperforms classic methods like Horn-Schunck [1]. Our proposal presents an \(EPE=9.034\) outperforming the \(TV-L^1\) classic formulation [13] and the non-local optical flow in Classic+NL [4]. Our proposal performs similar to Motion Detail Preserving Optical [12] flow, which considers additional correspondences, giving hints to guide the OF estimation. Figure 3.7 is shown with some examples of obtained results by our method.
In Fig. 3.7c and f is shown obtained OF for images of the sequence PERTURBED_shaman_1 and tiger1 of the MPI-Sintel test dataset. In these two images of the sequence PERTURBED_shaman_1, we obtained a \(EPE=1.648\) (Fig. 3.7a and b), and in the two images of sequence tiger1 (Fig. 3.7d and e) we obtained \(EPE=1.678\).
3.5.2 Bidimensional Empirical Mode Decomposition
We have filtered the estimated optical flow using Bidimensional Empirical Mode Decomposition (BEMD) as a proof of concept. We used a MATLAB implementation available on the web. In Fig. 3.8, we show a video sequence where a dragon runs following a chicken. We estimated the optical flow, and we extracted the BEMD.
We observe in Fig. 3.8d the edges of estimated optical flow in (c), and in Figure (f) intermediate spatial frequencies are shown. In Fig. 3.8, low frequencies are shown, but the edges were not preserved, i.e., the EMD performs as an anisotropic filter. Finally, in (f), we observe the continuum component of the flow, which is very blurred and does not preserve shapes.
The MATLAB implementation of the BEMD decomposition method runs four iterations in 1435 seconds, which is not suitable for real time.
3.5.3 Processing Time
We measured the total processing time at each scale, which depended on the image size. That is, at different scales, we have different processing times. The measure was performed on a Laptop MSi-i7, running on a single core (please see Table 3.3).
3.5.4 Discussion
Regarding the BEMD method, we observe in Fig. 3.8f that the method performs similar to the Gaussian filter, i.e., it blurs the image. Edge preserving in optical flow estimation is an essential feature that a method should have. We think that a minor modification of BEMD decomposition is possible in order to preserve edges. As a future work, we will consider modifying the BEMD method to consider anisotropic morphological operators. And then evaluate its effect on the optical flow estimation.
Concerning another approach to optical flow estimation as in [7], the author didn’t evaluate their proposal in the standard dataset such as MPI-Sintel [10] or [11]. Also, there is no available code in order to compare our results with their proposal. Considering implementing the proposal in [7], as future work, we will consider assessing this proposal with ours in a standard dataset such as MPI-Sintel.
Concerning the obtained results, we were comparing the columns in Table 3.2 associated with the Median filter and Balanced median filter. We observe for sequence shaman that the average angular error dropped from 7.11 to 6.58, which is 0.43 degrees, and EPE dropped from 0.37 to 0.35 with 0.02 pixels. We observe that the most significant reduction was in the angular error, which means that the optical flow is better aligned with the ground truth for this sequence. We highlight these results because some of these sequences contain large displacement or minimal displacements, and shaman contains medium displacements. The proposal is better suited for small and medium displacements.
3.6 Conclusions
To perform our intermediate filter study, we have used a robust model for illumination changes and can handle large displacements. With this model, we assessed the performance of these filters: Median, Weighted Median, Balanced Median filter, and bilateral filter. We evaluated the performance in the MPI-Sintel dataset and submitted our results to a benchmark webpage. We obtained that the Balanced Median filter outperforms the other three filters; thus, we process the MPI-Sintel test. The obtained results show that our proposal outperforms the classical Horn-Schunck method and TV-L1 model and other models, robust to large displacements (LDOF) or non-local methods (Classic+NL). It also outperforms current methods such as GeoFlow [14] and CPNFlow [15]. In future work, we can investigate the uses of the BEMD decomposition method. However, with an anisotropic consideration to preserve edges and shapes and as future work, we will consider implementing the wavelet optical flow to assess this proposal and our proposal in a standard dataset such as MPI-Sintel to compare performance.
References
Horn, B.K., Schunck, B.H.: Determining optical flow. Artif. Intell. 17, 185–203 (1981)
Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime TV-L1 optical flow. In: Hamprecht, F.A., Schnörr, C., Jähne, B. (eds.) Pattern Recognition. DAGM 2007. Lecture Notes in Computer Science, vol. 4713. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978_3_540_74936_3_22
Nunes, J., Bouaoune, Y., Delechelle, E., Niang, O., Bunel, Ph.: Image analysis by bidimensional empirical mode decomposition. Image Vis. Comput. 21(12), 1019–1026 (2003)
Sun, D., Roth, S., Black, M.J.: Secrets of optical flow estimation and their principles. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2432–2439 (2010)
Lazcano, V.: Study of specific location of exhaustive matching in order to improve the optical flow estimation. In: 15th ITNG, April 2018, Las Vegas, Nevada, pp. 603–661 (2018)
Xiao, J., Cheng, H., Sawhney, H., Rao, C., Isnardi, M.: Bilateral Filtering-based flow estimation with occlusion detection. In: Proceedings of the ECCV, pp. 221–224 (2006)
Dérian, P., Héas, P., Herzet, C., Mémin, E.: Wavelets and optical flow motion estimation. Numer. Math. J. Chin. Univ. Nanjing Univ. Press 2013(6), 116–137 (2003)
Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J. (eds.) Computer Vision–ECCV 2004. Lecture Notes in Computer Science, vol. 3024, pp. 25–36. Springer, Berlin (2004)
Meinhardt-Llopis, E., Sánchez-Pérez, J., Kondermann, D.: Horn-Schunck optical flow with a multi-scale strategy. Image Process. Line 3, 151–172 (2013). https://doi.org/10.5201/ipol.2013.20
Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: Fitzgibbon, A., et al. (eds.) European Conference on Computer Vision (ECCV). Part IV, LNCS 7577, Oct, pp. 611–625. Springer (2012)
Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. IJCV 92(1), 1–31 (2011)
Xu, L., Jia, J., Matsushita, Y.: Motion detail preserving optical flow. In: IEEE CVPR (2010)
Wedel, A., Pock, T., Zach, C., Bischof, H., Cremers, D.: An improved algorithm for TV-L1 optical flow. In: Statistical and Geometrical Approaches to Visual Motion Analysis. LNCS, vol. 5604 (2009)
Mei, L., Chen, Z., Lai, J.: Geodesic-based probability propagation for efficient optical flow. Electron. Lett. 54(12), 758–760,14,6, (2018). https://doi.org/10.1049/el.2018.0394
Yang, Y., Soatto, S.: Conditional prior networks for optical flow. In: The European Conference on Computer Vision (ECCV), pp. 271–287 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Lazcano, V., Isa-Mohor, C. (2023). Experimental Evaluation of Four Intermediate Filters to Improve the Motion Field Estimation. In: Subrahmanyam, P.V., Vijesh, V.A., Jayaram, B., Veeraraghavan, P. (eds) Synergies in Analysis, Discrete Mathematics, Soft Computing and Modelling. Forum for Interdisciplinary Mathematics. Springer, Singapore. https://doi.org/10.1007/978-981-19-7014-6_3
Download citation
DOI: https://doi.org/10.1007/978-981-19-7014-6_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7013-9
Online ISBN: 978-981-19-7014-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)