A contour tracking method of large motion object using optical flow and active contour model

Choi, Jin Woo; Whangbo, Taeg Keun; Kim, Cheong Ghil

doi:10.1007/s11042-013-1756-6

A contour tracking method of large motion object using optical flow and active contour model

Published: 19 November 2013

Volume 74, pages 199–210, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

A contour tracking method of large motion object using optical flow and active contour model

Download PDF

Jin Woo Choi¹,
Taeg Keun Whangbo² &
Cheong Ghil Kim³

652 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

In this study, an object contour tracking method is proposed for an object with large motion and irregular shape in image sequence. To track object contour accurately, an active contour model was used, and the initial snake point of the next frame is set by defining feature points with changing curvature in the object tracked from the previous frame and calculating an optical flow at the location. Here, any misled optical flow due to irregular changes in shape or fast motion was filtered by producing a difference edge map from the previous frame, and as a solution to the energy shortage of objects with complex contour, a method of adding snake points by partial curvature was applied. Findings from experiments with real image sequence showed that the contour of an object with large motion and irregular shapes was extracted in a relatively precise way.

A Study on Object Contour Tracking with Large Motion Using Optical Flow and Active Contour Model

Multiple-Cue-Based Visual Object Contour Tracking with Incremental Learning

Tracking Deformable Target via Multi-cues Active Contours

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With the growing interest in the 3D videos, displays such as 3D TV and broadcasting technologies that make it possible to watch 3D videos have been developed and gaining popularity in a quick pace recently. However, despite ever-growing demands for the 3D contents, available 3D contents are very rare due to limited production time and money [6, 12]. The 2D-to-3D technology that converts existing 2D images to 3D videos has been drawing attention lately as a solution to this. Although there are a number of ways of realizing such a technology, generally, 2D-to-3D conversion works in the following manner [6].

Object Extraction or Segmentation
Depth Map Assignment or Generation
Rendering or Occlusion
Re-touch

In this technology, conversion is conducted by means of (1) manual conversion, (2) non real- time software automatic conversion, and (3) real-time hardware automatic conversion, depending on the use of contents, costs, and the quality of videos [6]. It usually takes 6 months and 300 staffers to carry out manual conversion to attain the level of movie-like quality, and a hybrid conversion technology that combines software conversion based on image-processing and manual work has been emerging to reduce manual work necessary for video conversion and meet quality requirements [6]. To produce quality scene-based image sequence that are properly classified, the operator intervenes to deliberately extract individual objects and subtract the background for the first frame before producing depth map through geometric analysis. The next frames are converted automatically within a reasonable time on the basis of data used for the first frame. Here, it is highly important for the operator to correctly track the contour of an object subtracted from the first frame in subsequent frames, and the more irregular the shape of the object is and the larger its movement becomes, the harder tracking gets.

In this study, we propose a method of tracking contour of an object with larger motion and irregular shapes in image sequence for the 2D-to-3D conversion in a stable manner. Regarding this issue, we explain snake algorithm as related works in Section 2 and how to track the contour of objects using the proposed optical flow and active contour model in Section 3. In Section 4, we assess the performance of the proposed method through experiments, and Section 5 discusses the conclusion.

2 Related works

Precise tracking of an object in image sequence has been viewed as one of the challenges for researchers partly because of complex background and vague difference between the object and the background. There are two different methods of tracking an object in image sequence: one is existing object extraction algorithm with some changes, and the other is an object tracking algorithm [3].

Technologies to apply the object extraction algorithm to image sequence include the method of forming an energy function considering the relationship between time-dimensional nodes together with background subtraction and graph-cut algorithm [9, 10]. These methods often cause errors in the adjacent areas with complex background. In object tracking algorithm, features such as point, kernel, and silhouette are extracted from the previous frames, with the object extracted from the following frame by matching them [1]. Designed mainly to situate an object, this method needs additional algorithms to ensure accurate extraction of the shape of an object and has disadvantages such as occlusion of the object or vulnerability for objects with large motion [3].

In recent years, extensive studies have been made on an active contour model-based method that effectively captures the modification of an object and divides in flexible lines surrounding the object [2, 8, 13–15]. In this method, the contour of an object of the next frame is tracked based on contour energy information of the object from the results of extraction of the previous frame. In this study, we propose a method of continuously tracking the outermost closed curve of a non-rigid object extracted from the first frame of image sequence by the operator in subsequent frames by using the active contour model.

2.1 Snake algorithm

The snake algorithm, as the most common active contour model, was first proposed in 1987 by Kass [2]. In this algorithm, initial snake points are set around the object to extract from the input video, and the contour of the object is extracted in the course of minimizing the energy function by moving snake points iteratively according to the defined energy function.

Snake energy function is the sum of internal energy determining the shape of snake contour and external energy that serves to pulling snake points toward the object contour.

$$ {E}_{snake}(v)\kern0.5em =\kern0.5em {\displaystyle \sum_{i=0}^{N-1}\left({E}_{\operatorname{int} ernal}\left({v}_i\right)\kern0.5em +\kern0.5em {E}_{external}\left(v{}_i\right)\right)} $$

(1)

Where snake point is set as v _i, v _i is v _i = (x _i,y _i), which means the i th snake point, and N refers to the number of snake points.

Internal energy consists of continuity energy that determines the distance between snake points and curvature energy that determines the movement of snake points.

$$ {E}_{\operatorname{int} ernal}\left({v}_i\right)\kern0.5em =\kern0.5em {E}_{continuity}\left({v}_i\right)\kern0.5em +\kern0.5em {E}_{curvature}\left(v{}_i\right) $$

(2)

Continuity energy E _continuity(v _i), meaning the energy making the distance between snake points the same, normalizes the difference between mean distance of the entire snake points $ \overline{d} $ and the distance of two snake points by dividing mean distance $ \overline{d} $.

$$ {E}_{continuity}\left({v}_i\right)\kern0.5em \approx \kern0.5em \frac{\left|\overline{d}-\left|{v}_{i+1}-{v}_i\right|\right|}{\overline{d}} $$

(3)

$$ \overline{d}\kern0.5em =\kern0.5em \frac{1}{N}{\displaystyle \sum_{i=0}^{N-1}\left|{v}_{i+1}-{v}_i\right|} $$

(4)

Curvature energy E _curvature(v _i) pulls snake contour toward the side with smaller rate of change, causing it to move toward the object contour. The difference of vector addition of previous point of the current snake point (v _i − 1) and the next point (v _i + 1) is normalized to the value with the highest rate of change in the search area (c _max).

$$ {E}_{curvature}\left({v}_i\right)\kern0.5em \approx \kern0.5em \frac{\left|{v}_{i-1}-2{v}_i\kern0.5em +\kern0.5em {v}_{i+1}\right|}{c_{\max }} $$

(5)

External energy functions to attract snake contour to features or contour of an object. Features of frequently used videos include gradient meaning the boundary of the object where brightness changes sharply. External energy at individual points of 9-neighbor including the current control point is calculated to move to the location with a larger gradient.

$$ {E}_{external}\left({v}_i\right)\kern0.5em =\kern0.5em \frac{-\left|\nabla f\left({v}_i\right)\right|}{e_{\max }} $$

(6)

2.2 Problems with snake algorithm

Since it can extract the object contour in a simple and effective manner, snake algorithm is being used broadly and has following problems although there have been many minimization algorithms proposed [4]. The algorithm:

(1)
is highly dependent on the location and shape of initial snake points.
(2)
is unable to extract the contour of an object with complex shape by means of functions of internal energy function.
(3)
takes enormous time due to the limited range where fixed points move at a time.

As this study aims at applying to the non real-time 2D-to-3D conversion, the problem as in (3) is not taken into account seriously. By calculating optical flow with the previous frame and setting the location of initial snake points for object tracking of the current frame, the problem of dependence on the location and shape of initial snake points as mentioned in (1) is solved. As far as the problem described in (2) is concerned, object contour, no matter how complex, is effectively extracted by inserting new snake points after calculating partial curvature between snake points.

3 The proposed method

This study proposes a new tracking method in order to solve the problem with object tracking using existing snake algorithm. Figure 1 shows how the proposed method works. The goal is to track exactly contour of an object in the n + 1th frame from object extraction data in the nth frame.

First of all, in the n + 1th frame, optical flow of object contour feature points in the nth frame is calculated to set the initial snake points. In this case, any optical flow as a result of wrong algorithm due to irregular object or large motion is filtered, compared to the result of morphology algorithm of difference edge map between two frames. Afterwards, activation contour is converged to the target object in the n + 1th frame in the course of finding activation contour solution in the snake algorithm. To solve the problem of energy shortage caused by complex contour of the object to track, we took into consideration of the method of adding snake points using partial curvature of Lee [5].

3.1 Calculation of optical flow

In the case of non real-time 2D-to-3D conversion, objects are classified generally by the operator for the first frame (n = 0) in image sequence of the scene to convert. From the object contour classified in this way, end points of horizontal, vertical, and diagonal components are set as feature points of object contour.

Afterwards, referring to the n + 1th frame, optical flow of feature points is calculated. Optical flow means the motions of individual pixels created by 3D movement of the object in the video or the camera represented by vector field [7]. Among many methods of tracking the motion of pixels, the Lucas-Kanade [11] algorithm, most widely known, was used in this study. The Lucas-Kanade algorithm is based on three hypotheses: brightness constancy, temporal persistence, and spatial coherence. Brightness constancy means that brightness values among video frames never change, while temporal persistence means that compared to motion of an object in a video, time change faster, causing less motions of the object between frames. Under these two hypotheses, the optical flow between time t and t + Δt is expressed as follows:

$$ \begin{array}{c}\hfill f\left(x,t\right)\kern0.5em \equiv \kern0.5em I\left(x(t),t\right)\hfill \\ {}\hfill \hfill \end{array}\kern0.5em =\kern0.5em I\left(x\left(t+ dx\right),t+ dt\right) $$

(7)

Equation (7) can be put again as in Eq. (8) by adding the partial differential chain rule and speed component of axle x and axle y:

$$ \begin{array}{c}\hfill {I}_{x_1}{V}_x+{I}_{y_1}{V}_y=-{I}_{t_1}\hfill \\ {}\hfill {I}_{x_2}{V}_x+{I}_{y_2}{V}_y=-{I}_{t{}_2}\hfill \\ {}\hfill \vdots \hfill \\ {}\hfill {I}_{x_n}{V}_x+{I}_{y_n}{V}_y=-{I}_{t_n}\hfill \end{array} $$

(8)

V _x and V _y in Eq. (8) refer to speed component of each axle, and I to brightness degree of each pixel. Lastly, spatial coherence means that adjacent points in a space are more likely to be included in the same object and show the same motion. From this, Eq. (8) can be converted into Eq. (9):

$$ \left[\begin{array}{cc}\hfill {I}_{x_1}\hfill & \hfill {I}_{y_1}\hfill \\ {}\hfill {I}_{x_2}\hfill & \hfill {I}_{y_2}\hfill \\ {}\hfill \vdots \hfill & \hfill \vdots \hfill \\ {}\hfill {I}_{x_n}\hfill & \hfill {I}_{y_n}\hfill \end{array}\right]\left[\begin{array}{c}\hfill {V}_x\hfill \\ {}\hfill {V}_y\hfill \end{array}\right]=\left[\begin{array}{c}\hfill -{I}_{t_1}\hfill \\ {}\hfill -{I}_{t_2}\hfill \\ {}\hfill \vdots \hfill \\ {}\hfill -{I}_{t_n}\hfill \end{array}\right] $$

(9)

By applying the method of least square, Eq. (9) is converted to Eq. (10) that produces speed component of each axle (V _x, V _y):

$$ \left[\begin{array}{c}\hfill {V}_x\hfill \\ {}\hfill {V}_y\hfill \end{array}\right]=\left[\begin{array}{cc}\hfill {\displaystyle \sum {I}_{x_i}^2}\hfill & \hfill {\displaystyle \sum {I}_{x_i}}{I}_{y_i}\hfill \\ {}\hfill {\displaystyle \sum {I}_{x_i}}{I}_{y_i}\hfill & \hfill {\displaystyle \sum {I}_{y_i}^2}\hfill \end{array}\right]\left[\begin{array}{c}\hfill {\displaystyle \sum {I}_{x_i}}{I}_{t_i}\hfill \\ {}\hfill {\displaystyle \sum {I}_{y_i}}{I}_{t_i}\hfill \end{array}\right] $$

(10)

Figure 2 shows the results of motion of feature points following the calculation of optical flow with the n + 1th frame, once feature points are created along the object contour extracted by the operator from the nth frame.

3.2 Setup of initial snake points

The 15frame/s images were used for this study and have tracking points where as a result of creating optical flow, quickly moving irregular objects are unable to move precisely to the object contour as shown in Fig. 2. Therefore, these tracking points cannot be used for snake algorithm as they are due to higher dependence on the location and shape of initial snake points set in 2.2, so it is impossible to track object contour properly. To solve this problem, a difference edge map between frames is created for filtering purposes. To make a difference edge map, edge maps of the nth frame and the n + 1th frame are made first of all. To reduce the impacts of surrounding noises and obtain thick gradient data, the Sobel operator is used for an edge map. The difference edge map D _edge(x,y) for the two frames is defined as follows:

$$ {D}_{edge}\left(x,y\right)=\left\{\begin{array}{c}\hfill {S}_{n+1}\left(x,y\right)-{S}_n\left(x,y\right)\begin{array}{ccc}\hfill \hfill & \hfill if\hfill & \hfill \hfill \end{array}{S}_{n+1}\left(x,y\right)-{S}_n\left(x,y\right)>{I}_{th}\hfill \\ {}\hfill \begin{array}{ccc}\hfill 0\hfill & \hfill otherwise\hfill & \hfill \hfill \end{array}\hfill \end{array}\right. $$

(11)

Where S _n(x,y) and S _n + 1(x,y) refer to edge extraction result of using the Sobel operator of the nth frame and the n + 1th frame, respectively, and 10 is used for I _th as a threshold to prevent the mixture of noises. Figure 3 is the result of creating difference edge map. Figure 3(a) and (b) show the results of the Sobel edge extraction for the nth frame and the n + 1th frame, respectively, and Fig. 3(c) is the result of creating difference edge map. Fixed background and still object in a scene where the camera does not move are set to values close to 0. Then, considering the scope of filtering, dilation algorithms as a morphological algorithm are performed many times to obtain a map for final snake point filtering as in Fig. 3(d).

Using the difference edge map obtain in this way, the feature points produced through optical flow in 3.1 are filtered. Specifically, any optical flow surpassing a set value (here 5 set from many experiments) becomes subject to filtering, while the one found not included in the value as a result of comparison with the difference edge map is removed. Figure 4 shows that almost all of the feature points not included in the object contour have been removed.

3.3 Finding activation contour solution

Feature points filtered by difference edge map are set as reference snake points in the n + 1th frame and converged to object contour with gradient of grey image as external energy. For an object with irregular shape, we considered a method of adding snake points using partial curvature of Lee [5] in the stage of renewing individual snake points in order to detect the contour in the areas where the curvature among and between snake points changes greatly . Discrete curvature k _d is calculated using three snake points v _i − 1, v _i, v _i + 1 as follows:

$$ \overrightarrow{T_{d1}}={v}_{i-1}-{v}_i\begin{array}{cc}\hfill \hfill & \hfill \hfill \end{array},\begin{array}{cc}\hfill \hfill & \hfill \hfill \end{array}\overrightarrow{T_{d2}}={v}_{i+1}-{v}_i $$

(12)

$$ \cos \theta =\frac{\overrightarrow{T_{d1}}\bullet \overrightarrow{T_{d2}}}{\left\Vert \overrightarrow{T_{d1}}\right\Vert \cdot \left\Vert \overrightarrow{T_{d2}}\right\Vert } $$

(13)

$$ {k}_d=\frac{2 \sin \theta }{d}\kern1em \mathrm{where}\kern1em d={v}_{i+1}-{v}_{i-1} $$

(14)

In the above equation, • means inner area of two vectors, and ‖‖ means the norm of vector. Since the curvature value more than the critical value means complex object contour, two new snake points are inserted into v _i − 1 + v _i/2 and v _i + v _i + 1/2.

4 Results and discussion

To test the proposed method, a ballet video (1024*768, 15frame/s) [16] and a foreman video (352*288, 50frame/s) containing irregularly changing object were used. To assess the performance, mean error of actual contour point coordinates and estimated contour point coordinates was calculated. Actual contour point coordinates as criteria for assessment were obtained from the object contour extracted manually.

The first experiment is with the foreman video with a sufficient number of frames and smaller motion of the object. Tracking was conducted in the 159th frame for the object manually extracted from the 158th frame. Weights for continuity energy, curvature energy, and external energy of the snake algorithm were set to 0.2, 0.3, and 0.5, respectively, through many experiments, and convergence condition was that the rate of change of the snake points was below 10 %. Figure 5 compares the results of the existing method of using feature points of the previous frame as they are with those of the proposed method to set initial snake points in the 159th frame. The mean errors were 2.91 and 2.23 pixels, respectively, which indicate that while both methods tracked actual contour of the object almost similarly, the proposed method worked better.

The second experiment is with the ballet video with less frames and large motion. Tracking was performed in the 84th frame for the object extracted manually from the 83th frame. Weights of continuity energy, curvature energy, and external energy of the snake algorithm are 0.2, 0.15, and 0.65, respectively. Figure 6 shows how the results of using feature points of the previous frame differ from those of the proposed method to set initial snake points. The mean errors were 23.33 and 5.61 pixels, respectively. These findings reveal that by setting initial snake points in the proposed method, the contour of the object was tracked more exactly. However, in Fig. 6(e), some points near the left hand and left foot are not converged properly to the contour of the object. Motion blur is considered to make the object contour blurred near the left hand, while external energy shortage due to highly similar brightness of the object and the background appears to have caused the problem near the left foot. In general, the boundary of the object was tracked exactly in the remaining areas.

5 Conclusion

In this study, we proposed a method of tracking object contour with large motion and irregular shape in image sequence in a stable manner using the optical flow and active contour model. Setting optical flow of feature points along the object contour extracted by the operator from image sequence as snake reference for the next frame, activation contour tracking was conducted with addition of snake points by partial curvature. Findings from experiments with actual videos indicate that irregular objects with large motion are easier to track. To apply this method to a non real-time 2D-to-3D conversion where precise tracking of object is important, however, further research is necessary to solve the problem of motion blur or energy shortage due to a small brightness difference from the background. In addition, subsequent studies will focus on how to apply a background extraction method or a mean-shift algorithm to extract edges of an object more exactly.

References

Javed O, Rasheed Z, Shafique K, Shah M (2003) Tracking across multiple cameras with disjoint views. In: Proc. IEEE International Conference on Computer Vision, Nice, France, vol 2, pp 952–957. doi:10.1109/ICCV.2003.1238451
Kass M, Witkin A, Terzopoulos D (1988) Snakes: active contour models. Int J Comput Vision 1(4):321–331. doi:10.1007/BF00133570
Article Google Scholar
Kim J, Lee J, Kim C (2011) Video object extraction using contour information. J Inst Electron Eng Korea 48(1):33–45
Google Scholar
Kim D, Lee D, Paik J (2007) Combined active contour model and motion estimation for real-time object tracking. J Inst Electron Eng Korea 44(5):64–72
Google Scholar
Lee JH (2009) A study on an improved object detection and contour tracking algorithm based on local curvature. Dissertation, Paichai University
Lee YS (2011) The trends and prospects of 2D-3D conversion technology. Mag Inst Electron Eng Korea 38(2):37–43
Google Scholar
Lee JW, You S, Neumann U (2000) Large motion estimation for omnidirectional vision. In: Proc. IEEE Workshop on Omnidirectional Vision, Hilton Head Island, USA, pp 161–168. doi:10.1109/OMNVIS.2000.853824
Leymarie F, Levince MD (1993) Tracking deformable object in the plane using an active contour model. IEEE Trans Pattern Anal Mach Intell 15(6):617–634. doi:10.1109/34.216733
Article Google Scholar
Li Y, Sun J, Shum HY (2005) Video object cut and paste. ACM Trans Graph 24(3):595–600. doi:10.1145/1073204.1073234
Article Google Scholar
Li B, Yuan B, Yunda S (2006) Moving object segmentation using dynamic 3D graph cuts and GMM. In: Proc. IEEE International Conference on Signal Processing, Beijing, China, vol 2, pp 16–20. doi:10.1109/ICOSP.2006.345658
Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: Proc. International Joint Conference on Artificial Intelligence, Vancouver, Canada, pp 674–679
Okino T, Murata H, Taima K, Iinuma T, Oketani K (1996) New television with 2D/3D image conversion technologies. In: Proc. SPIE 2653 Stereoscopic Displays and Virtual Reality Systems III: 96–103. dio:10.1117/12.237421
Pi L, Fan J, Shen C (2007) Color image segmentation for objects of interest with modified geodesic active contour method. J Math Imaging Vision 27(1):51–57. doi:10.1007/s10851-006-9797-3
Article MathSciNet Google Scholar
Xiong B, Yu W, Charoensak C (2004) Face contour tracking in video using active contour model. Proc Int Conf Image Process 2:1024. doi:10.1109/ICIP.2004.1419475
Google Scholar
Xu C, Prince JL (1997) Gradient vector flow: A new external force for snakes. In: Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico, pp 66–71. doi:10.1109/CVPR.1997.609299
Zitnick CL, Kang SB, Uyttendaele M, Winder S, Szeliski R (2004) High-quality video view interpolation using a layered representation. ACM Trans Graph 23(3):600–608. doi:10.1145/1015706.1015766
Article Google Scholar

Download references

Acknowledgments

This research is supported by Ministry of Culture, Sports and Tourism(MCST) and Korea Creative Content Agency(KOCCA) in the Culture Technology(CT) Research & Development Program.

Author information

Authors and Affiliations

Culture Technology Institute, Gachon University, San 65, Bokjeong-dong, Sujeong-gu, Seongnam-city, Gyeonggi, South Korea
Jin Woo Choi
Department of Interactive Media, Gachon University, Seongnam, South Korea
Taeg Keun Whangbo
Department of Computer Science, Namseoul University, 21 Maeju-ri, Seonghwan-eub, Seobuk-gu, Cheonan-city, Choongnam, South Korea
Cheong Ghil Kim

Authors

Jin Woo Choi
View author publications
You can also search for this author in PubMed Google Scholar
Taeg Keun Whangbo
View author publications
You can also search for this author in PubMed Google Scholar
Cheong Ghil Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin Woo Choi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choi, J.W., Whangbo, T.K. & Kim, C.G. A contour tracking method of large motion object using optical flow and active contour model. Multimed Tools Appl 74, 199–210 (2015). https://doi.org/10.1007/s11042-013-1756-6

Download citation

Published: 19 November 2013
Issue Date: January 2015
DOI: https://doi.org/10.1007/s11042-013-1756-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A contour tracking method of large motion object using optical flow and active contour model

Abstract

Similar content being viewed by others

A Study on Object Contour Tracking with Large Motion Using Optical Flow and Active Contour Model

Multiple-Cue-Based Visual Object Contour Tracking with Incremental Learning

Tracking Deformable Target via Multi-cues Active Contours

1 Introduction