Moving Target Tracking Based on Pulse Coupled Neural Network and Optical Flow

Ni, Qiling; Wang, Jianchen; Gu, Xiaodong

doi:10.1007/978-3-319-26555-1_3

Qiling Ni¹⁷,
Jianchen Wang¹⁷ &
Xiaodong Gu¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9491))

Included in the following conference series:

International Conference on Neural Information Processing

3418 Accesses
1 Citations

Abstract

Video contains a large number of motion information. The video– particularly video with moving camera – is segmented based on the relative motion occurring between moving targets and background. By using fusion ability of pulse coupled neural network (PCNN), the target regions and the background regions are fused respectively. Firstly using PCNN fuses the direction of the optical flow fusing, and extracts moving targets from video especially with moving camera. Meanwhile, using phase spectrums of topological property and color pairs (red/green, blue/yellow) generates attention information. Secondly, our video attention map is obtained by means of linear fusing the above features (direction fusion, phase spectrums and magnitude of velocity), which adds weight for each information channel. Experimental results shows that proposed method has better target tracking ability compared with three other methods– Frequency-tuned salient region detection (FT) [5], visual background extractor (Vibe) [6] and phase spectrum of quaternion Fourier transform (PQFT) [1].

Access provided by Autonomous University of Puebla. Download conference paper PDF

Aggregating Motion and Attention for Video Object Detection

A Particle Filter Framework for Object Tracking Using Visual-Saliency Information

Moving Object Tracking and Detection Based on Kalman Filter and Saliency Mapping

Keywords

1 Introduction

On June 10, 2014, the document [8] of the Cisco Visual Networking Index predicts that it would take an individual over 5 million years to watch the amount of video that will cross global IP networks each month in 2018. Every second, nearly a million minutes’ video content will cross the network by 2018. Video data generation rate is considerably larger than video data analysis rate. Video-based target tracking has drawn increasing interest for its highly applications [9], such as video surveillance, traffic control, machine intelligence, biological medical, etc.

In this paper, we are committed at designing a simple simulation system of human vision by combining static information and motion information. Static information means phase information [1] of color pairs and topological property [4]. Motion information means PCNN fusion based on optical flow. The following sections detail how they contribute to mapping targets.

Figure 1 illustrates the structure of proposed model. Firstly, we get three channels (color pair RG, color pair BY and topological property) from video frames. Then, phase information can be obtained from phase spectrum of two color pairs and topological property by inverse Fourier transform. Secondly, PCNN is utilized as fusion tool of motion features by optical flow direction. Pulse generates from outside to inside based on directional difference in the video frame until high enough difference value happens. Thirdly, saliency map is computed by smoothing linear fusion of phase information, magnitude and direction fusion.

2 Related Work

This section includes PCNN fusion based on optical flow and topological information extraction. Optical flow and PCNN applied in our model will be briefly introduced before computational process descriptions of PCNN fusion and topological information are given in detail.

2.1 Optical Flow

Optical flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer (an eye or a camera) and the scene. The concept of optical flow was introduced by the American psychologist James J. Gibson [2] in the 1940 s to describe the visual stimulus provided to animals moving through the world.

The optical flow methods try to calculate the motion between two image frames which are taken at times t and $ t + \delta t $ at every pixel. For a 2-dimensional case, a pixel at location $ (x,y,t) $ with intensity $ I(x,y,t) $ will have moved by $ \delta x $, $ \delta y $ and $ \delta t $ between the two image frames, and the following brightness constancy constraint can be given:

$$ I\left( {x,y,t} \right) = I\left( {x + \delta x,y + \delta y,t + \delta t} \right) $$

(1)

Assuming the movement to be small, the image constraint at $ I(x,y,t) $ with Taylor series can be developed to get:

$$ I(x + \delta x,y + \delta y,t + \delta t) = I(x,y,t) + \delta x\frac{\partial I}{\partial x} + \delta y\frac{\partial I}{\partial y} + \delta t\frac{\partial I}{\partial t} + e(0) $$

(2)

Equations (1) and (2) result in following Eq. (3), in which

$$ ]{I_{x} = {{\partial I} \mathord{\left/ {\vphantom {{\partial I} {\partial x}}} \right. \kern-0pt} {\partial x}},I_{y} = {{\partial I} \mathord{\left/ {\vphantom {{\partial I} {\partial y}}} \right. \kern-0pt} {\partial y}},I_{t} = {{\partial I} \mathord{\left/ {\vphantom {{\partial I} {\partial t}}} \right. \kern-0pt} {\partial t}},V_{x} = {{\delta x} \mathord{\left/ {\vphantom {{\delta x} {\delta t}}} \right. \kern-0pt} {\delta t}},V_{y} = {{\delta y} \mathord{\left/ {\vphantom {{\delta y} {\delta t}}} \right. \kern-0pt} {\delta t}},\,\,\,\,\,\,\,I_{x} V_{x} + I_{y} V_{y} + I_{t} = 0 }$$

(3)

The equation with two unknowns is known as the aperture problem of the optical flow algorithms and cannot be solve. To find the optical flow another set of equations is needed, given by some additional constraint. The Horn-Schunck [10] algorithm assumes smoothness in the flow over the whole image. The flow is formulated as a global energy functional which is then sought to be minimized.

$$ \text{E = }\iint {\text{[}(I_{x} V_{x} + I_{y} V_{y} + I_{t} )^{\text{2}} \text{ + }\alpha^{\text{2}} (\left\| {\varDelta V_{x} } \right\|^{\text{2}} \text{ + }\left\| {\varDelta V_{y} } \right\|^{\text{2}} )\text{]}dxdy} $$

(4)

In Eq. (3), the parameter $ \alpha $ is a regularization constant. Larger values of $ \alpha $ lead to a smoother flow. This functional can be minimized by solving the associated multi-dimensional Euler-Lagrange equations.

$$ \alpha^{2}\Delta V_{X} = I_{x}^{2} V_{X} + I_{y} I_{x} V_{Y} + I_{t} I_{x} ,\alpha^{2}\Delta V_{Y} = I_{x} I_{y} V_{X} + I_{y}^{2} V_{Y} + I_{t} I_{y} $$

(5)

where subscripts again denote partial differentiation and $ \Delta = {{\partial^{2} } \mathord{\left/ {\vphantom {{\partial^{2} } {\partial x^{2} }}} \right. \kern-0pt} {\partial x^{2} }} + {{\partial^{2} } \mathord{\left/ {\vphantom {{\partial^{2} } {\partial y^{2} }}} \right. \kern-0pt} {\partial y^{2} }} $ denotes the Laplace operator. In practice the Laplacian is approximated numerically using finite differences, and may be written $ \Delta V_{x} = \bar{V}_{x} - V_{x} ,\Delta V_{y} = \bar{V}_{y} - V_{y} $, where $ \bar{V}_{x} $ and $ \bar{V}_{y} $ is a weighted average of $ \bar{V}_{x} $ and $ \bar{V}_{y} $ calculated in a neighborhood around the pixel at location $ (x,y) $. Using this notation the above equation system may be written:

$$ (I_{x}^{2} + \alpha^{2} )V_{x} + I_{x} I_{y} V_{y} = \alpha^{2} \bar{V}_{x} - I_{t} I_{x} ,I_{x} I_{y} V_{x} + (I_{y}^{2} + \alpha^{2} )V_{y} = \alpha^{2} \bar{V}_{y} - I_{t} I_{y} $$

(6)

However, since the solution depends on the neighboring values of the flow field, it must be repeated once the neighbors have been updated. The following iterative scheme is derived:

$$ V_{x}^{n + 1} = \bar{V}_{x}^{n} - I_{x} \frac{{I_{x} \bar{V}_{x}^{n} + I_{y} \bar{V}_{y}^{n} + I_{t} }}{{\partial^{2} + I_{x}^{2} + I_{y}^{2} }},V_{y}^{n + 1} = \bar{V}_{y}^{n} - I_{y} \frac{{I_{x} \bar{V}_{x}^{n} + I_{y} \bar{V}_{y}^{n} + I_{t} }}{{\partial^{2} + I_{x}^{2} + I_{y}^{2} }} $$

(7)

where the superscript n + 1 denotes the next iteration, which is to be calculated and n is the last calculated result.

2.2 Pulse-Coupled Neural Network

Figure 2 illustrates the structure of unit-linking PCNN, quoted from Literature [3].

The Unit-linking PCNN architecture can be described by the following formula:

$$ {\text{F}}_{j} = dif\_dir_{j} ,{\text{L}}_{j} = step\left[ {\mathop \sum \limits_{k \in N\left( j \right)} Y_{k} \left( t \right)} \right] = \left\{ {\begin{array}{*{20}c} 1 & {\mathop \sum \limits_{ k \in N\left( j \right)} Y_{k} \left( t \right) > 0} \\ {0,} & {else} \\ \end{array} } \right. $$

(8)

$$ \begin{aligned} U_{j} = F_{j} (1 + \beta_{j} L_{j} ),Y_{j} = step(U_{j} - \theta_{j} ) = \left\{ {\begin{array}{*{20}l} 1 \hfill & {{\text{if }}U_{j} (t) \ge \theta_{j} (t)} \hfill \\ 0 \hfill & {\text{else}} \hfill \\ \end{array} } \right. \hfill \\ \hfill \\ \end{aligned} $$

2.3 PCNN Fusion Based on Optical Flow

We extract moving targets from the optical flow, which respectively use the dimension of its amplitude and direction. PCNN has the fusion feature, this section combined the PCNN fusion characteristics with the quantitative optical flow direction information which has been pretreated, besides, it respectively fused the target and the background which has the same moving characteristic, so as to realize the separation of foreground and background, and then segment the optical flow moving targets, the calculation process is following 3 steps:

Step1: Optical flow field pre-process: for each pixel, optical flow cushioned with a background vector against the maximum optical flow direction, and value 1/10 of the magnitude of the maximum optical flow.

Step2: Direction difference quantization: pixel $ \left( {x,y} \right) $ and its optical flow value is $ \left( {u,v} \right) $, the direction of optical flow is expressed as $ Ang = {\text{atan2}}(u,v)/\pi $, this will use -1 ~ 1 to indicate the value direction, the distribution as shown in Fig. 3.

The above-mentioned direction difference is shown as formula 9, the result are determined by the current pixel direction value and the mean value of its four neighborhood which has been fired, is the absolute value of their difference. Then through the formula 9 processing. We get the 0 ~ 1 monotonically increasing direction difference $ dif\_dir_{i} $.

$$ difAng_{i} = \left| {Ang_{i} - {\text{mean}}(Ang_{j} )} \right|,j \in\Omega ;dif\_dir_{i} = \hbox{min} (difAng_{i} ,2 - difAng_{i} ) $$

(9)

Step3: Unit-linking PCNN fusing direction features: input to F channel is direction difference $ dif\_dir_{i} $ between current pixel’s direction value $ Ang_{i} $ and the mean of its 4-neighborhood direction values whose fire set is 1 (shown in Eq. (9)). L channel collects fire information of the current pixel’ 4-neighborhood.

Figure 4 is the PCNN fusion effect chart using PCNN fusion method.

According to [6], the metric most widely used in computer vision to assess the performance of a binary classifier is the percentage of correct classification (PCC), which combines four values - the number of true positives (TP), which counts the number of correctly detected foreground pixels; the number of false positives (FP), which counts the number of background pixels incorrectly classified as foreground; the number of true negatives (TN), which counts the number of correctly classified background pixels; and the number of false negatives (FN), which accounts for the number of foreground pixels incorrectly classified as background.

$$ PCC = {{TP + TN} \mathord{\left/ {\vphantom {{TP + TN} {TP + TN + FP + FN}}} \right. \kern-0pt} {TP + TN + FP + FN}} $$

(10)

As shown in Fig. 5, the classification accuracy of PCNN fusion movement information almost all above 95 %.

2.4 Topological Property

Chen Lin proposed topological perception theory in 1982 [7]. A stimulus was separated into different global wholes (a figure and a background), dependent only on global properties. These global properties can be described mathematically as topological properties, such as connectivity. Literature [4] applied connectivity of topological perception into visual attention. We improve topological algorithm in literature [4] as following 3 steps to avoid the problem of selecting segmentation threshold and benefit the filtering of two or more color tones of background.

Step1: Grayscale image changed from color video frame is resized to 64*64.

Step2: Unit-linking PCNN extracting topological connectivity: input to F channel is intensity difference $ dif\_I_{i} $ between current pixel’s intensity $ I_{i} $ and the mean of its 4-neighborhood direction values whose fire set is 1. L channel collects fire information of the current pixel’ 4-neighborhood.

$$ {\text{F}}_{j} = dif\_I_{j} ,{\text{L}}_{j} = step\left[ {\mathop \sum \limits_{k \in N\left( j \right)} Y_{k} \left( t \right)} \right] = \left\{ {\begin{array}{*{20}c} 1 & {\mathop \sum \limits_{ k \in N\left( j \right)} Y_{k} \left( t \right) > 0} \\ {0,} & {else} \\ \end{array} } \right. $$

(11)

Step3: The binary image computed from step2 PCNN_filter is the input of topological channel which expressed connectivity.

In Fig. 6, 2th image and 3th image are original topological channel and improved topological channel of 1th image separately. Different tones (grass and load) cannot be filtered out all in the original topological channel, which leads to pedestrians’ sinking into grass. By inputting intensity difference into F channel of PCNN, improved topological channel filter grass and load successfully. Improvement is done to more accurately represent connectivity of topology.

3 Algorithm Structure

The step of proposed model.

Step1: Grayscale image changed from color video frame is resized to 64*64. Optical flows of grayscale images are calculated using hs method as Sect. 2.1.

Step2: Optical flow field pre-process: for each pixel, optical flow cushioned with a background vector against the maximum optical flow direction, and value 1/10 of the magnitude of the maximum optical flow. PCNN fuse pre-processed optical flow as Sect. 2.3.

Step3: Topological channel T is computed by using PCNN as Sect. 2.4. Two color pairs [1] is $ RG = R - G,BY = B - Y $, where $ R = r - \left( {g + b} \right)/2 $, $ G = g - \left( {r + b} \right)/2 $, $ B = b - \left( {r + g} \right)/2 $, $ Y = \left( {r + g} \right)/2 - \left| {r - g} \right|/2 - b $, and r, g, b are separately red channel, green channel, blue channel of color image.

Step4: Phase spectrum can be obtained by normalizing Fourier transform of T, RG and BY. Then, phase information is obtained from phase spectrum by inverse Fourier transform.

$$ p = f^{ - 1} \{ P[f({\text{RG,BY,0}} . 4 * {\text{T}})]\} $$

(12)

Step5: Saliency map is computed by smoothing linear fusion of phase information p, magnitude |OF| and direction fusion fus. In Eq. (13), $ \omega_{1} = 1.0,\omega_{2} = 1.2,\omega_{3} = 1.5,\sigma = 8 $.

$$ {\text{S}}\_{\text{Map}} = {\text{G}}(\sigma )*\left[ {\omega_{1} * {\text{p}} + \omega_{2} *\left| {\text{OF}} \right| + \omega_{3} *fus} \right]^{ 2} $$

(13)

4 Experimental Results

The proposed algorithm is implemented on our video database and compared with FT [5], Vibe [6], PQFT [1].

4.1 Database

See Table 1.

Table 1. Information of database

Full size table

4.2 Attention Detection Effects

In our experiments, one widely used saliency detection algorithm (FT [5]) and two common target tracking algorithms (Vibe [6], PQFT [1]) are compared with our algorithm. In Fig. 7, We show the results of these three methods and proposed method using the above figures.

As can be seen in Fig. 7, the results of proposed detection get more salient targets and darker background compared with FT, Vibe, and PQFT. For video Parachute with moving camera, FT focuses more on brighter light through the hole than flying parachute because of the lack of motion information. Vibe concerns with the edge of flying parachute and light, because the light is “moving” in the screen. The proposed model elegantly solves the problem of target tracking with moving camera by utilizing motion direction difference. For those videos with static camera, Vibe’ effect is quite good sometimes such as the result of 84th frame in Walking. However, inexplicable target just happen frequently such as the middle pedestrian of Pedestrian’ 116th frame. Although PQFT focuses on moving targets correctly, some of its results are incomplete and its background distracts us.

4.3 Comparison of Attention Detection Models

To further illustrate the effectiveness of the proposed algorithm which combines the visual attention model with PCNN and optical flow, we select commonly used evaluation index F-Measure to compare proposed model to FT [5], Vibe [6], PQFT [1]. Assuming G is ground truth regions, S is saliency regions:

$$ ]{{\text{Precision}} = {{G \cap S} \mathord{\left/ {\vphantom {{G \cap S} S}} \right. \kern-0pt} S} , {\text{ Recall}} = {{G \cap S} \mathord{\left/ {\vphantom {{G \cap S} G}} \right. \kern-0pt} G},{\text{F - Measure}} = {{2*p_{thr} *r_{thr} } \mathord{\left/ {\vphantom {{2*p_{thr} *r_{thr} } {\left( {p_{thr} + r_{thr} } \right)}}} \right. \kern-0pt} {\left( {p_{thr} + r_{thr} } \right)}} }$$

(13)

This paper further compares the proposed algorithm with FT [5], Vibe [6], PQFT [1] on the evaluation index F-Measure with different samples. The result is shown in Table 2. For videos with moving camera or background disturbance (Parachute and Birdfall), F-Measures of proposed model are biggest of four models. Dealing with video Pedestrians and video Walking, proposed model is more effective than FT and PQFT. And our model is effective and comparable with Vibe.

Table 2. F-Measure

Full size table

5 Conclusion

This paper proposed a moving target tracking algorithm, which combines the visual attention model with pulse coupled neural network and optical flow, has better tracking performance compared with traditional algorithms. Based on the relative motion occurring between moving targets and background, the target regions and the background are fused respectively by using fusion ability of PCNN. Meanwhile, the improved topological channel benefits the filtering of more color tones of background. Experimental results show that proposed method has higher detection rate and better ability of suppressing background.

References

Guo, C.L., Ma, Q., Zhang, L.M.: Spatio-temporal saliency detection using phase spectrum of quaternion Fourier transform. In: IEEE Conference on Computer Vision and Pattern Recognition, pp, 1–8(2008)
Google Scholar
Eckhorn, R., Reitboeck, H.J., Arndt, M., et al.: Feather linking via synchronization among distributed assemblies: simulation of results from cat cortex. Neural Comput. 2(3), 293–307 (1990)
Article Google Scholar
Gu, X.D., Yu, D.H., Zhang, L.M.: Image shadow removal using pulse coupled neural network. IEEE Trans. Neural Networks 5, 692–698 (2005)
Article Google Scholar
Gu, X.D., Fang, Y., Wang, Y.Y.: Attention selection using global topological properties based on pulse coupled neural network. Comput. Vis. Image Underst. 117, 1400–1411 (2013)
Article Google Scholar
Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: IEEE CVPR, pp. 1597–1604 (2009)
Google Scholar
Barnich, Olivier, Van Droogenbroeck, Marc: Vibe: a universal background subtraction algorithm for video sequences. IEEE Trans. Image Process. 20(6), 1709–1724 (2011)
Article MathSciNet Google Scholar
Chen, L.: Topological structure in visual perception. Science 218, 699–700 (1982)
Article Google Scholar
Cisco VNI, Cisco visual networking index: forecast and methodology, 2013–2018 [EB/OL]. http://www.cisco.com/c/en/us/solutions/collateral/service-provider/ip-ngn-ip-next-generation-network/white_paper_c11-481360.html. 10−14 June 2014
Kim, W., Kim, C.: Spatiotemporal saliency detection using textural contrast and its applications. IEEE Trans. Circuits Syst. Video Technol. 24, 646–659 (2014)
Article Google Scholar
Horn, B., Schunch, B.: Detemining optical flow. Artif. Intell. 17, 185–203 (1981)
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by National Natural Science Foundation of China under grant 61371148.

Author information

Authors and Affiliations

Department of Electronic Engineering, Fudan University, Shanghai, 200433, China
Qiling Ni, Jianchen Wang & Xiaodong Gu

Authors

Qiling Ni
View author publications
You can also search for this author in PubMed Google Scholar
Jianchen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Gu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaodong Gu .

Editor information

Editors and Affiliations

University of Istanbul, Istanbul, Turkey
Sabri Arik
University at Qatar, Doha, Qatar
Tingwen Huang
Tunku Abdul Rahman University College, Kuala Lumpur, Malaysia
Weng Kin Lai
University of Science Technology, Wuhan, China
Qingshan Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ni, Q., Wang, J., Gu, X. (2015). Moving Target Tracking Based on Pulse Coupled Neural Network and Optical Flow. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9491. Springer, Cham. https://doi.org/10.1007/978-3-319-26555-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-26555-1_3
Published: 09 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26554-4
Online ISBN: 978-3-319-26555-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics