1 Introduction

The advancement of MEMS technologies has made it possible to produce tiny wireless multimedia sensor devices. These tiny sensors hold the promise of revolutionizing sensing in a wide range of application domains because of their flexibility and low cost. High self-organization and tolerant characteristics make wireless multimedia sensor networks be widely used for target tracking. There are evident differences between WMSN and traditional wireless sensor networks (WSN). Firstly, multimedia sensors provide high bit rate data that makes existing protocols of WSN inefficient. The multimedia data requires compute-intensive processing algorithms which are very energy consuming for sensor networks. Secondly, multimedia data necessitates near real-time delivery of image content to the destination [13]. It is imperative that packets containing acoustic and image information reach the destination before a playout deadline otherwise they will be considered obsolete and simply dropped by destination.

The fundamental design tradeoff in WMSN is between application-specific Quanlity of Service (QoS) and energy consumption [9]. Specifically, for target tracking, QoS is decided by the tracking error, which is defined by the average target location estimation error. The sensor networks lifetime is solely determined by the network energy consumption, which is the energy consumption rate of all sensor nodes. Thus, the fundamental design tradeoff can be more explicitly presented by the tradeoff between the tracking error and the network energy consumption. The processing capability of micro-sensor nodes is also limited due to the lack of energy resource and cost. This prevents the implementation of complicated signal processing algorithms on sensor nodes. Therefore, a fully distributed, lightweight target localization/tracking algorithm is demanded [25].

Much work is present in the literature for target tracking within WSN, and most target detection and localization algorithms require the sensors to work in groups in order to improve the efficiency and accuracy of target tracking algorithms. This makes it necessary for deployed sensors to discover and group together so that their coverage can be maximized. By now, researchers have done much work on localization/tracking algorithm. CPA (Closet Point of Approach) in literature [9] is a data fusion algorithm that relies on raw data gathered by individual sensors. Each sensor monitors the acoustic information from the target with the help of a microphone. Specifically, it monitors the signal energy for a given time. The sensor confirms the presence of a target once the signal strength exceeds a certain threshold. CPA algorithm is very simple, so it has large localization error. In order to enhance the tracking accuracy, learning theories can be considered for sensor networks of limited capability, rather than directly using raw measurements. Previous reports exist on kernel-based learning [14], maximum likelihood parametric approaches [18], and distributed learning [15]. With distributed Kalman filter [8], the localization performance has been improved in terms of accuracy in low cost and low-capability sensor networks.

In [4], the authors proposed a model integrating the human visual characteristics of video motion, in the frequency multi-resolution wavelet domain, with multi-dimensional fuzzy inference perceptual model. However, due to the integration of fuzzy inference, the model is slow and not suitable for real time WMSN application. In [17], authors introduced saliency model, along with face detection, to crop informative parts of images before reducing them to thumbnails. The same model is also used by [3], with the addition of text detection, to crop regions of interest in images for adaptation to small displays. In [23], the authors study the statistics of users’ interaction with images on small displays to determine regions of maximum user interest.

In [7], the authors proposed a resource allocation scheme based on predictive mobility in mobile wireless environments. In their paper, the directionality probability was introduced to determine which cell the mobile target will visit next. The cell on the direction from the previous cell to the current cell is regarded as the most likely visited cell. Their scheme can be used for resource allocation in cellular networks having user’s mobility profile, but cannot be used for the tracking of a mobile target whose mobility profile is unknown in WMSN. In [1], authors modeled the mobile target as random walk, the mobile target can take any direction from the current location since the vehicle kinematics are ignored. So, the area where the mobile target cannot visit for some time belongs to the tracking area. On the other hand, since our scheme models the mobile target’s movement based on the vehicular kinematics [2], only the area where the mobile target can visit mechanically belongs to the tracking area. As a result, we can reduce the number of working sensor nodes in each tracking area called the minimal contour for the energy efficiency.

In this paper, we will propose an algorithm to estimate target location and reduce the number of working sensors. The rest of this paper is organized as follows: In part II, we introduce the architecture of WMSN for target tracking. In part III, we design target localization scheme using acoustic and image sensors. In part IV, we propose GM algorithm to predict target trajectory and the method to form tracking region. In part V, some experiments have been done to demonstrated validity of our algorithm. In part VI, we make some conclusions.

2 Hybrid wireless multimedia sensors networks for target tracking

WMSN is used for target tracking in many fields. Unlike traditional data communication networks, WMSN is usually not address-centric. An individual sensor node generally does not have a globally unique ID in the networks. For target tracking applications specifically, the data communication is event and location centric. Event centric suggests that network operation and wireless data exchange are triggered by acoustic and image information in the interested region. Location centric suggests that the destination of wireless packets would be the nodes within a specific location region instead of a particular node [5, 6].

Figure 1 shows the architecture of WMSN. A detailed discussion on WMSNs hardware, architecture and existing test beds can be found in [16, 22].

Fig. 1
figure 1

Architecture of WMSN

Only sensors in sensing area are used to detect targets and to forward sensed data to the sink, and all the other sensors go into a sleep state. This issue is commonly resolved using a sensor wake-up scheduling protocol by which some sensors stay active to provide sensing services while the others sleep to conserve their energy. We consider the sensor scheduling problem to maximize network lifetime while maintaining both the target coverage and network connectivity in WMSN.

Suppose for a tracking application within WMSN, an image is captured and required to be sampled for storage as well as to be transmitted through wireless channel. The RF transmission power control is also considered in WMSN. It is not only used to determine the neighboring sensor nodes that can receive information, but also to reduce the communication cost among clustered sensor nodes [11, 22]. In addition, as transmission performance is greatly affected by the transmission distance [19,20,21], the shortened transmission radius can also increase the transmission performance. The directional antenna technology is used for mobile ad hoc networks including WMSN for the parallel communication in MAC protocol level [10, 12]. Our tracking algorithm uses the directional antenna in order to reduce the number of RF receiving sensors.

Based on the above literature, due to the limitations of acoustic sensors, it is difficult to accurately determine the target location with acoustic sensors only. In this paper, we propose a hybrid wireless multimedia sensors networks for target tracking. An acoustic and image information processing scheme is required to achieve intended target detection and tracking accuracy.

3 Acoustic and image information processing scheme for target localization

In physical world, mobile target will cause the change of environmental parameters, thus the target can be tracked by sensing environmental parameters.

3.1 Acoustic information processing scheme

The intensity of an acoustic signal emitted omni-directionally from the target and propagating through ground surface will attenuate at a rate that is inversely proportional to the distance from the source. The process of target localization based on acoustic sensors is shown in Fig. 2.

Fig. 2
figure 2

Process of target localization using acoustic sensors

We assume that the acoustic intensities of the target will be linearly superimposed without any interaction among them [7]. We can be obtained,

$$ {E}_i(t)=\frac{\gamma_i}{{\left\Vert {L}_T(t)-{L}_i\right\Vert}^2}{E}_s(t)+{\varepsilon}_i(t) $$
(1)

Where

$$ {E}_s(t)=\frac{1}{N}\sum_{n=0}^{N-1}{s}^2\left( t+\frac{n}{f_s}\right) $$
(2)
$$ {\varepsilon}_i(t)=\frac{1}{N}\sum_{n=0}^{N-1}{v_i}^2\left( t+\frac{n}{f_s}\right) $$
(3)

Then the expected target signal strength E s (t) at time t subject to the Gauss random distribution, E s (t) ∼ N(ES, σ s 2). σ s 2 is variance. As v i 2(t + n/f s ) is two order Chi-Square random variables [7], which mean is σ i 2 and variance is 2σ i 4. When the samples are large enough, according to the central limit theorem, ε i (t) approximately subject to the Gauss random distribution which mean is σ i 2 and variance is 2σ i 4/N.

$$ {\varepsilon}_i(t)\sim N\left({\sigma_i}^2,\frac{2{\sigma_i}^4}{N}\right) $$
(4)

In fact, if N>20, ε i (t) approximately subject to the Gauss random distribution.

In the sensing field, the number of the sensor nodes that found the target is n, which compose set S. By using optimal linear combination method [4], the target coordinates \( {\widehat{L}}_T(t) \) can be calculated by eq. (5)

$$ {\widehat{L}}_T(t)=\sum {\omega}_i{L}_i\kern0.5em i\in S $$
(5)

In Eq.(5), ω i is undetermined coefficient to estimate which satisfy ∑ω i  = 1 and i = 1 , 2 … n, according to Eq. (1), the following equation can be obtained,

$$ E\left({d_i}^2\right)= E\left({\left\Vert {L}_T(t)-{L}_i\right\Vert}^2\right)= E\left(\frac{E_s(t){\gamma}_i}{E_i(t)-{\varepsilon}_i(t)}\right) $$
(6)

According to Eq. (4), the value of ε i (t) is approximate subject to normal distribution, so we can get

$$ E\left({d_i}^2\right)= E\left(\frac{E_s(t){\gamma}_i}{E_i(t)-{\varepsilon}_i(t)}\right)={E}_s(t){\gamma}_i\sqrt{\frac{N}{4{{\pi \sigma}_i}^4}}{\int}_{-\infty}^{+\infty}\frac{1}{E_i(t)-{\sigma_i}^2- x}{e}^{-\frac{x^2 N}{4{\sigma_i}^4}} dx $$
(7)

Considering of MRC theory [15], we can obtain the following equation

$$ {\omega}_i=\frac{\frac{1}{E\left({d_i}^2\right)}}{\sum_{j=1}^n\frac{1}{E\left({d_j}^2\right)}} $$
(8)

Define

$$ {\mu}_i=\frac{E_s(t)}{E\left({d_i}^2\right)} $$
(9)

According to Eq.(7),the Eq. (9) can be expressed as,

$$ {\mu}_i=\frac{1}{\gamma_i\sqrt{\frac{N}{4{{\pi \sigma}_i}^4}}{\int}_{-\infty}^{+\infty}\frac{1}{E_i(t)-{\sigma_i}^2- x}{e}^{-\frac{x^2 N}{4{\sigma_i}^4}} dx} $$
(10)

So Eq. (8) can be written as,

$$ {\omega}_i=\frac{\mu_i}{\sum_{j=1}^n{\mu}_j} $$
(11)

When nodes receive target acoustic signal, ω i can be calculated by Eq. (11), and target location is estimated by eq. (5).

3.2 Image information processing scheme

The energy consumption of the image sensors is much larger than that of the acoustic sensors, so we arrange the acoustic sensors localize target firstly. In this way, WMSN can achieve a rough localization of the target. After the acoustic sensor finding the target, image sensor start monitor the target to obtain precise location. The image sensors that can rotate 360 degrees and obtain the angle of the target. The detection process of image sensors is illustrated in Fig. 3.

Fig. 3
figure 3

The detection area of image sensors

The challenge in image detection systems is the ability to accurately define the background. Once the background is modeled, the foreground (moving target) can easily be identified from the background. Accurately modeling the background is challenged by the presence of changes in the background which are not part of the object of interest. The background models must be constantly updated to compensate for these effects. Once the background model statistics have been defined as described above, subsequent image are tested to see if their pixels are within the high and low range of the average background pixel. Pixels outside that range are defined as foreground. Visual Studio.NET environment and OpenCV are used to implement the proposed system [10].

After extracting the moving target, we employ a dynamic programming approach, using the following procedure. First, motion parameters is extracted through a set of images in time domain. Then, we build a weighted directed graph. A shortest path algorithm through the graph selects the first optimal trajectory [16]. After this trajectory is then smoothed, the tracking area can be constructed (Fig. 4).

Fig. 4
figure 4

Process of target location using image sensors

The shortest path resulting from the previous stage has a noisy appearance. So a moving average smoother is applied to the trajectory. With y(t) being the original data (raw trajectory) at time t and ys(t) the smoothed one, the difference equation is:

$$ ys(t)=\frac{1}{2 N+1}\left( y\left( i+ N\right)+ y\left( i+ N-1\right)+.\dots + y\left( i- N\right)\right) $$
(12)

Where N is the length of the span interval.

The performance of the algorithm is tested using a moving vehicle. In Fig. 5, the moving target appears in different place in time domain. The localization of moving target is execute by image sensors. In Fig. 6, red line is actual trajectory and yellow line is calculated trajectory [12]. The yellow line is smooth by eq. (12). We can say that the proposed scheme can achieve very accurate results.

Fig. 5
figure 5

Moving target at different times

Fig. 6
figure 6

Comparison of calculated trajectory and actual trajectory

4 Mobile target tracking algorithm

Since most sensor nodes stay sleep to save power before the target arrives, the manager node (the node that manage tracking region) should predict the target moving direction and activate the right group of sensor nodes, which can detect the target and monitor its surrounding area as soon as the target approaches.

4.1 Target trajectory prediction

As mobile target usually moves towards an explicit destination, rather than aimless random motion, the target location, moving speed and angle have relationships in time domain. In our proposed predictive distance-based mobility management scheme, the future location of a target is predicted by the Gauss-Markov model based on its location and velocity at the time of the last location. The prediction information is made available to the sensor networks. Therefore, the sensor networks check target’s location periodically and perform location update whenever it reaches the threshold distance away from the predicted location.

In systems with correlated velocity mobility patterns, unlike those with random-walk mobility patterns, the current location of a target is relative to where the target last reported. We assumed that the target localization interval is δ, so at the time , coordinate of target location is (x  , y ), and at the time slot (k + 1)δ, coordinate of target location is (x (k + 1)δ  , y (k + 1)δ ). Based on piecewise linear fitting, the target average velocity in ~ (k + 1)δ is:

$$ v=\frac{\sqrt{{\left({x}_{\left( k+1\right)\delta}-{x}_{k\delta}\right)}^2+{\left({y}_{\left( K+1\right)\delta}-{y}_{k\delta}\right)}^2}}{\delta} $$
(13)

And the motion angle is

$$ \tan \theta =\frac{y_{\left( k+1\right)\delta}-{y}_{k\delta}}{x_{\left( k+1\right)\delta}-{x}_{k\delta}} $$
(14)

In target tracking system, Kalman algorithm is usually used to predict mobile target trajectory [16], but it requires prior knowledge of mobile target motion parameters to make accurate predictions. In most time, WMSN tracking system has no knowledge of target motion parameters, especially for hostile target. So, here we adopt Gauss Markov (GM) algorithm for target trajectory prediction, which can achieve better prediction result without the knowledge of target motion characters.

In GM model, g is a discrete time series, δ is time interval, and g (k + 1)δ can be expressed by the following equation,

$$ {g}_{\left( k+1\right)\delta}=\rho {g}_{k\delta}+\left(1-\rho \right){\mu}_g+ z\sqrt{1-{\rho}^2} $$
(15)

In eq. (15), z~N(0, σ g 2), σ g is variance of g, μ g is mean value of g and ρ (1 > ρ > 0) is correlation coefficient of g.

Because z is a zero mean Gauss random variables, according to linear least squares method, we can obtain the maximum likelihood solution of g (k + 1)δ ,

$$ {\widehat{g}}_{\left( k+1\right)\delta}=\rho {g}_{k\delta}+\left(1-\rho \right){\mu}_g $$
(16)

If the sampling sequence length is N, then μ g and ρ g can be calculated by,

$$ {\mu}_g=\frac{1}{N}\sum_{k=1}^N{g}_{k\delta} $$
(17)
$$ \rho =\frac{N}{N-1}\frac{\sum_{k=2}^N{g}_{k\delta}{g}_{\left( k-1\right)\delta}}{\sum_{k=1}^N{g^2}_{k\delta}} $$
(18)

Where θ is moving angle and v is velocity, considering of eq. (17), we can get \( {\widehat{\theta}}_{\left( k+1\right)\delta} \) and \( {\widehat{v}}_{\left( k+1\right)\delta} \) in the following period,

$$ {\widehat{\theta}}_{\left( k+1\right)\delta}=\frac{N}{N-1}\frac{\sum_{k=2}^N{\theta}_{k\delta}{\theta}_{\left( k-1\right)\delta}}{\sum_{k=1}^N{\theta^2}_{k\delta}}{\theta}_{k\delta}+\left(1-\frac{N}{N-1}\frac{\sum_{k=2}^N{\theta}_{k\delta}{\theta}_{\left( k-1\right)\delta}}{\sum_{k=1}^N{\theta^2}_{k\delta}}\right)\frac{1}{N}\sum_{k=1}^N{\theta}_{k\delta} $$
(19)
$$ {\widehat{v}}_{\left( k+1\right)\delta}=\frac{N}{N-1}\frac{\sum_{k=2}^N{v}_{k\delta}{v}_{\left( k-1\right)\delta}}{\sum_{k=1}^N{v^2}_{k\delta}}{v}_{k\delta}+\left(1-\frac{N}{N-1}\frac{\sum_{k=2}^N{v}_{k\delta}{v}_{\left( k-1\right)\delta}}{\sum_{k=1}^N{v^2}_{k\delta}}\right)\frac{1}{N}\sum_{k=1}^N{v}_{k\delta} $$
(20)

The target location in the following period is

$$ \left(\begin{array}{c}\hfill {\widehat{x}}_{\left( k+1\right)\delta}\hfill \\ {}\hfill {\widehat{y}}_{\left( k+1\right)\delta}\hfill \end{array}\right)=\left(\begin{array}{c}\hfill {x}_{k\delta}\hfill \\ {}\hfill {y}_{k\delta}\hfill \end{array}\right)+\delta {\widehat{v}}_{\left( k+1\right)\delta}\left(\begin{array}{c}\hfill \cos {\widehat{\theta}}_{\left( k+1\right)\delta}\hfill \\ {}\hfill \sin {\widehat{\theta}}_{\left( k+1\right)\delta}\hfill \end{array}\right) $$
(21)

4.2 Forming tracking region

When a target first enters the tracking region of the sensor networks, nodes that are awake and close to the target can detect it. These nodes construct an initial tracking region by first selecting a node to be the manager node of the tracking region based on a root election algorithm [13]. A node in the tracking region only sends data to its manager node, which can further reduce the redundancy in data transmission. Certainly, some manager nodes may fail. This can be addressed by allowing a node to select another manager node when its current one fails. As the target moves, many nodes in the tracking region may become far away from the manager node, and hence a large amount of energy may be wasted to send their sensing data to the manager node. In this case, a new manager node should be selected to replace the old one, and the tracking region should be reconfigured accordingly.

Assuming target moves at maximum velocity, considering of different steering angle, there exist a number of trajectories. These trajectories comprise a heart-shaped region that target may appear in the following ΔT. Figure 7 is the contrast of two tracking areas. Based on our method, the inside small region is composed of target trajectories. And the outside round region proposed by OCR algorithm is obviously much larger than trajectory area [24].

Fig. 7
figure 7

Contrast of two tracking region

Here, we take four wheels vehicle as analyzing object. The geometry relationship is shown in Fig. 8.

Fig. 8
figure 8

Geometry relations of vehicle motion

Pf = (xf, yf) is the coordinates of front axle midpoint, Pb = (xb, yb) is the coordinates of back axle midpoint, ϕ is the steering angle, and L is the length of front wheel to back wheel. According to the vehicle kinetics, Pf = (xf, yf) locates on the circle with radius Rf and Pb = (xb, yb) locates on the circle with radius Rb. So we can be obtained the following equations,

$$ R\mathrm{f}=\frac{L}{ \sin \phi} $$
(22)
$$ R\mathrm{b}=\frac{L}{ \tan \phi} $$
(23)

We use Pb = (xb, yb) represent coordinate of the target, Rb represent target turning radius, ϕ represent wheel steering angle and V represent velocity of target. It is obviously that tracking region changes with the characteristics of mobile target, and the region is the function of L, ϕmax (maximum steering angle) and Vmax (maximum speed)

$$ A= f\left( L,\phi \max, V \max \right) $$
(24)

In Fig. 9, target location is the origin of coordinate, the direction of V is the positive direction of axis Y. If target moving at maximum speed with a certain steering angle in the same length of time, each trajectory has equal length that can be expressed by

$$ S=2\uppi R\mathrm{b}=\frac{2\uppi L}{ \tan \phi \max } $$
(25)
Fig. 9
figure 9

Geometry relations of mobile target trajectory

We take a trajectory to analyze. As the steering angle is different to each other, every trajectory is a circular arc, whose radius is R and central angle is α. The relationship of R and α can be expressed by

$$ R=\frac{S}{\alpha}=\frac{2\uppi L}{\alpha \tan \phi \max } $$
(26)

And the boarder of the tracking region can be expressed by following parametric equations

$$ x=\frac{2\uppi L}{\alpha \tan \phi \max}\times \sin \alpha $$
(27)
$$ y=\frac{2\uppi L}{\alpha \tan \phi \max}\times \left(1- \cos \alpha \right) $$
(28)

With the knowledge of parametric equation, the area of tracking region is

$$ A=2\times {\int}_0^{2\uppi}{yx}^{\hbox{'}} d\alpha $$
(29)

4.3 Tracking region refresh

The tracking region follows the target’s movement changing its refresh time based on the target’s speed. The optimal tracking region size is also determined by the average trajectory distance used for the optimization of refresh time given the target’s speed. So, we need to maintain the constant contour shape by changing the refresh time according to the target’s current speed with the average trajectory distance of the targets observed so far. We can see that the refresh time means the lifetime of the current tracking region when the sensors within it should continue to work for sensing the target [24] (Fig. 10).

Fig. 10
figure 10

Tracking region refresh

Let T refresh be the refresh time to prepare for a new tracking region,

$$ T\mathrm{refresh}= T\mathrm{trans}+ T\mathrm{cal}+ T\mathrm{sen} $$
(30)

Where T trans is time cost of RF transmitting per hop for disseminating the minimal contour information (0.2 s), Tcal is time cost of calculation in a manager node for determining the minimal tracking region(0.02 s, Tsen is time cost of minimum working time for sensing mobile target in each sensor node(0.5 s).

Before the target leaves the current tracking region, the next tracking region is prepared. The manager node broadcasts the target’s location and movement information to its neighbor sensors. The neighbor sensors determine whether they will participate in tracking region or not. When the current tracking region’s refresh time expires, the sensors turn off their sensing devices except for the sensors that continue to belong to the next contour [26]. The starting time of sensing devices is determined considering the movement information message’s timestamp.

5 Experiments and analysis

In order to validate the performance of our localization algorithm, we use 16 nodes to do the experiment. The relative locations of target and sensor nodes are shown in Fig. 11, in which the hollow dots represent acoustic sensor nodes, solid dots represent image sensor nodes and the rectangle dots represent the targets. Here we use a buzzer to simulate a target in five positions.

Fig. 11
figure 11

Relative locations of target and sensor nodes

As shown in Fig. 11 (a), we first use acoustic sensor nodes only. In this experiment, the mean value σ i of measurement noise is 3. According to the algorithm in section 3, we set N = 50. Then, we exchange four acoustic nodes to image nodes in four corners of the detecting area as shown in Fig. 11 (b). It can be concluded that the location 3 has the lowest localization error. Because location 3 is the geometric center of sensor networks, and all sensor nodes uniformly distributed around the location which can accurately perceive the target signal strength. Compared to location 3, the others in the monitoring region are on the edge, and it is hard to get precise target information. The location error of these two methods is shown in Fig. 12.

Fig. 12
figure 12

Comparison of two methods

To validate the target tracking algorithm proposed in this paper, we model the WMSN and vehicle with Matlab and contrast it to OCR [24] and DBA [11]. In the simulation process, all the nodes are distributed randomly in the assigned area, and the simulation parameters are shown in Table 1.

Table 1 Simulation parameters

When L= 3 m and ϕmax = 25° the average energy consumption of this algorithm is 22% lower than OCR, as shown in Fig. 13.

Fig. 13
figure 13

Energy consumption of OCR compared with our algorithm

Figure 14 shows the relationship of energy consumption to L. When L increases, tracking area also enlarges and contains more nodes. WhenL= 3 m and ϕmax = 25°, the average energy consumption is 28% lower than L= 8 m.

Fig. 14
figure 14

Energy consumption of different L

Figure 15 shows the relationship of energy consumption to ϕmax. WhenL= 3 m and ϕmax = 20°, the average energy consumption is lower than ϕmax = 30°. With the increasing of ϕmax, the tracking region update time has been shorten, so it needs more energy to reform tracking region.

Fig. 15
figure 15

Energy consumption of different ϕmax

When L= 3 m and ϕmax = 20°, the average localization accuracy of hybrid WMSN is compared to OCR and DBA in Fig. 16. We can see that the tracking accuracy of hybrid WMSN is much higher than OCR and DBA. In our hybrid WMSN, the image sensors can accurately identify the moving target, and calculate it’s location and motion parameters. But the other two algorithms use acoustic sensors only, it is hard to achieve high tracking accuracy.

Fig. 16
figure 16

Localization accuracy of three algorithms

WhenL= 3 m and ϕmax = 20°, the energy consumption of hybrid WMSN is compared to OCR and DBA in Fig. 17. We can see that the energy consumption of hybrid WMSN, OCR and DBA is approximately equal.

Fig. 17
figure 17

Energy consumption of three algorithms

6 Conclusion

In this paper, a novel target tracking method has been proposed based on hybrid WMSN. Contributions and achievements of the proposed method are highlighted as follows. First of all, the architecture of hybrid WMSN is introduced. Second, the collaboration schemes of acoustic and image sensors are designed to calculate the target trajectory and update the tracking region. At last, vehicle kinetics is adopted in this paper to active the tracking region that mobile target can reach. Extensive simulations have been conducted, and the simulation results verify that the tracking accuracy of hybrid WMSN is much higher than OCR and DBA without extra energy consumption.