1 Introduction

Abnormal behavior detection is one of the key issues in the field of intelligent video surveillance, where the main tasks contain the following two aspects: one is to detect various abnormal behaviors automatically from surveillance videos, and the other is to remind security persons to deal with these unusual events timely.

Nowadays, various approaches of abnormal behavior detection can be divided into the following two categories: one category is object tracking based method [3, 5, 8, 9, 12, 17, 22, 24, 25] and the other category is group features based method [2, 5, 7, 10, 1416, 23, 26]. The first method tracks each moving object individually, and then the obtained motion information of each moving object is utilized to complete the detection of abnormal behavior. However, the occlusion between objects in complex scenes seriously affects the performance of the first method. The second method directly extracts the motion information from the whole video sequence. In [5, 14], the information of both normal behaviors and abnormal behaviors are utilized to train a support vector machine to complete the anomaly detection. In [2], only the information of normal examples is utilized to detect abnormal examples. In [10], the information of normal examples is utilized to construct the Probabilistic Principal component analysis model, and those behaviors which could not be represented by the proposed model are considered as abnormal examples. In [7, 15], the Latent Dirichlet Allocation method is utilized to detect abnormal behaviors. To detect anomaly behaviors in video sequences, existing methods usually scan every video pixel or every video region carefully, which increases the computation cost and reduces the detection accuracy.

To improve the detection accuracy of local abnormal behaviors, a novel method is here proposed. Firstly, bag of words [21] is adopted to describe the optical flow information of each blob, and the semi-parametric based model [4] is adopted to detect suspicious local abnormal blobs; then, suspicious abnormal blobs are divided into rectangular cells equally, and the maximum optical flow energy method [27] is adopted to detect undoubted abnormal cells; finally, the local nearest descriptor and Mixed Naïve Bayes model [19] are adopted to determinate anomaly behaviors.

The rest of the paper is organized as follows. Semi-parametric model based suspicious abnormal blob detection is described in Section 2. The detection of anomaly behavior is described in Section 3. In Section 4, we first introduce the experimental datasets and evaluation methods, and then report the experimental result. Section 5 concludes the paper.

2 Abnormal blob detection

Each video image frame is first divided into blobs, and the process of considering a blob as a suspicious abnormal blob is described as follows.

First, x is utilized to describe a group of features of a blob S; then, the blob S can be considered as a suspicious abnormal blob when the following inequality is true:

$$ \lambda (S)=\frac{ \Pr \left(x\Big|{H}_1(S)\right)}{ \Pr \left(x\Big|{H}_0(S)\right)}>\rho $$
(1)

where Pr(x|H 0(S)) is the likelihood probability of x when S is a normal blob, and Pr(x|H 1(S)) is the likelihood probability of x when S is an abnormal blob.

An appropriate likelihood probability model of Pr(x|H i ) needs to be built to compute the likelihood ratio λ(S). Unlike existing methods computing λ(S) by a probability density model with specific parameters, a semi-parametric probability density model is here adopted to compute λ(S).

2.1 Semi-parametric probability density model

The modeling process of the semi-parametric probability density function is described as below.

  • First, x 1 and x 2 are feature set inside and outside the blob S, and f(x) and g(x) are their corresponding probability density functions:

$$ \left\{\begin{array}{l}{\mathrm{x}}_1={\left({x}_{11},{x}_{12},......,{x_1}_{n_1}\right)}^T\sim f(x)\\ {}{\mathrm{x}}_2={\left({x}_{21},{x}_{22},......,{x_2}_{n_2}\right)}^T\sim g(x)\\ {}\frac{f(x)}{g(x)}= \exp \left(\alpha +{\beta}^Th(x)\right)\end{array}\right. $$
(2)

where n 1 and n 2 are the size of blob inside and outside S, and h(x) is a pre-defined function.

  • Then, the likelihood probability of semi-parametric for S can be formulated as:

$$ L\left(\alpha, \beta, G\right)=\underset{i=1}{\overset{n}{\varPi }}{p}_i\underset{j=1}{\overset{n_1}{\varPi }} \exp \left(\alpha +{\beta}^Th\left({x}_{1j}\right)\right) $$
(3)

where n = n 1 + n 2, p i  = dG(t i ) = Pr(X = t i ), t = (t 1,t 2, …,t n)T = (x 11, x 12,…, x 1n1, x 21,,…, x 2n2)T, and G(x) is the cumulative distribution function of g(x).

  • Finally, the likelihood probability of λ(S) in Eq. (1) can be computed with f(x) = g(x) or f(x) ≠ g(x) respectively.

    • When α = 0 and β = 0, f(x) = g(x), which means that the feature distribution inside and outside S are same and λ(S) is 1.

    • When α ≠ 0 and/or β ≠ 0, L in Eq. (3) can be maximized with the constraints of ∑p i  = 1 and ∑p i [ω(t i )-1] = 0. The detail of the parameter estimation is described as follows. α and β are first utilized to represent p i by Lagrangian relaxation; then, p i is substituted into the log-likelihood l = lnL(α, β, G); next, the maximum value of L can be obtained under p i  = n 2 −1[1 + ρω(t i )]−1 (ρ = n 1/n 2) and ∂l/∂α = 0, and Eq. (4) with only α and β can be obtained by ignoring constants; finally, α and β can be estimated by using the method of Newton maximization.

$$ l\left(\alpha, \beta \right)=-{\displaystyle \sum_{i=1}^n \log \left[1+\rho exp\left(\alpha +{\beta}^Th\left({t}_i\right)\right)\right]}+{\displaystyle \sum_{j=1}^{n_1}\left[\alpha +{\beta}^Th\left({t}_i\right)\right]} $$
(4)
  • Set \( \overset{\wedge }{\alpha } \) and \( \overset{\wedge }{\beta } \) be the maximum likelihood estimation of α and β, then \( l\left(\overset{\wedge }{\alpha}\overset{\wedge }{,\beta}\right) \) is the log-likelihood of f(x) ≠ g(x):

$$ \begin{array}{l}\lambda (S)\equiv -2\left[l\left(0,0\right)-l\left(\overset{\wedge }{\alpha}\overset{\wedge }{,\beta}\right)\right]-l\left(\alpha, \beta \right)\\ {}\kern4em =-2{\displaystyle \sum_{i=1}^n \log \left[1+\rho exp\left(\alpha +{\overset{\wedge }{\beta}}^Th\left({t}_i\right)\right)\right]}\kern0.5em +2{\displaystyle \sum_{j=1}^{n_1}\left[\alpha +{\overset{\wedge }{\beta}}^Th\left({t}_i\right)\right]}+2n \log \left(1+\rho \right)\end{array} $$
(5)

The method of semi-parametric probability density model has the following two merits. Firstly, the feature set t around each blob is here utilized to estimate α, β and their corresponding distribution simultaneously, since it is difficult to obtain the distribution model of α and β separately when the blob is too small; secondly, there are no specific parametric probabilistic models needed to be assigned, since different forms of h(x) can be selected for different feature distributions.

2.2 Feature description

The process of feature exaction for abnormal behavior detection is described as follows.

  • First, each image is divided into blobs equally, and the optical flow in each blob are extracted respectively.

  • Next, all optical flow features are clustered into several categories and every category is represented as a visual word.

  • Finally, a histogram of visual words is utilized to describe the behavior feature of the whole video sequence, which can compress image information and keep local features of interesting images simultaneously.

The histogram of visual word simplifies the computation of log-likelihood l(α, β). Set b = (b 1, b 2, …, b K )T be the visual word histogram in an image frame, where b k (k = 1, 2, …, K) is the number of blobs belonging to the k th visual word. Set b w = (b 1 w, b 2 w,…, b k w)T be the histogram within the whole video sequence, then Eq. (4) can be rewritten as below:

$$ l\left(\alpha, \beta \right)=-{\displaystyle \sum_{k=1}^K{b}_k^{\omega }} \log \left[1+\rho exp\left(\alpha +{\beta}^Th(k)\right)\right]+{\displaystyle \sum_{k=1}^K{b}_k\left[\alpha +{\beta}^Th(k)\right]} $$
(6)

Similarly, Eq. (6) can be simplified as below:

$$ \lambda (S)=-2{\displaystyle \sum_{k=1}^K{b}_k^{\omega }} \log \left[1+\rho exp\left(\overset{\wedge }{\alpha }+{\overset{\wedge }{\beta}}^Th(k)\right)\right]+2{\displaystyle \sum_{k=1}^K{b}_k\left[\overset{\wedge }{\alpha }+{\overset{\wedge }{\beta}}^Th(k)\right]}+K \log \left(1+\rho \right) $$
(7)

2.3 Abnormal blob detection

Based on the definition in Eq. (1), it can be seen that the value of λ(S) of abnormal behavior regions is larger than that of normal behavior regions. Therefore, if the value of λ(S) is larger than the threshold ρ, the blob S can be considered a suspicious abnormal blob.

However, some normal blobs with interactive motion or complex motion may also have large λ(S). Therefore, suspicious abnormal behavior blobs need to be further confirmed to reduce false alarms.

3 Abnormal behavior detection

3.1 Basic idea

Compared with images blobs of normal behavior, images blobs of abnormal behavior often have obvious characteristics of higher motion velocity and more disordered motion direction. Therefore, the optical flow energy of an abnormal behavior blob is larger than that of a normal behavior blob. Based on the discussed above, the basic idea of local abnormal behavior detection in this paper is described as follows: the mixed naive Bayes model is first utilized to train nearest neighbor descriptor of normal cells, and then the trained nearest neighbor descriptor is utilized to determinate whether each test cell is an abnormal cell or not. The detail of abnormal behavior detection is shown in Algorithm 1.

Algorithm 1: Abnormal behavior detection process

• Condition: known abnormal blobs

• Dividing each abnormal blob into cells with same size, and computing the optical flow energy in each cell.

• Searching abnormal cells with the largest optical flow energy according to the mixed naive Bayes model. If one cell is abnormal, then execute the next step; otherwise, the current abnormal blob is considered as a false abnormal blob, and the search process terminates.

• Continuing to search abnormal cells within the 4-neighbors of the current abnormal cell according to the mixed naive Bayes model.

• Marking all abnormal cells with red

3.2 Nearest neighbor descriptor

The construction process of the nearest neighbor descriptor is described as below.

  • Firstly, each image is divided into cells with the same size of h × w.

  • Then, the spatio-temporal gradient magnitude of each pixel (i, j) v ij , the variance M 2, the skewness M 3, and the kurtosis M 4 of each cell are computed respectively:

$$ \left\{\begin{array}{l}{M}_r=\left[{m}_{i,j}\right],\kern1.5em i=1,2,\dots, h,\kern0.5em j=1,2,\dots, w\\ {}{m}_{i,j}=\frac{1}{h*w}{{\displaystyle {\sum}_{i,j}\left({\nu}_{ij}\right)}}^r,r=2,3,4\end{array}\right. $$
(8)
  • Next, a vector M∈Rh×w is constructed as below:

$$ M=\left[{m}_2\kern0.5em {m}_3\kern0.5em {m}_4\right] $$
(9)
  • Finally, the distance between cell S and S’ is computed based on the same formula as in the reference [28]:

$$ d\left({V}_s,{V}_{s\hbox{'}}\right)={\displaystyle \sum_{\upsilon }{2}^{-2\upsilon }WA{V}_{\upsilon}\left(\left|{M}_s\right|-\left|{M}_{s\hbox{'}}\right|\right)} $$
(10)

and K spatio-temporal nearest neighbors of one cell in video sequence are formulated as:

$$ {X}^{sd}={\left[{d}_1,{d}_2,\dots, {d}_k,\dots, {d}_K\right]}^T $$
(11)

where d k is the distance between one cell and its k th nearest neighbor.

3.3 Anomaly detection using naive Bayes model

As shown in Fig. 1, the basic idea of the graphical model for mixed naive Bayes model is described as below.

Fig. 1
figure 1

Graphical model for the mixed naive Bayes model

  • Select a mixed-membership vector π ~ Dirichlet(ζ).

  • For each feature x j of X:

    • Select a component z j  = c ~ dicrete(π).

    • Select a feature value x j  ~ p Ψj (x j |θ jc ), where Ψ j and θ jc jointly determine whether the feature j and the component c can construct an exponential distribution function.

The naive Bayes model can be obtained by training video sequences with normal behaviors, and the detail training process is described as follows.

  • Given the model parameter φ and a set of Gaussian distributions Ω = (μ jc , σ jc , [j]1 d, [c]1 k), the probability density function of X within the mixed naive Bayes model can be formulated as below:

$$ p\left(X\left|\varphi, \varOmega \right.\right)={\displaystyle \underset{\pi }{\int }p\left(\pi \left|\alpha \right.\right)\times \left({\displaystyle \prod_{\begin{array}{l}j=1\\ {}\exists {x}_j\end{array}}^d{\displaystyle \sum_{c=1}^kp\left({z}_j=c\left|\pi \right.\right)}\frac{1}{\sqrt{2\pi {\sigma}_{jc}^2}} \exp \left(-\frac{{\left({x}_j-{\mu}_{jc}\right)}^2}{2{\sigma}_{jc}^2}\right)}\right)d\pi } $$
(12)

where μ jc and σ 2 jc are the mean and variance of the c th component within the j th Gaussian distribution.

  • Given training sets X = [X 1, X 2, …, X L ], the optimal parameters of φ *and Ω * can be obtained by maximizing the likelihood of the whole dataset p(X|φ, Ω):

$$ \left({\varphi}^{*},{\varOmega}^{*}\right)=\underset{\left(\varphi, \varOmega \right)}{ \arg \max }p\left(X\left|\varphi, \varOmega \right.\right) $$
(13)
  • In the phase of learning optimal parameters φ *and Ω *, a variational expectation maximization algorithm discussed in reference [18] is here utilized to achieve the expression of the mixed naive Bayes model quickly.

During the test phase, the learned mixed naive Bayes model is utilized to compute the log-likelihood l = log p(X|α,Ω) of nearest neighbor descriptors, and the behavior in one cell cab be considered as anomaly if the following inequality is true:

$$ \left|l\right|<T $$
(14)

where T is an appropriate threshold.

4 Experimental results

Experiments are conducted on two benchmarks to demonstrate the capabilities of the proposed method for abnormal activity swtection. In the following, the benchmarks are first described, and then the evaluation criterias are introduced. Finally, experimental results and extensive evaluation are presented.

4.1 Dataset

UCSD

The UCSD dataset includes ped1 and ped2 subsets for detecting local abnormal behavior[6, 13]. The ped1 video subset consists of 20 training video sequences, 14 validation video sequences and 36 testing video sequences, where each video sequence includes 200 frames in total. The ped2 subset consists of 10 training video sequences, 6 validation video sequences and 12 testing video sequences, where the frame number of each video sequence is 120, 150 or 180. The normal behaviors on the UCSD datasets are defined as walking with normal speed. The local abnormal behaviors mainly include irregular moving, such as skating, biking, and driving. Figures 2 and 3 show several frames of normal and abnormal behaviors from the ped1 dataset and ped2 dataset respectively.

Fig. 2
figure 2

Examples from the Ped1 dataset. a An example of normal walking behavior. b An example of abnormal behavior of biking. c An example of abnormal behavior of driving

Fig. 3
figure 3

Examples from the Ped1 dataset. a An example of normal walking behavior. b An example of abnormal behavior of biking. c An example of abnormal behavior of driving

SubWay

The subway dataset contains two video sequences recorded by a camera at the entrance gate and a camera at the exit gate respectively [1, 20]. The first video sequence, the entrance gate video sequence is 96 min long and contains normal behaviors including going down through the turnstiles and entering the platform. There are also 66 abnormal behaviors, including walking in the wrong direction, irregular interactions between people, sudden stopping, running fast. The second one, the exit gate surveillance video, is 43 min long and contains 19 anomalous events, mainly walking in the wrong direction and loitering near the exit gate. Neither the surveillance videos nor groups of frames within them are labeled as training or testing videos. Figure 4 shows several examples from the subwy dataset.

Fig. 4
figure 4

Examples from the subway dataset. a An example of normal behavior from the entrance video sequence. bc Examples of abnormal behavior where a person is exiting through the entrance gate. d An example of abnormal behavior where two persons are exiting through the entrance gate. e An example of normal behavior from the exit video sequence. fh Examples of abnormal behavior where a person is entering through the exit gate

4.2 Evaluation criteria

To better evaluate the performance of the proposed approach, a pixel-level based criterion is here adopted to compare the local abnormal behavior results with the ground-truth anomaly behaviors. Specifically, if at least 40 % of anomalous pixels within a detected anomaly image frame are real anomaly pixels, then the corresponding image frame is considered as an abnormal frame.

The pixel-level based receiver operating characteristic (shorted as ROC) curve is here adopted as the performance evaluation criterion, which is the integrated index of true positive rate (shorted as TPR) and false positive rate (shorted as FPR). Specifically, the detection accuracy is high when the area under the ROC curve is large. Furthermore, the more the ROC curve closes to the upper left, the larger the value of TPR and FPR are.

4.3 Prameter setting

There are several parameters needed to be set: K, υ, h, w, ρ and T. The algorithm is not very sensitive to the number of neighbors. In the current implementation, K is set as 9. The wavelet transform scale value υ is set as 8, since there is no improvement in performance can be obtained with more scale values. After cross-validation, the size of each blob h × w is fixed at 60 × 40, and the threshold in the mixed naive Bayes model T is set as 1.25. Furthermore, the threshold in Eq. (1), ρ is set as 1.5.

4.4 Abnormal detection results on UCSD

In our work, the detection results of local abnormal behaviors are marked with red rectangular boxes. Some examples of anomaly behavior detection on USCD and subway datasets are given in Figs. 5, 6 and 7 respectively.

Fig. 5
figure 5

Examples of anomaly behavior detection from the Ped1 dataset. a A pedestrian drives a wheelchair. b A pedestrian rides a bicycle. c A pedestrian drives a car. d A pedestrian rides a skateboard

Fig. 6
figure 6

Examples of anomaly behavior detection from the Ped2 dataset. a A pedestrian drives a car. b A pedestrian drives a bicycle. c Two pedestrians drive bicycles. d A pedestrian drives a bicycle and A pedestrian rides a skateboard

Fig. 7
figure 7

Examples of abnormal behavior detection from the subway dataset. ab Examples of abnormal behavior where a person is exiting through the entrance gate. c Examples of abnormal behavior where two persons are exiting through the entrance gate. df Examples of abnormal behavior where a person is entering through the exit gate

It can be seen from the results, the proposed method can detect different kinds of local anomalies, such as a pedestrian drives a wheelchair in Fig. 5a, a pedestrian rides a bicycle in Fig. 5b, a pedestrian drives a car in Fig. 5c, a pedestrian rides a skateboard in Fig. 5d, a pedestrian drives a car in Fig. 6a, a pedestrian drives a bicycle in Fig. 6b, two pedestrians drive bicycles in Fig. 6c, a pedestrian drives a bicycle and a pedestrian rides a skateboard respectively in Fig. 6d, a person is exiting through the entrance gate in Fig. 7a, two persons are exiting through the entrance gate in Fig. 7b and c, a person is entering through the exit gate in Fig. 7d–f. It can be seen from the experimental results in Figs. 5, 6 and 7 that the proposed approach also has the omission phenomenon of slow abnormal behaviors, such as a pedestrian cycling slowly in Fig. 6d, which is caused that slow abnormal behaviors cannot be detected by the information of optical flow.

The receiver operating characteristic curves of the proposed approach and other three approaches on Ped1 and Ped2 datasets are given in Figs. 8 and 9 respectively, where the compared approaches contain social force based method [16], mixed dynamic texture based method [13] (shorted as MDT), adaptive optical flow filtering based method [11] (shorted as AOF). It can also be seen from the experimental results in Figs. 8, 9 and 10, the proposed approach is superior to other three approaches.

Fig. 8
figure 8

Pixel-level receiver operating characteristic of the Ped1 dataset

Fig. 9
figure 9

Pixel-level receiver operating characteristic of the Ped2 dataset

Fig. 10
figure 10

Pixel-level receiver operating characteristic of the subway dataset

The area under receiver operating characteristic (shorted as AUC) of four approaches is shown in Table 1. The results illustrate the same conclusion that the proposed approach works better than other three approaches.

Table 1 The area under receiver operating characteristic of four methods

The processing time per frame in seconds of tested methods using different datasets are liiustrated in Table 2, where the approximate computational time is obtained on a PC with an Intel E6700 CPU and 2GB of RAM. It can be seen from Table 2 that the proposed method is considerably faster andit requires much less memory to store the learnt data.

Table 2 The required computational time of four methods

5 Conclusion

This paper proposes a novel approach for local abnormal behavior detection. The proposed approach needs no precise object detection or precise target tracking, and it also has some degree of robust. Semi-parameter based statistical model and largest optical flow energy model are utilized to reduce the search range of abnormal behaviors and improve the search efficiency. The experimental results on UCSD and subway public datasets demonstrate the effectiveness and superiority of the proposed approach.

There are some future works along this direction. One direction is that image segmentation technology can be utilized to locate abnormal tragets to improve abnormal behavior detection accuracy. Another direction is that a more comprehensive database needs to be established, such as fighting, escaping.