1 Introduction

At present, intelligent video surveillance is one of the major and challenging areas of research which is gaining importance due to its widespread applications. Detection of an abnormal activity, a suspicious person, suspicious objects, and most important unwanted circumstances (like road accident, bomb explosion etc) is really a big challenge for any surveillance system. Accurate detection of any anomalous events by the visual system is a critical task firstly due to the inherent difficulty in defining anomaly criteria and secondly due to the high volumes of data generated [1]. Currently, operators in control rooms use multiple cameras in around public buildings and along streets to detect an abnormal activity through monitoring round the clock. In most of the cases, the recorded videos are explored after the occurrence of a suspicious event. Even the number of operators will be also huge to handle increasing demand of camera nodes for larger coverage area. Accurate and timely detection of an event in real-time scenario is one of the key concerns in this area. Timely detection of these events provides sufficient time margin for taking action to reduce the impacts of those events. There is a need for real-time monitoring with smart cameras which are able to detect, track and analyze suspicious activity and minimize the impact of any unwanted circumstances.

An event is defined as ‘any action or incidence satisfying the threshold condition of some predefined criteria.’ Hence, depending on various criteria, the events may be categorized in different ways like accidental event, events related to crowd density, event related to crowd escape behavior [1, 2], and many more. Recently, detection of anomaly in the crowd behavior has become a very interesting and challenging area of research. In [3, 4], crowd is defined as ‘a collection of individuals’ and their mind and their mood may shape the behavior of crowd. One of the major factors influencing the mind and mood of the crowd is incidences occurring around them [5]. And this is how the crowd behavior originates [6]. Definition of anomaly depends on the user and varies from scene to scene and situation to situation. Here, anomaly is defined as ‘all of a sudden dispersion in crowd or all of a sudden abrupt change in the speed of crowd.’ Regarding this type of anomaly in crowd, several tools and methods are popularly used. Optical flow is one of these, and it has been used in several applications by several researchers. In [7], the authors have proposed an algorithm for the weighted velocity of optical flow vectors and introduced the concept of divergent center for crowd escape behavior detection and localization. Optical flow has been used for crowd motion modeling. An improvement was seen in [7] compared to social force model [8] in terms of both performance metrics accuracy and timely detection. A new concept of object-based abstraction and acceleration feature was proposed in [9]. Its performance was superior to [8]. But the major limitation was an unsatisfactory performance for high-density crowds and poor real-time performance. Optical flow was again used in [10] but in a different way. Using the concept of Riemannian manifolds, a novel idea of optical flow manifolds and optical flow bundle was proposed for detecting various crowd behaviors like walking, running, dispersing, evacuation, merging and splitting. In [11], authors have developed a new method for learning motion pattern of crowded scenarios using the hierarchical agglomerative clustering algorithm. A less accuracy of 78% was obtained in this method. Graph modeling and matching [12] based on Delaunay triangulation have been proposed for human crowd behavior analysis. This method outperforms [8, 11] in both accuracy and timely detection. But in terms of accuracy, it was inferior to [7]. Concept of Delaunay triangulation has been used in [13] where crowd has been represented as an evolving Delaunay graph, and a novel set of local mid-level visual descriptors has been used for crowd analysis. In [8], social force modeling was introduced to detect abnormal crowd behavior but it was less accurate and as an improvement of this work, social attribute-aware force model [14] came into light. In [14] scene scale estimation along with congestion and disorder attributes was added as an improvement of [8]. This improvement resulted in a supreme performance in terms of accuracy over other existing works. However, it may work unsatisfactorily in complex situations like grouping and formation of crowds. Pedestrian behavior modeling [15], spatiotemporal motion concept [16], unsupervised k-means clustering and semi-supervised HMM model [17], unusual event detection using multiple local monitors [18] are other methods for crowd modeling and anomalous detection of the crowd [19]. Velastin et al. [20, 21] use the image processing concept for the analysis of crowd movement and automated crowd density estimation. In the era of artificial intelligence, several machine learning concepts have played important role in crowd behavior analysis. Some of the works based on machine learning and deep learning are discussed in subsequent paragraphs.

One of the important fast and low-power field-programmable gate array (FPGA) implementations of crowd anomaly detection has been discussed in [22]. Here, authors have proposed a new hardware-friendly classifier KUGDA (k-means with univariate Gaussian discriminant analysis) to achieve the desired targets. The proposed method is based on outlier rejections. According to authors, this work is the first dedicated implementation of crowd anomaly detection. As future work, authors have suggested to use proposed features with deep neural networks. In another work [23], the authors have used deep learning architectures with slow feature analysis learning methods for video anomaly detection. For measuring the degree of abnormality, the authors have proposed a novel summed squared derivative value concept. This approach can detect both local and global anomaly. However, there is some scope of improvement for local anomaly detection in [23]. Continuing with the deep learning concept, the authors of [24] have proposed a deep learning-based anomaly detection system. They incorporated four modules, namely background estimation, object segmentation, feature extraction and activity recognition module, to achieve their goals. As future work, they focus on reducing the feature dimension while maintaining its robustness. In this continuation concept of hybrid features, using level set method has been introduced in [25]. Five different image descriptors such as color moments, edge histogram descriptor, color layout descriptors, color and edge directivity descriptors, and scalable color descriptors were extracted from an input video for the robust detection of an abnormal event. A novel method based on sparse reconstruction using two dictionaries, namely global dictionary and an online local dictionary, has been discussed in [26]. One-class support vector machine (SVM) model along with histogram of oriented gradients-local binary patterns (HOG-LBP) descriptor and histogram of oriented optical flow (HOF) descriptor has been used for abnormal event detection in crowded scenes in [27].

In this work, optical flow [28, 29] is used as the basic tool followed by correlation analysis of magnitude matrices [30] of optical flow vectors. An optical flow vector has four parameters. Here the magnitude parameter of optical flow is utilized [31]. Then a threshold value of correlation coefficients has been empirically estimated, and a new approach has been proposed to detect an anomaly in the crowded scenes.

2 Methodology

The basic steps of the entire methodology for anomalous crowd behavior detection are (a) capturing actions using CCTV cameras, (b) checking all threshold conditions, and (c) making decision whether event occurred or not (Fig. 1).

Fig. 1
figure 1

A framework for anomalous crowd behavior detection

2.1 Anomalous crowd behavior detection

Whenever there is a crowded place, movement of people is very less or there is only slight movement in crowd. Hence, frame by frame there is little change or almost negligible change in the scene. Hence, there will be high correlation between consecutive frames, and consequently, gradient values of consecutive correlation values will be very less. But once crowd will start dispersing all of a sudden, the consecutive frames will be differ from one another by a considerable amount and hence the correlation between consecutive frames will start decreasing and hence gradient values of the consecutive correlation values will increase. The flowchart of the entire process is shown in Fig. 2.

Fig. 2
figure 2

Flowchart of entire process of anomalous crowd behavior detection

3 Proposed algorithm for anomalous event detection

figure a
figure b

4 Results and discussion

The proposed methodology has been assessed with the videos obtained from University of Minnesota (UMN) and PETS 2009 datasets. The UMN dataset contains a combination of 11 videos of three different scenarios. Two of these scenarios are in the broad daylight with perfect illumination condition, whereas the third scenario is inside a hall where the illumination condition is little poor. In PETS 2009 dataset, the videos have been taken in perfect illumination conditions. Several video streams (112 combinations) have been generated from the video datasets by random placement of event locations. In this way, total 84 event locations and 28 non-event locations have been generated for the study. All the simulations have been carried out on Intel(R) Core(TM) i7-6700 CPU @ 3.40 GHz processor with 8.00 GB RAM.

Test video-1

The length of the video (Test video-1) is of duration 21 s and contains 625 captured frames. The size of the each frame is 320-by-240. Sub-sampling has been done on test video-1 frames. In total, 156 frames have been sub-sampled with a frame gap of 4 frames out of 625 frames for analysis. The video frame of normal movement (people are walking at normal speed) and abnormal crowd movement (people are running in fear) is shown in Fig. 3.

Fig. 3
figure 3

a Normal movement of crowd (normal walk) whereas b, c crowd’s abnormal movement (running)

Figure 4 shows the plot of correlation coefficient versus frame number in which a pattern is observed which tells that during normal movement of crowd the correlation was high, but during abrupt movement correlation values started falling down, and once very few people were left and entire seen was almost unchanged again, high correlation values obtained. The event started at 120th frame and end at 146th frame.

Fig. 4
figure 4

Plot of correlation coefficient versus frame number when correlation analysis was done on the magnitude parameter of optical flow of consecutive frames. Here the non-event region and the region in which event detected is shown

Test video-2

This video is extracted from UMN dataset. This scenario contains a total of six videos. The length of the video is of duration 32 s and has 766 frames. The frame size is 320-by-240. The frames were sub-sampled at a gap of four frames and extracted.

Figure 5 shows three frames where the first frame is of normal movement of crowd and the other two frames show anomalous behavior of crowd. The correlation analysis of the frames is shown in Fig. 6. During normal movement, correlation values are high and as the random movement started corresponding to the event occurrence, the correlation values decreased. It again increased as the event ends. The event region is identified between 127th and 148th frame.

Fig. 5
figure 5

a Normal movement of crowd and b, c crowd’s abnormal movement

Fig. 6
figure 6

Plot of correlation coefficient versus frame number. The correlation analysis is done on the optical flow of consecutive frames

Test video-3

This video is also extracted from UMN dataset. The length of video is 34 s and has 807 frames. The frame size is 320-by-240. The frames were extracted at a gap of four frames; hence, there are 201 frames.

In Fig. 7, the first frame shows normal crowd movement and the other two frames show anomalous behavior of crowd, and Fig. 8 contains the simulated results of this video. In Fig. 8, initially correlation values are high representing the normal movement of crowd, a dip followed by a rise again indicates that the crowd started dispersing and after some time there were no one in the scene. The frame numbers between which the event was detected is 179th frame and 198th frame.

Fig. 7
figure 7

a Normal movement of crowd and b, c crowd’s abnormal movement

Fig. 8
figure 8

Plot of correlation coefficient values of optical flow of consecutive frames against frame number

The proposed method has been assessed on several videos generated from UMN dataset [32] and PETS 2009 dataset [33]. In total, 112 videos have been generated, out of which 84 video are categorized as event videos and 28 are categorized as non-event videos. The value of correlation coefficient is considered here as an anomaly criteria, and its threshold for the event detection is chosen as 0.75 under proper as well as improper illumination condition. The performance of the proposed method is assessed with receiver operating characteristic curve (ROC), and the results are shown in Fig. 9. Various parameters like area under the curve (AUC), precision, accuracy and detection rate are presented in Table 1. Precision also known as positive predictive value is defined as TP/(TP + FP), and accuracy is defined as (TP + TN)/(TP + FN + FP + TN). The detection rate also known as recall or sensitivity is defined as TP/(TP + FN). It is also defined as the number of detected events divided by number of ground truth events. A brief explanation of these terms can be found in [12]. The metric timely detection is measured as experimentally determining the frame at which event started and the frame at which event finished. The result shows that the average detection rate of the proposed method is 97.61%, i.e., almost all the events are detected properly.

Fig. 9
figure 9

ROC curve for the proposed method on the UMN and PET’09 dataset

Table 1 Performance analysis of the proposed method for several videos generated from UMN and PETS 2009 datasets

Table 2 presents the performance comparison of the proposed method with other existing methods. Different authors have used different metrics to show their performances and superiority over other state of the art. It is elucidated that performance of the proposed method is either comparable or better than the existing methods. Another metric timely detection is also used for comparison. So, the simulation time and frame around which the event occurred have been assessed and the results are shown in Table 3. The results show an improvement regarding these factors compared to existing works. We have used the same datasets for the comparison.

Table 2 Performance comparison of the proposed method with existing state-of-the-art methods
Table 3 Performance comparison of the proposed method with the method of [7, 9, 10] based on the simulation time and frame numbers around which event occurred

From Table 3, it is concluded that the proposed method is superior to other existing state-of-the-art methods regarding timely detection. Compared to other works, the proposed method detected the events earlier and results were more nearer to the actual time of event occurred.

Further, the study has been carried out to observe the accuracy of the event detection with various threshold and the results are shown in Table 4.

Table 4 Performance comparison of accuracy of detection of events for proper and improper illumination condition by varying the threshold values

The results clearly show that the optimal threshold value is 0.75 giving an accuracy of 97.32%. Also this threshold value handles the illumination condition, the proper one and the improper one.

Table 4 and Fig. 10 show the impact of variation of threshold values on the accuracy of detection of events. The effect is more pronounced in case of improper illumination condition. As we can see in Fig. 9, the accuracy varies in a range of 0.76 and 0.98 by varying the threshold values in case of proper illumination condition, whereas its variation range is between 0.19 and 0.95 under improper illumination condition. Thus, the optimal threshold value is judiciously chosen as 0.75. At this value, the accuracy percentage is 98 and 95 for proper and improper illumination condition, respectively. Thus, it is clearly observed that the threshold value of 0.75 is suitable under any illumination condition. The overall accuracy of the proposed method in the optimum threshold is 97.32%.

Fig. 10
figure 10

Variation of accuracy values with the threshold value of the correlation coefficient

5 Conclusion

In this paper, a method based on correlation analysis of the optical flow is proposed for detection of anomalous behavior of crowd. Exhaustive experimental results demonstrate that the proposed method provides more accurate and fast results compared to the existing methods. The correlation coefficient of the optical flow of consecutive frames provides a pattern of non-event region and event region. It has been found that the range of correlation value is large to decide a threshold under proper illumination condition, whereas its range is small under improper illumination condition. Based on the empirical study, the optimum threshold value for all the illumination condition was found to be 0.75 for accurate detection. In this threshold, the proposed method provides an accuracy of 97% on standard datasets.