Real-time frequency-based detection of a panic behavior in human crowds

Aldissi, Bahya; Ammar, Heyfa

doi:10.1007/s11042-020-09024-z

Real-time frequency-based detection of a panic behavior in human crowds

Published: 26 June 2020

Volume 79, pages 24851–24871, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Real-time frequency-based detection of a panic behavior in human crowds

Download PDF

327 Accesses
7 Citations
2 Altmetric
Explore all metrics

Abstract

The real-time detection of a panic behavior in a human crowd is of a high interest as it helps alleviating crowd disasters and ensures that timely appropriate action will be taken. However, the fast analysis of video sequences to detect abnormal behaviours is one of the most challenging tasks for computer vision experts. While many research works propose off-line solutions, few studies investigate the real-time analysis of crowded scenes. This may be due to the fact that detecting a panic behaviour is closely related to the analysis of the crowd dynamics, which commonly necessitates heavy computations. In order to alleviate this problem, we propose a real-time panic detection technique that analyzes the crowd movements based on a simple and efficient solution. The key idea of the proposed approach consists of analyzing the interactions between moving edges along the video in the frequency domain. Our contribution is threefold. First, moving edges are considered for analysis along the video. Second, when a panic situation occurs within a human crowd, it leads to interactions between people that are different from those that occur during a normal situation. Therefore, to reveal such a behavior, a new frequency based-feature is proposed. To select the most appropriate frequency domain, the fast fourier transform, the discrete cosine transform and the discrete wavelet transform are investigated. Third, two different formulations of the problem of detecting a panic are explored. The experimental evaluation of the proposed technique shows its outperforming compared to the state-of-the-art approaches in terms of detection rates and execution time.

Statistical detection of a panic behavior in crowded scenes

Article 18 September 2018

An Improved Approach to Crowd Event Detection by Reducing Data Dimensions

Detecting Abnormal Behavioral Patterns in Crowd Scenarios

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The study of crowd scenes is becoming a field of considerable interest to researchers, mainly due to the rising number of popular events and public places that facilitate the mass gathering of people. Such occasions and spaces include markets, subways, religious festivals, sporting events and public demonstrations [32, 43]. Often, a crowd may induce a disastrous event due to fight, congestion, mass panic or various other reasons [18]. Many crowd disasters occurred recently [1, 6, 14].

In an attempt to prevent such deadly disasters from occurring, most public areas including holy places, campuses, residential areas and airports are now equipped with closed-circuit television (CCTV) surveillance cameras. The incoming video can be automatically analysed to facilitate the early detection of a possible abnormal event. The automatic detection of panic behaviours is the interest of the present study. Panic manifests as a sudden change in the crowd dynamics. It appears in the video feed as an atypical behaviour: moving in different directions, speed increase, collective running, grouping in one region and so on.

Many research projects have been conducted to automatically detect panic behaviours [7, 12, 13, 15, 21, 22, 24, 26, 29, 31, 34, 35, 40,41,42]. Despite their good detection performances, the majority of them propose off-line solutions. Although they are useful in many situations such as police investigation, it is important to detect a panic situation as soon as it occurs using a real-time detection approach.

To the best of our knowledge, few real-time techniques are proposed in the literature [13, 15, 22, 24, 26, 31, 34, 35, 42]. The common scheme of the panic detection approaches is mainly composed of three steps. First, the motion field is estimated since motion is a crucial characteristic of the crowd dynamics. Second, a feature that characterizes the crowd behavior is extracted. It is based on the way a panic behavior is defined. Third, panic is detected as a deviation of the values, taken by the selected feature, from those obtained during a non panic situation. For instance, in [35], a panic is identified by the presence of atypical motion patterns in the scene. Motion patterns of a non panic situation are learned thanks to the computation of motion representative subspaces on videos of normal behaviors. During the testing phase, the motion field of the considered video is estimated. Then, an approximation using the representative subspaces determined through the training stage is carried out. If the error between the estimated motion and its approximation exceeds a threshold adjusted by the user, then the presence of an abnormal behavior is concluded. However, the performances of this technique depend on the amount of trained data and the diversity of the human behavior in crowded scenes makes it difficult the enumeration of all possible normal behaviors.

More recently, a real-time detection technique based on texture modelling is proposed in [34]. It associates the occurrence of a panic behavior to a temporal change of the texture. According to the authors [34], this technique achieves an execution time of 18 ms per image. However, its performances may degrade when the spatial-temporal texture patterns are highly heterogeneous during a non-panic situation.

Detecting grouping and running behaviors are the main interest of the work described in [42]. First, the motion is estimated using an optical flow (OF) estimation approach. This estimation is carried out only on the Harris corners in order to alleviate the computations. Second, two parameters are chosen as features to characterize the two behaviors. The first parameter is a crowd distribution index that reflects the gathering level of people in a local area. The second parameter is the velocity. The combination of the two parameters forms the kinetic energy of the crowd. Running or gathering behaviors are detected if the energy or the crowd distribution index are greater than a threshold. This technique achieves a near real time execution of 20 images per second.

In [24], panic is defined as an unexpected change in the spatial occupancy of moving objects. An abnormal event is recognized when a high temporal variation of the space occupancy occurs during a given time interval. However, the space occupancy is computed in terms of number of pixels. Thus, it does not take into consideration the way by which the space is occupied along the video. In other terms, the space can be occupied by approximately the same number of moving pixels, but not with the same spatial distribution. The spatial positions of moving blobs are more likely to change during a panic situation as a consequence of the reaction of pedestrians when a dangerous event occurs.

By examining the reported real time techniques, two main limitations can be pointed out. First, although the motion information is important in characterizing the crowd dynamics, the OF estimation yields heavy computations. Also, tracking moving objects or restricting the motion estimation to some points of interest may fail in the presence of occlusions or a highly dense crowd. To alleviate this problem, we suggest to detect moving pixels by computing the absolute differences between pairs of successive images. Contrary to what one might expect, this step does not affect the robustness of the whole proposed system, to noise, occlusions and illumination variations as demonstrated later in Section 4.

The second limitation is that considering a panic as a temporal change in the texture may limit the performances of the system when the scene is highly textured. In the same way, defining a panic as a temporal change in the number of moving pixels within the crowded area may degrade the detection accuracy when people cannot move outside the area in presence of panic. As a solution, and motivated by the fact that the occurrence of a panic situation changes the way people behave with respect to each other, panic is viewed, in the present study, as a sudden change in the interactions between people. The same definition of panic has been considered recently in [29] and led to high detection performances in crowded scenes of any density level. However, the approach described in [29] cannot be investigated for real-time detection due to the computational complexity of the OF applied to all pixels of each image. In the present work, we restrict the analysis of the interactions between moving objects to the interactions between moving edges. Thus, the problem of analysing the spatial interactions between pedestrians is formulated as the problem of analyzing how the spatial distribution of moving edges varies in time. A sudden and remarkable temporal variation is associated to the occurrence of a panic situation. Our rationale is to conduct the analysis of the spatial distribution of moving edges, in the frequency domain. This is motivated by the fact that the spatial distribution of edges is easily perceived in the frequency domain by coefficients of high values. Furthermore, the transformation into the frequency domain allows to get a sparse representation of the spatial discontinuities. Any transformation to the frequency domain could be applied. The fast fourier transform (FFT), the discrete cosine transform (DCT) and the discrete wavelet transform (DWT) are explored in the present work.

To analyze the crowd behavior, a new feature is proposed based on the coefficients obtained through the transformation. A sudden increase in the value of the feature reveals the presence of a panic. In order to temporally locate the panic behavior, the values taken by the proposed feature are classified into two subsets: the first one is related to the normal instances of the video while the second one corresponds to the panic instances. We perform this classification using two different techniques as detailed in Section 3. An experimental comparison between both is presented in Section 4.

Three main contributions are proposed in this work:

1.
The first contribution aims to alleviate the heavy computations resulting from applying a motion estimation technique, by considering the absolute differences between pairs of successive images of the video. This solution allows a fast analysis reaching 406 frames per second (fps) as shown in Table 4.
2.
Considering a panic situation as a sudden change in the interactions between people, our second contribution consists of associating the spatial distribution of moving edges to people interactions. Thus, the problem of analyzing people interactions is formulated as the problem of analyzing the temporal variation of moving edges distribution.
3.
As a third contribution, we propose a new feature that characterizes the interactions between pedestrians. Our rationale is to sparsely represent moving edges in the frequency domain where they are expressed by coefficients of high values. When panic starts, the spatial distribution of moving edges suddenly changes implying a remarkable change in the values of the coefficients. Hence, the feature we propose is the sum of the coefficient absolute values at each instant.

The rest of the paper is organized as the following. The datasets used in this study are described in Section 2. Then, the proposed system is detailed in Section 3 and experimentally evaluated in Section 4. Next, the results are discussed in Section 5. Finally, some conclusions are drawn in Section 6.

2 Datasets

A variety of datasets are used in order to deal with various scenes. As depicted in Table 1, videos including artificial and real behaviors with different density levels, and different image sizes are analyzed.

Table 1 Characteristics of the datasets

Full size table

A brief overview of each data set is given in what follows.

University of Minnesota (UMN) dataset

It is a public dataset produced by the University of Minnesota, USA [23]. It is composed of 11 video sequences representing escape events, and captured in various contexts: Lawn, Indoor and Plaza. People in these videos walk around normally until an abnormal event occurs which makes them run away. A ground truth (GT) of this dataset is available in [12].

Motion Estimation Dataset (MED)

is a public dataset that includes 11 videos of panic behaviors [25]. Typical scenarios are: putting down a suspicious backpack, earthquake, hoodlum attack and sniper attack. The GT of this dataset is annotated and made publicly available by authors of the work [25].

Performance Evaluation of Tracking and Surveillance 2009 (PETS2009) dataset

is recorded at University of Reading, UK [9]. It includes many scenarios, where each scenario is captured from four different views. Two scenarios are analyzed in the present study. In the first scenario of 107 images, people start walking from the left side until an abnormal event occurs which makes them run away. The second scenario is composed of 378 images. People in this scenario start gathering to the middle until an abnormal event occurs which makes them run away in different directions. The videos of this dataset are challenging as they contain frequent illumination variations.

Festival crowd

[38] This video is a real scene of high people density. It records a festival event and shows people who are initially gathered until an abnormal event occurs. This video is challenging as it includes frequent people interactions, obstacles and occlusions.

Bull-running festival

[37] This video records a bull-running festival in Spain. In the beginning, it shows people walking, then they start freeing space for the coming bulls. After that, some of the bulls enter in the scene which causes people running. Critical occlusions appear in this video.

3 Method

The proposed approach is composed of four main stages as shown by the block-diagram of Fig. 1. Given the streaming video transmitted by the CCTV camera, the K images of the video are transformed into a grayscaling level. The first step of the proposed method consists of computing the absolute differences $\{D^{(k)}\}_{k}$ between pairs of successive images I^(k) and I^(k+ 1) (∀k = 1, … , K − 1). This phase allows to locate the moving edges at each instant. Second, the resulting maps $\{D^{(k)}\}_{k}$ are transformed in a frequency domain. The obtained coefficients of high absolute values in the frequency domain correspond to the spatial discontinuities within the map D^(k). These coefficients allow to reveal the way the moving edges are distributed within a local area. In the same way, local homogeneous regions, such as non moving areas, are represented by coefficients of low absolute values. The absolute values of the coefficients of D^(k) are summed, giving rise to S^(k). Third, the variation of S^(k) along time (∀k = 1, … , K − 1) allows to identify whether a remarkable increase, which may be associated to a panic situation, exists. To detect a panic behavior within the set $\{S^{(k)}\}_{k}$, two alternatives are explored in this study: a clustering based approach and a statistical based approach. Finally, the detection performances are refined by removing false alarms through a postprocessing phase.

3.1 Absolute image differences computation

This step aims to detect moving edges of the objects present in the video by a minimum number of computations. Hence, the absolute difference D^(k) between two successive images I^(k) and I^(k+ 1) of the video is carried out as:

$$ D^{(k)}=|I^{(k+1)}-I^{(k)}|, \quad \forall k=1,\ldots,K-1 $$

(1)

where K is the number of images in the video. The resulting matrices $\{D^{(k)}\}_{k=1,\ldots ,K-1}$ locate the moving edges between successive instants. Furthermore, they reveal the spatial distribution of the moving pixels at each instant. This distribution undergoes variations in a certain range during a non panic situation. When a panic occurs, it largely varies due to a sudden and a remarkable change in people behavior. An illustration is given in Fig. 2 where the first column depicts an image extracted during a non panic situation along with the corresponding map D^(k). The second column shows an image extracted during a panic situation and its related map D^(k). The bright pixels in D^(k) indicate a high intensity differences between the successive images. It is well noticeable that small variations exist during a non panic situation; whereas in a panic situation, the number of moving pixels increases, and the absolute pixel intensity differences between successive images are higher(shown by bright pixels in D^(k)) as a consequence of a faster change in the characteristics of the pedestrian movements during panic.

In order to detect this remarkable spatial-temporal change, we resort to the analysis of the distribution of moving edges in the frequency domain as explained in the next paragraph.

3.2 Proposed frequency-based feature for the characterization of the crowd dynamics

The aim of this step is to analyze the behavior of the pedestrians at each instant through analyzing the spatial distribution of the corresponding moving edges. For this, the transformation of D^(k),∀k = 1, … , K − 1 in a frequency domain is retained for its efficiency to locate discontinuities in one hand, and the sparse representation it offers on the other hand. The FFT [3], the DCT [2] and the DWT [19] are explored in the present work. The transformation $\mathcal {T}_{F_{d}}(D^{(k)})$ of D^(k), ∀k = 1, … , K − 1 in a frequency domain F_d ∈{FFT,DCT,DWT} yields a set of coefficients {c^(k)} stored in a matrix C^(k). The spatial discontinuities in D^(k) are transformed into high-magnitude coefficients in the frequency domain, which represent a minority among the whole set {c^(k)}. On the contrary, a majority of low-magnitude coefficients correspond to the local spatial homogeneities as illustrated by the histograms of Fig. 3. Furthermore, the differences in people interactions between the non panic and the panic situation are well highlighted. For instance, in this excerpt, with the same pedestrians present in both situations, the magnitude’s range of the set of coefficients {c^(k)} is larger during a panic than in case of normal behaviors. In addition, the number of high-magnitude coefficients during a panic situation is greater than their number during a non panic situation. It is to be noted that the same behavior is observed regardless the chosen frequency domain.

These observations motivated us to propose a new feature S^(k) defined for each D^(k) by:

$$ S^{(k)}=\left\{ \begin{array}{ll} \sum\limits_{(r,s) \in C^{(k)}} |c^{(k)}(r,s)|, & \text{if } F_{d} \in \{FFT, DCT\} \\ \sum\limits_{j=1}^{J} \sum\limits_{o=1}^{\mathcal{O}} \sum\limits_{(r,s) \in c_{(j,o)}^{(k)}} |c_{(j,o)}^{(k)}(r,s)|, &\text{if } F_{d}=DWT. \end{array} \right. $$

(2)

where J is the number of wavelet decomposition levels, $\mathcal {O}$ is the number of orientations at each level and $c_{(j,o)}^{(k)}$ is the wavelet subband at the resolution level j = 1, … , J and the orientation $o=1,\ldots ,\mathcal {O}$. For a dyadic wavelet, $\mathcal {O}=3$ and $c_{(j,1)}^{(k)}$ denotes the horizontal subband (o = 1) at the resolution level j, $c_{(j,2)}^{(k)}$ denotes the vertical subband (o = 2) and $c_{(j,3)}^{(k)}$ is the diagonal subband (o = 3). As explained later in this paper (Section 4), several dyadic wavelet transforms are experimented with different decomposition levels ranging from J = 1 to J = 3. It is found that a 1-level decomposition (J = 1) yields the best detection performances.

The feature S^(k) allows to quantify the discontinuities between moving pixels at each instant. Furthermore, it facilitates the distinction between a non panic and a panic behavior.

The examination of the temporal variation of S^(k) reveals a sudden change in its values when a panic occurs. An illustration of this behavior is depicted in Fig. 4 where the temporal variation of S^(k) along the video 9 of the UMN dataset is displayed.

As can be noticed, the values of S^(k) vary within the same range until the 551^st image where a remarkable jump occurs due to the occurrence of a panic behavior, and lasts for about 120 images. Then, the curve drops when people leave the scene. Another jump is also noticed within the images 301 and 302, when the DWT is applied. However, this peak does not correspond to a panic given its very short duration and is automatically eliminated according to the processing described in Section 3.4.

The next step consists of automatically detecting the high values of S^(k) as they reveal the presence of a panic situation.

3.3 Panic detection

Two relevant and distinguishable behaviors are present in a video containing a panic situation : non panic and panic related behaviors. They are reflected by the presence of two classes of values in the set $\mathcal {S}=\{S^{(k)}\}_{k}$ respectively : low and high values. In this study, we propose to formulate the problem of detecting the high values by using two different formulations. The first one considers classifying the set $\mathcal {S}$ into 2 classes using a clustering technique [16]. The second formulation, proposed in [29], considers the high values as atypical observations that statistically deviate from the distribution followed by the low values. Besides, without loss of generality, the high values are assumed to be a minority within the set $\mathcal {S}$ and hence are considered as outliers, detected thanks to the use of a statistical test for outlier detection [27]. We investigate the two formulations and we compare them in terms of detection performances and execution time.

3.3.1 Clustering based detection

The objective of this step is to differentiate between the data observations in $\mathcal {S}$ that correspond to a panic behavior, from those related to a normal behavior, by using a clustering technique. The idea is to build clusters of data by grouping in each cluster the data points that are as close as possible to each other with respect to a given distance, in one hand. On the other hand, the distance between clusters is required to be as large as possible. To detect the values of $\mathcal {S}$ that correspond to a panic, two clusters have to be identified. The first cluster S_np corresponds to the values obtained during a non-panic situation; while the second cluster S_p includes high values that are related to a panic situation.

Several clustering techniques are proposed in the literature [10, 11, 20, 28]. The comparison of their performances in detecting panic is conducted in Section 4.

3.3.2 Statistical detection

The aim is to partition $\mathcal {S}$ into two homogeneous subsets: a subset S_np of the majority of observations related to a non panic situation and another subset S_p containing a minority of observations of remarkably higher values that are related to a panic situation. Motivated by the characteristics of each subset and the differences between them, we emphasize the possibility of identifying them through a hypothesis testing. More precisely, the observations in S_p are considered to be deviating from the statistical distribution followed by the ones in S_np. Their detection can therefore be performed following two phases. The first phase aims to estimate the mean and variance of S_np by analyzing $\mathcal {S}$ robustly to the presence of the other category of observations (those being part of S_p). To this aim, the Minimum Covariance Determinant (MCD) estimator is retained for its efficiency and relatively low computational complexity [27]. The second phase aims to deduce the set S_p, given the estimated parameters of the distribution of S_np.

1.
Parameters estimation: The key idea of MCD to estimate the mean and the variance of $\mathcal {S}$ robustly to the presence of the observations of S_p, is to look for the most concentrated subset in $\mathcal {S}$ of size h = (1 − α)(K − 1) among h-subsets, given a confidence level 0 < α < 1. Hence, the observations $s_{i} \in \mathcal {S}$ are firstly ordered in an increasing order. Then, contiguous h-subsets H_i are built as H_i = {s_(i), … , s_(i+h− 1)}. For each subset, the mean and the variance are computed. Then, the most concentrated subset is the one whose variance ${\sigma ^{2}_{c}}$ is the minimum among the variances of all the subsets H_i. Its mean is denoted by μ_c.
2.
Detection of panic related observations: As outlined before, panic related observations have distinguishable values compared to the non panic related ones, and hence are considered as outliers. An observation s_i of $\mathcal {S}$ is considered as an outlier if its distance d(s_i,μ_c,σ_c) from the mean μ_c relatively to σ_c exceeds a tabulated threshold T derived with respect to a confidence level α. This distance is defined by:
$$ d(s_{i},\mu_{c},\sigma_{c})=\frac{|s_{i}-\mu_{c}|}{\sigma_{c}}, \forall i \in 1,\ldots,K-1. $$
(3)

Hence, the two subsets S_np and S_p of $\mathcal {S}$ related respectively to non panic and panic situations are deduced by:

$$ \begin{array}{@{}rcl@{}} S_{\text{np}}&=&\{s_{i} \in \mathcal{S}; d(s_{i},\mu_{c},\sigma_{c})<T \},\\ S_{\mathrm{p}}&=&\{s_{i} \in \mathcal{S}; d(s_{i},\mu_{c},\sigma_{c})\geq T \}. \end{array} $$

(4)

The MCD source code is part of the LIBRA package which is available at https://wis.kuleuven.be/stat/robust/LIBRA/LIBRA-home.

Figure 5 shows the detection result of the statistical test for outlier detection, when applied to the set $\mathcal {S}$ of Fig. 4c. As expected, the images 301 and 302 are considered as being part of the panic images. Other than these images, the panic event is detected earlier by just one image. To improve the detection performances, we propose a postprocessing step that aims to reduce the false detections.

3.4 Postprocessing

The proposed detection technique yields some false detections that should be reduced. To this aim and without loss of generality, the following assumptions are stated:

A panic behavior cannot happen over less than one second.
A panic behavior occurs once within a processed video.

The first assumption means that if N successive images are detected as containing a panic behavior and that N is less than the number N^∗ of images per second (equivalently, the sampling rate of the video), then, those images are considered as false detections and are discarded from the set of detections. According to the second assumption, it is then possible to identify the sequential number of the image when panic starts. It is the one whose all subsequent images were also identified as anomalous. As depicted in Fig. 5, the result of applying this processing shows the effective elimination of the false detections that are located separately to the sequence of panic images.

4 Results

To evaluate the performances of the proposed technique, four rounds of tests are conducted. The aim of the first round is to select the wavelet parameters that yield the best performances. In the second round, we seek to retain the panic detection method that yields the most accurate results. For this, common clustering techniques as well as the MCD test are confronted. The selection of the most suitable frequency domain is carried out in the third round. Finally, after retaining the appropriate parameters of the system, the detection performances are evaluated with respect to some highly-accurate offline techniques of the literature, and some real-time techniques. In order to quantify the performances of the tested techniques, the correct detection rate P_c (which is the same as the accuracy), the false detection rate P_f, the precision and the recall are computed. They are respectively defined in terms of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN) by:

$$ P_{c} = \frac{TP+TN}{K}, P_{f} = \frac{FP+FN}{K} $$

(5)

$$ Recall = \frac{TP}{TP+FN}, Precision= \frac{TP}{TP+FP}\\ $$

(6)

The proposed technique is also evaluated in terms of execution time (in number of frames per second (fps)) when it runs on a PC with a 64 bit Core(TM) i7 2.80 GHz CPU, 16 GB RAM and Windows 10. MATLAB 2017 and the WAVELAB library [33] are used for the implementation.

4.1 Wavelet parameters

In this category of tests, we aim to select the most suitable wavelet function and the optimal number of decomposition levels.

Selection of the wavelet function

Several wavelet functions exist in the literature [5, 8, 17, 36, 39] and [30]. The performances of the proposed approach are evaluated with respect to some wavelets in order to retain the most suitable one: Haar [36], Beylkin [4], Vaidyanathan [39], Coiflet [8] order 1, Daubechies [17] order 20, Symmlet [5] order 10 and Battle [30] order 5. Figure 6 shows the detection rates when the MCD method is applied. Each result represented in a curve reflects the P_f value (along the x-axis) and the P_c value (along the y-axis) of a specific video in the UMN dataset. The performances are almost the same for all the wavelets except for video 3, where they are degraded when using the Symmlet, the Beylkin and the Battle wavelets. According to these results, the Coiflet wavelet yields the best performances and it is retained for the remaining tests.

Selection of the number of resolution levels

Tests are conducted in order to decide about the optimal number of resolution levels that ensures high detection performances. 1-level, 2-level and 3-level wavelet decompositions are investigated on all the videos of the UMN dataset. The temporal variations of S^(k) along the video 9 of the UMN dataset, when J = 1, J = 2 and J = 3 are respectively illustrated in Fig. 7a, b and c. The panic starting at the 551^st image according to the ground truth (GT) is visible when J = 1 and J = 3 and two classes of values can be clearly identified in the set $\mathcal {S}$, unlike the one obtained when J = 2.

By applying the same test to all the videos, the detection rates depicted in Fig. 8 show that a 1-level decomposition and a 3-level decomposition yield close performances. Therefore, a 1-level wavelet decomposition is retained as it requires less computations compared to the 3-level decomposition.

4.2 Selection of the detection technique

In order to select the most appropriate detection technique, different clustering methods such as k-means [11], the Partitioning Around Medoids (PAM) method [16] and skinny-dip [20] are investigated and compared to the MCD statistical test for outlier detection [27]. Furthermore, different values of the confidence level α (0.01, 0.05, 0.1) are considered to evaluate the performances of the system when the MCD test is used. Figure 9 as well as Table 2 show that the performances of the MCD test with α = 0.01 outperforms the other methods with an average detection rate of 0.98.

Table 2 Comparison between MCD, PAM, k-means and skinny-dip methods based on DWT. Application to the UMN dataset

Full size table

4.3 Comparison between the frequency domains

Using the retained parameters of the system, namely the MCD test with α = 0.01, this round of tests aims to select the most appropriate frequency domain in terms of detection performances and execution time. Therefore, FFT, DCT and the DWT using the coiflet wavelet function with one level of decomposition, are explored.

Regarding the execution time, Table 3 shows that the FFT transform yields the fastest execution, followed by the DCT transform and then the DWT. However, for all the data sets, the detection rates obtained using each of the three frequency domains are very close, except for the PETS2009 and MED datasets where the DWT outperforms DCT and FFT. The execution time indicates that the technique operates in real-time although the image dimensions of the video are larger than in the other data sets. It is worth pointing out that the use of the DWT requires that the dimensions of the images be of the form 2ⁿ where $n \in \mathbb {N}^{*}$. That is, if this condition is not satisfied, the images are zero-padded. This explains in part the less fast computation of the technique when the DWT is used compared to the use of DCT and FFT.

Table 3 Panic detection results based on DWT, DCT and FFT using MCD

Full size table

4.4 Performances evaluation compared to the state-of-the-art techniques

The performances evaluation of the proposed system compared to the state-of-the-art techniques is carried out in two stages. As it is important to maintain a high detection accuracy while operating in real-time, the objective of the first stage is to compare the accuracy of the system with some offline techniques [7, 12, 21, 29, 40, 41].

In the second stage, comparisons with real-time techniques [13, 22, 24, 26, 31, 34, 35, 42] are performed in terms of accuracy and execution time.

Table 4 shows that in average, the proposed technique outperforms the technique in [12] and is slightly less accurate than the technique in [29] when tests are performed on the UMN dataset, with an average accuracy of 0.986 against 0.99. These results are considered as excellent since the proposed system operates in real-time with an average computational speed of 358 fps.

Table 4 Performances comparison between our approach and offline approach in [12] and [29] using the UMN dataset

Full size table

In the same way, Table 5 depicts the performances obtained on PETS2009 dataset. In average, the proposed technique performs better than [7, 40] and [21] for both scenarios, outperforms the technique [41] for the second scenario and is slightly less accurate than [41] for the first scenario.

Table 5 Performances comparison between our approach and offline approach. Application to the PETS2009 dataset

Full size table

Besides, Table 6 shows that the technique we propose outperforms the technique in [25] when the MED dataset is experimented, and for any of the frequency transforms.

Table 6 Detection performances on the MED dataset. Comparison between the proposed approach and [25]

Full size table

Real videos are also tested and the performances are depicted in Table 7 in comparison to the technique of [29]. Good performances are obtained by the proposed system even though they are less accurate than those of [29]. In the second stage, the performances of the proposed system are tested on the UMN data set and compared to related real-time techniques. The results are reported in Table 8 and show the outperforming of the proposed system in terms of both accuracy and execution time, for the three frequency domains.

Table 7 Performances comparison between our approach and offline approach [29]

Full size table

Table 8 Comparison in terms of accuracy and execution time between the reported real-time detection techniques and the proposed technique. Application to UMN dataset

Full size table

5 Discussion

The present study describes a new real-time approach for the detection of panic behaviors in crowded scenes. Three main contributions are proposed for which, efficiency, accuracy and high speed are experimentally proved. The first contribution aims to alleviate the heavy computations resulting from applying a motion estimation technique, by considering the differences between successive images of the video. This solution allows to locate moving edges with a fast execution. Furthermore, panic is defined as a sudden change in the interactions between people. This is reflected by a change in the spatial distribution of the moving edges in addition to the increase of the number of moving pixels as a consequence of the fast behavior’s change of people. In order to characterize the distribution of moving pixels during a panic and a normal situation, our second contribution consists of representing the moving edges in a frequency domain, allowing a sparse representation of the spatial discontinuities. The FFT, the DCT and the DWT are explored in the present study. Tests conducted on several challenging videos, with different density levels of pedestrians, show the high performances and the high speed of the proposed system as depicted in Tables 3, 4, 5, 6, 7 and 8. The experimental comparison between the three frequency domains in terms of performances show that they perform well and that the detection rates are close. In terms of execution time, the FFT based system yields the highest execution speed, followed by the DCT, then the DWT.

Our third contribution refers to the detection of panic related data by exploring two formulations. The first formulation considers distinguishing between the normal-related data and the panic-related data by using a clustering technique; whereas the second formulation is based on a hypothesis testing, in which panic-related data are considered as aberrant compared to the data resulting from a normal situation. Figure 9 and Table 2 illustrate the good performances of the system for both formulations, with a slight outperforming of the second formulation.

The proposed system is evaluated with regard to offline and real-time detection techniques and the results show its high performances.

6 Conclusion and future work

A new panic detection approach is proposed in this study. The aim is to analyze the crowd dynamics and detect a possible panic behavior in real-time and without requiring a prior knowledge about the video under consideration. For this, a new feature is proposed based on the computation of the image differences and the analysis of the moving edges in frequency domains. Then, two formulations of the panic detection problem are explored and compared in terms of accuracy and execution time. The approach is evaluated using several datasets and showed its high performances.

In the future work, we will study the effectiveness of other solutions for the detection of moving edges, such as the foreground extraction, and their impact on the system performances.

References

Agency TOSP (2015) Hajj / civil defense: 150 pilgrims died and 400 others injured in a stampede in Mina. http://www.spa.gov.sa
Ahmed N, Natarajan T, Rao KR (1974) Discrete cosine transform. IEEE Trans Comput 100(1):90–93
Article MathSciNet Google Scholar
Bergland GD (1969) A guided tour of the fast Fourier transform. IEEE spectrum 6(7):41–52
Article Google Scholar
Bleistein N (1987) On the imaging of reflectors in the earth. Geophysics 52(7):931–942
Article Google Scholar
Cai C, CaiHarrington PDB (1998) Different discrete wavelet transforms applied to denoising analytical data. Journal of chemical information and computer sciences 38(6):1161–1170
Article Google Scholar
Catherine ES, Shoichet E, Botelho G (2016) Footage shows suspects in Brussels attack. http://edition.cnn.com/2016/03/22/europe/brussels-explosions
Chen D-Y, Huang P-C (2011) Motion-based unusual event detection in human crowds. J Vis Commun Image Represent 22(2):178–186
Article Google Scholar
Daubechies I et al (1991) Ten lectures on wavelets. In: CBMS-NSF regional conference series in applied mathematics, vol 61, no. 4
Ferryman JA (2009) Pets2009benchmarkdata. http://www.cvg.reading.ac.uk/PETS2009/a.html
Firdaus S, Uddin MA (2015) A survey on clustering algorithms and complexity analysis. Int J Comput Scie Issues (IJCSI) 12(2):62
Google Scholar
Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–769
Google Scholar
Fradi H, Dugelay J-L (2015) Towards crowd density-aware video surveillance applications. Inf Fus 24:3–15
Article Google Scholar
Fradi H, Luvison B, Pham Q-C (2017) Crowd behavior analysis using local mid-level visual descriptors. IEEE Trans Circuits Syst Video Technol 27 (3):589–602
Article Google Scholar
Guardian T (2017) More than a dozen fans killed in stampede at Angolan football match. https://www.theguardian.com/world/2017/feb/10/17-fans-killed-stampede-football-match-angola
Gunduz AE, Ongun C, Temizel TT, Temizel A (2016) Density aware anomaly detection in crowded scenes. IET Computer Vision 10(5):376–383
Article Google Scholar
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: An introduction to cluster analysis, vol 344. Wiley, New York
Google Scholar
Lewis AS, Knowles G (1991) VLSI architecture for 2D Daubechies wavelet transform without multipliers. Elect Lett 27(2):171–173
Article Google Scholar
Li T, Chang H, Wang M, Ni B, Hong R, Yan S (2015) Crowded scene analysis: A survey. IEEE trans Circ Syst Video Technol 25(3):367–386
Article Google Scholar
Mallat S (1999) A wavelet tour of signal processing, Academic Press, San Diego
Maurus S, Plant C (2016) Skinny-dip: Clustering in a sea of noise. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1055–1064
Mehran R, Oyama A, Shah M (2009) Abnormal crowd behavior detection using social force model. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE, pp 935–942
Nady A, Atia A, Abutabl AE (2018) Real-time abnormal event detection in crowded scenes. J Theo Appl Inf Technol 96:6064–6075
Google Scholar
University of Minnesota (2006) Unusual crowd activity dataset. http://mha.cs.umn.edu/Movies/Crowd-Activity-All.avi
Pennisi A, Bloisi DD, Iocchi L (2016) Online real-time crowd behavior detection in video sequences. Comput Vis Image Und 144:166–176
Article Google Scholar
Rabiee H, Haddadnia J, Mousavi H, Kalantarzadeh M, Nabi M, Murino V (2016) Novel dataset for fine-grained abnormal behavior understanding in crowd. In: 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, pp 95–101
Roshtkhari MJ, Levine MD (2013) An on-line, real-time learning method for detecting anomalies in videos using spatio-temporal compositions. Comput Vis Image Und 117(10):1436–1452
Article Google Scholar
Rousseeuw PJ, Driessen KV (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3):212–223
Article Google Scholar
Sajana T, Rani CS, Narayana K (2016) A survey on clustering techniques for big data mining. Ind J Sci Technol 9(3):14721835
Google Scholar
Shehab D, Ammar H (2018) Statistical detection of a panic behavior in crowded scenes. Mach Vis Appl 30:1–13
Google Scholar
Sheng Y, Roberge D, Szu HH (1992) Optical wavelet transform. Opt Eng 31(9):1840–1846
Article Google Scholar
Shi Y, Gao Y, Wang R (2010) Real-time abnormal event detection in complicated scenes. In: 2010 20th international conference on pattern recognition (ICPR). IEEE, pp 3653–3656
Thida M, Yong YL, Climent-Pérez P, Eng H-L, Remagnino P (2013) A literature review on video analytics of crowded scenes. Springer, Berlin, pp 17–36
Google Scholar
Stanford University (2016) WAVELAB 850. http://statweb.stanford.edu/~wavelab/
Wang J, Xu Z (2016) Spatio-temporal texture modelling for real-time crowd anomaly detection. Comput Vis Image Und 144:177–187
Article Google Scholar
Wang L, Dong M (2012) Real-time detection of abnormal crowd behavior using a matrix approximation-based approach. In: 2012 19th IEEE international conference on image processing (ICIP), pp 2701–2704
Wang Q, Deng X (1999) Damage detection with spatial wavelets. Int J Solids Struct 36(23):3443–3468
Article Google Scholar
waze digital (1966) cloud digital asset management platform. http://commerce.wazeedigital.com/license/clip/14121797.do
waze digital (2001) cloud digital asset management platform. http://commerce.wazeedigital.com/license/clip/3682865.do
Wickerhauser MV (1996) Adapted wavelet analysis: from theory to software. AK Peters/CRC Press
Wu S, Moore BE, Shah M (2010) Chaotic invariants of Lagrangian particle trajectories for anomaly detection in crowded scenes. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 2054–2060
Wu S, Wong H-S, Yu Z (2014) A Bayesian model for crowd escape behavior detection. IEEE Trans Circ Syst Video Technol 24(1):85–98
Article Google Scholar
Xiong G, Wu X, Chen Y-L, Ou Y (2011) Abnormal crowd behavior detection based on the energy model. In: 2011 IEEE international conference on information and automation (ICIA), pp 495–500
Zhan B, Monekosso DN, Remagnino P, Velastin SA, Xu L-Q (2008) Crowd analysis: A survey. Mach Vis Appl 19(5):345–357. https://doi.org/10.1007/s00138-008-0132-4
Article Google Scholar

Download references

Acknowledgments

This project was funded by the Deanship of Scientific Research (DSR), King Abdulaziz University, Jeddah, under grant No. (DG-046-612-1140). The authors, therefore, gratefully acknowledge the DSR technical and financial support.

Author information

Authors and Affiliations

FCIT, King Abdulaziz University, KSA, Jeddah, Saudi Arabia
Bahya Aldissi
Laboratory of Robotics, Informatics and Complex Systems (RISC-Lab), National Engineering School of Tunis, University of Tunis El Manar, Tunis, Tunisia
Heyfa Ammar

Authors

Bahya Aldissi
View author publications
You can also search for this author in PubMed Google Scholar
Heyfa Ammar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heyfa Ammar.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aldissi, B., Ammar, H. Real-time frequency-based detection of a panic behavior in human crowds. Multimed Tools Appl 79, 24851–24871 (2020). https://doi.org/10.1007/s11042-020-09024-z

Download citation

Received: 18 June 2018
Revised: 22 March 2020
Accepted: 05 May 2020
Published: 26 June 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s11042-020-09024-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Real-time frequency-based detection of a panic behavior in human crowds

Abstract

Similar content being viewed by others

Statistical detection of a panic behavior in crowded scenes

An Improved Approach to Crowd Event Detection by Reducing Data Dimensions

Detecting Abnormal Behavioral Patterns in Crowd Scenarios

Explore related subjects

1 Introduction

2 Datasets

University of Minnesota (UMN) dataset

Motion Estimation Dataset (MED)

Performance Evaluation of Tracking and Surveillance 2009 (PETS2009) dataset

Festival crowd

Bull-running festival

3 Method

3.1 Absolute image differences computation

3.2 Proposed frequency-based feature for the characterization of the crowd dynamics

3.3 Panic detection

3.3.1 Clustering based detection

3.3.2 Statistical detection

3.4 Postprocessing

4 Results

4.1 Wavelet parameters

Selection of the wavelet function

Selection of the number of resolution levels

4.2 Selection of the detection technique

4.3 Comparison between the frequency domains

4.4 Performances evaluation compared to the state-of-the-art techniques

5 Discussion

6 Conclusion and future work

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation