Keywords

1 Introduction

Divergence detection at the high-density crowd is a tough task due to several challenges involved in high-density crowd videos e.g., few pixels available per head, extreme occlusion, cluttering and noise, and perspective problems, etc. If crowd divergence is not detected earlier at its development stage, it may lead to larger disasters like a stampede. Figure 1 shows an example of high-density crowd divergence behavior (Love parade 2010 musical festival [1]) where divergence eventually leads to disastrous stampede. Figure 1a, b demonstrate high-density crowd normal behavior following paths N1 and N2, whereas in Fig. 1c, d, a critical situation is shown where the incoming crowd is blocked by a stationary crowd and diverge through D1, D2 paths. Such divergence situations are common in mass gatherings where the whole crowd is marching towards a common destination and with an increase of density, ends up with a half-stationary half-moving crowd segments that result in divergence behavior.

Fig. 1
figure 1

Demonstration of crowd divergence at Love parade 2010: a crowd walking under normal conditions with low density b crowd walking paths N1, N2 under normal conditions c high-density crowd within the same region d crowd diverging through paths D1 and D2

Previous divergence detection methods [2, 3] learn manual motion features for every individual in the crowd from optical flow (OPF) including location, direction, magnitude, etc. An inherent problem with such methods is with an increase of crowd density, it is almost impossible to capture individual-level motion information and one must learn global crowd features. Later several methods have been developed to capture global crowd motion information e.g., optical flow with pathline trajectories [4,5,6,7], pathlines with Lagrangian particle analysis [8], streakflow [9,10,11,12,13], etc. These methods performed well in capturing crowd global motion information under normal behavior scenes only. Unfortunately, no results are reported in the literature for abnormal behavior detection at very high-density crowd levels.

In this work, we solve divergence detection in the high-density crowd by directly capturing crowd global motion in form of images and learn crowd normal and divergent motion shapes through a neural network that predicts crowd behavior for the unknown scene. We also propose a novel divergence localization algorithm to pin-point divergence location with the help of a bounding box. Finding a source of divergence can help to efficiently deploy crowd management staff right at the critical locations.

2 Related Work

Motion is one of the key ingredients in the crowd scene analysis and the success of the behavior prediction scheme greatly relies on the efficiency of the motion estimation (ME) method. Therefore, we provide a comprehensive review of ME techniques and the corresponding abnormal behavior detection methods with emphasis on their capabilities for ‘high’ density crowded scenarios. OPF is considered to be one of the most fundamental motion flow model [14,15,16,17] that has been widely employed for motion estimation [18, 19], crowd flow segmentation [20], behavior understanding [21,22,23] and tracking in the crowd [24]. However, OPF methods suffer from various problems like motion discontinuities, lack of spatial and temporal motion representation, variations in illumination conditions, severe clutter and occlusion, etc.

To overcome problems of OPF ME, researchers employ particle advection concepts from fluid dynamics into the computer vision domain [8] and obtain long-term “motion trajectories” under the influence of the OPF field. We et al. [7] employ chaotic invariants on Lagrangian trajectories to determined either the behavior of the crowd is normal or not. They also perform localization of anomaly by determining the source and size of an anomaly. Unfortunately, no results were reported for the high-density crowd. Similarly, Ali et al. [8] obtain Lagrangian Coherent Structures (LCS) from particle trajectories by integrating trajectories over a finite interval of time termed as Finite-Time Lyapunov exponent (FTLE). LCS appears as ridges and valleys in the FTLE field at the locations where different segments of the crowd behave differently. Authors perform crowd segmentation and instability detection in the high-density crowd using LCS in FTLE, however actual anomalies of the high-density crowd like crowd divergence, escape behavior detection, etc. are not performed. Similarly, authors in [10, 11] obtain particle trajectories using high accuracy variational model for crowd flow and perform crowd segmentation tasks only. Mehran et al. [9] obtain streakflow by spatial integration of streaklines that are extracted from particle trajectories. For anomaly detection, they decompose streakflow field into curl-free and divergence-free components using the Helmholtz decomposition theorem and observe variations in potential and streak functions used with SVM to detect anomalies like crowd divergence/convergence, escape behavior, etc. However, results are reported for anomaly detection and segmentation at low-density crowd and efficacy is still questionable for anomalies at the high-density crowd. Eduardo et al. [25] obtain long-range motion trajectories by using the farthest point seeding method called streamline diffusion on streamlines instead of spatial integration.

Behavior analysis is performed by linking short streamlines using Markov Random Field (MRF). However, only normal behavior detection and crowd segmentation results are reported. Although particle flow methods discussed above are better candidates for ME of the high-density crowd, but they are rarely employed for abnormal behavior detection at high density crowded scenes. Figure 2 provides a comparison of ME methods for high-density crowd performing Tawaf around Kabbah. Conventional object tracking based ME methods [26, 27] (Fig. 2b, c) works best at low crowd density but completely fails at high crowd density. The OPF method from Brox et al. [15] can estimate motion at high density but motion information is short-term. SFM [28] method can provide better motion estimation in low-density crowd areas but at high density, the performance of SFM also degrades. Streakflow [9] method also performs similarly to the SFM method at a high-density crowd. Unfortunately, all these methods are unable to provide a clean motion-shape for the crowd. FTLE method [9] (Fig. 2g) produce clear ridges at cowd boundaries and can be best to describe high-density crowd motion. Therefore, in this work, we utilize the FTLE method to obtain crowd motion-shape and translate it to a single channel greyscale image (Fig. 2h) for both normal and abnormal behavior analysis. Our framework for divergence detection is shown in Fig. 3 (top portion). It consists of two main phases: Phase 1: low-level FTLE feature extraction and conversion into a grey-scale motion shape image; Phase 2: behavior classification using a CNN. Motion shape images are also used for divergence localization process.

Fig. 2
figure 2

High-density crowd motion estimation by state-of-the-art methods

Fig. 3
figure 3

Framework for divergence behavior detection in high-density crowd

3 Divergence Detection with Motion Shape and Deep Convolution Neural Network

3.1 Data Preparation

Due to the unavailability of a very high-density crowd dataset with divergence behavior, we generate synthetic data by simulating crowd in Massmotion software [29]. We model two crowd scenarios: Stampede at Loveparade 2010 and Tawaf around Kabbah. Example snapshots of normal and divergence crowd behaviors are shown in Figs. 4 and 5.

Fig. 4
figure 4

Synthetic crowd data for Love parade 2010 disaster: a, b are camera-top views; c, d, e- same crowd with perspective views

Fig. 5
figure 5

Synthetic crowd data for Kabbah Tawaf—Normal behavior and divergent crowd

3.2 Global Motion Estimation and Shape Extraction

In this work, high-density crowd motion is computed by the Finite-Time Lyapunov Estimation (FTLE) method [8, 30]. Lagrangian Coherent Structure (LCS) appears as ridges in the FTLE field where two crowd segments behave differently. We extract LCS from FTLE field FTLE using the field-strength adaptive thresholding (FFSAT) scheme and convert it into a grey-scale image. At every integration step in the FTLE pipeline, maximum Eulerian distance (dmax) is calculated between LCS absolute peak value and average FTLE field strength, and a threshold (ffsat_thr) is set for dmax (65% in our work). LCS values crossing ffsat_thr are extracted and converted into a single-channel grey-scale image. FFSAT algorithm ensures only strong magnitude LCS values from the FTLE field are extracted and noise is filtered out.

3.3 Deep Network for Divergence Detection

A deep CNN network developed for normal and divergence classes in the high-density crowd is shown in Fig. 6.

Fig. 6
figure 6

Deep CNN for divergence behavior detection

The greyscale image is first rescaled to 50 × 50 pixels at the input layer. A convolution layer is used (24 filters) with ReLU activation. The purpose of using a large number of convolution filters is to ensure all important receptive fields of CNN are excited about a given motion-shape. ReLU is adopted as the activation function because of its good performance for CNNs [31] and Max pooling is used for each 2 × 2 region. Finally, two fully connected layers are used and the softmax layer is used for the classification of normal or divergent behavior.

3.4 Divergence Localization Algorithm

We propose a novel divergence localization algorithm that analyzes changes in motion shape blob to search for the region of divergence. It was noticed that motion-shape also exhibit undesired local variations (Fig. 7 top row) that could lead to false divergence region detection. These changes are occurred due to the to-and-fro motion experienced by the crowd at high densities crowd [32]. As these oscillatory motions propagate and reach the crowd boundary, the shape does not remain consistent in every frame. Whereas the initial occurrence of divergence also appears as a small shape change and progressively increases in size (as shown in Fig. 7 bottom row). To cater to undesired local shape changes, a blob processing pipeline is implemented shown in Fig. 8.

Fig. 7
figure 7

Top row: Undesired motion-shape variations due to crowd oscillatory motion; Bottom row: Real shape change due to divergence

Fig. 8
figure 8

Baseline blob extraction pipeline for normal and divergence behaviors

Baseline blob extraction pipeline extracts a baseline blob from the normal and divergence motion shapes and input to divergence localization algorithm is shown in Table 1. The divergence localization algorithm indicates divergence location(s) with the bounding box.

Table 1 Algorithm for divergence localization with bounding box

4 Experimentation Results

We evaluate proposed methods of crowd divergence behavior detection and divergence localization using crowd datasets of Love parade and Kabbah (data preparation details in Sect. 83.1). A detailed qualitative and quantitative analysis is provided for both methods on two selected scenarios. We also compare our methods with OPF from Brox et al. [15] by converting the OPF field in binary images.

4.1 Divergence Behavior Detection

For divergence behavior detection at two scenarios, the crowd is simulated to diverge from 25 different locations in each scenario and 1000 motion-shape images are captured (total images for 25 divergence locations = 25 × 1000 = 25,000 divergence images for each scenario). One thousand images for each divergence location are generated to train CNN with minor local motion changes contributed by crowd oscillatory motion. Similarly, 2500 images are generated for normal crowd behavior. The dataset for each scenario is split into two parts: randomly 20 divergence locations data (20 × 1000 = 20,000 images) are used for training/validation purposes, whereas the remaining random 5 divergence locations data (completely unseen to CNN) is used for prediction. Figure 9 provides a confusion matrix of divergence behavior detection for both scenarios and performance is compared with the OPF method.

Fig. 9
figure 9

Confusion matrices for divergence behavior detection: Love parade scenario a Proposed method b OPF method; Kabbah scenario c Proposed method d OPF method

For both the Love parade and Kabbah scenario, our method can achieve 100% accuracy. However, in both scenarios, OPF was able to detect approx. 50% of divergence behaviors only. Motion-shapes obtained through the OPF method are not as smooth and consistent as produced by our method; hence OPF performance degradation is evident.

4.2 Divergence Localization

The performance of the divergence localization algorithm is evaluated by calculating the Intersection over Union (IoU) area of the predicted bounding box and ground truth bounding box for each divergence region. Ground truth bounding boxes are obtained by hand labeling divergence regions of each abnormal frame. IoU is calculated using Eq. (1).

$$\text{IoU}=\frac{{\text{Area of overlap}}}{{\text{Area of union}}}$$
(1)

Generally, an IoU score greater than 0.5 (50% overlap) is considered a good prediction by any bounding box (b. box) detection algorithm [33]. In this work, the IoU score is calculated for N post i_t frames. IoU score of six selective frames (out of N = 50 post i_t frames) for the Love parade scenario is shown in Fig. 10. The green color b.box represents ground truth and the red color b.box represents prediction by our algorithm. The Final IoU score is obtained by averaging N frames IoU scores. The average IoU score of our algorithm for the Love parade scenario is 0.501 (50% overlap). We also perform divergence region detection using OPF motion images. The average IoU score with the OPF method is found to be 0.15 (15% overlap) which proves our method performs well than OPF for divergence region localization. Similarly, the average IoU score for the Kabbah scene with our algorithm is 0.63 (63% overlap) and 0.18 (18% overlap) for the OPF method.

Fig. 10
figure 10

Divergence region localization with our proposed method. IoU scores shown for six post i_t frames

5 Conclusion

In this work, we propose a deep CNN-based divergence behavior detection framework that extracts high-density crowd motion shapes in form of images to train deep CNN. Experimentation results show that the proposed method can achieve close to 100% accuracy for divergence detection in challenging Loveparade and Kabbah crowding scenarios. Similarly, a novel divergence region detection algorithm efficiently detects divergence regions with IoU of more than 50%. However, we notice there are few limitations of our proposed methodology of converting crowd motion into images using the FTLE method. Motion shape analysis is inefficient in the situations when a crowded segment in high density gets stationary due to any reason. Since there is no more movement at the stationary crowd segment, FTLE is unable to predict crowd motion shape at static crowd portions and results in incomplete or broken motion-shapes. Therefore, for our framework to work efficiently, the crowd needs to keep moving (for consistent motion-shape) that is always not true. Secondly, in the FTLE method, LCS ridges appear only at crowd boundaries, if any anomaly takes place at interior portions of the crowd (far from crowd boundaries towards the center), FTLE is unable to provide any information there. Therefore, in future work, we shall improve our method by incorporating spatial and temporal crowd density variations to capture static crowd behavior. And predict crowd behavior in all segments of the crowd, either crowd is stationary or in motion.