1 Introduction

People gatherings on a large-scale are vulnerable to adverse incidents when constricted with limited space and exit routes. In booming urban centers, crowd safety and analysis become major concerns especially in public places or in events where the possibility of catastrophic incidents increases and might lead to serious injury or death. The study of human crowd is a continual and interdisciplinary research, involving different disciplines, such as technology, mathematics, computational science, and psychology [1]. Several factors affect the behaviors and movement patterns (like walking strategy) of the crowd. The study of previous research shows that evacuation drills and simulations have been conducted for better analysis to achieve a more comprehensive understanding of crowd behaviors. However, evacuation drills are cost-intensive procedures and do not reflect the natural behavior of people in emergencies. Crowd behavior simulations [2] can overcome the shortcomings of mechanized evacuation drills in emergencies due to their time and cost efficiency and repeatability. However, it is still a challenge to match the simulation with the observed actual behavior in scenarios where human behavior is affected by external circumstances.

Despite several years of progress in crowd research, simulation mapping is still lacking due to the shortcomings in consideration of individual behavioral variations. Moreover, people make their own choices so their behaviors cannot be accurately described by equations as well as the laws of physics [3]. Artificial Intelligence (AI) techniques such as fuzzy logic [4], neural network (NN) [5], and game theory [6] have been implemented into crowd simulation to address individual perception and behaviors. However, these methods of AI are rule-based (not learning agent) and need extensive justifications to explain themselves in real scenario applications. Notably, attention to the significant role of data where models can be validated with empirical data with humans is definitely increasing but it still continues to remain largely unexplored [7]. Recently with the evolution of big data and techniques for automatic sensing of crowd behaviors [8], ML has become a hotspot for crowd analysis due to its ability to learn, adapt, and perform robust classification and detection [9]. Machine learning plays a vital role in crowd analysis because it enables researchers to explore real data of crowd assemblies [10] or adopt agents with virtual behavior that interact with the environment.

The role of machine learning in crowd analysis has not been significantly explored in previous studies. Accordingly, this review work attempts to address this gap to enable further development in crowd analysis. Several surveys related to crowd analysis have been done recently [11]. Zhan et al. [12] and Jacques et al. [13] reviewed crowd analysis methods covering the techniques of crowd recognition, tracking, and density estimation. While Sjarif et al. [14] surveyed the crowd analysis with emphasis on abnormality detection in crowded scenes. Swathi et al. [15] reviewed studies on crowd behavior analysis that involve density estimation, motion detection, tracking, and behavior recognition. However, these surveys paid excessive attention to the computer vision approach. Ibrahim et al. [16] surveyed prediction techniques of crowd disasters briefly along with evacuation models that provide efficient evacuation routes. Bi et al. [17] reviewed the evacuation algorithms in crowded sites while Zhou et al. [18] surveyed the approach of guided crowd evacuation. Sodemann et al. [19] surveyed anomaly detection approaches in automated surveillance and shortly introduced the anomaly detection machine learning techniques, such as support vector machine, clustering, and decision trees. Recently, Sreenu et al. [20] surveyed the applications of deep learning techniques in crowd counting. However, none of these reviews have fully addressed the applications of ML techniques toward crowd evacuation, abnormality detection, and crowd prediction. Table 1 summarizes the topics covered in the previously conducted surveys in crowd analysis.

Table 1 Previous surveys in crowd analysis

This article aims to answer the following research questions: Q1. How do machine learning-based systems impact crowd analysis? Q2. Which machine learning-related techniques do achieve better performance? and what types of problems do they seek to address? Moreover, this article aims to derive a clear picture of the crowd analysis machine learning-based systems which are categorized into crowd evacuation, abnormality detection, and prediction. As such, crowd counting, and density estimation have been excluded here. The main objective of this article is to review crowd analysis machine learning-based systems, consisting of a combination of the following key aspects: (I) relevant machine learning techniques in crowd analysis, (II) related crowd evacuation studies that provide efficient evacuation routes, (III) pertinent studies of abnormality detection and crowd prediction that could detect or foresee the occurrence of any possible disasters and predict pedestrian route choices, and (IV) relevant platforms for crowd simulation.

The remainder of this article is organized as follows: Sect. 2 discusses the common and widely applied ML techniques for crowd analysis, Sect. 3 elaborates on the approaches and types of crowd analysis machine learning-based systems, Sect. 4 briefs the relevant software and platforms used in surveyed papers, Sect. 5 addresses the current challenges of crowd analysis, and the subsequent section concludes this article.

2 Machine learning techniques in crowd analysis

Crowd analysis-oriented research has largely focused on machine learning models due to their outstanding capability of learning, adapting, and performing good detection and classification. Machine learning methods are categorized generally into supervised learning, unsupervised learning, and reinforcement learning. Currently, unsupervised learning has been overshadowed by the successes of supervised learning [21]. However, in the long term, unsupervised learning is expected to emerge as more significant because humans explore the world through autonomous observation (unsupervised learning), and not by being informed. This section discusses the most widely used ML models in crowd analysis, viz. traditional models of machine learning, reinforcement learning, deep reinforcement learning, convolutional neural network, and recurrent neural network due to their huge potential.

2.1 Traditional models of machine learning

Learning methods such as support vector machine (SVM), decision tree (DT), K-Nearest Neighbor (k-NN), and artificial neural network (ANN) become obvious in more complex scenarios in anomaly detection [19] and crowd movement pattern categorization under emergency evacuation [22]. Kuang et al. [23] developed a data-driven path planning model for crowd where SVMs have been used for route choice classification. Each learning method of machine learning demonstrates distinctive features and advantages in specific contexts. However, these ML techniques demonstrate limited ability to process raw data and require engineering skills and expertise to design a feature extractor that transforms the raw data into a feature vector from which the learning methods could detect or classify patterns in the input [21].

2.2 Reinforcement learning (RL)

The reinforcement learning (RL) has the property of utilizing a trial-and-error learning process to achieve certain goals, which can enable advanced progression toward building a human-level agent [24]. The RL has been a hotspot of intensive research in diverse fields like industrial automation, robotics, and control. It is also an effective learning approach to study crowd evacuation. RL supports multi-agent systems and enables goal-driven learning where individuals are treated as intelligent and independent agents who learn the main features of crowd motion. Traditional multi-agent methods are useful in capturing the dynamics of the human crowd. However, evacuation routes and agents’ behaviors are predefined in the traditional approaches by the model [25]. RL can enhance a better understanding of human behavior and handle situations that occur in real emergencies frequently. Many research studies have used the RL algorithm as a mechanism for human evacuation [26] in dynamic environments. RL is an algorithm that can solve sequential decision-making problems where the concept of RL optimal policy can generate anticipatory behaviors, which is a desirable character in crowd simulation [27], intelligent control, analysis, and prediction [28]. In the pedestrian simulation, the multi-agent RL applications have several advantages including low computational cost regarding the agents’ behavior, variability and heterogeneity of individual behaviors which affects crowd flow, the free model design where the agent behavior and environment are not assumed, and availability of collective behaviors as consequences of individual decisions [29]. Multi-agent RL algorithm has two key challenges: Firstly, dimension dilemma due to the rise of agents in different situations. Secondly, the level of complexity which is affected by how agents might interact and explore the environment.

2.3 Deep reinforcement learning (DRL)

Deep learning (DL) has emerged as the new tendency of machine learning approach due to its sophisticated pattern recognition and classification where raw sensory data is directly fed into deep neural networks that learn features automatically and provide more accurate classification compared to traditional ML techniques which have limited capability to process raw data directly. Several DL frameworks have been widely adopted for various crowd analysis as discussed here and in the following two sub-sections.

Deep reinforcement learning (DRL) is an integration of RL with deep neural networks (DNNs). It has shown astonishing results for several challenges in sequential decision-making [30, 31]. DRL approaches have been adopted in crowd simulation [32,33,34,35,36] due to their capability to deal with a dynamic environment that affects the decisions of agents [27]. The advantages of DRL are more obvious in scenarios, wherein the environment is represented by a large number of states. Mnih et al. [30] developed a virtual agent via deep Q-network (DQN) which directly learns effective policies from high-dimensional inputs. DRL shows better performance in comparison to RL for path planning evacuation due to its strong nonlinear approximation and a generalization of neural networks [24, 37]. To some extent, DRL contributes toward solving the dilemma of dimensionality. However, this dilemma is still an open question [24].

2.4 Convolutional neural network (CNN)

The CNN models have been widely adopted for crowd behavior categorization and detection of abnormal activities or emotions. CNN has significant advantages in vision-based tasks specifically in crowd counting [38] due to its auto capability of high-level features extraction in large-scale images with a single [39] or multiple objects. The CNNs include a convolutional layer, activation function, and pooling layer that reduces the dimension of input data but retains the most important information [40]. The CNN approaches have been studied to demonstrate effectiveness in crowd counting. Existing CNN applications present challenges for several parameters and the need for large storage space. Correspondingly, precise control becomes possible with visual inputs and power of the CNNs [27].

2.5 Recurrent neural network (RNN)

Several works have implemented RNNs for the prediction of pedestrian trajectories. RNNs can process sequential information and maintain the previous information updated to the current state. However, it is difficult to store information for longer durations [41]. In long sequence learning, RNNs experience deficiency due to the vanishing gradient problem. So, Long Short-Term Memory (LSTM) [42] and Gated Recurrent Unit (GRU) [43] were introduced to overcome the vanishing gradient problem. They have achieved an outstanding performance due to their ability in capturing long-term dependencies. GRU is like LSTM but with fewer parameters. Hence, RNNs with LSTMs have been exceedingly applied in long sequence processes [40]. In comparison to conventional RNNs, LSTM is proved to be more effective [44]. Table 2 provides a summary on the applied machine learning techniques in the surveyed articles in crowd analysis. Table 2 also shows that DRL techniques have advantages in path planning and evacuation while RNN shows significance in crowd prediction applications. CNN is considered a breakthrough in visual tasks related to crowd detection.

Table 2 Common ML techniques for crowd analysis

3 Crowd analysis

Crowd analysis has been flourishing and garnering the focus of several researchers. It involves diverse research aspects including visual surveillance, artificial intelligence, crowd dynamics and pattern recognition, etc. Based on their objective and purpose, this article categorizes crowd analysis studies as follows: Crowd evacuation, abnormality detection, and prediction (see Fig. 1).

Fig. 1
figure 1

Schematic representation of the crowd analysis categories in this survey

3.1 Crowd evacuation

Evacuation constitutes one of the main concerns in studying emergency crowd situations where time is very crucial for human safety. Providing an optimal route for evacuation continues to remain a big challenge which requires optimization and quick handling of high-dimensional data. As such, the pedestrian or agent in the simulation must have the learning capability to control its velocity and avoid collision with obstacles and other pedestrians. For better elaboration, this article categorizes crowd evacuation into data-driven methods [48] and goal-driven learning methods [26].

3.1.1 Data-driven methods

Traditional crowd evacuation methods strive to improve evacuation efficiency using several assumptions which make the crowd simulation less realistic. Correspondingly, researchers have employed data-driven methods to promote the visual reality of the crowd simulation. Several optimization algorithms [62,63,64] have been used to tune parameters of the social force model (SFM) [65] and Reciprocal velocity obstacles (RVO) [66] in order to generate plausible simulations with similar observed trajectory behaviors in a video. Kim et al. [67] proposed an adaptive data-driven algorithm and extracted trajectory behaviors from videos to obtain feasible simulations. However, these methods consider only the social attributes which are insufficient to describe the cohesive characteristics of crowds. Yao et al. [48] proposed a RL-based data-driven algorithm for crowd evacuation in dynamic environments where the position and velocity were extracted from videos to describe the crowd cohesiveness. A crowd cohesiveness-based K-means algorithm was proposed to merge the trajectories of individuals into groups to improve the efficiency of path computation. Then, a path planning mechanism was proposed with two layers. Wherein, the Q-learning algorithm was employed in the top layer to train the control policy using trajectories. The bottom layer was for obtaining individual routes and avoiding collision via RVO model. This proposed method has shown a more realistic crowd evacuation simulation. Wang et al. [9] proposed evacuation path planning by combining an improved multi-agent RL algorithm with the improved SFM to support the authority in managing a large-scale crowd. SFM has been continuously optimized since its birth and as such, visual parameters were added to SFM. The intersections of pedestrian trajectories were extracted from video and used as a state space for Q-Learning algorithm employing two-layer mechanism. The upper layer was used for processing the leader’s decision to select a route based on the improved multi-agent RL. Improved SFM was applied in the bottom layer for individuals’ evacuation. This method addressed the dimensionality problem of RL and improved the convergence speed. The leaders exchanged information to attain the optimal policy. Li et al. [54] proposed a novel pedestrian detection method that combined SFM with Region-based CNN (R-CNN). The pedestrian positions were acquired by the R-CNN from a video and converted into actual coordinates. After which, the SFM was applied to simulate evacuation based on these actual coordinates. The results showed a more logical and plausible simulation than the pure SFM.

The deep learning techniques were used for crowd detection with the purpose to overcome the evident lack of the SFM in handling individual variations and unplanned events. Herein, the SFM was adopted to enhance the computational accuracy of deep learning for a dense crowd. Also, the data-driven method aims to imitate the trajectories of crowd in real videos to describe social attributes sufficiently which is an efficient method to promote the realism of crowd simulation. However, these methods lack consideration of scenario variation and dynamic changes.

3.1.2 Goal-driven learning methods

The simulations of traditional crowd evacuation behaviors often ignore individual differences and lack local details. The reinforcement learning is considered one of the most effective methods to learn or plan human evacuations. Humans are mostly driven by goals (desires) and led by self-beliefs. The RL methods can be also used as a goal-driven learning method that enables agents to interact with surrounding environments and make a decision independently. Lee et al. [27] proposed an agent-based deep reinforcement learning (DRL) approach for navigation. A wide variety of scenarios (like Obstacle, Hallway, Crossway, Circle) were successfully simulated and there was no necessity to define complex rules or tune parameters for each scenario. However, the resultant trajectories in some cases did not coincide with the shortest and fast route. Lee [26] proposed a look-ahead crowded estimation approach embedded RL algorithm to find the shortest and less crowded path to the exit by avoiding crowdedness that might pose a serious concern during evacuation. Notably, RL algorithm with look-ahead crowded estimation method showed good performance, while the pure RL algorithm generated evacuation routes faster. Sharma et al. [50] proposed an environment under effect of fire, uncertainty, and bottlenecks for evacuation training of agents using tabular Q-learning and deep Q network (DQN). The main intuition underpinning the use of Q-learning pre-trained DQN model was to offer prior updates for increasing stability and convergence. After comparisons, this method was able to outperform RL algorithms, such SARSA and actor-critic methods. Tian et al. [51] studied time for safe evacuation based on RL in the immersed tunnel under fire with multiple exit routes to measure the true evacuation speed of pedestrians. The congestion due to the rushed crowd movement toward the exits may generate serious harm. Therefore, including the crowdedness factor in the evacuation modeling is essential in the evacuation control framework. Fragkos et al. [52] developed ESCAPE service based on the RL and game theory to promote effective evacuation. The agent’ behavior was modeled as a non-cooperative minority game. The selection of evacuation route was recommended based on RL, and agents via the concept of minority games to make the final decision whether to follow the initially selected and recommended route or not. The function was based on past decision experience and updates from the disaster zone. The ESCAPE service was evaluated under various scenarios.

Zheng et al. [49] proposed a collaborative evacuation approach based on DRL and two layer-mechanism to minimize the complexity of training and achieve global path planning. A multi-agent deep deterministic policy gradient (MADDPG) was used to enhance the collaborative path planning performance of the leader agent. RVO algorithms [66] and k-means algorithm [68] were implemented for collision avoidance and crowd clustering. Moreover, Wang et al. [22] adopted the conventional machine learning models to examine their potential at studying crowd behavior to extract rules that govern exit finding of pedestrians in multi-exit places. Traditional ML approaches such as decision tree, support vector machine, and k-nearest neighbor have been applied to classify movement patterns, while principal component analysis (PCA) was used as a data-reduction tool. These machine learning techniques have illustrated comparable accuracy in prediction. However, the approach becomes insufficient with high dimensional data. Wan et al. [53] proposed a robot-assisted evacuation system for pedestrians regulation based on DRL where CNN was also applied for the features’ extraction from high-dimensional images. Also, the Q-learning was used to control a robot, based on specific features to maximize the outflow of pedestrians. SFM with human–robot interaction (HRI) forces were used for the simulation of pedestrian motion. DRL has shown good progress in robotics. However, applications of robots influencing human behavior have not attracted much attention. One of the significant aspects of goal-driven methods is the speed learning of pedestrians or agents. Martinez et al. [32] addressed this issue by proposing iterative and incremental learning strategies based on the combination of vector quantization with Q-learning (VQQL) in pedestrian simulation to speed up the learning. The iterative VQQL deals with a fixed number of agents in every learning iteration where the incremental VQQL starts in the first iteration with one agent then one agent increases each iteration till the last iteration as such impacting gradual learning of the interactions among the agents. Table 3 provides a summary of the reviewed articles on crowd analysis indicating the applied ML techniques, scenarios, platforms, and the number of agents. It also clearly shows that crowd evacuation research has received the most attention due to the significant impact of DRL approach in evacuation followed by the interest in crowd abnormality detection and crowd prediction subsequently.

Table 3 Summary of surveyed papers in crowd analysis

3.2 Abnormality detection (anomaly)

Crowd behavior analysis is still a challenging problem [69]. Despite the fact that much research has been conducted on the individual behavior analysis vision-based [70], crowd abnormality detection is a significant aspect of the crowd analysis. Various methods have been applied for anomaly detection such as SFM [71], spatio-temporal motion pattern model [72, 73], high frequency and spatio-temporal feature [74], and local pressure model [75]. However, the anomaly detection problem continues to remain open widely and research efforts are dispersed in approaches, interpretation of the problem, assumptions, and objectives [19]. Based on the available research on crowd abnormality and detection methods, the abnormality can be globally or locally detected [76]. Machine learning has been widely adopted to address the problems associated with abnormal pedestrian behaviors.

3.2.1 Local abnormality detection

Alsalat et al. [57] proposed a real-time detection system for crowd abnormal behavior during Hajj pilgrimage via Internet of Things (IoT). Herein, RNN particularly LSTM and GRU were applied for classification and detection. Wearable wristband sensors which contained heart rate and motion sensors, GPS, and a wireless connection were used for abnormal behavior detection via heart rate and motion. Clearly, local abnormal detection methods are a very significant factor for individuals in crowd, but these methods do not reflect the crowd distribution dynamics, and as such, minimal research has been conducted in the detection of local abnormality.

3.2.2 Global abnormality detection

This sub-section focuses on global abnormal crowd behavior detection methods for abnormal group behavior. Jiang et al. [58] adopted LSTM model [42] for crowd distribution prediction in a specific zone at a particular time in the short and long term. The method was demonstrated using high dimensional dataset with spatial and temporal crowd dynamics collected in Japan during the 2016 Kumamoto earthquake disaster. The results indicated that LTSM is practical and achieved higher accuracy in the prediction of crowd dynamic estimation compared to traditional regressive models. However, performance can be enhanced by deploying convolutional-LSTM [77]. The convolutional layer CNN performed as filters to identify further sophisticated spatial structures to feed into LSTM layers for temporal prediction. Direkoglu et al. [45] detected an abnormality in crowd behavior based on the angle difference at each location between the optical flow vectors. SVM was used and showed competitive results in learning normal behavior. Moreover, the UMN [78] and PETS2009 [79] datasets have also been used for evaluation. Iwahashi et al. [46] presented a disaster detection algorithm based on the Emergency Rescue Support System (ERESS). Acceleration and angular velocity sensors were used to detect disaster by analyzing crowd behavior via support vector domain description. Varghese et al. [47] employed a 3D CNN-based crowd emotions to extract spatio-temporal features. A multiclass SVM was introduced to categorize and predict multiple heterogeneous crowd behavior as normal, abnormal, fight, panic, cheerful and congested based on emotions such as happy, excited, angry, scared, sad, and neutral. The approach effectiveness was evaluated and validated with three benchmark datasets. In addition, personality traits have been observed to be significant in behavior detection to address the lack between high-level crowd behaviors and low level features [80]. SVM has been successful in predicting six different behavioral classes. Studies on crowd abnormality detections via machine learning techniques have shown good performance and promising results for further future progress.

3.3 Crowd prediction

Prediction is a very significant factor for crowd safety. Though, it continues to remain a challenging task specially in large-scale events such as religions, festivals, entertainment, and sports which happen more frequently. Recently machine learning techniques have been adapted for crowd forecasting. More specifically, critical crowd conditions can be identified by features such as crowd density, speed, and flow [81]. This article classifies crowd prediction into crowd disaster prediction [16] and trajectory pedestrian prediction.

3.3.1 Crowd disaster prediction

Studying the critical conditions of the crowd is very important to predict the crowd disaster earlier as possible in order to prevent any potential disaster. Nagananthini et al. [55] developed an algorithm based on CNN for crowd density estimation that sends early and automatic warning messages to authorities to avoid a certain disaster. The proposed method is observed to work well in a varying illumination environment [82]. The datasets were used for evaluation and the results indicated a good performance, but the scale was very small where a warning message was displayed when crowd counting exceeded 25 people.

3.3.2 Pedestrian trajectory prediction

Prediction of a continuous location of a pedestrian is very insightful for crowd safety and management. Trajectory prediction is an important task and has shown some potential in several applications such as behavioral prediction [83, 84], traffic flow clustering [85], motion analysis of crowd [86], crowd segmentation [87], and abnormality detection [88]. Indeed, trajectory prediction of a pedestrian is still a challenging mission due to the nature of human behaviors. Duives et al. [59] developed a real-time forecasting method [89] for the next movement prediction of a pedestrian in a large-scale crowd. The available historical sequence was used by means of RNN with a Gated Recurrent Unit (GRU). The dataset was collected from Mysteryland 2017 large music festival event in Netherlands where the movement of pedestrians translated into a sequence of cells. Prediction of the next cell was slightly better at the end of a sequence than at the beginning of a sequence. Social LSTM [90] was also proposed for trajectory prediction. A social pooling layer was designed for capturing dependencies among multiple correlated sequences and interactions that might occur in the future. However, it ignored the importance of differences between pedestrians. Lee et al. [60] utilized RNN to capture previous histories of motion, the scene (roads and crosswalks), and agents’ interactions (pedestrian and cars) for future trajectory predictions of an agent in dynamic scenes. This proposed method showed some improvement in accuracy prediction. Xu et al. [61] proposed crowd interaction DNN for displacement prediction considering all pedestrian motion and interaction. LSTM was used for the motion modeling of pedestrians. Bi et al. [91] employed Markov decision process (MDP), which is a basic reinforcement, for optimal route prediction for evacuation The method demonstrated good efficacy while evaluation using the meteorological data. The main objective was to provide a safe evacuation route for people during a disaster. Yi et al. [56] proposed a Behavior-CNN to model pedestrians' behaviors in the dense crowd to capture the route history. CNN was used to model pedestrians’ interactions. However, this proposed model ignored the pedestrians in the far future. Clearly, several works have been done in crowd prediction which plays a significant role in human safety. However, there is still a significant gap to fulfill. The increase of mega events and the fast growth of urban cities will attract much interest to the field of crowd prediction.

4 Simulation platforms

The research of crowd simulation has been gaining more attention recently due to the advancements of Graphics Processing Units (GPUs) and Machine Learning where GPU accelerates agents' training and simulation. Commercial crowd simulation platforms such AnyLogic, FDS, EXODUS, and SIMULEX, which are based on multi-agent methods have been found useful for capturing the human crowd dynamics. However, they are rule and target-based where most of the evacuation routes and agent behaviors are predefined by models and simulations, so sudden or unplanned situations which occur frequently in crowd are ignored in these commercial platforms.

Researchers have developed simulation models using native languages such as C+ + and Python or using software packages such as Visual Studio. Most of the surveyed articles did not provide any details on the used software in their works. TensorFlow [92] and Torch [93] are open source libraries for machine learning models containing a wide range of deep neural networks. Keras is a high abstraction level for TensorFlow. Lee et al. [27] used C+ + platform for the agents’ simulation, Python for learning a policy, and TensorFlow for DNN process. PC with CPU (i7-6850 K) was used for computations. Wang et al. [9] carried out the simulation of crowd evacuation experiment on Visual Studio 2012 Open Scene Graph. Wan et al. [53] used one CPU (12-core i7-6800 K) and two GPUs (NVIDIA TITAN Xp) for training and testing. The DNN was implemented with Python, however, PyCUDA was used for the implementation model of pedestrian motion. The evaluation was conducted in a 3D environment built by Unity engine. Zheng et al. [49] built a continuous environment in the OpenAI Gym to train the optimal path of the leader agents. MATLAB, Microsoft XNA Game Studio 4.0, Microsoft Visual Studio 2012, and Open Scene Graph were adopted for the development of the crowd evacuation simulation platform. X. Li et al. [54] conducted the experiment on Ubuntu 16.04, via Scikit-learn 19.1 and Redis 4.08. Jiang et al. [58] implemented LSTM model using Keras framework with TensorFlow [94]. Alsalat et al. [57] used Ubuntu environment, Keras [35], and Python. A CPU (Intel (R) Core TM i7-3630QM) and a GPU (NVIDIA Quadro K2000M) were used for model training and classification. Nagananthini et al. [55] employed MATLAB 2013a for evaluation with datasets. Lee et al. [60] utilized TensorFlow for all model implementation and (NVIDIA Tesla K80) GPU for end-to-end training. Xu et al. [61] used a GPU (NVIDIA GeForce TITAN) and a CPU (Intel(R) Xeon(R)) respectively for model training. Sharma et al. [50] built a complex building environment containing 91 rooms in the OpenAI gym [95] wherein for model testing. The models have been developed in Python with TensorFlow and Keras. NVIDIA GTX 1050Ti GPU was used for agent training, each for 500 episodes. Clearly, TensorFlow has been used widely in as indicated in the surveyed papers here. It uses graphical representations and allows the developer to visualize NN models. There are several libraries that are specialized in training RL-based agents such as OpenAI gym [95] and Unity [96]. OpenAI Gym supports the Python interface. It is also compatible with TensorFlow, Keras, and Scikit-learn. However, no real learning environment has been reported yet for emergency evacuation.

5 Current challenges of crowd analysis

Some issues remain as challenges in crowd simulations and demand further development and research such as agent behavioral characteristics where human decisions and mobility are usually affected by human relation and social attachment and their individual characteristics such as gender, age, speed, level of familiarity, and knowledge, traits of leadership [97]. It is also evident that there is also a lack of real data on evacuation [98]. Waqar [99] attempted to survey the available datasets on crowd dynamics. The survey indicated a lack of datasets on events of entry and exit surveillance which play a crucial role in critical behavior. Besides that, monitoring evacuees’ behavior during real evacuation is very challenging. Moreover, it is very costly to set up a real evacuation experiment as such a scenario can impose ethical and hazardous implications. High-dimensional data is another challenge that has been addressed by researchers where methods have been proposed for data reduction such as auto-encoder and PCA which transform correlated features into a smaller size of independent features with maintaining the most important feature.

RL algorithms for multi-agent systems still suffer because of the computational complexity, and thus, it becomes significant to address an important aspect correlated to circumstances that require pedestrians’ communication and cooperation [100] to solve complicated missions [101]. Several novel algorithms have been developed to overcome this issue such as sparse interaction-based RL which allows agents to act aggressively, and negotiation-based MARL with sparse interactions which allows agents to select a strategy for their joint actions [102]. RNN and CNN have shown good performance in various crowd applications such as predictions of trajectory, destination, and disaster and detection of pedestrian personality and abnormal event detection. Existing works in crowd prediction and abnormality detection have been validated by extensive experiments on publicly available datasets. However, these works indicate a lack of significant factors that can ensure crowd safety such as crowd pressure, crowd flow, and effect of uncertainty and threat which could help much in very precise and effective detection of abnormal events or prediction of location or time where there is a possibility of incident or disaster to happen.

There is increasing attention to the significant role of data regards the study of crowd dynamics [103]. However, crowd research still faces difficulties with validation due to the lack of available sufficient data [104]. Data acquired by means of crowd monitoring systems and related to past crowd incidents such as the Love Parade disaster could enhance research on crowd analysis machine learning-based systems by providing valuable sources for validation of models and theory. IoT tracking systems besides machine learning techniques will play a significant role in measuring, analyzing, modeling, and optimizing crowd behaviors [5].

6 Conclusion

Research on crowd analysis is undoubtedly flourishing and it includes diverse research aspects such as visual surveillance, artificial intelligence, pattern recognition, etc. Crowd analysis has been successfully used for evacuation, abnormal detection, and prediction. DRL has demonstrated prominent results in sequential problems of decision-making and allows agents’ navigation in diverse scenarios. While CNNs enable more realistic crowd simulation with visual inputs. This article reveals that few studies on pedestrian trajectory prediction have been conducted in real-time on large-scale events. Furthermore, CNNs have brought about a breakthrough in processing and analysis of images and videos of crowd scenes, whereas RNNs have shone a light on sequential data applications such as predicting crowd movements. This review concludes that CNN and RNN have shown superiority in abnormality detection and prediction, whereas DRL can develop human-level capacities of reasoning. These methods contribute to the modeling and understanding of human behavior and will enhance further development to ensure human safety.

The main contributions of this work can be summarized as follows:

  • A review on crowd analysis machine learning-based systems as a combination of three categories, which are crowd evacuation, crowd abnormality detection, and crowd prediction.

  • A brief review on the implemented machine learning techniques in the surveyed works. Moreover, the article also presents a related analysis of crowd behaviors and activities.

  • State-of-the-art of crowd evacuation with an overview of data-driven methods and goal-driven learning methods.

  • An up-to-date review on the crowd studies of abnormality detection and prediction.

  • Useful tables are presented indicating current progress, advantages, and an overview of the aspects of crowd analysis.

Potential directions for future research are expected in crowd analysis vision-based systems that combine CNNs with RNNs and utilize RL to determine where, when, and what to monitor. As stated before, much development is still in need to ensure effective crowd management and human safety.