1 Introduction

It has been reported by the world health organization (WHO) that, the average annual human fatality rate globally is 18 per 100,000 persons due to accidents (Juárez and Gayet 2014). This figure varies across countries and excludes major and minor injuries. Not only does a road accident cause loss of precious lives, but it also damages vehicles and property. Additionally, depending upon the severity of the accident the passengers can be severely injured that may take them years to recover or may result in a permanent disability. An accident is normally considered to involve only car, however, this can involve collision of any two or more vehicles. At times collision of a vehicle with a building or pedestrian can also result in an accident (Akin and Akbas 2010). Many countries maintain the record of traffic accidents through their transportation department, traffic police or hospitals (Imkamon et al. 2008; Suriyawongpaisal and Kanchanasut 2003). These records show that the number of casualties and injuries due to road accidents are huge. Figure 1 shows a graphical representation of fatalities for the period from 2000 to 2011 caused due to road accidents in Great Britain alone (Andersson and Chapman 2011). It is worth mentioning that the number of fatalities in car occupants is way too high as compared to the rest. Based on Fig. 1, for 1 pedestrian fatality there are 2 car occupant fatalities, for 1 motorcyclist there are 2 car occupant fatalities and for 1 pedal cyclist there are 11 car occupant fatalities. It can be seen that from the year 2000 to 2011 there has been a gradual decrease in the number of fatalities. However, these numbers are still high. The presented results in Fig. 1 are that of a developed country, unfortunately, the situation in developing and underdeveloped countries is even worse (Hu et al. 2011).

Based on the aforementioned statistics on the loss of precious lives and property, road accidents have been a major and growing concern worldwide (Hu et al. 2011). There can be multiple approaches to prevent road accidents. Some of these may include traffic rules awareness, better transportation system, affective traffic policing and precautionary systems in vehicles. There are a number of reasons for road accidents, for example, using cell phone while driving, playing music, high acceleration, a sudden brake, responding to a request or congested roads (Dixon et al. 2005; Imkamon et al. 2008). Accident prediction can have a significant influence on traffic flow and safety. There are many existing approaches to detect unsafe driving patterns for accident prediction. Some of these are based on biometric detection and some use facial movement for driving fatigue assessment (Ji et al. 2004; Orazio et al. 2004). However, there is no fixed list of actions or data, based upon which accidents could be predicted accurately. This makes it difficult to consider all the reasons at a time for accident prediction. Due to this reason, a number of techniques have been proposed considering different methods and having diverse datasets for the detection of unsafe driving patterns (Dixon et al. 2005; Imkamon et al. 2008).

Fig. 1
figure 1

Road accident fatalities in Great Britain (2000–2011)

The choice of a particular technique to be used for detection of unsafe driving patterns is mainly dependent upon the type, size, and format of the data being recorded for this purpose. Figure 2 shows different types, sizes, and formats of data that can be gathered from both the vehicle and the person driving it. As shown in Fig. 2, the data can be collected form the vehicles, road conditions/infrastructure, data related to the drivers and weather conditions.

Fig. 2
figure 2

Data produced by a vehicle and/or traffic system

Previous studies show that AI-based techniques use neural network more often for the detection of unsafe patterns and crash prediction (Akin and Akbas 2010; Yuejing et al. 2010; Moghaddam et al. 2010; Wahab et al. 2009). Other studies are based on the statistical techniques like conditional logit model (Abdel-Aty et al. 2004, 2005; Zheng et al. 2010; Abdel-Aty and Rajashekar 2006; Xu et al. 2012a; Abdel-Aty et al. 2012). Genetic programming, an evolutionary technique, has also been used for the crash prediction systems. Work like Dixon et al. (2005) uses both learning algorithm and genetic algorithm for the experiments. In Xu et al. (2012b), genetic programming model is developed for the real time crash prediction on freeways. There also exists wide literature related to crash prediction based on statistical approaches (Zhou et al. 2007; Singh and Dongre 2012; Ning et al. 2009, 2008; Jeong et al. 2004; Wang et al. 2012; Bruns et al. 2005). Similarly, in Rygula (2009), analysis of speed profile is performed to identify the driving style.

The study based on vehicle’s data and/or driving features can be used in diverse ways and for many useful objectives. The key problems that can be addressed in this domain are listed as follows:

  • Driver identification: The data from the vehicle and also of the driver can be recorded and later matched with the pre-stored datasets. This can be used in minimizing car thefts and for parental control (Miyajima et al. 2007).

  • Profiling (grouping) drivers based on their driving features: Profiling the drivers can be done by recording driver’s driving features only. Where, a signature can be created for each driver using his/her recorded data. The said signature can then be used to identify the driver and, based on the signature, assign him/her to a particular profile. The profiling can be more conveniently done via clustering (Wahab et al. 2009; Kalsoom and Halim 2013).

  • Accident prediction: Accident prediction is a critical aspect in road safety, which can be addressed from four perspectives. An accident can be predicted using a) vehicle data, b) driver’s data c) road or traffic condition/history, and e) weather conditions (Yuejing et al. 2010). Any combination of these four perspectives will further increase the accuracy of an accident prediction system.

  • Early warning generation: An early warning generation system for vehicles seems to be somewhat similar to accident prediction, however, they are different. An accident prediction system can predict an anticipated accident with a margin of error, whereas; an early warning system may detect unsafe driving states/patterns and intimate the driver to be careful. An early warning system is mainly dependent on the individual driver’s style of driving a vehicle (Jabon et al. 2011).

  • Modeling individual driver’s features: Modeling an AI-based controller to replicate a driver can be done by training the controller using the driver’s patterns. This can have many applications like auto drive mode and autonomous vehicles (Ali et al. 2013).

  • Accident identification: Accident identification systems are activated once an accident has actually occurred. These systems usually identify an accident based on the jerk caused due to collision and may use additional information recorded from the vehicle sensors. Such post-accident systems are used to alert emergency services (Lv et al. 2009).

  • Prediction of driver’s suitability for driving: AI techniques can be used to evaluate a driver and propose his/her suitability for the driving of a particular vehicle. This can be done by recording the driver’s data generated during a test drive and matching it with one of the predefined clusters. Such techniques can be very useful for traffic police departments before issuing a driving license (Singh and Dongre 2012).

  • Postmortem analysis: Study of the reasons for an accident comes under the postmortem analysis. The study can benefit from AI and machine learning (ML) techniques to identify correlation between different events that could have triggered the accident (Akin and Akbas 2010).

There can be many other problem domains, such as study of traffic flow, effects of weather on road condition or appropriate path finding. However, these domains do not strictly take into account the individual driver’s driving features, and thus are out of the scope of this study.

Although, there exists a rich literature on different AI techniques used to predict accidents and unsafe driving behaviours but, there lacks a comprehensive survey on the existing methods in the literature. This paper aims at bridging this gap by encapsulating all the literature related to AI-based techniques for accident prediction, unsafe driving pattern identification, methods for profiling drivers, listing the key datasets/sources and identifying the future trends for road safety based on AI.

The rest of the paper is organized as follows. Section 2 lists the previous work on accident prediction/unsafe driving pattern analysis and comparative analysis of various AI techniques. Section 3 explains the available datasets and simulators for road safety studies. Section 4 covers the discussion, Sect. 5 lists the open questions and finally Sect. 6 concludes the paper.

2 AI techniques for accident prediction and unsafe driving pattern analysis

This section covers the key techniques from the domain of AI used in literature for accident prediction and to study the driver’s driving behavior. The techniques covered can be categorized into five subdomains, namely, search heuristic, supervised learning, dimensionality reduction, hybrid approaches and reinforcement learning. Figure 3 lists the categories and approaches covered in this work. There are many other techniques like, unsupervised learning, association rule learning, deep learning and hierarchical clustering however, this section covers the techniques which have been previously used for accident prediction or for the purpose of unsafe driving pattern analysis.

Fig. 3
figure 3

AI techniques used in literature for accident prediction

2.1 Evolutionary algorithms (EA)

Genetic algorithm (GA) (Koza 1998) is a heuristic based evolutionary search approach inspired by the Darwin theory and applies genetic operations of mutation and crossover to find the best solutions. GAs start with a set of candidate solutions known as the population. The candidate solutions are usually represented as a one dimensional array. The initial population is randomly generated using rules of the problem domain. The next population, also known as the next generation, is generated by selecting solutions from the previous population. For the purpose of promoting individuals from one population to the next, a fitness function is used to gauge fitness of the candidate solutions. It is assumed that new population will perform better as compared to the previous population and individuals having better finesses have more chances to be selected for the reproduction process. Crossover and mutation are the two reproduction operators used in GAs.

Genetic programming (GP), on the other hand, genetically breeds the computer programs to solve the problem (Koza 1998). Although GPs work like GAs but the candidate solutions in GPs are represented as a tree like structure. The breeding in GP is also based upon the fitness function and the reproduction operators are the same as were in the case of GAs. The main goal of the GP is to find the program from the search space which satisfies the fitness function and gives the best approximation to the objective. GP has two major advantages over the previous techniques. First, without any perspective, GP model can find a better functional solution to a problem. Second, GP can remove the black box effect caused by AI model (Xu et al. 2012b). In general, GP model is based on the evolution theory which works on the population of mathematical models. In each generation a mathematical model is selected, based on a fitness function of the GP model. When the predetermined number of generations is executed or the best fitness value is achieved, the GP model stops. Equation (1) is a fitness function developed based on the error values predicted by GP model and actual data (Abdel-Aty et al. 2004), where \(F({B_j})\) is the fitness value of the jth model, \(B_j \) in the population, \(B_j ({x_i})\) is the value of the jth model, and \(C({B_j({x_i})})\) is a scaling function that converts the value calculated by the model \(B_j \) into either 1 or 0. The functional form of \(C({B_j({x_i})})\) can be in any shape where values greater than a threshold are converted to 1 and others to 0.

$$\begin{aligned} F\left( {B_j } \right) =\mathop {\sum }\nolimits _{i=1}^n \left( {\beta ^{yi}\times \left| {y_i -C\left( {B_j \left( {x_i } \right) } \right) } \right| +(y_i -B_j \left( {x_i } \right) )^{2}} \right) \end{aligned}$$
(1)

A GP based model is proposed in Xu et al. (2012b) to predict a crash in real time. The proposed system in Xu et al. (2012b) is used for freeways based on traffic, weather, and crash data obtained from a local authority. Authors have studied two types of traffic conditions; uncongested and congested. A separate GP model is proposed for each type of the traffic conditions where, random forest (RF) technique is employed to select the variables from the dataset that influence a crash the most. The candidate solutions in Xu et al. (2012b) are initialized using the ramped half-and-half method (Koza 1992) by restricting tree depth to six levels. Later, during the evolutionary process the trees are reported to have a maximum depth of 30 levels. For reproduction, models with best finesses are selected and priority is given to models with less depth. The fitness function used in Xu et al. (2012b) is based on the number of hits and square errors as shown in Eq. (1). Each state is analyzed using GP model and results in Xu et al. (2012b) show that traffic flow characteristics, which may lead to a crash, are found different in case of congested and uncongested states. Authors in Xu et al. (2012b) also show a comparison between binary logit model and GP model using receiver operating characteristic (ROC) which depicts that GP has better performance as compared to binary logit model (Abdel-Aty and Rajashekar 2006). The population size for experiments in Xu et al. (2012b) is set to 1000, and the stopping condition for the GP is the maximum number of generations set at 100. The number of iterations, however, seems to be set at a lower side. It would, however, have been interesting to see the results of the experiments for 1000 or more iterations. The stopping iteration number having a larger value can also be more convincingly set unless the solution converges or until no further improvement in the best found solution is achieved for many consecutive iterations. Random forest (RF) modeling technique which consists of an ensemble randomized classification and regression trees (Breiman 2001) is used in Xu et al. (2012b) for the selection of contributing factors in crash risk. These candidate variables selected by RF model were used to generate the GP model. GP model was used for the detection of hazardous conditions that lead to a crash under each traffic state. Prediction performance is compared with a binary logit model which was developed for the same dataset.

The work reported in Dixon et al. (2005), Damousis et al. (2007), and Yang (2012) addresses the problem of modeling human recognition, accident prevention and accident prediction, respectively. However, they do not use GAs directly to solve the problem. Instead, the GAs are used to train the actual learning method for better performance. Authors in Dixon et al. (2005) have used GAs to minimize the error between the converged samples and ground-truth labels for prediction. GA is used to train the parameters of the fuzzy expert system in Damousis et al. (2007) where the chromosome is a coded GA to avoid loss of accuracy. Similarly, the work in Yang (2012) utilizes GAs to train the parameters of support vector regression (SVR).

2.2 Conditional random field (CRF)

Labeling of sequential data is a problem which arises in many fields; CRF is an undirected probabilistic graphical model that is used for labeling and segmentation of data (Lafferty et al. 2001). A simple classifier predicts the label without considering neighboring samples, while CRF takes all samples under observation in a linear chain to predict the label. Since CRFs use undirected graph, there is no biased value regarding fewer states. CRF is a discriminatively trained undirected graphical model which uses Markov property in hidden modes for observation and has characteristics of both discriminative and generative models. An ordinary classifier labels the sample data without considering the neighboring samples while CRF can take the whole context into account. To maximize the conditional probability of the label, given observation features are combined using exponential functions. Linear chain CRF feature function, for a node \(y_i \) given an input sequence x,  (Zhou et al. 2007) is shown in Eq. (2), where, \(f_k \) is the feature and i is the node number.

$$\begin{aligned} F_i \left( {Y,X} \right) =\mathop {\sum }\limits _{k} \lambda _{k}\cdot f_k \left( {y_i ,y_{i-1} ,\hbox {X}} \right) \end{aligned}$$
(2)

Since, multichannel sequential data cannot be applied directly to a discriminative classifier (like, support vector machine (SVM)) because the data is temporally correlated that is why in Zhou et al. (2007) CRF was used as an inference model. The work in Zhou et al. (2007) aims at combining the data received from multiple channels for the detection of unsafe driving patterns. The dataset is generated using STISIM car driving simulator (Zhou et al. 2007; Ning et al. 2009, 2008; Jabon et al. 2011) and a total of nine channels are used. The channels record data of throttle, brake, steering wheel, position, speed, acceleration, lane position, distance to the same lane vehicle, and distance to incoming vehicle. Sampling frequency of 30 Hz is used to record the data. For each of the channels, authors in Zhou et al. (2007) calculate minimum, maximum, mean, variance and first-order derivative to estimate Gaussian mixture model (GMM) to be used for safe and unsafe driving pattern, respectively. The fusion of multiple channels in Zhou et al. (2007) has an advantage of avoiding the need of labeling each sample in training data since both labeled and unlabeled data are used in the semi-supervised learning algorithms. Figure 4 depicts working of the system. The data from multiple channels is recorded and features are extracted to be forwarded to the CRF. The feature vector is passed to the inference model to estimate the current driving state where CRF classifies a stream to be safe or unsafe and warnings are generated as per desired threshold. While training, CRF finds the best inference model parameters. Training the CRF with labels of the complete dataset is a time consuming and expensive process because if we label some patterns unsafe then it is not necessary that all unsafe patterns may lead to an accident. On the other hand, since it is difficult to label the data manually, therefore semi-supervised learning is used in Zhou et al. (2007).

Results in Yuejing et al. (2010) show that CRF outpaces the SVM and hidden Markov model (HMM) in classification of both labeled and unlabeled data. The work in Wang et al. (2010a) presents an approach to predict driving danger-level based on multiple sensor input data. The data taken into consideration is that of the vehicle dynamic parameters, physiological data of the driver and driver dependent features of the vehicle while driving. CRF is used in Wang et al. (2010a) to model the temporal patterns that can be used to predict unsafe driving states. As was in the case of (Zhou et al. 2007), the data in Wang et al. (2010a) is also generated using STISIM car driving simulator. For the purpose of generating danger level source, a history of 1 min is scanned for the CRF. Authors in Wang et al. (2010a) have compared the performance of CRF with HMM and reinforcement learning where the later seems to outperform other approaches. Similarly, an approach for dangerous driving warning system is introduced in Wang et al. (2010b) that uses sparsely labeled dataset for training. Nine vehicle dynamic parameters are recorded using STISIM car driving simulator (Wang et al. 2010b) and a semi-supervised learning approach is used for prediction which outperforms CRF.

Fig. 4
figure 4

Flow of activities to detect an unsafe driving pattern

2.3 Artificial neural network (ANN)

Modern neural networks (NN) are non-linear statistical data modeling tools that are used to model complex relationships between input and output, and to find patterns in the data. ANN analysis is considered to be an alternative approach for the investigation of non-linear relationships in engineering problems (Akin and Akbas 2010). An ANN model is developed in three phases: modeling, training and testing. A typical neural network structure is shown in Fig. 5. Rules, input parameter, and gathering of data are used in the modeling phase. For training phase, preparation of the data and adaption of learning laws are performed. Accuracy and performance evaluation is performed during the testing phase by calculating error between estimated and actual outputs.

Fig. 5
figure 5

A typical neural network structure

During the past decade a number of ANN-based methods have been presented for the detection of accidents and its severity using different datasets and conditions. In Chong (2004) students of Oklahoma State University presented a model for the severity of injury caused by traffic accidents. Work in Chong (2004) used decision tree and ANN using backpropagation with different number of iterations to train the network. To minimize the error, conjugate gradient descent with 500 epochs was implemented. Analysis in Chong (2004) shows driver’s seat belt usage, light condition of the roadway and driver’s alcohol usage were the most critical features in fatal injuries. The dataset used in the study was taken from the national automotive sampling system (NASS) general estimates system (GES) consisting of traffic accident records from 1995 to 2000 with a total of 417,670 instances. Recently, in another work (Tambouratzis et al. 2010) for the prediction of accident severity (light, serious or dead), a fused methodology of probabilistic neural network (PNN) and decision tree is used on the data collected by the republic of Cyprus police. The result shows that it enhances the classification accuracy (Tambouratzis et al. 2010). Both the studies in Chong (2004) and Tambouratzis et al. (2010) use historical data for the prediction of an accident. Although, such an approach can be useful but may cause inaccuracy when tested in real time since the data gathered in real time has few attributes and covers a limited history spanning over a few minutes.

It is observed that there are many parameters that affect the occurrence of accidents. Generally, these parameters are related to traffic flow, road section length, infrastructure geometric characteristics, pavement surface conditions, lighting, as well as weather and driver behavior. Decrease in these occurring parameters that leads to an accident can reduce the occurrence and severity of the accidents; several accident prediction models using observational data have been created. Many supervised and unsupervised techniques are implemented for the accident prediction using a combination of neural networks, support vector machine and decision tree. Some of these can be seen in Akin and Akbas (2010); Yuejing et al. 2010); Moghaddam et al. 2010); Wahab et al. 2009); Lv et al. 2009); Qu et al. 2012); Li et al. 2008).

2.4 Principal component analysis (PCA) and hidden Markov model (HMM)

Principal component analysis (PCA) is an unsupervised feature extraction technique mostly used to develop/select smaller number of artificial variables when there are a large number of observed attributes (Zhao and Karypis 2005). These artificial variables are called principal components and they have the best capacity to explain the variance of data and retain the characteristics of data without losing information. HMM is a statistical model based on Markov process (Rabiner 1989). Markov process states are directly visible and observed. While in case of hidden Markov model, states are hidden, but the output state is visible.

A mobile application called crash prediction is reported in Singh and Dongre (2012) which takes various attributes as input like age, gender, disability (if any), vision, date of license expiry or experience in driving. This data is treated as single variable and PCA is applied on it. Afterwards it is converted into driver dependent variables, on which the HMM technique is applied to determine if a driver is fit, unfit or partially fit for driving. For the purpose of prediction and classification, PCA serves as a preprocessing technique to select minimum key attributes. Road accident prediction is done in Ali and Bakheit (2011) using ANN and for the purpose of selecting input variables for the ANN, PCA is used. PCA separates interrelationships to the independent components which are linear combinations of original items in the data. The approach in Farah et al. (2007) is based on creating an infrastructure coefficient for the freeways to predict an accident. The said coefficient is based on the freeway and its geometric features. PCA is used for the calibration of the coefficient consisting of eleven infrastructure characteristics. For the purposes of automated traffic safety analysis, work in Yu and Abdel-Aty (2013) presents an approach to identify unsafe driving maneuvers at intersections using data from video sensors. HMM is used in Yu and Abdel-Aty (2013) along with k-means clustering for the grouping of vehicle trajectories.

2.5 Fuzzy logic

Fuzzy logic is the representation of possible decisions in natural language, instead of quantitative representation (Mohan 2011). With fuzzy logic we can express our decision in terms of words instead of numbers and a solution provided by fuzzy logic is basically the reflection of human predicted solution. It enables computerized devices to think and reason like humans. Fuzzy inference system has the ability to map the nonlinear function using fuzzy rules and it can easily map the vector input into scalar output. A fuzzy inference system has four major components: fuzzifier, inference engine, rule base, and defuzzifier. Fuzzifier maps corresponding inputs to fuzzy membership while the rule base contains the rules provided by the human expert. The inference engine converts the fuzzy inputs to the fuzzy outputs using the rule base data. Defuzzifier converts the fuzzy output to a crisp numeric output as shown in Fig. 6.

Fig. 6
figure 6

Fuzzy system overview

A detailed study is conducted in Wahab et al. (2009) to extract those features which are effective to profile a driver. A feature extraction technique based on GMM is executed to extract the features from the accelerator and the brake pedal pressure. The extracted features are then used as an input to a fuzzy neural network (FNN) to identify the profile. Results in Wahab et al. (2009) show that FNN has better performance as compared to simple multi-layered perceptron (MLP). An accident prediction framework is presented in Hu et al. (2004) which gets data of moving trajectories using a 3D model vehicle tracking and also records the images in sequence. To learn the vehicle tracking trajectories, a new fuzzy self-learning algorithm, based on fuzzy sets, is proposed in Hu et al. (2004). Accident is predicted by computing the matching degree and monitoring the activity of two moving vehicles. Gaussian field (Ning et al. 2008) uses semi supervised learning approach which defines a weighted graph based on a similarity function using both labeled and unlabeled data. All the data are sampled using a window size of 1 s and statistics like minimum, maximum, mean, variance and a first order derivative of each sample is calculated. This statistical data is compared with the GMM to compute the features. CRF features are more expressive and have the ability to capture the dependency between the hidden state and the observations.

In Imkamon et al. (2008) unsafe driving behavior is treated as a subjective quantity and three perspectives are assumed to be used for detection of the unsafe driving behavior. The first perspective is the passenger point of view; second, driver’s point of view and the third perspective is vehicle status. For the passenger point of view, the authors detect heavy jolts caused by sudden turns or brakes by using 3-axis accelerometer mounted on the passenger’s seat. Video camera is mounted on the car’s console to emulate the driver vision and focus on the road as a driver point of view. For vehicle status, velocity and speed of the engine are read directly from engine control unit (ECU) by on-board diagnosis II (OBD II) protocol. Result in Imkamon et al. (2008) shows that the proposed system works according to human opinions. Following are the rules which are used in fuzzy inference system for the output of the system (Imkamon et al. 2008) which exhibits the probability of hazardous driving behavior.

  1. (i)

    Rules to minimize the impact to passengers

    • \( \hbox {If }\hbox { acc}\_\mathrm{X} \hbox { or} \hbox { acc } \_\mathrm{Y} \hbox { or } \hbox { acc } \_\mathrm{Z } \hbox { is } \hbox { HIGH } \hbox { then } \hbox { output } = \hbox { HIGH }\)

  2. (ii)

    Common rules for driving safety

    • \( \hbox { If turn rate is HIGH then output} = \hbox { HIGH}\)

    • \(\hbox { If speed and density of car is HIGH then output} = \hbox { HIGH}\)

  3. (iii)

    Rules for car protection

    • \(\hbox { If Speed is HIGH then output} = \hbox { HIGH}\)

The first rule is to discourage high acceleration while the second rule is for inferring about low speed requirement in critical situations. The third rule is for controlling the fuel consumption.

2.6 Temporal difference (TD) learning

Temporal difference learning is a supervised learning technique mostly used for reinforcement-based learning to measure the expected future reward (Doherty et al. 2003). Main goal of the temporal difference learning is to predict the quantity that depends on the future values. To find that, it uses prediction of successive time steps due to which it is called temporal difference learning.

Previously, TD learning has been used to find the unsafe driving patterns; however, it is still difficult to quantify the list of actions that contribute to unsafe driving. A generic framework is proposed in Ning et al. (2008) to detect unsafe driving patterns at runtime using multiple sensor readings. A danger level function is learned in Ning et al. (2008) and TD is used to approximate the expected function reward. The dataset used in Ning et al. (2008) is recorded using STISIM car driving simulator using 36 subjects where a sample of 20 min is recorded for each of the subjects. A signature is generated for each of the subject using mean, maximum, minimum, and variance which is used for the experiments instead of using raw data. Since the main challenge is labeling of data for training, therefore the authors in Ning et al. (2008) have used TD learning as reinforcement learning to give future expected reward to the danger level. TD learning gives an approximation to penalty the observable at collapse time. The proposed TD-based approach in Ning et al. (2008) is compared with two-category classifier using logistic regression and linear regressor. While in Wang et al. (2012) the authors have also used TD learning to avoid the labeling issue during training of data by giving an approximation to the risk at crash time.

2.7 Support vector machine (SVM)

Support vector machine (SVM) is a primary classifier based on decision boundaries and hyper plane. SVMs are learning machines that plot training vectors in a high dimensional space. An SVM-based approach in Burges (1998) classifies the data by creating support vectors and hyper plane represents the classified data of different classes. If the selected data for classification does not represent proper information, SVM cannot determine hyper planes in correct direction which degrades its performance. SVM is also limited for binary classification; however, it can be extended to multiple classes using transformation on the input data.

In Li et al. (2008) a prediction model based on statistical theory is proposed which uses SVM. Analysis of the model is performed on rural frontage road data of Texas. SVM model results are compared with negative binomial (NB) regression model which shows that SVM predicts with better accuracy as compared to the NB model. A previous work of the authors was based on the backpropagation neural network (BPNN) where the result showed that SVM is much faster as compared to BPNN. Correspondingly, work in Lv et al. (2009) implements SVM to study the vehicle patterns that results in an accident and also those which do not lead towards an accident. This makes it a binary classification problem and thus SVM is a better suited classifier. The data used in Lv et al. (2009) is of a real time traffic scenario using a traffic simulator software tool. The RBF kernel function is used with SVM for the experiments in Lv et al. (2009). The study uses six variables as candidate accident precursors. These variables are the mean and the standard deviation of traffic, headway speed, headway time, precursor values recorded for 5 min duration 50 min prior to an accident and for 5 min duration right before an accident, respectively. The study reports that, for hazardous condition prediction, SVM must be supplied with more than one traffic variables.

In order to evaluate real-time crash prediction the study in Yu and Abdel-Aty (2013) uses SVM. For the purpose of selecting key attributes from the data, classification and regression models have been used. The data are divided into two categories: training and scoring. Two crash datasets consisting of 265 crashes and 1017 non-crash instances from a 15 mile mountainous freeway are used for the experiment. A number of kernels have been evaluated and RBF is found to be performing better as compared to the rest.

2.8 Other techniques

Image processing is used to analyze and manipulate the data by taking images or video as input and applying image processing operations like compression, segmentation, and thinning. An android based application, CarSafe, is reported in You et al. (2012) for driver safety that uses dual cameras and other embedded sensors in a smart phone and fuses all the collected information. Front camera monitors the driver’s head pose and other activity to alert the driver in case of drowsiness or fatigue. While the back camera screens the distance between vehicles and other lane change actions to warn the driver if the vehicle is too close to another vehicle. This whole process is performed in four steps of camera switching, frame dispatching, image preprocessing and driver status implication (You et al. 2012).

Model predictive control (MPC) generally represents the behavior of a dynamic system and controls the process (García et al. 1989). MPC predicts the change caused in dependent variables due to independent variables using dynamic process model. Mostly these are used in industries for process control to predict and control all the changes in inputs. A key advantage of MPC is the optimization of current time while considering future events. In Rygula (2009); Murphey et al. 2009) driver style identification was analyzed using speed graphs and jerk analysis within a specific time window. Observation of the speed graph shows that each driver has a unique speed graph and it can be used as a behavioral biometric and deep analysis can also be used to investigate the driver’s psychophysical state (Rygula 2009).

Multiple linear regression (MLR) is a statistical technique used to find the relationship between dependent and independent variables. To minimize the error between observed differences and predicted values it uses least square method. It takes a group of random variables as input and finds the mathematical relationship between them by a straight line which approximates all data points. MLR is used in dendroclimatology for developing models to reconstruct climate variables from tree-ring series. MLR cannot be used for time series problem because, in time series data, most of the observations are dependent on each other. Linear prediction is used mostly for speech processing and analysis of speech signals. They are linear mathematical operations, which are used to estimate the future value of time signal by analysis of previous sampled values. MLR is used in Abdel-Aty and Radwan (2000) for modeling traffic accident occurrence.

DS (Driver’s Style) classification (Murphey et al. 2009) is an algorithm which was proposed for online driver style identification. It calculates the jerk within a specific time window and uses average jerk for identification of a driver. It also predicts the road type and traffic state whether it is congested or not. Experimental result in Murphey et al. (2009) shows that DS classification performs better than acceleration-based classification. Halim et al. (2016) addresses the problem of profiling drivers based on their driving features only. A dataset is recorded using 50 subjects and is then profiled using clustering techniques. For the purpose of clustering k-means, Fuzzy c-means and model based clustering is used. Results in Halim et al. (2016) shows that average speed, maximum speed, number of times brakes were applied, and number of times horn was used provide the information regarding drivers’ driving behavior, which is useful for clustering. The work later trains multiple classifiers for the prediction of drivers’ profile. Canale and Stefano (2002) presents an analysis to decide whether it is possible to determine a driver’s behavior. Additionally, the work also investigates the suitable signals for identifying the driver’s style and also which parameters can be used to describe a driver’s style. An assignment procedure is defied to classify a driver’s behavior within the stop and go task. A Dynamic Time Warping (DTW) and smartphone-based sensor-fusion system is presented in Johnson and Trivedi (2011) to classify a driver in two categories: non-aggressive and aggressive. The system recognizes and records various actions without external processing. Work in Ly et al. (2013) explores the possibility of using the vehicle’s inertial sensors from the CAN bus to build driver’s profile. This helps in reducing the dangerous car maneuvers by providing an appropriate feedback. Results show that the braking and turning events contribute more towards characterizing an individual. Shi et al. (2015) present an approach to quantitatively evaluate driving styles by normalizing driving behavior based on personalized driver modeling. A personalized driver model is established for the drivers. The personalized model is evaluated by a neural network. Later the driver model is used to simulate standard driving cycle test. An aggressiveness index is also proposed using the energy spectral density analysis on the normalized behavior. The index is useful to identify abnormal driving behaviors.

Transportation professionals are paying much attention towards the development of real time crash prediction models for freeways. In Veeraraghavan et al. (2005) physical data such as speed and turn signal is used for experiments. To measure the posture of the driver, a pressure sensitive chair and ultrasonic six degree freedom head tracking system was used and have manually labeled the videos of driving according to a list of situations. The main objective is to predict the time series of human recognized situation using the various sensory inputs. Sandia cognitive framework (SCF) (Dixon et al. 2005) is used to integrate the information of vehicle state and driver posture which gives a nonlinear dynamical pattern for recognition system. To minimize the error between ground truth and estimated parameters using SCF, authors in Dixon et al. (2005) use gradient descent approach and GAs. Using gradient descent estimation to minimize the error, rules are updated which, consequently guarantees to converge and terminate at a local minimum. For the GA, DAKOTA (design analysis kit for optimization and terascale applications) optimization package (Bruns et al. 2005) is used.

It is a common observation that every driver operates the vehicle in his/her own style, some drive slow while some like aggressive driving. Vehicle state normally depends on the speed, steering angle, acceleration, brake usage frequency, inter car distance and driver’s mental state. Authors in Rygula (2009) have analyzed the driver speed profile to identify the driver style. Driver speed profile can also be used to get other information like psycho-physiological state recognition, driver sleepiness and tiredness. Amount of points in which the derivative of speed changes with respect to time is called speed profile in Rygula (2009). Equations (3) and (4) are used in Rygula (2009) to estimate maxima and minima of the speed function, respectively, as

$$\begin{aligned} \frac{dV}{dt}= & {} 0\quad \hbox {and}\quad \frac{d^{2}V}{dt^{2}}<0 \end{aligned}$$
(3)
$$\begin{aligned} \frac{dV}{dt}= & {} 0\quad \hbox {and}\quad \frac{d^{2}V}{dt^{2}}>0 \end{aligned}$$
(4)

where, V is the velocity, t shows the time, and, \(\frac{dV}{dt}\) is the change in the velocity with respect to the time, i.e., acceleration. Equation (3) defines the amount of points in which the speed offshoots with respect to time. This is called intensity change of speed profile. Equations (3) and (4) are used to estimate extrema of the speed function taken as the points in which acceleration change its sign

Table 1 lists the aforementioned related studies on accident prediction models that are based on different AI techniques. These techniques can be divided into three groups, which are supervised or unsupervised learning, statistics-based learning, and other techniques. Table 1 shows that ANN, SVM, GAs and statistics-based techniques are frequently used AI techniques for the prediction of an accident/unsafe driving patterns analysis during the past 10 years.

Various AI-based techniques have been covered in this section for the prediction of an accident and analysis of unsafe driving patterns. To have a comparative view of these techniques, Table 2 lists eight important features of these techniques. The features covered include: (1) learning technique/domain, (2) whether or not the technique is population-based, (3) best reported accuracy of the techniques for accident prediction, (4) whether or not the technique is suitable for real-time deployment, (5) does the techniques support multi-format data?, (6) state-of-the-art work using the technique, (7) key strength and (8) notable weakness of the technique. The values of these eight features are listed keeping in view the accident prediction and AI domain. The values may differ for other problem domains. Based on the results in Table 2 the best reported accuracy is of PNN and decision trees with an accuracy of 95.9307 % in Tambouratzis et al. (2010). However, this accuracy does depend upon the type of data, the amount of data and the features taken into consideration. Any comparison made with a new or different technique will only be valid if it is tested on the same sample as was in the baseline approach.

Table 1 Different AI techniques used in literature for driving safety and vehicle crash prediction
Table 2 Comparison of various AI techniques for accident prediction

3 Datasets and simulators

An important issue in crash prediction system and road safety research is the availability of datasets. In literature, different studies have used diverse datasets, some of which are publicly available over the web, such as the CIAR (Wahab et al. 2009), STISIM data (Jabon et al. 2011) and SPD (Lafferty et al. 2001) datasets for the prediction of unsafe driving analysis and accident prediction, while other studies have collected data from local departments in the countries concerned. Table 3 lists the datasets used in related studies, including their size, instances and number of attributes.

In Dixon et al. (2005), data are collected through five different human subjects who were instructed to drive on urban roads for about 200 km and the data of 24 h was sampled at the rate of 4 Hz. For the classification using supervised learning following eight situations were decided as ground truth, (1) leaving intersection, (2) entering on ramp, (3) high speed roadway, (4) being over taken, (5) high acceleration, (6) approaching a slow vehicle, (7) preparing to change the lane, and (8) changing the lanes. Similarly, the proposed approach in Zhou et al. (2007) is evaluated using the car driving simulator STISIM. Work in Zhou et al. (2007) uses nine channels including: throttle, brake, steering wheel, position, speed, acceleration, lane position, distance to same line, and distance to incoming vehicle for driving performance. In Imkamon et al. (2008) Hitachi H48c accelerometer is used for acceleration where, a high value of acceleration in X-axis and Y-axis indicates the heavy jolts to passenger seat. The high value of acceleration in Z-axis indicates the jump and roughness of road experienced by the vehicle. By ignoring polarity of axes, work in Imkamon et al. (2008) has considered only the forces imposed on a passenger while ignoring the direction (Imkamon et al. 2008). By passing through a high pass filter, DC offset is also eliminated. All driving events of moving car can be measured by the change in acceleration.

In Xu et al. (2012b), for the experiments of GP model, data were obtained from a 21 mile freeway section of the I-880N freeway in United States using 40 loop detector and 3 weather stations. The crash data for the selected freeway were obtained from the statewide integrated traffic record system (SWITRS) and traffic data were obtained from the highway performance measurement system (PeMS) maintained by Caltrans. Traffic data were collected from the nearest upstream and downstream station for each crash location. For a speed graph analysis, tachograph disk is easily accessible and provides 24 h continuous data of driver by reading driving parameters, which include speed graph and road graph. Detailed data description of tachograph is available in Mitas (2007); Ryguła and Mitas 2007). Using tachograph disk, speed graph is generated as a result of up and down movement of the scriber. Tachograph disk information can be converted into digital form by finding its extreme points and converting pole coordinate to the value of temporary speed and proper position on the time axis (Rygula 2009). Table 3 shows the datasets which are used in the literature. The format of the data and the recorded attributes depend upon the type of problem being solved.

To acquire the driving data, using a simulator is always a good choice since it will reduce time consumed and avoids any injuries during experiments. There are a number of simulators available for this purpose. STISIM is a programmable driving simulator with the support of 3D graphics. It comes with a number of ready to run simulations and is capable of recording user specific driving features. SCANeR is a driving simulation engine that serves to allow training and safety awareness for five types of vehicles, including: cars, trucks, bus, armored vehicles and emergency vehicles. The SCANeR is also supported by a complete development kit and 3D graphics. Carnetsoft driving simulator is a low cost simulator aimed towards the training and education of drivers. Carnetsoft driving simulator also has a software module for studying the driver behavior and collecting related data. VANET is an open source driving simulator with a key focus on security and supports simulation on real traffic scenarios by providing the ability to import maps. VANET also supports micro-simulation where each vehicle can be simulated individually and may take autonomous decisions. PTV Vissim is a specialized software tool developed to analyze public transport and the effect of signaling. PTV Vissim also provides an application programming interface (API) for integrating external applications. Aimsun is another traffic modeling software with capabilities of mesoscopic, microscopic and hybrid simulation. Quadstone Paramics is traffic simulation software used for planning and design of transportation systems. Quadstone Paramics also supports the study of pedestrians to be incorporated in the transportation infrastructure. For the purpose of evaluating infrastructure and policy changes, SUMO (Simulation of Urban Mobility) is developed as a free simulator. TraNS is a simulator for vehicular ad hoc networks to study the mobility of vehicles. TraNS provide a visual interface integrating the SUMO (a traffic network simulator) and ns2. OpenEnergySim is an online simulator that is used to visualize microscopic traffic and CO\(_2\) emissions of vehicles. iTETRIS provides a flexible simulation platform to analyze traffic at city level. Table 4 shows the key features of the aforementioned simulators.

Table 3 List of datasets used in literature
Table 4 Simulators for vehicle and traffic analysis

4 Discussion

The work in this paper encapsulates the studies based on AI techniques for accident prediction and identification of unsafe driving states. However, there are many open issues that have either not been comprehensively examined in the literature or reasons for an efficient and accurate system for accident prediction are lacking. These open issues provide opportunities for further research. This section lists the shortcomings of the previous work and discusses how these can be further investigated. The work in Dixon et al. (2005), has used a labeled dataset for classification of driving situations, however, it is not clearly discussed how many physical states and how many video sensor inputs are used for the experiment. Explicit mention of this information can certainly be useful for others to replicate the results and also for the comparison purpose. The study in Dixon et al. (2005) has used human subjects for the creation of ground truth which seems to be a tedious task keeping in view that each subject had to drive for around 200 km. Although the results are promising, yet a modern tool like STISIM car driving simulator can be used in future for recording the dataset. Authors in Dixon et al. (2005) employed GA which is an evolutionary algorithm and needs more time for convergence. Results show that GA has a very high computation time as compared to the gradient descent approach which makes it less efficient and less reliable for real time system. Since GA is search based heuristic, and in order to get better and quick results, multi-population GAs can be employed in the future. The multi-population GA optimizes the multiple candidate solutions simultaneously, which makes them get to the global best solution using local search in each of the population. The efficiency of multi-population GAs can further be enhanced using parallel GAs (Abu-Lebdeh et al. 2014) where each population of the GA will be assigned to a separate processor. For the purpose of identifying unsafe driving patterns or the prediction of an accident, GAs can be utilized to train a classifier using both labeled and unlabeled examples. For the labeled examples, GA can use the class labels from the training dataset to assign fitness to each candidate chromosome: whereas, for the unlabeled training example a fitness function based on maximizing or minimizing a reward can be utilized. Although GAs can also be directly used to predict an accident or unsafe driving pattern, instead of being used to train a classifier, however, there offline use is always recommended. The reason is the real-time performance requirement of the prediction system and natively slow nature of a GA. The GP and other evolutionary search heuristics share the common performance issue as are for the GAs (Table 5).

Table 5 Acronyms and notations used in this review (unless stated otherwise)

In view of the detailed study in Zhou et al. (2007), CRF seems to be a better approach as compared to HMM. HMM models the observations independently given the hidden variables, and these assumptions are too strong in many cases. Since, CRF uses a probabilistic approach for labeling the data based on a conditional probability distribution over a sequence of labels, this makes them perform better than HMM. In addition to the study reported in Zhou et al. (2007), CRF has also performed better than HMM in other studies related to bioinformatics and pattern recognition. For the purpose of computing maximum-likelihood values in CRF, a dynamic programming based approach can be used. Traditional CRFs do pose a limitation when it comes to representing nonlinear dependencies in each frame (Maaten et al. 2011). To overcome this, hidden unit CRFs seem to be potentially useful where the hidden units try to learn the latent distributed structures in the underlying data for better classification accuracy. HMM, on the other hand, can still be useful in better compression that allows sequences to be found than a simple Markov model, provided HMM is well-tuned.

Investigating the prediction performance and accuracy of various ML techniques provide a deeper understanding of the current state of prediction methods, enabling future research to take the next step. Different datasets are used in the previous studies nevertheless; it may be worth examining the relationship between selected datasets, prediction accuracy, and number of features used in those experiments. In the literature, a variety of measurements are used to assess the prediction performance. This survey considers accuracy rate for the comparison of previous approaches. Accuracy is considered for comparison because of two key reasons. Firstly, because it is reported in nearly all related studies, and secondly for accident prediction/unsafe driving pattern identification it is of extreme importance as failure of the prediction approach may be fatal. The results of Dixon et al. (2005) show that the gradient descent algorithm predicts the driving situations correctly over 95 % of the time, while GA predicts with an accuracy of 85 % using the test data. Additionally, gradient descent has a very less convergence time as compared to GA’s computation time. For the evaluation and training of 18.3 h driving dataset, 1 h is required in case of gradient descent approach and 1611 h for genetic algorithm (Dixon et al. 2005). However, the results and their comparison depend upon the problem being addressed and the dataset used in addition to the particular AI technique employed. Prediction performance comparison of GP and binary logit model in Xu et al. (2012b) shows that, under uncongested traffic conditions, GP model increases the accuracy of crash prediction by an average of 8.2 % and for congested traffic it is increased by 4.9 %. Fuzzy logic system proposed in Imkamon et al. (2008) is tested against ground truth based on a questionnaire prepared by three passengers. The system is tested in four rounds and average error of each round is 0.225.

In Wahab et al. (2009) authors have compared the performance of GMSS, MLP, ANFIS (adaptive network-based fuzzy inference system) and EFuNN (evolving fuzzy neural network) by taking different combinations of the driver static features, dynamic accelerator and brake pedal pressure. Where, the GMM based extracted feature outperforms the wavelet coefficients. Result in Wahab et al. (2009) depicts that EFuNN has 100 % gender identification rate and GMSS has low error for cross validation of drivers. The proposed combination of PNN and DT in Tambouratzis et al. (2010) shows better accuracy as compared to naive Bayes and BNN. For the work presented in Zhou et al. (2007), Ning et al. (2009) and Ning et al. (2008), the authors have tried to handle the labeling issue and training of labeled and unlabeled data by using CRF and TD learning, respectively. Experimental results show the effectiveness of the technique. In other statistical techniques, analysis of different driver’s speed graph (Rygula 2009) shows that each driver has a unique speed graph and long route driving causes increase in acceleration due to the tiredness of the driver. DS-Classification algorithm (Murphey et al. 2009) effectively classifies the driver style by conducting experiments in PSAT simulation environment. While analyzing the speed profile for the driver identification in Rygula (2009), performance measurement experiments were performed by analysis of 10 tachograph disk of 2 drivers. Results in Rygula (2009) show that both drivers have a significant change in their speed profile characteristic. “Appendix” lists the complete details of the literature covered in this paper as a comprehensive table.

5 Open issues

Based on a careful examination of the literature and on the basis of the analysis presented in Table 2 and “Appendix”, this section lists the current open issues for future research in accident prediction and detection of unsafe driving patterns. In general, the open problems in the subject domain are as follows:

5.1 Performance markers

Performance markers have been developed for many problem domains where the comparison of various techniques is done using these markers. For instance, clustering is a subjective issue and the performance markers for this are; Davies Bouldin index (DBI), Dunn index (DI) and Silhouette coefficient (SC). In case DBI is used as performance marker, the best value is the one having the lowest index value. For DI, the best has the highest index value and for SC the value closest to 1 is considered to be the best. However, most of the related studies in the domain of accident prediction and unsafe driving pattern analysis have used different datasets for method evaluation; thus it is not feasible to directly compare these models. Performance markers need to be developed for these studies where the said markers should take into consideration the size of data, type of the data, data format and prediction technique’s performance. This will not only help in comparison of results, but will also accelerate the pace of research for accident prediction, as the comparison with state-of-the-art approaches will get easier and more logical. To state an example from the domain of automated generation of game contents, a set of entertainment metrics is proposed in Halim et al. (2014) which is used for the measurement of entertainment that the contents of a game may carry. The same can be used for comparison of different techniques for generation of games. A performance marker as in Halim et al. (2014) is also required for the research in the area of traffic accident prediction and road safety approaches. Once a performance marker is introduced, this will also help in conducting studies by further increasing the attributes of dataset. Moreover, the effect on the performance of the accident prediction based on multiple channel data can be studied.

5.2 Benchmarking datasets

Creation of benchmark datasets is a missing item in accident prediction and unsafe driving patterns analysis domain. Benchmark datasets need to be developed upon which algorithms can be tested. Currently the UCI machine learning repositoryFootnote 1 is the major source of the benchmark dataset from where researchers can download a variety of domains related dataset for testing the results of the proposed approach against the ground truth. However, unfortunately as yet no dataset has been submitted as a benchmark consisting of vehicles data for accident prediction. Since the accident can be predicted from a variety of data, including vehicle’s information, weather data, road/traffic conditions, and driver demographic, datasets need to be created for all these perspectives. Although a benchmark tool, STISIM car driving simulator, to record road/traffic data is available which can be used to record datasets, but the cost is at a higher end and to prompt research in road safety using AI, benchmark datasets can play a vital role. To start with, the data sources listed in Table 3 can be combined at a central repository. A very rich source of data, strategic highway research program (SHRP 2)Footnote 2 can be useful in benchmarking the dataset which is publically available. SHRP 2 is authorized by US Congress to address key issues in highway systems. The four key research areas covered by SHRP 2 are; (a) to understand the interaction among various factors involved in highway crashes—driver, vehicle, and infrastructure, (b) renewal, (c) reliability and (d) capacity.

5.3 Driver identification

Identification of the driver is also an interesting research area that needs to be addressed in future (Wu and Ye 2009a). An individual can be identified based upon his/her vehicle driving patterns. Such approaches can be utilized in protecting vehicles from theft and for parental control. There are many existing techniques that use biometric, image processing and voice signals (Wu and Ye 2009b) for the identification of a driver. However, all of these techniques are visible to the vehicle driver and thus will implicitly ping an authorized driver if he/she is driving the vehicle without permission or if the vehicle is being stolen. A driver identification system needs to be developed taking into account the driving patterns of the individual driving the vehicle. In addition to other advantages, from the view point of road safety driver identification will avoid crashes due to rash driving of a stolen car and can also be used to indicate driving while being drunk. Such a system when integrated with the vehicle security features can prove to be very useful. An interesting work in the same direction can be seen in Miyajima et al. (2006) where spectral analysis is used to identify driving behaviors using signals like gas and brake pedal operation while accelerating or decelerating. Algorithm learning based neural network integrating feature selection and classification is a promising approach introduced in Yoon et al. (2013), the same can be applied for driver identification for better accuracy.

5.4 Clustering drivers and/or their features

Clustering is an unsupervised learning technique that is used to group similar objects using a distance function. Clustering can be applied on a set of drivers based on their driving features or based on drivers’ demographics. The resultant clustering can be useful for many domains, such as customizing a car design for a specific group of people, advertisement and road safety analysis. Since clustering is an unsupervised approach, a descriptive analysis of the formed clusters will be required that may involve a domain expert. The said clusters can also be useful in linking a group of drivers with a particular physiological profile that can further be used as a recommender system for a specific vehicle type, driving care or assistive technology that can be useful for that cluster. The results of the clustering are highly dependent on the size, type and features in the dataset. Clustering results vary across different types of clustering algorithms. As a proposed example, the driving features that may be taken into consideration for clustering can include: number of times footbrakes are applied, use of horns, maximum speed achieved while driving, driver’s average speed, ratio of left turns to use of left indicators, ratio of right turns to use of right indicators, maximum gear used by the driver, driver’s average gear and number of times a vehicle gets into reverse gear. However, the accuracy of data that will be recorded is influenced by the data recording tools used and this can be overcome by using a benchmark simulator like STISIM. Some work in this direction can be seen in Ellison et al. (2012); Kalsoom and Halim 2013); Halim et al. 2016).

5.5 Additional considerations

Accident prediction, as already a major focus of this paper, is also an important aspect in road safety. Algorithm further needs to be built that can predict an accident with better accuracy and less prediction time. Real time analysis of accident prediction techniques is worth considering. Currently, most of the studies use simulation data or the simulators for testing their approaches. There can, however, be many factors like environment, road conditions, weather and number of passengers in the vehicle that can have influence over the occurrence of an accident. These factors can cause a delta in the expected and actual performance of an accident prediction technique. The same accident prediction systems can also serve as an early warning generation for unsafe driving states.

The accident prediction techniques, most of the times, uses a classifier; the same classifier can also be used for modeling of the individual driver’s features. The modeling of a driver thus becomes a controller training problem. A trained controller can be used in fully autonomous or semi-autonomous vehicles (Göhring et al. 2013). Accident prediction systems can also be integrated with post-accident systems and services to trigger rescue and emergency activities. False alarm generation is an important issue for these studies and can be minimized by increasing the accuracy of the prediction technique. The secondary problems in post-accident systems include shortest path finding, searching a path which consumes minimum fuel and nearest hospital finding (Chen et al. 2012; Islam and Rahman 2014), where these problems can take into account the GPS systems of either the vehicle or the cell phone of the driver (Mathkour 2011).

Predicting driver suitability for driving is also an open problem that not only can benefit from the clustering approaches for grouping the drivers into homogeneous clusters, but can also utilize the prediction systems that can map a driver into one of the predefined classes. Such systems can be utilized by traffic police departments for issue or renewal of driving licenses. From the point of view of human machine interaction (HMI), appropriate interface development is vital for the accident prediction and unsafe driving pattern analysis systems. The driver needs to concentrate on his/her driving and the accident prediction and avoidance systems will only assist the driver for a safe journey. The interface through which the message is to be conveyed to the driver needs to be explored from the HMI perspective (Peschel and Murphy 2013) since visual cues to the driver may cause distraction. To resolve this issue, sonification of the signals (Halim et al. 2016) can be explored. Some other interesting directions for road and passenger safety include, anger management, speed management and stress management.

6 Conclusion

This paper presented a comprehensive survey of various artificial intelligence techniques used in the literature for predicting an accident and also to study the unsafe driving patterns. The literature covered is from the past 10 years, i.e., 2004–2014 and is motivated by lack of such survey in the literature. The review reveals that the artificial neural networks, support vector machine and genetic algorithms have been the most frequently used artificial intelligence techniques for the prediction of an accident/unsafe driving pattern analysis during the last ten years. The best reported accuracy for predicting an accident is of probabilistic neural networks and decision trees with an accuracy value of 95.9307 %. However, this accuracy does depend upon the type of data, the amount of data and the features taken into consideration by the prediction technique. The paper has also listed the available datasets and the simulators which can be used to conduct research on accident prediction and studying unsafe driving patterns. Accident prediction can be performed using rich format data like images/videos and the same can also be done using data formats that consume less memory and processing power such as, reading vehicle information from the engine. For the purpose of capturing the required data, richer data formats may also require expensive data recording devices to be installed in a vehicle, whereas, the engine related information can be captured using the on-board diagnosis protocol without any additional expense. Nevertheless, irrespective of the data format and the data capturing mechanism, the accuracy of accident prediction is the most critical aspect in road safety where the artificial intelligence has already contributed and this further need to be improved.

The open issues in the subject domain have also been discussed to provide directions for the future work. The performance marker setting is an important aspect in this regard that can be addressed in the future to support various projects for accident prediction, to compare the results and rank various techniques. Benchmark datasets are also needed that can help the research community in testing new algorithms for accident prediction using simulations. Driver identification and profiling of the drivers is an interesting domain that can also be further looked into. From the production point of view, solutions for accident prediction and unsafe driving pattern identification also need to take into account the financial aspect so that the vehicles with such intelligent features are easily affordable by most individuals.