Abstract
Objective: Analyze the road crashes in Cartagena (Colombia) and the factors associated with the collision and severity. The aim is to establish a set of rules for defining countermeasures to improve road safety. Methods: Data mining and machine learning techniques were used in 7894 traffic accidents from 2016 to 2017. The severity was determined between low (84%) and high (16%). Five classification algorithms to predict the accident severity were applied with WEKA Software (Waikato Environment for Knowledge Analysis). Including Decision Tree (DT-J48), Rule Induction (PART), Support Vector Machines (SVMs), Naïve Bayes (NB), and Multilayer Perceptron (MLP). The effectiveness of each algorithm was implemented using cross-validation with 10-fold. Decision rules were defined from the results of the different methods. Results: The methods applied are consistent and similar in the overall results of precision, accuracy, recall, and area under the ROC curve. Conclusions: 12 decision rules were defined based on the methods applied. The rules defined show motorcyclists, cyclists, including pedestrians, as the most vulnerable road users. Men and women motorcyclists between 20–39 years are prone in accidents with high severity. When a motorcycle or cyclist is not involved in the accident, the probable severity is low.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
According to the World Health Organization WHO [1], approximately 1.4 million people died in traffic accidents in 2016, and it is estimated that more than 50 million people suffered severe injuries Traffic accidents in 2016 were the eighth cause of death, and the main causes of death for people between the ages of 15 and 29 years old. In addition, such accidents affect pedestrians, cyclists, and motorcyclists. Half of the road deaths occur among motorcyclists (28%), pedestrians (23%), and cyclists (3%). Mortality rates in low-income countries are 3 times higher than in high-income countries. Although only 1% of motor vehicles are in emerging countries, 13% of deaths occur in these nations [1]. Colombia in 2016 obtained a rate of 18.5 fatalities per 100,000 inhabitants. This figure is close to the global average (18.2) and the average middle-income countries (18.8). Between 2012–2018 in relation to the road users involved, the motorcyclists correspond to 50% of the victims. Pedestrians with 24%. The users of vehicles with 17%. The cyclists with 5% of accidents. The objective of this research was to analyze the road crashes in Cartagena (Colombia), and the factors associated with the collision and severity. Cartagena has 1.2 million inhabitants and more than 120,000 motor vehicles. Cartagena in the last two years (2017–2018) has remained in the top positions for fatal accidents in capital cities. In the last 8 years, Cartagena has been considered the fifth most dangerous city in road safety after Medellin, Cali, Bogota, and Barranquilla.
2 Method
The method is based on official information from the control entities, which allowed for the application of data mining and machine learning techniques. The method is based on the application of Decision Tree, Rule Induction, Support Vector Machines, Naïve Bayes, and Multilayer Perceptron with WEKA software. The decision tree constructs classification models in the form of trees. Rule Induction is an iterative process that follows a divide-and-conquer approach. Naïve Bayes is a classification algorithm based on Bayesian theorem. Multilayer Perceptron creates a feed-forward artificial neural network. Support vector machines are learning algorithms for classification and regression analysis. In the prediction of road accidents, data mining techniques have been implemented, such as: regressive models [2], neural networks [3], artificial intelligence [4], decision trees [5], Bayesian networks [6], SVM [7], and combined methods. The aim is to establish a set of decision rules for defining countermeasures to improve road safety. The effectiveness of each algorithm was implemented using cross-validation with 10-fold. The methodological process of this investigation is presented in four steps: (a) Pre-processing of accident dataset; (b) application of data mining techniques through Weka Software; (c) analysis of results and metrics; (d) definition of decision rules and analysis of associated factors.
2.1 Data Sources (Include Sample Size)
The registration of traffic accidents corresponds to the Cartagena database from January 2016 to December 2017. The dataset corresponds to the reports by the Administrative Department of Traffic and Transport (DATT). In total, 10,053 traffic accidents were reported by agents and police. The data records information about temporality, road users, gender and age. For the pre-processing of the data set, 27 categorical variables were defined (see Table 1). The variables were classified into four categories: (1) road actors involved in the crash, (2) individuals involved, (3) weather conditions and timing, (4) accident characteristics. The levels of injury severity were determined as low-level-of-injury (material damages, minor, non-incapacitating injury) or high-level-of-injury (injured victims and fatality).
3 Results
The software WEKA contributed to the purification of the information by means of the Remove Misclassified method and duplicated instances. The dataset was reduced to 7,894 instances. Table 2 summarizes the variables used and their relationship to the severity. The analyzed data was divided into 16% of low, and 84% high-level-of-injury. In the descriptive analysis of the data, the greatest number of accidents occurs between cars (45%), followed by cars-heavy vehicles (28%), and finally between cars-motorcycles (14%). Accidents between private and public vehicles are prevalent (44%). Accidents involving motorcyclists (76%) and bicycles (88%) are more severe. The most frequent type of crash is the collision (99%), and the most severe are being run over (100%) and falling off the vehicle (93%).
After descriptive statistical analysis, an inferential and correlational analysis of the variables proposed in the prediction of severity was proposed. The proposed analyzes were the Spearman and Friedman ANOVA correlation. Table 3 summarizes the results.
The variables NOT, YB, YHV, YC, GHV, DW, TD, and TCD do not evidence a significant statistical association with the direct prediction of the severity of the accident (p-value > 0.05). The variables with a statistical association on the prediction will be represented in the definition of the rules (p-value < 0.05).
After the data pre-processing, the selected data mining techniques are applied and parameterized (See Table 4) with the 10-fold cross-validation technique. The results are compared with the metrics: precision, accuracy, recall, and area under the ROC curve (See Table 5). The results show a high consistency and similarity in the prediction metrics in the applied techniques.
From the best results of each of the techniques, 12 priority decision rules for road safety were defined (See Table 6).
4 Discussion
The results show cyclists and motorcyclists as the most vulnerable road users. Motorcyclists men and women between 20 and 39 years are predictive of high severity accidents. When there are no motorcycles or cyclists involved in the accident, the probable severity is low. Also, the collision between two motorcycles is considered of high severity. If the crash is a runover the severity is high, and it is inferred that the victim is a pedestrian. Finally, if the crash is between vehicles with rigid protection systems such as cars, buses or carriages, the accident decreases its severity.
This investigation allowed analyzing the accident records of Cartagena (Colombia) with data mining techniques. The rules contribute to the definition of strategies for the reduction of the severity of accidents. The presence of vulnerable road users (motorcyclists, cyclists, and pedestrians) were predictive variables on the severity of the accidents, as well as the results of [4, 8].
Rules 1, 2, 3, 4, 6, 8 and 12 show that more than 50% of the rules are related to the users of motorcycles. Motorcyclists are a population with significant growth due to the conditions for mobility, transportation, sports and other economic activities [9,10,11]. Subsequently, some countermeasures are exemplified by the defined rules. These rules are based on findings in the United States and the European Union on vulnerable users that can be replicated to reduce and eliminate road accidents. Among the policies, strategies and countermeasures to improve the road safety of the motorcyclist and the recommendations of WHO [12], are: Promote culture and education in road safety [13]. Analysis and monitoring of accident reports [14]. Road safety campaigns on the most vulnerable users in the ages of 15 to 30 years [15]. Promote the use of protective elements [16]. Restrict and punish driving under the influence of drugs and alcohol [17]. Control the speed according to the road type [18]. Improve the quality of roads, or design exclusive lanes [19]. Improve mechanical conditions and maintenance [20]. Improve the visibility of the motorcyclist [21]. Improve road safety conditions such as lighting, and infrastructure [22]. Forbid the transport and exposition of children on motorcycles [23]. Penalize violations and risky behaviors [24]. Restrict the manipulation of electronic devices while driving [25].
Finally, some additional countermeasures based on the rules are: Define road safety control plans according to the season. Rules 3 and 4 contrast that there are more demanding months in road control. Define speed limits conditioned by the intensity of rainfall. Rules 6 and 12 show that moderate and intense rains increase the possibility of accidents. Motorcyclist accidents can be avoided if interaction with other users is reduced. This is achieved from single circulation lanes. Rules 1, 3, 4, 6, 8 and 12 relate cars when the motorcycle interacts with another vehicle (e.g. Trucks and Cars). Finally, rules 7 and 8 confirm that the age of the motorcyclist influences the severity of the accident. These rules allow you to define license plans according to age. For example, in Europe, there are restrictions on circulation, speed, displacement, and violations according to age. Rule 11 is closely related to the high severity of the abuses. To define effective countermeasures from these rules it would be good to include new variables. These should be focused on additional aspects such as accident location, interception, location of the road, type of road, signaling, lighting, time, among others.
5 Conclusion
In this study, motorcyclists at young adult ages are related to predictive factors of severity at a level of injury or fatality. Global records in 2016 placed Colombia in the tenth position worldwide, the third in the region and second in South America in motorcyclist accident. In 2018 Colombia has 8.3 million motorcycles registered. In the last 7 years (2012–2018) the proportion of dead and injured motorcyclists is close to 50% of road users. Analyzing the accident rate of motorcyclists in Colombia and their causality are future investigations essential to improve road safety. The limitations of the study are the sub-registration of data by traffic control entities. If more causal variables are included, the creation of significantly more strategic rules for road safety can be achieved (for example, state and road conditions).
This investigation allowed analyzing the accident records of Cartagena (Colombia) with data mining techniques. The rules contribute to the definition of countermeasures, focused on vulnerable users for the reduction of the severity of accidents. The definition of rules from data mining is more effective than analyzing information with a simple descriptive statistical analysis. Because the analysis of the information is done in a correlational way, this contributes to obtain results that are easier to understand and apply. Techniques such as multivariate analysis or black-box techniques require additional steps for the analysis of information.
References
World Health Organization (WHO): Global status report on road safety 2018 (2018). https://apps.who.int/iris/bitstream/handle/10665/276462/9789241565684-eng.pdf?ua=1
Savolainen, P., Mannering, F.: Probabilistic models of motorcyclists’ injury severities in single- and multi-vehicle crashes (in English). Accid. Anal. Prev. 39(5), 955–963 (2007)
Abdelwahab, H., Abdel-Aty, M.: Development of artificial neural network models to predict driver injury severity in traffic accidents at signalized intersections. Transp. Res. Rec.: J. Transp. Res. Board 1746, 6–13 (2001)
Hashmienejad, S.H.-A., Hasheminejad, S.M.H.: Traffic accident severity prediction using a novel multi-objective genetic algorithm. Int. J. Crashworthiness 22(4), 425–440 (2017)
Sohn, S., Shin, H.: Data mining for road traffic accident type classification. Ergonomics 44, 107–117 (2001)
Huang, H., Abdel-Aty, M.: Multilevel data and Bayesian analysis in traffic safety. Accid. Anal. Prev. 42(6), 1556–1565 (2010)
Li, Z., Liu, P., Wang, W., Xu, C.: Using support vector machine models for crash injury severity analysis. Accid. Anal. Prev. 45, 478–486 (2012)
Delen, D., Tomak, L., Topuz, K., Eryarsoy, E.: Investigating injury severity risk factors in automobile crashes with predictive analytics and sensitivity analysis methods. J. Transp. Health 4, 118–131 (2017)
Balasubramanian, V., Jagannath, M.: Detecting motorcycle rider local physical fatigue and discomfort using surface electromyography and seat interface pressure. Transp. Res. Part F 22, 150–158 (2014)
Shafiei, U.K.M., Karuppiah, K., Tmrin, S.B.M., Meng, G.Y., Rasdi, I., Alias, A.N.: The effectiveness of new model of motorcycle seat with built-in lumbar support (in English). Jurnal Teknologi 77(27), 97–103 (2015)
Ospina-Mateus, H., Jiménez, L.A.Q.: Understanding the impact of physical fatigue and postural comfort experienced during motorcycling: a systematic review. J. Transp. Health 12, 290–318 (2019)
World Health Organization (WHO): Seguridad de los vehículos de motor de dos y tres ruedas: manual de seguridad vial para decisores y profesionales (2017). https://apps.who.int/iris/bitstream/handle/10665/272757/9789243511924-spa.pdf?sequence=1&isAllowed=y
Segui-Gomez, M., Lopez-Valdes, F.J.: Recognizing the importance of injury in other policy forums: the case of motorcycle licensing policy in Spain (in English). Inj. Prev. Short Surv. 13(6), 429–430 (2007)
Schneider Iv, W.H., Savolainen, P.T., Van Boxel, D., Beverley, R.: Examination of factors determining fault in two-vehicle motorcycle crashes (in English). Accid. Anal. Prev. 45, 669–676 (2012)
Ivers, R.Q., et al.: Does an on-road motorcycle coaching program reduce crashes in novice riders? A randomised control trial (in English). Accid. Anal. Prev. 86, 40–46 (2016)
Donate-López, C., Espigares-Rodríguez, E., Jiménez-Moleón, J.J., de Dios Luna-del-Castillo, J., Bueno-Cavanillas, A., Lardelli-Claret, P.: The association of age, sex and helmet use with the risk of death for occupants of two-wheeled motor vehicles involved in traffic crashes in Spain. Accid. Anal. Prev. 42(1), 297–306 (2010)
Albalate, D., Fernández-Villadangos, L.: Motorcycle injury severity in Barcelona: the role of vehicle type and congestion (in English). Traffic Inj. Prev. 11(6), 623–631 (2010)
Clabaux, N., Brenac, T., Perrin, C., Magnin, J., Canu, B., Van Elslande, P.: Motorcyclists’ speed and “looked-but-failed-to-see” accidents (in English). Accid. Anal. Prev. 49, 73–77 (2012)
Sager, B., Yanko, M.R., Spalek, T.M., Froc, D.J., Bernstein, D.M., Dastur, F.N.: Motorcyclist’s lane position as a factor in right-of-way violation collisions: a driving simulator study (in English). Accid. Anal. Prev. 72, 325–329 (2014)
Rizzi, M., Strandroth, J., Holst, J., Tingvall, C.: Does the improved stability offered by motorcycle antilock brakes (ABS) make sliding crashes less common? In-depth analysis of fatal crashes involving motorcycles fitted with ABS (in English). Traffic Inj. Prev. 17(6), 625–632 (2016)
Clarke, D.D., Ward, P., Bartle, C., Truman, W.: The role of motorcyclist and other driver behaviour in two types of serious accident in the UK (in English). Accid. Anal. Prev. 39(5), 974–981 (2007)
López-Valdés, F.J., García, D., Pedrero, D., Moreno, J.L.: Accidents of motorcyclists against roadside infrastructure. In: IUTAM Symposium on Impact Biomechanics: From Fundamental Insights to Applications, vol. 124, pp. 163–170, Dublin (2005)
Brown, J., Schonstein, L., Ivers, R., Keay, L.: Children and motorcycles: a systematic review of risk factors and interventions. Inj. Prev. 24(2), 166–175 (2018)
Elliott, M.A., Baughan, C.J., Sexton, B.F.: Errors and violations in relation to motorcyclists’ crash risk (in English). Accid. Anal. Prev. 39(3), 491–499 (2007)
Truong, L.T., Nguyen, H.T., De Gruyter, C.: Mobile phone use while riding a motorcycle and crashes among university students. Traffic Inj. Prev. 20, 1–7 (2019)
Acknowledgements
Funding for first author was covered by (CEIBA)—Gobernación de Bolívar (Colombia). We thank the Administrative Department of Traffic and Transportation (DATT) in the accompaniment and support of the information required for this investigation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ospina-Mateus, H., Quintana Jiménez, L.A., López-Valdés, F.J., Morales-Londoño, N., Salas-Navarro, K. (2019). Using Data-Mining Techniques for the Prediction of the Severity of Road Crashes in Cartagena, Colombia. In: Figueroa-García, J., Duarte-González, M., Jaramillo-Isaza, S., Orjuela-Cañon, A., Díaz-Gutierrez, Y. (eds) Applied Computer Sciences in Engineering. WEA 2019. Communications in Computer and Information Science, vol 1052. Springer, Cham. https://doi.org/10.1007/978-3-030-31019-6_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-31019-6_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31018-9
Online ISBN: 978-3-030-31019-6
eBook Packages: Computer ScienceComputer Science (R0)