Introduction

The number of traffic accidents is increasing globally in relation to an increase in vehicles, and therefore, road safety is an important issue for all countries (WHO 2009, 2015). It is a particularly important issue in Turkey, where the road traffic density is very high due to the use of highways as the main access roads, and traffic accidents often occur as a result of road infrastructure problems and traffic violations when we examined traffic accidents are mostly caused by these two problematic sources (TURKSTAT 2015). In 2004, 537,5352 traffic accidents occurred in Turkey, and of this total, 4427 people were died and 136,437 people were get injured (TURKSTAT 2015). In 2014, a total of 1,199,010 traffic accidents occurred; 168,512 events occurred within these accidents, and the number of accidents includes both deaths and injuries. Furthermore, in the traffic accidents that took place during the year, 285,059 people were get injured, and 3524 people died (GDH 2014). It is therefore evident that the number of accidents per year more than doubled between 2004 and 2014, and although the number of accident fatalities had decreased by 2014, the number of injured people exhibits a growing trend. In Turkey, precautions and investments have been made to reduce the number of traffic accidents. This involves determining accident black spots with the aim of performing preventative activities in such places. An accident black spot is a segment of a road with increased accident density due to a specific cause by during range of the study time period of the data. The time period taken may vary on a periodic basis of the study such as annualy, monthly, yearly for the study.

The literature emphasizes that an area should be examined when four or more of the same type of traffic accident occur within 1 year (Hamburger and Kell 1981). To ensure road safety, it is necessary to examine the spatial and temporal statistics relating to traffic accidents (Morin 1967; Norden et al. 1956; Rudy 1962). While Tamburri and Smith (1970) introduced the notion of the safety index which is actually a combined criterion of accident number and accident severity (Tamburri and Smith 1970), Jorgensen (1972) introduced a new method that the identification of black spots is based on the difference between the expected number and the observed number of accidents (Jorgensen 1972). On the other hand, depending on the developing technologies, new methods have been revealed especially including the frequency and the intensity of the accidents, the comparison of the actual amount of accidents and the expected amount of accidents, minimizing inaccuracy of actual accident rate etc. from past to present (Higle and Witkowski 1988; Jorgensen 1972; Mcguigan 1981; Mcguigan 1982; Tamburri and Smith 1970; Taylor and Thompson 1977). By analyzing the spatial structure of accident occurrence, it is possible to determine where precautions should be taken on roads in a particular region with some statistical methods based GIS technology (Yu et al. 2016).

Geographic Information System (GIS) technology is an important tool to use in determining road accident hotspots on highways, and it has been widely used in road safety research (Faghri and Raman 1995; Hilton et al. 2011; Loo and Yao 2013). The spatial relations between traffic accidents and the geometric characteristics of the road segments have been modeled (Lupton et al. 1999; Ma et al. 2014). Some studies have examined traffic accidents only, otherhand examined traffic accident spatially using GIS technology to determine hot spots (accident black spots) (Codur and Tortum 2015; Gundogdu 2010; Gundogdu 2011; Steenberghen et al. 2004). Furthermore, spatial clustering and statistical analysis studies have been used to determine black spots with hot-spot GIS analysis (Anderson 2007; Polat and Durduran 2011; Prasannakumar et al. 2011; Songchitruksa and Zeng 2010; Steenberghen et al. 2004). In addition, differing statistical and geostatistical analysis methods have been used to determine hot spots. While some studies have used point-based methods in spatial clustering, spatial autocorrelation, and kernel density analysis (Atsuyuki Okabe et al. 2009; Erdogan et al. 2008; Ghouti 2016; Mohaymany et al. 2013; Xia and Yan 2008; Yamada and Thill 2004), other studies have used mapping clusters, hot spot analysis, Getis-Ord Gi* methods (Anderson 2007; Gundogdu 2011; Songchitruksa and Zeng 2010).

In this study, GIS-based analysis is used to determine hot spot regions over 45 km of main routes in Rize Province, Turkey, with the aim of understanding the processes involved in traffic accidents in which people were get injured or died between 2010 and 2014. To achieve this, a network dataset is created with a road network analysis layer. Using this dataset, network spatial weights are created and spatial relations are modeled. The Getis-Ord Gi* method is used in hot spot analysis, and a mapping cluster is provided. Ultimately, hot spot points are determined to improve road safety on main routes in Rize. Then, using the same data, kernel density method is applied to determine traffic accident black spots. Advantages and disadvantages of two methods are compared to each other.

Methods

When road accident hotspots are found, clues may be obtained to investigate the causes of the accidents. Cluster detection enables a visual presentation of traffic accidents and is therefore extremely useful. The process steps in accordance with the main purpose of this article: (1) to determine traffic accident data (included the number of injury and death), (2) using traffic data to create GIS geodatabase, (3) generated the Network dataset and Spatial Weights for the road data, (5) hotspot analysis using network spatial weights, (6) to realize finally geo-spatial analysis, (7) applied kernel density analysis, and (8) compared methods results.

Study area

This study is performed in Rize, a northeastern province of Turkey, which is surrounded by high mountains that extend parallel to the sea and are in close proximity to the beach. For this reason, highways are built along stream valleys to connect areas to the sea, and these routes provide connections in an east-west direction and north-south direction. The Eastern Black Sea Coastal Highways provide connections between Trabzon and Artvin Provinces and traverse the province of Rize for approximately 90 km. The road is a dual carriageway and is divided into 2 lanes in each direction. This study focuses on 45 km of coastal highways contain within the points of traffic accidents that run between the borders of Iyidere and Cayeli Merdivenli in Rize (Fig. 1).

Fig. 1
figure 1

Study area

Traffic accident data

In Turkey, traffic accidents data can be found in a spatially based database system. Existing records are kept in paper form, and in this study, locations are determined by measuring the coordinates of the accident locations using a GPS receiver. To map traffic accidents, it is initially necessary to use recorded data and to organize it in a database prior to digitize. It is also necessary to integrate the information in the spatial information system by taking advantage of the relative coordinate information. In this study, traffic accident data for the past 5 years (records for the year 2010–2014) are obtained from the Rize Police Department. These data are recorded as in accident reports and include information about accident occurrences and parameters. Accident parameters cover the date, hour, location, highway information, weather, road conditions, the age and sex of the driver and whether alcohol had been consumed. This study uses data on traffic accidents in which approximately 441 people were totally either died or get injured in the study area between 2010 and 2014. Traffic accident information records, found in paper form, are arranged for transfer to a digital form in a spatial database using the MS Office 2011 Excel programme. According to accident records, between the years 2010 and 2014, a total of 449 accidents occurred. When the accident records were transferred to the map according to the coordinate information, it was identified that the coordinate values of 60 accidents had been entered incorrectly into the accident reports. Therefore, within 13.4% of all the data, the recorded coordinate information is incompatible with the data. This is due to errors in the accident reports filled by attendants and to X and Y coordinate data being entered incorrectly (especially interchangeably) in the accident reports. Coordinate information, measured by hand-held GPS, of the X and Y values that are entered incorrectly in the database have been corrected. Furthermore, road accident point positions that are not located on roads have been corrected based on the crime scene photos and descriptive information on the locations. Additionally, 8 accident data points with insufficient information in the accident reports have been excluded from this study. The most challenging aspect of this study is the transference of accident records from paper to a spatial database. In addition, the coordinate information found in the records is sometimes very different from the specified location. In such cases, the written description of the observed accident location is used, and then the location is displayed on the map.

Creating GIS database

GIS layers created in this study use data relating to traffic accidents, road networks, and administrative units. In addition, to visualize the results, the basemap function of the ArcGIS 10.4 software is used. Traffic accident data are obtained in paper form, ready to be used with the GIS software, and all application data are converted to the common coordinate system as 30 UTM-ED50 data. Administrative units and positional data for the road network are obtained in a shapefile format at Karadeniz Technical University GISLab. X and Y coordinate information based on accident locations (found in the accident registration information) and a spatial representation of all accidents are represented as data points in the ArcGIS 10.4 software. After creating the table in Microsoft Excel according to the coordinates, the data are then transformed into a representation of spatial features using the Data Management tools and the Make XY Event Layer tools in ArcCatalog. The accident data are constructed to be used in a shapefile format within the GIS software. Thus, the shapefile is formed in accordance with the accident data features. However, in addition to traffic accident data, the positional information of the road network is needed to determine traffic accident black spots. Thus, a network dataset is created to assess the effects of traffic accidents at existing junctions located at the intersections of roads in the road network.

Generating the network dataset and spatial weights for the road data

A network dataset is required to represent the distances between different accident locations before network spatial weights can be generated. To generate network spatial weights, a point feature class is needed to represent both feature origins and feature destinations. In briefly, the accident locations are used as the feature origins and feature destinations. The generate network spatial weights function first allocate the accident on the highway network and then use the travel distance to calculate the weight between each and every other accidents locations. A simple line feature file is used to create a network highway dataset, whereby the line feature is used to create a new network database. It is important to establish a network dataset so that the rotations and path lengths can be determined and a weight matrix can be established. With the developed network dataset, shapefile files can be obtained that represent junctions on the road network and sets of network data. The distance between two point values can then be calculated based on the network instead of calculating the distance using a straight line. Then, the weight matrix to be created for the determined route network is studied. It is important when creating a weight matrix to determine a point-feature class that indicates the source. The output of the generate network spatial weights function is a spatial weights matrix file which contains the spatial relationships among all objects. Weight represents spatial relationship between the from feature and the to features. Generate spatial network weights construct a spatial weights matrix file (.swm) from a Network dataset which defines spatial relationships among all features in terms of an underlying network structure. This tool does not honor the environment output coordinate system. All feature geometry is projected to match the spatial reference associated with the Network Dataset prior to analysis. The resultant spatial weights matrix file created by this tool will reflect spatial relationships defined using the Network Dataset spatial reference (Zhang 2010).

In this respect, the travel lengths may be determined by separating the accidents on the road network and using the weight values between the places where the accidents occurred. The results generate a weight matrix function that represents a spatial weight matrix relation with other objects. The obtained weights represent a spatial relation between two attribute objects, and this obtained weight matrix file is produced with a .swm file extension and is used to reflect the relations between the accident data obtained in the hot spot analysis.

Hot spot analysis (Getis-Ord Gi*)

A hot spot analysis is performed using the generated spatial weights. Hot spot analysis is commonly used to identify the effect and density of a set of traffic accidents. The consequents (called z score and p values) spatially show where feature high or low values cluster. This method considers neighboring features. The feature has a high value and be surrounded by other features, hot spot regions can be identified statistically. This method and tool is an important for some academic studies such as traffic accident black point determination studies. Because this method gives quite important results compared to other methods, while determining black spot points where traffic accidents are intense. Especially using this method, the Getis-Ord Gi* statistic is used to calculate each value of an accident. Since each value of the accident data is calculated, it is important to use it in the analysis of accident data. In a study by Zhang (2010), Getis-Ord Gi* statistics use a longer clearance time for the cluster, but in this study, only equivalent property damage and only crash frequency are used for clustering (Zhang 2010). Instead of creating clusters according to the frequency counts of traffic accidents, clustering was performed according to accident weights. In the literature, accident weights are created based on only equivalent property damage or crash frequency. However, the calculation of the weight values depends on the proportion of accidents in which there were fatalities, injuries, and property damage, with respect to the country’s level of development. In this study, these ratios are essentially based on the information provided in Kahramangil and Şenkal (Kahramangil and Şenkal 1999). In such calculations, fatal crashes have a weight factor equal to 9, crashes in which injuries were sustained have a weight factor equal to 3, and those in which property damage only (PDO) was sustained have a weight factor equal to 1. The weights are determined according to accident severity using the following formula (Kahramangil and Şenkal 1999).

$$ EPDO\ \mathrm{weights}= 9\ \mathrm{Fatal}\ \mathrm{crashes}+ 3\ \mathrm{Injury}\ \mathrm{crashes}+ PDO\ \mathrm{crashes} $$
(1)

Getis-Ord Gi* statistics can be applied to the hotspot analysis tool in ArcGIS, and the z-values can then be calculated, which indicate whether features with high or low values are clustered together at a particular location. With hot spot analysis, spatial weights are calculated when creating the network dataset of the road network. Using the network spatial weights and performing hot spot analysis, the z values are found, and accident black spots can be identified as the locations of clusters where the z value is low or high. The following mathematical equation is used to efficiently determine the statistical density in Eq. 1:

This method calculate the Getis-Ord Gi* statistic for each feature in a dataset. The Getis-Ord statistic is given as:

$$ {G}_i^{\ast }=\frac{\sum_{j=1}^n{w}_{i,j}{x}_i-{X}^{\prime }{\sum}_{j=1}^n{w}_{i,j}}{S\sqrt{\frac{\left[n{\sum}_{j=1}^n{w}_{i,j}^2-{\left({\sum}_{j=1}^n{w}_{i,j}\right)}^2\right]}{n-1}}} $$
(2)

where xj is the attirubute value for feature j, wi, j is the spatial weight between n feature i and j, n is equal to the total number of features and:

$$ {\mathrm{X}}^{'}=\frac{\sum_{j=1}^n{x}_j}{n} $$
(3)
$$ S=\sqrt{\frac{\sum_{j=1}^n{x}_j^2}{n}}-{\left({X}^{\prime}\right)}^2 $$
(4)

where Xj is the attribute value for feature j, n is the sample size, and Wi, j denotes the spatial weights between features i and j (Zhang 2010).

The outcome of the Gi* statistic is a z value for each feature. Higher z values indicate clusters of accidents that occur over a longer period, and lower z values indicate a large number of accidents that occur over a shorter period around an area. In this study, there is no spatial correlation between traffic accidents. The lack of spatial correlation indicates that traffic accidents follow a normal distribution. In addition, while the z value is used in deciding the acceptance of a proposed idea, the p value is used in providing information on whether the transaction is correct or not. At the end points of the normal distribution, the z value is either high or low, and the p value is small. This means that the proposed concept has a very high probability.

In the tails of the normal distribution, the z value is very high or very low, and the p value is small. This means that the idea has a very high probability. From the result of hot spot analysis, z and p values are calculated for each accident data point. If an accident takes place at a location where the z value is very high and the p value is lower, the designated area is defined as an area where the accident trend is relatively high. According to the results of hot spot analysis, areas where accidents are very dense are determined using the obtained z and p-values. In this study, it is used Hot Spot Analysis (Getis-Ord Gi* statistics) with integrated Spatial Weight matrix for the road data. First of all, after generating the network dataset, a spatial weights matrix file (.swm) from a network dataset which defines spatial relationships among all features in terms of an underlying network structure is obtained. Then, hot spot analysis are carried out by using integrated spatial weights for the creation of injured and deadly hot spot regions. In short, the generated spatial weight matrix is used as a kind of input data in obtaining injured and deadly traffic accident black points.

Kernel density method

This technique transforms a sample of observations, recorded as geographically referenced point data, into a continuous surface that indicate the intensity of individual observation over space (Bailey and Gatrell 1995, 1996). The method is known as KDE because around each point at which the indicator is observed a circular area (the kernel) of defined bandwidth is created. This takes the value of the indicator at that point spread into it according to some appropriate function. Summing all of these values at all places, including those at which no incidences of the indicator variable were recorded, gives a surface of density estimates. Radius of the circular neighborhood affects the resulting density map. If the radius is increased, there is a possibility that the circular neighborhood would include more feature points which results in a smoother density surface. The common form of the KD function is as follows:

$$ \mathrm{{\mathchar'26\mkern-9mu \lambda}}(s)=\kern0.5em {\sum}_{i=1}^n\frac{1}{\tau^2}k\left(\frac{\left(S-{S}_i\right)}{\tau}\right) $$
(5)

where μ(s) is the estimate of the intensity of the spatial point pattern measurement at location s; si is the observed ith event, k represents the kernel weighting function; and s is the bandwidth (Borruso and Schoier 2004; Kloog et al. 2009). Compared with other studies of point patterns, KD smoothing provides significant advantages. In particular, it makes it possible to estimate the intensity of a point pattern and to represent it by means of a smoothed continuous surface, thus helping highlight the presence of clusters or regularities in the parameter’s distribution (Galrell 1996). From the perspective of past studies, there are many kinds of kernel functions (uniform, triangle, epanechnikov, quartic, Gaussia) in ArcGIS software. But especially using point features to determine black hot spot regions, it is identified that quartic kernel function described in Silverman (1986) is often used in this applications (Silverman 1986). This kernel density function method for point features is working principle like a smoothly curved surface fitted over each point. Because of this, in this study quartic is chosen as the function of the kernel analysis to determine black accident point regions. After determining black point regions with hot spot analysis, in the second step of this study, accident points are used to determine block spots with kernel density method and show in the figure. When the kernel density is applied, we choose the bandwidth 250 m because this is an appropriate bandwith for kernel density estimation of traffic accident (Xie and Yan 2008). So in this study, we determine kernel density values with this situation. Therefore, this application is made to compare two methods (kernel density and hotspot analysis using network spatial weights) that give results which method is more effective in the traffic accident application.

Results

In this study, in the network dataset, depending on the road network, the weights are calculated using the Generate Network Spatial Weights procedure for the relevant location according to the given mathematical equations, and hot spot analysis (Getis Ord Gi*) is performed using data related to road traffic accidents that occurred between 2010 and 2014 in the study area (areas of road covering 45 km in Rize) and the z and p values are calculated, in which people had been either dead or injured. Three maps of the study area are then obtained using the results of the hot spot analysis are produced: a traffic accident hotspot density map; a hotspot density map of traffic accidents in which people had been dead; and an integrated map of accidents in which people had been either injured or died (Figs. 2, 3, 4).

Fig. 2
figure 2

Hotspot Map of Deadly Traffic Accidents/Rize

Fig. 3
figure 3

Hotspot Map of Injured Traffic Accidents/Rize

Fig. 4
figure 4

Hotspot Map of Traffic Accidents/Rize

Then, kernel density analysis is performed in these same data and the resultant maps are produced. To compile these maps, analyses are performed according to a method in which equivalent weights are determined, which uses an evaluation of vehicles involved in accidents and casualties. For each location investigated, the number of people injured or died as a result of an accident, or the damage to the vehicle, is taken into account to determine the total intensity of the examined location. It has been determined in the literature that 1 fatal accident is equivalent to 9 property-damage accidents, and 1 injury accident is equivalent to 3 property-damage accidents, depending on the country’s level of development (Kahramangil and Şenkal 1999). Maps are produced using the total values for the accident data that complied with the values in the database. The results obtained in the hotspot analysis are shown in Figs. 5, 6, and 7.

Fig. 5
figure 5

Kernel Density Map of Deadly Traffic Accidents/Rize

Fig. 6
figure 6

Kernel Density Map of Injured Traffic Accidents/Rize

Fig. 7
figure 7

Kernel Density Map of Deadly-Injured Traffic Accidents/Rize

Conclusion

Accident black spots can be determined by mapping traffic accidents, and factors causing such accidents at these points can then be investigated with the aim of reducing accidents by altering these factors. Although a large number of studies using methods such as statistical and spatial analysis have been proposed, this study differs in that accident black spots are identified using hotspot analysis. The main reason for the use of hotspot analysis is that it allows for the identification of accident black spots by considering both the spatial weights of the road geometry and the weight of each accident. In this respect, hotspot analysis gives improved results, as our method is based on the frequency of accidents as a kernel density function.

This study area is a 45 km section of the Eastern Black Sea Coastal Highway in Rize Province, Turkey, which extends from the center of Trabzon. The study mapped accidents occurring over the past 5 years in which people had either been died or injured. Firstly, accident records are transferred to a geodatabase according to the location of the accident. However, in Turkey, traffic accident records are not stored in a spatial-based information system, and therefore, the most difficult part of this study is to obtain accident record data and transfer them to a spatial database.

To determine accident hot spots, ArcGIS 10.4 software is used and the Hot Spot Analysis: Getis-Ord Gi* statistic tool ang Kernel Density tool is used. When looking at the result values and maps, hot spot analysis scores are better than kernel density method results. Because in the hot spot analysis method, we use network spatial weights to determine traffic accident black spots. On the other hand, in the kernel method, we can use kernel tool which can points out the area only around accidents point, the spots are where accident frequently happen. So, hotspot analysis is based on locations where accident happened. Finally, this case is a more accurate and usable approach that the other method and this study could lead to further work to evaluate other factors involved in accidents.