Keywords

1 Introduction

The Gran Area Metropolitana (GAM) of Costa Rica, the principal urban and industrial region in the country, has severe traffic congestion and public and private transportation problems [9]. The reasons for that situation include a significant lag in traffic infrastructure development and an extremely large amount of vehicles concentrated in a small region. This behaviour worsens during rush hours and affects much of the population. Traffic congestion leads to several problems affecting the environment and quality of life of the population [19]. These problems include air and sound pollution, increment in fuel consumption, delays in commute time and emergencies, traffic accidents, and more.

An Intelligent Transportation System (ITS) [22] is traditionally used to help in solving pressing traffic problems. An ITS is a tool that gathers several types of traffic and social data from different sources. Such as traffic sensors, GPS, video detectors, autonomous and connected vehicles. For analysis and decision making on temporal or permanent solutions. Transportation authorities in Costa Rica have not implemented an ITS yet. Therefore, we must resort to computer traffic simulations to understand traffic patterns and propose alternatives that alleviate critical problems. A traffic simulation tool builds a virtual environment with the goal of performing analysis about the current traffic conditions and exploring potential changes to transport infrastructure or driving rules. Although promising, traffic simulations require a substantial calibration effort of all system parameters. Otherwise, it may not be possible to obtain an accurate and adequate simulation. Parameter adjustment is achieved with methods to optimize simulations, mostly calibration, validation and verification [6, 15].

In this paper, we present a method to calibrate traffic simulations using Global Position System (GPS) navigation records, collected from Waze, a commercial mobile application widely popular among Costa Rican drivers. Data was obtained from previous studies performed by State of Nation Program (PEN) [5, 7, 19]. These navigation records only report traffic speed in road segments. Therefore, our proposed method must solve an inverse problem. Traffic flows in all possible routes in a sector have to be calibrated to reproduce the same traffic speed. The whole calibration process requires selecting locations for study, preprocessing the GPS navigation data, processing and adjusting the network files of the virtual roads for the simulation, executing the calibration method as a software tool to perform experiments, evaluating results, and proposing solutions for each location.

2 Background

2.1 SUMO and Traffic Simulations

Computer simulations help emulate real traffic situations, show weak points in the road network and predict traffic conditions. Simulations are useful when there is no available real traffic flow information and traffic behavior is complex. The ultimate aim of traffic simulation is to create a virtual environment to understand traffic patterns difficult to capture and analyze from real scenarios. Modern data analysis and hardware advanced technologies have intensified the use of traffic applications to increase drivers safety, energy efficiency, user navigation system experience, and road infrastructure planning. Many of these problems are complex and can scale significantly. Therefore, we should rely on traffic simulation models that can be more accurate and dynamic than traditional analytical methods to suggest solutions for the aforementioned problems [6].

SUMO (Simulation of Urban MObility) [13] is a traffic simulation software created to simulate traffic conditions in a city. Specifically, its goal is to unveil the underlying model that produces traffic patterns, comparing features like speed simulation or the capacity to represent reality from other models. Created in the Centre for Applied Informatics at Cologne, Germany, it has been a popular tool to simulate and study traffic flow models, with additional tools to simplify processes like converting to different formats and creating routes to describe city transportation environments.

Past developments and SUMO applications are presented in a study [11]. SUMO simulates vehicular communication to: i) study the effect of vehicle-to-vehicle and vehicle-to-infrastructure communication where a combined simulation of traffic and communication is necessary; ii) optimize traffic light algorithms to make traffic lights capable of adapting to current traffic situations; iii) evaluate traffic surveillance systems to develop surveillance technology, and iv) use image processing of simulated areas to predict weather that could trigger critical traffic situations. A very interesting work is the calibration of the car-model in real sectors of American roads. In that research, they calibrated travel times compared with data from simulated traffic detectors. They managed to reduce the simulation error from 40% to a 15% [12].

Taking advantage of that SUMO simulates route choice and dynamic navigation. The work of the authors in [21] concentrates on traffic flow generation using an origin-destination (O-D) matrix with data from induction loop measurements available from traffic authorities, and then it uses DFROUTER (a SUMO tool to reroute vehicles), along with a heuristic, to generate an O-D matrix for traffic that resembles the real traffic distribution. Simulation results are validated against real data.

Similar work was done by Celick and Karadeniz [3], using SUMO simulator as we did in this paper. They aimed at optimizing the traffic flow, depending on the traffic density. Their approach takes an intersection and examines all the lanes on each side and processes traffic based on the lane where the longest tail is located. Then, it develops a real-time traffic light optimization system to set new light configuration, creating a smart intersection system. The most important conclusion is that for traffic light optimization the real-time analysis and change method gives better results than a fixed time and green wave method. The latter method is based on the principle that most cars which pass on green light encounter green light again at the traffic lights on the next intersection.

Flow [10] is another work where researches used reinforcement learning with SUMO to analyze traffic dynamics and perform optimization. Flow provides users with the ability to easily implement, through TraCI’s Python API, hand-designed controllers for any components of the traffic environment such as calibrated models of human dynamics or smart traffic light controllers. Together with the dynamics built into SUMO, Flow allows users to design rich environments with complex dynamics. A central focus in the design of Flow is the ease of modifying road networks, vehicle characteristics, and infrastructure within an experiment, along with an emphasis on enabling reinforcement learning control over not just vehicles, but traffic infrastructure as well.

Additional to the studies using SUMO, other authors have summarized several consideration while performing traffic simulations. Li et al. [14] describe the process of traffic optimization and organization, both processes were implemented using the combination of static channelized of road junction and signal optimization. For signal optimization, they used a model called simultaneous perturbation stochastic approximation, an approach that approximates the gradient of the objective function through finite differences. This method achieves an important reduction in computational cost compared to traditional stochastic approximation methods. The article presents a case study where they analyze one sector of 1.68 km with 7 signal intersections, during morning rush hours where the average speed of vehicles was around 10 and 15 km/h. They used data from a historical database of the traffic flow in the road. The final results show that the signal optimization method for the simulation model and the implementation were effective and feasible, increasing the average car speed after the optimization.

And finally, works like Paternina et al. l [17] proposed the use of artificial intelligence-based techniques, such as reinforcement learning and artificial neural networks, to design a global optimization approach that is coupled with discrete-event computer simulation models to efficiently resolve practical problems.

2.2 Simulation Optimization Techniques

A simulation model is based on a mathematical model which is used to implement a computer program. By running the simulation model with specific values for the input variables, we can examine the behavior of a system. We can define a simulation experiment as one or several tests in which meaningful changes are made to the input variables of a simulation model to observe and identify the reasons for changes in the output. We selected random (heuristic) search methods for our calibration process. These methods are part of direct search methods, and they are used to find a way to organize the search process and avoid searching over all possible solutions. Instead, these methods turn into a low-cost search that is likely to discover a good, or near-optimal solution. Providing an efficient global search strategy, because they balance exploration with exploitation [1]. The heuristic as a rule-of-thumb may not guarantee convergence and optimality, making heuristic methods vulnerable to falling into local optima.

Simulated annealing (SA) is a stochastic search method commonly used to solve deterministic optimization problems and combinatorial problems in traffic assignment [16]. The concept of annealing comes from thermodynamics, which emulates how a fluid slowly cools down into a solid to produce a stronger, more stable product. Simulated annealing as an optimization tool has been used in different fields for several decades now. The method is a variation of conventional iterative improvement methods that begin with an initial feasible solution, repeatedly generate and consider changes in the current configuration, and accept only those that improve the objective function. This improvement mechanism has a probabilistic factor, in which non-improving moves are occasionally made, and it therefore offers a way to avoid getting stuck in local optima, while keeping track of the best overall solution, expecting to arrive to a global optimum [20].

To avoid the undesired convergence to a local optimum that characterizes deterministic local heuristic methods, simulated annealing methods probabilistically accept configurations that temporarily deteriorate the quality of the system being optimized. An acceptance probability is computed, based on the change in the objective function and a temperature parameter. As the temperature is appropriately reduced (this is called an annealing schedule or a cooling schedule), fewer non-improving moves are accepted. Thus, a coarse global search evolves into a fine local search for optimality, and the probabilistic jumps provide avenues to avoid sinking into non-global optima [2]. Implementation of simulated annealing requires choosing parameters of the initial and final temperatures, the cooling schedule, and a number of function evaluations at each temperature.

3 Solution Overview

The principal goal of this work is to create a tool to calibrate a traffic simulation in SUMO, being able to represent the real-world traffic situation on specific road sectors in a determined period. This calibration is done with GPS records from the Waze application. The data is used to compare the reported speed with results from simulation. Using an optimization algorithm we adjust traffic flow in all routes to represent real traffic conditions as close as possible.

3.1 Scenario Selection

Rush hour in Costa Rica is problematic. Several locations of the GAM show heavy traffic and congestion affecting a substantial amount of people daily. Selecting a road sector to analyze their traffic flow is not an easy task. The government is already working in some sections of the road network, for example Circunvalación, the principal bypass in the capital city of San José. Currently, this ring is being completed with a brand new north segment, and some secondary roads are suffering changes to create more lanes and new signaling. Considering that, we focus on sectors that represent a problematic case and, at the moment, may not be receiving total attention from the corresponding authorities.

We then chose four road sectors from different areas, considering the importance of the location, the impact of current traffic congestion, and the feasibility to create new infrastructure. From the study called Congestion of Vehicular Flow of GAM by CFIA [4], we considered the reasons for traffic congestion together with some ways to address the road congestion to select the scenarios. Including on these reasons are the traffic demand in rush hours, the limited road spaces, the elevated cost of road infrastructure, the negative impact of heavy congested junctions, the traffic light coordination timing and the priority to public transportation. We describe the four chosen sectors next.

Fig. 1.
figure 1

Maps of sectors in the study along with their corresponding segment overlay (Color figure online)

Figure 1 shows the map of segments (in purple) per sector. Each segment is 100 m long or less, depending the split of each road length. In red numbers are annotated the main segments we chose to evaluate the traffic flow. These segments are the more relevant according with the observed traffic congestion, and the vehicle flow moving to the focal point in the sector.

Sector 1 is a main junction in Plaza Mayor, Rohrmoser, visible in Fig. 1(a). It presents traffic congestion in all directions, specially in the direction north to south. The north road comes from an uphill road and a bridge and stops at a traffic light, which aggravates the jam on the north side of the intersection. The traffic light is necessary because it is a junction with 5 routes and 8 turns. In this sector, we are looking into reducing that north-south jam, showing the impact of some simple changes. The districts at this location are Uruca and Pavas. This information is important because it is part of the filtering process to extract the involved road segments. Sector 2 (see Fig. 1(b)) presents a three-way junction without traffic lights, which is overly problematic. Sector 3 (see Fig. 1(c)) and Sector 4 (see Fig. 1(d)) are similar. Their location represents a main road that has heavy traffic flow mainly in one direction and has several secondary ways that introduce more complexity and vehicles to the actual traffic congestion.

3.2 Data Preparation from Waze

The data used to calibrate the simulation are GPS navigation records from the commercial mobile application Waze. A collaboration among State of the Nation Program (PEN), Ministry of Public Transportation of Costa Rica (MOPT) and Waze allows us to use data coming from jams and incidents reported since 2018. Even though the raw data is not available for this study. We used different data layers provided by a previous team of researchers that worked on the processing of this raw data [5, 7, 9]. They worked on cleaning and organizing the records in data frames of R programming language, and saving them on an RDS file. Using this data, we performed our filters to select the GPS records from the time frame and the appropriate location. The new data contains records with the following variables: city, length, speed, anno, hour, delay, line, startNode, month, dayWeek, endNode, roadType, street, day. We only need speed and line information for each record, but the filters are executed considering other variables.

For this work, we have data from the whole 2018 and part of 2019. The first filter selects records from the weekdays and the evenings rush hours. After that, it selects two variables: line and speed. The first one contains the geometric information required to intersect with spatial information from SUMO and other sources. The second one is the average reported speed of the jams in that moment. Then, we took only one-hour records during the rush hour in the work days, specifically \(17^{th}\) h (5 p.m.). We focused the study on that specific hour to try to avoid errors caused by the unstable conditions of the beginning and the ending of the rush hour. We assume this is the steady-state of the system.

The next step converts those records into spatial data. The spatial data structure includes points, lines, polygons and grids; each of them with or without attribute data [18]. Then, using the dataset RedVial, a dataset prepared from the previous studies of Gómez-Campos and Cubero [5, 9], the algorithm takes the district IDs to extract the 100-m road segments for each district. Finally, it intersects the GPS records with the road segments to once again reduce the amount of data and canalize only the required information.

At this stage, the workflow aggregates the data and gets the statistics of speed for each segment, resulting in a new data frame with the segments of road organized by id and their respective average speed reported for one hour, specifically at hour 17 (5 p.m.). This information is saved as a csv (comma-separated values) file that is used during the calibration process to compare with the speeds resulting from the simulation.

3.3 Calibration Algorithm Implementation

We developed the solution in Python and R languages. Python was chosen to facilitate the programming and take advantage that SUMO is written in Python. R was chosen to reuse the existing code elaborated by Cubero et al. [5] as a base for the spatial data processing, and specifically the parallel code execution to intersect road networks.

Fig. 2.
figure 2

General workflow of the solution

Figure 2 shows an overview diagram of our simulation optimization solution. The virtual environment component starts with preparing the input parameters that include the time period for vehicle insertion in specific routes of the sector. SUMO runs the simulation and generates an output that is the aggregated data of the road segment by a time interval. This output is used in combination with the GPS records to compare them and calculate the measure of performance (MoP) of the data, starting the calibration algorithm to find new parameters to simulate again. Each iteration the algorithm will perform the same procedure, generating new parameters, executing the simulation and calculating the MoP to verify and validate the alternative solution. The process stops when the indicated iterations are performed. The final step is the statistical test to validate the best found solution, to decide if it is truly useful.

3.4 Calibration Algorithm

The input parameters of the traffic simulation are the vehicles flows created manually in a routes file. Each flow (or route) contains an initial node (from) and final node (end). Those nodes are manually set according to the study and visualization of real conditions of the sector where more traffic flow is created. Attributes like departLane, departPos, departSpeed are set to random values to include variability. The most important attribute is period, this parameter is the spawning time in seconds between vehicles in the respective route. If those values are tiny, the simulation generates a high traffic flow, but it may cause deadlocks on the roads.

The chosen algorithm used as a base for the design of the solution was Simulated Annealing (SA). This algorithm takes an initial set of parameters to run a simulation once to create the initial solution. This initial parameters are the spawning periods for the initial set of routes indicated in the routes file. The first solution is set as the best solution that will work as a comparison start point. Then, the algorithm chooses a combination of parameters based on the initial input to generate an alternative solution (also called neighbor), either worse or better. Being this new solution stored in a routes file. The value of this new solution will be compared with the previous best solution, and the best of both will be set as the new best solution. On each iteration, the algorithm attempts to generate and select a better solution.

In our traffic simulations, the solution is a set of values describing the average speed by segment. We represent two sets of values (from GPS records and from simulation) as two vectors (\(V_{waze}\) and \(V_{sim}\) respectively), on which we compute a distance function. That is why we need close enough vectors to affirm that the simulation is representing a real situation of traffic congestion in the studied sector. To compare those two sets of values and obtain a measure of the solution, first we calculate the MoP using the mean relative error (MRE) using the formula \(MRE = (\sum _{}^{} { (|V_{sim} - V_{waze}|/V_{waze})})/ size(V_{waze}) \), which gives us a value of the distance of the vectors. With this result, we have an idea of how close those vectors are and we can put a single number on each solution. The final aim of the calibration method is to minimize that distance, trying to get the most similar values as possible and the lowest MRE. At the end of the process, the statistical test Paired Sample T-Test is calculated to determine if the best solution we found is statistically relevant, and thus we verify whether the calibration algorithm succeeded.

Algorithm 1 shows the core steps. In summary, the first part calculates the solution of the initial parameter configuration. Those parameters are given by a random value in a specific range to generate some traffic flow. This initial calculation involves processing the simulation output, intersecting the resulting road sectors with the Waze data to calculate the MoP, and setting the value as the first point of comparison. After that, the iterative algorithm starts. Based on the last solution parameters, SA will choose several neighbors indicated as a parameter. For each neighbor, the program will run a simulation, intersect the results with the GPS records and get an MRE value. Each time, the new solution will be compared with the last best solution found and changing it if a better solution is found. Neighbors are chosen using the property of temperature of the SA. The nature of the algorithm is that a temperature variable resembles the temperature in the original process in metallurgy. This variable starts at a high value and it is reduced at each iteration, similar to the cooling process. So, the higher the temperature, the more the input parameters will change in the simulation. That is, the period of the flows in the routes. On each iteration we expect we get closer to the best solution, selecting every time the lowest MRE. The cooling factor reduces each time the change of temperature in the neighbors, meaning the change in the input parameters is less vigorous every time. Also, cooler temperatures reduce the chances of getting many different neighbors and trying to converge to an optimal global solution. The cooling factors can be set manually in the configuration file before starting the program.

figure a

In the calibration process we calculated the initial MRE for each sector with the default parameter configuration, the algorithm searched for a better MRE to optimize on each iteration and found the possible solution. We aimed to optimize each sector using a significance level of 0.5, given the statistical test is proving that the final speed vector should be the same or at least show statistical evidence that they are the same. In the results we present a deeper description for each sector that helps to understand and explain the experimental setup and the obtained results.

4 Results and Analysis

4.1 Calibration Results

Sector 1 (see Fig. 1(a)) has 15 routes which create heavy traffic in the main junction and in the north-south road. Those 15 parameters are the focus of our calibration. In the process, every iteration the algorithm calculates 50 neighbors from each solution changing three values at a time and iterating over five temperature levels.

The initial value of the period of the input parameters is a random number from 20 to 30 s, with which we obtained an initial MRE of 4.842. Once the calibration algorithm ends, we obtained the final MRE of 0.478. That final value is significantly smaller than the initial 4.842. The final two-sample paired T-test gives us a p-value of 0.717, with \(\alpha \) equals to 0.5 to that the \(H_0\) can not be rejected. On the contrary, we accept \(H_0\) having statistical evidence that indicates the two speed vectors are similar enough, supporting and validating the calibration algorithm results.

Table 1 summarizes the simulation results of the initial MRE, the final MRE once the calibration is run and the final p-value of the statistical test.

Table 1. Measure of Performance and p-value calculation of the studied sectors

Sector 2 (see Fig. 1(b)) only has 8 routes: three routes coming from the north side of the highway, one route going south in the same road, one going east to segment 15 and the third going southeast to segment 4. One route in the highway going from south to north that increases traffic in the junction in segments 7 and 6 (not in image). Two more routes from segment 15 to north and south, and the last two routes going from south (segment 4) to east and north.

The algorithm runs with 5 levels of temperature, selecting 40 neighbors for each solution found and changing 3 values of those 8 parameters at the time to choose a new combination of parameters. The initial value of the MRE with periods from 15 to 30 s is 2.297. The calibration algorithm ran until we obtained a final MRE of 0.265, a huge difference compared with the initial value. To validate results the p-value of the T-test is 0.583. Again, we accept \(H_0\) with \(\alpha \) equals to 0.5 indicating both resulted vectors are statistically similar.

For the last studied sectors, we present two important cases of heavy traffic in the most complex conditions. A main road moves a heavy traffic flow to a specific direction while other routes fight each other for right of way, hindering the constant flow and reducing the speed in the majority of the segments. In addition, it generates a chain of situations that impact negatively the general traffic conditions. Sector 3 (see Fig. 1(c)) is a clear example of that. In rush hour at 5pm the main traffic flow moves from west to east. This sector has 3 traffic lights that complicate matters even more. We created 15 routes that generate the most of the traffic flow. In summary, there are three routes coming from the west, four coming from the east to different directions, three routes from the north, two coming from point 13 and two more from point 12. Details can be found in the configuration file. We focus on the amount of routes that create heavy traffic, even so, initial results show that we need to calibrate those spawning times. With periods from 15 to 25 s, we obtained an initial MRE of 1.360. The simulation runs with five levels of temperature, selecting for each solution 60 neighbors and changing 3 values at the time. The final MRE was 0.286, an adequate result. The statistical p-value gives us 0.5, enough to accept \(H_0\) to indicate both result vectors are similar. With these results the calibration process on these complex scenarios is satisfactory.

Lastly, in sector 4 buses have the most negative impact on traffic conditions. There are 16 routes created manually using observations of the common paths used by vehicles. Bus routes, however, have only one direction and there are several bus destinations that need to share the main route (route 126 in the map). East and west are the origin of two spawn points each one, having a total of four spawning points. This sector clearly illustrates the worst traffic conditions during rush hour in the country. An initial MRE of 1.066 was obtained with initial periods from 20 to 25 s. Simulation ran with 5 levels of temperature and selecting 50 neighbor per solution, changing 3 values each time. With that, the calibration algorithm results in a final MRE of 0.176, with a p-value of 0.396 rejecting \(H_0\) with \(\alpha \) of 0.5. This case requires more study because several runs did not calibrate the sector with the expected precision.

4.2 Evaluation of Proposed Traffic Solutions

To evaluate the proposed traffic solutions and test if they produce a positive impact on traffic conditions, we ran a new simulation using the initial parameters of the optimized simulation and the changes applied in the network road and/or traffic rules. It is important to clarify that solution were proposed by expert opinion related with the main cause of the traffic congestion on each sector. We took the resulting speeds for each involved road segment and we compared them with the result of the optimized simulation. Showing the differences for each speed per segment and calculating the speed increment rate to quantify the impact of the proposals. Figure 3 illustrates the proposed solutions.

Fig. 3.
figure 3

SUMO visualization of proposed solutions for each sector. Black lines represents roads. Red sections are junctions, inside them are different colors lines representing the direction of turning. In sector 3 a small icons indicates the traffic light in the junction. In Sector 4 the wide blue line represents a off-line bus stop. (Color figure online)

Beginning with Sector 1, we focused on the 11 segments marked on Fig. 1(a), chosen strategically to test specific traffic conditions and have better control of the situation. The traffic flow in segments 5, 6, 7 and 8 going to south warning about a necessity of a possible solution. Because of this, we created a new lane for turning right in the way to the west trying to reduce the amount of vehicles going to south and east. Figure 4(a) shows the values of speed in the selected segments for the optimized simulation and the proposed solution. For Segment 5 the difference is not substantial because it is the closest segment to the junctions where cars need to stop on the traffic light. However, the other three segments show a difference of more than 14 m/s (50 km/h) in average. This means that cars coming from north have good speed and cars that need to turn right have complete free right-of-way to do it. The speedups in segments 5, 6, 7, and 8 are 1.78, 13.98, 12.07 and 9.04, respectively.

Fig. 4.
figure 4

Comparison of original speed for all sectors in the study against speed obtained after apply the alternatives proposed. Each bar shows the average speed in m/s for vehicles in the respective road segment.

Sector 2 shows an increase of speed in 8 of the 15 segments, and a small slowdown in 4 segments. Segments 2, 3, 4 and 5 present a slowdown because we are allowing more traffic to cross from north to east. In that route, segments 7, 8, 9, 12, 13, 14 and particularly 15 show a significant acceleration.

Sector 3 is more complicated (see Fig. 1(c)). It has a lot of routes and several limitations about the structural road changes that we can propose. The simplest solution is to adjust the timing of the existing traffic lights at the east of the map (traffic lights shown in Fig. 1(c)). That solution involves setting a higher duration for routes in the main road, meaning the way from west to east will have more time to go through. The secondary turn time is reduced or adjusted according to the real time at the moment. In chart 3(c) we see the values of segments 1 and 2 that are the road segment closest to the traffic lights, here the average speed almost does not change. However, this effect can be caused by the natural stopping of vehicles approaching the junctions. After that, in the other segments the speed increases, exhibiting the most important differences from segment 4 to 11. The majority of segments show an increase of approximately the double of speed. This behavior is clear evidence that simulation shows a benefit in the speed of vehicles in the segments approaching the traffic lights, implying an improvement of the traffic jam.

Finally, in Sector 4 (Fig. 4(d)) all 9 segments showed a speed increment, being the most successful solution to the traffic congestion. The acceleration on every segment was close to 2x. The proposed solution included adding a bus bay for the bus stop along the main road. Currently, buses stop on the road and create a delay for the entire traffic at every stop.

4.3 Sensitivity Analysis

The objective of sensitivity analysis is to detect routes that are more sensitive to segment speed variations and change the best MRE that results in an optimal configuration of the spawning periods. We present results with route ids and the numbers of seconds added to the optimal value that changes the new MRE in more than 1, making the final solution not optimal. We verify this by running the same two-paired T-test we used in the calibration process to evaluate the statistical relevance of the solution.

Starting with Sector 1, the minimum change is 1 s, where two routes going from west to north and east respectively, affect the solution when their period changes −1 s, and a route south-to-east affects the solution when 1 s is added. That means those routes are extremely sensitive and with minimum changes the traffic flow in the sector is affected and the simulation can not represent the scenario observed on the GPS records. That implies that those three routes can be the main cause of traffic congestion in the sector. It is important that route coming from the south highway, entering the main road and going north, starts changing the solution when we add 3 s and more. That delay in spawning time is causing a low traffic flow reflecting in the average speed in the roads segments and again, not representing the real traffic conditions.

The same happens in Sector 2 results. It shows the north-east to north route is the most sensitive when we subtract −4, −3, and −2 s from the spawning time. Then, north-east to south-east route is sensitive when we start adding seconds, indicating that those two routes are crucial for the traffic congestion.

Sector 3 is peculiar, with only route going from west to north that affects the optimal solution after adding more than 6 s. That means this route is the one that has a real impact in the traffic flow, generating the largest number of vehicles going from the west to the east. And showing what happens in real life during rush hours, because it is one the principal roads to leave that sector. A future approach to calibrate and propose a solution for the real traffic problems around that place must focus on this specific road.

Finally, Sector 4 is chaotic, almost every route is susceptible to changes. Even the minimum change of 1 s alters the final solution and 10 routes reflect this behavior. Thus, we could instead search for the route that is least affected. But, in the shorter range from −2 to 2 s there is no clear route that is less involved. This could be a reason why the calibration algorithm was more complicated to set up for this sector.

5 Conclusions and Future Work

We successfully created a pipeline of computational tools to implement a method to calibrate traffic simulations for all studied sectors. The pipeline is adaptable for other locations and it can also scale in size to model bigger sectors. We can ensure the results from simulations and GPS records are statistically similar, supporting the successful calibration of the simulation. With this calibrated simulations we were able to analyze the impact of proposed solutions to traffic problems. We created new simulations to measure the speed in the involved road segments, and we observed improvements in multiple segments at each sector.

For data processing we recommend exploring and using a faster and more reliable file format for GPS records. If a not-optimized data file format is used in R, then it can affect the processing time of data. Using a more optimized format such as Parquet [8], we can reduce data wrangling time and improve testing and implementation of experiments.

Future work should focus on simulation of public transportation and changes in traffic rules and their acceptance by the drivers. This can be a crucial aspect of designing new rules for buses, the most important public transport in Costa Rica. Giving new guides to create bus-bays, rules to follow their right-of-way and measure the efficiency of those solutions.