Keywords

1 Introduction

Railway transport is capable of influencing economic and social growth of regions, as passenger transport may lead to the development and growth of different activities such tourism, culture etc., and freight transport can reduce road usage (Dolinayova et al. 2018). However, planning new railway alignments (i.e., the sequence of straight lines connected by curves) is complex and usually comprises an iterative process of a multidisciplinary team of specialists and, thus, is not straightforward and hardly leads to near-optimal costs without computational assistance.

According to Li et al. (2016), the railway alignment optimization (RAO) problem is defined as the task of finding the sequence of horizontal and vertical curves connected by straight lines while minimizing a mathematical function of total cost given the geometric and operational constraints of the transportation mode.

OECD (1973) and Chew et al. (1989) classify the costs to build highways or railway alignments into infrastructure, earthwork, tunnels and bridges, expropriation and drainage; where the first three account for 75% of the overall construction costs. Additionally, Schonfeld et al. (2007) separate the infrastructure cost into building and operational costs to users and to the operator.

It is well known that an appropriate alignment optimization model must compute the most significant cost items such as earthwork and infrastructure (tunnels and bridges). It should not violate the geometric constraints related to the mode of transport and its rolling stock. It also should simultaneously optimize 3-dimensional alignments, search within a continuous solution space, yield a realistic alignment, have an efficient solution algorithm in terms of memory requirements and computing time, must be compatible with a Geographic Information System (GIS) and should avoid inaccessible regions (Jha et al. 2006).

Besides the geological, hydrological, land use and topographical conditions provided by georeferenced databases, the safety and riding comfort standards of trains are relevant aspects to be considered when optimizing railway alignments. However, given these georeferenced datasets, the computational burden to solve the RAO problem requires significant efforts to find good quality solutions.

In this paper, a parallel Genetic Algorithm running on a high performance computing environment is proposed to solve the RAO problem while minimizing the costs of new railway alignments constrained by the geometric parameters required to run trains with different average speeds on intercity connections. The framework was applied to different connections between Brazilian cities considering High Performance Trains (henceforth HPTs) with average speeds of 200 km/h and High Speed Trains (onwards HSTs) running at 300 km/h on average. This application was undertaken in order to assess its accuracy in estimating the costs of intercity railway alignments and in evaluating its benefits in terms of processing times to achieve good solutions.

While the approach is based on previous work, it is not merely an adaptation of the existing solutions to the RAO problem since: (i) a parallel approach over a set of high performance computers is proposed to estimate the costs of new railway alignments; (ii) different geometric constraints and types of trains distinguished by their average speeds are taken into account to estimate the alignments; and (iii) the framework is applied to estimate the costs of intercity alignments in different topographical and land use conditions in Brazil.

The remainder of this paper is organized as follows. Section 2 presents a literature review of previous research on RAO and solution strategies to solve the problem. Section 3 describes a Genetic Algorithm (GA) to solve the model, and the proposed parallel computing framework, followed by Sect. 4 where the results of its application to different connections in Brazil are shown and compared with values from the literature based on international practice. Finally, Sect. 5 summarizes the conclusions of the research.

2 Literature Review

The RAO is distinguished from the highway alignment optimization problem by the objective function to be minimized. However, both mathematical models address the horizontal and vertical curves as continuous differentiable functions, constrained by minimum and maximum radii and slopes at their respective derivatives in successive points of the alignment (Jha et al. 2006).

The mathematical models and solution methods to these alignment optimization problems emerged in the 1970s and were applied to different contexts in order to assist the decision makers planning new transport infrastructures. However, the solution to these problems can hardly be achieved to optimality by an exact method since the terrain configuration usually cannot be represented as a continuous surface. Moreover, the design parameters regarding the minimum values of the horizontal and vertical alignments also affect these estimations (Hodas 2014).

Several mathematical models and heuristic algorithms were proposed to solve the horizontal, vertical, and three-dimensional alignment optimization problems (Li et al. 2016, 2017): calculus of variations (Howard et al. 1968); enumeration (Easa 1988); dynamic programming (Li et al. 2013); genetic algorithms (Jha 2003; Kang et al. 2012); neighborhood search heuristics with mixed integer programming (Cheng and Lee 2006); mixed integer programming (Easa and Mehmood 2008); particle swarm optimization (Maji 2017); and distance transform (De Smith 2006; Li et al. 2016, 2017).

The benefits and shortcomings of these methods are addressed by Jha et al. (2006) to optimize highway alignments, which may be extended to the RAO problem. However, despite being the most promising approach to solve the problem in reasonable computational time, Genetic Algorithms (GA) still require large computational resources to assess the large-scale datasets containing information on the topographic and land use conditions.

Jha and Schonfeld (2000) used a Geographic Information System (GIS) to assess the land use costs to build a new highway infrastructure, and Jha et al. (2001) and Jha (2003) proposed a decision support system that enables the alteration of an alignment given the surrounding infrastructure. Jha et al. (2007) applied the GA to the RAO using the method proposed by Jha et al. (2006), where the objective function comprised the operator costs (track construction, stations, earthwork, land use and operational costs) and the costs to users (access, egress, and travel time).

Li et al. (2016) proposed a two-phase methodology to solve the RAO problem, where promising paths are generated in the first phase, followed by curve refinements. The approach is validated through a real-world case study in a mountainous area where the natural terrain gradient is nearly the triple of the maximum allowed design gradient.

Samanta and Jha (2011) proposed a model to plan a rail transit line in which the optimal alignment is obtained by microscopic analysis, followed by the solution of a station location problem by a Genetic Algorithm which minimizes the total system cost per person, the total user cost per person and maximizes the total ridership.

Lai and Schonfeld (2016) and Pu et al. (2018) applied a distance-transform algorithm to solve the railway alignment and station location problem concurrently in the urban context and in mountainous terrain, respectively. The concurrency relates to the simultaneous alignment optimization and station location.

Kim et al. (2005) considered dividing a studied area into smaller regions to deal with the computational burden to solve the highway optimization problem through the so-called “Stepwise Genetic Algorithm”. However, the evidence through statistical hypothesis testing proved its effectiveness only to a small theoretical grid area of 200 × 200 ft.

The parallel computing approach has been applied to solve combinatorial optimization problems with different methods such the Variable Neighborhood Search and the Bee Colony Algorithm (Gupta and Deep 2009; Crainic et al. 2012). The adaptive Genetic Algorithm based on a multi-population parallel approach proposed by Chen et al. (2011) is capable of estimating the costs of highway alignments while avoiding premature convergence compared to a single-population evolutionary algorithm. Kazemi and Shafahi (2013) solved the highway alignment optimization problem with a parallel processing particle swarm optimization algorithm.

As far as we know, the proposed approach to solve the alignment optimization problem through a parallel Genetic Algorithm running on a set of high performance computers has not been explored in the literature, considering the computational burden to estimate new railway alignments in wide areas, such intercity connections in large countries as Brazil. Nowadays, data can be easily stored and processed by interconnected computers in data warehouses, such the parallel computing consists of a physical infrastructure of computers (Virtual Machines, VMs) remotely accessed and an interface that enables exchanging information among servers and clients through the Internet.

3 The Parallel Genetic Algorithm and High Performance Computing Environment

This section details the parallel computing framework considered to obtain near-optimal railway alignments. The literature review presented in last section shows that the Genetic Algorithm has been extensively applied to solve both the highway and railway alignment optimization problems. One of the main contributions in this field is by Jha et al. (2006), who detailed the mathematical formulation and procedures to estimate the overall costs to build linear transport infrastructures with applications to road design.

The steps of the 3-dimension Genetic Algorithm implemented in this paper to estimate new railway alignments are described in Fig. 1, taking into account the procedures described by Jha et al. (2006).

Fig. 1
figure 1

Source Adapted from Jha et al. (2006)

Steps of the genetic algorithm implemented in this paper to estimate railway alignments.

Initially, the algorithm sets the coordinates of the start (S) and end (E) points of the alignment, the values of track parameters (minimum horizontal radius, and minimum and maximum slope), and the GA population size (number of alignments), the number of generations (iterations of the algorithm) and the unit cost to estimate the fitness function of each individual (i.e., the cost of each alignment).

Next, the individuals of the population are created, each one representing an alignment containing Horizontal Intersection Points (HIPs) and their respective elevations defined as: points over the straight line between S and E, and random elevations; points in equidistant perpendicular plans over the straight line between S and E, and random elevations; points in equidistant perpendicular plans over the straight line between S and E, and ground elevation based on Digital Elevation Model (DEM); random points and ground elevations based on DEM; or random points and random elevations.

For each individual, the algorithm executes a SUB-ROUTINE where excessive horizontal curves are eliminated, the radii of the horizontal circular curves constrained by minimum values as a function of the railway technology to be operated are set, and the attributes of the tangent and curvature points of each horizontal circular curve are calculated.

The Vertical Intersection Points (VIPs) are defined in the same position of the HIPs in a way that three geometric elements may arise in these locations given the slope constrained by minimum and maximum parameters: (i) a horizontal circular curve and a vertical parabolic curve result in a three-dimension curvature; (ii) a horizontal curve stands in a flat or sloping terrain; and (iii) or a tangent section lies on a vertical curve.

Once the altitudes of the Vertical Intersection Points are defined based on the values of their respective HIPs, the length of the parabolic curves and their attributes are calculated, followed by the identification of the position and altitude of equally spaced track points along the three-dimensional alignment.

The fitness function of each individual, i.e., the overall construction cost of each alignment (CC), is estimated as the sum of the track related cost (TRC), the land use cost (LUC), the earthwork cost (EWC) and the costs to build tunnels and bridges (TBC). The TRC refers to the track elements (rails, sleepers, electrification etc.) and is calculated as a function of the total length of the alignment and an average unit cost per kilometer.

The land use cost (LUC) is calculated by the expropriation costs over a surrounding area of the alignment given an average unit cost depending on the land use provided by a georeferenced dataset (IBGE 2014), which classifies the studied region into urban and rural areas as illustrated in Fig. 2 (left). The cutting and embankment volumes calculate the earthwork costs (EWC) over successive track points given their cross-sectional areas and the elevations provided by the DEM illustrated in of Fig. 2 (right) (U.S. Geological Survey 2014).

Fig. 2
figure 2

Georeferenced datasets of land use (left) and DEM (right) of the Brazilian Southeastern region in Brazil

Finally, the costs to build tunnels and bridges (TBC) are estimated based on unit monetary values per kilometer along the sequence of track points where those structures are more economical than earthworks. An economic break-even point between earthwork cost and construction cost of bridges or tunnels is determined by the difference between the ground elevation and the altitude of the equally spaced track points of the vertical alignment.

By the end of the SUB-ROUTINE, all the individuals are sorted in ascending order of their fitness, and the probability of changing their genes is calculated based on an uniform distribution. Four types of crossover operators (simple, two-point, arithmetic and heuristic) and four types of mutation operators (uniform, straight and non-uniform mutation to one or to all Horizontal Intersection Points) addressed by Jha et al. (2006) are randomly applied to 30–50% of the population. At each crossover or mutation, the new individuals are submitted to the SUB-ROUTINE to calculate their fitness.

The replacement of individuals of the old population with a new one has been implemented based on Jha et al. (2006). A random value between zero (0) and one (1) is drawn based on an uniform probability function and assigned to the new individual obtained by the crossover or mutation. This value is then compared with a calculated probability of excluding the kth individual of the existing population based on Eq. (1). If the assigned random value is between pk and pk+1, then the old individual in the kth position of the old population sorted in the descending order of fitness is replaced by the new one, otherwise the new individual is excluded. This replacement procedure occurs after performing all the crossovers and mutations.

$$p_{k} = \frac{{q \cdot \left( {1 - q} \right)^{k - 1} }}{{1 - \left( {1 - q} \right)^{{n_{p} }} }}$$
(1)

where pk = choice probability of the kth individual sorted in descending order of the fitness function; np = population size; and q = exchange parameter equals 0.25 as recommended by Jha et al. (2006).

The new population is sorted once again in ascending order of fitness and the first individual is identified, which represents the alignment of lowest total cost. Finally, a stop criterion is checked: if the number of successive iterations in which the value of the fitness function of the best individual among the population does not change, then the parameter NUMBER_GENERATIONS is increased by one unit, else its value is set to zero. In the former case, the crossover and mutation probabilities are re-calculated, and these operators and the SUB-ROUTINE are executed until the parameter NUMBER_GENERATIONS differs from MAXIMUM_GENERATIONS.

3.1 Parallel Computing Framework

Despite the effectiveness of the Genetic Algorithm previously described to solve the RAO problem, its application usually is constrained to small areas given the computational burden due to data processing and recurrent access to the DEM and land use georeferenced files. Additionally, the algorithm applied to wide areas requires large-scale datasets to be processed at each generation after executing the crossover and mutation operators. Thus, the parallel programming is a suitable approach to deal with these computational issues by assigning tasks to multiple computers simultaneously, and, thus considerably reduce the running times to achieve good solutions to the problem.

The computational experiments presented in this paper were performed using the high performance computing resources of the University of São Paulo’s Advanced Scientific Computing Laboratory (LCCA). More specifically, a cluster with physical servers (virtual machines) Intel(R) Xeon(R) CPU E7-2870 @ 2.40 GHz 32 GB of RAM. The georeferenced DEM and land use datasets were stored in one of these machines and accessed by a relational programming language (MySQL 2014). The fitness function of the individuals were assessed simultaneously in different virtual machines, provided their availability previously defined by the user.

A virtual machine (master) containing the core of the GA coded in Java is connected to several machines (nodes) through the “Java Parallel Programming Framework” library (JPPF 2014), each one running a SUB-ROUTINE (task) to a specific individual.

The flowchart of Fig. 3 represents the communication among the master and the nodes of the high performance computing environment to illustrate the proposed parallel Genetic Algorithm framework to solve the RAO problem.

Fig. 3
figure 3

Parallel genetic algorithm framework to solve the RAO problem

4 Model Application

This section presents the results of the proposed parallel GA applied to the high performance computing environment in different railway connections between Brazilian cities. The algorithm was applied to three pairs of cities in the Southeastern Region of the country, varying the number of Horizontal Intersection Points as a function of the length between them and the number of virtual machines (nodes) available to run the SUB-ROUTINE. The total cost, processing time, length of the alignments and the average cost per kilometer have been assessed between each city given the specified type of train (HPT and HST).

Figure 4 illustrates the location of cities in the Brazilian Southeastern Region chosen to be connected by new railway alignments, defined as “Rio de Janeiro-Juiz de Fora”, “Campinas-Poços de Caldas” and “Araraquara-Ribeirão Preto”. Since these cities already have a railway infrastructure in the urban areas as a consequence of their historical development, the new alignments resulting from the parallel GA were estimated only in the rural areas between them.

Fig. 4
figure 4

Cities among which new alignments have been estimated

4.1 Parameters

For each railway technology (HPT and HST) and each pair of city, the number of Horizontal Intersection Points have been set proportionally to an average density of points per kilometer (i.e., HIPs separated by 5, 10, 15, 20 or 25 km on average) given the distance of the straight line between the start and end point of the alignment. In addition, the number of VMs available in the high performance computing environment to process the tasks in every iteration of the SUB-ROUTINE were set to: 1 (equivalently to execute the algorithm in a single computer), 5, 10, 25, or 50 VMs.

Finally, in order to assess the variability of the total estimated costs, the GA was executed five times per city connection, type of train, average density of HIPs, and number of available VMs. Thus, 125 executions of the GA have been performed in each studied case of intercity connection. The values of the parameters to estimate the costs of the alignments are shown in Table 1 given the standard section of the railway alignment illustrated in Fig. 5 regardless of the type of train.

Table 1 Parameters to estimate the railway alignments through the GA
Fig. 5
figure 5

Standard railway section to estimate the alignment for HPT and HST

4.2 Case 1: Rio de Janeiro-Juiz de Fora

Rio de Janeiro is a coastal city situated in the State of Rio de Janeiro, and Juiz de Fora is located in the State of Minas Gerais in rough terrain, and are separated by 128 km over mountainous terrain. The total costs and average costs in Brazilian monetary units (R$ and R$ per km, respectively), and processing times (seconds), and total length (km) of the estimated alignments to operate HPTs are presented in Fig. 6, obtained by the application of the proposed parallel GA framework running on the high performance computing environment.

Fig. 6
figure 6

Average values of the estimated alignments for HPT between Rio de Janeiro and Juiz de Fora

Additionally, Figs. 7 and 8 illustrate the alignment over the land use and the DEM, and its longitudinal profile, respectively, regarding the lowest total cost solution among the 125 replications of the algorithm. Similarly, Figs. 9, 10 and 11 present the results of the GA regarding the estimations of alignments proper to operate HSTs.

Fig. 7
figure 7

Estimated lowest cost alignment over land use (left) and DEM (right) to operate HPT between Rio de Janeiro and Juiz de Fora

Fig. 8
figure 8

Longitudinal section of the estimated lowest cost alignment to operate HPT between Rio de Janeiro and Juiz de Fora

Fig. 9
figure 9

Average values of the estimated alignments for HST between Rio de Janeiro and Juiz de Fora

Fig. 10
figure 10

Estimated lowest cost alignment over land use (left) and DEM (right) to operate HST between Rio de Janeiro and Juiz de Fora

Fig. 11
figure 11

Longitudinal section of the estimated lowest cost alignment to operate HST between Rio de Janeiro and Juiz de Fora

For both types of trains (HPT and HST), the total costs vary with the number of VMs and increase when the average distance between intersection points increases. They vary practically in the same range for both technologies, with average value of R$8.20 × 109 (standard deviation of R$1.12 × 109), average distance between HIPs equals 5 km (19 intersection points) and 50 VMs for HPTs, and R$8.50 × 109 (standard deviation of R$1.12 × 109) for HST with the same number of Horizontal Intersection Points and 25 available VMs.

The elapsed times to achieve the optimal solutions are not proportional to the average distance between intersection points. There is a remarkable variation when the number of available nodes changes, as the running times reduce significantly when five VMs are used instead of only one. For instance, when the intersection points are spaced by 5 km and one virtual machine runs the GA, the average processing time is 184 min and standard deviation of 70 min to estimate alignments to HPTs, and 180 min with standard deviation of 101 min adequate to HST. With five VMs running, these values are 36 min on average with standard deviation of 24 and 85 min (standard deviation of 42 min) respectively to HPT and HST.

However, when the number of nodes is increased, the running times do not decrease in the same rate and become higher in some cases (e.g., when 25 VMs are configured to run the code with intersection points spaced by 5 km on average for the HPT alignments).

The average distance between intersection points and the number of running nodes apparently do not affect the results regarding the alignment length. Therefore, the cost per km increases when the number of intersection points increases since the total costs are higher when the average spacing between points is greater, which can be explained by refined estimations of earthwork volumes. In the worst case, the highest average cost for HPT is R$10.18 × 107/km (standard deviation of R$ 0.87 × 107/km) when running 5 VMs and average distance between intersection points of 25 km, and the values for HST are R$9.67 × 107/km (standard deviation of R$0.91 × 107/km) with 25 virtual machines and the same number of points. The average costs per kilometer to build new alignments in this case are similar due to the mountainous terrain, which require the construction of a set of tunnels near the coastal area of Rio de Janeiro.

4.3 Case 2: Campinas-Poços de Caldas

Campinas is 90 km away from the capital of the State of São Paulo and Poços de Caldas is a touristic town in the State of Minas Gerais. They are separated by flat terrain close to Campinas that becomes mountainous near Poços de Caldas, with small surrounding cities in their path. The results of the parallel GA when estimating new railway alignments between them resulted in total costs, running times, total length and costs per kilometer depicted in Figs. 12 and 15 regarding the estimated alignments suitable to operate HPT and HST, respectively. Figures 13, 14, 16 and 17 illustrate the horizontal and vertical profile of the lowest cost alignments regarding the respective technologies.

Fig. 12
figure 12

Average values of the estimated alignments for HPT between Campinas and Poços de Caldas

Fig. 13
figure 13

Estimated lowest cost alignment over land use (left) and DEM (right) to operate HPT between Campinas and Poços de Caldas

Fig. 14
figure 14

Longitudinal section of the estimated lowest cost alignment to operate HPT between Campinas and Poços de Caldas

Fig. 15
figure 15

Average values of the estimated alignments for HST between Campinas and Poços de Caldas

Fig. 16
figure 16

Estimated lowest cost alignment over land use (left) and DEM (right) to operate HST between Campinas and Poços de Caldas

Fig. 17
figure 17

Longitudinal section of the estimated lowest cost alignment to operate HST between Campinas and Poços de Caldas

As can be seen, the variation of total costs is small across the average distance between intersection points and number of nodes for both technologies. The average minimum cost for HPT is R$7.22 × 109 (standard deviation of R$0.67 × 109) when the intersection points are separated by 5 km on average (24 intersection points) and 10 VMs are available to run the GA. On the other hand, the minimum cost is R$7.78 × 109 (standard deviation of R$0.75 × 109) for HST with the same average distance between HIP and 10 available VMs. The estimations tend to be similar to the previous case as the total costs increase when the average distance between intersection points raise, but do not vary significantly across the number of available nodes.

There is a remarkable reduction in running times when more than one VM is used to execute the Genetic Algorithm, which do not change significantly when more than five nodes are turned on. For instance, the average elapsed time to solve the RAO problem for HPT when the intersection points are spaced by 25 km on average is 157 min (standard deviation of 50 min) with one VM running and, on the other hand, the convergence is reached in 28 min on average (standard deviation of 14 min) if 5 nodes are running in the high performance computing environment. For HST, the largest difference is observed when the intersection points are separated by 5 km, with elapsed time of 193 min on average (standard deviation of 83 min) for one single running node, and 57 min (standard deviation of 35 min) and 63 min (standard deviation of 46 min) minutes with 5 and 50 VMs respectively.

The length of the estimated alignments gradually increases when the number of intersection points reduces, i.e., when the average distance between them increases. Despite the variation of the estimated distances when running the GA with a different number of nodes, these values do not affect the average cost per kilometer as they are relatively small compared to the total estimated costs.

The obtained costs per kilometer are lower compared to the previous case as the minimum average costs per km for HPT is R$4.96 × 107/km (standard deviation of R$0.71 × 107/km) and R$5.24 × 107/km (standard deviation of R$0.63 × 107/km) for HST, both for replications with intersection points spaced by 5 km on average and 5 VMs.

The estimated costs among the studied cases may differ due to the terrain configuration between Campinas and Poços de Caldas, with reduced earthwork costs. The length of tunnels and bridges is small in all the replications and, thus, their contribution to the total costs is smaller.

4.4 Case 3: Araraquara-Ribeirão Preto

The last case studied presents the solution to the RAO problem to estimate new alignments between Ribeirão Preto and Araraquara, two medium size cities in the State of São Paulo. Figures 18 and 21 show the results of total costs, the elapsed times to achieve these solutions, the total length and the cost per kilometer of the alignments suitable to operate HPT and HST respectively. Figures 19 and 20 illustrate the estimated lowest cost alignment to operate HPT over the land use and DEM database, and its longitudinal section, respectively. Similarly, Figs. 22 and 23 show the lowest cost alignment suitable to operate HSTs.

Fig. 18
figure 18

Average values of the estimated alignments for HPT between Araraquara and Ribeirão Preto

Fig. 19
figure 19

Estimated lowest cost alignment over land use (left) and DEM (right) to operate HPT between Araraquara and Ribeirão Preto

Fig. 20
figure 20

Longitudinal section of the estimated lowest cost alignment to operate HPT between Araraquara and Ribeirão Preto

Fig. 21
figure 21

Average values of the estimated alignments for HST between Araraquara and Ribeirão Preto

Fig. 22
figure 22

Estimated lowest cost alignment over land use (left) and DEM (right) to operate HST between Araraquara and Ribeirão Preto

Fig. 23
figure 23

Longitudinal section of the estimated lowest cost alignment to operate HST between Araraquara and Ribeirão Preto

Despite the same trends compared to the other connections, the variance of total and average costs per km of the obtained alignments between Ribeirão Preto and Araraquara are closer to the results obtained between Campinas and Poços de Caldas. The minimum average total cost for HPT is R$2.28 × 109 with standard deviation of R$0.91 × 109 when the distance between intersection points is 5 km on average and 50 VMs are available. For HST these values are respectively R$2.38 × 109 and R$0.31 × 109 with an average distance between intersection points of 5 km and one running VM.

However, the elapsed times to retrieve the solutions are considerably smaller when one virtual machine is running compared to the performance of more than five nodes, while the results with a higher number of VMs is less deviated from the average than in the other studied cases. For HPT alignments, the minimum average processing time is 142 min with a standard deviation of 108 min when running one node and the intersection points are spaced by 5 km on average. When four nodes are added to solve the problem, the average running time reduces to 44 min (standard deviation of 18 min) given the same average distance between intersection points.

Besides, the minimum average computational time to obtain alignments for HST is 162 min (standard deviation of 57 min) when the intersection points are separated by 5 km, and one VM is used. Furthermore, the processing time to achieve the solutions using 5 nodes is reduced to 50 min on average with a standard deviation of 19 min.

The average length of the optimal solutions for HPT is 95.6 km (standard deviation of 11.6 km) when running one virtual machine with average distance between intersection points of 5 km. The obtained alignments for HST lead to an average length of 93.4 km with a standard deviation of 8.2 km under the same conditions.

The average cost per kilometer to build HPT alignments is R$2.67 × 107/km with standard deviation of R$0.67 × 107/km obtained by running the GA in 50 nodes with average distance of 5 km between HIPs, which is significantly smaller than the results obtained in the previous cases. On the other hand, the obtained value for HST equals R$2.52 × 107/km on average with standard deviation of R$0.87 × 107/km given an average distance between HIPs of 5 km and 5 VMs.

4.5 Comparative Analysis

This section aims to compare the results of the parallel Genetic Algorithm applied to the intercity connections previously described. Table 2 summarizes the results of the estimated alignments of minimum total cost in each studied cases.

Table 2 Estimated results of the lowest cost alignments regarding each intercity connection and type of train

The results show that the lowest cost solutions to the RAO problem are when the average distance between intersection points is 5 km as it provides the most refined estimations of earthwork volumes and, thus, lower values to the most representative cost item among the overall estimated alignment cost. The number of Genetic Algorithm generations to achieve the solutions is not influenced by the railway technology and the running times are proportional to the number of iterations.

The minimum total costs are close to each other when the technologies are compared within the same case studied. However, the costs tend to reduce for both technologies when the connections of Rio de Janeiro-Juiz de Fora and Campinas-Poços de Caldas are compared to Araraquara-Ribeirão Preto since the terrain between the cities becomes flatter in Case 3.

The total estimated cost is not directly associated with the distance since the total length of the alignments for both technologies between Rio de Janeiro and Juiz de Fora is smaller than the Araraquara-Ribeirão Preto case. While the respective length for HPT and HST in the first case is 111.1 km and 106.7 km, and the total costs are R$7.00 × 109 and R$7.67 × 109 respectively, the third case resulted in 94.5 km and 87.9 km for the respective technologies with total cost of R$1.51 × 109 and R$1.87 × 109. The terrain configuration considerably impacts the total cost to build the alignment regardless the railway technology since the tunnel costs represent around 64% of the total costs for both technologies in Case 1, and 37.2% and 27.6% in Case 3 for HPT and HST, respectively.

These results directly affect the average cost per kilometer of the alignments. While Case 1 resulted in R$62.98 × 106/km and R$71.87 × 106/km for alignments suitable to HPT and HST, respectively, the respective values of Case 2 are R$42.33 × 106/km and R$41.75 × 106/km, and R$15.94 × 106/km and R$21.22 × 106/km in the third case. Thus, the average cost per km in Case 3 is 25.3% smaller than in the first one regarding the operation of HPT, and 29.5% for HSTs.

Given the estimated costs to build alignments appropriate to operate HPT and HST, the geometric parameters required to properly operate these technologies may not be the most relevant proxy since the earthwork, and tunnels and bridges costs are the most relevant cost items provided the terrain configuration.

In order to compare the estimated values, Fig. 24 presents the range of infrastructure construction costs required to operate High Speed Trains across different countries worldwide. Considering an average quotation of R$3.10 per European currency unit (€) between 01/19/2015 and 05/19/2014 (BCB 2015), the minimum and maximum values observed refer to R$14.6 × 106/km (France) and R$204.3 × 106/km (Italy), respectively.

Fig. 24
figure 24

Source Adapted from De Rus et al. (2009)

Average costs per kilometer for construction of new HST rail alignments.

Despite the specific features of the Brazilian terrain and the premises of the proposed parallel Genetic Algorithm framework, Fig. 24 shows that the average cost values to build the railway alignments obtained in this paper are close to the international practice (De Rus et al. 2009). While the average construction cost per kilometer of HST alignments worldwide is R$78.4 × 106/km, the average estimated values in Brazil are R$65.9 × 106/km for HST and R$63.6 × 106/km for HPT.

5 Conclusions

This paper provides a solution to the railway alignment optimization problem through a parallel Genetic Algorithm implemented on a high performance computing environment to estimate intercity alignments in Brazil under distinct land use and terrain configurations. The proposed approach was able to compute the costs to build new infrastructure to operate trains of 200 and 300 km/h, respectively defined in this paper as High Performance Trains (HPTs) and High Speed Trains (HSTs).

Despite being able to solve the problem with one single virtual machine running in the high performance computing environment, the elapsed times to achieve the solutions significantly decrease when the number of machines available to compute the fitness function of the individuals of the algorithm increases. However, scaling the computing infrastructure to more than five nodes is not appropriate since the running times do not decrease significantly when more virtual machines are available. Thus, investing in such large number of computers may not be the most adequate strategy to solve the problem.

While the length of the alignments remains almost the same when the average distance between intersection points is higher (because of slight differences in the number of horizontal and vertical curves), the total costs decreases with higher density of points (i.e., lower average distance between intersection points) since the tunnels and bridges are replaced by cutting and embankments.

Throughout the application of the model to three real-world cases, we showed that the alignment costs are not affected by the number of nodes available to run the Genetic Algorithm and that the high performance computing infrastructure affects only the running times and not the problem solution itself.

As expected, the alignments adequate to operate HPT and HST have less number of curves in flat terrains such in the studied cases of Campinas-Poços de Caldas and Araraquara-Ribeirão. However, in these scenarios the shape of the alignments is different between train technologies. On the other hand, the estimated alignments between Rio de Janeiro-Juiz de Fora are similar among train technologies because the mountainous terrain between them require more tunnels and bridges regardless the type of train.

Despite the capabilities of the parallel GA framework to solve the RAO problem, it can still benefit from technical improvements as the total estimated costs may include the values of operating costs on a two-level approach of RAO solutions followed by a train performance simulation. Moreover, methodological research would be carried to investigate the performance of a concurrent computing approach compared to the parallel environment described in this paper.