Keywords

1 Introduction

The vehicle routing problem with time windows (VRPTW) is a two-objective NP-hard discrete optimization problem. Its main objective is to determine the minimal number of homogeneous vehicles to serve customers dispersed on the map. In addition, the total travel distance is to be minimized. A solution of the VRPTW is feasible if (i) all customers are visited within their time windows and (ii) the capacities of the vehicles are not exceeded.

The applications of the VRPTW are of wide range, including food, cash and parcels delivering, school bus and airline fleet routing, rail distributions and more. Thus, a number of exact and approximate algorithms to solve the VRPTW have been proposed over the years. The former approaches incorporate dynamic programming, branch-and-bound, greedy algorithms and more [5]. Due to the NP-hardness of the VRPTW, numerous approximate algorithms have emerged to solve it in acceptable time. Heuristic algorithms improving an initial solution (improvement heuristics) and constructing a feasible solution from scratch (construction heuristics) have been explored [4, 6, 11, 12]. Metaheuristics, incorporating search space exploration and exploitation mechanisms and allowing for temporary deterioration of a solution, evolutionary, genetic and memetic algorithms (GAs and MAs), both sequential and parallel, have emerged recently [8, 13, 14]. MAs are the population-based methods combining the GAs to explore the solution space with local search algorithms for the exploitation of already found solutions.

Here we study the co-operation schemes for our two-stage parallel memetic algorithm to solve the VRPTW [1, 2, 9, 10]. We present the experimental results obtained for selected tests from the Gehring and Homberger’s benchmark. The paper is organized as follows. Section 2 formulates the VRPTW. The parallel memetic algorithm and co-operation schemes are described in Sect. 3. The experimental study is reported in Sect. 4. Section 5 concludes the paper.

2 Problem Formulation

Let \(G=(V,E)\) be a directed graph with a set \(V\) of \(C+1\) vertices and a set of edges \(E=\{(v_i,v_{i+1})|v_i,v_{i+1}\in V, v_i \ne v_{i+1}\}\). The VRPTW is defined on the graph \(G\), where each customer \(v_i\), \(i\in \{1,2,...,C\}\) (and the depot \(v_0\)) is given as a vertex, and each edge represents a travel connection with the cost \(c_{i,j}\), \(i,j \in \{0,1,...,C\}\), \(i \ne j\). Customers define their demands \(d_i\), \(i \in \{0,1,...,C\}\), \(d_i \ge 0\), \(d_0=0\), time windows \([e_i, l_i]\), \(i \in \{0,1,...,C\}\), and service times \(s_i\), \(i \in \{1,2,...,C\}\). Let \(Q\) be a constant vehicle capacity and \(K\) denote a number of vehicles. Each route \(r\), \(r=(v_0,v_1,...,v_{n+1})\), starts and finishes at the depot, thus \(v_0 = v_{n+1}\). A solution \(\sigma \) (with \(K\) routes) is feasible if (i) the vehicle capacity \(Q\) is not exceeded for any vehicle, (ii) the service of every customer starts within its time window, (iii) every customer is served in exactly one route, and (iv) every vehicle leaves and returns to the depot within the time window \([e_0, l_0]\).

The primary objective of the VRPTW is to minimize the number of routes \(K\) (\(K \le \left\lceil D/Q\right\rceil \), where \(D=\sum _{i=1}^C{d_i}\)). In addition, the total travel distance \(T=\sum _{i=1}^{K} T_i\) is to be minimized, where \(T_i\) is the travel distance of the \(i\)-th route. Let \(\sigma _A\) and \(\sigma _B\) be the feasible solutions. Then, \(\sigma _A\) is of higher quality if (\(K(\sigma _A) < K(\sigma _B)\)) or (\(K(\sigma _A) = K(\sigma _B)\) and \(T(\sigma _A) < T(\sigma _B)\)).

3 Parallel Memetic Algorithm and Co-operation Schemes

We had proposed and later improved [1, 3, 9] a two-stage parallel memetic algorithm (PMA) in which the number of routes is minimized by a parallel heuristic algorithm based on the approach suggested by Nagata and Bräysy [7]. The total travel distance is minimized using a parallel memetic algorithm. The sequential algorithm was proposed by Nagata et al. [8] and was subjected to our improvements [9, 10]. In this section we give an overview of the PMA and describe proposed co-operation schemes which are experimentally evaluated.

3.1 Parallel Memetic Algorithm Outline

The PMA consists of two stages in which the number of vehicles and the total travel distance are minimized independently (Algorithm 1). First, the number of routes is minimized by the parallel heuristic algorithm executed by \(N\) processes (line 1). Then, an initial population of \(N_p\) feasible solutions, each consisting of \(K\) routes, is generated for each parallel process (lines 2–4). A solution with the minimal total travel distance is found using a parallel memetic algorithm (lines 5–24).

figure a

In the first stage (Algorithm 2), an initial solution in which every customer is served in a separate route is subject to the route minimization (RM). At each step, a random route \(r\) is selected in \(\sigma \), its customers are inserted into the ejection pool (EP) (line 5) and the penalty counters \(p\) indicating the re-insertion difficulty are initialized (line 6). Then, a customer \(v\) is popped from the EP (line 8). If there exist several feasible positions for \(v\), then a random one is chosen (lines 9 and 10) – \(S(v,\sigma )\) is a set of feasible insertions of \(v\). Otherwise \(v\) is inserted infeasibly, and \(\sigma \) is squeezed to restore its feasibility (line 12). If the squeezing fails, then the additional customer ejections minimizing the sum of their penalty counters are tested (lines 15–17) and \(\sigma \) is perturbed by the local search moves (LSM) for search diversification (line 18). Let \(\mathcal {N}(v,\sigma )\) be the neighborhood of \(v\) in \(\sigma \), which is obtained by applying the following operators (moves): 2-opt*, OR-opt, out-relocate, in-relocate and exchange [1]. In order to decrease the search space, only \(N_N=50\) customers nearest to \(v\) are considered. Then, the best feasible solution \(\sigma '\), \(\sigma '\in \mathcal {N}(v,\sigma )\), replaces \(\sigma \), and the processes co-operate (line 20).

Once a minimal number of routes \(K\) is found, a population of \(N_{p}\) feasible solutions is created for each process \(P_{i}\) (Algorithm 1, lines 2–4). Then, according to a pre-selection scheme [10], \(N_{p}\) pairs of chromosomes (\(\sigma _A,\sigma _B\)) are determined (line 8) and for each pair \(N_{c}\) children \(\sigma _{c}\) are generated using the edge assembly crossover (EAX) operator (line 12). If \(\sigma _{c}\) is infeasible then it is repaired (if possible) and enhanced by the LSM (line 13). The best feasible child \(\sigma ^b_c\), i.e. with the shortest total travel distance, is found for each (\(\sigma _A,\sigma _B\)) (lines 14–16). The next population is formed (line 19) according to the post-selection scheme [10], and the processes co-operate (line 20). Finally, the best individual is returned (line 24). The detailed descriptions of the algorithms can be found in [1, 10].

figure b

3.2 Co-operation Schemes

The \(N\) processes in the PMA co-operate periodically in order to exchange solutions found up-to-date and to guide the search towards solutions of better quality. Let \(\delta _{R}\) and \(\delta _{D}\) denote the periods of process co-operations during the number of routes and the total travel distance minimization stages. Here we present the proposed co-operation schemes that were explored and experimentally evaluated in this work. The characteristics of the schemes are as follows:

  1. 1.

    Rigid. Each process \(P_{i}\) runs both optimization stages independently. After each stage all solutions are compared and the best one is chosen. This approach can be considered as the independent multi-search method since the processes do not co-operate during the PMA execution.

  2. 2.

    Knowledge synchronization (KS). In the first stage, the processes send their best solutions to process \(P_{1}\) Footnote 1 which determines the best one \(\sigma _{b}\) and broadcasts it back to each process. In the second stage, each process \(P_{i}\) sends the best solution \(\sigma ^{b}_{i}\) to the master. It sorts \(N\) received solutions and selects randomly \(N_b\), \(1\le N_b<N\), ones for each process \(P_{i}\). Finally, every process \(P_{i}\) replaces its \(N_b\) worst solutions with these received from the master.

  3. 3.

    Ring. Let \((P_1,P_2,\dots ,P_N)\) be the order in which processes co-operate. Then, the ring co-operation scheme is given as: \(P_{1}\rightarrow P_{2} \rightarrow \cdots \rightarrow P_{N} \rightarrow P_{1}\). The process \(P_{1}\) sends its best solution (in both stages) to \(P_{2}\). It replaces its best solution if the received one is of higher quality and sends it to \(P_{3}\). As the scheme constitutes a ring (\(P_{N}\) co-operates with \(P_{1}\) at last) it is ensured that the best solution found so far is kept by the master process \(P_{1}\).

  4. 4.

    Randomized EAX (R-EAX). The scheme is similar to the ring, however, if the solution received by process \(P_{i}\) is of lower quality than the current best one of \(P_{i}\), then the EAX operator is performed on these solutions to generate a child \(\sigma _{c}\). If it is not feasible, then the feasibility is restored by the repairing moves. Finally, if \(\sigma _{c}\) is feasible then it replaces the best solution kept by \(P_{i}\) in both stages. The order of co-operation is randomly determined by the master process \(P_1\) at the beginning of each co-operation phase.

  5. 5.

    Pool. In the first stage, process \(P_1\) handles the pool of size \(s\), \(1\le s\le N\), of solutions with the currently minimal number of routes \(K_P\). If \(\sigma \) is the best solution kept by process \(P_{i}\), and \(K(\sigma )>K_P\), then it is replaced by a random pool solution \(\sigma _P\). If \(K(\sigma )=K_P\), then \(\sigma _P\) replaces \(\sigma \) with a probability \(p\), \(0<p<1\). If process \(P_{i}\) finds a feasible solution \(\sigma \), and \(K(\sigma )<K_P\), then the pool is emptied, \(\sigma \) is sent to \(P_{1}\) to form the pool of size 1. Each process \(P_{i}\) sends to \(P_1\) its best \(\sigma \) with \(K_P\) routes to replace a random pool solution, only if \(\sigma \) does not exist in the pool. In the second stage, each process \(P_{i}\) sends to \(P_{1}\) the best \(\sigma \) in its population (of size \(N_{p}\)), and it is inserted into the pool as in the first stage. The process \(P_1\) determines \(\eta N\), \(0<\eta <1\), \(\eta N\le N_{p}\), best pool solutions. They replace \(\eta N\) random ones in the population of \(P_{i}\).

  6. 6.

    Pool with EAX (P-EAX). The regular pool is enhanced by applying the EAX operator. In the first stage, if the solution \(\sigma _{P}\) received by process \(P_i\) contains the same number of routes as the best solution \(\sigma \) kept by \(P_i\), and \(\sigma _{P}\ne \sigma \), then the EAX operator is applied to generate a child \(\sigma _c\) and its feasibility is restored if necessary. At most \(N_c\) children are generated, until a feasible one \(\sigma _c\) is obtained. Then, \(\sigma _c\) replaces \(\sigma \). In the second stage, the EAX operator is employed while replacing \(\eta N\) received solutions by process \(P_i\) – a solution \(\sigma _a\) that is being replaced is crossed-over with the replacing solution \(\sigma _b\), \(\sigma _a \ne \sigma _b\), to generate \(\sigma _c\). If \(\sigma _c\) is feasible, then it replaces \(\sigma _b\).

The time complexities of the considered co-operation schemes (i.e. of a single co-operation phase) are presented in Table 1 (it can be shown [1] that the EAX operator takes \(O(C^2)\) time, where \(C\) is the number of customers in each parent solution). It is worth noting that data are transferred asychronously between the processes. This means that the processes proceed with the algorithm execution while the communication progresses, i.e. computation and communication overlap. Furthermore, a two-step data passing is applied: the complete solution is transferred only if its quality has been improved since the previous co-operation.

Table 1. Time complexities of a single co-operation phase for the first (\(\mathcal {T}_1\)) and the second (\( \mathcal {T}_2\)) optimization stages; \(N\) – number of processes, \(C\) – number of customers.

4 Experimental Results

In this section we present the settings of the PMA, describe the Gehring and Homberger’s (GH) tests solved by the PMA, and discuss the experimental results obtained for each proposed co-operation scheme. The analyses of the quality of solutions, search convergence and execution time of the PMA are also provided.

4.1 Settings

The proposed co-operation schemes were tested in the PMA for solving the 400-customers GH tests. Also, we ran the PMA for some larger tests with 600 and 800 customers. However, to draw the meaningful conclusions about them, it is necessary to investigate all instances in a subclass, thus we focus on analysis of the results obtained for the 400-customers problems. The GH tests are divided into six groups: C1, C2, R1, R2, RC1 and RC2. Customers are: (i) clustered (C class), (ii) randomly dispersed on the map (R class), (iii) mixed – both clustered and random (RC class). The subclasses C1, R1 and RC1 have smaller vehicle capacities and shorter time windows than C2, R2 and RC2. There are 10 problem instances in a subclass, resulting in 300 GH instances in total. Tests can be distinguished by their unique names, \(\alpha \_\beta \_\gamma \), where \(\alpha \) denotes the subclass (C1, C2, R1, R2, RC1, RC2), \(\beta \) relates to the number of customers (2 for 200 customers, 4 for 400, and so forth) and \(\gamma \) is the instance number (\(\gamma =1,2,\dots ,10\)).

The PMA was implemented in C++ using the Message Passing Interface (MPI) library and the experiments were performed on the GaleraFootnote 2 supercomputer. Let \(t_{T}=t_{R}+t_{P}+t_{D}\), \(t_{T}\le t_{max}\), be the total execution time of the PMA, where \(t_{R}\) is the time necessary to find the minimum number of routes (Algorithm 2), \(t_{P}\) is the time of generating a population of solutions of size \(N_{p}\) (Algorithm 1, lines 2–4), and \(t_{D}\) is the time of the total distance minimization (Algorithm 1, lines 5–24). The PMA was run on 32 cores (4 nodes equipped with 8 cores each), each running a single process, and \(t_{max}=240\,\mathrm{min}\). The PMA parameters were tuned experimentally to the following values: \(N_{p}=60\), \(N_{c}=10\), \(\eta N_p=N_b=4\) (\(\eta \approx 0.07\)), \(p=0.5\), \(\delta _{R}=200000\), \(\delta _{D}=40\). The values of other PMA parameters along with the discussion on their influence on final results can be found in [1, 9].

4.2 Analysis and Discussion

In order to verify the speed of search convergence and quality of solutions obtained using the PMA, each 400-customers GH test (60 tests in total) was run \(n\) times, \(7\le n\le 10\), using each co-operation scheme. As mentioned earlier, the first objective of the VRPTW is to minimize the number of routes. Thus, the number of routes \(K\) obtained from the first stage of the PMA was compared with the currently best-known one \(K_b\) Footnote 3. The minimal number of routes \(K_b\) was achieved for all 400-customers tests using the PMA. However, in case of the independent multi-search method, i.e. the rigid co-operation, the PMA did not converge to the best \(K_b\) within time \(t_{max}\), and \(K\) was larger by one route for several testsFootnote 4 from C1, C2 and RC2 subclasses. It indicates that the co-operation of processes is crucial for converging to highest-quality solutions.

The average best total travel distance obtained using the proposed co-operation schemes and allowing for the evaluation of the quality of solutions containing the same number of routes \(K\), was calculated for each GH test. Then, these averages were used to calculate the average total distance of each GH subclass. The averaged best distances achieved for all 400-customers GH tests (tests for which \(K>K_b\) are omitted for the rigid co-operation) are presented in Table 2. The experiments showed that the ring and KS co-operations outperformed other schemes and gave the best average total travel distances. The percentages of the average total travel distances (%Best) shown in Table 2 are calculated with respect to the best currently-known ones. If within a certain subclass a better, i.e. with a shorter total travel distance, solution than this obtained using the PMA exists, then the percentage is larger than 100.00 %. The R1 and RC1 subclasses turned out to be the most difficult with %Best equal to 101.07 and 100.69 respectively. The detailed results for R1 subclass given in Table 3 shows that the ring, KS and R-EAX schemes outperformed other co-operation schemes.

Table 2. Average total travel distances of all 400-customers GH tests (the best average total travel distance within a subclass is marked in boldface).
Table 3. Average total travel distances for the R1 tests (the best average total travel distance for each test is marked in boldface).
Fig. 1.
figure 1

Average execution time \(t\) in seconds and average generation \(g\).

A number of factors, e.g. the population size or the number of ejected customers \(M_e\) (Algorithm 2, line 17), can make the execution time of the PMA arbitrarily large. Here we compare the average execution time \(t\) after which the best total travel distance (among \(N\) processes) was not further improved and the search converged, for each GH subclass (Fig. 1(a–c)). The total average execution time \(t\) of the PMA is given in Fig. 1(d). We also present the average number of generations \(g\). Intuitively, the PMA with the KS scheme requires the shortest execution time to converge, since the search is guided towards the most promising parts of the solution space which are intensively exploited. This, however, leads to getting stuck in local minima (KS was outperformed by the ring scheme for most GH tests, see Table 2). The best solutions propagate in case of the ring and R-EAX schemes, but the search is diversified due to updating only the best individuals. Applying the EAX operator in the former scheme results in increasing the execution time of the PMA. However, it is easy to see that \(t\) is smaller for P-EAX scheme than for the regular pool. Thus, the quality of initial populations and guiding the search, strongly influence the convergence of the total distance minimization stage. In the PMA with the rigid co-operation, each process minimizes the travel distance independently. Clearly, the average number of generations is significantly larger than in case of the co-operative methods (Fig. 1).

Table 4. New world’s best results for Gehring and Homberger’s tests; in expression \(x/y\), \(x\) denotes the number of routes and \(y\) the total travel distance.

The study indicated that the GH tests with wider time windows and larger vehicle capacities (C2, R2 and RC2) can be solved faster than these with shorter time windows and smaller capacities (C1, R1 and RC1). The tests with random customers (R1, R2) are more difficult to solve to good quality in comparison with the clustered-customers tests (C1, C2). It is worth noting, that applying the R-EAX scheme resulted in improving the world’s best total travel distances \(T_b\) for two GH tests and decreasing the best-known number of routes for two other ones, whereas the ring scheme decreased \(T_b\) in one GH test (Table 4).

5 Conclusions and Future Work

In this work we proposed new co-operation schemes used in the parallel memetic algorithm for the NP-hard vehicle routing problem with time windows. We showed how the solutions quality, execution time and number of generations are influenced by the choice of the scheme. The experiments performed on Gehring and Homberger’s benchmark proved the ring and knowledge synchronization schemes to be the best in terms of the search convergence speed and quality of final solutions. However, it is the randomized EAX scheme that allowed for finding four new world’s best solutions (one was found using the ring scheme). We showed that the independent multi-search method gave the worst results, what indicates that the search should be guided during the algorithm execution.

Our ongoing research includes performing full GH tests, investigating the influence of co-operation frequency on the quality of solutions and execution time and conducting the sensitivity analysis for its automatic tuning. Finally, our aim is to further improve the rigid, KS and R-EAX co-operation schemes.