1 Introduction

The maximum diversity problem (MDP) consists in finding a subset of elements that have maximum diversity according to a function that defines the diversity between any two elements in a set. An example of this problem might be to find a subset of students in a classroom that have maximum diversity among themselves. The diversity between any two students might be stored in a matrix and defined according to parameters such as gender, age, grades and nationality (Sect. 2).

The contributions of this paper are (i) a Memetic self-adaptive evolution strategy (MSES) (Sect.  3), (ii) an analysis of the benchmark instances and the problem difficulty (Sect. 4), (iii) a comparison of results between the MSES and the best methods available in the literature (Sect. 5). In Sect. 6, we draw conclusions from the comparison of the algorithms.

2 Maximum diversity problem

The MDP consists in finding a subset \(M\) (\(|M| = m\)) from a set \(N\) (\(|N| = n\)) in a way that the sum of diversities amongst the \(m\) elements is maximized. Many relations of diversity can be used to define diversity values \(d_{ij}\) according to the practical application of the MDP. The problem is concisely described [11] by formulation (1), where \(x_i = 1\) if element \(i\) is in the subset \(M\).

$$\begin{aligned} \mathrm{Maximize\ }&\displaystyle \sum \limits _{i=1}^{n-1} \sum \limits _{j=i+1}^{n} d_{ij} {x}_{i} {x}_{j},\; \mathrm{subject\ to\ } \sum \limits _{i=1}^{n} x_i = m \nonumber \\ \!\!\!\!\!\!\!\mathrm{where\ }&\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\! x_i \in \{0,1\} \, \forall i=1,\ldots ,n \end{aligned}$$
(1)

Applications of the MDP [14] are location of facilities, environmental systems, medical treatments, genetics and design. The clique problem can be reduced to the MDP [11], justifying the interest on metaheuristics for obtaining solutions in reasonable time. Methods for the problem include Greedy Randomized Adaptive Search Procedures (GRASP) [8, 16, 18, 19], Variable Neighborhood Search (VNS) [3, 6, 9, 18], Simulated Annealing [1], Lotfi-Cerveny-Weitz heuristic [22], Tabu Search [7, 15, 20], Iterated Greedy Algorithm [12], branch-and-bound [2, 13], Hopfield Networks [21], and Scatter Search [17].

In an extensive comparison with methods for the MDP [14], even simple heuristics achieved good results. However, for high level solutions, a VNS [6] and an ITS [15] were the best methods for the benchmark instances. In a more recent paper, a Learnable Tabu Search (LTS) [20] appears as the best approach.

3 Memetic self-adaptive evolution strategies

Evolution strategies (ES) are a class of evolutionary algorithms primarily dependent on mutation [4]. The notation ES\((\mu ,\lambda )\) stands for an ES with \(\mu \) parents and \(\lambda \) children, being each of those individuals a candidate solution to the problem.

The algorithm is based on generations that repeat until a halting criteria is reached. In a generation, \(\mu \) parents are mutated to produce \(\lambda \) children. Afterwards, the best \(\mu \) new individuals become the current parents. A common adaptation to ES is to include a step size or mutation strength variable \(\sigma _i\) associated with each individual \(i\) [4]. This variable is often adjusted through self-adaptation, that is, by also applying secondary mutation operators on the associated \(\sigma _i\) values and expecting the best configurations to survive over time. This adjustment of the algorithm composes the self-adaptive ES.

We propose a memetic self-adaptive evolution strategy (MSES) presented in Fig. 1 that includes local searches [15, 20], self-adaptation of the \(\sigma _i\) mutation parameter [4, 6, 15] and a low probability crossover operator. The source code of the algorithm is available online from the authorsFootnote 1 and the next paragraphs explain all details of this pseudocode, adaptable to other problems.

Fig. 1
figure 1

Memetic self-adaptive evolution strategy- MSES(\(\mu +\lambda \))

In order to save computational cost in line 1, indices mark which individuals are the parents and which ones are the children. Thus, when new \(\mu \) individuals are defined as parents, only the indices need to change. In line 2, the crossover probability \(c_p\), the number of parents per crossover \(\rho \), is defined to \(2\) and the range of perturbation in the strong local search \(\alpha \) are initialized.

The initial \(\mu \) parent individuals are generated in line 3 as uniformly distributed random solutions. Each individual represents a solution to the problem as a vector of \(n\) elements \(x_i\) where \(\sum _{i=1}^n x_i = m\), as in Eq. 1. All initial individuals already undergo a weak local search in line 4. Given that we can get to a neighbor of a solution by exchanging one element from \(M\) to \(N \backslash M\), the weak local search is a first-improvement local search [10] that tests all neighbors in random order and moves to any neighbor that improves the current solution until no neighbor is able to improve the current solution.

For a more efficient local search, only the benefit \(\Delta \) of each neighbor over the current solution is analyzed [15]. The \(\Delta _{qr}\) of removing the element \(q\) and including the element \(r\) is \(d_{r.} - d_{q.} - d_{qr}\), where \(d_{x.} = \sum _{i=1}^n d_{xi}\) for all \(i \in M\). The values \(d_{i.}\) of all elements can be calculated only once every iteration.

The initial solutions \(i\) receive their \(\sigma _i\) in line 5. The range of possible \(\sigma \) values is from \(\sigma _{\min } = 1\) to \(\sigma _{\max } =\min (m,n-m)\), but all individuals have \(\sigma _{\min }\) at this point. Those values are later self-adapted to determine the mutation strength. Iterative evolution begins in line 6 until certain halting criteria are met. In this work, the only criterion considered is elapsed time.

At each generation beginning in line 7, a second loop generates each of the new \(\lambda \) children. In line 8, it defines if the new child is going to be created from only a mutation applied to a parent, with probability \(1 - c_p\), or a crossover of \(\rho \) parents followed by mutation, with probability \(c_p\). If the new child will be produced through crossover we have a new process from line 9: \(\rho = 2\) parents are randomly chosen with uniform distribution and their indices are stored in \(\mathbf z \). The child \(i\) is then generated through crossover on the \(\rho \) parents in \(\mathbf z \).

In the crossover operator, given that each parent \(j\) represents a subset \(M_j\), the child’s \(M_i\) is initially formed by the intersection \(\cap M_{\rho }\) of all \(M_j\) for which \(j \in \mathbf z \). At this point, if \(|\cap M_{\rho }| = |M_i| = m\), the child is returned. Otherwise, if \(| \cap M_{\rho }| = |M_i| < m\), the operator selects \(m - |M_i|\) random elements from \(\cup M_{\rho } \backslash \cap M_{\rho }\) to include in \(M_i\), being \(\cup M_{\rho }\) the union of all \(M_j\) for which \(j \in \mathbf z \).

After the application of crossover to the individuals, the \(\sigma _i\) of the new child \(i\) is also inherited from the \(\sigma \) of its parents in line 12. The new \(\sigma _i\) is the mean \(\sigma \) of the parents rounded to the closest integer. On the other hand, if only mutation is used to generate the new individual, a different process begins in line 14: only one individual \(z\) is chosen as parent of the child \(i\), which will be a copy of the parent \(z\). The \(\sigma _i\) will also be a simple copy of \(\sigma _z\).

Independently of the application of crossover, all generated individuals undergo mutation in line 18. The mutation operator consists of randomly exchanging \(\sigma _i\) elements in \(M_i\) for elements in \(N_i \backslash M_i\). Each generated individual undergoes the weak local search in line 19. The weak local search avoids underestimating the potential of solutions mutated with high \(\sigma \). From this point, the \(\sigma \) values also have to be adjusted. If crossover was applied and the child is better than any of its parents, its \(\sigma _i\) returns to the minimum value \(\sigma _{\min }\) in line 22. If crossover was applied and there was no improvement, nothing happens and \(\sigma _i\) is left as it was with the mean \(\sigma \) of its parents.

On the other hand, if only mutation was applied and the child is better than its parent, its \(\sigma _i\) also returns to \(\sigma _{\min }\) in line 26. However, if there is no improvement over the parent, as in line 28, \(\sigma _i\) is adjusted to \(\sigma _i + 1\) in the next generation, or returned to \(\sigma _{\min }\) if \(\sigma _i\) was already equal to \(\sigma _{\max }\).

In any case, a child that improves the best solution known undergoes the fast strong local search [20] in line 31. The strong local search used at this point is a Tabu Search, a local search algorithm that accepts movements to worse solutions as long as they are not in the Tabu list, differently from the search used as weak local search in line 19. This Tabu Search includes or removes only one element in \(M\) to find neighbor solutions also based on \(\Delta \) values. The contribution \(\Delta _j\) of each neighbor \(j\) in this algorithm is \(\Delta _j = \sum _{i=1}^n d_{ji}\) for all \(i \in M\) if \(j \notin M\) or \(\Delta _j = -\sum _{i=1}^n d_{ji}\) for all \(i \in M\) if \(j\in M\).

This Tabu Search has tenure \(t = \sqrt{m}\). This means that once a movement to a neighbor is applied, this movement is marked as Tabu for \(t\) iterations and cannot be performed unless it has better objective value than the best feasible solution known so far. The algorithm has the limit of \(n\) iterations. At every iteration one movement towards the best neighbor is performed. If the current solution has less than \(m\) elements in \(M\), only neighbors with more elements are considered. If the current solution has more than \(m\) elements in \(M\), only neighbors with less elements are considered. If the current solution has \(m\) elements in \(M\), the solution is feasible and all neighbors are considered.

After generating all children, the indices of the parents become the \(\mu \) best unique individuals in line 33. In lines 34–36, a randomly chosen parent is perturbed and undergoes an intensive strong local search [15]. The perturbation alters \(\alpha \) elements in the solution. At every iteration, each of those \(\alpha \) elements is chosen from a list of the best 5 neighbor candidates. The neighbor structure and their measure of \(\Delta \) are the same as in the weak local search.

Afterwards, this perturbed solution undergoes a second strong local search in line 36. This strong local search is a Tabu Search with the halting criterion of \(\max (10,\!000, 1,\!000n)\) tested neighbors and tenure \(t = n/4\). It uses the same neighbor structure and calculation of \(\Delta \) as the weak local search. The neighbors are explored in random order and every time a better solution is found, it becomes the current solution and the algorithm goes to the next iteration. Solutions in the Tabu list are only considered if they improve the best solution known. If the best solution known has been improved by the end of the iteration, the new current solution undergoes the weak local search with a best-improvement approach instead of the ordinary first-improvement algorithm.

4 MDP instances

The 315 instances from the MDPLib [14] were used to test the proposed algorithm. The maintenance of the MDPLib is a valuable contribution from OPTSICOM.Footnote 2 A contingency table with the configuration of all the instances in the MDPLib is available online from the authors.

The instances in the MDPLib have \(10<n<3,\!000\); \(0.08<n/m<0.8\); and values \(d_{ij}\) not defined as a function of \(n\) or \(m\), which can make instances with a high value of \(n\) become easier as \(d_{ij}\) become small in relation to the problem, similarly to what happens to other optimization problems [5]. To demonstrate this, Fig. 2 shows the average gaps between the objective function values found by the constructive heuristic KLD [19] and the best values known for the 20 instances from the library SOM-b [14]. We see that the greater \(n\) is and the closer \(m/n\) is to \(1/2\), the smaller the gap we obtain for those instances.

Increasing the value of \(n\), however, does not necessarily mean that the probability of finding the optimal solution is higher. The number of solutions in the search space is \(C_m^n = \frac{n!}{m!(n-m)!}\). Thus, the number of possible solutions increases when \(n\) increases or \(m\) gets closer to \(n/2\) so it does not make sense to define the complexity of the problem only in terms of \(n\).

Fig. 2
figure 2

Performance of the algorithms

As for all instances in the MDPLib, the values \(d_{ij}\) are not defined as a function of \(n\) or \(m\). Given that the search space increases with \(n\) and decreases with \(|m-n/2|\), the most complex instances in the MDPLib are the ones with \(n=3{,}000; m/n=0.2, n=2,\!000; m/n =0.1\), and \(n=500; m/n=0.4\), which represent 46 (14%) of the 315 instances. All other instances have either smaller \(n\) or a \(m/n\) (or \(m\)) more distant from \(0.5\) (or \(n/2\)).

Thus, an increase in \(n\) may lead to higher probability of getting a solution with a small gap from the optimum. However, in the search space, it is more difficult to achieve the global optimum. Thus, simple algorithms are likely to have good results if the objective gap is used as reference, as shown in Fig. 2. However, even when it is easy to find solutions with quality similar to the optimal, the size of the search space is always the same \(C_m^n\), and the difficulty to find the global optimum does not necessarily change.

5 Experiments

In an extensive comparison among the best methods for the MDP [14], VNS [6] had the best results for the problem except for the largest instances, where the ITS [15] is the method with the best rank. The LTS [20] is a more recent approach that showed better results on 50 instances with size \(2,\!000<n<5,\!000\).

  • LTS [20]: An evolutionary strategy involving a k-means clustering algorithm for probability vectors identifies points likely to be good initial solutions for a Tabu Search. At each iteration, the Tabu Search updates an individual in the population and the probability vectors are also updated.

  • VNS [6]: Perturbations on the best solution generate new solutions. Those new solutions undergo a local search and are compared to the best solution. Perturbation strength increases when better solutions are found.

  • ITS [15]: A Tabu search is applied to an initial solution until the limit of evaluated neighbors is reached. Then, the current solution undergoes a change operator of random intensity and the Tabu Search restarts.

Three versions of MSES with minor adjustments are considered in the comparison. Having the algorithm in Fig. 1 as reference, MSES1 has the intensive strong local of lines 34–36 removed, MSES2 is exactly the same as described in Fig. 1, and MSES3 has a supplementary fast strong local search, such as the one in line 31, applied to the parent resultant from intensive strong local search in line 36 if it improves the best solution known.

In order to have relevant results, the performance of all the mentioned algorithms is compared for all the 300 time limits possible from 1 to 300 s and all MDPLib instances. Besides, in order to remove any factor that could influence the performance of the algorithms’ concepts themselves, (i) the order of all replicates of all experiments were randomized, (ii) the tests were run in computers with the same configurations for the 300 different time limits, and (iii) all the algorithms were implemented in the same language.

The computers used were Intel\(^{\circledR }\) Core i5-650 / 4M Cache / 3.20 GHz / 4GB 1333Mhz DDR3 / 500GB (7200 RPM) SATA 3.0Gb/s HD with 16MB DataBurst Cache running Windows\(^{\circledR }\) 7 SP1 Professional (32Bit OS). For fair comparison of the algorithms, all of them were implemented in MATLAB and the source codes were made available from our website.

Combinations of MSES(\(\mu \) + \(\lambda \)) with \(\mu = 1, 10, 30, 50\) and \(\lambda /\mu = 1, 2, 5, 10\) were tested for time limits of \(1,2,3\dots 300\) seconds. For each time limit, the efficiency of the algorithms on the MDPLib is compared with a Friedman test, appropriate to compare different treatments (16 combinations for each MSES + 3 algorithms from the literature) on blocks (315 instances) by ranking data within blocks. In this, \(p\)-values close to 0 indicate that the treatments do not have the same performance and the ranks indicate the performance of each method. As usual, we declare a result significant if the \(p\)-value \(< 0.05\).

For all time limits, the maximum \(p\)-value obtained is \(7.02\times 10^{-108} < 0.05\), indicating with confidence that the algorithms do not have the same performance for any time limit. The configurations of MSES with best rank for most time limits are MSES1(1 + 1), MSES1(10 + 20), MSES2(10 + 100), MSES2(30 + 150), MSES2(30 + 300), MSES3(10 + 100), MSES3(30, 150). Figures with the comparison of the MSES were made available online by the authors. Figure 2 shows the performance of each algorithm measured by their mean rank in the Friedman test for all time limits from 1 to 300 s.

The two lines with best rank for all time limits are MSES1(1 + 1) when the time limit is less than 154 s and MSES1(10 + 20) when the time limit is greater than 153 s. The other version of MSES are represented in dotted lines. The best ranks among the other algorithms belong to VNS, LTS, and ITS, respectively. The range of possible mean ranks in this graph goes from 10.91 to 34.80 and the confidence intervals, which keep the \(p\)-values \(<\) 0.05, between the methods are in the range from 12.61 to 14.55.

We also use the data to analyze the methods in relation to goals. That is, for a given goal, which algorithm takes less time to achieve it. In this case, in each Friedman test, the treatments are the algorithms and the blocks are the time spent to achieve a certain goal. Figure 2c shows the mean ranks of the algorithms for each goal. The lower the mean rank, the less time was spent to achieve the goal. Most methods have similar performance for goals lower than 70 %. For goals under 95 %, MSES2(10 + 100) has the best performance. For goals over 95 %, LTS becomes the best algorithm. However, for goals closest to the best values known, MSES1(\(1+1\)) is again the best algorithm.

In addition to those comparisons in terms of goal achievements and absolute performance, Fig. 2d shows the gap between the solutions found by the algorithms and the best solutions known. Each line represents a method and each column represents an instance. The figure shows the how most solutions are close to the best solutions and it also demonstrates the importance of using ranks to cognitively perceive the difference between the algorithms. As supplementary material, comparisons of the algorithms for different subsets, the absolute results of all tests, the best solutions known, and the gap between the best solutions and the results on the instances are also available online.

6 Conclusion and future work

All the heuristics considered in previous works [14, 20], VNS, LTS, and ITS, are smart approaches that consider specificities of the MDP. Based on their results, we propose here an MSES algorithm for the MDP with three variations in relation to its local search and variation of the parameters \(\mu \) and \(\lambda \).

MSES1(1 + 1) and MSES1(10 + 20) have the best results for time limits below 154 and above 153 s, respectively. MSES1(1 + 1) presented the best time to achieve the best results known. For easier goals, LTS and MSES2(10+ 100) have the best results. The fact that the best algorithms for easier goals do not have good results in the first test can be explained by the discussion in Sect. 4, where we show how simple heuristics may be preferred for the MDP when the goal is not to achieve the optimal results.

For the extension of this work, we propose (i) comparing other instances in which \(d_{ij}\) is defined in function of \(n\) and \(m\) is closer to \(n/2\), making the problems in fact more difficult; (ii) more tests on the adjustment of \(\lambda \) and \(\mu \); (iii) testing MSES(\(\mu ,\lambda \)), where all parents would be replaced by their children (as opposed to MSES(\(\mu +\lambda \))); and (iv) taking advantage of the population-based algorithm to evolve the individuals in parallel and accelerate convergence.