Introduction

Consider n objects (such as points in the plane or nodes of a network) with a given distance between every pair of points. We wish to find a cluster of m points which minimizes the total distance between all pairs of points in the cluster. This cluster can be interpreted as the “tightest” cluster of m points. This is similar to the one facility version of the max-cover problem (for a network or discrete formulation, see Daskin 1995; Current et al. 2002; for planar models, see Drezner 1981 for one facility, and Watson-Gandy 1982 for several facilities) where we wish to find the location of several facilities which cover the maximum number of points within a given distance.

Examples of applications include the selection of a group of m people out of n available people. The distance between a pair of persons is a measure of compatibility or a measure of being able to work together. The ideal group will have the most compatibility among the group members and the least potential for conflicts. Another application is selecting a subset of m points out of a given set of n points in the plane to be connected by links. We wish to minimize the total length of all links. The grey pattern problems in the context of the Quadratic Assignment Problem (Taillard 1995) can be formulated in an identical way. The grey pattern problem (Taillard 1995) is based on a rectangle of dimensions n 1 by n 2. A grey pattern of m black points is selected from the n=n 1×n 2 points in the rectangle while the rest of the points remain white. This forms a “grey pattern” of density \(m/n\). The objective is to have a grey pattern where the black points are distributed as uniformly as possible. This objective is achieved by defining a distance between pairs of points according to some rule. For more details see Taillard (1995).

All computational experiments were performed on a 2.8 GHz Pentium IV desktop computer. Programs were coded in FORTRAN and compiled by Microsoft FORTRAN PowerStation 4.0.

Formulation

Let

n :

be the number of points,

m :

be the number of points to be selected for the cluster,

d ij :

be the distance between points i and j (d ij =d ji , d ( ii )=0).

Let M of cardinality m be the subset of selected points. The objectivefunction is to minimize

$$F{\left( M \right)} = {\sum\limits_{i \in M} {{\sum\limits_{j \in M} {d_{{ij}} } }} }$$
(1)

This problem can be formulated as a quadratic assignment problem (Rendl 2002). The QAP is a combinatorial optimization problem stated for the first time by Koopmans and Beckmann (1957). The problem is defined as follows. A set of n possible sites are given and n facilities are to be located on these sites, one facility at a site. Let c ij be the cost per unit distance between facilities i and j and d ij be the distance between sites i and j. The cost f to be minimized over all possible permutations, calculated for an assignment of facility i to site p(i) for i=1,... n, is:

$$f = {\sum\limits_{i = 1}^n {{\sum\limits_{j = 1}^n {c_{{ij}} } }d_{{p{\left( i \right)}p{\left( j \right)}}} } }$$
(2)

The quadratic assignment problem is a very difficult optimization problem and only recently problems of up to n=36 points were solved optimally (see Drezner et al. 2005).

We define the weight matrix {c ij } as c ij =1 for i, jm and c ij =0 otherwise, and the distance matrix is defined by the given distances {d ij }. Every permutation of the points defines the selected group as the first m points of the permutation and the value of the objective function is the sum of all the distances among the selected group members. This is the formulation used by Taillard (1995) and two problems of this type of grey pattern are available at QAPLIB http://www.seas.upenn.edu/qaplib) and are called Tai64c and Tai256c. Tai64c is a grey pattern problem in a square of 8 by 8 points (n=64) and m=13 black points. Tai256c is a grey pattern problem in a square of dimensions 16 by 16 (n=256) and m=92 black points. Taillard and Gambardella (1997) define 126 grey pattern problems similar to Tai256c for n=256 and 3≤m≤128. Since this quadratic assignment formulation has a special structure, it is easier to solve as pointed out in Taillard (1995). In this paper, we use the formulation (1) rather than the general QAP formulation with successful results.

The branch and bound algorithm

Consider the matrix of symmetric distances d ij of size n by n with a zero diagonal (see Fig. 1). Suppose that m indices p 1, p 2,..., p m are selected for the cluster. Half the value of the objective function can be calculated by summing the distances in columns p 1, p 2,...p m whose rows belong to a lower p j . These distances are all in the upper triangle of the matrix. All the combinations of selecting m columns out of n are implicitly scanned. A lower bound is constructed for every partial selection of columns. The first column is selected and the lower bound calculated, then the second column is selected and the lower bound calculated, and so on until the lower bound is greater than or equal to the best found solution. When the lower bound of the last selected column is greater than or equal to the best found solution, the last selected column is advanced by one place. Suppose km columns have been selected. If the last column is at column number nm+k, no further advancement is necessary, and the next to last selected column (k−1) is advanced one place, the selection of k is ignored and the process continues until the first selected column is column number nm+1.

Fig. 1
figure 1

Calculating the lower bounds

The three lower bounds

We first detail the calculation of several values which are used in the proposed lower bounds. Suppose that km columns p1<p2...<p k are selected and we wish to find a lower bound for all possible selections of the remaining mk columns (all beyond column p k ). The objective function is the sum of the “relevant” distances in zones 1, 2, and 3 (see Fig. 1).

Preliminary values

Let the first r−1 distances in column r (the values “above” the zero diagonal) be sorted in increasing order yielding the vector δ jr for j=1,...,r−1. Define D s as a lower bound on the sum of the distances in any column if it is the s+1th selected column (i.e. s distances are selected from this column). For example, the first selected column, p 1, includes no distances to be added to the value of the objective function because all relevant distances are below the diagonal; the second selected column, p 2, includes only one distance to be added to the objective function, the distance in row p 1; and so on. The value of D s is:

$$D_{s} = {\mathop {\min }\limits_{s + 1 \leqslant r \leqslant n - m + s} }{\left\{ {{\sum\limits_{j = 1}^s {\delta _{{jr}} } }} \right\}}$$
(3)

The value of the objective function is the sum of the selected distances in zones 1,2, and 3 (see Fig. 1). We term these sums as s 1, s 2, and s 3, respectively.

For a given partial selection p 1<p 2...<p k we have the following bounds.

Bounds independent of the partial selection

Zone 1: \(s_{1} = {\sum\limits_{i = 1}^{k - 1} {{\sum\limits_{j = 1 + 1}^k {dp_{i} p_{j} } }} }\). This value is the same for all possible selections of the remaining mk columns.

Zones 2&3: \(s_{2} + s_{3} \geqslant B_{{23}} {\left( k \right)} = {\sum\limits_{i = k}^{m - 1} {D_{i} } }\). The values of B 23(k) for every 1≤km can be calculated before the branch and bound procedure and need not be repeated for any partial selection.

Zone 3: \(s_{3} \geqslant B_{3} {\left( k \right)} = {\sum\limits_{i = 1}^{m - k - 1} {D_{i} } }\). The value of B3 (k) can also be calculated for every k and its value is independent of the partial selection.

Bounds dependent on the partial selection

These bounds need to be recalculated for each partial selection.

Zone 2: For every r>p k calculate \(V_{r} = {\sum\limits_{i = 1}^k {d_{{p_{i} r}} } }\). Sort the values {v r } in increasing order and let {u r } be the sorted vector. Then, \(s_{2} \geqslant B_{2} {\left( k \right)} = {\sum\limits_{i = 1}^{m - k} {u_{i} } }\). This bound depends on the partial selection and need to be recalculated for every partial selection.

Zones 2&3: For every r>p k sort the distances in column r starting at row p k +1 and ending at row r (including the zero diagonal value). The sorted vector is {w ir } for i=1,...,r−p k . Let \(D^{{{\left( 1 \right)}}}_{i} = {\mathop {\min }\limits_{p_{k} + i \leqslant r \leqslant n} }{\left\{ {u_{r} + {\sum\limits_{j = 1}^i {w_{{jr}} } }} \right\}}\).

Then, \(s_{2} + s_{3} \geqslant B^{{{\left( 1 \right)}}}_{{23}} {\left( k \right)} = {\sum\limits_{i = 1}^{m - k} {D^{{{\left( 1 \right)}}}_{i} } }\).

The suggested lower bounds

LB1: LB 1=s 1+B 23(k). This lower bound is calculated quickly because B 23(k) need not be recalculated for every partial selection. However, this bound may not be very tight.

LB2 : LB2=s 1+B23 (1)(k). This lower bound is tighter than LB1 but it takes longer to calculate because B 23 (1)(k) needs to be recalculated for every partial selection. One possible weakness of LB2 is that in the event that a column in Zone 2 has relatively low values, it may be selected repeatedly in calculating the components of B 23 (1)(k).

LB3: LB 3=s 1+B 2(k)+B 3(k). This lower bound is also tighter than LB1 but it takes longer to calculate because B 2(k) needs to be recalculated for every partial selection. An advantage of LB3 is that B 2(k) is indeed the lowest possible value of s 2. However, the bound for s 3 is not necessarily calculated by using the same columns and may not be that tight. For larger k’s the contribution of Zone 3 to the objective function is diminished. Therefore LB3 is effective for larger k’s.

Note that s 1 needs to be calculated for each lower bound. B 23(k) is calculated once before the branch and bound procedure for every k so LB1 is the quickest one to calculate. Therefore, when applying LB2 or LB3, LB1 is calculated at no extra effort and if it is greater than or equal to the best found solution, there is no need to calculate the lower bound LB2 or LB3.

Optimal solutions to grey pattern problems

We first experimented with the grey pattern Tai64c problem. In Table 1 we report the run times required to prove that the best known value of 1,855,928 is indeed optimal by total enumeration, and branch and bound using each of the bounds. The shortest run time was required by using a branch and bound procedure applying the lower bound LB3.

Table 1 Times for finding the optimal solution to Tai64c

We then used the branch and bound algorithm to solve grey pattern problems with n=256 and small values of m between 3 and 8. The run times are depicted in Table 2. All best known solutions for these problems are proven optimal. The lowest run time was required for LB1.

Table 2 The optimal solution to n=256 grey pattern problems

Heuristic algorithms

We propose five heuristic solution procedures: Greedy, Steepest Descent, Tabu Search, Simulated Annealing, and an Evolutionary Algorithm. The first two algorithms take a very short run time but the three metaheuristics tend to provide higher quality solutions.

A greedy algorithm

The greedy algorithm starts with a point (the “kernel” point) and builds the cluster by adding one by one points close to the cluster. Every point is tested as the kernel point and the best cluster is selected as the result of the greedy algorithm.

The following is repeated n times, once for each point being the “kernel” point.

  1. 1.

    The selected cluster consists of the kernel point.

  2. 2.

    Go over all the points which are not in the cluster and evaluate the sum of all distances between them and all the points in the cluster.

  3. 3.

    Add to the cluster the point with the minimum sum of distances to all cluster points.

  4. 4.

    Go to Step 2 unless the cluster contains m points.

  5. 5.

    If the cluster contains m points, stop.

The best cluster obtained by all n possible kernel points is the result of the Greedy algorithm. The greedy algorithm requires O(m 2 n 2) time.

A steepest descent algorithm

The steepest descent algorithm is very similar to the algorithm proposed by Teitz and Bart (1968) for the heuristic solution of the p-median problem.

  1. 1.

    Select a starting cluster M of m points.

  2. 2.

    Evaluate the change in the value of the objective function for all possible exchanges between a point in the cluster and a point not in the cluster.

  3. 3.

    If there is an improving exchange, perform the best improving exchange and go to Step 2; otherwise go to Step 4.

  4. 4.

    Stop the algorithm with the final cluster as the solution.

The short cut

Note that evaluating the difference in the value of the objective function can be expedited by the following approach. Define \(S_{i} {\left( M \right)} = {\sum\limits_{j = 1}^m {d_{{ip_{j} }} } }\). The n values of S i (M) are calculated once at the beginning of the descent algorithm for the starting solution M. This requires O(mn) time and is performed once. In each iteration

  1. 1.

    Evaluate the change in the value of the objective function by a removal of a point kM from the cluster and adding a point lM to the cluster. The change in the value of the objective function is S l (M)−S k (M)−d kl . This takes O (1) time and is evaluated m(nm) times, once for each possible exchange.

  2. 2.

    Suppose that the selected move is to remove k from the cluster and add l to the cluster forming cluster M′. Then, S i (M′)=S i (M)+d il d ik . The calculation of all S i(M) requires O(n) time.

  3. 3.

    An iteration thus requires O(m(nm)) time.

The short cut is more efficient than evaluating the m(nm) exchanges directly. The calculation of an exchange directly requires O(m) time for a total time of O(m 2(nm)) per iteration.

A tabu search

Tabu search is a commonly used metaheuristic (Glover 1986; Glover and Laguna 1997; Salhi 1998). The parameters for the tabu search applied in this paper are: the definition of the tabu list as points recently removed from the cluster, the tabu tenure TT which is randomly generated each iteration in [T min,T max], and the number of iterations, N. We also employ the stipulation that if the best found solution is improved in an iteration, the tabu list is emptied. The details of the tabu search are:

  1. 1.

    Select a starting solution. Its value of the objective function is the best found solution.

  2. 2.

    Empty the tabu list.

  3. 3.

    If the number of iterations exceeds N, stop with the best found solution as the result of the algorithm.

  4. 4.

    Randomly generate the tabu tenure TT in [T min,T max].

  5. 5.

    Evaluate all m(nm) possible exchanges.

  6. 6.

    If an exchange leads to a solution better than the best found solution, perform the best exchange, update the best found solution, and go to Step 2; otherwise go to Step 7.

  7. 7.
    • Perform the best exchange (whether improving or not) excluding exchanges for which the tenure of the entering point in the tabu list is less than the tabu tenure TT.

    • Enter the point removed from the cluster to the tabu list.

    • Go to Step 3.

Since all m(nm) possible exchanges are evaluated every iteration in Step 5, the short cut used for the steepest descent algorithm (Section 4.2.1) is used also for the tabu search.

A simulated annealing algorithm

Simulated annealing is also a commonly used metaheuristic. It was suggested by Kirkpatrick et al. (1983) and simulates the annealing of metals from a high temperature liquid to a low temperature solid (see also Glover and Laguna 1997; Salhi 1998). The parameters for the simulated annealing are: the starting temperature T 0, the cooling factor α, and the number of iterations N. The variant used in this paper is:

  1. 1.

    Set the temperature T=T 0. Generate a starting cluster.

  2. 2.

    Randomly select a point in the cluster to be removed and a point not in the cluster to be added to the cluster.

  3. 3.

    Calculate the change in the value of the objective function ΔF.

  4. 4.

    If ΔF≤0 perform the exchange and go to Step 6.

  5. 5.

    Calculate \(\delta = \frac{{\Delta F}}{T}\). Perform the exchange with probability \(e^{{ - \delta }} \). Otherwise, retain the current cluster.

  6. 6.

    Multiply the temperature T by α, and if the number of iterations does not exceed N, go to Step 2.

  7. 7.

    Stop the procedure with the best found cluster during the process as the result of the algorithm.

The simulated annealing procedure requires O(mN) time. However, the calculation of \(e^{{ - \delta }} \) and the random number associated with the calculation of the probability requires quite a large proportion of the computer time. Since for δ>10 the probability of accepting a move is negligible (4.54×10−5 ), it is assumed “0” and the probability is not calculated.

Evolutionary algorithms

A population of P member solutions is maintained. Each population member M is represented by a set of m points.

  1. 1.

    Randomly generate a population of P solutions.

  2. 2.

    Repeat the following G times (generations)

    • Randomly select two members of the population and merge them to produce an offspring M′.

    • If F(M′) is not better than the worst population member, do not change the population and start the next generation.

    • If F(M′) is better than the worst population member, then

      • If the offspring is identical to an existing population member, do not change the population and start the next generation.

      • Otherwise, replace the worst population member with M′ and start the next generation.

  3. 3.

    The best population member is the resulting solution.

The most important part of an evolutionary algorithm is the merging process applied to produce an offspring. We tested two merging processes with a parameter K: the descent merging process and the tabu merging process.

The descent merging process

The descent merging process is similar to the merging process suggested in Berman and Drezner (2005).

  1. 1.

    The two parents are M 1 and M 2, each represented by a set of m points.

  2. 2.

    The intersection between M 1 and M 2 is: M I=M 1∩M2. The cardinality of the intersection is m I.

  3. 3.

    The union of M 1 and M 2 is: M U=M 1M 2. The cardinality of M U is 2mm I.

  4. 4.

    K different points not in M U are selected to form M K (if 2mm I +K>n then select only n−2m+m I points).

  5. 5.

    All the points in M U which are not in M I define M E. The cardinality of M E is 2m−2m I.

  6. 6.

    Define M D=M EM K. The cardinality of M D is m D=min{nm I, 2m−2m I+K}.

  7. 7.

    A starting offspring M′ is created by randomly adding mm I points from M D to M I.

  8. 8.

    A restricted descent process is performed on M′ by adding or removing only points in M D and keeping the points in M I fixed.

  9. 9.

    The result of the restricted descent process is the offspring.

Note that the shortcut described in Section 4.2.1 can be applied to the restricted descent process.

The tabu merging process

We also experimented with a tabu extension of the restricted descent search. Let h be the number of iterations performed by the restricted descent algorithm. A restricted tabu search of 5h iterations is performed following the approach outlined in Section 4.3. We need to select mm I points out of m D points. If m D−(m−m I)≤5, the tabu search is not performed and the result of the descent algorithm is applied. Note that the shortcut described in Section 4.2.1 can be applied to the restricted tabu search.

These merging processes do not resemble neither the standard crossover operator nor the hybrid genetic algorithm approach. However, they combine elements of both and thus the procedure is termed “an evolutionary algorithm” because it does not conform to the standard genetic or hybrid genetic algorithms.

Computational experiments with heuristic algorithms

We first solved the 126 grey pattern problems with n=256 (Taillard and Gambardella 1997). Optimal solutions were obtained for problems with 3≤m≤8 (see Table 2), and the rest were solved by heuristic algorithms. We then used the 40 problems suggested by Beasley (1990) for the p-median problem on a network, and solved them as cluster problems.

Grey pattern problems

We present the results in non-chronological order because values presented in later tables require the value of the best known value (BKV) depicted in Table 3. All BKVs where obtained by the evolutionary algorithm with the tabu merging process and adding K=3 dots. The original best known values are reported in Taillard and Gambardella (1997). Misevicius (2003a,b, 2004, 2005) improved some best known values. Eight new improved BKVs are reported in Table 3.

Table 3 Best known results for the grey pattern problems

The greedy heuristic solved all 126 problems in less than 2 min. It found the BKV for 11 problems (m=3, 4, 8, 16, 112, 120, 124–128). The average was 1.437% over the BKV.

The descent approach was replicated 1,000 times for each problem. Solving all 126 problems took about 4 min. It found the BKV at least once in 1,000 trials for 25 of the problems (m=3–17, 20, 21, 23, 24, 31, 33, 123, 125, 126, 128). The minimum value in 1,000 trials was, on the average, 0.250% above the BKV.

We tried many possible parameters for the simulated annealing approach but could not find good parameters for it. We therefore report only results obtained by the tabu search and the evolutionary algorithm. Since problems with m≤21 and m≥113 are easily solved by all metaheuristic approaches, we report in the rest of this section results of experiments with problems of 22≤m≤112 dots.

Experiments with tabu search

We tried various values for T min and T max and following the experience in Drezner and Marcoulides (2005) and our own extensive experiments we applied successfully a wide range for the tabu tenure with T min=0.02(nm) and T max=0.2(nm). We tested the tabu search with N=2,000n and N=10,000n iterations. Each problem was solved 100 times. The computational results are depicted in Table 4 (run times are given in Table 7). There seem to be three distinct “regions” of m. Problems with m≤64 and problems with m≥95 are, with a few exceptions, easily solved by the tabu search. Problems with 65≤m≤94, with a few exceptions, seem to be difficult for tabu search. Overall, tabu search with N=10,000n iterations failed to find the BKV in 12 problems but found the BKV in all 100 replications for 41 problems (see Table 8). Increasing the number of iterations from 2,000n to 10,000n improved the performance of the algorithm. However, we believe that diversification approaches may further improve the performance of the algorithm without adding run time.

Table 4 Results for the grey pattern problems by tabu search

Experiments with the evolutionary algorithm

We tested the evolutionary algorithm with both suggested merging processes. Following extensive experiments we set the population size at 300 (slightly higher than n=256) and the number of generations to 1,000n=256,000. In order to determine the preferred value of the parameter K, we solved the 91 problems (22≤m≤112) ten times for various values of K. The results are summarized in Table 5. For the descent merging process, the performance seems to improve with the increase in the value of K. However, run time increases as well. By inspection of the table we selected K=7 for further experiments. We also depict in Table 5 experiments with various values of K for the tabu merging process. There seem to be a drop in performance when K is increased from K=3 to K=5. To confirm this phenomenon, we also report the performance of the algorithm for problems with 72≤m≤94. This range includes all new BKVs, and seems to contain more difficult problems for this approach. Consequently, we selected K=3 for further experiments with the tabu merging process. The results of 100 replications for each problem are reported in Table 6 (and run times reported in Table 7). The results are much better for the tabu merging process but run times are about eight times longer for the tabu merging process. The descent merging algorithm missed the BKV for 23 problems while the tabu merging algorithm found it at least once for all problems. The average number of found BKV increased from about 24 to 61% (see Table 8). The average solution was 0.067% over the BKV for the descent merging algorithm and only 0.016% for the tabu merging algorithm.

Table 5 Summary of experiments with ten replications per problem
Table 6 Results for the grey pattern problems by the evolutionary algorithm
Table 7 Run times (min/run) for the grey pattern problems
Table 8 Summary of grey pattern problems (22≤m≤112)

Cluster problems

We tested the cluster problems on the 40 problems given in Beasley (1990) for the p-median problem. These problems range between n=100 and n=900 points, and the cluster sizes range between m=5 and m=200 points. We were unsuccessful in finding good parameters for the tabu search so we report in Table 9 only results for the branch and bound with LB3 (which was the best variant), the greedy algorithm, and the steepest descent approach. In Table 10, we report the results for simulated annealing (with the following parameters: T 0=1,000, N=50,000n, and \(\alpha = 1 - \frac{{10}}{N}\)), and the evolutionary algorithm using the tabu merging process with a population size of P=n and 1,000n generations. The branch and bound procedure was terminated after 1 h if it did not finish, the descent algorithm was replicated 1,000 times, and the simulated annealing and the evolutionary algorithm were replicated ten times each.

Table 9 Cluster solutions for branch and bound, greedy, and descent
Table 10 Cluster solutions for simulated annealing and evolutionary algorithm

Examining Tables 9 and 10, we conclude that 21 of the 40 problems were solved to optimality within 1 h of computer time. Twenty nine problems terminated with the best known solution. Overall, the final solution of the branch and bound algorithm was 0.072% above the best known solution. The greedy algorithm found the best known solution for 25 of the 40 problems. The solution of the greedy algorithm was, on the average, 0.084% over the best known solution. The descent approach and the simulated annealing found the best known solution at least once for all 40 problems. The evolutionary algorithm failed to find the BKV for one problem. The run time for 1,000 replications of the descent approach was faster than the run time for ten replications of the simulated annealing or the evolutionary algorithm by a factor of about 40. This means that for “fair” comparison we have to run the descent algorithm 40,000 times. The descent algorithm found the best known solution in about 38% of the replications while simulated annealing found it in almost 99% of the cases. The evolutionary algorithm found it in about 94% of the cases. We conjecture that all best known solutions are optimal. Twenty one problems were solved to optimality. The descent algorithm found the best known solution at least 99 times out of 1,000 replications and the simulated annealing algorithm found it at least nine times out of 10 for the remaining 19 problems. The recommended procedure is the descent algorithm which seems to perform very well in an extremely short computer time (less than half a second required for the largest problem). Simulated annealing requires much longer time (about 2 min for the largest problem) but finds the BKV in a large proportion of the cases. The Evolutionary algorithm, while performing quite well, is inferior to the simulated annealing algorithm both in the quality of the solution and run time.

Conclusions

We presented the cluster selection problem and proposed three robust algorithms and five heuristic algorithms for its solution. The same algorithms are also applied for the solution of the grey pattern problem which is usually formulated as a quadratic assignment problem. The computational results demonstrated the efficiency and effectiveness of the proposed solution methods.

As future research, we suggest to examine possible improvements to the lower bounds. We experimented with bounds that depend on the values of k and p k (the last column in Zone 1) which can be calculated once before the branch and bound process. Such bounds for Zones 2 and 2&3 may improve the performance of the branch and bound. However, in our experiments we did not observe significantly improved performance. It is also suggested to further investigate metaheuristic algorithms for the solution of these problems and employ a diversification strategy for the tabu search.