Keywords

1 Introduction

The Set Covering Problem (SCP) is a combinatorial problem that can be described as the problem of finding a subset of columns from a m-row, n-column zero-one matrix \(a_{ij}\) such that they can cover all the rows at minimal cost. The SCP can be formulated as follows:

$$\begin{aligned} Minimize \, Z = \sum _{j=1}^n c_j x_j \;\;\;\;\;\;\;\; j\in \{ 1, 2, 3, ... , n \} \end{aligned}$$
(1)

Subject to:

$$\begin{aligned} \sum _{j=1}^n a_{ij} x_j \ge 1 \;\;\;\;\;\;\;\; i\in \{ 1, 2, 3, ... , m \} \end{aligned}$$
(2)
$$\begin{aligned} x_j \in \{0,1\}, \end{aligned}$$
(3)

where \(c_{j}\) represents the vector cost. The SCP is a NP-hard problem [9] that has been used to model many problems as scheduling, manufacturing, services planning, information retrieval, etc. [1, 7]. Several algorithms have been developed for solving SCP instances. Exact algorithms [6], even though they can reach the global optima, they require substantial time for solving large instances. Greedy algorithms [8] are a good approach for large instances, but rarely generates good solutions because of its myopic and deterministic nature. Another approach are Probabilistic greedy algorithms [10, 13], which often generates better quality solutions than the deterministic counterparts. Metaheuristics are commonly the best way to solve large SCP instances, some of them are: Genetic algorithms [3, 18], Neural Network algorithms [16], Simulated Annealing [11], Ant Colony Optimization [14], and many more.

In this work, we propose an algorithm for solving the SCP that is based in the well known beam-search algorithm. It has been used in many optimization problems [4, 5, 12, 19]. Beam-search is a fast and approximate branch and bound method, which operates in a limited search space to find good solutions for optimization problems. It constructs a search tree by using a breadth-first search, but selecting only the most promising nodes by using some rule. Our implementation selects these nodes using a simple greedy algorithm that can be seen as a Depth-first search. The greedy will find a solution and returns its fitness, which will be used to select and discard nodes from the search tree.

This paper is organized as follows: Sect. 2 describes our Beam-Search implementation for the SCP, Sect. 3 shows the result that we obtained by using a well known set of SCP benchmarks instances, finally conclusions and future work can be found in Sect. 4.

2 Beam Search

Beam Search [15] is a deterministic heuristic algorithm that constructs a search-tree. It begins with an empty solution at the root node and gradually construct solution candidates, level by level. At each level of the tree, two procedures are applied: PromisingChildren and SelectBest. While the first one expand each node by the \(n_p\) most promising children using some criteria, the second one choose the \(n_s\) most promising nodes from the current level. Given this, at the level 0 the tree will have one node; at the level 1 \(n_s\) nodes; from the level 2 the algorithm will select \(n_s\) nodes from a pool of at most \(n_s*n_p\) nodes. Beam Search lacks of completeness, because the optimal solution could be pruned during the search process. The Algorithm 1 corresponds to the classic beam-search described before.

figure a

2.1 Our Implementation

For adapting this algorithm to the SCP, we consider the following: PromisingChildren determinates the \(n_p\) most promising children from the current node. This is achieved by calculating, for each non-instantiated variable, a value using one of the following functions: \(c_j/k_j\), \(c_j/k_j^2\), \(c_j/(k_j\log (1+k_j))\), \(c_j^{1/2}/k_j\), \(c_j/k_j^{1/2}\) and \(c_j/ \log (k_j+1)\) [8, 13]. The variable \(k_j\) represents the number of currently uncovered rows that could be covered by the column j. The function is selected in a random way and it is used for all the nodes of the current level. Then, the \(n_p\) variables with the lowest values are instanciated. After that, we run a greedy algorithm for each of the new candidates nodes by using the procedure Greedy-SelectBest. This greedy attempts to construct a branch (one node per level), using the same function selected in PromisingChildren, until a solution is reached. At the end of this process, each node will have an associate solution. The procedure will select the \(n_s\) nodes with the best objective function value. The best solution founded in the search it is used to discard nodes with a worst objetive function value.

Unlike the classic algorithm, the search does not stop when a solution is founded or all nodes are discarded, instead, we set a fixed number of nodes to be generated (See Algorithm 2).

figure b

2.2 Preprocessing

Preprocessing is a popular method to speedup the algorithm. A number of preprocessing methods have been proposed for the SCP [2]. In our implementation, we used the most effective ones:

  • Column domination: Any column j whose rows \(I_j\) can be covered by other columns for a cost less than \(c_j\) can be deleted from the problem, however this is an NP complete problem [9]. Instead, we used the rule described in [17].

  • Column inclusion: If a row is covered by only one column after the above domination, this column must be included in the optimal solution.

3 Experiments

Our approach has been implemented in C++, on an 2.4GHz CPU Intel Core i7-4700MQ with 8gb RAM computer using Ubuntu 14.04 LTS x86_64. In order to test it, we used 45 SCP instances from OR-LibraryFootnote 1 which are described in Table 1. Optimal solutions are known for all of these instances.

Table 1 Detail of the test instances
Table 2 Experiments using \(n_p=20\) and \(n_s=10\)
Fig. 1
figure 1

Convergence plots for the a scp41, b scp42 and c scp43 instances

Our algorithm was configured before perform the search. Each of these instances were executed 20 times, with several values of \(n_p\) and \(n_s\). The best results (related to the avg. value) were obtained by using \(n_p=20\) and \(n_s=10\). We set as stop criteria a maximum of 1000 nodes in the search tree. After reaching this value, the algorithm did not show a big improvement in the solutions. Table 2 shows the results by using this configuration.

The column Optima represents the lowest objective function value for a particular instance. Min-value and Max-value represent the lowest and the maximum objective function value, respectively, obtained for our proposal in 20 executions. The mean value of these 20 executions are shown in the column Avg. The column RPD represents the Relative Percent Difference. This measure can be defined as follows:

$$\begin{aligned} RPD = \frac{(\text {Min-value}-\text {Optima})}{\text {Optima}}\times 100. \end{aligned}$$
(4)

Convergence plots can be seen in Fig. 1.

4 Conclusion and Future Work

In this work we have presented a beam-search approach with a greedy algorithm to solve the SCP. Our approach applies a greedy algorithm in each node to find solutions by using a set of simple functions that choose promising variables. Experiments show very promising results, considering that the technique in not yet fully exploited. In a future work we plan to do a more guided search by using a nogood-like learning strategy,Footnote 2 that should reduce the size of the search tree. Also, we plan to adapt this technique for the bi-objective SCP formulation.