1 Introduction

One of the most fundamental and well-studied \({\mathcal {NP}}\)-hard problems in the field of machine scheduling is the makespan minimization on identical parallel machines where a set of n independent jobs \(\mathcal {J}=\{J_1,\ldots ,J_n\}\) with positive processing times \(t_1,\ldots ,t_n\) has to be assigned to m identical parallel machines \(\mathcal {M} =\{M_1,\ldots ,M_m\}\) in order to minimize the latest machine completion time. A related and in some sense dual but by far not as well-studied \({\mathcal {NP}}\)-hard problem is obtained when the objective is changed from minimizing the makespan \(C_{\max }\) to maximizing the minimum completion time \(C_{\min } = \min \{C_1,\ldots ,C_m\}\)—without introducing idle times—where \(C_i\) is the sum of processing times of all jobs assigned to \(M_i\). While the former problem (which is denoted by \(P||C_{\max }\) using the three-field notation of Graham et al. 1979) is a kind of packing problem, the latter problem (abbreviated as \(P||C_{\min }\)), which is the subject of this paper, belongs to the class of covering problems and therefore is also referred to as the machine covering problem. It has been first described by Friesen and Deuermeyer (1981) in the context of spare parts assignments to machines that undergo repeated repair. As another application of \(P||C_{\min }\), Haouari and Jemmali (2008) mentioned (fair) regional allocations of investments.

Although both problems basically intend to balance the workload among a given set of resources, \(P||C_{\min }\) has received less attention than its much more prominent counterpart \(P||C_{\max }\). To the best of our knowledge, the rather scarce literature on \(P||C_{\min }\) is limited to a few studies on approximation algorithms and their worst-case ratios and up to now only one exact solution procedure has been proposed. For the well-known longest processing time rule (LPT), Deuermeyer et al. (1982) showed the minimum completion time of the LPT-schedule \(C_{\min }^\mathrm{LPT}\) to never be less than 3/4 times the optimal minimum completion time \(C_{\min }^*\). Ten years later, Csirik et al. (1992) tightened this performance bound by proving that \(C_{\min }^\mathrm{LPT}/C_{\min }^*\ge (3m-1)/(4m-2)\) is fulfilled for any fixed m. In this context, Walter (2013) recently examined the performance relationship between the LPT-rule and a restricted version of it—known as RLPT—and he proved that \(C_{\min }^\mathrm{LPT}\ge C_{\min }^\mathrm{RLPT}\). Further publications are concerned with a polynomial-time approximation scheme (PTAS) (cf. Woeginger 1997) and on-line as well as semi-on-line versions of \(P||C_{\min }\) (cf., e.g., Azar and Epstein 1998; He and Tan 2002; Luo and Sun 2005; Ebenlendr et al. 2006; Cai 2007; Tan and Wu 2007; Epstein et al. 2011). The sole publication devoted to exact solution procedures is due to Haouari and Jemmali (2008). The main features of their branch-and-bound algorithm are tight lower and upper bounds and a symmetry-breaking solution structure. However, except for small-sized instances, computational results revealed that their algorithm fails to (quickly) solve instances where the ratio of n to m ranges between two and about three.

To overcome this drawback, we approach the machine covering problem from a similar perspective as done by Walter and Lawrinenko (2014) for the dual problem \(P||C_{\max }\). In their very recent contribution, the authors present structural properties of (potentially) makespan-optimal schedules. These properties are then transformed into problem-specific dominance criteria and implemented in a tailored branch-and-bound algorithm that performs very well on instances with small n / m-values.

Motivated by Walter and Lawrinenko’s results, in this paper we propose a tailored branch-and-bound algorithm for problem \(P||C_{\min }\). Our contribution differs substantially from the contribution by Walter and Lawrinenko (2014) in the following respects: (i) we show that their central result on makespan-optimal schedules also applies to problem \(P||C_{\min }\), (ii) we extend their basic dominance criterion by several novel \(P||C_{\min }\)-specific dominance criteria derived from properties of optimal \(P||C_{\min }\)-schedules, and (iii) we develop new upper bounds on \(C_{\min }^*\). Some of these bounds are derived from the solution structure, while others exploit the coherence between \(P||C_{\min }\) and the bin covering problem (BCP). The latter problem consists in packing a set of indivisible items into as many bins as possible so that the total weight of each bin equals at least C. To the best of our knowledge, this coherence has not been mentioned before in the literature.

The paper is organized as follows. In the technical part we describe properties of \(C_{\min }\)-optimal schedules (cf. Sect. 2) and translate them into novel \(P||C_{\min }\)-specific dominance criteria (cf. Sect. 3). In the algorithmic part, we provide a concise description of the proposed branch-and-bound algorithm (Sect. 4) and we evaluate its performance on different sets of benchmark instances (cf. Sect. 5). The paper concludes with a short summary and some interesting ideas for future research in Sect. 6.

For economy of notation, we usually identify both machines and jobs by their index. Moreover, w.l.o.g, we assume the jobs to be indexed so that \(t_1\ge \cdots \ge t_n\). In addition, to avoid trivial instances we presuppose \(n > m\ge 2\).

2 Theoretical background

In this section, we derive structural properties of \(C_{\min }\)-optimal schedules, which will be transformed later into problem-specific dominance criteria.

2.1 Solution representation and illustration

Throughout this paper, we assume schedules to be non-permuted. According to Walter and Lawrinenko (2014), a schedule \(S\in \{1,2,\ldots ,m\}^n\)—where \(S(j) = i\) means that job j is assigned to machine i—is said to be non-permuted if S fulfills the following two conditions:

  1. (i)

    \(S(1) = 1\) and

  2. (ii)

    \(S(j)\in \Big \{1,\ldots ,\min \{m,1 + \max _{1\le k\le j-1} S(k)\}\Big \}\) for all \(j = 2,\ldots ,n\).

Note that due to this representation, symmetric reflections obtained by a simple renumbering of the machines are avoided.

Furthermore, instead of using Gantt charts we adopt the illustration of schedules as sets of paths which has also been introduced by Walter and Lawrinenko (2014). More precisely, a schedule S is represented by \(\left( {\begin{array}{c}m\\ 2\end{array}}\right) \) paths \(P_S^{(i_1,i_2)}\)—one for each pair \((i_1,i_2)\) of machines \((1\le i_1 < i_2\le m)\). Each path is a string of length \(n+1\) where the j-th entry (\(j=1,\ldots ,n\)) represents the difference between the number of jobs assigned to \(i_1\) and \(i_2\) in S after the assignment of the j longest jobs. Additionally, to allow for initially empty machines, we set \(P_S^{(i_1,i_2)}(0) = 0\) for all pairs. In a graphical illustration, the entries of a path are linearly connected (see Example 2.1).

Example 2.1

Let \(n=4\), \(m=3\), and consider the non-permuted schedule \(S = (1,2,2,3)\). The corresponding paths \(P_S^{(1,2)} = (0,1,0,-1,-1)\), \(P_S^{(1,3)} = (0,1,1,1,0)\), and \(P_S^{(2,3)} = (0,0,1,2,1)\) are illustrated below.

figure a

2.2 Potential optimality

The concept of potential optimality has recently been proposed by Walter and Lawrinenko (2014) and originates from the question: “Are there certain general patterns in the structure of schedules that cannot lead to optimal solutions?” Admittedly, at first glance, this approach appears to be a bit unusual as actually we intend to derive properties of optimal solutions. However, by identifying properties of solutions that can never become (uniquely) optimal, we also implicitly gain insights into the structure of solutions that have the potential to become (uniquely) optimal—which are therefore called potentially (unique) optimal.

At this point, we want to mention that the concept of potential optimality is to some extent related with the concept of inverse optimization (for a review, see Ahuja and Orlin 2001, as well as Heuberger 2004) where unknown exact values of some adjustable parameters, e.g., processing times, should be determined within given boundaries in such a way that a pre-specified solution becomes optimal and the deviation between the determined and the given values of the parameters is minimal. Although inverse optimization has attracted many researchers in different areas of combinatorial optimization during the last two decades, applications to scheduling problems (e.g., see Koulamas 2005, as well as Brucker and Shakhlevich 2009, 2011) are still rather rare. Our approach—which differs slightly from the basic idea of inverse optimization in that we intend to identify a preferably large set of solutions for which we cannot select processing times so that any of these solutions becomes uniquely \(P||C_{\min }\)-optimal—constitutes another contribution in this field.

As will be seen next (cf. Theorem 2.2), solutions contained in the set \(\mathcal {S}\) play a crucial role in the context of potentially unique optimal \(P||C_{\min }\)-schedules. The set \(\mathcal {S}\) is formally defined as

$$\begin{aligned} \mathcal {S} =&\Big \{S\,:\,\text {for each pair }(i_1,i_2)\text { there exists either a }\\&j\in \{3,\ldots ,n\} \text { so that }P_S^{(i_1,i_2)}(j) < 0,\\&\text { or }0<j_1\le j_2<n\text { so that }P_S^{(i_1,i_2)}(j) = 1\text { for }\\&\quad j=j_1,\ldots ,j_2\text { and }P_S^{(i_1,i_2)}(j) = 0\text { else }\Big \} \end{aligned}$$

and contains all schedules where each machine processes at least one job and each path of a pair of machines that processes more than two jobs in total has at least one negative entry. We say that schedules in \(\mathcal {S}\) fulfill the path-conditions or, equivalently, all \(\left( {\begin{array}{c}m\\ 2\end{array}}\right) \) paths fulfill their respective path-condition.

Revisiting the schedule \(S=(1,2,2,3)\) introduced in Example 2.1, we readily see that S is no element of \(\mathcal {S}\), although no machine remains empty and the paths \(P_S^{(1,2)}\) and \(P_S^{(1,3)}\) fulfill their path-condition. However, the path \(P_S^{(2,3)}\) does not fulfill its path-condition as it has no negative entry, although the two machines process more than two jobs in total.

Our main theorem on potential optimality reads as follows.

Theorem 2.2

Let S be a schedule which is no element of \(\mathcal {S}\). Then, S is not a potentially unique \(C_{\min }\)-optimal solution.

Proof

Consider an arbitrary schedule S which is no element of \(\mathcal {S}\). Then, there exists a pair of machines—say \((i_1, i_2)\)—whose corresponding path does not fulfill the path-condition. These two machines and the respective set of jobs currently assigned to them constitute a solution to the machine covering problem on two identical parallel machines. Since machine covering and makespan minimization on two identical parallel machines are equivalent, we can apply a result by Walter and Lawrinenko (2014). They proved that with their so-called two-machine path-modification, a two-machine schedule which does not fulfill its path-condition can be turned into a schedule whose respective path does fulfill its path-condition without increasing the maximum completion time of \(i_1\) and \(i_2\). Note that the latter is equivalent to the fact that the minimum completion time of \(i_1\) and \(i_2\) does not decrease during the modification of the given schedule.

As the two-machine path-modification does not affect any jobs on the remaining \(m-2\) machines, the minimum completion time of the transformed schedule cannot be smaller than the minimum completion time of S. Hence, S cannot be potentially unique \(C_{\min }\)-optimal. \(\square \)

Summarizing the result of Theorem 2.2, we know that every instance of the problem \(P||C_{\min }\) has an optimal solution where all corresponding \(\left( {\begin{array}{c}m\\ 2\end{array}}\right) \) paths fulfill their path-condition.

In what follows, we will make use of the previous result and deduce several \(P||C_{\min }\)-specific dominance criteria—subsumed under the term path-related dominance criteria—which will later on prove to be effective in guiding the search of a tailored branch-and-bound algorithm toward schedules which are elements of \(\mathcal {S}\).

3 Dominance criteria based on potential optimality

For a better understanding, we start with a brief repetition of the basic dominance criterion developed by Walter and Lawrinenko (2014). All other dominance criteria presented within this section are novel and \(P||C_{\min }\)-specific.

3.1 The basic criterion

Given a partial solution \(\tilde{S}\) where the k longest jobs have already been assigned, the basic criterion is readily obtained from the characterization of potentially unique optimal schedules (cf. Theorem 2.2). Recalling that each pair \((i_1,i_2)\) of machines \(1\le i_1<i_2\le m\) has to fulfill its path-condition, for each \(i_2 \in \{m,m-1,\ldots ,2\}\) we simply have to count the minimum number of jobs \(v_{i_2}^k\) that still have to be assigned to \(i_2\) so that all paths fulfill their path-condition. According to Walter and Lawrinenko (2014), \(v_{i_2}^k\) can be computed as

$$\begin{aligned} v_{i_2}^k = \max _{\begin{array}{c} i_1=1,\ldots ,i_2-1 \\ PF_{\tilde{S}}^{(i_1,i_2)}(k)=0 \end{array}}\big \{P_{\tilde{S}}^{(i_1,i_2)} (k)\big \}+\gamma _{i_2}^k, \end{aligned}$$
(1)

where \(PF_{\tilde{S}}^{(i_1,i_2)}(k) = 0\) indicates that the pair \((i_1,i_2)\) does not currently fulfill its path-condition and \(\gamma _{i_2}^k\) is a correction term that is either 0 (iff all machines \(i_1<i_2\) process at most one of the first k jobs) or 1 (iff at least one machine \(i_1 < i_2\) processes more than one job), respectively. Clearly, \(v_{i_2}^k\) is set to 0 if all pairs \((i_1,i_2)\) currently fulfill their path-condition. Then, the basic criterion reads as follows.

Criterion 3.1

If

$$\begin{aligned} \sum _{i=1}^m v_i^k > n-k \end{aligned}$$
(2)

for some \(k<n\), then the current partial solution can be fathomed.

With regard to the next section, we define \(v_1^k = 0\) for all \(k>1\).

3.2 Further improvements

Up to now, the basic dominance Criterion 3.1 does not consider any machine completion times and therefore offers the potential for some improvements. Clearly, in a new incumbent solution, each machine completion time has to be at least as large as \(L+1\) where L is the best known minimum completion time so far. Based on this information, at first we will show how some of the demands \(v_i^k\) can be tightened and secondly we will check whether the current partial solution admits the possibility to become the new incumbent.

Recalling that we consider a partial solution where the k longest jobs have already been assigned, we let \(\hbox {LOW}^k\) denote the set of machines with current completion time \(C_i^k\) at most L, i.e., \(\hbox {LOW}^k = \{i : C_i^k \le L\}\), and \(\delta _i^k = L+1-C_i^k\) for \(i\in \hbox {LOW}^k\) denotes the gap between \(L+1\) and \(C_i^k\) (see Fig. 1).

Fig. 1
figure 1

Illustration of \(\delta _i\)

In other words, to improve on the currently best solution (i.e., the incumbent), machine i \(\in \hbox {LOW}^k\) has to run at least \(\delta _i^k\) units of time longer than now. Thus, at least

$$\begin{aligned} l_i^k = \min \Big \{\beta \in \{1,\ldots ,n-k\} : \sum _{j=1}^\beta t_{k+j}\ge \delta _i^k\Big \} \end{aligned}$$
(3)

jobs still have to be assigned to \(i\in \hbox {LOW}^k\) in order to finally yield \(C_i>L\). If no such \(\beta \) exists, then it is impossible to complete the current partial solution in such a way that a new incumbent solution is obtained. In this case, we set \(l_i^k = \infty \). Eventually, the number of required jobs can be updated:

$$\begin{aligned} \bar{v}_i^k = {\left\{ \begin{array}{ll} \max \{v_i^k,l_i^k\} &{} \text {if } i \in \hbox {LOW}^k \\ v_i^k &{} \text {else.} \end{array}\right. } \end{aligned}$$
(4)

Hence, a tighter version of Criterion 3.1 reads as follows.

Criterion 3.2

If

$$\begin{aligned} \sum _{i=1}^m \bar{v}_i^k > n-k \end{aligned}$$
(5)

for some \(k<n\), then the current partial solution can be fathomed.

After having updated the number of required jobs, we will incorporate the processing times of the remaining jobs and derive further criteria. Therefore, we let \(T_\mathrm{rem}^k = \sum _{j=k+1}^n t_j\) denote the total remaining processing time and we define \(\varDelta ^k = \sum _{i\in \hbox {LOW}^k} \delta _i^k\). Then, it is easy to observe that there is no need to further consider a partial solution if \(T_\mathrm{rem}^k < \varDelta ^k\). In this case, it is not realizable to assign the remaining jobs to the machines in \(\hbox {LOW}^k\) so that finally \(C_i>L\) is fulfilled for all \(i\in \hbox {LOW}^k\), i.e., no matter how we complete the current partial solution, we cannot obtain a new incumbent solution.

In what follows, we are concerned with tightening the inequality \(T_\mathrm{rem}^k < \varDelta ^k\), i.e., we want to identify (at low computational costs) a feasible value \(R^k>0\) so that the current solution can already be excluded from further searching whenever

$$\begin{aligned} T_\mathrm{rem}^k - R^k < \varDelta ^k \end{aligned}$$
(6)

is fulfilled. For this purpose, in a first step we take a look at those machines which already run longer than L but require at least one more job to fulfill the path-conditions. We summarize these machines in the set \(\mathrm{PATH}^k\) \( = \{i : C_i^k>L\) \(\text { and } v_i^k > 0\}\) and define

$$\begin{aligned} V_\mathrm{PATH}^k = \sum _{i\in \mathrm{PATH}^k} v_i^k \end{aligned}$$
(7)

which gives the number of required jobs added over all \(i\in \mathrm{PATH}^k\). Then, we readily observe that at least \(V_\mathrm{PATH}^k\) of the remaining \(n-k\) jobs cannot be used to fill the gaps on the machines in \(\hbox {LOW}^k\) or, equivalently, not more than \(n-k-V_\mathrm{PATH}^k\) jobs are available for being assigned to the machines in \(\hbox {LOW}^k\). Hence, we can conclude that \(R^k\) is at least as large as the sum of the \(V_\mathrm{PATH}^k\) shortest processing times, i.e., \(R^k=\sum _{j=1}^{V_\mathrm{PATH}^k} t_{n-j+1}\), and (6) appears as follows.

Criterion 3.3

If

$$\begin{aligned} T_\mathrm{rem}^k - \sum _{j=1}^{V_\mathrm{PATH}^k} t_{n-j+1} < \varDelta ^k \end{aligned}$$
(8)

for some \(k<n\), then the current partial solution can be fathomed.

To shorten notation, in the remainder of this section, we set \(n':=n-V_\mathrm{PATH}^k\). Moreover, for sake of readability and since we always consider a partial solution after the assignment of the k longest jobs, we will omit the upper index k.

We will now show that there is still some potential to further increase the value of R. So far, we do not adequately take into account that the jobs are indivisible. Since preemption is not allowed, the processing of a job cannot be interrupted and continued on another machine. Therefore, in contrast to our current criteria, if it is not realizable to select some of the remaining \(n'-k\) jobs so that a machine \(i\in \hbox {LOW}\) finishes exactly at time \(L+1\), then the time difference (or surplus) \(C_i - (L+1)\) on machine i cannot be used to increase the completion time of other machines in \(\hbox {LOW}\). Recalling that each machine \(i\in \hbox {LOW}\) requires at least \(\bar{v}_i\) more jobs to fulfill its path-conditions and to allow for a new incumbent, we simply check whether the sum of the \(\bar{v}_i\) shortest available jobs, i.e., \(n',\ldots ,n'-\bar{v}_i+1\), already exceeds the gap \(\delta _i\) on that machine. If this is the case, then R can be further increased as follows. For each \(i\in \hbox {LOW}\), we compute

$$\begin{aligned} s_i = \max \left\{ C_i + \sum _{j=1}^{\bar{v}_i}{t_{n'-j+1}} - (L+1),0\right\} \end{aligned}$$
(9)

which we call the individual surplus on machine i as well as

$$\begin{aligned} S = \sum _{i\in \hbox {LOW}} s_i \end{aligned}$$
(10)

which represents a lower bound on the cumulative surplus caused by the machines \(i\in \hbox {LOW}\). Then, it is obvious that in any complete solution, R will be at least as large as \(\sum _{j=1}^{V_\mathrm{PATH}^k} t_{n-j+1} + S\).

At this point, we shall remark that (9) (and 10) still assume the \(V_\mathrm{PATH}\) overall shortest jobs to be assigned to the machines in \(\mathrm{PATH}\). This assumption is correct because of the following trivial lemma which we state without giving a proof.

Lemma 3.4

If it is possible to fill all gaps on the machines in \(\hbox {LOW}\) using at most \(z\le n'-k\) of the jobs \(k+1,\ldots ,n\), then it is possible to fill all gaps on these machines using the z longest jobs \(k+1,\ldots ,k+z\).

As mentioned above, the calculation of the individual surplus \(s_i\) for each \(i\in \hbox {LOW}\) (cf. Eq. 9) is a basic approach because the shortest available jobs are supposed to be repeatedly assignable to each machine in \(\hbox {LOW}\). Clearly, this constitutes an essential simplification since in a feasible schedule each job has to be assigned to one machine. Taking this into account, for the subset of machines \(\hbox {LOW}1 := \{i\in \hbox {LOW} : l_i = 1\}\), we will now propose an alternative approach to calculate a lower bound on the cumulative surplus caused by these machines. As will be seen, the restriction to the subset \(\hbox {LOW}1\) of \(\hbox {LOW}\) ensures the alternative cumulative surplus to be easily and fast computable. The approach builds on the fact that in total (at least) \(|\hbox {LOW}1|\) jobs have to be assigned to the machines in \(\hbox {LOW}1\) and, since \(l_i=1\) for all \(i\in \hbox {LOW}1\), it might be sufficient to assign a single job to each machine \(i\in \hbox {LOW}1\) to fill the gaps. So, we can take the \(|\hbox {LOW}1|\) shortest available jobs, i.e., \(n',\ldots ,n'-|\hbox {LOW}1|+1\), and—assuming that each of which is assigned to exactly one machine in \(\hbox {LOW}1\)—we can calculate a lower bound on the cumulative surplus caused by the machines in \(\hbox {LOW}1\) as follows. Let

$$\begin{aligned} s'_i(j) = \max \{C_i+t_j-(L+1),0\} \text { for } i\in \hbox {LOW}1 \end{aligned}$$
(11)

denote the (single) surplus generated by assigning job j to machine \(i\in \hbox {LOW}1\) (see also Fig. 2), the following lemma is readily verified.

Fig. 2
figure 2

Illustration of \(s'_i(j)\)

Lemma 3.5

Consider two jobs \(j_1, j_2\) with \(t_{j_1}\ge t_{j_2}\) and two machines \(i_1, i_2\in \hbox {LOW}1\) with \(C_{i_1}\ge C_{i_2}\). Then, \(s'_{i_1}(j_2) + s'_{i_2}(j_1)\le s'_{i_1}(j_1) + s'_{i_2}(j_2)\).

Proof

If \(s'_{i_1}(j_2) = 0\), then \(s'_{i_1}(j_2) + s'_{i_2}(j_1) = s'_{i_2}(j_1)\le s'_{i_1}(j_1)\le s'_{i_1}(j_1) + s'_{i_2}(j_2)\) since \(C_{i_2}+t_{j_1} - (L+1) \le C_{i_1}+t_{j_1} - (L+1)\).

If \(s'_{i_2}(j_1) = 0\), then \(s'_{i_1}(j_2) + s'_{i_2}(j_1) = s'_{i_1}(j_2)\le s'_{i_1}(j_1)\le s'_{i_1}(j_1) + s'_{i_2}(j_2)\) since \(C_{i_1}+t_{j_2} - (L+1) \le C_{i_1}+t_{j_1} - (L+1)\).

Now, assume that \(C_{i_1}+t_{j_2} > L+1\) and \(C_{i_2}+t_{j_1} > L+1\). Then,

$$\begin{aligned} s'_{i_1}(j_2) + s'_{i_2}(j_1)&= C_{i_1}+t_{j_2}-(L+1) + C_{i_2}+t_{j_1}-(L+1) \\&= C_{i_1}+t_{j_1}-(L+1) + C_{i_2}+t_{j_2}-(L+1)\\&\le s'_{i_1}(j_1) + s'_{i_2}(j_2). \end{aligned}$$

\(\square \)

According to Lemma 3.5, assigning the longer job to the machine with the larger gap in \(\hbox {LOW}1\) and the shorter job to the machine with the smaller gap results in a smaller cumulative surplus (on these two machines) than the other way round. Clearly, this pairwise consideration can easily be extended to all machines in \(\hbox {LOW}1\) as summarized in the following corollary.

Corollary 3.6

Assume the machines in \(\hbox {LOW}1\) to be sorted according to non-increasing gaps. Then, the smallest cumulative surplus caused by the machines in \(\hbox {LOW}1\) is obtained by assigning the \(|\hbox {LOW}1|\) shortest available jobs in non-decreasing order of processing times to the machines in \(\hbox {LOW}1\).

Proof

The corollary can easily be proved by contradiction. At first, observe that considering any \(|\hbox {LOW}1|\) jobs out of \(k+1,\ldots ,n'\) cannot yield a smaller cumulative surplus than selecting the \(|\hbox {LOW}1|\) shortest ones because the surpluses are monotonically increasing in increasing processing times.

Now, assume that the smallest cumulative surplus is not achieved by the assignment described in Corollary 3.6. Then, there must exist a pair of machines (and jobs) where the machine with the larger gap is assigned the job with the shorter processing time and the machine with the smaller gap is assigned the job with the longer processing time. According to Lemma 3.5, swapping these two jobs results in a smaller cumulative surplus, which is a contradiction. \(\square \)

Formally, we can summarize Corollary 3.6 as follows: Let \(\pi \) be a permutation of the machines in \(\hbox {LOW}1\) so that \(C_{\pi (1)}\ge C_{\pi (2)}\ge \cdots \ge C_{\pi (|\hbox {LOW}1|)}\), then the smallest cumulative surplus \(S1'\) is obtained by assigning job \(n'-i+1\) to machine \(\pi (i)\) for \(i=1,\ldots ,|\hbox {LOW}1|\), i.e.,

$$\begin{aligned} S1' = \sum _{i=1}^{|\hbox {LOW}1|} s'_{\pi (i)}(n'-i+1). \end{aligned}$$
(12)

Note that again Lemma 3.4 is taken as granted in the computation of the cumulative surplus \(S1'\) on the machines in \(\hbox {LOW}1\).

Altogether, we have developed two different approaches to calculate a lower bound on the cumulative surplus caused by the machines in \(\hbox {LOW}1\), namely \(S1:=\sum _{i\in \hbox {LOW}1} s_i\) and \(S1'\). It is readily verified that neither S1 dominates \(S1'\) (think of cases where \(\bar{v}_i = l_i = 1\) for all \(i\in \hbox {LOW}1\)) nor \(S1'\) dominates S1. Hence, a lower bound on the cumulative surplus caused by all machines in \(\hbox {LOW}\) is given by

$$\begin{aligned} \bar{S} = \sum _{i\in \mathrm{LOW}\setminus \hbox {LOW}1} s_i + \max \{S1, S1'\} = S - S1 + \max \{S1, S1'\}. \end{aligned}$$
(13)

Consequently, we can further increase R by \(\bar{S}\) and (6) results in the following criterion that is tighter than Criterion 3.3.

Criterion 3.7

If

$$\begin{aligned} T_\mathrm{rem} - \sum _{j=1}^{V_\mathrm{PATH}} t_{n'+j} - \bar{S} < \varDelta , \end{aligned}$$
(14)

then the current partial solution can be fathomed.

Summarizing the results developed within this subsection, we can record the following. Given a partial solution where the k longest jobs have already been assigned, in order to obtain a new incumbent solution that fulfills the path-conditions at least \(R = \sum _{j=1}^{V_\mathrm{PATH}} t_{n'+j} + \bar{S}\) units of the total remaining processing time \(T_\mathrm{rem}\) cannot be used effectively to fill the gaps on the machines in \(\hbox {LOW}\)—no matter how the remaining \(n-k\) jobs will be assigned.

Example 3.8

The following example briefly illustrates the functionality of the four proposed dominance criteria. Assume \(n=12\), \(m=5\), and the vector of processing times \(T=\left( 95,93,87,86,66,63,62,55,45,25,10,6\right) \) for which a lower and an upper bound value of \(L=130\) and \(U=136\) can be determined (cf. Sect. 4), respectively. Furthermore, let \(\tilde{S}=(1,2,3,4,5,5,3,4)\) be the current partial schedule where the \(k=8\) longest jobs have already been assigned and \(T_\mathrm{rem}=\sum ^{12}_{j=9}t_j=45+25+10+6=86\).

Table 1 contains the current machine completion times \(C_i\) and the gaps \(\delta _i\). The number of required jobs according to Criteria 3.1 and 3.2 is given in Table 2. Since \(n-k=4\) jobs still have to be assigned, the current partial schedule cannot be fathomed on the basis of the first two criteria.

From Tables 1 and 2, we get \(\hbox {LOW}=\hbox {LOW}1=\{1,2,5\}\), \(\mathrm{PATH}\) \(=\{4\}\), and \(V_\mathrm{PATH}=1\) so that in a first step R is equal to \(t_{12}=6\). Since \(T_\mathrm{rem} - R = 86 - 6\) is not smaller than \(\varDelta = 36+38+2 = 76\), we cannot fathom the current partial solution yet (cf. Criterion 3.3).

Table 3 contains the results of the two alternative surplus computations for the machines in \(\hbox {LOW}1\). According to Criterion 3.7, the current partial solution can be fathomed because \(T_\mathrm{rem}-6-\bar{S}=86-6-15=65<76\).

Table 1 Completion times and gaps
Table 2 Number of required jobs
Table 3 Surpluses

4 A branch-and-bound algorithm

This section provides details on the developed branch-and-bound algorithm.

4.1 Upper bounds

4.1.1 A trivial bound and its worst-case ratio

Let \(T_\mathrm{sum} = \sum _{j=1}^n t_j\), then \(U_0 = \left\lfloor \frac{T_\mathrm{sum}}{m}\right\rfloor \) represents the simplest upper bound on the optimal minimum machine completion time \(C_{\min }^*\) (cf. Haouari and Jemmali 2008). It is readily verified that the worst-case performance of \(U_0\) can be arbitrarily bad. For instance, assume \(n = m + 1\) and consider the processing times \(t_1 = K\gg 1\) and \(t_j = 1\) for \(j=2,\ldots ,n\). Then, \(U_0 = \lfloor K/m\rfloor +1\), whereas \(C_{\min }^*= 1\) so that the ratio \(U_0/C_{\min }^*\) approaches infinity as K grows.

However, based on the following two observations, the performance of \(U_0\) can be drastically improved. Firstly, note that in case \(n = m + k\) (\(k\in \{1,\ldots ,m-1\}\)) at least \(m-k\) machines will process exactly one job in any optimal solution. Thus, the minimum of \(t_{m-k}\) and \(\lfloor \frac{1}{k}\sum _{j=m-k+1}^n t_j\rfloor \) is a valid upper bound on \(C_{\min }^*\). Secondly, note that any job whose processing time is greater than or equal to \(U_0\) can be eliminated so that \(\lfloor \frac{1}{m-|\mathcal {J'}|}\sum _{j\in \mathcal {J}\setminus \mathcal {J'}} t_j\rfloor \) where \(\mathcal {J'} = \{j:t_j \ge U_0\}\) constitutes a valid upper bound on \(C_{\min }^*\). Note that \(|\mathcal {J'}|\le m-1\). Moreover, note that the aforementioned elimination (or reduction) can possibly be repeated up to \(m-1\) times in an iterative manner.

As a result of the previous observations, we can reduce any given instance I to \(\tilde{I}\) in such a way that (i) the number of remaining jobs is at least twice the number of remaining machines and (ii) the longest (remaining) processing time is smaller than \(U_0(\tilde{I})\). Then, we can prove that \(U_0(\tilde{I})\) is never more than \(2-1/m\) times the optimal minimum completion time.

Theorem 4.1

Let I be an instance of \(P||C_{\min }\) which fulfills both \(n\ge 2m\) and \(t_1<U_0(I)\). Then,

$$\begin{aligned} \frac{U_0(I)}{C_{\min }^*(I)} \le 2-\frac{1}{m} \end{aligned}$$
(15)

for all instances I and this bound is asymptotically tight for any fixed \(m\ge 2\).

Proof

We prove the theorem by contradiction. Assume that \(U_0(I)/C_{\min }^*(I)>2-1/m\). This is equivalent to \(C_{\min }^*(I) < U_0(I)/(2-1/m)\le T_\mathrm{sum}/(2m-1)\). So, in an optimal schedule, the completion time of at least one machine, say \(i_1\), is less than \(T_\mathrm{sum}/(2m-1)\). Consequently, there exists at least another machine, say \(i_2\), whose completion time \(C_{i_2}\) is at least \((T_\mathrm{sum} - T_\mathrm{sum}/(2m-1))/(m-1) = 2T_\mathrm{sum}/(2m-1) > T_\mathrm{sum}/m > t_1\). Thus, \(i_2\) processes at least two jobs among which the shortest job’s processing time is at most \(C_{i_2}/2\ge T_\mathrm{sum}/(2m-1)\). Shifting this job from \(i_2\) to \(i_1\) yields a better schedule which is a contradiction.

To verify that the bound is asymptotically tight for any fixed \(m\ge 2\), consider \(n = 2m\) jobs with processing times \(t_1 = \cdots = t_{n-1} = K\gg 1\) and \(t_n = 1\). Then, \(C_{\min }^*= K + 1\), whereas \(U_0 = (2-1/m)K+1/m\). Hence, the ratio of \(U_0\) to \(C_{\min }^*\) approaches \(2-1/m\) as K grows. \(\square \)

4.1.2 Improvements derived from \(P||C_{\max }\)

Following Haouari and Jemmali (2008), an upper bound on \(C_{\min }^*\) can be derived from a known lower bound \(L_{C_{\max }}\) on the optimal makespan by computation of

$$\begin{aligned} U_1 = \left\lfloor \frac{T_\mathrm{sum}-L_{C_{\max }}}{m-1}\right\rfloor . \end{aligned}$$
(16)

To obtain a good lower bound on the optimal makespan, we implemented an enhanced version (due to Haouari and Jemmali 2008) of the bound \(L_3\) by Dell’Amico and Martello (1995). For further details, we refer the reader to the literature.

4.1.3 Lifting procedure and further enhancement

Haouari and Jemmali (2008) proposed two procedures to tighten upper bounds for \(P||C_{\min }\). The first one is the so-called lifting procedure which bases upon the fact that in any feasible schedule there exists at least a set of l machines \((1 \le l \le m)\) that process at most

$$\begin{aligned} \mu _l(n) = l\lfloor n/m\rfloor + \max \{0,n-m(\lfloor n/m\rfloor +1)+l\} \end{aligned}$$
(17)

jobs (cf. Haouari and Jemmali 2008). Then, a lifted bound can be obtained by applying an upper bound procedure on the partial instance restricted to l machines and the \(\mu _l(n)\) longest jobs.

The second procedure (cf. Haouari and Jemmali 2008) aims at enhancing a given upper bound value U by solving a specific subset sum problem (SSP) that checks whether there exists a subset of \(\mathcal {J}\) whose processing times sum up to exactly U. If no such subset exists, the smallest realizable sum (denoted by \(U_{SSP}\)) of processing times that does not exceed U constitutes an upper bound.

4.1.4 Improvements derived from bin covering

To the best of our knowledge, we are the first to describe the coherence between \(P||C_{\min }\) and the bin covering problem (BCP) in order to improve upper bounds on \(C_{\min }^*\). The idea is to transform a given \(P||C_{\min }\)-instance into a BCP-instance where (i) jobs and processing times correspond to items and weights, respectively, and (ii) the capacity C of the bins is set to the best known upper bound on the minimum completion time. Then, a procedure is applied to determine an upper bound on the maximum number of bins that can be covered. If this number is at most \(m-1\), then the optimal minimum completion time of the corresponding \(P||C_{\min }\)-instance is at most \(C-1\).

In our implementation, we used four BCP-upper bounds (\(U_0\) from Peeters and Degraeve 2006, as well as \(U_1\), \(U_2\), and \(U_3\) from Labbé et al. 1995, including their reduction Criteria 1 and 2) and an improvement procedure (see Theorem 5 in Labbé et al. 1995). Again, we refrain from reporting any further details on these bounding techniques but refer the interested reader to the literature.

4.1.5 Bounds derived from the solution structure

Assume that a lower bound L on \(C_{\min }^*\) is given (cf. Sect. 4.2). Then, in order to generate a new incumbent solution, i.e., \(C_{\min } > L\), we can deduce that at least

$$\begin{aligned} j_{\min } = \min \{k: t_1 + \cdots + t_k > L\} \end{aligned}$$
(18)

and at most

$$\begin{aligned} j_{\max } = \max \{k: t_{n-k+1} + \cdots + t_n \le \bar{C}\} \end{aligned}$$
(19)

jobs have to be assigned to each machine and

$$\begin{aligned} \bar{C} = \max \Big \{C: \Big \lfloor \frac{T_\mathrm{sum}-C}{m-1}\Big \rfloor > L\Big \}. \end{aligned}$$
(20)

Note that if a machine’s completion time is greater than \(\bar{C}\), it is impossible that each of the remaining \(m-1\) machines runs longer than L.

When the special case \(j_{\max } = j_{\min }+1\) occurs, an immediate upper bound is obtained after determining the number of machines \(m_{\min }\) (\(m_{\max }\)) on which exactly \(j_{\min }\) (\(j_{\max }\)) jobs are processed each. It is readily verified that \(m_{\min } = j_{\max }\cdot m - n\) and \(m_{\max } = n - j_{\min }\cdot m\). The resulting upper bound is

$$\begin{aligned} U(L) = \min \left\{ \left\lfloor \frac{1}{m_{\min }}\sum _{j=1}^{j_{\min }\cdot m_{\min }} t_j \right\rfloor , \left\lfloor \frac{1}{m_{\max }}\sum _{j=1}^{j_{\max }\cdot m_{\max }} t_j \right\rfloor \right\} . \end{aligned}$$
(21)

In case \(j_{\max } = 2\), note that an optimal solution is readily obtained by assigning job j (\(j = 1,\ldots ,m\)) to machine j and job \(m+k\) (\(k=1,\ldots ,n-m\)) to machine \(m-k+1\).

If \(j_{\max } > j_{\min }+1\), we propose the following strategy. Firstly, if \(\lfloor (T_\mathrm{sum}-t_1-\cdots -t_{j_{\min }})/(m-1)\rfloor \le L\), it is impossible to obtain a new incumbent by assigning the longest \(j_{\min }\) jobs to the same machine. In general, a new incumbent can only be obtained at all if none of the machines runs longer than \(\tilde{C}:=\max \{C>L:\lfloor \frac{T_\mathrm{sum}-C}{m-1}\rfloor >L\}\). Thus, if there exists no \(j_{\min }\)-element subset of \(\mathcal {J}\) whose cumulative processing time falls into the interval \([L+1,\tilde{C}]\) we can increase \(j_{\min }\).

Instead of checking each subset individually, we solve the following binary program (requiring pseudo-polynomial time) for a possible increase of \(j_{\min }\):

$$\begin{aligned}&\begin{array}{c} \text {Minimize } \tilde{j}_{\min } = \sum _{j=1}^n x_j\\ \\ \hbox {subject to} \end{array}\end{aligned}$$
(22)
$$\begin{aligned}&L+1\le \sum _{j=1}^n{t_j x_j} \le \tilde{C}\end{aligned}$$
(23)
$$\begin{aligned}&x_j \in \left\{ 0,1\right\} \quad \forall j=1,\ldots ,n. \end{aligned}$$
(24)

In case that now \(j_{\max } = \tilde{j}_{\min }+1\), an immediate upper bound can be determined as described above.

Secondly, we consider restricted instances with l \((l = m-1,m-2,\ldots ,2)\) machines and the longest \(\mu _{l}\) out of all n jobs where \(\mu _{l}\) is determined according to Eq. (17). Then, for each l, we compute a (restricted) \(C_{\min }\)-lower bound \(L_l\) by application of the LPT-rule and we take the maximum of L and \(L_l\) (note that the optimal restricted \(C_{\min }\)-value is at least as large as L) to determine \(j_{\min }(l)\) (cf. Eq. 18) as well as \(j_{\max }(l)\) (cf. Eq. 19). In case \(j_{\max }(l) = j_{\min }(l) + 1\), we obtain an upper bound \(U(L_l)\) according to Eq. (21).

4.2 Lower bounds

We implemented three construction procedures as well as an improvement procedure. The first constructive algorithm is the prominent longest processing time (LPT)-rule due to Graham (1969). As a second procedure, we implemented a randomized LPT-version due to Haouari and Jemmali (2008) that randomly decides in each iteration whether the longest or the second longest unassigned job is assigned to the next machine available. Our third procedure is an adaptation of a construction heuristic—referred to as Multi-Subset (MS) (cf. Alvim et al. 2004)—and consists of two phases. In the first phase, the machines are considered one by one. For each machine, a subset of the yet unassigned jobs is determined (by solving a subset sum problem) so that the longest unassigned job is contained and the sum of the respective processing times is closest to—without exceeding—a given target value T. If not all jobs are assigned after phase 1, the second phase completes the partial solution by assigning the remaining jobs according to the LPT-rule. We used two different values for T, namely \(T=UB_\mathrm{best}\) and \(T=LB_\mathrm{best}+1\) where \(UB_\mathrm{best}\) and \(LB_\mathrm{best}\) are the best known upper and lower bound values so far, respectively. The better of the two solutions produced by MS is chosen as the MS-solution.

The implemented improvement heuristic—referred to as Multi-Start Local Search (MSLS)—is also due to Haouari and Jemmali (2008). Starting with an initial solution, the procedure attempts to balance the workloads of the machines by iteratively solving a sequence of specific \(P2||C_{\min }\)-instances. For further details, we refer to Haouari and Jemmali (2008).

4.3 Application of the bounds

At the root node, the bounds are computed in the following order (as long as no optimal solution has been verified). Presupposing a (possibly preprocessed) instance where \(n\ge 2m\), at first we determine \(U_0\) and we apply the LPT-rule yielding a global lower bound value \(L_G\). Secondly, we make use of the lifting and enhancement procedure (see Sect. 4.1.3), i.e., for \(l=m,m-1,\ldots ,2\) we compute \(U_1\) restricted to l machines and the \(\mu _l\) longest jobs and afterward we solve for each l the respective SSP. Thirdly, we iteratively apply the bounding techniques derived from bin covering (see Sect. 4.1.4) as long as at least one of them leads to an improved global upper bound \(U_G\). In a fourth step, we try to further improve \(L_G\) by application of (i) MS and (ii) MSLS (see Sect. 4.2). The latter is applied to the LPT-solution, the MS-solution as well as to 25 randomized LPT-solutions. Lastly, we apply our upper bounds derived from the solution structure as developed within Sect. 4.1.5.

At each node in the tree, we pursue the following two ideas to obtain local upper bounds. Firstly, we adopt the rationale behind \(U_0\) as follows. In the current partial solution, we replace each currently unassigned job j \((j=k+1,\ldots ,n)\) by \(t_j\) jobs of length 1 and apply the LPT-rule to assign the \(T_\mathrm{rem}^k\) jobs of length 1 to the current partial solution. Clearly, the resulting minimum completion time constitutes a local upper bound which we denote by \(U_0^\mathrm{mod}\).

Secondly, we partition \(\mathcal {M}\) into two subsets \(M_{UB^-}=\{M_i\in \mathcal {M}: C_i^k<UB_{loc}\}\) and \(M_{{UB^+}}=\mathcal {M}\setminus {M_{{UB^-}}}\) where \(UB_L\) denotes the best known local upper bound for the considered node. Note that \(UB_{loc}\) is the minimum of \(U_0^\mathrm{mod}\) and the parent node’s upper bound value. Then, we compute the following modified variant \(U_1^\mathrm{mod}\) of \(U_1\):

$$\begin{aligned} U_1^\mathrm{mod} = \left\lfloor \frac{T_\mathrm{sum}-\sum _{i\in M_{UB^+}} C_i^k - L_3}{|M_{UB^-}|-1} \right\rfloor . \end{aligned}$$
(25)

Here \(L_3\) is computed for a transformed instance restricted to (i) the machines in \(M_{UB^-}\), (ii) the \(n-k\) remaining jobs, and (iii) \(|M_{UB^-}|\) fictitious jobs having processing times \(C_i^k\;(i\in M_{UB^-}\)).

To possibly improve on the lower bound, we apply the LPT-rule to partial solutions for which at least m jobs have already been assigned and no more than 2m jobs remain unassigned.

4.4 The branching scheme

We implemented a depth-first branching scheme which has originally been proposed by Dell’Amico and Martello (1995) in the context of solving \(P||C_{\max }\). We decided on this scheme as it allows for a straightforward incorporation of the path-related dominance criteria. It is different from the one developed by Haouari and Jemmali (2008) and works as follows. At level k, the current node generates at most m son-nodes by sequentially assigning job k, i.e., the longest unassigned job, to each machine \(M_i\) that fulfills both \(C_i^{k-1}<UB_{loc}\) and \(|M_i| < j_{\max }\). The corresponding machines are selected according to increasing current completion times.

Clearly, each new incumbent solution updates the global lower bound. As soon as an optimal solution has been identified, the branching process stops immediately.

4.5 Dominance criteria

In case that a current partial solution at level k cannot be fathomed due to the computation of the local upper bounds and not each pair of machines currently fulfills its path-condition, we apply the path-related dominance criteria in the same order as they are introduced in Sect. 3.

Furthermore, we apply the following criterion derived from Sect. 4.1.5.

Criterion 4.2

If \(|M_i|+\bar{v}_i^k>j_{\max }\) for some \(i\in \{1,\ldots ,m\}\), then the current partial solution can be fathomed.

The branching process itself can be limited according to four dominance criteria that have originally been introduced by Dell’Amico and Martello (1995). In Walter and Lawrinenko (2014) it is shown how these four criteria have to be modified in order to be compatible with the path-related criteria. For further details, we refer to the literature.

5 Computational study

We have coded the algorithm described within Sect. 4 in C++ using the Visual C++ 2010 compiler and experimentally tested its performance on (i) various difficult sets of \(P||C_{\min }\)-instances as reported by Haouari and Jemmali (2008) and (ii) a large set of instances from the literature. Our computational experiments were performed on a personal computer with an Intel Core i7-2600 processor and 8GB RAM while running Windows 7 Professional SP 1 (64-bit). The maximal computation time per instance was set to 600 s.

5.1 Performance on Dell’Amico and Martello’s instances

In a first experiment, we have run our algorithm (denoted by WWL) on the following five problem classes originally proposed by Dell’Amico and Martello (1995):

  • Class 1: discrete uniform distribution on \(\left[ 1,100\right] \)

  • Class 2: discrete uniform distribution on \(\left[ 20,100\right] \)

  • Class 3: discrete uniform distribution on \(\left[ 50,100\right] \)

  • Class 4: cut-off normal distribution with \(\mu =100\) and \(\sigma =50\)

  • Class 5: cut-off normal distribution with \(\mu =100\) and \(\sigma =20\)

According to the results stated in the paper by Haouari and Jemmali (2008), with their algorithm (denoted by HJ) branching was required for only 166 out of 2050 instances. Since our global bounds are in general at least as strong, there is no need to reconsider all of their investigated constellations. Instead, we restricted our study to the difficult (nm)-constellations (10, 3), (10, 5), (25, 10), (25, 15), (50, 15) where branching was required by HJ. Like Haouari and Jemmali (2008), for each constellation we randomly generated 10 independent instances resulting in a total of 250 instances. A summary of the results is provided in Table 4. In this table we document the mean CPU time in seconds (labeled as “Time”) as well as the mean number of explored nodes (“NN”). Moreover, the number of unsolved instances if greater than 0 is given in brackets. The results of algorithm HJ are taken from Haouari and Jemmali (2008). Their algorithm was coded in Visual C++ 6.0 and ran on a Pentium IV 3.2GHz with 3GB RAM. The time limit for HJ was set to 800 s.

Additionally, in Table 4, we report on the performance of the new upper and lower bound techniques (cf. columns labeled as “\(\hbox {UB}_{\mathrm{impr}}\)” and “\(\hbox {LB}_{\mathrm{MS}}\)”) introduced within the Sects. 4.1.4, 4.1.5, and 4.2. Concerning the new upper bounds, the column labeled as “#” gives the number of instances where the new upper bounds improved on the best upper bound obtained by the existing procedures (cf. Sects. 4.1.14.1.3). Numbers in brackets indicate for how many instances (if less than 10) the application of at least one of the two new upper bounding techniques has been needed. The columns labeled as “avg” and “max” give the average and maximum relative deviation (in %) between the best existing upper bound and the best new upper bound. Concerning the new lower bound, the column labeled as “#” gives the number of instances where Multi-Subset (MS) generates the best lower bound value. Numbers in brackets indicate for how many instances (if less than 10) the application of MS has been needed. For those cases, the columns labeled as “avg” and “max” give the average and maximum relative deviation (in %) between the best lower bound value and the value produced by MS.

Table 4 Results on difficult instances generated according to Dell’Amico and Martello (1995)

As can be seen from the entries in Table 4, except for constellation (50, 15) where algorithm HJ performed generally better than WWL, the opposite is true for each of the four other constellations. Here, our algorithm is strictly superior to algorithm HJ in terms of NN. Moreover, WWL was able to quickly solve all 200 instances, while algorithm HJ failed to solve two of them. The dominance of WWL is particularly impressive for the constellations (25, 10) and (25, 15) where the average number of generated nodes and the computation time were reduced to fractions of up to 1/14,500,000 and 1/8650, respectively. However, the benefit of the dominance criteria (cf. Sect. 3) seems to be limited to constellations where \(n/m \le 2.5\). These observations comply with the findings in Walter and Lawrinenko (2014) for problem \(P||C_{\max }\).

The performance of the new bounding techniques can be summarized as follows. Starting with the upper bounds, application of the new techniques has been required for 206 out of 250 instances and improvements were achieved for 63 of them (mostly due to the bounds derived from BCP). While the overall average relative improvement is about \(1.153\,\%\), maximum relative improvements of up to almost \(19\,\%\) were obtained (cf. constellation (10, 5), Class 5). The new upper bounding techniques performed remarkably well at constellation (10, 5) where upper bounds could be tightened for 40 (of 49) instances. With regard to the new lower bound, application of the MS-heuristic has been required for 171 out of 250 instances (note that for the remaining 79 instances the LPT-solution had already been identified as an optimal solution). For 27 of the 171 instances, MS generated the best lower bound or, in other words, the MSLS-heuristic was not able to improve the MS-solution. All in all, our proposed construction heuristic MS performed quite well yielding a deviation from the MSLS-value of less than 3.5 % on average.

5.2 Performance on Haouari and Jemmali’s instances

In the second experiment, we have run our algorithm on a class of problems where the processing times are drawn from a discrete uniform distribution on [1, n] as proposed by Haouari and Jemmali (2008). Again, we restricted our study to those (nm)-constellations where not all of the 10 generated instances in Haouari and Jemmali (2008) had been solved at the root node by their algorithm. This time, these are the three constellations (10, 5), (25, 10), and (25, 15). For each of them we randomly generated 10 instances. Table 5 provides a comparison of our results with the ones reported in Haouari and Jemmali (2008).

Table 5 Results on instances generated according to Haouari and Jemmali (2008)
Table 6 Results on benchmark instances

Obviously, our findings of the first experiment also apply to the results obtained for the second experiment (see Table 5) which reveal a clear dominance of our algorithm for the constellations (25, 10) and (25, 15). Compared to algorithm HJ, our approach reduced the average number of generated nodes and the computation time to fractions of up to 1/820,000 and 1/5270, respectively.

5.3 Performance on benchmark instances

In a third experiment, we investigated WWL’s performance on 780 benchmark instances originally proposed by França et al. (1994) as well as Frangioni et al. (2004) for problem \(P||C_{\max }\). While the former set consists of 390 uniform instances, the latter contains 390 non-uniform instances. For a detailed description, we refer to the literature. All instances can be downloaded from http://www.or.deis.unibo.it/research.html. To the best of our knowledge, we are the first to use this established benchmark set in the context of \(P||C_{\min }\).

Table 6 reports on the results obtained by our algorithm. In addition to the mean CPU time and the mean number of nodes, this time we also document the average as well as maximum relative deviation in \(\%\) (labeled as “Gap” and “maxGap,” respectively) between the best upper bound value and the best lower bound value computed at the root node. Averages are taken over all 10 instances per triple (nmInterval).

The results indicate that WWL’s performance on the uniform instances is better than that on the set of non-uniform instances. While 347 out of the 390 uniform instances have been solved optimally, only 273 out of the 390 non-uniform instances have been solved optimally leading to a total number of 620 solved benchmark instances within the time limit of 600 s per instance. The unsolved uniform instances belong to the three (nm)-constellations (50, 10), (50, 25), and (100, 25) and the intervals \([1,10^3]\) and \([1,10^4]\). In contrast, all but one of the unsolved instances of the non-uniform set belong to the four (nm)-constellations (50, 5), (50, 10), (100, 10), and (100, 25) and all three intervals. Taking a look at the entries in the columns Gap and maxGap of the unsolved instances, we record considerably smaller values for the uniform than for the non-uniform instances. For the latter, we observed average and maximum deviations of up to 21.1 and 23.7%, respectively. This indicates that non-uniform instances seem to be more difficult to solve than uniform instances.

To allow for future comparisons, we document detailed results on the best found objective function value as well as the best found upper bound value for each of the 780 instances in the “Appendix”.

6 Conclusions

The paper on hand addressed the machine covering problem \(P||C_{\min }\) and its exact solution. We identified structural properties of optimal schedules from which we deduced the so-called path-related dominance criteria. Representing the key characteristic of the proposed branch-and-bound algorithm, these novel criteria proved to be effective in limiting the search space—particularly in the case of rather small ratios of n to m. For those constellations, our approach is superior to the one presented in Haouari and Jemmali (2008).

For future research on \(P||C_{\min }\), we suggest three interesting directions with regard to exact as well as heuristic procedures. Firstly, a quite promising advancement of our branch-and-bound algorithm might consist in the implementation of a sophisticated branching scheme that (even) more directly exploits the structural properties of potentially (unique) optimal solutions and thus enforces the generation of respective (partial) solutions. Secondly, we encourage the development of an innovative dynamic programming (DP) formulation of \(P||C_{\min }\) that makes use of the structural properties identified in this paper to trim the state space of the DP aiming at a possible improvement on the time complexity or the space requirements of the sole PTAS existing so far. Thirdly, due to the exponential nature of exactly solving the machine covering problem, efficient (meta-) heuristic approaches such as Tabu search or population-based algorithms are still needed to tackle large-sized instances. Here, it is conceivable to punish the non-fulfillment of the path-conditions by adequately reducing the fitness value of such solutions.