1 Introduction

We consider the problem of scheduling n jobs on 2 identical parallel machines \(M_1\) and \(M_2\) to minimize the makespan \(C_{\max }\). Each job j is defined by a processing time \(p_j\) and is required to be executed by one of the machines. We denote by \(c_j\) the completion time of job j in any given schedule and we have \(C_{\max } = \mathop {\max }\nolimits _{1 \le j \le n}(c_j)\). Using the standard three-field notation (Graham et al. 1979) this problem is denoted by \(P2 || C_{\max }\). It is NP-hard in the ordinary sense.

We tackle problem \(P2 || C_{\max }\) from the approximation point of view. For any problem instance, we denote by \(C^*_{\max }\) the optimal makespan and by \(C^{X}_{\max }\) the makespan provided by a generic algorithm X. We say that algorithm X has approximation ratio \(\rho \ge 1\), if \(C^{X}_{\max } \le \rho \cdot C^*_{\max }\) for every instance of \(P2 || C_{\max }\).

A pioneering approximation algorithm for this problem is the Longest Processing Time (LPT) rule proposed by Graham (1969) for the more general \(P || C_{\max }\) problem with m machines. It requires to sort the jobs in non-ascending order of their processing times \(p_j ~(j=1,\dots ,n)\) and then to assign one job at a time to the machine whose load is smallest so far. This assignment of jobs to machines is also known as List Scheduling (LS). LPT has time complexity of \(O(n \log n)\) due to the initial sorting of the jobs. Several properties have been established for LPT in the last decades (Blocher and Sevastyanov 2015; Chen 1993; Coffman and Sethi 1976; Graham 1969). LPT generally exhibits much better performance in practice than the expected theoretical approximation ratios, especially as the number of jobs gets larger. Due to its simplicity and practical effectiveness, LPT became a cornerstone for the design of more involving exact or heuristic algorithms for problem \(P | | C_{\max }\). Very recently, a revisiting of the LPT rule has been proposed in Della Croce and Scatamacchia (2018).

We mention other popular approximation algorithms which exploit connections of \(P|| C_{\max }\) with bin packing: Multifit (Coffman et al. 1978), Combine (Lee and Massey 1988) and Listfit (Gupta and Ruiz-Torres 2001). Such algorithms provide better worst case performance than LPT but at the cost of higher running times. Also, Polynomial Time Approximation Schemes (PTASs) were derived for the problem. The first PTAS was given Hochbaum and Shmoys (1987). PTASs with improved running times were then provided in Alon et al. (1998), Hochbaum (1997), Jansen (2010) and in Jansen et al. (2017). In Sahni (1976), a Fully Polynomial Time Approximation Scheme (FPTAS) was devised for \(P2 || C_{\max }\) (and for the more general \(Pm || C_{\max }\), if the number of machines is fixed) which solves the problem with accuracy \((1 + \epsilon )\) in time \(O(n^2/\epsilon )\). The current best performing algorithm for \(P2 || C_{\max }\) running with polynomial time complexity independent from the accuracy and providing a constant approximation ratio of \(\frac{12}{11}\) was presented in He et al. (2000), while an approximation algorithm with ratio limited to \(\frac{9}{8}\) was given in Della Croce and Scatamacchia (2018).

In this paper, we propose an approximation algorithm with a constant ratio that first applies LPT to a subset of 2k jobs, then considers a single step of local search on the obtained subschedule and finally applies list scheduling to the remaining jobs. This algorithm, when applied for \(k=5\), reaches a tight \(\frac{13}{12}\)-approximation ratio improving the ratio of \(\frac{12}{11}\) established in He et al. (2000). We use Mathematical Programming (MP) to analyze the approximation ratio of our approach. In a sense, the proposed approach resembles the reasoning employed in Mireault et al. (1997), where several LPs were used to determine the worst case approximation ratio of LPT rule on two uniform machines, and the reasoning in Della Croce and Scatamacchia (2018) where theoretical results were derived by means of Mathematical Programming techniques. Also, recently a growing attention has been given to Mathematical Programming as an alternative to mainstream proof systems based on analytical derivation [see, e.g., Abolhassani et al. (2016), Chimani and Wiedera (2016), Della Croce et al. (2018)]. Finally, in Sect. 3, we show how to improve the running time of the FPTAS of Sahni (1976) under mild restrictions.

2 A linear time \(\frac{13}{12}\)-approximation algorithm

2.1 Preliminaries

We consider the jobs are sorted by non-ascending values of their processing time, i.e. \(p_j \ge p_{j+1}, \;\; j=1,\dots ,n-1\). We denote by critical the job that provides the makespan of a given schedule. The following proposition holds.

Proposition 1

Consider a given algorithm H that assigns jobs \(1,\ldots ,2k\) to the machines according to some policy and then applies LS to the remaining jobs \(2k+1,\ldots ,n\). If the critical job j is such that \(j \ge 2k+1\), then

$$\begin{aligned} \frac{C^{H}_{\max }}{C^*_{\max }} \le 1 + \frac{1}{2(k+1)} = \rho . \end{aligned}$$

Proof

Assume w.l.o.g. that j is assigned to machine \(M_1\) and denote by \(t_2\) the completion time of machine \(M_2\) before processing jobs \(j,\ldots ,n\). Then, as j is scheduled according to LS, we have \(C^{H}_{\max } - p_j \le t_2\) and \(C^{H}_{\max } + t_2= \sum _{i=1}^{j}p_i \le \sum _{i=1}^{n}p_i \le 2C^*_{\max }\). Correspondingly, we have \(2C^{H}_{\max } -p_j \le C^{H}_{\max } +t_2 \le 2C^*_{\max }\), that is \(C^{H}_{\max } \le C^*_{\max } + \frac{p_j}{2}\). Hence, we have \(\frac{C^{H}_{\max }}{C^*_{\max }} \le \frac{C^*_{\max } + \frac{p_j}{2}}{C^*_{\max }} = 1 + \frac{p_j}{2C^*_{\max }}\). Besides, as \(j \ge 2k +1\), in the optimal solution, one of the machines will be assigned at least \((k+1)\) jobs with processing time not inferior to \(p_j\), that is \(C^*_{\max } \ge (k+1)p_j\). But then

$$\begin{aligned} \frac{C^{H}_{\max }}{C^*_{\max }} \le 1 + \frac{p_j}{2C^*_{\max }} \le 1 + \frac{{p_j}}{2(k+1){p_j}} = 1 + \frac{1}{2(k+1)} = \rho . \end{aligned}$$

\(\square \)

The following corollary immediately follows.

Corollary 1

Given a problem \(P_1\) with n jobs, consider the subproblem \(P_{red}\) with the first 2k jobs only. If problem \(P_{red}\) is solved by an algorithm with approximation ratio \(1 + \frac{1}{2(k+1)}\), then the same approximation ratio holds for \(P_{1}\) by applying LS to the remaining subset of jobs.

Proof

Indeed, if the critical job in \(P_1\)\(\in \{1,\ldots ,2k\}\), the approximation ratio cannot be superior to \((1 + \frac{1}{2(k+1)})\). Besides, if the critical job in \(P_1\)\(\in \{2k+1,\ldots ,n\}\), then the result of Proposition 1 holds. \(\square \)

Remark 1

In Graham (1969), a somewhat similar result was proved (and generalized to m machines) stating that, if problem \(P_{red}\) is solved to optimality, then the approximation ratio \(1 + \frac{1}{2(k+1)}\) holds for \(P_{1}\) by applying LS to the remaining subset of jobs. We remark, however, that requiring to solve problem \(P_{red}\) to optimality makes such algorithm inapplicable as soon as k becomes non negligible. For this reason, it has always been considered of interest, well after the publication of the findings in Graham (1969), to determine low complexity polynomial time algorithms providing constant time approximation ratio [see, for instance, the work in He et al. (2000)].

2.2 The approximation algorithm

We now turn to the presentation of a \(\frac{13}{12}\)-approximation algorithm (Algorithm \(A_1\)) which combines the LPT rule with a simple local search.

figure a

In practice, Algorithm \(A_1\) applies first LPT to the reduced instance \(I'\) composed by the 2k largest jobs yelding subschedule \(S'\). Then, a single step of local search between pairs or triples of jobs is applied to \(I'\) yelding subschedule \(S^{''}\). Finally, starting from \(S^{''}\), LS is applied to the remaining \((n-2k)\) jobs.

Proposition 2

Algorithm \(A_1\) runs with complexity \(O(k^2 \log k + n)\).

Proof

We proceed by analyzing the execution time of each step of the algorithm. It is well known that finding the k-th element in a vector of n elements can be done in linear time by adapting the median algorithm in Blum et al. (1973). Correspondingly, determining the largest 2k processing times can be done in O(n) time. Step 2 requires then \(O(n + k \log k)\) due to the additional application of LPT to instance \(I'\). Computing the best swap between two jobs in Step 3 can be done in O(k) by means of the following procedure. Denote by \(\delta \) the difference between the completion times of the critical and non-critical machine and by \(d_{i,j}\) the difference in processing time produced by \(SW^1_{i,j}\) for any jobs i and j. Notice also that jobs are ordered by non-ascending processing time on both machines due to the application of LPT at Step 2. We first seek for swaps with \(d_{i,j}\) as close as possible to \(\frac{\delta }{2}\) by coupling the first job i\((=1)\) on the critical machine with jobs \(j=1,2,\dots \) on the non-critical machine as long as \(0 \le d_{i,j} \le \frac{\delta }{2}\). This determines the best job j, say job \(j^\prime \), for the first job i. Then, we analyze the next possible swap by considering the second largest job on the critical machine (\(i=2\)) and jobs \(j \ge j^\prime + 1\) on the non critical machine. After processing all jobs on the non-critical machine, we then re-apply the same procedure where, for a given candidate i, we look for the first job j such that \(d_{i,j} > \frac{\delta }{2}\). Overall, the best swap \(SW^1_{i,j}\) is found in O(k). In Step 4, we first compute and sort by non-ascending order of processing times all pairs (jk) of jobs on the non-critical machine in \(O(k^2 \log k)\). Then, we can apply the same procedure of Step 3 by comparing jobs on the critical machine with the array of \(O(k^2)\) processing time entries of the ordered pairs of jobs (jk). Therefore, the execution time of Step 4 is in \(O(k^2 \log k)\). A similar reasoning also applies in Step 5. In Step 7, LS runs with complexity O(n). The overall time complexity is hence \(O(k^2 \log k + n)\). \(\square \)

Theorem 1

Algorithm \(A_1\) applied to the largest jobs \(1, 2, \ldots , 10\) (where dummy jobs with null processing times are added if \(n < 10\)), i.e. with parameter \(k=5\), reaches a tight \(\frac{13}{12}\)-approximation ratio.

Proof

If \(n < 10\), it is immediate to see that an equivalent instance with 10 jobs exists by adding \(10-n\) dummy jobs with null processing times. If \(n \ge 10\), then, due to Corollary 1, it is sufficient to show that steps \(2-6\) of Algorithm \(A_1\) applied to the largest \(2k=10\) jobs provide approximation ratio not superior to \(1+\frac{1}{2(k+1)} = \frac{13}{12}\). To this extent, we rely on Mathematical Programming to evaluate the worst-case performance ratio of Algorithm \(A_1\) on any instance with up to 10 jobs. We propose a complete enumeration approach that considers all possible LPT sequences, denoted by \(S_i^{LPT}\), where by construction job 1 is assigned to machine \(M_1\), while jobs 2 and 3 are assigned to \(M_2\). Correspondingly, LPT rule may generate \(2^7=128\) possible different \(S_i^{LPT}\) sequences depending on the processing times values, where the makespan may be either on \(M_1\) or on \(M_2\). Similarly, we consider all possible optimal \(S_j^{OPT}\) sequences where, without loss of generality, we assume job 1 is assigned to \(M_1\). Correspondingly, \(2^9=512\) possible different sequences may be optimal. Actually, this value can be reduced to 260 by eliminating all dominated jobs assignments. For instance, the assignment of jobs \(1,\dots ,5\) to \(M_1\) and jobs \(6,\dots ,10\) to \(M_2\) is always dominated by the assignment of jobs \(1,\dots ,4,6\) to \(M_1\) and jobs \(5,7,\dots ,10\) to \(M_2\) as in both cases the makespan is on machine \(M_1\) and \(p_1+p_2+p_3+p_4+p_5 \ge p_1+p_2+p_3+p_4+p_6\).

For every pair \(S_i^{LPT}\), \(S_j^{OPT}\) of candidate solutions (for a total of 260\(\times \)128 pairs), we generate two LP models (taking into account whether the makespan of LPT is either on \(M_1\) or on \(M_2\)) that search for the jobs processing times that maximize the makespan determined by Algorithm \(A_1\) provided that the optimal makespan is not superior to a given constant value denoted by \(C^*\) where, without loss of generality, we may arbitrarily set \(C^* =1\): any other assignment \(C^*=\alpha \) would simply scale by a factor \(\alpha \) the related processing times values and correspondingly the objective function value without affecting the approximation ratio.

More precisely, we consider an MP formulation with non-negative variables \(p_j\) (\(j=1,\dots ,10\)) denoting the processing times, non-negative variables \(C_{\max }^{M_1}\) and \(C_{\max }^{M_2}\) representing the completion time of \(M_1\) and \(M_2\), respectively, in the LPT schedule and a non-negative variable \(\delta \) representing the difference between the above completion times. Finally, we introduce a non-negative variable \(\hat{\delta }\) representing the maximum among the improvements reachable by the best possible swaps \(SW^1_{i,j}\), \(SW^2_{i,j,k}\) and \(SW^3_{i,j,k}\), respectively, determined in steps \(3-5\) of Algorithm \(A_1\).

The processing times must satisfy the List Scheduling constraints of the LPT solution and the requirement that the optimal makespan cannot exceed the constant parameter. Further constraints connecting variables \(p_j\) and \(\hat{\delta }\) are also induced by the swaps considered in Steps 3-5 of Algorithm \(A_1\). We consider here the case where LPT gives the makespan on \(M_1\). Hence, the objective function value is given by the difference \((C_{\max }^{M_1} - \hat{\delta })\) to be maximized as we search for the worst-case. Let us denote by \(w_{k,\ell }\) a 0 / 1 coefficient indicating if job \(\ell \) is assigned to machine \(M_k\) in sequence \(S_i^{LPT}\). Similarly, let us denote by \(w_{k,\ell }^*\) a 0 / 1 coefficient indicating if job \(\ell \) is assigned to machine \(M_k\) in the optimal sequence \(S_j^{OPT}\). The following model is implied:

$$\begin{aligned}&\text {Maximize} \; (C_{\max }^{M_1} \; - \; \hat{\delta }) \end{aligned}$$
(1)
$$\begin{aligned}&p_j \ge p_{j+1} \qquad j=1, \dots , 9 \end{aligned}$$
(2)
$$\begin{aligned}&\sum _{j=1}^{\ell -1} w_{1,j} p_j \le \sum _{j=1}^{\ell -1} w_{2,j}p_j, \; \forall 2 \le \ell \le 10 \; | \; w_{1,\ell }=1 \end{aligned}$$
(3)
$$\begin{aligned}&\sum _{j=1}^{\ell -1} w_{2,j} p_j \le \sum _{j=1}^{\ell -1} w_{1,j}p_j, \; \forall 2 \le \ell \le 10 \; | \; w_{2,\ell }=1 \end{aligned}$$
(4)
$$\begin{aligned}&\sum _{j=1}^{10} w_{1,j}^* p_j \le C^* \end{aligned}$$
(5)
$$\begin{aligned}&\sum _{j=1}^{10} w_{2,j}^* p_j \le C^* \end{aligned}$$
(6)
$$\begin{aligned}&C_{\max }^{M_1} = \sum _{j=1}^{10} w_{1,j} p_j \end{aligned}$$
(7)
$$\begin{aligned}&C_{\max }^{M_2} = \sum _{j=1}^{10} w_{2,j} p_j \end{aligned}$$
(8)
$$\begin{aligned}&\delta = C_{\max }^{M_1} - C_{\max }^{M_2} \end{aligned}$$
(9)
$$\begin{aligned}&\hat{\delta } \ge \min \{p_i-p_j, \delta - p_i + p_j \},\; \; \forall i < j \; | \; w_{1,i}= w_{2,j}=1 \; \end{aligned}$$
(10)
$$\begin{aligned}&\hat{\delta } \ge \min \{p_i-p_j-p_k, \delta - p_i + p_j+p_k \},\; \; \forall i, j < k \; | \; w_{1,i}= w_{2,j}= w_{2,k}=1 \;\nonumber \\ \end{aligned}$$
(11)
$$\begin{aligned}&\hat{\delta } \ge \min \{p_i+p_j-p_k, \delta - p_i - p_j+p_k \}\; \; \forall i < j, k \; | \; w_{1,i}= w_{1,j}= w_{2,k}=1 \nonumber \\ \end{aligned}$$
(12)
$$\begin{aligned}&p_j \ge 0 \qquad j=1, \dots , 10 \end{aligned}$$
(13)
$$\begin{aligned}&\delta , \hat{\delta }, C_{\max }^{M_1}, C_{\max }^{M_2} \ge 0 \end{aligned}$$
(14)

Here constraints (2) ensure that jobs are ordered by non-increasing processing times. Constraints (3) and (4) impose to the \(p_j\) variables the fulfillment of the List Scheduling requirement. Also, constraints (5) and (6) require that the completion times of both machines in the optimal solution is not superior to the optimal makespan \(C^*_{\max }\). Then, constraint (7) indicates that the completion time of the critical machine is given by the sum of the processing times of the jobs assigned to that machine. Similarly, constraint (8) provides the same information for the non-critical machine. Constraint (9) indicates that \(\delta \) is the difference between the completion times of the two machines. Constraints (10)–(12) indicate that \(\hat{\delta }\) must be not inferior to the value of the largest possible improvement reachable by the best possible swaps \(SW^1_{i,j}\), \(SW^2_{i,j,k}\) and \(SW^3_{i,j,k}\), respectively. Notice that for conciseness we kept in constraints (10)–(12) a non-linear min notation that can be easily transformed into sets of linear constraints by means of big-M coefficients and the introduction of dedicated 0 / 1 variables. We report in “Appendix” the explosion of constraint (10). A similar reasoning is employed in the modeling of constraints (11)–(12) which is omitted in the paper for sake of conciseness.

Finally, constraints (13) and (14) indicate that all variables are non-negative. Thus, the MP model to be solved turns out to be a MILP formulation.

Then, by iterating (twice, in order to handle the makespan of LPT either on \(M_1\) or on \(M_2\)) the solution of the MILP model on all possible pairs \(S_i^{LPT}\), \(S_j^{OPT}\) and taking the maximum value, we get the worst-case instance with up to 10 jobs.

After solving \(2 \times 260 \times 128 = 66560\) MILP models,Footnote 1 we found that the worst-case is reached by the following example with vector of processing times \(\{7/12,5/12,1/6,1/6,1/6,1/6,1/6,1/6,0,0\}\). An LPT solution assigns jobs \(1,4,6,8\) to \(M_1\) and jobs 2, 3, 5, 7, 9, 10 to \(M_2\) and has makespan = 13/12. Any swap of the type indicated in Steps 2-6 of \(A_1\) does not lead to improvement. The optimal solution assigns jobs 1 and 2 to one machine and jobs \(3,\ldots ,10\) to the other machine reaching makespan =1. Correspondingly, the approximation ratio is \(\frac{13}{12}\). \(\square \)

We can also state the following side result for algorithm \(A_1\) applied to problem \(P2 || \sum _{i=1}^{2} (C_{M_i})^2\) where \(C_{M_i}\) refers to the completion time of the last job processed on \(M_i\) and the goal is to minimize the sum of the squares of the machine completion times.

Corollary 2

For any \(P2 || \sum _{i=1}^{2} C_{M_i}^2\) instance, algorithm \(A_1\) has a tight \(\frac{145}{144}\)-approximation ratio.

Proof

As indicated in Walter (2017), problems \(P2 || C_{\max }\) and \(P2 || \sum _{i=1}^{2} C_{M_i}^2\) are equivalent. Correspondingly, the tight instance provided above for problem \(P2 || C_{\max }\) constitutes also a worst-case instance for problem \(P2 || \sum _{i=1}^{2} C_{M_i}^2\). Since for this instance we have \(C_{M_1} = \frac{13}{12}\), \(C_{M_2} = \frac{11}{12}\) in the LPT schedule and \(C_{M_1} = C_{M_2} = 1\) in the optimal solution, the approximation ratio of Algorithm \(A_1\) is \(\frac{(\frac{13}{12})^2 + (\frac{11}{12})^2}{1 + 1} = \frac{145}{144}\). \(\square \)

We remark that the result of Corollary 2 improves upon the approximation ratio of \(\frac{50}{49}\) derived in Koulamas and Kyparisis (2008).

3 Improving the FPTAS of Sahni

By exploiting Proposition 1 and Corollary 1, it is possible to improve upon the time complexity of \(O(n^2/\epsilon )\) of the FPTAS proposed by Sahni (1976). Consider a simple procedure that first runs the FPTAS in Sahni (1976) to the subinstance only composed by the largest 2k jobs, with \(k=\lceil \frac{1}{2\epsilon } - 1\rceil \). Then, LS is applied to the remaining subset of jobs. The following proposition holds.

Proposition 3

Given a \(P2 || C_{\max }\) instance with n jobs, an approximation ratio \((1+\epsilon )\) can be reached with complexity \(O(\frac{1}{\epsilon ^3}+n)\) for all \(n > 1/\epsilon \).

Proof

As the proposed procedure sets \(k=\lceil \frac{1}{2\epsilon } - 1\rceil \), the results of Proposition 1 and Corollary 1 guarantee an approximation ratio \(1 + \frac{1}{2(k+1)} \le 1 + \epsilon \). To bound the running time, notice that the FPTAS is applied to the subset of 2k jobs, with \(k=\lceil \frac{1}{2\epsilon } - 1 \rceil \), thus yielding a time complexity of \(O(\frac{(2k)^2}{\epsilon }) = O(\frac{1}{\epsilon ^3})\). The additional linear contribution of n to the time complexity is due to the running of LS. \(\square \)

We remark that the difference in terms of complexity of the proposed procedure, with respect to the FPTAS in Sahni (1976), can be extremely large if \(n>> \frac{1}{\epsilon }\). For instance, with \(n=10000\) and \(\epsilon = 0.01\), while the FPTAS in Sahni (1976) runs in \(O(\frac{n^2}{\epsilon })=O(10^{10})\), we get a time complexity of \(O(\frac{1}{\epsilon ^3}+n)=O(10^6+10^4)\) that represents a difference of more than 4 orders of magnitude.

4 Conclusions

We considered problem \(P2 || C_{\max }\) and showed that the well-known LPT rule followed by a single step of local search reaches in linear time a tight \(\frac{13}{12}\)-approximation ratio. As a byproduct, we also showed that for any \(n > 1/\epsilon \) an approximation ratio \((1 + \epsilon )\) can be reached by means of an algorithm running with complexity \(O(n + \frac{1}{\epsilon ^3})\).

In our analysis we deployed an approach relying on Mixed Integer Linear Programming modeling. The proposed MILP reasoning could be considered a valid alternative to techniques based on analytical derivation. An attempt in this direction has been recently proposed in Della Croce et al. (2018) for a multiperiod variant of the knapsack problem. Due to the implications of Proposition 1, a generalization of the proposed approach for larger values of k, possibly combining LPT and other basic greedy rules such as, for instance, Multifit followed by a single step of local search, may possibly induce further improvement of the current result and is, therefore, definitely worthy of future investigation.