1 Introduction

Multiprocessor scheduling, which was first introduced by Graham (1966), is one of the most important combinatorial optimization problems in both the theoretical field and the application field, and has a lot of generalizations, variants and special cases (Chenier et al. 1995; Leutenegger and Vernon 1990; Zhong et al. 2017). The multiprocessor scheduling problem with rejection is one of its important generalizations, in which a job is either accepted and processed on one of the machines, or rejected and paid a rejection penalty. The objective is to minimize the sum of the makespan of the accepted jobs and the total rejection penalty of the rejected jobs. For this problem, Bartal et al. (2000) presented a 2-approximation algorithm. The parallel-machine scheduling problem with release dates and rejection is another one, in which a job is either accepted and processed on one of the machines after its corresponding release date, or rejected and paid a rejection penalty. For this problem, Zhang and Lu (2016) presented a 2-approximation algorithm. For more information see the surveys (Shabtay et al. 2013; Sotskov and Egorova 2019).

For a given finite set U, a set function \(f: 2^U\rightarrow \mathbb {R}_{\ge 0}\) is called submodular, if

$$\begin{aligned} f(X\cup Y)+f(X\cap Y)\le f(X)+f(Y) \end{aligned}$$

for any \(X, Y\in 2^{U}\). A submodular function is called modular if the equality holds. Submodular functions have the property of decreasing marginal returns and are recognized as fundamental tools in the areas of combinatorial optimization, game theory, machine learning and various other fields (Fujishige 2005). A set function is submodular if and only if its Lovász extension is convex (Lovász 1983). Submodular functions play a key role in combinatorial optimization somewhat similar to that played by convex/concave functions in continuous optimization.

In most real situations, the relationship between the number of rejected objects and the rejection penalties tends to be submodular rather than linear (Liu and Li 2021). So, classical combinatorial optimization problems with submodular penalties, in which the rejection penalty is determined by a submodular function, are receiving increasing attention. In 2016, Xu et al. (2016) introduced and studied the submodular vertex cover problems with linear/submodular penalties using the primal-dual technique and gave approximation algorithms with ratios 2 and 4, respectively. Most recently, Zhang et al. (2018) proposed a 3-approximation algorithm for precedence-constrained scheduling with submodular rejection on parallel-machines; Liu and Li (2020) proposed a 2-approximation algorithm for single-machine scheduling with release dates and submodular rejection penalties; and Liu and Li (2021) proposed a \(\frac{3+\sqrt{5}}{2}\)-approximation algorithm for multiprocessor scheduling with submodular rejection penalties.

Motivated by the above work, in this paper, we focus our attention on the parallel-machine scheduling problem with release dates and submodular rejection penalties. We will present a 2-approximation algorithm based on the primal-dual framework.

This paper is organized as follows: In Sect. 2, we recall some basic definitions and provide a formal problem statement. In Sect. 3, we design an approximation algorithm for the problem. In Sect. 4, we analyze the approximate ratio of our algorithm.

2 Preliminaries

In the parallel-machine scheduling problem with release dates and submodular rejection penalties, we are given m identical parallel machines, \(M=\{M_1, M_2, \ldots , M_m\}\), and n jobs, \(J=\{J_1, J_2, \ldots , J_n\}\). Each job \(J_j\) has a processing time \(p_j\) and a release date \(r_j\), where \(p_j\) and \(r_j\) are nonnegative real numbers. The rejection cost of a subset of jobs is determined by a submodular function \(\pi (\cdot ): 2^{J}\rightarrow \mathbb {R}_{\ge 0}\). The objective is to choose a subset \(R\subseteq J\) of rejected jobs and schedule the remaining jobs in \(A = J\setminus R\) on the machines such that the sum of the makespan of the accepted jobs and the penalty cost \(\pi (R)\) of the rejected jobs is minimized. By using the general notation for scheduling problems, the problem is denoted by \(P|r_j, reject|C_{\max }+\pi (R)\), where “P” denotes parallel machines, “\(r_j\)” denotes the release time, “\(C_{\max }\)” denotes the makespan (namely the latest completion) and “reject” implies that job rejection is allowed.

Clearly, if \(\pi (\cdot )\) is linear, that is \(\pi (R)=\sum _{J_j:J_j\in R}\pi (\{J_j\})\) for all \(R\subseteq J\), then the problem \(P|r_j, reject|C_{\max }+\pi (R)\) is exactly the problem \(P|r_j, reject|C_{\max }+\sum _{J_j\in R}\omega _{j}\) considered in Zhang and Lu (2016). If \(m=1\), then the problem \(P|r_j, reject|C_{\max }+\pi (R)\) is exactly the problem \(1|r_j, reject|C_{\max }+\pi (R)\) considered in Liu and Li (2020).

3 Algorithm

In this section, we design a 2-approximation algorithm for the problem \(P|r_j, reject|C_{\max }+\pi (R)\) based on the primal-dual framework. Recall that in this problem, we are given m identical parallel machines, \(M=\{M_1, M_2, \ldots , M_m\}\), and n jobs, \(J=\{J_1, J_2, \ldots , J_n\}\). Each job \(J_j\) has a processing time \(p_j\) and a release date \(r_j\). \(\pi (\cdot ): 2^{J}\rightarrow \mathbb {R}_{\ge 0}\) is the submodular penalty function.

We begin with the observation that if rejection is not allowed, then the problem \(P|r_j, reject|C_{\max }+\pi (R)\) is exactly the problem \(P|r_j|C_{\max }\). Lawler (1973) showed that the problem \(1|r_j|C_{\max }\) can be solved using the earliest release date rule (ERD-rule for short). That is, whenever some machine is idle and some job is available, process the unscheduled job with the earliest release date. Thus we have the following lemma.

Lemma 1

(Lawler 1973; Zhang and Lu 2016) For the problem \(P|r_j, reject|C_{\max }+\pi (R)\), there exists an optimal schedule such that the accepted jobs in \(A = J\setminus R\) are processed in the ERD-rule on each machine.

We re-label the jobs such that \(r_1+p_1\le r_2+p_2\le \ldots \le r_n+p_n\). For convenience, let \(r_0=0\) and \(p_0=0\).

For each \(e\in \{r_j+p_j | j=0, 1, \ldots , n\}\), let

$$\begin{aligned} D(e)=\{J_j | r_j+p_j>e\} \end{aligned}$$

be the set of jobs with the sum of release date and processing time bigger than e. Let \(D_e\) be the set of jobs such that

$$\begin{aligned} D_e\supseteq D(e)\quad \text {and}\quad \pi (D_e)\,\, \text {is minimized}. \end{aligned}$$

Obviously, we have

$$\begin{aligned} D_e=\arg \big (\min _ {X\subseteq J\setminus D(e)}\pi (X\cup D(e))\big )\cup D(e). \end{aligned}$$
(1)

Construct an auxiliary function \(\nu _e(\cdot )\) as follows: for any subset \(X\subseteq J\), define

$$\begin{aligned} \nu _e(X)=\pi (X\cup D(e))-\pi (D(e)). \end{aligned}$$
(2)

Lemma 2

For every \(e\in \{r_j+p_j | j=0, 1, \ldots , n\}\), the function \(\nu _e(\cdot )\) from (2) is submodular.

Proof

For any two subsets \(X, Y\subseteq J\), we have

$$\begin{aligned}&\nu _e(X)+\nu _e(Y)\\&\quad =\pi (X\cup D(e))-\pi (D(e))+\pi (Y\cup D(e))-\pi (D(e))\\&\quad \ge \pi ((X\cup D(e))\cup (Y\cup D(e)))+\pi ((X\cup D(e))\cap (Y\cup D(e)))-2\pi (D(e))\\&\quad =\pi ((X\cup Y)\cup D(e))-\pi (D(e))+\pi ((X\cap Y)\cup D(e))-\pi (D(e))\\&\quad =\nu _e(X\cup Y)+\nu _e(X\cap Y), \end{aligned}$$

where the inequality follows from that \(\pi (\cdot )\) is a submodular function. Therefore, \(\nu _e(\cdot )\) is a submodular function. \(\square \)

Now we display Algorithm 1 to find a schedule \(\delta _e\) for each \(e\in \{r_j+p_j | j=0, 1, \ldots , n\}\), which will be used as a subroutine in the final algorithm. We refer to it as A(e).

Algorithm 1

(Algorithm A(e))

Input: An instance of \(P|r_j, reject|C_{\max }+\pi (R)\) and a fixed number \(e\in \{r_j+p_j | j=0, 1, \cdots , n\}\).

Output: A schedule \(\delta _e\) and its objective value \(Z_e\).

Step 1. Initially, introduce a time notion t and an auxiliary “dual” variable \(\beta _{e,j}\) for every job \(J_j\in J\) and e. Set the variable \(\beta _{e,j}= 0\) for each job \(J_j\) in J and the time \(t=0\) when start the algorithm. We say that the job \(J_j\) is frozen if \(\beta _{e,j}\) no longer increases. Let \(R_e\) denote the set of rejected jobs and \(F_e\) denote the set of frozen jobs. Initialize \(R_e:=\emptyset \) and \(F_e:=\emptyset \).

Step 2. Compute the job set \(D_e\) using (1). Reject every job in \(D_e\) by setting \(R_e:=D_e\), and freeze every job in \(D_e\) by setting \(F_e:=D_e\).

Step 3. Increase the values of the dual variables \(\beta _{e,j}\) for all unfrozen jobs in \(J\setminus F_e\) uniformly at unit rate with time t until one of the following two events occurs:

Event 1. There exists such a set \(S_e\subseteq J\) with \(S_e\setminus F_e\ne \emptyset \) that

$$\begin{aligned} \sum _{j:J_j\in S_e\setminus F_e}t+\sum _{j:J_j\in S_e\cap F_e}\beta _{e,j}=\nu _e(S_e). \end{aligned}$$

Then set \(\beta _{e,j}=t\) for each job \(J_j\) in \(S_e\setminus F_e\) and freeze those unfrozen jobs in \(S_e\) by setting \(F_e:=F_e\cup S_e\). Reject all jobs in set \(S_e\) by setting \(R_e:=R_e\cup S_e\).

Event 2. There exists a job \(J_j\in J\setminus F_e\) such that \(t=\frac{p_j}{m}\). Then set \(\beta _{e,j}=t\) and freeze job \(J_j\) by setting \(F_e:=F_e\cup \{J_j\}\).

If multiple events occur at the same time, the algorithm executes them in an arbitrary order. Repeat the above process until all jobs in J are frozen.

Step 4. Reject the jobs in \(R_e\) and schedule all jobs in \(A_e(= J\setminus R_e)\) in the ERD-rule on the machines. The resulting schedule is denoted by \(\delta _e\), whose objective value is denoted by \(Z_e\).

Remark 1

For any \(e\in \{r_j+p_j | j=0, 1, \ldots , n\}\), at Step 4 of Algorithm A(e), use ERD-rule to schedule all jobs in \(A_e(= J\setminus R_e)\). According to this rule, we can obtain a sequence of all accepted jobs which minimizes the maximum of the makespan, in which we assume job \(J_l\) is the first to be scheduled. The starting time of processing job \(J_l\) is its release date \(r_l\), and its processing time is \(p_l\). Assume that job \(J_k\) is the last to be scheduled. The starting time of processing job \(J_k\) is \(S_k\), and its processing time is \(p_k\). So schedule all jobs in \(A_e(= J\setminus R_e)\) in time interval \([r_l, S_k+p_k]\) on the machines.

In the end of this section, we give the final algorithm.

Algorithm 2

Input: An instance of \(P|r_j, reject|C_{\max }+\pi (R)\).

Output: A schedule.

Step 1. Run Algorithm A(e) for every \(e\in \{r_j+p_j | j=0, 1, \cdots , n\}\) to obtain the schedule \(\delta _e\) and its objective value \(Z_e\).

Step 2. Choose the best schedule among \(\{\delta _e | e\in \{r_j+p_j | j=0, 1, \cdots , n\}\}\) as the output schedule.

4 Theoretical analysis

In this section, we analyze the approximate ratio of Algorithm 2.

Lemma 3

For any \(e\in \{r_j+p_j | j=0, 1, \ldots , n\}\), Algorithm A(e) can be implemented in polynomial time.

Proof

By Lemma 2, \(\nu _e(\cdot )\) is a submodular function, which implies that

$$\min _{X\subseteq J\setminus D(e)}\nu _e(X)$$

can be computed within polynomial time, using the method in Iwata et al. (2001). Since \(\pi (D(e))\) is a constant,

$$\min _{X\subseteq J\setminus D(e)}\pi (X\cup D(e))=\min _{X\subseteq J\setminus D(e)}\nu _e(X)+\pi (D(e))$$

can be computed within polynomial time. By (1), the set \(D_e\) can be computed in polynomial time, and hence Step 2 can be implemented in polynomial time.

Consider any time t during the implementation of Step 3. Let \(F_e(t)\) denote the subset of frozen jobs at this time. If we can find the next closest \(t'\) which is incurred by either Event 1 or Event 2 in polynomial time, Step 3 can be implemented in polynomial time, and hence this lemma is proved.

If \(t'\) is incurred by Event 2, we obtain \(t'\) by computing \(\min _{j:J_j\in J\setminus F_{e}(t)}\frac{p_j}{m}\), which can be implemented in polynomial time.

If \(t'\) is incurred by Event 1, then the next closest time \(t'\) can be computed by the following optimization problem:

$$\begin{aligned} t'=\min _{\begin{array}{c} S\subseteq J\\ S\setminus F_{e}(t)\ne \emptyset \end{array}}\frac{\nu _e(S)-\sum _{j:J_j\in S\cap F_{e}(t)}\beta _{e,j}}{|S\setminus F_{e}(t)|}=\min _{\begin{array}{c} S\subseteq J\\ S\setminus F_{e}(t)\ne \emptyset \end{array}}\frac{\nu _e(S)+\beta (S)}{\kappa (S)}, \end{aligned}$$
(3)

where \(\beta _{e,j}\) is the value of job \(J_j\) when \(J_j\) is frozen, and \(\kappa (\cdot )\) and \(\beta (\cdot )\) are two set functions, respectively, defined by

$$\begin{aligned} \kappa (S)=|S\setminus F_{e}(t)|\quad \text {and}\quad \beta (S)=\sum _{j:J_j\in S\cap F_{e}(t)}(-\beta _{e,j}) \end{aligned}$$

for any subset S with \(S\setminus F_{e}(t)\ne \emptyset \). For any subset \(S_1, S_2\subseteq J\) satisfying \(S_1\setminus F_{e}(t)\ne \emptyset \) and \(S_2\setminus F_{e}(t)\ne \emptyset \), we have

$$\begin{aligned} \beta (S_1)+\beta (S_2)= & {} \sum _{j:J_j\in S_1\cap F_{e}(t)}(-\beta _{e,j})+\sum _{j:J_j\in S_2\cap F_{e}(t)}(-\beta _{e,j})\\= & {} \sum _{j:J_j\in (S_1\cap F_{e}(t))\cup (S_2\cap F_{e}(t))}(-\beta _{e,j})+\sum _{j:J_j\in (S_1\cap F_{e}(t))\cap (S_2\cap F_{e}(t))}(-\beta _{e,j})\\= & {} \sum _{j:J_j\in (S_1\cup S_2)\cap F_{e}(t)}(-\beta _{e,j})+\sum _{j:J_j\in (S_1\cap S_2)\cap F_{e}(t)}(-\beta _{e,j})\\= & {} \beta (S_1\cup S_2)+\beta (S_1\cap S_2) \end{aligned}$$

and

$$\begin{aligned} \kappa (S_1)+\kappa (S_2)= & {} |S_1\setminus F_{e}(t)|+|S_2\setminus F_{e}(t)|\\= & {} |(S_1\setminus F_{e}(t))\cup (S_2\setminus F_{e}(t))|+|(S_1\setminus F_{e}(t))\cap (S_2\setminus F_{e}(t))|\\= & {} |(S_1\cup S_2)\setminus F_{e}(t)|+|(S_1\cap S_2)\setminus F_{e}(t)|\\= & {} \kappa (S_1\cup S_2)+\kappa (S_1\cap S_2). \end{aligned}$$

Therefore, \(\kappa (\cdot )\) and \(\beta (\cdot )\) are modular functions. Together with the fact that \(\nu _e(\cdot )\) is submodular, (3) is an optimization problem of minimizing the ratio of a submodular function and a modular function, which can be calculated in polynomial time Fleicher and Iwata (2003). \(\square \)

For any \(e\in \{r_j+p_j, j=0, 1, \ldots , n\}\) and for any \(1\le j\le n\), let \(\beta _{e,j}(t)\) be the variable of job \(J_j\) at time t which will increase with time t until job \(J_j\) is frozen.

Lemma 4

For any \(e\in \{r_j+p_j, j=0, 1, \ldots , n\}\), at any time t during the implementation of Algorithm A(e), the set \(R_e\) always satisfies

$$\begin{aligned} \pi (R_e)=\sum _{j:J_j\in R_e}\beta _{e,j}(t)+\pi (D_e). \end{aligned}$$
(4)

Proof

At time points \(t_1\) and \(t_2\) with \(t_2\ge t_1\) in Event 1, suppose that the corresponding selected job subsets are \(S_{e}(t_1)\) and \(S_{e}(t_2)\), respectively. By Event 1, we have

$$\begin{aligned} \nu _e(S_{e}(t_1))=\sum _{j:J_j\in S_{e}(t_1)\setminus F_{e}(t_1)}t_1+\sum _{j:J_j\in S_{e}(t_1)\cap F_e(t_1)}\beta _{e,j}=\sum _{j:J_j\in S_{e}(t_1)}\beta _{e,j}(t_1), \end{aligned}$$
(5)
$$\begin{aligned} \nu _e(S_{e}(t_2))=\sum _{j:J_j\in S_{e}(t_2)\setminus F_{e}(t_2)}t_2+\sum _{j:J_j\in S_{e}(t_2)\cap F_e(t_2)}\beta _{e,j}=\sum _{j:J_j\in S_{e}(t_2)}\beta _{e,j}(t_2). \end{aligned}$$
(6)

Observe that all jobs in \(S_{e}(t_1)\) are frozen at time \(t_1\) or even earlier. We obtain \(\beta _{e,j}(t_1)=\beta _{e,j}(t_2)\) for all these jobs \(J_j\).

Now we consider

$$\begin{aligned} \sum _{j:J_j\in S_{e}(t_1)\cup S_{e}(t_2)}\beta _{e,j}(t_2)+\sum _{j:J_j\in S_{e}(t_1)\cap S_{e}(t_2)}\beta _{e,j}(t_2) \end{aligned}$$

and for simplicity, we denote it by \(\triangle _{S_{e}(t_1), S_{e}(t_2)}(t_2)\). On the one hand, by (5), (6) and the comments below (6), we have

$$\begin{aligned}&\triangle _{S_{e}(t_1), S_{e}(t_2)}(t_2)\\&\quad =\sum _{j:J_j\in S_{e}(t_1)}\beta _{e,j}(t_2)+\sum _{j:J_j\in S_{e}(t_2)}\beta _{e,j}(t_2)\\&\quad =\sum _{j:J_j\in S_{e}(t_1)}\beta _{e,j}(t_1)+\sum _{j:J_j\in S_{e}(t_2)}\beta _{e,j}(t_2)\\&\quad =\nu _e(S_{e}(t_1))+\nu _e(S_{e}(t_2)). \end{aligned}$$

Since \(\nu _e(\cdot )\) is submodular, we obtain

$$\begin{aligned} \triangle _{S_{e}(t_1), S_{e}(t_2)}(t_2)\ge \nu _e(S_{e}(t_1)\cup S_{e}(t_2))+\nu _e(S_{e}(t_1)\cap S_{e}(t_2)). \end{aligned}$$
(7)

On the other hand, by Event 1, we have

$$\begin{aligned} \sum _{j:J_j\in S_{e}(t_1)\cap S_{e}(t_2)}\beta _{e,j}(t_2)\le \nu _e(S_{e}(t_1)\cap S_{e}(t_2)) \end{aligned}$$
(8)

and

$$\begin{aligned} \sum _{j:J_j\in S_{e}(t_1)\cup S_{e}(t_2)}\beta _{e,j}(t_2)\le \nu _e(S_{e}(t_1)\cup S_{e}(t_2)). \end{aligned}$$
(9)

From (8) and (9), we obtain

$$\begin{aligned} \triangle _{S_{e}(t_1), S_{e}(t_2)}(t_2)\le \nu _e(S_{e}(t_1)\cup S_{e}(t_2))+\nu _e(S_{e}(t_1)\cap S_{e}(t_2)). \end{aligned}$$
(10)

Combining (7)–(10), we get

$$\begin{aligned} \sum _{j:J_j\in S_{e}(t_1)\cup S_{e}(t_2)}\beta _{e,j}(t_2)=\nu _e(S_{e}(t_1)\cup S_{e}(t_2)). \end{aligned}$$
(11)

Thus the result follows from (2) and (11). \(\square \)

Lemma 5

For any \(e\in \{r_j+p_j | j=0, 1, \ldots , n\}\), let \(C_{\max }(\delta _{e})\) be the makespan of the schedule \(\delta _{e}\) output by Algorithm A(e). Let \(J_k\) be the job in \(\delta _{e}\) such that its completion time is \(C_{\max }(\delta _{e})\). Furthermore, let \(S_k\) be the starting time of \(J_k\) in \(\delta _{e}\). Thus we have

$$\begin{aligned} S_k\ge r_k \quad \text {and}\quad C_{\max }(\delta _{e})=S_k+p_k. \end{aligned}$$
(12)

Moreover, if \(S_k> r_k\), then all machines are busy in the time interval \([r_k,S_k)\).

Proof

The proof of (12) is a routine. Now we assume that some machine \(M_i\) is idle at time t with \(r_k\le t\le S_k\) for a contradiction. Note that \(J_k\) is released and unscheduled at time t in \(\delta _{e}\). Thus, by the ERD-rule in Step 4 of Algorithm A(e), there must exist some job (\(J_k\) or another released and unscheduled job) which should be processed at time t on \(M_i\). This contradicts the fact that \(M_i\) is idle at time t. The result holds. \(\square \)

Let \(\delta ^{*}\) be an optimal schedule for the problem \(P|r_j, reject|C_{\max }+\pi (R)\). Let \(A^{*}\) and \(R^{*}\) be the sets of the accepted jobs and the rejected jobs in \(\delta ^{*}\), respectively. Let \(e^{*}=\max \{r_j+p_j|J_j\in A^{*}\}\) if \(A^{*}\ne \emptyset \) and let \(e^{*} = 0\) if \(A^{*}=\emptyset \).

Lemma 6

For any optimal schedule \(\delta ^{*}\), we have \(\pi (R^*)=\pi (R^{*}\cup D_{e^*})\).

Proof

Let \(\delta ^{*}\) be any optimal schedule. If \(D_{e^{*}}\setminus R^{*}=\emptyset \), then \(D_{e^{*}}\subseteq R^{*}\) and hence the lemma holds. Otherwise, let \(\delta '\) be the schedule that the jobs in \(R'=R^{*}\cup D_{e^{*}}\) are rejected and the jobs in \(A'=J\setminus (R^{*}\cup D_{e^{*}})\) are accepted and processed in ERD-rule. Clearly, we have \(A'\subset A^{*}=J\setminus R^{*}\), which implies that \(\delta '\) is an optimal schedule to process the jobs in \(A'\) by Lemma 1. Thus,

$$\begin{aligned} C_{\max }(\delta ')\le C_{\max }(\delta ^{*}), \end{aligned}$$
(13)

where \(C_{\max }(\delta ^{*})\) (resp. \(C_{\max }(\delta ')\)) is the makespan of \(\delta ^{*}\) (resp. \(\delta '\)).

By the definition of \(e^{*}\), we have \(D(e^{*})=\{J_j\in J|r_i+p_j>e^{*}\}\subseteq R^{*}\). Note that \(D(e^{*})\subseteq D_{e^{*}}\). We obtain

$$\begin{aligned} D(e^{*})\subseteq R^{*}\cap D_{e^{*}}. \end{aligned}$$

By the definition of \(D_{e^{*}}\), we have \(\pi (D_{e^{*}})\le \pi (S)\) for any \(S\supseteq D(e^{*})\). Thus,

$$\begin{aligned} \pi (D_{e^{*}})\le \pi (R^{*}\cap D_{e^{*}}). \end{aligned}$$
(14)

Since \(\pi (\cdot )\) is a submodular function, we have

$$\begin{aligned} \pi (R^{*})+\pi (D_{e^{*}})\ge \pi (R^{*}\cup D_{e^{*}})+\pi (R^{*}\cap D_{e^{*}})=\pi (R')+\pi (R^{*}\cap D_{e^{*}}). \end{aligned}$$

Together with (14), we get

$$\begin{aligned} \pi (R')\le \pi (R^{*}). \end{aligned}$$
(15)

By (13) and (15), we find that the objective value of \(\delta '\) is no more than that of \(\delta ^{*}\). Thus, \(\delta '\) is an optimal scheduling with \(R'=R^{*}\cup D_{e^{*}}\) being the rejected set, which implies that

$$\begin{aligned} C_{\max }(\delta ')+\pi (R')=C_{\max }(\delta ^{*})+\pi (R^{*}). \end{aligned}$$
(16)

Combining (13), (15) with (16), we obtain \(\pi (R^{*})=\pi (R')=\pi (R^{*}\cup D_{e^{*}})\). The lemma holds. \(\square \)

Theorem 1

Algorithm 2 is a 2-approximation algorithm for the problem \(P|r_j,\) \(reject|C_{\max }+\pi (R)\).

Proof

Let \(\delta ^{*}\) be an optimal schedule and let \(Z^{*}\) be the objective value of \(\delta ^{*}\). If the optimal schedule \(\delta ^{*}\) rejects all the jobs, then \(\delta ^{*}\) is the schedule output by Algorithm A(e) with \(e=0\), which implies that the theorem holds. Now we assume that \(A^*\ne \emptyset \). By the definition of \(e^{*}\), we have

$$\begin{aligned} Z^{*}\ge e^{*}. \end{aligned}$$
(17)

Let \(C_{\max }(\delta ^{*})\) be the makespan of \(\delta ^{*}\). Note that the total processing time of the jobs in \(A^{*}\) is exactly \(\sum _{J_j\in A^{*}}p_j\). By the pigeonhole principle, we have

$$\begin{aligned} C_{\max }(\delta ^{*})\ge \frac{\sum _{J_j\in A^{*}}p_j}{m}=\sum _{J_j\in A^{*}}\frac{p_j}{m}. \end{aligned}$$

Now consider Algorithm \(A(e^*)\). By Event 2, we have \(\beta _{e^{*},j}\le \frac{p_j}{m}\) for any job \(J_j\) in J and \(\beta _{e^{*},j}= \frac{p_j}{m}\) for any job \(J_j\) in \(A_{e^{*}}\), where \(\beta _{e^{*},j}\) is the frozen variable of job \(J_j\) and \(A_{e^{*}}\) is the set of accepted jobs produced by Algorithm \(A(e^*)\). By Event 1, we have \(\pi (R^{*}\cup D_{e^{*}})\ge \sum _{j:J_j\in R^{*}}\beta _{e^{*},j}+\pi (D_{e^{*}})\). Thus, we have

$$\begin{aligned} \begin{array}{rcl} Z^{*}&{}=&{}C_{\max }(\delta ^{*})+\pi (R^{*})\\ &{}=&{}C_{\max }(\delta ^{*})+\pi (R^{*}\cup D_{e^{*}})\\ &{}\ge &{}\sum _{J_j\in A^{*}}\frac{p_j}{m}+\pi (R^{*}\cup D_{e^{*}})\\ &{}\ge &{}\sum _{J_j\in A^{*}}\frac{p_j}{m}+\sum _{j:J_j\in R^{*}}\beta _{e^{*},j}+\pi (D_{e^{*}})\\ &{}\ge &{}\sum _{J_j\in A^{*}}\beta _{e^{*},j}+\sum _{j:J_j\in R^{*}}\beta _{e^{*},j}+\pi (D_{e^{*}})\\ &{}=&{}\sum _{j:J_j\in J}\beta _{e^{*},j}+\pi (D_{e^{*}}), \end{array} \end{aligned}$$
(18)

where the second equality follows from Lemma 6.

Let \(C_{\max }(\delta _{e^{*}})\) be the makespan of the schedule \(\delta _{e^{*}}\) output by Algorithm \(A(e^*)\). Let \(J_k\) be the job in \(\delta _{e^{*}}\) such that its completion time is \(C_{\max }(\delta _{e^{*}})\). Furthermore, let \(S_k\) be the starting time of \(J_k\) in \(\delta _{e^{*}}\). Then by Lemma 5, we have

$$\begin{aligned} C_{\max }(\delta _{e^{*}})=S_k+p_k\quad \text {and}\quad S_k- r_k\le \frac{\sum _{J_j\in A_{e^{*}}}p_j}{m}=\sum _{J_j\in A_{e^{*}}}\frac{p_j}{m}. \end{aligned}$$

Note further that \(r_k+p_k\le e^{*}\) since \(J_k\in A_{e^{*}}\).

Let \(\delta \) be the schedule output by Algorithm 2. Let Z and \(Z_{e^{*}}\) be the objective values of the schedule \(\delta \) and the schedule \(\delta _{e^{*}}\), respectively. Thus, we have

$$\begin{aligned} Z\le & {} Z_{e^{*}}\\= & {} C_{\max }(\delta _{e^{*}})+\pi (R_{e^{*}})\\= & {} S_k+p_k+\sum _{j:J_j\in R_{e^{*}}}\beta _{e^{*},j}+\pi (D_{e^{*}})\\= & {} r_k+(S_k-r_k)+p_k+\sum _{j:J_j\in R_{e^{*}}}\beta _{e^{*},j}+\pi (D_{e^{*}})\\\le & {} e^{*}+ \sum _{J_j\in A_{e^{*}}}\frac{p_j}{m}+\sum _{j:J_j\in R_{e^{*}}}\beta _{e^{*},j}+\pi (D_{e^{*}})\\= & {} e^{*}+\sum _{j:J_j\in J}\beta _{e^{*},j}+\pi (D_{e^{*}})\\\le & {} 2Z^{*}, \end{aligned}$$

where the first inequality follows from Algorithm 2, the second equality follows from Lemma 4 and the last inequality follows from (17) and (18). \(\square \)

5 Conclusion

In this paper, we proposed the parallel-machine scheduling problem with release dates and submodular rejection penalties, which generalized both the single machine scheduling problem with release dates and submodular rejection penalties (Liu and Li 2020) and the parallel-machine scheduling problem with release dates and rejection (Zhang and Lu 2016). We design a 2-approximation algorithm for this problem based on the primal-dual framework.