1 Introduction

Scheduling problem is to find a minimum scheduling such that all jobs are processed on the machines (Graham 1966). In real life, manufacturers often outsource jobs with lower processing returns to obtain greater earnings, where the outsourcing cost of a job is regarded as the penalty for rejecting this job. This problem is called scheduling problem with rejection (Bartal et al. 2000), in which each job can be rejected, but a rejection penalty is paid, and the objective is to minimize the total makespan of accepted jobs and the penalty of rejected jobs.

The relationship between the number of rejected jobs and the penalties tends to be submodular rather than linear (Zhang et al. 2018; Liu and Li 2020), where the submodular function is a set function with the property of decreasing marginal return, i.e.,

$$\begin{aligned} \pi (S)+\pi (T)\ge \pi (S\cup T)+\pi (S\cap T),~\forall ~S, T \subseteq J. \end{aligned}$$
(1)

Liu and Li (2020) considered the single machine scheduling problem with release dates and submodular penalties, in which each job has a release time and the rejection penalty is determined by a submodular function

In the above problems, information of all jobs is known in advance, however, with the advent of the big data era, manufacturers must immediately and irrevocably decide to either process or reject the job when it arrives, which is denoted as on-line. For example, under the cloud computing framework, massive computing tasks generated by users are uploaded to the cloud for computing. To ensure the quality of service, the cloud must immediately decide whether to compute the task when it arrives, where the task that is not computed must pay a rejection penalty. From the perspective of maximization of profit, a cloud hopes that the tasks generated by the users should be computed as much as possible. Thus, in this paper, we consider the single machine scheduling problem with release dates and submodular penalties for the off-line case and the on-line case, respectively, where each job is either rejected or accepted and processed on the machine. The objective is to minimize the makespan of the accepted jobs plus the rejection penalty of the rejected jobs, which is determined by a submodular function.

1.1 Results and outline of the paper

In Sect. 2, we consider the off-line single machine scheduling problem with release dates and submodular penalties and present a simple algorithm.

In Sect. 3, we consider the on-line single machine scheduling problem with release dates and submodular penalties. First, we prove that there is no on-line algorithm with a constant competitive ratio if the penalty submodular function is not monotone. Second, we present an on-line algorithm with a competitive ratio of 3 if the penalty submodular function is monotone. Finally, we consider a special case of the on-line problem in which all jobs have the same release date. We prove that there is no on-line algorithm with a competitive ratio of \(\frac{\sqrt{5}+1}{2}\approx 1.618\), and the competitive ratio of the on-line algorithm we presented is 2. In Sect. 4, we conduct a simple simulation to evaluate the performance of the on-line algorithm.

We conclude the paper and suggest some possible future research in the last section.

1.2 Related work

For off-line scheduling problem, we are given n jobs and m machines, where each job has a processing time. The objective is to minimize the makespan of all jobs. Graham (1966) proved that it is strongly NP-hard, and presented the classical \((2-1/m)\)-approximation list scheduling (LS, for short) algorithm. Then, Graham (1969) presented a 4/3-approximation algorithm. Hochbaum and Shmoys (1987) presented the first polynomial time approximation scheme (PTAS). To the best of our knowledge, Jansen et al. (2020) presented the best efficient polynomial time approximation scheme (EPTAS). Jansen and Porkolab (2001) considered this problem when the number of machines is given, and presented a fully polynomial-time approximation scheme (FPTAS). Li et al. (2012) considered this problem on two machines, in which each job and machine are labeled with the grade of service levels, and presented an EPTAS.

For off-line scheduling problem with rejection, each job has a processing time and a rejection penalty. The objective is to minimize the total makespan of accepted jobs and the penalty of rejected jobs. Bartal et al. (2000) proposed a 2-approximation algorithm and a PTAS. Ou et al. (2015) proposed a (\(3/2+\epsilon \))-approximation algorithm, where \(\epsilon \) is a small given positive constant. Li et al. (2015) considered this problem under the rejection constraint, and the objective is to minimize the makespan of the accepted jobs and the total penalty of the rejected jobs is no more than a given bound. They presented an FPTAS.

For off-line scheduling problem with rejection and release data, each job has a release date, where jobs cannot be processed before their corresponding release dates. Zhang and Lu (2016) proposed a 2-approximation algorithm. Then, Zhong and Ou (2017) proposed a PTAS. When there are exactly two machines. Zhong et al. (2017) proposed a (3/2+\(\epsilon \))-approximation algorithm. When the number of machines is one, Zhang et al. (2009) proved that it is NP-hard, and presented a 2-approximation algorithm, which was independently improved by He et al. (2016) and Ou et al. (2016).

For off-line scheduling problem with submodular penalties, the penalty is determined by a submodular function. Liu and Li (2021) considered the scheduling problem with submodular penalties, and presented a \((2-1/m)\)-approximation. Zhang et al. (2018) considered the precedence-constrained scheduling with submodular penalties on parallel machines, and proposed a 3-approximation algorithm. Zheng et al. (2022) and Wang and Liu (2022) independently considered the parallel-machine scheduling problem with release dates and submodular penalties, and presented a 2-approximation algorithm. Liu and Li (2020) considered the single machine scheduling problem with release dates and submodular penalties, and proposed a 2-approximation algorithm.

For on-line scheduling problem, jobs arrive one by one, and each job has to be immediately and irrevocably assigned using the arrived jobs. The performance of an on-line algorithm is measured by the competitive ratio. For on-line scheduling problem, Graham (1966) presented the LS algorithm with a competitive ratio of \(2-1/m\). Several algorithms have been published that have a better competitive ratio than the LS algorithm. The possible competitive ratio for on-line scheduling problem is approximately 1.92 (Albers 1999; Fleischer and Wahl 2000), whereas the best lower bound for that problem is 1.88 by (Rudin 2001). For on-line scheduling problem with rejection, Bartal et al. (2000) presented an on-line algorithm with the best-possible competitive ratio of \((\sqrt{5}+3)/2\approx 2.618\). For on-line scheduling problem with rejection and release data, Lu et al. (2011) provided an on-line algorithm with the best-possible competitive ratio of 2 when the number of machines is one.

2 A simple algorithm for the off-line problem

In this section, we present a simple off-line algorithm for the off-line single machine scheduling problem with release dates and submodular rejection penalties, inspired the idea in (Zhang et al. 2009).

In the off-line single machine scheduling problem with release dates and submodular rejection, we are given an instance \((J,\pi )\), where \(J = \{1, 2,\ldots , n\}\) is a job set, and \(\pi (\cdot )\) is a submodular penalty function. Each job j, is described by the processing time \(p_j\) and the release date \(r_j\), is either accepted and processed on the machine or rejected. The off-line problem is to choose a set \(A_{J}\subseteq J\) of accepted jobs and reject the remaining jobs on the machine. The objective is to minimize the sum of the makespan of the accepted jobs and the penalty cost \(\pi (\overline{A_{J}})\), where

$$\begin{aligned} \overline{A_{J}}=J\setminus A_{J} \mathrm{~ is ~the~ complement ~of~} A_J, \end{aligned}$$

and we assume that \(\pi (\cdot )\) can be computed in polynomial time for any subset \(S\subseteq J\).

If rejection is not allowed, Lawler (1973) shows that this problem can be solved using the earliest release date rule. Thus, we have the following lemma.

Lemma 2.1

( Lawler (1973)) There exists an optimal solution such that the accepted jobs are processed using the earliest release date rule.

Inspired the idea in (Zhang et al. 2009), our algorithm finds a job subset \(A_{J,t}\subseteq J\) for each \(t\in \{0,1,2,\dots ,n\}\), and outputs the job subset with the minimum objective value, where t represents the maximum index of the job to be processed in the job subset. Thus, for any \(t\in \{0,1,2,\dots ,n\}\), jobs in \(B_t=\{ j\in J|{j} > t \}\) are all rejected, and only jobs in \(J{\setminus } B_t\) need to be considered for being either processed or rejected. To determine the \(A_{J,t}\), an auxiliary function \(p\pi _t(\cdot )\) defined on all the subset of \(J\setminus B_t\) is constructed as follows.

$$\begin{aligned} p\pi _t(S)=p( \overline{S\cup B_t})+\pi (S\cup B_t), ~\forall ~S\subseteq J\setminus B_t, \end{aligned}$$

where we define

$$\begin{aligned} p(J')=\sum _{j:j\in J'}p_j,~\forall J'\subseteq J. \end{aligned}$$

Lemma 2.2

\(p\pi _t(\cdot )\) is a submodular function.

Proof

Since \(B_t=\{ j\in J|{j} > t \}\), for any two job sets \(S_1,S_2\subseteq J\setminus B_t\), we have

$$\begin{aligned}{} & {} p\pi _t(S_1)+p\pi _t(S_2)\\= & {} p(\overline{S_1\cup B_t})+\pi (S_1\cup B_t)+p(\overline{S_2\cup B_t})+\pi (S_2\cup B_t)\\= & {} p(\overline{(S_1\cup S_2)\cup B_t})+p(\overline{(S_1\cap S_2)\cup B_t})+\pi (S_1\cup B_t)+\pi (S_2\cup B_t)\\\ge & {} p(\overline{(S_1\cup S_2)\cup B_t})+p(\overline{(S_1\cap S_2)\cup B_t})+\pi ((S_1\cup S_2)\cup B_t)+\pi ((S_1\cap S_2)\cup B_t)\\= & {} p\pi _t(S_1\cup S_2)+p\pi _t(S_1\cap S_2), \end{aligned}$$

where the inequality follows from the fact that \(\pi (\cdot )\) is a submodular function. Thus, \(p\pi _t(\cdot )\) is a submodular function. \(\square \)

Based on Lemma 2.2 and the algorithm designed in Iwata and Orlin (2009),

$$\begin{aligned} S_{t}:=\arg \min _{S: S\subseteq J \setminus B_t}p\pi _t(S) \end{aligned}$$

i.e.,

$$\begin{aligned} p\pi _t(S_{t})\le p\pi _t(S),~\forall S\subseteq J\setminus B_t. \end{aligned}$$
(2)

can be found in polynomial-time. Then, we defined \(A_{J,t}=J\setminus (S_t\cup B_t)=\overline{S_t\cup B_t}\), and the objective value is \(Z_{J,t}=r_t+p(A_{J,t})+\pi (\overline{A_{J,t}})\), where \(r_0=0\). Output \(A_J=\arg \min _{A_{J,t}}Z_{J,t}\). We provide the detailed the simple off-line algorithm in Algorithm 1.

figure a

Let \(Z^*\) be the optimal value. We have the following lemma.

Theorem 2.3

\(Z_J\le 2Z^*\).

Proof

Let \(A^*\) be an optimal solution and its objective value is \(Z^*\). If the optimal solution rejects all the jobs, \(A^*=\emptyset \), then \(A_{J,0}\) is exactly the optimal solution, which implies that the theorem holds; otherwise, let \(t_{A^*}=\max \{ j|j \in A^* \}\) be the maximum index of the job in \(A^*\), and

$$\begin{aligned} r_{t_{A^*}}+\pi (\overline{A^*})\le C(A^*)+\pi (\overline{A^*}) =Z^*, \end{aligned}$$
(3)

where \(C(A^*)\) is the makespan of the jobs in \(A^*\) using the earliest release date rule.

Since \(A_{J,t_{A^*}}=\overline{S_{t_{A^*}}\cup B_{t_{A^*}}}\), we have

$$\begin{aligned} p(A_{J,t_{A^*}})+\pi (\overline{A_{J,t_{A^*}}})= & {} p(\overline{S_{t_{A^*}}\cup B_{t_{A^*}}})+\pi (S_{t_{A^*}}\cup B_{t_{A^*}})\nonumber \\= & {} p\pi _{t_{A^*}}(S_{t_{A^*}})\nonumber \\\le & {} p\pi _{t_{A^*}}(\overline{A^*}\setminus B_{t_{A^*}})\nonumber \\= & {} p(\overline{(\overline{A^*}\setminus B_{t_{A^*}})\cup B_{t_{A^*}}})+\pi ((\overline{A^*}\setminus B_{t_{A^*}})\cup B_{t_{A^*}})\nonumber \\= & {} p(A^*)+\pi (\overline{A^*})\nonumber \\\le & {} Z^*, \end{aligned}$$
(4)

where the fourth equality follows from \(B_{t_{A^*}}=\{j\in J| j>t_{A^*}\}\subseteq \overline{A^*}\), the first inequality follows from inequality (2), and the second inequality follows from \(p(A^*)\le C(A^*)\).

Thus, the objective value of \(A_J\) generated by Algorithm 1 is

$$\begin{aligned} Z_J\le & {} Z_{J,t_{A^*}}\\= & {} r_{t_{A^*}}+p(A_{J,t_{A^*}})+\pi (\overline{A_{J,t_{A^*}}})\\\le & {} 2Z^*, \end{aligned}$$

where the first inequality follows from \(A_{J,t}=\arg \min _{A_{J,t}}Z_{J,t}\), and the second inequality follows from inequalities (3) and (4). \(\square \)

Next, we use an example to illustrate our algorithm and analysis of Theorem 2.3 is tight:

Example 1

we are given an instance \((J,\pi )\), where \(J=\{1,2,\ldots ,n\}\); \(r_j=0\) and \(p_j=1\) for \(j\in \{1,2,\ldots ,n-1\}\); \(r_{n}=n-1\) and \(p_{n}=1\); and the submodular function \(\pi (\cdot )\) is defined as follows:

$$\begin{aligned} \left\{ \begin{aligned}&\pi (\emptyset )=0;\\&\pi (\{n\})=2n-1;\\&\pi (S)=\sum _{j:j\in S}(1+\frac{1}{n}),~\mathrm{if~}n\notin S;\\&\pi (S)=2n-1,~\textrm{otherwise}. \end{aligned}\right. \end{aligned}$$

It is easy to obtain that the optimal solution is to process all jobs, and the optimal value is \(Z^*=n\).

Let \(A_{J}\) be the output job set generated by Algorithm 1, and \(A_{J}=\arg \min _{A_{J,t}}Z_{J,t}\), where \(Z_{J,t}\) is the objective value of \(A_{J,t}\). When \(t\in \{0,1,2,\ldots ,n-1\}\), job n is in \(B_t\) and \(n\notin A_{J,t}\). This means, \(A_{J,t}\) is empty set and its objective value is \(Z_{J,t}=r_{t}+p(A_{J,t})+\pi (\overline{A_{J,t}})=0+p(\emptyset )+\pi (J)=2n\); When \(t= n\), \(A_{J,t}\) contains all jobs in J and the objective value of \(A_{J,t}\) is \(Z_{J,t}=r_{n}+p(A_{J,t})+\pi (\overline{A_{J,t}})=n-1+p(J)+\pi (\emptyset )=2n-1\). This means, \(A_{J}=A_{J,n}=J\), and its objective is \(Z=Z_{J,n}= 2n-1\). Thus, we have

$$\begin{aligned} \frac{Z}{Z^*}=\frac{2n-1}{n+1}\longrightarrow 2,\mathrm{~when~} n\rightarrow \infty . \end{aligned}$$

3 The on-line problem

In this section, we consider the on-line single machine scheduling problem with release dates and submodular rejection penalties. First, we provide a formal problem statement, and proved that the lower bound of this problem. Second, we present an on-line algorithm. Finally, we consider a special case of this problem.

3.1 Low bound

We are given a single machine and a sequence of n jobs, \(J=\{1,2,\ldots ,n\}\), arriving online and a submodular penalty function \(\pi (\cdot )\), which is given as a value oracle. Each job j in J, is described by the processing time \(p_j\) and the release date \(r_j\), where we assume that

$$\begin{aligned} r_{j}\le r_{j'}~ \mathrm{for~any } ~1\le j<j'\le n. \end{aligned}$$

The job is to be either scheduled or rejected immediately and irrevocably at the time of it arrivals. The on-line problem is to find a job subset \(A^{J}\subseteq J\). The objective is to minimize the sum of the makespan of the accepted jobs and the penalty cost \(\pi (\overline{A^{J}})\), where

$$\begin{aligned} \overline{A_{J}}=J\setminus A_{J} \mathrm{~ is ~the~ complement ~of~} A, \end{aligned}$$

Theorem 3.1

There is no on-line algorithm with a constant competitive ratio for the on-line problem if the penalty submodular function \(\pi (\cdot )\) is not monotone, where monotone function satisfies \(\pi ( S) \le \pi ( T) \) for any \( S\subseteq T\subseteq J\).

Proof

We are given a sequence of two jobs, \(J=\{1,2\}\), arriving on-line, and a penalty submodular function \(\pi (\cdot )\) defined as follows,

$$\begin{aligned} \left\{ \begin{aligned}&\pi (\emptyset )=0;\\&\pi (\{1\})=P^2+1;\\&\pi (\{2\})=1;\\&\pi (\{1,2\})=1. \end{aligned}\right. \end{aligned}$$

Job 1 arrives, where \(r_1=0\) and \(p_1=P\). For any algorithm \({\mathcal {A}}\), If \({\mathcal {A}}\) rejects job 1, then no job arrives, and we have \(Z_{{\mathcal {A}}}=\pi (\{1\})=P^2+1\) and \(Z^*=r_1+p_1=P\), where \(Z_{{\mathcal {A}}}\) is the objective value of job set generated by \({\mathcal {A}}\) and \(Z^*\) is the optimal value. This implies that

$$\begin{aligned} \frac{Z_{{\mathcal {A}}}}{Z^*}=\frac{P^2+1}{P}> P. \end{aligned}$$

If \({\mathcal {A}}\) processes job 1 on the machine, then job 2 arrives, where \(r_2=1\) and \(p_2=P\). The optimal solution is to reject all jobs and its objective value is \(Z^*=\pi (\{1,2\})=1\). If \({\mathcal {A}}\) rejects job 2, we have \(Z_{{\mathcal {A}}}=0+P+\pi (\{2\})=P+1\); otherwise \({\mathcal {A}}\) processes job 2 on the machine, we have \(Z_{{\mathcal {A}}}=0+P+P=2P\). These statements imply that

$$\begin{aligned} \frac{Z_{{\mathcal {A}}}}{Z^*}\ge \frac{P+1}{1}> P. \end{aligned}$$

Since P is any positive number, the lemma holds. \(\square \)

Therefore, in the following part of this section, we assume that the penalty submodular function is monotone.

3.2 An algorithm for the on-line problem

Then, we present an on-line algorithm based on the off-line algorithm. For any \(l\in \{1,2,\ldots ,n\}\), let \(J_l=\{1,\ldots ,l\}\) be the set of the first l jobs in J. When the job l arrives, all information of the jobs in \(J_l\) is revealed. Thus, we can find a job set \(A_{J_l}\) for instance \((J_{l},\pi )\) using the Algorithm 1.

Further analysis of the relationship between \(A_{J_l}\) and job l, we can get the following lemmas. Let \(t_{A_{J_l}}=\max \{j|j\in A_{J_l}\}\).

Lemma 3.2

For any \(t\in \{0,1,2,\ldots ,l\}\) and any \(S\subseteq J_l\) with \(B_t=\{j\in J_l|j>t\}\subseteq S\), we have

$$\begin{aligned} r_{t_{A_{J_l}}}+p(A_{J_l})+\pi (\overline{A_{J_l}})\le r_{t}+p({\overline{S}})+\pi (S). \end{aligned}$$

In particular, for any \(S\subseteq J\), we have

$$\begin{aligned} r_{t_{A_{J_l}}}+p(A_{J_l})+\pi (\overline{A_{J_l}})\le r_{l}+p({\overline{S}})+\pi (S). \end{aligned}$$

Proof

Given an integer \(t\in \{0,1,2,\ldots ,l\}\), for any \(S\subseteq J_l\) with \(B_t\subseteq S\), since \(A_{J_l,t}=\overline{S_{t}\cup B_{t}}\), we have

$$\begin{aligned} p(A_{J_l,t})+\pi (\overline{A_{J_l,t}})= & {} p(\overline{S_{t}\cup B_{t}})+\pi (S_{t}\cup B_{t})\\= & {} p\pi _t(S_t)\\\le & {} p\pi _t(S\setminus B_t)\\= & {} p(\overline{({S}\setminus B_{t})\cup B_{t}})+\pi (({S}\setminus B_{t})\cup B_{t})\\= & {} p({\overline{S}})+\pi (S), \end{aligned}$$

where \(S_{t}\) and \(A_{J_l,t}\) are the job sets generated by the off-line algorithm for instance \((J_{l},\pi )\), the last equality follows from \(B_t\subseteq S\), and the inequality follows from inequality (2). Since \(A_{J_l}=\arg \min _{A_{J_l,t}}Z_{J_l,t}\), the objective value of \(A_{J_l}\) is

$$\begin{aligned} r_{t_{A_{J_l}}}+p(A_{J_l})+\pi (\overline{A_{J_l}})\le r_t+ p(A_{J_l,t})+\pi (\overline{A_{J_l,t}})\le r_t+ p({\overline{S}})+\pi (S), \end{aligned}$$

for any set S with \(B_t\subseteq S\). Since \(B_l=\emptyset \) and

$$\begin{aligned} r_{t_{A_{J_l}}}+p(A_{J_l})+\pi (\overline{A_{J_l}})\le r_l+ p({\overline{S}})+\pi (S), ~\forall S\subseteq J. \end{aligned}$$

\(\square \)

Lemma 3.3

There exists an optimal solution \(A_{J_l}^*\) satisfying \(l\in A_{J_l}^*\) when \(l\in A_{J_l}\).

Proof

Assume that any optimal solution satisfying that job l is rejected, i.e., \(l\notin A_{J_l}^*\), where \(A_{J_l}^*\) is an optimal solution. Since \(l\in A_{J_l}\), then \(A_{J_l}^*\cup A_{J_l}\) is not an optimal solution, and

$$\begin{aligned} Z_{J_l}^*= & {} C(A_{J_l}^*)+\pi (\overline{A_{J_l}^*})\nonumber \\< & {} C(A_{J_l}^*\cup A_{J_l})+\pi (\overline{A_{J_l}^*\cup A_{J_l}})\nonumber \\\le & {} \max \{C(A_{J_l}^*),r_{l}\}+p(A_{J_l}\setminus A_{J_l}^* )+\pi (\overline{A_{J_l}^*\cup A_{J_l}}), \end{aligned}$$
(5)

where \(Z_{J_l}^*\) is the optimal value for instance \((J_l,\pi )\), and \(C(A^*)\) is the makespan of the jobs in \(A_{J_l}^*\) using the earliest release date rule.

Case 1. if \(r_l\le C(A_{J_l}^*)\), by rearranging inequality (5), we have

$$\begin{aligned} p(A_{J_l}\setminus A_{J_l}^* )> & {} Z_{J_l}^*-C(A_{J_l}^*)-\pi (\overline{A_{J_l}^*\cup A_{J_l}})\nonumber \\= & {} \pi (\overline{A_{J_l}^*})-\pi (\overline{A_{J_l}^*\cup A_{J_l}})\nonumber \\\ge & {} \pi (\overline{A_{J_l}^*\cap A_{J_l}})-\pi (\overline{A_{J_l}}). \end{aligned}$$
(6)

where the second equality follows from inequality (1). Then,

$$\begin{aligned} p(A_{J_l})+\pi (\overline{A_{J_l}})= & {} p(A_{J_l}\cap A_{J_l}^*)+p(A_{J_l}\setminus A_{J_l}^*)+\pi (\overline{A_{J_l}})\nonumber \\> & {} p(A_{J_l}\cap A_{J_l}^*)+\pi (\overline{A_{J_l}^*\cap A_{J_l}}), \end{aligned}$$

where the inequality follows from inequality (6). This statement and \(r_l\ge r_{t_{A^*_{J_l}}}\) imply that

$$\begin{aligned} r_l+p(A_{J_l})+\pi (\overline{A_{J_l}})>r_{t_{A^*_{J_l}}}+p(A_{J_l}\cap A_{J_l}^*)+\pi (\overline{A_{J_l}^*\cap A_{J_l}}), \end{aligned}$$

which contradicts Lemma 3.2 by \(B_{t_{A^*_{J_l}}}\subseteq \overline{A_{J_l}^*\cap A_{J_l}}\) and \(t_{A_{J_l}}=\max \{j|j\in A_{J_l}\}=l\), where \(t_{A^*_{J_l}}=\max \{j|j\in A^*_{J_l}\}\).

Case 2. if \(r_l> C(A_{J_l}^*)\), let \(t_{A^*_{J_l}}=\max \{j|j\in A^*_{J_l}\}\), then

$$\begin{aligned} r_{t_{A^*_{J_l}}}\le C(A_{J_l}^*)< r_l \end{aligned}$$

and

$$\begin{aligned} Z_{J_l}^*\ge r_{t_{A^*_{J_l}}}+\pi (\overline{A^*_{J_l}}) ~\mathrm{~and~}~ Z_{J_l}^*< r_l+p(A_{J_l}\setminus A_{J_l}^* )+\pi (\overline{A_{J_l}^*\cup A_{J_l}}), \end{aligned}$$
(7)

where the first inequality follows from inequality (3), and the second inequality follows from inequality (5). Thus, we have

$$\begin{aligned} r_l-r_{t_{A^*_{J_l}}}> & {} Z_{J_l}^*-p(A_{J_l}\setminus A_{J_l}^* )-\pi (\overline{A_{J_l}^*\cup A_{J_l}})-\Big (Z_{J_l}^*-\pi (\overline{A^*_{J_l}}) \Big )\\= & {} \pi (\overline{A^*_{J_l}})-\pi (\overline{A_{J_l}^*\cup A_{J_l}})-p(A_{J_l}\setminus A_{J_l}^* )\\\ge & {} \pi (\overline{A^*_{J_l}\cap A_{J_l}})-\pi (\overline{A_{J_l}})-p(A_{J_l}\setminus A_{J_l}^* )\\= & {} \pi (\overline{A^*_{J_l}\cap A_{J_l}})-\pi (\overline{A_{J_l}})-(p(A_{J_l} )-p(A_{J_l}\cap A_{J_l}^* ))\\= & {} p(A_{J_l}\cap A_{J_l}^*) +\pi (\overline{A^*_{J_l}\cap A_{J_l}})-(p(A_{J_l} )+\pi (\overline{A_{J_l}})), \end{aligned}$$

where the first inequality follows from inequality (7), and the second inequality follows from inequality (1). This implies that \(r_l+p(A_{J_l} )+\pi (\overline{A_{J_l}})>r_{t_{A^*_{J_l}}} +p(A_{J_l}\cap A_{J_l}^*) +\pi (\overline{A^*_{J_l}\cap A_{J_l}}),\) which contradicts Lemma 3.2.

Thus, the lemma holds. \(\square \)

Then, we introduce the on-line algorithm. When a new job j is arrives, the algorithm works by assigning job j based on the job set \(A_{J_j}\) generated by Algorithm 1 for instance \((J_j,\pi )\). If \(j\in A_{J_j}\), job j is processed on the machine; otherwise, job j is rejected. Then, we provide the detailed the on-line algorithm in Algorithm 2.

figure b

For any \(j\in \{1,2,\ldots ,n\}\), let \(Z^*_{J_j}\) be the optimal value for instance \((J_j,\pi )\) of the off-line case, and let \(C^j\) be the makespan when job j is assigned by Algorithm 2.

Lemma 3.4

\(C^j\le Z^*_{J_j},~\forall j\in \{1,2,\ldots , n\}.\)

Proof

Our proof is by mathematical induction. When \(j=1\), if \(A_{J_1}=\{1\}\), then job 1 is processed on the machine by Algorithm 2, and \(C^1=r_1+p_1=Z_{J_1}^*\) by Lemma 3.3; otherwise, then job 1 is rejected by Algorithm 2, and \(C^1=0\le Z^*_{J_1}\). Therefore, we have \(C^1\le Z^*_{J_1}\).

We prove that \(C^l\le Z^*_{J_{l}}\) if \(C^{l-1}\le Z^*_{J_{l-1}}\) for any integer \(l> 1\), and show this by proving the following two cases.

Case 1. if \(l\notin A_{J_l}\), then job l is rejected by Algorithm 2, and

$$\begin{aligned} C^l=C^{l-1}\le Z^*_{J_{l-1}}\le Z^*_{J_{l}}. \end{aligned}$$

Case 2. if \(l\in A_{J_l}\), then job l is processed on the machine by Algorithm 2, and

$$\begin{aligned} C^l=\max \{C^{l-1},r_l \}+p_l. \end{aligned}$$

Based on Lemma 3.3, there exists an optimal solution \(A_{J_l}^*\) for instance \((J_l,\pi )\) satisfying \(l\in A_{J_l}^*\), and the objective value of \(A_{J_l}^*\) is

$$\begin{aligned} Z^*_{J_l}= & {} C(A_{J_l}^{*}) + \pi (\overline{A_{J_l}^{*}})\nonumber \\= & {} \max \{C(A_{J_l}^{*}\setminus \{l\}),r_l \}+p_l + \pi (\overline{A_{J_l}^{*}})\nonumber \\= & {} \max \{C(A_{J_l}^{*}\setminus \{l\})+ \pi (\overline{A_{J_l}^{*}}),r_l + \pi (\overline{A_{J_l}^{*}})\}+p_l \nonumber \\\ge & {} \max \{Z^*_{J_{l-1}},r_l+ \pi (\overline{A_{J_l}^{*}}) \}+p_l \nonumber \\\ge & {} \max \{C^{l-1},r_l\}+p_l \nonumber \\= & {} C^{l}, \end{aligned}$$
(8)

where \(C(A_{J_l}^{*})\) is the makespan of job set \(A_{J_l}^{*}\) using the earliest release date rule; the first inequality follows from the fact that \(A_{J_l}^{*}\setminus \{l\}\) is a feasible job set for instance \((J_{l-1},\pi )\) and \(Z^*_{J_{l-1}}\) is the optimal value for instance \((J_{l-1},\pi )\); and the second inequality follows from \(C^{l-1}\le Z^*_{J_{l-1}}\) and \(\pi (S)\ge 0\) for any \(S\subseteq J\). \(\square \)

When a new job j arrives, let \(A^j\) be the processed job set when job j is assigned by Algorithm 2. For convenience, we define

$$\begin{aligned} R^j=\overline{A^j}. \end{aligned}$$

Lemma 3.5

\(\pi (R^j)\le 2 Z^*_{J_j},~\forall j\in \{1,2,\ldots , n\}.\)

Proof

For instance \((J_j,\pi )\), let \(A_{J_j}\) be the job set generated by Algorithm 1, and its objective is

$$\begin{aligned} r_{t_{A_{J_j}}}+p(A_{J_j})+\pi (\overline{A_{J_j}})\le 2Z^*_{J_j}, \end{aligned}$$
(9)

where \(t_{A_{J_j}}=\max \{j|j\in A_{J_j}\}\) and the inequality follows from Theorem 2.3.

Case 1. if \(R^j\cap A_{J_j}=\emptyset \), we have \(R^j\subseteq \overline{A_{J_j}} \) and

$$\begin{aligned} \pi (R^j)\le \pi (\overline{A_{J_j}})\le 2Z^*_{J_j}, \end{aligned}$$

where the first inequality follows from that \(\pi (\cdot )\) is monotone, and the second inequality follows from inequality (9).

Case 2. if \(R^j\cap A_{J_j}\ne \emptyset \), let l be the job with the maximum index in \(R^j\cap A_{J_j}\), then we have

$$\begin{aligned} r_l\le r_{t_{A_{J_j}}} \end{aligned}$$
(10)

by \(t_{A_{J_j}}=\max \{j|j\in A_{J_j}\}\). Let \(A_{J_l}\) be the job set for instance \((J_l,\pi )\) using Algorithm 1. Then, we prove the following inequalities in instance \((J_l,\pi )\), and use

$$\begin{aligned} R_{J_j}=J_j\setminus A_{J_j} \mathrm{~instead ~of ~the~} \overline{A_{J_j}} \mathrm{~we ~used~ before}. \end{aligned}$$

In the the following proof, we define

$$\begin{aligned} {\overline{S}}= J_l\setminus S \mathrm{~ for~ any~} S\subseteq J_l. \end{aligned}$$

Since job \(l\in R^j\) is rejected by Algorithm 2, we have \(l\notin A_{J_j}\). Based on Lemma 3.2, we have \(r_{t_{A_{J_l}}}+p(A_{J_l})+ \pi (\overline{A_{J_l}})\le r_l+p({\overline{S}})+ \pi (S)\) for any \(S\subseteq J_l\), where \(t_{A_{J_l}}=\max \{j|j\in A_{J_l}\}\). By rearranging the terms and let \(S=\overline{A_{J_l}}\cap R_{J_j}\), we have

$$\begin{aligned} r_l-r_{t_{A_{J_l}}}\ge & {} p(A_{J_l})+ \pi (\overline{A_{J_l}})-\Big (p(\overline{\overline{A_{J_l}}\cap R_{J_j}})+ \pi (\overline{A_{J_l}}\cap R_{J_j})\Big ) \nonumber \\= & {} p(A_{J_l})-p(A_{J_l}\cup (J_l\setminus R_{J_j}) + \pi (\overline{A_{J_l}})-\pi (\overline{A_{J_l}}\cap R_{J_j}) \nonumber \\= & {} -p((J_l\setminus R_{J_j})\setminus A_{J_l}) + \pi (\overline{A_{J_l}})-\pi (\overline{A_{J_l}}\cap R_{J_j}) \nonumber \\\ge & {} -p(A_{J_j}\setminus A_{J_l})+\pi (\overline{A_{J_l}}\cup R_{J_j})-\pi (R_{J_j}) \end{aligned}$$

where the last inequality follows from inequality (1) and \(J_l\setminus R_{J_j}\subseteq J_j\setminus R_{J_j} =A_{J_j}\). By rearranging the terms, we have

$$\begin{aligned} \pi (\overline{A_{J_l}}\cup R_{J_j})\le & {} p(A_{J_j}\setminus A_{J_l})+\pi (R_{J_j}) +r^l-r_{t_{A_{J_l}}}\nonumber \\\le & {} r^l+ p(A_{J_j}\setminus A_{J_l})+\pi (R_{J_j}). \end{aligned}$$
(11)

If \(R^j\cap (A_{J_j}\setminus \overline{A_{J_l}})=\emptyset \), then we have

$$\begin{aligned} R^j\subseteq J_j\setminus (A_{J_j}\setminus \overline{A_{J_l}})=R_{J_j}\cup (A_{J_j}\cap \overline{A_{J_l}})\subseteq \overline{A_{J_l}}\cup R_{J_j}, \end{aligned}$$

and

$$\begin{aligned} \pi (R^j)\le \pi (\overline{A_{J_l}}\cup R_{J_j})\le r^l+ p(A_{J_j}\setminus A_{J_l})+\pi (R_{J_j})\le r_{t_{A_{J_j}}}+p(A_j)+\pi (R_j)\le 2 Z^*_{J_j}, \end{aligned}$$

where the first inequality follows from that \(\pi (\cdot )\) is monotone, and the second inequality follows from inequality (11) and inequality (10); otherwise, let \(A_{J_j}=A_{J_j}\setminus \overline{A_{J_l}}\), and repeat the above Case 2. Repeating at most n times, we can obtain that the equality \(R^j\cap (A_{J_j}\setminus \overline{A_{J_l}})=\emptyset \) follows. \(\square \)

By Lemma 3.4 and Lemma 3.5, the following theorem is obvious.

Theorem 3.6

The competitive ratio of Algorithm 2 for the on-line problem is 3.

We complement this result by an example implying that the analysis of Theorem 3.6 is tight, and consider the following instance with n jobs, where \(r_j=0\) and \(p_j=1\) for \(j\in \{1,2,\ldots ,n-1\}\), and \(r_{n}=n-1\) and \(p_{n}=1\). The submodular function \(\pi (\cdot )\) is a value oracle which is defined as follows (same to Example 1):

$$\begin{aligned} \left\{ \begin{aligned}&\pi (\emptyset )=0;\\&\pi (\{n\})=2n-1;\\&\pi (S)=\sum _{j:j\in S}(1+\frac{1}{n}),~\mathrm{if~}n\notin S;\\&\pi (S)=2n-1,~\textrm{otherwise}. \end{aligned}\right. \end{aligned}$$

Same to Example 1, the optimal solution is to process all jobs, and the optimal value is \(Z^*=n\).

For any \(S\subseteq \{1,2,\ldots ,n-1\}\), we have \(p(S)<\pi (S)\). For \(j\in \{1,2,\ldots ,n-1\}\), since \(r_j=0\), the output job set contains all jobs for instance \((J_j,\pi )\) using Algorithm 1. i.e., \(A_{J_j}=J_j\). This means, the output job set by Algorithm 2 is to contain the first n jobs.

For instance \((J_{n},\pi )\), same as Example 1, \(A_{J_{n}}\) dose not contain job n, and the output job set by Algorithm 2 is not contains job n, i.e., the objective value of \(A^j\) is \(Z=C(J_{n-1})+\pi (\{n\})= n-1+2n-1=3n-2\). Thus, we have

$$\begin{aligned} \frac{Z}{Z^*}=\frac{3n-2}{n}\longrightarrow 3,\mathrm{~when~} n\rightarrow \infty . \end{aligned}$$

3.3 A special case with the same release date

Then, we consider the special case of the on-line problem, where all the jobs have the same release date.

Theorem 3.7

There is no on-line algorithm with a competitive ratio less than \(\frac{\sqrt{5}+1}{2}\), even when \(r_j=0\) for any \(j\in J\).

Proof

We are given a sequence of two jobs, \(J=\{1,2\}\), arriving on-line, and a penalty submodular function \(\pi (\cdot )\) defined as follows,

$$\begin{aligned} \left\{ \begin{aligned}&\pi (\emptyset )=0;\\&\pi (\{1\})=\pi (\{2\})=\pi (\{1,2\})=\frac{\sqrt{5}+1}{2}. \end{aligned}\right. \end{aligned}$$

Job 1 arrives, where \(p_j=1\). For any algorithm \({\mathcal {A}}\), if job 1 is rejected by \({\mathcal {A}}\), then no job arrives, and \(Z_{{\mathcal {A}}}=\pi (\{1\})=\frac{\sqrt{5}+1}{2}\) and \(Z^*=r_j+p_1=1\), where \(Z_{{\mathcal {A}}}\) is the objective value of the job set generated by \({\mathcal {A}}\) and \(Z^*\) is the optimal value. This implies that

$$\begin{aligned} \frac{Z_{{\mathcal {A}}}}{Z^*}\ge \frac{\sqrt{5}+1}{2}. \end{aligned}$$

If job 1 is processed on the machine by \({\mathcal {A}}\), then job 2 arrives, where \(p_2=10\). The optimal solution is to reject all jobs and its optimal value is \(Z^*=\pi (\{1,2\})=\frac{\sqrt{5}+1}{2}\). If job 2 is rejected by \({\mathcal {A}}\), then \(Z_{{\mathcal {A}}}=0+1+\pi (\{2\})=1+\frac{\sqrt{5}+1}{2}=\frac{\sqrt{5}+3}{2}\). Otherwise, job 2 is processed on the machine, and \(Z_{{\mathcal {A}}}=p_1+p_2=11\). These statements imply that

$$\begin{aligned} \frac{Z_{{\mathcal {A}}}}{Z^*}\ge \frac{\frac{\sqrt{5}+3}{2}}{\frac{\sqrt{5}+1}{2}}=\frac{\sqrt{5}+1}{2}. \end{aligned}$$

Therefore, the lemma holds. \(\square \)

Theorem 3.8

When all jobs have the same release date, Algorithm 2 is a 2-competitive on-line algorithm and the bound is tight.

Proof

For any \(j\in \{1,2,\ldots , n\}\), let \(A_{J_j}^*\) be the optimal solution for instance \((J_j,\pi )\), and let \(Z^*_{J_j}\) be the optimal value. If \(A_{J_j}^*=\emptyset \), since \(A_{J_j,0}=\emptyset \) and \(A_{J_j}=\arg \min _{A_{J_j,t}}Z_{J_j,t}\), we have \(p(A_{J_j})+\pi (\overline{A_{J_j}})\le Z^*_{J_j}\); otherwise, since all jobs have the same release date, we can resort the index of jobs to satisfy \(j\in A_{J_j}^*\). Similar to inequality (4), we have \(r_{t_{A_{J_j}}}+p(A_{J_j})+\pi (\overline{A_{J_j}})\le Z^*_{J_j}\), where \(t_{A_{J_j}}=\max \{j|j\in A_{J_j}\}\). Combining Lemma 3.4, the objective value of job set generated by Algorithm 2 is

$$\begin{aligned} C^j+\pi (\overline{A^{j}})\le Z^*_j+r_{t_{A_{J_j}}}+ p(A_j)+\pi (\overline{A_{J_j}})\le 2Z^*_j. \end{aligned}$$

To show that the bound is tight, we consider the following instance with n jobs, where \(r_j=0\) for \(j\in \{1,2,\ldots ,n\}\), \(p_j=1\) for \(j\in \{1,2,\ldots ,n-1\}\), and \(p_{n}=3\). The submodular function \(\pi (\cdot )\) is defined as follows:

$$\begin{aligned} \left\{ \begin{aligned}&\pi (\emptyset )=0;\\&\pi (\{n\})=n;\\&\pi (S)=\sum _{j:j\in S}(1+\frac{1}{n}),~\mathrm{if~}n\notin S;\\&\pi (S)=n+1,~\textrm{otherwise}. \end{aligned}\right. \end{aligned}$$

It can be verified that the optimal solution is to reject all jobs, and the optimal value is \(Z^*=n+1\). However, the output job set by Algorithm 2 is to contain the first n jobs, and its objective value is \(Z=2n-1\). Thus, we have

$$\begin{aligned} \frac{Z}{Z^*}=\frac{2n-1}{n}\longrightarrow 2,\mathrm{~when~} n\rightarrow \infty . \end{aligned}$$

\(\square \)

Fig. 1
figure 1

Solution quality for n=10

Fig. 2
figure 2

Performances of Algorithm 2 against the greedy algorithm

4 Numerical experiments

The experimental study aims to present the practical performance of Algorithm 2, and compare against a baseline method.

Datasets. The processing time and the release date of any jobs in datasets are generated in an average distribution over a given range.

Submodular penalty function. The penalty function required by the experiment must be non-negative, monotonic, and submodular. We use the following submodular function for experiments,

$$\begin{aligned} \pi (S)=\sum _{J_j\in S}\pi (J_j)-\theta \cdot (|S|^2-|S|),~\forall S(\ne \emptyset )\subseteq J, \end{aligned}$$

where \(\theta \ge 0\) is a parameter.

Experiment procedures. We use IBM’s open-source tool Cplex to obtain the optimal solution for the single machine scheduling problem with release dates and submodular penalties. If we do not get the optimal solution within 5 min, we stop Cplex. For comparison, we also run the greedy algorithm (Lu et al. 2011) which is the on-line algorithm for the problem with linear penalties. The first experiment compares Algorithm 2 the against the optimal solution generated by Cplex and the greedy algorithm in small datasets using different parameter \(\theta \). The second experiment compares Algorithm 2 against the greedy algorithm in the datasets with different number of jobs.

Implementation details. Implementation details. The whole experiment is implemented on a single process with an Intel(R) Core(TM) i5-9300 H CPU at 2.40GHz and 8 GB RAM.

Numerical results. Figure 1 displays the objective values of Algorithm 2, the CPLEX and the greed algorithm in a small datasets. Figure2 displays the performances of Algorithm 2, and the greed algorithm in the datasets with different number of jobs. The results first show that the theoretical analysis indeed matches the practical performance. In addition, the gap between the objective values generated by Algorithm 2 and greedy algorithm is increasing as either \(\theta \) or n increases, which is expected.

5 Conclusion

In this paper, we study the on-line single machine scheduling problem with release dates and submodular rejection penalties. We prove that there is no on-line algorithm with a constant competitive ratio if the penalty submodular function is not monotone, and present an on-line algorithm with a competitive ratio of 3 when the penalty submodular function is monotone. In particular, we consider the special case of this problem, where all jobs have the same release date. We prove that there is no on-line algorithm with a competitive ratio of \(\frac{\sqrt{5}+1}{2}\approx 1.618\) even when the release date of each job is 0, and the competitive ratio of the on-line algorithm is 2.

It is challenging to either find a greater lower bound or design an on-line algorithm with a better competitive ratio. The information of jobs with the same release date can be revealed when these jobs arrive, and the on-line-over-time case of this problem is worth considering.