1 Modeling Equations

We briefly recall the production network model from [1, 6] first, and according to [4], we present the stochastic extension to a load-dependent production model with machine failures. To keep the notation well-arranged, we consider a production network consisting of a single queue processor unit. We assume a processor, which is represented by an interval \((a,b) \subset \mathbb {R}\), i.e., with length L = b − a, where ρ(x, t) describes the density of production goods at x ∈ (a, b) and time t ≥ 0. The dynamics of the density, and consequently of the production, is given by the following nonlinear hyperbolic partial differential equation

$$\displaystyle \begin{aligned} \partial_t \rho(x,t) + \partial_x \min\{v \rho(x,t),c\}=0, {} \end{aligned} $$
(1)

where c ≥ 0 is the production capacity and v > 0 the constant production velocity. In front of the processor a storage, also called queue, is assumed and for an externally given time-dependent inflow G in(t) into the production, the queue length q follows the ordinary differential equation

$$\displaystyle \begin{aligned} \partial_t q(t) = G_{\text{in}}(t)-g_{\text{out}}(t), {} \end{aligned} $$
(2)

with

$$\displaystyle \begin{aligned} g_{\text{out}}(t) = \begin{cases} \min\{G_{\text{in}}(t),c\}, &\text{ if } q(t) = 0,\\ c, &\text{ if } q(t)>0. \end{cases} \end{aligned}$$

The processor is coupled to the queue by a boundary condition in the form of \(\rho (a,t) = \frac {g_{\text{out}}(t)}{v}\) and initial conditions ρ(x, 0) = ρ 0(x) ∈ L 1((a, b)), \(q(0) = q_0 \in \mathbb {R}_{\geq 0}\) are prescribed. This deterministic model is well-defined, see, e.g. [1]. The theory of piecewise deterministic Markov processes; see, e.g. [2, 7], has been used to define an appropriate production model with stochastic machine failures in [4], where the probabilities of machine failures depend on the actual workload of the processor. Since this construction only allows for a dependence on the current workload, we can not use the amount of goods produced since the last machine failure as a measure for the next failure. Our new idea lies in adding a variable w governing the workload since the last repair. To do so, we use the time-dependent variable r(t) ∈{0, 1}, and set the capacity as μ(t) = r(t)c for a maximal capacity c > 0. This means that r(t) = 0 ⇒ μ(t) = 0 is a down and r(t) = 1 ⇒ μ(t) = c a working processor at time t and we define

$$\displaystyle \begin{aligned} \operatorname{WIP}(t_0,t_1) = \int_{t_0}^{t_1} \int_a^b \rho(x,t)dxdt \end{aligned}$$

as the cumulative work-in-progress of the processor between time t 0 and t 1. The variable w should therefore satisfy

$$\displaystyle \begin{aligned} \partial_t w(t) = r(t)\int_a^b \rho(x,t)dx, \quad w(t_0) = w_0 = \int_a^b \rho(x,t_0)dx.{} \end{aligned} $$
(3)

Altogether, we define the state space

$$\displaystyle \begin{aligned} E = \mathbb{R}_{\geq 0}\times \{0,1\} \times \mathbb{R}_{\geq 0} \times L^1((a,b)), \end{aligned}$$

which is a measurable space together with the σ-algebra \(\mathscr {E}\) generated by the open sets induced by the metric

$$\displaystyle \begin{aligned}d((w,r,q,\rho),(\tilde{w},\tilde{r},\tilde{q},\tilde{\rho})) = |w-\tilde{w}|+|r-\tilde{r}|+|q-\tilde{q}|+\|\rho-\tilde{\rho}\|{}_{L^1((a,b))}.\end{aligned}$$

Since we construct a piecewise deterministic Markov process, we define the deterministic dynamics between jump times as

$$\displaystyle \begin{aligned} \varPhi_{st} \colon E \to E,\quad (w_0,r_0,q_0,\rho_0) \mapsto (w(t),r(t),q(t),\rho(t)), \end{aligned} $$

i.e., Φ st is the solution to Eqs. (1), (2), (3), and r(t) = r 0 with initial conditions (w 0, r 0, q 0, ρ 0) ∈ E. In between the jump-times, where the capacity changes, we have a capacity, which is given by cr 0 and independent of time. This allows us to apply the theory of the deterministic model (1)–(2) to obtain continuity properties of Φ. To characterize the stochastic part, we introduce

$$\displaystyle \begin{aligned} \psi(t,y) = \lambda_{r,r}(t,w),\quad \eta(t,y,B) = \frac{\lambda_{r,(1-r)}(t,w)}{\psi(t,y)} \epsilon_{(rw,(1-r),q,\rho)}(B) \end{aligned} $$

for every y = (w, r, q, ρ) ∈ E and \(B \in \mathscr {E}\), where λ i,j(t, w) describes the transition rate from capacity i to j at time t and actual workload w, i, j ∈{0, 1} and 𝜖 x is the Dirac measure with unit mass in x. The function ψ is the total intensity determining whether a jump occurs, or not, and the function η describes the probability distribution of the systems jump given the system changes at time t. For example, given the state y = (w, 1, q, ρ) at the time of a jump, the system jumps to (w, 0, q, ρ) and, vice versa, given the state y = (w, 0, q, ρ) the system jumps to (0, 1, q, ρ), i.e., the workload has been “reset”. The open question is whether this model can be represented by a piecewise deterministic Markov process. Following [4], it is straightforward to show

Theorem 1

Let \(\lambda _{i,j} \colon [0,T] \times \mathbb {R}_{\geq 0} \to \mathbb {R}_{\geq 0}\)be uniformly bounded, continuous and satisfy λ i,i = λ i,i−1for i ∈{0, 1}. Then for all initial data x 0 ∈ E there exists a Markov process

$$\displaystyle \begin{aligned}X = ((w(t),r(t),q(t),\rho(r)),t \in [0,T]) \subset E\end{aligned}$$

on some probability space \({(\varOmega ,\mathscr {A},P)}\) , satisfying

  1. 1.

    X(0) = x 0P-almost surely,

  2. 2.

    for every t ∈ (0, T), (w, r, q, ρ) ∈ E and j ∈{0, 1}, it holds that

  3. 3.

    there exists a P-null set \(\mathscr {N} \in \mathscr {A}\)such that for every \(\omega \in \varOmega \setminus \mathscr {N}\), there exist times T 0 = 0 ≤ T 1 ≤⋯ ≤ T M = T such that for every k = 0, …, M − 1, \(X(t) = \varPhi _{T_k,t}(X(T_k))\)for t ∈ [T k, T k+1) with capacity μ(r(T k, ω)), i.e., X behaves deterministic between jump times.

The main and new ingredient is the mapping tw(t), which is a continuous mapping since tρ(t) is continuous.

2 Computational Results

Due to the fact that solutions to (1) move with non-negative velocities only, we can use the first order left-sided upwind scheme for a numerical approximation of the density ρ. Furthermore, we use the explicit Euler scheme to approximate the queue length q given by (2) and w given by (3), where we use a rectangular rule for the integration. This yields an approximation of the deterministic dynamics between the jump times. The simulation of the jump times is done with the thinning algorithm presented in [4]. Its basic idea is to use the uniform bound on the rate functions and generate exponentially distributed times with high intensity, representing the times between jumps, and thin these times during the numerical simulation of the whole system with an appropriate acceptance rejection procedure.

The choice of the rate functions λ i,j(t, w) is a crucial point in numerical examples. Here, we make use of the choice in [9] and set for θ 1, θ 2 > 0 the rate function as

$$\displaystyle \begin{aligned} \lambda_{1,0}(t,w) = \lambda_{1,0}^{\text{min}}+(\lambda_{1,0}^{\text{max}}-\lambda_{1,0}^{\text{min}})(1-e^{-(\theta_1 w)^{\theta_2}}), \end{aligned} $$

which is a scaled version of the cumulative distribution function of a Weibull distribution, i.e., \(F(t) = 1-e^{-(\theta _1 t)^{\theta _2}}\). The classical interpretation of t in the latter expression is the lifetime of a machine and F(t) is the probability that a failure happens after time t, see, e.g. [8]. In our case we use the variable w, which measures the amount of goods produced since the last repair happened. Therefore, if w = 0, then \(\lambda _{1,0}(t,0) = \lambda _{1,0}^{\text{min}}\), which corresponds to the minimal failure rate and \(\lim _{w \to \infty }\lambda _{1,0}(t,w) = \lambda _{1,0}^{\text{max}}\). The function λ 1,0(t, w) is monotonically increasing in w and incorporates the idea of an increasing failure rate depending on past workloads. On the other hand, we assume λ 0,1(t, w) = λ 0,1 because repair times do not dependent on the amount of goods produced.

In the following, we examine the presented model using numerical examples. Here, we assume a production velocity of v = 1, the interval a = 0, b = 1, and the capacity is given as μ(t) = 2r(t). We use a spatial discretization with step-size Δx = 10−1 and a temporal step-size that satisfies the Courant-Friedrichs-Lewy condition, which reads as Δt ≤ Δx for the chosen parameters. The simulation results are based on samples of the stochastic process X and we use the classical Monte-Carlo estimator to evaluate moments or probabilities of the samples. We used a sample size of 105 for all following results.

We analyze the expected queue length, capacity and the distribution of the number of repairs within a time horizon [0, 50] for two different constant inflow profiles. We denote by \(G^{1}_{\text{in}}(t) \equiv 0.5\) and by \(G^2_{\text{in}}(t) \equiv 1.5\) as inflow profiles and use the parameters

$$\displaystyle \begin{aligned} \lambda_{0,1}(t,w) = \frac{1}{0.5},\quad \lambda_{1,0}^{\text{min}} = \frac{1}{10}, \quad \lambda_{1,0}^{\text{max}} = \frac{1}{0.5}, \quad \theta_1 = \frac{1}{10}, \quad \theta_2 = 5. \end{aligned} $$

In Fig. 1, first order moment estimations are shown. In detail, Fig. 1a shows the expected value of the variable w, Fig. 1b the expected capacity, Fig. 1c the expected queue length and Fig. 1d the expected density at the end of the processor. The dynamics is quite interesting: the expected capacity decreases approximately until time t = 6 for the second inflow, then increases and decreases again. Indeed, the mean time to failure is given by \(\varGamma (1+\frac {1}{\theta _2})\theta _1^{-1}\), see e.g. [8]. If w corresponds to the lifetime in our model, we see that an intact system with constant inflow G in is more likely to fail around time \(\varGamma (1+\frac {1}{\theta _2})(\theta _1 G_{\text{in}})^{-1}\). In our case, this leads to time 18.4 for the first and time 6.1 for the second inflow profile, which is close to the times at which the shape of the expected capacity changes. We observe these characteristic times also in the other graphs in Fig. 1. In contrast to the models presented in [3,4,5], where quantities monotonically converge, we obtain an oscillatory behavior of the quantities for constant inputs. The oscillatory effects are natural and caused by the history we incorporate in w. This means, the first machine failures are likely around time 18.4(6.1), the second around 36.8(12.2) and so on. At the same time the failures, which occur between these likely times, smooth this effect out as time evolves and the quantities converge.

Fig. 1
figure 1

First order moments of w, the capacity, queue-length and density. (a) Expected w. (b) Expected capacity μ(t). (c) Expected queue-length q(t). (d) Expected density at x = 1

Figure 2 shows the distribution of the number of repairs within the time horizon [0, 50] and emphasizes the impact of the chosen inflow on the reliability of the processor. In Fig. 2a the case of \(G^1_{\text{in}}\) is shown, where mostly 5–9 repairs have been done. The situation for inflow profile \(G^2_{\text{in}}\) is different, where 9–14 repairs during the time horizon are more likely.

Fig. 2
figure 2

Distribution of the number of repairs within [0, 50]. (a) Inflow \(G^1_{\text{in}}\). (b) Inflow \(G^2_{\text{in}}\)

To conclude, we deduced a production model with random machine failures including failure probabilities depending on the workload of the machine since the last repair occurred. The extension of the model to complex production networks is straightforward, see, e.g. [4]. Simulation results showed a big impact of the history on expected workload, capacity, queue length and density. These effects are not negligible for production planning and control and must be taken into account.