Keywords

1 Introduction

Equipment failures in manufacturing processes concern industries because they can lead to severe issues regarding human safety, environmental impact, reliability, and production costs. The stochastic nature of equipment degradation and the uncertainty about future breakdowns affect significantly the decision-making process of experts in the context of Condition-Based Maintenance (CBM) strategy (Elwany and Gebraeel 2008; Lorén and de Maré 2015). To this end, mathematical modelling methods can be embedded in information systems capable of processing and analyzing real-time big data with the aim to facilitate decision making ahead of time for optimizing the maintenance and spare parts inventory operations. These methods can contribute to the Decide phase of the “Detect-Predict-Decide-Act” proactive principle. The e-maintenance concept can significantly enhance proactive decision making in maintenance-driven operations management. However, despite the increasing capabilities of e-technologies, maximizing the e-maintenance benefits for the overall maintenance efficiency requires more than technology (Guillén et al. 2016). There is the need for models and methods capable of being embedded in real-time systems triggered by real-time prognostic information in an event processing, streaming computational environment. Maintenance and inventory management are strongly interconnected and should both be considered simultaneously when optimizing a company’s operations (Van Horenbeek et al. 2013). The decision about proactive maintenance actions requires a balance between the cost due to premature action and the cost of unexpected breakdown, while the ordering time of spare parts and their stocking quantities need to be planned so that holding costs are minimized by avoiding, at the same time, stock-outs (Elwany and Gebraeel 2008; Bohlin and Wärja 2015).

Real-time data processing for proactive decision making poses several challenges in efficiency and scalability of the associated information systems. Currently, most of such models and methods can be run offline or on the basis of batches of data at specific sampling times. Although there are research works dealing with extracting insights about current and future situation of business processes, decision making on the basis of real-time, event-driven predictive analytics is still an unexplored area. More specifically, rarely joint maintenance and logistics decision models are real-time and event-driven, while they usually provide recommendations about a pre-defined maintenance action (assuming perfect maintenance) with its associated pre-defined order of spare parts.

We propose a proactive event-driven decision model for joint maintenance and logistics optimization in the frame of Industrial Internet of Things (IIoT). The proposed model is triggered by prognostic information in an event processing computational environment on the basis of sensor-generated real-time data. Unlike other approaches, the proposed model incorporates multiple alternative maintenance actions since the recommended proactive maintenance actions address perfect and various degrees of imperfect maintenance, while each one is mapped to the associated order of spare parts. The proposed decision model incorporates a Markov Decision Process (MDP) model capable of being embedded in event-driven information systems for scalable proactive decision making regarding maintenance and logistics actions.

The rest of the paper is organized as follows. Section 2 presents a literature review about sensor-generated real-time data and proactive event-driven computing as well as joint optimization of maintenance and logistics. Section 3 describes the proposed proactive decision model for joint maintenance and logistics optimization in the frame of Industrial Internet of Things. Section 4 presents the deployment of an information system incorporating the aforementioned model, while Sect. 5 presents the evaluation results. Finally, Sect. 6 concludes the paper and discusses the future work.

2 Literature Review

2.1 Industrial Internet of Things and Proactive Event-Driven Computing in Manufacturing Enterprises

Sensors deployment for measuring temperature, vibration, pressure, etc., is continuously increasing in industries since it enhances their monitoring capabilities by integrating various devices equipped with sensing, identification, processing, communication, and networking capabilities (Bi et al. 2014). To this end, the Industrial Internet of Things (IIoT) paradigm has been evolved. IIoT requires the emergence of IT architectures and infrastructure in order to support the real-time, scalable handling, processing, and storage of increasingly growing amounts of sensor data gathered from various and heterogeneous sources (Bousdekis et al. 2015a). Sensing devices can detect state changes of objects or conditions (e.g., degradation of manufacturing equipment) and create events, which can then be processed by a system or service for providing meaningful insights. Consequently, the real-time, event-driven information systems have been facilitated by the development of sensor technology, the expansion of broadband connectivity and the emergence of predictive analytics (Lee et al. 2014) using a web-service communication paradigm (Theorin et al. 2016).

Event-driven architectures are able to close the business–ICT gap by delivering appropriate business functionality and enabling interconnectivity at an object level (Potocnik and Juric 2014; Zimmermann et al. 2015). Unlike previous paradigms which send requests and wait for responses (responsive computing) (Etzion and Niblett 2010) or others that require continuous update of all the system components each time a sensor measurement is gathered (near real-time computing) (Elwany and Gebraeel 2008), in event-driven systems, that are reactive in nature, processing is triggered on the basis of events for scalable and efficient real-time big data processing. The development of appropriate models and methodologies contributed significantly to the expansion of reactive event-driven systems. A similar contribution is necessary for the expansion of proactive computing. Proactivity refers to the ability to avoid or eliminate the impact of undesired future events, or to exploit future opportunities, by applying predictive models combined with real-time sensor data and automated decision-making technologies (Engel et al. 2012). A crucial concept in proactive computing is the proactive Event Processing Network, which consists of various types of processing elements called proactive Event Processing Agents, aiming to support processing of predicted events as well as actions and actuators as part of the model. Proactive event-driven computing indicates the use of event-driven applications for real-time predictions and decisions ahead of time, before a critical event occurs, according to the “Detect-Predict-Decide-Act” principle (Engel et al. 2012; Feldman et al. 2013; Sejdovic et al. 2016). Each phase is implemented as an Event Processing Agent. There is a large variety of methods, algorithms, and information systems dealing with the Detect and the Predict phase; however, the Decide and Act phases have not been extensively explored (Bousdekis et al. 2015b).

2.2 Joint Maintenance and Logistics Optimization

Companies keep inventories of spare parts in order to have availability in case of maintenance. The amount of spare parts in inventory depends on the demand, i.e., the corrective, preventive, and predictive maintenance actions requiring the associated spare parts. Therefore, maintenance and inventory management are strongly interconnected and should both be considered simultaneously when optimizing a company’s operations (Van Horenbeek et al. 2013). For this reason, the field of joint maintenance and logistics optimization has gathered research interest with the aim to take into account the trade-off between CBM implementation and spare parts ordering policies (Van Horenbeek et al. 2013; Basten et al. 2015), although there are relatively few contributions until now (Keizer et al. 2017).

Since CBM is a proactive maintenance strategy, its implementation can take place according to the “Detect-Predict-Decide-Act” proactive principle for event-driven computing (Bousdekis et al. 2015b) through a digital transformation for adapting manufacturing operations driven by the maintenance function. To this end, the e-maintenance concept can exploit the IIoT paradigm in order to enable proactive decision making in the context of CBM. E-maintenance is referred to the emergence of technologies which are able to optimize maintenance-related workflows and the integration of business performance enabling openness and interoperation of e-maintenance with other components of e-enterprise (Guillén et al. 2016). However, apart from the e-technologies, there is the need for models and methods in order to make e-maintenance a key element to satisfy operational requirements, to improve production system performance and to support inventory and operation guidance, beyond simple notifications and warnings (Muller et al. 2008; Pistofidis et al. 2012; Bousdekis et al. 2015a; Guillén et al. 2016).

The problem of joint maintenance and logistics optimization has gathered a lot of interest from different perspectives. Several decision methods have been used and formulated according to the problem at hand. Several research works have made use of simulation models (e.g., Monte Carlo simulation and discrete event simulation) (Sarker and Haque 2000; Hu et al. 2008; Wang 2012), while multi-objective models have also been considered (Nosoohi and Hejazi 2011). In the last years, the use of condition monitoring has been widely investigated. Therefore, methods for continuous review (s, S) of ordering policies (Xie and Wang 2008; Wang et al. 2012; Keizer et al. 2017) as well as sensor-driven cost risk-based models (Wu et al. 2007; Elwany and Gebraeel 2008) have been developed. However, the decision models existing in the literature are subjected to the limitations presented in Sect. 2.3.

2.3 Existing Limitations and beyond the State of the Art

Existing research works in joint maintenance and logistics optimization use reliability distributions given by the manufacturer specification or derived from experimental setups or collected in real-time through sensors in laboratory environment. Moreover, almost all published papers on this domain consider the implementation of CBM by taking into account the current level of degradation, but not the prediction about the future degradation, the future failure or other prognostic information. So, joint maintenance and spare parts decision models have not been coupled with algorithms for real-time, event-driven prognostics, as a consequence of a general lack of methods for the Decide phase of the proactive principle. Such an approach could support manufacturing companies to minimize their major costs, since a decrease in spare parts inventory cost is among the most significant indirect benefits provided by CBM (Van Horenbeek et al. 2013). Therefore, due to the availability of real-time prognostics, CBM actions can be recommended and spare parts can be ordered Just-In-Time. On the other hand, the equipment downtime may be affected by logistics-related delays, while the time needed for finishing the implementation of the appropriate maintenance actions is rarely accurately known (Van Horenbeek et al. 2013). Finally, the vast majority of published papers assume that the parts of equipment are perfectly maintained after a pre-defined action implementation or do not mention any assumption regarding the degree of restoration (Van Horenbeek et al. 2013).

To overcome the aforementioned limitations, the proposed model is triggered by prognostic information in an event processing computational environment on the basis of sensor-generated real-time data. Unlike other approaches, it incorporates multiple alternative proactive (perfect and imperfect) maintenance actions and spare parts orders. Moreover, it incorporates an MDP model handling transition probabilities distribution functions of time, while, in the place of state rewards, there are costs as functions of action implementation time. Consequently, its output is an action-time policy instead of an action-state policy. Overall, the proposed model takes advantage of proactive event-driven computing and is capable of being embedded in event-driven information systems for scalable proactive decision making in terms of maintenance and logistics actions.

3 The Proposed Model

3.1 Overview

We contribute to the Decide phase of the proactive principle by proposing a proactive event-driven decision model for joint maintenance and logistics optimization supported by the IIoT technology. The proposed model can be embedded in a real-time, event processing information system in order to: (1) be configured (at design time) by the user with the aim to insert the required domain knowledge; and (2) be triggered (at runtime) by real-time prognostic information in the form of a prediction event. Its output is a set of recommendations about the optimal mitigating (perfect or imperfect) maintenance action (out of a list of alternative actions) along with its implementation time and the optimal order of spare parts that are related to this action along with the optimal ordering time.

At design time, the decision maker inserts domain knowledge with the aim to define and configure the various parameters of the proactive decision model. The domain knowledge is entered to the model through equipment instances, which are specific instances corresponding to a specific part of equipment to which the predicted failure corresponds. Domain knowledge entered by users corresponds to the proposed model’s input parameters and includes the cost of the equipment failure (e.g., breakdown), the alternative actions along with their cost parameters, and the new lifetime after the action implementation (i.e., how much time each action prolongs the lifetime of the equipment) as well as the decision horizon (e.g., next planned maintenance). The latter is defined by the end of decision epoch, i.e., the time after which the effect of the predicted undesired event fades and the probability of its occurrence returns to normal (Engel et al. 2012). The action-related cost parameters deal with two factors: the cost of action implementation and the cost of action effect (after the action implementation). These two factors apply in both maintenance and inventory aspects and are expressed as a function of implementation time, because actions often affect operation until some specific future time (e.g., taking machinery down to maintenance and losing the rest of the working week). In this sense, the cost is a decreasing function in the activation time.

The real-time prognostic information is received in the form of a prediction event from the Predict phase and includes the probability distribution function (PDF) of the failure occurrence along with its associated parameters. The proposed decision model takes advantage of the basic model for proactive event-driven computing (Engel et al. 2012) and extends it in order to address the joint optimization of maintenance and spare parts ordering in a proactive way when there are multiple alternative maintenance actions and associated spare parts orders. To this end, an MDP model is used and is formulated accordingly.

3.2 The Mathematical Formulation for the Proposed Decision Model

The proposed proactive event-driven decision model for joint maintenance and logistics optimization is formulated according to the proactive approach (Engel et al. 2012). Therefore, the output of the MDP is not a policy consisting of an action-state pair, but a policy of an action-time pair, and therefore, the Bellman equation is structured accordingly. The proposed proactive decision model is able to provide recommendations about when to take which action provided that the cost of taking the action and/or the cost of the action effect changes over time. To do this, it incorporates the transition probability distributions as a function of implementation time. The state rewards of the MDP correspond to the costs as functions of implementation time. Consequently, the result is the action with the minimum expected loss (instead of the maximum utility) and the optimal time of applying it. The expected loss function of each action is estimated by using the backward induction algorithm for finite horizon problems (Watkins and Dayan 1992) and the Bellman equation is minimized with respect to time. The proactive formulation of the MDP model is solved for both maintenance and logistics so that the resulting expected loss functions are jointly optimized.

Figure 1 shows an example of the proactive MDP formulation for joint maintenance and logistics optimization for three alternative actions. On the basis of this formulation, for arbitrary number of actions, the equations of the joint decision model are derived, i.e., the maintenance equation (for each maintenance mitigating action) and the spare parts ordering equation (for each order associated with the respective maintenance mitigating action). Both of them are derived in relation to the predicted failure, but there are different transition probability functions and state rewards in the same formulation, depicted in Fig. 1. The state rewards correspond to the maintenance costs (i.e., cost of failure, cost of action implementation, and cost of action effect) for each alternative action and the inventory costs (i.e., shortage cost, holding cost) associated with each maintenance action along with their lead times. Table 1 shows the explanation of the proposed decision model’s variables.

Fig. 1
figure 1

An example of the proactive MDP formulation for joint maintenance and logistics optimization

Table 1 Explanation of the proposed model’s variables

3.2.1 Maintenance Expected Loss Function

For the maintenance equation, based on the aforementioned MDP formulation, there is no cost (or benefit) of being at state S n , hence EL(S n ) = 0. In state f, there is a penalty of C f (i.e., the cost of failure), hence EL(S f ) = C f . In state e i we incur penalty of \( {C}_{e_i}\left({t}_{e_i}\right) \) (i.e., the cost function of the action effect) and, given the probability to move to state f, the policy evaluation gives:

$$ EL\left({S}_{e_i}\right)={C}_e\left({t}_{e_i}\right)+P\left({S}_{e_i},{S}_f\right)\ast EL\left({S}_f\right)={C}_{e_i}\left({t}_{e_i}\right)+P\left({S}_{e_i},{S}_f\right)\ast {C}_f $$

In state a i , there is a penalty of \( {C}_{a_i}\left({t}_{a_i}\right) \) (i.e., the cost function of the action implementation) and given the probability to move to state f the policy evaluation gives:

$$ {\displaystyle \begin{array}{c} EL\left({S}_{a_i}\right)={C}_{a_i}\left({t}_{a_i}\right)+P\left({S}_{a_i},{S}_f\right)\ast EL\left({S}_f\right)+P\left({S}_{a_i},{S}_{e_i}\right)\ast EL\left({S}_{e_i}\right)\\ {}={C}_{a_i}\left({t}_{a_i}\right)+P\left({S}_{a_i},{S}_f\right)\ast {C}_f+P\left({S}_{a_i},{S}_{e_i}\right)\ast \left[{C}_{e_i}\left({t}_{e_i}\right)+P\left({S}_{e_i},{S}_f\right)\ast {C}_f\right]\end{array}} $$

Finally, the state S d has not any penalty itself. Therefore, the expected loss is computed as follows:

$$\begin{array}{lll} EL\left({S}_d\right)&=& P\left({S}_d,{S}_{a_i}\right)\ast EL\left({S}_{a_i}\right)+P\left({S}_d,{S}_f\right)\ast EL\left({S}_f\right)\\ &=&P\left({S}_d,{S}_{a_i}\right)\ast \left\{{C}_{a_i}\left({t}_{a_i}\right)+P\left({S}_{a_i},{S}_f\right)\ast {C}_f+P\left({S}_{a_i},{S}_{e_i}\right)\ast \left[{C}_{e_i}\left({t}_{e_i}\right)\right] \right\}\\ &&\left.\left.+P\left({S}_{e_i},{S}_f\right)\ast {C}_f\right]\right\}+P\left({S}_d,{S}_f\right)\ast {C}_f \end{array}$$

Consequently, the expected loss function for each mitigating maintenance action is derived from Eq. (1):

$$\begin{array}{lll} {EL}^{a_i}&=&P\left({S}_d,{S}_{a_i}\right)\ast \left\{{C}_{a_i}\left({t}_{a_i}\right)+P\left({S}_{a_i},{S}_f\right)\ast {C}_f+P\left({S}_{a_i},{S}_{e_i}\right)\ast \left[{C}_{e_i}\left({t}_{e_i}\right)\right] \right\}\nonumber\\ &&\left.\left.+P\left({S}_{e_i},{S}_f\right)\ast {C}_f\right]\right\}+P\left({S}_d,{S}_f\right)\ast {C}_f\end{array}$$
(1)

Let EL 0 denote the expected loss of taking no action. Backward induction for this policy gives:

EL 0(S d ) = P(S d , S n ) ∗ EL 0(S n ) + P 0(S d , S f ) ∗ EL 0(S f ) = P 0(S d , S f ) ∗ C f

The transition probabilities from S d to S f or \( {S}_{a_i} \) are:

$$ P\left({S}_d,{S}_f\right)={P}^f\left({t}_0,{t}_{a_i}\right) $$
$$ P\left({S}_d,{S}_{a_i}\right)=1-{P}^f\left({t}_0,{t}_{a_i}\right) $$

To proceed from a i to e i , probabilities are given by:

$$ P\left({S}_{a_i},{S}_f\right)={P}^f\left({t}_{a_i},{t}_{e_i}\right) $$
$$ P\left({S}_{a_i},{S}_{e_i}\right)=1-{P}^f\left({t}_{a_i},{t}_{e_i}\right) $$

that is, we move to \( {S}_{e_i} \) if f does not occur between the time the action is applied until the time it takes effect. The transition from \( {S}_{a_i} \) to S f occurs with the complementary probability.

Finally, the distribution over the event occurrence in state ei is denoted by:

$$ P\left({S}_{e_i},{S}_f\right)={P}_{e_i}^f\left({t}_{e_i},T\right) $$

T indicates the decision horizon, i.e., the end of decision epoch. If no action is taken, the probability to go to state f is the probability of the event occurrence over the entire interval:

$$ {P}^0\left({S}_d,{S}_f\right)={P}^f\left({t}_0,T\right) $$

And P 0(S d , S n ) is the complementary probability.

Therefore, Eq. (1) is transformed to the expression of Eq. (2):

$$\begin{array}{lll} {EL}^{a_i}&=&\left[1-{P}^f\left({t}_0,{t}_{a_i}\right)\right]\ast \left\{{C}_{a_i}\left({t}_{a_i}\right)\right] \nonumber\\ &&+{P}^f\left({t}_{a_i},{t}_{e_i}\right)\ast {C}_f+\left[1-{P}^f\left({t}_{a_i},{t}_{e_i}\right)\right]\ast \left[{C}_{e_i}\left({t}_{e_i}\right)\right]\nonumber\\ &&\left.\left. +{P}_{a_i}^f\left({t}_{e_i},T\right)\ast {C}_f\right]\right\}+{P}^f\left({t}_0,{t}_{a_i}\right)\ast {C}_f \end{array}$$
(2)

Equation (2) expresses the expected loss of each mitigating maintenance action. The minimization of the expected loss functions of all the alternative actions with respect to implementation time provides a recommendation about the optimal action (the action with the global minimum) and the optimal time for its implementation (the time when the expected loss has its global minimum). In Eq. (2), there is the cost function of the action implementation \( {C}_{a_i}\left({t}_{a_i}\right) \) (i.e., how much the process of action implementation costs—e.g., cost of spare parts, technician pay rate, etc.) and the cost function of the action effect \( {C}_{e_i}\left({t}_{e_i}\right) \) (i.e., how much the result of the action costs—e.g., cost of operating at reduced equipment load). Provided that an estimation of the duration of action implementation is known, \( {t}_{a_i}=t \) and \( {t}_{e_i}=t+\Delta t \), where t indicates the time of action implementation. The polynomial of the action cost function of implementation as well as the initial estimation of the duration of action implementation can be continuously updated through SEF, as we are explaining below. In addition, t0 is considered equal to 0. Consequently, Eq. (2) is transformed to Eq. (3):

$$\begin{array}{lll} {EL}^{a_i}(t)&=&\left[1-{P}^f\left({t}_0,t\right)\right]\ast \left\{{C}_{a_i}(t)+{P}^f\right\}\nonumber\\ && \left(t,t+\Delta t\right)\ast {C}_f+\left[1-{P}^f\left(t,t+\Delta t\right)\right]\ast \left[{C}_{e_i}\left(t+\Delta t\right)\right]\nonumber\\ &&\left.\left. +{P}_{a_i}^f\left(t+\Delta t,T\right)\ast {C}_f\right]\right\}+{P}^f\left({t}_0,t\right)\ast {C}_f \end{array}$$
(3)

Considering a fixed cost function of action implementation and the time periods to which the cost function of action effect corresponds, Eq. (3) is transformed to:

$$\begin{array}{lll} {EL}^{a_i}(t)&=&\left[1-{P}^f\left({t}_0,t\right)\right]\ast \left\{{C}_{a_i}+{P}^f\left(t,t+\Delta t\right)\right.\ast\nonumber\\ &&{C}_f+\left[1-{P}^f\left(t,t+\Delta t\right)\right]\ast \left[{C}_{e_i}\left(T-t-\Delta t\right)\right.\nonumber\\ &&\left.\left.+{P}_{a_i}^f\left(t+\Delta t,T\right)\ast {C}_f\right]\right\}+{P}^f\left({t}_0,t\right)\ast {C}_f \end{array}$$
(4)

3.2.2 Logistics Expected Loss Function

Similarly to the previous calculations, the logistics-related equation (dealing with spare parts ordering) for each alternative maintenance action is derived from backwards induction algorithm on the basis of the same MDP formulation. In this case, there is a shortage inventory cost function C s (t) which is inserted in the following equations and a holding cost function which is taken into account indirectly due to the complementary probabilities. In addition, there is a cost of buying the spare parts C sp . The state negative rewards represent the inventory-related costs and the action states represent the order of spare parts that is mapped to each action, as it has been defined at the configuration of the equipment instance. The ordering of spare parts business function is driven by maintenance, therefore, the MDP formulation remains the same, but each state has a different reward which corresponds to the spare parts ordering costs. So, backwards induction algorithm gives:

$$ EL\left({S}_n\right)=0 $$
$$ EL\left({S}_f\right)={C}_s\left({t}_f\right)={C}_s\left(T-T\right)=0 $$
$$ EL\left({S}_{e_i}\right)=0+P\left({S}_{e_i},{S}_f\right)\ast EL\left(S{\hbox{'}}_f\right)=P\left({S}_{e_i},{S}_f\right)\ast {C}_s\left({t}_{e_i}\right) $$
$$\begin{array}{lll} EL\left({S}_{a_i}\right)&=&{C}_{sp}+P\left({S}_{a_i},{S}_f\right)\ast EL\left(S{\hbox{'}}_f\right)+P\left({S}_{a_i},{S}_{e_i}\right)\ast\nonumber \\ EL\left({S}_{e_i}\right)&=&{C}_{sp}+P\left({S}_{a_i},{S}_f\right)\ast {C}_s\left({t}_{a_i}\right)+P\left({S}_{a_i},{S}_{e_i}\right)\ast \nonumber\\ &&P\left({S}_{e_i},{S}_f\right)\ast {C}_s\left({t}_{e_i}\right) \end{array}$$
$$ \begin{array}{lll} EL\left({S}_d\right)&=&P\left({S}_d,{S}_{a_i}\right)\ast EL\left({S}_{a_i}\right)+P\left({S}_d,{S}_f\right)\ast EL\left(S{\hbox{'}}_f\right)\nonumber\\ &&=P\left({S}_d,{S}_{a_i}\right)\ast \left[{C}_{sp}+P\left({S}_{a_i},{S}_f\right)\ast {C}_s\left({t}_{a_i}\right)\right.\nonumber\\ &&\left.+P\left({S}_{a_i},{S}_{e_i}\right)\ast P\left({S}_{e_i},{S}_f\right)\ast {C}_s\left({t}_{e_i}\right)\right]+P\left({S}_d,{S}_f\right)\ast {C}_s\left({t}_d\right)\end{array} $$

Therefore, the expected loss function for each action is given by:

$$\begin{array}{lll}{EL}^{o_i}&=&P\left({S}_d,{S}_{a_i}\right)\ast \left[{C}_{sp}+P\left({S}_{a_i},{S}_f\right)\ast {C}_s\left({t}_{a_i}\right)\right.\nonumber\\ &&\left.+P\left({S}_{a_i},{S}_{e_i}\right)\ast P\left({S}_{e_i},{S}_f\right)\ast {C}_s\left({t}_{e_i}\right)\right]+P\left({S}_d,{S}_f\right)\ast {C}_s\left({t}_d\right) \end{array}$$
(5)

Let EL 0 denote the expected loss of taking no action. Backward induction for this policy gives:

$$\begin{array}{lll} {EL}^0\left({S}_d\right){=}P\left({S}_d,{S}_n\right){\ast} {EL}^0\left({S}_n\right){+}{P}^0\left({S}_d,{S}_f\right){\ast} {EL}^0\left({S}_f\right){=}{P}^0\left({S}_d,{S}_f\right){\ast} {C}_s\left({t}_d\right)\vspace*{-6pt}\end{array}$$

Finally, the expected loss function of ordering the associated spare parts for each action is given by:

$$\begin{array}{lll} {EL}^{o_i}(t)&=&\left[1-{P}^f\left({t}_0,{t}_{a_i}\right)\right]\ast \left\{{C}_{sp}+{P}^f\left({t}_{a_i},{t}_{e_i}\right)\ast {C}_s\left({t}_{a_i}\right) \right.\nonumber\\ &&\left.+\left[1-{P}^f\left({t}_{a_i},{t}_{e_i}\right)\right]\ast {P}_{a_i}^f\left({t}_{e_i},T\right)\ast {C}_s\left({t}_{e_i}\right)\right\}\nonumber\\ &&+{P}^f\left({t}_0,{t}_{a_i}\right)\ast {C}_s\left({t}_d\right) \end{array}$$
(6)

Taking into account the lead times of the spare parts orders, this equation can be transformed to:

$$ {\fontsize{9}{11}\selectfont{\begin{array}{lll} {EL}^{o_i}(t)&=&\left[1-{P}^f\left({t}_0,t+L\right)\right]\ast \left\{{C}_{sp}+{P}^f\left(t+L,t+L+\Delta t\right)\ast {C}_s\left(t+L\right)\right.\nonumber\\ &&\left.+\left[1-{P}^f\left(t+L,t+L+\Delta t\right)\right]\ast {P}_{a_i}^f\left(t+L+\Delta t,T\right)\ast {C}_s\left(t+L+\Delta t\right)\right\}\nonumber\\ &&+{P}^f\left({t}_0,{t}_{a_i}\right)\ast {C}_s(T)\end{array}}}$$
(7)

Considering the time periods to which the shortage cost function corresponds, Eq. 7 is transformed to:

$$ {\fontsize{8.5}{11}\selectfont{\begin{array}{lll} {EL}^{o_i}(t)&=&\left[1-{P}^f\left({t}_0,t+L\right)\right]\ast \left\{{C}_{sp}+{P}^f\left(t+L,t+L+\Delta t\right)\ast {C}_s\left(T-t-L\right)\right.\nonumber\\ &&\left.+\left[1-{P}^f\left(t+L,t+L+\Delta t\right)\right]\ast {P}_{a_i}^f\left(t+L+\Delta t,T\right)\ast {C}_s\left(T-t-L-\Delta t\right)\right\}\nonumber\\ &&+{P}^f\left({t}_0,t+L\right)\ast {C}_s(T)\end{array}}}$$
(8)

3.2.3 Joint Optimization of Maintenance and Logistics

Equations (3) and (7) constitute the generic proactive decision model for joint maintenance and logistics optimization that is triggered by a prediction event containing the PDF of the equipment under consideration failure. Since the PDF depends on the degradation modelling until the breakdown, it will usually follow distribution belonging to the exponential family (e.g., exponential, Weibull, and gamma) (Kapur and Pecht 2014), and therefore, it will fulfill the Markov property. Otherwise, it should be filtered and processed by other decision methods, e.g., Elwany and Gebraeel (2008). Before optimizing the equations of the proposed decision model, the PDFs should be calculated according to reliability theory, as shown in Eqs. (9) and (10).

$$ {P}^f\left({t}_1,{t}_2\right)=\frac{G^f\left({t}_2\right)-{G}^f\left({t}_1\right)}{1-{G}^f\left({t}_1\right)} $$
(9)
$$ {P}_{a_i}^f\left({t}_1,{t}_2\right)=\frac{G_{a_i}^f\left({t}_2\right)-{G}_{a_i}^f\left({t}_1\right)}{1-{G}^f\left({t}_1\right)} $$
(10)

P f(t 1, t 2) denotes the probability distribution function of the occurrence of the undesired event in the time interval (t1, t2), conditioned on not occurring until time t 1, while \( {P}_{a_i}^f\left({t}_1,{t}_2\right) \) denotes the probability distribution function of the occurrence of the undesired event in the time interval (t1, t2) conditioned on not occurring until time t 1 and assuming that the action a has been implemented exactly at time t 1. The event density function of u, denoted by g f(t), indicates the probability that f will occur at time t and the cumulative distribution function of g is denoted by G f(t). G f(t) indicates the probability that f will occur between time zero and time t, while \( {\overline{G}}^f(t)=1-{G}^f(t) \) denotes the cumulative probability distribution function of the undesired event not occurring. When an action a is applied to reduce the probability of an undesired event, a is associated with a new event density function \( {g}_a^f(t) \), which is the probability that f occurs at time t, although a has been applied before t. This happens because the implementation of action a does not prevent f with certainty. In Eq. (10), the conditioning (denominator) takes into account the fact that until the action occurrence at t 1, the distribution in place was G f. The joint optimization of the maintenance and logistics equations is conducted by using the Brent’s method which is root-finding algorithm combining the bisection method, the secant method and inverse quadratic interpolation (Brent 1971; Gegenfurtner 1992).

4 Information System Deployment in Industrial Environment

We validated our proposed approach in a real industrial environment in the area of oil and gas industry in the context. Although comparable industries such as automotive and aviation have recently started exploiting big data by analyzing them and processing them in suitable information systems, the oil drilling industry has not reached to that level yet. We embedded our approach in an event-driven information system and we integrated it with a system addressing the Detect phase (Riemer et al. 2015), one addressing the Predict phase (Stopar 2015) and one addressing the Act phase (Bousdekis et al. 2015a) of the proactive principle in a real-time event streaming computational environment. The oil drilling company aims to turn from time-based into CBM strategy by exploiting the IoT capabilities with the use of sensors and an event-driven infrastructure and by aligning its logistics operations. For the machine’s gearbox equipment instance, the “Detect-Predict-Decide-Act” principle deals first with friction losses detection with the use of complex event patterns of lube oil temperature and RPM events characterized by an abnormal oil temperature rise measured over a percentage of the drilling period when drilling RPM exceeds a threshold (Detect). This pattern, learned at the offline phase, is an indication that the gearbox may be at a dangerous state. Therefore, a detection event is sent to the Predict phase where a prognostic model is developed for the estimation of the reliability distribution function of the gearbox. This prediction triggers Decide phase which provides a proactive recommendation about the optimal maintenance action and the optimal time of applying it as well as the optimal order of spare parts along with the optimal time for their ordering. Finally, the Act phase includes a Sensor-Enabled Feedback mechanism for supporting continuous monitoring.

At design time, the user interaction is realized with a GUI of the web-based application enabling the user to insert the required domain knowledge per equipment instance. In the current scenario, there are four alternative maintenance actions (lubrication of metal parts, operate at reduced equipment load, offshore maintenance, and full onshore maintenance) with different degrees of restoration and their associated orders of spare parts (lube oil, no ordering, gearbox, Derrick Drilling Machine—DDM), as shown in Table 2. The time-to-failure after the implementation of the maintenance action indicates the degree of restoration. The actions a1, a2, and a3 are implemented on the oil rig (offshore), while onshore maintenance, which corresponds to perfect (“good-as-new”) maintenance, requires its movement onshore.

Table 2 The domain knowledge inserted during user configuration

At some time, a prediction event about an exponential distribution function of the failure occurrence with a parameter λ = 0.045 triggers the decision algorithm. Eqs. (4) and (8) of the joint maintenance and logistics proactive decision model are formulated as shown below:

$$ \begin{array}{lll} {EL}^{a_i}(t)&=&\left[1-\left(1-{e}^{-\lambda t}\right)\right]\ast \left\{{C}_{a_i}+\left(1-{e}^{-\lambda \Delta t}\right)\ast \right.\nonumber\\ &&{C}_f+\left[1-\left(1-{e}^{-\lambda \Delta t}\right)\right]\nonumber\\ &&\left.\ast \left[{C}_{e_i}\left(T-t-\Delta t\right)\right.\right.\nonumber\\ &&\left.\left.+\left({e}^{\left(t+\Delta t\right)\left(\lambda -{\lambda}^{\hbox{'}}\right)}-{e}^{-{\lambda}^{\prime }T+\lambda \left(t+\Delta t\right)}\right)\ast {C}_f\right]\right\}+\left(1-{e}^{-\lambda t}\right)\ast {C}_f\nonumber\\ \end{array}$$
$$\begin{array}{lll} {EL}^{o_i}(t)&=&\left[1-\left(1-{e}^{-\lambda \left(t+L\right)}\right)\right]\ast \left\{{C}_{sp}+\left(1-{e}^{-\lambda \Delta t}\right)\ast {C}_s\left(T-t-L\right)\right.\\ &&\left.+\left[1-\left(1-{e}^{-\lambda \Delta t}\right)\right]\ast \left({e}^{\left(t+L+\Delta t\right)\left(\lambda -{\lambda}^{\hbox{'}}\right)}-{e}^{-{\lambda}^{\prime }T+\lambda \left(t+L+\Delta t\right)}\right)\right.\\ &&\left. \ast {C}_s\left(T-t-L-\Delta t\right)\right\} +\left(1-{e}^{-\lambda \left(t+L\right)}\right)\ast {C}_s(T) \end{array}$$

Although there is an indication of the most probable time-to-failure (parameter λ), the exponential degradation leads to high uncertainty in considering the deterministic value itself. Handling the PDF instead can lead to more accurate and reliable results. The expected loss functions are shown in Fig. 2 and their optimization results in the recommendation: Conduct offshore maintenance for gearbox replacement in 85.47 h and order the gearbox in 42.36 h. These recommendations are exposed to the user through the GUI.

Fig. 2
figure 2

The expected loss functions for (a) maintenance, and (b) logistics (ordering of spare parts)

5 Evaluation Results

5.1 Comparative Analysis

We compared the results of the proposed decision model for the aforementioned scenario with three cases: (1) the case of not having a prediction and therefore, of applying corrective maintenance and inventory-related actions (reactive approach), (2) the case of having a preventive policy with time-based maintenance and scheduled ordering, and (3) the case of having prediction but not proactive recommendations and therefore, of applying a preventive action immediately when the prediction is provided (myopic approach). In the first case, corrective maintenance actions last more than planned ones due to the lack of root causes knowledge, while emergency, unplanned ordering of spare parts requires a higher lead time along with a cost penalty due to the unplanned distribution. In the second case, there is the cost for time-based maintenance along with the risk of an unexpected failure between time intervals. In the third case, due to the failure prediction, immediate orders of spare parts are applied and preventive maintenance actions are implemented after the required lead time. However, there is the probability of a failure occurring before the spare parts arrived. The cost values for the comparative analysis have been derived from expert knowledge in combination with historical data analysis. The results are shown in Table 3.

Table 3 Results of comparative analysis for the aforementioned scenario

Moreover, we conducted simulations of prediction events in the context of 5 real case studies, based on the configuration of 5 associated equipment instances by the users in the oil drilling company. For each scenario, we simulated 100 executions by sending prediction events. In all the scenarios, the expected loss of the proposed approach is significantly lower comparing to the reactive, preventive and the myopic approach leading to optimized business performance, as shown in Table 4. In the case of myopic policy, actions may be applied at some time according to domain knowledge, something which is not quantifiable and is constrained by the subjectivity of human decision-making process.

Table 4 Results of comparative analysis for several executions in five scenarios

5.2 Sensitivity Analysis

5.2.1 Results of Sensitivity Analysis with Respect to the Prediction Events

In the context of the sensitivity analysis, we simulated several prediction events for investigating the resulting recommendations and the associated expected loss. Table 5 shows some indicative results of the sensitivity analysis. It should be noted that, since the decision horizon is in 240 h after the prediction event trigger, the recommended time of 240 means that the action should be performed as has been planned. According to the results, the recommendations can significantly change according to the prediction events. In addition, the earlier a failure is predicted and the proactive decision model is triggered, the less the expected loss is, while the decision maker has more time at their disposal to be prepared and align other manufacturing operations. This conclusion also means that there is a need for reliable and accurate predictive algorithms, with minimized false alarms (false positive and false negative) in order to early predict upcoming undesired events (e.g., equipment failures). In this way, proactive decision models will be able to provide recommendations that lead to a more optimized business performance.

Table 5 Results of sensitivity analysis with respect to the prediction events

5.2.2 Results of Sensitivity Analysis with Respect to the Costs

In order to conduct sensitivity analysis of the proactive decision model for joint maintenance and logistics optimization, we simulated four scenarios of cost structures between the action cost and the failure cost as well as between the shortage cost and the spare parts costs given a specific prediction. Figures 3 and 4 show two indicative plots for the maintenance and logistics expected loss functions, respectively (for one maintenance action and one spare parts order), while Tables 6 and 7 present the resulting optimal expected loss and the optimal implementation time for the specific action. Similarly to other proactive decision algorithms (Engel et al. 2012), the proposed proactive decision model is sensitive to its cost-related input parameters, since the expected loss functions are changed and they can lead to different recommendations.

Fig. 3
figure 3

Four cost structures for the maintenance expected loss function

Fig. 4
figure 4

Four cost structures for the spare parts ordering expected loss function

Table 6 Results of the cost structures for the maintenance expected loss function
Table 7 Results of the cost structures for the spare parts ordering expected loss function

6 Conclusions and Future Work

We presented a proactive event-driven decision model for joint maintenance and logistics optimization in an IIoT-based industrial environment. The proposed model addresses the Decide phase of the “Detect-Predict-Decide-Act” proactive principle. Unlike previous approaches, our proposed model is able to be embedded in an EDA in a scalable and efficient way. Moreover, it is able to provide recommendations of action-time pairs, when there are multiple alternative (imperfect and perfect) maintenance and logistical actions. Our approach was tested in a real industrial environment, in the area of oil and gas industry, while it was further evaluated through comparative and sensitivity analyses. The results showed that the proposed model can lead to a significant reduction of the expected losses caused by maintenance and logistical actions. Moreover, the time that a prediction event is received and the accuracy in cost-related input are crucial for the reliability and the business added value of the recommendations. Regarding our future work, we will develop a context-aware model for considering the context affecting the proactive decision model (i.e., its input parameters and thus, the recommendations themselves). Moreover, we aim to integrate the proposed decision model with a Sensor-Enabled Feedback mechanism and a portfolio optimization approach for supplier selection, since prices may be subjected to fluctuations.