1 Introduction

The integration of Artificial Intelligence (AI) in a Cyber-Physical System (CPS) is used to establish autonomous and self-driven machining processes [1,2,3]. The machining processes are usually associated with aspects of non-linear behavior and stochastic degradation that result in the difficulty of predicting the life span of the tool, especially when dealing with difficult-to-cut materials [4, 5]. There are many attempts in the literature to monitor the tool wear and detect the machining’s tool failure. To achieve the highest possible material removal rates in machining, previous studies focused on offline optimization to schedule the machine feed rate, assuming ideal machining conditions [6]. Despite the previous attempts, an intelligent online system that can monitor and optimize the tool’s performance in real-time is still needed. This paper fills this gap by offering a new approach that provides an intelligent-based extension of Tool Time to Failures (T2F) while maintaining an acceptable material removal rate level.

Offline mathematical optimizations were applied to the CNC machine processes to find the static machining parameters that maximize productivity [7]. Feed rate scheduling optimizations were developed to have a dynamic online feed rate setting [5, 8, 9]. One of the limitations of these approaches is the usage of empirical equations that assume ideal machining conditions. Adaptive control (AC) techniques take into consideration the environmental and sensor variations by mathematically estimating the forces at each time step, and then comparing the estimated values with the actual sensor measurements [5]. As such, the CNC machine controller’s parameters are changed to achieve the offline optimized feed rate schedule. The estimation of online forces requires large computational time. Stemmler et al. [10] developed a Model Predictive Controller (MPC) to minimize the production time online for CNC milling machines. MPC is a model-based controller that predicts the values of the forces and adjusts the feed rate online accordingly to minimize the machining time. The MPC online optimization causes a processing delay, and thus, an additional signal processing synchronization is added to the machine controller. Both MPC and AC require mathematical modeling to estimate the forces. These models assumed a new tool at the beginning of cutting and ideal tooling conditions.

Shaban et al. studied tool wear monitoring for CNC machines to develop a failure alarm solution that could avoid producing defective pieces [11,12,13]. The authors applied Logical Analysis of Data (LAD) to detect the tool wear (VB) failure while monitoring the data of the machining forces. Sadek et al. [14] developed an adaptive mechanism that linked the tool wear monitoring to AC for a drilling machine. This mechanism is limited to two speeds and two feed rate adjustment levels. Shaban et al. used the time to Failure (T2F) and the Proportional Hazard Modeling (PHM) to obtain the optimal replacement time for the CNC machine tool [15]. The authors defined the tool replacement time at different machine settings using two types of analyses: tool availability and cost. The machining parameters were static settings, and it was adjusted before running the machine.

Taha et al. [16] developed a self-healing mechanism for a CNC milling machine. This mechanism dealt with the CNC machine under fault and approved self-healing mechanism to the machine. The authors used pattern-recognition machine learning to define the recovery patterns and each pattern is bounded by corrective settings. The self-healing mechanism selects the recovery pattern according to distance calculations to the current machine’s faulty settings. From the selected pattern, the corrective actions are randomly selected through a uniform distribution that is bounded by the selected pattern.

Reinforcement Learning (RL) is a model-free approach that able to find the best value of a variable over its dimensional space, according to the optimal trained policy [17]. It successfully adapted to optimize the tool orientation [18], and manage the energy of a turning process under different cutting conditions [19]. The autonomous actions to improve tool life and productivity are needed to be addressed in the pre-failure zone. Table 1 provides a summary of many attempts in terms of what was achieved and what needs to be addressed. To fill the research gaps presented in Table 1, the main objective of this paper is to develop an autonomous pre-failure mechanism that interacts with the CNC machine in the P-F interval to extend the tool’s useful life. Figure 1 depicts the P-F curve, which is a conceptual curve of degradation of any physical assets. The P-F curve has two main points that express its name: the potential failure point (P), and the function failure point (F) [20]. Practically, the degradation process is a stochastic phenomenon, and each tool has a unique P-F curve [3, 4].

Table 1 Research Gaps
Fig. 1
figure 1

P-F Conceptual Curve

This section presents the LAD algorithm to generate the patterns, which are used to develop the self-healing actions in module 3, and to classify and detect the out-of-specification in module 2.

The proposed Pre-Failure approach has the following features:

  1. 1.

    Model-free adjustment mechanism for the CNC machine.

  2. 2.

    Continuous feed rate adjustment.

  3. 3.

    Time to failure extension and degradation rate slowdown.

  4. 4.

    Lower online computation efforts.

  5. 5.

    Applicable for wide ranges of machining parameters.

In this work, a model-free Deep Reinforcement Learning (DRL) is proposed for continuous online feed rate adjustment. This approach is developed to add a tuning mechanism that optimizes the tool performance and productivity in the P-F zone of the machine’s tool. The approach has the capability to achieve the highest possible material removal rate while maintaining an acceptable tool wear level. This approach can be implemented in any machining process with less computational effort.

This paper is organized as follows: Section 2 describes the system layout and Pre-Failure mechanism procedure. Section 3 contains the physical experiment and data review. Section 4 presents the proposed methodology. Section 5 describes the results of the implementation and provides a discussion. Lastly, Sect. 6 concludes the paper.

2 System description

2.1 System layout

Figure 2 shows the system layout of the online autonomic closed loop for the CNC machine’s pre-failure mechanism. In the pre-failure mechanism, there are two main phases and four modules. Phase 1 is the offline step for machine learning with Logical Analysis of Data (LAD), which is indicated by module 1. The software used is cbmLAD, which was developed for condition-based maintenance applications [21]. The cbmLAD is used to generate explanatory patterns that define the online P-F zones in module 2. The tool data is labeled according to the tool wear level VB as (a) new tool \({VB < VB_p}\), (b) Potential Failure (P-F) of the tool \({VB_p<VB<VB_F}\), and (c) Failure of the tool \({VB<VB_F}\). The data of the time-series forces are ingested into the cbmLAD to extract patterns that characterize the P-F zone. The tool’s data is labeled as a failure when the tool wear level is more than, or equal to, a predefined value \({VB_F}\). The data in the potential failure zone is labeled in the same way, as shown in Sect. 4.1.

Fig. 2
figure 2

Autonomic closed loop to achieve Pre-failure Mechanism

P-F zone monitoring in module 2 is based on online rules extracted from the cbmLAD’s generated patterns. This module monitors the tool performance and detects the instant of Potential failure and the tool failure. Module 3 represents the CNC machine in the Digital environment. The developed CNC Digital Twin (DT) is proposed to work online and in parallel with the physical machine. The DT model is supported by an artificial Neural Network (NN) that reads the CNC machine settings of cutting speeds speed (v) and feed rate f. It estimates the machine’s forces measurements \({[F_x, F_y, F_z]}\) at each time step (t+1), based on the sensor’s readings of forces at the time (t).

Module 4 is a Deep Reinforcement Learning (DRL) Pre-Failure agent that generates action \({a_{t+1}}\) to adjust the feed rate \({f_{t+1}}\) according to the optimal policy that the agent learned in the training phase. In the online mode, module 2 reads the CNC machine forces sensor measurements at time (t) and enables the Pre-Failure agent in module 4 once the P-F zone is detected. The Pre-Failure agent reads the CNC machine measurement of the radial force \({F_x}\), the feed force \({F_y}\), the cutting \({F_z}\) force, and the cutting speed v, at each time step. Accordingly, the proposed DRL agent adjusts the feed rate \({f_{t+1}}\) to slow down the tool degradation rate.

2.2 Pre-failure intelligent mechanism procedure

This section presents the design steps to achieve the research objective of having a tool Pre-Failure mechanism for autonomous CNC machines. The proposed Pre-Failure mechanism has five main steps, as follows:

  1. 1.

    CNC machine experimental data: the material-tool pair Time to Failure (T2F) data is an essential process in order to improve the tool performance in the Pre-Failure stage. In this paper, the raw T2F data is analyzed to monitor and detect the tool performance degradation in the P-F zone. The developed CNC machine’s Digital Twin (DT) model is validated and tested with the Physical raw data. These data were collected for the process of turning Titanium Metal Matrix composite (TiMMC) material. The raw data collection experiment was fully described in [13]. The data is presented in Sect. 3.

  2. 2.

    Tool P-F zone monitoring: the tool performance degradation is studied by building the tool P-F curve as the general one in Fig. 1. P-F curve shows the performance degradation versus the lifetime of the tool. Section 4.1 presents the data analysis of tool performance degradation and the proposed algorithm to define the tool potential failure instant. It also includes a Logical Analysis of Data (LAD) and the online generated patterns that monitor the P-F zone. By the end of Sect. 4.1, module 3 in Fig. 2 is achieved.

  3. 3.

    Deep Reinforcement Learning (DRL) Pre-Failure agent: This is the step of designing the DRL agent (module 4 in Fig. 2). Section 4.2 explains the DRL for continuous feed rate adjustment, and it defines the pre-failure agent objective and state vector that describes the CNC machine status from the perspective of DRL. Section 4.2 includes a description of the agent’s architecture, training algorithm, and communication links to modules 2 and 3. The added value and machining improvement for the tool’s performance are discussed and verified in the results in Sect. 5.

  4. 4.

    CNC machine Digital Twin (DT): The DRL’s environment is an essential part of Pre-Failure agent learning. A Digital Twin model is developed to interact with the DRL agent. The developed model lies on the collected experimental data, and it is validated with the physical machine tool’s degradation. By the end of Sect. 4.3, the CNC machine’s digital module 3 in Fig. 2 is accomplished.

3 Review of the experimental data

The Proposed Pre-Failure algorithm will be implemented on a CNC machine during turning Titanium Metal Matrix Composites (TiMMC), and all experimental data used in this study is based on [13]. In the data collection phase, the experimental data was recorded under different static machining parameters on a 5-axis Boehringer NG 200 CNC turning center [13]. The tool diameter was 1.6 mm, and the tool wear was measured with an Olympus SZ-X12 microscope. In [13], two design variables are included; feed rate f (mm/rev) and cutting speed v (m/min). In terms of the machining outputs and experiment response, the forces and flank wear VB (mm) are recorded. The experiment consisted of five runs, and at least five replications for each run.

For high-speed turning machines, ISO 3685:1993 stated that the maximum allowable VB before tool failure is 0.3 mm [22]. This criterion is uncommon in the aerospace industry to avoid the impairing surface damage caused by tool degradation [23]. In this paper, the experiments were conducted on a CNC machine turning TiMMC material for aerospace applications, and the tool wear level is recommended to be lower than 0.2 mm [13]. Therefore, VB was measured every 2 min until its value exceed the failure level of 0.2 mm.

Full factorial Design of Experiment (DoE) is the most conservative DoE type, as it aims to cover all combinations of machining parameters [24, 25]. In this paper, the experimental method was a full factorial 2-factors, 2-level full DoE. The levels of DoE were the maximum and minimum operation settings that were recommended by tool’s supplier: speed \({(v= 40~;~ 80~m/min)}\) and feed rate \({(f = 0.15~;~ 0:35~ mm/rev)}\). One more level \({(v= 60 ~m/min}\), and \({f= 0:25~ mm/rev)}\) was added to the full factorial DoE in order to address the non-linearity of the tool degradation, and increase the prediction accuracy. The raw experimental data consists of 247 observations. A sample of the experimental data is given in Table 2.

Table 2 A sample of experimental Raw Data

For example, Fig. 3 shows the radial forces of the experimental data provided in Table 2 for cutting speeds of 80 m/min. Figure 3 contains five replications of run 2 and run 4. It should be stated from Fig. 3 that increasing the feed rate at the same cutting speed leads to higher radial forces. Accordingly, the T2F becomes shorter when increasing the feed rate.

Fig. 3
figure 3

Experimental radial forces \({F_x}\) at cutting speed of 80 m/min and different feed rates for different replications

4 Materials and methods

4.1 Tool degradation monitoring on PF curve

4.1.1 Tool potential failure point (P)

The P-F curve presents the tool performance’s degradation against its operating time, and it defines the potential failure point (P) and the functional failure point (F) [20]. The tool has functionally failed when it exceeds the tool wear VB level that is recommended by the tool’s manufacturer. The potential failure point (P) is the point at which the tool’s failure propagation starts to increase, and it could be detected. In the P-F zone, the tool’s performance has a significant deviation from its normal behavior when the tool is first installed. This performance is observed by the machine’s sensor measurements of forces [13, 15]. P-F zone is an important mode as maintenance activities take place in this time interval [13, 15, 20].

In this paper, the knee point detection algorithm in [26] is adapted to define the potential failure point (P) for tool performance degradation according to the experimental tool wear data in Table 2. To plot the tool performance degradation P-F curve, an index that takes values in the interval of [0,1] is developed. The Normalized Tool performance degradation Index NTPI is given by \({NTPI=1-5\times VB}\). It equals one with the new tool and zero at the tool failure limit of \({VB=0.2}\)mm. Figure 4 shows an example of the P-F curve for a tool operated at 40 m/min cutting speed and feed rate of 0.35 mm/rev. The potential failure point (P) is detected at 560 sec, and the wear is 0.073 mm for this replication.

Fig. 4
figure 4

P-F curve of one tool replication under 40 m/min cutting speed and 0.35 mm/rev feed rate

The knee detection algorithm [26] calculates the Euclidian displacement between all of the points on the NTPI graph and the perpendicular point on an imaginary reference line. This straight-line links the maximum and the minimum points of the tool performance, given by the dashed line in Fig. 5. The potential failure point is a point on the NTPI that has the maximum positive Euclidian distance. In Fig. 5, the red curve is the Euclidian displacement between NTPI and the reference line. Figure 6 depicts the Normalized Tool performance degradation Index NTPI and the potential failure (P) points for all the runs and replications of the experimental data given in Table 2. The potential failure points are indicated by dashed lines. As the tool degradation is a stochastic process, the detected potential points are not the same for all of the runs and replications. Table 3 summarizes the potential failure levels of the tool wear VB for each run and replication for the cutting speed \({v1= 40 m/min ~and~ v2 = 80 m/min}\) and the feed rate \({f1 =0.15 mm/rev ~and~ f2=0.35 mm/rev}\). The average potential (P) tool wear over the collected experimental data is 0.135 mm. Tool P-F zone is the pre-failure zone at which the correction mechanism is needed.

Fig. 5
figure 5

P-F curve and Euclidian distance curve for P point detection

Fig. 6
figure 6

TiMMC Normalized Tool performance degradation Index NTPI for different runs and replications of the experimental data in Table 2

Table 3 Potential failure points of the CNC tool experimental data

4.1.2 Tool P-F zone online monitoring and detection

Logical Analysis of Data (LAD) is a non-statistical supervised data mining method. It uses Boolean logic functions and combinatorial optimization for classification [21, 27]. The advantage of LAD over other classification methods is to generate explanatory patterns for each class, which maintains comparative performance in knowledge extraction for supervised and semi-supervised classification problems. The patterns divide the multidimensional space of features into zones that characterize the classes.

cbmLAD solves Mixed-integer Programming (MILP) optimization problems iteratively to find the logical relationships among the input data features by generating patterns that characterize each class of the tool’s life [28]. For each pattern, each feature is bounded by a specific range of values. For a new data point, the pattern is satisfied if the value of the measured features lies in its bounded range. In the one versus all (OVA) classification technique, the cbmLAD generates patterns to characterize a specific class from the other classes. From the tool P-F curve, the tool-life consists of three zones, as shown in Fig. 7: (a) New tool class, (b) Pre-failure class, and (c) Failure class. To detect the tool degradation state, cbmLAD divides the classification problem into three sub-problems and finds each class’ discrimination function \({\Delta 1, \Delta 2, \Delta 3}\).

Fig. 7
figure 7

OVA Technique for the Tool Degradation Performance Classes in Two-Dimensional Space

For a new observation O, the OVA cbmLAD multi-class’s discrimination function \({\Delta (O)}\) is given in Eq. 1 as described in [28].

$$\begin{aligned} \begin{aligned} \quad&\Delta (O)=arg_{i}max[\Delta _{i}(O)]\\&\Delta _{i}(O)=\sum _{P_{j}}{w_{j}\alpha _{P_{j}}(O)},\; \forall i=1,2,3 \end{aligned} \end{aligned}$$
(1)

where \({P_j}\) is the \({j^{th}}\) pattern that covers the observation O where j is the number of the pattern that belongs to the class i set of patterns. \({W_j}\) is the pattern coverage weight, and \({\alpha }\) is a binary index, which is 1 when the observation O is covered by pattern \({P_j}\), and zero otherwise.

In this paper, the cbmLAD one versus all (OVA) technique is applied to solve a multi-classification problem in order to find the tool’s state of degradation. In online mode at each time step (t), the P-F monitoring and detection module 2 in Fig. 2 monitors the time-stamped machine’s sensors \({[t, F_x, F_y, F_z]}\) and checks whether the measured observation is covered by any of the patterns that represent the pre-failure or failure zones. The Pre-failure zone’s detection signals are used to activate or deactivate the Reinforcement Learning RL module 4 in Fig. 2.

From Table 3, the potential failure VB level is 0.135 mm on average. The P-F zone is defined when the tool wear is \({0.135 \le VB < 0.2}\), and the failure level is \({VB > 0.2}\). The experimental data is categorized into three main classes: (a) \({VB < 0.135}\), (b) \({0.135 \le VB < 0.2}\), and (c) \({VB \ge 0.2}\) mm. According to the developed tool P-F curve, the data in Table 2 is identified by the classes label and ingested to OVA cbmLAD to generate the tool life’s patterns. Table 4 presents the generated patterns that characterize the P-F and the failure classes.

The online P-F monitoring module 2 in Fig. 2 performs two main functions, (a) the monitoring of the potential failure interval with ten patterns, and (b) the detection of failure with five patterns. Each pattern in Table 4 is represented in a multidimensional zone in the features space. At each time step (t), once the measured observation lies in a pattern zone, a signal is sent to activate or deactivate the Pre-Failure agent module 4 in Fig. 2. The P-F monitoring and detection module’s scanning cycle is synchronized with the machine module 3 in Fig. 2 and with the pre-failure agent.

Table 4 Generated patterns of P-F and failure zones for the data of the time-stamped Force

4.2 Deep Reinforcement Learning (RL) model

The standard RL is formalized as an agent that interacts with a system’s environment, then receives the current system state and instant reward \({r_t}\) at time t [17, 29]. The RL goal is to find the optimal policy \({\pi ^{*}}\) that maximizes the return from the state \({R_{t}=\sum _{t=0}^{\infty }{\gamma r_{t}(s_{t},a_{t})}}\), where \({\gamma \in [0,1]}\) is the discount factor for future rewards and t is instant of the return [29, 30]. The expected return value of taking action (\({a_t}\)) in the state \({s_t}\) under a policy \({\pi }\) is called Q-function and it is equal to \({Q^{\pi }(s_t,a_t)=E_{r_{t},s_{t+1}}[r_t(s_t,a_t)+\gamma E_{a_{t+1}}[Q^{\pi }(s_{t+1},a_{t+1})]]}\). The optimal Q-value \({Q^*(s_t,a_t)=max(Q^{\pi }(s_t,a_t))}\) is the maximum returned value \({\forall s_t \in S}\) and \({\forall a_t \in A}\) , where S is the state space, and the action space A is limited to discrete actions [29, 30]. The optimal policy is obtained from the optimal Q-value when obtaining the action that maximizes the returned Q-value; in mathematics, this is given by \({\mu (s)=argmax_{a}Q(s_t,a_t)}\) [29, 30]. The Q-learning is an off-policy algorithm that uses a greedy policy. It learns the optimal policy \({\pi ^*}\) as it approximates the Q-function by the Q-network parameters \({\theta ^Q}\). The optimal Q-function \({Q^*(s_t,a_t)}\) is achieved by obtaining the optimal parameters \({{\theta ^Q}^*}\) at which the training loss \({L(\theta ^Q)=E_{s_t,a_t,r_t}[(Q(s_t,a_t|\theta ^Q)-y_t)^2]}\) is the minimum. \({y_t}\) is the target function with next time step state \({s_{t+1}}\), and it is calculated as \({y_t=r(s_t,a_t)+\gamma Q(s_{t+1}|\theta ^Q)}\). For the Q-learning stability, the target \({y_t}\) is calculated by another identical Q-network [2, 29]. Practically speaking, it is difficult to apply Q-learning to a continuous action space, and the Q-algorithm is not capable of optimizing an infinite number of actions at each time step. The actor-critic approach is used to solve this problem with the Deterministic Policy Gradient (DPG) algorithm [2, 29]. The critic is an action-value function \({Q(s,a|\theta ^Q)}\) used to calculate the temporal difference (TD) error to criticize actions made by the actor, and it is updated based on the Q-function. The actor is a deterministic policy function \({\mu (s|\theta ^{\mu }}\) that chooses action \({a_t}\) given state \({s_t}\) [2, 29]. The actor’s network parameters \({\theta ^\mu }\) are updated according to maximizing the action-value \({Q(s,a|\theta ^Q)}\), and its training losses are ascending losses given by \({\nabla _{Q^\mu } J\approx E_{s_t}[\nabla _a.Q(s,a|\theta ^Q)|_{s=s_t,a=\mu (s_t)}\nabla _{Q^\mu }\mu (s|\theta ^\mu )|_{s=s_t}]}\). A Deep Deterministic policy gradient (DDPG) is an algorithm that implements the deep Q-network on the DPG algorithm. DDPG approximates the Q-function and enables RL in systems that have continuous actions and a large-dimension state [2, 29]. DDPG is a model-free RL algorithm that uses a replay buffer memory to update the system’s states, actions, and rewards during agent training [17, 29]. In this paper, the pre-failure agent is an adapted DDPG algorithm to achieve optimal proactive and autonomous feed rate adjustment.

4.2.1 Pre-failure agent for autonomous CNC machine

In the current work, the CNC turning machine pre-failure agent action \({a_t}\) is performed on the feed rate \({f_t}\) mm/rev. At each time step t = 1sec, the agent reads the machine sensor data \({[F_x, F_y, F_z]}\) and scans the P-F monitoring module signal. The agent generates actions to decrease the tool’s degradation rate and keep a reasonable productivity limit in the P-F zone, which is given in Eq. 2.

In the training phase, the pre-failure agent interacts with the DT model of the CNC turning machine, and it is rewarded for each action \({a_t \rightarrow f_t}\) with the reward function \({r_t}\). The rewarded value depends on the tool degradation rate and productivity at each time step. The machine’s productivity is represented by the Material Removal Rate \({MRR ~(mm^3/min)}\) in Eq. 2.

$$\begin{aligned} MRR= f\times v \times d \end{aligned}$$
(2)

where f is the feed rate in mm/rev, v is the cutting speed in m/min, and d is the cutting depth in mm. The proposed pre-failure agent interacts with the CNC machine at a different cutting speed, which varies from 25 m/min to 80 m/min. The agent’s actions are a wide range of feed rate adjustments from 0.025 mm/rev to 0.35 mm/rev.

Definition of the state

The classical RL algorithm is an extension of a Markov Decision Process (MDP) and its assumption of time-independent states [30]. At each time step (t), the RL agent receives state \({s_t}\) and takes an action \({a_t \rightarrow f_t}\) according to its learned policy [29]. To learn the optimal policy, the state is assumed to be time-independent, and it describes the system status regardless of the system’s historical behavior. In many industrial system applications, the MDP cannot fully describe a system in which the time-independent state is useless in learning the optimal policy [29]. For example, in CNC machining, the instant sensor measurement \({[F_x, F_y, F_z]}\)t at time (t) cannot abstract the tool wear stage, as described in Sect. 4.1. Therefore, it is difficult to take a maintenance decision according to a time-independent instant value of forces. To ensure that the RL agent has the full features to describe the tool wear status, the RL state \({s_t}\) is extended to include the cutting speed v (m/min), forces measurement at the instant of potential failure detection \({[F_x,F_y,F_z]_p}\), the sensor measurement deviation at time t from its value when the tool is at the P point \({[E_{xyz}]}\), and the negative rate of forces \({[\Delta F_{xyz}]_t}\) over sampling time T. Pre-Failure agent’s state \({s_t}\) is given by Eqs. 34, and 5.

$$\begin{aligned}{}[E_{xyz}]_t=[F_x,F_y,F_z]_p-[F_x,F_y,F_z]_t \end{aligned}$$
(3)
$$\begin{aligned}{}[\Delta F_{xyz}]_t=\frac{[F_x,F_y,F_z]_t-[F_x,F_y,F_z]_{t+1}}{T} \end{aligned}$$
(4)
$$\begin{aligned} s_t=(v,[F_x,F_y,F_z]_p,[E_{xyz}]_t,[\Delta F_{xyz}]_t) \end{aligned}$$
(5)

RL agent action

The feed rate optimization is the key factor in optimizing the CNC machine tool’s performance, as indicated in Sect. 3. At the same cutting speed v (m/min), the tool’s degradation rate decreases while the feed rate f (mm/rev) is decreased, and this decreases productivity. The pre-failure RL agent is designed to generate optimal and continuous action at that adjusts the CNC machine feed rate at each time step (t). This action aims at decreasing the tool degradation rate while keeping the productivity within an acceptable limit. In practice, the adjustable feed rate range depends on the tool-material pair, and for products in composite materials, it could be changed from 0.025 mm/rev to 0.35 mm/rev.

Reward function

In RL, the reward function \({r_t(s_t,f_t,s_{t+1})}\) acts as the objective function in mathematical programming. At each time t, the RL agent explores the action space A to find the optimal action at that maximizes its reward according to the given state \({s_t}\). The pre-failure agent reward function is designed to minimize the tool degradation rate and to keep the productivity level within acceptable limits. To maximize the tool Time to Failure (T2F), the Pre-Failure agent is designed with a positive reward function \({r_t(s_t,f_t,s_{t+1})}\) given by Eq. 6. To keep the productivity level relatively high, the agent is rewarded only if its action at minimizes the forces deviation \({|E_{xyz}|_t}\),which is the absolute difference between the measurement forces \([F_{xyz}]\) in the P-F zone and the detected potential failure forces \({[F_{xyz}]_p}\) at \({t=t_p}\) point. The tool degradation rate increases when the deviation decreases. Equation 6 indicates the instant reward \({r_t(s_t,f_t,s_{t+1})}\) calculation at each time step t.

$$\begin{aligned} r_t(s_t,f_t,s_{t+1})=\left\{ \begin{matrix} 1&{} \text {if } |E_{xyz}|_{t+1} \le |E_{xyz}|_{t} \\ 0&{} o.w \end{matrix}\right. \end{aligned}$$
(6)

4.2.2 Pre-failure agent training

The Pre-Failure agent is an adapted DDPG algorithm, and its structure consists of the actor and the Q-function/Critic deep NNs. The actor adjusts the CNC machine feed rate according to the input RL state \({s_t}\) and learned policy that is criticized by the Q-function network. To improve the DDPG agent training performance, a random noise \({N_t \sim N(0, std)}\) is added to the actor action [31]. The feed rate \({f_t}\) to be adjusted at time (t) equals to \({a_t=\mu (s_t|\theta ^\mu )+N_t}\), where \({\mu (s_t|\theta ^\mu )}\) is the output of the actor-network and \({\theta ^\mu }\)is the parameters of the actor-network. In the training phase, the hidden layer’s parameters of the Pre-Failure agent are updated to minimize the losses function of critics and to maximize the negative losses of the actor-network. To improve the learning stability, the target networks’ parameters are updated with soft updates [31]; in other words, \({\theta ^{'} \leftarrow \tau \theta + (1-\tau )\theta ^{'}}\), where the learning rate \({\tau }\) is less than one.

The target network’s parameters are \({\theta ^{Q'}}\) and \({\theta ^{\mu '}}\) for the critic and the actor. The pre-failure agent’s full training algorithm is given in Algorithm 1, and it is built and trained on a deep learning Pytorch environment.

figure a

In DRL for continuous control applications, the selections of the agent’s hyperparameters are correlated to the complexity of the environment, including numbers of inputs and outputs [29]. Agent complexity increases as hyperparameters are increased, which leads to high training accuracy, model overfitting, and low online accuracy. This issue is known as the bias-variance tradeoff [32]. In this paper, the environment is the CNC turning machine, and the pre-failure agent’s action is the feed adjustment while the machine sensors are three \({F_x, F_y, F_z}\). The developed Pre-Failure agent architecture and hyperparameters are given in Appendix 1. These hyperparameters are adopted from the best performance DRL model in the literature of the same context. The adopted model’s architecture performed a continuous control action on a computer game environment that has the same input-output dimensions as CNC turning machine [29].

Training of pre-failure DRL agent requires a large number of data observations and replications [33]. It is impractical to have massive machining experiments to train the DRL, as the TiMMC and its tool are costly [13]. Therefore, training of the DRL agent in a Digital Twin (DT) environment is proposed to address these limitations. DT is a potential candidate to eliminate or reduce the machining experiments’ cost.

4.3 Digital Twin (DT) for CNC turning machine

Recently, the development of Industrial IOT (IIOT), simulation modeling, and Artificial Intelligence (AI) enable the digitalization of the machines, and the Digital Twin (DT) is extracted as a new concept of Cyber-Physical Systems (CPS) [3, 34]. Digital Twin is a model that emulates the Physical CNC machine in the cyber/digital environment, and it has the capability of interacting with the real machine in the Physical environment [34,35,36]. DT was developed for a system level, and in the case of a single machine, DT is developed on the component level. Figure 8 shows the implementation of DT on machine tool management. The digital environment contains physical data storage, data preprocessing, digital simulation models, and artificial intelligent agents. The digital environment has three main objectives: (1) monitoring the machine’s data forces (2) analyzing this data to abstract the health status of the tool, and (3) taking action to improve the tool’s performance. There are three methods to model a Digital Twin: Multiphysics modeling using Finite Element Analysis (FEA), Mathematical model-based, and/or Data-driven modeling [31]. In this paper, the experimental data in Table 2 is used to build the machine DT, and a deep artificial Neural Network (NN) is developed to act as a digital twin for the CNC turning machine. This model emulates the CNC turning machine in the digital environment. The DT’s outputs are the estimated radial force \({F_x}\), feed force \({F_y}\), and cutting force \({F_z}\) measurements at each time step t, and the model inputs are the cutting speed v (m/min) and the feed rate f (mm/rev), and time step t (min).

Fig. 8
figure 8

DT implementation on a machine’s tool management

To minimize the overfitting of the model, a rule of sum for the NN architecture design is provided in Eq. 7 [37, 38]. The numbers of hidden layers are limited to the number of inputs \({N_i}\) and number of outputs \({N_o}\), while the number of hidden neurons \({N_h}\) for \({N_s}\) data observations is given by Eq. 7 [37, 38]. \({\beta }\) is a scaling factor that represents the prevention of overfitting in the NN model and it takes a value from 2 to 10 [37, 38].

$$\begin{aligned} N_h=\frac{N_s}{\beta (N_i+N_o)} \end{aligned}$$
(7)

The developed digital model has three inputs [tvf] and three outputs \({F_x}\), \({F_y}\), and \({ F_z}\). For 247 data observations, the number of hidden layers varies from 4 to 20 for \({\beta \in [2,10]}\). One of the model’s architectures is selected from more than 170 models’ architectures. The models’ hidden layers vary from single-layer to four-layer models, and each layer’s amount of neurons changes from 4 to 20 neurons, and more five-layer models were added to the architecture’s comparison. The best model is the model that has the lowest Mean Square Error (MSE) for the unseen testing data. Table 5 demonstrates the lowest MSE Network’s architecture among all of the models with the same number of layers. The best model architecture of \({[14-15-18-15]}\) is selected. To build the CNC machine DT, this model is trained and tested on a deep TensorFlow learning environment [39]. During the testing of the DT model, the testing Mean Absolute Error (MAE) was ±12.8 (e.g., Digital \({F_x}\) = physical \({F_x}\) ± 12.8). Figure 9 shows the DT model estimated forces versus the physical CNC turning machine sensors’ reading for the unseen testing data of \({F_x}\), \({F_y}\), and \({F_z}\).

Table 5 Lowest MSE NN models and their hidden layers and neurons
Fig. 9
figure 9

Sensor data of a \({F_x}\), b \({F_y}\), and c \({F_z}\) with the CNC cyber model vs. experimental physical testing data

5 Analysis of the results

This section analyzes the effects of the Pre-Failure agent on the tool performance and the tool Time to Failure (T2F) compared to the standalone CNC machine at different cutting speeds. The performance of the proposed Pre-Failure mechanism is measured by two key indexes: the Tool T2F and the achieved MRR. The P-F monitoring module activates the Pre-Failure agent in the P-F zone, and the agent is deactivated at the instant of tool failure. The closed-loop autonomy enables the Pre-Failure agent to adjust the optimal feed rate according to the estimated machine’s forces at time (t + 1). In online mode, the Pre-failure agent interacts with the CNC machine every 1 s, and its sampling time T is selected as T = 60 sec.

DRL Pre-failure agent learned the optimal policy \({\pi ^*}\) in the training phase, and it figures out the mapping between the machine’s sensor measurements that are represented in the state \({s_t}\), and the best action \({a_t \rightarrow f_t}\) at each time t. In online mode, the input state \({s_t}\) is changed to follow the tool degradation, and accordingly, the developed pre-failure agent interacts with this degradation in the Pre-failure zone, and a new feed rate setting \({f_t}\) is adjusted as the best action at time t based on the learned optimal policy \({\pi ^*}\).

The trained Pre-Failure agent is validated with the CNC Turing machine DT at different cutting speed settings given in Table 6. In each run, the autonomic CNC machine is simulated until the tool failure is detected at VB = 0.2 mm. The machine’s tool starts with a maximum feed rate of 0.35 mm/rev and the Pre-Failure enters the machining process when the P-F monitoring module 2 detects the potential failure class and the instant of P point.

Table 6 Different speeds to validate the trained Pre-Failure agent

In Run I, the Pre-Failure agent increased the tool T2F by almost 27% over the standalone CNC machine. Figure 10 illustrates the force measurements for the standalone CNC machine in solid lines, and the tool Time to Failure (T2F) is 31.433 min (1886 s). With the implementation of the Pre-Failure agent on the CNC machine, the degradation rate of the machine forces decreases, which is indicated by the dashed (\({-\cdot -}\)) lines in Fig. 10. The tool’s T2F increases to 40 min (2400 s).

Fig. 10
figure 10

3D Forces \({F_{xyz}}\) (N) at 5000 RPM of Run I for the standalone machine and the Pre-Failure machine

In the P-F zone, the Pre-Failure agent generates a continuous feed rate adjustment according to the optimal trained policy in Sect. 4.2. At a spindle speed of 5000 RPM, the adjusted feed rate at each time step (1 s) is given with the blue color in Fig. 11, while its accumulated moving average is given in the orange color. At t = 21.733 min (1304 s) the tool’s potential failure point (P) is detected, then the feed rate is adjusted according to the learned optimal policy. Cumulative Moving Average (CMA) at each time step is plotted as the average of the Pre-Failure agent’s action up to the current time step. The Pre-Failure agent generates a variable feed rate at each second to maximize the T2F within the P-F zone.

Fig. 11
figure 11

Pre-Failure feed rate adjustment at a spindle speed of 5000 RPM in Run I

The Pre-Failure agent keeps the productivity of the CNC machine at an acceptable limit, as the minimum CMA feed rate is kept at 0.2665 mm/rev. The machine’s productivity index is the Material Removal Rate MRR \({(mm^3/min)}\) in Eq. 2. For a 0.2 mm depth of cut and 25120 mm/min cutting speed, the change of the MRR with the Pre-Failure agent is indicated by the green shaded area in Fig. 12. \({MRR\%}\) is the Pre-Failure agent’s overall productivity \({MRR_{PF}}\) relative to the standalone machine \({MRR_{max feed}}\) at the maximum feed rate given by Eq. 8. \({T2F_{st}}\) is the standalone machine’s T2F, and \({T2F_{PF}}\) is the T2F that includes the Pre-Failure agent.

$$\begin{aligned} MMR\%=\frac{\sum _{t=0}^{T2F_{PF}}{MMR_{PF}}}{\sum _{t=0}^{T2F_{st}}{MRR_{max feed}}}=\frac{\sum _{t=0}^{T2F_{PF}}{f(t)}}{f_{max}\times T2F_{st}} \end{aligned}$$
(8)

In Run I, the Pre-Failure machine productivity is almost the same as the standalone machine with the maximum feed rate, and the \({MRR\%}\) equals 99.3%. The extension in T2F enables the machine to produce more within the added time. Figure 12 shows that the Pre-Failure added-value MRR recovers the lost MRR.

Fig. 12
figure 12

Pre-Failure Agent Tool MRR VS. Standalone Machine

Figure 13 concludes the extension of the tool’s Time to Failure (T2F) by the Pre-Failure agent over the standalone machine. The tool’s added lifetime is high with relatively low spindle speeds, and it is small with high speeds. The lowest T2F added time is 1.4 min (37%) for the tool that works on 12500 RPM and the highest added time is 10.9 min (50%) with 7500 RPM. At higher cutting speeds, the tool degrades faster, the P-F interval is smaller, and the Pre-Failure agent has a smaller time to interact with the CNC machine environment. In the meantime, the T2F added time adds more valuable MRR at high speeds, as stated in Fig. 14. The Pre-Failure agent keeps the level of productivity high, as given in Fig. 14, while the tool deviation from the potential failure level is considered as discussed in Sect. 4.2. The lowest Pre-Failure agent’s \({MRR\%}\) is 79%, which is achieved at a spindle speed of 10000 RPM, as given in Fig. 14. At 10000 RPM, the tool T2F increases to 2.133 min (128 sec), and the added-value MRR with the Pre-Failure agent is lower than the lost one due to a decrease in the feed rate value. Figure 15 depicts the T2F and \({MRR\%}\) of the standalone machine at different static feed rates and a spindle speed of 10000 RPM. The best standalone static feed rate setting is 0.1745 mm/rev, which aims to increase both MRR and T2F; the achieved MRR is 59% of that at fmax, and tool T2F is 19.12 min. The proposed Pre-Failure mechanism outperforms the standalone machine with static settings, and the online agent’s optimal policy achieves more T2F extension and \({MRR\%}\), as given in Figs. 13 and 14.

Fig. 13
figure 13

Tool T2F of the Pre-Failure and the Standalone machine for different RPMs

Fig. 14
figure 14

\({MRR\%}\) for different spindle speeds

Fig. 15
figure 15

T2F and MRR% for a standalone machine at a spindle speed of 10000 RPM and different feed rates

Implementation of the developed Pre-Failure agent improves the tool’s performance in the P-F zone. The online optimal feed rate continuous adjustment adds on average of 5 min to the tool T2F and 5% to the MRR over the classical machining system. The detailed experimental results for each run in Table 6 are provided in Appendix 2.

6 Conclusion

In this paper, the developed Pre-Failure approach improves the tool’s performance in the Pre-Failure zone based on Deep Reinforcement Learning (DRL) during machining processes. The proposed Pre-Failure agent increases the tool Time to Failure (T2F) while maintaining the Material Removal Rate (MRR) at an acceptable limit. The machine tool’s P-F curves and Logical Analysis of Data (LAD) are implemented to monitor and detect the potential failure level of the machine’s tool. In the P-F zone, Pre-Failure model-free agent interacts with the CNC machine and adjusts its feed rate according to the estimated machine’s forces at time (t+1). This method decreases the tool’s degradation rate in the P-F zone before the tool is worn out, at VB = 0.2mm. The Pre-Failure mechanism also keeps the forces at a relatively high level. To train the Pre-Failure agent, a machine Digital Twin (DT) was developed and validated with the physical machine data. The Pre-Failure agent is validated at different spindle speeds, starting from 5000 RPM to 15,000 RPM. By implementing the proposed Pre-Failure approach, the tool T2F increases over the classical machining approach. The value-added time is high at relatively low spindle speeds. It was found that the maximum added time is 10.9 min, which is achieved with 7500 RPM. Meanwhile, at 15,000 RPM, the tool T2F equals 4.55 min, which is almost double the standalone machine. In the P-F zone, the Pre-Failure agent adds more MRR that recovers the lost MRR due to decreasing the adjusted feed rate to be lower than its maximum value. At high speeds, the added MRR is higher than the lost ones. The Pre-Failure agent’s MRR reaches 138.04% of that achieved with a static maximum feed rate under 15,000 RPM spindle speed. At 1000 RPM, the Pre-Failure agent gets the lowest MRRR of 79%, relative to the standalone machine, and it adds 12% of the tool T2F. However, the added time is not enough to recover the lost MRR. The developed dynamic Pre-Failure agent outperforms the best static adjustment for standalone machine runs at 10,000 RPM from the perspective of tool life and productivity. The current work can be extended in the future by including electrical power consumption, the material type, and other machining quality characteristics (e.g., surface roughness and residual stresses).