Introduction

The composite materials have special properties which make them the backbone of some industries such as aerospace, sporting, automotive and aircraft structure (Rahman et al. 1999). CFRP has very high modulus of elasticity, high tensile strength, low density, and high chemical stability. Most studies of CFRP are restricted to material properties and theoretical mechanics. Nowadays, the economic impact has an important consideration in manufacturing; therefore, it’s important to study the machining process control for CFRP because it affects the production process (Ferreira et al. 1999). The machining of composite materials is more difficult than the machining of metals because they have non-homogeneous composition, and abrasive properties of reinforcing fibers. The cutting tool confronts fibers and matrix whose response to machining process could be completely different (Teti 2002). The complicated reaction of CFRP to machining, and consequently the defects which are introduced into the workpiece, in addition to the special required specifications of the machined part are the main reasons for the search of new techniques for process control.

Milling is used frequently in manufacturing in order to produce, with composite materials, parts which have high accuracy and high surface quality (Teti 2002), such as delamination, surface roughness and machined part dimensions (Davim and Reis 2005). Researches have been conducted on milling process control. They used artificial intelligence learning techniques in order to control machining process. For example, (Zuperl et al. 2012) used ANN and fuzzy logic to control the cutting force in the process of ball-end milling, and in order to maintain constant roughness. In (Huang 2014), The author developed an intelligent neural-fuzzy model for surface roughness monitoring system in milling operations. He developed a decision-making system which analyzed the cutting forces and then responded with an accurate output. He concluded that his developed system can be used, in future, as an adaptive control system of the machining parameters in smart Computer Numerical Control (CNC) machine. In (Zhang et al. 2007), the authors used ANN to develop surface roughness adaptive control in turning process. They used data from controllable cutting parameters such as feed rate, cutting speed, and depth of cut, and also uncontrollable monitored parameters such as vibration signals in order to develop neural-networks-based surface roughness adaptive control system. Other researchers used other techniques, for example, (Coker and Shin 1996) used ultrasonic sensing to control surface roughness during machining processes. (Wang and Huang 2006) used the concept of an Equivalent Fixture Error (EFE) to improve machining process control. Based on simulated data, they illustrated their concept. In (Du et al. 2012), the authors developed a robust approach for root causes identification in machining process using hybrid learning algorithm and engineering-driven rules. In order to judge whether the process is in normal or abnormal condition, off-line pattern match relationships and on-line time series measurements were used. They validated the developed approach by using data from the real-world cylinder head of engine machining processes. Due to the nonlinearity and complexity of milling process, traditional approaches fail to develop appropriate model to control the process (Haber et al. 2002). In (Landers et al. 2002), the authors concluded that the future of the milling process monitoring and control needs techniques that can determine threshold values and characteristic patterns which can be used to control and tune autonomously the controllable machine conditions (feed, cutting speed, etc.), on-line and off-line, in order to improve part accuracy.

In this paper, we present a pattern-based machine learning technique called logical analysis of data (LAD). We use this technique in order to discover and to understand the hidden correlation between the machining variables of CFRP. Information is extracted from experimental results, and is presented in the form of characteristic patterns. These are hidden rules that characterize the temporal evolution of the machining process. Subsequently, these rules are used in machining process control. In section “Experiment description”, the experimental procedure and results are presented. LAD approach is presented in section “Logical analysis of data (LAD)” and a numerical example is introduced. In section “Performance comparison”, the learning process, from the obtained experimental data, is introduced and comparison between LAD and the ANN is presented. In section “Process control system”, a simulated machining process control is used for building online-decision making procedure using LAD. Concluding remarks are given in section “Conclusions”.

Experiment description

The composition of the tested CFRP composite is quasi-isotropic laminate comprising 35 plies of 8-harness satin woven graphite epoxy prepreg with a final cured thickness of \(6.35 \pm 0.02\,\hbox {mm}\). The tool materials is 6.35 mm, four-flute, solid carbide end mill. The equipment is a Makino A88\(\upvarepsilon \) machining center. In order to reach a spindle speed up to 40,000 rpm, an IBAG spindle speed attachment, which has a 1 kW power, is used. The routing tests is performed using four values of Spindle speed (rpm): 10,000, 20,000, 30,000, and 40,000, three values of feed (mm/min): 250, 500, and 1,000, and three values of tool overhang lengths (\(\hbox {TL})\): \(\hbox {TL}1 =38\,\hbox {mm},\,\hbox {TL}2=31\,\hbox {mm}\), and \(\hbox {TL}3=24\, \hbox {mm}\). The experiments are repeated each 32 mm of cutting distance for three times. As such, we have three values of cutting distance (\(\hbox {C}\)): 32, 64, and 96 mm. In total, we have three feed rates (\(f\)), four cutting speed (\(v\)), three overhang length (\(\hbox {TL}\)), and three cutting distances (\(\hbox {C}\)); therefore, the total number of observations (experiments) is 108. This is a full factorial design of experiments. During slotting, the cutting forces are measured using a Kistler dynamometer 9255B, and the temperatures are measured using a FLIR ThermoVision A20M infra-red camera. For example, Trends for the feed force \((\hbox {Fx}\)), transverse force \((\hbox {Fy})\), and axial force\((\hbox {Fz}\)) for different speeds, feeds and tool overhang length \((\hbox {TL}1=38\,\hbox { mm})\) are shown in Fig. 1. The machined slots were characterized in terms of surface roughness, and delamination. The conforming specifications of these qualities are as follows:

  • Exit and entry delamination \(\le 1\) %.

  • Slot surface roughness right and left \(\le 1.2{\upmu }\hbox {m}\).

Fig. 1
figure 1

Trends for the feed force (Fx), transverse force (Fy), and axial force (Fz) for different speeds, feeds and tool overhang length (TL1 = 38 mm) (Meshreki et al. 2012)

Schematic of the experimental setup is shown in Fig. 2. A sample of the collected data are presented in Table 1. The observations that satisfy (don’t satisfy) any of these specifications are identified by 1(0) in Table 1.

Fig. 2
figure 2

Schematic of the experimental setup

Table 1 A sample of the experimental results

Logical analysis of data (LAD)

The methodology

LAD is a knowledge discovery approach that allows the classification of phenomena based on knowledge extraction and pattern recognition. It is applied in two consecutive phases, training or learning phase, where part of the database is used to extract special features or patterns of some phenomenon, and the testing or the theory formation phase, where the rest of the database is used to test the accuracy of previously learned knowledge. LAD uses a supervised learning technique; this means that the historical data or the database contains the variables and their corresponding outcomes or classes. For example, in Table 1, columns 2–9 are the variables, and columns 12 and 15 are the classes. In this paper, we use a two-class LAD technique. A multi-class LAD technique can be found in (Mortada et al. 2011, 2013). After the two previously mentioned phases, new observations are introduced to LAD in order to be classified. This classification allows us to predict the quality outcome. The main advantages of LAD are: (1) LAD has explanatory power and causality identification which can be very useful in addressing machining process problems. This means that the user can track back any results, caused by a phenomenon or its effects, to its possible causes. This property appears particularly special, when LAD is compared to ANN which is characterized by the difficulty in determining the network structure and the number of nodes, and also the difficulty of interpreting the classification process. The ANN is a “black box” type of technique, which classifies new points without any explanations. (2) Unlike the statistical techniques which depend on distributions, and independence among variables, LAD is a non-statistical, non-approximate technique. LAD does not assume that the data belongs to any specific statistical distribution. (3) Unlike rules based on expert systems and expert knowledge, LAD extracts the knowledge hidden in the data. It then accumulates and preserves this knowledge which can be used at any time by the user, even if the human expertise is not available anymore. (4) No restriction concerning the type of data that LAD can deal with. LAD is capable of handling different types of data, whether nominal or numerical, discrete or continuous, simultaneously.

LAD was proposed for the first time at the Rutgers University Center for Operations Research (RUTCOR) (Hammer 1986). The main steps of the LAD are the binarization of data, the pattern generation, and the theory formation. The objective of data binarization is the transformation of a database of any type into a Boolean database by using cut points technique. Many researchers presented different binarization techniques (Mayoraz and Moreira 1999). In this paper, we use the binarization technique that is presented in (Bores et al. 2000). The technique starts by ranking, in ascending order, all the distinct values, \(u\), of a variable, then cut-points \(\upvarepsilon \) is inserted between each two values that belong to different classes. The cut-point is calculated as the average of the two values. A binary attribute is then formed from each cut-point such that:

$$\begin{aligned} b=\left\{ {{\begin{array}{ll} 1&{}if\,u\ge \upvarepsilon \\ 0&{}if\,u<\upvarepsilon \\ \end{array} }} \right. \end{aligned}$$

The number of transitions between distinct values from two different classes, and vice versa, is equal to the number of cut-points which leads to the total number of binary attributes replacing a numerical variable.

The objective of pattern generation is to find the characteristic patterns that differentiate between classes that are commonly called positive and negative. The positive (negative) class is a set, \(\uppi ^{+}(\uppi ^{-})\), of observations that belong to this class. Many techniques were proposed for pattern generation such as heuristics (Hammer 1986; Hammer and Bonates 2006), enumeration (Bores et al. 2000),column generation (Hansen and Meyer 2011),and linear programming(Ryoo and Jang 2009). In this paper, we follow the pattern generation technique which is proposed in (Ryoo and Jang 2009). The authors converted the pattern generation problem to a set covering problem, and solved it by a mixed integer linear programming (MILP) without any assumptions. Each positive observation \(\hbox {i}\in \uppi ^{+}\) is represented as a Boolean observation vector \(\hbox {a}_\mathrm{i} =(\hbox {a}_{\mathrm{i},1} ,\ldots \hbox {a}_{\mathrm{i},\hbox {q}} ,\hbox {a}_{\mathrm{i},\hbox {q}+1} \ldots \hbox {a}_{\mathrm{i},2\hbox {q}})\). Each generated pattern \(\hbox {p}\) is associated with a Boolean pattern vector \(\hbox {W}=(\hbox {w}_1 ,\hbox {w}_2 ,\ldots \hbox {w}_\mathrm{q} ,\hbox {w}_{\mathrm{q}+1} ,\hbox {w}_{\mathrm{q}+2} ,\ldots \hbox {w}_{2\mathrm{q}} )\) with size \(\hbox {n}\), where \(\hbox {n}=2\hbox {q}, \hbox {q}\) is the size of a binary observation vector.

A literal is a Boolean variable \(\hbox {x}\) or its negation \({\bar{\hbox {x}}}\) (Bores et al. 2000). A pattern \(\hbox {p}\) cannot include both the literal \(\hbox {x}_\mathrm{j} \) and its negation \({\bar{\hbox {x}}}_\mathrm{j} \) at the same time, thus the constraint \(\hbox {w}_\mathrm{j} +\hbox {w}_{\mathrm{j}+\hbox {q}} \le 1\quad \forall \hbox {j}=1,2,\ldots \hbox {q}\) must be respected. The number of literals used to define the pattern is called the degree of a pattern d. Pattern p of degree \(\hbox {d}\) is a conjunction of \(\hbox {d}\) literals; therefore, the pattern \(\hbox {p}\) is found after getting the Boolean pattern vector \(\hbox {W}\) which is the solution of the set-covering problem. For the generation of a positive pattern p+, that is a pattern that covers observations which belong to the positive class, \(\hbox {Y}=(\hbox {y}_1 ,\hbox {y}_2 ,\ldots \hbox {y}_{\mathrm{D}^{+}}))\) is the Boolean coverage vector whose number of elements equal to the number of positive observation \(\hbox {D}^{+}\), and where \(\hbox {y}_{\mathrm{i}}\) is equal to 0 if a pattern p+ covers a positive observation i, and 1 otherwise. Minimizing Y means finding a positive pattern that covers the maximum number of observations of this class. Our objective is to find a pattern that covers a maximum number of positive observations. This pattern is subsequently used to characterize the positive class. It is an indication of the unknown outcome or class. In this optimization problem, the decision variables are the pattern vector \(\hbox {W}\), the degree \(\hbox {d}\), and the coverage vector \(\hbox {Y}\). By definition, a positive pattern cannot cover any negative observations, so the dot product of the pattern vector \(\hbox {W}\) and the observation \(\hbox {i}\in \Pi ^{-}\) must be less than the degree \(\hbox {d}\) of the pattern \(\hbox {p},\) and for that reason the constraint \(\sum \nolimits _{\mathrm{j}=1}^{2\mathrm{q}} \hbox {a}_{\mathrm{i},\mathrm{j}} \hbox {w}_\mathrm{j} \le \hbox {d}-1\, \forall \hbox {i}\in \Pi ^{-}\) must be satisfied. Since the generated pattern doesn’t have to cover all the observations in \(\uppi ^{+}\), the following constraint must be satisfied, \(\sum \nolimits _{\mathrm{j}=1}^{2\mathrm{q}} \hbox {a}_{\mathrm{i},\mathrm{j}} \hbox {w}_\mathrm{j} +\hbox {qy}_\mathrm{i} \ge \hbox {d}\,\forall \hbox {i}\in \Pi ^{+}\). The set covering problem is repeated until all observations in one class are covered by a set of generated patterns such that each observation is covered by at least one pattern. In order to speed-up the pattern generation procedure, the newly-generated pattern must not be a subset of the set of patterns that have already been generated. Every generated pattern vector \(\hbox {W}\) is stored as vector \(\hbox {v}\) in the set \(\hbox {V}\) containing all pattern vectors of the patterns generated previously. This condition can be formulated as:

$$\begin{aligned} \sum \limits _{\mathrm{j}=1}^{2\mathrm{q}} \hbox {v}_{\mathrm{k},\mathrm{j}} \hbox {w}_\mathrm{j} \le \hbox {d}_\mathrm{k} -1\,\forall \hbox {v}_\mathrm{k} \in \hbox {V}. \end{aligned}$$

In addition to the previously mentioned constraints, the problem can be summarized as follows:

$$\begin{aligned}&\mathop {\min }\limits _{\mathrm{W},\mathrm{Y},\mathrm{d}} \mathop \sum \limits _{\mathrm{i}\in \Pi ^{+}} \hbox {y}_\mathrm{i}\nonumber \\&\hbox {s}.\hbox {t}.\left\{ {{\begin{array}{l} {{\begin{array}{l} {\sum \nolimits _{\mathrm{j}=1}^{2\mathrm{q}} \hbox {w}_\mathrm{j} =\hbox {d}} \\ \end{array} }} \\ {1\le \hbox {d}\le \hbox {q}} \\ {{\begin{array}{l} {\hbox {W}\in \left\{ {0,1} \right\} ^{2\hbox {q}}} \\ {\hbox {Y}\in \left\{ {0,1} \right\} ^{\mathrm{D}^{+}.}} \\ \end{array} }} \\ \end{array} }} \right. \end{aligned}$$
(1)

After generation of the strongest pattern, which is the pattern that covers a maximum number of observations in the positive class, looping mechanism is used in order to generate an entire set of patterns that cover all the positive observations at least once. The same process is then repeated to obtain the negative patterns by using the set \(\uppi ^{-}\) of negative observations. A theory is then formed and a decision model is obtained.

The theory formation is the final step in LAD. A discriminant function, such as the one given in Eq. (2), is formulated in order to calculate a score ranging between -1 and 1. When the output of a discriminant function has a positive (negative) value, this means that the tested observation belongs to the positive (negative) class. Zero value means the evidences are not enough in order to decide to which class an observation belongs (Mortada et al. 2011).

$$\begin{aligned} \Delta (O)=\sum \limits _{i=1}^{N^{+}} \alpha _i^+ P_i^+ \left( O \right) -\mathop \sum \limits _{i=1}^{N^{-}} \alpha _i^- P_i^- \left( O \right) \end{aligned}$$
(2)

Where \(\hbox {N}^{+}(\hbox {N}^{-})\) is the number of positive (negative) generated patterns, \(\hbox {P}_\mathrm{i}^+ \left( \hbox {O} \right) \left( {\hbox {P}_\mathrm{i}^- \left( \hbox {O} \right) } \right) \) is equal to 1 if pattern \(\left( \hbox {i} \right) \) covers observation O, and is equal to zero otherwise, \(\upalpha _\mathrm{i}^+ \left( {\upalpha _\mathrm{i}^- } \right) \) are the weights of the positive (negative) pattern \(\hbox {p}_\mathrm{i}^+ \left( {\hbox {p}_\mathrm{i}^- } \right) \). These weights are the proportion of observations covered by each pattern. They represent the power of each pattern. A strong pattern is the most powerful and cover the highest number of observations.

Numerical example

In order to explain the LAD methodology, we introduce the following numerical example. We assume that we have the following seven observations, and the corresponding measured qualities such as the surface roughness or the delamination. The quality takes a label or class 1 (0) to represent conforming (non-conforming) specification. We assume that the machining conditions measurements’ are already changed to binary attributes b1 to b5, by using the procedure presented in section “The methodology”. We search for the combination of machining conditions, that are the characteristic patterns, which differentiate between parts which are conforming or non-conforming to specifications. The seven observations are shown in Table 2 in columns 2–6. Each observation \(\hbox {i}=1\) to 7 is associated with a Boolean observation vector \(\hbox {a}_\mathrm{i} =(\hbox {a}_{\mathrm{i},1} ,\hbox {a}_{\mathrm{i},2} ,\ldots \hbox {a}_{\mathrm{i},\hbox {q}} ,\hbox {a}_{\mathrm{i},\hbox {q}+1} ,\hbox {a}_{\mathrm{i},\hbox {q}+2} ,\ldots \hbox {a}_{\mathrm{i},2\hbox {q}} )\), where q = 10. These are the literals of the observations, as in columns 2–6, and their negations. The Boolean observation vectors are shown in Table 2.

Table 2 Collected Boolean observation vectors and their classes

\(\hbox {Y}=(\hbox {y}_1 ,\hbox {y}_2 ,\hbox {y}_3 )\) is the Boolean vector whose number of elements equal to the number of positive observations, and where \(\hbox {y}_{\mathrm{i}}\) is equal to 0 if a pattern p covers the positive observation i, and 1 otherwise. Minimizing \(\hbox {Y}\) means finding a positive pattern that covers the maximum number of positive observations, that is the strongest pattern.

Accordingly, the MILP for the pattern generation problem is formulated as follows:

$$\begin{aligned}&\hbox {Minimum}\,\hbox {y}_1 +\hbox {y}_2 +\hbox {y}_3 \\&\hbox {S.t.}\\&\hbox {w}_1 +\hbox {w}_{6} \le 1,\hbox {w}_2 +\hbox {w}_7 \le 1,\hbox {w}_3 +\hbox {w}_8 \le 1,\hbox {w}_4 +\hbox {w}_{9} \le 1,\\&\qquad \hbox {w}_5 +\hbox {w}_{10} \le 1\\&\hbox {w}_2 +\hbox {w}_4 +\hbox {w}_5 +\hbox {w}_{6} +\hbox {w}_8 +5\hbox {y}_1 \ge \hbox {d},\hbox {w}_1 +\hbox {w}_2 +\hbox {w}_4\\&\qquad +\,\hbox {w}_8 +\hbox {w}_{10} +5\hbox {y}_2 \ge \hbox {d}\\&\hbox {w}_1 +\hbox {w}_2 +\hbox {w}_3 +\hbox {w}_9 +\hbox {w}_{10} +5\hbox {y}_3 \ge \hbox {d},\hbox {w}_1 +\hbox {w}_3 +\hbox {w}_7\\&\qquad +\,\hbox {w}_9 +\hbox {w}_{10} \le \hbox {d}-1\\&\hbox {w}_4 +\hbox {w}_5 +\hbox {w}_6 +\hbox {w}_7 +\hbox {w}_8 \le \hbox {d}-1,\hbox {w}_1 +\hbox {w}_2 +\hbox {w}_4\\&\qquad +\,\hbox {w}_5 +\hbox {w}_8 \le \hbox {d}-1\\&\hbox {w}_5 +\hbox {w}_6 +\hbox {w}_7 +\hbox {w}_9 +\hbox {w}_{10} \le \hbox {d}-1,\hbox {w}_1 +\hbox {w}_2 +\hbox {w}_3\\&\qquad +\,\hbox {w}_4 +\hbox {w}_5 +\hbox {w}_6 +\hbox {w}_7 +\hbox {w}_8 +\hbox {w}_9 +\hbox {w}_{10} =\hbox {d}\\&1\le \hbox {d}\le 5,\hbox {w}_\mathrm{j} \in \left\{ {0,1} \right\} \,\forall \hbox {j}=1,\ldots ,10,\hbox {y}_1 ,\hbox {y}_2 ,\hbox {y}_3 \in \left\{ {0,1} \right\} , \end{aligned}$$

This MILP problem has three decision set of variables \((\hbox {y},\hbox {d},\hbox {w})\) and it can be solved by any MILP-solver (Linderoth and Lodi 2011). The strongest pattern was obtained as \(\hbox {W}\) \(=(0,1,0,0,0,0,0,0,0,1)\) which means that \(\hbox {p}_1^+ =\hbox {x}_2 \bar{\hbox {x}}_5\), and therefore the attributes’ values must be equal to (1, 0) at attributes \((\hbox {b}_2 ,\hbox {b}_5 )\) in order to be covered by this pattern. The pattern is of degree \(\hbox {d}=2, Y=(1,0,0)\) which means that there is one positive observation (\(\hbox {y}_1 =1\)) that is not covered yet. In this small example, it is easy to see that from the three positive observations 1, 2, and 3, observations 2 and 3 are covered by the pattern that is found, while observation 1 is not. The process of pattern generation is repeated in order to find a pattern that covers observation 1.

In order to generate the \(\hbox {p}_2^+ \) pattern, the observations which have been covered by \(\hbox {p}_1^+ \) are removed. The remaining data set is given in Table 3.

Table 3 The remaining dataset after founding the first positive pattern

Let \(\hbox {Y}=(\hbox {y}_1 )\), where Y is the Boolean vector whose number of elements equal to the number of positive observation. The MILP is as follows:

$$\begin{aligned}&\hbox {Minimum}\,\hbox {y}_1\\&\hbox {S.t.}\\&\hbox {w}_1 +\hbox {w}_{6} \le 1,\hbox {w}_2 +\hbox {w}_7 \le 1,\hbox {w}_3 +\hbox {w}_8 \le 1,\hbox {w}_4 +\hbox {w}_{9} \le 1,\\&\qquad \hbox {w}_5 +\hbox {w}_{10} \le 1\\&\hbox {w}_2 +\hbox {w}_4 +\hbox {w}_5 +\hbox {w}_{6} +\hbox {w}_8 +5\hbox {y}_1 \ge \hbox {d},\hbox {w}_1 +\hbox {w}_3 +\hbox {w}_7\\&\qquad +\,\hbox {w}_9 +\hbox {w}_{10} \le \hbox {d}-1\\&\hbox {w}_4 +\hbox {w}_5 +\hbox {w}_6 +\hbox {w}_7 +\hbox {w}_8 \le \hbox {d}-1,\hbox {w}_1 +\hbox {w}_2 +\hbox {w}_4\\&\qquad +\,\hbox {w}_5 +\hbox {w}_8 \le \hbox {d}-1\\&\hbox {w}_5 +\,\hbox {w}_6 +\hbox {w}_7 +\hbox {w}_9 +\hbox {w}_{10} \le \hbox {d}-1,\hbox {w}_1 +\hbox {w}_2 +\hbox {w}_3\\&\qquad +\,\hbox {w}_4 +\hbox {w}_5 + \hbox {w}_6 +\hbox {w}_7 +\hbox {w}_8 +\hbox {w}_9 +\hbox {w}_{10} =\hbox {d}\\&1\le \hbox {d}\le 5,\hbox {w}_\mathrm{j} \in \left\{ {0,1} \right\} \,\forall \hbox {j}=1,\ldots ,10,\hbox {y}_1 \in \left\{ {0,1} \right\} , \end{aligned}$$

By solving the MILP for the second iteration, the strongest pattern is \(\hbox {W}=(0,1,0,0,0,1,0,0,0,0)\) which means that \(\hbox {p}_2^+ =\bar{\hbox {x}}_5\) and therefore the attributes’ values are (0, 1) at attributes \((\hbox {b}_1 ,\hbox {b}_2 )\).The pattern is of degree \(\hbox {d}=2, \,\hbox {Y}=[0]\) which means that all the positive observations are covered. Since all the positive observations are covered by at least one pattern, the pattern generation procedure is stopped. The same procedure is repeated in order to generate the negative patterns. Finally the generated patterns are:

$$\begin{aligned} \hbox {Positive patterns}: \hbox {p}_1^+&= \hbox {x}_2 \bar{\hbox {x}}_5\, \hbox {with weight}\, {\upalpha }_1^+ =2/3\,\hbox {and}\, \hbox {p}_2^+\\&= \bar{\hbox {x}}_5\,\hbox {with weight}\, {\upalpha }_2^+ =1/3\\ \hbox {Negative patterns}: \hbox {p}_1^-&= \bar{\hbox {x}}_5 \,\hbox {with weight}\, {\upalpha }_1^- =3/4\, \hbox {and}\, \hbox {p}_2^-\\&= \hbox {x}_1\,\hbox {x}_5 \, \hbox {with weight}\, {\upalpha }_2^- =1/4 \end{aligned}$$

The interpretability power of LAD is obvious from the fact that any user can now go back to the collected observations and check the existence of these patterns and their coverage, as well as their signs and their meanings. The hidden knowledge discovery property is also obvious, since even in this small example, a human mind will not discover these patterns easily. This pattern discovery process is done by using the software cbmLAD (Software 2012; Bennane and Yacout 2012). It took less than 1 second. Finally we note that the MILP is a procedure for pattern generation and discovery only. This means that LAD does not suppose any mathematical modeling of any relation between the variables.

The discriminant function that generates a score ranging between -1 and 1 is as follow.

$$\begin{aligned} \Delta \left( \hbox {O} \right)&= \sum \limits _{\mathrm{i}=1}^{\mathrm{N}^{+}} {\upalpha }_\mathrm{i}^+ \hbox {P}_\mathrm{i}^+ \left( \hbox {O} \right) -\sum \limits _{\mathrm{i}=1}^{\mathrm{N}^{-}} {\upalpha }_\mathrm{i}^- \hbox {P}_\mathrm{i}^- \left( \hbox {O} \right) \\&= \left( {\frac{2}{3}\hbox {p}_1^+ +\frac{1}{3}\hbox {p}_2^+ } \right) -\left( \frac{3}{4}\hbox {p}_1^- +\frac{1}{4}\hbox {p}_2^-\right) \end{aligned}$$

For example, for a new observation (1,0,0,1,0), the discriminant function is \(\Delta \left( \hbox {O} \right) =\left( {\frac{2}{3}(0)+\frac{1}{3}(0)} \right) -\left( {\frac{3}{4}\left( 1 \right) +\frac{1}{4}\left( 0 \right) } \right) =-0.75\)

The classification decision for this new observation is predicted to be the negative class.

Performance comparison

The ANN technique

ANN is the most famous and well known machine learning technique. It has high efficiency on adaptation and learning. For these reasons, it’s used widely as modeling tool in machining process (Benardos and Vosniakos 2002; Çaydaş and Ekici 2012). An ANN is generally composed of three types of layers: an input layer which accepts the input attributes and has the number of neurons equal to the number of attributes, hidden layers which have some number of neurons, and an output layer that has one neuron. The number of hidden layers and its neurons depend on the nonlinearity of the model. All neurons in any layer are interconnected to the neurons of the pre and after layers through weighted links (Sharma et al. 2008).

The input variables which are controllable and monitored uncontrollable, as well as the quality outcomes, that are the delamination and the surface roughness are shown in Fig. 3(A, B). We use four models. Model (A-1) has controllable variables as inputs, namely cutting speed, feed, tool overhang length, and cutting distance. The output is the delamination which can be conforming or non-conforming to specifications. Model (A-2) has controllable variables as inputs, namely the cutting speed, feed, tool overhang length, and cutting distance. The output is the surface roughness which can be conforming or non-conforming. Model (B-1) has the monitored uncontrollable variables, namely the forces in three coordinates and the mean temperature, as inputs. The output is the delamination which can be conforming or non-conforming. Model (B-2) has the monitored uncontrollable variables, namely the forces in three coordinates and mean temperature, as inputs. The output is the surface roughness which can be conforming or non-conforming.

Fig. 3
figure 3

ANN models: a controllable variables model, b monitored uncontrollable variables model

Unlike the LAD approach, the ANN is subjected to the overfitting phenomenon. In order to find a good model, we tried several ones and we retained the best (Russell et al. 1995). By the best, we mean that we choose the network architecture that gives the highest prediction accuracy in the validation test. This will be discussed in details in section “Process control system”. In this paper, we use the Weka data mining software (Hall et al. 2009). For the delamination analysis, the proportion of conforming observation to non-conforming is 12–96 which is, obviously, an unbalance between minority and majority classes. The Synthetic Minority Over-sampling Technique (SMOT) is applied to rebalance and alter the class distribution (Witten et al. 2011). SMOT adjusted the relative frequency between two classes in the data to 48 to 96. The same technique is applied for surface roughness analysis to adjust the relative frequency from 6 to102 to 24 to102. For further reading about SMOT, we refer the reader to (Chawla et al. 2011). Table 4 shows the best obtained networks architecture.

Table 4 ANN architectures for the four models

The learning rate parameter takes a value between [0,1] in order to determine the step size, and hence how quickly the search converges. If it is either too large or too small, the search may overshoot and miss the minimum entirely, or slow the progress toward convergence. A momentum parameter term takes a value between [0,1]. It’s used to update the value of a new weight by small proportion which leads to smooth searching process. The confusion matrix is \(\left( {{\begin{array}{ll} \hbox {G}&{} {\hbox {D}^{+}-\hbox {G}} \\ {\hbox {D}^{-}-\hbox {H}}&{} \hbox {H} \\ \end{array} }} \right) \), where \(\hbox {G}\) is the total number of correctly classified positive observations, H is the total number of correctly classified negative observations, and \(\hbox {D}^{+}(\hbox {D}^{-})\)is the number of positive (negative) observations.

Validation and comparison

The validation and the comparison between different techniques often represent a challenge for machine learning researchers (Wolpert 1996). Usually, two different learning techniques used for the same problem, and their results, are compared in order to decide which technique is better to use. By calculating the accuracy, which is obtained from cross-validation with several repetitions, the technique that has the higher accuracy is retained. This procedure is quite sufficient for comparison in many practical applications (Witten et al. 2011). In (Hammer and Bonates 2006; Yacout 2010), LAD methodology was compared to the best reported results obtained by machine learning technique. The comparison was performed on a number of well-known problems which are conceived and kept in repositories in order to be used by researchers. The comparison was favorable to LAD technique (Mortada et al. 2009). In this paper, two qualities, namely the delamination and the surface roughness, are considered. For each one of the two qualities, the given specifications divide the outcomes space into two distinct spaces, the space of conforming products (positive) and the space of nonconforming products (negative). We also divide the set O of the n observations into two sets of training, L, and testing, T. In this paper we present the results obtained when the training set is composed of (n\(-\)1) observations and the testing is formed of the remaining observation. To calculate the classification accuracy we repeated the training-testing process n times, where each observation was chosen exactly once to constitute the testing set. This training and testing procedure is known as leave-one-out (LOOC) cross validation procedure, which is considered by many machine learning references as the best validation procedure when the amount of data for training and testing is limited (Witten and Frank 2011). LOOC is a special case of K-fold cross validation, where (K=n), n is the total number of observations. This procedure is attractive for two reasons. First, the greatest possible amount of data is used for training, which presumably increases the chance that the classifier is an accurate one. Second, the procedure is deterministic, which means no random sampling is performed. For example, if we divide the training set to equal parts, 50 % for learning and 50 % for testing, we omit 50 % of limited number of observations from the learning process, which affects negatively this process. Moreover, we will need a sampling strategy in order to choose 50 % of the observations. In this paper, we present the results obtained when the training set is composed of (n\(-\)1) observations, and the testing is formed on the remaining observation. The procedure is then repeated n times. Two measures of accuracy, ACCURACY and the quality of classification, \(({\upnu })\) are used, where

$$\begin{aligned} {\upnu }=\frac{a+b}{2}+\frac{e+g}{4} \end{aligned}$$

The values (a) and (b) represent the proportion of observations, positive and negative, which are correctly classified. The values (e) and (\(\hbox {g})\) represent the proportion of observations, positive and negative, which are not classified. Another measure is the \(\hbox {ACCURACY} =\frac{\hbox {G}+\hbox {H}}{\hbox {Nt}}\), where \(\hbox {G}\) is the total number of correctly classified positive observations, \(\hbox {H}\) is the total number of correctly classified negative observations, and \(\hbox {Nt}\) is the total number of observations in the testing set. Table 5 shows the accuracy of the four models and the comparison between the accuracy of the ANN and the LAD techniques.

Table 5 Accuracy of the ANN and the LAD techniques

In general, all statistical models are biased in one way or another; therefore, the comparisons between learning algorithms that are using different priors is meaningless (Wolpert 1996). Here, we compare between two different techniques, the ANN and the LAD. LAD methodology was compared to the most popular techniques of machine learning, such as Support Vector Machine (SVM), and ANN (Hammer and Bonates 2006; Yacout 2010). In general, if the comparison shows that one of the algorithms has substantially high accuracy in comparison to the other, that algorithm should be used (Wolpert 1996). Obviously, it can be seen that the accuracy of LAD compares favorably with that of ANN.

Process control system

Our objective is to use the data presented in Table 1 in order to train LAD to detect automatically and without human interference, the threshold values and characteristic patterns for zones of machining conditions, that lead to acceptable quality, and those that lead to unacceptable quality. Although LAD generates positive and negative patterns for each of the four problems, in the following machining process control we use only the positive patterns of Models (A-1) or (A-2), and only the negative patterns generated for Models (B-1) or (B-2). In order to reach this objective, the software cbmLAD (Software 2012) is trained by using the data obtained from the experimental results that are shown in Table 1. Table 6 shows the positive characteristic patterns for Models (A-1) and (A-2), and the negative characteristic patterns for Models (B-1) and (B-2), which are found by the software.

Table 6 The positive patterns obtained by LAD for Models (A-1) and (A-2), and the negative patterns obtained for Models (B-1) and (B-2)

These generated patterns are used in the machining process control. The generated positive patterns illustrate the threshold boundaries for the controllable conditions that will always lead to conforming (positive) parts. In our machining process control, the negative patterns that are formed with the uncontrollable variables are used to give an alarm indicating that the machining process is beginning to produce unacceptable products. For example, the generated negative patterns (1) for Model (B-2) is \(\hbox {Fx} >24.745\). This means that as long as \(\hbox {Fx}\) is higher than 24.745 the machined part will be non-conforming to the required specification of surface roughness. The same can be said for the negative pattern (5), which is \(\hbox {Fx}<14.91\). These two constraints together illustrate the boundaries for the zone of \(\hbox {Fx}\) which should be avoided during the machining process. As we have explained, cbmLAD identifies and characterizes these regions perfectly and by using the lowest possible number of variables. To avoid the zones which are defined by the negative patterns, a simulated adaptive control loop is developed as shown in Fig. 4. The generated patterns are incorporated in the machining process control which is shown in Fig. 4.

Fig. 4
figure 4

The machining process control

The machining process control is an adaptive control loop with an automatic adjustment of machining parameters, in our case the feed and speed, in order to improve operation productivity and part quality (Liang et al. 2004). Due to machine design constraints and complexity of finding monitoring parameters constraints and thresholds, process control loop is not commonly available in CNCs. Nevertheless, it attracts many researchers due to its potential to significantly improve operation productivity and part quality (Liang et al. 2004). In this paper, we assume that the machining process is monitored through sensors. Sensor’s measurements are analyzed by the software cbmLAD in order to detect and identify the characteristic patterns; the patterns are obtained from the experimental data. They are then used in order to build the adaptive control loop. In Fig. 4, a schematic diagram shows the machining process control. It starts by an off-line pattern generation by using cbmLAD. The generated patterns are transmitted to a “LAD On-line Decision Making” unit. The on-line loop starts by monitoring the uncontrollable variables. At each second, and by comparing the uncontrollable variables’ values to the negative characteristic patterns, which are stored in “LAD On-line Decision Making” unit, a decision is made to whether change the values of the controllable values or to keep the current values. In the former case, the information is sent to the “Process Controller “unit in order to adjust the controllable variables to the nearest positive patterns’ zones. The adjusted variables are the inputs to the actuator and the spindle drive.

In order to give a simulated example of the machining process control for the delamination quality, a simulated machining process control system is developed using labVIEW 8.5 software (Elliott et al. 2007). For example, we show in Fig. 5 the front panels of Models (B-1) and (A-1). We use the negative patterns for the uncontrollable variables of Model (B-1), and the positive patterns for the controllable variables of Model (A-1), as shown in Table 6. The uncontrollable variables, which are the forces in three coordinates (\(\hbox {Fx}, \hbox {Fy},\hbox {Fz})\) and the mean temperature Tmean, are monitored and their values are sent to “LAD On-line Decision Making” unit every second in order to compare them to the stored negative patterns. A decision is then taken to either change the values of the controllable operating conditions in order to avoid the negative patterns’ zones, or to keep them as they presently are. “LAD On-line Decision Making” gives an alarm if the uncontrollable variables comply with one of the negative patterns in Model (B-1). If the alarm is given, the “Process Controller” selects one of the positive patterns in Model (A-1). The selection of a positive pattern is guided by the dynamics of the machining process. The new values of the controllable variables are found in the selected positive patterns, and are the inputs to the actuator and the spindle drive. Adaptive control loop is looping at every second until “LAD On-line Decision Making” alarm is off. Figure 6 shows the flowchart for the process control.

In order to test the procedure that is described in the previous paragraph, a simulation model of the process control is developed. We assume that the correlation between controllable variable (speed \((v)\), feed (\(f\)), tool overhang length (\(\hbox {TL})\), and cut distance (\(\hbox {C}))\) and the monitoring variables (forces in three coordinates (\(\hbox {Fx},\,\hbox {Fy},\,\hbox {Fz})\), mean temperature Tmean) for milling the CFRP composite material is represented by a simple multiple linear regression with a sample size (n) of 108. This assumption is only used in order to generate the values of the uncontrollable forces; in real life these values will be generated by the milling process itself, and they are captured by the sensors. The equations obtained using Weka data mining software were as follow:

$$\begin{aligned} \hbox {Fx}&= -0.0012\,v+0.0402\,f+0.4033\,\hbox {TL}+0.1913\,\hbox {C}\\&\quad +\,20.1253\\ \hbox {Fy}&= -0.0014\,v+0.0493\,f+0.5711\,\hbox {TL}+0.0719\,\hbox {C}\\&\quad +\,14.373\\ \hbox {Fz}&= -0.0006\,v+0.0235\,f+14.9966\\ \hbox {Tmean}&= 0.1952\,f+7.4782\,\hbox {TL}+1.0048\,\hbox {C}-45.2817 \end{aligned}$$
Fig. 5
figure 5

On-line machining process control for delamination quality using LabVIEW

Fig. 6
figure 6

Flow chart of process control

Since tool overhang length cannot be changed on-line, it was predefined and fixed before the simulation. According to the positive patterns 2–6 in Model (A-1), tool overhang length was restricted to less than 27.5. We chose an overhang length of \(\hbox {TL}=24,25,26,\,\hbox {and}\,27\,\hbox {mm}\), for our simulated example. This means that only these five positive patterns of Model (A-1) are available to the “Process Controller” in order to control the machining process, since the first pattern can be satisfied with TL higher than 27 as long as it is less than 34.5. The cutting distance is also a predefined input which is set by the user before starting the simulation, and has a predefined value in the range of \(\hbox {C}\le 96\,\hbox {mm}\) during the simulation runs. For testing the simulated process control, we run the simulation model at \(\hbox {C}=24,27,30,\ldots ,87,90,93\,\hbox {and}\,96\,\hbox {mm}\) were performed. The total number of simulated runs are thus are equal to 100. As an example, Table 7 shows the results of how the iterations terminate by selecting one of the four positive patterns of Model (A-1). The elapsed time to find the positive pattern depends on the initial conditions, the inertia of CNC machine, and the number of positive patterns that were generated off-line, in this example we have four positive patterns. Run No 1 terminates after 13 seconds, by finding the positive pattern number (5) in Model (A-1), and run No 2 terminates in 4 seconds and found pattern number (4). In this work, we considered the iteration step as one second.

Table 7 Two runs of the simulated process control using LabVIEW

Conclusions

In this paper, LAD is applied to high speed routing of CFRP, and found the characteristic patterns that lead to conforming products and those which lead to nonconforming products, by exploiting the results obtained experimentally of a routing process of CFRP. LAD accuracy is compared to that of ANN. An on-line machining process control is developed by using the patterns that were found off-line. A simulated machining process control is implemented by using the experimental results, and LabVIEW software. The simulation model shows how LAD is used to control the routing process by tuning autonomously the routing conditions in order to always return to the machining zones defined by the positive patterns.

For the areas of further research, we are presently working on incorporating the machining process control in a real computer numerical control (CNC) machine. The learning phase will be done off-line by cbmLAD based on data obtained from sensors which are mounted to the CNC machine. At each unit of time, a new sensors’ reading is transmitted to the unit “LAD On-line Decision Making”. This latter works on-line in order to give and alarm each time a negative pattern of the uncontrollable variables is detected. The unit “Process Controller” searches on-line for a positive pattern of the controllable variables, then a decision to change the values of the controllable values or to keep the current values is taken. In the latter case, the actuator and the spindle execute the “Process Controller’s” command. We are also working on studying the effects of initiating the alarm based on the discriminant function of the new observation instead of on only the appearance of a negative pattern.