Introduction

As a significant tool for the statistical process control (SPC), a control chart plays an important role in the manufacturing quality control, wherein it is widely used to monitor whether the machining process is in control or not. A manufacturing process is considered natural or normal if only random causes are affecting its operation (Zan et al. 2010). Therefore, unnatural control chart patterns (CCPs) displayed on the control charts can be associated with specific causes that adversely affect the manufacturing processes (Western Electric Company 1958). Nelson (1984) provided possible explanations and corrective actions for unnatural patterns. For instance, he stated that trend patterns might point out to wear or thermal deformation of key parts of a machine tool, while changes in operators, materials, or equipment might result in shift patterns; furthermore, the cyclic patterns might be related to the periodic variation in the power supply (Hachicha and Ghorbel 2012).

When some points exceed the boundaries of the control chart or when the control chart displays an unnatural pattern, that means the monitored manufacturing process is out of control (Montgomery 2007). The former can be easily recognized by the quality practitioners while identifying the latter requires some specific methods. For this reason, many scholars have formulated supplementary rules for detecting the unnatural patterns, such as the Geometric Moving Averages test (Roberts 1959), the Runs rules (Ducan 1986), and the Zone rules (Nelson 1985). However, based on the subsequent research Cheng (1997) pointed out that there is no one-to-one mapping relation between a supplementary rule and an unnatural pattern; moreover, utilization of too many rules may lead to too many false alarms. Manual experience and knowledge are still needed to identify the alarm authenticity, which introduces an additional burden to the quality control (Ranaee and Ebrahimzadeh 2011). Accordingly, many studies pointed out that supplementary rules are often ineffective in recognizing the CCPs (Davis and Woodall 1988; Yang et al. 2015). Due to the deficiencies of supplementary rules and constantly increasing requirements for intelligent manufacturing, the interest in developing the accurate and automatic control chart pattern recognition (CCPR) algorithms has been significantly increased. Consequently, many CCPR methods have been developed recently, and they can be divided into two categories: expert system methods and machine learning methods.

Due to the limitation of the expert system methods, there is less research in this field. In 1987, Swift designed the first expert system for CCPR (Swift 1987). On this basis, many other studies have been conducted (Cheng and Hubele 1992; Kuo and Mital 1993; He et al. 2013) including the research on developing an expert system for CCPR. Bag (2012) focused on the design and development of an expert system for on-line detection of various control chart patterns to enable the quality control practitioners to initiate prompt corrective actions for an out-of-control manufacturing process. The typically used algorithms mainly included the statistical test and heuristic algorithm. The performance of these expert system methods is not excellent because their judgment rules are flawed (Zhou et al. 2018); however, their engineering application value is worth affirming.

On the other hand, the machine learning methods have been widely applied in the CCPR, and excellent results have been achieved. The machine learning methods can be classified into two categories, namely artificial neural networks (ANNs) and support vector machine (SVM). The ANNs based methods include supervised algorithms such as a multi-layer perceptron (MLP) (Cheng 1997; Pham and Wani 1997; Al-Assaf 2004; Ranaee and Ebrahimzadeh 2013), a learning vector quantization (LVQ) (Guh 2008; Gauri 2010; Yang and Zhou 2015), a probability neural network (PNN) (Cheng and Ma 2008), and a radial basis function (RBF) neural network (Addeh et al. 2018). Additionally, a fuzzy adaptive resonance theory map (ARTMAP) neural network having an incremental learning ability was proposed to recognize the CCPs (Zan et al. 2010); an unsupervised algorithm was used to realize the CCPs clustering analysis. It was reported that this method was superior to the traditional MLP method. However, the clustering results were more than the actual pattern type, resulting in an inconvenient use. In (Awadalla and Sadek 2012), a spiking neural network (SNN) architecture was proposed to be used for the CCPR. The SNNs represent the third ANN generation which considers time as an important feature for information representation and processing. Furthermore, the learning algorithm is improved to provide perfect learning rules. However, the ANNs have certain shortcomings, which hinder their universality and practicability, such as difficulty in convergence, ease relapsing into a local extremum, difficulty in determining a most suitable network structure, etc. As a new generation of the machine learning methods, the SVM has been widely applied in the CCPR achieving good results. In Zhou et al. (2018), a novel CCPR method integrating the Fuzzy SVM (FSVM) with the hybrid kernel function and genetic algorithm (GA) was proposed; the obtained simulation results demonstrated that the proposed method achieved an excellent performance and outperformed other approaches such as the LVQ the MLP, the PNN, fuzzy clustering, and the SVM, in term of the recognition accuracy. Zhao et al. (2017) proposed a CCPR method based on the improved supervised locally linear embedding and SVM. It was used to reduce the dimensionality of a high-dimensional feature set. The results showed that the dimension of the feature set had a great influence on the classification accuracy but the proposed method identified patterns correctly.

Another important problem in the pattern recognition field is a data form to be used as a classifier input, i.e., input data representation. The first form denotes raw data (Pham and Oztemel 1994; Hassan et al. 2003; Cheng and Ma 2008). However, there are some problems when the CCPR directly uses unprocessed CCP data such as high input dimension. Namely, high-dimensional input data usually results in too large classifier size, which leads to a reduction in accuracy and efficiency for complex recognition problems (Ranaee and Ebrahimzadeh 2011). The second form denotes the feature set, consisting of shape features (Pham and Wani 1997; Gauri and Chakraborty 2006, 2009; Pelegrina et al. 2016), statistical features (Pelegrina et al. 2016; Addeh et al. 2018), or wavelet analysis features (Jin and Shi 2001; Ranaee and Ebrahimzadeh 2011). Most related studies showed that the CCP classifiers using the feature set as an input achieved a significantly better performance compared to the classifiers using raw data as input (Hachicha and Ghorbel 2012). Besides, it is well known that choosing the most suitable feature set is the key to improve the recognition accuracy. Although many feature extraction methods have been proposed to solve the CCPR problem, and the features have been designed by experts for the required tasks, the full potential of the feature-based approach has not been fully exploited yet because the discarded raw data still contains much important information. Thus, the most suitable feature set for a CCPR is still unclear, and a more comprehensive and effective feature extraction method such as feature learning is needed.

Feature learning refers to the collection of techniques that learn a transformation or sequence of transformations of raw data so that the data is optimally represented for a required task (Janssens et al. 2016). In recent years, the deep neural networks (DNNs) have been widely applied in feature learning. Compared with the traditional ANNs, the special structure of the DNNs makes it possible to extract features from raw data (Ajm and Hulzebosch 1996). As a representative type of the DNNs, the convolutional neural networks (CNNs) have made remarkable achievements in the image recognition field. The CNNs use the alternating convolution and pooling layers to transform input data and optimize the convolution kernel by back propagation (BP) algorithm to minimize the cost (or loss) function, so that at the classification step, the input data is transformed optimally, i.e., the optimal features are learned for the classification task. With the popularity of deep learning, scholars have begun to use the CNNs for fault diagnosis (Liao et al. 2017; Xia et al. 2018; Xie and Zhang 2017). Janssens et al. (2016) used the CNNs to extract the vibration signal features and conduct the rotating machinery fault diagnosis, and the excellent result was obtained.

In this paper, a one-dimensional convolutional neural network (1D-CNN), which is sensitive to the time sequence, is proposed to extract the CCP features and realize the CCPR. To the best of our knowledge, only one article presenting the CNN-based CCPR has been published recently (Miao and Yang 2019). In Miao and Yang (2019), the mathematical methods were used to extract features (statistical and shape features) from raw data, and then a CNN was applied to the extracted features. However, it should be noted that their work differs from the work presented in this paper because here, no feature is extracted from the CCPs and the CNN-based model is applied to the raw data so that the network can learn optimal features for the CCPR. In other words, feature learning not only simplifies the model and reduces time consumption, but also can get more optimal features than the one extracted manually. In addition, the method we propose is helpful to improve the automation and intelligence level of quality management in the manufacturing process.

The rest of the paper is organized as follows. “Methodology” Section explains the simulation method of the CCPs and presents the recognition algorithm. “Proposed method” Section introduces the proposed method in detail. “Experiments” Section presents the verification test and gives the results. Subsequently, the results are discussed in “Discussion” Section. Lastly, “Conclusion” Section concludes the paper.

Methodology

Simulation method of CCPs

As shown in Fig. 1, there are generally six typical CCPs in the production process, namely, normal (NOR) pattern, upward- and downward-shift (US and DS) patterns, upward- and downward-trend (UT and DT) patterns, and cycle (CYC) pattern. Except for the NOR pattern, the rest of the unnatural patterns corresponding to certain unusual changes in a monitored manufacturing process.

Fig. 1
figure 1

Typical CCPs in the production process

The Monte-Carlo simulation is mostly used by scholars to provide a large number of the CCPs for the recognition algorithm. The process mean and two noise components are used to create the data points for the various patterns (Zan et al. 2010) as follows:

$$ y\left( t \right) = \mu + x\left( t \right) + d\left( t \right) $$
(1)

where y(t) denotes the value of a sample collected at time t, t is the time of sampling, μ is the statistical mean when the process is in control, x(t) is a random noise at time t, and it obeys normal distribution, x(t)~ N(0, σ); σ is the standard deviation when the process is in control, and d(t) is the special disturbance caused by specific factors in the manufacturing process at time t.

Based on Eq. (1), the simulation method of various typical CCPs is as follows.

The NOR pattern is given by:

$$ d\left( t \right) = 0 $$
(2)

The US and DS patterns are given by:

$$ d\left( t \right) = \pm v \times s $$
(3)

where v is parameter determining the shift position, and it is equal to 0 before the shift and to 1 after the shift, s is the shift magnitude; sign “+” is used for the US pattern, and sign “−” is used for the DS pattern.

The UT and DT patterns are given by:

$$ d\left( t \right) = \pm v \times d \times t $$
(4)

where v is parameter determining the trend position, and it is equal to 0 before the trend and to 1 after the trend, d is the slope of a trend; the sign “+” is used for the UT pattern, and sign “−” is used for the DT pattern.

The CYC pattern is given by:

$$ d\left( t \right) = v \times a \times \sin \left( {2\pi t/\omega } \right) $$
(5)

where a is the amplitude of a cycle, and ω is the period of a cycle.

CNN model

Deep learning has made outstanding achievements in the field of pattern recognition. Compared with the traditional machine learning methods, deep learning has many advantages. (1) Data pre-processing can be omitted completely, and raw data can be directly used for model training and testing. (2) A multi-layer neural network can be used to learn deeper knowledge, and it is competent for more complex tasks. (3) The most appropriate features can be learnt for classification tasks. The comparison of deep learning and traditional machine learning methods is shown in Fig. 2.

Fig. 2
figure 2

Comparison of deep learning and traditional machine learning

A CNN is a type of a deep neural network. As shown in Fig. 3, a CNN is made up of three types of layers: convolutional layer, subsampling layer (or pooling layer), and fully connected layer with a cost function. A typical CNN structure can, therefore, be divided into two parts; namely, the convolutional and pooling layers work as a feature extractor, and the fully connected layer works as a classifier (Xie and Zhang 2017).

Fig. 3
figure 3

Typical CNN structure

A convolutional layer is the most important component of a CNN. The weights and biases of a convolutional layer are organized into a series of convolutional kernels (or filters). A set of output feature maps can be acquired by using different filters. Each output feature map is the result of a convolution of multiple input feature maps and multiple convolutional kernels, which is given by:

$$ \varvec{x}_{j}^{l} = f\left( {\mathop \sum \limits_{i = 1}^{{D_{l - 1} }} \varvec{x}_{i}^{l - 1} *\varvec{\omega}_{ij}^{l} + \varvec{b}_{j}^{l} } \right),\quad j = 1, 2, \ldots , D_{l} $$
(6)

where * represents the convolution operation, l represents the serial number of current network layer, D is the number of feature maps, \( \varvec{\omega}^{l} \) is a convolutional kernel connecting the (l−1)th layer to the lth layer, and its size is \( r \times c \), \( r \) represents height, \( c \) represents width, \( \varvec{x}_{j}^{l} \) represents the jth output feature map, b is the additive bias of each output feature map, and f is an activation function.

Most commonly used nonlinearity activation functions are Sigmoid function and rectified linear units (ReLU) function, which are respectively given by:

$$ f\left( x \right) = 1/\left( {1 + e^{ - x} } \right) $$
(7)
$$ f\left( x \right) = max\left( {0, x} \right) $$
(8)

The size of the output feature map of the lth convolution layer is \( R_{l} \times C_{l} \), \( R \) represents height, \( C \) represents width, and it is calculated by:

$$ R_{l} \times C_{l} = \left[ {\left( {R_{l - 1} - r} \right)/s + 1} \right] \times \left[ {\left( {C_{l - 1} - c} \right)/s + 1} \right] $$
(9)

where s is the moving stride of a convolution kernel, and in this work, s is set to 1.

In the subsampling layer, the downsampling is completed so that the dimension of the feature maps can be quickly reduced. A subsampling layer is mathematically represented by:

$$ \varvec{x}_{j}^{l} = down\left( {\varvec{x}_{j}^{l - 1} } \right),\quad j = 1, 2, \ldots , D_{l} $$
(10)

where l represents the serial number of the current network layer, \( D_{l} \) is the number of input feature maps, \( \varvec{x}_{j}^{l} \) represents the jth output subsample map, and down represents a pooling function.

The commonly used subsampling strategies are the max pooling and the average pooling. In the max pooling strategy, the maximum value of the subsampling region is taken as a new feature, while, in the average pooling strategy, the mean value of the subsampling region is taken as a new feature. Scholars believe that max pooling reflects the most striking feature of the subsampling region, while the average pooling selection is more smooth (Xie and Zhang 2017).

The size of the output subsample map of the lth subsampling layer is \( R_{l} \times C_{l} \), and it is calculated by:

$$ R_{l} \times C_{l} = \left( {R_{l - 1} / u} \right) \times \left( {C_{l - 1} /u} \right) $$
(11)

where u is the step size of pooling operation. In this work, u is set to 2.

As shown in Fig. 3, the feature maps are expanded and spliced together in the first fully connected layer. The number of neurons in this layer is M, representing M features extracted by the CNNs. The M is calculated by:

$$ M = R_{l - 1} \times C_{l - 1} \times D_{l - 1} $$
(12)

Neurons in a fully connected layer have full connections to all neurons in the previous layer like in a regular neural network. Hence, they can be computed with a matrix multiplication followed by a bias offset:

$$ \varvec{O} = f\left( {\varvec{\omega}_{o} \varvec{f}_{v} + \varvec{b}_{o} } \right) $$
(13)

where \( \varvec{f}_{v} \) is the input vector of a fully connected layer, \( \varvec{b}_{o} \) is the bias vector, and \( \varvec{\omega}_{o} \) is the weight matrix.

The last layer in a CNN is the output layer, containing N neurons represents the number of pattern types to be identified. Usually, the activation function of the output layer is a Sigmoid function or a Softmax function, which are respectively given by Eqs. (7) and (14). Finally, in the training phase, the BP algorithm is used to optimize the weights and biases (\( \varvec{\omega}_{ij}^{l} \), \( \varvec{\omega}_{o} \), \( \varvec{b}_{j}^{l} \), and \( \varvec{b}_{o} \)) in a CNN to minimize the value of the cost function.

$$ f\left( {x_{i} } \right) = e^{{x_{i} }} /\mathop \sum \limits_{j = 1}^{N} e^{{x_{j} }} $$
(14)

In this paper, a 1D-CNN, which is slightly different from a typical CCN, is used to complete the CCPR. More details about the 1D-CNN structure are introduced in the following sections.

Proposed method

Manual feature extraction from raw CCP data can improve the recognition accuracy, but it also increases the workload and complexity of the quality control. Therefore, this paper proposes a 1D-CNN to complete the CCPR so as to extract the CCP features through the feature learning.

Compared with the traditional machine learning method, the advantage of a CNN is that it can realize end-to-end recognition or diagnosis. In other words, the input of the neural network is raw data, and the output is the pattern type. Feature extraction, feature selection, and feature optimization are completed by the convolution and pooling layers of a CNN. The weights and biases of a CNN structure are optimized and adjusted by the BP algorithm by minimizing the cost function. Thus, the best feature set is obtained through the CNN learning process. In that way, a lot of manpower is saved, and complex work is finished by a neural network. The structure of the proposed 1D-CNN is slightly different from a common CNN structure. The feature mapping in the 1D-CNN structure is not a matrix but a vector, which makes the 1D-CNN particularly sensitive to the time sequence such as a CCP.

The structure of the proposed CCPR method is shown in Fig. 4, where it can be seen that the proposed CNN structure consists of two convolution layers, two pooling layers, and a fully connection layer, such that the convolution and pooling layers complete feature extraction, and the fully connection layer realizes classification. Since the proposed network represents a 1D-CNN, the CNN structural parameters C and c (the width of feature maps and convolutional kernels) equal to 1. The dimension of the raw CCP data is 25, and there are six different patterns as mentioned earlier; thus, the structural parameters are R1 = 25 and N = 6. The other structural parameters of the 1D-CNN, such as the number of feature maps (D) and the height of the convolution kernel (r), will be discussed and optimized in the next section. The pooling function of the subsampling layer is max pooling.

Fig. 4
figure 4

The structure of the proposed CCPR method

Since the deep learning is used in the proposed CCPR method, the main steps of the proposed method are very simple, and they are as follows:

  • Step 1 The training and test sets containing the raw CCP data are generated by the method introduced in “Methodology” Section.

  • Step 2 Training set is used to train the 1D-CNN and optimize the network weights and biases.

  • Step 3 The optimized 1D-CNN is validated by the test set.

Experiments

A series of simulations were conducted to verify the feasibility and effectiveness of the proposed method. The 1D-CNN was implemented by using Python programming program, and it was run on a personal computer with a 2.20-GHz CPU and 2 GB RAM. The correct recognition ratio (CRR) was used as an estimation criterion, and it represented the ratio of the number of correctly classified patterns to the total number of test patterns. The confusion matrix was used to present the evaluation results more intuitively. Finally, the validity of 1D-CCN was evaluated through the comparison with the existing methods of the same type.

CCP parameters

Six common CCPs were considered, namely NOR, UT, DT, US, DS, and CYC. The Monte-Carlo simulation algorithm introduced in “Methodology” Section was used to generate datasets for training and testing of the 1D-CNN. The parameters of the six CCPs are shown in Table 1.

Table 1 The parameters of six CCPs

In the actual manufacturing process, the change of a monitored object is very complicated. In order to simulate this situation more truly, the CCP parameters were randomly selected within a certain range, using the uniform distribution. The mean μ was set to 30, and the standard deviation σ was set to 0.05. The value of the slope d varied in the range [0.1σ, 0.3σ]. The magnitude of the shift s changed in the range [1.5σ, 3σ], and the amplitude of cyclic patterns a varied in the range [1.5σ, 4σ]. The value of the period of cycle ω was respectively set to 4, 5, 6, 7, and 8. The starting position of each unnatural pattern was in the range [4, 9]. The data length of the CCPs was 25. By comparing the recognition rate of different sample size, it was found that when the number of samples exceeds 2000, the improvement of the recognition rate will not be significant. At the same time, the computational cost of the experiment increases exponentially. After taking these factors into account, the generated dataset contained 2000 samples per CCP, which was a total of 12,000 different samples. The dataset was random divided into two parts, of which 9600 samples were used to train 1D-CNN, and the rest was used for testing.

1D-CNN performance dependence on structural parameters

In order to determine an optimal 1D-CNN structure, the 1D-CNN performance was compared for different structural parameters. The comparison results are shown in Table 2. Namely, two sets of experiments were carried out with the same dataset. The activation function of each 1D-CNN layer was the Sigmoid function. The purpose of the first experiment was to optimize the size of the convolution kernel (r), and the results are given in Table 2. Afterward, the second experiment was carried out to optimize the number of feature maps (D), and the results are also presented in Table 2. The mean square error (MSE) of the experiments is shown in Fig. 5.

Table 2 The 1D-CNN performance at different structural parameters
Fig. 5
figure 5

The MSE of the experiments. The numbers in the figure legend correspond to the numbers in Table 2

We trained the neural network with different structural parameters. The respective CRR, the training time of each epoch, and the MSE were determined. As it can be seen in Table 2 and Fig. 5, the first combination of structural parameters had the highest CRR, the shorter training time, and the fastest convergence speed. Therefore, it denoted the best-performance 1D-CNN structure, which was used in the subsequent experiments. All the network parameters were determined, and they are summarized in Table 3.

Table 3 Structural parameters of the 1D-CNN

Influence of activation function on recognition performance

To select the most convenient activation function, we performed a set of simulations whose results are shown in Table 4, and the corresponding MSE is shown in Fig. 6. In scheme 1, the activation function of each layer in the 1D-CNN was the Sigmoid function, and in scheme 2, the activation function of the convolution layer was the ReLU function, and the activation function of the output layer was the Softmax function. As it can be seen in Table 4, the 1D-CNN with the ReLU and Softmax activation functions performed better than that with the Sigmoid function; both the CRR and the convergence speed were significantly better.

Table 4 Comparison of 1D-CNNs with different activation functions
Fig. 6
figure 6

The MSE for different activation functions

Performance comparison between MLP and 1D-CNN

To evaluate the performance of the proposed method further, a comparative experiment of 1D-CNN, CNN and MLP was conducted. As already mentioned, the MLP as a machine learning method with an excellent performance has been widely used in the CCPR and has achieved good results (Pham and Wani 1997; Al-Assaf 2004; Ranaee and Ebrahimzadeh 2013), which is why we used it in this work. The input of 1D-CNN was raw data, the input of CNN was control chart image data, which were 60 × 60 pixels in size \( (R_{1} = C_{1} = 60) \), and the MLP input was either the raw data or the feature set. The feature set included the statistical features (mean, standard deviation, skewness, and kurtosis) and shape features (S, NC1, NC2, APML, APSL), which have been proved to be very effective in the CCPR (Pham and Wani 1997; Hassan et al. 2003; Gauri and Chakraborty 2009; Pelegrina et al. 2016). A detailed description of the features can be found in (Pham and Wani 1997; Hassan et al. 2003). The MLP used in the experiment was a typical three-layer neural network, containing 25 neurons that were receiving the CCPs raw data, or 9 neurons which were receiving the feature set. There were 15 neurons in the hidden layer, and 6 neurons in the output layer corresponding to the six typical CCPs. The activation function of all layers was the Sigmoid function. The structural parameters of the CNN were as follows: \( r_{2} = c_{2} = r_{4} = c_{4} = 5 \), \( u_{3} = 4 \), \( u_{5} = 2 \), and other parameters were the same as the 1D-CNN. In order to evaluate the classifier performance more intuitively, the confusion matrix was used. The values on the confusion matrix diagonal denoted the percentage of correctly recognized patterns. The other values in the matrix represent the percentage of misclassifications (Sokolova and Lapalme 2009). The CRR was equal to th e average value of all the elements on the matrix diagonal. The confusion matrix is shown in Fig. 7, where it can be clearly seen that the performance of the 1D-CNN exceeded other methods performance. Namely, the 1D-CNN achieved higher recognition accuracy and lower error rate.

Fig. 7
figure 7

The CCPR confusion matrix for a the MLP and raw data, b the MLP and feature set, c the 1D-CNN and raw data, and d the CNN and image data

In addition, the performance in the three cases was compared using different number of samples per pattern. As shown in Table 5, the number of samples of each CCP was respectively set to 200, 2000, or 20,000. The recognition rate and time of each epoch were determined, and obtained results are shown in Table 5, where it can be demonstrated that the proposed method was superior to the traditional MLP method and the CNN method with image data as input in terms of both recognition accuracy and time consumption.

Table 5 Comparison results for different numbers of samples

Comparison of 1D-CNN and other methods

To further evaluate the effectiveness of the CNN, the proposed method was compared with the methods reported in the related literature. As shown in Table 6, many classifiers, including the MLP, PNN, Fuzzy ARTMAP, SVM, and RBF, were used for the CCPR. The input was either raw data or feature set. Based on the results presented in Table 6, using the features as an input was generally more effective than using the raw data. Particularly, when image and statistical features were combined, the performance was most prominent. When the input of the 1D-CNN was a raw signal, the final output of the convolution and pooling layers could be understood as an optimal feature set extracted by the neural network through the learning process. The box-plots of some features are plotted in Fig. 8. The total number of features M was equal to 24, but due to the limited space, only six features are displayed in Fig. 8. In Zhou et al. (2018) and Ranaee and Ebrahimzadeh (2013) similar box-plots of shape and statistical features were given. In Fig. 8, the values of features of the same CCP type were closer to each other and separated from that of the other types in the feature space, which proved that CNN learned features from the raw data more than excellent.

Table 6 Performance comparison of different methods
Fig. 8
figure 8

Box-plots of some features for different CCP types

The results showed that the recognition rate of the proposed method was 98.33%, representing one of the higher results, but not the highest, which might be caused by the following reasons. The datasets used in training and testing classifiers were different. Also, in some literature, the parameter setting lacked the randomness when generating simulation data, such as selecting only a few fixed values instead of uniform distribution in a certain range, or the parameter setting range was too narrow, which could deviate from the actual CCPs. With the aim to make the simulation data more close to the real patterns, the parameters presented in Table 1 were adopted, and the parameters obeyed a uniform distribution in a wide range. This inevitably increased the difficulty of the CCPR, resulting in a decrease in the recognition rate, which was caused by large randomness of parameters and the existence of a random noise x(t) [see Eq. (1)] so that the CCPs dataset generated by the simulation algorithm contained some dirty data. In Fig. 9a, the pattern computed by Eqs. (1) and (4) is presented, and according to its category label in the dataset it represented a UT pattern, but this pattern was more like a US pattern, and it was also recognized as a US pattern by the 1D-CNN. Even when the CCP was identified manually, the same result was obtained as that by the 1D-CNN. In other words, the 1D-CNN actually achieved the correct recognition. Similarly, in Fig. 9b, a US pattern was identified, but actually it should be UT pattern. Therefore, some misclassification in Fig. 7c was partly caused by dirty data in the dataset, not real errors. As shown in the last row in Table 6, after removing the dirty data from the dataset, the 1D-CNN had the CRR of more than 99.30%.

Fig. 9
figure 9

Examples of dirty data in the datasets for a control chart with a UT label, and b control chart with a US label

Besides, in most literature, there were more sampling points per control chart sample, about 60 sampling points, which could ensure a larger difference between different CCP types; moreover, the trend patterns could become more evident with the time, helping the achievement of a high recognition rate. (In this case, the recognition rate of the proposed method was over 99.73%.) In this work, the number of sampling points per control chart sample was 25 to detect the abnormalities in the manufacturing process more timely and reduce the loss of manufacturing enterprises. However, the patterns could become very similar, especially trend pattern and shift pattern, which would make the recognition even more difficult.

Application in production environment

In order to further illustrate the proposed methodolog, a control chart diagnostic system including Nelson rules and 1D-CNN has been developed and applied to the monitoring of a real dataset from the production environment. The diameter of parts is regarded as the key quality characteristic, which can be measured with a three-coordinates measuring machine.

As shown in Fig. 10, in the analysis of control charts by Nelson rules, only a small number of second-type anomalies (9 continuous points on the same side of the central line) were found, and the process was basically stable and controlled. At the same time, the 1D-CNN method was implemented for a moving window on control charts. Like the number of sampling points in simulation experiments, the window size of CCPR was also set as 25. Unlike the results of Nelson rules method, the 1D-CNN can identify all 5 unnatural CCPs, as shown in Fig. 11.

Fig. 10
figure 10

CCPR result for real data based on Nelson rules

Fig. 11
figure 11

Unnatural CCPs detected from real data by 1D-CNN

Through communication with enterprise engineers, it was found that tool status and equipment exchange may lead to the emergence of these unnatural CCPs in the processing process. The 1D-CNN can accurately identify unnatural CCPs that Nelson rules can not recognize. This means that the 1D-CNN trained with simulation dataset still performed well in recognizing the real dataset from the production environment. In addition, Nelson rules and 1D-CNN recognition results are complementary, and the use of both can make the recognition system more perfect.

Discussion

The network structural parameters used in the comparative experiment are shown in Table 2. After many epochs of training, the final recognition rate varied a little, and the training time of each epoch differed slightly. Changing the number of feature maps, we found out that if their number was too small, the recognition rate decreased significantly, indicating that the number of features extracted by the 1D-CNN was insufficient. On the other hand, when the number exceeded a certain value, the difference in the recognition rate with the increase in parameter D was not obvious, which indicated that the number of features extracted by the 1D-CNN was enough to describe the raw data, but the training time increased significantly, decreasing the efficiency. Under various structural parameters, the convergence rate of the MSE was different, but the final convergence results were close. As shown in Fig. 5, test No. 1 converged at the 20th epoch but tests No. 9 and No. 12 converged at the 40th epoch. In the first convolutional layer, the convolutional kernel of a smaller size converged fast. To sum up, the structural parameters of the 1D-CNN had a little effect on the final recognition rate but had a significant impact on the training time. Therefore, to achieve higher recognition rate, the number of feature maps should be ensured to be enough.

According to the results given in Table 4 and Fig. 6, it is obvious that the 1D-CNN with ReLU and Softmax functions performed better than with the Sigmoid function. Both the CRR and the convergence speed improved significantly. Thus, the ReLU activation function was more suitable for the 1D-CNN because the gradient dissipation problem in deep learning could be effectively solved.

In Fig. 7, it is shown that there were different numbers of misclassification of the two recognition algorithms. However, the performance of the 1D-CNN was significantly better than that of the MLP. In the case of the MLP and raw data, the recognition results showed the tendency of misclassification the NOR patterns with the CYC patterns. At the same time, there was also a severe misclassification between the UT and US patterns, and between the DT and DS patterns. The misclassification of the NOR patterns with CYC patterns was about 10.8%, and of the CYC patterns with the NOR patterns about 18.8%. Such a result can be considered as a very bad result, which means that the Type I errors (false alarms) and Type II errors (missed disturbances) of this CCPR system were severe. In addition, the misclassification between the NOR and CYC patterns was effectively improved when the feature set was used as MLP input, which proved the effectiveness of the feature-based method, but the CRR was still unsatisfactory. In contrast, the NOR and CYC patterns were accurately identified by the 1D-CNN, and the CRR was high, it was 98.33% exactly. Other CCP types also had lower error rates. More importantly, there was no misclassification between the unnatural patterns and NOR patterns. In other words, there were no Type I and Type II errors in this CCPR system. The experimental results given in Table 5 show that the 1D-CNN performance decreased significantly when the number of samples was small compared with the traditional MLP method, which is typical for deep learning. However, the recognition rate of the 1D-CNN was still higher than that of the MLP. In addition, the iteration time of the proposed method was much smaller than that of the MLP, regardless of the number of samples. When the number of samples was large, the 1D-CNN still achieved good training efficiency.

In the CCPR process, extracting a large number of features improved the recognition accuracy of the CCPs, but it also increased the quality control workload. Moreover, the feature extracted is becoming more and more complicated. Although the features were specially designed by the experts for the CCPR, they could not be guaranteed to be the best ones. Also, the shape features were specially designed for the CCPR, which led to their poor versatility in other fields. In contrast, there were no such problems in the feature learning process. The raw data of the CCPs was continuously transformed by the convolution and pooling layers to get the feature set, and the network weights and biases were optimized by the learning algorithm to minimize the cost function. Although the features extracted by the neural network did not have any apparent physical meaning like the shape and statistical features did, it could be ensured that the feature set obtained by the neural network was the best for the recognition task. The results presented in Fig. 8 also prove this. Namely, the method based on the 1D-CNN and raw data achieved higher recognition rate than the method based on the MLP and traditional feature set due to the high-quality feature set obtained by the 1D-CNN. In addition, the raw data was fed directly to the network input, which not only reduced the workload but also provided strong generality in other fields.

Conclusion

In this paper, a 1D-CNN is proposed for feature learning and realization of an end-to-end CCPR. Based on the obtained results, the following conclusions can be drawn. (1) Although there were many network structural parameters, parameters selection was not complicated. When the number of feature maps was enough, good recognition accuracy was obtained regardless of the convolutional kernel size. (2) The ReLU activation function was more suitable for deep learning, and it significantly improved the 1D-CNN recognition accuracy and convergence speed. (3) The feature set learned by the proposed 1D-CNN was superior in quality over the one extracted manually. When the raw CCP data was used as an input, the patterns were correctly recognized by the proposed 1D-CNN with the recognition accuracy of 98.33%. The identification accuracy, convergence speed, and iteration time of the proposed model were significantly better than of the traditional MLP model.

In summary, the proposed 1D-CNN avoids the problem of extracting all kinds of complex features, making it more conducive to the practical quality control and helping to improve the automation and intelligence level of quality management in enterprises. In the future, an on-line prediction and diagnosis system of control chart based on a variety of deep learning algorithms will be studied.