1 Introduction

With the emergence of industry 4.0 and “Internet + manufacturing,” higher requirements are put forward for the cutting system. As an important part of the machining and manufacturing process such as turning and milling, the tool will inevitably be worn in the actual machining [1]. Tool wear state has a great influence on the machining quality, machining efficiency, and cutting stability. If serious tool wear condition cannot be found in time, it may cause interruption of the cutting process and even cause machine tool failure, which seriously reduces the machining efficiency and increases the machining cost [2, 3]. Relevant statistics show that an accurate and reliable online monitoring system can increase the cutting speed by 10 to 50% and save the machining cost by 10 to 40% during the machining process [4]. Therefore, online accurate identification of tool wear state in cutting is an effective way and inevitable trend to improve machining quality, improve machining efficiency, and ensure the efficient and stable operation of manufacturing system.

Tool wear status changes irregularly in the machining process, so an accurate, reliable, efficient, and stable tool wear monitoring system is needed to accurately identify the tool wear status. As an extremely important key technology in the machining field, tool wear monitoring technology is mainly divided into direct and indirect monitoring methods [5]. Since the direct monitoring method is limited by the machining conditions, resulting in low efficiency, the indirect monitoring method of tool wear is widely used at present.

In recent years, in order to better realize the online monitoring of tool wear status, many scholars at home and abroad have used the time, frequency, and time–frequency domain features of monitoring signals such as cutting force, vibration, and sound to establish identification models such as neural network, hidden Markov and support vector machine (SVM), and conducted extensive research on tool wear status monitoring.

Xu et al. [6] established a BP neural network-based wear classification model using the frequency domain features of acoustic emission and vibration signals, and the outcomes showed that the model can effectively monitor the tool wear status. Pandiyan et al. [7] developed a classification prediction model based on GA and SVM based on the time and frequency domain characteristics of force, vibration, and acoustic emission signals in grinding, and found that the model has high prediction accuracy. Liao et al. [8] proposed a tool wear state recognition method based on gray wolf algorithm optimized SVM based on the time, domain, and time–frequency domain characteristics of cutting force signal, and found that the classification accuracy is high. Li and Liu [9] presented a tool wear state prediction method using improved hidden Markov model based on the time and time–frequency domain features of cutting force signals. Kong et al. [10] established a wear state recognition model combining support vector machine (SVM) and whale optimization algorithm (WOA) by using the three domain features of the force signal.

SVM has the performance advantages of simple structure, strong generalization ability, and fast running speed, and it is widely used in linear and nonlinear classification and recognition [11]. The identification process of different tool wear states is actually a nonlinear classification and recognition problem. Therefore, this study uses nonlinear classification SVM to carry out the research on the subsequent wear classification and identification. However, the SVM classification and recognition performance mainly depend on two hyperparameters (i.e., penalty parameter c and the kernel parameter σ) [12]. In traditional methods, these two parameters were mainly selected based on subjective experience, and it is difficult to obtain optimal model parameters. To overcome the deficiencies in the traditional methods, the use of swarm intelligence optimization algorithms (gray wolf optimization (GWO), artificial bee colony (ABC), genetic algorithm (GA), etc.) can achieve SVM parameter optimization to a certain extent. However, some commonly used algorithms may fall into local optimization, overfitting, and low efficiency in the optimization process [13]. Thus, the WOA [14, 15] with strong global and local search capabilities is used for optimizing the hyperparameters of the SVM classification model to obtain the optimal solution of parameters in this study.

In view of the fact that milling is an intermittent cutting process, the multi-signals generated in machining are unstable and fluctuate greatly. Time and frequency domain features are sometimes affected by cutting conditions and unstable signals, resulting in the inability to provide more accurate and comprehensive information [16]. On the contrary, wavelet analysis in time–frequency domain analysis has obvious advantages in depicting nonstationary signals, and singularity analysis in wavelet analysis method has good application prospects for wear state identification [17, 18]. Singularity analysis method was first applied to boundary detection in images, and then due to its stability in characterizing nonstationary signal changes, it was gradually applied to state monitoring by related scholars [19, 20].

Mohanraj et al. [21] came up with a wear condition monitoring method based on end milling vibration signal HE index combined with machine learning, and found that the recognition accuracy was high. Zhou et al. [22,23,24] used singularity analysis of single force, vibration, and sound signals to monitor tool wear, respectively. Results showed that the method could be effective and improve manufacturing sustainability. Zhu et al. [25] estimated the tool state based on the singularity HE probability density function of micro-milling cutting force signal, and found that the proposed method has strong robustness. Tien et al. [26] established an online wear monitoring model combined with wavelet signal single point analysis of vibration signal HE index, and experiments verified the accuracy of the model.

Relevant scholars have laid a certain foundation for the research on tool wear state monitoring based on signal singularity analysis, but the related research mainly focuses on using the singularity feature of a single sensor signal to monitor the wear state, which cannot guarantee to provide more accurate and comprehensive feature information. Meanwhile, the research on wear condition monitoring based on the fusion of different signal singularity features is relatively rare.

In the actual machining, the synthesis of various cutting signal information can comprehensively and better reflect the tool wear state. Therefore, this study proposed a method based on WOA-SVM with statistical feature fusion of multi-signal singularity to monitor the tool wear state. Based on the statistical characteristics of HE index of multi-sensor signal singularity, the random forest algorithm was used to screen the multi-signal features that were relatively important to the change of tool wear. Kernel principal component analysis (KPCA) feature fusion algorithm was used to reduce the dimension of the features, and the fusion statistical features were used as the input of WOA-SVM wear classification model to identify the tool wear status. The effectiveness and feasibility of the method proposed were verified by wear experiments. The proposed framework for online recognition method of tool wear status is depicted in Fig. 1.

Fig. 1
figure 1

Framework for online recognition method of tool wear status

2 Method proposed

Tool wear status changes with the process in machining. Relevant studies have shown that the changes of different tool wear states are closely related to the changes of the signal waveforms. Cutting force and vibration signals are considered to be the most sensitive detection signals to tool wear changes [27, 28]. It can characterize the different tool wear states effectively, being more suitable for the online recognition and monitoring of tool wear. The relationship between the cutting force and vibration signal waveform and wear status is shown in Fig. 2. It can be found that the signal waveform exhibits singularity or disorder in the machining, which can be quantitatively estimated by the Lipschitz index (i.e., HE index) [23, 25].

Fig. 2
figure 2

Relationship between the signal waveform and tool wear status

2.1 Estimation of HE index of multi-signal singularity

The HE index is a useful index for evaluating signal singularity, and the method of calculating HE using the wavelet transform modulus maxima is gradually used in machine fault diagnosis and condition monitoring [26, 29]. The theoretical calculation process is as follows:

Assuming that f(t) is HE α (α ≥ 0) at point v ∈ R, if there is a constant A > 0 and m = [α] degree polynomial pv, such that:

$$\left|f(t)-{p}_{v}(t)\right|\le A{\left|t-v\right|}^{\alpha } \forall t\in R$$
(1)

where the upper bound is determined by the index α. Parameter α provides the HE of the function f(t) at t = v. If the HE index α0 of f(t) satisfies n < α < n + 1, f(t) is nth differentiable. Its n derivative f(n)(t) is singular at point v, then it can be said that α0 describes the singularity.

According to relevant research, the HE index can be obtained by estimating the maximum wavelet transformation module (MWTM) and the decay on the time scale plane. By setting the partial derivative of wavelet transform of signal f(t) at u to zero, the local extreme value along the scale is obtained [22, 26].

$$\frac{\partial WTf(u,s)}{\partial u}=0$$
(2)

Along the modulus maximum line, the wavelet coefficients have the following scaling behavior around t [26]:

$$A\left|WTf(u,s)\right|\le {s}^{(\alpha +\frac{1}{2})}$$
(3)

where α represents HE. By taking the discrete scale s = 2j along the modulus maximum line, the wavelet coefficients can be expressed as Eq. 4.

$${\mathrm{log}}_{2}\left|W{T}_{{2}^{j}}f(u,s)\right|\le {\mathrm{log}}_{2}A+j(\alpha +\frac{1}{2})$$
(4)

A and α can be calculated by Eq. 4. The function connects the wavelet scale j and HE, representing the relationship between the MWTM and the wavelet scale j.

2.2 WOA

The whale optimization algorithm (WOA) is an emerging swarm intelligence optimization algorithm proposed based on the hunting behavior of humpback whales, mainly including three parts: surrounding prey, bubble attack, and random search for prey [15, 30]. In the stage of surrounding the prey, the prey position is the target position, and the individual whale will move to the target position. The mathematical expression is shown in Eqs. (5)–(8).

$$\overrightarrow{D}=\left|\overrightarrow{C}\cdot \overrightarrow{{X}^{*}}\left(t\right)-\overrightarrow{X}\left(t\right)\right|$$
(5)
$$\overrightarrow{X}\left(t+1\right)=\overrightarrow{{X}^{*}}\left(t\right)-\overrightarrow{A}\cdot \overrightarrow{D}$$
(6)
$$\overrightarrow{A}=2\overrightarrow{a}\cdot \overrightarrow{r}-\overrightarrow{a}$$
(7)
$$\overrightarrow{C}=2\cdot \overrightarrow{r}$$
(8)

where t is the number of current iterations, and \(\overrightarrow{A}\) and \(\overrightarrow{C}\) are coefficient vectors. \(\overrightarrow{{X}^{*}}\) represents the position vector of the prey so far, \(\overrightarrow{X}\) is the position vector of other search agents, \(\overrightarrow{a}\) is the coefficient vector in the iterative, and \(\overrightarrow{r}\) is a random vector between [0, 1].

Bubble attack behavior establishes a spiral equation between the position of whale and prey to simulate the predator–prey mechanism of whale bubble net. The expression is shown in Eqs. (9) and (10).

$$\overrightarrow{X}\left(t+1\right)=\overrightarrow{{D}^{^{\prime}}}\cdot {e}^{bl}\cdot \mathrm{cos}\left(2\pi l\right)+\overrightarrow{{X}^{*}}\left(t\right)$$
(9)
$$\overrightarrow{{D}^{^{\prime}}}=\left|\overrightarrow{{X}^{*}}\left(t\right)-\overrightarrow{X}\left(t\right)\right|$$
(10)

where b is the logarithmic spiral shape constant, \(\overrightarrow{{D}^{^{\prime}}}\) represents the distance between the current whale and prey, and l is a random number between [− 1, 1].

In order to determine which mechanism in the bubble attack the whale moves to the prey position, a probability p = 0.5 generated randomly between [0, 1] is used alternately to determine the way to update the position of the search particle. The mathematical expression is shown in Eq. (11).

$$\overrightarrow{X}\left(t+1\right)=\left\{\begin{array}{lc}\overrightarrow{{X}^{*}}\left(t\right)-\overrightarrow{A}\cdot \overrightarrow{D} \, p& p<0.5\\ \overrightarrow{{D}^{^{\prime}}}\cdot {e}^{bl}\cdot \mathrm{cos}\left(2\pi l\right)+\overrightarrow{{X}^{*}}\left(t\right) \, p& p\ge 0.5\end{array}\right.$$
(11)

In the stage of random search for prey, the WOA algorithm judges to enter the stage of random search for prey according to the coefficient vector |A| value greater than 1. At this time, whales perform random global searches based on each other’s location, rather than prey location. The mathematical modeling of this behavior is shown in Eqs. (12) and (13).

$$\overrightarrow{D}=\left|\overrightarrow{C}\cdot \overrightarrow{{X}_{rand}\left(t\right)}-\overrightarrow{X\left(t\right)}\right|$$
(12)
$$\overrightarrow{X}\left(t+1\right)=\overrightarrow{{X}_{rand}\left(t\right)}-\overrightarrow{A}\cdot \overrightarrow{D}$$
(13)

where \(\overrightarrow{{X}_{rand}}\) is the position vector of selected individual whale randomly, and \(\overrightarrow{D}\) represents the random distance between the prey and individual whale.

2.3 SVM classification algorithm

The basic idea of SVM classification is to nonlinearly map the low-dimensional data in the original space to the high-dimensional feature space, and find an optimal hyperplane to realize the classification problem of data samples. Supposing that the classification sample data is T = {(x1, y1) (x2, y2),…, (xi, yi)}, i = 1, 2,…, N, where xi ∈ Ra is a real vector and yi ∈ {− 1,1} is a category label.

The original problem to be solved by support vector machine is shown in Eq. (14).

$$\left\{\begin{array}{c}\underset{\omega ,b,{\xi }_{i}}{\mathrm{min}}\left[\frac{1}{2}{\Vert \omega \Vert }^{2}+c\sum_{i=1}^{N}{\xi }_{i}\right]\\ {\text{s}}.t.\left\{\begin{array}{c}{y}_{i}\left(w\phi ({\text{x}}_{\text{i}})+b\right)-{{y}^{^{\prime}}}_{\text{i}}\ge 1-{\xi }_{i}\\ {\xi }_{i}\ge 0; \, c\ge 0 \, i=\mathrm{1,2},\cdots ,N\end{array}\right.\end{array}\right.$$
(14)

where c is the penalty parameter, \(\xi\) is slack variable, and \(\phi ({x}_{i})\) is nonlinear mapping.

The Lagrange function is introduced into Eq. (14), and converted into dual form, as shown in Eq. (15) [10].

$$\left\{\begin{array}{c}\underset{{\alpha }_{i}}{\mathrm{max}}\left[\sum_{i=1}^{N}{\alpha }_{i}-\frac{1}{2}\sum_{i\text{,}=1}^{N}\sum_{{\text{j}}=1}^{N}{\alpha }_{i}{\alpha }_{j}{y}_{\text{i}}{y}_{j}K({\text{x}}_{\text{i}},{x}_{j})\right]\\ {\text{s}}.t.\sum_{i=1}^{N}{\alpha }_{i}{y}_{\text{i}}=0 \, (0\le {\alpha }_{i}\le c;i=\mathrm{1,2},\cdots ,N)\end{array}\right.$$
(15)

where K(xi, xj) represents the kernel function. \({\alpha }_{i}\) is a Lagrange multiplier.

Assuming that \({{\alpha }^{*}}_{i}\) is the optimal solution of the dual problem, the optimal classification function of SVM under nonlinear conditions can be obtained, as shown in Eq. (16).

$$f(x)=sign\left[\sum_{{\text{j}}=1}^{N}{{\alpha }^{*}}_{i}{y}_{\text{i}}K({\text{x}}_{\text{i}},{x}_{j})+{b}^{*}\right]$$
(16)

2.4 WOA-SVM classification model

In the process of WOA algorithm optimizing SVM, the model parameters (c, σ) together constituted search particles. The parameters of WOA-SVM classification and recognition model were trained and optimized based on the training dataset, and the training dataset was K-fold cross validated. The average recognition accuracy of the K times of test results in the process of WOA optimization was taken as the fitness of the search particles. To reflect the advantages of WOA-SVM classification model in the process of parameter optimization, the maximum iteration number was used as the algorithm termination standard in this study. When the iteration is over, the optimal search particle (c*, σ*) corresponding to the maximum fitness value is obtained, and the WOA-SVM classification and recognition model are established together with the training data. Since the radial basis function had obvious advantages in program simplification and generalization, this research adopted the radial basis function as the SVM kernel function, and the representation is shown in Eq. (17) [12, 31].

$$K\left({x}_{i},{x}_{j}\right)=\mathrm{exp}\left(-\frac{1}{2}{\left(\frac{\Vert {x}_{i}-{x}_{j}\Vert }{\sigma }\right)}^{2}\right)$$
(17)

In order to fully reflect the accuracy and effectiveness of the WOA-SVM classification and recognition model based on signal singularity feature fusion, the complete statistical features and the fusion features obtained by different feature dimension reduction fusion techniques of HE index of multi-signals were used as the input of WOA-SVM classification model to identify different tool wear states. Meanwhile, it was compared with some commonly used optimization algorithms (gray wolf optimization (GWO), artificial bee colony (ABC) and genetic algorithm (GA), etc.) to optimize SVM recognition methods. Due to the randomness of the parameter optimization of swarm intelligence optimization algorithm, in order to reliably and quantitatively evaluate the accuracy and effectiveness of the wear recognition method proposed, the classification and recognition programs of SVM optimized by different swarm intelligence optimization algorithms were run for 20 times. The average classification accuracy and the training time of the model were used as indicators for evaluation.

3 Milling experiment

3.1 Experimental condition

To verify the accuracy and effectiveness of the method proposed, the wear experimental data of ball end carbide milling cutters obtained on high-speed CNC machine tools under dry milling conditions were used [32, 33]. The experimental acquisition platform is shown in Fig. 3. The cutting parameters in machining are shown in Table 1. A Kistler dynamometer was installed between the table and the workpiece to measure the cutting force signal. Meanwhile, Kistler piezoelectric accelerometers were deployed on the workpiece to gather vibration signal. Cutting force and vibration signals in different directions were collected by Ni DAQ data acquisition card at a continuous sampling frequency of 50 kHz.

Fig. 3
figure 3

Experimental acquisition platform

Table 1 Cutting parameters in machining

During the experiment, the flank wear was measured by the LEICA MZ12 microscope. In each experiment, tool wear was measured offline after 315 cuts. Since the C1, C4, and C6 milling cutters had complete wear data in the entire cutting process, the C1, C4, and C6 datasets were selected to validate the method proposed in this study.

4 Results and discussion

4.1 Tool wear clustering

The average flank wear corresponding to different cutting times of C1, C4, and C6 milling cutters was obtained in the experiment, as shown in Fig. 4a. It can be observed that the overall trend of the wear curve of each milling cutter is basically the same, which roughly goes through three stages: initial wear, normal wear, and severe wear. The traditional method of calibrating different stages of tool wear based on experience has strong subjectivity. For overcoming the shortcomings of traditional methods, K-means clustering algorithm [34] was adopted in this study. With tool wear as the observation value and category as the hidden variable, unsupervised cluster analysis was carried out for different tool wear stages. The number of cluster categories was 3. Due to the randomness of the initial center point selection of the clustering algorithm, there may be slight differences in each clustering result. In order to reduce accidental errors, clustering repeated was performed on the worn samples, and the average of the multiple clustering results was used as the final clustering result. The wear clustering results of different milling cutters are shown in Fig. 4b–d.

Fig. 4
figure 4

Clustering results of different wear stages of milling cutters

4.2 Multi-signal denoising and HE index calculation

Due to the interference of machining environment and machine tool vibration, the original multi-signals collected will inevitably be polluted by noise in the experiment. When characterizing the signal singularity quantitatively, the HE index generated by noise will lead to inaccurate calculation results of signal HE index. Therefore, signal noise reduction preprocessing is an important foundation to obtain relatively accurate quantitative characterization of multi-signal waveform. Some commonly used signal denoising methods (wavelet default threshold filtering, bandpass filtering, etc.) can remove noise effectively, but there may be the problem of removing some important information from the original signal. Therefore, this paper adopted a noise reduction algorithm based on MWTM to denoise the signal. Since the MWTM value of a noise-dominated point decreases significantly with increasing scale, the point is set to 0 and the effective signal is reconstructed using Mallat’s method [26, 29].

The type of wavelet basis function has an important influence on the effective signal denoising, so the wavelet basis need be selected according to the characteristics of multi-signals. Reasonable selection of wavelet bases is helpful to retain the useful information in the original signal. The selection of wavelet bases is closely related to the vanishing moment. At present, the function with continuous differentiability, vanishing moment, symmetry, or antisymmetry is mainly selected as wavelet bases [24, 29].

Combined with the research results of relevant literatures and repeated experiments on signal wavelet basis selection, the Gaussian function was selected as the signal wavelet basis in this study. Since its N-order derivative function has the characteristics of N-order vanishing moment, it can evaluate the influence of different quantities of vanishing moments on the noise reduction of signal MWTM effectively. Generally, the number of MWTM increases with the number of vanishing moments at a given scale. In consideration of computational efficiency, it is particularly important to select wavelet bases with appropriate vanishing moments for signal denoising [35].

On the basis of experiments, the vibration and cutting force signals of the 220th cutting cycle of the milling cutter were denoised based on the MWTM noise reduction algorithm. At the same time, to verify the effect of denoising, the frequency spectrum of the denoised samples was analyzed, as shown in Figs. 5 and 6, respectively.

Fig. 5
figure 5

Different noise reduction methods for Vx signal

Fig. 6
figure 6

Different noise reduction methods for Fx signal

By comparing Fig. 5a–d, it can be found that the signal waveform denoised by the wavelet default threshold has changed significantly compared with the original vibration signal. Compared with the denoising method based on MWTM, more useful information of the original signal is lost. The MWTM denoising algorithm based on the wavelet base with two vanishing moments of the second derivative has good noise reduction effect and smooth signal. Compared with Fig. 5d, e, it can be observed that the smoothness of denoised signals using three vanishing moment wavelet bases is basically the same as that of the two vanishing moments. Therefore, the Gaussian function with second derivative is suitable to be used as the wavelet base for denoising the MWTM of the vibration signal.

From Fig. 6c–e, it can be found that the cutting force signal is denoised based on the wavelet bases of 1, 2, and 3 vanishing moments, and the effect of signal smoothness is almost the same. Therefore, the Gaussian function with the first derivative was used as the wavelet base for the MWTM denoising of the cutting force signal. According to the spectrum analysis results in Figs. 5 and 6, it can be found that the energy of vibration signal and cutting force signal is mainly concentrated near the tool tooth passing frequency of 520 HZ and its integer multiples [26]. The formula is shown in Eq. (18).

$$TPF=\frac{Nn}{60}$$
(18)

where N represents the spindle speed, N = 10,400 r/min, and n is the tooth number of the tool, n = 3.

Since both HE index estimation and signal denoising need to calculate WTMM, the WTMM noise reduction algorithm is integrated with the HE index calculation process [23, 26]. As the number of WTMM of noise decreases with the increase of scale, relatively large scale is usually used in noise reduction algorithm to highlight useful signals, and the scale value is usually 4–5 [22, 24]. By setting the T threshold of the MWTM at the largest scale, it is used to exclude the MWTM generated by noise, and the MWTM smaller than T will be eliminated. T threshold formula is shown in Eq. (19).

$$T=M{\mathrm{log}}_{2}\left(\frac{1+2\sqrt{{P}_{N}}}{Z+j}\right)$$
(19)

where PN is the size of the noise; M is the maximum value of the MWTM, Z is a constant and set to 2 [25], and j is the maximum value of the scale s = 2j, j = 0, 1, 2, …, j.

4.3 Statistical feature extraction

Based on the wear experiment and clustering results, the vibration signal Vx and cutting force Fx representing five rotation cycles of different wear stages (cutting 50th, 150th, and 300th) were selected to calculate the HE index value, and the probability density function was calculated for HE index. The results are depicted in Figs. 7 and 8, respectively.

Fig. 7
figure 7

HE index estimation of vibration and cutting force signals

Fig. 8
figure 8

HE exponential probability density of vibration signal and cutting force signal

From Figs. 7 and 8, it can be found that the probability density distribution of HE index of Vx and Fx at different wear stages of the tool basically conforms to the form of single or double front normal distribution. At the same time, the state of probability density distribution changes with the change of tool wear state gradually, and the cutting force and vibration signals in other directions also have the same situation. Due to the large overlap of the probability density function distributions of different tool wear states, only using the basic parameters of the mean and standard deviation (Std) of the Gaussian distribution is not enough to characterize the probability distribution information of different tool states fully and effectively. Since the probability density distributions of different tool wear states are different in shape and range, we extracted the maximum (Max), minimum (Min), skewness (Ske), kurtosis (Kur), variance (Var), range (Rng), and standard error (SE) features of HE to represent the tool state comprehensively. When the sample number is the same, the number of signal singular points changes with different tool wear states. Thus, the HE number (HENo.) can also characterize the change of tool wear. In this article, 10 features were extracted from each milling force and vibration sensor signal, a total of 60 statistical features.

Taking C1 dataset as an example, for avoiding the influence of magnitude differences between different feature factors and laying a foundation for feature fusion below, this study adopted the Min–Max normalization method to normalize the different features of multi-signals, as shown in Fig. 9. It can be found that with the progress of machining, the characteristics of cutting signals have obviously different trends with the different stages of tool wear state. Taking the Vx signal mean feature in Fig. 9a as an example, it can be observed that the mean feature in the signal characteristics can show an overall increasing trend with severe tool wear, while some other features show a different changing trend.

Fig. 9
figure 9

Normalization results of different features of multi-signals in C1

4.4 Feature screening and fusion

There are some features which are irrelevant to the tool wear state or redundant in the statistical features of signal singularity. If the wear identification model is directly established without feature screening and fusion, it is easy to cause problems such as overfitting and dimensional disaster, which will affect the recognition effect of wear classification. Through feature selection and dimension reduction fusion, the useless information of tool wear can be removed and more relevant information can be retained. It can reduce the complexity of building the model and characterize the tool wear state effectively [36]. Based on the above considerations, in order to screen out the features which are relatively important to the change of tool wear state, the random forest algorithm [37] was used to sort the importance of signal features. The number of trees was 300, and the cross validation was fivefold. Taking Vx and Fx signal features as examples, the importance rank results of each feature are shown in Fig. 10.

Fig. 10
figure 10

Importance rank of signal features

To improve the accuracy of tool wear status recognition, the first three statistical features of multi-signal HE importance rank were selected for dimension reduction fusion based on the consideration of calculation amount and efficiency. In this study, KPCA was used for feature dimension reduction fusion, and its fusion features are used as the input of classification model to identify tool wear status. Meanwhile, it was compared with some commonly used dimension reduction fusion algorithms (LLE and ISOMAP). Singular value decomposition was performed on the screened features, and the number of principal component features was estimated through the cumulative contribution rate of eigenvalues of the covariance matrix, as shown in Fig. 11. It can be found that the first 7 principal component features retain more than 85% of the contribution rate. Usually, when the cumulative contribution rate of the features exceeds 80%, it has a certain effectiveness. Therefore, the first seven principal component features were selected to characterize tool wear in this study. LLE and ISOMAP algorithms were used to reduce the dimension of the selected features, and the fused features were selected for tool wear recognition.

Fig. 11
figure 11

Dimension reduction by feature decomposition

4.5 Recognition results and analysis

According to the tool wear clustering results in Sect. 4.2, the variables “1,” “2,” and “3” were used as output labels for the initial, normal, and severe wear stages of the tool. The wear datasets of any two milling cutters of C1, C4, and C6 were selected as the training sets, and tenfold cross validation was carried out. Then, 20 groups of initial wear, 60 groups of normal wear, and 60 groups of severe wear data samples were randomly selected from the different wear stages of the third milling cutter wear dataset as the testing set. The data combination of training set and testing set composed of samples of different milling cutter datasets is described, as shown in Table 2.

Table 2 Data combination description of training set and testing set

The initial ranges of the relevant parameters C and σ of the SVM model are set as follows: C ∈ [10−5, 105], σ ∈ [10−5, 105]. Relevant parameters of swarm intelligence optimization algorithm are set, as shown in Table 3.

Table 3 Relevant parameter information of swarm intelligence optimization algorithm

All programs in this section were calculated on AMD R9-5900HX 3.3-GHz CPU processor (16.0-GB RAM). Taking D1 data combination as an example, the WOA-SVM classification and recognition results of different states of tool wear based on multi-signal complete feature set (CFS) and fusion features of different datasets are shown in Fig. 12. After repeated operations, the classification accuracy and modeling run time of wear states of each data combination based on different methods were obtained, as shown in Fig. 13.

Fig. 12
figure 12

Classification and identification results of wear state in D1 data combination

Fig. 13
figure 13

Classification accuracy and modeling run time of wear states of each data combination based on different methods

From Fig. 12, it can be found that among all the methods for classifying and identifying the wear state of the D1 data combination, the KPCA + WOA-SVM method based on the signal singularity feature has the least number of misclassified samples of the wear state and the best classification effect. However, the wear state of the CFS + WOA-SVM method has the largest number of misclassified samples, and the classification effect is poor.

According to Fig. 13, It can be found that the classification accuracy and modeling run time of WOA-SVM wear classification method using different fusion features are better than those of CFS + WOA-SVM method in all data combinations (D1, D2, D3). The improved classification accuracy of WOA-SVM model and the shorter modeling run time fully reflect the advantages and importance of feature fusion in the process of wear status classification and recognition. The main reason is that there are some irrelevant and redundant features in the complete feature set, which increases the difficulty and complexity of model modeling, resulting in a long modeling time. Meanwhile, the overfitting phenomenon may occur in the modeling process of WOA-SVM, which reduces the classification performance of the model. Feature screening and dimension reduction fusion can remove some useless features and reduce the feature dimension. To a certain extent, it can reduce the difficulty of modeling and avoid the occurrence of overfitting in the model training. The performance of model classification and recognition is improved, which shortens the modeling run time and improves the accuracy of classification and recognition.

Comparing the classification accuracy of each data combination of the three different fusion feature methods in Fig. 13a, it can be found that KPCA + WOA-SVM has the highest classification accuracy compared with ISOMAP + WOA-SVM and LLE + WOA-SVM methods. It can be shown that the performance of KPCA dimension reduction fusion feature used in this study is better than that of ISOMAP and LLE, which fully proves the effectiveness and feasibility of the KPCA + WOA-SVM method proposed. According to the wear classification accuracy using the same method in different data combinations, it can be found that the classification accuracy of D1, D2, and D3 based on the same method is different. The main reason may be that there are some differences in data quality between training set and testing set in different data combinations. And because different milling cutter datasets have different sample numbers in the same wear stage, the distribution of training set and testing set is different.

By classifying and identifying the wear states of D1, D2, and D3 data combinations, the performance of WOA-SVM classification model based on KPCA fusion features was compared with that of SVM classification model optimized by other common optimization algorithms. The classification accuracy and modeling run time of different classification models are shown in Table 4. It can be found that compared with other forms of SVM classification models, the WOA-SVM classification model has the highest classification accuracy in D1, D2, and D3.

Table 4 Performance evaluation of SVM classification methods optimized by different swarm intelligence algorithms

It is also found that the modeling run time of WOA-SVM classification model is less than that of SVM classification model optimized by other swarm intelligence optimization algorithms, but greater than that of SVM classification model. The main reason is that SVM model does not optimize the parameters in the process of classification and recognition. Therefore, the modeling run time is the shortest, but it also leads to the worst classification performance of the model. Results show that WOA algorithm has strong parameter optimization ability and high optimization efficiency, establishing WOA-SVM model with good classification performance. The model has strong generalization ability, and it can fully mine the relevant information between fusion features and wear.

Considering the accuracy of wear classification and the running time of modeling, the KPCA + WOA-SVM method based on signal singularity proposed has good advantages and feasibility in tool wear classification and recognition in this article. The research results are of great significance for accurate identification of tool wear status, timely replacement of tools, improvement of machining quality, and guarantee of safe and stable operation of manufacturing system.

5 Conclusions

Accurate, reliable, efficient, and stable tool wear condition monitoring is essential for evaluating tool status, improving machining quality and efficiency, and ensuring the stability of manufacturing system. Based on the correlation information between the singularity statistical characteristics of cutting force and vibration signals and tool wear, this paper proposed a new method based on WOA-SVM with statistical feature fusion of multi-signal singularity to monitor tool wear state innovatively. The effectiveness and feasibility of the method proposed were verified by milling experiments. Some main conclusions can be drawn as follows:

  1. 1.

    A tool wear condition monitoring method based on WOA-SVM with multi-signal singularity feature fusion was proposed, which can fully mine the useful information between multi-signal singularity feature fusion and tool wear. Based on the limited experimental data, the tool wear status can be identified efficiently and accurately.

  2. 2.

    The K-means unsupervised clustering algorithm was used to overcome the traditional experience-based calibration of different tool wear stages, which was highly subjective. Random forest and KPCA algorithms were used for multi-signal singularity feature screening and dimension reduction fusion. To some extent, it can avoid the occurrence of overfitting in the model training. The model performance and of classification accuracy was improved.

  3. 3.

    The classification performance of different methods was evaluated based on the tool wear experimental datasets of the high-speed CNC machine tool. In the same data combination, compared with WOA-SVM method based on complete feature set and other dimension reduction fusion features, KPCA + WOA-SVM method had the best classification accuracy. Results showed that KPCA dimension reduction fusion features had the best performance. It was also noticed that different data combinations based on the same method have different classification accuracy, which may be caused by the differences in data quality and distribution between training set and testing set.

  4. 4.

    Compared with some commonly used optimization algorithms to optimize SVM classification model, WOA-SVM classification model had the highest classification accuracy in all data combinations. The modeling run time was less than that of SVM model optimized by other optimization algorithms except that it was greater than that of SVM model without parameter optimization. Results showed that WOA algorithm had strong parameter optimization ability and high optimization efficiency, establishing WOA-SVM model with strong classification performance and generalization ability.

The tool wear condition monitoring method proposed in this paper can provide an effective way for timely replacement and early condition warning of the tool in the actual machining, having a certain potential application value. However, the method was only studied under fixed cutting conditions in milling. For future work, the method will be applied in different machining methods and variable cutting conditions to demonstrate the practicality of the method. In addition, the swarm intelligence optimization algorithm and SVM structure will be improved to make the method more effective in the subsequent implementation.