1 Introduction

The wear of micro-grinding tool is a crucial factor that affects the processing efficiency, quality, and dimensional accuracy of micro-structure workpieces [1]. The tool wear prediction can effectively evaluate the tool service life and then guide tool replacement strategies to reduce cost and control quality [2, 3]. According to previous reports, the state signals generated during the machining process, such as cutting force signals, vibration signals, acoustic emission (AE) signals, and so on, or tool static geometry and morphology characteristic such as tool diameter and radius of cutting edge can be used to predict the tool wear by theoretical physical model or empirical model based on determinate factor relationship [4]. However, it appears challenging to achieve the micro-grinding tool wear prediction online integrating the processing state signals and tool static characteristic based on indeterminate factors relationship. These processing state signals and tool static characteristics exhibit multi-source heterogeneous data type. And collecting signal data during the machining process requires a large amount of cost [5, 6]. Therefore, it is more significant to predict the tool wear online using small-sample and multi-sourced heterogeneous data with the development of artificial intelligence technology.

The characterization of tool wear is essentially a problem of pattern state recognition, including acquiring the feature signals during the machining process, transforming the signals into dimensionality reduction, extracting the feature information that has a certain mapping relationship with the tool wear state, and forming feature vectors [7]. The characterization of tool wear can be mainly divided into direct and indirect ways. The direct way is to directly measure and observe the geometric shape, size, and morphological characteristics of the tool using machine vision or optical sensors. Banda et al. [8] illustrated the application and the significance of machine vision systems in tool wear monitoring and tool performance optimization. Zhang et al. [9] established an in-situ monitoring system based on machine vision to detect tool wear behavior. The indirect way is to detect the status information generated in the machining process, such as force, acceleration, and AE signals [10]. Zhou et al. [11] demonstrated the basic principles and key technologies of AE signal used for monitoring in sawing. Wang et al. [12] proposed an effectively AE signal processing method to monitor the damage mechanisms of monocrystalline silicon in ultra-precision grinding. Zhou et al. [13] presented a monitoring method to predict the tool wear using force and acceleration signals and it is effective in tool wear prediction with maximum error around 8 μm.

The prediction methods of tool wear proposed can be divided into two categories, which are physical model driven methods and data-driven methods [14, 15]. The physical-model-driven method is to build an explicit mathematical model for tool wear prediction through physical laws or mathematical models [16]. Li et al. [17] presented a new physics-informed meta-learning framework to predict tool wear under varying wear rates and validated the effectiveness of the method presented through a milling machining experiment. Qiang et al. [18] proposed a physics-informed transfer learning (PITL) framework to predict tool wear under variable working conditions and the experiment results showed highly prediction accuracy under variable conditions. The data-driven method is to collect tool wear status information through a variety of sensors and then use machine learning or deep learning methods to build a prediction model using the data after information processing and recognition [19]. Tool wear prediction is an important research field in intelligent manufacturing. Data-driven method is recognized as an effective means to achieve accurate prediction of tool wear, which provides a significant research direction for on-line tool wear monitoring problems [20, 21]. Wei et al. [22] established a genetic algorithm (GA-BP) neural network prediction model to predict the tool wear condition during high-speed milling and the relative error is within 5%. Chen et al. [23] proposed a tool wear prediction method based on back propagation neural network and compared the results with a long short-term memory (LSTM) model. The results show that the error was reduced by 29% and 25% as the input eigenvalues increased, respectively.

Bidirectional long short-term memory neural network (BiLSTM) is one of the data-driven tool wear prediction methods. As a derivative model of recurrent neural network (RNN), it inherits the advantages of RNN that can effectively represent the characteristics of time series data and avoids the problem that RNN is prone to produce gradient disappearance [24]. Tool wear is accumulated with the processing time, so it is necessary to mine the deep time series characteristics of data for tool wear prediction. Wu et al. [25] proposed a tool wear prediction model based on singular value decomposition and BiLSTM with cutting force signal. Ma et al. [26] established a wear prediction model based on convolutional BiLSTM and convolutional bidirectional gated recurrent unit using cutting force signals.

It is challenging to integrate multi-source heterogeneous data to characterize wear characteristics and predict micro-grinding wear, such as multi-source heterogeneous data fusion, mixed prediction model construction, and small sample cost prediction. To deal with the challenge, a novel prediction method based on BiLSTM is proposed to characterize and predict the wear combining AE signals and visual images using such data. The micro-channel grinding experiments are conducted by the electroplated diamond micro-grinding tool. The prediction results of BiLSTM model are compared with the experimental test results to verify the feasibility of BiSLTM model. Two derived models of GA-BiLSTM and LSTM are compared to verify the accuracy of BiLSTM model. Two machine learning models of BP and GA-BP are compared to verify the stability and superiority of BiLSTM model. Several abbreviations and notations are used in the paper and are described separately in Table 1.

Table 1 List of used abbreviations and notations

2 Tool wear prediction model

2.1 Principle of signal characterization

The state of cutting tools in actual machining process is influenced by various factors. The machine vision technology is widely used to observe the morphology and features of micro-grinding tool nowadays, which has the characteristics of high accuracy and convenience [27]. Moreover, the morphology of micro-grinding tool is shown in Fig. 1a. According to the wear form of micro-grinding tools and the changes in morphology, the wear situation can be directly characterized by the loss of the cross-sectional area of the micro-grinding head. The loss of the cross-sectional area of the micro-grinding head refers to the difference in the cross-sectional area of the grinding head before and after grinding, as shown in Fig. 1b. The calculation formula is as shown in Eq. (1).

$${\varphi}_i={C}_0-{C}_i$$
(1)

where the C0 represents the edge contour before micro-grinding. The Ci represents the edge contour after machining i channel micro-grooves. The φi represents the cross-sectional area loss of the grinding head after machining i channel micro-grooves. The coordinate unit is μm2.

Fig. 1
figure 1

Wear characterization method for micro-grinding tool. a The morphology of micro-grinding tool. b The measuring method for the loss of the cross-sectional area

Another way to characterize the wear of micro-grinding tool is AE signal. The AE signals during the grinding process originate from the interaction between abrasive particles and materials and the crystal structure during material removal, which rapidly release energy, generating transient elastic waves that manifest as AE signals [28]. Due to the differences in AE sources, signal time and frequency domain analysis can be used to distinguish different material removal behaviors. The process of AE detection is shown in Fig. 2. Firstly, the AE sensor is attached to the unmachined surface, which is used to convert the AE signal generated during the processing into a voltage signal. Secondly, the voltage signal can be amplified, filtered, and A/D converted through a preamplifier. Therefore, the AE signal can be quantitatively analyzed by the data acquisition system to derive the actual processing status based on the relevant feature parameters. It was found that there is a certain mapping relationship between the characteristics of AE signals and the external processing environment and the inherent properties of the workpiece itself [29]. However, the AE signals collected are influenced by external environment and the inherent characteristics of the machine tool, and the signal energy fluctuates up and down through spectrum analysis [30]. Therefore, it is more reasonable for characterizing the wear of micro-grinding tool using energy ratio instead of energy value.

Fig. 2
figure 2

The process of AE signal detection [30]

The experimental group with a micro-groove depth of 200 μm was taken as an example. The signal time domain characteristics are extracted by root mean square method and the frequency bands are divided by 4-layer wavelet packet decomposition and fast Fourier transform. The signal in the sampling interval is reconstructed through these signal processing methods, and the energy proportion diagram was drawn for each frequency band through energy proportion calculation, as shown in Fig. 3a. It can be seen that the signal energy in the sampling interval mainly comes from the low and medium frequency bands and they show a negative correlation relationship. With the grinding length accumulating, the total amount of material removal increases, the proportion of medium frequency band energy continuously increases and tends to stabilize, and the energy of low frequency band continuously decreases and tends to stabilize. Compare the proportion of medium frequency band energy with the corresponding radial wear of the micro-grinding tool, as shown in Fig. 3b. There is a positive correlation between the proportion of medium frequency band energy and the corresponding radial wear of the micro-grinding tool. Therefore, the proportion of medium frequency band energy is the most suitable characterization of micro-grinding tool wear.

Fig. 3
figure 3

Signal energy analysis in the sampling interval. a The energy proportion diagram for each frequency band. b The proportion of medium frequency band energy with the corresponding radial wear of the micro-grinding tool

2.2 Model design

Long short-term memory neural network (LSTM) is an improved model of RNN, which has the same input and output as RNN [31]. LSTM has the ability to extract temporal features and expand them according to time dimensions [32], as shown in Eq. (2).

$${h}^{(t)}=f\left({h}^{\left(t-1\right)},{X}^{(t)};\boldsymbol{\omega} \right)=f\left(f\left({h}^{\left(t-2\right)},{X}^{\left(t-1\right)};\omega \right),{X}^{(t)};\boldsymbol{\omega} \right)=\cdots ={g}^{(t)}\left({X}^{(t)},{X}^{\left(t-1\right)},{X}^{\left(t-2\right)},\cdots, {X}^{(2)},{X}^{(1)}\right)$$
(2)

where the h(t) represents the output of the network at time t, which corresponds to the signal features extracted at time t. The X(t) represents the input of the network at time t, which corresponds to the signal data collected at the time t. The ω Represents the corresponding parameters of network nodes. The function g(t) represents the output network calculated at time t based on all historical time series signals.

As shown in Fig. 4, the structure of a single LSTM unit body includes a forgetting gate, an input gate, and an output gate. The input of the unit body includes the input of the current time and the output of the previous time unit body and the output of the unit body serves as the input of the next unit body. The specific calculation process is as shown in Eq. (3).

Fig. 4
figure 4

Schematic diagram of LSTM unit structure

$$\left\{\begin{array}{l}f_t=\sigma\left({\boldsymbol W}_f\left(h_{t-1},X^t\right)+{\boldsymbol b}_f\right)\\i_t=\sigma\left({\boldsymbol W}_i\left(h_{t-1},X^t\right)+{\boldsymbol b}_i\right)\\{\overline C}_t=\tanh\left({\boldsymbol W}_C\left(h_{t-1},X^{(t)}\right)+{\boldsymbol b}_C\right)\\C^{(t)}=f_t\odot C^{\left(t-1\right)}+i_t\odot{\overline C}_t\\O_t=\sigma\left({\boldsymbol W}_O\left(h_{t-1},X^{(t)}\right)+{\boldsymbol b}_O\right)\\h^{(t)}=O_t\odot\tanh\left(C^{(t)}\right)\end{array}\right.$$
(3)

where the state of the unit C(t) is obtained by updating the forgetting gate ft, input gate i, output gate Ot, the previous hidden layer state h(t − 1), and the previous unit state C(t − 1) at the moment of t. The hidden layer state in the unit h(t) is updated based on the signal data input at the moment of t and the unit state C(t). The parameters Wf, Wi, WC, WO and bf, bi, bC, bO are obtained through model training and shared at all times. The n represents the number of hidden neurons. The T represents the time step, which is the length of real-time sequential data. The “⊙” represents the accumulation by element. The σ represents sigmoid activation function and the tanh represents the tanh activation function.

In response to the problem of insufficient early feature memory ability and difficulty in deep mining the temporal features of the entire network in LSTM when learning temporal features, BiLSTM is proposed to enhance the representation ability of data features and improve the accuracy of wear prediction. BiLSTM consists of two independent LSTM layers, which input temporal feature data in forward and reverse directions for feature extraction. The two output feature vectors are concatenated as the final output to achieve feature representation. The parameters of the two LSTM layers are independent of each other and the specific network structure is as shown in Fig. 5. In the prediction model, N = 90, which represents the sample size. Xk represents that the input in group k is a 4-dimensional feature vector composed of processing variables including micro-groove depth (μm) and micro-groove length (mm), AE signals including the proportion of medium frequency band energy (%) and visual images including the cross-sectional area of the micro-grinding head (μm2). Yk represents that the output in group k is the loss of cross-sectional area of the micro-grinding head (μm2).

Fig. 5
figure 5

Schematic diagram of BiLSTM unit structure

In order to reduce the interference of singular sample data on training, accelerate the training and convergence speed of the model, and improve the accuracy of the model, normalization process is required to limit the data to the range of [0,1]. The prediction model uses Adam algorithm to minimize the loss function, as shown in Eq. (4).

$$E_{loss}=\frac1n\sum \limits_{k-1}^n\left(y_k^{pre}-y_k\right)^2$$
(4)

In Eq. (4), \({y}_k^{pre}\) represents the predicted wear value, yk represents the actual wear value, n represents the data volume of training samples, and Eloss represents the loss of function. Evaluate the performance of the model by calculating the root mean squared error (RMSE) and determination coefficient R2, as shown in Eqs. (5) and (6). The hyperparameters are selected according to the RMSE after training and the final optimal hyperparameters are shown in Table 2.

$$RMSE=\sqrt{\frac1n\sum \limits_{k=1}^n{(y_k^{pre}-y_k)}^2}$$
(5)
$${R}^2=\sum_{k-1}^n{\left({y}_k^{pre}-\overline{y}\right)}^2/\sum_{k-1}^n{\left( yk-{y}_k^{pre}\right)}^2$$
(6)
Table 2 The final optimal hyperparameters of BiLSTM

2.3 Comparative model

The prediction performance of the micro-grinding wear prediction model is related to the fitness between the depth of the model architecture and the sample data volume [33]. To prove the accuracy of the BiLSTM model by comparing the prediction accuracy of different models with different depth of model architecture, two derived models of GA-BiLSTM and LSTM are constructed. To increase the ability of feature signal extraction and representation, GA-BiLSTM introduces more algorithm structures to deepen the model architecture. To adapt to feature extraction and representation of smaller sample data, LSTM applies shallower network structures to simplify the model architecture. The prediction results are compared with the BiLSTM model.

In data-driven tool wear prediction methods, traditional machine learning methods often show better results in small sample data prediction than deep learning methods [34]. In traditional machine learning algorithms, BP has good adaptive feature extraction ability and nonlinear mapping ability [35]. Therefore, two wear prediction models based on BP and GA-BP were constructed for prediction and comparison. The BP structure is as shown in Fig. 6. The ωij is the weight value of the connection between the input layer unit and the hidden layer unit. The νj is the weight value of the connection between the hidden layer unit and the output layer unit. The specific calculation process is as shown in Eq. (7).

Fig. 6
figure 6

Schematic diagram of BP structure

$$\left\{\begin{array}{l}H_j(k)=f\left(\sum_{i=1}^m\left(\omega_{ij}{\mathcal X}_i(k)-a_j\right)\right),j=1,2,\cdots h\\Y(k)=f\left(\sum_{j-1}^h\left(v_jh_j(k)-b\right)\right)\\\varepsilon(k)=Y(k)\left(1-Y(k)\right)\;\left(Y(k)-T(k)\right)\\\delta_j(k)=Y(k)\;\left(1-Y(k)\right)\;\varepsilon(k)v_j\end{array}\right.$$
(7)

where the aj is the threshold between input layer and hidden layer units and the b is the threshold between hidden layer and output layer units. The xi(k) represents the input of the model. The Y(k) represents the predicted output value. The T(k) represents the expected output value.

The schematic flowchart of GA-BP structure is as shown in Fig. 7. After assigning the initial weight value of the connection to the model, genetic algorithm is used for fitness calculation, selection, and variation. When the iteratively updated weight value reaches feasible, the optimal weight value is reversely assigned to the model to achieve model optimization.

Fig. 7
figure 7

Schematic flowchart of GA-BP structure

3 Experiment verification

3.1 Design of micro-grinding experiment

The micro-grinding experiment is conducted by the electroplated diamond micro-grinding tool to verify the feasibility and accuracy of the models proposed. The micro-grinding platform used in the experiment is modified from the MK2945C CNC coordinate grinder, equipped with a micro-grinding tool online detection system and AE detection system, as shown in Fig. 8. The micro-grinding tool online detection system includes camera with a resolution of 20 million pixels named Dahua A3B00MG000 CMOS, fixed magnification fixed focal distance telephoto lens named WWK40-100-111 and LED backlight board with model named CF-200-W. The AE detection system includes high sensitivity resonant AE sensor with model named R6α, preamplifier model named 2/4/6 with an amplification gain of 40 dB, and PCI-2 AE acquisition system.

Fig. 8
figure 8

The micro-grinding platform

Reasonable processing parameters can effectively reduce the risk of the processing process, which is conducive to detect feature signals and study their characterization relationships. The parameters used in the micro-grinding experiment are shown in Table 3. A total of six micro-grooves are processed with three sampling intervals set at the beginning, middle, and end of each micro groove, as shown in Fig. 9. The lengths corresponding to the head of the micro-groove are, respectively, 2 mm, 4 mm, and 6 mm. The experimental group with the same micro-groove depth corresponds to 18 sampling intervals, which is 18 sample data groups and five experimental groups contain a total of 90 sampling intervals, which is 90 sample data groups. The first 80% of the 90 data groups are divided as training sets to train the model and the last 20% are divided as test sets to test the predictive performance of the model.

Table 3 The parameters in experiment
Fig. 9
figure 9

The schematic diagram of sampling intervals for AE signals

3.2 Results of prediction models

The test data is loaded into the BiLSTM, GA-BiLSTM, LSTM, BP, and GA-BP models and the test results of prediction models are shown in Fig. 10a–e, respectively. In Fig. 10a, the predicted value of BiLSTM model is basically consistent with the real wear value. Most data groups show high prediction accuracy. Nearly half of the data prediction relative error is within 5% and only relative error of group 14 is about 20%. By averaging the relative errors of all groups, the relative error predicted by the model is 7.92%, indicating the prediction accuracy of BiLSTM model is 92.08%. In Fig. 10b, the prediction trend of GA-BiLSTM prediction model is in good agreement with the actual change trend. However, prediction results in some data groups have significant deviations, such as the relative errors of the 1st, 5th, 10th, 11th, and 15th groups exceeding 20%. By averaging the relative errors of all groups, the relative error predicted by the model is 12.8%, which means the prediction accuracy of the model is 87.2%. In Fig. 10c, the relative error predicted of LSTM model is 11.42% by averaging the relative errors of all groups, which means the prediction accuracy of the model is 88.58%.

Fig. 10
figure 10

The test results of prediction models. a BiLSTM; b GA-BiLSTM; c LSTM; d BP; e GA-BP

In Fig. 10d, the prediction trend of BP prediction model is in good agreement with the actual change trend. The predicted values of the model and the actual wear value are in good agreement when the actual wear value is large, but there is a significant error when the actual wear value is low. For example, the deviation between the predicted value and the actual value of the 4th, 5th, 10th, and 12th groups exceeds 30%, and even the relative prediction error of the 10th group is as high as 60.25%. By averaging the relative errors of all sample data, the relative error predicted by the model is 15.63%, which means the prediction accuracy of the model is 84.37%. In Fig. 10e, the prediction trend of the GA-BP neural network micro-grinding tool wear prediction model is in good agreement with the actual change trend. However, some data prediction results have significant deviations, such as the relative errors of the 4th, 10th, and 12th groups exceeding 30%, and the prediction result of the 10th group of data is even as high as 74.03%, which is consistent with the sample data group corresponding to the large error of the BP neural network. The relative error of the model prediction is 14.2%, which means the prediction accuracy of the model is 85.8%.

3.3 Comparison and analysis

In order to more intuitively compare the differences in predictive performance between models, the comparison figures of relative error for prediction models (BiLSTM, GA-BiLSTM, LSTM, BP, and GA-BP models) are plotted to compare the prediction accuracy of different models with different depths of model architecture. In order to compare differences of the prediction accuracy among five models more intuitively, the prediction results were plotted every four groups as a rank and a total of four relative error comparison figures (Fig. 11a–d) were drawn.

Fig. 11
figure 11

The comparison figures of relative error for prediction models. a 1st–4th groups; b 5th–8th groups; c 9th–12th groups; d 13th–16th groups

BiLSTM model has a lower relative error in the predicted values corresponding to most samples and the relative error fluctuation is smaller than the other two models. GA BiLSTM and LSTM models have relatively high relative errors in 1st, 5th, 10th, 11th, 12th, and 15th groups, while the BiLSTM model has relatively high relative error in 14th group. The relatively large prediction error in the 1st group of samples is due to the special memory structure of the model (Fig. 11a). As the output of the current unit will affect the output of the next unit, while the 1st group of samples is not affected by this, which means that they do not have memory ability and cannot mine temporal features of the data, there is a certain error. The large prediction errors of the remaining samples are caused by the randomness of the deep learning model. In addition, it can be seen that the prediction errors of BP neural network and GA-BP neural network fluctuate greatly, showing extremely high prediction accuracy in some samples and significant mismeasurement errors in some samples, especially the relative error in 10th group (Fig. 11c) is close to 80%.

Although, the BiLSTM model do not achieve extremely high prediction accuracy in individual samples, the overall relative error fluctuation is small and stable and the average prediction accuracy is higher than other four models. In conclusion, the BiLSTM model has the highest fitness with the small sample multi-source heterogeneous data and the results prove the accuracy, stability, and superiority of the wear prediction model of micro-grinding tool based on the BiLSTM.

4 Conclusions

In this work, the BiLSTM prediction model was constructed to integrate small-sample and multi-source heterogeneous data. The micro-grinding tool wear is characterized and predicted.

The calculated and experimental results show that the prediction accuracy of BiLSTM model is 92.08%, which proves the feasibility of the prediction model. In comparison among the GA-BiLSTM, LSTM, and BiLSTM models, the BiLSTM model has the highest prediction accuracy, which indicates that the model has the strongest fitness of architectural depth of model and sample data. In comparison among the BP, GA-BP, and BiLSTM models, the BiLSTM model has the highest prediction accuracy and the smallest relative error fluctuation, which proves the stability and superiority of the BiLSTM model compared with the machine learning methods.

The BiLSTM model proposed in this paper has important scientific significance for the wear characterization and prediction of micro-grinding tools. On the one hand, it is able to guide the replacement strategy so that the processing quality will be ensured and the waste will be reduced. On the other hand, it can provide more solutions for sustainable manufacturing and provide theoretical basis for independent decision-making in precision intelligent manufacturing.