1 Introduction

In most of the manufacturing industries, machining is one of the indispensible processes, involving removal of unwanted material from a given workpiece to provide it the desired shape while meeting the requirements of close dimensional accuracy and tolerance, and satisfactory surface quality. It is a subtractive manufacturing operation, employing use of cutting tools, discs, abrasive wheels, and more, making direct contact with the workpiece for removal of excess material from it [1]. Depending on the type of the cutting tool and shape of the workpiece to be generated, machining processes are of various types, like turning, milling, drilling, sawing, lapping, broaching etc. Among them, turning, milling and drilling are the principal material removal processes employed almost in every manufactured product. It has been observed that in a typical manufacturing shop-floor, 40–50% of the total workload has been catered by the turning process, followed by drilling (30–40%) and milling (10–15%) operations [2]. With the introduction of CNC technology, all the movements, speeds and tooling changes have been automated, making these processes more productive, consistent and precise.

Each of these machining processes has several input parameters, like cutting speed, feed rate, depth of cut (DOC), tool geometry, type and material of the cutting tool, cutting environment etc. which can be controlled or set depending on the machine specifications and requirements of the process engineers. They have direct relationships with the product characteristics and machining performance, in terms of material removal rate (MRR), surface roughness (SR), geometrical deviations, cutting force, power consumption etc. It has been experimented that during turning operation, although increasing values of cutting speed, feed rate and DOC may enhance productivity with respect to higher MRR, but they have detrimental effects on surface quality of the machined components [3]. On the other hand, during drilling, higher spindle speed, feed rate and DOC may have adverse effects on the quality of the holes generated, although MRR may increase [4]. Due to complex material removal mechanism and involvement of numerous input parameters, their relationships with the conflicting process outputs (responses) are often nonlinear in nature and difficult to model. The process engineers are always in search of developing appropriate predictive models helping them to envisage the possible outcomes for given sets of input parameters. It would help them to have an idea about the tentative values of different responses under consideration for varying combinations of the input parameters.

Predictive modeling is a form of data mining technique which analyzes past data aiming to identify trends or patterns and then generate the corresponding model to help predict the future outcomes. Thus, in predictive modeling, data is collected, a model is formulated, predictions are made, and the model is validated based on additional data. Regression analysis and neural networks are the two most widely adopted predictive modeling techniques. Regression analysis has the limitation of exactly depicting the nonlinear behavior of a system in many of the real-time applications. On the other hand, predictive models used in neural networks, such as machine learning and deep learning, are the emerging fields in artificial intelligence, and have the ability to extract nonlinear relationships between the input and output variables, which would prove impossible for the human analysts. Machine learning deals with structured data, such as spreadsheet or experimental results. On the contrary, deep learning takes into account unstructured data, like video, audio, social media posts or images, not involving numbers or metric reads.

Artificial neural networks (ANNs) are computational networks which attempt to mimic the network of neurons of a human brain so that the computers can understand a system behaviour and make decision in a human-like manner. Similar to human brain, ANNs also have neurons linked to each other in various layers of the network. A typical ANN consists of an input layer, one or more hidden layers and an output layer. Each layer has also several nodes (artificial neurons), depending on the ANN architecture and complexity of the problem. Each node connects to another node, and has an associated weight and threshold. If the output of any individual node exceeds the specified threshold value, that node is activated, sending data to the next layer of the network. The ANNs have several advantages, like learning ability and self-adaptability, nonlinear relationships, fault tolerance, parallel processing, and generalization ability. Overfitting, limited interpretability, computational expensiveness, data requirements and sensitivity to noise are the major disadvantages of ANNs.

Acknowledging the immense potentiality of ANNs in effectively understanding the complex material removal mechanism, and modeling nonlinear relationships between various machining parameters and responses, the past researchers have relied on them in predicting the achievable responses based on the given sets of input parameters. In earlier days, the optimal combination of input parameters to attain the target response values had mainly been based on trial and error method or expert opinions had been sought or machining data handbooks had been consulted. Application of ANNs with appropriate architecture thus relieves the process engineers in effectively predicting the responses of the considered machining operations for various combinations of the input parameters. Based on the randomly chosen real-time experimental trials, those ANNs are usually trained and their prediction performance is later validated with suitable testing datasets using different statistical measures. The past researchers have already surveyed the ANN applications in different machining operations and proved their competency as efficient predictive tools.

Pontes et al. [5] reviewed 45 articles published during 2000–2009 on application of ANNs for modeling of SR in turning, milling, grinding, electrical discharge machining (EDM), abrasive flow machining, electrochemical machining (ECM), micro-end milling and water jet machining (WJM) processes. The common approaches adopted by the past researchers for the said purpose were identified, along with model elaboration, fitting and validation. Chaudhari and Gohil [6] performed a literature survey on applicability of ANNs to model SR during turning operations, and proved their superiority over the conventional prediction models while providing accurate relationship between the turning parameters and SR. Through a literature review, Ranganath et al. [7] proved the potentiality of ANNs in accurately and reliably predicting SR values during turning operations with cutting speed, feed rate and DOC as the input parameters. Garg et al. [8] reviewed the applications of regression analysis, ANNs, fuzzy logic and support vector machines (SVMs) for modeling of turning processes, and opined on the use of ANNs and SVMs due to their non-dependency on statistical assumptions and ability to capture complexity of the turning operation. Dureja et al. [9] evaluated the applicability and relative performance of several modeling and optimization techniques, like linear, polynomial and fuzzy modeling, ANNs, Taguchi methodology, response surface methodology (RSM) and genetic algorithm (GA) for hard turning applications. It was concluded that when regression analysis would fail in developing suitable models, ANNs could be applied for predicting cutting force, tool wear and residual stress during the said machining operations. Jegan et al. [10] reviewed 13 articles dealing with application of ANNs in conventional (turning and milling) and non-conventional machining processes (ECM, EDM, WJM and abrasive jet machining), and advocated their use for prediction of the best results. It was stressed that the performance of ANNs would entirely depend on the availability and accuracy of the training data. du Preez and Oosthuizen [11] emphasized on the application of machine learning techniques in cutting processes which could lead to cost and time savings, enhanced quality and waste reduction, resulting in deployment of a sustainable manufacturing environment.

Based on a review of 49 articles on ANN application in milling processes, Al-Zubaidi et al. [12] concluded that (a) back-propagation neural networks (BPNNs) had been primarily adopted by the past researchers for modeling of milling processes, (b) they had shown higher predictive accuracy than the traditional statistical approaches, (c) GA could be coupled with BPNN for optimization of the milling processes, (d) adaptive neural controllers could be integrated with ANNs for online monitoring and control of SR, tool wear and cutting forces through proper adjustment of the considered milling parameters, and (e) SR had been treated as the most important response directly related to the surface quality of the machined components. Senthil Kumar and Ezilarasan [13] explored the contents of 33 research articles on applications of RSM, ANNs and fuzzy logic for modeling of the drilling operations on glass fiber-reinforced plastics, and identified thrust force and torque as the two important input parameters affecting delamination of the drilled holes. In a recent paper, acknowledging the analytical and predictive capabilities of ANNs, Mumali [14] reviewed 99 multi-disciplinary articles published during 2011–2021 in the manufacturing sector, and identified product and process design, performance evaluation and predictive maintenance as the key areas for ANN adoption. Integration of ANNs with fuzzy logic and GA was highly recommended to overcome their slow convergence during training.

Based on the above literature review, it can be noticed that the past researchers had already accepted ANNs as one of the effective modeling techniques to identify the nonlinear relationships between the input and output variables in many of the machining processes. But, it is revealed that all the literature reviews are back-dated and not exhaustive, and only concentrate on some distinct applications of ANNs in different machining operations. A thorough analysis of the above-cited literature review unveils the following research questions (RQs):

  • RQ1: What would be the representative set of input parameters to model the behavior of the three main machining processes, i.e. turning, milling and drilling?

  • RQ2: What would be the optimal set of responses to highlight performance of those processes?

  • RQ3: Among various ANN techniques, which would be best suited for modeling the nonlinear relationships between the input and output variables for those processes?

  • RQ4: What would be the best architecture of the adopted ANN models and how to achieve it?

  • RQ5: What would be the most suitable training algorithm and activation (transfer) function?

  • RQ6: What would be the most appropriate statistical measures to validate prediction performance of the ANN models?

  • RQ7: How the training and test datasets are collected?

Based on a systematic and content-wise analysis of a considerable number of research articles, available in the popular Scopus, Sciencedirect and Google Scholar databases, on application of ANNs for predictive modeling of turning, milling and drilling processes, this review paper endeavors to answer the above-identified RQs. It would assist the process engineers or software developers in indentifying the best representative sets of input and output parameters for the considered machining processes, selecting the appropriate ANN along with its optimal architecture, choosing the most apposite training algorithm and activation function, singling out the best statistical metric for evaluating prediction performance of the developed ANNs, and choosing the optimal training data based on real-time experiments. This paper would thus act as a data repository to help the process engineers and future researchers in effectively understanding the complex material removal mechanism of any of the machining processes while extracting the nonlinear relationships between the input and output parameters, and envisaging the tentative responses for given combinations of the machining parameters without conducting real-time experiments. It would ultimately help in achieving better product quality, higher process economy, reduced tool wear and energy consumption, leading to sustainable and green machining environment. The organization of this paper is as follows: Sect. 2 provides a brief introduction of ANNs, and the statistical metrics considered for their performance analysis are presented in Sect. 3. Applications of ANNs in turning, milling and drilling processes are provided in Sect. 4 through succinct tables. Results derived from the literature are analyzed in Sect. 5, and Sect. 6 concludes the paper along with future research directions.

2 Artificial neural networks

ANNs are a fundamental concept in the field of machine learning and artificial intelligence, based on a nonlinear mapping system inspired by the structure and functioning of human brain [15]. They are a subset of machine learning algorithms designed to recognize patterns, make predictions or perform tasks by learning from data [16]. ANNs consist of interconnected units called neurons, which are organized into a minimum of three layers, i.e. an input layer, one or more hidden layers and an output layer, as shown in Fig. 1. The input layer receives raw data and sends them to the hidden layer(s), the hidden layer(s) then retrieve information from the data received and pass to the output layer, which ultimately produces the final results [17]. The depth and number of hidden layers determine the ANN’s complexity and capacity to capture intricate patterns. All neurons in the network are connected to each other via links referred to as weights. Neurons in the input layer multiply each input data with its weight and calculate their summation, which is then added to a bias and transformed into an output using an activation function, as presented in Fig. 1.

Fig. 1
figure 1

Architecture of a typical ANN

There are several types of ANN, each designed for specific task and architecture. Most commonly used ANNs in machine learning are [18]:

  1. a.

    Feed forward neural networks (FFNNs): These are the basic type of ANN where data flows in one direction from input to output layer, without any feedback loop. They are mostly employed for tasks, like classification and regression.

  2. b.

    Convolutional neural networks (CNNs): CNNs work best on unstructured data, and are well-suited for image and video analysis. They utilize convolutional layers that automatically learn and extract features from visual data. They are commonly used for image classification, segmentation and detection.

  3. c.

    Recurrent neural networks (RNNs): RNNs have connections that loop back on themselves, allowing them to process sequences of data. They are considered for tasks involving sequential data, like language translation, speech recognition and time series analysis.

  4. d.

    Long short-term memory (LSTM): LSTMs, which are specialized type of RNNs, are designed to better deal with long-range dependencies in sequential data. They are particularly suitable when memory of the past information is important, such as in language modeling and sentiment analysis.

  5. e.

    Radial basis function neural networks (RBFs): RBF networks are a commonly used type of ANNs for function approximation problems and SVM-based classifications. They are distinguished from other ANNs by their fixed three-layer architecture, universal approximation and faster learning speed.

Activation (transfer) functions play a crucial role in proper functioning of ANNs. They introduce nonlinearity to the network, determine the output of a neuron based on its input, and greatly influence an ANN’s ability to learn and generalize. The choice of activation function depends on the problem at hand, architecture of the network and empirical experimentation. They can generally be divided into two classes, i.e. linear activation function and non-linear activation function [19].

  1. a.

    Linear activation function: The linear activation function, also known as ‘Identity function’ or ‘Straight-line function’, is applied when the activation is directly proportional to the input. The most commonly used linear function is pure linear activation (Purelin) function. It does not consider the weighted sum of the inputs and simply splits the value which it has given. Its main problem is that it cannot be defined within a specific range.

  2. b.

    Non-linear activation function: The problem with a linear activation function can be effectively overcome using non-linear functions. This type of activations is normally defined within a specific range which makes it easier for ANNs to adapt to a variety of data and differentiate between the possible outcomes. These functions are mainly categorized based on their ranges. Table 1 provides the expressions of all the commonly employed activation functions along with their ranges and advantages/disadvantages.

Table 1 Different activation functions used in ANNs

The neurons in ANNs learn by updating their weights and biases iteratively to obtain the desired output. For learning to take place, the network is first trained, based on a predefined set of rules, known as training algorithm. They are crucial in training ANNs to perform various tasks, such as classification, regression, and other complex tasks, like image and speech recognition. The frequently employed training algorithms include [20]:

  1. a.

    Gradient descent (GD): GD is the most straightforward algorithm for ANNs, recommended for massive neural networks with many thousands parameters. Until the error function is close to or equal to zero, it continues to adjust its parameters to yield the smallest possible error.

  2. b.

    Levenberg–Marquardt (LM): LM algorithm is specifically designed to work with loss functions that take the form of the sum of squared errors. It is a combination of GD and Gauss–Newton methods. It is the fastest back-propagation algorithm and is highly recommended, although it requires more memory than the other training algorithms.

  3. c.

    Scaled Conjugate Gradient (SCG): Based on conjugate directions, it is a fully automated algorithm with no critical user-dependent parameters and unlike other conjugate algorithms, it avoids a time-consuming line search.

  4. d.

    Broyden–Fletcher–Goldfarb–Shanno (BFGS) quasi-Newton: BFGS overcomes some of the limitations of GD algorithm by seeking the second order derivative. It necessitates complex computation and high storage, for which it is mostly recommended for those networks having small number of weights in the nodes.

  5. e.

    Resilient Propagation (RP): It is very similar to common back-propagation except for weight update routine. It does not take into account the error gradient, but considers only the sign of the error gradient to indicate direction of the weight update, making it faster than other back-propagation trainings.

  6. f.

    Bayesian Regularization (BR): It incorporates Bayesian principles into training and regularization of ANNs. During training, it seeks to find out the posterior distribution of the weights given the training data and prior distribution. It prevents overfitting and allows ANNs to make more reliable prediction, and provides a way to quantify uncertainty in the prediction process.

  7. g.

    Nature-inspired metaheuristics: To overcome the problems of the conventional training algorithms with respect to being trapped in the local minima and overfitting of the training data, several nature-inspired metaheuristics, like GA, artificial bee colony, particle swarm optimization, cuckoo search algorithm etc. have also been proposed by the researchers [21] for training of the developed ANNs for enhanced convergence speed and higher prediction accuracy.

3 Statistical metrics

It has already been stated that a typical ANN is usually fed with appropriate set of training data, the corresponding model is then formulated, predictions are subsequently made, and the developed model is validated using additional (testing) data [22]. To validate prediction performance of ANNs, the following statistical metrics are usually adopted [23]:

$$ {\text{Coefficient of determination }}\left( {R^{{2}} } \right) \, = 1\, - \,\,\frac{{\sum\nolimits_{i = 1}^{n} {\left( {y_{i} \, - \,\hat{y}} \right)^{2} } }}{{\sum\nolimits_{i = 1}^{n} {\left( {y_{i} \, - \,\overline{y}} \right)^{2} } }} $$
(1)

where yi is the actual value of ith observation, \(\hat{y}\) is the predicted value of ith observation, \(\overline{y}\) is the mean of all the observations and n is the number of observations. As it signifies proportion of variation in the dependant (output) variables that can be predictable from the independent (input) variables, its higher value is always preferred.

$$ {\text{Root mean square error }}\left( {{\text{RMSE}}} \right) = \sqrt {\frac{{\sum\nolimits_{i = 1}^{n} {\left( {y_{i} \, - \,\hat{y}} \right)^{2} } }}{n}} $$
(2)
$$ {\text{Mean squared error }}\left( {{\text{MSE}}} \right) = \frac{{\sum\nolimits_{i = 1}^{n} {\left( {y_{i} \, - \,\hat{y}} \right)^{2} } }}{n} $$
(3)
$$ {\text{Mean absolute percentage error }}\left( {{\text{MAPE}}} \right) = \frac{1}{n} \times \frac{{\sum\nolimits_{i = 1}^{n} {\left| {y_{i} \, - \,\hat{y}} \right|} }}{{y_{i} }} \times 100\% $$
(4)
$$ {\text{Mean percentage error }}\left( {{\text{MPE}}} \right) = \frac{1}{n} \times \frac{{\sum\nolimits_{i = 1}^{n} {\left( {y_{i} - \,\hat{y}} \right)} }}{{y_{i} }} \times 100\% $$
(5)
$$ {\text{Mean absolute error }}\left( {{\text{MAE}}} \right) = \frac{{\sum\nolimits_{i = 1}^{n} {\left| {y_{i} \, - \,\hat{y}} \right|} }}{n} $$
(6)
$$ {\text{Relative error }}\left( {{\text{RE}}} \right) = \frac{{\sum\nolimits_{i = 1}^{n} {\left| {y_{i} \, - \,\hat{y}} \right|} }}{{y_{i} }} $$
(7)
$$ {\text{Percent absolute error }}\left( {{\text{PAE}}} \right) = \frac{{\sum\nolimits_{i = 1}^{n} {\left| {y_{i} \, - \,\hat{y}} \right|} }}{{y_{i} }} \times 100\% $$
(8)
$$ {\text{Mean error }}\left( {{\text{ME}}} \right) = \frac{{\sum\nolimits_{i = 1}^{n} {(y_{i\,} \, - \,\hat{y})} }}{n} $$
(9)
$$ {\text{Relative absolute error }}\left( {{\text{RAE}}} \right) = \frac{{\sum\nolimits_{i = 1}^{n} {\left| {y_{i} \, - \,\hat{y}} \right|} }}{{\sum\nolimits_{i = 1}^{n} {\left| {y_{i} \, - \,\overline{y}} \right|} }} $$
(10)
$$ {\text{Root relative squared error }}\left( {{\text{RRSE}}} \right) = \sqrt {\frac{{\sum\nolimits_{i = 1}^{n} {\left( {y_{i} \, - \,\hat{y}} \right)^{2} } }}{{\sum\nolimits_{i = 1}^{n} {\left( {y_{i} \, - \,\overline{y}} \right)^{2} } }}} $$
(11)

All these errors, i.e. RMSE, MSE, MAPE, MPE, MAE, RE, PAE, ME, RAE and RRAE measure deviations between the values predicted by the adopted ANNs and values that are actually observed during real-time machining operations. As a perfectly designed ANN model would be capable to almost accurately predict values of the dependant variables based on the given set of independent variables, it would be always desired that values of these error measures should be nearer to zero (their minimum values are thus preferred).

$$ {\text{Pearson's}}\;{\text{correlation}}\,{\text{coefficient}}\,(r)\, = \,\frac{{\sum\nolimits_{i = 1}^{n} {\left( {x_{i} \, - \,\overline{x}} \right)\left( {y_{i} \, - \,\overline{y}} \right)} }}{{\sqrt {\sum\nolimits_{i = 1}^{n} {\left( {x_{i} \, - \,\overline{x}} \right)^{2} } } \sum\nolimits_{i = 1}^{n} {\left( {y_{i} \, - \,\overline{y}} \right)^{2} } }} $$
(12)

where xi is the value of ith input variable, yi is the value of ith output variable, and \(\overline{x}\) and \(\overline{y}\) are the mean values of all x and y variables respectively. Its value ranges between − 1 and + 1, where + 1 indicates a perfectly positive correlation between the considered variables, whereas, − 1 signifies that they are strongly negatively correlated. Thus, its higher value is always desired to show how strongly the predicted values are correlated with the actual ones.

4 ANN applications in machining processes

4.1 Turning

In turning, a wedge-shaped cutting tool having linear motion is strongly pressed against a rotating cylindrical workpiece and the material is removed from its outer surface due to shear deformation. The cutting tool may have movements along all the three directions, making this process capable of producing precise diameters and depths [24]. Besides decreasing diameter of the workpiece, it can also perform other operations, like parting, grooving, knurling, threading, taper turning etc. This process has several advantages, like interchangeable work materials, excellent dimensional tolerance, short lead time, higher MRR and no need of highly skilled operator. But, it only permits machining of rotatable components, often requires subsequent operations, generates substantial amount of scrap and causes excessive tool wear. Being the main machining operation in the manufacturing industries, it is quite expected that the past researchers would attempt to model them using ANNs, validate performance of the adopted ANNs and predict the unknown response values for varying combinations of the turning parameters. Table 2 provides the ANN applications in turning operations which reveals that almost all the authors have adopted FFNNs for the said purpose.

Table 2 Applications of ANNs in turning operations

Considering cutting speed, DOC, feed rate and average grey level of the surface image of the machined component (grabbed using computer vision system) as the turning parameters, and SR as the response, Natarajan et al. [30] compared the prediction performance of an FFNN, a differential evolution algorithm (DEA)-based ANN and adaptive neuro-fuzzy inference system (ANFIS) during CNC turning of steel alloys. It was noticed that although all the adopted techniques would be capable of envisaging the target SR responses with satisfactory MSE values, but the convergence speed for ANN-DEA had been higher than FFNN and ANFIS models. A similar work had also been performed by Radha Krishnan et al. [68]. Besides treating the primary turning parameters (like cutting speed, feed rate and DOC), Radha Krishnan et al. [68] also employed Fourier transformation to extract the relevant features from the workpiece image (average grey level, major peak frequency and principal component magnitude squared value) to achieve SR prediction accuracy above 95% and MSE below 5%. In an attempt to predict tool wear during turning operation of EN9 and EN24 steel alloys, Baig et al. [75] developed appropriate ANN models considering types of the work material and tool insert, number of cuts, cutting speed, feed rate, DOC, machining time and vibration amplitude as the input parameters. It was claimed that the developed model (having a R2 value of 0.9964) would be able to accurately predict tool wear without performing any real-time experiment, thereby avoiding catastrophic tool failure. In a similar study, Rajeev et al. [58] developed an ANN model for tool wear prediction during hard turning operation of AISI 4140 steel, considering cutting speed, feed, DOC, mean value of the forces in x, y and z directions, power spectral density of vibration and machining length as the inputs to the proposed model. Its application for online tool wear monitoring was highly recommended. Nouioua and Bouhalais [79] explored the practicality of using root mean square values and spectral centroid indicator of vibration signals as suitable inputs to the ANNs for online monitoring of tool wear and SR during the turning operation on AISI 1045 steel materials.

Lee et al. [80] developed an innovative RNN model for flank wear and SR prediction during AISI 1040 steel turning operation with cutting speed, feed rate, DOC and homogeneity extracted from the surface texture images based on grey level co-occurrence matrix as the input variables. It was shown that the adopted RNN model could achieve excellent prediction accuracy of 97.05% and 96.58% for flank wear and SR, respectively. A deep learning-based ANN model was proposed by Patil et al. [84] for tool condition monitoring during turning operation with respect to five distinct tool faults, and a comparative study was later performed against other machine learning-based classifiers to prove robustness of the proposed model which had shown a prediction accuracy of 93.33%.

An analysis of the information provided in Table 2 reveals that while modeling turning processes and predicting responses using ANNs, feed rate, cutting speed and DOC have been treated as the most representative turning parameters, as shown in Fig. 2a. In this figure, ‘Others’ parameters include axial defections, sound, number of cuts, position of the cutting piece, machining length etc. On the other hand, Fig. 2b singles out SR as the most favored response, followed by cutting force and tool wear to symbolize performance of the turning operations.

Fig. 2
figure 2

Input parameters and responses considered during ANN modeling of turning operations

4.2 Milling

Milling employs a rotating multi-point cutting tool (cutter) for the purpose of shaping the workpiece by advancing it into the cutter. Although there are several variations of this process, end milling and face milling are the most popular choices in the manufacturing industries. End milling consists of a cylindrical cutter having multiple edges on its periphery and tip, permitting both end cutting and peripheral cutting. On the contrary, face milling performs horizontal cutting using the circular shape and edges of the cutter along its circumstance [86]. Capability to generate complex shape geometries with high precision, flexibility, versatility, low downtime, increased productivity and reduced waste are the major advantages of a milling process. On the other hand, it suffers from some disadvantages, such as increased setup time, higher space requirement, noisy working environment, higher cost and requirement of skilled manpower. Modeling of milling processes and prediction of the corresponding responses using suitably structured ANNs have also caught attention of the past researchers, as noticed in Table 3.

Table 3 Applications of ANNs in milling

After face milling operation of Al alloy 7075-T7351, Muñoz-Escalona and Maropoulos [98] developed three ANN models, i.e. radial basis NN (RBNN), FFNN and generalized regression NN (GRNN) for prediction of SR values. Based on the calculated MSE values, it was noticed that FFNN had the best prediction performance. It was also observed that among the considered milling parameters (cutting speed, feed per tooth, axial DOC, chip width and chip thickness), there had been strong correlation between the measured SR values and chip thickness, followed by cutting speed. Brecher et al. [102] proposed a solution using global user data to SR monitoring based on numerical control kernel and human–machine interface. Several input parameters were taken to develop the corresponding ANN models which would help in online SR measurement and provide optimized parameters to the machine operators.

Kothuru et al. [124] explored the applicability of deep learning techniques for tool condition monitoring based on the spectrogram features of the audible sound acquired during end milling operations and employed a deep visualization approach to have valuable knowledge with respect to the inner workings of the deep learning models for tool wear prediction. Finally, CNN models were developed for tool wear monitoring and hyper-parameter tuning for increased prediction accuracy. Using audible sound signals during milling operations, Kothuru et al. [119] also employed SVM and CNN models for prediction of tool wear and hardness variation of the machined workpieces.

Ong et al. [125] applied wavelet neural network (WNN), a variant of ANN, to monitor tool wear during CNC end milling operation of grade SS41 mild steel blocks. After each milling experiment, the tool wear images were processed and the corresponding descriptor of the wear zone was extracted. It was noticed that the WNN model with cutting speed, feed rate, DOC, machining time and descriptor of wear zone would provide the most accurate prediction of tool wear. A deep convolutional neural network (DCNN) was proposed by Huang et al. [129] for monitoring of tool wear condition during high-speed CNC operation under dry condition, considering three-dimensional cutting force and vibration as the tool health indictors. Its prediction accuracy had been noticed to be significantly better that the other ANN models. Sener et al. [134] also employed DCNN for chatter detection during CNC milling operation and concluded that when cutting speed and DOC had been considered as the input parameters, the developed model could achieve an average prediction accuracy of 99.88%.

Figure 3 depicts various input parameters and responses considered by the past researchers for ANN modeling of milling processes. It is revealed that cutting speed, feed rate and DOC have been the most favored milling parameters, while SR has been the most important response. In Fig. 3a, ‘Others’ parameters contain width of cut, type of insert, sound pressure level, cutting section, length of cut, number of teeth, milling orientation, extension length, maximum chip thickness, chip load, machined surface area, machining time etc.

Fig. 3
figure 3

Input parameters and responses considered for ANN modeling of milling operations

4.3 Drilling

Drilling utilizes a multi-point cutting tool, in the form of drill bit, to generate cylindrical holes in a solid material. In this process, the rotating drill bit is perpendicularly fed to the plane of the workpiece’s surface, making vertically-aligned holes with diameters equal to that of the drill bit. The drill bit has a pointed end which assists in easily cutting a hole in the workpiece, and its typical double-helix structure allows the debris material (chips) to fall way from the workpiece [137]. Besides making holes, other operations, like reaming, boring, counter boring, counter sinking, tapping, trepanning etc. can also be performed employing a drilling setup. A typical drilling process has several advantages, like higher MRR, extreme adaptability, low maintenance cost, easiness of use etc. But, limited size of the workpiece, generation of rough hole, clogging of chips, drill breakage, use of coolant etc. are some of its demerits. The performance of a drilling process is often characterized with respect to surface quality, delamination factor, geometrical deviations (cylindricity, circularity and perpendicularity), torque, thrust force etc. which are noticed to be influenced by various input parameters, like spindle speed, feed rate, DOC, drill diameter, drill material, cutting environment etc. Table 4 exhibits the works carried out by the earlier researchers on applications of ANNs in drilling processes.

Table 4 Applications of ANNs in drilling

Efkolidis et al. [156] integrated ANN with GA to aid in determination of the optimal ANN architecture while predicting thrust force and torque, treating cutting velocity, drill diameter and feed rate as the major drilling parameters. It was revealed that GA-ANN would perform more efficiently as compared to the ANN with network architecture evaluated based on trial and error method. Rao and Rodrigues [152] performed a comparative study among five different learning algorithms, like BFGS quasi-Newton, SCG, conjugate gradient with Powell-Beale restarts (CGPB), conjugate gradient with Polak-Ribière updates and LM, and revealed the superiority of LM algorithm in perfectly predicting the corresponding response values during drilling of glass fiber reinforced polymer composites. Ramalingam et al. [170] proposed an FFNN model to predict thrust force, torque, exit delamination, hole diameter, cylindricity and SR while conducting drilling operations on quartz cyanate ester polymeric composite materials, and concluded that an optimal network architecture of 3-45-15-10-6 would result in the minimum MSE value of 0.0105. The developed network had also excellent prediction accuracy (maximum error was 7.17%). Using ANNs, Kolesnyk et al. [169] investigated the influences of number of holes, cutting speed, feed rate, time delay and hole depth measuring point on drilling temperature, hole diameter and circularity during drilling of carbon fiber-reinforced plastic/titanium alloy stacks. It was concluded that ANNs could be deployed to extract the nonlinear relationships between the input parameters and quality of the drilled holes. In Fig. 4, the drilling parameters and responses considered by the past researchers for ANN-based modeling of drilling operations are provided. It can be revealed from Fig. 4a that spindle speed, feed rate and drill diameter have been the most favoured input parameters. In Fig. 4a, ‘Others’ parameters consist of thrust force, torque, number of pecking cycles, hardness of the work material, time delay and hole depth measuring point. On the other hand, Fig. 4b shows that SR, followed by thrust force, torque, delamination and hole diameter have been mainly chosen by the researchers to represent performance of the drilling operations on different materials.

Fig. 4
figure 4

Input parameters and responses considered for ANN modeling of drilling operations

5 Analysis of the obtained results from the literature

Acknowledging application of ANN models to different machining processes as a challenging task requiring a broad range of domain knowledge, this paper critically and systematically reviews a considerable number of research articles available in some of the most popular scholarly databases for more than last 20 years focusing on ANN-based modeling and response prediction of turning, milling and drilling processes. The extract of this review has already been provided in Tables 2, 3 and 4, with respect to input parameters, responses, types of the learning algorithm and activation function, network architecture, statistical measure and training datasets considered. It is noticed from Fig. 5a that turning (42.07%) occupies the maximum share of ANN applications, followed by milling (34.48%) and drilling (23.45%) processes. In statistics, analysis of variance helps in identifying the most significant input parameters affecting the responses, which may not be the same for all the responses. Since, during machining operations, each of the input parameters is vital, there is a need to formulate an ANN-like model which can predict the corresponding responses based on varying combinations of the input parameters. This literature review reveals that cutting velocity, feed rate and DOC have been the most predominant parameters for both the turning and milling operations; whereas, spindle speed, feed rate and drill diameter have been maximally preferred during ANN modeling of drilling operations. It is also interestingly unveiled that SR and cutting (thrust) force have been the two main responses representing the performance of turning, milling and drilling operations. In some cases, the researchers have grabbed images of the machined surface and captured vibration/sound signals for online surface texture analysis and tool wear monitoring with the help of CNN, RNN and DCNN models.

Fig. 5
figure 5

Machining processes, learning algorithms, activation functions and statistical measures considered by the past researchers

Selection of appropriate training algorithm and activation function plays crucial role in achieving the desired prediction accuracy of the adopted ANN models with minimum computational effort. Figure 5b and c show the distributions of learning algorithm and activation function as considered by the past researchers with respect to ANN modeling of turning, milling and drilling operations. The most popular learning algorithm has been LM (58.3%), followed by GD (34.6%). The wide application of LM as an effective learning algorithm may be attributed to its robustness, faster convergence speed and ability to deal with ill-structured data. On the other hand, Sigmoid (31.6%) and Tansig (27.8%) have appeared to be the most favored activation functions. Both of them are nonlinear activation functions, capable of providing output values within specified ranges, with minimum chances of the activations being blown up. To evaluate accuracy of the adopted ANN models, the past researchers have employed different statistical metrics, as shown in Fig. 5d, which basically measure the goodness of fit, deviations between the actual and predicted responses, and correlation between them. It is unveiled from Fig. 5d that MSE (47.2%) and R2 (11.8%) have been the two most popular measures considered by the researchers. The MSE measures the mean of the squared deviations between the actual and predicted values. Its smaller value is always preferred, and it ensures that the trained ANN model has no outlier predictions with large errors since it puts higher weight on those errors due to squaring part of its equation. On the other hand, a higher R2 value (varies between 0 and 1) explains how excellently the nonlinear relationship between the machining parameters and responses has been extracted.

Determination of the optimal architecture of an ANN is a challenging task for achieving better prediction accuracy during machining operations. A typical ANN contains one input layer having number of nodes equal to the number of machining parameters, one or more hidden layers and one output layer with number of nodes equal to the number of responses to be predicted. It is noticed that to estimate the optimal number of nodes in the hidden layer(s), the past researchers have mainly relied on trial and error method. The architecture with a given number of nodes in the hidden layer(s) providing minimum MSE value has been considered as the best choice. It thus indicates that some skills are required to select the optimal architecture of an ANN for faster training and better accuracy. Selection of appropriate training and testing data has significant impact on the performance of ANN models. The past researchers have conducted machining operations using different design of experiments (DOE) or Taguchi’s orthogonal arrays. From the experimental dataset, 70% have been utilized for training of the ANN models, and the remaining has been used for validation and prediction purposes. Those experimental data may often contain noise and outliers which may adversely affect accuracy of the ANNs. Use of scatter plots, histograms, box plots or various statistical tests can identify outliers or noisy samples, thereby ensuring proper training of the ANN models. In some cases, the experimental data has been simulated to provide larger datasets while keeping the response values within their achieved minimum and maximum observations.

5.1 Selection of the optimal ANN architecture

While training an ANN for modeling of any of the machining processes and prediction of the corresponding responses, there are a number of hyperparamters to choose, including the number of hidden layers, number of nodes in each of the hidden layers, type of the learning algorithm and activation function, learning rate etc. Determining the optimal intermix of those hyperparameters is thus a challenging task. Therefore, a question always arises to the process engineers and ANN developers that how can the optimal architecture of an ANN can be achieved. An ANN architecture is simply defined by the number of input nodes, number of hidden layers along with the number of nodes in each layer, and number of output nodes. Selection of the optimal number of hidden layers and nodes helps remove them from the hyperparameter optimization search space, resulting in less hyperparameters to be optimized. For modeling any of the machining processes, the number of nodes in the input layer should be equal to number of input variables considered, whereas, the number of output nodes would correspond to the number of responses to be predicted. As there is no generic way to determine a priori the optimal number of hidden layers for a given ANN, trial and error method is still a viable option for the said purpose (optimal number of hidden layers would provide the minimum MSE value). In an ANN, if the optimal number of hidden layers/nodes is used, better prediction accuracy can be achieved with less time complexity. On the other hand, if the number of hidden layers is increased, suitable accuracy can be obtained up to great extent, but the ANN architecture would become more complex. Selection of an appropriate learning algorithm would depend on several factors, like interpretability, volume of training data and its format, data linearity, training and prediction time, and memory requirements. The considered activation function must be monotonic, differentiable and quickly converging with respect to weights for the optimization purpose. In machine learning, different optimization techniques, like stochastic gradient descent, Adam, RMSprop etc. are employed to adjust the ANN’s parameters during training to minimize the corresponding loss function. They enable ANNs to learn from the training data by iteratively updating weights and biases.

Thus, while choosing the optimal ANN architecture, the following factors need to be considered, i.e. (a) type of the data (like structured data, image data, sequential data etc.), (b) complexity of the task (binary classification, image or speech recognition, natural language processing etc.), (c) availability of the labeled data (data with specific information, such as categories or labels), (d) volume of training data (a complex ANN trained using a small dataset may often lead to overfitting, where the model fits too closely to the training data, showing poor performance on new, unseen data), (e) requirement for transfer learning (it can significantly reduce volume of training data as well as time complexity while improving overall prediction accuracy), (f) evaluating the importance of sequential data (using CNNs or RNNs), (g) consideration of the importance of layers (mainly number of hidden layers and nodes in each layer, while making a trade-off between performance and complexity), (h) existence of benchmark models, and (i) selection of the appropriate statistical metrics for evaluating the ANN’s prediction performance.

5.2 Critical analysis of the literature

Keeping in mind the potentiality of ANNs in effectively exploring the nonlinear relationships between the input and output parameters, and predicting the response values, the past researchers have successfully deployed them in many of the conventional machining processes (turning, milling and drilling). For the said purpose, they have mainly relied on real-time experimental datasets which are occasionally smaller in dimension, leading to overfitting of the developed models and poorer prediction performance. Although there are several learning algorithms, activation functions and statistical metrics, in most of the cases, those have been arbitrarily chosen without any valid justification. It is also noticed that due to availability of structured experimental datasets, the earlier researchers have maximally preferred to focus on the application of only FFNNs, although CNNs and DCNNs, developed based on vibration or acoustic signals, may result in better prediction of surface texture and tool wear. Use of simulated data [172], development of dimension-reduced ANNs [173], accessibility to advanced computational resources, integration of ANNs with metaheuristics [174,175,176] and seeking expert’s opinions for selection of appropriate learning algorithm, activation function and network architecture may fruitfully overcome the limitations and challenges encountered by the earlier researchers.

6 Conclusions and future scopes

In this paper, a systematic literature review of a considerable number of research articles published in the top-peer reviewed journals (written in English and publication status ‘Final’) available in some of the popular scholarly databases is conducted on ANN applications in three of the major machining processes (turning, milling and drilling). It is noticed that among those machining operations, ANNs have found maximum application in turning operations for their modeling and optimization. The researchers have mainly preferred to model those processes using FFNNs due to ready availability of structured experimental data and their ability to provide higher prediction accuracy than the traditional statistical approaches. In few cases, CNNs and DCNNs have also been adopted for on-line surface texture and tool wear monitoring. While modeling the considered machining operations using ANNs, cutting speed, feed rate and DOC have been treated as the most representative input parameters for turning and milling; and spindle speed, feed rate and drill diameter for drilling. With respect to output parameters, most of the researchers have concentrated on prediction of SR, followed by cutting force and tool wear. Prediction of SR is important for having better surface integrity of the machined components to minimize frictional and energy losses; while achieving minimum values of cutting force and tool wear would help in attaining economical and sustainable machining environment. As there is no strong mathematical foundation for deriving the optimal ANN architecture, the researchers have relied on trial and error method for the said purpose. It is unveiled that LM, Sigmoid and MSE have mostly been employed as the learning algorithm, activation function and statistical measure, respectively. In the reviewed articles on ANN applications in machining processes, there is almost no mention about any specific optimization algorithm considered to adjust the ANN’s parameters during training to minimize the corresponding loss function. Very few authors have acknowledged application of Adam algorithm for this purpose. For training and testing of ANNs, the databases have mainly been generated recording real-time experimental observations based on DOEs and Taguchi’s orthogonal arrays, and a 70–15–15% rule has been followed for training, validation and testing of the ANNs. It is also noticed that most of the authors have considered an MSE value of 0.0001 and 5000 epochs during ANN training. Thus, it is concluded that any of the machining processes can be effectively modeled with the help of suitably developed ANNs and the important responses can be predicted for varying combinations of input parameters without conducting real-time experiments, thereby saving machining time and cost. Therefore, the machining processes under consideration can be optimized with respect to higher productivity and process economy, better product quality, reduced tool wear and energy consumption, resulting in sustainable and green machining environment.

This review paper also proposes multiple future research directions. It is highly recommended to adopt CNNs and DCNNs for surface texture or online tool wear monitoring through analysis of the captured images of the machined components or vibration/sound signals during real-time machining operations. For this purpose, suitable adaptive neural controller supported by ANNs may be developed. It would effectively lead to cost and time savings, enhanced product quality and waste reduction. Instead of black-box models, like ANNs, use of decision tree or fuzzy logic is encouraged to understand the inherent relations between the machining parameters and responses. Instead of traditional techniques, use of gene expression programming is highly desired for empirical modeling of the machining processes. It does not require any assumption with respect to model structure, automatically evolving the optimal model structure and related parameters. To overcome the problems of slow convergence speed and overfitting of training data, the ANN architecture may be optimized with the help of different metaheuristic algorithms. Adaptive neuro-fuzzy inference system, based on Takagi–Sugeno fuzzy system may be used for faster data extraction and process behavior realization. The developed ANNs should be reusable based on uniformly distributed training and testing datasets. Performance of the ANNs can be enriched through establishment of standardized databases and data-sharing platforms. Moreover, to deal with small volume of training data, transfer learning and data augmentation techniques may be utilized. Future works may be more deeply directed towards application of neurocomputing concepts, network optimization, validation of results based on firmer statistical techniques and finally, visualization of the derived results. This literature review may also be extended to include ANN applications in other machining/joining processes, like casting, grinding, welding, and many of the non-traditional material removal processes.

Due to paucity of space, this review paper has some limitations, like relying on only three scholarly databases for availing the published research articles, consideration of articles in only top-peer reviewed journals and written in English, taking into account only three major machining processes, not depicting the achieved values of the predicted responses and corresponding statistical measures etc.