1 Introduction

Magnesium (Mg)-based alloys are characteristically lightweight, and therefore, they exhibit high specific strengths [1, 2]. In the automotive and aerospace industries, the motivations to increase fuel efficiency, cut down on vehicle emissions, and reduce global warming have spurred researchers to turn to Mg-based materials [3, 4]. Two basic processing methods are employed to manufacture Mg-based components either through the powder metallurgy (PM) or melting techniques. PM technique is a near-net shape and relatively simple fabrication process that involves pressing of Mg-based powders into desirable compact shape and application of heat and/or pressure (a process called sintering) to densify the component and impact the necessary properties. However, many powders cannot be used without a further reduction of their sizes to facilitate easy pressing into desired shapes and enhance diffusion during the sintering process. For instance, fine-sized, free-flowing powders are required for metal powder injection molding and additive manufacturing processes [5], while some sintering studies have shown that powder particle sizes less than 100 µm readily absorb microwaves at 2.45 GHz, which results in rapid microwave sintering and densification [1, 6].

Hence, high-energy mechanical milling/alloying (MA) is a useful method for producing powders with refined and reduced particle sizes. In this process, metal powders with milling media (e.g., balls or blobs) are loaded into a suitable container and subjected to grinding action. Tumbling action from the milling causes repetitive welding and fracturing of the metal powder. Thus, MA has also been applied for homogeneously mixing different powders and extending the limit of solid solubility [5], which is difficult to achieve using the melt processing technique. Besides, MA is also employed in the production of advanced materials including dispersion-strengthened superalloys [7, 8] and nanocrystalline composite materials [9, 10]. However, the mechanisms of MA have not been fully unraveled due to the high randomness of the process and the complex interrelationship between many dependent and independent variables [11]. For instance, MA of ductile powders involves repetitive plastic deformation and fracturing of the powders. These processes are greatly influenced by several parameters, such as charge ratio (milling ball-to-metal powder mass ratio), types and sizes of milling balls, milling atmosphere, process control agent, and miller type [12]. Any wrong combination of these processing variables results in an undesirable final product, such as particle agglomeration due to excessive cold welding [13].

Therefore, it is imperative to precisely control this process with the aim of optimizing it. One approach is to develop models that furnish a better understanding of the process and thus provide information on the general trends of the evolution of particle sizes. Besides, successful models help in the identification of critical processing parameters and reduction of expensive and time-intensive experimentations [14]. Some researchers utilized empirical/phenomenological models to describe the chain of events during MA [14]. Furthermore, a system dynamic model was used to optimize ball size, milling speed, and milling time to realize optimum size reduction in a metal-matrix nanocomposite [9]. Despite the insights revealed by these models, they are not easily and accurately replicated in reality. This is attributed to the stochastic nature of the MA process, the complexity of the models, use of many oversimplified assumptions required to formulate the models, and the varied process parameters involved [15].

Consequently, the suitability of artificial neural network (ANN) as a powerful and versatile modeling approach has been investigated by a number of scholars. ANN is part of the soft computing methods that imitates the human brain in the processing of information. Similar to the human brain, ANN is made up of interconnected neurons. One important feature of ANN is its ability to integrate all processing parameters into a single model [16]. Besides, ANN is characterized by high parallelism, nonlinearity, and extensive learning and generalization capabilities. Security features to guard against data theft and infiltration are key requirements to deploying ANN [17]. Nonetheless, some encouraging results on applying ANN for particle size prediction in MA have been reported. Hamzaoui et al. [18] apply ANN to predict to the magnetic properties of nanocrystalline Fe–Ni alloy from milling process parameters [18]. Similarly, ANN has been employed to predict the morphological characteristics of nanocomposite WC–MgO powders from high-energy planetary ball milling parameters [15]. Likewise, Lemine and Louly [10] utilize ANN to correlate the processing conditions of the planetary ball mill to the crystallite size of ZnO. The effect of processing parameters and the densification behavior of Al-based/ B4C composite powders have also been investigated using ANN [19, 20].

Recently, Akhlagi et al. develop an ANN model that describes the effect of some process parameters on the powder particle size characteristics of Al and B4C powders during MA [12]. Many of these works report very encouraging results because ANN predictions are generally more accurate than other models. However, these ANN-based models have not been applied to Mg-based powders and do not present any equation that can relate the input with the output parameters. Therefore, this work develops an ANN-based mathematical equation for predicting the particle size of AZ61 powder during mechanical alloying.

2 Brief description of ANN

ANN models are created by varying the connection of neurons that form a network. The feed-forward (FF) ANN is a typical network architecture in which the neurons are organized into different layers (multilayer perceptron, MLP) with unidirectional interconnections. FF-ANN is considered to be extremely nonlinear and capable of solving complex problems in different scientific fields including estimation of circuit parameters, prediction of microbial growth curves, and prediction of blast-induced ground vibrations in quarries [16, 21, 22]. An MLP consists of three layers of interconnected neurons: input, hidden, and output layers. The function of the hidden neurons is to process the information received from the input layer and relay them to the output layer.

For the network to function effectively, various learning algorithms have been proposed. A learning process involves updating network architecture by changing the connection weights that are derived from the available training patterns. The performance of the network improves by iteratively updating the weights over time [22]. One of the most popular algorithms is the backpropagation (BP) ANN. BP-ANN is also an MLP comprising an input layer that takes the input variables, one or more hidden layers to capture the nonlinearity in the data, and an output layer with nodes representing the dependent variables [16].

Each of the input variables (\(x_{i}\)) in the input layer is transmitted to the neurons in the hidden layer by multiplying with a corresponding connection weight (\(w_{ji}\)). For each neuron, all the input signals are summed and added to a threshold value or bias (\(\theta_{j}\)). The corresponding output (\(Y_{j}\)) at a node in the output layer is determined by subjecting the summed input signals to a transfer function (f), such as a linear or nonlinear function. In BP, the predicted output is compared with the target to obtain an error value. This error is then propagated backwards from the output layer through the hidden layer to the input layer to adjust the connection weight. This process is repeated until the error falls within a pre-specified tolerance. A BP-ANN process can be summarized by the following equation:

$$Y_{j} = f\left( {\theta_{j} + \mathop \sum \limits_{i = 1}^{n} w_{ji} x_{i} } \right)$$
(1)

where \(Y_{j}\) represents the output variable; \(\theta_{j}\) the bias of the hidden layer; n the number of neurons in the hidden layer; \(w_{ji}\) the connection weight between the input layer and the hidden layer; \(x_{i}\) the input variable; and f the transfer function.

3 Data and model development

3.1 Materials and methods

Water-atomized AZ61 magnesium alloy powder was sourced from Tangshan Weihao Magnesium Powder Co., Ltd, China. Particle size of the as-received powder varied between 120 and 300 µm. A planetary ball mill (QM-3SP2, Nanjing University Instrument Plant) was used for the mechanical alloying (MA) process. MA was conducted at 300, 350, and 400 rpm for 5 to 15 h. Charge ratios (ratio of the mass of milling ball-to-metal powder, BPR) were 5:1, 10:1, and 20:1. To minimize cold welding, cyclohexane at 30 vol.% was used as a process control agent. After charging the powder and milling balls, the vial was evacuated by means of a vacuum pump for 20 min to reduce powder oxidation. The size of the milled powder was determined by analyzing the SEM images obtained for each combination of processing parameters, i.e., rotation speed, charge ratio, and milling time. A minimum of three images were analyzed for each combination of the processing variables.

3.2 Development of BP-ANN model

A three-layer BP-ANN model was developed to predict the particle sizes of AZ61 powder subjected to MA. The model had three neurons in the input layer, viz.: rotation speed (S), charge ratio (C), and milling time (t) in the input layer and one single neuron (i.e., particle size) in the output layer. The optimal number of neurons in the hidden layer was determined by trial and error as there is no established rule for determining it in the literature [16]. Eventually, a model with five neurons in the hidden layer was selected. The transfer function in the input layer was tan-sigmoid, while purelin was used in the output layer. Twenty-seven data points were generated from the experimental data to build the model. The experimentally generated datasets were adequate to successfully build a suitable ANN model as some researchers, such as Monjezi et al. [23] and Zhao et al. [24], used lower number of datasets to develop suitable and acceptable ANN models. The range of the data is given in Table 1

Table 1 Range of data used for model development

The BP-ANN model was developed using the built-in neural network toolbox of MATLAB®. The data were randomly divided into three sets comprising training (70%), testing (15%), and validation (15%). Prior to modeling, the data were scaled within the range of -1 to 1 by means of Eq. 2:

$$x_{i}^{*} = \lambda_{1} + \left( {\lambda_{2} - \lambda_{1} } \right)\frac{{\left( {Z_{i } - Z^{\min } } \right)}}{{\left( {Z^{\max } {-} Z^{\min } } \right)}}$$
(2)

where \(x_{i}^{*}\) is the scaled parameter, \(\lambda_{1} \;{\text{and}}\;\lambda_{2}\) represent the normalization range, \(Z_{i }\) the data to be scaled, and \(Z^{\max }\) and \(Z^{\min }\) are the max and min of \(Z_{i }\) in the dataset.

Scaling is useful for avoiding under- and overfitting, preventing a large number from over-riding a smaller number, and rendering the data compatible with the adopted tan-sigmoid transfer function [16]. 70% of the data were used for training the network, 15% for validation, and the remaining 15% for testing.

4 Results and discussion

4.1 Effect of charge ratio on the average particle size of AZ61 powder during MA

Figure 1 shows the effect of charge ratio (BPR) on the average particle size of AZ61 after 15 h of MA at 350 rpm. It is apparent that the higher the charge ratio, the more reduction in particle size that is achieved. For instance, at a BPR of 5:1, the average particle size measures 161.11 µm, which is further, reduced to 107.33 µm at 10:1 BPR. Moreover, when the BPR rises to 20:1, finer particles are obtained (34.62 µm). Several authors have shown that the BPR is an indication of the milling energy supplied to the MA process [5, 13]. The higher the BPR, the higher the collisions per unit time, and consequently, more energy is transferred to the powder. This explains the greater reduction in particle size realized at 20:1 BPR, as shown in Fig. 1. Similar particle size reduction at a higher BPR has been reported for Al-Zr powder [25] and Fe-NbC [26] composite.

Fig. 1
figure 1

Plot of mean particle size against charge ratio at 350 rpm

4.2 Effect of milling time and milling speed on the average particle size of AZ61 powder during MA

The data presented in Fig. 2 show that there is a correlation between milling time, milling speed, and the resulting average particle size. Generally, the higher the speed, the more reduction in particle size is achieved. Milling speed is particularly important as it contributes to the energy required for particle size refinement [13]. Therefore, the faster the rotation of the mill, the more energy is committed to the powder. This is why after 10 h of milling size reduction is 163.11, 142.59, and 19.62 µm at 300, 350, and 400 rpm, respectively. This is in agreement with the submissions in some earlier studies on the MA of Ti powder [27] and gas atomized Al88Ce8Fe4 powders [28]. However, there is a critical speed above which the milling balls will be pinned to the walls of the milling vial and comminution will not occur [5]. From the results presented in Fig. 2, it can be deduced that this critical point has not been exceeded at 400 rpm, the maximum vial rotation speed utilized in this study.

Fig. 2
figure 2

Plot of mean particle size against milling time for different vial rotation speeds

Likewise, increasing milling time results in higher particle size reduction. This is indicated by the plot shown in Fig. 2. Keeping the milling speed constant at 300 rpm, particle size reduces from 173.18 µm, through 163.11 µm to 95.00 µm after 5, 10, and 15 h, respectively. This observation is also similar to that recorded at 350 and 400 rpm. In a commercially pure Ti powder subjected to MA, the authors show that a longer milling time results in finer powder particle [27].

4.3 Neural network architecture

The proposed BP-ANN is shown in Fig. 3. It is a 3–5–1 three-layer BP-ANN architecture consisting of three neurons in the input layer, five neurons in the hidden layer, and a single neuron in the output layer. The input layer employs a tan-sigmoid transfer function, while the output layer uses a purelin transfer function. Data limitation is a general problem in many aspects of materials science. The availability of samples, equipment, and cost is usually the major reasons behind data limitations. Since there is no general ground-rule established in the literature that specifies the minimum or the maximum number of datasets required in an ANN model, the results of the proposed models are reliable and applicable for the prediction of the particle size of milled AZ61 powder. This is because several studies have used less than the twenty-seven data points used in this study to develop various ANN models. Monjezi et al. [23] use 20 dataset to predict blast-induced ground vibration. On the other hand, Zhao et al. [24] apply only 16 datasets to develop a BP neural network for predicting the flexural strength of open-porous Cu-Sn-Ti composites. Therefore, the results of the proposed model presented here are reliable and applicable for the prediction of the particles size of milled AZ61 powder. Figure 4 shows the regression plots for the training, validation, and test data. The R values are above 93% and the graphs show that there is a good fit between the predicted normalized particle size and the normalized target particle size. This clearly indicates that the network model can accurately predict the particle size of the AZ61 powder after MA.

Fig. 3
figure 3

Three-layer 3–5–1 BP-ANN architecture

Fig. 4
figure 4

Regression graphs for training, validation, test, and whole datasets

4.4 Formulation of a mathematical equation for predicting particle size during MA

The mathematical representation of a trained ANN can be derived from the weights, biases, and transfer functions used in formulating the network. This equation can be generalized as

$$Y = f_{o} \left\{ {{\varvec{W}}_{{\varvec{o}}} \cdot \left[ {f_{i} \left( {{\varvec{W}}_{{\varvec{i}}} \cdot {\varvec{x}} + {\varvec{B}}_{{\varvec{i}}} } \right)} \right] + {\varvec{B}}_{{\varvec{o}}} } \right\}$$
(3)

where \({\varvec{W}}_{{\varvec{o}}}\) and \({\varvec{W}}_{{\varvec{i}}}\) are the weights vectors of the output and input layers, respectively; \({\varvec{B}}_{{\varvec{o}}}\) and \({\varvec{B}}_{{\varvec{i}}}\) are the biases vectors of the output and input layers, respectively; \({\varvec{x}}\) normalized input vector; and \(f_{o }\) and \(f_{i}\) are the transfer functions used at the output and input layers, respectively. In the current work, \(f_{o}\) is purelin, while \(f_{i}\) is tan-sigmoid function.

By referring to Table 2, Eq. 3 can be transformed to

$$P_{{{\text{norm}}}} = - 0.73161 + \phi_{1} + \phi_{2} + \phi_{3} + \phi_{4} + \phi_{5}$$
(4)

where \(P_{{{\text{norm}}}}\) is the normalized particle size.

Table 2 Weights and biases derived from the ANN simulation

The unknown variables in Eq. 4 can be obtained from the parameters contained in Table 2 as

$$\phi_{1} = 0.72119 \cdot {\text{tanh}}\beta_{1}$$
(5)
$$\phi_{2} = 1.30223 \cdot {\text{tanh}}\beta_{2}$$
(6)
$$\phi_{3} = 1.38506 \cdot {\text{tanh}}\beta_{3}$$
(7)
$$\phi_{4} = - 0.24613 \cdot {\text{tanh}}\beta_{4}$$
(8)
$$\phi_{5} = - 2.35384 \cdot {\text{tanh}}\beta_{5}$$
(9)

The unknown variables in Eqs. 59 can be transformed into Eqs. 1014 by using the values presented in Table 2:

$$\beta_{1} = 2.34007S - 4.05438C + 0.87849t - 2.08853$$
(10)
$$\beta_{2} = 3.36672S + 1.06733C + 3.51236t + 1.60816$$
(11)
$$\beta_{3} = 2.34007S - 4.05438C + 0.87849t + 5.29558$$
(12)
$$\beta_{4} = - 1.99231S - 1.91939C + 0.04739t - 2.21190$$
(13)
$$\beta_{5} = 1.24444S - 0.02016C + 1.39798t + 0.67019$$
(14)

To formulate the ANN-derived mathematical equation for predicting particle size, \(P_{{{\text{norm}}}}\) in Eq. 4 is denormalized to \(P_{{{\text{denorm}}}}\) as shown below

$$P_{{{\text{denorm}}}} = 178.4893P_{{{\text{norm}}}} + 104.9082$$
(15)

Equation 15 can be used to predict the particle size of AZ61 powder subjected to MA. This equation is validated by comparing its output with that of the ANN-predicted values. The result is presented in Fig. 5. It can be seen that the correlation coefficient is exactly 1, which implies that the derived equation is an exact replica of the predicted outputs of the ANN model.

Fig. 5
figure 5

Plot comparing the particle size of BP-ANN-derived equation with the predicted particle size of BP-ANN

4.5 Comparison of ANN model with a multilinear regression model

Multilinear regression (MLR) analysis describes a linear relationship between one dependent variable and multiple independent variables. The relationship is derived from the principle of minimizing the sum of the squares of the differences between the dependent and independent variables. Generally, MLR can be represented as

$$Y = b_{o} + b_{1} x_{1} + b_{2} x_{2} + \cdots + b_{n} x_{n} + \lambda$$
(16)

where Y and x are the dependent and independent variables, respectively; \(b_{o}\) is the intercept; \(b_{1}\) to \(b_{n}\) are the coefficients of the independent variables; and \(\lambda\) is the error of the predictor.

Compared with multi-nonlinear regression (MNLR), several past studies have shown that there is no significant difference between the performances of both regression types [29,30,31,32,33], and in some cases, MLR outperforms the MNLR model [32, 33]. Analogous to the works of Enayatollahi et al. [34], Mehrdanesh et al. [35], and Akhlagi et al. [12], the performance of the ANN model in this work is compared with an MLR model.

Similar to the ANN model, the MLR model uses three independent variables [rotation speed, (S), charge ratio (C), and milling time (t)] and one dependent variable, i.e., particle size (P). MLR analysis is performed with the aid of Microsoft Excel®, and the resulting model equation is presented as follows:

$$P = 304.6818 - 0.2489S - 3.7896C - 4.56855t$$
(17)

In Eq. 17, P is the particle size (µm), S is the rotation speed (rpm), C is the charge ratio, and t is the milling time (h).

The BP-ANN model is compared with the MLR model by means of two different error analyses methods: root-mean-square error (RMSE) and mean absolute error (MAE), according to Eqs. 18 and 19 below:

$${\text{RMSE}} = \sqrt {\left[ {\frac{1}{n}\mathop \sum \limits_{j = 1}^{n} \left( {P_{j} - P_{j}^{*} } \right)^{2} } \right]}$$
(18)
$${\text{MAE}} = \frac{1}{n}\mathop \sum \limits_{j = 1}^{n} \left| {P_{j} - P_{j}^{*} } \right|$$
(19)

where \(P_{j}\) and \(P_{j}^{*}\) are the measured and predicted values of particle size after MA.

The results show that the RMSE of the MLR model is 34.62 µm, which is considerably higher than that of the ANN model (8.4748 µm). Generally, a lower RMSE is desirable, which indicates a good fit. Similarly, the MAE of the ANN model is 4.19 µm, which is also lower than 29.66 µm of the MLR model. In this case, a lower MAE is desirable. Therefore, the ANN model outperforms the MLR model based on these two error analyses techniques.

Furthermore, the predictive abilities of the two models are presented in Fig. 6. The coefficient of correlation of the ANN model is 0.97, which outperforms the 0.46 in MLR model. This is also an indication that the ANN model is superior and more reliable compared with the MLR model. This agrees with the conclusions of some earlier study that ANN models outperform MLR models in predicting the particle size distribution of metallic powders [12] and rock mass fragmentation properties [35].

Fig. 6
figure 6

Comparison of the predicting ability of the a BP-ANN model and b MLR models

4.6 Sensitivity analysis

Sensitivity analysis is used to quantify how the output values of a model are influenced by changes in the input values of the model. In this work, the cosine amplitude technique (CAT) is used to measure the sensitivity of each input parameter in the form of strength values. The higher the strength value, the more influence an input parameter has on the output parameter. The expression for CAT is depicted in the following equation [36]:

$$r_{ij} = \frac{{\mathop \sum \nolimits_{i}^{n} \left( {x_{i} Y_{i} } \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} x_{i}^{2} \mathop \sum \nolimits_{i}^{n} y_{i}^{2} } }}$$
(20)

In Eq. (20), \(x_{i}\) is the input parameter, \(Y_{i}\) is the output parameter, and n is the number of observations for the parameters.

Results of the CAT analysis are depicted in Fig. 7. The highest strength value is exhibited by “Speed,” followed by “Charge ratio” and “time,” respectively. This indicates that rotation “speed” is the most significant factor influencing the average particle size of AZ61 during MA. This is in good agreement with the results and discussion of Fig. 2. Generally, the higher the speed, the more reduction in particle size is achieved. Milling speed is particularly important as it contributes to the energy required for particle size refinement [13]. Therefore, the faster the rotation of the mill, the more energy is committed to the powder.

Fig. 7
figure 7

Strength of relationship (\(r_{ij}\)) between average particle size and each input parameter

5 Conclusions

An artificial neural network (ANN)-based model has been successfully developed for predicting the particle size of AZ61 magnesium alloy powder after mechanical alloying process. The ANN model utilizes three input parameters (rotation speed, charge ratio, and milling time) to predict the average particle. The developed model has an R value of over 93%, which strongly suggests its high prediction and precision. Furthermore, a mathematical equation for predicting the particle size of the powder has been derived from the ANN model. This can help in optimizing the process, eliminating expensive and time-intensive experimentations, and minimizing the stochasticity of the process. A comparison of the ANN-based model with a multilinear regression model reveals the superiority and high predictability of the ANN model. Finally, sensitivity analysis reveals that rotation speed is the most significant parameter influencing particle size during MA. One of the limitations of this study is that the proposed model has not been applied to other metal/reinforcement systems due to limited data in the literature. Similarly, just like any other soft computing-based models, the proposed models will perform well for the datasets that are within the range of those used in this study with similar experimental conditions. Hence, the datasets that are outside the data range or obtained from highly different experimental conditions may be the limitations of this study.