1 Introduction

Tribology encompasses studying and applying the phenomena and techniques associated with the frictional and sliding interactions of moving, sliding, and sliding surfaces (Britain 1966; Dowson et al. 1998). Material wear is the phenomenon of surface deformation in solids that leads to a reduction in the original size. Since wear is the predominant failure mechanism for these materials, knowing the wear rate in various operating conditions is crucial (Ashby et al. 1990). Typically, carbon is the ingredient responsible for changing steel's characteristics. The strength and hardness of steel improve with an increase in the carbon content (Gupta 2013). Compared to conventional steels, hypereutectoid steels excel in strength, hardness, and wear resistance. Hypereutectoid steels have attracted a lot of interest from several engineering fields, especially in railway applications, because of their exceptional mechanical characteristics and wear resistance (Qiao et al. 2022; Hosmani et al. 2017; Wang et al. 1999; Liu et al. 2011).

Many researchers have studied wear in metals, non-metals, composites, and hybrid composites under distinct working environments. Wadsworth (1999) reports that Stanford University developed UHCS hypereutectoid steels with 0.8 to 2% C. Their high carbon content was similar to Damascus steels and other ancient steels. Sato et al. (2007) found pro-eutectoid carbides in 1–2 wt.% UHCS. Brittle but strong, these steels offer exceptional wear resistance. If pro-eutectoid carbides break at grain boundaries and produce spheroidized morphologies, the steel is strong, ductile, and wear-resistant. Sasaki et al. (2006) described cementite volume fraction relevance. Steel characteristics can be altered by altering the shape, size, and content of cementite. Cementite is the hardest particle and increases the wear resistance of steel. Luzginova et al. (2008) studied hypereutectoid steels with varying chromium concentrations. Gunduza et al. (2008) studied the wear behavior of forging steels with varied microstructures during dry sliding and found that the microstructures affect the steels' wear resistance performance.

Many studies are required to assess the wear resistance of tribological materials to be employed under various operating instances, and these trials may be time-consuming. Thus, with the objective to reduce the number of tests and the expense of experimental investigations, there is increased scope for ML algorithms that use experimental data to predict material wear behaviors.

Machine learning (ML) methods have been extensively studied recently, and their impact can be seen in a wide variety of disciplines. In their study, Shi et al. (2016) looked at how the ML approach can be used in the field of materials engineering as well as the functional sector. Furthermore, we uncover numerous common themes related to the application of ML techniques, such as material characteristics (Hulipalled et al. 2022), crystal structures (Tsutsui et al. 2019), and phase diagram predictions (Qiao et al. 2021), which greatly speeds the discovery of new materials via a data-driven material research approach. Taking into account the available data, ML methods typically display outstanding performance when tackling the challenging multidimensional exponential relationship between input and output variables.

For example, using machine learning methods, we are able to forecast the wear rate of the coated ferroalloy under different conditions of use. When compared to linear regression and support vector machine, it was found that the Gaussian process regression method was the most effective (Altay et al. 2020). Using an artificial neural network, Capitanu et al. (2019) investigated the wear behavior of plastic material reinforced with short glass fiber on the surface of the steel.

Based on the literature survey, no studies have been noticed on the wear performances of hypereutectoid steels using the ML approach. Prediction of wear rate is essential before the failure of steel rail for railway tracks due to wear. Hence the current study aims to predict the wear rate of hypereutectoid steels (0.92 wt.% carbon and 1.57 wt.% carbon) under dry sliding settings was predicted using machine learning methods such as LR, KNN, SVM, GPR, RF, AdaB, and GB.

2 Methodology

This section focuses mainly on the methodology of data collection and designing of ML models that can be used for regression analysis to predict the wear performance of hypereutectoid steels. Both of these topics will be discussed in further detail in subsequent sections.

2.1 Data acquisition

Two samples of hypereutectoid steels were used in the present study, one with 0.92 weight percent carbon (C) and the other with 1.57 weight percent (C), as indicated in Table 1. The methodology of sample preparation, wear test experimental details, and microstructures are published in previous work (Sharanabasappa et al. 2014; 2015a). The two datasets pertaining to the wear loss of hypereutectoid steels were compiled by conducting 48 wear test trials with a variety of input parameters, as detailed in Table 2.

Table 1 Hypereutectoid steel composition (wt. percent)
Table 2 describes the parameters of the wear test

2.2 Machine learning algorithm

One of the most important ideas in the field of ML is called regression analysis, and it is comprised of a group of different ML techniques that may predict the values of a continuous output parameter based on the input parameters. Before implementing the ML models, the datasets were pre-processed using the standardScaler() fuction from sklearn.preprocessing module which will normalize each feature by considering the mean value as zero and the standard deviation as one. The supervised ML models that are mentioned below are subjected to regression analysis to provide predictions regarding the wear rate of hypereutectoid steels. ML algorithms are trained using the experimental findings of two hypereutectoid steels (0.92 percent C and 1.57 percent C) with sliding distance, normal pressure, and sliding speed serving as input factors and wear rate serving as an output parameter. The respective ML Algorithms are scripted using Python and allowed to divide the dataset into training and test sets in a 70:30 (Specht et al. 1991; Scherbela et al. 2018).

2.2.1 Linear regression

The linear regression method is the most basic type of regression algorithm for determining the relation between the dependent variable(s) and the independent variable(s) (Algur et al. 2021).

2.2.2 K-Nearest Neighbor (KNN)

K-Nearest Neighbors (KNN) is a quasi-method performing feature-based discriminant analysis by determining the Euclidean distance (\({d}_{i}\)) between the training dataset and the specified test dataset (Wan et al. 2021).

2.2.3 Support vector machine (SVM)

In order to determine the relationship between the output and the input data, the SVM algorithm uses a regression technique (Gunn 1997). The linear kernel function is used to find the best-fitting line through a group of data points scattered throughout the hyperplane.

2.2.4 Gaussian Process Regression (GPR)

Prediction values and uncertainty estimates can be calculated using a regression model based on Gaussian processes. Kernel functions allow for the incorporation of pre-existing knowledge regarding the shape of functions (Wang and Tao 2015; Daemi et al. 2019).

2.2.5 Random Forest (RF)

The supervised learning method known as RF employs a plethora of decision trees. The foundational idea behind RF is to combine several decision trees into a single one to determine the final output, rather than relying on the results of individual trees. Ensemble learning, or the act of assimilation of numerous classifiers to address a complicated problem and boost the model's performance, is the foundation of this approach (Aye and Heyns 2017; Breiman 2001).

2.2.6 AdaBoost (AdaB)

The adaptive boosting algorithm is an eminent machine learning algorithm introduced by Freund and Schapire which works on the concept of boosting. It is also identified as an ensemble learning algorithm (James et al. 2013) which reduces variance and bias under two classifications namely bagging-based and boosting-based, respectively. This method is called iteratively until all the features are classified correctly.

The performance of the stump is calculated by the equation,

$$\mathrm{Performance }= \frac{1}{2}{\mathrm{log}}_{\mathrm{e}}\left(\frac{1-\mathrm{TE}}{\mathrm{TE}}\right)$$
(1)

where TE is a total error which is the sum of all errors that occurred at each classified sample weight.

2.2.7 Gradient Boost (GB)

Gradient Boosting Machines was developed by Friedman which trains many models in a gradual, additive, and sequential manner (Zhao et al. 2019). Its goal is to reduce the model's loss by employing a gradient descent-like approach to add weak learners. To forecast the outcome, it employs several additive functions.

$$\overline{{Z }_{i}}={{Z}_{i}}^{0}+\eta \sum_{c=1}^{m}{f}_{c}\left({X}_{i}\right)$$
(2)

where \(\overline{{Z }_{i}}\) is the prediction for the ith trial, where \({X}_{i}\) is the feature vector, \(m\) is the number of estimators and each estimator \({f}_{c}\) resembles an independent tree structure;\({{Z}_{i}}^{0}\) is the initial mean of measured values in the training set; \(\eta\) is the learning rate (Lee et al. 2020).

Finally, effective ML models were constructed using a k-fold cross-validation method with a fold value of 10 and three assessment criteria: R-squared (R2), mean absolute error (MAE), and root mean square error (RMSE). Table 3 provides the parameter settings used to optimize the ML model performances.

Table 3 Parameter settings of ML models

3 Results and discussion

3.1 Wear rate

Initial wear tests were conducted at normal pressures ranging from 0.1249 to 0.8743 MPa (0.2498 MPa increments) and sliding speeds ranging from 1 to 7 m/s (2 m/s increments) across a total of 10,000 m. The volume lost due to sliding was expressed as a percentage of the total volume and used to determine the wear rate (mm3/m) (Algur et al. 2017). Normal pressure has been shown to enhance the wear rate for all specimens (Fig. 1a, b. Sliding speeds between 1 and 3 m/s reduce the wear rate, while speeds above 3 m/s increase it for all typical pressures. Sliding at 3 m/s increases normal pressure, and the wear rate is essentially consistent across all distances. The wear rate will be greater under the most extreme operating conditions of 7 m/s and 0.8743 MPa than under the other operating conditions. The critical sliding speed for hypereutectoid steels is thus determined to be 3 m/s.

Fig. 1
figure 1

Hypereutectoid steels wear rates a for 0.92 wt.% C, and b for 1.57 wt.% C

3.2 Worn surface morphology

The formation of the oxide layer starts usually at 3 m/s for ferrous materials and also starts the work hardening process at the wearing surface due to the plastic deformation. These oxide layers and hardened surfaces usually avoid intimate contact and avoid adhesion breaks during sliding and give a transitional state from severe to mild wear. In such situations, free-wear debris particles gather in the grooves of the contact surface.

Smearing the worn surface is important since it aids in the changeover. SEM Fig. 2a shows the surface wear for sample -1 at normal pressure of 0.8743 MPa and a sliding speed of 3 m/s. In this instance, oxide forms on the surface, and the surface are flattened due to the high normal pressure. Spalling occurs when cracks reach a certain size, separating considerable amounts of material. Figure 2b shows the typical size and shape of the debris, and the matching EDS is shown in Fig. 2c to illustrate the damage to the contacting surfaces.

Fig. 2
figure 2

Hypereutectoid steels for 0.92 wt.% C with speed-3 m/s, normal pressure 0.8743 MPa a SEM micrograph, b Wear debris, c EDS

Two-body abrasive wear occurs as the asperity welds to the sliding disc and functions as an abrasive particle over the worn surface. Figure 3a shows a groove appearing on the wearing surface caused by the welding asperity, and Fig. 3b shows a broken type of wear debris caused by the asperity cutting the specimen. Figure 3c depicts the EDS based on the data.

Fig. 3
figure 3

Hypereutectoid steels for 1.57 wt.% C with speed-3 m/s, normal pressure 0.8743 MPa a SEM micrograph, b Wear debris, c EDS

3.3 wear rate prediction by machine learning

We have conducted 48 trials of experiments on each 0.92% C and 1.57% C hypereutectoid steels samples with 3 varying input parameters to predict wear rate. These results were used to train ML algorithms. To visualize and analyze the data for all considered ML Algorithms, graphs are constructed where the training data and test data are denoted as red and blue scatter points, and the predicted train data and predicted test data are denoted as red and blue dashed lines which are plotted over the hyperplane and the black solid line indicates the model’s fit line concerning the mathematical equations behind the ML algorithms for 0.92% C and 1.57% C hypereutectoid steels respectively (Figs. 4a, b, 5, 6, 7, 8, 9 and 10a, b).

Fig. 4
figure 4

Linear Regression to predict the wear rate of a 0.92% C, and b 1.57% C hypereutectoid steels, c residual of experimental data and prediction, d R-squared value of training and test data

Fig. 5
figure 5

K-Nearest Neighbor to predict the wear rate of a 0.92% C, and b 1.57% C hypereutectoid steels, c residual of experimental data and prediction, d R-squared value of training and test data, e R-squared values with varying neighbor values (k)

Fig. 6
figure 6

SVM to predict the wear rate of a 0.92% C, and b 1.57% C hypereutectoid steels, c residual of experimental data and prediction, d R-squared value of training and test data

Fig. 7
figure 7

GPR to predict the wear rate of a 0.92%C, and b 1.57%C hypereutectoid steels, c Residual of experimental data and prediction, d R-squared value of training and test data

Fig. 8
figure 8

Random Forest to predict the wear rate of a 0.92%C, and b 1.57%C hypereutectoid steels, c Residual of experimental data and prediction, d R-squared value of training and test data

Fig. 9
figure 9

ADA Boost to predict the wear rate of a 0.92%C, and b 1.57%C hypereutectoid steels, c residual of experimental data and prediction, d R-squared value of training and test data

Fig. 10
figure 10

Gradient Boost to predict the wear rate of a 0.92% C, and b 1.57% C hypereutectoid steels, c residual of experimental data and prediction, d R-squared value of training and test data

Scatter plots are constructed to visualize and compare the calculated residual values of hypereutectoid steels from various ML algorithms, where green colored data points indicate the residual values of 0.92% C and red color data points for 1.57% C (Figs. 4c, 5, 6, 7, 8, 9 and 10c). Bar charts are also constructed to show the comparison of predicted R-squared values of the training dataset and test dataset for both hypereutectoid steels using all the constructed models (Figs. 4d, 5, 6, 7, 8, 9 and 10d).

Linear regression fits the model for deriving a linear relationship among the input features and output for both hypereutectoid steels along with prediction accuracy as shown in Fig. 4a, b. It is perceived that for both 0.92% C and 1.52% C hypereutectoid steels the most values of train data and train prediction have noticeably deviated from one another’s individual data points. From Fig. 4c it is observed that residual values follow a normal distribution with randomly distributed on either side of the zero line.

In the preferred, case of datasets having linear relationships, the LR technique can produce reasonable estimates at a minimal computational cost (Kong et al. 2018). When the results are compared to 1.57% C, the R-squared value of 0.92% C hypereutectoid steel is lower (Fig. 4). This could be attributed to a rise in carbon content, which raises the material's hardness (Sharanabasappa et al. 2015b).

Figure 5a, b illustrate the predicted wear rate against the experimentally measured data for both hypereutectoid steels using the KNN algorithm. Changes in the K value can vary the predictive performance of the KNN method. It generates various conditional probabilities concerning changes in the K value during the prediction phase. Figure 5e illustrates the variations in R2 training and test values for both hypereutectoid steels concerning the changes in K value. The KNN performance is particularly subtle to the choice of K and a larger K value does not always imply a higher prediction accuracy, for example, in the case of 0.92 percent C, the prediction accuracy appears to be worse. This means that the best K value will vary depending on the materials used and the number of datasets used (Ahmad et al. 2017).

The SVM kernel function was used to estimate the wear rate of hypereutectoid steels and compare it to the measured values, as shown in Fig. 6a, b. From Fig. 6d, the accuracy of the SVM algorithm is more for 1.57%C hypereutectoid steels when utilizing the polynomial kernel function. This could be attributed to an increase in carbon content, which increases the hardness of the material.

Wear rates predicted by the GPR algorithm and experimental data for two hypereutectoid steels are shown in Fig. 7a, b, respectively. By comparing the results of GPR with LR, KNN, and SVM algorithms (Figs. 4, 5, and 6), GPR is producing better predictions. Possibly this is because of the nonparametric nature of the GPR method.

Figure 8a, b individually illustrate hypereutectoid steels' predicted wear rate using the RF algorithm. This data reveals that the Random Forest algorithm outperforms the LR, KNN, SVM, GPR, and AdaB algorithms in terms of experiment prediction accuracy (Figs. 4, 5, 6, 7, and 9).

Figure 9a, b, individually illustrates the predicted wear rate of hypereutectoid steels using the AdaB algorithm. According to these results, the Ada Boost algorithm achieves higher prediction accuracy than the LR, KNN, SVM, and GPR algorithms (Figs. 4, 5, 6, and 7).

Figure 10a, b, shows the predicted wear rate of hypereutectoid steels by means of the Gradient Boost algorithm. The predicted wear rate based on the Gradient Boost algorithm is the best fit for both hypereutectoid steels compared to the other methods used in this investigation (Figs. 4, 5, 6, 7, 8, 9 and 10).

Based on the data from Figs. 4, 5, 6, 7, 8, 9 and 10, Table 4 compares the accuracy of the seven ML algorithms in predicting wear rates. The LR algorithm has traditionally been used to determine whether or not an input–output relationship is linear. The MAE and RMSE values for 0.92% C/1.57% C are 0.528/0.508 and 0.661/0.599 respectively (Table 4).

Table 4 Performance of ML algorithms

The wear rate prediction accuracy of the KNN method is greater than the LR approach (Fig. 5 and Table 4). In contrast to the LR method, the KNN method makes no assumptions regarding the data input and is less susceptible to outliers, resulting in higher prediction accuracy.

The SVM approach is superior to the LR and KNN models for predicting quasi-data as it can handle more than two predictor variables (Table 4). The SVM algorithm has a relatively poor chance that some data may be omitted from the test set (Krishnan et al. 2018).

The GPR model’s prediction of wear rate is better than LR, KNN, and SVM because the training datasets are assumed to follow a normal distribution with a known mean and standard deviation (Smola et al. 2004). In addition, the GPR method is a nonparametric process unaffected by constraints in the structure of the dataset hence providing reliable predictions of wear rate for the existing datasets.

The ADA Boost algorithm predicts a better wear rate as compared to the output of the LR, KNN, GPR, and SVM models because the weights of instances are assigned based on the error of the most recent forecast.

The RF algorithm has the higher prediction accuracy of the wear rate of hypereutectoid steels among the LR, KNN, SVM, GPR, and AdaB ML algorithms, as illustrated in Fig. 8 and Table 4. This is due to the reduced overfitting tendency by creating random subsets of the features and the tree diversity is built.

The GB algorithm has the highest prediction accuracy of wear rate of 0.92% C and 1.57% C hypereutectoid steels among all seven ML algorithms, as illustrated in Fig. 10 and Table 4. The outcome in the GB approach is determined not by a single decision tree, but rather by a collection of trees.

4 Conclusion

In the present work, seven ML algorithms are applied to the dataset of empirically measured rate of wear of hypereutectoid steels (0.92% C and 1.57% C) with the input parameters (sliding speed, normal pressure, and sliding distance). The effectiveness of the algorithms is discussed and ranked in order of prediction accuracy: GB, RF, AdaB, GPR, SVM, KNN, and LR. The Gradient Boost algorithm predicts the wear rate of hypereutectoid steels with maximum accuracy, with R2 test being 0.972 and 0.991 for 0.92% C and 1.57% C, respectively. The GB accomplishes this by gradually, additively, and sequentially training many models to reduce the loss functions of a model by including weak learners in a gradient descent technique. These results also prove that ML models with hyperparameter settings are efficient enough to work well on small datasets with higher prediction accuracies. Worn-out surfaces and debris show an Oxidative wear mechanism for specimen 1 (0.92 wt.% C) and an abrasive Wear mechanism for specimen—2 (1.57 wt.% C). These findings could aid future studies in predicting the wear rate of hypereutectoid steel materials, perhaps speeding up the development of new functional steel with controlled wear behavior.