1 Introduction

Extensively increasing automation has increased the complexity of the modern industrial system and raised the standards for precision. This has increased demand for modern, efficiently run mechanical equipment. Thus, health monitoring of complicated machinery is thus an essential but costly endeavour. Rolling element bearings (REBs) are an integral component of widely used industrial equipment and determines the equipment's service life. Over the lifespan, the equipment loses its durability and performance, and failure risk increases simultaneously. The condition-based maintenance scheme should be an integral part of any preventative maintenance plan [1]. Excessive vibration is the most common reason for a system failure, and the most common source of vibration is fault present. Local and widespread defects in the machine may lead to catastrophic failure. Bearing failure in any of the heavy rotating machines or assembly lines can result in a shutdown, affecting the overall cost and quality of the product. Recently, substantial research has been conducted on defect classification without disrupting the machine's initial activities. It facilitates the planning of maintenance and mitigates the effects of malfunctions REB failures.

Over the last few decades, numerous research articles have been published to detect/diagnose rotor-bearing system faults; some of the essential findings are mentioned below. Fan et al. [2] implemented a support vector machine (SVM) built with fusion and particle swarm optimization techniques. Empirical mode decomposition characteristics (EMD) are combined with excellent generalization properties using the multi-kernel least square support vector machine (MK-LS-SVM). The results demonstrate its efficacy in detecting and classifying bearing faults. Kedadouche et al. [3] presented a hybrid technique to improve the categorization and diagnosis of rolling bearing degeneration. The approach combines wavelet packet transform, singular value decomposition, and SVM. The suggested methodology accurately classifies rolling bearing defects, according to test findings. Guo et al. [4] studied an envelope spectrum analysis based on the Hilbert transform derives fault bearing features from rolling bearing fault vibration signals. SVM analyzes vibration data from faulty rolling bearings (ball fault, inner and outer race fault). This study findings demonstrate that the method under consideration offers precise diagnosis and a high level of diagnostic resolution. Xu et al. [5] explored a fault identification method with permutation entropy-based technique, support vector machines, and particle swarm optimization. The compound multiscale permutation entropy model has a greater classification accuracy than multiscale permutation entropy, according to the findings of the experiments. Lin et al. [6] developed a collection of diagnostic models for malfunction based on several signal processing analytical approaches. The support vector machine was trained and tested using the collected eigenvalues and ball-bearing status categories. Finally, the artificial fish-swarm technique was used to find the best classifier settings, improving the accuracy of malfunction classification. Kankar et al. [7] studied artificial neural network (ANN) and support vector machine (SVM) for ball bearing automatic fault diagnosis. Gangsar and Tiwari [8] performed a prediction of induction motor fault with a support vector machine by monitoring current and vibration signals.

Goyal et al. employed wavelet transform, Mahalanobis distance method support vector machine [9] and KNN [10] for analysis for vibration signal, feature selection, and classification of bearing defects. Lei et al. [11] studied the combination of weighted KNN (WKNN) for fault identification in rotating bearings to overcome a flaw in KNN. Results showed good accuracy in detecting bearing abnormalities fault identification with severity. Kumar [12] uses empirical mode decomposition (EMD) for the decomposition of vibration characteristics into intrinsic mode functions (IMFs) into the various intrinsic mode functions. Extracted statistical features from a raw signal are given as input for the KNN classifier. The KNN classifier and weighted k-nearest neighbor (WKNN) were used for fault identification and classification. Rathore and Harsha [13] evaluated a bearing remnant operational life, and a comparison is made with experimentally obtained actual life. Wan et al. [14] developed a teager energy entropy, and improved fuzzy C-means are used to a fault diagnosis system. The mean-shift approach proves the initial clustering center existence to improve fuzzy C-means clustering performance and reduce misclassification. Yao et al. [15] implemented a stacked inverted residual convolution neural network (SIRCNN) due to its inherent advantages and better accuracy over conventional machine learning methods. Udmale et al. [16] generated a kurtogram from a vibratory signal and implemented a robust learning machine algorithm.

Liang et al. [16] implemented WT-GAN-CNN with convolutional neural network (CNN), GAN, and WT for fault diagnosis in rotating machinery. The proposed approach shows better accuracy over other machine learning approaches.

Jalan and Mohanty [17] mentioned that rotor unbalance and shaft misalignment are the primary vibration sources in rotating machines. The author studied a model-based technique for fault diagnosis of the rotor-bearing system. Comparison of residual and equivalent theoretical forces successfully identifies the location and fault condition.

Li et al. [18] studied time-domain signal is split using Hankel matrix and reconstructed using the spectrum fusion method. Tiwari [19] studied higher harmonics and related sidebands occur when the bearing deflects grow in size. The response has very high-frequency components due to the micro-irregularity on the bearing-contacting surface.

Shinde and desavale [20, 21] applied a DA method for finding a rotor-bearing vibration, and SVM was used for fault classification of the rotor-bearing system. Kumbhar et al. [22,23,24,25] highlighted the importance of Buckingham's pi theorem of DA, experimental approach and ANFIS to identify the rotor bearing vibrations. Patil and Jadhav [26] reported experimental investigation with the RSM and DA approach demonstrates the methodology's efficacy for condition monitoring of rotating machinery in industries.

From the earlier researcher findings, it is noticed that most research works extracted statistical features from a vibration signal and post-processed using various machine learning techniques. Overlapping extensive statistical features and redundant data consideration reduces fault classification accuracy. Hence, the current work considers bearing characteristic frequencies as a feature set for KNN. This unique consideration may improve fault identification accuracy for low-speed applications. This paper aims to identify and classify the faults present in the rotor-bearing system using KNN based on extracted bearing characteristic frequency sets.

The studied work is structured as Sect. 1 contains a introduction with complete summary of the literature's findings and a description of gaps. Section 2 describes KNN classifier methodology with feature selection and extraction. Section 3 is followed by experimental setup and experimentation. Section 4 describes KNN implementation with bearing characteristics features, and Sect. 5 includes the paper conclusion.

2 KNN classifier methodology

Figure 1 shows the proposed method for bearing fault diagnosis flowchart. The KNN algorithm can solve classification and regression issues.

Fig. 1
figure 1

KNN methodology

The supervised learning algorithm KNN is a machine learning algorithm. The classification divides the population of data points into groups so that data points in the same group have more features in common than data points in other groups. Classes are data groups. The KNN machine learning algorithm used the Euclidian, cosine, and city block distance metrics to determine the closest data class. The main advantage of employing the KNN algorithm is its simplicity and ease of comprehension. The detailed flowchart of KNN implementation is shown in Fig. 2.

Fig. 2
figure 2

KNN implementation

When a large dataset is used, KNN performs well. The KNN algorithm can classify items in data collection and produce excellent results. The proposed approach classifies new data points by comparing their similarity and placement to those in the training dataset.

The unique data points K neighbors are determined in the training dataset, and a new data point class with a substantial number of neighbors is picked. Figure 3 depicts the visualization of the characteristic amplitudes of various fault bearings for KNN classifier training with known fault classes.

  • Application of distance metrics

  • It is essential to select the most suitable distance metrics for the dataset available so the algorithm can function optimally. The current study has implemented the Euclidean, cosine and city block distance metrics. The Euclidean distance metric is the most popular and efficient for the proposed KNN study.

Fig. 3
figure 3

Three nearest neighbors visualization

The details of all three metrics are listed below.

2.1 Euclidian distance method

The straight-line distance between two data points is known as the Euclidian distance shown in Fig. 4. When the dataset is organized into distinct groups, Euclidian distance produces good classification results. When two data points do not have uniform standards, normalization is required.

Fig. 4
figure 4

Euclidian distance method

For \(i = 1{\text{ to}} n\)

$${\text{d}}\left( {x, y} \right) = \sqrt {\mathop \sum \limits_{i = 1}^{n} \left( {x_{i} - y_{i} } \right)^{2} }$$
(1)

where x and y are training and test dataset row values.

2.2 Cosine distance method

Using the cosine distance method is appropriate when dataset groups are oriented in a particular direction. The primary purpose of this distance measure is to determine how similar two vectors are to one another. The cosine of the angle between two vectors is used to determine whether or not they point in the same direction shown in Fig. 5.

$$\cos \theta = ~\frac{{\vec{\text{A}}\vec{\text{B}}}}{{\vec{\text{A}}\vec{\text{B}}}}$$
(2)
Fig. 5
figure 5

Cosine distance method

This formula yields a value that indicates the similarity between the two vectors, whereas (1-cos) provides their cosine distance. Dataset groups should be oriented to use the cosine distance approach. Cosine distance similarity is used when two data objects are far away by size but have smaller angles. Cosine similarity captures the angle of orientation, not the magnitude, in the case of multi-dimensional space. Lower is the similarity; higher is the angle.

2.3 City block method

The total of the absolute Cartesian coordinate difference is the distance between two points using the city block method shown in Fig. 6.

Fig. 6
figure 6

City block method

For \(i = 1 \quad to \quad n\)

$$d \left( {x, y} \right) = \mathop \sum \limits_{i = 1}^{n} \left| {x_{i} - y_{i} } \right|$$
(3)

where x and y are training and test dataset row values (number of features).

For the larger datasets, the city-block distance metric is favored over the Euclidean distance metric. This effect is caused by the 'curse of dimensionality.

2.4 Feature selection

A lot of redundant information and noise reduces the capability of machine learning algorithms to process the data. Several features have been extracted for better classification accuracy and can be supplied as input for the machine learning algorithm.

2.4.1 Bearing characteristics amplitude extraction

The amplitudes of bearing characteristics captured from vibration signals have physical significance regarding the degradation of bearing components. Statistical features like kurtosis, skewness, mean, standard deviation have been extracted from time and frequency domain vibration signals. Various bearing faults may be acquired through such feature values. Bearing fundamental frequencies are calculated from bearing dimensions and shaft speed. Filtered signal characteristic amplitudes are extracted from the envelope spectrum at fundamental frequencies, depicting bearing conditions. The present study shows an accuracy of both feature characteristics over each other.

The following equations are used to calculate fundamental frequencies:

Fundamental frequencies

Formulae

Outer race ball pass frequency (BPFO)

 = 

\(\frac{Z}{2}\frac{{N_{{\text{s}}} }}{60} \left( { 1 - \frac{{D_{{\text{b}}} }}{{d_{m} }}\cos \alpha } \right)\)

Inner race ball pass frequency (BPFI)

 = 

\(\frac{Z}{2}\frac{{N_{{\text{s}}} }}{60} \left( { 1 + \frac{{D_{{\text{b}}} }}{{d_{m} }}\cos \alpha } \right)\)

Rolling element defect frequency or ball spin frequency (BSF)

 = 

\(\frac{{d_{m} }}{{D_{{\text{b}}} }}\frac{{N_{{\text{s}}} }}{60} \left\{ {1 - \left( {\frac{{D_{b} }}{{d_{m} }}} \right)^{2} \cos^{2} \alpha } \right\}\)

Shaft spin frequency (SSF)

 = 

\(\frac{{N_{{\text{s}}} }}{60}\)

\(D_{{\text{b}}}\) = Ball Diameter, \(d_{m}\) = Pitch Diameter of bearing, \(N_{{\text{s}}}\) = frequency of shaft (rpm) and Z = number of rolling elements α = contact angle between ball and race.

Table 1 shows input-bearing characteristics features used to train and test fault classes. Ball pass inner race frequency and its sideband are represented by the first three columns for various fault classes. Similarly, for ball spin frequency and its sideband (BSF), shaft spin frequency and sideband (SSF) are shown in its subsequent columns for all fault classes.

Table 1 Bearing characteristics Features vector for KNN

3 Experimental setup

The experimental test rig consists of two 6209 K deep groove ball bearings supporting a rotary shaft that carries an unbalanced disc, as shown in Fig. 7. The shaft is connected to a DC motor through a flexible coupling. The diameter of the MS shaft is 38 mm, and the length of the shaft is 600 mm is considered for the experimentation. Table 2 provides the bearing specifications, and vibration responses are collected by an accelerometer mounted on the vertical, horizontal and axial test-bearing pedestal. A 4 channel Adash 4400 VA4 Pro Ver. 2.21 FFT analyzer with a sensitivity of 100 mV/g is utilized to process the collected signals in the experiment.

Fig. 7
figure 7

Experimental setup

Table 2 Bearing specifications

The five different fault classes are considered for experimentation, such as artificially created surface faults on the inner race and outer race (Figs. 8, 9 and 10), clearance between ball and inner race, unbalance in the shaft, parallel misalignment between rotor and shaft. The artificial surface faults of different sizes are created using the electro-discharge machining method.

Fig. 8
figure 8

Outer race fault

Fig. 9
figure 9

Inner race fault

Fig. 10
figure 10

Compound fault

The dynamic vibratory responses collected in the frequency domain are provided to the developed KNN algorithm for fault classification and identification. MATLAB is employed to train and test the algorithm. Different variables, coded factors, and real factors for each variable are considered for experimentation (Table 3). The bearing vibratory response is recorded at various rotor speeds and defective conditions in the accelerometer's vertical, horizontal and axial directions shown in Table 4.

Table 3 Levels of experimental design
Table 4 Vibratory response of bearing under various defects

Figure 11 shows the vibration response at 900 rpm and 30 g rotor unbalance, showing 1.02 mm/s significant peak amplitude at the first harmonics of shaft frequency. The peak amplitude shows the presence of unbalance in the system. All other signatures lie below the considerable amplitude level except the predominant unbalance fault in the rotor-bearing system.

Fig. 11
figure 11

Frequency response at 900 rpm, 30 g unbalance

Figure 12 represents the frequency response at 700 rpm and 0.4 mm of misalignment, which shows a significant peak at the second harmonics of shaft frequency. A 1.80 mm/s of peak amplitude shows the presence of misalignment in the system. It is observed that as the shaft misalignment increases with the rotation of the shaft, the amplitude of vibration increases substantially, and similar nature is observed for rotor unbalance.

Fig. 12
figure 12

Frequency response at 700 rpm, 0.4 mm misalignment

Figure 13 shows the frequency response at 700 rpm shaft speed with 0.2 mm the circular-shaped inner race surface defect. The recorded signal shows a peak amplitude of 2.30 mm/s at the first harmonics of shaft frequency which is 1.12 BPFI. The existence of peak amplitude at multiples of BPFI reveals defected condition of the inner race of the bearing.

Fig. 13
figure 13

Frequency response at 700 rpm, 0.2 mm inner race surface defect

Figure 14 shows the frequency response at 1100 rpm shaft speed with 0.2 mm the circular-shaped outer race surface defect. The peak amplitude of 2.64 mm/s at 1.04 times of BPFO at 1100 rpm shaft speed is observed, reflecting the outer race defect in the test bearing. It is also observed that the vibration amplitude associated with the inner race surface defects is higher than the outer race defects amplitude at similar operating conditions.

Fig. 14
figure 14

Frequency response at 1100 rpm, 0.2 mm outer race surface defect

Figure 15 depicts the system response at the combined fault condition, showing 3.68 mm/s of peak amplitudes at 1.05 times of BPFI and 3.297 mm/s at 1.07 times of BPFO. It shows the presence of multiple fault conditions.

Fig. 15
figure 15

Frequency response of different fault combinations at 1100 rpm

4 KNN implementation

Figure 16 shows a scatter plot for shaft spin frequency (SSF), and its sidebands show significant variation amongst all fault classes.

Fig. 16
figure 16

Scatter plot for all responses

KNN is one of the most successful intelligence systems used in fault diagnostics due to its promising performance, such as its capacity to predict even with small amounts of data and faster and more efficient training than other systems.

A novel bearing characteristics features were extracted from the vibratory response consisting of 66 instances and ten bearing characteristics features. Figure 17 represents characteristics amplitude with each extracted feature. The shaft spin frequency (SSF) of bearing with both sidebands is represented by Fig. 17a–c. Each fault has its inherent characteristics and shows a different range for each fault class. Figure 17 shows that misalignment and outer race fault have a higher amplitude than all fault classes. In Fig. 17b, the outer race fault and inner race fault class have higher amplitude over all other fault classes. The ball pass outer race frequency (BPFO) characteristics feature is represented by Fig. 17d shows higher amplitude inner race defect over all other fault classes. From Fig. 17e–g shows BPFI frequency characteristics with sidebands and significant deviation in different fault classes.

Fig. 17
figure 17

Characteristics amplitudes with class a SSF b SSF sideband (\(f_{b} - f_{t}\)) c SSF Sideband (\(f_{b} + f_{t}\)) d BPFO e BPFI f BPFI side band (\(f_{b} + f_{t}\)) g BPFI sideband (\(f_{b} - f_{t}\))

Five-fold cross-validation is used in the KNN algorithm to check the model's accuracy. In addition, KNN classifiers with different parameter distance measurement criteria, number of neighbors, distance weighing criteria, normalization of data are used to optimize the KNN model listed in Table 5.

Table 5 KNN parameters

The accuracy plot for city block, euclidian and cosine is shown in Fig. 18, 19 and 20, respectively. The distance weight of equal, inverse and square inverse is utilized for finding accuracy. Compared to the rest method, the city block distance weight has high accuracy. On the other hand, equal weight is less accurate than varied weightings in all distance approaches. The case study summary and optimized parameter selected for models are listed in Table 6.

Fig. 18
figure 18

KNN city block model accuracy

Fig. 19
figure 19

KNN euclidian model accuracy

Fig. 20
figure 20

KNN cosine model accuracy

Table 6 Optimized KNN parameters

Artificial neural networks (ANN), support vector machine (SVM), and K-nearest neighbors (KNN) are used to identify faults of different classes. Each classification algorithm has a different level of accuracy over each other shown in Fig. 21.

Fig. 21
figure 21

Classwise accuracy

The ANN classification model shows an 81.18% accuracy and classifies 54 over 66 instances. Similarly, the SVM model shows better accuracy than the ANN model and 89.39% accuracy of fault identification. The proposed model with unique feature extraction shows 98.5% accuracy over all other classification models. Sixty-five fault classes out of sixty-six predicted correctly and had the highest accuracy. The details of true positive (TP rate), false negative (FN), F-measure score, and sensitivity are tabulated in Table 7.

Table 7 Detailed training by class for KNN

However, Fan et al. [1] implemented an SVM-SRPSO algorithm with Permutation entropy and Energy entropy as fault classification characteristics show an accuracy of 99.72%. Goyal et al. [9] uses statistical features for KNN and weighted KNN algorithm to classify the inner race, outer race, and ball defect with 95.8% and 94.7% accuracy, respectively. Similarly, Table 8 depicts most researchers presenting feature matrices using statistical parameters like range, mean value, standard deviation, skewness, kurtosis, crest factor, entropy, etc., showing various levels of classification accuracy. However, proposed method demonstrates the successful classification of bearing fault based on bearing characteristics amplitude as features. The bearing is running at a higher shaft speed, and load gives confident feature extraction, which leads to more efficient features classification. Hence, the current work considers bearing characteristic frequencies as a feature set for KNN. This unique consideration improves fault identification accuracy.

Table 8 Comparison of bearing fault classification by different researchers

5 Conclusion

The current study presents the bearing characteristics as novel feature sets rather than statistical features from the vibration response for rolling element bearings fault diagnosis. The K-NN machine learning algorithm is used for fault classification amongst all data sets.

The experimentation is carried out on a rotor-bearing setup developed in the laboratory. Many trials are conducted for five different fault classes, and responses are recorded. The KNN machine learning algorithm is used to train and test datasets. The same datasets have been tested using SVM and ANN. Proposed KNN machine learning algorithm with novel feature extraction shows good accuracy over other algorithms.

The results obtained are summarized as,

  • KNN classification provides excellent results at the defect level with the appropriate feature selection approach.

  • It is seen that novel extracted features shows good accuracy over support vector machine (SVM) and artificial neural network (ANN)

  • The artificial neural network (ANN) shows 81.18% accuracy, and 89.39% accuracy is given by SVM for all bearing fault detection.

  • K-NN classification algorithm classifies 65 fault classes correctly out of 66 fault classes.

  • The proposed K-NN classification algorithm with the city block method shows 98.5% accuracy over Euclidian and cosine distance method.

Extraction of different fault class bearing characteristics features can be effectively utilized for the prediction fault inside the rotor-bearing system. The current work could be extended to identify and classify multiple bearing defects using deep convolutional neural networks (DCNN) and long short-term memory (LSTM) networks. A comparison can be made with the current approach.