Introduction

Rotary machines remain the mainstay in various industries, including construction machinery, rail transit equipment, instrumentation, precision machine tools, and other pertinent domains. Small failures in roller bearings frequently occur and may lead to catastrophic consequences [1]. Also, fault location, fault type, and fault severity can all be determined from a vibration signal, indicating when a bearing's inner ring (IR), ball element (BA), or outer ring (OR) has failed [2]. So to detect abnormal bearing vibration, various methods can be employed, including vibration sensors, accelerometers, and laser displacement sensors. These sensors can measure the vibration of the bearing and provide a signal that can be analyzed to determine the cause of the vibration [3]. By optimizing computational resources and minimizing memory requirements, the technology for processing vibration signals using deep learning can be adapted for small, portable, and energy-efficient embedded platforms. This adaptation renders it well-suited for deployment in challenging industrial operating environments [4]. While vibration levels may fall within acceptable limits, ongoing monitoring remains essential to prevent potential deterioration over time. In light of recent strides in complex and intelligent industrial systems, early detection of issues has become increasingly pivotal [5]. Certain industrial and manufacturing machinery is required to function continuously in challenging environments, leading to the risk of critical components such as bearings, a prevalent element in popular industrial equipment, experiencing failures [6]. Top of Form Rolling bearings play a critical role in the total performance of machinery. Failure to promptly address issues with rolling bearings can lead to complete equipment breakdown, causing substantial financial losses, environmental damage, and posing risks to worker safety [7]. Hence, the synergy of monitoring and supervision is vital to proactively manage faults, ensuring uninterrupted and normal equipment operation. Leveraging intelligent approaches, For instance, predictive maintenance predicated on process conditions, has proven effective in fault diagnostic research with machine learning at its core [8]. This involves two key stages: fault classification and signal feature extraction [9]. In order to ensure facility safety and prevent major breakdowns, the imperative was to develop a method for detecting and predicting failures proactively, enabling necessary repairs without interrupting device operations. Relying solely on human experience for error detection often proves inadequate, especially in critical and complex scenarios. Ongoing research, conducted through various studies, seeks to establish cutting-edge mechanisms and strategies for innovating new predictive maintenance tools, facilitating the detection of errors before they manifest [10]. One notable finding is the continuous monitoring of the state of rotating machines by collecting diverse data, with vibration signal data being particularly prominent. Consequently, vibration analysis emerges as a vital technique for real-time system monitoring during operations. Vibration analysis in practice enables accurate identification of errors in bearing elements [11]. Fundamentally, the application of deep learning and machine learning techniques plays a pivotal role in identifying production and operational issues. Various tools are employed to analyze faults in bearings and other machine components, including K-nearest neighbor (KNN), support vector machines (SVM), artificial neural networks (ANN), and fuzzy techniques [12]. In recent years, convolutional neural networks (CNNs) designed for artificial intelligence have significantly enhanced fault identification accuracy, leveraging different feature types in the diagnostic process. Employing CNN as a backend trainer and classifier, along with statistical information derived from raw vibration signals of single faults, achieves high-precision bearing defect diagnosis [13]. The general process for identifying bearing faults involves two key steps: feature extraction (data processing) and fault recognition [14]. Despite the nonlinear and non-stationary nature of bearing vibration signals, they often contain sufficient defect information. Consequently, the extraction of signal characteristics becomes a critical step. Time-frequency signal analysis techniques, such as the short-time Fourier transform (STFT) and continuous wavelet transform (CWT), are commonly utilized for preliminary automatic feature extraction. These techniques transform original 1-D time domain data into 2-D time-frequency images [15]. Many deep neural networks, including AlexNet [16], GoogleNet [17], and VGG [18] use CNN for the smart fault detection of rolling bearings.

In this paper, we employ a CNN-AlexNet network to predict various fault types occurring in bearings, leveraging data obtained in the time domain to achieve optimal accuracy within a relatively short timeframe. Our proposed model undergoes thorough evaluation using two established benchmark bearing datasets: The real-time bearing dataset, a run-to-failure raw bearing dataset amassed by the instrumentation and control engineering laboratory at Vishwakarma Institute of Technology (VIT) in Pune, India, and the Case Western Reserve University (CWRU) bearing dataset sourced from the Bearing Data Centre at Case Western Reserve University, Canada. This investigation introduces an innovative approach for fault diagnosis in rolling element bearings, employing a fusion of sequential CNN analysis, image classification, and force detection. Sequence maps derived from the internal vibration signal of rolling element bearings are meticulously calculated. These sequential maps exhibit consistent patterns across various operational conditions, displaying fluctuations attributable to faults while maintaining a steadfast structure amidst changing operational parameters. Consequently, their resilience to alterations in operating conditions, namely speed, and load, positions them as invariant vibration patterns. Utilizing force maps computed at distinct operational states, a subset is employed for training the CNN model. The trained CNN model is subsequently employed for fault diagnosis across diverse operating conditions, encompassing variations in speed and load. The results demonstrate that our proposed method consistently achieves an average accuracy rate that surpasses the state-of-the-art articles, even in a shorter time interval, across both training and test datasets. Additionally, we present the detailed architecture and the step-by-step methodology followed to attain high accuracy in our fault detection model.

Theoretical Framework

Continuous Wavelet Transform(CWT)

This process is like to the Fourier transform, which decomposes a signal into sinusoids with different frequencies, but the wavelets used in the CWT are more localized in time, permitting for a more detailed analysis of the signal's time-varying frequency content. CWT techniques are very helpful in signals with multiple resolutions and time-frequency positions because they transform time signals into a two-dimensional time-frequency domain from one-dimensional time signals.

As seen in the following equation of CWT:

$$\text{CWT}\left(a,\tau \right)=\frac{1}{\sqrt{a}}{\int }_{-\infty }^{+\infty }s(t){\varphi }^{*}\left(\frac{t-\tau }{a}\right)\text{d}t$$
(1)

Wherever \(a\) represents the wavelet scale, \(\tau\) is the wavelet time localization, \(\frac{1}{\sqrt{a}}\) maintains the wavelet energy constant at varying scales, and \({\varphi }^{*}\) represent the mother wavelet (\(\varphi )\) conjugate. Where the initial one-dimensional data signal is represented by \(s(t)\). When CWT is applied to signals, scalograms, or 2D time-frequency spectra, are produced. In scalograms, a continuous wavelet transform is represented (CWT). In this research, the raw vibration signals captured by sensors are sampled as a time series at intervals of 1024, and CWT is performed for this time series [19, 20].

Convolutional Neural Network (CNN)

The most well-known and widely used deep learning algorithm is CNN. CNN has a basic edge over its predecessors in that it performs tasks automatically and without human involvement, recognizing the relevant elements [21]. It is made up of multi-layer versions that analyze photos to find visual trends. A standard CNN design consists of an image as input, a feature extraction block consisting of fully related layers: first, the convolution and activation layers, then clustering, and finally, a classification layer. The CNN architecture comes in various forms, including AlexNet, ResNet, GoogleNet, and LeNet [22]. AlexNet is a deep learning structure consisting of five convolutional layers, two normalization layers, three max pooling layers, and one softmax layer, with two fully connected layers in its design. Each convolutional layer is linked to a pooling layer. The casual inactivation neuron operation is introduced to the first two fully connected layers of the proposed model to prevent overfitting. The final layer is a softmax classifier for image classification [23].

When using multiple parameter initialization approaches, the use of ReLU as the activation task in AlexNet simplifies computation and model training. To control model complexity, AlexNet uses a dropout strategy; whereas, LeNet merely uses weight decay. Signal analysis by machine learning involves three stages: signal assembly, feature extraction/selection, and model training [24]. Three types typically make up a CNN: a convolutional layer (CL), a sub-sampling layer (SL), and a fully connected layer (FCL). This section outlines the mathematical model for each layer in a CNN's design [25]. Convolutional layers (CLs) are crucial to the success of a CNN's architecture. They consist of several convolutional filters or kernels. The output feature map is produced by convolving the input image, expressed as N-dimensional metrics. The convolutional operation results are passed through the activation function to obtain the output. Recently, Rectified Linear Unit (ReLU) activation functions have increasingly been adopted due to their quick training times and low computational requirements. ‘The following equation mathematics describes the mapping value \(y\) and 1-D value \(x\):

$$y = {\text{conv}}\left( {x,\omega , ^{\prime}{\text{valid}}^{\prime}} \right) = \left( {y\left( 1 \right), \ldots ,y\left( t \right), ..y\left( {n - m + 1} \right)} \right)\epsilon R^{n + m - 1}$$
(2)
$$y\left( t \right) = \mathop \sum \limits_{i = 1}^{m} x\left( {t + i - 1} \right)\omega \left( i \right)t = 1,2,3..,n - m + 1(m < n)$$
(3)

where \(\text{conv}\) indicates the \(\text{valid}\) method convolution operation, the integer \(\text{m}\) represents the weight values in the filter. The length of the signal x is n, \(\omega (i)\) represents \(\text{i}\) the weight of the filter, and \(y(t)\) is the \(\text{t}\) mapping value. The input data dimensions are A and B, while X \(\epsilon\) \(R^{A \times B}\) defines the convolutional layer's input. Followed by a pooling layer to calculate load and reduce spatial dimension and overfitting risk [26]. Include average pooling, norm pooling, max pooling, and logarithmic pooling, and the mathematic model is:

$${\underset{{S}^{M\times N}}{{P}_{\text{cn}} =\text{ max}}}_{ }({Y}_{\text{cn}})$$
(4)

where \({P}_{\text{cn}}\) pooling output layer able to take the maximum value out of the \({Y}_{\text{cn}}\) convolutional output layer, the dimensions of scale matrix \(S\) are \(M\) and \(N\). The maximum value will be taken out of the M \(\times\) N matrix in \({Y}_{\text{cn}}\) during the pooling process until \({S}^{M\times N}\) sweeps the entire \({Y}_{\text{cn}}\) by a defined stride. After passing alternately through pooling layers and convolutional layers, the picture characteristics are input into the fully connected layer. The deep feature data with category distinction are included in the fully connected layer, and a mapping link between the retrieved features and sample types is built. The mathematical formula for a fully connected layer:

$${y}^{k}=f({\omega }^{k}{x}^{k-1}+{b}^{k})$$
(5)

where The completely connected layer output is represented by \({y}^{k}\). \({\omega }^{k}\) is the weight coefficient, \({x}^{k-1}\) is the fully connected layer input, \({b}^{k}\) is the network offset, \(k\) denotes the layer \(k\) network. In general, CNN uses a variety of convolution and pooling layer combinations. Many fully connected layers will then be added one after the other, layer by layer, and they can change the matrix in the filter to a row or column.

Proposed Method

The prevalent neural network architecture for image processing is the 2D convolutional neural network, capable of handling grayscale or RGB images as input. For an accurate diagnosis of failures, the raw fault signals need transformation through continuous wavelet transform (CWT) into 2D time-frequency images enriched with fault information. This paper proposes a CNN-based technique for identifying failures in rolling bearings, and the flowchart of the proposed method is depicted in Fig. 1.

  1. 1.

    Initially, vibration signals from bearings are collected using two accelerometer sensors, one mounted horizontally and the other vertically.

  2. 2.

    Subsequently, the gathered vibration signals are transformed into RGB formats using CWT, and the images are then divided into two parts (training and testing).

  3. 3.

    Each set of 1024 signal sampling points from the raw vibration signals collected by sensors is treated as a time series, and CWT is applied to this series.

  4. 4.

    The AlexNet network is loaded, and the model is fed with samples. The linked layer is then utilized to extract features from both test and training images.

  5. 5.

    To assess the accuracy of the diagnosis, test samples are input into the trained model [27].

Fig. 1
figure 1

Details flowchart to clarify the failures in bearings

The structure of AlexNet is organized as follows: Initially, an image with dimensions of 227 × 227 × 3 is input, where 227 and 227 represent the length and width, and 3 channels correspond to an RGB image. The first convolutional layer is applied with 96 filters and a filter size of 11, resulting in a stride of 4, which reduces the dimensions of the input to 55 × 55 × 96. Subsequently, the second convolutional layer utilizes a filter size of 3 and a stride of 2, reducing the dimensions from 55 × 55 × 96 to 27 × 27 × 256. The third and fourth convolutional layers employ 384 filters each with a stride of 1, followed by a pooling layer with 256 filters of the same size in the fourth layer.

Finally, AlexNet incorporates three fully connected layers. The first two layers each have 4096 nodes; while, the third fully connected layer consists of 1000 units. In the model's concluding stage, a softmax layer and a classification output layer are set up to provide probabilities for each label.

Experimental Setup

In this study, we evaluate the effectiveness of our proposed approach by subjecting it to testing on two benchmark bearing datasets: The Case Western Reserve University (CWRU) trustworthy data and real bearing data obtained from Vishwakarma Institute of Technology (VIT) Lab. Data are collected to encompass various bearing fault scenarios, as illustrated in Fig. 2. As shown in Fig. 2, the presented method includes three major steps. The first step is to obtain the gathered vibration signals, which are then transformed into RGB formats by CWT and potentially provide valuable information about machinery health conditions. The second step is to import the obtained images into a CNN-AlexNet network structure to obtain an optimized deep-learning model for bearing fault diagnosis. The final step is to employ the optimized CNN-AlexNet network model to diagnose the bearing faults and detect the fault severity.

Fig. 2
figure 2

General architecture of the CNN-AlexNet-based approach for bearing fault diagnosis [27]

Experimental Verification

Many articles and studies have employed CNN models like VGGNet, AlexNet, GoogLeNet, Lenet-5, and ResNet for bearing fault diagnosis. This paper introduces the AlexNet model, with the primary goal of achieving superior performance in image recognition and attaining the highest accuracy compared to other methods. The experiment utilizes the AlexNet method on two motor bearing datasets with varying speeds and loads, obtained from Vishwakarma Institute of Technology (VIT) College. Subsequently, a comparison is conducted with a dataset from Case Western Reserve University (CWRU).

Bearing Data Centre (BDC)

The Dataset Pertaining to Bearings from Case Western Reserve University

The laboratory setup at Case Western Reserve University [28] is commonly employed in research focusing on bearing vibration fault diagnostics. This setup primarily includes a motor, testing bearings, an accelerometer, and a loading motor, which provide experimental vibration data for rolling bearings. The experiment involved introducing four distinct single-point conditions of bearings (normal bearings, inner race faults, ball faults, and outer race faults), each with fault dimensions of 0.021, 0.014, 0.007, and 0.028 inches for the 6205-2RS JEM SKF bearing model. These tests were conducted at various motor speeds (1772, 1797, 1755, and 1730 rpm). A sampling frequency of 12 kHz was employed, and an accelerometer was strategically placed near the drive end to capture the vibration signals.

Results and Discussion

Figure 3 illustrates the continuous wavelet transform (CWT) results for vibration signals under four conditions: Healthy Bearing (HB), Ball Fault (BF). Inner race Fault (IRF), and Outer race Fault (ORF), While raw vibration signals in the time domain make it challenging to distinguish between fault types, CWT reveals differences in the vibration images in the time-frequency domain, making them suitable for further input into CNN for feature extraction [29]. The vibration images are then input into the CNN-AlexNet for training.

Fig. 3
figure 3

Transformation of a vibration signal to images for different bearing conditions (a), (b), (c), and (d)

In Fig. 3, the transformation of the vibration signal into an image shows noticeable differences, indicating the model's convergence. The training and verification data exhibit stability, highlighting the efficiency of the model. The training process was repeated multiple times under the same conditions to ensure maximum accuracy, with a learning rate set at 0.0001. The accuracy attained for the CNN-AlexNet model is 90.91%, 82.14%, 90.91%, and 100% for (a) Healthy bearing, (b) Inner race fault, (c) Outer race fault, and (d) Ball fault, respectively. These results illustrate rapid and accurate diagnosis using the suggested method, as depicted in Fig. 4. The exceptionally high prediction accuracy underscores the effectiveness of the approach. The classifier's prediction accuracy and the proportion of stage length overlapping with other stages for all bearings are presented in Fig. 4a–d illustrates the relationship between the number of model training epochs, representing the cycles through the full dataset, and the corresponding values on the x-axis. The y-axis denotes both the loss and accuracy. Upon close observation of the accuracy graph, it is noteworthy that, initially, the validation accuracy surpasses the training accuracy for several epochs. As the number of epochs increases, both the validation and training accuracy curves exhibit an upward trend. The loss value drops below 0.001 after approximately 30 epochs, indicating the superiority of the model architecture and the efficiency of the training process. When compared to Healthy, Inner race, and Outer race bearing defects, the trained classifier achieves significantly higher accuracy in the Ball fault stages.

Fig. 4
figure 4figure 4

Performance comparison of Loss and accuracies for the CWRU dataset under various bearing conditions at different motor speeds (1772, 1797, 1775, and 1730 rpm): (a) Healthy bearing, (b) Inner race fault, (c) Outer race fault, and (d) Ball fault

Real-Time Dataset of Bearing Faults

VIT College Dataset

This study considered three bearing health conditions: normal, inner race fault, and outer race fault. Bearings under each condition were operated at rotating speeds of 950, 1250, and 1950 rpm. The fault bearings are depicted in Fig. 5. During the experimental procedures, various bearings were used, each exhibiting a unique state or condition (such as being healthy, having an inner race fault, and an outer race fault). These bearings were placed at a specific location, namely, the end part of the engine, to test and observe their performance under these different conditions. Two accelerometers are installed in the horizontal and vertical directions of the bearing housings for vibration data acquisition and two loads are applied on both motor ends (1.25 kg and Gearbox). For the testing, ball bearings of type 608-2RSH&SKF were utilized. Under each bearing fault condition, three sets of data are collected at rotational speeds of 950, 1250, and 1950 rpm, as outlined in Table 2. The data collection for each set spans approximately 6 min, with a sampling period of roughly 6 min for each set. The data are sampled at a rate of 12 kHz. The data are acquired using the NI DAQ data collecting system and subsequently saved as a mat file using MATLAB 2021b software.

Fig. 5
figure 5

Distinct bearing fault conditions are delineated as follows: (a) Healthy bearing, (b) Inner race fault, and (c) Outer race fault

Table 1 provides the specifications of the bearing used for experimental purposes.

Table 1 Specification of bearings

The experimental setup, designed to study the effects of bearing faults in a motor as depicted in Fig. 6, includes the National Instruments USBX-series data acquisition (DAQ) system interfaced with the Computer GUI (Graphical User Interface). A Graphical User Interface (GUI) is employed to oversee the experiment, manage its operations, and gather the resulting data. The data acquisition process involves collecting information about the bearing conditions by applying two loads, one load weighs 1.25 kg, and the next load is the gearbox. These loads are attached to the end of the motor. The experiment involves testing at different rotational speeds of the motor: 950, 1250, and 1950 revolutions per minute (RPM), as specified in Table 2. The data readings are acquired using Piezoelectric accelerometer sensors for three types of load conditions: Healthy bearing, Outer race fault, and Inner race fault. Vibration signals are acquired from two accelerometer sensors (X and Y-axis) that detect the signals.

Fig. 6
figure 6

Experimental Setup with labeled components (1) Computer GUI, (2) Gearbox, (3) Motor with bearing faults, (4) Accelerometer sensor (X, Y-axis), (5) Speed sensor, (6) Speed controller, and (7) DAQ

Table 2 Statistics of dataset (VIT Collage)

Results and Discussion

The data acquired in this experiment included three types of bearings: healthy, outer race fault, and inner race fault. The sample time was 6 s. The data gathered by the accelerometer sensor, which was attached to the motor, was then processed in a MATLAB 2021a environment, and all the data files were stored in MATLAB (.mat) format.

Figure 7a–c displays scalogram images and vibration signal images collected by the vibration sensor using our method under different bearing conditions. When compared to vibration images generated from the original signal data, the majority of them exhibit simpler patterns that are easier for humans to classify. The heightened CWT spikes in these images represent occurrences of bearing faults in terms of features. In contrast, the normal condition is characterized by a lack of significantly elevated CWT magnitudes, as indicated by the blue color. The CNN model excels at feature extraction from these images without compromising critical information necessary for bearing state classification. It efficiently identifies the Healthy bearing (HB), Outer race fault (ORF), and Inner race fault (IRF). The Deep Network Designer in MATLAB was utilized for downloading pre-trained networks (AlexNet), replacing layers, assessing the architecture, loading the data, and starting the training. This work used the AlexNet method for the classification of the images that were converted by CWT, The images were resized according to the model used was set to (227 × 227 × 3) then trained, tested and validation set the CNN-AlexNet model for feature extraction and classification.

Fig. 7
figure 7

Transformation of vibration signals collected at different speeds (950, 1250, and 1950 rpm) into images for the VIT dataset: (a) Healthy Bearing with load and without load, (b) Inner Race Fault with load and without load, and (c) Outer Race Fault with load and without load

The four error metrics, namely accuracy, precision, recall, and F1-score, are calculated to evaluate the performance of the diverse classifiers. The formulas for these metrics are depicted in Eqs 6-9.

$$\text{accuracy}=\frac{\text{TN}+\text{TP}}{\text{TN}+\text{FN}+\text{FPP}+\text{TP}}$$
(6)
$$\text{precision}=\frac{\text{TP}}{\text{TP}+\text{FP}}$$
(7)
$$\text{recall}=\frac{\text{TP}}{\text{TP}+\text{FN}}$$
(8)
$$\text{f}1-\text{score}=2\frac{\text{precision}*\text{recall}}{\text{precision}+\text{recall}}$$
(9)

\(\text{TP}\) is the value of true positive classifications, \(\text{TN}\) the value of true negative classifications, \(\text{FP}\) the value of false positive classifications (a false alarm), \(\text{FN}\) the value of false negative classifications (missed faults), These metrics were selected because they directly reflect the impact of monitoring PdM demand conditions. In the event that the Predictive Maintenance (PdM) system activates an alarm upon purportedly detecting a fault, it is more advantageous to notify about all potential faults, even if it entails encountering some false alarms amidst the faults. In essence, the triggering of more alarms results in the presentation of more errors to the user (higher recall), albeit accompanied by an increase in false alarms (lower accuracy). Conversely, if errors are exclusively flagged when actual faults are detected, some faults may go undetected and no false alarms would be raised. The classifier will inevitably accentuate both aspects, ensuring that an alarm is sounded only in instances of genuine faults, with no omissions or false alarms. This amalgamation is directly reflected in the F1 score. Additionally, accuracy is favored due to its straightforward interpretability, representing the ratio of correctly classified samples to the total sample count. Table 3 presents the findings. As can be observed, the outcomes are improving for each metric.

Table 3 Performance results of CNN-AlexNet

The high accuracy of the classification results indicates the effectiveness of the deep signal processing method in analyzing vibration signals for bearing fault analysis. The loss and accuracy of the three fault analysis models are depicted in Fig. 8a–c. In Fig. 8a, there is a notable increase in the accuracy of identifying a healthy bearing, reaching 95.24%, and the loss value drops below 0.001 after approximately 40 epochs.

Fig. 8
figure 8

Performance comparison of accuracy for the VIT dataset under various bearing conditions at different motor speeds (950, 1250, and 1950 rpm): (a) Healthy bearing(HB), (b) Inner race fault(IRF), and (c) Outer race fault(ORF)

Figure 8b illustrates 100% accuracy in detecting inner race defects in the bearing, with the loss value dropping below 0.001 after approximately 10 epochs. Figure 8c shows an increased accuracy of 98.43% in detecting outer race defects in the bearing, and the loss value drops below 0.001 after approximately 40 epochs. Examining the training accuracy curve and loss curve epochs, we observe in the figures that the model begins to converge and the training and verification data become somewhat stable, indicating the efficiency of the model. Although accuracy generally improves with the addition of epochs, there are significant fluctuations. However, when using the Adam optimizer, the initial training stage achieves higher accuracy and lower training loss values. The training loss curve exhibits a smoother descent, and the associated accuracy curve converges more rapidly. The optimal accuracy is achieved when the epoch reaches 10. Based on these evaluation findings, the suggested CNN-AlexNet fault diagnosis technique proves to be more resilient to signal noise in different speed bearing vibration signals.

The classification results displayed in Table 4 show that the deep learning method CNN has performed well for diagnosis bearing faults. The features of the bearing's vibration signals can be recovered using the fault diagnosis model using CNN-AlexNet, and the various fault kinds can be efficiently separated. The diagnostic model can self-learn the hidden properties of the vibration signals of all conditions that are collected from the bearing signals, as shown by the 95.24% recognition accuracy of the normal condition. Due to the similarities between the Inner race failure and Outer race failure vibration signal characteristics, misclassification may occur easily. As a result, the equivalent recognition accuracy only reaches 100% and 98.43%, respectively. To further analyze the findings of this work, we compared the proposed model to VGGNet and GoogLeNet models under the same conditions using the CWRU vibration dataset and VIT vibration dataset. Table 4 shows the comparing results. The proposed model AlexNet performed roughly better compared to the GoogLeNet and VGGNet models. Table 5 demonstrates that the proposed method attains the highest performance of 100%; while, the VGGNet and GoogLeNet models show less accuracy performance.

Table 4 AlexNet classification accuracy for different bearing conditions with training time.
Table 5 Performance comparison between proposed model AlexNet and other models (VGGNet and GoogleNet) on CWRU and VIT datasets.

Figure 9a and b displays the diagnostic accuracy that uses several methods for diagnosis. As a result of its superior ability to detect temporal shifts in vibration signals, CWT, when used in conjunction with the ALexNet Model, emerged as a leading candidate for feature extraction. Furthermore, convolutional neural networks (CNNs) excel in pattern recognition, automated feature extraction, and the acquisition of robust features. Transforming vibration signals into images constitutes a suitable technique for analyzing features in two dimensions. However, in this study, optimal outcomes were attained with the Case Western Reserve University (CWRU) and Vishwakarma Institute of Technology (VIT) datasets utilizing the proposed CNN architecture based on AlexNet featuring two fully connected (FC) layers. Specifically, the CNN-AlexNet model emerged as the most effective, significantly surpassing alternative methodologies.

Fig. 9
figure 9

Performance comparison between AlexNet and other models (VGGNet and GoogleNet) in terms of accuracy for the (a) VIT dataset (b) CWRU dataset

Conclusions

Intelligent techniques for identifying and categorizing machine problems are the focus of significant scientific inquiry in the field of Industry 4.0 and smart manufacturing. So, Bearing defects have been identified and categorized using a variety of signal processing and machine learning-based methods. However, DL-based methods are favored over them due to the many drawbacks of conventional signal processing and ML-based approaches. This study presents an intelligent technique for bearing fault identification and classification in the real-time dataset that by self-collected in VIT College and one of the best general benchmark datasets, the CWRU dataset. For various bearing situations, the suggested system achieved testing accuracy between 95.24 and 100%. The findings demonstrate that when scalogram images are input, the suggested method achieves state-of-the-art accuracy. The CNN-AlexNet method has been proven to outperform other state-of-the-art methods in studies using the CWRU Famous Public and VIT College datasets, and the pre-processing of converting the signal to an image by CWT can speed up the diagnosis of bearing problems. To increase the forecast accuracy, we’d like to put our issue diagnosis method through more rigorous testing on a wider range of data, including real-world fault scenarios. The discovery of motor errors using other signals rather than vibration data, such as torque data, motor current signals, or high temperature in the rotating machinery, is another promising area for further research. This is because these signals are frequently collected straight from the programmable logic controller, rather than having to install additional sensors.