Introduction

Wind turbines are often located in remote locations to extract the kinetic energy from the wind. As a result, they are exposed to very harsh environmental conditions. A wind turbine is designed to last for 20 years. However, reports have shown that they tend to break down within the design lifetime [1]. Studies have revealed that the most common wind turbine failures reported are that of gearbox failures. The high speed shaft gear and planet gear tends to fail before the gears in the low speed stage. A scheduled maintenance usually covers minor problems such as misalignment and lubrication contamination. Preventive maintenance may not always be a viable option as it tends to be expensive and, in most cases, prevents utilizing the full design life of the component. Hence, a condition-based maintenance plan would contribute greatly towards utilizing the design life of the component and minimizing the chance of an unplanned downtime.

Presently, vibration monitoring systems adopted for machines with gearboxes can detect abnormal vibration patterns and report it to the user. Multiple researchers have reported promising techniques to analyze vibration signatures of different machine components under stationary and non-stationary conditions. Moosavian et al. used the continuous wavelet transform and Short Time Fourier Transform to extract features which were used to identify the piston health of an IC engine [2]. Sugumaran et al. used multiclass support vectors to identify bearing faults under stationary speeds [3]. While many researchers have put forward various techniques to deal with stationary signals, most vibration signals extracted from a machine are highly non-stationary. Moreover, multiple components can fail which could make the analysis very complicated. Unlike a stationary signal where a simple frequency spectrum (FFT) would provide valuable information, an FFT plot of a non-stationary signal would not provide much information as the frequency spectrum could be affected by both the equipment’s speed and load. Short Time Fourier Transform (STFT) can provide valuable insights into a signal as both the time and frequency information are made available for correlation. This method also helps in identifying the sequence of events or the fault characteristics order (FCO) [4, 5]. However, STFT suffers from one major drawback. The window length used for extracting the coefficients has to be constant which means there is always a trade-off between time resolution and frequency resolution.

Many past literatures have highlighted methods for dealing with non-stationary speeds. Barszcz et al. proposed using spectral kurtosis to detect tooth cracks within a planetary gearbox [6]. Vamsi et al. proposed using an integrated condition monitoring scheme to detect bearing failures in a gearbox [7]. Hizarci et al. proposed using image processing to estimate the health of two distinct gearboxes by analyzing the vibration region [8]. Gunerkar et al. implemented sensor fusion techniques for both vibration and acoustics data and used K-Nearest Neighbor to classify ball bearing faults [9]. Xie et al. demonstrated a speech recognition process using a variance-based approach and multiple machine learning techniques by extracting spectrogram features from the signal [10]. Praveen et al. implemented STFT and decision trees to detect and classify the faults in a planetary gearbox [11]. Cocconcelli et al. reported an STFT based approach for ball bearing fault detection for direct drive motors operating at varying speeds [12]. Huang et al. proposed using Resonance-Based Sparse Signal Decomposition to extract features for rolling elements bearing fault diagnosis [13]. Radhika et al. extracted motor current signatures to predict the fault in a 3 phase AC induction motor using wavelets [14]. Zhang et al. proposed implementing Principal Component Analysis (PCA) and Support Vector Data Description to predict health of a slew bearing [15]. Gómez et al. proposed using wavelet packet transform energy for health detection of wind turbine gearbox and generators [16]. Han et al. proposed using multi-level wavelet packets and neural networks to predict component faults within a planetary gearbox under varying speed [17].

Wavelets are presently one of the common tools to process non-stationary signals. As it can be observed, multiple studies have used STFT, wavelets or other means to extract features from the raw data. From previous studies, it could be observed that most methods for stationary analysis depended on decision trees. Whereas non-stationary methods depended on neural networks. While Discrete Wavelet Transforms (DWT) can be deployed for a condition monitoring problem [7, 18], the present study focused on deploying a Continuous Wavelet Transform (CWT) as it features a scale factor which allows the wavelet to accommodate a wide range of frequencies. The scale factor allows the wavelet to stretch to accommodate very low frequencies and also compress in order to capture sudden transient changes in the signal. This is in contrast to DWT, where a signal is decomposed into its low-pass approximation and high-pass detailed signals and further decomposition only applies to the detailed components. Hence, it suffers from insufficient treatment of the high frequency components where the bearing fault impulses exist [19].

Feature extraction plays an important role in the overall result. When a machine learning algorithm is involved, preparing the extracted features to train the algorithm is crucial. As highlighted before, a CWT deployed to extract features from a signal, requires a particular scale. This scale can be used to generate a feature vector which consist of a set of wavelet coefficients computed from the raw signal. The scales can be used to target specified frequency ranges which would allow monitoring of multiple frequency bands within the same data set. The scale function also allows the signal to be sampled at different frequencies. While this method would excel in diagnosing faults in a stationary signal, using just one feature vector would be insufficient to draw a discrete conclusion from a non-stationary signal. Moreover, multi-component faults can pose a challenge when using a single scale. Since the scale is used to capture an abrupt change in the signal (event), the frequency of event has to be within the scale’s limit. However, this can vary based on the component generating it. A single component such as a gear can have a distinct signature based on its operating configuration. A component with multiple objects such as a bearing can have multiple frequencies, which may not be within a single scale’s limit. Previous studies have overcome this by manually changing the scale to accommodate these frequencies individually [20]. While the trend is to adopt advanced algorithms in order to process complex signals, such networks can be extremely demanding on computation resources and tend to grow exponentially as the data-set grows. Portable or remote condition monitoring devices may not have the required computation resources to handle advanced algorithms in order to produce any output within an acceptable time frame. Moreover, a large data-set would not be feasible as memory on portable devices are also at a premium. CWT is very versatile, as a mother wavelet can be chosen or designed, which best matches the features being extracted. However, the CWT coefficients extracted will be equal to the input signal. A large signal would generate a large data-set of CWT coefficients which would prove difficult to adapt to portable devices. As such, the present study focused on a real-time gearbox health monitoring system for portable devices. The approach was based on CWT, a matured signal processing technique used for non-stationary signal analysis. Handling a non-stationary signal would generally adopt more advanced signal processing techniques such as auto-encoders and Convolution Neural Networks. However, deployment of such systems, especially embedded systems would be computationally expensive. Here, the authors propose a simplified feature extraction and reduction technique which works with simpler algorithms such as decision trees. Since decision trees are ‘if-else-if’ statements deploying them onto simpler devices are relatively quick when compared to Neural Networks. While decision trees are simple, they are also crude which indicates that the data-set supplied to the decision tree must be properly generated with necessary features. Due to the non-stationarity of the signal, a multi-scale wavelet approach was chosen. As discussed before a single scale CWT would not suffice as the signal was non-stationary. Here, the scale of the wavelet was adjusted to accommodate all possible frequencies generated from the gearbox for all sensor channels. This would generate a large feature set which would then be abstracted by means of descriptive statistics. The statistical features were tagged with the different gearbox conditions and used to train two decision trees and one pattern recognition algorithm. An application was built to test the capability of the proposed approach. The application using the proposed approach was capable of classifying the faults within the wind turbine gearbox located in a remote location.

Experimental Analysis

Experiment Setup

A customized miniature wind turbine planetary gearbox was designed and constructed (see Fig. 1) with an overall gear ratio of 1:100. The gearbox consists of three stages, two planetary and one parallel stage. Both the planetary stages featured a gear ratio of 1:5 and the parallel stage featured a gear ratio of 1:4. A variable frequency drive (VFD) was used to vary the motor speed to simulate random changes in wind speed that a regular wind turbine would experience. The speed of the low speed shaft was varied from 8 to 18 RPM. The inputs to the VFD were random analogue signals generated from a computer using a random number generator. This induced non-stationary loading on the wind turbine gearbox. Two tri-axis piezoelectric accelerometers were used to capture the signal from two distinct points of the gearbox (Fig. 2).

Fig. 1
figure 1

Wind turbine Planetary Gearbox

Fig. 2
figure 2

Sensor position on wind turbine Planetary Gearbox

The accelerometers were mounted using an acrylic based adhesive. Since all gears were straight cut gears, the Z axis (axial direction) did not carry much information as opposed to the X and Y axis and was neglected. As the study focused on developing a methodology for a real-time condition monitoring system, an NI 9134 controller with an NI-USB 4432 ADC was used for data acquisition. The axes were designated as Intermediate Speed Stage (ISS X-axis, ISS Y-axis) and Low Speed Stage (LSS X-axis, LSS Y-axis). The sampling rate was set to 22.05 kHz and the sample length per file was set to 131,072 data-points which translates to approximately 6 s of vibration data per file. This was done to ensure that at least one full rotation of the low-speed shaft would be captured in a file irrespective of the gearbox operating speed. A total of 15 such files were collected for each fault. This summed up to 90 s of vibration data per fault.

Fault Simulation and Data Acquisition

A series of single and multi-component faults were simulated on the gears and bearings, see Table 1. All faults were created using an EDM wire cut. The diameter of the wire was 0.2 mm. The depth of fault was maintained as 2 mm. A single component fault was defined as a component in operation with only one induced fault at a time. A multi-component fault was defined as independent components in operation having faults simultaneously or a single component in operation having multiple faults. The bearing chosen was an SKF make SYJ20TF. The gear was a 24 teeth 1.5 mm module prepared using mild steel. Figure 3, shows the bearing with a simulated fault (IO). Figure 4, shows the high-speed shaft pinion with a simulated fault (HRC). Figure 5 and Fig. 6 shows the time history (90 s) of non-stationary vibration data acquired for healthy and IOHR cases respectively. The Fig. 5 and Fig. 6 were generated by appending the individual files which summed up to 90 s of run-time. This was done to ensure that the signal time history was non-stationary. The non-stationary nature of the signal was confirmed in both the time and frequency domain. In the time domain the non-stationary nature of the signal was confirmed by observing the fluctuations in amplitude across six segments of 15 s each. It can be observed that the amplitude varies considerably between segments. In the frequency domain, the signal was first divided into segments where each segment would contain one full rotation of the LSS input shaft. The segmentation was done by means of the speed sensor attached to the gearbox which also doubled as a simple indexing sensor. Few segments were segmented from the main signal which spanned for a total of 90 s. In order to visually confirm the frequencies, the signals were first passed through a filter which filtered out the LSS and ISS component frequencies thereby, passing only the High Speed Stage (HSS) component frequencies. Figure 7 shows the segment’s raw waveform and the FFT of the respective segments. Here, the signal analyzed belonged to the Outer Race & High Speed Pinion Root Crack (ORHR) class. It can be observed that the peak frequencies for the segments shift considerably. Hence, the signals were categorized as non-stationary.

Table 1 Seeded faults information
Fig. 3
figure 3

Bearing with seeded fault

Fig. 4
figure 4

Gear with seeded fault

Fig. 5
figure 5

Healthy waveform

Fig. 6
figure 6

HRC waveform

Fig. 7
figure 7

FFT of ORHR waveform

Data Pre-Processing

Wavelet Feature Extraction

The raw vibration data can hardly be used to distinguish between a healthy and faulty signal. As such relevant distinguishable features need to be extracted from healthy and faulty vibration data collected. Since all signals were non-stationary, CWT was used to extract the wavelet coefficients. Figure 8 shows the methodology followed.

Fig. 8
figure 8

Methodology

CWT excels in time frequency analysis and filtering of time localized events, which corresponds to any potential vibration signature emitted by a component. The extracted wavelet features can be used to distinguish a faulty vibration signal from a healthy signal. Multiple scales of the same wavelet were used to extract information from the raw signal. This ensured that all relevant component signature would be captured irrespective of the speed. As the CTW scales could be used to target specified frequency ranges it allowed the signal to be sampled at different frequencies. This eliminated the need for down sampling and denoising as only the frequency information falling within the scale’s range would be captured there by filtering out the frequency information outside the scale’s range. A moving average filter was initially used to denoise the signal before computing the CWT coefficients. However, it was found that the denoising did not have any significant effect on the classification performance and was dropped so as to reduce computation time. The formula used to calculate the CWT coefficients is highlighted in Eq. 1. Here, W – Wavelets, S- scale, T- Translation, Sg -signal, WC- Wavelet conjugate, t -time [21].

$$W\left(S,T\right)=\frac{1}{\sqrt{S}}{\int }_{-\infty }^{\infty }Sg\left(Wc\right)\left(\frac{\left(t-T\right)}{S}\right)dt$$
(1)

Wavelet Selection

To identify the best mother wavelet from a family of wavelets available, wavelet selection was performed. The minimum Shannon entropy criterion was used to narrow down the wavelet selection [19]. The minimum Shannon entropy was calculated using the equations Eq. 2 & 3 [21, 22].

$$Shannon entropy-{\sum }_{j=1}^{m}{p}_{i}{log}_{2}{p}_{i}$$
(2)

Here \({p}_{i}\) stands for the probability distribution of energy for wavelet coefficients which can be defined as follows

$${p}_{i}=\frac{\text{|}{C}_{n,i}{|}^{2}}{E\left(n\right)}$$
(3)

A total of six wavelets namely: Discrete Meyer, Mexican hat, Morlet, db1, Haar and coif1 were analyzed to select the best mother wavelet that will suit the data in hand. The healthy signal shown in Fig. 5 was used as the raw signal to extract entropy information as this was considered to be the default state of the gearbox. Figure 9 shows the Shannon entropy values for different scales for all examined wavelets. The wavelets Discrete Meyer, Mexican hat and Morlet were chosen as they had the lowest Shannon entropy values. The scale factor of a wavelet is used to stretch or compress the wavelet before it translates in the signal. When the scale factor is low, the wavelet is compressed; thereby resulting in a detailed representation of the signal. However, a wavelet with a low scale factor may not last for the entire duration of an ‘event’. On the other hand, a higher scale factor stretches the wavelet, which indicates that the resulting representation contains less detail but may last for the whole event. Since the signal is non-stationary the length of an event can vary significantly. Hence, using a single wavelet scale alone may not be sufficient. The scale must be varied to match the signal to extract maximum possible information from it. Thus, the CWT coefficients were extracted from the signals for all fault conditions analyzed using multiple scales.

Fig. 9
figure 9

Shannon entropy of all chosen wavelets for all chosen scales

However, the wavelet coefficients generated using multiple scales made the sample space very large for the proposed machine learning model to work with. Moreover, the computation requirements were also very high. As such, the CWT coefficients extracted from the signals acquired were ‘compressed’ using descriptive statistics before passing it as input to the machine learning algorithms. This reduced the size of the data-set from a matrix size of 96,000 × 131,072 to 96,000 × 14. The final data-set consisted of 96,000 instances in total (15 files × 4 axis × 8 conditions × 200 scales). The following attributes Mean, Max, Min, Median, Mode, Standard Deviation, Sample Variance, Kurtosis, Skewness, Range, RMS, Sum, Axis ID (range of 1 to 4) and condition were provided for each instance.

Results and Discussion

Wavelet Analysis and Classification performance

Since the signals were non-stationary, a single scale may not be sufficient to extract useful information. To test this uncertainty, both the algorithm types were provided with signals from all healthy and faulty classes extracted with different scales. The algorithm was supplied with a single feature vector (signals extracted with the CWT scale individually set to 1, 2, 3, 4 etc.) and a group of feature vectors (scales were banded together such as scales 1–5, 1–10 etc.). As stated before, the scale information was compressed using descriptive statistics to reduce its size. Both classes of algorithms -ANN and DT were employed in this phase of the study. The ANN model (Fig. 10) used for the study was a shallow neural network (feed-forward) which had a total of three layers. The input layer had fourteen neurons which corresponded to the number of attributes of the data-set. The hidden layer had ‘N’ neurons where ‘N’ was set to 10, 100 & 1000. The hidden layer was activated using a sigmoid function. The output layer had eight neurons which corresponded with each class (IO, IOHR etc.) in Table 1. The data-set was divided into a training set, a testing set & a validation set. The division was kept at a ratio of 70:15:15. The maximum number of epochs was set to 1000 and the maximum number of validations were set to six.

Fig. 10
figure 10

ANN model layout

Under the Decision Tree (DT) class, two classification algorithms C4.5 and random tree were chosen such that their basic operations were fundamentally different. The C4.5 [23] (J48) algorithm builds a decision tree from a set of training data using the concept of information entropy and prunes the final tree whereas, the random tree [24] builds a tree that considers a set of randomly chosen attributes at each node and does not prune the tree. Results from both the algorithms were cross-checked to ensure that the probability of over-fitting was minimum. ‘Five’ fold cross validation was used to further reduce the chance of over-fitting. WEKA (Waikato Environment for Knowledge Analysis) an open-source machine learning tool was used to perform this process. Both the algorithms, C4.5 and random tree were run 3 times by shuffling the data-set and enforcing fivefold cross-validation during every run. This was done to ensure that the possibility of over-fitting was minimized and produced a more realistic classification capability. Each algorithm would generate a total of 6000 results (400 scales [200 single scales and 200 scale bands] × fivefold cross-validation × 3 Runs) per wavelet which amounts to a total of 18,000 results for all 3 considered wavelets.

Unfortunately, the compression which resulted in a matrix size reduction from 96,000 × 131,072 to 96,000 × 14 negatively affected the ANN. Training accuracies were very low in the order of 15% (10 N) to 22% (1000 N) and testing accuracies was even lower. The most likely cause of this outcome can be attributed to the low number of input nodes for such a large data-set. However, the DT algorithms responded positively to the abstraction and were hence pursued further. Figure 11 shows the classification accuracy of both the DT algorithms for the Mexican hat wavelet. The Mexican hat wavelet was selected as both the classification algorithms scored the highest classification accuracy with it as the pre-processor (Table 2). It can be clearly observed that as the number of scales were increased the classification capability improved drastically (see Fig. 11). However, as the number of scales grouped together increased, the computation time required increased drastically without a proportional impact on classification accuracy.

Fig. 11
figure 11

Classification accuracy for multiple scales

Table 2 Classification accuracy for multiple wavelets with different algorithms

Hence, in the present investigation, the optimum number of scales was set to 200 so as to have a trade-off between the classification accuracy and computation time. The signals for healthy and multi-component faults observed in detail. All signals analyzed were 6 s long. Figure 12 shows the scalogram which indicate the energy levels of the CWT coefficients up to 100 levels for the healthy signal. It was found that scales above 100 did not carry much visual information. The wavelet coefficient energy seemed to be dominant up to 56 scales after which it began to fade. It can be noted that the energy contained in the faulty component signal (Fig. 13) was significantly higher as when compared to the healthy signal. This can be observed by analyzing the energy associated with the wavelet coefficients which is dominant up to 90 scales after which it began to fade as compared to the 56 scales in the healthy signal. It can also be noted that the energy towards the lower frequency ranges have increased. This can be confirmed by the increased plot density towards the higher scales. Although scales above 100 did not carry much significance from a visual interpretation, scales between 1 and 200 were used to train and test the machine learning model as the DT algorithms clearly benefited from the large number of feature vectors (Fig. 11).

Fig. 12
figure 12

Scalogram for Healthy condition

Fig. 13
figure 13

Scalogram for fault: IO

Another point to note was that the random tree algorithm occasionally dips in classification accuracy as opposed to the J48 decision tree algorithm. Table 3 and 4 tabulates and analyses the J48 algorithm’s performance while classifying the ‘Mexican hat 1–200 scale band’. Table 3 shows the detailed accuracy by class which can be used to identify classes (Table 1) which could reduce classification accuracies during classification due to misclassification for its own class or other classes. The TP rate or true positive rate must approach 1 and the FP rate known as the false positive must approach 0. From Table 3, it can be noted that the TP and FP rate had marginal changes which clearly indicates the faults did not interfere with one another. Precision recall and F measure are performance parameters of the algorithm. precision is the ratio of relevant instances to the retrieved instances, whereas recall is the ratio of relevant retrieved instances to the total number of relevant instances. Matthews’s Correlation Coefficient or MCC is the quality of the classification. It is measured between − 1 and + 1. A coefficient of + 1 represents a perfectly correct classification, 0 which is nothing but a random classification and − 1 indicates a total disagreement between classification and observation. A ROC curve (Receiver Operating Characteristic) curve, is a plot that depicts the ability to diagnose the classifier system. The ROC curve is obtained by plotting the true positive values against the false positive values whose corresponding values have been shown in Table 3. Both ROC and PRC (precession recall rate) must be above 0.5 to ensure that the classification process is not occurring randomly. Table 4 shows the confusion matrix generated by the J48 algorithm. The diagonal elements represent the correctly classified instances. A quick check on the ‘TP rate’ in Table 3 reveals that all classes were well above 0.8 which yielded a good classification accuracy. The ROC also indicates a high score of 0.9 which indicates that the classifier is not randomly classifying the attributes. While analyzing the classes a notable observation could be made. The ‘Healthy’ had the lowest ‘TP rate’. This was fairly consistent for a few observations. However, due to the large number of results being analyzed, only the final classification accuracy was taken.

Table 3 Detailed accuracy by class
Table 4 Confusion matrix

Development of an Automated system

To automate the process an application was built in the MATLAB environment. A decision tree model was trained with the same data-set and integrated into the application. To put the proposed method to test, the application was setup to receive the raw signals from a remotely located wind turbine gearbox setup over a network connection. Figure 14 shows the operation of the application.

Fig. 14
figure 14

Application operation

The NI 9134 controller gathers raw vibration data from the gearbox and sends it to the file server using File Transfer Protocol (FTP) / Secure File Transfer Protocol (SFTP)/ Common internet file system (CISF). The Data processor (any system with the application) downloads the files from the file server to the local storage. Once all files were downloaded, the raw signals were pre-processed to extract the CWT coefficients similar to the methodology used in the study (Fig. 8). After the CWT coefficients were extracted for all four axes, the coefficients were ‘compressed’ using descriptive statistics and then written to a test file for the machine learning algorithm to validate them. The validation process begins once the testing file is populated to the required number of rows. The trained tree had a total of 16,075 nodes. Figure 15 shows the sample structure of the trained tree. The root node used the standard deviation attribute. Figure 16 shows the frequency of attributes in the full tree. Here, it can be observed that the attribute Mode, sample variance and RMS did not have any useful information for the algorithm to work with and were hence not selected. The pre-processing and processing; i.e., feature extraction and feature reduction can take up most of the computation time. It was observed that the testing/validation time was always under 1 s for any number of files chosen (file limit was 1–5). The application was run in a Linux environment on a Q6600 with 8 GB of RAM and a 100 Mbps network link.

Fig. 15
figure 15

Sample output of the trained decision tree

Fig. 16
figure 16

Attributes vs occurrences

The application was deployed as a standalone executable. This allows porting the application on to any Linux capable device. Figure 17 shows GUI of the application operating in a Linux environment. The application performs all the steps mentioned above to test the signal. However, before declaring the fault it splits the probability of the fault and displays it in the lower right corner of the application window. The ‘probability’ shown in the application is a similarity check which tells how similar the signal is to the predefined faults provided to the algorithm during training. The screenshot (Fig. 17) is the result of a healthy signal tested with the highest match of 26.5%. However, If the test signal was for an HR case there would be a very high chance of misclassification as the difference for the scores of HR and OR are very small. Here, more files (file limit > 1) should be used to increase separation of the classes instead of a single file. This can reduce the chance of a false positive at the expense of computation time. The standalone application developed is capable of automatically distinguishing the health and faulty vibration signatures obtained from a remotely located wind turbine test rig with minimum human intervention using the proposed method.

Fig. 17
figure 17

Application screenshot

Conclusion

The present work discusses the development of an automated reliable method to monitor the health of a remotely located wind turbine planetary gearbox. This was achieved using the multiple scales of the continuous wavelet transform as a feature extractor. Descriptive statistics was then used to abstract the features to reduce the size of the data-set and thereby reduce the computation requirements. The method was able to accurately predict the multi-component faults of the gearbox under non-stationary loading conditions. From this study the following notable points have emerged.

  1. A)

    For a machine learning based approach of multi-component fault diagnosis, a single feature vector is generally insufficient to classify between faults signatures effectively; as the number of correlations drawn are less. In the present study, a single wavelet scale information (feature vector) could not effectively classify the multi-component fault signatures accurately. However, as the scales were grouped together in bands it resulted in increasing the number of the feature vectors thereby providing more information and generating more instances for the algorithm to work with, which in turn increases the chances of a correctly classified instance.

  2. B)

    The scale bands are also effective when the signal is non-stationary. This can be attributed to the fact that at least one scale would have the optimal size to capture a specific event of a component in a non-stationary signal thereby increasing the chances of a correctly classified signature.

  3. C)

    The present study discovered that usage of an Artificial Neural Network resulted in very poor performance for the proposed method. This can be attributed to the fact that the number of input layers were less than the required amount to ensure a reliable model. Updating the number of hidden layers did not yield much improvement.

  4. D)

    The present study discovered that usage of a decision tree algorithm resulted in minimum training and testing time. Decision tree algorithms are also less computation resource intensive; which allows applications featuring this method to be deployed on portable devices with limited processing power.

Taking the above observation into consideration an application was built in MATLAB to automate the process and test the method. A trained model of the decision tree was supplied to the application. The raw vibration signals were processed and classified accurately without any human intervention.