Introduction

Rotating machines are widely employed in modern industry. However, they are frequently subject to a wide range of conditions, such as frequent load changes and high speeds (Qian et al. 2019) that result in performance degradation and mechanical failures (Li et al. 2020b). Consequently, a key industry issue is to provide system effectiveness and reliability through accurate fault diagnosis (Yu et al. 2019). These allow for unexpected failures and unscheduled downtime to be minimized, saving unnecessary extra costs.

Applications involving fast and intelligent fault diagnosis methods are of significant interest, as can be seen in the works (Li et al. 2019; Wang et al. 2020; Martins et al. 2021). A variety of sensors have also been employed to measure dynamic responses (Goyal et al. 2019). A possible non-invasive solution to effectively measure the different levels of degradation is through vibration signal estimation. Note that failure recognition and detection from mechanical vibration analysis enables proper maintenance measures at early stages (Glowacz 2018). The most frequent failures that affect the useful life of rotating machines are imbalance and misalignment (Bai et al. 2019; Guan et al. 2017).

Misalignment is usually due to improper installation, thermal variation, asymmetric loads, amongst others (Hujare and Karnik 2018). These result in increased loads on bearings and couplings, the parts connected to the shaft. Misalignment usually worsens with continuous operation and requires periodical monitoring in order to be corrected (Verma et al. 2014). One possible strategy for determining misalignment is to employ vibration spectrum analysis. This is a reliable method that also enables the identification of imbalance faults. Various methodologies have been applied in the literature addressing this issue, such as (Klausen et al. 2018; Djagarov et al. 2019). For instance, (Yamamoto et al. 2016) proposed using an intelligent algorithm embedded in a Field Programmable Gate Array (FPGA) to correct imbalance faults. The work (Djagarov et al. 2019) designed a Supervisory Control and Data Acquisition (SCADA) system for monitoring electric motor failures in ships.

Other references, such as (William and Hoffman 2011; Yu 2019), successfully applied signal processing methods to fault detection. Recently, many authors, such as (Srinivas et al. 2019; Dekhane et al. 2020), have addressed the problem of measuring, identifying, and quantifying combined faults in rotating machines. Machine learning and statistical techniques also exist for tackling these issues, namely (Yang et al. 2019; Zhang et al. 2020a).

A review on data-driven fault severity assessment in rolling bearings was presented in (Cerrada 2018). The work mentions a series of techniques that can be employed to assess the state of an electric engine based on digital signal processing and intelligent algorithms, namely: artificial neural networks, support vector machines, clustering, Markov models, fuzzy logic, linear discriminant analysis, Gaussian mixture models and probabilistic based approaches. One possible way for developing prognostic systems is to consider the remaining useful life of an asset, which can be estimated by fault classification techniques (Si et al. 2011). Fault classification can be divided into model-based approaches (Srinivas et al. 2019; Wang and Jiang 2018) and data-driven methods (Dekhane et al. 2020; Li et al. 2017), the latter being the focus of this paper. Typically, statistical data-driven approaches for fault classification apply stages such as (i) data acquisition; (ii) feature extraction; (iii) fault identification; and (iv) fault severity estimation (Martins et al. 2019). Machine learning techniques are susceptible to suffer from overfitting issues. This is especially true in the case of learning from rare events (Oh and Jeong 2020). Data augmentation schemes can be employed to reduce this issue (Li et al. 2020c).

In (Jin et al. 2021) a technique is presented based on deep learning to identify vibration signals composed of simple and combined failures related to bearing faults. The dataset used in this paper is composed of eight classes composed of three simple failures, four combined failures and one class corresponding to normal operating conditions. The algorithm employs active learning in order to overcome a lack of labeled instances. The article also proposes an automatic way of extracting features to reduce the intervention of a specialist in the initial choice of the feature set. The authors also apply a feature selection technique to choose the most relevant ones and thus reduce the number of input signals in the classifier. The algorithm achieved 100% accuracy, outperforming convolutional neural networks and long short-term memory algorithms.

In (Xiao et al. 2021) a system was designed based on deep learning using a denoising autoencoder to solve the problem of noisy domain shift in failure identification. This work made use of two datasets consisting of acoustic signals, one referring to gear faults and the other to motor faults. Noisy data was generated through additive white Gaussian noise (AWGN) and binary masking. Classification-wise, the proposed algorithm performed well even in the face of contaminated signals with high noise levels. The training time of the proposed algorithm was also lower when compared to other deep learning algorithms.

In (Shao et al. 2017) the authors propose an Auxiliary Classifier Generative Adversarial Network (ACGAN) to create new and realistic synthetic observations directly from sensor data. The method is applied for fault detection and classification in rotating machines. The authors made use of a rotor kit with one accelerometer for data gathering. Six conditions were simulated: normal, stator winding defect, imbalanced rotor, bearing defect, broken bar, and bowed rotor. The minor class had 100 samples, while the rest of the classes had 200 instances. Different training data settings using real data and generated data were used to produce 12 different scenarios. The baseline scenario employed 200 samples of real data alongside zero instances of generated data and achieved an accuracy of \(99.80\%\). When 200 samples were used from real data in conjunction with 200 generated samples the system produced \(99.93\%\) accuracy. Classification accuracy reached \(100\%\) when 200 real samples were used alongside 600 generated ones.

In the work of (Rashid and Louis 2019), AWGN was used to augment the positioning and movement data which were collected from GPS and gyroscope devices. The sensors were installed in heavy-duty vehicles to evaluate the optimal usage of civil construction equipment through deep learning methods. The goal of the authors was to reduce costs in civil constructions. The idea of creating a new dataset using data augmentation techniques can also be found in (Rochac et al. 2019). The authors applied AWGN to develop several new training data from an original limited set consisting of infrared camera images and further train different deep learning models. The authors gave special attention to the signal-to-noise ratio (SNR), experimenting with ten different SNR values to demonstrate the respective influence on accuracy. These results were then compared to those obtained using Synthetic Minority Oversampling Technique (SMOTE) (Chawla et al. 2002). In the latter, the authors performed experiments using SMOTE to enlarge the minority class after having undersampled the majority classes in order to analyze performance in the ROC space. The experiments were performed with three different classifier algorithms.

In (Arslan et al. 2019), a dataset of humidity, temperature, light intensity, and air quality was preprocessed through AWGN and SMOTE data augmentation techniques and further used to train a classifier algorithm. The results suggested a better accuracy when using SMOTE than AWGN for this configuration. The work (Fernández et al. 2018) presents a literature review and approaches some of the relevant aspects of the SMOTE technique. In (Wang 2008) the authors successfully increased classification accuracy by combining SMOTE and a Biased-SVM when applied to four other imbalanced datasets available at the UC Irvine (UCI) machine learning repository. The results suggested that classifier sensitivity to minority classes was improved by the SMOTE algorithm. It is also possible to create variations of the SMOTE technique as proposed by (Li et al. 2011). Instead of selecting the K-nearest neighbors (K-NN), the authors selected three real random samples to create a triangle. The triangle is then filled with a defined quantity of lines, and each of these lines will finally contain a defined synthetic amount of data points. This process was entitled Random-SMOTE, whose objective was to pursue a more uniform distribution of synthetic items throughout the minority class space. In (Ali et al. 2019) the influence on the model accuracy was analyzed after SMOTE was applied to enlarge the minority class of a vibration dataset. The results were comparable to the previous works, which used AWGN as the augmentation approach. The authors used a multilayer perceptron (MLP) to classify the rotating machine faults.

Variational Autoencoder is an additional data augmentation technique based on deep learning. The method allows for the reconstruction of the created examples in the data space. However, the approach is known for producing distorted reconstruction when the signal is noisy (Burks et al. 2019). The method is also difficult to train due to the required hyperparameter tuning process and the high execution computational cost (Asadi et al. 2009; Shorten and Khoshgoftaar 2019), which requires the use of clusters and/or GPUs.

Generative artificial neural network is another deep learning method that has been used for data augmentation in several areas. However, the use of the technique has some limitations, namely: (i) it requires a large amount of original data to carry out training (Yu et al. 2021), which is not always possible, as is the case of this research; (ii) it is subject to instability and non-convergence of the algorithm in cases where the generator produces large outputs; and (iii) it generates examples that are not consistent with the physical nature of the real data (Shorten and Khoshgoftaar 2019; Mikołajczyk and Grochowski 2018).

As mentioned in the previous paragraphs, the performance of deep learning techniques is susceptible to suffer from a lack of training examples in what concerns failure conditions. Therefore, it is pertinent to propose a data augmentation method for those classes whose instances are lacking, which is: (i) stable when using parameter adjustment methodology; and (ii) does not require high-performance computational resources. The main contributions of this paper are summarized below:

  1. 1.

    Most of the research focusing on fault diagnosis in rotating machines only considers the identification of single faults. However, in this work, the objective is to identify and differentiate single failures from combined failures. These are situations that can occur in industrial environments. Furthermore, this task is more complex than the identification of isolated faults.

  2. 2.

    Compute the influence in classifier performance of preprocessing approaches such as features normalization, undersampling, and data augmentation using white noise and SMOTE.

  3. 3.

    Develop a novel hybrid data augmentation method using SMOTE and AWGN to increase the number of minority classes instances with the objective of improving classifier performance.

Fig. 1
figure 1

ABVT Experimental bench

This paper is structured as follows. Section 2 presents a description of the proposed methodology, detailing the dataset as well as the feature extraction process. A theoretical foundation regarding the main concepts treated in this research is briefly explained in Sect. 3. Section 4 describes the effectiveness of the proposed method. The concluding remarks are reported in Sect. 5.

Case study

Industrial rotating machines are usually involved in production processes. Production stoppage might cause significant financial losses and even damage the equipment. This makes it unfeasible to cause failures in these apparatuses for study purposes. An adequate study of the problems affecting this type of machines requires a large dataset covering different types and severities of breakdowns. Creating such a dataset can be very time consuming and even impossible for the most critical operating conditions.

In this sense, two approaches can be taken, namely: (i) place the rotating machine on a test bench for the purpose of inserting faults and recording the corresponding vibration signatures; and (ii) employ bench simulators of rotating machines. The former is impractical given the potentially high cost of the machine and elevated execution time associated with preparing and assembling the failures. As a consequence, laboratory tests are more expensive. The second approach enables the insertion of failures in a more convenient way, which results in time and execution savings (Villa et al. 2012).

As a result, the experimental bench Alignment Balance Vibration Trainer (ABVT) was employed in this study to produce simple and combined faults. This experimental bench is composed of a 0.25 hp DC motor, two rolling bearings, a thin shaft, a sliding surface, a rigid coupling, and an inertia disc positioned in the center hug configuration (between the rolling bearings), as shown in Fig. 1. The simulation bench was used in an environment with a controlled temperature in the range of \(22^{\,\circ }\)C to \(27^{\,\circ }\)C. Before starting to record and monitor the signals, the engine was in operation for 10 minutes to ensure that it was properly prepared. Signals that presented vibration values outside the expected range were discarded and replaced with a new recording. The module used to record the vibration and tachometer signals was the signal acquisition module (NI 9234), manufactured by National Instruments. This module converts the analog signals from the sensors into digital voltage or current signals. The main features of the module sensor are 24-bit resolution, a maximum sampling frequency of 51.2 kHz, 102 dB dynamic range, anti-aliasing filter, operating temperature range of [\(-\) 40, 70] \({}^\circ \)C, and signal conditioning for piezoelectric sensors. The \(\hbox {Labview}^\mathrm{TM}\), software was used to implement the interface between the acquisition module and the computer. This interface enables viewing the signals of each channel during the acquisition step to avoid recording errors.

The scenarios studied in this research are: (i) normal behavior; (ii) imbalanced rotor; (iii) imbalance rotor with added horizontal misalignment; and (iv) imbalance rotor with added vertical misalignment. Imbalance is provoked in the ABVT by fixing screws on the inertia disc. Vertical misalignment is produced by adding metal plates at the base of the DC motor. Horizontal misalignment is inserted by shifting the base of the motor and measuring rotational speed using a digital tachometer, as shown in Fig. 2.

Fig. 2
figure 2

Faults insertion

Table 1 Dataset description

The vibration signals were acquired and stored. Because the acceleration signals are quite noisy, which can negatively affect the fault diagnosis stage, they were filtered by a bandpass Hamming window whose cutoff frequencies are 10 Hz and 1000 Hz. Subsequently, the discriminative characteristics of the signals were extracted as a means of reducing the amount of input information to be presented to the classifiers. The last step was to compare the classification performance behavior of four algorithms. This allowed us to better grasp of the effectiveness of the proposed hybrid data augmentation method against AWGN and SMOTE.

Dataset

Table 1 presents the details of the dataset produced, which consists of 238 signals. These were recorded by changing the motor rotational speed using 2 Hz steps in the range \(f \in [16,60] \) Hz. The maximum frequency employed is due to the operating limits of the simulation bench. The imbalance values listed in Table 1 indicate the masses (in grams (g)) that were placed on the inertia disc. The horizontal and vertical misalignments measures are in millimeters (mm) and correspond to the movement of the motor base when compared to its initial position. The ‘Label’ column indicates the class that describes each scenario.

The vibration signals were measured at the internal bearing, which is closer to the DC motor. Digital data was acquired at the sampling frequency of 50 kHz for 3 s. Three uniaxial piezoelectric accelerometers, manufactured by IMI Sensors, were employed to obtain vibration signals in perpendicular directions: axial, horizontal, and vertical. The main characteristics of this sensor are: sensitivity (100 mV/g (20%) ); frequency range ([0.27, 1000] Hz); and acceleration measurement range ([\(-\) 50, 50] g, in this case g is approximately 9.8 m/s\(^2\)). In order to measure the rotational speed of the shaft motor, the tachometer MT-190 was used, which is produced by Monarch Instrument.

Table 2 Extracted features at time and frequency domains, with \(\alpha _K \triangleq \frac{\sqrt{K(K-1))}}{K-2}\)

Feature extraction

One of the main preprocessing steps in fault diagnosis is feature extraction of the vibration signals (Razavi-Far et al. 2017; Xu et al. 2019). The fault signature can be understood as a set of symptoms associated with a defect, and these are directly related to certain features from the vibration signals (Cerrada 2018). Feature extraction also reduces the amount of information to be used as input to the classifier. For contextualization, in this research, if this preprocessing step were not to be used, the classifier would receive 150,000 samples referring to each of the sensors used. This would unnecessarily increase the computational cost of the classification task and impair its accuracy due to the excess of information (Bramer 2007). In this work, features are used in time and frequency domains (Pandya et al. 2013; Dhamande and Chaudhari 2018), as shown in Table 2 where:

  • x(n) is the time domain vibration signal;

  • N is the length of the time domain vibration signal;

  • \({\mathbb {E}}\) denotes the expected value operator;

  • \(p(z_n)\) corresponds to the probability of x(n) being equal to the possible values of sequence \(z_n\);

  • s(k) is the vibration signal spectrum obtained by the application of Fast Fourier transform (FFT) in x(n);

  • K is the number of samples of s(k);

  • \(p(z_k)\) corresponds to the probability of x(k) being equal to the possible values of sequence \(z_k\);

  • \(R_f\) is the rotational speed frequency obtained by the FFT of the tachometer;

  • \({A_m}(R_f)\) denotes the maximum value of s(k) at the \(R_f\) of the rotating machine;

  • N/A stands for not applicable;

with the exception of the \(R_f\) indicator, which represents only a single feature, each one of the remaining indicators in Table 2 is calculated for the axial, horizontal, and vertical directions. This results in 48 time-domain and 60 frequency-domain features, thus producing a feature vector with 109 elements.

Features normalization

In statistical studies, normalization is used to standardize data and to optimize data processing (Suarez-Alvarez et al. 2012). In machine learning, normalization plays a significant role when attributes can hinder data processing (e.g., redundant or extreme values). Normalization is a way to standardize and minimize problems that originate from such dispersions or redundancies. The process allows for (Walpole and Myers 2012): (i) effective data processing; and (ii) ignoring inconsistent data. Normalization can improve the performance of classifiers such as SVM, K-NN, and RF (Canbaz and Polat 2019; Sikder et al. 2019).

Preliminary simulations in the dataset employed in this work show that Minimum-Maximum (min-max) normalization performs better than Z normalization. Thus, in the simulations, only the min-max normalization was applied. This technique, respectively presented in Equation (1), normalizes the values through their minimum and maximum values, separating them at fixed intervals to provide more effective processing (Polat 2020).

$$\begin{aligned} {\mathbf {f}}{\mathbf {e}}_{\text {norm}} = \frac{{\mathbf {f}}{\mathbf {e}}- \min {\left( {\mathbf {f}}{\mathbf {e}}\right) }}{\max {\left( {\mathbf {f}}{\mathbf {e}}\right) } - \min {\left( {\mathbf {f}}{\mathbf {e}}\right) }}, \end{aligned}$$
(1)

where \({\mathbf {f}}{\mathbf {e}}\) is the original feature vector, \(\text {min}{\left( {\mathbf {f}}{\mathbf {e}}\right) } \) is the lowest value of vector \({\mathbf {f}}{\mathbf {e}}\), \(\max {\left( {\mathbf {f}}{\mathbf {e}}\right) }\) is the highest value of \({\mathbf {f}}{\mathbf {e}}\) and \({\mathbf {f}}{\mathbf {e}}_{\text {norm}}\) is the normalized \({\mathbf {f}}{\mathbf {e}}\) vector.

Theoretical foundations

This section presents the theoretical background for the development of the hybrid approach, namely: Sect. 3.1 presents an explanation of rotating systems and a respective dynamic model; Sect. 3.2 describes the imbalance whilst Sect. 3.3 details the misalignment effects; Sect. 3.4 presents the data augmentation methodology and Sect. 3.5 elaborates on the classification methods employed.

Mechanical model of rotating machines

In general, a rotor-coupling-bearing system is represented by a second-order differential equation as described by (Desouki et al. 2020):

$$\begin{aligned} \mathbf {M}\ddot{q}+\mathbf {C}\dot{q}+{\mathbf {K}}{\mathbf {q}}=\mathbf {f}(t), \end{aligned}$$
(2)

where \(\mathbf {M}\) is the mass matrix, \(\mathbf {C}\) is the damping matrix, and \(\mathbf {K}\) is the stiffness matrix. The vector of generalized coordinates is given by \(\mathbf {q}\), with its first and second derivatives with respect to time t given by \({\dot{\mathbf{q}}}\) and \({\ddot{\mathbf{q}}}\), respectively. While, the external forces are represented by the vector \(\mathbf {f}(t)\).

Imbalance and misalignment are the main sources of vibration in rotating machinery. The vibration caused by these phenomena may destroy critical parts of the machine, depending on its amplitude. Considering those phenomena responsible for the excitation forces perceived in the coupling of the driver and driven shafts, the vector of external forces is given by (Desouki et al. 2020):

$$\begin{aligned} \mathbf {f}(t)=\mathbf {f}_{\text {imb}}(t)+\mathbf {f}_{\text {mis}}(t), \end{aligned}$$
(3)

where \(\mathbf {f}_{\text {imb}}(t)\) is the component due to imbalance and \(\mathbf {f}_{\text {mis}}(t)\) is the component caused by parallel or angular misalignment, or even a composition of them, and t is time (Wang and Jiang 2018; Xu and Marangoni 1994; Wang and Gong 2019).

Fig. 3
figure 3

AWGN signal addition scheme

Imbalance in rotating machines

According to (Desouki et al. 2020), imbalance occurs when the center of mass of a rotating assembly does not coincide with the center of rotation. The ISO 21940-1:2016 defines imbalance as a resulting condition of force transmission or vibration movement through the bearings as a result of the action of centrifugal forces (ISO 2016). The issue is usually attributed to deformations, asymmetries, imperfections in the raw material, and assembly errors caused by an eccentric concentrated mass. The imbalance force is described by:

$$\begin{aligned} \mathbf {f}_{\text {imb}}(t)=mr{\omega }^2, \end{aligned}$$
(4)

where m is the unbalancing mass, r is the distance from the mass center of gravity to the rotation axis, and \(\omega \) is the angular velocity. Imbalance in rotating machines can be identified by applying signal processing techniques. This fault presents amplitude in the fundamental frequency of the rotational speed, which is much higher than the amplitudes of other harmonics in the radial direction. This issue provokes high vibration amplitudes, which causes stresses in structural supports and can eventually lead to their complete failure (Bloch and Geitner 2005).

Misalignment in rotating machines

The alignment condition on rotating machines is given by the relative position of the connected shafts. If their centerlines are coincident, forming a straight line, the rotating machine is considered aligned. Otherwise, there is misalignment, which is usually classified as parallel or offset misalignment, angular misalignment, or more commonly, a combination of both (Hujare and Karnik 2018). The misalignment produces forces and moments, inducing radial and axial vibrations in the system, which can be represented by:

$$\begin{aligned} \mathbf {f}_{\text {mis}}(t)= \mathbf {K}_{\text {c}}\varvec{\Delta e}, \end{aligned}$$
(5)

where \(\mathbf {K}_{\text {c}}\) is the couplings stiffness matrix and \(\varvec{\Delta e}\) is the couplings stiffness matrix and \(\varvec{\Delta e}\) vector of misalignments, composed by parallel and angular displacement (Wang and Jiang 2018; Wang and Gong 2019). It should be said that the study of rotor misalignment has been limited to a qualitative understanding of the phenomenon. This has been mostly based on experiments with scarcely successful attempts to develop an effective mathematical model that allows for a quantitative evaluation of this defect (Desouki et al. 2020; Sinha et al. 2004; Lal and Tiwari 2018).

Data augmentation

A common issue that occurs while working with supervised data is trying to learn from imbalanced data. This usually happens due to the underrepresentation of a set of classes, i.e. when an uneven number of instances are used to train the machine learning algorithm (Fernández et al. 2018). These are called minority classes. This situation leads to biased models where model accuracy decreases as the imbalance ratio increases. In real-world conditions, it is to be expected to have more instances representing normal conditions than those deemed to be abnormal or defective (Chawla et al. 2002). Learning from imbalanced data has thus become an integral part of machine learning techniques (Fernández et al. 2018).

In (Fernández et al. 2018) resampling methods were presented covering undersampling and oversampling. The undersampling techniques refer to the random elimination of samples from the majority classes to make them smaller and size comparable to the smallest ones. However, this approach leads to some problems since: (i) important instances may be discarded, resulting in a lack of data affecting class characterization; (ii) higher imbalance ratio, the number of samples that will be discarded, which may reduce the ability for generalization; and (iii) the reduction of the training set provokes a variance increase of the classifier (Chawla et al. 2002; Dal Pozzolo et al. 2015). In contrast, the oversampling method relies on increasing the instances of minority classes in order to make them comparable in size to the largest ones. The candidate samples are replicated based on some weight criteria.

More elaborate techniques are commonly referred to as data augmentation techniques (Fernández et al. 2018; Chawla et al. 2002), and these will be the focus of the following sections. Namely: “Additive white gaussian noise technique” section presents the AWGN method; “Synthetic minority oversampling technique” section describes the SMOTE approach; the details for the hybrid data augmentation method proposed in this work can be found in section “Proposed hybrid data augmentation method”.

Additive white gaussian noise technique

AWGN can be used in the data augmentation process, which is applied to the data space instead of the feature space, as opposed to SMOTE (Fernández et al. 2018). Figure 3 represents the AWGN method where a zero-mean Gaussian noise is added to the input vibration signal (McClaning and Vito 2000; de Lima et al. 2013) to create a new vibration signal. The Signal-to-Noise ratio (SNR), respectively presented in Equation (6), reflects the relation between the input signal average power (\(P_{\text {signal}}\)) and the average noise power \(P_{\text {noise}}\) in dB.

$$\begin{aligned} {\text {SNR}}_{{{\text {dB}}}} = 10\log \left( {\frac{{P_{{{\text {signal}}}} }}{{P_{{{\text {noise}}}} }}} \right) , \end{aligned}$$
(6)

Due to the random character of the added noise (Diniz et al. 2010), the original input signal can be transformed as many times as needed to make the resulting polluted signal comparable in size to those of the larger classes. This can be performed by adding random noise to each new copy of the vibration signal. In this research, we employed \(\text {SNR}_{\text {dB}}=15\) dB to create the noisy signal versions.

Synthetic minority oversampling technique

SMOTE was initially proposed in (Chawla et al. 2002) as an option to increase the proportion of minority classes in datasets. Its approach consists in creating fictitious or synthetic observations in between two real observations. As commented in (Fernández et al. 2018; Chawla et al. 2002) this is a process applied to the feature space instead of the data space as occurs when using other oversampling methods.

Fig. 4
figure 4

Minority class feature space is represented in a simplified two dimensions scheme. The blue circles correspond to the real observation, the orange circles are the synthetics observations, the blue arrows are the real features vector and the blue arrows are the synthetics features vector (Colour figure online)

Figure 4 presents a two-dimensional representation of the creation of the synthetic observations and the respective feature vectors. This technique can be applied to a multidimensional feature space. It is possible to create as many synthetic points as needed to make the minority class dataset size comparable or equal to the larger ones. A synthetic observation might be created between: i) two real observations; ii) a real observation and a synthetic one; and iii) between two previously created synthetic observations.

According to (Chawla et al. 2002), a synthetic observation can be constructed as follows: a given real feature vector, \(\text {sample}_i\) is randomly taken from the minority class dataset. In addition, one of the K nearest neighbors of a sample is randomly chosen. Subsequently, the difference, \(D_i\) between each respective feature of both vectors is calculated, and the new synthetic vector is created by summing each of feature c of the randomly chosen i samples to its corresponding Di.G, where G is a factor randomly chosen in the interval \([ 0< G < 1 ]\) for each different feature c. This results in the construction of a synthetic vector between a sample and a neighbor. The aforementioned process is precisely detailed in Algorithm 1.

figure a
Fig. 5
figure 5

Proposed hybrid method version 1

Fig. 6
figure 6

Proposed hybrid method version 2

Proposed hybrid data augmentation method

The use of SMOTE and AWGN techniques in an isolated manner to create additional instances of the minority classes are able to increase classifier performance. However, these methods can also increase overfitting (Zur et al. 2004; Santos et al. 2018), which is not desirable. Furthermore, SMOTE also has the potential to disseminate noisy information when new instances are created in unwanted positions (Cheng et al. 2019).

In order to increase the number of vibration signals and avoid overfitting, we propose a hybrid method combining SMOTE and AWGN. The purpose of applying this method is to create a set of artificial signals that have higher randomness than when applying techniques in an isolated manner. These would translate into a more robust and generalist classification model, thus decreasing the bias when compared with any one of the two data augmentation techniques employed. Two versions can be devised for the hybrid method. Namely, a first version (version 1) can be developed consisting in expanding only the number of instances of the minority classes without making changes to the majority classes. This procedure is illustrated in Fig. 5, where:

  • \(M_a\) represents the number of majority class instances;

  • \(M_{i_1}\) represents the number of minority class instances obtained by feature extraction without using data augmentation techniques;

  • \(M_{i_2}\) represents the number of minority class instances obtained from applying SMOTE;

  • \(M_{i_3}\) represents the number of minority class instances obtained from applying AWGN.

In the first approach, \(M_a\), \(M_{i_1}\), \(M_{i_2}\) and \(M_{i_3}\) contain, respectively, 115, 41, 37 and 37 instances each. Also, note that the quantity of Ma instances is equal to the sum of Mi\(_1\), Mi\(_2\) and Mi\(_3\).

The second version of the method (version 2), presented in Fig. 6, increases the number of minority class instances by x units using the AWGN technique and also modifies x signals of the majority class by adding Gaussian white noise. This way, the insertion of white noise does not become a discriminating feature between minority and majority classes. Figure 6 presents the overall details of the second approach where:

  • \(M_{a_1}\) represents the number of majority class instances;

  • \(M_{a_2}\) represents the number of instances modified by AWGN;

  • \(M_{i_1}\) represents the number of minority class instances without using data augmentation techniques;

  • \(M_{i_2}\) represents the number of minority class instances resulting from applying SMOTE;

  • \(M_{i_3}\) represents the number of minority class instances resulting from applying AWGN.

In the second approach, \(M_{a_1}\), \(M_{a_2}\), \(M_{i_1}\), \(M_{i_2}\) and \(M_{i_3}\) contain, respectively, 78, 37, 41, 37, 37 instances each. Also, the sum of \(M_{a_1}\) and \(M_{a_2}\) is equal to the sum of \(M_{i_1}\), \(M_{i_2}\) and \(M_{i_3}\).

Classification methods

This paper compares four machine learning classification methods, namely Support Vector Machines (SVM), K-Nearest Neighbors (K-NN), Random Forest (RF) and Stacked Sparse Autoencoder (SSAE). These are, respectively, briefly described in Sects. 3.5.1, 3.5.2, 3.5.3 and 3.5.4.

Support vector machines

Support Vector Machines (SVM) is a machine learning method with a set of linear indicator functions that divides the feature space into two regions (Vapnik 2013; Ziani et al. 2017). The method maps the original data in higher dimensional feature space (compared to the original one) using the training dataset. A hyperplane with a better discriminatory capacity is then constructed. This capacity depends on the kernel function employed, with the most common ones being the sigmoid, the radial basis, and the linear functions (Choubin et al. 2019). Usually, the radial basis function kernel tends to match the performance of the linear one (Chang et al. 2010). However, in the exploratory experiments performed in this work, the linear kernel delivered the best results. As a result, it was the one chosen for the rest of the evaluations. The linear kernel SVM also exhibits good results in the works presented in (Elangovan et al. 2011; Ruiz-Gonzalez et al. 2014).

K-nearest neighbors

K-Nearest Neighbors (K-NN) is one of the most used non-parametric methods (Yoon and Friel 2013). This is essentially due to its simplicity of implementation. It is used to classify and cluster the nearest data vectors, with proximity being measured by some defined metric, the most common of which is the euclidean distance (also used in this work). K-NN is designed with the concept of the classification being decided by determining the majority class amongst its K closest neighbors (Xing and Bei 2020).

Random forest

Random Forest (RF) is a method of ensemble learning inspired by decision tree learning (Breiman 2001). The method combines different decision tree predictors (with each one being statistically independent of the remaining ones) and outputs the most common predicted class. The method uses a variety of binary-ruled decisions to indicate a split in each tree (Görgens et al. 2015). Feature bagging is performed for each tree, where a random subset of the features is selected in the learning process. RF is ranked as one of the best classification methods (Fernández-Delgado et al. 2014), and its popularity growth is associated with the automation and simplicity of the algorithmic training procedure. As a result, system developers with little experience in machine learning can build classification systems with good discriminatory capacity (Fletcher and Reddy 2016).

Stacked sparse autoencoder

AE is a deep learning algorithm consisting of neural networks whose objective is to encode and reconstruct, with the smallest possible error, the input itself in the output. It consists of two parts: an encoder and a decoder. The encoder is responsible for compressing the original data space into a new representation space, called latent space. The function of the decoder is to reconstruct the input data from the data representation in the latent space (Shao et al. 2017). The training step of the AE is unsupervised because the data labels are not provided (Li et al. 2020a). The AE can be used in several manners, namely: (i) to perform feature reduction; (ii) to denoise data; (iii) to perform data augmentation; (iv) or classify data, as is the case in this paper (Fu et al. 2019).

A Stacked AE is a complex structure composed of a series of concatenated layers. The output of each layer is connected as an input to the next layer. In this structure, each layer is trained as an AE with the objective of reducing the error. After all layers are trained, a fine-tuning step is performed. For the classification step, the decoder layer is removed and a softmax layer is added. Due to a large number of neurons in the hidden layers, the sparse constraint is used to capture high-level representations of the data, thus its name, Stacked Sparse Autoencoder (SSAE) (Aouedi et al. 2020).

Results and discussion

The main goal of this work is to identify the four classes described in Table 1, namely, No (Normal), I (Imbalance), IHM (Imbalance + Horizontal Misalignment) and IVM (Imbalance + Vertical Misalignment). In this section, the results of applying four types of classifiers are compared: SVM, K-NN, RF and SSAE in 14 different cases, which are described in Table 3.

Table 3 Cases description

As the dataset used in this research has low cardinality, it is not recommended to use the holdout technique, which separates the data into training and test sets. In these circumstances, classifier training can result in overfitting issues, causing bias in the result (Aggarwal et al. 2018). As a result, we opted to instead apply 5-fold cross-validation, which is a stochastic partition method for training and test data. This results in a more robust and accurate prediction model (Dinov 2018). The procedure iteratively goes through every possible training and test set combination evaluating the respective performance. This procedure is illustrated in Fig. 7.

Fig. 7
figure 7

K-fold representation

The classifiers have adjustable parameters whose selection was oriented by maximizing the highest average of intraclass relative hits. This is calculated through the sum of the correct answers of the main diagonal of the confusion matrix divided by the number of classes. The following sentences describe how the hyperparameter tuning for each classifier was performed. SVM training was performed by testing different values of the regularization term \(C\in \{2^{-5}, 2^{-3},2^{-1},...,2^{13}, 2^{15}\}\) using the linear kernel function. The training of the RF consisted of tuning the number of trees. During the training stage, the number of trees was varied from 1 to 50. The division rule used to form the nodes of the trees of RF was the Gini diversity criterion. The minimum number of observations per leaf used by the classifier was 1. In what concerns the K-NN classifier, the number of neighbors was varied from 1 to 100 using the Euclidean distance to select the best value of K. Based on (Zhang et al. 2020b), the following hyperparameters were used to train SSAE with softmax classification: (i) three hidden layers consisting of, respectively, 100, 50, and 20 neurons; (ii) weight decay coefficient equal to 0.0001; (iii) sparsity penalty coefficient of 0.001; and (iv) sparsity factor set to 0.2. Seven metrics were used to measure the performance of the classifiers: classification time for one example (T), precision (P), recall (R), specificity (S), F1 Score (F1), accuracy (A), and standard deviation (SD) (Rehman et al. 2020; Kankar et al. 2011).

The following sections are organized as follows: Sect. 4.1 presents the results for the SVM classifier; Sect. 4.2 details the performance of the K-NN method; Sect. 4.3 describes the data obtained for the RF algorithm; and Sect. 4.4 lists the results for the SSAE approach.

SVM results

Table 4 presents the SVM results for the dataset without using normalization. The data shows that using the undersampling technique (\(C_2\)) worsens SVM performance when compared against the baseline \(C_1\). The SMOTE data-augmentation (\(C_3\)) technique causes a decrease in accuracy when compared to that of \(C_{1}\). However, the other metrics evaluated are improved. The application of AWGN in \(C_4\) and in all classes (\(C_5\)) improves precision, recall, specificity, and F1-score when compared to \(C_1\). On the other hand, the application of these techniques worsens processing time, accuracy, and standard deviation. The application of the proposed hybrid method, version 1 (\(C_6\)) and version 2 (\(C_7\)), improves SVM performance in all the evaluated items except the processing time compared when compared against \(C_1\).

Table 4 SVM applied to the dataset without features normalization

Table 5 presents the results concerning feature normalization and showcases a significant improvement in SVM performance when compared with the described results in Table 4. Namely, all data augmentation techniques applied improved classifier performance when compared with the baseline results of \(C_{8}\). Amongst the results presented, the best performing one is \(C_{14}\) which refers to the application of the second version of the hybrid method.

Table 5 SVM applied to the dataset with features normalization

K-NN results

Table 6 reports the K-NN classifier results without using feature normalization. The application of undersampling (\(C_{2}\)) improves K-NN performance in what concerns precision, recall, F1-score, and standard deviation. On the other hand, accuracy and specificity results are reduced when compared to the baseline (\(C_{1}\)). In addition, the application of oversampling techniques (\(C_3, C_4, C_5, C_6, C_7\)) caused an improvement when compared to: (i) the baseline results (\(C_1\)); and (ii) the undersampling approach (\(C_2\)). The techniques that exhibited the best results made use of AWGN (\(C_4\) and \(C_5\)).

Table 6 K-NN applied to the dataset without features normalization

Table 7 reports K-NN results for normalized features. As can be verified, the application of feature normalization improved the performance in all evaluated cases when compared to the results without normalization shown in Table 6. The application of undersampling (\(C_2\)) improved precision, recall, F1 score and standard deviation when compared against \(C_{1}\).

The application of data augmentation techniques (\(C_{10}, C_{11}, C_{12}, C_{13}, C_{14}\)) improved K-NN performance. The technique which presented the best result was the second version of the hybrid method (\(C_{14}\)), which resulted in an improvement of \(20.92\%\) in precision, \(21.19\%\) in recall, \(5.35\%\) in specificity, \(15.26\%\) in accuracy, \( {21.06}\%\) in F1-score and a reduction of \(6.21\%\) in standard deviation without requiring an increase in processing time against the baseline (\(C_{8}\)).

Table 7 K-NN applied to the dataset with features normalization

RF results

Table 8 presents RF results without feature normalization. The use of undersampling (\(C_2\)) increases performance when compared against \(C_1\). Application of oversampling causes an improvement in performance (\(C_3, C_4, C_5, C_6, C_7\)). The best performance was derived from AWGN application in all classes (\(C_5\)) and the second version of the hybrid proposal (\(C_7\)). The latter achieved the best results, producing an improvement of \(7.71\%\) in precision, \(11.46\%\) in recall, \(2.81\%\) in specificity, \(9.63\%\) in F1-score, \(8.27\%\) in accuracy, \(2.93\%\) reduction in standard deviation, and a processing time of 0.07 s when compared against \(C_1\).

Table 8 RF applied to the dataset without features normalization

Table 9 shows RF results using feature normalization. The data demonstrate an improvement in RF performance for \(C_8\) and \(C_{10}\) when compared with, respectively, \(C_1\) and \(C_3\) of Table 8. However, RF performance for \(C_9, C_{11},\) \(C_{12}, C_{13}, C_{14}\) was reduced when compared against, respectively, \(C_2, C_4, C_5, C_6,\) \(C_7\) of Table 8. The results of Table 9 also show that the application of data augmentation techniques (\(C_{10}, C_{11}, C_{12}, C_{13}, C_{14}\)) improved RF performance when compared to \(C_8\). The most effective techniques were: (i) SMOTE (\(C_{10}\)); and (ii) AWGN applied to all classes (\(C_{12}\)) using normalized features.

Table 9 RF applied to the dataset with features normalization

SSAE results

Table 10 reports SSAE results without feature normalization. The use of undersampling (\(C_2\)) reduced the specificity, F1-score, and accuracy when compared against \(C_1\). Application of oversampling (\(C_3, C_4, C_5, C_6, C_7\)) caused an improvement in precision, recall, F1-score, and standard deviation. The best performance was derived from the AWGN application in minority classes (\(C_4\)).

Table 10 SSAE applied to the dataset without features normalization

Table 11 presents the results concerning feature normalization and showcases a significant improvement in SSAE performance when compared with the described results in Table 10. The use of undersampling (\(C_2\)) reduced the performance when compared against \(C_1\). The application of data augmentation techniques (\(C_{10}, C_{11}, C_{12}, C_{13}, C_{14}\)) improved SSAE performance. The technique which presented the best result was the second version of the hybrid method (\(C_{14}\)), which resulted in an improvement of \(4.42\%\) in precision, \(5.12\%\) in recall, \( {1.07}\%\) in specificity, \(4.77\%\) in accuracy, \(3.53\%\) in F1-score, a reduction of \(3.10\%\) in standard deviation and reduced the processing time in 0.45 seconds against the baseline (\(C_{8}\)).

Table 11 SSAE applied to the dataset with features normalization
Fig. 8
figure 8

Radar plot of the best classifiers: SVM using hybrid method version 2 applied to normalized features, K-NN using hybrid method version 2 applied to normalized features and; RF using hybrid method version 2 applied to non-normalized features; SSAE using hybrid method version 2 applied to normalized features

Discussion

The results also demonstrate that feature normalization is a relevant step for the K-NN, SVM, and SSAE methods, as these methods are sensitive to different feature scales. Avoiding the characteristics that have low values when compared to other ones has little influence on the decision of these classifiers. The application of the AWGN and SMOTE techniques improves the results of the four classifiers analyzed when compared to the baseline results. This is due to the small number of examples of faulty classes available in the original data sets, which hinders the individual training stages. The scarcity of machine failure signals is a frequent occurrence in real industrial environments, making the case for data augmentation approaches.

By analyzing the results it is possible to conclude that the SVM classifier achieved the best behavior when using the original dataset for both the normalized and non-normalized approaches (\(C_1\) and \(C_8\)). Application of undersampling increased the performance of (i) K-NN when applied to normalized features; and (ii) RF when non-normalized features were used. RF performance through normalization only improved when the original dataset was employed (\(C_8\)) and when using SMOTE (\(C_{10}\)). The K-NN technique was able to deliver the fastest classification times. Overall:

  • SVM exhibited the best results when using the second version of the hybrid method applied to the normalized features;

  • K-NN exhibited the best results when using the second version of the hybrid method applied to normalized features;

  • RF exhibited the best results when using the second version of the hybrid method applied to non-normalized features;

  • SSAE exhibited the best results when using the second version of the hybrid method applied to the normalized features.

Version 2 of the proposed hybrid method is the data augmentation technique that resulted in the best performance, surpassing the application of the AWGN and SMOTE techniques individually. This shows the effectiveness of the approach when identifying combined failures in rotating machines. The hybrid proposal was able to produce new data examples with greater randomness than when using only AWGN or SMOTE. Consequently, the models generated from the hybrid approach are more generalist, resulting in an improvement in classifier performance. Figure 8 presents a radar plot comparing the performance of these classifiers.

Overall, the SSAE classifier stood out, outperforming the other ones, except for classification time where K-NN performed better. As a result, in the context of this research, the SSAE with feature normalization alongside the second version of the hybrid data augmentation proposal exhibits the best performance. In addition, it is also important to emphasize that the less time a classifier takes to identify a test example, the less complex the generated classifier model will be (Qin et al. 2021). Classification time can be a determining factor for online fault diagnosis when deploying a classifier in an industrial setting. The data obtained show that the K-NN algorithm is recommended due to its processing speed and for exhibiting good performance among the four classifiers examined.

Conclusions

In this paper, a hybrid data augmentation method based on AWGN and SMOTE techniques were proposed to diagnose combined faults in rotating machines, which is a more complex task than identifying isolated failures. In industrial rotating machines, little data is available regarding faults when compared to normal operation, which leads to an imbalanced dataset. Consequently, it is necessary to use data augmentation techniques to increase the number of minority classes examples to improve classifier performance.

To validate the generalization and effectiveness of the proposed method, a comparison with 4 classifiers was performed considering 14 different cases. Each one of these tested a specific configuration such as using the original dataset, undersampling the majority class, applying feature normalization, utilizing AWGN, employing SMOTE and our hybrid proposal. The results obtained show that the latter surpassed the other approaches used in this paper. This resulted in more generalist classifier models, which improved their performance.

The best result was achieved by combining the hybrid data augmentation with the SSAE algorithm using normalized features. This method was able to achieve a processing time of 0.13 seconds whilst attaining \(100\%\) of accuracy. However, if the classifier is to be deployed in industrial applications where execution time is crucial then the K-NN classifier is a good option due to its compromise of high processing speed ( 0.04 seconds) and elevated accuracy (\(99.46\%\)). Overall, the proposed hybrid data augmentation method is effective in improving classifier performance.

For future work, it is our intention to: (i) add the classes of horizontal and vertical misalignment separately; and (ii) the combined failure of horizontal misalignment associated with vertical misalignment. The addition of these classes will require a reevaluation of classifier performance. We also intend to use techniques such as genetic algorithms and minimum-redundancy maximum-relevancy to select the best features in order to perform dimensionality reduction. This procedure has the potential to improve classifier performance and avoid overfitting.