1 Introduction

Gearboxes are mechanical components of rotating machines that enable torque transmission. The configuration of such components is complex and they comprise mainly gears, bearings, and shafts. Faults in gearboxes can occur due to several factors. In particular, strong plastic deformation, lack of lubrication, wear, corrosion, and additionally design errors represent another reason involved in fault development. In consequence, several research efforts have been performed with the objective of developing methods that can detect faults in rotating machines with high accuracy and reliability [7, 61]. Detection and diagnosis of faults in gearboxes are tasks usually performed based on acquisition and analysis of vibration signals [30]. The standard method performs fault classification using machine learning techniques [42, 45]. Accomplishing this task requires the extraction of useful features from the recorded signals using classical signal analysis methods either in time [25] or frequency domains, and time-frequency representation algorithms [1, 28, 57].

Data-driven fault detection and classification are performed by using extracted features as inputs of high-performance machine learning techniques [36]. For instance, Li et al. [29] addressed the problem of fault classification in gearboxes using vibration measurements; they use a Gaussian-Bernoulli deep Boltzmann machine that performs fault classification in gearboxes and bearings using a statistical features set. Such features set is estimated from different representations of the vibration signal including the original time-based representation, their frequency representation, and even time-frequency representations obtained using the Fourier transform or the wavelet transform. He et al. [16] reported an application concerning the diagnosis of faults in a gearbox transmission chain using an unsupervised approach known as deep belief networks, where optimization of structural parameters is attained using a genetic algorithm; the approach shows improvements with respect to other machine learning techniques for classifying faults in gearboxes and bearings. Cerrada et al. [8] developed a system that enables diagnosis of multi-class faults in spur gears; they considered a selected set of features extracted from vibration signal using frequency and time based analysis. Ranking of such features using genetic algorithms enables faults classification with a random forest algorithm. Sanchez et al. [45] reported a methodological framework for multi-fault classification in rotating machinery using K-nearest neighbors and random forest; the ten most relevant features are selected using ranking techniques including chi-square, the ReliefF algorithm, and the information gain algorithm. This method is useful for online implementation as the extraction of features is developed in time domain. Praveenkumar et al. [42] used support vector machines (SVM) to classify faults in an automobile gearbox; they extracted a set of statistical features from the times series representation of the vibration signal and used it for fault classification with SVM. More recently, Medina et al. [32] also applied SVM with a different approach; in this research, Poincaré plots are the source of features enabling the classification of faults in gearboxes with an error correcting output codes (ECOC)-based multi-class SVM [12, 56]. Such application was inspired by the work of Brennan et al. [4] and Piskorski and Guzik [40]. A generalization of the Poincaré plot is considered where the lag between samples took an arbitrary value; this had the advantage of considering a time delay embedding system useful for modeling chaotic time series [55].

The underlying assumption when using classical feature extraction methods is that the source of acquired signals corresponds to a linear system satisfying stationary and periodicity properties [27]. Unfortunately, vibration signals in rotating machinery include sub-harmonics, chaotic features, and super-harmonics [9] due to the non-linear properties of the mechanical system. Nonetheless, some works in literature do use non-linear models; for instance, in [35], the authors reported the non-linear dynamic model for three bevel gears.

In rotating machinery where non-linear models are the most accurate, the non-linear dynamics methods and chaos theory are tools useful for analysis of vibration signals [48, 59]. Previous research within this context has reported investigations incorporating chaos and fractal approaches aimed at the analysis of the recorded signals. As an example, Sun [51] studied the evolution of faults in non-linear rotatory mechanical systems by using the largest Lyapunov exponent (LLE). Janjarasjitt et al. [20] reported the utilization of the correlation dimension (CD) to show the statistical differences of a brand new roller element with respect to another roller element close to failure based on the vibration signal analysis. More recently, Soleimani and Khadem [48] addressed the use of phase space for the analysis of gearboxes and ball bearings status; additionally, authors also proposed to use several features aimed at damage detection. They used features such as the largest Lyapunov exponent, the correlation dimension, and the approximate entropy.

A non-linear dynamical system can be analyzed by studying symbolic dynamics attained by phase-space topological partitions [5, 14]. In particular, symbolic signal processing is a method useful for transforming the recorded signals into sequences of symbols to enhance the features, discarding unnecessary details. Gupta and Ray [15] applied symbolic dynamics techniques for anomalies detection; they proposed a wavelet-based partitioning approach of the phase-space for extracting symbol sequences from the recorded time series. They performed the wavelet analysis after a careful selection of the necessary parameters. The resultant wavelet coefficients are represented as a one-dimensional signal at selected time and shift position. This scale series is used for generating the symbol sequence using a maximum entropy partition approach. The method is known as symbolic dynamic filtering (SDF) and one of the applications is early detection and prognosis of anomalies in several systems. Rao et al. [43] compared SDF with other methods for anomalies detection; in this research, the analytic signal is processed for extracting the symbol sequence and the scheme for space partitioning is extended to bi-dimensional data in order to generate symbolic representations of images. Space partitioning of the analytic signal relies on the application of the Hilbert transform. A D-Markov machine, operating as a finite state automata, is used for representing statistics of the symbol sequence. The 2D methodology for pattern detection using SDF is also applied to 2D wavelet coefficient datasets for anomaly detection. Further details of the approach using SDF are reported in [49]. An additional application of SDF was reported by Subbu et al. [50], where the evolution of fatigue damage is detected and quantified by using 2D SDF on images representing the surface of test specimens made of a polycrystalline alloy; they compared these results with respect to a similar analysis performed on 1D ultrasound signals simultaneously recorded from the same specimen, showing close agreement. Akintayo and Sarkar [2] reported a modification of the SDF method, corresponding to an algorithm for hierarchical symbolic dynamics filtering (HSDF) that is useful for modeling time series that are non-stationary.

Applications of symbolic dynamics have been reported for different research domains, as addressed by Daw et al. [11] which includes works in fields such as astrophysics, chemistry, geophysics, mechanical systems, biology, and telecommunications. For instance, Sanjith et al. [46] addressed fault detection of bearings using vibration signals; the time series vibration signal is converted into a symbolic sequence, and then, by using the generated symbolic data, a dictionary is constructed. The vibration signal for the no-load case is used as a reference for detecting the bearing in healthy or faulty condition. The comparison with respect to the reference data is estimated using the Common Signal Index. The classification accuracy varies from 91.5 to 97.1%. de Paula et al. [39] used symbolic dynamics for flow patterns identification in a bank of tubes; the symbolic dynamics of the turbulent cross flow over tube banks can be useful for identifying patterns contained within the data. The symbolic dynamics approach was performed on the experimental time series, by using histograms according to a binary alphabet. The turbulent cross flow over a tube bank of five rows of circular cylinders placed in an aerodynamic channel was studied for validating the proposed approach. Considering more complex systems, Xu and Beck [58] performed an investigation of share price returns using a simple symbolic description. The approach considers two and four symbols for coding the signals. The sequence of symbols is analyzed using the spectrum of Rényi entropies. The share price returns showed a non-Markovian evolution in time.

An important number of applications of symbolic analysis have been reported in the neuro-science and cardiovascular physiology literature [41]. Kurths et al. [26] used symbolic dynamics to study the variability of heart rate intervals extracted from electrocardiographic signals; authors used the symbolic dynamics for quantifying the possibility of suffering ventricular arrhythmia and eventually sudden cardiac death. The symbolic description uses four symbols for representing the amplitude of beat-to-beat intervals. The symbols series are analyzed using generalized Rényi entropy [44]. Valencia et al. [54] studied whether symbolic transformations of time series extracted from electrocardiographic signals (RR and QT time series) could be useful for separating ischemic dilated cardiomyopathy (IDC) patients with respect to control (HC) subjects. The method attained an accuracy of 80% for separating both compared groups. Kim et al. [23] used the subdivision of the Poincaré plot in the context of electrocardiogram signal processing aimed at detection of cardiac abnormalities; in this application of symbolic dynamics, the goal is the detection of neurocardiogenic syncope after the analysis of heart rate signals extracted from electrocardiograms acquired during the head-up tilt test. The Poincaré plot is subdivided using lines parallel to the identity line y = x. The subdivision considers six regions and each region is assigned a symbol. The symbol sequences generated are analyzed using the approximated entropy showing their usefulness for abnormality detection.

In this work, a gearbox vibration signal dataset is acquired in a test rig under different conditions of load, motor speed, and faults related to the normal class and nine different fault configurations considering tooth breakage, wear, crack, and chafing. Some faults were configured in the pinion, other ones in the gear, and one both combined. Additionally, balanced data is considered, that is, the dataset has the same amount of samples for different loads and motor speed under normal or fault condition. Then, these data are analyzed with the goal of attaining the multi-class classification of such faults. Two methods for extracting features from vibration signals in gearboxes by using SDA and PDSA on the Poincaré plot are proposed. The extracted features are used for gearbox faults classification with a multi-class SVM [12, 56]. Both methods exploit the non-linear nature of the gearbox mechanical system [35] for representing their symbolic dynamics aimed at faults classification. The Poincaré surface of section [5, 14] is obtained by plotting the measured vibration signal in a phase space of lagged coordinates. We have considered a modification of the method reported by Kim et al. [23]; in particular, the symbol sequence is obtained by partitioning the phase space of lagged coordinates into a set of angular regions. Each of the regions is identified with a symbol extracted from the available alphabet. The probability distribution generated by data from each symbol is used for attaining the classification with the ECOC-SVM algorithm. The proposed methods are simple and highly accurate for classification of faults in gearboxes attaining classification accuracies higher than 99.0%.

2 Theoretical background

2.1 Poincaré plot

The Poincaré plot is formally the intersection of a periodic orbit of a continuous dynamical system, in the state space, with a subspace of lower dimensions [53]. Such Poincaré section is the projection of the data in the plane spanned by the vectors defined by [x(t),x(t + τ)] that allows to obtain the Poincaré map or plot of the dynamical system [9]. In vibration signal analysis, the Poincaré plot is obtained by plotting each sample versus the next. The Poincaré plot can be generalized by setting an arbitrary lag τ between samples. The Poincaré plot has been mainly used as a graphical tool for describing the mechanical dynamics of chaotic nature in gearboxes and bearings, either in normal operation or in faulty conditions [10, 37, 38]. However, in applications concerning the detection of heart diseases based on electrocardiographic signal analysis, the Poincaré plot has been used for extracting useful features for abnormalities detection [24].

2.2 Support vector machines

The usual approach for data-driven fault classification involves three stages: the first stage is devoted to feature extraction. The second stage is optional and it is related to features selection; the fact is that some features have low sensitivity to fault evolution, and in consequence, these features could become irrelevant or redundant for fault detection and diagnosis and features selection could be a necessary stage [34]. The third stage is fault classification using machine learning algorithms such as SVM [21, 60] and artificial neural networks (ANN) [36].

SVMs are machine learning algorithms useful for binary classification where an optimal hyper-plane between classes is estimated [31]. Multi-class versions of SVMs are also available to construct the decision functions simultaneously for all classes [56]. An alternative solution consists in subdividing the problem of multi-class classification in order to solve instead several simple binary classifications. Combination of such binary classifiers using the ECOC approach [12] enables multi-class classification. Discrimination of one class against the rest corresponds to the one-versus-all ECOC strategy. However, other strategies are also available. In particular, the one-versus-one method improves the per- formance of classification in problems with several classes.

2.3 Symbolic dynamics

Reconstruction of chaotic systems from noisy signals using a symbolic approach is a feasible method as shown in [52]. According to this theory, a dynamical system denoted f : QQ with time evolution (x0,x1,...) can be described by a sequence of symbols. This type of dynamical description, requires introducing a partition p = {P1,P2,...,Pr} subdividing the state space Q into r disjoint sets. Each subset is represented by a symbol si ∈{1,2,...,r} = S. An orbit in the state space is represented by the sequence of visited symbols s = {s0,s1,...}; this is shown in Fig. 1. The symbol sequence is considered as a transform for the input data that is able to retain important temporal information [11]. The symbolic dynamics is related to Poincaré maps where a multi-dimensional phase-space trajectory is intercepted with a plane to represent the simple Poincaré map [17]. The heuristical idea behind the symbolic representation is introducing a coarse re-sampling of the Poincaré map. Therefore, any interception of the phase-space trajectory located within a sub-region of the Poincaré map is known as “a cell” and it is assigned the same symbol; this is illustrated in Fig. 2 where in (a), the phase-space trajectories for the Lorenz attractor [13] intercept a plane for constructing the Poincaré plot. In (b), the Poincaré plot is shown. As the plane space can be arbitrarily discretized into several “cells,” the points of the Poincaré plot located in each “cell” can be represented by the same symbol.

Fig. 1
figure 1

Graphical representation of the phase space partitioning showing the obtained symbolic dynamics. Given the partitioning p = {P1,P2,P3,P4}, the trajectory (x0,x1,x2,x3,x4,...) is represented by the symbol sequence (s0,s1,s1,s3,s2,...) = (1,2,2,4,3,...)

Fig. 2
figure 2

a The phase-space trajectories for the Lorenz attractor are intersected with a plane for obtaining the Poincaré map. b The intersection points between the phase-space trajectories and the plane are represented by the discretized Poincaré plot

Even when the symbolic dynamics has a well established theoretical basis, in practice, the generation of partitions of the Poincaré plot for attaining a trajectory in phase space associated with a unique symbol sequence is only feasible for certain systems [52]. In consequence, practical applications of symbolization have been mainly heuristic and empirical.

3 Methodology

The test rig for signal acquisition is located at the Rotating Machinery Laboratory of Universidad Politécnica Salesiana from Cuenca, Ecuador. Such a test rig is shown in Fig. 3. An induction motor model Siemens 1LA7 090-4YA60, 2 Hp for 1.1 kW is powered by three-phase electric line of 220 V at 60 Hz. This motor is used for providing rotational motion. The nominal speed of this motor has been set to 1650 rpm. The mechanical motion produced by the motor is transferred to one-stage gearbox where several gear faults are incorporated. A magnetic brake system connected to the output of the gearbox through a pulley is used for loading the motor. A diagram showing the components of the test rig used in this research is presented in Fig. 4.

Fig. 3
figure 3

Vibration analysis laboratory at Universidad Politécnica Salesiana, Cuenca, Ecuador

Fig. 4
figure 4

Schematic of the experimental test bed for simulation of faults in rotating machinery

For this research, different break loads were implemented using the magnetic brake. Such loads were 0.83 Nm (L1), 4.19 Nm (L2), and 7.54 Nm (L3). Hence, a set of three variable speeds (720–1080 rpm, 300–720 rpm, and 480–900 rpm) and three different constant speeds (480, 720, and 900 rpm) were generated using a variable-frequency drive. The length of each recorded signal was acquired during 10 s at a sample frequency of 50 kSamples/second. The acquisition is performed with 24-bit resolution after processing the signals with an anti-alias filter. Ten different fault modes were configured in the gearbox components including the healthy condition as described in Table 1. Each fault mode included up to a pair of faulty elements. The configured faults for the gearbox elements are shown in Fig. 5. The healthy (normal) condition is denoted as P1. Incipient faults are denoted as P2 and P3. A fault is considered incipient when symptoms of malfunction are just starting to appear; the diagnosis of incipient faults is very important in industry for reducing costs and risks for human operators. Four moderate faults are denoted as P4, P5, P7, and P9. Two severe faults are also included: P6 and P8, and finally P10 is a multi-fault. Several combinations of test-rig operation parameters were considered for acquiring 900 vibration signals in total. Each fault condition includes 90 signals where each load is represented by 30 vibration signals.

Table 1 Gear fault conditions
Fig. 5
figure 5

Simulated faults a 25% tooth breakage (P4), b 50% tooth breakage (P5), c 100% tooth breakage (P6), d 25% gear crack (P7), e 100% gear crack (P8), f tooth wear (P3), g tooth chaffing (P2), h 50% tooth chaffing (P9), and i gear set with simulated faults

Fig. 6
figure 6

Phase maps. a Vibration signal from a gearbox of the healthy class P1. b Phase map for the vibration signal from class P1 obtained by plotting dx(t)/dt versus x(t). c Vibration signal from a gearbox of the faulty class P10. d Phase map for the vibration signal from class P10

3.1 Feature extraction

3.1.1 Phase map and lagged Poincaré plot

In non-linear dynamical systems, the intersection of the state space trajectories with the Poincaré section is known as the phase map [9]. In the case of the test rig described previously, the observation measurement is the vibration signal represented by time series x(t) that measures the instantaneous displacement in the horizontal direction. A fragment of the vibration signal for a healthy gearbox (P1) is shown in Fig. 6a. The phase map is a plot of the time derivative dx(t)/dt in function of x(t) and the gearbox per- forms a quasi-periodical motion. In consequence, the phase map forms a closed curve as time evolves. Such a close curve is shown in Fig. 6b representing a fragment of signal shown in Fig. 6a. In Fig. 6c, a fragment of signal extracted from a faulty gearbox from the class P10 with faulty pinion Z1 and faulty gear Z2 is shown. In Fig. 6d, the phase map for the signal representing the faulty class P10 is shown.

In several applications, the Poincaré plot is obtained by plotting the sample x(t + 1) versus x(t) [4, 22, 40]. In this research, a generalization of the Poincaré plot is considered; hence, rather than plotting two consecutive samples of x(t), we are plotting the sample x(t + lag) as a function of x(t), where lag ≥ 1 is a lag in samples (known as lagged Poincaré plot [24]). In this application, the parameter lag enables a non-linear warp of the phase map. This is shown in Fig. 7a where the phase map shown in Fig. 6b appears rotated and deformed using a lag value lag = 11 samples. As we are interested only in the location of points within the Poincaré plot, the trajectory of points can be ignored and we obtain the Poincaré plot shown in Fig. 7b and d. The Poincaré plot is shown considering lag = 11. The Poincaré plot for the healthy class P1 is different with respect to the class P10. This suggests that the shape and location of points (Figs. 6 and 7 in the Poincaré plot) can be useful for fault detection. Selection of the lag parameter for optimal separation between fault classes has been previously reported [32]. The lag parameter should be greater than 10 samples and it can be chosen as the lag representing the first zero-crossing for the auto-correlation sequence of x(t). In particular, for this gearbox dataset, the optimal value for this parameter is lag = 24.

Fig. 7
figure 7

Lagged Poincaré plots. a Poincaré plot obtained by plotting x(t + lag) versus x(t) for a lag value set as lag = 11. The plot represents a vibration signal from a gearbox of the healthy class P1. The plot also shows the trajectories followed by points as time evolves. b Poincaré plot showing only the location of points. c Lagged Poincaré plot for the vibration signal extracted from class P10 showing the temporal trajectories of points. d Lagged Poincaré plot for signal from class P10 showing only the location of points

3.1.2 Peak detection

One of the stages of the peaks symbolic dynamics algorithm is the peak detection as shown in Fig. 8. The goal is to detect the peaks separated by at least a lagpeak samples. With this goal, the following steps are performed:

  1. 1.

    The absolute value for the vibration signal time series ∥x(t)∥ is analyzed and the peaks satisfying the condition of being separated by at least lagpeak samples are retained. This stage generates a sequence of samples denoted as peaks signal vp(t) that has a smaller size than the original signal. The sequences of peaks for signals of the healthy class P1 and the faulty class P10 are shown in Fig. 9a and c. The peak signal is shown overlaid on the original signal. A portion of 1000 samples of the original signal is shown. The peaks on the original signal that are not selected do not satisfy the requirement of being separated from the previous peak by at least lagpeak samples. This stage enables the compression of signals by reducing its length to the size of a vector including only the peaks detected. The parameter lagpeak defines the minimum distance between peaks extracted from the vibration signal. This parameter can be set as the lag, where the first zero-crossing of the auto-correlation sequence is located. The selected value lagpeak = 24 is used for extracting the sequence of peaks.

  2. 2.

    Construction of the Poincaré plot using the time series vp(t). The Poincaré plot is constructed with the set of points defined by each pair of consecutive peaks detected. The Poincaré plot for class P1 is shown in Fig. 9b and the Poincaré plot for the faulty class P10 is shown in Fig. 9d. The lag used for the Poincaré plots is lag = 1. The shape and distribution of points in the 2D space of the Poincaré plot are different for both classes.

Fig. 8
figure 8

Two proposed algorithms are shown. The upper part shows the peaks symbolic dynamics algorithm. The lower part in solid line shows the symbolic dynamics algorithm. The PSDA includes an additional stage corresponding to the peaks detection

Fig. 9
figure 9

Peaks extraction and Poincaré plots for the peaks sequences. a Peaks sequence overlaid on the original vibration signal extracted from healthy class P1. b Poincaré plot for the peaks sequence extracted from a signal of the class P1. The lag used for the Poincaré plot is lag2map = 1. c Peaks sequence overlaid on the original vibration signal extracted from faulty class P10. d Poincaré plot for the peaks sequence extracted from a signal of the class P10. The lag used for the Poincaré plot is lag2map = 1

3.1.3 Symbolic dynamics algorithm

A simple method known as symbolic dynamics algorithm (SDA) for feature estimation is proposed (see Fig. 8). The algorithm includes the following steps:

  1. 1.

    The first step consists in plotting the Poincaré map using a lag value of 24 as shown in [32].

  2. 2.

    The phase-space spanned by the Poincaré plot is subdivided into 12 angular regions as shown in Fig. 10.

  3. 3.

    Each region is assigned a symbol represented by integer numbers {0,1,2,3,...,11} as shown in Fig. 11.

  4. 4.

    The symbolic representation of the vibration signal is processed for obtaining the histogram. This is simply done by counting the number of points on each region. An example of the symbol histogram for a vibration signal is shown in Fig. 12.

  5. 5.

    All 12 elements in the histogram are considered as features for fault classification.

Fig. 10
figure 10

Poincaré plot. The phase-space is subdivided into 12 angular regions

Fig. 11
figure 11

Discretization of the phase-space. Each of the angular regions is assigned a symbol between 0 and 11

Fig. 12
figure 12

Symbol histogram for a gearbox vibration signal

3.1.4 Peak symbolic dynamics algorithm

A variant of this algorithm is also implemented based on the Poincaré plot of the sequence of peaks extracted from the vibration signal. The algorithm is known as peak symbolic dynamics algorithm (PSDA) (see Fig. 8). The rest of this algorithm includes the same steps of SDA.

3.2 Gearbox faults classification

The classification was performed in Matlab using multi-class ECOC-SVM. The features considered for each signal are the 12 elements in the histogram obtained using either the SDA or the PSDA. Selection of SVM parameters was performed using an empirical method. First, the kernel type was selected as the one that provided higher classification accuracy considering the rest of parameters as default. The Gaussian or RBF kernel was selected and for this type of kernel, the scale parameter (σ) is necessary. For selecting this parameter, an interval centered at the default scale parameter calculated by the Matlab software was considered. The scale parameter was varied and the value retained provided the maximum classification accuracy within this interval. Parameter C also known as Box Constraint is a regularization coefficient that controls the maximum penalty imposed on margin-violating observations, and prevent overfitting. The default value for this parameter is 1.0 and several values were considered; however, there was not any improvement in classification accuracy with respect to the default value. The Standardize option was set to true such that the software function centers and scales each column of the predictor data.

The k-fold cross-validation procedure was used to accomplish the phases of training and testing of the SVM based model. The use of this procedure is well proved to obtain less biased models and, in consequence, more robust models regarding the generalization capability, than other methods such as a simple train/test split.

Two experiments for faults classification were performed:

  1. 1.

    Experiment 1: the first experiment was performed considering all vibration signals available in the dataset. During the first classification experiment, the group of 900 vibration signals was randomly partitioned into 10 equal sized subsets (10-fold) including 90 signals. From the 10 subsets, one is considered as the test-set for validating the trained model. The training set is constructed with the rest of subsets. This training and validation process is repeated 10 times performing the validation each time with a different subset. The multi-class SVM was trained using a Gaussian kernel with σ = 0.87.

  2. 2.

    Experiment 2: in the second experiment, the classification was performed considering the signals acquired at a constant load: L1, L2, and L3. In this case, the dataset at constant load included 300 vibration signals, including 30 signals for each class. A 10-fold cross-validation was performed.

Evaluation of the classification ability of the trained classifier considered several metrics [18]. The discriminator metrics were based on the confusion matrix [47]; in particular, the accuracy or error rate is the evaluation metric commonly used in practice for evaluating the generalization capability of the classifier. In the case of binary classification, the correctness of a classification is evaluated by calculating the number of instances correctly classified (true positives (TP)), the number of instances that do not belong to the class and were correctly recognized (true negatives (TN)), and the instances that either were incorrectly assigned to the class (false positives (FP)) or were not recognized as instances belonging to the class (false negatives (FN)). These four calculated quantities are the elements of the confusion matrix for the binary classifier. In this research, however, we are dealing with multi-class classifiers. The confusion matrix for the multi-class classifier is a generalization of the binary case [47]. In the case of M classes, each of them is denoted as Ci where i ∈{1,2,...,M} and it has the associated true positive denoted tpi, false positive fpi, false negative fni, and true negative tni counts. Details of the calculation are presented in [47]. In particular, from the multi-class confusion matrix, several measures are calculated for each class. A set of calculated metrics is presented in Table 2. Calculation of the average metrics considering all classes can also be estimated as shown in [18, 47].

Table 2 Metrics calculated from the multi-class confusion matrix

The receiver operator characteristic (ROC) curve has been used for evaluation of machine learning algorithms [3, 19]. In particular, the area under the ROC curve denoted AUC is the metric commonly used as a measure of classifier performance. The AUC has been shown by Huang and Ling [19] that is a better metric than accuracy. The ROC was also considered as a metric for analyzing the SVM performance as a classifier.

4 Results

4.1 Cluster structure for the symbolic dynamic algorithm

The analysis of the cluster structure, generated with the estimated features extracted directly from the symbolic representation obtained from the input signal, was developed using the SDA, considering all vibration signals and also for each of the recorded loads. The best results were obtained considering each of the loads recorded separately. In particular, the best results are obtained with loads L2 or L3. In this case, the load L3 was chosen for analyzing the cluster structure. Visualization of the cluster structure is attained by selecting only three features from the set of 12 features. Considering n = 12 and taking combinations of three elements provides 220 possible combinations; testing all these combinations is very time consuming. However, as the primary goal is to provide clues about the generated cluster structure, in this case, only a few combinations were tested for showing the cluster structure. Features 4, 5, and 8 were selected and the scatter plot was used for comparing the faulty classes P2 to P10 with respect to healthy class P1. The faulty classes are represented using dark dots while the class P1 is represented using light dots. The comparison between class P1 and P2 is shown in Fig. 13a; the cluster structure shows that class P2 can be separated from the normal class in the space generated for only three features. The separation could be attained with a single surface. Similarly, in Fig. 13b, the comparison between classes P1 and P3 is shown; clearly, the class P3 can be separated from normal class P1. Comparison of class P1 with respect to classes P4 and P5 is shown in Fig. 13c and d. The separation of clusters in the parameter space for both faulty classes is apparent. However, separation is better between class P5 and the normal class P1.

Fig. 13
figure 13

Cluster structures comparison for load L3 of the gearbox dataset. Class P1 (the healthy class) is shown in light dots and faulty class is represented using dark dots. a Cluster structures for classes P1 and P2. b Cluster structures for classes P1 and P3. c Cluster structures for classes P1 and P4. d Cluster structures for classes P1 and P5

In Fig. 14a and b, the comparison between class P1 and classes P6 and P7 is shown. The normal class has a cluster well separated with respect to clusters for the faulty classes. Similarly, the comparison between the normal class P1 and classes P8 and P9 is shown in Fig. 14c and d. Separation between the clusters P8 and P9 with respect to the normal cluster P1 is evident. Finally, comparison of a combined fault corresponding to class P10 with respect to normal class P1 is shown in Fig. 15. A single surface could be useful for separating the faulty clusterP10 with respect to the normal class P1.

Fig. 14
figure 14

Comparison of cluster structures for load L3 of the gearbox dataset. The class P1 (healthy class) is shown in light dots and the faulty class is shown in dark dots. a Cluster structures for classes P1 and P6. b Cluster structures for classes P1 and P7. c Cluster structures for classes P1 and P8. d Cluster structures for classes P1 and P9

Fig. 15
figure 15

Comparison between classes P1 and P10 using three features considering load L3 of the gearbox dataset

4.2 Results of experiment 1 with SDA

The first experiment concerning the fault classification was performed considering all vibration signals. The average accuracy obtained during the cross-validation was 99.2% using a Gaussian kernel with σ = 0.87. The resultant confusion matrix for this experiment, considering all signals in the dataset, is presented in Fig. 16. Table 3 shows the metrics values measuring the classification performance associated to each class. The formulas for calculating these metrics were presented in Table 2. The column 1 represents the false negative rate (fnr); classes P3, P4, P5, P6, and P9 attain the higher values for fnr that are, in all cases lower than 3.7%. The fpr metric is shown in column 2; classes P5, P6, and P9 attain the higher values for this metric. However, they are lower than 3.3%. The sensitivity (sn) is shown in the third column; class P6 attained the lowest sensitivity with a value of 96.7%. The highest sensitivity is 100%. The specificity (sp) is also shown. Class P5 attained the lowest specificity with a value of 99.6%.

Fig. 16
figure 16

Confusion matrix representing the results of faults classification using SDA considering all signals in the dataset

Table 3 Confusion matrix results considering all signals and SDA for feature extraction

The ROC curve is useful for visualizing classifiers performance where sensitivity is plotted against 1-specificity [3]. The ROC for the experiment using SDA with all vibration signals in the dataset is shown in Fig. 17. As we are dealing with a multi-class classification problem and the classification accuracy is very high and similar for different classes, the curves are overlaid. In this plot, the accuracy attained by class P5 is 96.7%. The accuracy attained by classes P3, P4, P6, and P9 is 98.9% and these classes are overlaid. The remaining classes P1, P2,P5, P8, and P10 attain a perfect classification accuracy of 100% and they appear overlaid. In general, the accuracy attained for all the classes is high and the lower performance is attained by class P5.

Fig. 17
figure 17

ROC curve for SDA considering all signals in the dataset

4.3 Results of the experiment 2 with SDA

The experiment concerning the faults classification using SDA for vibration signals acquired at a particular load provided a classification accuracy for load L1 of 97.7%, 100.0% for load L2, and finally 100.0% for load L3. The best classification accuracy is attained for load L2 and L3. A list of quantitative metrics for classification with SDA is presented in Table 5; this algorithm shows excellent accuracy either in Experiment 1 or Experiment 2.

4.4 Results of experiment 1 with PSDA

The PSDA was used for fault classification considering all vibration signals in the dataset. The average accuracy obtained during the validation was 99.8% using a Gaussian kernel with a value of σ = 0.87. Figure 18 shows the resultant confusion matrix considering all signals in the dataset. The performance metrics associated to each class are shown in Table 4. The false negative rate (fnr) is presented in the first column. The classes P3 and P10 attained the higher values for this parameter which is lower than 1.2%. The false positive rate (fpr)is presented in the second column. Classes P1 and P2 attained the higher value of 1.1%. The sensitivity (sn) or recall is presented in the third column. In this experiment, classes P1 and P2 attained the lowest sensitivity with a value of 98.9% while the rest of classes attained a valued of 100%. The specificity (sp) is presented in the fourth column and classes P3 and P10 attained the lowest value of 99.9%.

Fig. 18
figure 18

Confusion matrix representing the results of the experiment for faults classification using the symbol sequence extracted using PSDA considering all signals in the gearbox dataset

Table 4 Confusion Matrix results considering all the signals and PSDA for feature extraction

The ROC for PSDA, considering all signals in the gearbox dataset, is shown in Fig. 19. In this multi-class classification problem with classification accuracies very high and similar for different classes, the curves are overlaid. Classes P3 and P10 attained the lower performance in terms of accuracy with a value of 98.9% and both curves are overlaid. The remaining curves for P1, P2, P4, P5, P6, P7, P8, and P9 attain a perfect classification accuracy of 100% and they are overlaid in this plot. However, in general, the PSDA is highly accurate for gearbox fault classification.

Fig. 19
figure 19

ROC curve for gearbox faults classification using PSDA considering all signals in the gearbox dataset

4.5 Results of experiment 2 with PSDA

An experiment was performed and the features used for classification were extracted with PSDA from the sequence of peaks that comes from vibration signal. The accuracy of classification using PSDA was calculated considering a particular load. A classification accuracy of 98.7% for load L1 is attained as a result for this experiment. The accuracy obtained with load L2 was 100.0%, while the accuracy for load L3 was 99.7%. Classification considering signals acquired at load L2 provided the higher classification accuracy. A list of quantitative metrics for classification results with PSDA is presented in Table 5 where it is shown that this algorithm is highly accurate.

Table 5 Gearbox classification accuracy for the proposed algorithms (in percents)

4.6 Results comparison

4.6.1 Comparison of algorithms proposed

Both algorithms (SDA and PSDA) are highly accurate for fault classification in gearboxes. The classification accuracy is 99.78% for the tests performed using all vibration signals in the dataset. An accuracy of 100% can be obtained working at a particular load such as load L2 and L3. The results for the faults classification experiments are presented in Table 5. These results show that combination of the lagged Poincaré plot with symbolic dynamics is very efficient for extracting the features representing the mechanical status of the gearbox.

4.6.2 Comparison with respect to other methods using the same dataset

Comparison with other classification methods applied to the same gearbox dataset shows that accuracy results are higher considering both proposed algorithms. Results of the comparison are shown in Table 6. The maximum accuracy attained for each algorithm is presented. A method using the same gearbox dataset has been reported in [6]. A set of features is composed of time domain statistical features, as well as frequency domain and time-frequency domain features extracted from vibration signals. A hierarchical unsupervised algorithm is used for extracting representative features useful for fault classification. In particular, the classification was performed based on 12% of 811 features using several machine learning techniques, and the algorithm attained a classification accuracy of 98% using random forest classifiers. The same dataset was also analyzed in [30] using multimodal deep support vector classification (MDSVC). In this research, nine features were calculated from the time representation of the vibration signal; these features were also calculated from the frequency representation of the vibration signal. Similarly, a set of features was also extracted from the wavelet packet representation. A MDSVC composed by three Gaussian-Bernoulli deep Boltzmann machine (GDBM) was trained for classifying the faults of the gearbox dataset. The algorithm attained a classification rate of 97.08%. Another research with this dataset [32] performed the classification using only three features extracted from the Poincaré plot for feeding support vector machines. The highest classification accuracy was 95.3% considering vibration signals at constant load. Dictionary sparse–based representations of vibration signals were also used for extracting features for fault classification in this gearbox dataset [33]. In that case, the best classification accuracy was 95.1% and it was obtained at constant load. Further results using this gearbox dataset were reported in [45]. A total of 30 features are extracted from the time-domain representation of the vibration signal. The extracted features are ranked using ReliefF, chi-square, and information gain methods. After applying feature ranking methods, a selection of ten features was considered for classification. The machine learning algorithms used were K-nearest neighbors (KNN) and random forests. The highest classification accuracy obtained was 99.3%. In contrast, the maximum percent classification accuracy attained by the proposed symbolic dynamics algorithms is 99.8%. Additionally, the proposed algorithm considers only 12 features that are extracted easily from the time-domain representation of the vibration signal.

Table 6 Comparison with respect to other methods using the same dataset

5 Conclusions

The problem of fault classification in gearboxes is solved using two symbolic dynamics algorithms. The SDA extracts the symbols histogram directly from the Poincaré plot constructed considering the time samples from the vibration signal, and the PSDA extracts the symbols histogram from a Poincaré plot constructed with a peaks sequence obtained from the vibration signal. The feature analysis showed a cluster structure that could be useful for classification of faults in gearboxes.

The result concerning the total accuracy was 99.2% considering features extracted directly from the vibration signal. Alternatively, when the symbol histogram is extracted from the sequence of peaks from the vibration signal, the accuracy obtained was 99.8%. In both cases, all the signals in the dataset are considered.

Performing the classification of faults using only signals acquired at a particular load could reduce slightly the accuracy of classification for loads L1 using both algorithms. However, for load L2, the accuracy of classification is increased to 100%.

The analysis of the generated cluster structures shows that even considering a reduction in the number of features to only three elements of the histogram produces configuration of instances in the feature space that could be separated using the appropriate machine learning technique.

The experiments also showed that high accuracy is attained when the classification considers features from vibration signals acquired at a constant load. The proposed feature extraction approach has the advantage of being simple because only 12 features are extracted from the vibration signal. Moreover, their calculation requires a low computational cost. Additionally, the fault classification algorithms proposed are more accurate than other previously reported methods using the same dataset. This approach is also general as it could be useful for extracting features for classification of faults, considering other types of signals recorded from gearboxes or roller bearings.

The proposed method is fast and can be adapted for real-time condition monitoring. A trained SVM could be used for classifying with low delay, a recorded portion of a vibration signal as only 12 features are necessary and their calculation and classification require a low computational cost. No other variant of the SVM algorithm was considered as the obtained performance was adequate. Future works could address the comparison between different SVM algorithms under the particular context of the available number of features and samples of this case study.

The gearbox vibration signals dataset used in this research includes a number of signals that enables the machine learning technique to solve a multi-classification problem that is balanced, since the number of instances in each of the classes is similar. Further research is still necessary for evaluating the efficacy of the feature extraction algorithms in situations where there is no balance among the considered classes. Furthermore, even when the test rig considers classes which comprise combined faults in the pinion and gear, there are more complex combination of faults that would be necessary to study using the proposed algorithms.