Introduction

Works that require repetitive action carried out with upper limbs are frequent. They usually involve simultaneous manipulations that are done with upper limbs, commonly at a fixed repetition rate. This kind of upper limb activity, which is called as cyclic work or repetitive task, composed of sequences of tasks. The period of the cycle and the relative external force of each cycle period determine the conditions of the external workload while repetitive tasks are performed [1]. It has been demonstrated that repetitive tasks make work mostly hazardous [24]. The relationship between the external workload characterised by the aforementioned parameters of a repetitive task and musculoskeletal load or fatigue has been evidenced in numerous studies, in which individual authors discussed various indices of fatigue of the musculoskeletal system [410].

The existence of fatigue can be determined analyzing the EMG signal. We can define the muscle fatigue as decrease in the force generating capacity of a muscle. This kind of change in mechanical performance capacity results in EMG changes. The effect of fatigue is observable in the recorded EMG signal as a change of values of selected EMG parameters. Those individual parameters can be achieved as a consequence of processing the EMG signal in the time and frequency domains. Muscle fatigue causes a shift of the EMG power spectrum into lower frequency [4, 11]. Successful analysis of transient surface electromyogram (SEMG) signals for fatigue analysis necessitates proper spectral estimation techniques. The use of short-time Fourier transform (STFT) stays away from the question of stationary signals by describing the time-interval to be used in the computation. However, there are limitations in the use of Fourier transform due to the time frequency resolution [5]. Particularly, the signal must be of time-invariant (stationary) or periodic frequency content within the analysis window, otherwise, the resulting spectrum will make little physical sense. Unfortunately, the EMG is a time-varying signal, in particular for contraction levels higher than 50% of maximum voluntary contraction (MVC) [12]. Hence time–frequency methods such as STFT, Wigner–Ville distribution (WVD) and continuous wavelet transform (CWT) can be used for time-varying (non-stationary) signal analysis. In this paper, we propose a method via time–frequency resolution to validate the efficiency of the variable to enumerate EMG manifestation of muscle fatigue. These methods overcome the problems of the conventional Fourier spectral variables deriving method by avoiding the spectral estimation. Since it does not require any quasi-stationarity and linear assumptions, time–frequency methods are suitable for non-stationary signals. The methods supply a compact and physically meaningful representation of EMG signal [6].

In the analysis of an upper limb repetitive task, it is imperative to express muscular fatigue quantitatively. This makes it possible to differentiate dissimilarity in muscle fatigue depending on the circumstances of the external load both for continuous static loads and for repetitive ones. As a consequence, it is crucial to consider how that time should be determined. The time of sustaining a load imposed by experimental conditions does not take into consideration individual capabilities of participants and it can be either too short or too long considering the process of muscle fatigue. As a result, it can be assumed that the issues are of particular meaning in assessing muscle fatigue, that is, changes in the value of a particular EMG signal parameter in time, time after which muscle fatigue takes place. Thus, muscle fatigue should be assessed with an indicator enabling quantitative assessment of muscle fatigue [4].

The aim of this study was to distinguish fatigue of upper limb muscles, which involves a function describing changes in the EMG signal parameter in time and frequency, using artificial neural networks (ANN). The proposed methodology consists of three steps. In the first step, the EMG signal is preprocessed using time–frequency methods to extract features from EMG. In the second step, the dimension of extracted features is reduced by using independent component analysis (ICA). In the third step, an unknown EMG signal is classified as fresh or fatigued using multi-layer perceptron neural network (MLPNN) with Levenberg–Marquardt (L-M) and gradient descent (GDA) algorithms.

Material and methods

Experimental design

Fourteen healthy, right-hand dominant volunteers (mean age 23.3, S.D. 1.6 years), participated in the experiment. The height of the subjects varied from 167 to 183 cm (mean 176.5 cm, SD 4.5 cm) and mass varied from 53 to 80 kg (mean 71 kg, SD 7.7 kg). Subjects were recruited from the university setting through requests for volunteers and represented a population of recreationally active young adults, none of which were professional athletes. All subjects gave their informed consent to participate in the study. No subject had known symptoms of neuromuscular disorders. EMG signals were obtained from the right biceps brachii in an isometric constant force experiment. The participants performed cyclic contractions of the dominant arm, holding a 4 kg dumbbell corresponding to 15% of their MVC (1 s raising the weight and 1 s bringing down the weight). The duration of the test was limited to 2 min, but subjects were allowed to stop earlier when exhausted. Thirteen subjects performed the task for 2 min. Only one subject had to stop the contraction earlier due to exhaustion. Bipolar surface EMG (with BIOPAC MP150 system, EMG100C, Gain: 1,000, band-pass: 1–500 Hz; sampling rate 1,000 Hz) was recorded using surface Ag–AgCl electrodes placed over dominant hand biceps brachia muscle of subjects. The electrode site was initially cleansed with sterile alcohol pads by applying a sufficient abrasive action to lessen the resistance of the skin and consequently improve the SNR. All software implementations were done in MATLAB (The MathWorks Inc., Natick, MA).

Time–frequency representations of signals

The simultaneous mapping of time and frequency can never be completely achieved due to the uncertainty principle. Hence, Fourier transform methods are not usually suitable approach in the analysis of signals with transient components [13]. To capture time and frequency information, a special type of signal representation is needed, which creates a function of two variables, time and frequency, that captures transient effects in the signal. A more comprehensive representation of the signal can be obtained by using time–frequency representations. The time–frequency methods have turn out to be a powerful alternative for the analysis of transient signals. The time–frequency has variable time–frequency resolution over the time–frequency plane by providing good time resolution at high frequency and good frequency resolution at low frequencies [14].

Short-term Fourier transform: The spectrogram

The short-time Fourier transform (STFT) is a popular method of analysing non-stationary signals since it is simple and computationally efficient and yields reliable time–frequency plots for slowly varying signals. The major drawback is that there is a compromise between time and frequency resolution of the decomposition. For signals containing transient features it may be not possible to handle this compromise efficiently [15].

The basic equation for the STFT in the continuous domain is:

$$X_{{{\text{STFT}}}} {\left( {\tau ,f} \right)} = {\int\limits_{ - \infty }^\infty {x{\left( t \right)}.w{\left( {\tau - t} \right)}} }e^{{ - j2\pi ft}} \cdot \operatorname{d} t$$
(1)

where w(τt) is the window function and t is the variable that slides the window across the waveform, x(t). There are two major problems with the spectrogram: (1) selecting an optimal window length for data segments which include numerous different features may not be possible, and (2) the time–frequency tradeoff: shortening the data length, N, to get better time resolution will decrease frequency resolution which is approximately 1 / (NT s ). Shortening the data segment could also result in the loss of low frequencies that are no longer fully included in the data segment. Consequently, if the window is made smaller to improve the time resolution, then the frequency resolution is degraded and vice versa. This time–frequency tradeoff has been equated to an uncertainty principle where the product of frequency resolution (expressed as bandwidth, B) and time, T, must be greater than some minimum. Specifically:

$$BT \geqslant \frac{1}{{4\pi }}$$
(2)

The trade-off between time and frequency resolution inherent in the STFT, or spectrogram, has motivated a number of other time–frequency methods as well as the time-scale approaches. In spite of these types of limitations, the STFT has been used effectively in a wide range of problems, mainly those where only high frequency components are of interest and frequency resolution is not crucial [16].

The Wigner–Ville distribution

Different kind of approaches has been developed to eliminate some of the limitations of the spectrogram. The first of these was the Wigner–Ville distribution (WVD) which is also one of the most studied and best understood of the many time–frequency methods. The approach was actually developed by Wigner for use in physics, but later applied to signal processing by Ville, hence the dual name. The Wigner–Ville distribution is a special case of a wide variety of similar transformations known under the heading of Cohen’s class of distributions. For an extensive summary of these distributions see [1617].

The Wigner–Ville distribution uses an approach that harkens back to the early use of the autocorrelation function for calculating the power spectrum. Hence the Wigner–Ville distribution can be defined from the time domain representation of the signal:

$$WVD_{X} {\left( {t,f} \right)} = {\int\limits_{ - \infty }^{ - \infty } {x{\left( {t + \frac{\tau }{2}} \right)}.x^{*} {\left( {\tau - \frac{\tau }{2}} \right)} \cdot e^{{ - 2j.\pi f.\tau }} \cdot \operatorname{d} \tau } }$$
(3)

where \(R_{x} {\left( {t,\tau } \right)} = x{\left( {t + \frac{\tau }{2}} \right)} \cdot x^{*} {\left( {\tau - \frac{\tau }{2}} \right)}\) is the instantaneous autocorrelation function and * indicates conjugate operation [16]. This distribution formula satisfies a large number of desirable mathematical properties. In particular the WVD is always real valued and it preserves time and frequency shifts and satisfies the marginal properties. In the last decades, many researches have been made at the efficient suppression of crossterms and enhancement of the frequency resolution, maintaining the required properties of time–frequency energy distribution. Although the Wigner–Ville distribution has the merits of a high resolution in both time and frequency but the interfering terms produced by the interaction of two signal components make it hard to interpret the result. Hence it is better to reduce the number and the amplitude of the interference. If a time smoothing window g(t) and a frequency smoothing window h(t) are applied then the WVD becomes the smoothed-pseudo-Wigner–Ville distribution (SPWVD) which is written as

$$SPWVD_{x} {\left( {t,f} \right)} = \iint {h{\left( {t - \tau } \right)}g{\left( {f - \xi } \right)}W{}_{x}{\left( {\tau ,\xi } \right)}\operatorname{d} \tau \operatorname{d} \xi }$$
(4)

where W is the Wigner Ville distribution. The previous compromise of the spectrogram between time and frequency resolutions is now replaced by a compromise between the joint time–frequency resolution and the level of the interference terms. That is, if you increase the smoothing in time and/or frequency you will get a poorer resolution in time and/or frequency [1520].

The WVD has a number of limitations. Most serious of these is the production of cross products: the demonstration of energies at time–frequency values where they do not exist. These phantom energies have been the most important motivator for the development of other distributions that apply various filters to the instantaneous autocorrelation function to mitigate the damage done by the cross products. Besides, the Wigner–Ville distribution can have negative regions that have no meaning. The Wigner–Ville distribution also has poor noise properties. In essence the noise is distributed across all time and frequency including cross products of the noise, even if in some cases, the cross products and noise influences can be reduced by using a window. In such situations, the required window function is applied to the lag dimension of the instant autocorrelation function similar to the way it was applied to the time function [16].

The continuous wavelet transform

Wavelets handle the time–frequency resolution compromise in a different way to the STFT. Rather than having uniform time and frequency resolution across the time–frequency plane, wavelets present good time resolution at high frequencies and good frequency resolution at low frequencies. This property can be extremely valuable in the detection of short-time transients, such as high-frequency waves. In other cases a multi-resolution decomposition is not suitable for representing a signal, particularly when good frequency (or time) resolution is required across the frequency range [16].

The basic idea underlying wavelet analysis consists of expressing a signal as a linear combination of a particular set of functions, obtained by shifting and dilating one single function called a mother wavelet. Then the correlation between the resulting wavelet shape and the signal is calculated. This coefficient is a measure of how much of the wavelet at that particular scale and time point is included in the signal. The translated and dilated wavelet is defined in terms of the mother wavelet as

$$\psi _{{a,b}} {\left( t \right)} = \frac{1}{{{\sqrt {{\left| a \right|}} }}}\psi {\left( {\frac{{t - b}}{a}} \right)}$$
(5)

for positive a. Conventionally, a is 1 for the mother wavelet and increasing a > 1 dilates the wavelet, expanding the interval over which it takes non-zero values. The factor \(\frac{1}{{{\sqrt {{\left| a \right|}} }}}\) ensures that all of the scaled wavelets have a squared norm of 1 so the wavelet transforms at different scales are directly comparable. When the wavelet is dilated for large scales its scale content becomes more localised except the time period it covers becomes less localised. This is why wavelets have good time resolution and poor scale resolution for low scales, but poor time resolution and good scale resolution at high scales. The wavelet operates as a smoothing function in the time and scale domains in the same way as windows smooth the STFT. The continuous wavelet transform (CWT) of a signal x(t) is formally defined at time t and scale a by the formula

$$W{\left( {a,b} \right)} = {\int\limits_{ - \infty }^\infty {x{\left( t \right)}\frac{1}{{{\sqrt a }}}\psi {\left( {\frac{{t - b}}{a}} \right)}} }\operatorname{d} t,$$
(6)

where b acts to translate the function across x(t) just as t does in the equations above [15]. The wavelet coefficients, W(a, b), express the correlation between the waveform and the wavelet at a variety of translations and scales: the similarity between the waveform and the wavelet at a given combination of scale and position, a, b. In other words, the coefficients present the amplitudes of a series of wavelets, over a range of scales and translations, that should be added together to reconstruct the original signal. From this point of view, wavelet analysis can be thought of as a search over the waveform of interest for activity that most obviously approximates the shape of the wavelet. This search is carried out over a range of wavelet sizes: although its shape remains the same, the time span of the wavelet varies. Since the net area of a wavelet is always zero by design, a waveform that is constant over the length of the wavelet would give rise to zero coefficients. Wavelet coefficients respond to changes in the waveform, more robustly to changes on the same scale as the wavelet, and most strongly, to changes that is similar to the wavelet. Even though a redundant transformation, it is often easier to analyze or recognize patterns using the CWT [16, 21].

Independent component analysis (ICA)

Makeig et al. [22] presented the first applications of ICA to biomedical time series analysis. The significance of using ICA to distinguish a source, x i , from mixtures, y i , is that the activity of each source is statistically independent of the other sources [23]; i.e., the mutual information between any two sources, x i and x j , is zero. General ICA model can be written as

$${\text{Y = AX,}}$$
(7)

where A is an unknown full-rank matrix, called the mixing matrix, and X is the independent component (IC) data matrix, and Y is the measured variable data matrix. The crucial problem of ICA is to estimate the independent component matrix X or to estimate the mixing matrix A from the measured data matrix Y without any knowledge of X or A. The practical problem of ICA is to calculate a separating matrix W so that the components of the reconstructed data matrix \(\widehat{X}\) become as independent of each other as possible, given as

$$\widehat{X} = WY$$
(8)

The requirement of independence of components \(\widehat{X}\) is equivalent to the assumption of their non-Gaussian distribution. As a result, the independent components are evaluated optimizing some measure of the non-Gaussianity of the vector WY. In this work, a fast fixed-point algorithm [24, 25] is used to minimize or maximize the fourth order cumulant to perform ICA (available in The FastICA MATLAB Package [26]). Since statistical independence is more restrictive than uncorrelation, the measured variables are first transformed into uncorrelated variables with unit variances. This pretreatment is called as sphering or prewhitening [27, 28].

Artificial neural network models

Artificial neural networks (ANNs) are computing systems made up of large number of firmly interconnected adaptive processing elements (neurons) that are able to perform massively parallel computations for data processing and knowledge representation. ANNs can be trained to recognize patterns and the nonlinear models developed during training allow neural networks to generalize their conclusions and to make application to patterns not previously encountered. There are many different types and architectures of neural networks varying fundamentally in the way they learn; the details of which are well documented in the literature [2931]. In this paper, neural network relevant to the application being considered will be employed for designing classifiers; namely the MLPNN. The architecture of MLPNN may contain two or more layers. A simple two-layer ANN consists only of an input layer containing the input variables to the problem, and output layer containing the solution of the problem. This type of networks is a satisfactory approximator for linear problems. However, for approximating nonlinear systems, additional intermediate (hidden) processing layers are employed to handle the problem's nonlinearity and complexity. Although it depends on complexity of the function or the process being modeled, one hidden layer may be sufficient to map an arbitrary function to any degree of accuracy. Hence, three-layer architecture ANNs was adopted for the present study.

For solving pattern classification problem MLPNN employing Levenberg–Marquardt (L-M) and gradient descent (GDA) training algorithms were used. The L-M algorithm combines the best features of the Gauss–Newton technique and the steepest-descent algorithm, but avoids many of their limitations. In particular, it generally does not suffer from the problem of slow convergence [32]. Effective training algorithm and better-understood system behavior are the advantages of this type of neural network. Selection of network input parameters and performance of classifier are important in muscle fatigue detection. The efficiency of this technique can be explained by using the result of experiments [33]. This paper clearly demonstrates that our method is applicable for detecting muscle fatigue. The qualities of the method are that it is simple to apply, and it does not require high computation power. The method can be used as a standalone tool, but it can be implemented as a building block for computer-assisted EMG fatigue detection.

Results and discussion

In this study, we propose the use of the Time–Frequency methods with ICA as an alternative method for the classification of the surface electromyography signal for studying local muscle fatigue during sustained isometric constant force muscle contractions. The time-varying characteristic of the method enables us to accommodate non-stationary EMG data in higher-level contraction. The reliability and accuracy of the time–frequency methods are compared in terms of their robustness in determination of muscle fatigue. The results suggest that time–frequency methods are better choice for the spectral estimation process commonly used to quantify surface EMG signal manifestations of muscle fatigue. Although much more sophisticated classification methods for muscle fatigue detection exist according to the results obtained, we considered our choice adequate for a given problem. The efficiency of this technique can be explained by using the result of experiments. In this study, EMG recordings were preprocessed by using STFT, SPWVD and CWT (Figs. 1 and 2); ICA is used for dimension reduction. Then, these signals are applied to MLPNN.

Fig. 1
figure 1

a Fresh EMG signal, b STFT, c SPWVD, d CWT of the signal

Fig. 2
figure 2

a Fatigued EMG signal, b STFT, c SPWVD, d CWT of the signal

Development of ANN model

The adequate functioning of MLPNN depends on the sizes of the training set and test set. During training, the input and desired data will be repeatedly presented to the network. When using neural network, decision must be taken for how to divide data into a training set and a test set. In this study, 55% of overall data were used for training and the rest of them (45% of overall data) were used for testing. The objective of the modelling phase in this application was to develop classifiers that are able to identify any input combination as belonging to either one of the two classes: fresh or fatigue. For developing the neural network classifiers, 600 examples were randomly taken from the 1,100 examples and used for training the neural networks. The remaining 500 examples were kept aside and used for testing the validity of the developed models. The class distribution of the samples in the training, validation and test data set is summarized in Table 1.

Table 1 Class distribution of the samples in the training and test data sets

The MLPNN was designed with different features of EMG signal in the input layer; and the output layer consisted of one node representing whether fatigue detected or not. A value of “0” was used when the experimental investigation indicated a fresh and “1” for fatigued muscle. Depending on the nature of the method, we used different nodes in the input layer for each method (STFT, SPWVD and CWT). We used 150 nodes for STFT, 160 for SPWVD and 175 for CWT. The preliminary architecture of the network was examined using one and two hidden layers with a variable number of hidden nodes in each. It was found that one hidden layer is adequate for the problem at hand. Thus the sought network will contain three layers of nodes. The training procedure started with one hidden node in the hidden layer, followed by training on the training data, and then by testing on the validation data to examine the network’s prediction performance on cases never used in its development. Then, the same procedure was run repeatedly each time the network was expanded by adding one more node to the hidden layer, until the best architecture and set of connection weights were obtained. Using the backpropagation (L-M and GDA) algorithm for training, a training rate of 0.001 and momentum coefficient of 0.9 was found optimum for training the network with various topologies. The selection of the optimal network was based on monitoring the variation of error and some accuracy parameters as the network was expanded in the hidden layer size and for each training cycle. The sum of squares of error representing the sum of square of deviations of MLPNN solution (output) from the true (target) values for both the training and test sets was used for selecting the optimal network. The optimum network configuration found was 65 neurons for the hidden layer. In the hidden layer sigmoidal function and in the output layer linear function was used.

Evaluation of performance

Additionally, because the problem involves classification into two classes, accuracy, sensitivity and specificity were used as a performance measure. These parameters were obtained separately for both the training and testing sets each time a new network topology was examined. Computer programs that we have written for the training algorithm based on backpropagation of error, L-M and GDA were used to develop the MLPNNs. The test performance of the MLPNN was determined by the computation of the following statistical parameters:

Specificity

Number of correctly classified non-fatigued subjects/number of total non-fatigued subjects.

Sensitivity

Number of correctly classified fatigued subjects/number of total fatigued subjects.

Total classification accuracy

Number of correctly classified subjects/number of total subjects.

Experimental results

In this paper, different techniques based on time–frequency methods and neural networks are proposed for detecting muscle fatigue. Conventional techniques related to frequency and amplitude analysis do not work in this case. The signal amplitude for all subjects is falling near the end of the test showing that the subjects are putting less force. This shows that while fatigue is obviously present according to the subjects’ own experience it cannot be detected due to the condition of constant force being not applicable. Time–frequency based decomposition is used to isolate muscle activity from fourteen volunteer subjects related to the same phasic voluntary movements. The proposed method for detecting fatigue by observing the behaviour of the time–frequency spectrum with ICA can be automated using neural networks. An MLPNN has been used to visualize the variation of the time–frequency methods and aid the detection of fatigue. The creation of the input dataset for the MLPNN has been performed by creating vectors whose elements are extracted by using STFT, SPWVD and CWT.

The training set provided to the MLPNN was representative of the whole space of concern so that the trained MLPNN had the ability of generalization. After training MLPNN, it was determined that the network adequately classified data. We achieved a classification rate of higher than 90% by using artificial neural network as a classifier. Depending on output neuron had a value of “0” or “1”, the EMG recording was classified as fresh and fatigued. Tables 2 (for L-M algorithm) and 3 (for GDA algorithm) show a summary of the performance measures. It is obvious from Table 2 that, by using L-M algorithm, the highest success rate was obtained for the non-fatigued group (sensitivity) using SPWVD with ICA (93%). Also it can be seen from Table 2 the wavelet transform method has the best classification accuracy. The average classification success rate for CWT was 91%, for SPWVD was 90% and for STFT was 88.5%. On the other hand, by using GDA algorithm, the highest success rate was obtained for the non-fatigued group (sensitivity) using SPWVD with ICA (92%). The average classification success rate for CWT and SPWVD was 89% and for STFT was 87.5%.

Table 2 Comparison of time–frequency methods for muscle fatigue detection using L-M algorithm
Table 3 Comparison of time–frequency methods for muscle fatigue detection using GDA algorithm

Discussion

The aim of this study is to validate the use of wavelets in detecting muscle fatigue in comparison with the SPWVD and STFT. In order to examine the effect of time–frequency methods on classifications efficiency, tests were carried out using ICA for dimensionality reduction. Average efficiency obtained for each method when EMG signals were classified using MLPNN structures.

In recent years, number of studies [812] has used time–frequency methods to process non-stationary EMG signals. The Wigner–Ville has a number of advantages over the STFT, but also has a number of limitations. It produces extremely good picture of the time–frequency structure [14]. It also has favourable marginals and conditional moments. The marginals narrate the summation over time or frequency to the signal energy at that time or frequency [16]. Unlike the STFT, the CWT does not assume signal stationarity and uses different length windows (wavelets) to accomplish a multiresolution analysis. For instance, the CWT estimates the amount of power at each possible frequency for each time instant, resulting in a time–frequency representation of the signal. The CWT presents good resolution in both time and frequency, thus making it useful for non-stationary signal processing [8, 34, 35].

These findings suggested that STFT, SPWVD and CWT methods give almost similar information regarding the physiological mechanisms underlying fatigue during dynamic muscle actions. Consequently, time–frequency methods are acceptable for determining the muscle fatigue in EMG. Although there were small differences between the results, the testing performance of the neural network classification systems are found to be satisfactory for both L-M and GDA algorithms. Hence, we think that this system can be used in muscle fatigue detection in the future after it is developed. This application brings objectivity to the evaluation of EMG signals and its automated nature makes it easy to be used in practical way.

Conclusions

Determination of muscle fatigue is a difficult task requiring examination of the muscle, an EMG. An artificial neural network that classifies subjects as having fresh or fatigued muscle provides a valuable diagnostic decision support tool for muscle fatigue detection. In this study, time–frequency methods have been used as a feature extraction method and ICA has been used to reduce the dimension of feature vectors. Then extracted features of EMG signals have been used as an input to MLPNN that could be used to detect muscle fatigue. This process is realized by online data acquisition system. Depending on these methods, classifiers have been developed and trained. The success and performance of the time–frequency methods is proven to detect fatigue under dynamic conditions. The proposed techniques can be used to detect the presence of fatigue and act against it at an early stage, possibly by adjusting comfort related parameters that can affect the presence of fatigue. With specificity and sensitivity values both above 90%, the MLPNN classification with L-M algorithm may be used as an important decision support system in the muscle fatigue detection.