Classification of electromyography signals using relevance vector machines and fractal dimension

Lima, Clodoaldo A. M.; Coelho, André L. V.; Madeo, Renata C. B.; Peres, Sarajane M.

doi:10.1007/s00521-015-1953-5

Classification of electromyography signals using relevance vector machines and fractal dimension

Original Article
Published: 24 June 2015

Volume 27, pages 791–804, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

Classification of electromyography signals using relevance vector machines and fractal dimension

Download PDF

Clodoaldo A. M. Lima ORCID: orcid.org/0000-0003-3381-5348¹,
André L. V. Coelho²,
Renata C. B. Madeo¹ &
…
Sarajane M. Peres¹

523 Accesses
11 Citations
Explore all metrics

Abstract

Surface electromyography (EMG) signals have been studied extensively in the last years aiming at the automatic classification of hand gestures and movements as well as the early identification of latent neuromuscular disorders. In this paper, we investigate the potentials of the conjoint use of relevance vector machines (RVM) and fractal dimension (FD) for automatically identifying EMG signals related to different classes of limb motion. The adoption of FD as the mechanism for feature extraction is justified by the fact that EMG signals usually show traces of self-similarity. In particular, four well-known FD estimation methods, namely box-counting, Higuchi’s, Katz’s and Sevcik’s methods, have been considered in this study. With respect to RVM, besides the standard formulation for binary classification, we also investigate the performance of two recently proposed variants, namely constructive mRVM and top-down mRVM, that deal specifically with multiclass problems. These classifiers operate solely over the features extracted by the FD estimation methods, and since the number of such features is relatively small, the efficiency of the classifier induction process is ensured. Results of experiments conducted on a publicly available dataset involving seven distinct types of limb motions are reported whereby we assess the performance of different configurations of the proposed RVM+FD approach. Overall, the results evidence that kernel machines equipped with the FD feature values can be useful for achieving good levels of classification performance. In particular, we have empirically observed that the features extracted by the Katz’s method is of better quality than the features generated by other methods.

Evaluation of feature extraction techniques and classifiers for finger movement recognition using surface electromyography signal

Article 18 June 2018

Study on the methods of feature extraction based on electromyographic signal classification

Article 10 March 2023

Hand Movement Detection from Surface Electromyography Signals by Machine Learning Techniques

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In the last decades, the surface electromyography (EMG) signal has been widely investigated for the purpose of neuromuscular disorder diagnosis, rehabilitation and control of prosthetic devices as well as man–machine interface, targeting individuals with amputations or congenitally deficient limbs [2, 8, 17, 24, 26, 37, 41]. This is because the EMG signal provides a highly useful characterization of the neuromuscular system, also allowing that many pathological processes—whether arising in the nervous system or in the muscles—manifest themselves by alterations in the signal properties.

In order to accomplish the analysis and processing of EMG signals, mainly aiming at performing pattern classification, different approaches have been proposed in the literature, most of which are composed of two interdependent modules [15, 26]: (1) feature extraction and (2) classification. Feature extraction is especially helpful if the pattern to be represented is a sequence of values taken as a function of time, say x(t), such as the EMG signal. In general, there are four classes of feature extraction approaches to representing 1D signals, namely those based on time, frequency, time–frequency, and nonlinear dynamics.

It has been shown that biomedical signals, such as the EMG, are inherently nonlinear in nature, exhibiting well-defined properties, such as scale invariance, scaling range, power law scaling, and self-similarity [14, 38]. The phenomenon of self-similarity, in particular, whereby a small scale structure can resemble the large-scale structure of an object, has been investigated for the purpose of characterizing different biomedical signals as well as for identifying different patterns available in these signals [25, 32]. In fact, EMG signals usually show noticeable traces of self-similarity that could be captured by fractal dimension (FD) measures [22], representing a way to extract discriminative features directly from these signals [13]. Grossly speaking, FD amounts to a non-integer or fractional dimension of a geometric object [4, 44].

In [33], among different nonlinear methods investigated for representing EMG signals, fractal dimension was found to be especially interesting for its sensitiveness to the magnitude and rate of the generated muscle force. On the other hand, in the work of Hu et al. [22], FD was calculated from filtered surface EMG signals in order to discriminate between forearm supination (FS) and forearm pronation (FP) movements. The results reported by the authors showed that the values of fractal dimension of filtered FS surface EMG signals and those of filtered FP surface EMG signals distribute in two different regions, demonstrating the usefulness of FD in capturing different motion patterns of surface EMG signals. More recently, Phinyomark et al. [34] have investigated the specific case of low-level EMG signal classification through a single-channel system, which comes to be a difficult pattern classification task. The authors concluded that detrended fluctuation analysis (DFA), which is an advanced fractal analysis method suited for the identification of low-level muscle activations, performs better than other conventional features in the classification of EMG signals from bifunctional movements, such as flexion–extension. By other means, Ancillao et al. [3] have conducted an experimental study investigating the correlation between the fractal dimension of the surface EMG signal recorded over the main erector muscle of the human leg, viz. the rectus femoris muscle, during a vertical jump and the height reached in that jump. The authors concluded that FD is able to properly characterize the EMG signal, and a linear regression analysis showed a very high correlation coefficient between the fractal dimension and the height of the jump achieved by all the 20 healthy subjects recruited.

Regarding the classification stage, this can be briefly defined as the process of assigning one out of C discrete labels (classes) for a given input vector $\varvec{x}$ [5]. The classification of EMG signals, in particular, appears to be a hard pattern recognition task to pursue since there are usually lots of interferences and fluctuations happening in the EMG signal [21]. Numerous empirical studies have been conducted investigating the use of different types of classifiers operating on different types of features extracted from the EMG signal. These classifiers include artificial neural networks (ANN) [9], linear and quadratic discriminant analysis [6, 35], Bayesian classifiers [16], fuzzy classifiers [7], and also support vector machines (SVM) [12, 28, 31, 45]. In a recent work [46], Yousefi and Hamilton-Wright conducted a critical review of some of the classification methodologies used in EMG characterization and also present the state-of-the-art accomplishments in this field, emphasizing neuromuscular pathology.

Most of the aforementioned classifiers are based on the idea of solely minimizing the training error, which is usually called empirical risk. However, the combination of limited amounts of training data and the quest for high classification accuracy over these data often leads to overfitting problems [5]. In addition, the levels of accuracy exhibited by these classifiers are usually much sensitive to the feature dimension of the given pattern set. Since they are not plagued by these deficiencies, SVM appear as the method of choice in coping with highly complex classification problems, such as those involving biomedical signals.

The relevance vector machines (RVM) were introduced by Tipping [42] as a Bayesian variant of SVM, which means that they also do not suffer from the aforementioned drawbacks. The RVM yield a probabilistic sparse model identical in functional form to the SVM, representing a new approach to pattern classification that has recently attracted a great deal of interest. In many problems, RVM classifiers have produced competitive results to other kernel-based classifiers, being recently thoroughly investigated in the context of electroencephalogram (EEG) signal classification for epilepsy diagnosis [29, 30].

In order to deal directly with multiclass classification problems, the RVM formulation has been recently adapted [36]. A straightforward multiclass adaptation of RVM is problematic due to the bad scaling of the maximization of the marginal likelihood procedure with respect to the number of classes [10] and dimensionality of the Hessian required for the Laplace approximation [5]. In [36], Psorakis et al. conceived an approach to circumvent these difficulties, bringing about two multiclass multikernel RVM methods (hereafter referred to as mRVM) that are able to address multikernel learning while producing both sample-wise and kernel-wise sparse solutions.

In this paper, we investigate the conjoint use of RVM and FD for tackling the task of EMG signal classification. For this purpose, besides the standard RVM formulation, two types of mRVM, namely constructive mRVM and top-down mRVM, as well as different methods for calculating the FD of an EMG signal, were considered. As far as the authors are aware of, this is the first work providing a thorough assessment of the potentials of combining RVM and FD into a single EMG signal classification framework. Several experiments have been conducted on a dataset involving seven distinct types of limb motions, and the performance of distinct configurations of the RVM+FD approach is reported.

The rest of the paper is organized as follows. In Sects. 2 and 3, we present four methods for estimating the FD from a 1D signal and the mathematical formulations behind RVM and mRVM models, respectively. In Sect. 4, we characterize the EMG dataset used in the experiments and outline some procedures adopted for data preprocessing. We then present and discuss the results achieved by different configurations of the RVM+FD approach, taking as reference the performance delivered by SVM models. Finally, Sect. 5 concludes the paper and brings remarks on future work.

2 Fractal dimension

In a nutshell, fractal dimension alludes to a statistical index of complexity, indicating how the details in a given physical pattern (or object) change with the scale at which they are measured [1, 4]. The value of this index is usually a non-integer, fractional number, hence the designation of a fractal dimension. There are many notions of FD, and various algorithms have been proposed to compute them [44]. None of these methods, however, should be considered as universal, which justifies an empirical comparison of their abilities as feature extractors from EMG signals. In the following subsections, we outline the four methods adopted in our experiments.

2.1 Box-counting method

The idea behind the box-counting (BC) method is to apply successive hypercube grid coverings over a curve (e.g., an 1D signal), yielding as a result a value which is usually very similar to that produced by the Hausdorff Dimension, which is another standard method for calculating the FD [4]. Since in each iteration of the BC method, a finer covering is applied, the method is said to perform a finer and finer analysis on the fractal. Usually, when this method is used, the final FD measure is named as box-counting dimension.

For the calculus of the BC dimension, the successive coverings generated by the method are reflected on a log–log curve (a.k.a. BC curve), which is composed of points that represent the relation between the shrinking of the hypercubes and their occupation rates. The straight line that best approaches the BC curve represents the behavior of the observations from the signal under analysis. The power law of this curve (i.e., the slope of the straight line that best fits it) represents the BC of the fractal.

Formally speaking, the calculation of the BC dimension (D) is given by [4]:

$$\begin{aligned} D = \lim _{n \rightarrow \infty }\frac{\log (Nn(\varLambda ))}{\log (2^n)}, \end{aligned}$$

where $\varLambda \in {\mathfrak{H}}({\mathfrak{R}}^m)$ is an attractor in the Euclidean metric space whose points are compact subsets of ${\mathfrak{R}}^m$; $Nn(\varLambda )$ is the number of boxes intersecting the attractor; and n denotes the nth iteration of the process. Simply put, the BC method covers ${\mathfrak{R}}^m$ with a grid of boxes with lateral size equal to $1/2^n$.

2.2 Higuchi’s method

As the former, the Higuchi’s method [19, 44] is iterative in nature. However, it is especially indicated to handle waveforms as objects. Consider $s=\{s(1),s(2),\ldots ,s(N)\}$ as an epoch of the time series to be analyzed. Then, construct k new time series (aka sub-epochs) $s_m^k$, each of which being defined as [44]

$$\begin{aligned} s_m^k= \left\{ s(m),s(m+k),s(m+2k),\ldots ,s \left( m+\left\lfloor \frac{(N-m)}{k} \right\rfloor k \right) \right\} , \end{aligned}$$

where N is the total length of the data sequence s; $m=1,2,3,\ldots ,k$ indicates the initial time value; k indicates the discrete time interval between points (delay); and $\lfloor \cdot \rfloor$ means the floor operator.

For each of the sub-epochs $s_m^k$, the average length $L_m(k)$ is computed as

$$\begin{aligned} L_m(k)= \frac{1}{k}\left\{ \frac{(N-1)}{\left\lfloor \frac{(N-m)}{k} \right\rfloor k}\sum _{i=1}^{\left\lfloor \frac{(N-m)}{k} \right\rfloor } \left| s(m+ik)-s(m+(i-1)k) \right| \right\} , \end{aligned}$$

where $(N-1)/\lfloor (N-m)/k \rfloor k$ is a normalization factor.

Then, the length of the epoch L(k) for the time interval k is computed as the mean of the k values, for $m=1,2,\ldots ,k$, as given in Eq. (1). This procedure is repeated for each k, ranging from 1 to $k_\mathrm{max}$ ($k_\mathrm{max}=5$ in our experiments).

$$\begin{aligned} L(k)= \sum _{m=1}^k L_m(k). \end{aligned}$$

(1)

The total average length L(k), for scale k, is proportional to $k^D$, where D is the FD of the curve describing the shape of the epoch as calculated by Higuchi’s method. Otherwise, if L(k) is plotted against k on a double-logarithmic scale, then the coefficient of the linear regression of this plot can be taken as an estimate of the FD of the epoch [19].

2.3 Katz’s method

Consider $s(i)=(x_i,y_i )$, $i=1,2,\ldots ,N$, where $x_i$ are values of the abscissa and $y_i$ are values of the ordinate. If the points s(i) and s(j) are represented as $(x_i,y_i)$ and $(x_j,y_j)$, respectively, the Euclidean distance between the points is computed as:

$$\begin{aligned} {\text{dist}}(s(i),s(j))=\sqrt{(x_i-x_j)^2+(y_i-y_j)^2}. \end{aligned}$$

According to the Katz’s method, the FD of the curve representing a time series can be defined as [27]:

$$\begin{aligned} D = \frac{\log (L)}{\log (d)}, \end{aligned}$$

(2)

where L is the total length of the curve or the sum of the Euclidean distances between successive points in the same curve, and d is the diameter estimated as

$$\begin{aligned} d = \max ({\text{dist}}(s(i),s(j))),\quad i,j=1,\ldots ,N. \end{aligned}$$

If there are no intersections of the curve, i can be set equal to 1 and d can be estimated as the maximum distance between the first sample and the farthest of all subsequent samples in $s(i), i = 2,\ldots ,N$.

Obviously, d and L should be dimensionless number to calculate the logarithms in Eq. (2). However, this is not always true. Katz [27] proposed to normalize d and L by the length of the average step, defined as $L/N_l$. In this way, Eq. (2) becomes

$$\begin{aligned} D = \frac{\log (N_l)}{\log (N_l)+\log (d/L)}, \end{aligned}$$

(3)

where $N_l = N-1$.

2.4 Sevcik’s method

Let $y_i, i = 1, \ldots , N$ be a set of values sampled from a signal between time zero and $t_\mathrm{max}$ with sampling period $\delta$. Suppose also that the waveform is submitted to a double-linear transformation that maps it into a unit square. Then, the normalized abscissa $x_i^*$ and the normalized ordinate $y_i^*$ of the square can be defined, respectively, as [40]

$$\begin{aligned} x_i^*= & {} \frac{x_i}{x_\mathrm{max}},\\ y_i^*= & {} \frac{y_i-y_\mathrm{min}}{y_\mathrm{max}-y_\mathrm{min}}, \end{aligned}$$

where $x_\mathrm{max}$ ($y_\mathrm{max}$) denotes the maximum value of $x_i$ ($y_i$), and $y_\mathrm{min}$ is the minimum value of $y_i$. Thus, the FD of the waveform can be approximated by [40]

$$\begin{aligned} D = 1+ \frac{\ln (L)}{\ln (2N_l)} \end{aligned}$$

where $\ln$ is the natural logarithm, L is the length of the curve in the unit square and $N_l = N-1$.

3 Relevance vector machines and their multiclass versions

As mentioned before, RVM can be regarded as a Bayesian variant of SVM, aimed at overcoming some of the SVM limitations [5, 30, 42]. In this section, we present the basic formulation underlying standard RVM classifiers and also the recently proposed multiclass versions [18, 36].

3.1 Relevance vector machines

The standard formulation of the RVM assumes, for a given input $\varvec{x}_n$, that the error between the classifier output, given by $f(\varvec{x}_n;\varvec{w})$, and the desired output $t_n$, where $t_n \in \left\{ 0,1\right\}$, has a normal distribution with zero mean and variance $\sigma ^2$. It also assumes that the samples $\{\varvec{x}_i, t_i \}^N_{i=1}$ are independently generated, so that the likelihood of the observed dataset can be written as [42]:

$$\begin{aligned} p(\varvec{t} | \varvec{w},\sigma ^2 )=(2 \pi \sigma ^2)^{-N/2}\exp \left\{ -\frac{1}{2 \sigma ^2} || \varvec{t} - \varvec{\varPhi }\varvec{w}||^2 \right\} , \end{aligned}$$

where $\varvec{t}=[t_1,\ldots ,t_N]^T$, $\varvec{w}=[w_0,\ldots ,w_N]^T$, and $\varvec{\varPhi } = [\varvec{\phi }(\varvec{x}_1), \ldots , \varvec{\phi }(\varvec{x}_N)]^T$, with $\varvec{\phi }(\varvec{x}_i)=[1, K(\varvec{x}_i,\varvec{x}_1), \ldots , K(\varvec{x}_i,\varvec{x}_N)]^T$. The function $K(\cdot ,\cdot )$ denotes a kernel function defined on a (high-dimensional) dot product space [39], whereas the final decision function is given by $f(\varvec{x}_n;\varvec{w}) = \sum _{i=0}^N w_iK(\varvec{x}_i,\varvec{x}_n)$.

RVM uses an a priori probability over the model parameters (weights) controlled by a set of hyper-parameters. Each weight becomes associated with a hyper-parameter, and most likely values for the weights are estimated iteratively from the training data [42]. In a Bayesian perspective, the model parameters $\varvec{w}$ and $\sigma ^2$ can be estimated initially from an a priori distribution and then reestimated by calculating a posterior distribution using the observed data likelihood. Tipping [42] proposed the following a priori distribution for each model parameter:

$$\begin{aligned} p(w_j | \alpha _j ,\sigma ^2 )= \sqrt{\frac{\alpha _j}{2 \pi }} \exp \left\{ - \frac{\alpha _j w^2_j}{2} \right\} = {\mathcal {N}}(0,\alpha _j^{-2}), \end{aligned}$$

where $j=0,\ldots ,N$ and $\varvec{\alpha }=[\alpha _0,\ldots ,\alpha _N]^T$ is the hyper-parameter vector, which is estimated iteratively from the training data.

Given an a priori distribution, the Bayes rule can be used to determine the posterior distribution of the model parameters through $p(\varvec{w},\varvec{\alpha },\sigma ^2| \varvec{t}) = p(\varvec{w} | \varvec{t}, \varvec{\alpha },\sigma ^2)p(\varvec{\alpha },\sigma ^2 | \varvec{t})$.

Moreover, for a new sample $\varvec{x}_n$, the prediction of the corresponding label $t_n$ can be provided by

$$\begin{aligned} p(t_n | \varvec{t}) = \int p(t_n | \varvec{w}, \varvec{\alpha },\sigma ^2)p(\varvec{w},\varvec{\alpha },\sigma ^2 | \varvec{t})d\varvec{w} d\varvec{\alpha } d\sigma ^2. \end{aligned}$$

However, an analytical expression for the posterior distribution of the model parameters is still not available. In order to solve this problem, it is necessary to adopt an effective approximation. The posterior distribution of the parameters can be decomposed into two components according to

$$\begin{aligned} p(\varvec{w},\varvec{\alpha },\sigma ^2 | \varvec{t}) = p(\varvec{w} | \varvec{t}, \varvec{\alpha },\sigma ^2)p(\varvec{\alpha },\sigma ^2 | \varvec{t}) . \end{aligned}$$

(4)

The first term of the right-hand side of Eq. (4) is the posterior probability of the weights $\varvec{w}$ given $\sigma ^2$ and $\varvec{\alpha }$. The computation of these probabilities is well detailed in [42].

Once the weights were obtained, the hyper-parameters $\alpha _i$ are updated according to $\alpha _i = \frac{\lambda _i}{w^2_i}$, where $w^2_i$ is the square of the ith average weight, $\lambda _i$ is defined as $\lambda _i = 1 - \sum _{ii}$, and $\sum _{ii}$ is the ith element of the main diagonal of the covariance matrix, which may be interpreted as a measure of how well each parameter $w_i$ is estimated. The optimization of the hyper-parameters continues until a pre-defined threshold is achieved or until certain number of iterations is performed.

Sparsity emerges when most of the $\alpha _i$ go to infinity, thus effectively removing the corresponding basis functions; the remaining basis functions are called the relevance vectors (RV) [42]. For large-scale problems, this number can be high and testing complexity might become prohibitive, namely $O(N_\mathrm{ts}N_\mathrm{RV})$, where $N_\mathrm{ts}$ is the number of samples in the test set and $N_\mathrm{RV}$ is the number of relevance vectors.

Standard RVM models can be used to handle classification problems with multiple classes by decomposing the problem into several binary classification tasks, each solved efficiently by an RVM model. The simplest approach, known as the one-versus-one approach, is to decompose the problem with C classes into $\frac{C(C-1)}{2}$ binary problems. A binary classifier is built to discriminate between each pair of classes, while discarding the rest of the classes. When testing a new sample, a voting is performed among the classifiers and the class which received the most votes is deemed to be the outcome.

3.2 Multiclass relevance vector machines

Two different types of mRVM were proposed in [18, 36], namely the constructive type (referred to as mRVM1) and the top-down type (mRVM2). The idea of both is not to train multiple RVM classifiers but to train only a single model that could deal directly with multiclass problems. While mRVM1 achieves sparsity by starting with an empty model and adding samples from the training set based on their contribution to the model, the strategy underlying mRVM2 is to follow a top-down strategy by loading the whole training kernel into memory and iteratively removing non-relevant samples.

The training phase of mRVM2 is similar to that of mRVM1, being both based on the expectation maximization (EM) algorithm. The main difference in that is the mRVM2 does not adopt the marginal likelihood maximization as mRVM1 does [see Eq. (5)] but rather employs an extra E-step for the updates of the hyper-parameters [18]. Moreover, mRVM2 is relatively more expensive than mRVM1 because each sample i has different scales $\alpha _{ic}$ across classes. However, if mRVM2 prunes a sample, such sample cannot be reintroduced into the model. In what follows, we present the main equations underlying the formulation of the mRVM1. The reader is referred to [36] for more detailed explanations.

Consider a training set $\left\{ \varvec{x}_n,t_n\right\} _{n=1}^N$, where $\varvec{x}_n \in {\mathfrak{R}}^m$ and $t_n \in \left\{ 1,\ldots ,C\right\}$. Let $\varvec{k}_n$ be the nth row of the kernel matrix $\varvec{K}$ ($\varvec{K} \in {\mathfrak{R}}^{N \times N}$), expressing how the nth sample correlates with the others from the training set. The learning process involves the inference of the model parameters $\varvec{W} \in {\mathfrak{R}}^{N\times C}$ in such a way that the quantity $\varvec{W}^T\varvec{K}$ acts as a sort of voting system expressing which data relationships are important to capture for increasing the model’s discriminative properties.

Moreover, let $\varvec{Y} = \left\{ y_{11},\ldots ,y_{1N};\ldots ;y_{c1},\ldots ,y_{cN};\ldots ;y_{C1},\ldots ,y_{CN}\right\} \in {\mathfrak{R}}^{C \times N}$ denote a matrix of auxiliary variables introduced for the purpose of multiple class discrimination, acting as targets for $\varvec{W}^T\varvec{K}$. The variables $y_{cn}$ are assumed to obey a standardized noise model, i.e., $y_{cn} | \varvec{w}_c$, $\varvec{k}_n \sim {\mathcal {N}}_{y_{cn}}(\varvec{w}_c^T\varvec{k}_n,1)$, whereas the model parameters $w_{nc}$ follow a standard zero-mean Gaussian distribution, namely $w_{nc} \sim {\mathcal {N}}(0,1/ \alpha _{nc})$, where $\alpha _{nc}$ belongs to the scaling matrix $\varvec{A} = (\varvec{\alpha }_1, \ldots , \varvec{\alpha }_N)^T \in {\mathfrak{R}}^{N\times C}$.

The formulation of mRVM1 adopts as objective the maximization of the marginal likelihood $p(\varvec{Y} | \varvec{K}, \varvec{A} ) = \int p(\varvec{Y} | \varvec{K}, \varvec{W})p(\varvec{W} | \varvec{A}) d\varvec{W}$. In order to differentiate this likelihood, Psorakis et al. [36] followed the assumption that each sample n has a common scale $\alpha _n$ shared across all classes. So, for mRVM1, the vector of hyper-parameters $\varvec{\alpha }_n$ associated with a sample turns out to be a simple scalar $\alpha _n$. The maximization of the marginal likelihood results in a criterion to either add a sample n, delete it, or update its associated $\alpha _n$. So, the model can start with a single sample and then proceed in a constructive manner.

In order to achieve this goal, the log of the marginal likelihood is decomposed into contributing terms based on each sample, that is,

$$\begin{aligned} {\mathfrak{L}}(\varvec{A})= & {} \log p(\varvec{Y} | \varvec{K}, \varvec{A})\nonumber \\= & {} \sum _{c=1}^C -\frac{1}{2} \left[ N \log 2 \pi + \log |\varvec{{\mathcal {C}}}| + \varvec{y}_c^T \varvec{{\mathcal {C}}}^{-1}\varvec{y}_c\right] , \end{aligned}$$

(5)

where $\varvec{{\mathcal {C}}} = \varvec{I} + \varvec{K}\varvec{A}^{-1}\varvec{K}^T$, whose determinant and inverse were derived by Tipping and Faul [43] as a function of $\varvec{{\mathcal {C}}}_{-i}$, that is, the value of $\varvec{{\mathcal {C}}}$ with the ith sample removed. The determinant of $\varvec{{\mathcal {C}}}$ is given by

$$\begin{aligned} | \varvec{{\mathcal {C}}} | = |\varvec{{\mathcal {C}}}_{-i}| | 1 + \alpha _i^{-1}\varvec{k}_i^T \varvec{{\mathcal {C}}}_{-i}^{-1} \varvec{k}_i |, \end{aligned}$$

whereas the inverse of $\varvec{{\mathcal {C}}}$ is given by

$$\begin{aligned} \varvec{{\mathcal {C}}}^{-1} = \varvec{{\mathcal {C}}}_{-i}^{-1} - \frac{\varvec{{\mathcal {C}}}_{-i}^{-1} \varvec{k}_i \varvec{k}_i^T \varvec{{\mathcal {C}}}_{-i}^{-1}}{\alpha _i + \varvec{k}_i^T \varvec{{\mathcal {C}}}_{-i}^{-1} \varvec{k}_i}. \end{aligned}$$

(6)

Equipped with these results, Eq. (5) can be rewritten as:

$$\begin{aligned} {\mathfrak{L}}(\varvec{A}) = {\mathfrak{L}}(\varvec{A}_{-i}) + \sum _{c=1}^C -\frac{1}{2} \left[ \log \alpha _{ic} - \log (\alpha _i + s_i) + \frac{q^2_{ci}}{\alpha _i + s_i}\right] , \end{aligned}$$

where $s_i$ and $q_{ci}$ are called sparsity factor and quality factor, respectively, and these are defined as $s_i = \varvec{k}^T_i \varvec{{\mathcal {C}}}_{-i}^{-1} \varvec{k}_i$ and $q_{ci} = \varvec{k}^T_i \varvec{{\mathcal {C}}}_{-i}^{-1} y_c$. The sparsity factor can be seen as a measure of how much the descriptive information of the ith sample is already captured from the existing samples. On the other hand, the quality factor measures how good the ith sample is in helping to describe a specific class [36].

By setting the derivative $\partial {\mathfrak{L}}(\varvec{A})/ \partial \alpha _i = 0$, one obtains

$$\begin{aligned} \alpha _i&= \frac{Cs^2_i}{\sum ^C_{c=1}q_{ci}^2-Cs_i}, \quad \text{ if } \sum \nolimits ^C_{c=1}q_{ci}^2>Cs_i \end{aligned}$$

(7a)

$$\begin{aligned} \alpha _i&= \infty , \quad \text{ if } \sum \nolimits ^C_{c=1}q_{ci}^2\le Cs_i. \end{aligned}$$

(7b)

The quantity $\theta _i = \sum ^C_{c=1}q_{ci}-Cs_i$ captures the contribution of the ith sample to the marginal likelihood in terms of how much additional descriptive information it provides to the model. By resorting to this quantity, it is possible to establish some rules for including or excluding a given sample, or updating its hyper-parameter [18]:

IF $\theta _i>0$ and $\alpha _i<\infty$ THEN set/update $\alpha _i$ with (7a);
IF $\theta _i\le 0$ and $\alpha _i<\infty$ THEN set $\alpha _i$ with (7b).

Then, the M-step and E-step of EM are used to estimate $\varvec{W}$ and the posterior expectations of the auxiliary variables $\varvec{Y}$, respectively. The weights are estimated as:

$$\begin{aligned} \hat{\varvec{w}}_c = (\varvec{K}\varvec{K}^T + \varvec{A}_c)^{-1}\varvec{K}\tilde{\varvec{y}}_c^T. \end{aligned}$$

Assuming a given class i, the E-step calculates the expected value of $y_{in}$ as

$$\begin{aligned} \tilde{y}_{in} = \hat{\varvec{w}}_i^T \varvec{k}_n - \left( \sum _{j\ne i} \tilde{y}_{jn} - \hat{\varvec{w}}_j^T \varvec{k}_n \right) , \end{aligned}$$

whereas $\forall c \ne i$, the E-step yields

$$\begin{aligned} \tilde{y}_{cn} \leftarrow \hat{\varvec{w}}_c^T\varvec{k}_n - \frac{{\mathcal {E}}_{p(u)}\left\{ {\mathcal {N}}_u(\hat{\varvec{w}}_c^T\varvec{k}_n - \hat{\varvec{w}}_i^T\varvec{k}_n,1)\varPhi _u^{n,i,c}\right\} }{{\mathcal {E}}_{p(u)}\left\{ \varvec{\varPhi }_u(u+\hat{\varvec{w}}_i^T\varvec{k}_n - \hat{\varvec{w}}_c^T\varvec{k}_n)\varPhi _u^{n,i,c}\right\} }, \end{aligned}$$

where $u \sim {\mathcal {N}}(0,1)$ and $\varvec{\varPhi }$ denotes the Gaussian cumulative distribution function.

In the classification phase, the test sample $\varvec{x}_n$ is labeled as of the class i whose auxiliary variable $y_{in}$, $1 \le i \le C$, is maximum, i.e., $t_n = \arg \max _i (y_{in})$.

4 Computational experiments

In what follows, we provide details about the dataset used in the experiments and how the experiments were set up. Then, we present the accuracy results revealed by the RVM and mRVM models, considering the different methods to calculate the fractal dimension. For each model, we also report the optimized kernel parameter value and the associated number of relevance vectors, so as to measure the complexity of the induced models. In this paper, the one-versus-one approach was adopted when using the standard RVM approach.

4.1 Description of the dataset

The EMG signal dataset used in our experiments was originally collected by Chan and collaborators [6, 17]. The authors used eight channels of surface EMG to collect signals from the right arm of 30 normally limbed subjects. Each subject underwent four sessions, with one to two days of separation between sessions. Each session consisted of six trials. EMG signals were collected from seven sites on the forearm and one site on the biceps. An electrode was placed on the wrist to provide a common ground reference. These signals were amplified with gain of 1000 and a bandwidth of 1 Hz to 1 KHz. Signals were sampled at 3 KHz using an analog-to-digital converter.

Seven distinct limb motions (classes) were performed: hand open, hand close, supination, pronation, wrist flexion, wrist extension, and rest. For each trial, the subject repeated each limb motion four times, holding each motion for 3 s, each time. The order of these limb motions was randomized. Chan and Green [6] only used the session four in their experiments. Data from the first two trials were used as training data, and data from the remaining four trials were used as testing data. In this paper, we also make use of data from session four, but the investigated models were assessed separately on each trial using $5\times 2$ cross-validation.

4.2 Experimental setup

The main purpose of this paper is to empirically assess the performance of RVM models in the task of EMG signal classification. In the experiments, we have considered only the radial basis function kernel [39], which has an associated hyper-parameter to be calibrated beforehand, namely the radius $\sigma$. The value of this parameter was varied in our experiments. Although we know that there are several heuristics to select the values of hyper-parameters, we have opted to set the value of $\sigma$ as one in the range $\{2^i, i = -3, -2, -1, 0, 1, 2, 3, 4, 5\}$. For each of the nine values in this range, a $5\times 2$-fold cross-validation run per trial was performed in order to measure the average performance of the methods.

In what concerns data preprocessing, samples were extracted from the EMG signals using a sliding window of 256 ms in length, spaced 32 ms apart [15]. Then, the FD values, as calculated by the different methods described in Sect. 2, were used to build up the feature vectors. The dimension of each transformed sample (i.e., feature vector) was of eight features, since there were eight channels and one FD value was calculated for each channel. The class distribution for each trial is presented in Table 1.

Table 1 Class distribution for each trial

Full size table

Table 2 Best results in terms of cross-validation error achieved for each feature type and kernel machine

Full size table

4.3 Simulation results

In Table 2, we report the best accuracy results achieved by the different kernel machines, including SVM, considering the four types of FD features. The results are given in terms of average and standard deviation of the generalization error calculated over the $5\times 2$-fold cross-validation process. In this table, we highlight the best calibrated kernel parameter value for each kernel machine and also present the number of relevance vectors or support vectors associated with each model. The accuracy results are complemented with those reported in Table 3, which relates to the application of the two-sided Wilcoxon rank sum test over the cross-validation errors [11]. The Wilcoxon rank sum test is a nonparametric statistical procedure that helps answering the following question: Do two independent samples, say $\mathbf {x}$ and $\mathbf {y}$, represent two different populations? The null hypothesis is that data in $\mathbf {x}$ and $\mathbf {y}$ are samples from continuous distributions with equal medians. Assuming a 5 % significance level, having a p-value lower than 0.05 indicates that the Wilcoxon rank sum test rejects the null hypothesis, and so the difference in performance between the given kernel machines is statistically significant [20]. In our case, the test is applied per trial and one of the samples always relates to the kernel machine with the lowest average cross-validation error for the given trial.

On the other hand, Tables 4 and 5 show the specificity and sensitivity values delivered by the best calibrated kernel machines, as reported in Table 2, for each combination of FD method and trial. Each of the last 14 columns in these tables refers to either a specificity or a sensitivity result for a certain class. Sensitivity (also called the true positive rate) measures the proportion of actual positives of a class which are correctly identified as such, whereas specificity (aka the true negative rate) measures the proportion of negatives of a class which are correctly identified as such.

The features were normalized to have 0 as mean and 1 as standard deviation. Since the accuracy results produced by using the feature values extracted by the Katz’s method were significantly better than those obtained by using the feature values extracted by the other FD methods, we decided to inspect in more detail the effect of the calibration of the kernel parameter value for the cases where the Katz’s method was employed. Thus, Figs. 1, 2, 3 and 4 show the way the accuracy rate (i.e., 1—error rate) obtained by the different kernel machines has varied as a function of the kernel parameter value. Figures 5, 6, 7, and 8 do the same job but focus on the sensitivity. The choice of the trial #1 was arbitrary since the purpose here is only to contrast the profiles produced by the different machines. The bars in Figs. 1, 2, 3, and 4 represent the variance in accuracy rate per class (one standard deviation from the mean) for each value of $\sigma$ considered.

Finally, in Tables 6 and 7, we provide the average processing time elapsed during the training and testing phases for each combination of classifier model, fractal dimension estimation method, and experimental trial.

4.4 Discussion

From the results presented in Tables 2 and 3, it is possible to conclude that, in general, the accuracy rates displayed by SVM and RVM were rather similar to each other, prevailing in the majority of the cases over those produced by mRVM2. On the other hand, the performance of mRVM1 varies in accordance with the feature extractor adopted. Considering specifically the BC and Sevcik’s methods, SVM and RVM usually outperformed the others, as testified by the low p-values associated with mRVM1 and mRVM2. For Higuchi’s method, SVM performed consistently better than mRVM1 and mRVM2, but was comparable to RVM in most of the cases. On the other hand, irrespective of the type of kernel machine, the accuracy rates obtained with the aforementioned FD methods were significantly worse than those achieved with the Katz’s method. For this feature type, the performance levels delivered by SVM, RVM, and mRVM1 were rather comparable, since the null hypothesis could not be rejected in five out of six trials. In half of the trials, mRVM1 has provided the best average results, whereas in all cases, mRVM2 was overmatched by the best kernel machine. It is also worth mentioning that the standard deviation values of the error rates obtained with the Katz’s method were usually smaller for all machines, evidencing the robustness of the induced models to the variability of training/test data in the cross-validation process.

In what concerns the efficiency of FD-based RVM and its variations in terms of computational time, the results shown in Tables 6 and 7 reveal that the training of these models is usually more expensive than that of SVM. However, in the testing phase, this time reduced from 2 s on average for SVM to circa 1.5 s on average for RVM and to about 0.4 s on average for mRVM1 and mRVM2. This suggests that the FD-based multiclass RVM can yield more sparse solutions, which means a better data reduction ability. Anyway, regardless of the FD estimation technique used, the time taken to obtain the final classification outputs from the induced RVM models is usually small, which ensures their practical deployment in real-world settings.

Table 3 Results of the Wilcoxon rank sum test over the cross-validation errors

Full size table

Table 4 Best specificity (Spec) results achieved by models for each class and FD method using trial #1

Full size table

Table 5 Best sensitivity (Sens) results achieved by models for each class and FD method using trial #1

Full size table

Table 6 Average CPU time (in seconds) spent in the training phase for each combination of FD estimation method, classifier type, and experimental trial

Full size table

Table 7 Average CPU time (in seconds) spent in the test phase for each combination of FD estimation method, classifier type, and experimental trial

Full size table

By looking at the values shown in Tables 4 and 5, one can perceive that the use of the Katz’s method as feature extractor has endowed all classifiers with the capability to provide a good balance between specificity and sensitivity of the classes. In fact, for all FD methods but Katz’s, the specificity values were usually significantly lower than the sensitivity values. Besides, as evidenced in Figs. 5, 6, 7, and 8, very high sensitivity values could be obtained for all seven classes, irrespective of the value used for the kernel parameter. This behavior could not be reproduced by the other FD methods.

The choice of the kernel parameter value was not a crucial factor to distinguish between the overall best error rates exhibited by the models, even though for each kernel machine, there are some values of $\sigma$ that appear more frequently in Table 2, such as $\sigma =8$ for SVM and $\sigma =\{2,4\}$ for RVM. As depicted in Figs. 1, 2, 3, and 4, there is usually a range of values for the kernel parameter yielding quite interesting results, although there is no optimal value yielding 100 % of correct classification for all classes. Interestingly, the best values of $\sigma$ for the combination of mRVM1 and Katz’s method were always the same, namely $\sigma =32$, the highest value of the studied range. Maybe higher values of this parameter could yield even better results for the mRVM1. In terms of stability, RVM models were usually more robust to the choice of $\sigma$, considering the mean accuracy over all classes altogether.

Finally, in what regards the complexity of the induced models, the number of support vectors and relevance vectors of the best calibrated SVM and RVM models was usually significantly higher than the number of RV associated with mRVM models—refer to Table 2. An exception occurs for the combination of mRVM1 and Katz’s method. In this case, the number of RV was much higher than those obtained by using the other methods for calculating the FD. On the other hand, the models induced by mRVM2 were always the less complex ones, regardless of the FD method. So, when the sparsity of the induced model is a key aspect to take into account, the use of mRVM2 seems to be much recommended.

5 Concluding remarks

In this paper, we investigated the potentials of using relevance vector machines (both in the standard and multiclass formulations) to cope with the task of EMG signal classification. In this study, we have considered different methods for calculating the fractal dimension of 1D signals as feature extractors.

Through experiments conducted on a publicly available dataset involving different types of limb movements (seven classes in total), we have empirically confirmed that the deployment of the kernel machines equipped with the FD feature values can be useful for achieving good levels of classification performance. In particular, the combination of SVM, RVM, and mRVM1 with Katz’s method was the best, across the different experiment trials, in terms of accuracy and generalization. In what concerns the complexity issue, however, mRVM2 has consistently produced more sparse models, implying higher efficiency when classifying large batches of novel samples.

As ongoing work, we are currently extending the scope of investigation by considering other nonlinear dynamics methods to extract the hidden information in the EMG signals, such as the Lyapunov exponent, and Hurst exponent [2]. As future work, we plan to investigate the impact of using EMG sub-segments of different sizes and also of using different feature selection methods, since feature selection is a preprocessing step that can bring about gains in terms of classifier accuracy [23, 45]. Finally, the combination of different kernel machines in heterogeneous committee machines will also be researched in the context of EMG signal classification.

References

Abry P, Gonçalves P, Véhel JL (eds) (2013) Scaling, fractals and wavelets. Wiley, New York
MATH Google Scholar
Acharya UR, Ng EYK, Swapna G, Michelle YSL (2011) Classification of normal, neuropathic, and myopathic electromyograph signals using nonlinear dynamics method. J Med Imag Health Inform 1:375–380
Article Google Scholar
Ancillao A, Galli M, Rigoldi C, Albertini G (2014) Linear correlation between fractal dimension of surface EMG signal from rectus femoris and height of vertical jump. Chaos Solitons Fractals 66:120–126
Article Google Scholar
Barnsley M (1988) Fractals everywhere. Academic Press, New York
MATH Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
MATH Google Scholar
Chan AD, Green GC (2007) Myoelectric control development toolbox. In: Proceedings of 30th conference of the Canadian medical & biological engineering society
Chan FH, Yang YS, Lam FK, Zhang YT, Parker PA (2000) Fuzzy EMG classification for prosthesis control. IEEE Trans Rehabil Eng 8:305–311
Article Google Scholar
Chang GC, Kang WJ, Luh JJ, Cheng CK, Lai JS, Chen JJJ, Kuo TS (1996) Real-time implementation of electromyogram pattern recognition as a control command of man–machine interface. Med Eng Phys 18(7):529–537
Article Google Scholar
Chu JU, Moon I, Lee YJ, Kim SK, Mun MS (2007) A supervised feature-projection-based real-time EMG pattern recognition for multifunction myoelectric hand control. IEEE-ASME Trans Mechatron 12:282–290
Article Google Scholar
Damoulas T, Girolami M, Ying Y, Campbell C (2008) Inferring sparse kernel combinations and relevance vectors: An application to subcellular localization of proteins. In: Proceedings of the 7th International Conference in Machine Learning Applications, pp 577–582
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Dobrowolski AP, Wierzbowski M, Tomczykiewicz K (2012) Multiresolution MUAPs decomposition and SVM-based analysis in the classification of neuromuscular disorders. Comput Meth Prog Bio 107:393–403
Article Google Scholar
Easwaramoorthy D, Uthayakumar R (2011) Improved generalized fractal dimensions in the discrimination between healthy and epileptic EEG signals. J Comput Sci 2:31–38
Article Google Scholar
Eke A, Herman P, Kocsis L, Kozak LR (2002) Fractal characterization of complexity in temporal physiological signals. Physiol Meas 23:1–38
Article Google Scholar
Englehart K, Hudgins B, Chan ADC (2003) Continuous multifunction myoelectric control using pattern recognition. Technol Disabil 15(2):95–103
Google Scholar
Englehart K, Hudgins B, Parker P (2001) A wavelet-based continuous classification scheme for multifunction myoelectric control. IEEE Trans Biomed Eng 48(3):302–311
Article Google Scholar
Goge A, Chan A (2004) Investigating classification parameters for continuous myoelectrically controlled prostheses. In: Proceedings of the 28th conference of the Canadian medical & biological engineering society, pp 141–144
He W, Yow KC, Guo Y (2012) Recognition of human activities using a multiclass relevance vector machine. Opt Eng 51:017,202
Article Google Scholar
Higuchi T (1988) Approach to irregular time series on the basis of the fractal theory. Phys D 31(2):277–283
Article MathSciNet MATH Google Scholar
Hollander M, Wolfe DA (1999) Nonparametric statistical methods. Wiley, New York
MATH Google Scholar
Hu X, Wang Z, Ren X (2005) Classification of surface EMG signal using relative wavelet packet energy. Comput Meth Prog Biomed 79:189–195
Article Google Scholar
Hu X, Wang ZZ, Ren XM (2005) Classification of surface EMG signal with fractal dimension. J Zhejiang Univ Sci B 6:844–848
Article Google Scholar
Huang H, Xie HB, Guo JY, Chen HJ (2012) Ant colony optimization-based feature selection method for surface electromyography signals classification. Comput Biol Med 42:30–38
Article Google Scholar
Hudgins B, Parker P, Scott RN (1993) A new strategy for multifunction myoelectric control. IEEE Trans Biomed Eng 40(1):82–94
Article Google Scholar
Janjarasjitt S (2014) Examination of the wavelet-based approach for measuring self-similarity of epileptic electroencephalogram data. J Zhejiang Univ Sci C 15:1147–1153
Article Google Scholar
Kang WJ, Cheng CK, Lai JS, Shiu JR, Kuo TS (1996) A comparative analysis of various EMG pattern recognition methods. Med Eng Phys 18(5):390–395
Article Google Scholar
Katz M (1988) Fractals and the analysis of waveforms. Comput Biol Med 18(3):145–156
Article Google Scholar
Khokhar ZO, Xiao ZG, Menon C (2010) Surface EMG pattern recognition for real-time control of a wrist exoskeleton. Biomed Eng Online 9:41
Article Google Scholar
Lima CAM, Coelho ALV (2011) Kernel machines for epilepsy diagnosis via EEG signal classification: a comparative study. Artif Intell Med 53:83–95
Article Google Scholar
Lima CAM, Coelho ALV, Chagas S (2009) Automatic EEG signal classification for epilepsy diagnosis with relevance vector machines. Expert Syst Appl 36:10054–10059
Article Google Scholar
Lucas MF, Gaufriau A, Pascual S, Doncarli C, Farina D (2008) Multi-channel surface EMG classification using support vector machines and signal-based wavelet optimization. Biomed Signal Proces Control 3:169–174
Article Google Scholar
Najarian K, Splinter R (2012) Biomedical signal and image processing, 2nd edn. CRC Press, Boca Raton
Google Scholar
Nussbaum M A, Yassierli (2003) Assessment of localized muscle fatigue furing low-moderate static contractions using the fractal dimension of EMG. In: Proceedings of the XVth triennial congress of the international ergonomics association, Seoul, Korea, August 25–29
Phinyomark A, Phukpattaranont P, Limsakul C (2012) Fractal analysis features for weak and single-channel upper-limb EMG signals. Expert Syst Appl 39:11156–11163
Article Google Scholar
Phinyomark A, Quaine F, Charbonnier S, Serviere C, Tarpin-Bernard F, Laurillau Y (2014) Feature extraction of the first difference of EMG time series for EMG pattern recognition. Comput Methods Programs Biomed 117:247–256
Article Google Scholar
Psorakis I, Damoulas T, Girolami MA (2010) Multiclass relevance vector machines: sparsity and accuracy. IEEE Trans Neural Netw 21(10):1588–1598
Article Google Scholar
Riillo F, Quitadamo L, Cavrinia F, Gruppioni E, Pinto C, Pastò NC, Sbernini L, Albero L, Saggio G (2014) Optimization of EMG-based hand gesture recognition: supervised vs. unsupervised data preprocessing on healthy subjects and transradial amputees. Biomed Signal Process Control 14:117–125
Article Google Scholar
Sarkar M, Leong TY (2003) Characterization of medical time series using fuzzy similarity-based fractal dimensions. Artif Intell Med 27:201–222
Article Google Scholar
Scholköpf B, Smola A (2002) Learning with kernels. MIT Press, Cambridge
MATH Google Scholar
Sevcik C (1998) A procedure to estimate the fractal dimension of waveforms. Complex Int 5:1–19
MathSciNet MATH Google Scholar
Subasi A (2013) Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders. Comput Biol Med 43:576–586
Article Google Scholar
Tipping M (2001) Sparse Bayesian learning and the relevance vector machine. J Mach Learn Res 1:211–244
MathSciNet MATH Google Scholar
Tipping M, Faul A (2003) Fast marginal likelihood maximisation for sparse bayesian models. In: Proceedings of 9th AISTATS workshop, pp 3–6
Tricot C (1995) Curves and Fractal Dimension. Springer, New York
Book MATH Google Scholar
Yana Z, Wanga Z, Xieb H (2008) The application of mutual information-based feature selection and fuzzy LS-SVM-based classifier in motion classification. Comput Meth Prog Biomed 90:275–284
Article Google Scholar
Yousefi J, Hamilton-Wright A (2014) Characterizing EMG data using machine-learning tools. Comput Biol Med 51:1–13
Article Google Scholar

Download references

Acknowledgments

The first and second authors acknowledge the sponsorship from the Brazilian National Council for Research and Development (CNPq) via grants #475406/2010-9, #308816/2012-9, and #304603/2012-0. The third author thanks the financial support of São Paulo Research Foundation (FAPESP/ Brazil)—process number 2011/04608-8.

Author information

Authors and Affiliations

Information Systems Program, School of Arts, Sciences and Humanities, University of São Paulo, São Paulo, Brazil
Clodoaldo A. M. Lima, Renata C. B. Madeo & Sarajane M. Peres
Graduate Program in Applied Informatics, Center of Technological Sciences, University of Fortaleza, Fortaleza, Brazil
André L. V. Coelho

Authors

Clodoaldo A. M. Lima
View author publications
You can also search for this author in PubMed Google Scholar
André L. V. Coelho
View author publications
You can also search for this author in PubMed Google Scholar
Renata C. B. Madeo
View author publications
You can also search for this author in PubMed Google Scholar
Sarajane M. Peres
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Clodoaldo A. M. Lima.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lima, C.A.M., Coelho, A.L.V., Madeo, R.C.B. et al. Classification of electromyography signals using relevance vector machines and fractal dimension. Neural Comput & Applic 27, 791–804 (2016). https://doi.org/10.1007/s00521-015-1953-5

Download citation

Received: 04 November 2014
Accepted: 05 June 2015
Published: 24 June 2015
Issue Date: April 2016
DOI: https://doi.org/10.1007/s00521-015-1953-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Classification of electromyography signals using relevance vector machines and fractal dimension

Abstract

Similar content being viewed by others

Evaluation of feature extraction techniques and classifiers for finger movement recognition using surface electromyography signal

Study on the methods of feature extraction based on electromyographic signal classification

Hand Movement Detection from Surface Electromyography Signals by Machine Learning Techniques

1 Introduction