1 Introduction

Transcranial magnetic stimulation (TMS) is a non-invasive stimulation technique that produces a strong magnetic field of two Teslas in a short period (< 1 ms) to depolarize neurons within the brain cortex [1]. Motor-evoked potentials (MEPs), a biphasic activity detected by electromyography, can be recorded after TMS stimulation of the primary motor cortex to assess some properties of the central nervous system [2]. For example, the time of central and spinal conduction can be approximated by the latency of the MEP, which is defined as the time from the onset of a TMS pulse and the appearance of the MEP in peripheral recordings [1]. MEP amplitude, measured as the peak-to-peak difference between positive and negative deflections of the MEP trace, can be used to assess cortical excitability, inhibition, and mapping of the somatosensory cortex [3].

An application of TMS lies within the field of stroke research. Stroke is one of the leading causes of disability worldwide and is the result of the rupture (hemorrhagic) or obstruction (ischemic) of blood vessels within the brain [4]. One of the most incapacitating sequelae of stroke is hemiparesis, partial paralysis of the body’s hemisphere contralateral to the stroke lesion [5]. As reviewed by McDonnel et al., who identified 844 studies reporting TMS measurements of the motor cortex in adult stroke patients, TMS-elicited MEPs are of significant importance in stroke-related research [6]. Particularly, MEPs features have been used in studies that aim to describe the effects of interventions during stroke upper limb motor rehabilitation. This is important since the estimation of corticospinal tract excitability in stroke patients may be used as a biomarker of upper limb stroke recovery [7].

Although the extraction of MEP features from electromyography (EMG) is useful for stroke-related research, MEPs may not be elicited in all recorded trials of a TMS study, especially in the affected hemisphere (AH) of stroke patients, since lower cortical excitability in that hemisphere compared to the unaffected one (UH), due to the neurological lesion, increases the variability of the extracted features [8]. To eliminate trials without MEPs, or with poor signal-to-noise ratio, several studies employed a manual procedure comprising a visual inspection of the EMG trace from each trial, either during MEP acquisition or in posterior offline processing [9,10,11,12,13,14,15,16]. Furthermore, although the International Federation of Clinical Neurophysiology (IFCN) has stated guidelines regarding MEP acquisition and signal processing in TMS studies, the methodology employed for MEP signal processing is heterogeneous throughout the literature [3]. To the authors’ knowledge, only Rábago et al. have proposed a methodology for the automatic extraction of MEP features [17]. However, although their algorithm showed good reliability, it cannot perform an automatic selection of EMG trials with good signal-to-noise ratio [17].

To provide a new methodology for the automatic identification of reliable MEP trials in EMG recordings, and their feature extraction, this work presents a computational tool for the analysis of TMS-elicited MEP parameters in stroke patients. The proposed methodology, based on a frequency-domain analysis of MEP traces across time windows, was tested with a database of stroke patients and compared to the manual inspection performed by three experts.

2 Methods

2.1 Stroke patients

Data from 10 stroke patients were included in the present study. These were acquired as part of a previous larger study aimed to test the effects of a brain-computer interface (BCI) for stroke rehabilitation [18], performed under the approval of the Research and Ethics Committees of the institution where this study was performed (approval number 36/15). Patients read and signed written informed consent. All patients were diagnosed with ischemic stroke confirmed through neuroimaging by a neurologist. Stroke patients were at the subacute phase with 4.6 ± 2.7 months from stroke onset to inclusion in the study, and they were right-handed before the stroke and had no previous history of psychiatric or neurological disorders. The patient’s demographic and clinical information can be found in Table 1.

Table 1 Clinical and demographic information of stroke patients included in the present study

2.2 TMS procedure

Each patient participated in three TMS evaluations (S1, S2, and S3). S1 was performed at the beginning of their participation in the study; S2, 30 days after the patients underwent an upper limb focused intervention; and S3, after another 30 days of therapy. In each session, a Rapid2 (The Magstim Company Ltd., Whitland, UK) stimulator with a figure-of-eight coil was used for delivering TMS single pulses in both cerebral hemispheres (AH and UH) of the subjects. During sessions, all patients were seated in a comfortable armchair. MEP hotspot was located, in each hemisphere, using stereotaxic neuronavigation, using a Visor2 (eemagine Medical Imaging Solutions GmbH, Berlin, Germany) with a 3D template constructed from the brain’s coronal plane with magnetic resonance imaging (MRI) data of each patient. The coil was placed on the primary motor cortex (M1) contralateral to the examined limb. For stimulation, the orientation of the coil was turned 45° towards the contralateral forehead for applying a current flow, perpendicular to the central sulcus. Patients’ resting-state motor threshold (RMT) was approximated using the relative frequency method [19]. Beginning with a stimulus intensity of 35% of the maximal stimulator output (MSO), and the coil placed over the hotspot, power was gradually increased in steps of 5% of the MSO until TMS consistently elicited MEPs with peak to peak amplitudes higher than 50 μV. Thereafter, stimulus strength was gradually lowered in steps of 1% of the MSO until there were less than 10 out of 20 trials with a visible MEP [19]. After the hotspot and the RMT were assessed, 30 trials of MEPs were acquired. This procedure was performed first for the UH and then for the AH.

2.3 EMG recordings

The electromyographical activity was recorded with the Rapid2 8-bit MEP Pod amplifier (The Magstim Company Ltd., Whitland, UK). Two passive surface electrodes were placed over the first dorsal interosseous (FDI) muscle for bipolar recording with a ground electrode located above the olecranon bone of each arm. A bandpass filter of 20 Hz to 10 kHz and a sampling frequency of 1500 Hz were used to record trials with 200 ms of duration after the TMS pulse. Each trial was exported to a CSV file after the acquisition.

2.4 MEP processing algorithm

The algorithm was designed for computing MEP amplitude, latency, and selection of trials for a session of 30 trials per hemisphere. The method is described, as follows:

  1. 1)

    An 8th order IIR Butterworth forward-backward filter was applied for notch filtering of the 60 Hz noise.

  2. 2)

    A portion of the filtered signal was extracted from 4.6 to 50 ms of the total trial duration of 200 ms, as this is usually the time interval where MEPs occur [20]. This fragment was segmented into sliding windows of 10.7 ms, since this was the average duration of a previously analyzed subsample of stroke patients’ MEPs [21], with an overlapping of 10 ms. Afterward, the power spectrum of each window, computed using the discrete Fourier transform, was summed from 0 to 234 Hz. This frequency band was selected because most power of a subsample of MEPs comprised this range. Each windows’ summed power was used for computing a new representation of the trial, defined as POWi(j), where i is the index of each trial and j is the index of the sliding window for the trial i.

  3. 3)

    For identifying the position of the MEP within the time-domain signal, the algorithm selected the j-th sliding window with the highest summed power (WMaxi). Therefore, MEP’s latency was calculated as the time of WMaxi first sample. MEP amplitude was computed from the maximum and minimum peaks of the EMG signal within the window WMaxi. This procedure was performed for each of the 30 trials, per session, per patient.

  4. 4)

    Once all of the MEPs amplitudes and latencies were computed for all trials of one session and one hemisphere, the algorithm discarded 10 trials out of 30 [22] using the following procedure: for each session and trial i, the WMaxi of the corresponding trial was selected. Afterwards, the median of the power of all the windows in the trial i was computed and WMaxi was divided by this median. The result was a fitness criterion for each trial, which was lower if the WMaxi power was similar to the computed median. Then, the algorithm discarded the 10 trials with the lowest fitness. The motivation for this selection came from the hypothesis that the summed power of a window of a signal not containing a valid MEP would be similar to the power of other segments within the same signal, unlike a window that does contain a valid MEP, whose power should be higher.

The algorithm was implemented in MATLAB® 2017b (MathWorks Inc., MA, USA) but its pseudocode is shown in Table 2 so that it can be written in any programming language. The main function MEP_Features must be executed once for every analyzed hemisphere and session.

Table 2 Pseudocode implementation

2.5 Algorithm validation

The algorithm for automatic MEP selection and feature extraction proposed in this work was compared with the outcomes computed by three different experts using manual measurements. These observers had at least 1 year of experience in identifying TMS-elicited MEPs. Observers, as well as the automatic method, discarded the 10 out of 30 trials that were less likely to contain a valid MEP waveform. Indexes of selected trials, amplitudes, and latencies were compared between each experts’ estimations, to assess variability among observers. Also, the automated method was contrasted with each of the observers’ manual MEP selection and feature extraction. This was done by comparing the percentage of trials that were equally selected and discarded by two methods (%Coincidence). Additionally, relative errors of the amplitudes and latencies between the automatic measurements and those made by the experts were calculated.

2.6 Statistical analysis

A Lilliefors test was employed to evaluate if MEP amplitudes and latencies presented a Gaussian distribution. Friedman’s test was used with the features obtained by the three observers and the automatic method for assessing if amplitude and latency medians were significantly different between methods. Amplitudes and latencies were compared for the UH, AH, and both hemispheres (BH). Afterward, post hoc analysis with Wilcoxon signed-rank tests, using Bonferroni correction for multiple comparisons, was applied to compute which methods were significantly different from each other. Six comparisons were performed: observer 1 vs. observer 2, observer 1 vs. observer 3, observer 2 vs. observer 3, observer 1 vs. automatic, observer 2 vs. automatic, and observer 3 vs. automatic. Assessment of inter-rater reliability was performed using Cronbach’s alpha coefficient. The evaluation involved the amplitude and latency measurements in the UH and AH performed by the three observers and the automatic method and for the three observers without the automatic method.

3 Results

Medians of amplitude and latency, results of Friedman and post hoc Wilcoxon signed-rank tests are shown in Fig. 1. Amplitude measurements were higher for the UH compared to the AH for the observers and the automatic method. Median latency measurements were lower for the UH compared to the AH, except for the automatic method, for which median latencies were similar between hemispheres. Median differences of BH amplitudes and latencies were more pronounced between the automatic method and observer 2. The observers’ measurements in all hemisphere’s comparisons were within the range of those computed with the automatic method. Friedman tests showed that there were significant differences between values of MEP amplitude and latency, measured by observers and the automated method, regardless of the assessed hemisphere. Post hoc tests revealed that amplitude and latency measurements of observer 2 were significantly different from those of the other observers and the automated method. Among the observers, observer 1 and observer 3 presented the most similarities. For amplitude, the automated method showed higher similarity with observer 3 since no significant difference was found between AH and BH. For latency, the automated method resembled more the measurements from observer 1, as shown by the lack of significant difference between the AH, UH, and BH.

Fig. 1
figure 1

Amplitude and latency of MEPs evaluated by each observer and with the automatic method for the affected (AH), unaffected (UH), and both hemispheres (BH). Measurements that did not have a statistically significant difference are also shown

The amplitude measurements in the UH and the AH performed by the three observers and the automatic method had inter-rater reliability, determined by Cronbach’s alpha, of 0.929 and 0.798, respectively. The measurements performed only by the three observers had a lower Cronbach’s alpha of 0.911 and 0.726, respectively. The latency measurements in the UH and the AH performed by the three observers and the automatic method had inter-rater reliability, determined by Cronbach’s alpha, of 0.615 and 0.555, respectively. The measurements performed only by the three observers had a lower Cronbach’s alpha of 0.598 and 0.405, respectively.

Table 3 shows %Coincidence and relative median errors between the proposed method and the manual procedures. Trial selection coincidence varied between 58.3% and 80% among observers and the automatic method. Median trial selection coincidence was only lower in the UH compared to the AH between observer 1 and observer 2. MEP amplitude errors were below 6%, although quartiles were between 0 and 16%. Median latency error was below 23% with quartiles from 12 to 38%.

Table 3 Relative error medians and %Coincidence between observers and the automatic method

4 Discussion

Median MEP amplitude measurements of observers and the automatic method was within the range of those reported by Pellegrini et al. of 250 ± 300 μV, in a sample of healthy subjects [23]. MEP amplitudes were also consistent with those reported by Acler et al. of 470 ± 200 μV in the AH and 890 ± 600 μV in the UH of acute stroke patients [24]. Observers’ median latency measurements were lower than those reported by Van Doren et al. of 27 ± 2.7 ms in healthy subjects [25]. However, the measurement range of the automatic method was within the values of Van Doren et al., implying that the automatic method’s latency measurements were more similar to those reported in the literature.

The criteria of the automatic method for discriminating trials which are likely to contain a valid MEP from those that do not, seem to have the same discrepancy as that observed between manual measurements, regardless of the assessed hemisphere. This allows us to infer that the proposed trial selection method, based on features extracted from the frequency domain of MEP traces, can be suitable for the automatization of MEP trial selection in stroke patients’ studies.

For MEP amplitude and latency, differences were found between observers, regardless of the assessed hemisphere, and between observers and the automatic method. This could be explained by the effects of disrupted cortical excitability in both hemispheres of stroke patients [6, 26] that could have made difficult the localization of MEPs within EMG traces. It may have also been caused by variability between trials; however, 20 trials have been reported to be adequate to compensate MEPs intrinsic variability [22], so this is a less likely cause for lack of similarities between measurements. Another explanation could be the effect of the subjective criteria that could have influenced the observer’s measurements. This is reinforced by the differences between observer 2 with observer 1 and 3 and the automatic method, which allows the possibility that even experts can provide outlying measurements. Rábago et al. have already raised the issue of observer-dependent variability among MEP measurements [17]. On the other hand, the automatic method coincided with MEP amplitudes and latencies of two of the observers. This suggests that the automated method can be considered at least similar to that of an expert, in the case that the expert does not provide outlying data.

The amplitude of MEPs computed with the automated method had a median relative error of approximately 5% and at most of 15%, if experts’ manual calculations are regarded as the standard. Furthermore, amplitude measurements among observers and the automatic method had a good to excellent inter-rater reliability in both hemispheres. This could suggest that the proposed automatic methodology could be used instead of the manual method if a likely difference in MEP amplitude of 5% can be regarded as acceptable. However, median latency errors were of up to 23% (up to 18% if the discrepant measurements of observer 2 are not considered). This could be probably caused by different segments of the EMG trace regarded as MEPs, which caused poor inter-rater reliability of latency measurements. This implies that latencies computed with the automatic method may be different from those measured by an expert. However, even among experts, latencies were different and showed poor inter-rater reliability, which increased if the automated measurements were added to the manual ones. Therefore, latency using the automatic method could provide a more reliable and objective assessment.

Compared to the methodology presented by Rábago et al. [17], the present algorithm does not require fine-tuning of model hyperparameters. Also, it is not necessary to manually inspect EMG recordings for noisy trials (since they are automatically identified). Finally, it was evaluated with stroke patients which present a clinical challenge due to their compromised corticospinal tract integrity and compared with manual measurements of experts. However, Rábago et al. algorithm allows the identification of the cortical silent period which can provide additional information for clinical purposes, which cannot be currently computed with the presented algorithm.

The present study has limitations that need to be taken into account. First, the proposed method was tested in a low sample of stroke patients, though this was partially compensated by a large number of trials performed per patient. Second, MEP acquisition was not performed with a fixed maximum stimulator output, and this has been reported to affect MEP amplitude and latency [23]. Besides, there is no valid gold standard, with which to compare the proposed algorithm’s selection and computation of amplitude and latency, since manual measurements, even among experts, were not similar between them. However, to the authors’ knowledge, this is the first study that describes an automatic methodology for MEP selection and feature extraction and its evaluation with stroke patients’ data. Therefore, the proposed automatic methodology can provide a tool that enables neuroscience research groups to eliminate observer-dependent subjectivity during MEP features’ computation and could allow increasing the possibility of reaching more meaningful conclusions in stroke-related studies.

5 Conclusion

The proposed methodology had a performance similar to that of manual measurements from experts, regarding the selection of trials with MEPs and computation of MEP amplitude and latency. The main advantages of the proposed methodology are that it can decrease the time required for extraction and analysis of MEP features and the capability of performing computations without observer-dependent subjectivity.