1 Introduction

Electrocardiogram (ECG) signals contain much rich information about the cardiac electrical activity. This signal is often used as a clinical and diagnostic tool [1, 2]. Women and men have general differences in their hormonal, anatomical, physiological, biochemical, and biomedical responses [3,4,5]. Therefore, gender information plays an important role in ECG signal interpretation in certain physiological, psychological, or pathological states. In addition, it is well established that other factors such as the subject’s age can have an influence on the ECG signals [6]. To evaluate the gender- or age-based differences, most of the previous investigations concentrated on the statistical and morphological features of ECG time series [6, 7]. Additionally, it has previously been proved that different frequency domain features can provide some information about individual differences in different states of disease and health. Despite the valuable and encouraging results, these techniques suffer from some shortcomings. One of the major shortcomings is that the frequency domain-based methods do not offer any details about the frequency component position in time. To overcome the imposed limitation of these techniques, Wavelet-based procedures have been presented to examine the non-stationary signals. Although the impressive reputation of the wavelet-based methods in bio-signal interpretation, it is hampered by some main limitations [8].

Due to the non-stationary, complex, and chaotic nature of ECG, many recent investigations have emphasized on the potential of the nonlinear bio-signal processing methods. By means of dynamic and nonlinear features, an increasing number of methodologies have been presented to scrutinize the gender-based ECG characteristics in various fields [9,10,11,12]. In addition, some investigations dealt with age-related ECG differences based on nonlinear dynamics [10, 13, 14]. Although some of the nonlinear algorithms such as Lyapunov exponent can deliver some global information about the reconstructed phase space structure of ECG and its trajectories, these methods are not able to describe the shape details of ECG trajectories. The geometric pattern of data points positioned on Poincare surfaces has been served as beneficial information in the study of nonlinear bio-signals [15,16,17,18,19]. To scrutinize the detailed shape information included in the reconstructed phase space, these techniques can also be utilized.

Following the idea of employing Poincare sections for capturing the cardiac chaotic behavior within ECG time series, we attempt to design a dynamic scheme to investigate gender- and age-based differences. Totally, the proposed framework covers the data selection, preprocessing module, the process of feature extraction, and implementing the classification scheme, which are described thoroughly in the next sections. An overview of the proposed procedure is provided in Fig. 1.

Fig. 1
figure 1

Suggested methodology

2 Materials and methods

2.1 Data

The data were selected from ECG-ID database available at PhysioBank [20]. Totally, three hundred and ten 20-s segments of lead I ECG were contained, which acquired from 90 participants at the sampling rate of 500 Hz. Data filtering included baseline drift removal, AC power line noise elimination (using a band-stop filter), exclusion of high-frequency distortions, and signal smoothing [20]. In this study, ECG of 79 subjects, including 37 male (age: 31.24 ± 13.92) and 42 female (age: 25.81 ± 10.8) was applied.

2.2 Preprocessing

The ECG segments (X) were normalized as follows:

$$ {\text{norm}}(X) = 2{{(X - \hbox{min} (X))} \mathord{\left/ {\vphantom {{(X - \hbox{min} (X))} {(\hbox{max} (X) - \hbox{min} (X))}}} \right. \kern-0pt} {(\hbox{max} (X) - \hbox{min} (X))}} - 1 $$
(1)

Further processing was performed in 0.8-s window length according to a normal ECG cycle duration [21].

2.3 Phase space

First, the trajectory is defined in an n-dimensional space by plotting the set of:

$$ [x_{k} ,x_{k + \tau } ,x_{k + 2\tau } , \ldots ,x_{k + (d - 1)\tau } ] = X(k)\quad {\text{for}}\quad k = 1,2, \ldots ,N - (d - 1)\tau $$
(2)

for a scalar vector xi (i = 1, 2, …, N). In this equation, the lag and the dimension of the embedding are τ and d, respectively, and the delayed vector in the phase space is shown by X(k). The phase space reconstruction is crucially affected by the lag. Selecting a small and a large τ value generates an absolutely correlated and uncorrelated phase, respectively. We examined τ = 2–7 in the reconstruction of ECG phase space.

2.4 Poincare section

Description of the trajectory configuration and specification of the attractor type is realized using Poincare section, which is initially defined by the selection of Poincare hyperplane. Then, its definition is completed by specifying the crossing points (also called intersections) of the hyperplane and the trajectory. A line which shows the system status (Eq. 3) is the Poincare section in a 2D space.

$$ y = \tan (\theta )x + b $$
(3)

where tan(θ) is the slope. In addition, b is the y-intercept. In this study, the b was zero. Figure 2 shows the trajectory of an ECG cycle in the phase plane (black curve). The Poincare sections are shown in gray. A crossing point of the data trajectory with a Poincare section in a blue circle was also indicated.

Fig. 2
figure 2

The trajectory of an ECG cycle in the phase plane (black curve). The Poincare sections are shown in gray. An example of the crossing point is shown with blue circle (color figure online)

The selection of the step size (θ) is very influential. Inaccurate θ can result in some incorrect features of basin. We examined different θ in the range of 0°–360° with the step size of 15° (Fig. 2). A line equation for each of two successive points of the ECG trajectory (F(xy)) was calculated, and the crossing points of the line with Eq. (3) was computed (Eq. 4)

$$ \left\{ {\begin{array}{*{20}l} {x_{{{\text{Crossin}}{\text{g}}\;{\text{point}}}} = \frac{{y_{n} - mx_{n} - b}}{\tan (\theta ) - m}} \hfill \\ {y_{{{\text{Crossin}}{\text{g}}\;{\text{point}}}} = \tan (\theta )\;x_{{{\text{Crossin}}{\text{g}}\;{\text{point}}}} + b} \hfill \\ \end{array} } \right. $$
(4)

in which \( x_{n} \le x_{{{\text{Crossin}}{\text{g}}\;{\text{point}}}} \le x_{n + 1} ,\;m = {{y_{n} - y_{n - 1} } \mathord{\left/ {\vphantom {{y_{n} - y_{n - 1} } {x_{n} - x_{n - 1} }}} \right. \kern-0pt} {x_{n} - x_{n - 1} }} \). m denotes the line slope which passing over 2 successive points of trajectory F(xy). xn and yn are the trajectory coordinates.

Finally, the following indices were extracted: the number of crossing points (F1), the area of ECG segment trajectory which has the smallest (F2), the largest (F3) value, and the mean area of basin values for all ECG cycles (F4). For calculating features (F2)–(F4), the area of the basin was firstly calculated for all ECG cycles, and then, the mean, minimum, and maximum of this measure were extracted. The average of standard deviations (SD) of the given series distribution in the horizontal (F5) and vertical (F6) coordinates, the average of third moments of the given series distribution in the horizontal (F7) and vertical (F8) coordinates, and the average of fourth moments of the data distribution in the horizontal (F9) and vertical (F10) coordinates were extracted.

2.5 Classification

Three binary classification strategies were considered. (1) Separating two gender categories of male (M) and female (F). (2) Classification of two age-groups, including younger adults (A1 ≤ 23 years) and older adults (A2 > 23 years). (3) Considering both age and gender information concurrently. In this way, four classes of younger male adults (MA1), younger female adults (FA1), older male adults (MA2), and older female adults (FA2) were defined. In addition, for the last strategy, one versus all schemes was adopted.

Before entering the measures to the classifier, they were normalized. A fivefold cross-validation scheme was employed for 10 times, while accuracy, sensitivity, and specificity were calculated to evaluate the network performance.

The popular SVM algorithm was implemented for categorization. This technique was known as a worthful one in the bio-signal categorization [22]. It usually operates with the adoption of a nonlinear kernel function to transform an input data into a high dimensional space, which ensures easier data separability compared with the original input. Depending on the input measures, an iterative learning procedure of SVM makes an optimum hyperplane which has the largest border between the categories. Finally, to recognize different clusters, the maximum-margin hyperplanes define the decision boundaries. Therefore, the higher the distance between hyperplanes and data points, the higher the classification rates. In this study, radial basis function (RBF), polynomial, and quadratic kernels were tested.

3 Results

After preprocessing the data, the 2D phase space of ECG segments was reconstructed for lags 2–7 (Fig. 3).

Fig. 3
figure 3

a The ECG phase space of a subject in different lags. b The boundaries of the trajectories in different lags

As shown in Fig. 3, the trajectory pattern was dissimilar in different lags. As the lag increases, the area of the phase space is larger. Its pattern has changed from being oval into the circular mode. Then, the Poincare sections in different angles were formed and 10 geometrical-based indices were extracted from the crossing points of the Poincare sections in different states. Mean, maximum, and minimum values of ECG features in different lags are shown in Fig. 4 for male and female groups.

Fig. 4
figure 4

The ECG Poincare section-based indices in different lags, a female and b male. The mean values are shown in blue. Minimum and maximum of the features are demonstrates in black dots. The horizontal axis indicates the lag number, and the vertical axis shows the values of different indices (F1–F10) (color figure online)

Not only did gender affect the amount of indices, but the effect of lag was also evident on these values (Fig. 4). For example, the maximum number of Poincare crossing points (F1) was higher in female than in male. Mean and SD of the parameters in two age-groups are reported in Table 1.

Table 1 Mean ± SD of 10 extracted features (F1–F10) with different delays in two age ranges

Both lag and age affected the amount of indices (Table 1). For example, lower F1 and F2 values were observed for A2 than A1. However, F3 values were higher in A2. The average area has grown with increasing delay, especially in all delays of the first age-group (F4). For other features, there was a difference between the two age-groups and among the various delays, although these changes did not have a specific pattern.

Performance evaluation of the features in terms of age and gender was done using SVM (Fig. 5).

Fig. 5
figure 5

Mean SVM accuracy, sensitivity, and specificity rates in 10 times run, using RBF, polynomial, and quadratic as a kernel function for a age and b gender classification

The highest mean rates for age categorization were achieved in lag 6 using quadratic kernel (Fig. 5a). In this case, the mean accuracy, sensitivity, and specificity rates were 83.33, 95, and 97.14%, respectively. The second best rates were obtained by polynomial with the corresponding rates of 81.33, 90, and 94.29%. For age classification, the highest accuracy, sensitivity, and specificity were 93.33, 87.5, and 100%, respectively, using quadratic kernel. The highest mean classification rates for gender separation were achieved for lag 5 using quadratic kernel (Fig. 5b), where the mean accuracy, sensitivity, and specificity rates were 83.33, 94.29, and 95%, respectively. The second best rates were obtained by RBF with the corresponding rates of 83.33, 88.57, and 100%. For gender classification, the highest accuracy, sensitivity, and specificity were 93.33, 100, and 100%, respectively, using RBF and polynomial functions.

It can be concluded that the best results were obtained for the separation of age and gender classes with delays of 5 and 6. Therefore, in order to take into account the effect of both gender and age parameters, we only used these two delays. The mean ± SD of F1–F10 in four different groups is provided in Table 2.

Table 2 Mean ± SD of 10 extracted features (F1–F10) with delays 5 and 6 in four different groups

In both lags, the ECG parameters were different in two age-groups and in two genders (Table 2). Considering both age and gender groups concurrently, mean performances are reported for lag 5 and lag 6 in Table 3.

Table 3 Mean SVM accuracy, sensitivity, and specificity rates in 10 times run, using RBF, polynomial, and quadratic as a kernel function in two lags of 5 and 6

Optimal performances were achieved using the proposed methodology (Table 3). FA2 was obtained the highest mean rates using all kernel functions and for both lags. This class was separated with the highest mean rate of 94.66% using polynomial and lag 5. The second best results were allocated to the MA1. Using RBF and lag 6, it was recognized with the highest mean rate of 90%. Considering all classes, the mean accuracy rates were in the range of 80–85%. The highest mean accuracies were obtained using RBF. The sensitivity and specificity rates were also promising. The mean sensitivity rates were in the range of 92–98% and the mean specificity rates were in the range of 87–96%. The second best accuracy was obtained by polynomial kernel function with the mean rates of 83.33 and 82.33% for lag 5 and lag 6, respectively.

4 Discussion

Many factors affect ECG interpretation, including heart size, torso morphology, ECG lead placement, environmental artifacts, the person’s height and weight, age, gender, race, and genetic background. Therefore, it is very important and challenging to have robust ECG algorithms in different clinical conditions. The main contribution of this study was to evaluate subject differences in terms of their age and gender using ECG. We employed ECG of 79 subjects to scrutinize the effect of age and gender on the reconstructed ECG phase space. We defined 10 features to quantify the points of Poincare section intersected with the ECG phase space. Our results revealed that ECG dynamics were different in two age ranges and in two genders (Table 2). These results are consistent with the previous findings. Former analysis [6] showed that some global ECG indices are significantly different in females and males. Another study [23] emphasized that as ECG characteristics varied with gender and age, diagnostic ECG criteria should be age and sex specific. To examine the impact of age and gender in paced breathing, spectral and sample entropy indices were employed [10]. It was shown that fluctuations in cardio-respiratory coupling were noticeable only in middle-aged male subjects. Beckers et al. [13] reported that nonlinear measures of heart rate variability (HRV) declined with age. However, there were not any clear gender-based differences in these indices. ECG differences of females and males were investigated in response to sad stimuli [9]. They reported the efficiency of nonlinear indices in revealing gender-wise ECG differences. In another study, they showed that compared to females, sleep deprivation affects the ECG of males during affective stimuli [24].

In terms of the age and sex of subjects, there are a limited number of studies that applied nonlinear methods to investigate ECG dynamics. However, most of them provided global information about the ECG trajectory. Compared with these investigations, our proposed framework efforts to track the local information embedded in the ECG basin.

We obtained the highest accuracy of 93.33% for the gender- and age-based classification strategies. By combining age and gender information, the maximum rate was 94.66%. Previously, a little study was performed on gender and age classification using physiological signals. For gender classification, some frequency- and time-based HRV features were used [25]. Using SVM, the maximum accuracy of 84% was reported. A review article on gender classification [26] showed that SVM has been a common classifier in this field [26]. Other classifiers have been also used in age or gender categorization [27]. For age classification, the highest area under the ROC (86.25%) was reported for Bayesian network. Though they used some nonlinear features, detailed properties of ECG phase space have not been evaluated.

Although in this study we focused on the ECG Poincare section indices of healthy subjects based on the gender and age differences, in future investigations, the impact of these two factors on ECG dynamics of patients with heart ailments should be carefully studied by means of the proposed algorithm.

5 Conclusions

This manuscript presented a novel age and gender classification approach using Poincare section indices. Interesting results have been achieved from these phase space-based nonlinear features. Further improvements were obtained by incorporating features coming from both age and gender information concurrently. Totally, our findings provide a better insight into age- and gender-based discrimination using ECG characteristics delivered by Poincare section. In addition, considering the simplicity and rich information of the suggested technique, which is provided based on the chaotic nature of ECG, the algorithm can be applied as an efficient method in ECG waveform analysis in different states of disease and health, as well as for prediction and diagnosis purposes.