1 Introduction

Human activity recognition with monitoring purposes has become a tool of great importance in later years. In the medical field, human activity recognition enables clinicians to obtain objective and valuable information from the state and evolution of several pathologies with the aim of improving therapeutic strategies [1]. From a technical perspective, activity recognition through inertial-based wearable devices has generated a countless number of publications during the last years. With the help of these devices, which are increasingly smaller and portable, it is possible to objectively monitor the activity of users during their activities of daily life. In the case of pathologies affecting human movement, these devices further enable physical assessment without requiring the presence of clinical observers. For example, in stroke patients, quantifying the amount of movement performed through a day has proved to be useful for analysing the evolution of the rehabilitation [2]. Similarly, in patients with Parkinson’s disease, such devices have shown useful in the monitoring of different activities of daily living [3], gait analysis [4] and posture transitions [5]. Finally, another relevant example is the usage of these inertial systems in patients with Alzheimer’s disease for localisation and energy expenditure quantification [6, 7].

The common approach to use the signals provided by these inertial-based wearable devices consists in analysing them by means of machine learning techniques and biomechanical algorithms with the aim of determining the activity, posture transition, gait parameters and other mobility-related parameters [5, 8, 9]. In previous works, the most used inertial sensor is the accelerometer, followed by the gyroscope [10]. However, other sensors are recently being employed in order to contextualise and enrich inertial sensor measurements. One of these sensors is the atmospheric pressure (AP) or barometer. Their measurements are based on the current absolute AP, which is subjected to natural atmospheric events such as stormy days, where AP is usually low, and anticyclones, where the AP is high. These natural atmospheric events affect the AP measurement and must be considered in terms of activity recognition purposes.

Current barometers are able to detect altitude changes of few centimetres [11,12,13]. Thus, in the field of activity recognition, barometers may become a useful complementary sensing modality to inertial sensors in order to detect movements where a change in altitude is given, for example falls, using stairs, or postural transitions (PTs) like sit to stand (SiSt) or stand to sit (StSi). However, their physical measurement also represents a challenge in terms of facing false positives. A small change in temperature, or in pressure, given, for example, by a door/window opening within a room, can considerably alter the barometer measurement and create a signal peak that resembles a change in altitude, which would result into false positives in the algorithm outcomes.

On the other hand, regarding SiSt and StSi PT recognition, there are some algorithms that have been shown to provide acceptable results, although using several memory resources due to the complexity of the classifier [14]. In these works, provided that acceleration might be similar in these two PTs [5], some algorithms are then built with complex structures in order to achieve acceptable results. Classifiers with such complex structures require the usage of many computational resources, reducing the autonomy of the wearable devices. As a result, the energy consumption of the battery is increased. However, the energy consumption of inertial units employed for long-term monitoring purposes must be reduced to enable a long battery life. In this way, the resulting device is highly usable, especially when frail individuals or patients with motor impairment can be the users of such inertial units.

In this work, we explore the addition of barometers to inertial sensors in the task of monitoring human activities by a single device. More concretely, we address the activities that have been more complex or needed more computing resources to identify by a single inertial sensor: postural transitions and falls, since they are those in which AP sensors would represent an advantage. In addition to this, we also include in our study other common activities, such as walking or using stairs, among others. To analyse the benefits of using barometers, we explore several feature extraction processes and supervised learning classifiers applied to the accelerometer and barometer signals obtained from 14 users, and compare the results and complexity of using only accelerometers against using both sensors. Our results show that AP sensors allow obtaining higher accuracies, while also achieving simpler classifiers, in terms of computational complexity and memory resources, which reduces the energy consumption of wearable devices.

2 Related work

Activity recognition with inertial systems, such as accelerometers, has been widely studied, and it has been employed in many different fields of applications, for example in health [15], sports [16] and video games [17].

Within the activity recognition field, three kinds of movements can be distinguished: those where a punctual movement is performed, movements that are repeated over time and finally the absence of movement. The first case is mainly composed of posture transitions and falls, that is, movements characterised by brief executions in a short time period, some examples are SiSt or StSi PT. The second case is the repeated movements such as walking, running, going up/downstairs. Finally, the last case consists of remaining still in a posture, such as standing, sitting, or lying.

All these movements can be characterised with inertial sensors, being the accelerometer the most used [9, 10, 18, 19]. This sensor measures accelerations and is able to provide the orientation in regard to the gravity axis. Other sensors such as gyroscopes or magnetometers have also been employed. Najafi et al. [20] used a gyroscope to detect a SiSt and StSi transition by means of a wavelet feature extraction. In addition to this, gyroscopes have also been used to analyse gait [21] or falls [22]. Gyroscopes have the disadvantage of only providing measurements when there is some movement; note that accelerometers provide the orientation with respect to gravity even if there is no movement. In contrast, magnetometers provide the relative orientation in regard to the magnetic North. These devices have also been used to identify activities by means of data fusion techniques with accelerometers [23]. However, ferromagnetic materials such as any electric device, building structures, or street lightning, among others, distort the surrounding magnetic field severely affecting the magnetometer measurements in real environments. Thus, they are only reliable in outdoor environments (out of the city) and in controlled conditions. Accelerometers are, therefore, the most employed sensors to recognise activities so far, but they can be complemented with other sensors that provide context information, such as GPS, barometers, and non-wearable sensors (presence, temperature and humidity sensors) [24,25,26].

The use of barometers is currently not very extended. The main reason relies on the trade-off that exists between its sensitivity and its specificity. A sensitive barometer is able to detect distances up to 10 cm; however, a small change in pressure, a window or door opening and closing, or a small air stream might modify the measure, leading to an increment of false positives. However, Massé et al. [27,28,29] found that barometers may detect activities such as SiSt and StSi (StSi) posture transitions. They developed an algorithm to detect activities along with an accelerometer and a barometer, proving that barometers could enhance the performance of some algorithms where there was a change in altitude.

In this work, Masse et al. employed three barometers (BMP085, MS5611-BA01 and MPL115); although they could perceive SiSt and StSi transitions, the signal of the BMP085 (the best barometer) presented a noise of 0.03 mbar (10 cm approximately according to BMP085 datasheet) compared to the current BMP280, which only presents a ± 0.0013 mbar noise (1.7 cm according to BMP280 datasheet) [13].

Moncada-Torres [30] also proved different locations of the body to build an algorithm and test different sensors. They also showed that barometers could enhance detection of activity with classical inertial sensors. However, they also showed that gyroscopes did not provide any enhancement to their algorithms. Surprisingly, wrists were considered as the most optimal location to place a sensor to detect different activities, contrasting to Gjoreski et al. [31] who proved that wrists are not a suitable location due to the excessive random hand movements in DLA. According to Moncada et al., barometers could enhance by 20% the detection of some activities compared to the algorithm tested only with an accelerometer. In this work, although falls were not tested, accuracy on sit, stand and lie down was 90%, 82% and 76%, respectively, obtained with a k-NN classifier.

Falls have also been studied with barometers due to the altitude change. Tolkiehn et al. [32] showed an enhancement of 5% in accuracy with an algorithm using barometer and accelerometer against an algorithm that only used an accelerometer. In other work, Bianchi et al. [33] showed a very significant improvement on sensitivity (up to a 20%) and specificity (up to a 5%) including a barometer to the accelerometer-based algorithm. In this work, however, they did not investigate the effect of barometers onto other activities such as posture transitions.

In this work, we describe the effect of the barometer in two algorithms. The first one classifies StSi from SiSt, and the second is a 3-output classifier that determines SiSt, StSi and falls. A previous classifier is also included in order to discard other activities. In addition to this, we analyse the computational resources used by the devised algorithms, given that they may be embedded into wearable devices, and reducing the resources would result in an extended battery life and a higher usability.

Unlike previous works, which are focused in detecting falls or postural transitions, we present a method that demonstrates and justifies the features selected to detect specific movements from a database where several activities are performed. In this way, we want to maximise sensitivity in detecting SiSt, StSi and falls and also to maximise specificity while users perform other activities which could lead to false positives. Thus, we use several temporal and frequency features and employ a method to select the best features in a database of several activities done by 14 users. Meanwhile, other papers present specific features and focus on concrete classifiers; we present a method to select the features as well as the optimal classifier.

3 Methodology

In this section, we present the methodology that aims to analyse the effects of including a barometer in a human activity recognition system devoted to identify posture transitions, one of the most challenging movements to be detected by a single inertial device.

In this way, we propose an approach in which, on the one hand, only an accelerometer is employed and, on the other hand, the combined information from this accelerometer and a barometer is employed to analyse the activity of a person. In both cases, a feature extraction process is applied in which signals are represented by several characteristics that represent different aspects related to posture transitions and falls. Then, the features are ranked based on a feature selection algorithm, and afterwards, redundant features are removed by means of a correlation coefficient analysis. Finally, several supervised learning classifiers are trained by using the resulting first M features, where M varies from 1 to 30. The optimal number of features N is determined based on the performance results in function of the number of input features. This process is done first for the accelerometer sensor and, then, a second time by using also the barometer. In this way, by comparing the accuracy and complexity of the classifiers we determine the advantages of adding a barometer to inertial sensors for monitoring purposes.

In addition to this, to enable obtaining more significant statistical results, we divided data from users into training-validation and test, so cross-validation is used in the former and results are obtained from the latter, as detailed in Sect. 3.3. This data division is performed randomly 10 times, so the described process is carried out the same amount of times. Results are reported in Sect. 4 as the average of test results.

The methodology used is depicted in Fig. 1. In the following subsections, each step of the methodologies is detailed.

Fig. 1
figure 1

Methodology of the proposed algorithm

3.1 Data collection

The experiments were performed at the Neápolis building (which has 4 floors, stairs and two elevators), in Vilanova i la Geltrú (Barcelona, Spain). More concretely, they took place at the CETpD facilities and at different floors of the building. The CETpD laboratory was equipped with a mattress (30 cm high) located on the ground for the participants to fall and sit in it. Next to it, a bed with another mattress was placed to initiate falls from it. (The bed was placed 53 cm higher than the ground mattress.) A pair of chairs, one with wheels (51 cm high) and another one without (44 cm high), were also employed.

The test protocol was divided into three parts. The first part of the protocol was devoted to perform different types of falls; the second part of the protocol was focused on executing SiSt and StSi transitions in different chairs. Finally, the third part consisted of executing other postures or activities different than falls and posture transitions, which could lead algorithms to some false positives, such as lying, standing, walking, going up/down stairs and going up/down the elevator. A total of 14 participants took part in this protocol (Table 1).

Table 1 Users’ baseline data

The test protocol consisted of several movements and repetitions; every repetition and movement took place after 5 s from the previous one; during this time, users remained standing, lying or sitting. The test protocol approximately took 10 min to be completed, varying from 9 to 13 min. Figure 2 shows the executions and repetitions that volunteers performed.

Fig. 2
figure 2

Test protocol scheme

All the tests were video-recorded in order to establish a gold standard. Video recordings were first synchronised with the sensor signals as described in [34], and videos were then labelled according to the activities performed. In this way, the resulting labels were extrapolated to the signals.

3.2 Signal conditioning and windowing

Data were obtained by means of the 9 × 3 sensor [35], a inertial measurement unit that measures 99 × 53 × 19 mm3 and weighs 57 g (83 g with battery). This device is composed of 1 microcontroller, 3 accelerometers, 1 gyroscope, 1 magnetometer and 3 barometers. It also has a microSD card socket for registering captured data. The system is located at the left side of the waist as depicted in Fig. 3, which includes the orientation of the device.

Fig. 3
figure 3

Location and orientation of the inertial device

The system captures inertial data at 40 Hz, which is enough to analyse human movement during daily life activities [36]. Among them, the activity that produces harmonics in the highest part of the frequency spectrum is gait. According to Antonsson and Mann, the frequency content of gait coming from an inertial sensor is below 20 Hz [37]. In addition to this, gait is rather a simple activity to detect; hence, 40 Hz is considered sufficient as a sampling frequency to detect movements in daily living activities.

The signal, given by x1, …, xM, y1, …, yM, and z1, …, zM and discretised at time t1, …, tM, is then filtered through a second-order Butterworth filter that enables removing high-frequency noise. Once the signal is filtered, it is segmented into windows of 128 consecutive samples, resulting into a 3.2 window length, which is enough to capture posture transitions [14, 38]. These windows are 50% overlapped in order to avoid losing information between windows [39], thus beginning a new window every 64 samples.

Since the barometer provides measurements corrected according to the temperature, the BMP280 was selected among other commercial sensors. The low-level noise, the short dynamic response, good resolution and sampling frequency were features that we took into account. In our problem, the barometer signal is treated in a similar process to the accelerometer. First, our process considers that the sampling frequency offered by the sensor is 26.3 Hz, although measurements are offered at 40 Hz, which implies that some samples are repeated. The provided signal b1, …, bM with a discretised time t1, t1 + τ, t1 + 2τ, …, t1 + τ, where τ = 1/40, is also filtered with a second-order Butterworth filter similarly to the accelerometer. However, while the accelerometer cut-off frequency is set to 15 Hz to detect walking patterns and other activities, the cut-off frequency of the barometer is set to 0.68 Hz. The rationale of setting this cut-off frequency relies on the consideration that harmonics above 0 Hz until 0.68 Hz is related to postural transitions [14]. In this way, we remove high-pass frequency signals that are not related to postural transitions such as the effect of opening and closing a door or window, which provokes sudden peaks on the signal. After the filtering process, barometer signal is treated in the same way than the accelerometer.

3.3 Feature extraction and selection

The whole data set is then divided into a training-validation set and a testing set. The training-validation set is formed with the data from 11 users randomly selected, and this data set will be used to (I) extract features, (II) rank them, (III) remove redundant features and (IV) train the different classifiers and tune the hyper-parameters by means of a tenfold cross-validation. The models obtained with each classifier are finally tested over the three remaining users in order to assess the performance for each model. This process is executed randomly ten times to prevent specific training/test divisions from biasing the results.

Two monitoring tasks are addressed by this approach. The first one consists in determining whether a SiSt transition, a StSi transition or another activity has been performed. The rationale, as previously described, is that these transitions are one of the most difficult events to identify by a single accelerometer. In addition to this, the second classification task also includes falls in the detection problem, which is shown in the literature to be a challenging one. In consequence, the following described methods have been employed with the training set twice, once to distinguish between StSi and SiSt and another time to also detect falls.

Feature extraction characterises the signal contained in a window (see the end of Sect. 3.2 for details) according to two specific set of features. The first one is composed only of accelerometer features, and in the second set, the barometer features are added. A total of 80 and 89 (80 for the accelerometer and 9 for the barometer) features are evaluated. Tables 2 and 3 report the complete list of these features.

Table 2 Accelerometer feature set
Table 3 Barometer feature set, which is added to the accelerometer feature set (see Table 2)

The first accelerometer features described in Table 2, related to the (0–0.68] Hz band, were identified by the authors in a previous paper [14] to increase its power spectra during PT. During PT, the power spectra of the accelerometer signals were concentrated in this band. However, other activities such as walking also increase this band, although the harmonics spread until higher frequencies. To identify them, a complementary spectra band (0.68–3] Hz is also included, and the relation between them is also added. On the other hand, as previously described and shown in Fig. 3, the position and orientation of the wearable sensor were fixed, and consequently, SiSt and StSi PTs are expected to be observed in frontal and vertical axes. In addition, falls are also expected to change measurements in one or two axis. Hence, the maximum harmonic from each axis and the different statistics computed, such as skewness, kurtosis, entropy, mean, will characterise whether a PT or a different activity has been produced. Axis integral and signal magnitude area enable to know the energy of the movement, which also helps to distinguish whether a posture transition or more energetic activities were produced. Finally, similarities among axis are computed, based on differences among means and correlation coefficients, since differences may be expected in posture transitions given the different movements observed in each axis.

One of the most important barometer features described in Table 3 is the average value of the signal, which is related to the altitude. The remaining features aim to represent the changes in altitude during the time window, which is useful to identify PT and falls. Another relevant feature is the linear regression slope, which identifies the tendency of the altitude change. The frequency where the maximum harmonic is found identifies whether noise (medium–high frequencies) drives changes in the barometer signal, or a human movement (low frequency) does. The range and maximum and minimum values also enable to identify the altitude changes. Also, combining them with the signal slope can lead to identify noise.

Given that barometer signal is susceptible to suffer sudden changes in the signal due to small pressure changes such as closing a door or window, going from one room to other, etc., we cannot consider that barometer could be used without an accelerometer. In this paper, we consider that barometer is used to contextualise the information of the accelerometer, helping to classify some posture transitions and, given the nature of its functioning, reduce the computational burden.

Once the feature extractions is executed; a feature selection algorithm is used to rank features. The rationale is that several features will not provide important information; some will provide noise and some will be redundant. Thus, we propose to use ReliefF [41] to estimate the quality of each feature. ReliefF iteratively selects a random instance from the training set and compares its k-nearest neighbours from the same and from the other classes. The quality estimation of an attribute is increased if the same-class neighbours are near to the selected instance, and it is decreased as the neighbours of the remaining classes are also close. In this work, the quality estimation provided by ReliefF is used to rank the set of features from higher quality to lower quality.

After using the ReliefF algorithm, a table of sorted features is obtained, ranked from the most meaningful feature characterising the desired output, to the less important. However, ReliefF does not analyse feature redundancy. In consequence, the list of features is composed of several features that could provide the same or very similar information; therefore, redundant features must be removed to prevent classifiers from using repeated features. In this context, we use the absolute value of the Pearson correlation method in order to obtain a score that reflects the redundancy between two features. We propose that all those features that are correlated more than 0.8 are removed from the analysis. Then, the higher ranked feature remains on the data set. Finally, a set of non-redundant features is obtained with their score indicating the importance of the feature regarding the main goal of the algorithm.

3.4 Classifiers

Once features are extracted, ranked and those redundant are removed, they are used as the input of a supervised learning classifier with the aim of obtaining a robust detector. Note that 4 experiments are performed:

  • Classifier for sit to stand/stand to sit with accelerometer features (C2outA)

  • Classifier for sit to stand/stand to sit with accelerometer and barometer features (C2outAB)

  • Classifier for sit to stand/stand to sit/falls with accelerometer features (C3outA)

  • Classifier for sit to stand/stand to sit/falls with accelerometer and barometer features (C3outAB)

Furthermore, each classifier is trained 30 times obtaining 30 models, which are built by employing 1–30 features, respectively, with the aim of analysing the performance and also the burden load of each classifier. For example, the first model is built with the best ranked feature; then, the second one is trained with the first and the second best features. Performance is expected to increase as the classifier employs more features. However, the performance of the algorithm might reach a plateau or barely grow after a certain number of features are used; this means that at this point the classifier is receiving non-significant information.

With the aim of testing different classifiers and analysing which one is more appropriate to the classification task, several classifiers have been employed. Table 4 reports the list of used algorithms and the hyper-parameters optimised by tenfold cross-validation.

Table 4 List of classifiers and hyper-parameters tested

Given that the algorithm needs several features to provide a significant performance, it has been considered different standard classifiers that are supported by WEKA [42]. We have not tested k-NN, decision trees, Naïve Bayes since, according to many previous works, they do not provide the highest performance results. Thus, support vector machines (SVM), logistic regression, multilayer perceptron and random forest have been considered as the most suitable candidates to achieve the highest performance in detecting falls and posture transitions. In addition to this, hyper-parameter optimisation consists in testing different values and identifying the most suitable combination by means of cross-validation. In this way, the values tested for the SVM hyper-parameters cover different powers of 10 in a grid search, which is the common practice to optimise SVM hyper-parameters. In the case of random forests, they are evaluated with different numbers of trees up to 100, which suits the amount of patterns from the database obtained, as results show. Finally, neural networks are tested with up to two layers and 10 neurons, which empirically shows enough capacity to learn the patterns, i.e. providing a small training error. In the case of support vector machines, prior to finally train each one of the 10 models used to evaluate the testing data, optimal values for cost and gamma parameters are found. Optimal values are those that maximise the accuracy of a tenfold cross-validation train among the tested ones and, in the case of the RBF model, also among gamma values. The trained model is built with the corresponding number of best ranked features. At the end of this process, we obtain a group of optimal values (cost and gamma) for each of the four classifiers (C2outA, C2outAB, C3outA, and C3outAB).

In order to evaluate the complexity of the resulting models, we also analyse the amount of support vectors required to classify a new pattern. This value is directly proportional to the computational resources that a wearable device would require to classify in real time a new signal window. Thus, reducing this number leads to a more extended battery life.

Finally, to test a model as a function of the number of trained features and the accuracy, we use the parameter geometric mean between sensitivity and specificity in order to select an optimal model. We select two models, the first one is the model that surpasses the 95% of the maximum geometric obtained.

Given the maximum geometric mean achieved for each of the four classifiers, we select two models. The first one is that model that surpasses the 95% of the maximum geometric mean obtained in any of the 30 models and that has less features. The second one is obtained with the same method but selecting the model that surpasses the 98% of the model with the maximum geometric mean value. The four selected models are then evaluated through the remaining 3 participants of the data collection.

The experiments have been performed with MATLAB® and through including WEKA libraries in order to compute under the same conditions all the different classifiers [42].

4 Results

This section reports the results obtained in the algorithms as well as some intermediate results that are achieved while extracting and selecting features.

4.1 Classifier C2outA and C2outAB results

Among the 80 features obtained from the accelerometer signals and the 9 from the barometer signal, i.e. 89-feature set mixing barometer and accelerometer features, we remove those that are considered as a highly correlated (Pearson coefficients > 0.8) to avoid those that are redundant. In this way, we remove 41 features that correspond to the accelerometer and 51 that correspond to those obtained from the accelerometer and the barometer. The most relevant non-redundant features among the original set of 89 features are reported in Table 5.

Table 5 List of the most valuable features from the accelerometer and barometer

In Table 5, feature ranking shows that 4 out of 10 features come from the barometer signal, and moreover, among the 4 most valuable features to classify sit-to-stand and stand-to-sit transitions, we find 3 features that belong to the barometer signal.

The tenfold cross-validation applied to the classifiers in Table 4 results into sensitivity and specificity values. In the case of SVM, we take the measure of number of support vectors and the memory of the model. Figure 4 depicts these values as a function of the number of features. It is observed that using the barometer increases both sensitivity and specificity for the same number of features; at the same, time, it also reduces the number of support vectors.

Fig. 4
figure 4

Results obtained for the 30 models on the C2outA and C2outAB classifiers. Top left image plots the performance, where in dashed line sensitivity specificity and geometric mean of the outcomes of the C2outA are plotted. In continuous line, results of the C2outAB are plotted. In the top right image, the number of support vectors is shown (dashed line for C2outA classifier, continuous line for C2outAB classifier). At the bottom, the memory employed is depicted, which is the number of support vectors multiplied by the number of features (dashed line for C2outA classifier, continuous line for C2outAB classifier)

More concretely, Fig. 4 shows that the C2outA needs more than 15 features to reach a 0.85 on geometric mean between specificity and sensitivity (GM). However, employing barometric features, only three features are needed to achieve a GM over 0.9. It is also important to note that the number of support vectors is reduced. While the C2outA classifier needs 125 support vectors, the C2outAB classifier only needs 100 support vectors with 5 features.

Similar trends were observed with the other models. There are some algorithms that only with accelerometer features the performance is similar to the C2outAB classifier, but using many features.

Table 6 reports the average GM obtained from the tenfold cross-validation of the 30 models performed 10 times over the training data set. An enhancement between the classifiers trained with accelerometer features and combining accelerometer and barometer features is observed.

Table 6 Validation average of the geometric mean along the 10 training repetitions with different random training sets and over the different number of features (C2out classifiers)

These results do not show the best model, since they are the average of several results comprising different number of features. However, this table illustrates the enhancement achieved by using barometric features. There is an increment between 7 and 11% on the GM between sensitivity and specificity, proving that barometer signal provides useful information and that combining it to accelerometer features, it enhances the posture transitions detection.

Once we have the different models, we evaluate the model over the three remaining participants, i.e. testing performances are obtained. As reported in Sect. 3.3, we propose two different models. One using the model that achieves the 95% of the maximum validation value obtained with the 30 models. The second model we propose to use is the one reaching 98% of the maximum validation value. Another constraint is the use of the same features on C2outA and C2outAB models. Thus, both models should overpass the 95% of the maximum performance achieved in validation.

In this way, with 15 features, both the models C2outA and C2outAB reach the 95% of the maximum performance achieved. On the other hand, with 21 features both models reach the 98% of the maximum achieved performance with the 30 models.

In the evaluation phase, two previous classifiers have been computed as presented in [43]. The first one (C1) detects a posture transition or fall and rejects any other movement (walking, going upstairs, sitting, etc.). This classifier achieved an accuracy of 0.975 being reliable to be executed in this problem. The second classifier (C2) determines whether we have a fall or a SiSt/StSi transition. This classifier obtained an accuracy of 0.951.

The following table reports the average value of the performance obtained with all the classifiers and over the testing set 10 times (Table 7).

Table 7 Average results of the performance obtained with all classifiers over the testing set

The results show a slight improvement on the results of the C2outAB classifier; however, it does not show a significant improvement in regard to the number of features employed. Similar results are obtained with 15 than with 21 features. Regarding the type of classifiers, results vary depending on the evaluated set, moving from 80 to 97% of performance achieved with some SVM with any kernel, MPL, and random forest. However, random forest with 1 tree and logistic regression use to present poor results is not surpassing 80% in any occasion. There is no evidence that some classifiers work better than the other, and no conclusion can be established in regard to this concern. Nevertheless, results obtained with the same number of features with barometer features have been always better both in the training phase and in the evaluation phase.

4.2 Classifier C3outA and C3outAB results

This subsection reports the results obtained using the C3outA and C3outAB classifiers. In this case, we have three classes: sit to stand, stand to sit and falls. Unlike the previous case, which was a bi-classification problem, we proposed to use the same classifiers but training them by means of a “one vs all” approach. Thus, three results are obtained and we report the average in order to simplify their description.

In the feature selection process, we remove up to 42 redundant features for the C3outA classifier and 49 features for the C3outAB classifier. As in the previous section, the performance of several algorithms is analysed by varying the number of features from 1 to 30. It is also interesting to see the most valuable ranked features within the group of accelerometer and barometer. Table 8 reports the 10 most significant non-redundant features according to the ReliefF algorithm.

Table 8 List of the most valuable non-redundant features from the accelerometer and barometer

Similarly to the results from the previous subsection, among the 10 most significant features we observe that there are 3 features from the barometer, and moreover, from the three most valuable features, we have 2 belonging to the barometer signal. Although it seems that barometer is not as important as the previous problem, it is still crucial to include it in a problem of change of altitude.

We then find the optimal value for cost and gamma for the SVM algorithms. In this case, we obtain the following values: cost = 10,000 and gamma = 0.001.

The classifiers are trained 30 times over 10 different random training sets. The behaviour is pretty similar to Fig. 4. In this case, performance with barometric features also improves those obtained by only using accelerometer signals. The quantity of features to obtain high-performance results is also inferior when barometer features are included.

The following table shows the results obtained by averaging the 30 models (regarding the number of features) and the 10 repetitions performed with different training data sets.

Results achieved are, as expected, worse than those obtained with the 2-output classifier from the previous subsection. However, although performances of 0.8828 are the highest average GM, a GM over 0.95 has been obtained with models trained with more than 20 features. The reduction in performance is explained by the need of including more features in order to characterise the 3-output problem, being more complex than C2outA and C2outAB classifiers.

Regarding the increment of performance between C3outA and C3outAB, a significant improvement between the 2 models is found, providing lower error the combination of barometer and accelerometer features. However, given the complexity of the problem, as seen in Table 9, barometer is not as crucial as it is in Sect. 4.1.

Table 9 Validation average over the geometric mean obtained with different 10 random training sets and with the 30 models regarding the number of features (C3out classifiers)

After obtaining the models, we evaluate the classifier with the testing data, i.e. with unseen data from the remaining three users. We select two different models for each of both classifiers following the methods described in Sect. 3.3. In this case, with 17 features we reach the 95% of the maximum performance value. The second model is performed with 25 features, in order to achieve the 98% of the maximum performance.

To evaluate the model, we compute the results based on the outcomes achieved on the trained classifier but, first, we always execute classifier C1 [43] with the aim of removing those movements that are not fall, SiSt or StSi transitions. The following table shows the results of the average of all the classifiers evaluated over 10 different evaluation sets (Table 10).

Table 10 Average results of the performance obtained with all classifiers over the testing set

Results show that there is no improvement when selecting the 95% or the 98% model, meaning that with 17 features we obtain similar results to using 25 features. Results improve significantly in the case of using barometer features instead of using only accelerometer features. Improvements are not meaningful but denote that barometer features always enhance the results of the algorithm. Moreover, we find some classifiers (SVM and random forest) that surpass 90% of geometric mean in the validation results, while there is only one case that surpasses this threshold by only using accelerometer features. These results suggest that the enhancement with barometers is evident and significant.

4.3 Comparison to previous works

The results presented in the previous subsection shows an improvement of the performance by using barometer up to a 11%. In the previous works described in Sect. 2, an improvement of up to a 20% was reported as a result of using barometers. However, these previous studies have only used threshold-based classifiers and they only comprised falls and postural transitions; i.e. other activities were not included in the training database. In contrast, our current work employs further activities in the analysis that may provoke false positives and reduce the resulting specificities and accuracies. To this end, machine learning classifiers were employed in this work. In summary, it is observed that barometers indeed improve the detection of posture transitions and falls, although in a lesser extension to some previous works given the harsher conditions we have employed.

Regarding the complexity of the detection algorithms, we could not find any previous work analysing the computational resources used. This is a contribution of the present work, showing that the usage of barometers also reduces the computational burden and thus allowing to extend the battery of the wearable devices.

5 Conclusions

The use of small portable devices in medical devices provides valuable information to clinicians in order to monitor the evolution of movement disorders, the symptoms of disease or a progression of a rehabilitation in real-life environments, without the need of performing visits to doctor’s office. The economic impact of this monitoring into health systems could be noteworthy, and the enhancement of algorithms to detect daily living activities or symptoms of a disease is crucial to result into a reliable clinical monitoring. These small and portable devices based on inertial systems provide movement information of the patient, being useful for diseases such as Parkinson’s disease, Alzheimer’s disease or epilepsy, among others.

Barometers are starting to be employed as a tool to detect minimal changes in altitude. The inclusion of this sensor into human movement recognition has been done gradually, since it can provide several false positives due to temperature and pressure changes, for example, due to door or window openings. This fact makes an detection algorithm to not uniquely depend on barometer features, since they are also sensitive to small pressure changes that do not involve altitude change. Due to this reason, barometric features must be understood as a complement to improve other sensor-based algorithms. In this way, its use with classical devices, as accelerometers, can enhance considerably the detection of postures or activities, as demonstrated in this research study.

In this paper, we have presented an algorithm that uses the accelerometer but also combines different barometer features. We compare an algorithm to classify posture transitions and falls based on two algorithms, one with accelerometer features and one with accelerometer and barometer features. Along all the phases of the method, we have tested and proved that the contextualisation of the barometer significantly improves the outcomes of the human activity recognition classifier. Even in the evaluation phase, we have obtained better results with several classifiers using barometer features as inputs of the proposed classifier.

The evidence is very clear but larger databases and specific problems have to be tested, for example for rehabilitation, chronic disease monitoring or frailty. The use of barometers opens up the possibility to greatly enhance the performance of several current algorithms that are based on complex structures. As reported in Fig. 4, barometer features also reduces computational burden, allowing faster and better algorithms. Finally, new approaches may be investigated, such as the use of deep learning or the use of new kernels in order to enhance the performance of the classifier [44].