Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In the nineties cheap motion sensors together with low power, compact wireless processing and communication capabilities started becoming available. This led to the idea of using such sensors for in-field (also called “in the wild”) capture of human motion in terms of 3D kinematics [Pic17] and recognition of general human activity [Sch99, Bao04]. Driven by continuous cost, size, power consumption reduction, and integration into accessories and smart textiles [Zhe14], this technology found widespread use in many areas, where “in vivo” information gathering is important. This ranges from healthcare and sports [Shu14, Won15, Che16, Men16, Ios16] over industrial ergonomics and workflow analysis [Vig13, Won15, Ble15a] to human-computer-interaction and robotics, e.g., [Tag14], to name some prominent examples.

On an abstract level two general approaches can be identified for deriving different types of information from wearable sensor networks [Lop16]. The first approach focuses on the estimation of 3D joint kinematics, which in essence amounts to the capture of the poses (orientations, positions), (angular) velocities, and (angular) accelerations of each relevant body part (body segments or joints). Here, the goal is to enable personalized biomechanical analyzes outside the lab and at relatively low cost, but with comparable accuracy to laboratory-based gold standard systems (e.g., [Sut02]). This approach generally relies on inertial measurement units (IMUs) in combination with model-based sensor fusion algorithms, e.g., [Mie16]. It uses physical and biomechanical models and is independent of training data. State of the art methods typically require one IMU on each body segment that should be captured. In other words, the price for an exact motion estimation (which is suitable for biomechanical analyzes) is the need for a potentially large number of sensors, and possibly strict placement and attachment constraints may need to be observed.

The second approach abstracts from the capture of exact body motions and focuses on using machine learning techniques to build statistical models of relevant activities based on signals from fewer sensors, in particular sensors placed on fewer (often just one) body locations. Here, lesser accuracy and level of detail, dependence on training data, and a “black box” statistical character of the model are the price that has to be paid for a less obtrusive, easier to deploy system. While IMUs also play an important role in this approach, they are often complemented by other sensors ranging from microphones over textile stretch sensors, capacitive body sensors, pressure sensors and ultrasonic sensors to eye trackers and wearable cameras [Shu14, Won15, Ble15a, Pap17].

The following sections describe the individual working principles, challenges, and potential applications of these two approaches, then their existing and potential synergies on method and application level. Finally, different aspects of how the technology can be beneficially used in the context of support systems are summarized.

2 IMU Based 3D Kinematics Estimation

2.1 Working Principles

In the area of IMU based kinematics estimation the motions of a person are approximated through the motion of a pre-known biomechanical model that is driven by noisy and biased IMU measurements (angular velocities, accelerations, mostly also magnetic fields) through a stochastic sensor fusion algorithm. This is in contrast to optical gold standard systems, e.g., [Sut02], where the 3D positions of reflective markers precisely placed on anatomical landmarks are measured directly and joint centers and angles are geometrically derived from these [Lea07].

The biomechanical model typically consists of rigid segments, approximating the human bones. These are connected through joints that can optionally be constrained regarding their degrees of freedom (DoF). Besides a personalized biomechanical model (e.g., in terms of segment lengths), the reconstruction of biomechanically valuable joint kinematics data requires knowledge about the relative transformations between IMU and segment coordinate systems, the so-called IMU-to-segment calibrations. Figure 1 illustrates the above-mentioned aspects.

Fig. 1
figure 1

Lower body 3D kinematics estimation. Left: setup with seven IMUs on feet, lower and upper legs, and pelvis as well as reflective markers according to Leardini et al. [Lea07]. Right: Biomechanical model of the lower body with connected segments (magenta lines), joint centers (red spheres), four contact points on each foot (green spheres), IMU placement, and involved coordinate frames. A technical coordinate system is associated to each IMU (I). The segment coordinate systems (S) are drawn at the proximal ends of the segments. The six degrees of freedom (DoF) transformations, each in terms of an orientation (quaternion) qSI and a translation IS, between the IMU coordinate frames and the associated segment coordinate frames are called IMU-to-segment calibrations. One such calibration is shown at the right thigh. The symbol G denotes the global coordinate system. The figures have been taken from [Ble17, Mie17]

A sensor fusion algorithm here denotes a combination of a set of stochastic equations to describe the estimation problem, often called a state-space model, and an estimation method to solve this problem. The state-space model defines (1) the variables (states) of interest, i.e., the segment kinematics or joint angles, (2) the evolution of these variables over time (motion models), i.e. difference equations based on assumptions on how the human body moves, (3) how the measurements relate to these variables (measurement models), i.e., forward kinematic equations that relate the motion of the biomechanical model to the IMU measurements. This information is often combined with further constraints from the biomechanical model, such as limited joint DoFs and ranges of motion to restrict the solution space. For IMU based kinematics estimation the resulting estimation problem is nonlinear. Methods to solve this problem (based on noisy data and uncertain assumptions) typically utilize Bayesian inference, where a nonlinear maximum a posteriori estimate can be found in multiple ways [Thr05, Gus12], e.g., via an extended Kalman filter (EKF), which works based on a predictor-corrector scheme, or via sliding-window/moving horizon (nonlinear weighted least squares) optimization, to name two (online-capable) approaches. Figure 2 illustrates these two approaches.

Fig. 2
figure 2

Illustration of an extended Kalman filter (upper row) and a sliding-window optimization (lower row) based solution to IMU based kinematics estimation of an upper limb. On the right side, a general state-space formulation with motion and measurement models for the prediction and correction step of a recursive filter, as well as, a weighted least squares cost function with hard constraints for batchwize numerical optimization are indicated

2.2 Challenges and Solution Approaches

IMU based pose estimation typically suffers from integration drift. This is caused by integrating the noisy and biased gyroscope measurements to obtain orientation changes and by using this for gravity-compensating the accelerometer measurements, which are then double integrated to obtain position changes [Kok16]. Orientation drift is often compensated for by additionally using magnetometers. These provide valuable orientation information in the case of a homogeneous magnetic field. However, the assumption concerning the global magnetic field is often violated, particularly in indoor environments [Lig16]. Therefore, recent research addressed the development of new sensor fusion methods, which can work without using magnetometer information; e.g., Miezal et al. [Mie16] showed that a combination of a redundant biomechanical model definition with biomechanical and kinematic constraints accounted for in an optimization-based state estimation method show lower orientation drift and higher biomechanical model error tolerance compared to the more classical kinematic chain and EKF based sensor fusion method. It was also shown that pairwise kinematic constraints can reduce drift at the joints, e.g., [Wen15, Fas17]. To obtain long-term stable global heading orientation estimation (i.e., transversal plane rotation), additional sensors (e.g., cameras [Ble09]) or scenario-dependent assumptions (e.g., a reset pose or walking on a straight line) can be exploited.

Translation drift can be reduced through so-called zero velocity updates at stationary points on the biomechanical model, a well-known concept from the field of Pedestrian Dead-Reckoning [Har13]; e.g., in [Mie17] a probabilistic kinematics-based ground contact estimation method using four contact points on an anatomically motivated foot model (see Fig. 1) was proposed and in [Ble17] it was integrated with different sensor fusion methods. The results show significant drift reduction for different types of locomotion, such as walking, running, and jumping. To obtain long-term stable global translation estimation, again, additional sensors, such as cameras [Ble09], ultrawideband or global positioning system (GPS) [Hol11, Kok15], can be used. Another area of research addresses methods for obtaining valid and reliable IMU-to-segment calibration parameters (see Fig. 1). State-of-the-art procedures are based on the user performing predefined static poses or functional movements, which make such a system less easy to use and can favor human-induced errors (cf. [Bou15]). An emerging field of research are self-calibration methods, which determine the calibration parameters from sensor measurements without prior knowledge or assumptions about the performed movements, e.g., [See14] proposes a method for two linked segments and [Tae16] proposes a promising proof-of-principle for an online-capable calibration correction and segment kinematics estimation method for the lower limbs.

2.3 Potential Applications

IMU based 3D kinematics estimation enables the reconstruction of individual movement patterns and biomechanically interpretable data (such as joint ranges of motion, trajectories of joints or other anatomical landmarks, segment orientations, spatiotemporal locomotion parameters) in-field. This ability is useful for different application areas. Some popular examples together with the parameters of interest are summarized in the following:

  • Clinical movement analysis [Che16, Ble17]. In numerous areas of medicine, quantitative movement analysis, e.g., gait analysis, has proven effective in supporting assessments, diagnoses, and therapies. Here, a lightweight and easy-to-use wearable measurement system could support functional diagnostics and valid follow-up and documentation in everyday clinical practice.

  • Rehabilitation exercises [Ble13, Lam15]. Exploiting the online processing capabilities of IMU based 3D kinematics estimation, this can also be used to promote and support self-training by providing direct feedback to patients on the movement quality. This is often combined with game-based features to increase motivation (e.g., [Gor17, Ste17]).

  • Sports [Won15, Men16]. In-field capturing capabilities are of particular interest in the area of sports. Here, IMU based 3D kinematics estimation can be used e.g., for analyzing and improving athletic performance (cf. [Ree16] for an analysis of running kinematics during a marathon) or for investigating and treating sport-specific injuries (e.g., assessing leg axis stability after an anterior cruciate ligament injury [Che16]).

  • Workflow analysis and assistance. In this area, IMU based 3D kinematics estimation can be used for different purposes, e.g., for: (1) designing and raising the awareness for ergonomically safe workflows (see [Vig13] for an IMU based system providing real-time ergonomic feedback on hazardous postures), (2) providing user monitoring as ingredient for building an intelligent workflow assistance system [Ble15a], (3) providing kinematic information to control wearable assistive devices, e.g., exoskeletons to support overhead work.

Other application areas for IMU based 3D kinematics estimation, which are only shortly mentioned here, are entertainment (e.g., gesture-based game control, animation of virtual characters) and robotics (e.g., teleoperation).

3 Human Activity Recognition Based on Multimodal Body Sensor Networks

3.1 Motivation

Moving away from the notion of having the exact trajectory of each body part as starting point for activity analysis is justified by three considerations. First, many activities have a distinct motion signature that can be detected even by a single sensor at various body locations. As an example, consider step detection. On the one hand the tracking of the trajectory of at least upper and lower legs is needed for exact step analysis. On the other hand, the up and down motion associated with each step and the shock of the foot hitting the ground produces a distinct acceleration signature that can be easily detected at nearly all body locations (this is how commercial step counters work). Second, for many activities there are important sources of information beyond the tracking of body parts kinematics. As an example, consider grasping an object and putting it on the table. The grasping motion of the fingers, including an estimation of the force, can be derived from the activity of the muscles in the lower arm which in terms can be sensed using electromyography (EMG), capacitive sensing, or textile pressure sensing matrices. In addition, putting the object on the table often produces a characteristic sound which can be detected with a body worn microphone. Given the above additional sensing modalities a rough estimate of the overall arm motion that can be derived from a single wrist worn IMU may be sufficient in terms of motion information. Third, there are many activities where body motions (at least in terms of limbs trajectories), are more or less irrelevant. A good example are cognition dominated activities such as reading, watching a movie, or having a conversation. In addition to obvious sources of information such as audio and first-person video, head motion patterns and eye tracking have been shown to be the key sources of information to distinguish such activities [Ish14].

3.2 Abstract Motion Signatures

As an example of an abstract motion signature the acceleration signal produced by a sensor in a trouser side pocket when the user is walking up and down stairs is shown in Fig. 3. A close inspection shows that there is obvious structure in the data which can be mapped onto features of human steps. We have one part corresponding to the leg being put forward and one for the leg being pulled from behind. In the walking downstairs case we have sharp peaks corresponding to the impact of the foot on the lower step. In the walking up signal soft peaks caused by the lifting and straightening of the leg can be seen. There are three main approaches for the automatic analysis of such signals.

Fig. 3
figure 3

Signals from an acceleration sensor worn by a person in a trouser side pocket while taking two steps up (left) and down (right) the stairs

First, abstract features such as mean, variance, root mean square (RMS), and frequency distribution can be computed on a sliding window with a length corresponding to the time scale of the underlying activity. In the case of step analysis this corresponds to around one second (typical step frequency). The features are then fed into statistical classifiers such as Support Vector Machines (SVMs) or neural networks for the recognition of the underlying activities. For simple tasks such as modes of locomotion recognition (walking, running, walking up/down stairs etc.) such a simple approach is often sufficient.

Second, probabilistic time series modeling methods can be applied to better capture the temporal characteristics of the activity in question. Traditionally, Hidden Markov Models [Moj12, Dav16] (HMMs) have been widely used for wearable activity recognition . Related methods are Conditional Random Fields or various other variations of Dynamic Bayesian Networks (of which HMMs are just a special case). In most cases a separate model is trained for each activity. For recognition each of the trained models are applied to the signal and the one producing the highest probability is selected (which means the corresponding class is recognized). Such models have been used for the recognition of manipulative gestures (picking objects up, operating tools, eating, drinking etc.) from wrist/arm worn motion sensors, which are, in general, harder to separate than modes of locomotion [Jun08].

Third, template matching methods such as Dynamic Time Warping (DTW) can be applied to detect characteristic signal parts [Pha10]. To this end an “average” signal is computed from a large number of examples for each class. For recognition the class whose template is the best match is selected.

3.3 Combination of Abstract Motion and Other Information

When abstract motion signatures fail to provide sufficient discriminative power additional sensing modalities can help. For many activities sound is a very rich source of information. A good example is the use of tools such as a hammer, screwdriver, saw, drill etc. in a wood workshop task [War06]. The respective motions can be quite subtle and difficult to detect in a continuous stream of data. However, the activities have very distinct sounds associated with them. In general, frequency transformations on windows anywhere between 100 ms and 1 s followed by either linear discriminant analysis (LDA), principle component analysis (PCA) or computation of standard frequency domain features such as frequency centroid, bandwidth, spectral rolloff frequency, band energy ratio or cepstral coefficients are used. As shown in Fig. 4 differential sound intensity analysis can also be used to localize the sound’s source with respect to the user or even different body parts. Examples of other relevant sensors that have been used in multimodal activity recognition are:

Fig. 4
figure 4

Augmenting the recognition of wood workshop activities through sound processing [War06]. Top: using sound intensity difference between a microphone on the wrist and on the chest to identify sounds that originate close to the hand (e.g., from a machine that a hand is operating). Bottom left: Using linear discriminant analysis (LDA) of the frequency distribution to discriminate between the sounds of different tools. Bottom right: The overall recognition architecture

  • Muscle activity [Ogr07, Amf06, Che12] monitoring using force sensitive resistors, textile pressure mats or capacitive sensors. The basic idea is that muscle activity leads to shape changes on the surface of the corresponding body part. At the same time looking at muscle activity can provide information that may be difficult to access using direct motion sensors such as IMUs. Thus, the motion of fingers and the palm is driven by muscles in the lower arm where sensors can be mounted much more easily than on the fingers themselves.

  • Body sound. Many processes going on inside the human body create sounds that can be detected with appropriate wearable microphones. Examples range from muscle and joint motion through breathing, heartbeat and coughing to chewing and swallowing. Thus, for example an ear worn microphone can reliably detect chewing including the distinction between different types of food [Amf05].

  • Hand tracking using ultrasonic tracking [Ogr12]. The idea is that the location of the hand with respect to the object on which the user is working is an important piece of information with respect to the user’s activity. An alternative approach is to use a lower arm mounted radio-frequency identification (RFID) reader and RFID tags placed on objects.

  • When working with IMUs magnetic fields and ferromagnetic objects in the environment are often seen as a problem as they disturb the signal (cf. Sect. 2.2). However, such disturbances can also be seen as a source of useful information. Thus, in general, different object appliances and machines will have a unique magnetic signature which can be used to recognize when the body worn IMU is near them [Bah10].

  • Air pressure. While absolute air pressure depends on the weather and is not useful for activity recognition , fine air pressure variations can be used to detect changes in sensors’ altitude corresponding to activities such as walking up or down stairs or even sitting down or standing up.

  • Furthermore, high-level background information such as location, credit card transactions, data from autonomous devices such as smart home components, power consumptions in buildings and similar can be used to enhance activity recognition .

Since different sensors produce very different types of signals, feeding them into a single classifier seldomly produces good results. Instead, hierarchical, multi-stage recognition architectures have been exploited where different sensors are used to classify different activity components and the final recognition is done with appropriate sensor fusion (see Fig. 4). Recently, deep learning systems have been demonstrated to be able to replace such hand-crafted architectures with automatic extractions of intermediate feature hierarchies [Ord16].

Note, as already mentioned in Sect. 1, central differences of the described approach with respect to the above mentioned one (IMU based 3D kinematics estimation) are both in the level of detail with which information is reconstructed (recognized activities versus exact body motions) and the type of methods and models used (black-box statistical models and machine learning algorithms, which result in a dependence on training data, versus model-based sensor fusion methods, which are generally applicable and provide biomechanically interpretable data, but typically have stricter placement and attachment constraints).

4 Existing and Potential Synergies

While the two approaches (IMU based 3D kinematics estimation and general human activity recognition ) have been presented separately in the above sections, there are indeed many existing and potential synergies both on method and on application level. Some of these are shortly indicated in the following.

  • Sensor reduction. Consider the case where a full-body 3D kinematics reconstruction is needed, but the mounting of sensors on all body segments (e.g., 17 IMUs in commercially available systems [Xse17]) is infeasible. Here, large datasets of precisely captured motion (e.g., using a full IMU setup or a gold standard capturing system) have been combined with machine learning algorithms for reconstructing the full-body kinematics with a reduced amount of IMUs; e.g., Tautges et al. [Tau11] uses four accelerometers and Wouda et al. [Wou16] uses five IMUs. In [Mar17], visually pleasing results were obtained with six IMUs using an offline global optimization approach together with kinematic constraints. Obviously, such approaches introduce a dataset dependence and come at the cost of reduced accuracy, which can, however, be sufficient for specific applications.

  • Automatic sensor assignment to body locations. Setting up a system with multiple IMUs can be error-prone regarding the correct placement of the IMUs on the different body segments. Here, machine learning algorithms [Kun05, Kun07, Kun14, Wee13, Zim18], as well as, hierarchical construction-based methods [Gra16] have been applied to obtain an automatic assignment during a predefined movement (e.g., walking). This has been used as pre-processing step for both IMU based 3D kinematics estimation and general activity recognition .

  • Reduction of soft tissue and clothing artifacts. Soft tissue [Lea05] and clothing artifacts concern all body-mounted measurement systems. They constitute a major source of error in both IMU based 3D kinematics estimation and activity recognition [Moh17]. Recent literature provides initial model-based [Kok14] and data-level [Ols17] approaches to address soft tissue artifacts. Based on studies with optical markers attached to the skin [Cam12], Olsson and Halvorsen [Ols17] argue that a linear model is sufficient to compensate for soft tissue artifacts. In [Men15], soft tissue artifacts are compensated for in IMU based 3D kinematics estimation of the upper limbs via a linear regression approach that considers the person’s body and arm total mass, fat mass, lean mass, and fat percentage. The regression is then used to obtain corrected estimates for planar arm movements. Integrating sensors into comfortable (i.e., not very tight) clothes (smart textiles) makes it more feasible to wear multiple sensors over a longer period of time. However, this also results in more severe artifacts. In [Moh17], a deep learning approach is proposed for increasing the signal-to-noise ratio for the case of IMUs being integrated into a training suit.

  • Automatic segmentation of repetitive motions (e.g., rehabilitation exercises). In [Ble15b], a machine learning approach is used to segment the motion data obtained from IMU based 3D kinematics estimation for counting and evaluating single exercise repetitions. There are several advantages of performing the segmentation on the level of reconstructed 3D kinematic data instead of on raw sensor signals, e.g., obtaining biomechanically interpretable features and being more independent from the sensing hardware (cf. also [Ble15a]). A fusion with complementary sensors, such as pressure insoles or mobile force sensors, could enhance both biomechanical analysis (moving from kinematics to kinetics) and activity segmentation.

  • Locomotion analysis. In locomotion analysis both kinematic (e.g., joint angles and segment orientations) and spatiotemporal parameters are of importance (cf. Sect. 2.3). While IMU based 3D kinematics estimation can immediately deliver the former, detection of the critical locomotion events (e.g., initial and terminal contact) is required for deducing the latter from the kinematics data. This is typically based on machine learning algorithms [Che16]. Note, in well-defined scenarios, such as walking straight on flat ground, spatiotemporal parameters have also solely been extracted based on machine learning algorithms, e.g., from two shoe-mounted IMUs using deep convolutional neural networks [Han17].

  • Long-term context-sensitive biomechanical analysis, e.g., for the purpose of (medical) movement analysis in everyday life (instead of in a specific assessment situation) or for ergonomic feedback throughout the workday, could be a further scenario for a potential synergy. Here, classifications based on few sensors (to reduce energy consumption) could be used to trigger detailed biomechanical analyzes with additional sensors, only if relevant activities have been detected (e.g., normal walking, standing up/sitting down, or manipulating high weights).

5 Conclusion

Human motion capturing and activity recognition using wearable sensor networks represent enabling technologies, which can be used to enhance support/assistance systems in many application areas (e.g., in healthcare and sports, workflow analysis, human-computer-interaction, robotics, and entertainment, as already exemplified in the above sections). Knowledge about a user’s motion, activity and possible environment allows assistance systems and assistive devices to adapt to the user and his or her context. This can improve usability and usefulness; e.g., think of Augmented Reality manuals which provide step-by-step guidance for manual workflows, exoskeletons to support overhead work which regulate the amount of support based on the wearer’s motions and activities, leg prostheses which adapt their settings to best support the wearer’s intended activity (e.g., standing up, climbing stairs). Another aspect concerns the improvement of human-machine-interaction; e.g., think of social robotics where knowledge about the human partners’ motions or activities is essential to enable natural interactions. Enabling the provision of (online) feedback concerning a person’s motion or activity is another aspect, which can be beneficial, e.g., think of motor learning in the context of rehabilitation or sports where feedback is both effective and motivational. An obvious aspect concerns the ability to provide in-field monitoring (either for online feedback or documentation), e.g., of daily living activities, specific movement patterns, but also sleep, nutrition or other body functions. These are all relevant in the health context, e.g., think of telemedicine, home based rehabilitation or early diagnosis. Another area is ergonomics, where in-field biomechanical analyzes can be helpful to design and raise the awareness of ergonomically safe workflows. In all of these real-world examples, respective support/assistance systems require or can at least benefit from mobile (i.e., location-independent instead of stationary hardware dependent) information gathering. This can be provided based on wearable sensor networks.