Abstract
This paper presents a human gait data collection for analysis and activity recognition consisting of continues recordings of combined activities, such as walking, running, taking stairs up and down, sitting down, and so on; and the data recorded are segmented and annotated. Data were collected from a body sensor network consisting of six wearable inertial sensors (accelerometer and gyroscope) located on the right and left thighs, shins, and feet. Additionally, two electromyography sensors were used on the quadriceps (front thigh) to measure muscle activity. This database can be used not only for activity recognition but also for studying how activities are performed and how the parts of the legs move relative to each other. Therefore, the data can be used (a) to perform health-care-related studies, such as in walking rehabilitation or Parkinson’s disease recognition, (b) in virtual reality and gaming for simulating humanoid motion, or (c) for humanoid robotics to model humanoid walking. This dataset is the first of its kind which provides data about human gait in great detail. The database is available free of charge https://github.com/romanchereshnev/HuGaDB.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
1 Introduction
The increasing availability of wearable body sensors leads to novel scientific studies and industrial applications [1]. The main large areas include gesture recognition, human activity recognition, and human gait analysis. Several databases have been released for benchmarking; however, due to a wide variety of sensor types and the complexity of activities, these databases are rather distinct. Now, we will review these areas and the corresponding databases in a taxonomic manner.
Gesture recognition (GR) mainly focuses on recognizing hand-drawn gestures in the air. Patterns to be recognized may include numbers, circles, boxes, or Latin alphabet letters. Prediction is usually made on data obtained from smartphone sensors or some special gloves equipped with kinematic sensors, such as 3-axis accelerometers, 3-axis gyroscopes, and occasionally electromyography (EMG) sensors, to measure the electrical potential on the human skin during muscular activities [2]. A database for gesture recognition is available in [3].
Human activity recognition (HAR), on the other hand, aims at recognizing daily lifestyle activities. For instance, an interesting research topic is recognizing activities in or around the kitchen, such as cooking; loading the dishwasher or washing machine; preparing brownies or salads; scrambling eggs; light cleaning; opening or closing drawers, the fridge, or doors; and so on. Often these activities can be interrupted by, for example, answering phones. Databases on this topic include the MIT Place dataset [4, 5], Darmstadt Daily Routine dataset [6], Ambient Kitchen [7], CMU Multi-Modal Activity Database (CMU-MMAC) [8], and Opportunity dataset [9, 10]. In this topic, on-body inertial sensors are usually worn on the wrist, back, or ankle, however, additional sensors are used, such as temperature sensor, proximity sensor, water consumption sensor, heart rate and so on. For instance, CMU-MMAC includes videos, audios, RFID tags, motion capture system based on on-body markers, and physiological sensors such as galvanic skin response (GSR) and skin temperature, which are all located on both forearms and upper arms, left and right calves and thighs, abdomen, and wrists.
Other types of HAR usually focus on walking-related activities, such as walking, jogging, turning left or right, jumping, laying down, going up or down the stairs, and so on. Data on this topic can be found in the WARD dataset [11], PAMAP2 dataset [12, 13], HASC challenge [14,15,16], USC-HAD [17, 18], and MAREA [19]. For data collection, on-body sensors are often placed on the participant’s wrist, waist, ankles, and back.
In some databases, exceptional efforts are taken to provide a reliable benchmark. The body sensor network conference (BSNC) (http://bsncontest.org) [20], for instance, has carried out a contest where organizers provided three different datasets from different research groups. Databases differ in sensor types used and activities recorded. Another team, called the Evaluating Ambient Assisted Living Systems through Competitive Benchmarking – Activity Recognition (EvAAL-AR), provides a service to evaluate HAR systems live on the same activity scenarios performed by an actor [21]. In this contest, each team brings its own activity recognition system, and the evaluation criteria attempt to capture the practical usability: recognition accuracy, user acceptance, recognition delay, installation complexity, and interoperability with ambient-assisted living systems.
Gait analysis focuses not only on the recognition of activities observed but also on how activities are performed. This can be useful in health-care systems for monitoring patients recovering after surgery or fall detection or in diagnosing the state of, for example, Parkinson’s disease [22, 23]. For instance, the Daphnet Gait dataset (DG) [24] consists of recordings of 10 participants affected with Parkinson’s disease instructed to carry out activities that are likely to be difficult to perform, such as walking. The objective is to detect these incidents from accelerometer data recorded from above the ankle, above the knee, and on the trunk. On the other hand, Bovi et al. provide a gait dataset collected from 40 healthy people with various ages as a reference dataset [25]. In the aforementioned BSNC, the third database (ID:IC) contains gait data before knee surgery and 1, 3, 6, 12, and 24 weeks (respectively) after it.
2 Motivation and Design Goals
The main purpose of this dataset is to provide detailed gait data to study how the parts of the legs move individually and relative to each other during activities such as walking, running, standing up, and so on. A summary of the activities can be found in Table 1. This dataset contains continuous recordings of combinations of activities, and the data are segmented and annotated with the label of the activity currently performed. Thus, this dataset is also suitable for analyzing human gait and activities between transitions.
Mainly inertial sensors were used for data acquisition. We decided to use inertial sensors because they are inexpensive, simple to use anywhere such as indoor and outdoor area, and widely available compared with other systems. For instance, compared with video-based motion capture systems, they require expensive video cameras and special full bodysuit with special markers on it. In addition, they are restricted to being used in the installed test area and they are sensitive to lightning and suffer from lost markers phenomenon.
In total, six inertial sensors were placed on the right and left thighs, shins and feet; and data were collected from 18 healthy participants, providing total 10 h of recording. This allows one to investigate how the parts of the legs move individually and relative to each other within and in-between activities. Our dataset could be used as control data, for instance, in health-care-related studies, such as walking rehabilitation or Parkinson’s disease recognition. In virtual reality or gaming, our dataset can be used to model a virtual human movements by reproducing the leg movements from the accelerometer data by simply taking the integrals. In fact, it is not limited to virtual environment and could be used to train to walk and move humanoid robots to make them more humanlike and cope with the uncanny valley.
This dataset is unique in the sense that it is the first to provide human gait data in great detail mainly from inertial sensors and contains segmented annotations for studying the transition between different activities.
3 Data Collection and Sensor Network Topology
In data collection, we used MPU9250 inertial sensors and electromyography (EMG) sensors. Each EMG sensor has a voltage gain is about 5000 and band-pass filter with bandwidth corresponding to power spectrum of EMG (10–500 Hz). A sample rate of each EMG-channel is 1.0 kHz, ADC resolution is 8 bits, input voltages: 0–5 V. The inertial sensors consisted of a 3-axis accelerometer and a 3-axis gyroscope integrated into a single chip. Data were collected with accelerometer’s range equal to \({\pm }2\) g with sensitivity 16.384 LSB/g and gyroscope’s range equal to \(\pm 2000^{\circ }/\)s with sensitivity 16.4 LSB \(/^{\circ }/\)s. All sensors are powered from a battery, that helps to minimize electrical grid noise.
Accelerometer and gyroscope signals were stored in int16 format. EMG signals are stored in uint8. Therefore, accelerometer data can be converted to m/s\(^2\) by dividing raw data 32768 and multiplying it by 2g. Raw gyroscope data can be converted to \( ^{\circ }/\)s by multiplying it by 2000/32768. Raw EMG data can be converted to Volts by multiplying it 0.001/255. We kept the raw data in our data collection in case one prefers other normalization techniques.
In total, three pairs of inertial sensors and one pair of EMG sensors were installed symmetrically on the right and left legs with elastic bands. A pair of inertial sensors were installed on the rectus femoris muscle 5 cm above the knee, a pair of sensors around the middle of the shinbone at the level where the calf ends, and a pair on the feet on the metatarsal bones. Two EMG sensors were placed on vastus lateralis and connected to the skin with three electrodes. The locations of the sensors are shown in Fig. 1. In total, 38 signals were collected, 36 from the inertial sensors and 2 from the EMG sensors.
The sensors were connected through wires with each other and to a microcontroller box, which contained an Arduino electronics platform with a Bluetooth module. The microcontroller collected 56.3500 samples per second in average with standard deviation (std) 3.2057 and then transmitted them to a laptop through Bluetooth connection.
The data were collected from 18 participants. These participants were healthy young adults: 4 females and 14 males, average age of 23.67 (std: 3.69) years, an average height of 179.06 (std: 9.85) cm, and an average weight of 73.44 (std: 16.67) kg.
The participants performed a combination of activities at normal speed and casual way, and there were no obstacles placed on their way. For instance, a participant was instructed to perform the following activities: starting from a sitting position, sitting - standing up - walking - going up the stairs - walking - sitting down. The experimenter recorded the data continually using a laptop and annotated the data with the activities performed. This provided us a long, continuous sequence of segmented data annotated with activities. We developed our own data collector program. In total, 2,111,962 samples were collected from all the 18 participants, and they provided a total of 10 h of data.
Data acquisition was carried out mainly inside a building. However, activities such as running, bicycling, and sitting in a car were performed outside. We collected data in a moving elevator and vehicle. In these scenarios, the activities performed were simply standing or sitting. However, a force impact on the accelerometer sensors and in certain applications, it may be important to consider these facts. Note that we did not collect data on a treadmill.
4 Data Format
Data obtained from the sensors were stored in flat text files. We decided to store the data in flat files because they have one of the most universal formats, and they can be easily preprocessed in all programming languages on every system. One data file contains one recording, which is either a single activity (e.g., walking) or a series of activities. Every file name was created according to the template HGD_vX_ACT_PR_CNT.txt. HGD is a prefix that means human gait data and vX means the version of the data files, currently v1. ACT is a variable, and it denotes the activity ID that was performed. If a file contains a series of different types of activities, then it is indicated as VARIOUS. PR indicates the ID of the person who performed the activity. Data recording was repeated a few times, and CNT is a counter for this. For example, a file named HGD_v1_walking_17_02.txt contains data from participant 17 while he was walking for the second time. The file naming convention is summarized in Table 2.
The main body of the data files contains tab-delimited raw, unnormalized data obtained from the sensors directly. Each data file starts with a header, which contains metainformation. It summarizes the list of activities, the IDs of the activities recorded, and the time and date of the recording. This is summarized in Table 3.
The main data body of every file has 39 columns. Each column corresponds to a sensor, and one row corresponds to a sample. The order of the columns is fixed. The first 36 columns correspond to the inertial sensors, the next 2 columns correspond to the EMG sensors, and the last column contains the activity ID. The activities are coded as shown in Table 1. The inertial sensors are listed in the following order: right foot (RF), right shin (RS), right thigh (RT), left foot (LT), left shin (LS), and left thigh (LT), followed by right EMG (R) and left EMG (L). Each inertial sensor produces three acceleration data on x, y, z axes and three gyroscope data on x, y, z axes. For instance, the column named ‘RT_acc_z’ contains data obtained from the z-axis of accelerometer located on the right thigh.
Sample data with respect to the activities are visualized through a heat map representation in Fig. 2.
A screenshot of some part of data file can be seen in Fig. 3
The data files can be loaded easily in most of the popular programming languages. For instance, they can be loaded in Python using the following script:
Please note that it requires NumPy library. It also can be loaded in Matlab with the following one-line command:
We have prepared a script to load the data into SQLite database, which is available at the database’s website: https://github.com/romanchereshnev/HuGaDB/blob/master/Scripts/create_db.py.
5 Discussion on Data Variance
We were interested seeing the variance among the data collected, in particular, the data variance (A) within a single user and (B) between several users. For this reason, we plotted in Fig. 4 the x-axes acceleration data from the thigh recorded during a short two-three-step walk. Panel A shows the data from various recordings performed by the same user. It can be seen that the data variance at a single frame is quite low suggesting that people perform activities very similar way. On the other hand, panel B shows data obtained from six different, randomly chosen users. Here, a much higher variance can be seen in the same frames compared to the previous case. The increased variance may arise from several facts including: difference in gait, difference in leg shape, sensors mounted in slightly different positions, etc. We obtained similar conclusions on data obtained from different sensors during different activities. We note that, even higher variance was observed in the EMG data, which resulted from the difference in the electricity conduction characteristics of the skin, skin thickness, etc.
Taking into account the high data variance between different users, we emphasize the importance of proper evaluation of machine learning methods developed for human activity recognition. Therefore, we propose using the supervised cross-validation approach for constructing training and test sets [26]. In this approach, all the data from a designated user are held out only for tests and the data from the other 17 participants are used for training. Thus, this approach provides a reliable estimation of how an activity recognition system would perform with a new user whose data was not seen before.
Variance can arise from using different brands of sensors. Unfortunately, we did not have the capacity to collect data from different brands of sensors. We hope the measurement noise is small in general and that different sensors can be calibrated to be compatible with each other.
6 Availability
The database is available free of charge at https://github.com/romanchereshnev/HuGaDB (455 Mb).
7 Summary
The HuGaDB dataset contains detailed kinematic data for analyzing human gait and activity recognition. This dataset differs from previously published datasets in the sense that HuGaDB provides human gait data in great detail mainly from inertial sensors and contains segmented annotations for studying the transition between different activities. Data were obtained from 18 participants, and in total, they provide around 10 h of recording. This dataset can be used in health-care-related studies, such as walking rehabilitation, or in modeling human movements in virtual reality or humanoid robotics. The dataset will be updated with new data from new participants in the future.
References
Aggarwal, C.C.: Managing and Mining Sensor Data. Springer Science & Business Media, New York (2013). https://doi.org/10.1007/978-1-4614-6309-2
Amma, C., Georgi, M., Schultz, T.: Airwriting: a wearable handwriting recognition system. Pers. Ubiquit. Comput. 18(1), 191–203 (2014)
Georgi, M., Amma, C., Schultz, T.: Recognizing hand and finger gestures with IMU based motion and EMG based muscle activity sensing. In: Proceedings of the International Conference on Bio-inspired Systems and Signal Processing, pp. 99–108 (2015)
Tapia, E.M., Intille, S.S., Lopez, L., Larson, K.: The design of a portable kit of wireless sensors for naturalistic data collection. In: Fishkin, K.P., Schiele, B., Nixon, P., Quigley, A. (eds.) Pervasive 2006. LNCS, vol. 3968, pp. 117–134. Springer, Heidelberg (2006). https://doi.org/10.1007/11748625_8
Intille, S.S., Larson, K., Beaudin, J., Nawyn, J., Tapia, E.M., Kaushik, P.: A living laboratory for the design and evaluation of ubiquitous computing technologies. In: CHI 2005 Extended Abstracts on Human Factors in Computing Systems, pp. 1941–1944. ACM (2005)
Huynh, T., Fritz, M., Schiele, B.: Discovery of activity patterns using topic models. In: Proceedings of the 10th International Conference on Ubiquitous Computing, pp. 10–19. ACM (2008)
Pham, C., Olivier, P.: Slice&Dice: recognizing food preparation activities using embedded accelerometers. In: Tscheligi, M., et al. (eds.) European Conference on Ambient Intelligence, AmI 2009. LNCS, vol. 5859, pp. 34–43. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-05408-2_4
De la Torre, F., Hodgins, J., Bargteil, A., Martin, X., Macey, J., Collado, A., Beltran, P.: Guide to the Carnegie Mellon University multimodal activity (CMU-MMAC) database, p. 135. Robotics Institute (2008)
Chavarriaga, R., Sagha, H., Calatroni, A., Digumarti, S.T., Tröster, G., del R. Millán, J., Roggen, D.: The opportunity challenge: a benchmark database for on-body sensor-based activity recognition. Pattern Recogn. Lett. (2013)
Sagha, H., Digumarti, S.T., Millán, J.d.R., Chavarriaga, R., Calatroni, A., Roggen, D., Tröster, G.: Benchmarking classification techniques using the opportunity human activity dataset. In: 2011 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 36–40. IEEE (2011)
Yang, A.Y., Kuryloski, P., Bajcsy, R.: WARD: a wearable action recognition database (2009)
Reiss, A., Stricker, D.: Introducing a new benchmarked dataset for activity monitoring. In: 2012 16th International Symposium on Wearable Computers, pp. 108–109. IEEE (2012)
Reiss, A., Stricker, D.: Creating and benchmarking a new dataset for physical activity monitoring. In: Proceedings of the 5th International Conference on Pervasive Technologies Related to Assistive Environments, 40 pages. ACM (2012)
Kawaguchi, N., Ogawa, N., Iwasaki, Y., Kaji, K., Terada, T., Murao, K., Inoue, S., Kawahara, Y., Sumi, Y., Nishio, N.: HASC challenge: gathering large scale human activity corpus for the real-world activity understandings. In: Proceedings of the 2nd Augmented Human International Conference, 27 pages. ACM (2011)
Kawaguchi, N., Watanabe, H., Yang, T., Ogawa, N., Iwasaki, Y., Kaji, K., Terada, T., Murao, K., Hada, H., Inoue, S., et al.: HASC2012corpus: large scale human activity corpus and its application. In: Proceedings of the IPSN, vol. 12 (2012)
Kawaguchi, N., Yang, Y., Yang, T., Ogawa, N., Iwasaki, Y., Kaji, K., Terada, T., Murao, K., Inoue, S., Kawahara, Y., et al.: HASC2011corpus: towards the common ground of human activity recognition. In: Proceedings of the 13th International Conference on Ubiquitous Computing, pp. 571–572. ACM (2011)
Zhang, M., Sawchuk, A.A.: USC-HAD: a daily activity dataset for ubiquitous activity recognition using wearable sensors. In: Proceedings of the 2012 ACM Conference on Ubiquitous Computing, pp. 1036–1043. ACM (2012)
Zhang, M., Sawchuk, A.A.: Human daily activity recognition with sparse representation using wearable sensors. IEEE J. Biomed. Health Inform. 17(3), 553–560 (2013)
Khandelwal, S., Wickström, N.: Evaluation of the performance of accelerometer-based gait event detection algorithms in different real-world scenarios using the Marea gait database. Gait Posture 51, 84–90 (2017)
Giuberti, M., Ferrari, G.: Simple and robust BSN-based activity classification: winning the first BSN contest. In: Proceedings of the 4th International Symposium on Applied Sciences in Biomedical and Communication Technologies, 34 pages. ACM (2011)
Gjoreski, H., Kozina, S., Gams, M., Lustrek, M., Álvarez-García, J.A., Hong, J.H., Dey, A.K., Bocca, M., Patwari, N.: Competitive live evaluations of activity-recognition systems. IEEE Pervasive Comput. 14(1), 70–77 (2015)
Sant’Anna, A., Salarian, A., Wickstrom, N.: A new measure of movement symmetry in early Parkinson’s disease patients using symbolic processing of inertial sensor data. IEEE Trans. Biomed. Eng. 58(7), 2127–2135 (2011)
Sant’Anna, A.: A symbolic approach to human motion analysis using inertial sensors: framework and gait analysis study. Ph.D. thesis, Halmstad University (2012)
Bachlin, M., Roggen, D., Troster, G., Plotnik, M., Inbar, N., Meidan, I., Herman, T., Brozgol, M., Shaviv, E., Giladi, N., et al.: Potentials of enhanced context awareness in wearable assistants for Parkinson’s disease patients with the freezing of gait syndrome. In: 2009 International Symposium on Wearable Computers, pp. 123–130. IEEE (2009)
Bovi, G., Rabuffetti, M., Mazzoleni, P., Ferrarin, M.: A multiple-task gait analysis approach: kinematic, kinetic and emg reference data for healthy young and adult subjects. Gait Posture 33(1), 6–13 (2011)
Kertész-Farkas, A., Dhir, S., Sonego, P., Pacurar, M., Netoteia, S., Nijveen, H., Kuzniar, A., Leunissen, J.A., Kocsor, A., Pongor, S.: Benchmarking protein classification algorithms via supervised cross-validation. J. Biochem. Biophys. Methods 70(6), 1215–1223 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Chereshnev, R., Kertész-Farkas, A. (2018). HuGaDB: Human Gait Database for Activity Recognition from Wearable Inertial Sensor Networks. In: van der Aalst, W., et al. Analysis of Images, Social Networks and Texts. AIST 2017. Lecture Notes in Computer Science(), vol 10716. Springer, Cham. https://doi.org/10.1007/978-3-319-73013-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-73013-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73012-7
Online ISBN: 978-3-319-73013-4
eBook Packages: Computer ScienceComputer Science (R0)