Keywords

1 Introduction

Currently, there are four main types of technologies to achieve Human Activity Recognition (HAR), which are sensors [16], radar[28], RF signals [30] and camera [2]. Due to cost constraints, most applications mainly use the first two methods that are sensors and radar hardware to capture and identify human movements of activity. Their principles are different: radar mainly obtains the distance, speed and angle of the object by sending electromagnetic waves and receiving echoes. Sensors based on wearable devices to collect physical quantities of human movement. Such as angular velocity [4], linear velocity relative to the ground [13], acceleration [3], vibration [29] and direction based on contact [12].

These two technologies have their own advantages and disadvantages. In general, wearable sensors have low cost [11], easy to use [22] and less limited by usage scenarios [27]. It is now necessary hardware for mobile portable devices to realise motion recognition. However, its range and accuracy are not as good as radar, which performance limited by other hardware such as batteries and microprocessors. The radar can realise the object detection with higher precision through the principle of Doppler shift [19]. Moreover, it is not constrained by power, and the hardware performance can be better released. However, it is generally used in a fixed scenario that can not move around quickly, which means the capture of the object signal is easily affected by factors such as occlusion and angle limited.

Therefore, a new solution is proposed, which integrates the sensor and radar cooperate to form a human motion perception system, it is complementing each other to achieve a more stable and reliable human motion recognition function [18]. After adopting the fusion method, the sensor and radar will acquire different perception information and complement each other quite well. However, there may also be contradictions by some special conditions. For example, under a single signal data, sometimes conflicting activity recognition results are obtained, it is caused by feedback the incompatible acquired information. Conflicting information may affect the recognition judgment of the results [23]. Therefore, in order for the perception system to receive consistent and clear activity results, it is necessary to fuse the data of sensor and radar by suitable calculation.

The basic principle of data fusion between sensors and radar signal, it is similar to the integrated processing of information from multiple organs such as eyes, nose and ears by the human brain. It mainly integrates data and information obtained by multiway hardware, which calculates redundant and complementary information of multiple type data in space or time, and then obtains a consistent description of activity feature. Following the early human activity recognition research to extend on the data fusion of multiple sensors [37]. At this point, we propose a data fusion method for sensors and radar signal, and processing it with neuromorphic computing to finally achieve a perception system for human activity recognition.

In recent years, the application of multi-hardware data fusion technology has become more and more widely, such as industrial control [5], robotics [21], object detection [20], traffic control [26], inertial navigation [31], remote sensing [32], medical diagnosis [38], image processing [33], pattern recognition [15] and other fields. Compared with a single-sensor system, the use of multi-hardware data fusion technology helps solve problems of detection, tracking and recognition. It can improve the reliability and robustness, enhance the credibility and accuracy of data, extend the time of the entire system, Space coverage, and increase the real-time performance and utilization of data.

In the traditional machine learning method [34], Muhammad et al. [24] proposed a data fusion for ensemble computing with a Random Forest algorithm to predict multiple sensors. The multiple sensor data is integrated with the fog computing environment to perform processing in a decentralised manner, and then feeding into the global classifier after performing data fusion, it can achieve more than 90% of average accuracy. Li et al. [17] use the Sequential Forward Selection method to fuse IMU and Radar information to form time series data. It can be used as features to train the Support Vector Machine (SVM) [36] and Artificial Neural Network (ANN) algorithm for classification computing, which improves about 6% higher than a single type of data. In view of the uneven data quality of different hardware [35], Huang et al. [14] propose three sparsity-invariant operations to utilise multi-scale features effectively. The sparsity-invariant of the hierarchical multi-scale encoder-decoder Neural Network is used to process sparse input and sparse feature maps for multi-hardware data. The features of multiple sensors can be fused further to improve the performance of deep learning work. However, multi-hardware of data fusion needs to run synchronously, ensuring the time axis is unified in the coordinate system. It puts forward higher requirements for multi-modal data acquiring, and not conducive to the system’s future expandability.

This paper explores a new multi-hardware data fusion method that makes use of IMU sensors and Universal Software Radio Peripheral (USRP) for human motion recognition. Our approach is based on using building a constructing feature matrix to fusing different hardware information as unify data. Different hardware signals are difficult to match due to the signal shapes between the object. The difference in location and time axis is the main disadvantage of these data fusion. In order to overcome this limitation, we constructed matrices from vectors based on principal component analysis to combine IMU and USRP signals, then helps multi-hardware data fusion, and finally achieve better classification and recognition effect than traditional data fusion results.

The organization structure of this paper is as follows: In Sect. 2, we introduce the signal processing process of IMU and USRP hardware for human motion capture, and the feature details of data fusion. In Sect. 3, we propose the feature matrix details of data fusion with PCA algorithm, and quantitatively evaluate the recognition accuracy. In Sect. 4, we discuss the proposed data fusion processing methods, potentials, limitations and compare the results with related work. Finally, Sect. 5 summarizes this work, and it also describes our future direction of data fusion in different hardware.

2 Materials and Methods

We collect human activity signals through the Inertial Measurement Unit (IMU) sensor [9] and the Universal Software-defined Radio Peripheral (USRP) [10]. The IMU sensor is worn on the wrist of volunteers. On the side, the USRP keeps a distance of 2 meters from humans to collect electromagnetic signals in a fixed way. We tested a total of 20 volunteers for two activities with three repeat times, which are sitting down and standing up. The raw signal data of the sensor and the USRP is shown in Fig. 1.

Fig. 1.
figure 1

The raw activity signal data from the IMU sensor and USRP.

2.1 IMU State Modeling

Inertial Measurement Unit (IMU) is a device for measuring the attitude angle of an object. The main components include a gyroscope, accelerometer and magnetometer. The gyroscope detects the angular velocity signals relative to the three degrees of freedom (X, Y and Z) in the coordinate navigation system, and the accelerometer monitors the acceleration signals of the independent three axes of the object carrier coordinate system in X, Y and Z directions. The magnetometer can obtain the surrounding magnetic field information. It can calculate the angle between the module with the north direction through the geomagnetic vector and help correct the angular velocity parameters of the gyroscope. The final real-time output of three-dimensional angular velocity signal, acceleration signal, and magnetic field information is used to calculate the object’s current posture.

The gyroscope directly measures the angular velocity rather than the angle, which is converted into angular velocity by acquiring the Coriolis force. Coriolis force as physical information comes from the inertia of the motion of the object. This is a description of the deviation on the linear motion, and it is a particle moving in a rotating system due to relative inertia. In the sensor, the object keeps moving in a direction. The angular velocity of rotation will produce Coriolis force in the vertical direction, which results in a change of capacitance difference. The change in capacitance is proportional to the rotation angular velocity, and then the rotation angle can be obtained from the corresponding capacitance. The principle of an accelerometer is more straightforward than a gyroscope. It measures acceleration directly through specific force, which is the overall acceleration without gravity or the non-gravitational force acting on the unit mass. The sensor of the mass block will move under the action of acceleration affected. The capacitors on both sides measure the mass block’s position and calculate the magnitude of the acceleration. The magnetometer uses the Hall effect to measure the strength of the magnetic field. Under the magnetic field’s action, the electrons will run in the vertical direction to generate an electric field on the side. Therefore, the magnetic strength can indirectly measure the strength and the positive and negative sign of the electric field. At this point, the IMU hardware’s primary work is to obtain more accurate activity information from the three sensors’ data.

2.2 USRP State Modeling

The USRP is a device that allows for radio frequency communication between a transmitter and a receiver antenna and allows for a wide range of parameters to be defined with the use of software. These devices are commonly used within research labs and universities. The device connects to a PC where software is used to control the USRP hardware for transmission and receiving RF signals.

In this paper the USRP is set up to communicate using Orthogonal frequency-division multiplexing (OFDM). Channel Estimation is an important feature of OFDM as it monitors the state of the channel for the purpose of improving performance. Channel estimation does this by using a specified set of symbols known as pilot symbols. These symbols are used in the transmission of the data and once the receiver antenna receives the data the received pilot symbols are compared to the expected pilot symbols and this provides details of the State of the channel.

This paper observes the channel state information while the activities of Sitting, Standing and no activity take place between the transmitter and receiver antennas. This is carried out through several samples. These samples can then be used in Machine Learning applications to see if the patterns can be recognised. The samples in this paper include 64 subcarriers. An average of each subcarrier is taken, and this represents the signal propagation while an action takes place between the transmitter and receiver. Each activity takes place while the USRP is communicating between the transmitter and receiver for 5 s. Then the data can be stored in CSV format. Then all the activities are complied into a single dataset for processing in machine learning.

3 The Proposed Structure Matrix to Data Fusion

Data fusion utilises comprehensive and complete information about the object and environment obtained by multiple hardware devices, which is mainly reflected in the data fusion algorithm. Therefore, the signal processing’s core point on a multi-hardware system is to construct a suitable data fusion algorithm. For multi-type sensor hardware, which acquired information is diverse and complex. Moreover, the basic requirements for information data fusion methods are robustness and parallel processing capabilities. There are also requests for the speed and accuracy of the method, and the previous preprocessing calculations and subsequent recognition algorithms Interface compatibility that to coordinate with different technologies and methods; reduce information sample requirements, etc. In general, data fusion methods are based on nonlinear mathematical computing. It can achieve fault tolerance, adaptability, associative memory, and parallel processing capabilities.

The raw data obtained by the separately single hardware about the two activities, but how to fuse the two hardware’s signal information becomes the key. They cannot be directly performed the calculation because their sampling rate makes the time axis incompatible with the coordinates system. At this point, the raw data need to reduce the time dimension of the activity feature. The Principal Component Analysis (PCA) [1] algorithm used for data dimension reduction can be achieved a time-independent activity features. Furthermore, after analysing the two types of signal, we can find that the sensor data has more dimensions than USRP data. Therefore, we designed a big sub-matrix to represent the sensor signal of human activity after data dimension reduction and a small sub-matrix representing the USRP signal after data dimension reduction.

Fig. 2.
figure 2

The entire calculation workflow of feature extraction to human activity recognition.

Meanwhile, this design also facilitates subsequent machine learning, which requests normalise the fused data to obtain a standard feature map pattern [25] of each activity, and then loaded the feature map template into the Neural network algorithm for training. It gets the combination matrix of the feature map pattern. Its activity feature matrix of the sensor and USRP is combined by a direct row arrangement method of previous sub-matrices. Finally, the neural network obtains the classification function of the two activities based on the training. The neural network as a classifier to output recognition result. Figure 2 shows the entire calculation workflow from feature extraction to recognition.

3.1 Principal Component Analysis for Feature Extraction

The PCA algorithm for data dimensionality reduction calculates the covariance matrix of one dimension with sample information, and then solves the feature value and corresponding feature vector of the covariance matrix. It arranges these feature vectors according to their corresponding feature values from large to small as a new projection matrix. In this case, the projection matrix as the feature vector pattern after sample data transformation. It maps n-dimension features to k-dimension space, which is a brand new orthogonal feature as the principal component. It is a re-constructed k-dimension feature based on the original n-dimension features. Following the first K-dimension vectors are the essential features of the high-dimension data retained to remove noise and unimportant interference factors to improve data information quality.

4 Experimental Evaluation

Fig. 3.
figure 3

The ANN algorithm classification confusion matrix of IMU sensor, USRP and IMU sensor fused USRP data.

After PCA feature extraction of the hardware signal, it is evaluated the training and testing performance following by the Artificial Neural Network (ANN). Figure 3 illustrates the recognition accuracy of sensors data applied to the machine learning algorithm. The evaluation for the performance of a single sensor and fused data modality in activity recognition. It is the two layers ANN algorithm classification results of the confusion matrix for Sit down and Stand Up activities. Following feature extraction through the PCA algorithm, the ANN algorithm presents the results in terms of classification accuracy for different hardware (IMU Sensor and USRP) of direct processing and data fusion when fused the different features.

Table 1. Comparison table with other data fusion methods.

By comparing the single hardware data’s classification performance with the data fusion method on the machine learning of neural network algorithm, we designed the fusion method to increase the activity classification accuracy from single signal data of 90.5% from IMU sensor and 81.8% from USRP signal to 99.2% after the IMU and USRP data fusion. This evaluation proves that our solution can pass the method of constructing the matrix helps the data fusion between different hardware, and the fused data can obtain higher accuracy. When the sensors are used individually, the IMU Sensor is more suitable for measures human activity. For multi-hardware of both IMU sensor and USRP, it improves single type signal quality with more angle and dimensionality on the feature extraction.

Table 1 shows a comparison against traditional machine learning algorithms’ accuracy and proves that better results are achieved through the proposed data fusion method. Such as Chen et al. worked on the IMU Sensor with accelerometer and gyroscope, and they used traditional machine learning algorithms (K Nearest Neighbor (KNN), Random Forest (RF) and Support Vector Machine (SVM)) to classify human activities. Furthermore, Chung et al. improve the data fusion method to suitable for 9 axes IMU Sensor (magnetometer, accelerometer and gyroscope) and achieve results from the LSTM network. Based on the Kinect, IMU and EMG of the multi-hardware platform, Calvo et al. implements the Hidden Markov Model Classifier to recognise human activity signal, and Zou et al. design a Deep Neural Network (DNN) framework to ensembled the C3D and Convolutional Neural Network (CNN) model to processing fused data of WiFi-enabled IoT devices and camera. However, through comparing accuracy, our implementation is more accurate than their classification. We believed that the recognition findings are preferable, demonstrating that the PCA model to fuse multi-hardware signal features effectively recognises human behavior. Furthermore, our proposed workflow has greater robustness and can adapt to more different types of matrix data.

5 Conclusion

This paper proposes a data fusion design, which simply and quickly fuses the different hardware signal data by constructing a matrix processing the extracted features. From the experimental results, the fused data’s accuracy is average about 13% higher than that of a single hardware signal under the same classification algorithm. It is finally achieved 99.2% classification accuracy on multi-hardware and multi-activities signals. This shows that this method can effectively fuse data between different hardware, and help the different data types to obtain more dimensional features without affecting the classification accuracy. The above results have demonstrated the potential of multi-sensor fusion in human activity recognition. For future work, a more intelligent algorithm will be deployed, with a drive to perform a more flexible multi-hardware framework in different environments.