Keywords

1 Introduction

Location-Based Services (LBS) applications have recently attracted more attention in the research community. These services are aimed at assisting and improving customer gratification, thus improved conceptualization and analyses of both staff and customers actions and behaviours [1]. LBS provides useful information about the person’s or object’s current position. LBS can be categorized into an outdoor and indoor system. Among these applications, GPS-based positioning and navigation services have been deployed in an outdoor environment. However, GPS requires being in sight with GPS satellites. Due to this challenge of a line of sight with GPS satellites, GPS fails in an indoor environment because of multipath effects caused by walls or indoor setup in general [2,3,4]. Hence, indoor wireless location positioning is gaining momentum to locate objects within close area. Localization techniques such as Wi-Fi [5, 6], Bluetooth [7, 8], and vision-based techniques [9] are used indoor. Yet again, these techniques come with positioning pros and cons with regards to accuracy, energy consumption, and positioning delay [10].

Positioning are categorized as Trilateration and Fingerprinting methods, the former uses Time of Arrival (TOA), Angle of Arrival (AOA), or Received Signal Strength Indicator (RSSI) and the latter uses RSSI approach to create radio-map and estimates the location by matching RSSI. The fingerprinting method uses a probabilistic matching algorithm such as K-Nearest Neighbour (KNN), Decision Trees, and Naïve Bayes amongst others to estimate and make decisions. Nevertheless, the RSSI fluctuate because of multipath effects, interference, and shadowing effects [11] and without corrections of RSSI the localisation error of the matching algorithm is too high to achieve the needs of indoor LBS.

In this study, we propose a model to correct localisation error based on the fingerprinting algorithm. Our technique computes the mean and standard deviation to correct the measured RSSI values. We provide comparative performance metrics of Machine learning classification algorithms based of corrected RSSI and measured RSSI. Our results show an improvement in localization accuracy, with misclassification of 5.5% and 4.9% from 31.1% and 30.4% in the Kernel Naïve Bayes and Fine Decision Tree, respectively. The result confirms that the proposed model is effective in reducing the localisation error, and that to enhance localization performance we have to employ a filter to reduce RSSI measurements noise during the calibration phase of the fingerprinting algorithm.

The remainder of this paper is structured as follows: In Sect. 2, we present related work. In Sect. 3, we present the methodology followed to improve indoor location estimation. In Sect. 4, we present testing and results. Finally, we provide conclusion and future work in Sect. 5.

2 Related Work

Over the past few years, several models in location tracking and monitoring have been proposed to attempt to reduce the issue of multipath effects, and indeed acceptable results up to a certain extent were achieved. However, more research is explored to attempt to solve the multipath issue that affects localisation accuracy. Researchers in [9], proposed a computer vision application that is used to detect and track any human in the presence of Closed-Circuit Television (CCTV) with face recognition. The study developed a MATLAB 2015b application that focuses on the distance to detect a human being in the presence of camera, the influence of light, and the number of faces that can be detected at the time. The authors in [9] discovered that the application could detect and recognise only one face at a time, in any lighting condition, and up to a distance 300 cm from the CCTV. With limited coverage and effects of shadowing as the number of faces increase, the multipath issue remains a concern, which our research seeks to address.

The authors in [12] used a Wi-Fi integrated Device-free localization (Dfl) system for intrusion detection of the human body. The system relies on RSSI patterns caused when the human body in motion passes through the system. The Dfl system stores the RSSI changes and the raw data of RSSI is filtered using the Alpha Trim Mean Filter. The filter removes both the extreme high and low RSSI values, which are considered as outliers. This approach reduces the noise, however, another study in [13], showed that extremely high RSSI values are much closer to the correct RSSI value. Our proposed model differs with [12] and [13] models, because we do not eliminate nor consider only extreme high RSSI values. We eliminated RSSI values below the mean and standard deviation.

In [14], the authors proposed a Wi-Fi-based location estimation technique based on RSSI measurements from existing access points. They used a fingerprinting algorithm with additional visual access points and also adopted Kalman Filter and Particle Filter to improve localization error accuracy. Their results showed a high occurrence distance error of 4.49 m, because they in [14] focused more on the online phase filtering than the improvement of the calibration phase. A similar technique was proposed in [15], the authors employed Bluetooth technology to estimate the location and to enhance localization accuracy. The authors in [15] proposed propagation model to determine the distance using the RSSI measurements and weighted centroid. The Affinity Propagation Clustering is used to reduce the size of fingerprint by selecting the Required Points (RP) with the largest RSSI, as proposed in [16]. To reduce RSSI noise, authors in [15] used the exponential averaging method, which produced an 1.05 m 1.38 m error on corridor and furnished computer lab respectively. However, the solution is different from our proposed models because the exponential average gives more weight to the recent RSSI, which is filtered in the online phase.

The work presented in [17] differs with the ones in [14] and in [15], they used Artificial Intelligence approaches to improve localization error by deploying the Particle Swamp Optimisation algorithm (PSO). The Fingerprinting algorithm was employed with Wi-Fi RSSI measurements and the Weighted Fuzzy Matching algorithm was used to estimate indoor location. The Weighted Fuzzy Matching algorithm is also used in [18] to compare the effectiveness of Li-Fi compared to Wi-Fi. Besides, in [17], the PSO Algorithm was applied to improve the accuracy. The PSO algorithm reduced the localisation error 2 m to 1.2 m. In Table 1, we present an overview of the various existing localisation techniques and various available gaps to be considered for future work.

Table 1. Indoor localization models.

In most of the mentioned models in Table 1, we found that multipath effects and signal strength attenuation caused by the brick walls [26], human beings, and indoor environment setting in general is major concern. These challenges cause a significant error of RSSI measurements classified into [13]:

  • Fading - which is the error caused by multipath effects of the walls and other indoor structures. Usually, a stronger value of RSSI is only affected by fading indoor.

  • Shadowing - which is caused by the presence of people. The closer a person is to either a transmitter or a receiver; the more the signal strength will be affected.

  • Interference - which is caused by the presence of other devices that share radio channels as Wi-Fi such as Bluetooth devices and microwave oven. If these devices co-exist, they will use the same frequency band, and RSSI measurements will be affected.

Therefore, all future location positioning methods must consider all these factors that affects the measure of signals.

3 Methodology

In this Section, we present our proposed model to correct the RSSI measurements. In Fig. 1, our proposed system architecture is represented based on the Fingerprinting algorithm.

Fig. 1.
figure 1

EMPsys system architecture

Our EMPsys employs the Fingerprinting algorithm, which consist of the offline phase and the online phase. Database generation is performed during the offline phase by scanning for RSSI and populating it in the database from various known required points or points of interest. Our mobile application collects RSSI along with corresponding SSID, MAC address and timestamp from Wi-Fi-Direct-Enabled devices or any other alliance device such as Wi-Fi-Hotspots and sends them to a Firebase real-time database. The RSSI measurements are used to localise the devices, the SSID and MAC address are used to identify the device, and the timestamp is used to specify the time the device information was observed. During the online phase, the smartphone scan for RSSI from unknown location. A machine learning classification algorithm is utilized to estimate the target devices’ location by matching with the radio map or database generated during offline.

Fig. 2.
figure 2

Offline phase database generation

3.1 Offline Phase RSSI Correction

In this study, we focus only on the offline phase sometimes called calibration phase in order to select a suitable algorithm for our online EMPsys model. We developed our offline proposed EMPsys model on Android Studio and the flow diagram for the offline phase is shown in Fig. 2. The SSID, MAC addresses, RSSI and timestamp at each RP are populated into a database.

The WifiManager API was utilized to check whether the Smartphones’ WiFi is turned on, If not, the user should press the Calibrate button to turn it On by calling the setWifiEnabled() method and start with calibration. The target devices are placed at each Required Points (RP) during the offline phase. A number of N \(RP_N\) (where N ) as represented in Eq. (1), are known locations. At each RP a number of M \(RSSI_M\) (where M ) measurements are collected by calling the getScanResults() method. The RSSI measurements are stored in a N \(\times \) M table as given in Eq. (2).

$$\begin{aligned} RP_N=\{RP_1,RP_2,\dots ,RP_N\} \end{aligned}$$
(1)
$$\begin{aligned} RSSI_{NM}= \begin{bmatrix} RSSI_{11} &{} RSSI_{12} &{} \dots &{} RSSI_{1M} \\ RSSI_{21} &{} RSSI_{22} &{} \dots &{} RSSI_{2M} \\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ RSSI_{N1} &{} RSSI_{N2} &{} \dots &{} RSSI_{NM} \end{bmatrix} \end{aligned}$$
(2)

The SSID, MAC addresses, RSSI and timestamp at each RP are populated into a database.

Therefore we computed the Mean and the Standard Deviation (SD) on RSSI readings using Eqs. (3) and (5) from each RP to detect outliers. The mean and SD are therefore combined to form Eqs. (4) and (6) respectively

$$\begin{aligned} mean_{{RP}_{N}} = \frac{ \sum _i=1 ^M RSSI_i }{N} \end{aligned}$$
(3)
$$\begin{aligned} RSSI_{NM}= \begin{bmatrix} \overline{RSSI_1}\\ \overline{RSSI_2}\\ \vdots \\ \overline{RSSI_N}\\ \end{bmatrix} \end{aligned}$$
(4)
$$\begin{aligned} SD_{{RP}_{N}}= \sqrt{ \frac{ \sum _i=1 ^M(RSSI_i - \overline{RSSI_N})^2 }{M} } \end{aligned}$$
(5)
$$\begin{aligned} RSSI_{NM}= \begin{bmatrix} SD_1\\ SD_2\\ \vdots \\ SD_N\\ \end{bmatrix} \end{aligned}$$
(6)

For each vector of RSSI measured at \(RP_N\), we apply the mean and standard deviation to filter each vector of RSSI, to correct signal attenuation due to indoor obstacles. All RSSI signals detected to be outliers are filtered and replaced with the mean, to eliminate poor RSSI to only consider stronger signal strengths from the mean.

3.2 Raw Dataset and Properties

The dataset used in our study to simulate the effectiveness of our proposed model is originally collected in the study named SHiB [27] for a smart home environment. The environment in which SHiB dataset was collected is divided into locations named localization, with 10 calibrations. The dataset is published publicly online on GitHub [28]. The values of dataset variables consists of [27, 28]:

  • Timestamp: The date and time of data which was collected.

  • RSSI: Numerical signal strengths from each gateway.

  • s1x, s1y, s1z, ..., s5x, s5y, and s5z: Categorical tri-axial accelerometer readings X, Y, and Z, which basically helps to identify activity the user is performing. X-axis is horizontal points right (x+) and left (x−), Y-axis is vertical and points up (y+) and down (y−) and Z-axis points outside (z+) and inside (z−) of the wearable device [29].

  • Gateway: Categorical location in which the RSSI signal strength is read, and the Raspberry Pi are installed.

  • Localization: Is the Categorical location where the user wrist wearable device is located.

  • Activity: Categorical activity is used to determine the localization.

3.3 Classification

We used the SHiB dataset with 10 calibrations collected from 10 users. We combined all 10 calibrations to make as a single dataset with 56099 rows, then we split the dataset as training dataset and testing dataset given as 80% and 20% in MATLAB respectively. We compute the mean and standard deviation from each gateway and filter the RSSI measurements. The RSSI measurements that are considered as outlier are then corrected or replaced with the mean RSSI at each gateway. In the next stage, we simulated the Naïve Bayes, Decision tree, and KNN classification algorithms using the combined dataset as in [28]. Therefore, we compared the performance of each classification algorithms in accuracy, precision, and recall to determine the best performing classifier following a model in Fig. 3, to select a best suitable algorithm for our EMPsys online phase machine learning algorithm.

Fig. 3.
figure 3

Localization diagram

4 Testing and Results

The combined dataset with 56099 rows was loaded into MATLAB. The data is split into 80% training and 20% testing. The performance of evaluation on each machine learning algorithms presented in confusion matrix. The metric performance indicators are given by Eqs. (7), (8) and (9).

$$\begin{aligned} Accuracy = \frac{(TN+TP)}{(TP+TN+FP+FN)} \end{aligned}$$
(7)
$$\begin{aligned} Precision = \frac{(TP)}{(TP+FP)} \end{aligned}$$
(8)
$$\begin{aligned} Precision = \frac{(TP)}{(TP+FN)} \end{aligned}$$
(9)

Whereby TN is True Negative, TP is True Positive, FP is False Positive, and FN is False Negative.

4.1 Localization Classification Results

Firstly, we used our combined dataset to predict the location of the user with a wearable device. The predictors used are the accelerometer coordinates, activity, and gateway. The result as summarised in Table 2, shows that the decision tree emerged as the best performing algorithm with 100% accuracy and Naïve Bayes 94.3% accuracy.

Table 2. Localization classification results
Fig. 4.
figure 4

Unfiltered RSSI and filtered RSSI

Therefore we added unfiltered RSSI readings as another predictor with accelerometer coordinates, activity, and gateway, to check whether it affects the results in Table 2, to predict the location of the user with a wearable device. The results were not affected and remained the same. With good results of over 94% accuracy in all classifiers, we conclude that our combined dataset is effective, and RSSI readings do not affect localization (which is one of the variables in the dataset) prediction at this stage.

4.2 Gateway Classification Results

In the second step, we performed the gateway classification. We split the combined dataset according to each localization. Then, we only used RSSI as a predictor in this stage. We used our algorithm presented in Fig. 2 to compute the mean and standard deviation to filter and correct the RSSI, where all weak signal strength are discarded and replaced by the mean at each gateway. The results of the unfiltered and filtered RSSI are presented in Fig. 4(a) and (b) respectively.

The comparison results of unfiltered RSSI and filtered RSSI as summarised in Tables 3, 4 and 5. The results show improvements in accuracy, precision, and recall using filtered RSSI feature (Mean and SD). The Kernel Naïve Bayes showed an improved misclassification error from 31.1% to of 5.5% and with the new accuracy of 94.5%. Whereas the Fine Decision Tree showed an improvement from a misclassification error rate of 30.4% to 4.9% and with the new accuracy of 95.1%. In general, we considered the performance metrics and the results shows positive effects on the proposed mean and standard deviation approaches, with Fine Tree, Coarse Decision Tree and Kernel Naïve Bayes showing good results with an overall precision of 95.5%, 95.4%, and 95% respectively and also with the overall recall of 95.1%, 94.8% and 94.5% respectively.

Table 3. Gateway classification accuracy
Table 4. Gateway classification precision
Table 5. Gateway classification recall

The results listed in Tables 3, 4 and 5 show a significant improvement because all lower performing algorithms with unfiltered RSSI reduced misclassification significantly with filtered and corrected RSSI features. Applying our proposed model filter reduced the standard deviation and increased the stability of calibration RSSI measurements from each gateway. In addition, the accuracy of the machine learning algorithms adopted in our study performed better than the model that used the same dataset proposed in [27], which produced 92% accuracy at room level. Thus, we can conclude that our technique is effective and comparative in reducing the location error.

5 Conclusion

In this paper, we presented a model capable of filtering and correcting poorly received signal strength during the calibration phase of the fingerprinting algorithms. We presented challenges that affect RSSI measurements in indoor environment and, proposed a mean and SD RSSI filtering approach and finally compared the unfiltered and filtered RSSI. We selected and combined 10 calibrated SHiB dataset to predict the location of the user with a wearable device in the first stage. We conducted experimentation using several machine learning algorithms with accelerometer coordinates, activity, and gateway variables as our predictors for localization variable classification and all produced over 94% accuracy and shows that our combined dataset is effective. We then added RSSI variable as another predictor and the localization prediction results remained the same, which shows that the RSSI measurements have no effect in the first stage. In the second stage, we applied our proposed model to filter and correct RSSI measurements for gateway classification, and the results show significant improvement in accuracy, precision and recall with filtered RSSI. The results of the experimentation show the need to de-noise RSSI and that mean, and SD filter is important to correct and stabilize RSSI during the calibration phase of the fingerprinting algorithm. The filtering of RSSI approach is effective and comparative in reducing location estimate error with improved classification accuracy. In future, we intend to select a suitable machine learning algorithm for the online phase of our EMPsys and test the system to localize smartphones in real-time.