Keywords

1 Introduction

Location-based Service (LBS) has become one of the development and competition in the field of robots, smart home and large shopping malls. To provide more efficient services for people’s life and travel, the location information and service demand data can be fed back to the server, and the original data can be further calculated through emerging technologies such as cloud computing and big data [1, 2].

Because of the importance and role of indoor positioning, the domestic and international business circles and academic circles have carried out on the theory and application of indoor positioning. Livetti developed Active Badge applied to indoor positioning system based on infrared technology [3]. iBeacon is the first system of Apple achieving accurate indoor positioning by using Bluetooth technology [4]. Tsingoal Technology developed indoor positioning system by the new technology UWB [5]. In 2000, Microsoft put forward and developed a set of RADAR system based on Wi-Fi positioning, which can realize the continuous tracking of the position [1]. University of California put forward a new Bayesian probability algorithm of positioning model [6]. Haeberlen used the Gaussian method to model the position space of Wi-Fi signal [7].

Although indoor positioning has been researched in research and industry, there are still some problems to be solved. There has not a very stable and reliable product used in our real life. Wi-Fi is the most widely used technology in indoor wireless network, which brings great convenience to the research of indoor positioning.

2 Wi-Fi Position Estimation Algorithm

Near neighbor estimation algorithm based on signal intensity is an algorithm that can perceive near objects. When signals emitted by the surrounding wireless AP points are measured by mobile devices, the signal intensity RSSI and the physical address MAC are recorded. By comparing RSSI and MAC of different wireless signals with offline database, several fingerprint points which are the nearest to the unknown point are calculated. Then specific location is obtained though the algorithm. The basic principle is that the signal intensities of real-time signals and matched fingerprint points are calculated, and K fingerprint points which are the nearest to the unknown point are selected. The detected signal intensities are recorded as vector [rssi 1 ,rssi 2 ,…,rssi m ], where 1 ≤ m ≤ n. The fingerprint signal intensity database is established as shown in formula as followed [8].

$$ {\text{RSSI}} = \left[ {\begin{array}{*{20}c} {\left( {x_{1} ,y_{1} } \right) rssi_{11} rssi_{12 } \ldots rssi_{1n } } \\ {\left( {x_{2} ,y_{2} } \right) rssi_{21} rssi_{22} \ldots rssi_{2n } } \\ \ldots \\ {\left( {x_{p} ,y_{p} } \right) rssi_{p1} rssi_{p2} \ldots rssi_{pn } } \\ \end{array} } \right] $$
(1)

In the formula, each row vector represents a fingerprint point, so there are p fingerprint points. Each point contains the signal intensities of n different AP points.

The distance of signal intensity is calculated as shown as [5].

$$ D_{m} = \left( {\sum\nolimits_{j = 1}^{m} {\left| {rssi_{j} - RSSI_{pm} } \right|^{q} } } \right)^{{\frac{1}{q}}} ,\;j = 1,2, \ldots ,m,1 \le m \le n $$
(2)

where

  • rssi pm represents fingerprint signal intensity in fingerprint database;

  • rssi j represents measured signal intensity;

  • D m represents the distance of signal intensity;

  • q represents a natural number which is usually chosen as 2.

The Euclidean distance between the measured signal intensity rss ij and the fingerprint point in database rssi pn is calculated. The calculated distance D m is rearranged. The minimum K distances away from D m is chosen as the nearest K fingerprint points. Because the Wi-Fi signal intensity in the indoor space has the characteristic of large floating, so the selected fingerprint points may be the inaccurate fingerprint points generated by the change of the signal, which will cause a larger error in the location estimation. Therefore, the positions are distinguished according to the proportion of different fingerprints, which is shown as [9],

$$ \left( {{\hat{\text{x}}},{\hat{\text{y}}}} \right) = \sum\nolimits_{i = 1}^{K} {\frac{{\frac{1}{{D_{i} + \varepsilon }}}}{{\mathop \sum \nolimits_{i = 1}^{K} \frac{1}{{D_{i} + \varepsilon }}}} \times \left( {x_{i} ,y_{i} } \right)} $$
(3)

where

  • D i represents the Euclidean distance,

  • ε is generally taken as 0.000001 to avoid the special case D i  + ε = 0,

  • (x i ,y i ) represents the coordinates of the K selected fingerprint points.

3 Improvement of Fingerprint Database

To divide fingerprint points with similar Wi-Fi signal intensities into one class and maximize the difference between adjacent classes, K-mean clustering algorithm [10, 11] is used to classify Wi-Fi signal intensities. The characteristic function of similarity has a great influence on the clustering effect. Euclidean distance function is chosen as the standard to measure the similarity of Wi-Fi signals, as (4),

$$ {\text{d}}\left( {rssi_{i} } \right) = \sqrt[2]{{\sum\nolimits_{i = 1}^{n} {\left( {rssi_{i} - rssi_{j} } \right)^{2} } }} $$
(4)

The specific steps of K-mean clustering algorithm applied to Wi-Fi signal classification are as follows.

  1. (1)

    The target sample data that are the classified fingerprint point data n is selected as 14.

  2. (2)

    The number of divided cluster subsets K is determined. The samples selected are two laboratories and a part of the corridor in the indoor environment, so K is selected as 3.

  3. (3)

    The initial sample clustering center is selected. The center is selected randomly in the subset. To conform to the actual indoor environment, Wi-Fi signal data from each laboratory and the corridor are selected randomly as the initial cluster head rssi i .

  4. (4)

    The cluster head rssi i is taken as the center, and all the data of fingerprint points except other cluster head is traversed. Formula (4) is used to calculate the distances between all fingerprint points data and the three cluster heads. Then the points nearing the cluster head are divided into one class, which is used for the initial classification, as shown in Fig. 1.

  5. (5)

    The formula (4) is used to calculate in each classification subset until the cluster head position does not change. Then the average of all objects within the subset is updated to the new cluster head. Convergence condition of iterative calculation is shown as followed,

$$ {\text{E}} = \sum\nolimits_{i = 1}^{j} {\sum\nolimits_{{rssi_{i} \in C_{k} }} {\left| {rssi_{i} - RSSI_{i} } \right|^{2} } } $$
(5)

where

  • RSSI i represents each Wi-Fi signal intensity of fingerprint point,

  • rssi i represents cluster head value in divided subset,

  • j represents the number of Wi-Fi in fingerprint points.

    Fig. 1.
    figure 1

    Division of all fingerprint points

Since each fingerprint point has 3–6 pieces of Wi-Fi signal information, the new cluster head will vary due to different Wi-Fi signals. The Wi-Fi signal fingerprint library should have the following characteristics after the division: the difference between the fingerprint points in the classified subset is the smallest, and the difference between fingerprints in different classified subsets is the largest. This method can not only reduce the complexity of algorithm to improve operational efficiency, but also improve the positioning accuracy when fingerprint points are selected in real-time positioning.

3.1 Establishment of Fingerprint Database Based on K-Mean Clustering

After the collected Wi-Fi signals were filtered by Gaussian filter and average value of each signal was calculated, original fingerprint database is established. After the preliminary classification, the type of signal within the class is determined. Six Wi-Fi signals are contained in the class. The formed fingerprint library is shown as Table 1.

Table 1. Original fingerprint database

K-mean clustering algorithm is used to calculate the selected original signal fingerprint database. The classification number K is selected as 3. The initial seed cluster heads are selected in each cluster subset. The initial cluster heads are (24, 3) in subset 1, (28, 3) in subset 2 and (26, 9) in subset 3. By MATLAB program, the Euclidean distances between the signal intensities of the remaining points and the signal intensities of the 3 cluster heads are calculated, and the results of the Euclidean distances are shown in Table 2.

Table 2. Euclidean distances

According to the Euclidean distances, the fingerprints are reclassified. The minimum Euclidean distance of each row shows the highest similarity to the corresponding cluster head. The fingerprint point and the cluster head are divided into one class. It can be seen from Table 2 that the fingerprints after reclassification are matched with actual indoor environment, and the classification results are shown in Table 3.

Table 3. Cluster fingerprints

The Fingerprint library is divided into three subsets. The new cluster heads in each subset are calculated and the new cluster head signal value is updated to the new cluster head fingerprint point by MATLAB program. The new cluster head is represented by vector M, as shown in formula (6).

$$ {\text{M}} = \left[ {\begin{array}{*{20}c} { - 55.2233} & { - 54.4533} & { - 61.3267} & { - 56.5467} & { - 54.9033} & { - 54.4133} \\ { - 50.0567} & { - 59.9987} & { - 62.9183} & { - 62.0609} & { - 53.2154} & { - 62.0667} \\ { - 52.5740} & { - 52.8020} & { - 68.3940} & { - 62.3140} & { - 51.4940} & { - 61.0280} \\ \end{array} } \right] $$
(6)

Through the K-mean clustering algorithm, the clustering fingerprint database is established, so that the positioning system does not need to traverse all the fingerprint points in the fingerprint database and only need to calculate the Euclidean distances between fingerprints and cluster head signal intensities to selected fingerprint points from subsets and estimate distances. The clustering fingerprint database can reduce the operation time of the algorithm in real-time positioning, so that the location can be quickly obtained.

3.2 Positioning Experiment Analysis of Different Fingerprint Databases

Twenty five locations were selected randomly and the Wi-Fi signals collected at the locations are made into the test sequence. The traditional location fingerprint database, probability estimation fingerprint database and improved clustering algorithm fingerprint database are used in simulation experiments. The positioning results are shown in Fig. 2.

Fig. 2.
figure 2

Clustering positioning experiment

As can be seen from the simulation map, the positioning accuracy of the traditional location fingerprint database is lower than that of the Bayesian-Gaussian probability algorithm fingerprint database and the clustering algorithm fingerprint database. Bayesian-Gaussian probability fingerprint database is not suitable to be applied in indoor positioning because the fingerprint database needs more samples. From the positioning simulation in Fig. 2, clustering effect of K-mean clustering algorithm is obvious. In the sequence of the locations, the clustering effect can be achieved, and the positioning accuracy is high. Comparison of three kinds of fingerprint database positioning error is shown in Fig. 3. The average errors of three kinds of fingerprint positioning are 2.5355 m, 2.1423 m and 1.5821 m. It can be obtained that the traditional fingerprint location has the worst results.

Fig. 3.
figure 3

Comparison of positioning erros

Therefore, it can be found that the accuracy of Bayes-Gaussian probability fingerprint database is better, and the accuracy of the improved average clustering fingerprint database positioning is improved. Through the comparison of the positioning error sequence, the maximum error of the K-mean clustering fingerprint database is 3.28 m, and the maximum error of the traditional fingerprint database and the Bayes-Gaussian probability fingerprint database is 6.27 m and 5.51 m. The improved clustering algorithm fingerprint database based on traditional fingerprint database can greatly improve the positioning accuracy.

4 Experiment

In a real experimental environment, the Wi-Fi real-time positioning system is tested, and the performance of the positioning algorithm is analyzed. The impact of different terminals on clustering algorithm is investigated. Three kinds of Android mobile phone MiPhone, Samsung and ZTE are taken as the positioning terminals. The laboratories and the corridor are selected as experimental environment. Real-time positioning scene is shown in Fig. 4. Figure 5 shows the details of indoor positioning interface. The black cross is a symbol of positioning terminal, and left digitals represent Euclidean distance. The last column numbers indicate the selected region fingerprint database. Because of different terminal screen size, origin calibration is required at the initial stage of positioning. A laboratory corner is selected as the origin in this experiment.

Fig. 4.
figure 4

Real-time positioning sence

Fig. 5.
figure 5

The indoor positioning interface

In the real-time positioning stage, a point in the fingerprint database is selected randomly. Three different mobile terminals are selected for testing, and the actual location information is recorded. The value of WKNN_X and WKNN_Y is extracted from APP, and the probability of accumulative error is obtained by calculating the positioning error. The accumulative error of three kinds of different terminals is shown in Fig. 6. It can be seen that the fingerprint database established by using MAC classification and signal intensity clustering keeps different terminals from producing large deviation in the actual experiment.

Fig. 6.
figure 6

The accumulative error of three kinds of different terminals

From Table 4, when different terminals are used to locate, the cluster fingerprint database can ensure that the locations are in the clustering subset to reduce the positioning range and improve the positioning accuracy. By matching of the collected Wi-Fi signal intensities and the clustering fingerprint database, the location clustering subsets are obtained and the most similar fingerprint points are selected. WKNN positioning algorithm is used to calculate the real-time positioning distances which are displayed on the UI interface. Experimental results show that the K-mean clustering fingerprint database has a lower positioning error in the positioning experiment, and can achieve the indoor positioning function.

Table 4. Accuracy of clustering subset

5 Conclusion

Wi-Fi indoor positioning technology based on location fingerprint and cluster analysis is studied. Specific locations are calculated by using RSSI nearest neighbor estimation method. The positioning accuracies of different terminals are compared. It is found that the average value of the signals of mobile phones is smaller than that of the notebook. The RSSI signal intensity is used to make clustering process for the fingerprint database. The noise signal in the fingerprint database is filtered. The traditional location fingerprint database, probability estimation fingerprint database and improved clustering algorithm fingerprint database are established. By comparing the positioning error of the test data in three different fingerprint databases, the accuracy of indoor positioning is improved. Finally, the Wi-Fi data receiving module, the positioning server module and the positioning display module of positioning terminal are established, and the positioning APP is tested in the actual environment.