Keywords

1 Introduction

In recent years, with the advent of the mobile Internet and the rapid popularization of smart phone terminals, the lifestyle and behavior habits of people are undergoing a huge change. Nowadays, people in daily life habitually through LBS (Location Based Service) to find restaurants, banks, takeout and so on. Therefore, positioning technology has become an increasingly significant problem.

At present, GPS (Global Positioning System) is the most widely used positioning system. However, in complex indoor environments, due to factors such as weak signal, environmental noise and multipath interference, GPS’s positioning accuracy deteriorates rapidly and even cannot be located [1]. In order to solve the positioning problem in complex indoor environments, most of the solutions are based on local wireless sensor network [2]. In recent years, Wi-Fi wireless cards are installed on most smartphones, tablets and laptops, providing a ready-made hardware platform for indoor positioning. Therefore, Wi-Fi fingerprint positioning has become the most widely used indoor fingerprint positioning [3].

1.1 Problem Definition

The Wi-Fi fingerprint positioning, as shown in Fig. 1, is roughly divided into two phases: the offline phase and the online phase. In the offline phase, by recording the RSS of Access Points (APs) at the Reference Points (RPs), a fingerprint database is built in the positioning area and the positioning model is trained. In the online phase, feature extraction is performed on the real-time Received Signal Strength (RSS) of Aps through the positioning model to estimate the device position of current moment. The positioning model generally adopt machine learning methods, which can divided into classification method which discrete result can be obtained and regression method which the continuous position coordinates can be obtained. In order to simplify the problem, in this paper, we adopt classification positioning model, and we propose a positioning model based on deep learning in offline phase, which has good performance on the publicly available dataset.

Fig. 1.
figure 1

Technology of Wi-Fi fingerprint positioning

2 Related Work

In recent year, some scholars have introduced machine learning into Wi-Fi finger-print positioning, such as KNN [4], WKNN [5] and SVM [6]. In this method, it mainly training a classification model based on machine learning to extract the mapping relationship between Wi-Fi RSS and device location. Wang et al. [7] used a four-layer FCN (fully connected network) for fingerprint positioning. The network structure is pre-trained by stacking and self-encoders, and the global fine-tuning is performed by backpropagation. The feature is automatically extracted from the wavy wireless signal and linearly transformed to calculate the position of target. FCN model can achieve higher positioning accuracy and enhances system robustness, but the multi-layer FCN has too many parameters, which need a large number of data to train the parameters. More importantly, FCN does not take advantage of positional information between RSS.

If we sort each AP and RSS pairs according to the RSS value, the connection between each AP and RSS pairs and its surrounding pairs is relatively close, just like the adjacent word in a sentence. Although in other fields, deep learning model such as CNN, LSTM and Attention model perform well in sequence modeling, it is still not well applied in Wi-Fi fingerprint positioning. Therefore, in the paper, we proposed a fingerprint positioning method based on Bi-LSTM and Attention model.

3 Wi-Fi Attention Network

The overall architecture of the Wi-Fi Attention Network is show in Fig. 2 which refers to Yang’s Hierarchical Attention architecture [8]. It consists of two part: a Wi-Fi sequence encoder and a Wi-Fi attention layer. We introduce the details in the following sections.

Fig. 2.
figure 2

The architecture of Wi-Fi Attention Network

3.1 Wi-Fi Encoder

Each Wi-Fi sequence contains several APs and its RSS, drawing on the idea of text classification, we treat an AP and its RSS pairs as a Wi-Fi word, and treat each Wi-Fi sequence as a Wi-Fi sentence. Assume that a Wi-Fi sentence contains \( {\text{n}} \) Wi-Fi words \( w_{i} \), with \( i \in \left[ {0, T} \right] \) represents the \( i_{th} \) Wi-Fi word in the Wi-Fi sentence. We first embed the Wi-Fi words to vectors through an embedding matrix \( W_{e} \), \( x_{i} = W_{e} *w_{i} \). Then, we use a bidirectional LSTM [9] to get representation vector of Wi-Fi words by summarizing information from both directions, in this way, we can incorporate the contextual information of Wi-Fi sentence. The bidirectional LSTM contains the forward LSTM which get the Wi-Fi sentence \( {\text{s}} \) from \( x_{1} \) to \( x_{n} \) and a backward LSTM which reads from \( x_{n} \) to \( x_{1} \).

$$ x_{i} = W_{e} *w_{i} $$
(1)
$$ \overrightarrow {{h_{i} }} = \overrightarrow {LSTM} \left( {x_{i} } \right) $$
(2)
$$ \overleftarrow {{h_{i} }} = \overleftarrow {LSTM} \left( {x_{i} } \right) $$
(3)

We obtain a complete representation vector for as given word \( w_{i} \) by concatenating the forward hidden state hit and backward hidden state \( h_{i} \). In this way, \( h_{i} \) can summarizes the information of the whole Wi-Fi sentence centered around \( w_{i} \).

$$ h_{i} = \left[ {\overrightarrow {{h_{i} }} , \overleftarrow {{h_{i} }} } \right] $$
(4)

3.2 Wi-Fi Attention

Not all Wi-Fi words have equivalent effect on the representation of the Wi-Fi sentence. Therefore, we introduce attention mechanisms to extract Wi-Fi words which are important to the representation of the Wi-Fi sentence, and summarize the representation of those Wi-Fi words according to its significance to form a Wi-Fi sentence vector.

$$ u_{i} = { \tanh }\left( {W_{w} h_{i} + b_{w} } \right) $$
(5)
$$ \alpha_{i} = \frac{{{ \exp }\left( {u_{i}^{T} u_{w} } \right)}}{{\mathop \sum \nolimits_{i} { \exp }\left( {u_{i}^{T} u_{w} } \right)}} $$
(6)
$$ {\text{s}} = \mathop \sum \nolimits_{i} \alpha_{i} h_{i} $$
(7)

At first, we use a single-layer MLP to get the \( u_{i} \) as a hidden representation of \( h_{i} \). Then, we calculate the importance of this Wi-Fi words as the similarity between \( u_{i} \) and \( u_{w} \) which is a context vector and get a normalized weight vector \( \alpha_{i} \) through a softmax function. Finally, we calculate the Wi-Fi sentence representation vector \( {\text{s}} \) as a weighted sum of Wi-Fi words. The context vector \( u_{w} \) is the high-level representation of the Wi-Fi sentence, which is randomly initialized and learned together during training process.

3.3 Wi-Fi Classification

The Wi-Fi sentence representation vector \( {\text{s}} \) is a high-level representation of this Wi-Fi sequence and can be used as features for the Wi-Fi classification:

$$ {\text{p}} = {\text{softmax}}\left( {W_{s} {\text{s }} + b_{s} } \right) $$
(8)

Then, we choose then max p as the classification result of this Wi-Fi sequence.

4 Experiments

4.1 Dataset

We evaluate the effectiveness of our Wi-Fi Attention Network model on the public dataset of BDCI 2017 Task 1, to accurately locate the user’s store in the mall [10]. This dataset has 1.1 million training sample and 48.4 thousand test sample, the target of this competition is to locate the user’s current store accurately. The dataset including three kind of information. The first information is the user information, including user id and its historical consumption record. The second information is the shop information, such as average consumption price of this shop. The last information is the consumption information, including Wi-Fi sequence information, longitude and latitude information. In order to evaluate the effectiveness of our Wi-Fi Attention Network model, we only use the Wi-Fi sequence information. This Wi-Fi sequence information is Wi-Fi list consist of several AP information, for each AP, it has three items: id, signal and flag. Among them, id is the unique identifier of APs, it can distinguish between different APs. Signal is the RSS of APs, which vary from −104 to 0, 0 represent the max value of AP’s RSS. Flag indicates whether this AP is connected by the device. We use 80% of the dataset for training, 10% for validation and the remaining 10% for test.

4.2 Preprocessing

We split Wi-Fi sequence into Wi-Fi words as shown in Fig. 3. The original Wi-Fi sequence has ten AP information, each AP information has three parts: id, signal and flag. In order to achieve good effect, we sort each AP information in descending order according to the RSS value. Then we can get the preprocessed Wi-Fi sequence.

Fig. 3.
figure 3

The preprocessing of Wi-Fi sequence

4.3 Evaluation

We train our Wi-Fi Attention Network on the preprocessed dataset and compare it with other models. The hyper parameters as shown in Table 1, which are tuned on the validation dataset. In our model, we set the LSTM dimension to be 100, in this situation, the bidirectional LSTM produce a 200 dimensions vector. And we set the Wi-Fi word embedding dimension to be 100. For training, we use RMSProp optimizer, batch size of 128, training epochs of 20 and learning rate of 0.001.

Table 1. Table parameters of WAN.

In order to prove the effect of our model, we compare it with several methods, including traditional approaches such as KNN (Bahdanau et al. 2014) and SVM (Zhu et al. 2013) and other neural network model, LSTM, GRU (Bahdanau et al. 2014) without attention structure. The experimental results are shown in Table 2, we refer our models as WAN (Wi-Fi Attention Network), and the result show that WAN gives the best performance across this dataset on precision, recall and F1 score. From Table 2 we can see that WAN outperforms the traditional approaches KNN and SVM model by 5.0% and 4.2% respectively on precision. More importantly, WAN can significantly improve over LSTM 1.5% and GRU 1.4% on precision.

Table 2. The target of different architecture.

This result proves our hypothesis that better performance can be obtained by incorporating knowledge of Wi-Fi fingerprint sequence structure in the WAN architecture. The intuition underlying WAN is that not all parts of a Wi-Fi fingerprint sequence are equally relevant for indoor positioning and determining the significant parts involves modeling the interactions of the Wi-Fi parts, not just their presence in isolation. That is to say, if we sort each AP according to its RSS value, the connection between AP and its surrounding AP is relatively close. This relationship can help us improve the performance of Wi-Fi fingerprint positioning.

5 Conclusion

In this paper, we proposed Wi-Fi Attention Networks (WAN) for Wi-Fi fingerprint indoor positioning. In the WAN, a bidirectional LSTM is used to get a representation vector of Wi-Fi by summarizing the contextual information. Then, an attention mechanism is used to extract the Wi-Fi words which is important to the representation of Wi-Fi sequences and get a high-level representation vector. Finally, a fully connected network is used for classification. Experimental results demonstrate that WAN performs than other models on the publicly available dataset.