1 Introduction

Positioning or localization is one of the most researched domains in the recent era. GPS-based navigation (Ishikawa et al. 2008) performs exceptionally well for localizing in the outdoor environment where positioning is achieved with the help of satellites to calculate the geographic position of the device. Although it has been observed from experimentation that the GPS signal strength decreases by about 10–12 decibels as the device enters an indoor environment. Hence, various localization models are proposed by researchers to achieve the positioning of the devices in an indoor environment as. Sensors are practically used everywhere; a smartphone consists of numerous sensing sub-devices. It is crucial to develop a modeling approach for the positioning and location identification of such devices to serve various application-specific needs. Indoor localization systems have a wide range of applications; some of the most important domains are robotics (Montemerlo and Thrun 2007), augmented reality (Paucher and Matthew 2010), navigation systems (Indoor), tourism, smart home, disaster rescue operations, and many more.

The technologies primarily used in the development of indoor localization are WiFi-based, Bluetooth-based, Vision-based, Lo-Ra techniques, etc. Most of the technologies have their set of added advantages and disadvantages according to the environment. WiFi-based approaches mostly have a maximum range of 40–100 m but are prone to noise. Bluetooth, on the other hand, has a range of 100 m with less localization accuracy. Vision-based approaches require much processing and are mainly used for the purpose of surveillance (Mao 2009). Lo Ra is a modulation technique achieved on a spread spectrum used for low power wireless transmission long-range (Mroue et al. 2018). There are many advantages of using these technologies: the large reception range and less energy consumption. But the disadvantage is the same as that of WiFi-based approaches. There is no significant generalized location-based approach. Location plays one of the key roles in the selection or usage of the techniques. Inertial-based technique for localization is gaining importance, and researchers are working with Inertial Measurement Units (IMU) (Skog 2006) to achieve indoor positioning. The basic IMU components are accelerometer, gyroscope, and magnetometer, which are widely used for many real-world applications, but the precision is limited due to cumulative errors by IMU drifts.

It has been estimated that the indoor localization market in the near future will be worth more than 20 Billion dollars (El-Sheimy and Li 2021). However, localization in an indoor environment comes with its share of challenges. There are very few research papers (Harle 2013) that provide a survey on inertial sensing through pedestrian dead reckoning approach. An extensive study of the RSS fingerprinting-based approach with a focus on inertial heading estimation is given by Davidson and Piché (2016). Another highly cited work is presented by Yang et al. (2015), discussing the intricate hardware components and the types of sensors used. Their work also focused on how mobility enhanced smartphones aids in localizing. The majority of the past survey focuses on listing the works on human activity recognition (Chen et al. 2017; Mimouna and Khalifa 2021). Buke et al. (2015) presented a survey covering the various work done on healthcare monitoring using inertial sensors. We have also observed that the number of surveys focused on smartphone IMU-based indoor positioning is scarce, although there are numerous survey work focused only on Pedestrian Dead Reckoning (PDR) (Yuan et al. 2019). Thus, our motive behind this survey lies in providing the reader with a systematic overview of smartphone based localization approaches, discussing the challenges of smartphone IMUs, and how current research works tackle the problem through exploring combination of various smartphone sensors. The present article briefly describes working methodology, techniques, and a comparative study on various works to address indoor localization with a major focus on smartphone IMU-based approaches. The overview of the paper and our key contribution are as follows:

  • The paper provides a brief overview of the inertial measurement approaches in indoor positioning, focusing on their working and how they aid in localizing.

  • The work discusses the various issues and challenges in smartphone IMU and highlights the work done using Machine Learning-based approaches in tackling the challenges.

  • The paper agglomerates various research on IMU approaches and presents a comparative-based study on the discussed approaches. Open issues are also discussed.

The paper is organized in the following manner. Section 2 presents a literature survey in the domain of Indoor Positioning(IL) using the technologies that are generally used. Section 3 gives an overview of the use of inertial sensors in localization and discusses the issues and challenges. We have also listed the machine learning-based approaches in addressing the challenges in smartphone IMU in Sect. 4. An agglomeration of the last 10 years of work in the indoor positioning domain is covered in Sect. 5. Finally, the works conclude in Sect. 6.

2 Literature review

Indoor Positioning techniques have become increasingly essential and find usage in many industries where positioning is required. The common technologies used in positioning include Bluetooth Low Energy (BLE), WiFi, UWB, Lo Ra, and so on, on which techniques such as RSS based fingerprinting, TDoA based approaches are carried out for localization.

2.1 RSS based approaches

2.1.1 WiFi fingerprinting

In an indoor environment, WiFi is one of the significant ubiquitous technologies that are available everywhere. In this approach, the first objective lies in calculating the Received Signal Strength (RSS) fingerprints. An RSS fingerprint is the received value of WiFi signals from various Access Points (APs) in a particular area. The recorded RSS from different WiFi APs for that specific location is stored in a database. A simple example of WiFi fingerprint is depicted in Table 1, where we observe that the signal values in decibel are recorded against each of the APs for a particular location. There is much literature (Panja et al. 2021; Yuanchao et al. 2015) pertaining to localization based on Wifi-based fingerprinting, but the major problem with this procedure is performing the site survey and creating the Radio Map(RM). Another significant difficulty in this approach is the calibration time.

Table 1 Fingerprint records from multiple APs

A calibration less indoor positioning approach is given by Masimo Ficco in (Ficco 2014). The sole objective of their approach is to reduce the manual effort involved in site survey. A pre-processing is carried for radio map modeling by drawing virtual rectangular grid. For each cell of the grid, the WiFi fingerprint is calculated (Fig. 1).

Fig. 1
figure 1

WiFi Fingerprinting

A workflow overview for the radio map construction can be found in (Ficco 2014). For every cell \(\{c_j\}\), distance from the center of the cell to the sensor or any mobile device is computed. A function to compute the objects in the line of sight from the centre to the device is also calculated. Taking into consideration these parameters an average RSS value for that cell \(\{c_j\}\) is calculated and is stored as a fingerprint data for that cell in a vector \(\{F_{i}={f_{1}, f_{2},\dots \ ,f_{n}}\}\); where \(\{F_{i}\}\) is the WiFi fingerprints for cell i from various n APs. As a person traverses from one cell to another, the position estimation of the device is carried out by simply taking a difference (Eq. 1) of the distance from the measured RSS value to the recorded RSS fingerprint for the cell, mathematically denoted by:

$$\begin{aligned} d=\sqrt{\sum _{k=1}^{n} (ss_k-f_{i,k})^2} \end{aligned}$$
(1)

where \(ss_k\) is the measured RSS value and \(f_{i,k}\) is the recorded RSS value at location i from cell k. The motivation of the authors was to calculate and compare the accuracy and predictions against the other models that require calibration. They have claimed that they have achieved an accuracy of 1.5 metres.

Fig. 2
figure 2

Received Signal Strength (RSS) variation with movement

One of the significant difficulties in designing an indoor localization system using WiFi is the variation in RSS values due to different environmental features. Factors like temperature, humidity, mobile hot spots attenuate signals. Furthermore, the variation of RSS with distance is not linear. The relationship between the RSSI and distance (Botta and Simek 2013) can be observed from the following Eq. 2.

$$\begin{aligned} d= 10^{(P_0-F_m-P_r-10 \times n \times \log _10 f+30 \times n-32.55)/10 \times n} \end{aligned}$$
(2)

Here \(F_m\) is fade margin,n is path-loss exponent,\(P_r\) signal power in dBm,f signal frequency in Hz. A simple variation is displayed in Fig. 2. It can be seen that the signal strength decreases abruptly as the device moves away from the AP, but at a certain distance, it is found to increase again; hence, location prediction accuracy also varies. It is seen from Fig. 2 that as the user moves away from the AP source, there is a no-wall region after a certain distance. Hence, attenuation caused by the wall reduces, thereby leading to an increase in the RSS value.

Roy et al. (2019) have proposed a smartphone-based localization approach that gathers the WiFi fingerprints that are subject to spatial and temporal domains and device-oriented as well. A WiFi data collector for recording the fingerprint data has been utilized to record the RSS values of each grid location. Virtual grids are constructed for each floor of a building where each grid cell is of the area 1 \(\times \) 1 m. The recorded data is pre-processed and is assigned a class with respect to a location. The data is divided into its respective training and test examples and is fed accordingly to a supervised machine learning classifier i. The authors have worked with classifiers like the J48 decision tree, KNN, K*, Bayes Net, SVM with accuracy and error in meters as the metric evaluation of the classifier result. The authors have worked and carried out the experiments using various mobile devices. The trained examples are recorded over a few weeks, after which the test data were recorded for prediction analysis. Finally, the authors modeled a kNN based conditional ensemble approach that considers each testing data set and their prediction results which is compared with the other classifier. For example, if most of the prediction approaches have classified the RSS data to a particular cell \(c_i\), then the output result is taken as \(c_i\). With the integration of the ensemble approach, the authors have claimed to have reached an accuracy of 91% prediction accuracy. Kumar et al. (2022) proposed a feature-based training pipeline focussing on the reduction of the APs and proposing a feature based ensemble approach. The proposed work as claimed by the author is capable of giving appreciable accuracy for any dynamic floorplan with mean absolute error of 2.68 m. Another novel ensemble based approach proposed in (Roy et al. 2021) based on Dempster-Shafer (Shafer 1992) belief theory. The authors have tested the performance of their model on JUIndoorLoc dataset, using different training and testing context and devices. The tested accuracy has shown to be more than 95%.

Félix et al. (2016) have proposed a similar localization using Deep Learning methodology. The radio maps of each floor location are recorded and stored on a server. The recorded RSS vector from the APs is sent to a server for location prediction. As a user requests a location from an unknown location x, the signal values from all APs are recorded and sent as a vector to the server to calculate the unknown location x. It has been observed that in an environment that succumbs to persistent changes with respect to the objects in the environment or the WiFi source itself, it becomes challenging for the classifier to maintain high performance in the prediction approach. On the other hand, deep learning algorithms have ways of identifying high levels of features and learning from them. The authors have considered a 40 \(\times \) 15 m floor with 16 rooms and with 6 APs. In the very same manner as (Roy et al. 2019), virtual grids are formed with 80 reference points 2 m apart are considered. The authors have executed the experiments using Deep Belief Network(DBN), Deep Neuronal Network(DNN), and Guassian Bernoulli DBN and presented a comparative study amongst the prediction outputs. The authors have observed that DNN achieves an accuracy of 1.00598 m while the other two achieved 2 m.

2.1.2 Bluetooth low energy

Another alternative usage of the RSS-based approach is the use of Bluetooth Low Energy (Gomez et al. 2012). Apple inc. has revolutionized the world of indoor positioning using smartphone devices using BLE technology called iBeacon. With the advent of BLE technology, energy consumption is vastly reduced, and hence, the system can efficiently operate without relying on external power supplies. BLE has a license-free band with a 2.4GHz frequency. Messages are sent in BLE in a concise, flexible manner. The main motive behind using a BLE beacon rather than a WiFi is the easy deployment due to their size, and they allow suitable signal geometries for radio positioning. BLE tags are used for indoor localization with the utilization of RSS and AoA.

A simple BLE-based approach is given by Kalbandhe and Patil (2016) where the RSS from BLE tags is utilized for indoor positioning. They have used the BLE CY8CKIT-042 (Kalbandhe and Patil 2016), whose RSS is measured. An Android-based application is used to record the RSS values from the BLE tags. The BLE tags are positioned at a particular location on the floor. The BLE tags transmit signals at fixed intervals. A mobile device captures these signals. The positioning is done by estimating a distance parameter to the measured RSS values from the BLE tags in an area. The distance parameter indicates the position of the mobile device on the floor. The accuracy is measured concerning signal attenuation and noise that increases with signal strength.

Another RSS-based approach is proposed by Huh and Seo (2017) where the signal strength of Bluetooth tags with a range average algorithm is used to estimate the positioning. The authors have divided the area into multiple unit spaces of hexagonal structure. Bluetooth beacons are placed at each corner of a hexagon, and another beacon is placed in the middle. The authors have modeled the path loss as a function of the distance between the transmitter and receiver as given in Eq. 3. \(T_x\) is the transmission signals strength, d is the distance between the transmitter and the receiver. The trilateration (Thomas and Ros 2005; Yang et al. 2010) procedure is done to estimate the position in a hexagon. An Android application records RSSI values from the various Bluetooth beacons and pass the data to an indoor location-based Server. The server estimates the location using Trilateration by selecting the target points as depicted in (Huh and Seo 2017). After estimating the position, a coordinate management module displays the mobile device location in a GUI-based interface.

$$\begin{aligned} RSSI= -(10nlog_{10}d-T_{x}) \end{aligned}$$
(3)

2.2 Time difference based approaches

Ultra Wide Band (UWB) (Siwiak 2001) based approaches are gaining the limelight in the recent technological era. For devices within the short-range, UWB uses low energy with high bandwidth for communication. Hence, a large amount of data can be transferred in a wide spectrum of frequency bands. Ultra wide-band broadcasts digital signals and is coordinated on a carrier signal across a vast spectrum at the same time. Transmitter and receiver must be coordinated to send and receive pulses accordingly. Time Difference of Arrival (TDoA) (Cong and Zhuang 2002) is a technique where the time of arrival of a signal at the receiving stations is calculated, which are physically positioned at different locations with time reference synchronized. TDOA based approach provides better accuracy for range-based technologies such as UWB. Although TDOA works better with ranged-based methods, synchronization is required between the transmitter and receiver before data transmission occurs. Figure 3 depicts synchronization between the transceivers as \(R_x+T_x\).

Fig. 3
figure 3

Synchronization in TDOA

A time hopping impulse (Bergel et al. 2002) radio-based localization has been proposed in (Zhang and Ahao 2005) by Zhang et al., where they have performed a TDoA approach on multiple antennas. The anchor antennas are physically placed at different locations. A signal is measured from the object’s position to more than one receiver. The time difference from receiver to object is converted into hyperboloid (H1, H2, H3) as depicted in Fig. 4 with a constant distance between two or more receivers. Synchronization is required in the receiver’s clock to estimate the positioning.

Fig. 4
figure 4

Hyperboloid representation of TDoA based Positioning

Gentner and Jost (2013) proposed an indoor positioning using multipath propagation with a focus on proposing a model in localizing when there is an insufficient number of receivers. Virtual transmitters/receivers are considered, which are placed physically separated from each other concerning positioning. The authors have utilized the Simultaneous Localization and Mapping (SLAM) (Gamini et al. 2001) algorithm, where the virtual transmitters are treated as landmarks and hence, the position finding of the receiver and the landmarks are done at the same time. The algorithm developed by the authors first estimates various parameters of multipath propagation, like the angle of arrival, amplitude, delay in transmission. The TDoA (Friedman et al. 1989) is used to calculate the time difference between the receiver and the object in multipath components. A Kalman Filter (Welch and Bishop 1995) which we know is a linear quadratic estimator(LQE) is used to track the time-variant behaviour of the parameters for multipath propagation. A time difference estimation is carried out between the object and the transmitter/receiver which is defined by a hyperbola as depicted in the Fig. 4 on which the object is located with foci at the transmitters. The SLAM approach calculates the positioning of the virtual transmitters.

Xue et al. (2018) proposed a TDoA model that considers asynchronous UWB Signals, i.e., a time difference-based approach without any synchronization. A one-way ranging model is proposed with consideration of a reference node. The sensor nodes deployed are anchor nodes, reference nodes, and target nodes whose positions are to be predicted as depicted in Fig. 5.

Fig. 5
figure 5

TDOA without Synchronization

The anchor node receives signals from both the target and the reference nodes. The position of the anchor nodes and the reference nodes is known; only the target nodes have to be localized in that floor. The authors have modeled the approach in a way such that the anchor nodes as depicted in Fig. 5 record the time stamp from UWB signals coming from the reference nodes and target nodes. An analysis of determining the interval of arrival between the two signals is done by the help of a server. The anchor nodes send the received information to a server. Interpolation is applied to determine the mapping values without considering the clock synchronization of the anchor nodes. A time difference in the received signal from the target and the reference nodes to the anchor nodes is based on mapping values. The authors in (Xue et al. 2018) have estimated the position of the target nodes using the least square approach on the time difference.

2.3 RTT and AoA based approaches

The WiFi Round Trip Time(RTT) is one of most popularly used approaches in estimating the positioning in an indoor environment. A Round Trip comprises of sum of the time required for the data packets to reach the destination and the acknowledgement to be received at the source. With a multilateration algorithm a distance from an AP and hence the location of a mobile device can be identified for a particular floorplan. Arrue et al. (2010) presents a localization approach using Impulse Radio Ultra wide band(IR-UWB) approach estimated using the RTT mechanism. The methodology follows two steps; in the first step the distance from the fixed transceivers or APs are estimated using the RTT ranging method. The second step is the positioning or localization carried out using Least Square method (Chen et al. 2005). The authors have claimed to have achieved appreciable accuracy in a room setup. Cao et al. (2020) proposed a RTT based approach that addresses the 3D positioning problem for an given floormap. The authors have investigated approaches such as Weighted Centroid (Wang et al. 2011), Least Square methods and have compared it with their proposed metaheuristics approach. A combination of both RSS based as well as ranging based approach is given by Hashem et al. in (Hashem et al. 2020a). The proposed approach is able to function without any clock synchronization in RTT mechanism and reported localization error is 0.86m. The same authors have published another RTT based positioning and applying Deep Learning methods. The proposed approach is named as DeepNar (Hashem et al. 2020b). Their proposed approach claims to have addressed multipath interference and attenuation problem with submeter accuracy 0.75 m.

Angle of Arrival is another popular positioning method where the estimation of direction of signal reception is carried out. BniLam et al. 2017) proposed a positioning of a IoT transceiver on a given floorplan using an adaptive beam forming AoA estimation. The proposed experiment was conducted on 6.45m \(\times \) 9.47m room, where the authors received appreciable accuracy in the middle of the room which slowly got degraded near the walls. A fusion of RTT and AoA based approach is given by Dakkak et al. (2011). The authors have applied RTT to address the problem of time synchronization. The mobile stations are localizalized using a coordinate clustering mechanism with respect to the base stations deployed for a particular floormap.

3 Inertial sensor based localization

RSS, TDoA, Round trip Time(RTT), AoA, etc. are some of the major localization approaches used in indoor positioning. An article published by Roe Melamad (Melamed 2016) of IBM labs has covered the challenges faced by the above-mentioned approaches in indoor positioning. Some of the strengths and weaknesses of the positioning techniques are covered in Table 2. The core part of any localization approach is the sensors. In the present era, smartphone-based approaches are gaining importance as smartphones are bundled with fundamental sensors, like accelerometers, gyroscopes, magnetometers, WiFi transceivers, etc. The approaches like RSS, RTT, TDoA are mainly constructed for systems built on radio frequency(RF) based, optical, or acoustic-based transceivers. The frameworks are highly dependent on the base stations or access points. The range and power of the stations are other restricting factors in the development process. Furthermore, the approaches built on RF or optical-based are mainly focused on localizing an individual or device for a particular floormap. To navigate on the selected floormap, inertial measurement units(IMU) sensors and their associated approaches need to be explored. IMU-based approaches can also be clubbed with existing technologies such as WiFi, Bluetooth, etc., to produce fusion-based approaches for better accuracy.

Table 2 Overview of Strength and Weaknesses of some wireless techniques

3.1 Framework of inertial sensing for ILS

IMU (Skog 2006) are vastly used in numerous applications throughout the globe. The fundamental basic sensors that make up the IMU are accelerometer, gyroscope, and magnetometer. Bosch Sensortec manufactures a wide range of IMU sensors for advanced consumer electronics applications in smartphones, watches, etc. Some of the products are BMI270Footnote 1, BMI088Footnote 2 , etc. These sensors give a relatively moderate to a good level of data which can be used for further analysis. For indoor localization system, one of the major focus lies in gathering various data from group of people on a particular floorplan, also termed as crowd sensing (Chenshu et al. 2014). With the advent of smartphones, users have researched and have created various procedures that utilize the mobility factor of the smartphone for the purpose of localization. Smartphones come with IMU components that can be modeled to gather data to provide localization, create trajectories, etc.

IMU sensors are used both in the domain of localization as well as indoor outdoor detection (Ubiquitous 2021). The fundamental area of positioning, i.e. the floor, on which localization has to be done, is also called the floorplan. Developing floorplans can be broadly classified into the manual and automatic floorplan. Developing a manual floorplan requires a great amount of active human interaction for estimating the floor. On the other hand, researchers are working on developing various automatic floorplan approaches where technologies can be utilized to estimate a floor structure. IMU-based approaches are greatly used in developing floorplans (Shin et al. 2011; Peng et al. 2018) using smartphones. The users’ trajectory can be analyzed by reading the IMU data. Such kind of floor layout construction using real-time IMU data is called building tomography (Tan et al. 2017). A simple overview of floor construction using IMU data can be understood from Fig. 6. Here, we observe the mobility of a person that can be used to gather data about the floor. A change in direction might indicate a corner, while a long stretch of variation without any turning can indicate a corridor, and so on. The change in the direction is estimated using the gyroscope angular velocity estimated across the 3-axis. From the figure, it can be observed that a sharp downward change in the gyroscope value along the Z-axis indicates the right turn; here Z is the dominant axis. Same reading can be observed across the other axis(X and Y) but with less amplitude value. While considering IMU data comprising of accelerometer, gyroscope, and magnetometer, the positioning has to be carried out in the generated floorplan itself. It is well known that accelerometer data generates a lot of noise which influences the output. Thus, errors may also increase rapidly. These errors have to be filtered and the drift has to be corrected. The procedure by which the positioning of a device is calculated in a floor plan, considering the current position and velocity to determine the next position, is called Dead Reckoning (Steinhoff and Schiele 2010). A generalized IMU-based positioning framework is given in Fig. 7.

Fig. 6
figure 6

Floor estimation using Inertial Sensors

Fig. 7
figure 7

Inertial Measurement Based Positioning Framework

3.2 Pedestrian dead reckoning

Pedestrian Dead Reckoning(PDR) (Beauregard and Haas 2006) is a fundamental procedure of localization involved in the positioning of pedestrians for a particular floor plan by using IMU data. Any simple Dead Reckoning system consists of the following procedures: step detection, length of the step estimation, and direction estimation. The Dead Reckoning starts from a known position in the floor plan. Stride or step detection is usually carried out with the help of accelerometer data and direction estimate using a gyroscope or magnetometer or a combination of both. In the modern-day scenario, almost every individual has a smartphone. The smartphones are equipped with inertial sensors as Microelectromechanical systems (MEMS) (Farbod and de Silva 2012; Ashraf et al. 2020) sensors. With the need for efficient motion tracking and positioning, the MEMS inside the smartphones can be utilized and modeled for the development of the various application.

While modeling localization technique using inertial measurement unit in smartphone, one of the prime factor is the step length detection. When a user walks, an impulse is noted along the X, Y and Z axes of the accelerometer according to the orientation. A simple overview of step length estimate is performed by considering a difference between adjacent accelerometer readings on the axes. Continuous variation in the accelerometer is a firm indicator of step detection. The raw measured accelerometer \(Acc_x,Acc_y,Acc_z\) across three axes are gathered and the magnitude of the accelerometer data is usually considered for the processing which is depicted in Eq. 4.

$$\begin{aligned} Mag_{Acc}=\sqrt{{Acc_x}^2+{Acc_y}^2+{Acc_z}^2} \end{aligned}$$
(4)

As a person moves a step is detected by monitoring the crest and trough of the accelerometer signal data. A simple procedure of step detection is given by Abadleh et al. (2017). The Magnitude of acceleration is calculated using Eq. 4. A running average is calculated over the magnitude \(Mean_{Acc}\). A Net acceleration vector is formed by subtracting the \(Mean_{Acc}\) from the \(Mag_{Acc}\) vector; i.e. \(Net_{Acc}\)=\(Mag_{Acc}\)-\(Mean_{Acc}\). This procedure is a simple average filtration performed to detect the peaks in the accelerometer reading. A vector \(\alpha \) (Eq. 5) is filled up with three values to distinguish the real peak values from the fake peak values.

$$\begin{aligned} \alpha _i= {\left\{ \begin{array}{ll} 0.5,&{} \text {if } Mag_{Acc(i)}<Mean_{Acc}\\ 1.0,&{} \text {if } Mag_{Acc(i)}=Mean_{Acc}\\ 1.5,&{} \text {if } Mag_{Acc(i)}>Mean_{Acc}\\ \end{array}\right. } \end{aligned}$$
(5)

A vector step (Eq. 6)is calculated which holds binary data [1, 0], 1 if a particular step is detected, 0 if it’s not a step. This is done by parsing through the \(\alpha \) vector.

$$\begin{aligned} step_i= {\left\{ \begin{array}{ll} 1,&{} \text {if } \alpha _i=\alpha _{i+1}=\alpha _{i+2}=\alpha _{i+3}=1.5\\ 0, &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(6)

Here, we have considered that a peak is detected which is an indication of a step taken only if we get a four consecutive values in the \(\alpha \) vector, i.e., \(\alpha _i\), \(\alpha _{i+1}\), \(\alpha _{i+2}\), \(\alpha _{i+3}\). The above condition may vary accordingly. Figure 8 gives an understanding of the step detection process, where we have plotted the Magnitude vector \(Mag_{Acc}\) and the step vector.

Fig. 8
figure 8

Step detection

From Fig. 8 the natural peaks can be distinguished from the fake peaks. An effective auto step detection procedure that estimates peak detection in accelerometer reading is proposed by Ying et al. (2007); Kim et al. (2004). The step length estimation can be classified as fixed or variable length. If the step length is considered to be fixed, then the complexity is less compared to the dynamic step length evaluation. The PDR approach estimates the position by considering a current position estimate and combines it with a Step length(SL) and a heading angle. The following Eq. 7 gives a simple 2D positioning estimate to calculate the next step position.

$$\begin{aligned} \begin{bmatrix} x_i\\ y_i \end{bmatrix} =\begin{bmatrix} x_{i-1}\\ y_{i-1} \end{bmatrix}+SL_i*\begin{bmatrix} cos(h_i)\\ sin(h_i) \end{bmatrix} \end{aligned}$$
(7)

\(h_i\) is the heading angle, x and y are the location variables. The heading angle \(h_i\) is calculated from the gyroscope reading. A turn is detected as depicted in Fig. 6 where a visible change of crest or trough can be observed. Some more insight into the heading estimation and angle can be found in (Fischer et al. 2012). The PDR process overview is demonstrated using a block diagram in Fig. 9.

Fig. 9
figure 9

PDR Procedure Overview

3.3 Issues and challenges of smartphone IMU

Inertial Measurement Units comes with its share of problems that must be dealt with before it can be used for navigation or localization purposes.The issues can be broadly classified as follows:

  • External Forces and Drifts A simple IMU in its stable stationary position measures some forces in the inertial frame due to the earth’s gravity which causes a drift in the IMU. Furthermore, the earth’s centrifugal force, which is caused due to the rotation of the earth, causes a position error of 0.5, (Fischer et al. 2012). A slight bias drift of 0.1 deg /s is included to negate the rotation effect in many present MEMS devices. A collective error from alignments and linearities from the gyroscope reading also causes a calibration error in the gyroscope reading, hence inducing a change in the gathered data.

  • Ferromagnetic Effect Interference in magnetometer devices are caused due to the presence of ferromagnetic substances in the wall and objects present in the environment, usually classified as hard iron interference and soft iron interference.

  • Effects of Temperature With temperature fluctuation, some bias value is induced in the IMU that causes modification in the orientation reading.

  • Noise The electronic MEMS devices inside the smartphones are subjected to random flickering or noise that makes the gyroscope wander over time. The variation of the noise affects the MEMS at low frequency.

The IMU sensors are not perfect. Hence, the measurements are corrupted due to a constant bias induced, and with integration, a drift in the actual reading takes place, which increases linearly as one progresses. A bias can be defined simply by the difference between the input and the output value. Sometimes with the increase in temperature, the sensor overheats. This modifies the bias value in turn. The dead reckoning performed with the help of IMU double integrates the result from accelerometer and gyroscope data to determine the orientation and positioning. The magnetometer in IMU is used to estimate the magnetic field of a particular location. The magnetometer data are fused with the gyroscope to estimate the absolute orientation. When building a localization scheme using IMU inside the smartphone, several challenges have to be dealt with as the process goes. Some of the crucial challenges are activity tracking, Zero Velocity Update(ZUPT), gait analysis, step estimation, heading and orientation, device heterogeneity.

3.3.1 Activity tracking and ZUPT

One of the significant challenges that have to be overcome during the localization process using smartphone IMU is activity tracking. The simplest of activity detection is the step detection process itself. However, the activity parameters affecting the localization process are much more than just the step detection process. A person might be standing, running, jogging, or walking, depending upon which the gait changes, affecting the step length. One of the significant problems with the inertial sensor is the detection of Zero Velocity(ZV) (Skog et al. 2010). Zero velocity occurs when the horizontal acceleration of a person is zero. This usually occurs in both cases of standing still as well as walking. During the walking phase, when one leg of a person is carrying the whole body weight, and the other leg is swinging to its next step position, a Zero Velocity occurs, which has to be corrected and updated in the gathered data of accelerometer, this update is called Zero Velocity Update(ZUPT) (Fischer et al. 2012). ZV can be corrected with the help of a Kalman Filter. Researchers have also proposed various approaches that gathers knowledge about human walking pattern (Fischer et al. 2012) to detect the stance ZV phase during walking.

3.3.2 Gait analysis and step length estimation

The walking of an individual is not static. It changes with each step. Furthermore, the walking pattern or gait change is different for every individual. A simple overview of gait change can be observed from Fig. 10 where the step length of an adult man, woman, and an older adult is given. A step length estimation is crucial to calculate the next step position in the localization process. The problem with fixed step length is that when a particular step is detected, the length of each step is fixed beforehand. Hence, the localization accuracy is decreased. Dynamic step length (Shin et al. 2007) estimation is preferred, which takes into account various parameters, such as gait change. Here, the step length is evaluated based on the pedestrian’s speed and current state.

Fig. 10
figure 10

Varying gait change

3.3.3 Heading and orientation

In the case of smartphones, as individuals move, there are dominant axes regarding the smartphone’s orientation. The phone can have any orientation when held in hand, but it is mostly classified into three generalized modes- texting mode, swinging mode, and running mode as depicted in Fig. 11. Identifying the smartphone’s orientation or, instead, the MEMS device is a significant challenge that has to be dealt with before proceeding with localization. The three canonical orientations are \(\phi \),\(\theta \),\(\psi \) (roll, pitch, yaw), which are estimated with the help of gyroscope data.

According to the smartphone’s orientation, some amount of yaw, pitch, and roll are applied, which are used to predict the angle of rotation. Researchers have opted for the methodology of fusing the data with the help of a Kalman Filter (Bayesian filter) (Barker Allen et al. 1995), Particle Filter (Gustafsson 2010) to better estimate the orientation and direction.

Fig. 11
figure 11

Smartphone orientation

3.3.4 Device heterogeneity

The MEMS devices inside the smartphone come from various manufacturers. The reading on the inertial sensors may vary from model to model. Hence the bias value and the error correction also varies accordingly. Device heterogeneity is a significant challenge; hence the development of localization algorithms should be such that it should work in a constrained-free manner. A survey of combination of various approach is presented by Baldini and Steri (2017) to address the challenge of device heterogeneity using smartphones.

4 Application of machine learning in addressing the challenges in IMU

The interpretation and time integration is complex in the measurement process of IMU as the work has to be carried out using the moving coordinate system. Thus, as discussed in the previous Sect. 3.3, the challenges pertaining to the measurement process have to be appropriately handled before localization can be carried out.

In this section, we have discussed several machine learning-based approaches in addressing some of the generalized difficulties faced by Smartphone IMU sensors. Machine learning methods have also emerged as a new promising direction in the same domain of IMU. Stride length is one of the essential factors that control the drift error in the IMU sensors. One of the significant challenges lies in separating or identifying the actual detected steps from the non-step signal readings. Sometimes, the body acceleration in the rest frame can be misread as an identifiable step which can cause drift in the system

Ngo et al. (2014) proposed a data collection of positive and negative steps and classifying it with a machine learning model. The features such as mean, standard deviation, and energy are evaluated from three axes of the accelerometer reading without removing the gravity factor. The features are labeled with 12 locomotion comprising of various activities, out of which 3 are positive locomotion or step. The authors have tested their collected data using SVM and Decision Tree classifier and have reported having appreciable accuracy. Zhang et al. (2018) proposed a threshold-based peak detection and zero-crossing detection. The proposed approach adapts to the different device orientations, which is done by using an Extreme Learning Machine (ELM). This aids in understanding the heading direction. The proposed PDR work is able to bring down the error deviation in the range of 1.17–1.73 m on their selected floor map. Yao et al. (2020) proposed a robust step estimation using the Random Forest Classifier. The authors have estimated the features such as Mean absolute error, Kurtosis, Mean Square frequency, correlation, and so on from accelerometer and magnetometer data.

Understanding user walking patterns is a challenging affair. We can find the use of a hybrid deep learning approach combining Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) in (Kang et al. 2018) to detect walking patterns and estimate the dynamic change in velocity. The training has been carried out using 9 different devices to make the system more adaptable to device sensitivity. Non-linear features from inertial data were extracted using multiple 1D CNN layer which is further fed into multiple RNN to estimate the velocity and identify the signal pattern. Chan et al. (2017) proposed a fusion-based approach combining fingerprint with a pedometer to calibrate the PDR direction and step count. The authors have used autoencoders, a typical neural network-based approach and capable of learning efficient coding of unlabelled information. It is crucial to estimate the context of the smartphone device and approximate the degree (Feigl et al. 2018) of turn pertaining to the 3-axis. An LSTM based network proposed by Wang et al. (2019) extracts temporal features from accelerometer and gyroscope sensors which are passed to the learning model for stride estimation. The stride length error rate is 4.59% at 80% confidence level. The authors proposed a based heading estimation with training data collected in four different smartphone orientations- Holding, Swinging, Calling, and inside pocket. The Principal Component Analysis (PCA) combined with an angle deviation approach is used to detect the heading. Table 3 gives an overview of some of the works carried out using a machine learning-based approach to handle the challenges in smartphone IMU.

Table 3 Machine learning based smartphone IMU correction

5 Insight into smartphone IMU based indoor positioning

This section discusses the current research scenario and some research works carried out in indoor positioning using smartphone IMU. We have classified the inertial-based approaches into the following two parts.

  • Non fusion based approaches In non fusion based approaches, the focus lies in discussing about the works done in the field of localization only utilizing the inertial sensors inside the smartphones.

  • Fusion based approaches In fusion based approaches, we have given an in-depth survey of localization scheme utilizing fusion based methodology by combining smartphone inertial measurements with the Wifi, Bluetooth, UWB, Light sensor etc. The section for Fusion based approaches also provides an overview of the working methodology adopted by the researchers.

5.1 Non fusion based approaches

In (Kang and Han 2014), Kang et al. proposed a smartphone-based localization approach using PDR. This involves step length estimation, step detection, and finally, direction estimation. A detailed insight into the PDR approach can be found in (Kang and Han 2014). Sun et al. (2015) proposed an Indoor Positioning based approach that relies solely on the IMU. Such techniques are required in areas where there is no connectivity of WiFi or any other RF source. The author has divided the entire walking procedure into numerous segments. As the user walks, the change in the locomotion causes accelerometer data to change suddenly. It has been observed that sudden spikes arise as a person walks. The authors fitted the accelerometer data using a standard sine wave to cope up with the gait changes. The rotation angle and velocity data are collected from the magnetometer and are fused together using an average filter. One such combination of the approaches can be found in (Abyarjoo et al. 2015). A Kalman filter is utilized to remove the drift in the inertial reading. Four features, namely, time of the peak, value at peak, time of trough, value at the trough, are extracted from every step in the sensor. A dynamic step length is considered for step estimation. The step length (SL) estimation is done with the Eq. 8 given below:

$$\begin{aligned} SL=\root 2.7 \of {\frac{\sum _{i=1}^{N} a_i}{N} .\sqrt{\frac{m}{\sqrt{\delta k \times a_d}}}} \end{aligned}$$
(8)

\(\delta k\) is the interval of time between peak and trough, N is the number of samples taken, \(a_d\) is the drop in value between a peak and trough. The authors have also considered gait change with an empirical parameter m which is not same for male and female. For male, the value of m is considered to be 750, and for female it is considered to be 630. A 2D dead reckoning is performed for location estimation (Equation  ). The heading angle estimate is calculated by combining gyroscope and magnetometer data. The authors have claimed to have reached an accuracy of 1.96 m.

In (Li et al. 2012), a novel end-to-end infrastructure less indoor positioning approach is proposed by Zhao et al. As the inertial measurement unit suffers from drift problem, the authors have estimated the step length using a direction estimator. The information from the accelerometer, gyroscope and magnetometer are amalgamated using a particle filter to calculate the positioning. A motion API is utilized to calculate the direction of the smartphone (microsoft xxxx). In (Li et al. 2012), the authors have claimed to have achieved an accuracy of 2.9 m when the smartphone is placed in the pocket and an accuracy of 1.5 m while smartphone is in the user’s hand.

The heading estimation of smartphone is difficult as it involves drifts and various errors. An approach is proposed by Qian et al. (2013) where Principal Component Analysis(PCA) is utilized for heading offset determination. It is gathered from the yaw angle of the smartphone and a particle filter is used for tackling the drift problem of the inertial reading.

5.2 Approaches based on fusion of different signal modalities

Researchers have worked on different fusion based approaches that combine inertial based approaches with the conventional Location-based services(LBS) implemented using WiFi, Bluetooth etc. In (Wang et al. 2016), Wang et al. proposed a positioning scheme using WiFi RSS and PDR. The authors have developed a landmark based positioning using WiFi Fingerprinting received from numerous sites. The proposed algorithm starts with step detection and direction estimation. For step detection, the step length is calculated in a manner similar to the one described in (Abyarjoo et al. 2015) with some changes as depicted in the Eq. 9. This equation is a modification of the approach proposed by Weinberg (2002):

$$\begin{aligned} SL = k \times \frac{\frac{\sum _{i=1}^{N} a_i}{N}-a_{min}}{a_{max}-a_{min}} \end{aligned}$$
(9)

In the above Eq. 9, \(a_{max}\) and \(a_{min}\) are the maximum and minimum values of acceleration during a stride. The landmark of each area is recorded by RSS fingerprinting from WiFi sources. Landmarks are special locations on a floor that define the overall structure or the floor plan. Few of the landmarks that the authors in (Wang et al. 2016) have considered, are corners, straight path and corridors. The user’s trajectory and direction are measured and compared with the position of the corner and landmark database for the positioning of the device and hence the user. Every landmark consists of an Access Point (AP) that is the WiFi source. To form the landmark database for every AP, the peak of the RSS value is determined. The authors have also compared their method with simple PDR based approach and their fusion based approach and has found significant increase in accuracy in estimating the location in an indoor environment. The authors have also stated in (Wang et al. 2016) that they have achieved a deviation of 4.158  m in case of only PDR approach, while their fusion based approach has a deviation of 1.343 m from the exact position. A similar approach can also be found in Chen et al. (2015). The authors have considered a Kalman filter for the fusion process.

Li et al. (2015); Bird and Arden (2011) the authors have used a fusion based approach combining WiFi fingerprinting and magnetometer data. In (Li et al. 2015), Li et al. compared the positioning approach using simple RSS based and RSS fused with magnetometer. While considering simple RSS based approach, the WiFi fingerprints from various sites are recorded from all the access points as depicted in Table 1. Along with the WiFi Fingerprints, Magnetometer reading in indoor environment is also recorded from the site named as ’Heat Maps’. The authors have taken the measurements and predictions are made with respect to different orientation. The authors claim that if the positioning is performed with simple RSS data, the accuracy is up to 3 m while Magnetic field aided RSS gives an accuracy of 2 m.

The Kalman Filtering(KF) is greatly utilized in the process of fusion approach. Researchers have modified KF algorithm and extended the work to function for nonlinear systems (Wan and Der Merwe 2000). The Unscented Kalamn Filter(UKF) (Wan and Der Merwe 2001) and Extended Kalman Filter(EKF) (Isabel 2004) are the variants of the KF approach used for nonlinear system. Device heterogeneity is one of the major challenges along with the activity tracking during the localization process using smartphone. A novel approach proposed by Jianguo et al. (2019), in this work it can be observed that WiFi and PDR based approaches are combined using two proposed UKF algorithm. One of the algorithm is modelled for positioning and the other for heading estimation that aids in positioning. The proposed UKF approach is said to give robust positioning and orientation that works for unconstrained smartphones. The authors have experimented and compared their proposed approach with EKF, KF, Simple PDR and WiFi localization and they claim that their approach produces better result of 0.76 m accuracy than the rest of the methods.

Chen et al. (2016), Chen et al. proposed an Indoor Positioning using inertial sensors combined with BLE technique. As we know Inertial based localization suffers from drift problem as one progresses from one location to another, hence the authors adopted a drift correction mechanism with the use of Apple’s Bluetooth iBeacon. The authors considered a quanternion to estimate the parameters in Pedestrian Dead Reckoning (PDR). The PDR approach is formulated with Eq. 10 :

$$\begin{aligned} P_{k+1} = P_k+SL\begin{bmatrix} sin \theta _k \\ cos \theta _k \end{bmatrix} \end{aligned}$$
(10)

Here \(P_{k+1}\) is the next step location, \(P_{k}\) is the current location, \(\theta _k\) is the stride direction and SL is the length of the stride. In order to detect the steps, a consistent 3D acceleration is necessary. Step detection is performed in the very same manner of identifying the peaks in the accelerometer reading. The authors have also considered variable step length. The length estimation is performed by considering the difference between the maximum and minimum vertical acceleration during one stride. The step length (SL) is shown in the following Eq. 11:

$$\begin{aligned} SL = \beta (a_{max}-a_{min})^{1/4} \end{aligned}$$
(11)

\(a_{max}\) and \(a_{min}\) are the maximum and minimum values of acceleration during a single step. While the \(\beta \) parameter is varied accordingly as in (Chen et al. 2016), a gait change parameter is considered. A Fusion based approach using a Kalman filter is considered where the magnetometer and gyroscope values are combined to give the direction estimation. The iBeacons are deployed to assist the PDR based approach while performing localization in indoor environment. The authors have considered a sparse deployment of the iBeacons. Distance between the mobile device and a particular \(j^{th}\) iBeacon is calculated using a pathloss evaluation between the reference RSS and the \(j^{th}\) iBeacon. The drift in the PDR based approach is corrected with the help of iBeacons. A modified Positioning system is given by Eq. 12.

$$\begin{aligned} P_{k+1} = P_k+SL\begin{bmatrix} sin \theta _k \\ cos \theta _k \end{bmatrix}+R \end{aligned}$$
(12)

Here R varies with respect to various RSS received values from the iBeacons constructed using a Path-loss model in (Chen et al. 2016). The authors claim that the proposed localization accuracy is 1.28 m against normal PDR based approach which varies between 3 and 5 m.

A prominent Fusion based approach combining inertial sensors and light intensity sensor for accurate positioning can be found in (Xu et al. 2015). The variation in light intensities is detected as the user moves around the floor. Locating the position of the lights is crucial. For identifying the luminaries the authors have used a head mounted camera. The illumination information is fused with the PDR data. The light intensity assisted displacement estimation is performed using an adaptive filter followed by Dead Reckoning based position estimation. A head mounted Go-Pro camera is used for the Floor mapping which is synchronized accordingly for the localization with the IMU and light sensors. The positioning prediction accuracy is 96% with error ranging from 0.38–0.74 m.

Ultra-wideband (UWB) fused with inertial measurements also provides an efficient way of positioning in indoor location. One such approach is proposed by Kok et al. (2015). Inertial sensors are placed in the body of an user along with three or more UWB transmitters placed on a user’s head and feet to calculate the Time of Arrival from the UWB receiver’s. The only problem with the UWB based approach is the clock synchronization between the receivers. The objective of the author is to estimate the 6D position of the users, that is the 6 degree of freedom namely up/down, left/right, yaw, pitch, roll, forward/back. An extended Kalman filter is used for combining the UWB with the inertial approach. In order to carry out localization, the position of the receivers have to be known. The authors have developed a procedure to estimate the location of the receivers without manually intervening into the site and performing survey. A trilateration and multilateration (Savvides et al. 2003) procedure are utilized to estimate the position.

5.3 Comparative study among the approaches

The sole objective of Location-based service is to provide continuous and seamless localization. RSS and Bluetooth based approaches have high throughput and reception range. BLE on the other hand, has less energy consumption,ultra wide band techniques are immune to interference and provide high accuracy. Visible light based approaches use relatively higher power. An overview of power consumption, range and throughput of the various technologies used for the purpose of localizing is given in Table 4.

Table 4 Throughput and range of conventional localization approaches
Fig. 12
figure 12

Mapping challenges with solution

We have observed in Sect. 3.3 about the problems associated with IMU based approaches. The above Fig. 12 specifically links the various research work discussed in previous section to the specific challenges that are tackled in that very work.It has been observed that combining inertial sensors with existing technologies of WiFi, Bluetooth, etc. provides better positioning.

A comparison table is given in Table 5 that summarizes the discussed works. It is noticed that Kalman Filter has been widely used in the fusion based approaches where step length calculation is done using the fusion of IMU based approaches with numerous techniques like BLE, Light intensities, etc., provides better accuracy in localization. It can be noticed from the Table 5 that the PDR combined with Light intensity and UWB provides better accuracy when the area of the floor is small, and light sources are ambient sources and are found almost everywhere. However, power consumption of the approach is more and so is the infrastructure cost. RSS (WiFi or BLE) with IMU provides more than satisfactory accuracy as compared to using only PDR or RSS for positioning.

Table 5 Comparative overview of inertial based positioning approaches

6 Conclusion

In this survey, we have reviewed the works carried out using inertial measurement units in the domain of smartphone-based indoor positioning. Some of the notable challenges are highlighted in the manuscript and mapped with the past publications. The article lists out the challenges faced by inertial measurement units. A section amalgamating present machine learning-based approaches in tackling the problems with smartphone IMU is presented. We have observed from the past research work that authors have considered the varying context of smartphones and have also considered the device sensitivity by considering multiple smartphone devices during the training process. The Deep Learning-based approach has also been explored to tackle the challenges faced by smartphone IMU both for stride length analysis and Heading and orientation. Thus, leading the path towards a robust solution in the domain of Indoor Localization. The article discusses the significant perspectives in smartphone IMU-based indoor localization and guides in better understanding the research domain of this same field.