1 Introduction

Mobile robots’ self-localization and position tracking is one of the most important tasks in various applications such as visual navigation and visual servo, which are the indispensable terminal applications for the future Internet of Things. A common way to implement this task is to estimate robots’ pose according to inertial sensors. This method is difficult to resolve the problem of robot’s wheel slipping, so the accumulated errors impact the estimating accuracy greatly [4, 19, 21, 22]. GPS is another commonly used technique to solve this problem. However, GPS can’t be used in some conditions such as indoor environments [25, 28]. Magazine [17] proposed The Non-Cooperative Feature Points for the motion estimation, which are implemented by extracting special satellite feature points that are Non-Cooperative for the implementation of navigation. The problem of this method is that it is limited in special environment. The graph-based methods [10] have been utilized for the robot localization and navigation, but this method is difficult to obtain the high accuracy. The technical, using Visual sensors to estimate camera’s ego-motion, is called visual odometry. it is a promising technique for improving the capability of a mobile robot to estimate its motion in indoor environment [8, 11, 18, 24]. Extensive research has been done to make use of visual odometry in practice, several methods have been proposed based on monocular image sequences or stereo image sequences; however, there are still many difficulties in estimating the robots motion effectively because of the inaccurate feature matching and outliers [3, 14, 23]. One object of this paper is to develop a method integrating distinguished features of indoor environment and wireless sensor network to improve the effect of visual odometry. Considering that it is mainly applied to the indoor environment, the robot is assumed to travel in flat floor. The second problem needs to be solved is the robustness. Vision sensors are easy to be affected by many factors: the degree of image distortion correction, the tracking error of the feature points, and so on [1, 2, 26]. A series of filter algorithms are commonly used to improve the robustness of vision sensors, where particle filter has been a successful algorithm to solve this problem compared with other algorithms [6, 7, 12, 15]. However, A problem is difficult to be solved in particle filter algorithm that the particles would degenerate over time in terms of accuracy, which happens when the diversity of particles disappear over time. The main factors contributing to the disappearing of particles’ diversity is in the resampling process: where the low weight particles would be thrown away and only can the particles with high weight survive. In recent years, many algorithms [5, 20] have been proposed to solve the problem of particle depletion. Several kinds of particle filter algorithms utilizing compensation techniques to deal with the depletion problem. Similarly, A new resampling technique is proposed to restrain the particle depletion using geometric relation between particles. Swarm intelligence is also integrated into the process of motion estimation, but the problem is that the unified framework is difficult to be constructed [9, 13, 16, 27]. These methods have shown better performance than original particle filter, However, they haven’t taken particles’ set as a whole and merely tried to maintain diversity. We propose an improved particle filter method to solve the particle depletion problem, which utilizes curve fitting to predict the trend of particle weights in resampling phase. The specific steps can be described as follows: Firstly, the discrete particles with different weights can be taken as a whole, and curve fitting is used to obtain the pattern of weight distribution; Then the resampling process can be carried out in this weight curve.

This paper is organized as follows: Section 2 described the main methods including the robot motion estimation and enhanced particle filter to improve estimation robustness; In Section 3, various experiments were implemented to verify the proposed method. Some difficult problems targeting at indoor environment were discussed in Section 4. Section 4 gives conclusions.

2 Methods

2.1 Motion estimation based on feature points of vision sensor

2.1.1 The design of robot motion estimation based on feature points

How to gain the accurate visual features are the most fundamental problem in robot localization and motion control. The artificial features are often used for the robust feature extraction, but they are difficult for the flexible robot application because they need Complex manual arrangement beforehand. The natural features are easy to be obtained, but the robustness of the natural features is difficult to be solved. So, the distinguished features combining the real environment like points and lines are the primary idea to describe the environments through vision sensors. Considering that a better result can be obtained if a priori knowledge of the environment is utilized in the feature extraction algorithm, a thorough comparison is implemented among various feature detection algorithms. In the real indoor environment, the straight lines in images are all parallel or vertical, and their parameter ranges can be determined based on the results in the previous frame; Therefore, the extraction of line features is of great significance to the environment understanding. In order to improve the feature robustness, the point features obtained by the cross of lines are appropriate for robot positioning. Finally, the standard Canny algorithm is selected for edge detection, and Hough transform is selected for line feature detection. The detailed process can be described as follow:

  • The edge points are extracted in the whole indoor environment through canny algorithm when the robot is the first time to run.

  • The lines are detected and the line parameter are obtained utilizing Hough Transform according to the results of edge contour.

  • Based on the fact that the straight lines in images of indoor environment are all parallel or vertical, the parameter optimization is implemented through least square method.

  • Finally, the point features through the cross of lines are chosen as the best features for robot positioning.

Hough transform has the advantage that it is not easily affected by illumination change, so the point features can be obtained in spite of the strong interference of light. In order to improve the robustness of features further, the homography matrix H is utilized to eliminate the outliers, which is calculated through the correspondent features from the last frame to the current frame. The complete process of homography matrix calculation is described as follows.

Two steps are applied to reject outliers. Firstly, an initial estimation of the homography matrix H is obtained through the RANSAC algorithm. Secondly, the unmatched features were eliminated according to the homography matrix H. The concrete homography matrix H calculation can be represented in following equations. Assuming that the correspondent feature points are Pi(xi,yi),Vi(ui,vi) and Pi and Vi should satisfy Eq. 1:

$$ k\left[\begin{array}{c}{u}_i\\ {}{v}_i\\ {}1\end{array}\right]=\mathbf{H}\cdotp \left[\begin{array}{c}{x}_i\\ {}{y}_i\\ {}1\end{array}\right],\kern1em \mathbf{H}=\left[\begin{array}{ccc}{m}_1& {m}_2& {m}_3\\ {}{m}_4& {m}_5& {m}_6\\ {}{m}_7& {m}_8& {m}_9\end{array}\right] $$
(1)

Equation 2 can be obtained through Eliminating k from Eq. 1:

$$ \mathbf{K}\cdotp \mathbf{M}={m}_9\cdotp \mathbf{U} $$
(2)

Where K, M and U are defined as follows:

$$ \mathbf{K}=\left[\begin{array}{cccccccc}{x}_i& {y}_i& 1& 0& 0& 0& -{x}_i{u}_i& -{y}_i{u}_i\\ {}0& 0& 0& {x}_i& {y}_i& 1& -{x}_i{v}_i& -{y}_i{v}_i\\ {}& & & & & \dots & & \\ {}& & & & & \dots & & \end{array}\right] $$
(3)
$$ \mathbf{M}={\left({\mathbf{K}}^T\mathbf{K}\right)}^{-1}{\mathbf{K}}^TU $$
(4)

The relative motion will be very small when the two images are consecutive, therefore, we can obtain the motion speed according to the correspondences of the features in images. Based on the assumption of the planar ground, the world coordinate of points can be calculated according to the point coordinate projected in the image.

Point P is assumed as one point in the ceiling, and its coordinates are assumed as p(u,v) on the image plane. The point coordinates can be converted to homogeneous coordinates, according to the linear camera model. The relations of points can be represented as Eq. 5:

$$ \lambda \left[\begin{array}{c}u\\ {}v\\ {}1\end{array}\right]=\mathbf{M}\left[\begin{array}{c}X\\ {}Y\\ {}Z\\ {}1\end{array}\right] $$
(5)

Where M is a 3 * 4 projection matrix, which can be obtained through camera calibration. Because the point is on the ceiling, the Z axis coordinate of the point are set to 0. The equation can be rewritten as follows:

$$ {\displaystyle \begin{array}{c}\lambda \left[\begin{array}{c}u\\ {}v\\ {}1\end{array}\right]=\mathbf{M}\left[\begin{array}{c}X\\ {}Y\\ {}Z\\ {}1\end{array}\right]=\left[\begin{array}{cccc}{m}_{11}& {m}_{12}& {m}_{13}& {m}_{14}\\ {}{m}_{21}& {m}_{22}& {m}_{23}& {m}_{24}\\ {}{m}_{31}& {m}_{32}& {m}_{33}& {m}_{34}\end{array}\right]\left[\begin{array}{c}X\\ {}Y\\ {}0\\ {}1\end{array}\right]\\ {}=\left[\begin{array}{ccc}{m}_{11}& {m}_{12}& {m}_{14}\\ {}{m}_{21}& {m}_{22}& {m}_{24}\\ {}{m}_{31}& {m}_{32}& {m}_{34}\end{array}\right]\left[\begin{array}{c}X\\ {}Y\\ {}1\end{array}\right]=\mathbf{m}\left[\begin{array}{c}X\\ {}Y\\ {}1\end{array}\right]\end{array}} $$
(6)

Among them, m is the remaining invertible 3 × 3 matrix after that the third column of matrix M is deleted. The upper formula shows that there is a one-to-one correspondence between point P and point p under the condition that the ceiling is flat, which means that the world coordinates can be derived from the Eq. 7 if the image point coordinates are given.

$$ s\left[\begin{array}{c}X\\ {}Y\\ {}1\end{array}\right]={\mathbf{m}}^{-1}\left[\begin{array}{c}u\\ {}v\\ {}1\end{array}\right] $$
(7)

Therefore, as long as the coordinates of the corresponding feature points are obtained in two adjacent frames, it can be easy to estimate the robot motion, which can be represented as the motion of camera photocenter.

The process can be concluded as follows:

  • To Capture the current frame of the video and rectify the image to remove the distortion.

  • To Find the correspondent features in the current frame according to the features in the last frame

  • To Compute the correspondent world coordinates of the feature points in the two frames

  • To Infer the motion parameter according to distance the world coordinates of the correspondent feature points in the two frames.

  • To Capture the next frame and repeat this process

2.1.2 The calculation of motion parameter

Based on the spatial analytic geometry theory, it needs at least 3 pairs of non-collinear matching points to determine the three-dimensional motion of the rigid body. Due to the special constraints of the indoor environment, the robot motion is in the two-dimensional plane. It only needs two points to determine the robot motion. In practical applications, in order to reduce the measurement error, the number of feature points selected is far greater than 2. A simple method for calculating the motion parameters is presented below.

{mi = (xi, yi)T|i = 1, …, n} is assumed to be n (n > =2) feature points, where Zi is zero and is deleted. R is assumed to be 2 × 2 rotation matrix; T is assumed to be 2 × 1translation vector; {mi = (xi, yi)T|i = 1, …, n} is assumed to be {mi} the matched points after motion,If the data does not contain noise, the Eq. 8 can be obtained:

$$ {{\mathbf{m}}_i}^{\prime }={\mathbf{Rm}}_i+\mathbf{T},i=1,\dots n $$
(8)

\( \overline{\mathbf{m}^{\prime }} \), \( \overline{\mathbf{m}} \) are assumed as the gravity center of the two point sets respectively.

$$ \overline{\mathbf{m}^{\prime }}=\frac{\sum \limits_{i=1}^n{{\mathbf{m}}_i}^{\prime }}{n}=\left[\begin{array}{c}\overline{x^{\prime }}\\ {}\overline{y^{\prime }}\end{array}\right],\kern1em \overline{\mathbf{m}}=\frac{\sum \limits_{i=1}^n{\mathbf{m}}_i}{n}=\left[\begin{array}{c}\overline{x}\\ {}\overline{y}\end{array}\right] $$
(9)

Substitute \( \overline{\mathbf{m}^{\prime }} \), \( \overline{\mathbf{m}} \) to Eq. 9:

$$ \overline{\mathbf{m}^{\prime }}=\mathbf{R}\overline{\mathbf{m}}+\mathbf{T} $$
(10)

Let Formula 9 subtract Formula 10,The following function can be referred:

$$ {{\mathbf{m}}_i}^{\prime }-\overline{\mathbf{m}^{\prime }}=\mathbf{R}\left({\mathbf{m}}_i-\overline{\mathbf{m}}\right)\kern1em i=1,\dots n $$
(11)

R is the rotation matrix and it can be solved by the least square method, and the translation vector can be obtained by the following equation.

$$ \mathbf{T}=\overline{\mathbf{m}^{\prime }}-\mathbf{R}\overline{\mathbf{m}} $$
(12)

This method is simple, but it has a serious problem: The matrix obtained is not a rotation matrix, because the constraints on the rotation matrix are not taken into account. The result is not satisfactory when the data contains noise.

Considering the constraint conditions of the rotation matrix R, we use the Euler angle to represent the rotation matrix.

$$ \mathbf{R}=\left[\begin{array}{cc}\cos \theta & -\sin \theta \\ {}\sin \theta & \cos \theta \end{array}\right],0\le \theta <2\pi $$
(13)

In order to calculate Euler angle θ, the minimum value of the following function should be calculated based on Eq. 13:

$$ f\left(\theta \right)=\sum \limits_{i=1}^n{\left\Vert {{\mathbf{m}}_i}^{\prime }-\overline{\mathbf{m}\prime }-\mathbf{R}\left({\mathbf{m}}_i-\overline{\mathbf{m}}\right)\right\Vert}^2 $$
(14)

It can be calculated to make the first derivative if the function is zero:

$$ \tan \theta =\frac{\sum \limits_{i=1}^n\left(\left({x}_i-\overline{x}\right)\left({y_i}^{\prime }-\overline{y^{\prime }}\right)-\left({y}_i-\overline{y}\right)\left({x_i}^{\prime }-\overline{x^{\prime }}\right)\right)}{\sum \limits_{i=1}^n\left(\left({x}_i-\overline{x}\right)\left({x_i}^{\prime }-\overline{x^{\prime }}\right)+\left({y}_i-\overline{y}\right)\left({y_i}^{\prime }-\overline{y^{\prime }}\right)\right)} $$
(15)

Translation vector can be obtained based on Formula 13. After the motion parameter (R, T) is obtained,it can be used to get rid of the outliers. Considering the measurement error, Eq. 9 should be rewritten as:

$$ {m_i}^{\prime }={Rm}_i+T+{\eta}_i $$
(16)

Where ηi represents the errors between the real position and the calculated position of feature points. The n + 1 feature points are substituted to the upper formula in turn, and if the ηi modulus is larger than a certain value, the point can be considered as an outlier. After removing the outliers, the above formula can be reused to get more accurate motion parameters.

2.2 Particle filter for robust estimation

2.2.1 Particle filter with enhanced resampling

The general description of robot motion description is that we can estimate the state of feature point at a specific time using a set of camera sequence images, and then transform it into a posterior probability distribution function for target state. It is assumed that the change of the target state p(xk| xk − 1) is Markoff process and the observed values {Z1, Z2Zk} are independent of each other. Given the state transition prior probability p(Xk|Xk − 1),where k denotes the time step. The observed likelihood function p(Xk|Z1 : k), the posterior probability distribution can be obtained based on Bayesian theory:

$$ p\left(\left.{X}_k\right|{Z}_{1:k}\right)=\frac{p\left(\left.{Z}_k\right|{X}_k\right)\int p\left(\left.{X}_k\right|{X}_{k-1}\right)p\left(\left.{X}_{k-1}\right|{Z}_{1:k-1}\right){dX}_{k-1}}{p\left(\left.{Z}_k\right|{X}_k\right)\int p\left(\left.{X}_k\right|{X}_{k-1}\right)p\left(\left.{X}_{k-1}\right|{Z}_{1:k-1}\right){dX}_{k-1}{dX}_k} $$
(17)

Because the calculation of state posterior probability contains complex and intractable integral operation, it is difficult to obtain its analytical solution. Monte Carlo technology is used to approximate the posterior state distribution with a set of weighted samples.

In recent years, in order to deal with the state estimation problem of nonlinear system and non-Gauss dynamic system, Particle Filter is also known as Bayesian Bootstrap Filter or Monte Carlo Filter. The particle filter is the Bayesian framework under the theory of sequential importance sampling by Monte Carlo method, which is based on Monte Carlo stochastic simulation theory. The system posterior p(Xk|Z1 : k),where k denotes the time step, is represented by a set of weighted random sampling \( {\left\{{x}_k^i,{w}_k^i\right\}}_{i=1}^N \) with probability distribution, and the new distribution produced by Bayesian iterative evolution from these random samples. It is impossible to get sampled particles directly from posteriori distribution in practice, so we often use the proposed distribution q(Xk|Xk − 1, Z1 : k) to approximate the posterior distribution and to get a group of particles from it. From the moment k-1 to the moment k, the i-th particle weight up can be updated based on formula 18:

$$ {w}_k^i={w}_{k-1}^i\frac{p\left({Z}_k|{X}_k^i\right)p\left({X}_k^i|{X}_{k-1}^i\right)}{q\left({X}_k^i|{X}_{k-1}^i,{Z}_{1:k}\right)} $$
(18)

The posteriori distribution p(Xk|Z1 : k) can be approximated as:

$$ p\left({X}_k\left|{Z}_{1:k}\right.\right)\approx {\sum}_{i=1}^N{w}_k^i\delta \left({X}_k-{X}_k^i\right) $$
(19)

The state function f(Xk) can be approximated as:

$$ E\left[f\left({X}_k\right)\right]\approx {\sum}_{i=1}^N{w}_k^if\left({X}_k^i\right) $$
(20)

In the original particle filter algorithm, the resampling process is just to replicate the high weight particles. it is easy to meet the problem of particle depletion, and the particles would degenerate over time in terms of accuracy. we designed a new method in the resampling process of particle filter, where least square curve fitting is used to increase the particle diversity. The process of curve fitting used in resampling is described as follows:

(xi, fi),(i = 1, 2, ⋯, m) is assumed as the given particles’ state and their responding weights before resampling, The function needs to be found in set Φ = Span{φ0, φ1, ⋯, φn} to satisfy the following Eq. 21.

$$ {S}^{\ast }(x)=\sum \limits_{k=0}^n{a}_k^{\ast }{\varphi}_k^{\ast }(x)\left(n<m\right) $$
(21)

The errors can be represented as Eq. 22:

$$ {\delta}_i={S}^{\ast}\left({x}_i\right)-{f}_i\left(i=1,2,\cdots, m\right) $$
(22)

S(x) can be obtained to satisfy Eq. 23:

$$ {\displaystyle \begin{array}{c}\sum \limits_{i=1}^m{\delta}_i^2=\sum \limits_{i=1}^m\omega \left({x}_i\right){\left[{S}^{\ast}\left({x}_i\right)-{f}_i\right]}^2\kern2em \\ {}\kern2em {=}_{S(x)\in \kern0.5em \Phi}^{\mathrm{min}}\sum \limits_{i=1}^m\omega \left({x}_i\right){\left[S\left({x}_i\right)-{f}_i\right]}^2\end{array}} $$
(23)

Where ω(x) ≥ 0 is the given weight function in set [a, b]. The above method of finding the approximation function S(x) is the least square method of curve fitting. The function S(x) that satisfies the relational expression 23 is the least square solution. For the given (xi, fi), the only function \( {S}^{\ast }(x)=\sum \limits_{k=0}^n{a}_k^{\ast }{\varphi}_k^{\ast }(x) \) exists in function set Φ = Span{φ0, φ1⋯, φn} to satisfy Eq. 23. The coefficients \( {a}_0^{\ast },{a}_1^{\ast },\cdots, {a}_n^{\ast } \) of the least squares solution can be solved by the Eq. 24.

$$ \sum \limits_{k=0}^n\left({\varphi}_k,{\varphi}_j\right){a}_k=\left(f,\varphi \right)\kern1em \left(j=0,1,2,\cdots, n\right) $$
(24)

In our designed solution, the algebraic polynomial fitting is selected as the basic functions, which is {φ0, φ1, ⋯, φn} = {1, x, x2⋯, xn}. So, the corresponding solution function 24 can be rewritten as:

$$ \left[\begin{array}{cccc}\sum {\omega}_i& \sum {\omega}_i{x}_i& \cdots & \sum {\omega}_i{x}_i^n\\ {}\sum {\omega}_i{x}_i& \sum {\omega}_i{x}_i^2& \cdots & \sum {\omega}_i{x}_i^{n+1}\\ {}\vdots & \vdots & \ddots & \vdots \\ {}\sum {\omega}_i{x}_i^n& \sum {\omega}_i{x}_i^{n+1}& \cdots & \sum {\omega}_i{x}_i^{2n}\end{array}\right]\left[\begin{array}{c}{a}_0\\ {}{a}_1\\ {}\vdots \\ {}{a}_n\end{array}\right]=\left[\begin{array}{c}\sum {\omega}_i{f}_i\\ {}\sum {\omega}_i{x}_i{f}_i\\ {}\vdots \\ {}\sum {\omega}_i{x}_i^n{f}_i\end{array}\right] $$
(25)

where \( \sum \limits_{i=1}^m \) is simplified as ∑, as well as ωi = ω(xi). \( {S}^{\ast }(x)=\sum \limits_{k=0}^n{a}_k^{\ast }{x}^k \) is the required polynomial.

The concrete description is as follows: firstly, based on the distinguished particles, curve fitting was utilized to fit the trend of the particles with different weights. Secondly, the fixed number of particles was resampled based on the fitted curve. The illustrative diagram is shown in Fig. 1. Our method has two advantages: the first one is that the curve can reflect the trend of particle weight; the second one is that new particles can be easily resampled from the fitted curve, solving the problem of particle depletion. The whole process can be concluded as follows:

  • Particles are used to simulate robot real state, which are assumed to be distributed uniformly in state space

  • Each particle gets a predicted state according to the state transfer equation

  • The predicted states of particles are evaluated and updated according to the observation

  • The least square curve fitting is carried out according to the state and weight of the particles. The new resampling particles gets the weight from the fitted curve.

Fig. 1
figure 1

The Illustrative diagram for particle resampling: the blue circle points denote the original particles and the red points denotes the new resampling particles

2.2.2 Observation model of feature points

Many factors affect the visual motion estimation errors, such as the light changes, running distance. For the robust motion estimation, The errors modeling methods are indispensable, which are usually set up vision imaging model according to computer vision theory, but it is difficult to adapt to the specific circumstances of indoor environment.

The Robot observation model is shown as Fig. 2, where the pentagram denotes the real robot position, while the observation results have some errors shown as asteroids. Considering the efficiency, A direct method is used to set up the vision errors model through the analysis of the experimental data. Our vision error model is built as Eq. 26:

$$ {z}_i={x}_i+{v}_i $$
(26)
Fig. 2
figure 2

The illustrative graph for robot observation error: the pentagram denotes the real robot position and the asteroid denotes the robot observation position. The unit is cm

The observation noise vi is also assumed to be Gaussian noise with the mean value of zero, and the covariance matrix Ri is a diagonal, where the diagonal elements are as follows:

$$ {\displaystyle \begin{array}{c}{R}_{11=}{\sigma}_{rx},\\ {}{R}_{22=}{\sigma}_{ry},\\ {}{R}_{33=}{\sigma}_{\theta}\end{array}} $$
(27)

According to the measurement results, the variance of the position distribution σx were obtained using curve fitting, where r is the distance from the current point to the feature point. Curve fitting is used because the variance will turn very great in the blind area without feature points. The change of angle variance σθ is not obvious with the change of distance.

3 Results

The Institute of automation of Chinese Academy of Sciences has independently developed a versatile autonomous mobile robot platform called AIM based on its existing technology. The various theories and algorithms described in this paper are tested on the platform. The vision sensor is placed on the top of the mobile robot.

In order to obtain the observation model of vision sensors, A set of measurement values and errors are obtained through actual experiments, where the real position of the feature point is obtained by manual measurement. The measured points are distributed equally in three sets of concentric circles, and the feature points to be observed are located at the center of the circle. The measurement error varies with different distances to feature points. Farther away from the feature points, the more measurement errors, which is caused by image distortion. Though the camera calibration and image correction have been implemented, the absolutely accurate camera calibration is difficult to achieve. Based on this phenomenon, our observation model is set up considering more on the actual environment. The experiment environment is an indoor room with area 5 m by 5 m, which is shown in Fig. 3. The pentagram represents the feature points using the cross of vertical lines extracted by Hough transform. The center of red eclipses represents the feature points detected, and the area of a certain eclipse represents the error variance corresponding to a certain feature point. The more the area, the more the error variance. The robot path is scheduled as follows: The robot, which is represented by the purple triangle, starts from the corner and runs around the rectangular wall. When the robot detects a feature point, it estimates the feature position and the corresponding error variance. Simultaneously, it estimates its own position according to the detected feature points. The motion speed will be determined based on its position in different time. Because the errors in the estimation of the feature points, the estimation of robot’s own position would have errors too. The estimation position of the robot is represented by the red triangle.

Fig. 3
figure 3

Result of UKF algorithm: The axis X and axis Y denote the room length and room width respectively, and the unit is 5 cm

The UKF [13] is selected for the comparison of experiment results, which is the abbreviation of the Unscented Kalman Filter. In order to explain the results in more detail, the original particle filter [16] is also selected. Because our proposed algorithm combines the original particle filter and cure fitting, it is represented by PFCF (Particle Filter with Curve Fitting). UKF and original particle filter are shown in Figs. 3 and 4 respectively. Figure 5 shows the result of PFCF. Apparently, the error of landmark estimation becomes smaller in our proposed algorithm than that in original particle filter and UKF. The standard deviations of estimation errors for different algorithms are summarized in Table 1. it is worth noting that each value in this table averages the whole results for different numbers of particles. The standard deviations of original particle filter are smaller than UKF, and the standard deviations of PFCF are smaller than original particle filter.

Fig. 4
figure 4

Result of original particle filter experiment: The axis X and axis Y denote the room length and room width respectively, and the unit is 5 cm

Fig. 5
figure 5

Experiment result of proposed algorithm called PFCF: The axis X and axis Y denote the room length and room width respectively, and the unit is 5 cm

Table 1 Standard deviation of particle position

Different numbers of particles have important impacts on the estimation of feature error and position error for different algorithms. For better comparison, we analyze the results of different algorithms from different angles. The average values of feature error for different algorithms are shown in Fig. 6. The average feature error turns smaller with the number of particles increases. Among the Three algorithms, the worst algorithm is UKF, which has the largest feature estimation errors; while the best algorithm is the proposed PFCF, which has the smallest feature estimation errors. The comparison is carried out in the same particle number. The average position error values for different algorithms and different numbers of particles are shown in Fig. 7. It is shown that the average position errors of the three algorithms become reduced as the number of particles increases. The worst algorithm is UKF, which has the largest average position error, and the best algorithm is still the proposed PFCF, which has the smallest position estimation error for the same particle number.

Fig. 6
figure 6

The average values of feature error with different numbers of particles: The axis X denotes the numbers of particles, and the axis Y denotes average values of feature error (m)

Fig. 7
figure 7

The average values of position error with different numbers of particles: The axis X denotes the numbers of particles, and the axis Y denotes average values of position error (m)

Figure 8 shows the impact of different numbers of particles in maximum estimation of feature position, and Fig. 9 shows that of position. In these conditions, PFCF has the best results for each same number of particles, and particle filter is better than the UKF. The experiment results show that conventional particle filter degenerates over time in terms of accuracy due to the particle depletion in resampling process. The curve fit method is utilized in PFCF, which can represent the trend of the particle weights, to solve the particle depletion problem. The results show that it can improve the performance of particle filter, contributing to the better estimation results. Besides, there are many factors that will affect the algorithm, such as the degree of image distortion correction, the jitter of the camera in the robot’s running process, the tracking error of the feature point, and the pitching of the robot during the emergency stop and acceleration. Some of these factors have a great impact on the results of the algorithm, such as the jitter of the camera during the operation and the pitching of the robot. The future work will focus on these problems, and the depth information acquisition and matching for the real indoor environment will be taken into account to get more robust estimation results.

Fig. 8
figure 8

The maximum values of feature error with different numbers of particles: The axis X denotes the numbers of particles, and the axis Y denotes maximum values of feature error (m)

Fig. 9
figure 9

The maximum values of position error with different numbers of particles: The axis X denotes the numbers of particles, and the axis Y denotes maximum values of position error (m)

4 Conclusions

Future internet of things relies on the various robot services, while robot localization and motion estimation are key factors to implement different robot applications. The method utilizing traditional inertial sensor is easy to be impacted by the wheel slipping, and the accumulated error is a serious problem to be resolved when the robot moves a long time. Vision sensors with huge amount information have been one of the most important sensors in the various applications of the mobile robot. The indoor environment is semi-structured environment, So the algorithm combining the characteristics of indoor environment can bring a lot of convenience, which can also help to improve the accuracy, real-time and reliability of robot localization and motion estimation. An effective robot motion estimation method is proposed assuming that the mobile robot runs on the flat ground. Because the visual features are easily affected by the changes of light, an enhanced particle filter method integrated with least square curve fitting is designed to improve the robustness. The experiments on the wheeled robot verified that the algorithm can be carried out reliably for robot motion and positioning estimation in the indoor environment. The results also show that our proposed enhanced particle filter can perform better than the original particle filter. Future work will integrate the depth information with the real indoor environment to implement the more robust motion estimation.