Handling pure camera rotation in semi-dense monocular SLAM

Zhou, Yao; Yan, Feihu; Zhou, Zhong

doi:10.1007/s00371-017-1435-0

Handling pure camera rotation in semi-dense monocular SLAM

Original Article
Published: 01 September 2017

Volume 35, pages 123–132, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

The Visual Computer Aims and scope Submit manuscript

Handling pure camera rotation in semi-dense monocular SLAM

Download PDF

672 Accesses
7 Citations
Explore all metrics

Abstract

In this paper, we present a method for semi-dense monocular simultaneous localization and mapping (SLAM) that is capable of dealing with pure camera rotation motion which brings forward a severe challenge for current direct (featureless) monocular SLAM approaches. A probabilistic depth map model built on Bayesian estimation is combined with the main framework of the state-of-the-art direct method LSD-SLAM. Using this model, both rotation-only and general camera motions could be tracked, and a consistent depth map could be built in real-time. Experimental results demonstrate the outstanding performance of the proposed system.

Camera-Agnostic Monocular SLAM and Semi-dense 3D Reconstruction

LSD-SLAM: Large-Scale Direct Monocular SLAM

Sliding Window Based Monocular SLAM Using Nonlinear Optimization

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Benefited from the rapid development of virtual reality (VR) and augmented reality (AR) devices and applications, real-time simultaneous localization and mapping (SLAM) has been gaining increasing popularity as an essential part of AR and VR researches [1,2,3] in the last two decades. SLAM techniques can be divided into different classes according to different sensors like lasers, sonar or cameras. With a monocular camera, the cheapest and smallest sensor module, visual monocular SLAM algorithms [4,5,6,7,8,9,10,11,12,13,14] have made significant progress, and feature-based techniques have been consolidated and prevalent in the past decades. Recently, direct approaches are drawing more and more attentions. Contrast to feature-based approaches which extract and triangulate features on the images, direct approaches track camera motions and reconstruct the environment directly over pixel intensities on the whole image. This provides substantially more information about the environment, which can be invaluable for robotics or augmented reality applications.

Though with more potential applications, direct approaches still have many restrictions on the camera motion. Particularly in rotation-only camera motion, existing direct semi-dense SLAM systems could hardly estimate and update the depth map and finally cause tracking failed.

In this paper, we build on the main framework of LSD-SLAM [8] and the probabilistic depth map model [15], to design a semi-dense monocular SLAM system suitable for rotation motion. More specifically, we model the depth of every pixel as a distribution that mixes a good measurement (normally distributed around the true depth) and an unknown measurement (uniformly distributed in an interval which is supposed to contain the depth range). As new frames arrive, we regard these frames as new observations for the depth of their reference keyframe and compute the Bayesian estimation for the real depth of the keyframe and estimate the probability of satisfactory ones. Choosing this model, depth map can still be created in rotation-only camera motion. Both general and rotation-only camera motion can be tracked, and a semi-dense map could be reconstructed at last, as shown in Fig. 1.

The remainder of this paper is organized as follows: Section 2 presents related works. Section 3 introduces the probabilistic depth map, including the Bayesian model of the map as well as the map update, propagation and regularization steps. Section 4 introduces how to track new frames with the depth map. Experimental results are illustrated in Sect. 5, and Sect. 6 draws a conclusion.

A previous version of our work was presented in [16]. In this paper, we add descriptions in detail and perform more experiments.

2 Related work

A large variety of SLAM systems have been proposed in the past decades, which can be intuitively divided into two classes: feature-based and direct approaches. A feature-based approach estimates camera poses and reconstructs maps by extracting and tracking a sparse set of image features from successive frames, while a direct approach minimizes the photometric error directly over pixel intensities and performs dense or semi-dense reconstruction. Works belonging to the former and the latter class include [4,5,6] and [7, 8, 13, 17, 18], respectively.

The initial approaches of feature-based monocular SLAM systems are mostly based on filter methods. Davison et al. [4] firstly presented a real-time monocular SLAM system called MonoSLAM, employing EKF-based probabilistic estimation to calculate camera poses and build a sparse map of features. Modern feature-based approaches [5, 6] are based on keyframes [19]. Optimization methods such as bundle adjustment (BA) [20] could be operated in these systems. Klein and Murray [5] suggested a widely popular framework, parallel tracking and mapping (PTAM), which splits camera tracking and mapping into two parallel threads and performs optimization over selected frames applying BA methods. Murartal et al. [6] designed a novel monocular SLAM system, ORB-SLAM. Built on the main ideas of PTAM, the system has fixed many limits such as loop closing and relocalization and becomes one of the most representative feature-based SLAM techniques.

Recently, as the performance of computer hardware has been incredibly improved, multiple kinds of direct approaches have been put forward. Newcombe et al. [13] presented a dense SLAM system which generates smooth depth estimate by a non-convex optimization process. This system needs GPU to enhance the processing power. The first large-scale direct monocular SLAM method is LSD-SLAM, a real-time direct monocular SLAM framework, proposed by Engel et al. [7, 8]. The system employs a direct tracking method towards keyframes and a probabilistic filtering solution to build large-scale semi-dense maps. It is impressive that this system has real-time capability on CPUs without GPU acceleration. Then, Caruso et al. [17, 18] extended this framework to an omnidirectional camera model and a stereo camera model, respectively.

There are also many systems using a combination of feature-based methods and direct approaches, such as SVO [14], which is proposed by Forster et al. They use direct methods to estimate feature correspondences and feature-based methods to refine camera poses.

All the mentioned SLAM works seek for robust real-time performance; however, tracking is usually failed in multiple situations such as pure rotation. Handling rotation-only camera motion has always been one severe challenge for SLAM. Several algorithms have been proposed to explicitly address this problem.

Gauglitz et al. [9] presented a keyframe-based real-time approach which differentiates general and rotation-only camera motions between keyframe pairs. In the latter case, Pirchheim et al. [10] proposed a scheme with the basic idea of combining 6DOF and panoramic SLAM, the regional panorama maps registered in a global 3D map to handle pure rotation camera movements. Herrera et al. [11] presented a real-time visual SLAM system that tracks the features locally and incrementally and delays triangulation of the matched 2D features between keyframes until sufficient baseline has been satisfied.

Theoretically treating translation motion and rotation motion differently should be fine. However, the two kinds of camera motion could be inextricably linked in practice. On the other hand, few approaches of direct method SLAM have been proposed to handle this degenerate rotation-only camera motion. And likewise, this is the major motive of this paper.

In this work, we propose a direct monocular SLAM that combines a probabilistic depth map model based on Bayesian estimation with the main framework of LSD-SLAM to deal with not only general camera motions but also rotation-only motions.

3 Probabilistic depth map based on Bayesian estimation

In LSD-SLAM [8], the system uses an extended Kalman filter to refine the depth map. More specifically, when a frame is chosen to be a keyframe, the system estimate the depth of all pixels which have a non-negligible image gradient in new images, and each estimate is represented as a Gaussian probability distribution and used to refine the depth map of the keyframe. The potential meaning is that each estimate would be treated as a good measurement.

In practice, however, there are always numerous inevitable bad measurements. If we could separate bad measurements from good measurements, the depth estimate would be more accurate with restricted iterations. In particular for real-time keyframe-based SLAM system, the reference keyframes could be generated frequently to maintain effective tracking when severe camera motion occurs, which also means observations for one depth would be restricted while introducing more noisy estimations.

Considering this situation, one depth model which is less affected by outliers could be more suitable for our system. Therefore, we model the estimated depth ${d_k}$ of each pixel according to [15] with a distribution that mixes a good measurement (normally distributed around the true depth $\hat{d}$) and an unknown measurement (uniformly distributed in an interval $[d_{\min },d_{\max }]$):

$$\begin{aligned} p(d_k {\mid } \pi ) = \pi \mathcal {N}\left( d_k {\mid } \hat{d},\tau ^2_k\right) +(1-\pi )\mathcal {U}(d_k {\mid } d_{\min },d_{\max }) \end{aligned}$$

(1)

where $\pi $ and $\tau ^2_k$ are the probability and the variance of a good measurement in k-th frame. Note that we use d to denote the inverse depth, which is different with [15]. And this model is illustrated in Fig. 2.

As derived in [15], the posterior of the Bayesian estimation for d can be approximated by the product of a Gaussian distribution for the depth and a Beta distribution for the probability of good measurement:

$$\begin{aligned}&q\left( \pi ,\hat{d} {\mid } a_k,b_k,\mu _k,\sigma ^2_k\right) \nonumber \\&\quad = \hbox {Beta}\left( \pi \mid a_k,b_k\right) \mathcal {N}\left( \hat{d} {\mid } \mu _k,\sigma ^2_k\right) \end{aligned}$$

(2)

where $a_k$ and $b_k$ are the parameters of Beta distribution.

More details of this model are presented in [15] and similarly it is also used in SVO [14].

Similar to LSD-SLAM, the main mapping process in our system contains four parts: depth map initialization, depth map update, depth map propagation and depth map regularization, while all steps have been modified to combine with the applied depth map model. Depth map initialization step uses a random method to initialize the depth map and gives it a large variance. Depth map update step computes the depth observations in each frame and updates the depth map of the current keyframe. Depth map propagation step creates a new keyframe when the current frame is too far away from the existing depth map and propagates depth map from the old keyframe into the new one. Depth map regularization step is executed after the update step. It computes the smoothed depth for stereo searching and tracking. Overview of the mapping process is visualized in Fig. 3.

3.1 Depth map initialization

Instead of estimating the relative pose between two or more frames to triangulate initial map in traditional monocular visual SLAM systems, LSD-SLAM [8] uses an initialization method that initialize the first keyframe with a random depth map with large variance. In practice, these initialized depths are always outliers. Since we choose the Bayesian model [15] in our system, which could naturally separate good measurements and unknown measurements, we could make full use of this model in the depth map initialization step.

In detail, as visualized in Fig. 4, we initiate each pixel in the depth map with a high expectation of unknown measurement and this step will generate a random depth and large variance. After several subsequent frames, the depth map could be upgraded to a correct depth configuration using the Bayesian estimation mentioned above.

3.2 Depth map update

When the camera pose of a new frame has been estimated, depth map update step would be used to update the depth map of the reference keyframe. For every pixel with non-negligible gradient in the keyframe, a search method which matches the pixels intensity along the epipolar line on the current frame is performed. In order to improve the search efficiency, the search interval $d \pm l\;(E_{\pi },\sigma )$ is limited by the prior info of the pixel, and $l\;(E_{\pi },\sigma )$ is defined as

$$\begin{aligned} l(E_{\pi },\sigma ) = 2\pi _i\sigma _d + (1-E_{\pi })\sigma _{\max } \end{aligned}$$

(3)

where the parameter $E_{\pi }$ is the expected value for the probability of a good measurement, and it is controlled by the Beta distribution in Eq. (2). In other words, this parameter is controlled by pixel parameter a and b: $E_{\pi } = \frac{a}{a+b}$. Parameter $\sigma _{\max }$ is a constant which represents 99% of the probability inverse depth lies in the range $[d_{\min }, d_{\max }]$ by the Gaussian distribution. Parameter $\sigma _d$ is the inverse depth variance of one pixel which is estimated by previous observations (Figs. 5, 6).

Then, we need to estimate the uncertainty of the inverse depth. The method applied in [7, 8] which considers both photometric and geometric disparity errors and together with the pixel to inverse depth ratio, is performed to determine the accuracy of this stereo observation. Though the three factors are designed for small camera rotation hypothesis in [7, 8], we note that if the reference keyframe could be generated more frequently as rotation-only camera motion occurs, the estimate method could also be reliable. We refer to the original work in [7] for more details of this estimate method.

When the inverse depth and its uncertainty of the pixel in the current observation have been estimated, these factors will be added into the mentioned Bayesian estimation. Parameters a, b, $\mu $ and $\sigma $ which correspond to the depth measurement of the pixel is then updated and would be used for tracking and mapping afterwards.

3.3 Depth map propagation

Incoming frames are evaluated to determine whether they should be added as a keyframe. Since the search method matching non-negligible gradient of pixels along the epipolar line on the current frame with the reference keyframe has been performed, it is able to determine which parts of the new frame have been tracked in the depth map. To create a new keyframe, the following main conditions should be met:

1.
A given number of pixels of the new frame have not been tracked.
2.
A given distance or a given angle between the current frame and the reference keyframe is reached.

Note that in the second condition, keyframes could be generated frequently in our system, which is different with LSD-SLAM.

If the camera moves far away from the current keyframe, or the rotation increases sharply, a new keyframe would be created from the recent tracked frame. Based on the estimated camera motion between the two frames, the depth map of last keyframe will be projected into the new keyframe. New inverse depth is calculated by

$$\begin{aligned} \mu _k(x_k) = Tk^{-1}(x_0,\mu _0(x_0)){\mid }_Z \end{aligned}$$

(4)

where $x_k$ is the corresponding pixel position in the new keyframe:

$$\begin{aligned} x_k = kTk^{-1}(x_0,\mu _0(x_0)) \end{aligned}$$

(5)

Since keyframes could be generated frequently when pure rotation occurs, the camera rotation between the two keyframes could be assumed to be small in both general and rotation-only camera motions. Then, the new variance can be approximated by

$$\begin{aligned} \sigma ^2_{\mu _k} = {\left( \frac{\mu _k}{\mu _0}\right) }^4\sigma ^2_{\mu _0} + \sigma ^2_c \end{aligned}$$

(6)

where $\sigma ^2_c$ is a constant which approximately corresponds to the camera motion uncertainty. And parameters a and b which correspond to the probability of good measurement are simply equal to values in the previous keyframe.

Then, the depth measurement is allocated to the closest integer pixel position. For every pixel in the new keyframe, at most one depth measurement would be allowed. If there are two depth measurements are generated for one pixel, we need to handle the collision. Let $\eta _\mathrm{th}$ be the thresholds on the expectation of good measurement. There are three cases:

(a)
If both pixels meet the limit $E_\pi >\eta _\mathrm{th}$. Then if ${|\mu _1-\mu _2|}\le {\sigma _1+\sigma _2}$, they will be consider to be two independent estimations of one pixel, and we would fuse them. Otherwise, the point that is closer from the camera will be retained and the farther one would be considered to be occluded, and will be removed.
(b)
If only one pixel meets the limit $E_\pi >\eta _\mathrm{th}$, we choose this one to be remained.
(c)
If neither of the two pixels meets the limit, we will randomly choose one.

If a pixel with non-negligible gradient has no assigned depth measurement, the pixel will be initialized with a high expectation of indefinite measurement and will get a random depth and large variance. Then as new observations have been added into the Bayesian estimation, the depth measurement could be efficiently converged to the true value.

3.4 Depth map regularization

After the keyframe has been updated by subsequent new frames, one iteration regularization method will be performed to smooth the inverse depth value. In detail, we average the surrounding inverse depths with the weights of their possibility of good measurement and inverse variance. In order to preserve sharp edges, only pixels with adjacent depth will be calculated. The regularization function is defined as:

$$\begin{aligned} \mu _\mathrm{smooth}(x) = \frac{\sum _{{x}'\in {\varOmega _x}}{\alpha }g(E_\pi ({x}'),\sigma ({x}'))\mu _\mathrm{raw}({x}')}{\sum _{{x}'\in {\varOmega _x}}{\alpha }g(E_\pi ({x}'),\sigma ({x}'))} \end{aligned}$$

(7)

where $\varOmega _x$ is the set of valid pixels around pixel x in $3*3$ resolution, and $g(\pi ,\sigma )$ is the weighting function which will be introduced in Sect. 4. Parameter $\alpha $ is used to preserve sharp edge and is defined as

$$\begin{aligned} {\alpha }(\mu ,\sigma ,{\mu }',{\sigma }') = {\left\{ \begin{array}{ll} 0 &{}\quad {\Vert {\mu -{\mu }'}\Vert _1 > 2\Vert {\sigma -{\sigma }'}\Vert _1}\\ 1 &{}\quad \text {else} \end{array}\right. } \end{aligned}$$

(8)

The smoothed depth will be utilized to restrict the stereo search range (Sect. 3.2) and track new frames (Sect. 4).

4 Dense tracking based on the probabilistic depth map

The camera pose of new frame is estimated using the dense image alignment based on the depth map of the reference keyframe. As has been successfully applied in [7, 21], the photometric error for a pixel is defined as

$$\begin{aligned} r_I = I_2(kTk^{-1}(x,\mu (x)))-I_1(x) \end{aligned}$$

(9)

where k is the camera projection matrix and $k^{-1}$ is the inverse. $T\in \hbox {SE}(3)$ is a transformation matrix which represents the camera motion from the reference frame to the current frame. Since T has twelve parameters while the camera motion only has six degrees of freedom, we use Lie algebra $\xi \in {\hbox {se}(3)}$ which is associated with the group $\hbox {SE}(3)$. Then, the transformation matrix T can be calculated based on $\xi $ using the exponential $T=\exp (\xi )$. $I_1(x)$ is the intensity of pixel in the reference frame, and $I_2(x)$ the intensity in the current frame.

In order to enhance robustness, we add an additional weighting term which is calculated by the probability and variance of good measurement for each valid point. The camera motion $\xi ^*$ is calculated by minimizing the energy function:

$$\begin{aligned} \xi ^* = \mathop {\arg \min }_{\xi }\sum _{x\in \varOmega }g(E_\pi (x),\sigma _i(x))\Vert {r_i(\xi ,x)}\Vert _\varepsilon \end{aligned}$$

(10)

where $\Vert {r_i(\xi ,x)}\Vert _\varepsilon $ is the Huber norm to penalize the outliers and increase the robustness:

$$\begin{aligned} \Vert {r_i(\xi ,x)}\Vert _\varepsilon = {\left\{ \begin{array}{ll} \frac{\Vert {r_i(\xi ,x)}\Vert ^2_2}{2\varepsilon } &{}\quad \text {if}\;{\Vert {r_i(\xi ,x)}\Vert ^2\le \varepsilon }\\ \Vert {r_i(\xi ,x)}\Vert _1-\frac{\varepsilon }{2} &{}\quad \text {otherwise} \end{array}\right. } \end{aligned}$$

(11)

$g(\pi ,\sigma )$ is the weighting function, represented as

$$\begin{aligned} g(\pi ,\sigma ) = \pi \frac{\sigma ^2_{\max }}{\sigma ^2}+(1-\pi )\frac{\sigma ^2_{\max }}{\lambda } \end{aligned}$$

(12)

where $\lambda $ is a constant controlling the weight item of unknown measurement. Obviously, the weight of unknown measurement should be much smaller than the good measurement and $\lambda $ should be at least lager than $\sigma ^2_{\max }$. In our experiments, $\lambda $ is equal to $4\sigma ^2_{\max }$.

When a point have a high expected value $E_\pi $ for the probability of reliable measurement, the weight is mainly controlled by the measurement variance $\sigma ^2_i$. With a smaller variance $\sigma ^2_i$, this point can have larger weight to the minimization of energy function. Oppositely, weight will be less with a small expectation $E_\pi $ or higher variance $\sigma ^2_i$.

The solution to the minimization problem is computed iteratively based on the reweighted Gauss–Newton algorithm. A coarse-to-fine approach is implemented to handle larger inter-frame motions. Each new frame is first tracked on a low resolution image and depth map. The tracked pose is then used as initialization for the higher resolution. Depth map are down sampled by factors of two, using a weighted average of the inverse depth and inverse variance:

$$\begin{aligned} \mu _{l+1}(x) = \frac{\sum _{{x}'\in \varOmega _x}g({x}')\mu _l({x}')}{\sum _{{x}'\in \varOmega _x}g({x}')} \end{aligned}$$

(13)

$$\begin{aligned} \sigma _{l+1}(x) = \frac{\sum _{{x}'\in \varOmega _x}g({x}')\sigma _l({x}')}{\sum _{{x}'\in \varOmega _x}g({x}')} \end{aligned}$$

(14)

where l is the pyramid level and $\varOmega _x$ is the set of valid pixel contained in pixel at the higher resolution.

5 Experimental results and discussion

The implementation of our system was extended from the main framework of LSD-SLAM. We recorded two image sequences which contain both general motion and rotation-only motion to demonstrate the additional capabilities of our system.

Image sequence 1, see Fig. 7a, captures a room-sized indoor scene and is recorded by Ipad Air2 with a fish-eye lens. And sequence 2, see Fig. 7b, is recorded by a Micro Aerial Vehicle (MAV), DJI Phantom 3. It captures the outdoor scene of a museum from the air. We processed the image sequences with both our method and LSD-SLAM. The experiment was performed on a computer equipped with a quad-core 3.5GHz CPU and 8GB of RAM. Figure 7 also shows point clouds and camera trajectories produced by our method, while LSD-SLAM fails to create complete maps.

We collect tracking and mapping statistical results in the two image sequences. While LSD-SLAM can only create submaps in different period of regular camera motion, our approach can merge these submaps separated by rotation-only camera motion into a single map and provide more restriction to loop-closing optimization. Thus we could reconstruct a lager and denser semi-dense map. In Fig. 8, tracking and mapping statistical results for image sequence 1 is presented. In Fig. 8a, b, we observe that our method has tracked all of the frames, while LSD-SLAM fails six times (manually reset the system after a failure). All tracking failures of LSD-SLAM are caused by pure rotation camera motions, while these situations could be handled by our method. In Fig. 8c, we demonstrate the change of average expectation $E_\pi $ due to different situations of camera motion. When general motion occurs, the average expectation $E_\pi $ remains at a high level and is affected by the quality of camera motion. Fast camera motion and frequent keyframe change will lead to a relatively low value. While when rotation-only motion occurs, the average expectation $E_\pi $ reduces to a low level, because not enough parallax is observed. But rotation-only motion can still be tracked based on the map. When the camera motion returns to be general, the average $E_\pi $ returns to the formerly high level, too.

In Fig. 9, statistical results for image sequence 2 is presented. LSD-SLAM fails twice in tracking, while our method has tracked all of the frames. We also note that the broken line of average expectation $E_\pi $ in Fig. 9c is smoother than line in Fig. 8c, because the camera motion in image sequence 2 is much smoother than image sequence 1.

In Figs. 5 and 6, keyframe sequences of both algorithms in the period of rotation-only camera motion are presented to intuitively show how our approach can handle rotation-only motion. Figure 5 corresponds to frame 650–900 in image sequence 1, and Fig. 6 corresponds to frame 1080–1600 in image sequence 2. Both algorithms work well in the general camera motion (col. a in Figs. 5, 6).

When pure rotation occurs, due to the lack of parallax for mapping, LSD-SLAM cannot create new depth point and just propagate old depth point to the new keyframe. Thus, the number of valid depth points keeps decreasing (col. b–d in Figs. 5, 6) and finally the system fails in tracking. Our method initiates new non-negligible gradient pixel with low expectation $E_\pi $ and random depth. The new pixel will remain low expectation $E_\pi $ during rotation motion (col. b–d in Figs. 5, 6) because of lacking parallax but is enough to track the rotation-only motion. When pure rotation ends, the depth map of our method could be quickly updated to a correct depth configuration (col. e–g in Figs. 5, 6).

In order to demonstrate that our system could not only deal with the pure rotation motions but also run well in normal conditions, we evaluate the proposed approach on two widely used datasets, the City of Sights stage set [22] and the TUM RGB-D dataset [23]. Figures 10 and 11 depict the chosen frames from different views and the reconstruction results of these two datasets which are composed of coloured semi-dense 3D points. Table 1 shows the RMSE results in four sequences of TUM RGB-D dataset [23] compared to LSD-SLAM [8], and the results are very close since there are not too many rotation-only motions in these sequences.

Table 1 Comparison of RMSE (cm) on TUM RGB-D dataset [23]

Full size table

6 Conclusion

In this paper, we propose a real-time direct (featureless) monocular SLAM system which combines a probabilistic depth map model based on Bayesian estimation with the main framework of LSD-SLAM. The system has the capability to address rotation-only camera motion, which is always a severe challenge for current direct SLAM approaches.

The probabilistic depth map which models the depth of every pixel as a mixture of good measurement and unknown measurement is carried out, and both general and rotation-only camera motion can be handled by the computed Bayesian estimation.

Experimental results demonstrate the outstanding performance of the proposed system.

Like normal direct methods, however, our system will meet the great challenges in the presence of geometric noise or fast motion with the nature limitation of direct methods. In our future work, we would like to combine feature-based algorithms or IMU measurements to alleviate these problems.

References

Reif, R., Walch, D.: Augmented & virtual reality applications in the field of logistics. Vis. Comput. 24(11), 987–994 (2008)
Article Google Scholar
Ott, R., Thalmann, D., Vexo, F.: Haptic feedback in mixed-reality environment. Vis. Comput. 23(9), 843–849 (2007)
Article Google Scholar
Wang, S.W., Cai, K., Lu, J., Liu, X., Wu, E.: Real-time coherent stylization for augmented reality. Vis. Comput. 26, 445–455 (2010)
Article Google Scholar
Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: MonoSLAM: real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)
Article Google Scholar
Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: IEEE and ACM International Symposium on Mixed and Augmented Reality, Nara, Japan (2007)
Murartal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)
Article Google Scholar
Engel, J., Sturm, J., Cremers, D.: Semi-dense visual odometry for a monocular camera. In: IEEE International Conference on Computer Vision, Sydney, Australia (2013)
Engel, J., Schops, T., Cremers, D.: LSD-SLAM: large-scale direct monocular SLAM. In: European Conference on Computer Vision, Zurich, Switzerland (2014)
Gauglitz, S., Sweeney, C., Ventura, J., Turk, M., Hollerer, T.: Live tracking and mapping from both general and rotation-only camera motion. In: IEEE International Symposium on Mixed and Augmented Reality, Atlanta, USA (2012)
Pirchheim, C., Schmalstieg, D., Reitmayr, G.: Handling pure camera rotation in keyframe-based SLAM. In: International Symposium on Mixed and Augmented Reality, Adelaide, SA, Australia (2013)
Herrera, D., Kim, C.K., Kannala, J., Pulli, K., Heikkila, J.: Dt-slam: Deferred triangulation for robust slam. In: International Conference on 3D Vision, Tokyo, Japan (2014)
Pizzoli, M., Forster, C., Scaramuzza, D.: RE-MODE: probabilistic, monocular dense reconstruction in real time. In: IEEE International Conference on Robotics and Automation, Hong Kong, China (2014)
Newcombe, R., Lovegrove, S., Davison, A.J.: DTAM: dense tracking and mapping in real-time. In: International Conference on Computer Vision (2011)
Forster, C., Pizzoli, M., Scaramuzza, D.: SVO: fast semi-direct monocular visual odometry. In: IEEE International Conference on Robotics and Automation, Hong Kong, China (2014)
Vogiatzis, G., Hernandez, C.: Video-based, real-time multi-view stereo. Image Vis. Comput. 29(7), 434–441 (2011)
Article Google Scholar
Zhou, Y., Yan, F., Zhou, Z.: Probabilistic depth map model for rotation-only camera motion in semi-dense monocular SLAM. In: The 6th International Conference on Virtual Reality and Visualization, Hangzhou, China (2016)
Caruso, D., Engel, J., Cremers, D.: Large-scale direct SLAM for omnidirectional cameras. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, Hamburg, Germany (2015)
Engel, J., Stuckler, J., Cremers, D.: Large-scale direct SLAM with stereo cameras. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, Hamburg, Germany (2015)
Strasdat, H., Montiel, J.M., Davison, A.J.: Visual SLAM: why filter? Image Vis. Comput. 30(2), 65–77 (2012)
Article Google Scholar
Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment—a modern synthesis. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) Vision Algorithms: Theory and Practice. IWVA 1999. Lecture Notes in Computer Science, vol. 1883. Springer, Berlin (2000)
Kerl, C., Sturm, J., Cremers, D.: Robust odometry estimation for RGB-D cameras. In: IEEE International Conference on Robotics and Automation, Karlsruhe, Germany (2013)
Gruber, L., Gauglitz, S., Ventura, J., Zollmann, S., Huber, M., Schlegel, M., Klinker, G., Schmalstieg, D., Hllerer, T.: The city of sights: design, construction, and measurement of an augmented reality stage set. In: IEEE International Symposium on Mixed and Augmented Reality, Seoul, Korea (2010)
Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of RGB-D SLAM systems. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura, Portugal (2012)

Download references

Acknowledgements

This work is supported by the National 863 Program of China under Grant No. 2015AA016403 and the Natural Science Foundation of China under Grant Nos. 61472020, 61572061, 61602223.

Author information

Authors and Affiliations

State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
Yao Zhou, Feihu Yan & Zhong Zhou

Authors

Yao Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Feihu Yan
View author publications
You can also search for this author in PubMed Google Scholar
Zhong Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feihu Yan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, Y., Yan, F. & Zhou, Z. Handling pure camera rotation in semi-dense monocular SLAM. Vis Comput 35, 123–132 (2019). https://doi.org/10.1007/s00371-017-1435-0

Download citation

Published: 01 September 2017
Issue Date: 11 January 2019
DOI: https://doi.org/10.1007/s00371-017-1435-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Handling pure camera rotation in semi-dense monocular SLAM

Abstract

Similar content being viewed by others

Camera-Agnostic Monocular SLAM and Semi-dense 3D Reconstruction

LSD-SLAM: Large-Scale Direct Monocular SLAM

Sliding Window Based Monocular SLAM Using Nonlinear Optimization

1 Introduction

2 Related work