Multidirectional Conjugate Gradients for Scalable Bundle Adjustment

Weber, Simon; Demmel, Nikolaus; Cremers, Daniel

doi:10.1007/978-3-030-92659-5_46

Simon Weber¹¹,
Nikolaus Demmel¹¹ &
Daniel Cremers¹¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13024))

Included in the following conference series:

DAGM German Conference on Pattern Recognition

Abstract

We revisit the problem of large-scale bundle adjustment and propose a technique called Multidirectional Conjugate Gradients that accelerates the solution of the normal equation by up to 61%. The key idea is that we enlarge the search space of classical preconditioned conjugate gradients to include multiple search directions. As a consequence, the resulting algorithm requires fewer iterations, leading to a significant speedup of large-scale reconstruction, in particular for denser problems where traditional approaches notoriously struggle. We provide a number of experimental ablation studies revealing the robustness to variations in the hyper-parameters and the speedup as a function of problem density.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A scalable parallel preconditioned conjugate gradient method for bundle adjustment

Article 10 May 2021

Fast bundle-level methods for unconstrained and ball-constrained convex optimization

Article 07 February 2019

Robust Bundle Adjustment Revisited

Keywords

1 Introduction

The classical challenge of image-based large scale reconstruction is witnessing renewed interest with the emergence of large-scale internet photo collections [2]. The computational bottleneck of 3D reconstruction and structure from motion methods is the problem of large-scale bundle adjustment (BA): Given a set of measured image feature locations and correspondences, BA aims to jointly estimate the 3D landmark positions and camera parameters by minimizing a non-linear least squares reprojection error. More specifically, the most time-consuming step is the solution of the normal equation in the popular Levenberg-Marquardt (LM) algorithm that is typically solved by Preconditioned Conjugate Gradients (PCG).

In this paper, we propose a new iterative solver for the normal equation that relies on the decomposable structure of the competitive block Jacobi preconditioner. Inspired by respective approaches in the domain-decomposition literature, we exploit the specificities of the Schur complement matrix to enlarge the search-space of the traditional PCG approach leading to what we call Multidirectional Conjugate Gradients (MCG). In particular our contributions are as follows:

We design an extension of the popular PCG by using local contributions of the poses to augment the space in which a solution is sought for.
We experimentally demonstrate the robustness of MCG with respect to the relevant hyper-parameters.
We evaluate MCG on a multitude of BA problems from BAL [1] and 1dSfM [20] datasets with different sizes and show that it is a promising alternative to PCG.
We experimentally confirm that the performance gain of our method increases with the density of the Schur complement matrix leading to a speedup for solving the normal equation of up to 61% (Fig. 1).

2 Related Work

Since we propose a way to solve medium to large-scale BA using a new iterative solver that enlarges the search-space of the traditional PCG, in the following we will review both scalable BA and recent CG literature.

Scalable Bundle Adjustment

A detailed survey of the theory and methods in BA literature can be found in [17]. Sparsity of the BA problem is commonly exploited with the Schur complement matrix [5]. As the performance of BA methods is closely linked to the resolution of the normal equations, speed up the solve step is a challenging task. Traditional direct solvers such as sparse or dense Cholesky factorization [11] have been outperformed by inexact solvers as the problem size increases and are therefore frequently replaced by Conjugate Gradients (CG) based methods [1, 6, 18]. As its convergence rate depends on the condition number of the linear system a preconditioner is used to correct ill-conditioned BA problems [14]. Several works tackle the design of performant preconditioners for BA: [9] proposed the band block diagonals of the Schur complement matrix, [10] exploited the strength of the coupling between two poses to construct cluster-Jacobi and block-tridiagonal preconditioners, [7] built on the combinatorial structure of BA. However, despite these advances in the design of preconditioners, the iterative solver itself has rarely been challenged.

(Multi-preconditioned) Conjugate Gradients

Although CG has been a popular iterative solver for decades [8] there exist some interesting recent innovations, e.g. flexible methods with a preconditioner that changes throughout the iteration [13]. The case of a preconditioner that can be decomposed into a sum of preconditioners has been exploited by using Multi-Preconditioned Conjugate Gradients (MPCG) [4]. Unfortunately, with increasing system size MPCG rapidly becomes inefficient. As a remedy, Adaptive Multi-Preconditioned Conjugate Gradients have recently been proposed [3, 15]. This approach is particularly well adapted for domain-decomposable problems [12]. While decomposition of the reduced camera system in BA has already been tackled e.g. with stochastic clusters in [19], to our knowledge the decomposition inside the iterative solver has never been explored. As we will show in the following, this modification gives rise to a significant boost in performance.

3 Bundle Adjustment and Multidirectional Conjugate Gradients

We consider the general form of bundle adjustment with $n_{p}$ poses and $n_{l}$ landmarks. Let x be the state vector containing all the optimization variables. It is divided into a pose part $x_{p}$ of length $d_{p}n_{p}$ containing extrinsic and eventually intrinsic camera parameters for all poses (generally $d_{p}=6$ if only extrinsic parameters are unknown and $d_{p}=9$ if intrinsic parameters also need to be estimated) and a landmark part $x_{l}$ of length $3n_{l}$ containing the 3D coordinates of all landmarks. Let $r\left( x\right) =[r_{1}\left( x\right) ,...,r_{k}\left( x\right) ]$ be the vector of residuals for a 3D reconstruction. The objective is to minimize the sum of squared residuals

$$\begin{aligned} F\left( x\right) =\Vert r\left( x\right) \Vert ^{2} = \sum _i \Vert r_i(x) \Vert ^{2} \end{aligned}$$

(1)

3.1 Least Squares Problem and Schur Complement

This minimization problem is usually solved with the Levenberg Marquardt algorithm, which is based on the first-order Taylor approximation of $r\left( x\right) $ around the current state estimate $x^{0}=\left( x_{p}^{0},x_{l}^{0}\right) $:

$$\begin{aligned} r\left( x\right)&\approx r^{0}+J\varDelta x \end{aligned}$$

(2)

where

$$\begin{aligned} r^{0}&=r\left( x^{0}\right) ,\end{aligned}$$

(3)

$$\begin{aligned} \varDelta x&= x-x^{0},\end{aligned}$$

(4)

$$\begin{aligned} J&=\frac{\partial r}{\partial x}\mid _{x=x^{0}} \end{aligned}$$

(5)

and J is the Jacobian of r that is decomposed into a pose part $J_{p}$ and a landmark part $J_{l}$. An added regularization term that improves convergence gives the damped linear least squares problem

$$\begin{aligned} \underset{\varDelta x_{p},\varDelta x_{l}}{\text {min}}&\left( \Vert r^{0}+\left( \begin{array}{cc} J_{p}&J_{l}\end{array}\right) \left( \begin{array}{c} \varDelta x_{p}\\ \varDelta x_{l} \end{array}\right) \Vert ^{2}+\lambda \Vert \left( \begin{array}{cc} D_{p}&D_{l}\end{array}\right) \left( \begin{array}{c} \varDelta x_{p}\\ \varDelta x_{l} \end{array}\right) \Vert ^{2}\right) \end{aligned}$$

(6)

with $\lambda $ a damping coefficient and $D_{p}$ and $D_{c}$ diagonal damping matrices for pose and landmark variables. This damped problem leads to the corresponding normal equation

$$\begin{aligned} H\left( \begin{array}{c} \varDelta x_{p}\\ \varDelta x_{l} \end{array}\right)&=-\left( \begin{array}{c} b_{p}\\ b_{l} \end{array}\right) \end{aligned}$$

(7)

where

$$\begin{aligned} H&= \left( \begin{array}{cc} U_{\lambda } &{} W\\ W^{\top } &{} V_{\lambda } \end{array}\right) ,\end{aligned}$$

(8)

$$\begin{aligned} U_{\lambda }&=J_{p}^{\top }J_{p}+\lambda D_{p}^{\top }D_{p},\end{aligned}$$

(9)

$$\begin{aligned} V_{\lambda }&=J_{l}^{\top }J_{l}+\lambda D_{l}^{\top }D_{l},\end{aligned}$$

(10)

$$\begin{aligned} W&=J_{p}^{\top }J_{l},\text { }b_{p}=J_{p}^{\top }r^{0},\end{aligned}$$

(11)

$$\begin{aligned} b_{l}&=J_{l}^{\top }r^{0} \end{aligned}$$

(12)

As the system matrix H is of size $\left( d_{p}n_{p}+3n_{l}\right) ^{2}$ and tends to be excessively costly for large-scale problems [1], it is common to reduce it by using the Schur complement trick and forming the reduced camera system

$$\begin{aligned} S\varDelta x_{p}&=-\widetilde{b} \end{aligned}$$

(13)

with

$$\begin{aligned} S&=U_{\lambda }-WV_{\lambda }^{-1}W^{\top },\end{aligned}$$

(14)

$$\begin{aligned} \widetilde{b}&=b_{p}-WV_{\lambda }^{-1}b_{l} \end{aligned}$$

(15)

and then solving (13) for $\varDelta x_{p}$ and backsubstituting $\varDelta x_{p}$ in

$$\begin{aligned} \varDelta x_{l}&=-V_{\lambda }^{-1}\left( -b_{l}+W^{\top }\varDelta x_{p}\right) \end{aligned}$$

(16)

3.2 Multidirectional Conjugate Gradients

Direct methods such as Cholesky decomposition [17] have been studied for solving (13) for small-size problems, but this approach implies a high computational cost whenever problems become too large.

A very popular iterative solver for large symmetric positive-definite system is the CG algorithm [16]. Since its convergence rate depends on the distribution of eigenvalues of S it is common to replace (13) by a preconditioned system. Given a preconditioner M the preconditioned linear system associated to

$$\begin{aligned} S\varDelta x_{p}=-\widetilde{b} \end{aligned}$$

(17)

is

$$\begin{aligned} M^{-1}S\varDelta x_{p}=-M^{-1}\widetilde{b} \end{aligned}$$

(18)

and the resulting algorithm is called Preconditioned Conjugate Gradients (PCG) (see Algorithm 1). For block structured matrices as S a competitive preconditioner is the block diagonal matrix $D\left( S\right) $, also called block Jacobi preconditioner [1]. It is composed of the block diagonal elements of S. Since the block $S_{mj}$ of S is nonzero if and only if cameras m and j share at least one common point, each diagonal block depends on a unique pose and is applied to the part of conjugate gradients residual $r_{i}^{j}$ that is associated to this pose. The motivation of this section is to enlarge the conjugate gradients search space by using several local contributions instead of a unique global contribution.

Adaptive Multidirections

Local Preconditioners. We propose to decompose the set of poses into N subsets of sizes $l_{1},\ldots , l_{N}$ and to take into consideration the block-diagonal matrix $D_{p}\left( S\right) $ of the block-jacobi preconditioner and the associated residual $r^{p}$ that correspond to the $l_{p}$ poses of subset p (see Fig. 2(a)). All direct solves are performed inside these subsets and not in the global set. Each local solve is treated as a separate preconditioned equation and provides a unique search-direction. Consequently the conjugate vectors $Z_{i+1}\in \mathbb {R}^{d_{p}n_{p}}$ in the preconditioned conjugate gradients (line 10 in Algorithm 1) are now replaced by conjugate matrices $Z_{i+1}\in \mathbb {R}^{d_{p}n_{p}\times N}$ whose each column corresponds to a local preconditioned solve. The search-space is then significantly enlarged: N search directions are generated at each inner iteration instead of only one. An important drawback is that matrix-vector products are replaced by matrix-matrix products which can lead to a significant additional cost. A trade-off between convergence improvement and computational cost needs to be designed.

Adaptive $\tau $-Test. Following a similar approach as in [15] we propose to use an adaptive multidirectional conjugate gradients algorithm (MCG, see Algorithm 2) that adapts automatically if the convergence is too slow. Given a threshold $\tau \in \mathbb {R}^{+}$ chosen by the user, a $\tau $-test determines whether the algorithm sufficiently reduces the error (case $t_{i}>\tau $) or not (case $t_{i}<\tau $). In the first case a global block Jacobi preconditioner is used and the algorithm performs a step of PCG; in the second case local block Jacobi preconditioners are used and the search-space is enlarged (see Fig. 2(b)).

Optimized Implementation. Besides matrix-matrix products two other changes appear. Firstly an $N\times N$ matrix $\varDelta _{i}$ must be inverted (or pseudo-inverted if $\varDelta _{i}$ is not full-rank) each time $t_{i}<\tau $ (line 4 in Algorithm 2). Secondly a full reorthogonalization is now necessary (line 16 in Algorithm 2) because of numerical errors while $\beta _{i,j}=0$ as soon as $i\ne j$ in PCG.

To improve the efficiency of MCG we do not directly apply S to $P_{i}$ (line 3 in Algorithm 2) when the search-space is enlarged. By construction the block $S_{kj}$ is nonzero if and only if cameras k and j observe at least one common point. The trick is to use the construction of $Z_{i}$ and to directly apply the non-zero blocks $S_{jk}$, i.e. consider only poses j observing a common point with k, to the column in $Z_{i}$ associated to the subset containing pose k and then to compute

$$\begin{aligned} Q_{i}&=SZ_{i}-\mathop {\sum }_{j=0}^{i-1}Q_{j}\beta _{i,j} \end{aligned}$$

(19)

To get $t_{i}$ we need to use a global solve (line 10 in Algorithm 2). As the local block Jacobi preconditioners $\{D_{p}(S)\}_{p=1,...,N}$ and the global block Jacobi preconditioner $D\left( S\right) $ share the same blocks it is not necessary to derive all local solves to construct the conjugates matrix (line 12 in Algorithm 2); instead it is more efficient to fill this matrix with block-row elements of the preconditioned residual $D\left( S\right) ^{-1}r_{i+1}$.

As the behaviour of CG residuals is a priori unknown the best decomposition is not obvious. We decompose the set of poses into $N-1$ subsets of same size and the last subset is filled by the few remaining poses. This structure presents the practical advantage to be very easily fashionable and the parallelizable block operations are balanced.

4 Experimental Evaluations

4.1 Algorithm and Datasets

Levenberg-Marquardt (LM) Loop. Starting with damped parameter $10^{-4}$ we update $\lambda $ according to the success or failure of the LM loop. Our implementation runs for at most 25 iterations, terminating early if a relative function tolerance of $10^{-6}$ is reached. Our evaluation is built on the LM loop implemented in [19] and we also estimate intrinsics parameters for each pose.

Iterative Solver Step. For a direct performance comparison we implement our own MCG and PCG solvers in C++ by using Eigen 3.3 library. All row-major-sparse matrix-vector and matrix-matrix products are multi-threaded by using 4 cores. The tolerance $\epsilon $ and the maximum number of iterations are set to $10^{-6}$ and 1000 respectively. Pseudo-inversion is derived with the pseudo-inverse function from Eigen.

Datasets. For our evaluation we use 9 datasets with different sizes and heterogeneous Schur complement matrix densities d from BAL [1] and 1dSfM [20] datasets (see Table 1). The values of N and $\tau $ are arbitrarily chosen and the robustness of our algorithm to these parameters is discussed in the next subsection.

Table 1. Details of the problems from BAL (prefixed as: F for final, L for Ladybug) and 1dSfM used in our experiments. d is the density of the associated Schur complement matrix, N is the number of subsets, $\tau $ is the adaptive threshold that enlarges the search-direction space.

Full size table

We run experiments on MacOS 11.2 with Intel Core i5 and 4 cores at 2 GHz.

4.2 Sensitivity with $\tau $ and N

In this subsection we are interested in the solver runtime ratio that is defined as $\frac{t_{MCG}}{t_{PCG}}$ where $t_{MCG}$ (resp. $t_{PCG}$) is the total runtime to solve all the linear systems (12) with MCG (resp. PCG) until a given BA problem converges. We investigate the influence of $\tau $ and N on this ratio.

Sensitivity with $\tau $. We solve BA problem for different values of $\tau $ and for a fixed number of subsets N given in Table 1. For each problem a wide range of values supplies a good trade-off between the augmented search-space and the additional computational cost (see Fig. 3). Although the choice of $\tau $ is crucial it does not require a high accuracy. That confirms the tractability of our solver with $\tau $.

Sensitivity with N. Similarly we solve BA problem for different values of N and for a fixed $\tau $ given in Table 1. For each problem a wide range of values supplies a good trade-off between the augmented search-space and the additional computational cost (see Fig. 4). That confirms the tractability of our solver with N.

4.3 Density Effect

As the performance of PCG and MCG depends on matrix-vector product and matrix-matrix product respectively we expect a correlation with the density of the Schur matrix. Figure 5 investigates this intuition: MCG greatly outperforms PCG for dense Schur matrix and is competitive for sparse Schur matrix.

4.4 Global Performance

Figures 6 and 7 present the total runtime with respect to the number of BA iterations for each problem and the convergence plots of total BA cost for F-1936 and Alamo datasets, respectively. MCG and PCG give the same error at each BA iteration but the first one is more efficient in terms of runtime. Table 2 summarizes our results and highlights the great performance of MCG for dense Schur matrices. In the best case BA resolution is more than $20\%$ faster than using PCG. Even for sparser matrices MCG competes PCG: in the worst case MCG presents similar results as PCG. If we restrict our comparison to the linear system solve steps our relative results are even better: MCG is up to $60\%$ faster than PCG and presents similar results as PCG in the worst case.

Table 2. Relative performances of MCG w.r.t. PCG. d is the density of the associated Schur complement matrix. MCG greatly outperforms PCG (up to 61% faster) for dense Schur matrix and competes PCG for sparse Schur matrix. The global BA resolution is up to 22% faster.

Full size table

5 Conclusion

We propose a novel iterative solver that accelerates the solution of the normal equation for large-scale BA problems. The proposed approach generalizes the traditional preconditioned conjugate gradients algorithm by enlarging its search-space leading to a convergence in much fewer iterations. Experimental validation on a multitude of large scale BA problems confirms a significant speedup in solving the normal equation of up to 61%, especially for dense Schur matrices where baseline techniques notoriously struggle. Moreover, detailed ablation studies demonstrate the robustness to variations in the hyper-parameters and increasing speedup as a function of the problem density.

References

Agarwal, S., Snavely, N., Seitz, S.M., Szeliski, R.: Bundle adjustment in the large. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 29–42. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15552-9_3
Chapter Google Scholar
Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.: Building Rome in a day. In: International Conference on Computer Vision (ICCV) (2009)
Google Scholar
Bovet, C., Parret-Fréaud, A., Spillane, N., Gosselet, P.: Adaptive multipreconditioned FETI: scalability results and robustness assessment. Comput. Struct. 193, 1–20 (2017)
Article Google Scholar
Bridson, R., Greif, C.: A multipreconditioned conjugate gradient algorithm. SIAM J. Matrix Anal. Appl. 27(4), 1056–1068 (2006). (electronic)
Article MathSciNet Google Scholar
Brown, D.C.: A solution to the general problem of multiple station analytical stereo triangulation. RCA-MTP data reduction technical report no. 43 (or AFMTC TR 58–8), Patrick Airforce Base, Florida (1958)
Google Scholar
Byrod, M., Aström, K., Lund, S.: Bundle adjustment using conjugate gradients with multiscale preconditioning. In: BMVC (2009)
Google Scholar
Dellaert, F., Carlson, J., Ila, V., Ni, K., Thorpe, C.E.: Subgraph-preconditioned conjugate gradients for large scale slam. In: IROS, pp. 2566–2571 (2010)
Google Scholar
Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Standards 49(409–436), 1952 (1953)
MathSciNet Google Scholar
Jeong, Y., Nister, D., Steedly, D., Szeliski, R., Kweon, I.-S.: Pushing the envelope of modern methods for bundle adjustment. In: CVPR, pp. 1474–1481 (2010)
Google Scholar
Kushal, A., Agarwal, S.: Visibility based preconditioning for bundle adjustment. In: CVPR (2012)
Google Scholar
Lourakis, M., Argyros, A.: Is Levenberg-Marquardt the most efficient optimization algorithm for implementing bundle adjustment. In: International Conference on Computer Vision (ICCV), pp. 1526–1531 (2005)
Google Scholar
Mandel, J.: Balancing domain decomposition. Comm. Numer. Methods Eng. 9(3), 233–241 (1993)
Article MathSciNet Google Scholar
Notay, Y.: Flexible conjugate gradients. SIAM J. Sci. Comput. 22(4), 1444–1460 (2000)
Article MathSciNet Google Scholar
Saad, Y.: Iterative Methods for Sparse Linear Systems. SIAM, PHiladelphia (2003)
Book Google Scholar
Spillane, N.: An adaptive multipreconditioned conjugate gradient algorithm. SIAM J. Sci. Comput. 38(3), A1896–A1918 (2016)
Article MathSciNet Google Scholar
Trefethen, L., Bau, D.: Numerical Linear Algebra. SIAM, Philadelphia (1997)
Book Google Scholar
Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment — a modern synthesis. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) IWVA 1999. LNCS, vol. 1883, pp. 298–372. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44480-7_21
Chapter Google Scholar
Wu, C., Agarwal, S., Curless, B., Seitz, S.: Multicore bundle adjustment. In: CVPR, pp. 3057–3064 (2011)
Google Scholar
Zhou, L., et al.: Stochastic bundle adjustment for efficient and scalable 3D reconstruction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 364–379. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_22
Chapter Google Scholar
Wilson, K., Snavely, N.: Robust global translations with 1DSfM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 61–75. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_5
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Technical University of Munich, Munich, Germany
Simon Weber, Nikolaus Demmel & Daniel Cremers

Authors

Simon Weber
View author publications
You can also search for this author in PubMed Google Scholar
Nikolaus Demmel
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Cremers
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simon Weber .

Editor information

Editors and Affiliations

Fraunhofer IAIS, Sankt Augustin, Germany
Christian Bauckhage
University of Bonn, Bonn, Germany
Juergen Gall
University of Illinois at Urbana-Champaign, Urbana, IL, USA
Alexander Schwing

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Weber, S., Demmel, N., Cremers, D. (2021). Multidirectional Conjugate Gradients for Scalable Bundle Adjustment. In: Bauckhage, C., Gall, J., Schwing, A. (eds) Pattern Recognition. DAGM GCPR 2021. Lecture Notes in Computer Science(), vol 13024. Springer, Cham. https://doi.org/10.1007/978-3-030-92659-5_46

Download citation

DOI: https://doi.org/10.1007/978-3-030-92659-5_46
Published: 13 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92658-8
Online ISBN: 978-3-030-92659-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multidirectional Conjugate Gradients for Scalable Bundle Adjustment

Abstract

Similar content being viewed by others

A scalable parallel preconditioned conjugate gradient method for bundle adjustment