Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In this chapter range cameras and especially the data they provide are investigated. Range cameras can be considered to provide a 3D point cloud, i.e. a set of points in 3D, for each frame. The overall aim of this chapter is to describe techniques that will provide point clouds that:

  • Are “free” of systematic errors (definition to follow); and

  • Are registered together from multiple images or an image stream into one superior coordinate system.

In order to achieve this goal, the quality of range camera data has to be analyzed. The error sources need to be studied and grouped into random and systematic errors. Systematic errors are errors that can be reproduced under the same measurement conditions, e.g. a range measurement may be systematically affected by the brightness of the scene. Random errors are independent of each other and of the scene. Their influence can be reduced by averaging. Systematic errors can be reduced if additional aspects of the measurement instrument or the scene are exploited to extend the model describing the relation of the quantities of interest, i.e. the point coordinates, and the raw measurements. This calibration approach is called data-driven, and needs to be distinguished from those that aim to modify the camera physically.

As range cameras have a field of view in the order of 45° by 45°, one frame is often not sufficient to record an entire object. This holds true for building interiors, cultural heritage artefacts, statues, urban objects (city furniture, trees, etc.), and many other objects. Thus frames from different positions and with different angular attitude need to be recorded in order to cover the whole object. This task is called registration and usually requires finding the orientation of the camera.

In the following sections, the basic models used in data acquisition and orientation will first be presented. Next we will concentrate on error sources and split them into random and systematic. In that section also the mitigation of errors will be discussed. The topic of the following section will be orientation of 3D camera frames. In the last section the calibration of range camera data is presented.

Range imaging is a field of technology that is developing rapidly. We concentrate on TOF cameras here, i.e., cameras that measure range directly and not by triangulation. Furthermore, we restrict ourselves to commercially available cameras that can reliably provide range information over the entire field of view.

2 Geometric Models

Photogrammetry is the scientific discipline concerned with the reconstruction of real-world objects from imagery of those objects. It is natural to extend the geometric modelling approaches developed and adopted by photogrammetrists for passive cameras to TOF range cameras. The well-accepted basis for that modelling is the pinhole camera model in which the compound lens is replaced (mathematically) by the point of intersection of the collected bundle of rays, the perspective centre (PC). The collinearity condition,

$$ {\mathbf{r}}_{\rm{i}} = {\mathbf{r}}_{\rm{j}}^{\rm{c}} + \lambda_{\rm{ij}} {\mathbf{R}}_{\rm{j}}^{\rm{T}} {\mathbf{p}}_{\rm{ij}} $$
(1)

that a point on the object of interest, \( {\mathbf{r}}_{\rm{i}} = \left( {\begin{array}{*{20}c} \rm{X} & \rm{Y} & \rm{Z} \\ \end{array} } \right)_{\rm{i}}^{\rm{T}} \), its homologous point in the positive image plane, \( {\mathbf{p}}_{\rm{ij}} = \left( {\begin{array}{*{20}c} {\rm{x_{ij}} - \rm{x_{{p_{j} }} }} & \rm{{y_{ij}} - \rm{y_{{p_{j} }} }} & \rm{{ - c_{j} }} \\ \end{array} } \right)^{\rm{T}} \), and the camera’s perspective centre, \( {\mathbf{r}}_{\rm{j}}^{\rm{c}} = \left( {\begin{array}{*{20}c} {\rm{X}^{\rm{c}} } & {\rm{Y}^{\rm{c}} } & {\rm{Z}^{\rm{c}} } \\ \end{array} } \right)_{\rm{j}}^{\rm{T}}\), lie on a straight line holds true if the incoming light rays are undistorted (Fig. 1).

Fig. 1
figure 1

TOF range camera geometric model

Two basic sets of parameters are needed to model the central perspective imaging geometry. The first is the exterior orientation (extrinsic) parameter (EOP) set that models the camera pose, more specifically the position and angular orientation of the image space relative to object space. The EOP set thus comprises the three-element position vector of the PC, \( {\mathbf{r}}_{\rm{j}}^{\rm{c}} \), and three independent angular parameters, often Euler angles \( \left( {\upomega_{\rm{j}} ,\upphi_{\rm{j}} ,\upkappa_{\rm{j}} } \right) \). However parameterized, the angular elements are encapsulated in a 3 × 3 rotation matrix, e.g.

$$ {\mathbf{R}}_{\rm{j}} = {\mathbf{R}}_{3} \left( {\upkappa_{\rm{j}} } \right){\mathbf{R}}_{2} \left( {\upphi_{\rm{j}} } \right){\mathbf{R}}_{1} \left( {\upomega_{\rm{j}} } \right) $$
(2)

The determination of the EOPs for a TOF range camera is the subject of Sect. 3.

The second is the interior orientation (intrinsic) parameter (IOP) set that models the basic geometry inside the camera and comprises three elements. The first two are the co-ordinates of the principal point, \( {\rm{\left( {x_{{p_{j} }} ,y_{{p_{j} }} } \right)}} \), the point of intersection of the normal to the image plane passing through the perspective centre. The third is the principal distance, \( {\rm{c_{j}}} \), the orthogonal distance between the image plane and the PC, which is not necessarily equal to the focal length.

In passive-camera photogrammetry the unique scale factor \( \lambda_{\rm{{ij}}} \) is unknown and, as a result, 3D co-ordinates cannot be estimated from a single view without additional constraints. A TOF range camera allows extension of the collinearity model with the condition that the length of the line between the PC and an object point is equal to the measured range. Thus 3D object space co-ordinates can be uniquely determined from a single range camera view

$$ {\mathbf{r}}_{{\rm i}} = {\mathbf{r}}_{{\rm j}}^{{\rm c}} + \frac{{\uprho_{{\rm{ij}}} }}{{\left\| {{\mathbf{p}}_{{\rm{ij}}} } \right\|}}{\mathbf{R}}_{{\rm j}}^{\rm{T}} {\mathbf{p}}_{{\rm {ij}}} $$
(3)

Following the usual convention in photogrammetry that is well suited to Gauss-Markov model formulation and least-squares estimation techniques, the extended collinearity condition can be recast from the direct form of Eq. 3 into observation equations of image point location, \( \left( {{\rm{x_{ij}}} ,{\rm{y_{ij}}}} \right) \)

$$ {\rm{x_{ij}}} + \varepsilon_{{\rm{x_{ij}}}} = {\rm{x_{{p_{j}}} - c_{j} \frac{{U_{ij} }}{{W_{ij} }} }}+ \Updelta {\rm{x_{ij}}} $$
(4)
$$ {\rm{y_{ij}}} + \varepsilon_{\rm{{y_{ij}}}} = {\rm{y_{{p_{j} }} - c_{j} \frac{{V_{ij} }}{{W_{ij}}}}} + \Updelta {\rm{y_{ij}}} $$
(5)

where

$$ ( {\rm U} \quad {\rm V} \quad {\rm W})_{\rm{{ij}^{T}}} = {\mathbf{R}}_{{\rm j}} ({{\mathbf{{\rm r}}}_{{\rm i}} - {\mathbf{r}}_{\text j}^{c} } ) $$
(6)

and range, \( \uprho_{{\rm{ij}}} \)

$$ \uprho_{\rm{i}{j}} + \varepsilon_{\uprho_{\rm{i}{j}}} = \left\| {{\mathbf{r}}_{{\rm i}}} - {\mathbf{r}}_{\rm j}^{{\rm c}} \right\| + \Updelta \uprho_{{\rm{ij}}} $$
(7)

These equations have been augmented with systematic error terms \( \left( {\Updelta {\rm{x}}_{\rm{ij}} ,\Updelta {\rm{y}}_{\rm{ij}} ,\Updelta \uprho_{\rm{ij}} } \right) \) and random error terms \( \left( {\upvarepsilon_{\rm{{x_{ij} }}} ,\upvarepsilon_{\rm{{y_{ij} }}} ,\upvarepsilon_{\rm{{\uprho_{ij} }}} } \right) \) that account for imperfections in the imaging system, which are described in Sect. 4.

3 Orientation

Commonly several separate stations are required to entirely capture the geometry of a scene with a range camera. This might be necessary because of the limited field of view of the sensor as mentioned, its limited range, the extent of the object, self-occlusion of the object or occlusions caused by other objects. Before the data can be passed down the processing pipeline to successive steps, such as meshing and modelling, the alignment of the range measurements into a common reference frame has to be performed, often referred to as registration. Mathematically we seek the rigid body transformation defining the six parameters of translation and rotation which transforms points from the sensor coordinate system to the common or global coordinate system. As we assume the camera to be calibrated with the procedures described in Sect. 5, the scale parameter is known. Since the common coordinate system for a range camera is seldom a geodetic coordinate system, we do not specifically consider the issue of georeferencing.

There exist various approaches to solve the problem of orientation. The approaches differ in their prerequisites on the scene, prior knowledge, extra sensor measurements and the level of automation and robustness. The approaches can be categorized into (1) marker-based approaches, which require the placement of markers (e.g. planar targets, spheres, etc.) in the scene, either as control or tie points, (2) sensor-based approaches, which require additional sensors (e.g. IMU, external trackers, etc.) to be attached to the scanner in order to directly determine the position and orientation of the sensor and (3) data-driven approaches, which use the geometry and other properties of the acquired data to determine the transformations in-between scans.

The orientation of range measurements acquired with a 3D camera shares many properties with orientation of other range data, e.g. acquired with terrestrial laser scanners. Therefore many of the known solutions [1] can be applied to the orientation of 3D cameras. One property that makes 3D cameras distinctly different is the higher frame rate which offers additional processing possibilities.

3.1 Marker-Based Orientation

The marker-based registration is expected to achieve the highest accuracy. It utilises artificial targets which must be inserted into the scene. Artificial targets are well known from photogrammetry and classical surveying. Artificial targets either represent control points which are linked to some reference coordinate system or tie points. Since most TOF cameras provide both an intensity and a depth channel, both two-dimensional flat targets and three-dimensional shapes can be used as markers. The dominant shape for two-dimensional targets is white circles on a contrasting background, although checker-board patterns are also used. The dominant three-dimensional shape is spheres. Using artificial targets has the advantage of enabling measurements on ‘cooperative’ surfaces, i.e. surfaces of chosen reflectance properties. This removes any measurements errors due to disadvantageous material properties. Due to the limited pixel-count of many TOF cameras it can be a problem to provide markers in sufficient numbers and with sufficient size in the image domain. This problem occurs specifically in calibration. Reference [2] used infrared LEDs as markers. They are both small in image space and yet deliver precise measurements. The software tools for extracting markers from the image data can be directly adopted from close-range photogrammetric software (for two-dimensional markers) or from terrestrial laser scanning (for three-dimensional markers).

However accurate, it must be noted that marker-based approaches require extra effort for the placement and measurement of the targets. The placement of such targets may be prohibited for certain objects. For these reasons marker-less approaches are of high interest both form a practical point of view and from an algorithmic point of view.

3.2 Sensor-Based Orientation

Well-known in aerial photogrammetry, sensor-based orientation involves the integration of additional sensors to measure the pose of the camera. Typically a GNSS sensor and an inertial measurement unit (IMU) are integrated for estimation of position and orientation parameters. However as most TOF cameras are used indoors we rarely find this integration. Rather we see sensor-assisted orientation, where an IMU is used to stabilize the estimation of the orientation parameters. This approach is common in robotics [3].

3.3 Data-Driven Orientation

Data-driven registration attempts to find the transformation parameters in-between the camera stations from the sensed point cloud and intensity data itself. Some approaches reduce the complexity of the dataset by using feature extraction. Again as we can use both intensity and range data, several feature operators are available most well-known from purely image-based approaches. Reference [4] gives a comparison of some standard feature operators on TOF camera images and reports the SIFT to provide the best results. Recent work in robotics has produced novel local feature descriptors which purely use the 3D information, such as the Point Feature Histogram [5] and the Radius-based Surface Descriptor [6].

The group of algorithms for using the full point cloud geometry is the Iterative Closest Point (ICP) algorithms. The ICP has originally been proposed by Besl and McKay [7] for the registration of digitized data to an idealized geometric model, a typical task in quality inspection. The ICP can also be used on a sparse cloud of feature points. In its most basic form the ICP is restricted to pair-wise registration of fully overlapping datasets. Several extensions and improvements to the original algorithm have been proposed since its original publication. An overview is given by Eggert et al. [8] (we also recommend the extended version of the article which is available online) and Rusinkiewicz and Levoy [9]. It should be noted that the ICP has progressed to be the dominant registration algorithm for multi-station point cloud alignment.

4 Error Sources

Three types of errors can be distinguished and used to characterize the behaviour of instruments including TOF cameras. These are random errors, systematic errors, and gross errors.

The random errors are independent of each other (see also [10]). This applies, on the one hand, to the errors of all pixels within a frame, but on the other hand also to the errors in each pixel through the frames. Random errors in range cameras have their cause in shot noise and dark current noise. Repeating an experiment with a TOF camera will result in slightly different ranges being recorded, which are caused by the random errors. By averaging, their influence can be reduced. One way of averaging is to perform the (complex) averaging of frames if the exterior orientation as well as the scene is stable and the warm-up period of the camera has passed. In this case, averaging is performed in the time domain. Another way of averaging, performed in the spatial domain, is the modeling of the scene using geometric primitives, for example. This requires prior knowledge of the suitability and existence of these primitives within the scene. However, a group of measurements belonging to one primitive visible in one frame will have independent random errors. By applying an optimization technique, parameters of the primitives can be estimated such that the errors between measurement and scene model are minimized. This results in a scene model of better precision than the original measurements. By increasing the number of measurements, either spatially or temporally in the averaging process, the precision improves.

However, averaging does not necessarily lead to more and more accurate values, especially because of the existence of systematic errors. Those errors may stay constant during the repetition of an experiment or they may vary slowly, e.g., because of the temperature of the chip. In first instance, quantifying these errors is of interest, because it describes how much the measurement may deviate from the “true” value even if the random errors were eliminated. However, those errors can also be modeled, which is in fact an extension of the models described in Sect. 2. Among those errors are lens distortion and range errors caused by electronic cross talk. While the causes and suitable modeling strategies for these effects are known, reproducible errors for which neither the origin nor an appropriate modelling approach are known are also encountered. In this context, the distinction will be made between the physical parameters and the empirical parameters used to extend the basic models of Sect. 2.

Finally, gross errors, also called blunders, are defined as errors which do not fit to the measurement process at all. They have to be identified with suitable mathematical approaches, typically robust methods of parameter estimation, and eliminated. Those errors will not be further discussed here, but their detection and elimination is an area of ongoing research [11].

The systematic errors described in the following are camera internal errors, errors related to the operation of the camera, or errors related to the scene structure.

  • Lens distortions: camera internal errors modelled with physical parameters

  • Range finder offset, range periodic error and signal propagation delay: camera internal errors modelled with physical parameters

  • Range error due to position in the sensor and range errors related to the recorded amplitude: camera internal error, modelled empirically

  • Internal scattering: camera internal error modelled empirically and physically, respectively, by different authors

  • Fixed pattern noise: camera internal error, not modelled

  • Integration time errors: related to the operation of the camera, resulting in different sets of error models (e.g. for range) for different integration times

  • Camera warm up errors and temperature effects during measurement: related to the operation of the camera, quantified, not modelled

  • Range wrapping errors: related to the distances in the scene, modelled physically

  • Scene multi-path errors: related to the scene structure, quantified by some experiments.

Motion blur, caused by the movement of the TOF camera or the movements in the scene, is a further effect. If the range to a target within a pixel’s instantaneous field of view is not constant during the acquisition of a frame, then the recorded range corresponds to an average distance. Likewise, multiple objects at different distances may be within a pixel’s instantaneous field of view, which will lead to an averaged distance (the mixed pixels effect). It is not appropriate to term these errors of the measurement because the measurement itself is performed as an integral over the entire pixel. However, the recorded range does not necessarily correspond to a distance from the sensor to the target which can be found in the scene itself. Therefore, these measurements should be treated as gross errors.

4.1 Lens Distortions

Radial lens distortion, or simply distortion, is one of the five Seidel aberrations and is due to non-linear variation in magnification. The systematic effect of radial lens distortion is isotropic and is zero at the principal point. Many TOF range cameras exhibit severe (i.e. tens of pixels at the edge of the image format) negative or barrel distortion. The mathematical model for radial lens distortion, ∆rrad, is an odd-powered polynomial as a function of radial distance, r, the Gaussian distortion profile

$$ \Updelta {\rm r}_{\rm rad} = {\rm k}_{1} {\rm r}^{3} + {\rm k}_{2} {\rm r}^{5} + {\rm k}_{3} {\rm r}^{7} $$
(8)

where k1, k2, k3 are the radial lens distortion model coefficients and

$$ {\rm r} = \sqrt {\left( {{\rm x} - {\rm x}_{\rm p} } \right)^{2} + \left( {{\rm y} - {\rm y}_{\rm p} } \right)^{2} } $$
(9)

The correction terms for the image point co-ordinates are easily derived from similar triangle relationships

$$ \Updelta {\rm x}_{\rm{rad}} = \left( {{\rm x} - {\rm x}_{\rm p} } \right)\left( {{\rm k}_{1} {\rm r}^{2} + {\rm k}_{2} {\rm r}^{4} + {\rm k}_{3} {\rm r}^{6} } \right) $$
(10)
$$ \Updelta {\rm y}_{\rm rad} = \left( {{\rm y} - {\rm y}_{\rm p} } \right)\left( {{\rm k}_{1} {\rm r}^{2} + {\rm k}_{2} {\rm r}^{4} + {\rm k}_{3} {\rm r}^{6} } \right) $$
(11)

Often only one or two coefficients are required to accurately model the distortion.

Decentring lens distortion arises due to imperfect assembly of a compound lens in which the centres of curvature of each lens element are not collinear due to lateral and/or angular offsets. It can be caused by inaccurate alignment of the sensor relative to the lens mount, i.e., the optical axis is not orthogonal to the detector array [12]. The effect is asymmetric having both radial and tangential components. Conrady’s model for decentring distortion, which is expressed in terms of the radial and tangential terms, can be recast into Cartesian components

$$ \Updelta {\rm x_{dec} }= {\rm p_{1}} \left( {{\rm r^{2}} + 2\left( {{\rm x} - {\rm x_{p} }} \right)^{2} } \right) + 2{\rm p}_{2} \left( {{\rm x - x}_{\rm {p}} } \right)\left( {{\rm y - y}_{\rm {p}} } \right) $$
(12)
$$ \Updelta {\rm y_{dec} }= {\rm p_{2}} \left( {{\rm r^{2}} + 2\left( {{\rm y} - {\rm y_{p} }} \right)^{2} } \right) + 2{\rm p}_{1} \left( {{\rm x - x}_{\rm {p}} } \right)\left( {{\rm y - y}_{\rm {p}} } \right) $$
(13)

The effect of decentring distortion is typically an order of magnitude lower than that of radial lens distortion.

4.2 Scene-Independent Range Errors Modelled Physically

Three error sources in the range measurements are discussed in this subsection: the rangefinder offset (d0), periodic errors (d2 to d7) and signal propagation delay errors (e1 and e2). The common thread among these is that they are instrumental errors that are independent of the scene structure. This is in contrast to the scattering range error (Sect. 4.4) that is also an instrumental source but it is very strongly dependent on the nature of the imaged scene.

$$ \Updelta \uprho = {\rm d}_{0} + \sum\limits_{{\rm m} = 1}^{3} {\left[ {{\rm d}_{{\left( {2{\rm m}} \right)}} {\rm\sin} \left( {\frac{{2^{{\rm m}} \uppi }}{{\rm U}}\uprho } \right) + {\rm d}_{{\left( {2{\rm m} + 1} \right)}} {\rm\cos} \left( {\frac{{2^{{\rm m}} \uppi }}{{\rm U}}\uprho } \right)} \right]} + {\rm e}_{1} \left( {{\rm x} - {\rm x}_{{\rm p}} } \right) + {\rm e}_{2} \left( {{\rm y} - {\rm y}_{{\rm p}} } \right) $$
(14)

In other rangefinding technologies, such as tacheometric equipment, the offset parameter d0 models the offset of the range measurement origin from the instrument’s vertical axis. It can also be a lumped parameter that models internal signal propagation delays and its value may be temperature dependent [13]. In the context of a TOF range camera, the offset represents the difference between the range measurement origin and the PC of the pinhole camera model. The first approximation is that the rangefinder offset d0 is constant, but deviations from this may be modelled with a pixel-wise look-up table or a position-dependent “surface” model.

The periodic range errors are caused by odd-harmonic multiples of the fundamental frequency contaminating the modulating envelope, which results in a slightly square waveform. The physical cause of this is the non-ideal response of the illuminating LEDs [14]. The errors have wavelengths equal to fractions of the unit length, U (half the modulation wavelength). Pattinson [15] gives the mathematical explanation for the existence of the U/4-wavelength terms. The origins of the U/4- and U-wavelength errors that have been observed experimentally are not completely clear. Some (e.g. [16]) favour non-sinusoidal bases to model the periodic errors such as B-splines or algebraic polynomials (e.g., [17]).

The e1 and e2 terms are the signal propagation delay errors [18], also known as the clock-skew errors [19]. They are caused by the serial readout from the detector array. Their effect is a linearly-dependent range bias, a function of the column and row location, respectively.

4.3 Range Errors Depending on Position in the Focal Plane and Range Errors Related to the Recorded Amplitude

Systematic range errors depending on the recorded amplitude and the position in the focal plane are reported in ([2023]). While arguments are brought forward for the physical relation between recorded amplitude and a range error related to individual diodes (rise and fall time of IR diodes, [21]), the assembly of many diodes for illumination of the scene prevents physical modeling.

The TOF cameras investigated with respect to the dependence of the range error on the position in the image plane feature a notable light fall-off from the center to the image borders. As no physical cause is given for this error, the relation of the range error sources amplitude and position in image plane are not resolved.

The recorded amplitude a is a function of the distance from the sensor to the illuminated footprint on the object and the object brightness itself. Furthermore, objects which are neither perfect retro-reflectors nor isotropic scatterers feature a dependence of the remitted energy on the incidence angle. Thus, object brightness (at the wavelength of the diodes) and angle of incidence may appear to have an influence on the range error, but primarily this relates to the influence on the backscattered energy.

Reference [20] reports for the SR3000 range errors of 40 cm for low amplitudes. Additional maximum errors of 25 cm depending on the position in the image plane are shown. However, these image plane errors are concentrated strongly in the corners.

The models to describe these systematic offsets need to be developed since no physical basis exists. The functions are typically chosen to have as few parameters as possible in order to prefer a simple model. On the other hand, the remaining errors, after subtraction of the modelled systematic behaviour should only be random. In [20] a hyperbolic function with parameters h1, h2, h3 is chosen. This model fits to the general observations that range errors are positive and large for low amplitudes, rapidly become smaller for larger amplitudes and do not change much for higher amplitudes. The equation for relating the range error to the observed amplitude is:

$$ \Updelta \rho = - \frac{{{\rm{h}}_{2} }}{{{\rm{h}}_{3} }}{\rm{a}} + \sqrt {\frac{{{\rm{h}}_{2}^{2} }}{{{\rm{h}}_{3}^{2} }}a^{2} - 2\frac{{{\rm{h}}_{1} }}{{{\rm{h}}_{3} }}{\rm{a}} + \frac{1}{{{\rm{h}}_{3} }}} $$
(15)

It has to be noted that these descriptions should only be used for one range camera, as no physical principle applicable to all range cameras builds the basis. See Fig. 2 for an example.

Fig. 2
figure 2

Systematic range errors as function of the observed amplitude. The dashed line represents the differences between the observed and the reference distance. The solid line is the model that describes the systematic range error [20]

4.4 Internal Scattering

The echo of the emitted optical signal is, in the geometrical model of the camera, focused by the lens onto the individual pixels. In reality, however, the signal focused onto one pixel is scattered partially within the camera and distributed over the sensor area. This is caused by multiple reflections within the camera: between the lens, the optical filter, and the sensor. Therefore, the signal observed at an individual pixel is a mixture of the focused light, returned from the geometrically corresponding pixel footprint on the object, and the scattered light reflected at other pixels and thus corresponding to other parts of the object. This is illustrated in Fig. 3. Three targets at different distances, and therefore with different phase angles, are imaged by the range camera. Because target 1 is a strong reflector and close to the sensor, a notable portion of the focused light is scattered and influences the focused light backscattered from targets 2 and 3. Because target 3 in the given example features a low amplitude due to the larger distance, the scattered light from target 1 will influence the derived range stronger than for the “brighter” target 2. The impact on the observed amplitudes may be small, but phase angle measurements and derived distances are strongly affected in images with high amplitude and depth contrast. Such high contrasts are typical for systems with active illumination.

Fig. 3
figure 3

Targets at different distances and having different amplitudes of the geometrically recorded signal. Left: the amplitude and phase angle are shown as a complex number. Right: scattering from target 1 to targets 2 and 3 is illustrated [24]

Scattering can be described by the convolution of a point spread function (PSF) with the “unscattered” image. This assumes that scattering in TOF cameras can be described as a linear phenomenon, which is – experimentally – verified by different studies [25]).

[26] Conducted experiments with an SR3000 and images with a high contrast in depth (0.73  to 1.46 m) and target reflectivity (retro-reflective foil). Scenes with and without the target in the foreground were subtracted in the complex domain. Maximum distortions in the distance were found to be 40 cm. Reference [27] performed experiments with an SR3000 and an SR4000 using two planar objects. The foreground object covered half the scene. The scattered signal from the foreground object to the pixel which images (geometrically) the background resulted in “errors” as large as the distance between the two objects. For the SR4000 scattering was more than one order of magnitude smaller. Differently, [28] could not assert scattering effects in their experiments.

4.5 Fixed Pattern Noise

Each pixel can be considered to be individually the cause of a systematic range error. This is called fixed pattern noise. Piatti [29] attributes it to the “imperfect manufacturing process and different material properties in each sensor gate, which yields a pixel-individual fixed measurement offset”. In [30] it is argued that modeling this error for the PMD-camera does not lead to an increase in precision.

4.6 Errors Dependent on Integration Time

In the focal plane of the TOF camera the backscatter of the emitted light is collected. In order to have a high signal-to-noise ratio (SNR) for precise measurement at each pixel, the measurement period should be as long as possible. On the one hand, this can lead to saturation effects, and on the other hand, moving objects in the scene or movement of the camera may motivate a short integration time in order to avoid motion blur effects. Thus, the integration time can typically be set by the user and is adjusted to the observed scene. The SR4000 allows, e.g., setting the integration time approximately between 0.3 and 25 μs, whereas the PMD CamCube bounds it approximately by 0.01 and 50 μs.

Different authors have reported that the systematic errors described above depend on the integration time selected by the user ([20], [23]). The influence on the range errors of the chosen integration time is shown in Fig. 4.

Fig. 4
figure 4

Observed distances vs. range error for different integration times of the SR3000, between 0.4 and 200 μs. The periodic range measurement errors (see 6.4.2) are clearly visible, but also that the integration time has an impact on the phase of the periodic errors. The deviations from a harmonic originates in other error sources (recorded intensity), which also varied in the acquired data. The period of these range errors is 1.75 m, which is an eighth of the modulation wavelength of 15 m. [20]

4.7 Camera Warm Up Errors

Controlled tests conducted by [23, 31] show that range measurements between a stationary camera and a stationary target exhibit a significant transient error response as a function of time after the camera is powered up due to strong temperature dependence. The magnitude of this drift in the older SR3000 model of the SwissRanger camera is reported to be on the order of several centimetres [23]. They also show that this effect can be reduced to the centimetre level by introducing an optical reference into the camera. The known internal path of light passing through an optical fibre allows correction of measured ranges for the temperature-caused drifts. The warm-up transient effect in the newer SR4000 is reported by [31] to be smaller, i.e. on the order of several millimetres, but takes tens of minutes to decay. They suggest that a warm-up period of 40 min be observed prior to camera use. An example of the warm-up effect is given in Fig. 5.

Fig. 5
figure 5

Range error, ∆d, due to the camera warm-up effects in SR4000 data collected every 30 ms for 5 h. A diffusely-reflecting, planar target was imaged at normal incidence from a range of 2.1 m. The range of the 60 s moving-average trend is 2.6 mm. In this example the transient dies out after about 60 min but the long-term stability thereafter is very good

The temperature of the camera does not only vary when the camera is switched on, but also depends on the work load of the camera which is shown in Fig. 6, [15]. Three distinct phases are depicted and separated by vertical bars: a first phase of low frame rate, then a phase of high frame rate (up to 10 fps) and then a phase of low frame rate again. The distance which is measured to fixed targets varies for several centimeters depending on the frame rate (and consequently the temperature of the camera) at which the camera is driven.

Fig. 6
figure 6

Range measurements to multiple targets at varying frame rate [15]

4.8 Ambiguity Interval

Ranges determined by the phase-difference method are inherently ambiguous since the phase measurements are restricted to [0, 2π). For a single-tone system, the maximum unambiguous range or ambiguity interval, U, is determined by the modulation frequency, fmod,

$$ \rm{U} = \rm{\frac{c}{{2f_{\bmod } }}} $$
(16)

For the SR3000 the nominal modulation frequency and maximum unambiguous range are 20 MHz and 7.5 m, respectively. The SR4000 features the ability to set the frequency to one of several pre-defined values; the default is 30 MHz, for which U is 5 m.

The range to a target beyond the ambiguity interval is mapped onto [0, U), resulting in a discontinuous wrapped range image or wrapped phase map (Fig. 7). Phase unwrapping must be performed to estimate the integer ambiguity—the number of whole cycles between the camera and target—at each pixel location, thereby removing the discontinuities. Jutzi [32] has demonstrated that it is possible to unwrap the phase of a range camera image with unwrapping 2D algorithms. He suggests making use of the measurement-confidence value available with some cameras’ output to guide the unwrapping.

Fig. 7
figure 7

Left: SR4000 amplitude image captured in a hallway. Right: corresponding wrapped range image showing two discontinuities

4.9 Multi-Path Effects in Object Space

Similar to the scattering of the signal inside the camera (Sect. 4.4), parts of the emitted signal may be scattered in object space by closer objects, casting the signal further to other objects, and from there to the corresponding pixel in the focal plane. Thus, the footprint of the pixel on the object is illuminated twice: once directly, and once via a longer object multi-path. Again, the relative strength of the multi-path signal to the direct illumination signal determines the size of the range error. The final range and amplitude is found by the complex addition of all direct and multi-path signals incident onto each pixel. Descriptions of this phenomenon are found in [33, 34].

5 System Calibration

5.1 Purpose of Calibration

The geometric positioning models given by Eqs. 4, 5 and 7 are mathematical simplifications of the imaging process. In reality the many influencing variables described in Sect. 4 cause perturbations from the idealized geometric conditions, which if ignored can degrade the accuracy of a captured 3D point cloud. In calibration one seeks to estimate the coefficients of the models for the instrumental systematic errors. In doing so the influence of other error sources, such as the ambient atmospheric conditions and scene-dependent artefacts like multi-path and scattering, must be eliminated as best as possible to prevent biases. Currently the estimable parameter set includes the lens distortions, rangefinder offset, periodic range errors, clock skew errors as well as amplitude-dependent range errors.

Förstner [35] breaks down the camera calibration (and orientation) problem into four central tasks. The first is sensor modelling, treated in Sects. 2 and 4, in which an understanding of the physics of image formation, possibly guided by model identification procedures, is transformed into the mathematical models. The second is geometric network design where the goal is to maximize the accuracy of coefficient estimation through the judicious choice of a configuration of sensor and calibration primitive locations. This subject is touched upon briefly in this section; greater detail can be found in the cited references. The third and fourth tasks are error handling (i.e. procedures for outlier identification) and automation, respectively, neither of which is treated here. However, there is ongoing research to address these problems also in neighbouring disciplines [11].

5.2 Calibration Approaches

Two basic approaches to TOF range camera calibration can be considered. The first is laboratory calibration, the name of which implies that it is conducted in a controlled setting. Specialized facilities (e.g. a geodetic calibration track) are used to very accurately determine (usually) a subset of calibration model parameters. Multiple calibration procedures (e.g. range error calibration over a baseline and lens distortion calibration over a target field) are generally required for complete system characterization. The principal advantage of this approach is that the network design tightly is controlled by the specialized observation conditions, so no parameter correlation problems are explicitly encountered.

The rather restrictive requirement for special facilities has been a driving force behind the development of self-calibration methods. Though several variants (described below) exist for TOF range cameras, they share a common underlying premise: a strong network of geometric primitives (structured point targets and/or planar surfaces) is imaged and all model variables (the IOPs augmented with systematic error model terms, the EOPs and the object primitive parameters) are simultaneously estimated from the redundant set of observations according to some optimality criterion (e.g. the weighted least-squares principle). It is a very flexible approach in that all information sources (system observations and external constraints) can be incorporated and a holistic method in which both individual component errors and system assembly errors are included in the models. Absolute knowledge of the object space primitive parameters is not required, though it can be readily included in the solution. Since the facility requirements are minimal, self-calibration may be performed in a laboratory for pre- (or post-) calibration or at a job site if the stability of the camera’s interior geometry is in question.

5.3 Self-Calibration Methods

The available self-calibration methods are first categorized as being range-camera-only methods or joint-setup (with a passive camera) methods. Three variants of the former are first described: the two-step independent method; the two-step dependent method; and the one-step integrated method. Data acquisition for either category may comprise still images or video sequences, which allow for greater random error mitigation via image averaging and greater redundancy.

In the two-step independent method, the camera-lens and range-error calibrations are performed as separate processes using separate facilities. First, an established procedure is used for the camera-lens calibration from x and y observations of targets in a network of convergent images [36]. Convergent imaging is needed to lower functional dependencies between the EOPs and IOPs; see [37]. Then, a planar target is imaged at normal incidence to determine the range-error parameters. Kahlmann and Ingensand [23] use a small, planar target moved along an interferometric calibration track whereas [31] use an extended planar target positioned with parallel tape measures. The orientation can also be performed by space resection of the camera from independently-surveyed targets on the plane. Regardless of the orientation method used, reference ranges between each image’s PC and the target plane, described by the unit surface normal vector n and distance parameter d, are computed as follows

$$ \uprho^{\rm{ref}} = \rm{\frac{{d - {\mathbf{n}}^{T} {\mathbf{r}}_{j}^{c} }}{{{\mathbf{n}}^{T} {\mathbf{R}}_{j}^{T} {\mathbf{p}}_{ij} }}\left\| {{\mathbf{p}}_{ij} } \right\| }$$
(17)

The reference ranges are compared with the observed ranges to derive a dense set of range differences from which the range-error parameters are estimated by least-squares.

A common facility is used for both calibration processes in the two-step dependent method. The camera-lens calibration is first performed with an established procedure using the x and y observations of targets on a planar surface observed in a network of both convergent and orthogonal images. The camera-plane orientation is established by the camera calibration so there is no need for an independent orientation means. The reference ranges can then be computed from the orthogonal camera stations to points on the plane [38, 39] or to the target centers [40] and used for the range-error calibration as described above.

The third approach is the one-step integrated method in which both sets of calibration parameters (camera-lens and range-error) are estimated simultaneously [27]. A planar target field is imaged from both convergent and orthogonal camera locations. To prevent scattering errors from biasing the solution, the range observations from the convergent stations are excluded from the bundle adjustment. In this approach the camera orientation is performed concurrently and there is no explicit computation of reference ranges.

In the joint-setup methods a high-resolution, pre-calibrated passive digital camera is used to aid the range camera calibration. The two cameras are rigidly mounted together in a frame such that there is a high degree of overlap of their fields-of-view and their relative orientation is invariant. Reference [16] presents a two-step (i.e. camera-lens calibration followed by range calibration) joint-setup method while [17] proposes an integrated joint-setup approach that incorporates both convergent and normal images in the network.

The principal advantage of the range-camera-only methods is that no auxiliary equipment is required to perform the self-calibration. Though in principle a more rigorous method, the one-step integrated approach suffers from high correlation between the rangefinder offset and the PC position brought about by the use of normal-incidence imaging of the planar target field. This correlation exists in the two-step methods as well, just not explicitly. However [41] have demonstrated that the precision and accuracy differences between the three range-camera-only methods were not of any practical significance. The advantage of the joint-setup approach is the decoupling of the IOPs and EOPs parameters. Furthermore, it is a logical procedure to pursue if one wishes to colourize point clouds captured with the range camera with colour imagery from the passive digital camera.

5.4 Model Formulation and Solution

Regardless of the exact self-calibration procedure, the first step in the numerical model solution is formulation of the deterministic part of the linearized Gauss-Markov model:

$$ {\mathbf{Ax}} = {\mathbf{b}} + {\mathbf{v}} $$
(18)

where x denotes the vector of unknown parameters; b is the vector of observations; A is the Jacobian of the observations with respect to the parameters; and v is the vector of error estimates (residuals). If, for example, one considers the point-based, one-step self-calibration procedure, then this system can be partitioned row-wise according to observation group (x,y,ρ) and column-wise according to parameter group (e: EOPs; i: IOPs; o: object point co-ordinates)

$$ \left( \begin{array}{ccc} {\mathbf{A}}_{\rm{xe}} & {\mathbf{A}}_{\rm{xi}} & {\mathbf{A}}_{\rm{xo}}\\ {\mathbf{A}}_{\rm{ye}} & {\mathbf{A}}_{\rm{yi}} & {\mathbf{A}}_{\rm{yo}}\\ {\mathbf{A}}_{\uprho \rm{e}} & {\mathbf{A}}_{\uprho \rm{i}} & {\mathbf{A}}_{\uprho \rm{o}}\\ \end{array} \right)\left( \begin{array}{c} {\mathbf{x}}_{\rm{e}} \\ {\mathbf{x}}_{\rm{i}} \\ {\mathbf{x}}_{\rm{o}} \\ \end{array} \right) = \left( \begin{array}{c} {{\mathbf{b}}_{\rm{x}} } \\ {\mathbf{b}}_{\rm{y}} \\ {\mathbf{b}}_{\uprho } \\ \end{array} \right) + \left( \begin{array}{c} {\mathbf{v}}_{\rm{x}} \\ {\mathbf{v}}_{\rm{y}}\\ {\mathbf{v}}_{\uprho} \\ \end{array} \right) $$
(19)

The stochastic model is defined by

$$ {\rm E}\left\{ {\mathbf{v}} \right\} = {\mathbf{0}} $$
(20)

and

$$ {\rm E}\left\{ {{\mathbf{vv}}^{{\rm T}} } \right\} = {\mathbf{C}} $$
(21)

where C is symmetric, positive-definite and diagonal if uncorrelated observational errors are assumed.

The system of Eq. (19) must be subjected to a set of minimum object space datum constraints such as the inner constraints, represented by the design matrix G, imposed on object points only (e.g. [42]).

$$ {\mathbf{G}}_{o}^{T} {\mathbf{x}}_{o} = {\mathbf{0}} $$
(22)

The least-squares solution of this system of equations is performed iteratively in order to obtain optimal parameter estimates. Their covariance matrix quantifying parameter solution quality is obtained directly from the solution. Further details about the least-squares solution and quality measures can be found in Kuang [43], for example.

6 Summary

This chapter summarized approaches to estimate the orientation of TOF cameras and model the imaging process. The geometric principles of the imaging process are based on the collinearity of the object point, the camera perspective centre, and the image point. This is augmented by the direct range observation performed by TOF cameras, providing the distance from the perspective centre to the object point.

However, systematic deviations from this model exist. Thus, range cameras need to be calibrated. Depending on the range camera used and the experimental design, errors may be larger than 10 cm. If possible, the physical cause for these systematic errors should be found and modelled. This was shown for periodic range errors. In other cases, an empirical modelling approach must be chosen, as the cause for reproducible systematic errors is not known or too complicated for modelling (e.g. error related to amplitude). Other causes of error are range wrap and scattering. Range wrapping can be corrected with weak assumptions on the scene. Scattering, however, needs to be treated differently. Its removal by deconvolution techniques is the first step in processing range data, i.e. before orientation and calibration.

Orientation and calibration are not independent of each other. If the stability of a range camera does not allow determining calibration parameters once for a longer time period (e.g. one year), self-calibration by exploitation of project data is necessary. In such a case, the orientation and calibration are solved simultaneously.

With the on-going development of TOF camera technology, the set of systematic errors becomes smaller and smaller. Still calibration remains important because it is the appropriate means of quantifying both random and systematic errors.

The on-going development with respect to TOF camera resolution and reduced noise in range and amplitude observation will also increase the accuracy of estimating the orientation.