Keywords

1 Introduction

Semantic-rich and qualitative 3D building models could widely be used in urban planning, simulation of automated driving, localization, metaverse, building accessibility for disabled. 3D building models are a cross-sectional technology for further applications. The cross-sectional technology contributes to people and the environment by increasing flexibility, efficiency, sustainability and safety. For building up applications, the semantic description of the building outer shell is necessary. This corresponds to Gerhard Groeger et al. to Level of Detail 3 (LOD3) (Groeger et al. 2008). The research of treal-world LOD3 reconstructions that have been already performed revealed that the number is small (see Related work). In addition, companies offer LOD3 building reconstructions (Voxelgrid 2023). In the reconstruction process, the ambivalence of data acquisition and processing are the challenging issues in scalability, cost-benefit and automation level. Most approaches have used raw data from existing data acquisition systems. The acquisition systems are stationary, mobile or flying and mainly use LiDAR, photogrammetry, inertial measuring units (IMU), Global Navigation Satellite System (GNSS) and Real-Time Kinematics (RTK) as sensors. A gap has been identified in acquisition systems for use in city areas and detailed mapping. The problem is that most of the large-scale mobile mapping systems are mounted on cars. Thus, it is difficult to take detailed images while driving or accessing restricted areas. Alternatively, drones, trolley- and backpack-based systems are used to capture details. However, flying in city centers is often only allowed with exceptional permits. Backpacking systems with LiDARs and panoramic cameras are mainly designed for indoor use. They have a limited line of sight due to their low LiDAR position above ground. Based on these circumstances, the paper presents an alternative approach for data acquisition. In this approach, a height-adjustable, high-precision gimbal is mounted on a handcart. The system combines stationary and mobile data acquisition and relies primarily on photogrammetric reconstruction assisted bz LiDAR, IMU and GNSS RTK data. The gimbal is equipped with two cameras and different lenses, resulting in a pan tilt zoom system (PTZ) for data acquisition.

1.1 Related Work

A. Gueren et al. propose different approaches for LOD3 modelling from multiple data sources (Gueren et al. 2020). The approach cosiders raw data acquisition by light detection and ranging (LiDAR) and structure from motion/multi-view stereo (SfM/MVS) photogrammetry technology. Furthermore, the paper described the use of raw data mobile ground and flying vehicles. Thus, the paper coveres a wide variety of different raw data possibilities.

Wyskocki et al. (2022) provided an open source semantic city model overview of real and artificial CityGML reconstructions in all detail levels. According to this overview, there are three LOD3 city models from Espoo, London and Ingolstadt available. Based on the three LOD3 city models inversely, the processing and raw data acquisition methods are explained in detail:

Espoo, Finland, was reconstructed in LOD3. The LOD3 data does not provide semantic information like windows, doors etc. but rather marks texturing (City of Espoo 2023). The reconstruction and data acquisition process are not described.

AccuCities Ltd. provides some reconstructed city models of the UK in LOD3 quality (AccuCities Ltd 2023). They describe the data acquisition by aerial imaging and a LOD3 reconstruction accuracy of 15 cm. The data acquisition is done by flying parallel lines above the city. Due to aerial images the provided LOD3 models are only considering details of the rooftop structure.

In the governmental founded projects SAVe and SAVeNoW, in Ingolstadt in Germany, the project partners reconstructed one street in LOD3 (Schwab and Wysocki 2021). The LOD3 model reconstruction is based on point clouds recorded by a mobile laser scanner (MLS) system from 3D Mapping Solutions GmbH. The MLS is driven on road through the area and generates a point cloud with an accuracy of 1–3 cm for reconstruction. The LOD3 reconstrutions shows that the ground-based view of MLS makes it hard to reconstruct rooftop structures, rain gutters and chimneys.

Voxelgrid GmbH offers LOD3 reconstructions for single buildings (Voxelgrid 2023). The data acquisition is done drone flights with LiDAR, multispectral cameras and structure for motion (SfM) photogrammetry. The flying strategy is to record the buidling surfcaces in parallel equidistant lines. The façade elements are distinguished by multispectral reflectance. Information regarding the accuracy is not provided. The data acquisition requires governmental and local residents permits.

NavVis GmbH provides LOD3 reconstruction for single buildings (NavVis 2023). They developed a backpack with LiDAR and a panorama camera system for data acquisition. During the acquisition process a person has to walk around the bulidng. Their white paper provides a validation of their sensor setup based on an 20 m high façade. The system achieves an average accuracy less than 12.2 mm.

All related data collection methods differ in combination of different sensors, mechanical integration and acquisition strategy. The proposed acquisition system combines different mentioned approaches and tries to compensate disadvantages.

1.2 Objectives

The general challenges of raw data acquisition for LOD3 reconstruction can be summarized as follows:

  1. 1.

    The different architectures of the data acquisition systems in sensors, mechanics, and strategy limits their field of application. Thus, all real-world LOD3 data acquisition approaches are limited in scalability for large, semantical enriched LOD3 reconstructions.

  2. 2.

    The counterpart of scalability is the cost-benefit of the LOD3 reconstruction. These costs are largely caused by data collection and processing expenses. A solution is the efficiency improvement of the methodology.

  3. 3.

    Permissions for public data acquisition are in most countries challenging legislation and privacy regulations. These regulations make it even difficult to collect data in the appropriate quantity and quality.

To overcome these challenges, we propose a novel human-based mapping approach. This approach is mainly based on photogrammetry and is considering zoom for detailed pictures. Furthermore, the capturing system is mounted on 360\(^{\circ }\) and height-adjustable high precision gimbal on a ground-based handcart. Thus, the system combines stationary and mobile acquisition. The handcart design allows accessibility to cities and is regulation compliant because the point of view is like that of a truck driver. The acquisition strategy is considering LOD2 models and camera-behavior model for accuracy. Additionally, the system can be used for testing future object-based, visual localization approaches.

2 Handcart Data Acquisition Approach

The starting point was the research about localization for autonomous ground vehicles. So, the idea came up to do human like localization via visual photogrammetry on object-level. The idea of the acquisition system came from the need that only a few LOD3 models were available. The models were reconstructed by point clouds. In SAVe model in comparison with real-world taken images, had many differences. So, the models were not useful for photogrammetry object-based localization approaches. Thus, the decision was made to create a photogrammetry-based mapping approach.

2.1 LOD3 Reconstruction Process Pipeline Overview

The LOD3 reconstruction process focusses on a photogrammetric SfM approach. Figure 1 shows: Data acquisition, instance segmentation, 3D model reconstruction, standardized output format.

Fig. 1
figure 1

Photogrammety SfM LOD3 reconstruction process

The Data acquisition includes the recording of images with additional ground truth data for reconstruction validation. In the second step, the images are decomposed via instance segmentation to façade elements. The object classes of façade elements are inspired by CityGML (Groeger et al. 2008) standard. From the segmented façade elements the contours are derived. Next, contours from different points of view are matched. Matched contour points are three-dimensionally reconstructed via SfM triangulation to 3D model. A camera behavior model and the LOD2 data are used as support. Finally, the model is transformed to a compliant standard, like CityGML. Due to the complexity of the presented process pipeline, this paper is only about the data acquisition.

2.2 Data Acquisition Strategy

The data acquisition strategy is distinguished in stationary and mobile acquisition. Figure 2 shows the stationary data acquisition in birds view. The stationary acquisition strategy is also splitted into orthogonal XZ and YZ planar capturing. YZ is for façade element positions on surface and XZ for depth estimation. In overlapping the XZ and YZ three-dimensional XYZ position accuracies should increase. So images for YZ plane are taken from the house opposite (I, II). Façade elements depth capturing on XZ plane is done in parallel view direction to the façade plane (II, III). LOD2 models will be used to calculate capturing positions. During capturing, the gimbal is moved to defined positions and 6DoF of gimbal and vehicle position (IMU, motor), date, time, GNSS RTK are logged. The gimbal is additionally equipped with a LiDAR rangefinder for tachymetic measurements. These points are used as ground truth data for photogrammetry reconstruction validation. The rangefinder can be controlled in closed loop by using camera and gimbal to position its laser point. By determining salient points on the façade, a relative 6DoF estimation via LiDAR can be done.

Fig. 2
figure 2

Static data acquisition strategy with orthogonal planar XZ and YZ approach

The mobile acquisition process is distinguished in low and fast speed. At walk speed the system operates like in stationary acquisition. At faster speed, the gimbal view direction is positioned to front-side of the handcart. The data acquisition is done iteratively by driving along the buildings. During each pass, the yaw, and pitch angles of the gimbal are systematically changed. Ground truth data acquisition like rangefinder measurements are not further considered.

2.3 System Design and Building

This subsection describes only the opto-electromechanical system design and setup. System accuracy and precision are in results Sect. 3. Figure 3 shows the handcart functionality of axes and sensors and Fig. 4 shows the prototype. The handcart explanation of design and building is correlated with numbers to Figs. 3 and 4. The handcart is built by off the shelf components for a cost benefit setup. Its skeleton is build of aluminum 20 and 40 mm profiles and 3D-printed parts (1). Further, the vehicle is on four wheels and is pulled via a handle bar. This design enables its application in rough terrain, on the road and on footpaths. The elevator (2) on top is rest on big real elevators with a counterweight. The elevator can move in range of 1.2 m from 1 m above the ground. The moving part of the elevator is equipped with linear axis for precision. The elevator itself is a 3D printed 360\(^{\circ }\) gimbal mounted (3). The gimbal is limited by wiring to a range of yaw 340\(^{\circ }\) and pitch 100\(^{\circ }\). This construction enables a five DoF acquisition system (roll is missing). All moveable axis are moved by 1.8\(^{\circ }\) stepper motors. Every axis is reduced by a fix ratio. Each motor is controlled via a separated CAN gateway with TMC2130 stepper driver (TRINAMICMotion Control GmbH & Co. KG 2023) (4). Movable parts are homed by one site end switches to a reference position. The pitch axis is equipped with two Sony IMX 477 (Song 2018) raspberry pi cameras, Bosch BNO055 (Bosch Sensortec 2021) IMU, a green laser diode and Leica Distro D510 (Leica Geosystems 2023) rangefinder (5). Beside, the system controller NVIDIA Jetson Xavier NX is mounted (6). The camera heads are connected via CSI-2 cable and the rangefinder via Bluetooth to the Jetson. The GNSS RTK ubloxx ZED-F9P (u bloxx 2023) module is mounted on the top of the elevator pole (7) and connected with USB. All sensors, DIOs, IMU, drivers are connected on CAN operated nodes. The nodes are binded with a CAN gateway to the Jetson. The system is powered by a 120 Ah 12V car battery. Material costs were around 4000 €.

Fig. 3
figure 3

Functional representation of the handcart including sensors DoF

Fig. 4
figure 4

Handcart prototype

2.4 3D Sensor Pose Reconstruction Methods

The 3D reconstruction process consists of two stages. The first stage is the absolute or relative 6DoF positioning of the vehicle cameras. Followed by the reconstruction of façade elements three-dimensional key pointpositions. Every stage cloud be proceed by different component combinations. The 6DoF poses are described by a cartesian vector \(\overrightarrow{v}\) and euler angles.

  1. 1.

    Absolute position determination with GNSS, IMU and gimbal data. Due to lack of IMU yaw accuracy, the yaw was reconstructed by camera with pole objects. The individual camera absolute focal point \(P_{focal}\) is calculated by GNSS average position \(P_{pos,GNSS, avg}\) and offset vector \(\overrightarrow{v}_{gimbal}\). The offset vector depends on motor positions, offsets and is anchored by rotation matrice R to absolute coordinates.

    \(P_{pos,GNSS,avg} = \frac{\sum _{i=0}^{n} P_{i,GNSS}}{n} | P_{i,GNSS} \in \{fixed RTK\}\)

    \( P_{focal} = P_{pos,GNSS,avg} +\overrightarrow{v}_{gimbal}(z_{m1}, \gamma _{m2}, \alpha _{m3}) \times R_(\varDelta \alpha _{imu}, \varDelta \beta _{imu}, \varDelta \gamma _{cam}) \)

    The pose is calculated by gimbal positions and their IMU \(\varDelta \) angles to absolute coordinates in addition to the basic offsets:

    \( \left( \begin{array}{r} \alpha _{abs} \\ \beta _{abs} \\ \gamma _{abs}\end{array}\right) = \left( \begin{array}{r} \alpha _{m3} \\ 0 \\ \gamma _{m2}\end{array}\right) + \left( \begin{array}{r} \varDelta \alpha _{imu}(\alpha _{m3},\gamma _{m2}) \\ \varDelta \beta _{imu}(\alpha _{m3},\gamma _{m2}) \\ \varDelta \gamma _{cam}(\alpha _{m3},\gamma _{m2})\end{array}\right) + \left( \begin{array}{r} \alpha _{offset} \\ \beta _{offset} \\ \gamma _{offset}\end{array}\right) \)

    Afterwards vector \(\overrightarrow{v}_{focal,P_1P_2}\) is calculated for SFM reconstruction:

    \( \overrightarrow{v}_{focal,P_1P_2} = {P}_{2,focal} - {P}_{1,focal}\)

  2. 2.

    Relative positioning and selective point measurements with LiDAR rangefinder for ground truth data acquisition. For 6DoF pose estimation, a minimum set of three, spatio-temporally constant points in the environment is necessary. The individual environment point \({P}_{env}\) is described by gimbal spherical coordinates \(\gamma _{m2}, \alpha _{m3}\) and laser measured distance d LiDAR relative position \({P}_{LiDAR}\) is triangulated by environmental points \(\sum _{i=0}^{n}{P}_{env}\). From environmental points the roll angel \(\beta _{env}\) is derived. The camera focal point \({P}_{focal}\) is calculated by \({P}_{LiDAR}\) and the pose corrected offset vector \(\overrightarrow{v}_{cam,offset}\).

    \( \left( \begin{array}{r} \alpha _{abs} \\ \beta _{abs} \\ \gamma _{abs}\end{array}\right) = \left( \begin{array}{r} \alpha _{m3} \\ \beta _{env} \\ \gamma _{m2}\end{array}\right) \)

    \({P}_{focal} = \overrightarrow{v}_{cam,offset}(\gamma _{m2}, \alpha _{m3}, \beta _{env}) + {P}_{LiDAR} \)

  3. 3.

    Relative 6DoF pose estimation by camera is done in four steps. Frist, a set of corresponding key points in two images are extracted. The the the fundamental matrix is calculated by key points with 8-point algorithm. The fundamental matrix is enhanced by camera and translational matrix to essential matrix. From this, the pose is recovered by solving the perspective’n’point problem. Afterwards the transition vector \(\overrightarrow{t}\) is scaled by lenth of \( \overrightarrow{v}_{focal,P_1P_2}\).

3 Experiments and Results

This section contains the selected components, configuration and its theoretical accuracy. The second subsection contains the calibration methods and determined real-world accuracies of the components. Third to fifth subsection contains the real-world test environment, test description and 3D reconstruction results of different system combinations. The last subsection is about the image annotation process and results.

3.1 Sensor and Motor Driver Setup—Theoretical System Accuracy

The cameras are setup for mono camera data acquisition with zoom. So LOD3 reconstruction is done by different positions and photogrammetry SfM. The two cameras are equipped with a 16 mm telephoto (10MP) and the left 6 mm wide angle (3MP). The image sensor pixel size is 1.55 \(\upmu \)m (Sony 2018). A pixel of the sensor plane could be transferred via pinhole model to a parallel object plane (Sturm 2016). The 16 mm lens has a resolution of 96.88 \(\frac{\textrm{um}}{\textrm{px}}\) in per meter distance from focal point to a parallel object plane. This ideal assumption ends at one cm in pixel diagonally on object plane at a distance of 72.99 m.

The motors are driven by 256 micro steps. By multiplying the reductions, lengths per step and step width the theoretical accuracies were calculated. The elevator achieves an accuracy of 7.752 \(\upmu \)m. The yaw axis achieves 0.253 \({}^{\prime \prime }\) and pitch achieves 2.195 \({}^{\prime \prime }\).

The Bosch BNO055 IMU (Bosch Sensortec 2021) is an integrated triaxial 16 Bit gyroscope, 14 Bit accelerometer and geomagnetic sensor. Thus, 14 to 16 Bit, the IMU theoretical archives an accuracy of 79.102 \({}^{\prime \prime }\) to 19.775 \({}^{\prime \prime }\). The magometer tolerance is max ± 2.5\(^{\circ }\), accelerometer sensitivity tolerance is max ± 4.0 % and gyroscope sensitivity tolerance ±3.0%. It is operated in data nine degree fusion mode for absolute euler angle orientation.

The GNSS RTK System consists of a ublox ZED-F9P (u bloxx 2023) rover and a NetwakeVision RoyalBase (2023). Both systems use Radio Technical Commission for Maritime Services (RTCM) 3.3 protocol for RTK correction. The fix point RTK achieves an accuracy of 14 mm.

3.2 Calibration Tests and Real Accuracy Determination

The handcart combines electromechanical and optical components for data acquisition. Thus, there are four separated calibration processes which result in overall system accuracy.

The height adjustable gimbal was calibrated by repeatability tests. These repeatability tests were executed stationary and individually for each axis. The test procedure was to move 100 times from every approaching direction to the same position. For measuring an image of the green laser dot on a 2 m away, orthogonal millimeter paper was taken. Figure 5 shows an example of position reaching. Afterward, based on the images, the standard deviations were calculated. Table 1 contains the gimbal accuracy results. The results are distinguished axis-individually in negative, positive apporaching and the backlash between approaching directions. Negative and positive approaching accuracies are close together and much lower than the backlash. Finally, the stragegy to pose the gimbal in high precision is a one way position apporaching direction.

Fig. 5
figure 5

Picture of millimeter paper for accuracy determination

Table 1 Gimbal accuracy results of repeatability test

The component XYZ offsets are measured by meter rule in millimeter accuracy.

The BNO055 IMU is used during the whole capturing process for absolute euler angles determination. Due its tolerance, a average filter is applied to achieve higher accuracy. For determining the IMU roll and pitch accuracy the handcart was rolled or pitched by underlay. The real plumped vector was measured with a 1.2 m plump and meter rule. During testing the gimbal was tilted in yaw and pitch with negative approaching to 100 positions. For each test, afterward the IMU plumped vector was calculated with absolute gimbal positions. The calculated and measured plumped vectors differ less than 1 mm. According to this, the accuracy is \(\le \)2.865 \({}^{\prime }\). The yaw was tested by pulling the car around in 360\(^{\circ }\) circles to the same position or tilting gimbals yaw. Pulling around ended in 33.125\(^{\circ }\) difference. This may by caused by vibrations of no damping construction of the cart. But stationary, relative positioning with the gimbal yaw tilting achieved 0.688\(^{\circ }\) accuracy.

The camera with the 10MP 16 mm lens is adjusted to a depth of 15 m and the one with 6 mm is adjusted to a depth of 7 m. Afterward the intrinsic calibration matrix A is determined via a chessboard by using openCV 2011. The calculated focal point offset was 16.227  mm so the resolution per pixel increased.

Leica certified the Disto D510 by ISO 16331-1 accuracy to ± 1.0 mm at favorable and ± 2.0 mm at unfavorable conditions.

Table 2 contains theoretical and practical component accuracy results. In comparison , excepting the camera, the accuracies differ between theoretical and practical.

Table 2 Comparison of theoretical and practical component accuracies

3.3 Outdoor Test Environment

Outdoor testing should challenge camera and LiDAR. Dark, matt surfaces absorb light and uni colors make instance segmentation and edge detection more difficult. Furthermore, LiDAR is challenged by light energy absorption. Glass stress LiDAR and camera by light scattering effects. Building depth offers, like overhangs, are generating shades and disturb line of sight. This makes edge detection and segmentation tasks difficult. Irregular positions of shutter systems make façade element interpretation complicated. Pitched roofs require greater distances for ground-based data acquisition. The RITZ building at campus Fallenbrunnen in Friedrichshafen, Germany fulfills most of the challenging tasks. Unless the the pitched roofs and dormers. The surrounded buildings have pitched roofs and dormers. Furthermore, ground truth data of is available in CAD, LOD2 and total station measurements.

3.4 Outdoor Test Description

The system was tested in stationary mode at the east side of the RITZ building. The handcart was positioned according the stationary acquisition strategy at three points. The test was about accuracy testing of the system as well as the possibility of façade element segmentation of captured images. Sunny to partly cloudy weather condition was chosen for uniform illumination conditions. Additionally, the building was tagged with markers for reproducibility of accuracy testing.

The acquisition process consists of data acquisition at the three positions. For validation the handcarts 6DoF positions were referenced via total station. During data acquisition the pitch and yaw were stepped equidistant in 5\(^{\circ }\). Thus, data of 352 poses at a stationary position was taken.

3.5 3D Sensor Pose Reconstruction Results

Results of the three 6DoF pose maximum deviations are shown in Table 3. The maximum deviations were referenced to total station measurements. Due to different positions of Camera and LiDAR only GNSS and Camera euler angels could be directly compared. The table is showing that the LiDAR pose estimation method had the best accuracy. Camera pose estimation method showed worst results in rotation angle reconstruction.

Table 3 Maximum deviations results of pose estimation methods

3.6 3D Façade Reconstruction Results

Façade point 3D reconstruction is similar to the sensor pose estimation. The façade points could be measured by LiDAR method or reconstructed by camera ray triangulation. Ray triangulation is based on previous 6DOF sensor pose estimation. So the sensor pose euler angles affect the 3D reconstruction. After testing, best results were archived by GNSS sensor pose estimation method. Point matching in the images was done manually to ensure exact matching. Table 4 shows the object reconstruction results. First and second row contain baseline orthogonal point reconstruction of XZ and YZ plane of the building. Third and fourth row show results of the building dimensions. The first column shows the reconstruction aim. Followed by position pairs and base distance \(d_{stereo,GNSS}\) . Additional minimal \(d_{P,min}\) and maximum point distanced \(d_{P,max}\) from positions are listed. Point reconstruction was done like proposed in the acquisition strategy. From position I and II, the parallel points on YZ-plane had an accuracy of 2 mm and a depth of 3 mm. Results of orthogonal XZ plane (II, III) for depth estimation match with YZ plane. For building length estimation, the combinations using all positions were made. The RITZ reference east side length is 54.9867 m, the reconstructed length was 55.051 m. The error of 64 mm is caused mainly due the calculation of the opposite south-east edge point by position I and III. The RITZ building visible rooftop edge is 10.802 m, the reconstructed was 10.799 m.

Table 4 Maximum deviations of three-dimensional object point reconstruction based on photgrammetry

3.7 Results of Manual Image Annotation

Image annotation, as precursor of machine learning, is the process of labeling and segmenting objects. In segmentation, the contour is determined by human vision and perception. To follow the scientific approach, a set of possible contour and edge detection problems were proposed: Image quality, blur, illumination, contrast, line of sight, light scattering and contextual reference. An example subset of affected images is shown in Fig. 6. All images were taken during sunset. In Fig. 6a illumination and blur make detection of round sealing of the upper corner hard. Increasing distance shows in Fig. 6b the effect of edge vanishing on vertical façade elements. Figure 6c is an example of boarder-crossing the dynamic range of cameras, resulting in overexposure. Figure 6d shows light scattering effects in the transition from mirroring to looking through glas. This leads to object contextual reference challenges. In Summary, all these challenges can be examined by different filters, edge detection or pattern recognition. But the solution for most of these issues are images from other positions and poses. Additionally, environmental conditions should be considered in acquisition strategy.

Fig. 6
figure 6

Examples of annotation challenges

4 Discussion

The achieved system results of 3D building façade reconstruction from previous section are discussed in relation to competitive system architectures. Discussion is divided into 3D Reconstruction and image annotation.

The handcart has a redundant system architecture for 3D reconstruction. The LiDAR subsystem can be used for relative positioning and measurements in closed loop mode with camera and gimbal. The Camera pose can be precisely controlled with the gimbal for detailed picture taking. Absolute 6DoF estimation via GNSS RTK and IMU needs to be supported by visual odometry. Good absolute position results were only achieved in fixed RTK mode. This makes the system depended of GNSS signal and RTK communication. Far depth reconstruction with stereo vision over 40 m with a small distance baseline is not useful. But these can be solved by other handcart positions or LiDAR measurements. The handcart architecture makes reconstructions in millimeter to low centimeter range possible.

Based on the practical approved reconstruction, the system is compared to competing systems. For the comparison, Table 5 was created. As representatives for competing systems were selected from popular manufacturers. Due to comparable sensor setup of camera and LiDAR, hand-held and trolley-based systems can be represented by a backpack system (Pantoja-Rosero et al. 2022; Nüchter et al. 2015; Blaser et al. 2020) . The compared backpack is the evolution of a former trolley-based system (NavVis 2021).

Table 5 Comparison of competing data acquisition systems

Data acquisition is distinguished in stationary or mobile. The backpack and the vehicle data acquisition are based on movement. Total station and the handcart have a 360\(^{\circ }\) FoV for stationary acquisition. Further, the handcart is designed for both modes.

All systems use different localization methods but are in the same range. Unless the total stations with autarky precise point positioning (PPP) GNSS. The mobile mapping vehicle (MMV) and the handcart need an active connection to a reference station. Simultaneous localization and mapping (SLAM) of the backpack is a relative method. So, the system must be initial referenced to absolute position points.

All systems use cameras and LiDAR. Total station uses a defection unit and handcart a gimbal 360\(^{\circ }\) field of view (FoV). So, they have the same appoach. The Backpack and MMV have fixed mounted vision systems with a defined. Their FoV is varied by moving. Thus, the accuracy of mobile systems and stationary systems differ. The handcart and total station are nearly in the same accuracy range of LiDAR. Backpack and MMV accuracy is lower because of rotational multi-line LiDAR.

The handcart is constructed for selective point reconstruction of façade elements by camera. The 16 mm lens is capable to keep up with the handcart LiDAR accuracy due to its camera angular resolution. The handcart has a camera closed loop LiDAR measurement. A camera comparison to the other system is not useful because necessary parameters are not available.

Times for data acquisition are only available from total station and handcart. Here, the time of the handcart is significantly reduced to the total station. This is caused by the 12 MP camera to a \(1.1 \frac{M.pts}{s}\) sampling rates. But the handcart needs data from various positions for reconstruction.

The handcart camera-based approach needs good weather and illumination conditions. The 12 Bit high dynamic rate (HDR) camera has problems with direct views in sunlight and dark shades. This is a disadvantage compared to the other systems. They use camera images for point cloud coloring. Also the LiDAR laser beams can be disturbed by rain and snowflakes. So, they are parcialy illumination and weather dependent. Systems’ data acquisition area restrictions show that MMV can only be driven on roads. Total station and backpack have no restrictions, but their use in road traffic is difficult. Stairs and strong slopes are restricting areas for the handcart. With this, the handcart can operate in many urban areas, sidewalks and in road traffic as bicycle trailer.

All point cloud accuracies listed in comparison refer to different environmental data acquisitions of buildings. So, the data are in the same context but can not be directly compared. Handcart and total station point cloud deviation are close together. But the handcart generates the points by photogrammetry and all other systems by LiDAR. MVH and backpack have a significant higher deviation in accuracy.

The handcart and MVH have segmentation approaches. The handcart segmentation approach is based on images and the MMV on LiDAR. So based on this, the handcart is not designed for high density point cloud generation. It is designed to reconstruct the necessary key points of facede elements shape for LOD3 modelling. Surface areas can be photogrammetrically three-dimensional reconstructed. But questionable is this three-dimensional information necessary? Usually façade elements are extruded, simple, regular shapes, like rectangular, circles. So highlevel detailed objects like stuck bring the handcart to its reconstruction boarders. Because many images with good illumination must be taken to reconstruct a high density point cloud. Here the LiDAR-based approaches are in advantage.

In summary, the systems are a trade-off between time and accuracy. The handcart is a trade-off design of both for further LOD3 reconstruction based on camera photogrammetry.

5 Conclusion

In this paper, we propose another an alternative data acquisition approach for LOD3 reconstruction based on a camera, high precision gimbal, handcart design. This design was derived from a superior camera SfM based LOD3 reconstruction pipeline. A prototype was built and a data acquisition strategy was developed. It was tested, evaluated and practical approved in a test environmen on component, subsystem, and system level. The test results were discussed in relation to competing systems techniques. The primary contributions can be summarized as follows:

  1. 1.

    The system is a multi functional data acquisition system for façade reconstruction. The gimbal enables a 360\(^{\circ }\) surround view of the camera and LiDAR rangefinder. Additionally, the rangefinder is controlled in close loop measurement by camera and gimbal. This enables to data recording for photogrammetry reconstruction and LiDAR-based ground truth in one step.

  2. 2.

    The photogrammetry reconstruction was evaluated in a test environment. Resulting accuracies were in the millimeter and low centimeter range.

  3. 3.

    The system is designed for extruded, simple, regular shapes reconstruction. Highly detailed reconstructions like stuck are conditional possible.

  4. 4.

    The handcart application area is limited by steep slopes and barriers. Its dual use in stationary and mobile allows data acquisition in road traffic, sidewalks and rough terrain. This is an enabler of scalability for LOD3 reconstructions.

  5. 5.

    Summarized, the system is an alternative photogrammetry LOD3 reconstruction approach competing to point cloud-based. Due to its accuracy, it is capable to keep up with competing systems.

In the future, we will focus on: (1) image instance segmentation, (2) automatization building reconstruction supported by LOD2 data, (3) mobile data acquisition, (4) object-based autarky self-localization.