1 Introduction

Efficiently monitoring the structural health of large-scale tunnel infrastructure is a socially important challenge. Tunnel infrastructure is commonly located in urban areas and provides essential functions to society such as transport, electricity and communications. As urban populations grow, tunnels are often worked beyond their original design specifications for both function and lifespan, and hence face a growing risk of structural failure.

A critical requirement for effective structural health monitoring is the early detection of any visual changes in the tunnel surface, such as leakages, cracks and corrosion. Early detection allows early intervention, keeping the cost of remedial measures low and reducing the risk of unexpected failure. Detecting such changes is often the work of human inspectors, but given sometimes adverse working environments and the extensive coverage areas, this is a costly and time-consuming process and is subject to human error. More recently, digital camera and laser technology have been used to capture data to improve the efficiency of visual inspection [16, 20], but systems remain expensive and routine inspections are typically conducted only once every few years.

We present a vision-based change detection system which is a step towards low-cost, high-frequency monitoring. The system’s goal is to automatically detect, localise, cluster and rank visual changes on tunnel surfaces in newly acquired images. The system automates the time-consuming process of visual defect discovery, reducing the workload and increasing the effectiveness of expert human inspectors.

This article is an extended version of the conference paper [17]. Section 2 describes the main contributions of the paper in the context of related literature. Section 3 details the theory behind the change detection framework. Section 4 provides an extended description of the complete system and its constituent components. Section 5 describes practical experiments performed, including new results on pixel-level accuracy and on the clustering and ranking of changes. Section 6 concludes with a description of the system’s limitations and areas for future work.

2 Related work

The main contributions of this work are twofold. First, we describe a low-cost means of collecting and organising large-scale visual datasets of tunnel linings. Secondly, we devise a framework for change detection on newly captured, unregistered images. These two contributions are further described below, in the context of related literature.

Data acquisition and reconstruction. Existing automated approaches for tunnel surface inspection tend to make use of more expensive or bespoke visual capture systems such as laser scanners [16] or calibrated laser/camera hybrids [10]. While the use of high-precision depth sensors can enable more accurate and robust geometry estimation than can be achieved from images alone, the increased cost of the sensors makes the systems less economical in situations where such high precision is not necessary.

Alternative approaches using only CCD sensors, such as [19, 20, 25], avoid dealing with geometric information by assuming an annular 2D world. Such systems rely on accurate camera positioning to maintain a constant distance to the tunnel surface. The approach in [6] describes a means of overcoming this by inferring geometric information from the images, but the use of this information is limited to quantifying the scale of cracks rather than to facilitate comparison with previous images.

We opt for a fixed but otherwise unconstrained array of synchronised, overlapping, consumer-grade digital cameras (Fig. 1a). The low cost of the capture device allows for the possibility of assigning one or several devices to monitor individual tunnels continuously, rather than using a single expensive device to monitor many tunnels sporadically as is common. From the captured image set, we use Structure from Motion (SfM) techniques [15] to recover 3D geometry, and model the tunnel surface by locally fitting quadric surfaces to the resulting point cloud [2]. A 3D wire-frame surface model, texture-mapped with captured images, is shown in Fig. 1b.

Fig. 1
figure 1

Illustration of the system. a Hardware for data capture; b changes detected in new images by localising within a reconstructed reference model; c sample output: detected changes are clustered by appearance and ranked within each cluster according to a user-defined importance measure

Change detection. Change detection in 2D images is a well-studied problem, particularly in the fields of remote sensing, video surveillance and medical imaging [9, 12]. Systems exist for similar applications to ours, such as pattern matching for concrete crack detection or road surface condition monitoring [16, 20]. However, the detection of general changes on tunnel surfaces does not seem well explored, despite its importance. We identify the three main challenges and related work from the computer vision literature:

  1. 1.

    Query image registration. Accurate registration is an essential prerequisite for changes to be detected without a large number of false positives. In remote sensing, the standard means of registration is by coarse localisation via GPS, then feature matching and homography estimation (assuming a planar world geometry). In the tunnel environment, GPS is unavailable and the presence of 3D relief necessitates a geometric model. Recent techniques have adopted voxel-based [11, 18] and mesh-based [4] geometric models for 3D change detection in cluttered scenes with many occlusions. The geometric change detection system in [14] adopts a probabilistic rather than deterministic geometric representation, citing the difficulty of producing sufficiently accurate deterministic models in many realistic scenarios, e.g. due to the limited variety of camera poses or insufficient textures. The tunnel environment in our scenario is in general uncluttered and well-defined, and as we are specifically interested in detecting visual changes on its surface, we opt for a simple, scalable, local quadric surface model. The benefit of such an approach is that surfaces can still be recovered with sufficient accuracy for fine-grained change detection even in areas with little texture, given a suitable model.

  2. 2.

    Nuisance variability. Figure 2 illustrates some typical sources of nuisance variability in the tunnel environment. One source of false changes between the registered images is image parallax from unmodelled geometry such as textureless cables and poorly lit panel anchor holes. This can be avoided by explicitly modelling all geometry [4, 18], however, this is challenging in areas of poor texture or limited visibility. Instead, since the tunnel surface is our main concern, we circumvent the problem using a nuisance mask, in the style of [13], which downweights regions of the change image depending on their adherence to the fitted surface model. A further source of nuisance variability is illumination, amplified by the enclosed and poorly lit nature of the tunnel environment. In surveillance applications, background modelling is used to mitigate this variability, but is not feasible with limited temporal information. We investigate single image colour-normalisation and colour-constancy techniques such as Multi-Scale Retinex (MSR) [7], to counter both high- and low-frequency illumination variability.

  3. 3.

    Many modes of relevant changes. Many existing systems for large-scale infrastructure monitoring focus on pattern matching to detect specific features such as cracks in concrete [6, 20, 24, 25]. Our main concern is to capture all visual outliers which are not accounted for by understood modes of nuisance variability. Unlike all of the mentioned approaches, we aim to do so by statistical comparison against previous images rather than by creating a set of heuristics for performing detection of an explicit type of change such as a crack. Clustering the outliers that we detect based on their appearance establishes groups of features such as cracks or leaks, as illustrated in Fig. 1c, but without enforcing any prior knowledge on what types of changes we detect.

    Fig. 2
    figure 2

    Sources of nuisance variability. From left to right new query image with no relevant change; warped matching image from database; absolute difference image, with brighter areas indicating larger changes. All differences observed in this final image are caused by nuisance variability rather than relevant change. Such nuisances include low-frequency contrast changes across the tunnel lining, hard shadows around off-lining geometry and glare from specular surfaces. They are caused here by the changes in camera position and lighting

3 Theory

We denote the query image by \(I^{q}\), a function \(D^{q}\rightarrow \mathbb {R}^3\) which maps pixels from location \(\mathbf {x}\) in the query image domain \(D^{q}\) to RGB values \(I^{q}(\mathbf {x})\). The set of matching images is given by \(\{I^{m}_\textit{i}\}_{\textit{i}=1,\cdots ,M}\). This is the set of images taken at a previous time instance which have non-zero intersection with the query image. The matching images are registered to the query camera viewpoint such that \(D^{m}_\textit{i}\subset D^{q}\) for \(\textit{i}= 1,\cdots ,M\).

We are interested in obtaining a change map \(\textit{C}: D^{q}\rightarrow \{0,1\}\), which maps a location to \(1\) in case of a change and \(0\) otherwise. The goal is to achieve invariance to nuisance variability as described above and return a change map of only the relevant changes for pixel-level or image-level classification.

3.1 Change detection

We first consider the case of estimating a single change map \(\textit{C}_\textit{i}\) from the query image \(I^{q}\) and one of the matching images \(I^{m}_{\textit{i}}\) from the database:

$$\begin{aligned} p( \textit{C}_\textit{i}\, | \, I^{q}, I^{m}_{\textit{i}})&= \frac{1}{Z} \mathcal {L}(I^{q}\, | \, I^{m}_{\textit{i}}, \textit{C}_\textit{i}) \; p(I^{m}_{\textit{i}}\, | \, \textit{C}_\textit{i}) \; p(\textit{C}_\textit{i}) \end{aligned}$$
(1)
$$\begin{aligned}&\propto \mathcal {L}(I^{q}\, | \, I^{m}_{\textit{i}}, \textit{C}_\textit{i}) \; p(\textit{C}_\textit{i}) , \end{aligned}$$
(2)

where the normalising constant \(Z\) and the matching image prior \(p(I^{m}_{\textit{i}}\, | \, \textit{C}_\textit{i})\) are disregarded as they are constant with respect to the query image. This leaves a likelihood term \(\mathcal {L}(I^{q}\, | \, I^{m}_{\textit{i}}, \textit{C}_\textit{i})\) and a prior term \(p(\textit{C}_\textit{i})\) for the probability of change at any given pixel. In our experiments, we set this prior to a constant value, but in a working system it might be varied depending on the location of the image pixel within the tunnel. This would allow a user to bias the system to detect changes with more sensitivity in areas of structural importance (such as where the tunnel passes nearby other critical infrastructure).

We define a distance function \(d\) between the query and the matching image, and the likelihood term is then expressed as the distribution of values of \(d\) given whether or not a change has occurred at a particular location:

$$\begin{aligned} \mathcal {L}_d(I^{q}| I^{m}_{\textit{i}}, \textit{C}_\textit{i}) \! = \! {\left\{ \begin{array}{ll} \exp \left( -d(I^{q}|I^{m}_{\textit{i}}) / \sigma ^2 \right) &{} \quad \mathrm {if } \; \textit{C}_\textit{i}\! = \! 0 \\ {\mathcal {U}}(d) &{} \quad \mathrm {if } \; \textit{C}_\textit{i}\! = \!1, \end{array}\right. } \end{aligned}$$
(3)

where \({\mathcal {U}}(d)\) is a uniform distribution over the range of values of \(d\). The smoothing constant \(\sigma \) is set as the mean value of \(d\) over the whole query image set. Each matching image, \(I^{m}_{\textit{i}}\), provides information for changes to be identified in its areas of overlap with \(I^{q}\).

3.2 Choices for distance function

The distance function, \(d\), maps corresponding query and matching image pixels into a feature space and returns a value, \(d(I^{q}| I^{m}_{\textit{i}})\), using some distance metric. A good choice of function is one that detects relevant changes, yet is invariant to changes due to nuisance variables. Table 1 details a subset of the functions that we examined. These include: colour-normalisation techniques such as chromaticity and gray-world; colour-constancy techniques such as multi-scale retinex (MSR); a spatial histogram of gradients technique in the form of dense scale invariant feature transform (SIFT); a measure of textural similarity in the form of grayscale normalised cross-correlation (NCC); and a measure of violation of the smooth relationship, \(g\), between query and matching image intensities fitted in local windows using polynomial regression.

Table 1 Subset of distance functions examined

3.3 Combination of multiple change maps

In many cases, the query image contains regions which are visible in multiple matching images. In these areas, we can combine the outputs of the individual change maps using a probabilistic OR function:

$$\begin{aligned} p(\textit{C}(\mathbf {x})) = 1 - \prod _{\{ \textit{i}: \mathbf {x}\in D^{m}_{\textit{i}} \}} \left( 1 - p(\textit{C}_{\textit{i}}(\mathbf {x})) \right) , \end{aligned}$$
(4)

where \( D^{m}_{\textit{i}}\) is the domain of the matching image \(I^{m}_{\textit{i}}\) and the dependencies on \(I^{q}\) and \(I^{m}_{\textit{i}}\) are dropped for clarity.

3.4 Geometric prior

We use the information available to us from our SfM reconstruction to form a geometric prior, \(p( \textit{C}| {\mathcal {G}})\), included as follows:

$$\begin{aligned} p( \textit{C}_\textit{i}|I^{q}, I^{m}_{\textit{i}}, {\mathcal {G}}) = p( \textit{C}_\textit{i}| I^{q}, I^{m}_{\textit{i}}) \; p( \textit{C}| {\mathcal {G}}). \end{aligned}$$
(5)

The prior makes use of the recovered scene geometry, \({\mathcal {G}}\), which maps image locations to corresponding 3D points: \(D^{m}\subset D^{q}\rightarrow \mathbb {R}^3\). The objective of the prior is to mask out nuisance changes caused by geometry or poorly reconstructed features. It can thus also be thought of as an inverse ‘nuisance map’ [13].

To construct the prior, we first group the image interest points into an inlier (on-surface) and outlier (off-surface) set, based on the distance of their corresponding 3D points to the nearest point on the locally fitted surface. Given the relatively sparse nature of \({\mathcal {G}}\) (Fig. 3b), we next apply mean-shift segmentation to the query image [3]. This delineates the image into pixel groups of similar colour and texture (Fig. 3c).

Fig. 3
figure 3

Geometric prior. a Query image. b Distribution of reconstructed SIFT features (green). c Mean-shift segmented image with colour-coded segments. d Final geometric prior (black areas indicate off-surface or uncertain geometry)

Inliers and outliers contained within a pixel group vote towards its overall classification. Pixel groups containing only outliers are classified as off-surface and assigned a prior probability of zero, i.e. changes in those regions are considered to be nuisance variability and are ignored. Pixel groups containing more inliers than outliers are classified as on-surface and assigned a prior probability of one. For pixels lying in groups which contain no points, or fewer inliers than outliers, the prior depends on the distance of the pixel to the nearest inlier. The prior is, therefore, expressed as:

$$\begin{aligned} p( \textit{C}(\mathbf {x}) | {\mathcal {G}}) \!=\! {\left\{ \begin{array}{ll} 1, &{} \quad \text {for on-surface groups}.\\ 0, &{} \quad \text {for off-surface groups}.\\ \exp \left( \frac{-||\mathbf {x}- \mathbf {x}_{in}||}{\sigma ^2_{{\mathcal {G}}}} \right) , &{} \quad \text {otherwise}. \end{array}\right. } \end{aligned}$$
(6)

where \(\sigma _{{\mathcal {G}}}\) controls smoothness in uncertain regions and \(\mathbf {x}_{in}\) signifies the nearest inlying 2D SIFT feature.

The geometric prior thus downweights changes where the geometry is either known to be off-surface or known to be unreliable. The latter is important in the tunnel environment, where off-surface features such as cables and boxes tend to have matte, featureless surfaces and are therefore reconstructed poorly using feature-based SfM. Figure 3d shows an example of a geometric prior mask, downweighting changes along the yellow cable and in the panel anchor (surface hole).

4 System description

A flowchart illustrating the main processes of our system is shown in Fig. 4. We now describe each process in turn.

Fig. 4
figure 4

Flowchart of main system processes

Reference images acquisition. At time \(t_0\), we use a prototype capture system consisting of five cameras with synchronised shutters and flash units, arranged in a semi-circular array as shown in Fig. 1a to capture a stream of reference images \(\{I^{r}_\textit{i}\}_{\textit{i}=1,\cdots ,R}\). Each reference image overlaps with its immediate neighbours by 50 %, both radially and longitudinally along the tunnel.

Structure from motion (SfM). SIFT feature descriptors [8] are extracted from each reference image using a GPU implementation [22]. The image set is split into smaller overlapping subsets and reconstructed in parallel using standard SfM [15, 23]. Local reconstructions are stitched into a global coordinate frame, centred at the first image in the sequence, using overlapping feature correspondences. The resulting reconstruction is illustrated in Fig. 5. Note that the global geometry recovered by the reconstruction process suffers from drift as loop closure is not possible in linear tunnels. However, such drift has little consequence on the change detection system, which relies only on local geometry.

Fig. 5
figure 5

Tunnel surface reconstruction from overlapping image subsets. Above overlapping reconstructed subsets are shown in different colours; below after texture mapping

Atlas (3D database) builder. Following SfM, the pose of each reference image is known in a global coordinate system. Each reference image is stored along with its intrinsic and extrinsic parameters, the set of descriptors for its \(N\) largest SIFT features over scale and its corresponding subset number in an “atlas” database. This is used later for query image localisation.

Geometric primitive (surface) fitting. The tunnel surface is modelled locally with quadrics, fit to each point cloud using robust non-linear optimisation with outlier removal. As in [2], we find that in the tunnels we consider, a piecewise cylindrical representation is sufficient, though the system can be trivially extended to any extruded shape (e.g. square or rectangular tunnels). This representation does not limit the system to straight tunnels, but can be used for any curved tunnel provided its gradients are smooth and shallow enough to be accurately approximated as locally straight.

Query image(s) acquisition. Query images, \(I^{q}_\textit{i}\), are acquired at time \(t_1\ne t_0\) using either the same capture device or a new device such as a human inspector’s camera. In the former case, the query image data will be dense and overlapping and hence can be used as a new reference dataset for \(t_1\); in the latter case, the query image data will be sparse and unordered. In our evaluation, we assumed the latter, making localisation more challenging.

Query image localisation/camera resectioning. Approximate \(k\)-nearest neighbours matching using a \(k\)-d tree is used to match the descriptors of the \(N\) largest 2D SIFT features in the query image to the atlas database. In our experiments, we set \(k\)=5 and \(N\) = 300. Each match is a weighted vote and votes are aggregated to find the highest scoring reference image subset. Within the subset, RANSAC-based registration is performed over all SIFT features and the query image camera is accurately resectioned with radial distortion estimation. This method of localisation was found to be sufficiently discriminative on concrete tunnel surfaces to correctly register all of our query image dataset.

Change detection. Images from the reference dataset which overlap with the query image are then back-projected onto the recovered tunnel surface and re-projected into the query image. This provides a set of matching images, \(\{I^{m}_\textit{i}\}_{\textit{i}=1,\cdots ,M}\), for change detection, as described in Sect. 3.

Unsupervised clustering and ranking of changes. To present the detected changes to the user in an efficient manner, we employ an unsupervised clustering and ranking approach. The benefit of clustering, even if the number of clusters is large, is that it can remove the need for the user to address each image change individually. This is especially useful as entire groups of real but unimportant changes can be quickly disregarded (e.g. the addition/removal of a cable along the complete length of the tunnel or the addition of a yellow chainage marker on every panel along the tunnel, which would appear as a change in many images).

The change probability maps are first thresholded with hysteresis to give a discrete set of connected changes. Each connected change is then represented as a 6D point in a simple colour and shape-based feature space:

  • mean colour of (MSR-corrected) change as it appears in the query image (3D),

  • perimeter to area ratio (1D),

  • ratio of principal axes (1D),

  • morphological Euler number (1D)—the number of unchanged connected components surrounded by the change.

The feature space is normalised and mean-shift clustering is used with an adaptive bandwidth to over-cluster the changes. In our experiment, we set the number of clusters at 100, far greater than the \(\sim 10\) types of changes applied. The motivation for over-clustering is to ensure that clusters remain homogeneous to avoid grouping together different types of changes.

Changes within each cluster are then ranked by a user-defined importance measure. In our experiment, we choose this measure as the sum of the pixels within the change weighted by their change probabilities. Large, high probability changes therefore appear before small, low probability changes.

5 Experiments

We captured data covering \(180^{\circ }\) of a 3-m-diameter tunnel section of 100m length. This comprised 1,000 images at a resolution of \(3,888 \times 2,592\) pixels. Next, artificial changes were applied to the concrete tunnel surface to simulate the visual changes that might be observed in a real environment—such as leaking, cracking and spalling. A query set of 232 images was taken, of which 131 contained relevant changes. All 232 images were labelled with ground truth for the presence of absence of change, 60 of which were labelled at the pixel level.

5.1 Qualitative results

Figure 6 shows three sample queries as well as different distance functions, the geometric prior mask and final change detection results. Relevant changes in seq. 1 and 2 include leaking, fine chalk markings, discolouration and objects attached to the surface. The three illustrated distance functions—gray-world, regression and NCC—pick out changes with different degrees of success. Gray-world tends to amplify changes and has good resolution, comparing each pixel individually without taking into account its neighbourhood, but the model we use is a global one and hence illumination effects are also undesirably amplified. Polynomial regression, implemented here as cubic regression with a \(9\times 9\) window, picks out fine changes such as cracks which disrupt the smooth relationship between query and matching image intensities, but predictably fails to detect larger changes such as water leaks where the entire window (and therefore relationship) is transformed. NCC, implemented here with a \(5\times 5\) window, reaches something of a compromise, highlighting both fine and coarser changes by taking into account intensity and spatial information, but at the cost of reduced resolution of the resulting distance image. All methods falsely detect changes from the lighting units and cabling. This is especially evident in seq. 3, where there is significant parallax and specularity in the scene.

Fig. 6
figure 6

Illustrative results for three cases. Two sequences (1 and 2) feature relevant change: water leakage, chalk marks, added features; all three sequences feature nuisance change: lighting change (significantly in 1 and 3) and cables, fixtures and other off-surface geometry (significantly in 2 and 3)

The geometric prior in all three cases correctly identifies and removes most of the nuisance change caused by off-surface features. The final column shows a probabilistic output change mask, formed by a combination of gray-world and NCC features, multiplied by the geometric prior as per Eq. (5). In seq. 1 and 2, its performance is close to ground truth. Seq. 3 illustrates a failure case, caused by the unusual presence of some thread on the normally featureless red cable. The large downweighted area in the geometric prior of seq. 3 corresponds to an area of unknown geometry, as the query image is at the edge of the reconstructed area.

5.2 Quantitative results

5.2.1 Pixel-level performance

The pixel-level detection performance of various combinations of features is compared in the ROC curves of Fig. 7a. Distance functions which explicitly take into account local spatial information (NCC and DSIFT) performed better than methods which compare individual query pixels against individual reference pixels (MSR and gray-world) or against a locally fitted relationship (regression). Grayscale NCC returned the best performance, detecting 98 % of positive change pixels at a 20 % false positive rate. NCC was also tested on MSR and gray-world normalised images, although no significant difference in performance was observed.

Fig. 7
figure 7

ROC comparison. a Pixel-level performance of various features; b image-level performance with and without geometric prior; c image-level performance of various features with mean-shift geometric prior. Area-based methods gave the best performance and image-level performance was significantly improved with the introduction of a discriminative geometric prior

5.2.2 Effect of geometric prior

We compared the image-level detection performance of two-feature combinations, gray-world and gray-world NCC, in three scenarios: without a prior; using the mean-shift-based geometric prior described in Sect. 3.4; and using an alternative SLIC superpixel-based geometric prior [1]. The SLIC superpixel prior is calculated in the same manner as described in Sect. 3.4, but replacing the mean-shift algorithm with SLIC superpixelisation in the segmentation stage. ROC curves are shown in Fig. 7b. Classification performance after the introduction of the geometric priors increases substantially. With no prior, gray-world is initially far more discriminative than NCC, which is more prone to detect changes across nuisance areas of the image space. When a prior is introduced, however, nuisance regions are masked out and NCC can safely exploit local spatial information solely in the regions of interest (i.e. the tunnel surface), allowing it to outperform gray-world. Finally, the quantitative performance of our proposed mean-shift prior is shown to be improved versus the more local, SLIC-based prior. We tried several parameter settings for each and found that qualitatively, mean-shift performed better than SLIC. Despite larger computational expense, it was able to capture both irregularly shaped thin structures (e.g. cables) and large flat structures (the tunnel surface) of non-uniform size at the same parameter setting, thus returning a more semantically meaningful and useful segmentation. SLIC, in comparison, could not capture such different structures with a given region size and regularisation parameter.

5.2.3 Image-level performance

The image-level detection performance of various combinations of features is shown in the ROC curves of Fig. 7c. Consistent with our pixel-level results in Sect. 5.2.1, NCC performed best, detecting 81 % of true positives at 20 % false positive rate. Furthermore, running NCC on MSR and gray-world normalised images returned no significant quantitative difference in performance.

One difference of note when comparing Figs. 7a and c is that while gray-world and regression have worse pixel-level performance than MSR, their image-level performance is notably better. In the case of regression, this can be attributed to its failure to detect large areas of change such as the centre of the white circle in seq. 2 of Fig. 6. This reduces pixel-level performance, but it still detects the boundaries of such areas accurately (where there is overlap with an unchanged area) and therefore correctly flags the image as containing change. Gray-world returns a higher rate of false positive pixels because it offers little invariance to local lighting changes, but there are relatively few images in which this is a problem so image-level performance is not significantly degraded. Conversely, MSR resolves lighting change using local rather than global image statistics, and so has improved pixel-level performance but is more susceptible to artefacts at sharp local boundaries, e.g. between gray concrete and brightly coloured cables. This results in a higher number of false positive images.

5.3 Clustering and ranking results

The top-ranked changes in a subset of the clusters returned after the unsupervised clustering and ranking stage described in Sect. 4 are shown in Fig. 8. The method employed showed good qualitative performance at picking out groups of similar changes, although it was found that due to the variable visual nature of the false positives, larger clusters with more variation would often contain some contamination (e.g. two instances of yellow cable in cluster 3). Smaller clusters such as clusters 5–9 were generally more homogeneous, although offer less benefit in terms of reducing the workload for the human inspector. It should be noted that in a real system, adding location to the feature space should enhance the results of the method, by allowing for example all crack-like changes in the crown of the tunnel or all leakages in a particular tunnel segment to be grouped together.

Fig. 8
figure 8

Top-ranked changes in a subset of the clusters returned after unsupervised clustering and ranking

6 Discussion

We have presented a system which is suitable for the automated monitoring and detection of general visual changes on smooth, unpainted, concrete tunnel surfaces. Our system is inexpensive to implement and reduces the workload for visual inspection, enabling higher frequency, more effective tunnel inspections and better use of visual inspection data. The change detection framework we present is broadly applicable to any situation where an accurate geometric model can be recovered of the area of interest, such that reference images may be accurately synthesised from the viewpoint of a query image.

6.1 Limitations

A key limitation of the proposed system is that there must be sufficient texture on the tunnel surface to allow reconstruction at a single time instance, and sufficient stable texture to allow registration of images between time instances. Our experience is that concrete and cast iron tunnels are sufficiently textured for both, provided they have not been painted or panelled. However, we have thus far tested in relatively static (utility) tunnels, not in more dynamic environments such as road or subway tunnels, where the build up of dust and dirt over time might mask the image texture used for localisation. Similar problems may occur between wet and dry environments or at tunnel extremities. The variability introduced by such factors could be mitigated to some extent by using odometry and/or 3D information from the reconstruction to improve query image registration. However, further tests are needed.

A second limitation is that we assume the tunnel geometry has a locally uniform cross-sectional shape, which can be retrieved from the 3D reconstruction. Our experience is that this is a fair assumption in modern pre-cast concrete tunnels which are precisely fabricated, but does not hold true of all tunnels. A significantly varying tunnel geometry would require a more precise approach to surface reconstruction.

Finally, despite the inclusion of the geometric prior, many false positives are still detected around areas such as cables. Performance might be further improved by adding domain-specific knowledge such as segmenting out all cable-like structures which appear as a certain colour in the images. A more generally applicable method would be to segment out cable structures by first reconstructing them using a model-based approach.

6.2 Future work

We are currently developing our capture device to acquire much larger volumes of data automatically. In the future, we plan to test our system in an active tunnel environment to detect real changes on a larger scale and to make a more direct comparison of our system against existing manual inspection techniques.

We also plan to further explore nuisance invariant features and extend our system to more complex tunnel geometries. Another interesting avenue for research is designing the system to scale efficiently in time as well as space, so that historical data may be stored and used efficiently.