1 Introduction

Crack detection is very essential for the health monitoring and assessment of civil infrastructures such as bridges, tunnels and elevated tracks. For most of the crack detection methods based on machine vision, the main point is to extract the cracks accurately from survey images, which is not an easy work since concrete cracks are usually very thin, irregularly shaped and submerged by stains, speckles, and shadows. Various efforts have been devoted to extracting cracks from noisy images. If the texture of concrete surface is fine grained, quite good results can be obtained by some local methods such as statistical filter [1], morphological filter [2], Hessian matrix-based filter [3] and wavelets [4, 5]. However, as pointed out in literature [6, 7], if the texture of concrete surface is coarse grained and the cracks are not continuous enough, the performance of the local methods will significantly deteriorate. To extract such inconspicuous cracks embedded in rough background, a number of global methods have been proposed. Among them, the minimal path-based methods [6, 812], which appear to be most prevalent in recent literature, consider a crack as a path of minimal length in a certain metric and then convert crack extraction to finding the minimal path.

Similarly, the crack with some branches is considered as a minimum spanning tree in [13]. Though these methods are very powerful for tackling cluttered background, their drawbacks are also obvious. First, the minimal path or minimum spanning tree can only approximate the centerlines of cracks, so the extraction result does not contain any information on crack width, which is crucial for the health assessment of beams, bridges, and the like. Second, it is very difficult to cope with the topology changes of cracks such as disconnections and closed substructures. Among the minimal path-based methods mentioned above, only [6, 10] are able to extract cracks with arbitrary topology. The former requires at least one manual input point, and the latter involves a series of complicated local preprocessing steps to generate crack seeds. It is noteworthy that incorporating the minimal path into traditional line filters may lead to new methods without the above two drawbacks. For example, the free-form anisotropy (FFA) method [7] calculates intensity statistics along four minimal paths in a relatively large neighborhood for each pixel, and then generates the anisotropic feature to enhance the crack pixels. This method is insensitive to crack topology and able to preserve crack width information. However, it has the effect of blurring crack boundaries, which will become a serious problem for measuring the widths of very thin cracks. Figure 1 gives a glimpse of the characteristics of three methods mentioned above and the proposed method.

Fig. 1
figure 1

A simple comparison of three existing methods and the proposed method. a A real crack image, b the local filter method in [3], c the minimal path method in [12], d local filter \(+\) minimal path in [7], e the method proposed in this paper

There are also some distinctive methods such as percolation [14] and 3D reconstruction [15]. In the percolation method, a pixel is estimated by the shape of the region formed by simulating the natural phenomenon of liquid permeation. If the percolated region is omnidirectional, the pixel will be determined to be a background pixel. Otherwise, if the percolated region is linear, the pixel will be considered as a crack pixel. This method can be efficiently implemented by adding the termination and skip procedures. In the 3D reconstruction-based method, the authors first recover the 3D structure of an object using several images captured from different views, and then they segment the images using a morphological filter. An interesting point is that the structural element size of the morphological operator can be adaptively adjusted according to some parameters computed from the 3D model and the desired crack thickness. However, this method is not appropriate for the common single-view images.

In this paper, our aim is to extract inconspicuous cracks automatically with crack width information preserved. A new method called anisotropic clustering on surfaces (ACOS) is proposed as a solution. First, we treat each gray-level image as a two-dimensional parametric surface embedded in \(R^{3}\), so that we can analyze the original images by exploiting some properties from differential geometry, which are usually hidden from the traditional point of view. Though this geometric representation has been adopted in a number of general purpose image processing methods [16, 17], we have found its application to crack detection only in [12], which focuses on detecting the presence and location of cracks and takes no consideration of crack width, as shown in Fig. 1c. Subsequently, given the parametric surface and the candidate points obtained by the segmentation method introduced later on, we focus our efforts on developing a new anisotropic clustering algorithm based on geodesic distance to assign the points to clusters of cracks and clusters of other objects. Then, the crack clusters can be easily sifted out according to their elongated shapes. This new clustering algorithm has the ability of linking the crack fragments separated by small gaps while preventing them from agglomerating with other surrounding objects, which is very important for our application. After this, the remaining questions are how to obtain the appropriate candidate points and determine the key parameters of the clustering method. Our strategy is to integrate the clustering with the globally convex segmentation (GCS) model [18]. The final extraction result can be obtained by repeating the following steps for several times. First, minimize the energy functional of GCS and then determine the candidate points and parameters for the clustering algorithm. Second, cluster the candidate points and remove the non-elongated clusters. Third, construct a spatial prior from the clustering result and incorporate it into the energy functional of the GCS model. In addition, the \(k\)-nearest neighbor mean filter is used to improve the performance.

The rest of the paper is organized as follows. Section 2 describes the proposed anisotropic clustering criterion and algorithm. Section 3 gives details on the integration of the clustering method with the globally convex segmentation. In Sect. 4, we present experimental results of the proposed method on real images and compare it with some existing methods. Section 5 concludes this paper.

2 Anisotropic clustering on surfaces

In this section, we focus on developing a clustering method to assign the pixel points to clusters of cracks and clusters of other objects. Once these clusters are available, crack clusters can be easily sifted out according to their elongated shapes. Not all of the points have to be labeled directly by the clustering method, since we can use some other faster method to exclude many overly bright points, which are unlikely to be crack points, and only take account of the points with lower intensities, called candidate points. Obviously, the set of candidate points is a mixture of cracks and many other dark objects such as gravel, stains, and sand grains. In this paper, we refer to the small spots as speckles, and the relatively large spots as lumps. We assume here that the candidate points have been given by the segmentation method that will be introduced in Sect. 3. Figure 2a shows the candidate points of Fig. 1a as an example.

Fig. 2
figure 2

Candidate points (a) and clustering results of DBSCAN with different parameters b Eps \(=\) 5, MinPts \(=\) 10, c Eps \(=\) 10, MinPts \(=\) 10

For our application, an effective and promising clustering algorithm should be able to discover clusters of arbitrary shapes and adjust the number of clusters rather than depending on a prescribed fixed number. In addition, it should have the ability to filter out the outliers generated by speckles. The density-based clustering methods [1922] satisfy the above requirements and thus allow us to share their basic philosophy, that is, clusters can be viewed as regions of high density that separated from each other by regions of low density. However, the above requirements are far from sufficient to guarantee good performance. In real crack images, as shown in Figs. 1a and 2a, a crack is usually surrounded by noises and disconnected by a number of gaps. This gives rise to another important requirement that the clustering method should be able to separate cracks from other surrounding objects and group the crack fragments into the same cluster simultaneously. Otherwise, taking the DBSCAN method [19] for example, the obtained crack clusters will either break up into small pieces that are hard to sift out from other objects in the subsequent procedure, see Fig. 2b, or agglomerate with many noise points, see Fig. 2c. To solve this problem, the clustering method must be aware of whether a point lies in the extension line of an elongated cluster or on one side of it, which means that the clustering criterion should be anisotropic. As far as we know, the existing density-based clustering methods do not meet the last requirement, since such position information is not taken into consideration. Therefore, it is necessary to devise a new clustering mechanism. Our idea is to adopt a geometric representation and then discover the clusters based on geodesic distances. Details will be introduced in the following two subsections.

2.1 Geometric representation of crack images

For every gray-level image \(I(x,y)\), \(x,y\in {}\Omega {}\subset {}{R}^{2}\), we can treat it as a two-dimensional parametric surface represented by the map:

$$\begin{aligned} X:\Omega \rightarrow {R}^3, X(x,y)=\{x,y,I(x,y)\}. \end{aligned}$$
(1)

For example, Fig. 3 shows a parametric surface generated by a synthetic crack image that is very simple but contains the essential ingredients of real crack images. It can be seen that cracks, lumps and speckles in real images will appear on such surfaces as valleys, basins and pits, respectively. Then, intuitively, cracks can be separated from other objects by some geometric properties on the surface.

Fig. 3
figure 3

A synthetic crack image and its parametric surface

One of the most important concepts in differential geometry is the first fundamental form, which determines all the intrinsic geometric properties, and its coefficients matrix for the surfaces described by Eq. 1 is given by:

$$\begin{aligned} g= \left( \begin{array}{l@{\quad }l} 1+I^{2}_{x} &{} I_x I_y \\ I_x I_y &{} 1+I^{2}_{y} \\ \end{array} \right) . \end{aligned}$$
(2)

Consequently, the length of a curve \(\Gamma :[0, 1]\rightarrow X(x(t),y(t))\), with \(\Gamma (0)={{\mathbf {p}}_{1}}\), \(\Gamma (1)={{\mathbf {p}}_{2}}\), can be calculated by:

$$\begin{aligned} L(\Gamma )=\int _{0}^{1}{\sqrt{{{g}_{11}}{{{\dot{x}}}^{2}}+2{{g}_{12}}\dot{x}\dot{y}+{{g}_{22}}{{{\dot{y}}}^{2}}}\mathrm{d}t}. \end{aligned}$$
(3)

And then the geodesic distance between two points \({{\mathbf {p}}_{1}}\), \({{\mathbf {p}}_{2}}\) on the surface \(X\) is defined as:

$$\begin{aligned}&{{d}_{X}}({{\mathbf {p}}_{1}},{{\mathbf {p}}_{2}}) \nonumber \\&\quad = \min _{\Gamma }\,\{L(\Gamma )\ | \ \Gamma :[0, 1]\rightarrow X,\Gamma (0)={{\mathbf {p}}_{1}},\Gamma (1)={{\mathbf {p}}_{2}}\}.\nonumber \\ \end{aligned}$$
(4)

The minimizer of the functional \(L(\Gamma )\) is called a minimal geodesic or a geodesic path accordingly. Practically, with the fast marching methods [2325], geodesic distances on parametric surfaces can be computed efficiently by solving the Eikonal equation:

$$\begin{aligned} ||{{\nabla }_{X}}d(\mathbf {p})||=1 \end{aligned}$$
(5)

where \(\mathbf {p}\) is a point on the surface \(X\), and \(d\) is a distance map that assigns each point its geodesic distance from a set of source points. And then, using the back propagation procedure [26], the geodesic paths can be obtained immediately without additional computation. The geodesic distance will be used as a powerful tool to analyze the crack images in the following part of this paper.

2.2 Clustering criterion

Figure 4 presents a zoomed-in portion of the parametric surface generated by the synthetic crack image. The candidate points are indicated by small circles, and most of them are lying at the bottoms of valleys, basins and pits. Note that \({{\mathbf {p}}_{0}}\), \({{\mathbf {p}}_{1}}\), \({{\mathbf {p}}_{2}}\) are crack points, \({{\mathbf {p}}_{3}}\) and \({{\mathbf {p}}_{4}}\) belong to speckles, and \({{\mathbf {p}}_{5}}\) lies in a lump object.

Fig. 4
figure 4

Candidate points (small circles) and geodesic paths (thick curves)

We denote the geodesic path and the geodesic distance between two points \({{\mathbf {p}}_{i}}\) and \({{\mathbf {p}}_{j}}\) by \({{\gamma }_{i,j}}\) and \({{d}_{i,j}}\), respectively. The geodesic paths between some pairs of these points are plotted by green thick curves, which illustrate the corresponding geodesic distances in an intuitive way. Analogous to DBSCAN [19], we estimate the density around a point \({{\mathbf {p}}_{i}}\) as the number of candidate points contained in its neighborhood. Based on geodesic distances, it is natural to define the \(\varepsilon \)-neighborhood of \({{\mathbf {p}}_{i}}\) as \({{N}_{\varepsilon }}({{\mathbf {p}}_{i}})=\{{{\mathbf {p}}_{j}}\in X|{{d}_{i,j}}<\varepsilon \}\), where \(\varepsilon \) is a parameter determining the size of the \(\varepsilon \)-neighborhood. Then the density around \({{\mathbf {p}}_{i}}\) is given by \(\rho ({{\mathbf {p}}_{i}})=|{{N}_{\varepsilon }}({{\mathbf {p}}_{i}})\cap {{C}_{X}}|\), where \({{C}_{X}}\) is the set of candidate points on the surface \(X\). Clearly, given an appropriate \(\varepsilon \), for example \(\varepsilon ={{d}_{0,1}}\), the densities around \({{\mathbf {p}}_{0}}\), \({{\mathbf {p}}_{1}}\), \({{\mathbf {p}}_{2}}\) and \({{\mathbf {p}}_{5}}\) will be much higher than the densities around \({{\mathbf {p}}_{3}}\) and \({{\mathbf {p}}_{4}}\), which illustrates that the regions of cracks and lumps can be distinguished from the regions of speckles by density. The points with a density higher than a certain threshold \({{T}_{\rho }}\) are defined as Core points. Since the regions of cracks are denser than the regions of speckles, we can find an appropriate \({{T}_{\rho }}\) such that no point in the speckles will be accepted as a Core point whereas each crack will contain at least one Core point. Therefore, if the clustering criterion stipulates that a cluster must originate from a Core point, the speckles will not generate any new cluster. The Core points can be sought by scanning the candidate points unassigned to any cluster and computing their densities. The implementation details will be given in Sect. 2.3. On the other hand, the geodesic distance between two points belonging to different objects is usually larger than that between two points in the same object. For example, it can be seen that \({{\gamma }_{0,5}}\) is significantly longer than \({{\gamma }_{0,1}}\), although the Euclidean distance between \({{\mathbf {p}}_{0}}\) and \({{\mathbf {p}}_{5}}\) is much smaller than that between \({{\mathbf {p}}_{0}}\) and \({{\mathbf {p}}_{1}}\). It is therefore not difficult to find a proper \(\varepsilon \) such that \({{\mathbf {p}}_{0}}\) and \({{\mathbf {p}}_{1}}\) will fall in the \(\varepsilon \)-neighborhood of each other but \({{\mathbf {p}}_{0}}\) and \({{\mathbf {p}}_{5}}\) will not. This means that two adjacent objects in crack images can be separated from each other efficiently because they are not so close to each other in the sense of geodesic distance. In Sect. 3, we shall see that \({{T}_{\rho }}\) and \(\varepsilon \) can be determined automatically.

Based on these grounds, we can temporally specify our clustering criterion as follows. A cluster must originate from a Core point, and then it grows by absorbing the candidate points covered by any one of the \(\varepsilon \)-neighborhoods of the points existing in this cluster. As discussed above, the speckles such as \({{\mathbf {p}}_{3}}\) and \({{\mathbf {p}}_{4}}\) are not Core points and they are far away from other objects in the sense of geodesic distance. Therefore, complying with this criterion, the speckles will neither generate any new cluster nor be absorbed by any existing cluster, which means that the speckles will disappear in the clustering result. Moreover, complying with this criterion, the cracks and lumps will be properly assigned to different clusters since they can be separated by the geodesic distance. Then the crack clusters can be sifted out according to their elongated shapes.

However, it is not allowed by the provisional criterion to assign the fragments of a crack to one cluster, since two adjacent fragments are also not close enough to each other in the sense of geodesic distance, see the geodesic path between \({{\mathbf {p}}_{1}}\) and \({{\mathbf {p}}_{2}}\). If the fragments of a crack are assigned to several small clusters, these crack clusters will be difficult to distinguish from the lump clusters by their shapes. It is worth noticing that, quite different from DBSCAN and its variants, not all the points of a cluster are required to fall in the \(\varepsilon \)-neighborhoods of Core points in our provisional criterion. This allows us to devise more flexible mechanisms to fix the above problem.

Before proceeding, we return to the image domain \(\Omega \). Since the points in \(\Omega \) and the points on the parametric surface have a one-to-one correspondence, we denote \({{\mathbf {{p}_{i}'}}}\) the corresponding point of \({{\mathbf {p}}_{i}}\), where \({{\mathbf {{p}_{i}'}}}\in \Omega \) and \({{\mathbf {p}}_{i}}\in X\). Then, the projection in \(\Omega \) of the \(\varepsilon \)-neighborhood of \({{\mathbf {p}}_{i}}\) is given by:

$$\begin{aligned} {{{N}'}_{\varepsilon }}({{\mathbf {p}}_{i}})=\{{{\mathbf {{p}_{j}'}}}\in \Omega |{{\mathbf {p}}_{j}}\in {{N}_{\varepsilon }}({{\mathbf {p}}_{i}})\}. \end{aligned}$$
(6)

Figure 5 presents the projection in image domain of the \(\varepsilon \)-neighborhood of \({{\mathbf {p}}_{0}}\) with \(\varepsilon ={{d}_{0,1}}\). It can be seen that the projection \({{{N}'}_{\varepsilon }}({{\mathbf {p}}_{0}})\) nearly coincides with the crack fragment and has an elongated shape as well. Consequently, in this projection, the points with longer Euclidean distances from \({{\mathbf {{p}_{0}'}}}\) will have stronger tendencies to grow the cluster in the extension direction of the crack. To quantify the tendency and estimate the extension direction, we introduce the elongation vector \(\mathbf {v}({{\mathbf {p}}_{i}})\), which is defined as the projection of the vector from \({{\mathbf {p}}_{i}}\) to its direct predecessor node \({{\mathbf {p}}_{k}}\), i.e., \(\mathbf {v}({{\mathbf {p}}_{i}})={{\mathbf {{p}_{i}'}}}-{{\mathbf {{p}_{k}'}}}\). For instance, if \({{\mathbf {p}}_{1}}\) has just been assigned to the cluster containing \({{\mathbf {p}}_{0}}\) because it falls in the \(\varepsilon \)-neighborhood of \({{\mathbf {p}}_{0}}\), then \({{\mathbf {p}}_{0}}\) is the direct predecessor node of \({{\mathbf {p}}_{1}}\) and \(\mathbf {v}({{\mathbf {p}}_{1}})\) equals \({{\mathbf {{p}_{1}'}}}-{{\mathbf {{p}_{0}'}}}\).

Fig. 5
figure 5

Projection in image domain of the \(\varepsilon \)-neighborhood of \({{\mathbf {p}}_{0}}\)

Based on the elongation vectors, we can make the \(\varepsilon \)-neighborhood of some crack point protrude approximately in the direction of crack extension so that it can cover the points of another crack fragment separated by a gap while keeping noise points outside. Recalling the provisional criterion, this means that the two tasks of separating adjacent objects and grouping the fragments of a crack into the same cluster can be completed at the same time. Now, we can propose an anisotropic neighborhood for each point, given by:

$$\begin{aligned} {{N}_{\alpha }}({{\mathbf {p}}_{i}})=\{{{\mathbf {p}}_{j}}\in X|{{d}_{i,j}}<\alpha ({{\mathbf {p}}_{i}},{{\mathbf {p}}_{j}})\} \end{aligned}$$
(7)
$$\begin{aligned}&\alpha ({{\mathbf {p}}_{i}},{{\mathbf {p}}_{j}})\nonumber \\&\quad =\left\{ \begin{array}{ll} (\varepsilon \!+\!\left| \mathbf {v}({{\mathbf {p}}_{i}}) \right| ){{\left( \frac{\langle \mathbf {v}({{\mathbf {p}}_{i}}),{{{\mathbf {{p}_{j}'}}}}-{{{\mathbf {{p}_{i}'}}}} \rangle }{\left| \mathbf {v}({{\mathbf {p}}_{i}}) \right| | {{{\mathbf {{p}_{j}'}}}}-{{{\mathbf {{p}_{i}'}}}} |}\right) }^{2}} &{}\quad \mathrm{if} \, \langle \mathbf {v}({{\mathbf {p}}_{i}}),{{{\mathbf {{p}_{j}'}}}}-{{{\mathbf {{p}_{i}'}}}} \rangle \!>\!0 \\ 0 &{}\quad \text {otherwise.} \\ \end{array} \right. \nonumber \\ \end{aligned}$$
(8)

where \(\varepsilon \) is a positive parameter. We shall see in Sect. 3 that \(\varepsilon \) can be determined automatically. In Fig. 6a, two blue shadows indicate the projections of \({{N}_{\alpha }}({{\mathbf {p}}_{1}})\) and \({{N}_{\alpha }}({{\mathbf {p}}_{6}})\). In this case, we assume \({{\mathbf {p}}_{0}}\) is the direct predecessor node of \({{\mathbf {p}}_{1}}\) and \({{\mathbf {p}}_{6}}\). It can be seen that the geodesic distances \({{d}_{1,2}}\) and \({{d}_{5,6}}\) are almost identical. However, \({{N}_{\alpha }}({{\mathbf {p}}_{1}})\) covers the crack point \({{\mathbf {p}}_{2}}\) but \({{N}_{\alpha }}({{\mathbf {p}}_{6}})\) keeps the noise point \({{\mathbf {p}}_{5}}\) outside. To illustrate the relationship between an anisotropic neighborhood and its corresponding direct predecessor node more intuitively, we plot some anisotropic neighborhoods on a flat surface, i.e., a plane, as shown in Fig. 6b, where the isolated yellow point is the direct predecessor node of other points.

Fig. 6
figure 6

Anisotropic neighborhoods. a Projections in image domain of the anisotropic neighborhoods, b anisotropic neighborhoods on planes

Finally, we specify the anisotropic clustering criterion as follows. A cluster must originate from a Core point, and then it grows by absorbing the candidate points covered by any one of the anisotropic neighborhoods of the points existing in this cluster. Note that the anisotropy of this clustering criterion stems from two aspects. Besides the anisotropic neighborhoods on surface, the projection in image domain of the \(\varepsilon \)-neighborhood is also anisotropic, as shown in Fig. 5. Here is a brief summary of the roles of the key elements involved in the anisotropic clustering criterion. First, the \(\varepsilon \)-neighborhood measures the density around a candidate point in an anisotropic way. Second, the Core point is an indication of a high-density region and thus determines where a new cluster should be created. Third, the anisotropic neighborhood makes an crack cluster elongate approximately in the direction of crack extension.

2.3 Clustering algorithm

The proposed clustering algorithm is described in pseudocode in Algorithm 1. Generally, the section from step 6–19 is the procedure for finding a Core point and generating a new cluster, and the section in the while loop is the procedure of absorbing unassigned points to this cluster. In the result, the points with a CLASSID of 0 are corresponding to speckle noises, and the other points are assigned to a number of clusters. There are some steps needing further explanations. First, in steps 8 and 25, the points should be added to a vector in order of geodesic distance from the current point so that a subsequent absorbed point can have the farthest possible direct predecessor node. This will be conducive to connecting crack fragments. For instance, in Fig. 7, we assume \({{\mathbf {p}}_{0}}\) is the direct predecessor node of \({{\mathbf {p}}_{6}}\) and \({{\mathbf {p}}_{7}}\), and both their anisotropic neighborhoods cover the unsigned point \({{\mathbf {p}}_{1}}\). If we first pick out \({{\mathbf {p}}_{7}}\), the elongation vector of \({{\mathbf {p}}_{1}}\) will be so short that the anisotropic neighborhood of \({{\mathbf {p}}_{1}}\) cannot cover \({{\mathbf {p}}_{2}}\). It is therefore reasonable to pick out \({{\mathbf {p}}_{6}}\) first. Actually, it is unnecessary to rearrange these points specially because of the monotonicity of the fast marching method in computing the distance map. Second, for the sake of concise description, we borrow the member function vector::push_back() in C\(++\) Standard Template Library. It means adding a new element at the end of the vector, after its current last element. Third, due to the anisotropy, the neighborhood of a point that belongs to a newly created cluster may cover some points in an existing cluster. According to our clustering criterion, all the points in these two clusters should be grouped into one cluster. Therefore, we merge these two clusters by the procedure from steps 17 to 19.

figure d
Fig. 7
figure 7

Illustration of the effect of sequence variation on the elongation vector

Most of the computational complexity is contributed by the procedures of finding anisotropic neighborhoods using the fast marching method. If a neighborhood contains at most K grid points on surface, the complexity will be less than \(O(nK\ln (K))\), where \(n\) is the number of candidate points. The running times on some real crack images will be presented in Sect. 4.

Figure 8 shows the clustering process. Clusters are labeled with different colors. Comparing the candidate points (Fig. 8a) and the clustering result (Fig. 8h), we can see that most of the speckles are removed and the crack consisting of several fragments has emerged from the noisy back ground. Due to the elongated shape, the crack cluster can be easily sifted out from the other clusters corresponding to the lumps. The shape of a cluster can be measured by the elongatedness defined as \(\mathrm{el}=A/{{R}^{2}}\), where \(A\) is the cluster area and \(R\) is the radius of the minimum enclosing circle of the cluster. The smaller value of el implies the more elongated shape. Therefore, the clusters with an elongatedness smaller than a threshold \(\phi \) should be accepted as crack clusters while the other clusters should be removed. Figure 9 illustrates the procedure of sifting out the crack clusters from Fig. 8h. Here, the threshold \(\phi \) is set to 0.5. As shown in Fig. 9c, most of the lumps have been removed. It is worth emphasizing that the effectiveness of sifting procedure is highly dependent on the clustering result. If the crack clusters break up into small pieces or agglomerate with speckles or lumps, the sifting procedure will become invalid. Therefore, the sifting procedure should be regarded as a small additional step of the clustering procedure.

Fig. 8
figure 8

Clustering process. a Candidate points, bg intermediate stages, h clustering result

Fig. 9
figure 9

The procedure of sifting out the crack clusters from the other clusters. a Elongatedness of each cluster, b clusters labeled with numbers, c result of the sifting procedure

3 Integration with globally convex segmentation

This section addresses the problem of determining the candidate points and key parameters of the clustering method. As mentioned in Sect. 2, we can obtain the candidate points consisting of cracks and other dark objects by some image segmentation method based on pixel intensity. As a representative of region-based active contours, the Chan–Vese model [27] has been extensively applied to image segmentation. The GCS [18] adopted in our work can be viewed as a convex version of the Chan–Vese model. It avoids the drawback of being prone to getting stuck in local minima, which means that it is independent of the choice of initial contour. Moreover, it can be implemented much more efficiently via convex minimization schemes such as the dual formulation [28] and the split Bregman method [29]. The GCS model is defined as the following minimization problem:

$$\begin{aligned}&\underset{0\le u\le 1}{\mathop {\min }}\left\{ E(u)\right. =\int _{\Omega }{|\nabla u|\mathrm{d}\mathbf {x}}\!+\!\mu \nonumber \\&\qquad \qquad \left. \int _{\Omega }{({{(I(\mathbf {x})\!-\!{{c}_{1}})}^{2}}\!-\!{{(I(\mathbf {x})-{{c}_{2}})}^{2}})u\mathrm{d}\mathbf {x}}\right\} \end{aligned}$$
(9)

where \(u\) is the level set function, \(\mu \) is a positive parameter that controls the boundary smoothness of the segmented region defined as \({{\Omega }_{S}}=\{\mathbf {x}|u(\mathbf {x})\ge 0.5,\mathbf {x}\in \Omega \}\), and thus the level of detail to be segmented, and the two constants \({{c}_{1}}\), \({{c}_{2}}\) represent the mean intensities inside and outside \({{\Omega }_{S}}\). The energy \(E(u)\) is minimized using an alternating scheme where at the first step \({{c}_{1}}\) and \({{c}_{2}}\) are updated and at the second step \(u\) is evolved. Consequently, the points in the final \({{\Omega }_{S}}\) can be accepted as the candidate points.

Returning to the surface in Fig. 4, we can see that the depth of a valley or a basin can be estimated by \({{c}_{2}}-{{c}_{1}}\). It is therefore natural to set \(\varepsilon ={{c}_{2}}-{{c}_{1}}\) in Eq. 8 to ensure that adjacent objects can be separated and simultaneously a crack section can be prolonged as much as possible. Besides, since the \(\varepsilon \)-neighborhood of a crack point is expected to have at least \(\varepsilon \) candidate points (see Fig. 5), it is also reasonable to set the density threshold \({{T}_{\rho }}\) to \({{c}_{2}}-{{c}_{1}}\).

Note that, in practice, it is unrealistic to expect GCS to provide good enough candidate points at the first time. Therefore, we adopt an iterative scheme in which a spatial prior constructed from the last clustering result is incorporated into the energy functional, and then we iterate the loop of segmentation–clustering for several times.

To improve the performance, we smooth the gray-level image using the \(k\)-nearest neighbor mean filter defined on its parametric surface, given by:

$$\begin{aligned} {{I}_{s}}(\mathbf {x})=\mathrm{mean}\{I(\mathbf {y})|\mathbf {p}(\mathbf {y})\in {{N}_{k}}(\mathbf {p}(\mathbf {x}))\subset X\} \end{aligned}$$
(10)

where \({{N}_{k}}(\mathbf {p}(\mathbf {x}))\) is the set of the \(k\)-nearest neighbors of \(\mathbf {p}(\mathbf {x})\) in the sense of geodesic distance. Figure 10 illustrates the \(k\)-nearest neighbors of a crack point and a speckle. The former one tends to coincide with the crack and the latter one tends to overflow from the speckle. As a result, this filter is able to remove speckles efficiently without blurring cracks and, more importantly, flatten the bottoms of crack valleys, which is conducive to grow the crack clusters along cracks. For reasons of speed, we only smooth the region \({{\Omega }_{S}}\). Besides, as suggested in [3], we remove the shadows by subtracting the original image from its median filtered image in preprocessing.

Fig. 10
figure 10

The \(k\)-nearest neighbors of a crack point and a speckle in the sense of geodesic distance

We can now define the integrated energy functional as:

$$\begin{aligned} {{E}_{C}}(u)&= \int _{\Omega }{|\nabla u|\mathrm{d}\mathbf {x}}\nonumber \\&\quad +\mu \int _{\Omega }{({{({{I}_{s}}(\mathbf {x})-{{c}_{1}})}^{2}}-{{({{I}_{s}}(\mathbf {x})-{{c}_{2}})}^{2}})u\mathrm{d}\mathbf {x}} \nonumber \\&\quad +\int _{\Omega }{\exp (\mathrm{dist}(\mathbf {x},{{\Omega }_{C}})-\beta )u\mathrm{d}\mathbf {x}} \end{aligned}$$
(11)

where \({{\Omega }_{C}}\) is the crack region extracted in the last clustering procedure, \(\mathrm{dist}(\mathbf {x},{{\Omega }_{C}})\) is the Euclidean distance from \(\mathbf {x}\) to \({{\Omega }_{C}}\), and \(\beta \) is a nonnegative parameter that controls the penalty effect of spatial constraint. Generally, the candidate points will be restricted in the region \({{\Omega }_{\beta }}=\{\mathbf {x}|\mathrm{dist}(\mathbf {x},{{\Omega }_{C}})\le \beta \}\). We adopt the split Bregman method [29] to minimize the energy functional \({{E}_{C}}(u)\) with respect to \(u\) motivated by its high convergence speed. In this way, the above minimization problem is converted to the following sequence of optimization problems:

$$\begin{aligned} \left( {{u}^{k+1}},{{\mathbf {d}}^{k+1}}\right)= & {} \arg \ \underset{0\le u\le 1,\mathbf {d}}{\mathop {\min }}\,\left\{ \int _{\Omega }{|\mathbf {d}|\mathrm{d}\mathbf {x}}+\mu \int _{\Omega }{ru\mathrm{d}\mathbf {x}} \right. \nonumber \\&+\int _{\Omega }{\exp (\mathrm{dist}(\mathbf {x},{{\Omega }_{C}})-\beta )u\mathrm{d}\mathbf {x}}\nonumber \\&+\left. \frac{\lambda }{2}\int _{\Omega }{|\mathbf {d}-\nabla u-{{\mathbf {b}}^{k}}{{|}^{2}}\mathrm{d}\mathbf {x}}\right\} \end{aligned}$$
(12)
$$\begin{aligned} {{\mathbf {b}}^{k+1}}= & {} {{\mathbf {b}}^{k}}+\nabla {{u}^{k+1}}-{{\mathbf {d}}^{k+1}} \end{aligned}$$
(13)

where \(r={{I}_{s}}(\mathbf {x})-{{c}_{1}}{{)}^{2}}-{{({{I}_{s}}(\mathbf {x})-{{c}_{2}})}^{2}}\), \(\mathbf {b}\) and \(\mathbf {d}\) are two auxiliary vectors, and \(\lambda \) is a positive parameter whose recommended value is 0.5. Next, \({{u}^{k+1}}\) can be obtained by solving the Euler–Lagrange equation of Eq. 12 element wisely:

$$\begin{aligned} \Delta u=\frac{\mu }{\lambda }r+\frac{1}{\lambda }\exp (\mathrm{dist}(\mathbf {x},{{\Omega }_{C}})-\beta )+\nabla \cdot (\mathbf {d}-\mathbf {b}). \end{aligned}$$
(14)

And then \({{\mathbf {d}}^{k+1}}\) is computed by:

$$\begin{aligned} {{\mathbf {d}}^{k+1}}=\max \{|{{\mathbf {b}}^{k}}+\nabla {{u}^{k+1}}|-\lambda ,0\}\frac{{{\mathbf {b}}^{k}}+\nabla {{u}^{k+1}}}{|{{\mathbf {b}}^{k}}+\nabla {{u}^{k+1}}|}. \end{aligned}$$
(15)

Finally, the level set function \(u\) will converge after only a few iterations and become very close to a binary function.

At the end of this section, we summarize in Algorithm 2 the main steps of the integration of clustering with segmentation. Figure 11d shows the final extraction result of the crack in Fig. 1a, which is obtained after five iterations of the segmentation–clustering loop.

figure e
Fig. 11
figure 11

Crack extraction results of Fig. 1a after the 2nd–5th iterations of the segmentation–clustering loop. ad are corresponding to 2nd–5th iterations in order. Recall that the result after the 1st iteration is shown in Fig. 9c

4 Experimental results

In this section, we demonstrate the performance of the proposed ACOS method on a set of 80 real crack images acquired from concrete track beams. All the test images have a size of 200 \(\times \) 200 pixels. The crack widths range from 1 to 3 pixels and the backgrounds are coarse grained. Besides the crack image in Fig. 1, another six examples of the test images are presented in Fig. 12. The ground truths shown in the first column of Fig. 13 were obtained by tracing the cracks manually on the original images. We compare the performance of our method with the FFA method [7], since it is also width-aware, automatic, robust to noise, and insensitive to topology changes as mentioned in the Sect. 1. The parameters \(\mu \), \(\beta \) in Eq. 11 , \(k\) in Eq. 10 and the elongatedness threshold \(\phi \) are set to 400, 5, 15 and 0.5, respectively. In FFA, the length of minimal paths is set to 25, which is believed to be an optimal value according to preliminary experiments. Besides, to validate the new clustering criterion, we also test the performance of DBSCAN \(+\) GCS, which is obtained by substituting the anisotropic clustering in step 8 of Algorithm 2 with the basic version of DBSCAN. The neighborhood radius and density threshold of DBSCAN are tuned to 5 and 10 to achieve its best performance. Extraction results of these three methods are presented in Fig. 13. All the programs were implemented in C\(++\) in single-thread mode and run on a PC with an Intel Celeron 2.7 GHz CPU.

Fig. 12
figure 12

Concrete crack images

Fig. 13
figure 13

Ground truth cracks and extraction results. First column ground truth, second column FFA, third column DBSCAN \(+\) GCS, last column ACOS

4.1 Performance evaluation

From the extraction results, we can see that both ACOS and FFA achieve good performance since the extracted cracks are similar to the ground truths. However, our method preserves more details. As regards DBSCAN \(+\) GCS, its results are much worse than ACOS: many more noise points are mislabeled (see Fig. 13c–f) and the cracks are prone to break up into more fragments (see Fig. 13b), which makes the crack clusters more difficult to sift out from irrelevant objects. For quantitative evaluation, we adopt the precision, recall and \(F\)-measure, which are widely used in evaluating the performance of object extraction methods. In crack extraction, precision is defined as the fraction of extracted crack pixels that are true crack pixels, and recall the fraction of true crack pixels that are extracted. \(F\)-measure is then defined as \(2\times \frac{\text {precision}\,\times \, \text {recall}}{\text {precision}\,+\,\text {recall}}\). Nevertheless, from a practical point of view, these measures are not perfect because they are oversensitive to the localization of crack boundary. For instance, consider a crack with a width of two pixels. If the extracted crack is exact in shape and size but merely deviates from the true crack by one pixel in the transversal direction, then all the above three measures will drop to only 0.5, though this deviation can be totally ignored in practice. On the other hand, location deviations in the manually labeled ground truth images are also unavoidable, especially for the vague and tortuous cracks. Therefore, as in [11, 13], a little location tolerance must be allowed when counting the correctly labeled pixels. Then, we define the precision, recall and \(F\)-measure with tolerance \(\sigma \) as follows:

$$\begin{aligned}&\text {precision}(\sigma )=\frac{|{{\Omega }_{C}}\cap D({{\Omega }_{\mathrm{true}}},\sigma )|}{|{{\Omega }_{C}}|} \nonumber \\&\text {recall}(\sigma )=\frac{|D({{\Omega }_{C}},\sigma )\cap {{\Omega }_{\mathrm{true}}}|}{|{{\Omega }_{\mathrm{true}}}|} \nonumber \\&F\text {-measure}(\sigma )=2\times \frac{\text {precision(}\sigma \text {)}\times \text {recall(}\sigma \text {)}}{\text {precision(}\sigma \text {)}+\text {recall(}\sigma \text {)}} \end{aligned}$$
(16)

where \(|\cdot |\) denotes the number of pixels in a certain region, and \(D(A,\sigma )=\{\mathbf {x}|\mathbf {x}\in \Omega ,\mathrm{dist}(\mathbf {x},A)\le \sigma \}\).

Figure 14 plots the average precision, recall and \(F\)-measure versus the location tolerance . We can see that the three curves of ACOS are above those of FFA, respectively, which indicates that ACOS outperforms FFA on extraction accuracy. When the tolerance is smaller than 3, the curves of FFA drop more sharply than the curves of ACOS. This is because the extracted cracks of FFA contain more discontinuities and burrs than those of ACOS. On the other hand, when the tolerance is larger than 3, the precision of FFA becomes almost flatten whereas that of ACOS still ascends slowly. This fact implies that the wrongly labeled points of FFA scatter more randomly whereas those of ACOS mainly distribute near cracks. Therefore, our method can locate the cracks more accurately. The curves of DBSCAN \(+\) GCS are significantly lower than those of ACOS. This verifies the effectiveness of our new clustering criterion.

Fig. 14
figure 14

Curves of precision (a), recall (b) and \(F\)-measure (c) versus location tolerance

Crack width is very crucial for the health assessment of concrete structures, therefore, a successful crack extraction method should be able to provide dependable width information. To compute the average width of an extracted crack, we adopt the following classical method, which is still used in very recent literature [30]. Because crack length is far lager than crack width, half of the perimeter of crack region can approximate the crack length. Then the average crack width can be estimated by the ratio of crack area to length. Notice that the extraction results of FFA and DBSCAN \(+\) GCS contain many noise points, so the above method for computing crack width is not applicable to them. However, according to visual examination, many cracks extracted by FFA are more than two times the widths of the ground truth cracks. In Table 1, we list the average widths computed from the ground truth cracks and the ones from the extraction results of ACOS. Compared to the ground truth widths, the mean absolute percentage error of ACOS on all the test images is 29 %. Considering that the denominator, i.e., real crack width, is very small, this result is acceptable. The width error is mainly caused by the stones adhering to cracks and the vague crack boundaries.

Table 1 Crack widths (pixel)

In Table 2, we record the running times of the three methods for each example image. The average running time of ACOS on the test images is roughly twice that of DBSCAN \(+\) GCS. The FFA method is much slower because it has to compute four minimal paths for each pixel, which is very time-consuming. The computational complexity of ACOS is mainly contributed by the procedure of anisotropic clustering. As mentioned in Sect. 2.3, the complexity depends on the number of candidate points and the sizes of anisotropic neighborhoods, which are changing from point to point. Therefore, we can only approximate its upper bound by \(O(mnK\ln (K))\), where \(m\) is the iteration number of the clustering procedure, \(n\) is the number of candidate points, and \(K\) is the maximum size of anisotropic neighborhoods. Generally, \(m\) is less than 10 and \(K\) less than 100.

Table 2 Running times (s)

4.2 Choices of parameters

There are four user-determined parameters in the proposed method. The parameters \(\mu \) and \(\beta \) in Eq. 11 controls the segmentation detail level and the penalty effect of spatial constraint, respectively. The integer \(k\) in Eq. 10 controls the sizes of speckles that will appear in the candidate points by smoothing out the speckles whose sizes are much smaller than \(k\). Besides, it also controls the smoothness of the crack bottoms. The elongatedness threshold \(\phi \) directly determines which clusters will be sifted out as crack clusters. The optimal values for these parameters are \(\mu =400\), \(\beta =5\), \(k=15\) and \(\phi =0.5\). We test the parametric sensibility by varying one parameter at a time while fixing the other parameters. Consequently, as shown in Fig. 15, the curves of precision, recall and \(F\)-measure at a certain tolerance versus the varying parameter can be used to evaluate the parametric sensibility quantitatively. In this case, we set the tolerance \(\sigma \) to one pixel. We can see that all the four parameters can vary in a relatively wide range without changing the \(F\)-measure significantly. However, \(\mu \) and \(\phi \) can reduce the precision and increase the recall significantly as they become large. This is because larger \(\mu \) will provide more candidate points and larger \(\phi \) will reserve more clusters. Moreover, in defect detection, recall is usually more important than precision. If two values result in approximately equal \(F\)-measures, we should choose the value corresponding to the higher recall.

Fig. 15
figure 15

Curves of precision, recall and \(F\)-measure versus varying parameter

5 Conclusions

In this paper, we developed an anisotropic clustering method for extracting small cracks from noisy concrete background. It treats crack images as parametric surfaces and then groups candidate points into arbitrarily shaped clusters by exploiting the geometric properties on these surfaces. The clusters corresponding to cracks can be easily sifted out according to their elongated shapes. By virtue of the integration of the clustering method with the globally convex segmentation, candidate points and some key parameters can be determined automatically. Experimental results demonstrate that the cracks extracted by our method are very similar to manually traced ground truth cracks. The crack widths computed from the extracted cracks are also acceptable. In future work, we will further explore geometric properties to remove the stones adhering to cracks from the extraction results. Besides, we intend to accelerate the clustering procedure of this method by reducing the number of points for which the neighborhood computation is required.