1 Introduction

Hardness, which is an important characteristic of solid materials, can be determined with the Vickers hardness test. A pyramidal indenter is pressed into the material with a defined force and causes an indentation. An important issue is to measure the size (diagonal length) of the approximately square indentation to determine the Vickers hardness [1] (depends on the applied force and the diagonal length). As the manual measurement of the indentation images not only is expensive but also interpretive and subjective, a robust and accurate automatized measurement method is highly beneficial. Segmentation algorithms are used to get the positions of the four vertices of the indentation. Having the four vertices, the diagonal lengths can be calculated easily.

There are several proposals for image segmentation of Vickers indentations. One group of algorithms rely on wavelet analysis [2, 3]. These methods assume that object borders are perfectly straight lines, which is not always true. Another approach [4] is based on edge detection followed by Hough transform and least squares approximation of lines. Edge finding techniques are based on the assumption that high differences between neighboring pixels imply that these pixels are part of the border. Especially in case of noisy images, this assumption is not right at all. The method introduced in [5] applies thresholding followed by a Hough transform. If thresholding is applied to images, it is necessary to calculate a threshold value which depends on the current image, because a fixed threshold surely does lead to good results. Often a separation of the object from the background with thresholding is not possible, since the assumption of different gray values between object and background is not valid. Other suggested methods also binarize the image using thresholding [6, 7], followed by morphological closing. Another approach is based on axis projection and Hough transform [8]. This approach is based on the assumption that the objects are perfectly aligned (diagonals vertical/horizontal). Methods relying on template matching [9, 10] are quite robust to noise. This is because big templates suppress noise as large regions are summed up. The template-matching approach introduced in [10] provides robust as well as precise results, but requires an accurate alignment (diagonals horizontal/vertical). A high degree of accuracy is achieved by applying four corner templates instead of one complete square template [9] (square templates in different sizes and different rotations are matched with the image). The results of the approach using complete square templates [9] are robust but only serve as approximations. A refinement strategy [11] adds accuracy.

In this work, first we investigate the active contours approaches with reference to Vickers indentation images. The aim of these approaches is that a contour with a defined initialization converges to the real object boundaries. In experiments, different energy functionals, which depend on pixel values of the image on the one hand and homogeneity criteria of the contour on the other hand are utilized. We found out that this technique suffers from poor initializations and we illustrate the reason for this behavior. To achieve appropriate initializations, a flexible Shape Prior approach, which produces highly robust, but not accurate results, is investigated [12]. In the two-stage segmentation approach, these approximative, but robust results deal as initializations for a subsequent active contour. Moreover, active contours as well as the Shape Prior approach are joined with the existing Shape from Focus approach [13], which extracts additional information from a series of images (of the same indentation). This increases the robustness of segmentation, especially of low quality images. Furthermore, a gradual enhancement approach [14] based on different unfocused images is investigated, in order to decrease the overall runtime and maintain the robustness. With new image data, different combinations of the mentioned approaches are evaluated in a uniform way in order to compare the segmentation accuracy.

All these different approaches which are explained in the consecutive sections are summarized in Fig. 1. The proposed variable two-stage segmentation approach defines different initialization stages (1A–1C) and enhancement stages (2A–2C). One initialization and one enhancement stage must be chosen and lastly, the local Hough transform has to be applied. For high quality images, we propose the methods 1A and 2A. In the case of lower quality images, the segmentation robustness can be increased, if in the first stage, method 1B (Shape from Focus) is utilized. Very low quality images even benefit if the method 2B is used. To decrease the execution runtime, method 1C can be used for all images (followed by 2A).

Fig. 1
figure 1

Block diagram: two-stage approach with different methods for each stage

We found out that whereas a certain degree of inaccuracy in the first approximative stage does not affect the overall segmentation accuracy, outliers cannot be refined in the precise stage 2. In the following, robustness means that the outliers ratio is low. Experiments showed, that a vertex position can be refined without a loss of accuracy, until a detection error of about 50 pixels is reached (in stage 1). Consequently, we set the outliers threshold to 50 pixels.

In Sect. 2, different active contours approaches (stage 2, especially 2A and 2C) are investigated with reference to Vickers indentation images. In Sect. 3, the new approximative Shape Prior segmentation approach (stage 1, especially 1A) is introduced to achieve good initializations. In Sect. 4, the Shape from Focus approach [15] is incorporated with traditional segmentation methods (stage 1B, 2B), in order to improve robustness. In Sect. 5, a new gradual enhancement approach is introduced (stage 1C). Experiments are shown in Sect. 6. Section 7 concludes this paper.

1.1 Vickers images

The images to segment approximately fit the following description:

  • Square geometry of object to segment

  • Dark object, bright background

  • Diagonals are horizontally and vertically aligned

  • Object is situated close to the center.

Figure 2a shows quite perfect images. Images in Fig. 2b show different kinds of noise. Some bright images lack of contrast (Fig. 2c), especially the diagonals’ gray scale is the same as the background. Another inconvenience is the fact, that in such images, possible noise is often darker than the imprint. In Fig. 2d, the diagonals are not aligned horizontally and vertically. In Fig. 2e, a concave and a convex curvature is shown. The images in Fig. 2f represent the smallest and largest imprints in our databases.

Fig. 2
figure 2

Different categories of Vickers indentation images

2 Active contours approaches

2.1 State of the art methods

The traditional active contours (snake) model has been introduced in [16]. A snake is a closed curve which iteratively converges at the object’s borders, by means of gradient descent of an energy functional. The energy is computed from image pixel values (e.g. gradients) and the homogeneity of the contour. The curve is represented parametrically by a sequence of pairs containing x and y coordinates connected with straight line segments.

The level set method introduced in [17] is an alternative to the traditional snake model. Apart from other inconveniences, traditional active contours suffer from an explicit parametrization (by frontier points) of the contour. In the level set formulation, the contour is given by its level set \(\Gamma \).

$$\begin{aligned} \Gamma = \{ (x,y) | \phi (x,y) = 0 \}. \end{aligned}$$
(1)

\(\phi (x,y)\) is a function which is \(1\) inside, \(-1\) outside of the region and exactly \(0\) at the frontier of the evolved shape.

Evolution of the frontier happens by moving the level set \(\Gamma \) in normal direction \(\frac{\nabla \phi }{||\nabla \phi ||}\) with a specified speed \(v\). There exist lots of different ways of calculating the speed function \(v\), which influences the behavior of the evolving level set.

As with the snake approach, edge-based level set approaches [18] require the propagation of edges to increase the capture range (region where the contour converges to the real boundary). To bypass this issues, in [19], a region-based approach has been introduced, where the force of the contour is not based on image gradients. This method is based on the assumption that the object’s surface as well as the surface outside of the object are homogeneous as far as its gray value is concerned:

$$\begin{aligned} E_\mathrm{CV}&= \int \limits _{\Gamma _\mathrm{in}} (I(v) - \bar{I_\mathrm{in}})^2 \mathrm{d}v + \int \limits _{\Gamma _\mathrm{out}} (I(v) - \bar{I_\mathrm{out}})^2 \mathrm{d}v \nonumber \\&+\, \lambda \int \limits _{\Gamma } ||\nabla \phi (v)|| \mathrm{d}v . \end{aligned}$$
(2)

\(\Gamma \) is the image, \(I\) is the image gray value, \(\bar{I_\mathrm{in}}\) (\(\bar{I_\mathrm{out}}\)) is the average image value inside (outside) of the contour, \(\Gamma _\mathrm{in}\) (\(\Gamma _\mathrm{out}\)) is the surface inside (outside) of the contour, \(\nabla \) is the gradient operator and \(\lambda \) is the curvature weighting term.

The region based-approach is based on the assumption that images do not necessarily have strong gradients at their boundaries, but the regions inside and outside of the contour have to be homogeneous as far as the gray value is concerned. This assumption usually is quite appropriate, however it is inappropriate for some kinds of images (e.g. the image in Fig. 3: background consists of regions which are darker than the object).

Fig. 3
figure 3

Indentation image with inhomogeneous background

Consequently, a region-based model would state that such dark noise pixels are more likely to be part of the object than to be part of background (as background average color is brighter). Obviously, this does not match with reality. To overcome that inconvenience, a statistical approach has been introduced [20]:

$$\begin{aligned} E_\mathrm{stat}&= - \int \limits _{\Gamma _\mathrm{in}} \log p_\mathrm{in}(f(v)) \mathrm{d}v\nonumber \\&-\int \limits _{\Gamma _\mathrm{out}} \log p_\mathrm{out}(f(v))\mathrm{d}v + \alpha |C|. \end{aligned}$$
(3)

The regularization term \(\alpha |C|\) prevents the contour from developing zigzag patterns. \(p_\mathrm{in}\) and \(p_\mathrm{out}\) are the probabilities of the feature vectors \(f\) inside and outside of the contour. Intuitively, the energy is low, if both regions are homogeneous with respect to the feature vectors \(f\).

This formulation allows not only to use the gray value as feature but also each feature which could be defined for a specific pixel might be included in an arbitrary feature vector. In our experiments, the following feature vectors are used:

$$\begin{aligned} f(v)&= (I(v),|| \nabla I(v) ||) ;\end{aligned}$$
(4)
$$\begin{aligned} f(v)&= \left(I(v),\frac{\mathrm{d}I(v)}{\mathrm{d}x},\frac{\mathrm{d}I(v)}{\mathrm{d}y}\right) . \end{aligned}$$
(5)

2.2 Vulnerability to initialization

First, we discuss the edge-based approaches which are based on image gradients. That means, the contour moves into the direction, where the gradients are increasing. Close to object boundaries, this actually is appropriate. However, we do not already know where the objects are located approximately. In Fig. 4 (left), the gradient image and a possible initial contour are shown. Surely, a gradient descent of the contour would never be successful as the capture range is too small. Especially, the segmentation of noisy images suffers because the gradient image shows lots of regions with low image energy (Fig. 4, right), where the contour potentially converges to.

Fig. 4
figure 4

Problems: small capture range (left), noise gradient image (right)

To increase the capture range and decrease the effect of noise, large edge operators instead of small ones (Sobel \(3 \times 3\)) could be utilized. This causes a propagation of the edge information and thus makes a gradient descent of the contour possible. However, the size of the gradient operator is limited as small objects are blurred too much and moreover, the computational costs become high. The effect of differently sized edge operators is shown in Fig. 5. However, although a big gradient operator is used in the right image, the segmentation fails.

Fig. 5
figure 5

Differently sized gradient operators and the impact on the capture range

In [21, 22], strategies to increase the capture range are proposed. One big problem of these methods is that not only edge information is propagated but also noise. Noisy edge images affect the active contours as shown in Fig. 6.

Fig. 6
figure 6

Segmentation failed because of noisy edge image

Although region-based approaches (and the statistical approach) which do not rely on gradients are less vulnerable to the initial configuration, starting with a general level set is often not successful as well. On the one hand, even region-based algorithms do not succeed to converge at the desired boundary, if the contour is too far away from the indentation. The contour often does not shrink to converge at the boundaries, but grows instead. This is because the average gray value of the background might be darker than the average gray value within the contour (shown in Fig. 7). On the other hand, if such long distances must be managed, computational costs are tremendous.

Fig. 7
figure 7

Region-based approach fails (left: initialization, right: wrong segmentation)

3 An approximative Shape Prior method

The methods mentioned so far suffer from converging to local minima if the initialization is inappropriate. In existing Shape Prior level set approaches [23, 24], a weighted shape term is added to the energy function. This increases the robustness, but for a standalone segmentation even these methods are inappropriate. The same problem as shown in Fig. 7 arises. We propose a quite different way for robust segmentation of shapes which are known a priori [12].

Our approach requires a parametric description of the prior shape. The object that will be segmented will have exactly the prior shape, as not a contour (parametrized by points or level set), but the object description parameters are directly evolved by gradient means of descent. Whereas the traditional active contours as well as level set algorithms allow arbitrary deformation of the initial contour, our approach only allows the evaluation of the following four parameters (the effect on the contour is shown in Fig. 8):

  1. 1.

    \(r_0\): scaling

  2. 2.

    \(x_0\): translation x axis

  3. 3.

    \(y_0\): translation y axis

  4. 4.

    \(\alpha _0\): rotation

Fig. 8
figure 8

The four degrees of freedom

The contour of the square is given by the points \((x,y)\) with the distance \(d = r_0\) to a center \((x_0,y_0)\). \(d\) is calculated in the following way to ensure that the evolved contour has a square shape:

$$\begin{aligned} d&= |(x-x_0) \cdot \cos (\alpha ) + (y-y_0) \cdot \sin (\alpha )| \nonumber \\&+|(x-x_0) \cdot \sin (\alpha ) - (y-y_0) \cdot \cos (\alpha )|. \end{aligned}$$
(6)

Of course, this algorithm will not be able to segment Vickers images perfectly, as Vickers’ shape often cannot be described by a perfect square. Though this is not our objective, we aim at a pretty good approximative segmentation of a very high rate of images. These approximations deal as the initializations for a precise segmentation method.

In order to reduce the computational effort, we previously downscale the indentation images by factor 10. Although this causes a further loss of accuracy, the results after the precise segmentation stage are not affected.

The regions in- and outside of the square are given by \(\Gamma _\mathrm{in}\) and \(\Gamma _\mathrm{out}\):

$$\begin{aligned}&\Gamma _\mathrm{in} = \{(x,y) : d < r_0 \};\end{aligned}$$
(7)
$$\begin{aligned}&\Gamma _\mathrm{out} = \{(x,y) : d > r_0 \}. \end{aligned}$$
(8)

As with the level set approach, we define an energy criterion which is minimized by gradient descent. We investigated different energy functions (edge based, region based). Tests showed that the following statistical criterion, which is derived from the statistical level set approach proposed in [20], is the best choice:

$$\begin{aligned} E = - \int \limits _{\Gamma _\mathrm{in}} \log (p_{\Gamma _\mathrm{in}}(f(v))) \mathrm{d}v - \int \limits _{\Gamma _\mathrm{out}} \log ( p_{\Gamma _\mathrm{out}}(f(v))) \mathrm{d}v.\nonumber \\ \end{aligned}$$
(9)

\(f(v)\) is an arbitrary feature of the point \(v\). We investigate the feature vectors in Eqs. (4) and (5). The evolved parameters are collected in the vectors \(s_i =(x_0, y_0, r_0, \alpha )\). The vector \(s_0\) is the initialization. \(s_{n+1}\) is defined recursively:

$$\begin{aligned} s_{n+1} = s_{n} + \lambda ( \nabla E). \end{aligned}$$
(10)

\(\lambda \) is defined to be signum function:

$$\begin{aligned}&\lambda ((x_1,\ldots ,x_n)^\mathrm{T}) = (\mathrm{sign}(x_1),\ldots , \mathrm{sign}(x_n))^\mathrm{T};\end{aligned}$$
(11)
$$\begin{aligned}&\nabla E = \left(\frac{\mathrm{d}E}{\mathrm{d}x}, \frac{\mathrm{d}E}{\mathrm{d}y}, \frac{\mathrm{d}E}{\mathrm{d}r}, \frac{\mathrm{d}E}{\mathrm{d}\alpha }\right)^\mathrm{T}. \end{aligned}$$
(12)

e.g. the partial derivative of the \(x\) dimension is calculated as:

$$\begin{aligned} \frac{\mathrm{d}E}{\mathrm{d}x}((x,y,r,\alpha )^\mathrm{T})&= E((x+1,y,r,\alpha )^\mathrm{T}) \nonumber \\&-E((x-1,y,r,\alpha )^\mathrm{T}). \end{aligned}$$
(13)

Although the introduced approach is already able to deal with local minima caused by noise, we still have not achieved a total invariance to the initialization \(s_0\). Local minima still prevent from a proper localization of the indentation in several cases. The balloon approach [25] introduced for active contours deals with this problem by adding an energy term, forcing the contour to become smaller or larger. Our approach allows to apply a kind of balloon force in an easy but effective way. Instead of calculating the radius \(r_0\) by gradient descent, \(r_0\) is simply decreased by one in each iteration of the gradient descent. If the contour starts at the image boundaries (\(r_0\) is large), it necessarily has to cross the object’s boundaries, when getting smaller and smaller.

Unlike unforced gradient descent, the proposed balloon method does not stop before \(r_0\) becomes zero (or a defined minimum). In a second step, the history of the gradient descent has to be analyzed to get the best fitting vector \(s_{res}\) from a set of several local minima. In our case, the best results are achieved when using the vector \(s_i\) with the highest response (achieved by convolution) of the image information to the template (parametrized by \(s_i\)) shown in Fig. 9 with a thickness of 3 pixels.

Fig. 9
figure 9

Directed edge template (thickness 1)

4 Including Shape from Focus

The approaches mentioned so far only rely on one single image, which has to be segmented. All the information must be gathered from this single two dimensional signal. However, the real world cannot be described by two dimensions, as space has got three. To be able to exploit also 3D information, we utilize the Shape from Focus [15] approach. This method uses a set of images, with different focus setups (as shown in Fig. 11) to generate a depth map.

In order to acquire focused images, the Vickers hardness testing facilities rely on autofocus systems. The autofocus system takes pictures, computes the focus metric [26] and moves the camera for one step until the peak of the focus metric (i.e. the focused image) is reached. Consequently, a number of images of the same indentation is already available, which can be utilized to compute the shape.

4.1 Shape computation

Intuitively, the Shape from Focus approach exploits the fact that focus can only be achieved for a specific distance from the camera. That means, it is not possible to focus an object in the foreground as well as the background simultaneously. Consequently, if pictures with different focus setups of a three dimensional object are taken, information of the third dimension (depth) can be obtained, as in the different images, different regions are focused.

First of all, to compute the shape of an object, a series of images \(I_k\) of an object with different focus levels \(k \in L\) must be gathered (\(L\) is the set of focus levels). After that, for each point \(v=(x,y)\) in each image \(I_k\) of a focus series, a focus measure \(F(v)\) must be computed. Next, for each point \(v\), the focus level \(k \in L\) with the highest focus measure \(F_{k}(v)\) is calculated. Each focus level \(k\) represents a defined depth level \(d\):

$$\begin{aligned} d(v) = k \in L : \forall l \in L : F_k(v) \ge F_l(v). \end{aligned}$$
(14)

Although the depth is not measured absolutely, relative differences are sufficient to determine peaks and valleys of the surface.

4.2 Focus measures

As mentioned, a focus measure \(F\) is necessary to determine the focus level \(k\) with the highest response. In [15], the sum-modified-Laplacian (SML) operator \(F_{SML}\) is proposed, which is based on the second order derivation:

$$\begin{aligned} F_\mathrm{SML}(i,j)&\!=\!&\sum _{x=i-N}^{i+N} \sum _{y=j-N}^{j+N} ML(x,y), \quad \mathrm{if} \; ML(x,y) \ge T;\nonumber \\ \end{aligned}$$
(15)
$$\begin{aligned} \mathrm{ML}(x,y)&\!=\! | 2 \cdot I (x,y) \!-\! I(x-s, y) - I(x+s,y) |\nonumber \\&\!+| 2\cdot I (x,y) \!-\! I(x,y\!-\!s) \!-\! I(x,y\!+\!s) |. \end{aligned}$$
(16)

\(s\) is the step size of the metric, which can be adjusted according to the image properties.

The SML operator not only consists of a simple gradient operator. To increase the robustness, a threshold \(T\) is introduced, which suppresses very small responses. Moreover, some neighboring pixel responses are summed up to achieve a more steady output (adjustable with \(N\)).

Alternatively, we investigate a generalization of the Tenengrad focus measure \(F_\mathrm{T}\) [26] that is based on the first order derivation, which could be used instead of the proposed SML measure.

$$\begin{aligned} F_\mathrm{T}(i,j) = \sum _{x=i-N}^{i+N} \sum _{y=j-N}^{j+N} T(x,y),\quad \mathrm{if} \; T(x,y) \ge T. \end{aligned}$$
(17)

\(T(i,j) = S_x^{*2}(i,j) + S_y^{*2}(i,j)\) and \(S_x^*\) and \(S_y^*\) are convolutions of the Sobel operators in x and y direction with the image.

Moreover, we investigate the range metric, which is based on the histogram:

$$\begin{aligned}&F_\mathrm{range}(i,j) = \max (r(i,j)) - \min (r(i,j));\end{aligned}$$
(18)
$$\begin{aligned}&r(i,j) = \{ (x,y) | (|x-i| + |y-j|) \le T \}. \end{aligned}$$
(19)

\(T\) defines the size of the considered region.

4.3 Introduction of Shape from Focus knowledge into segmentation

As shown in Fig. 10, the depth estimation based on the Shape from Focus approach often is not able to appropriately separate the indentation from the background. The depth in regions marked white cannot be reliably determined (i.e. region is homogeneous). Otherwise, the darker the region, the farther it is away.

Fig. 10
figure 10

Original images (top) and corresponding depth informations (bottom): reliable (left) and unreliable (middle, right) depth information

Especially in case of high quality images, the shape information often is very unreliable, whereas in case of (noisy) low quality images, the shape information seems to be more reliable than the original image. Consequently, we incorporate the active contours and the Shape Prior method with the Shape from Focus approach [13]. We concentrate on the statistical energy criteria [Eqs. (2), (9)], which allow (unlike the other criteria) the inclusion of arbitrary feature vectors.

We investigate the following feature vectors, instead of the traditional ones in Eqs. (4) and (5):

$$\begin{aligned} f(v)&= (I(v), D_I(v)); \end{aligned}$$
(20)
$$\begin{aligned} f(v)&= (I(v), ||\nabla I(v)||, D_I(v)). \end{aligned}$$
(21)

\(D_I\) is the depth image generated with the Shape from Focus approach. The new feature vectors can be used straightforward in the statistical Shape Prior method as well as in the statistical active contours method in the same way. The only difference is that we use downscaled images in the Shape Prior approach. That means, the depth image \(D_I\) also has to be calculated from the downscaled image, which is significantly less computationally expensive.

5 Gradual enhancement with unfocused images

The following two facts motivate us to propose one more approach dealing with differently focused images [14]:

  • Actually, with appropriately unfocused images, the segmentation robustness can even be improved (see Sect. 6). This is because the unfocused images are not just similar to low pass filtered images, but in addition, the indentation is reinforced if the focus plane is chosen appropriately (see Fig. 11, left images).

  • Furthermore, the autofocus system takes a significant amount of time to provide the focused image. To get the focused image, our system starts with a focus plane farther away than the surface and incrementally gets nearer.

Fig. 11
figure 11

Different focus settings, reaching from \(fl<\!\!<0\) (left) to \(fl>\!\!>0\) (right)

Active contours incrementally converge at the object boundaries. That means, the image might be changed during the segmentation process. Our intention is to start with unfocused images, which can be segmented robustly. Incrementally, when a “better” image is available, the segmentation approach continues to process the new image in order to refine the results. The focused image is defined to have the focus level zero (\(fl = 0\)). If the focus plane is farther away from the camera than the specimen, the focus level is smaller than zero and vice versa. The chosen step size between two focus levels is explained in Sect. 6.

We propose the following three steps based on the Shape Prior gradient descent approach:

  1. 1.

    The focus starting setting is chosen that the focus plane is farther away than any part of the specimen (\(fl <\!\!< 0\)).

  2. 2.

    Start the proposed first stage gradient descent segmentation algorithm (see Sect. 3) on the unfocused image which is taken with the mentioned focus setting. Approximative results are achieved.

  3. 3.

    Until the end-criterion is reached:

    • Increase the focus level by one step and get the image.

    • Initialize the gradient descent algorithm with the current approximative results and the new image.

    • Increase the initialization variable “radius” by e.g. 2 pixels. As a balloon force is used, otherwise the contour could only shrink.

    • Run the algorithm with only 5 iterations to enhance the approximative results.

    • New approximative results are achieved.

The first image to segment is highly unfocused. Consequently, an exact segmentation surely cannot be achieved. However, the blurred image can be segmented robustly. Whereas the first image is segmented as proposed in Sect. 3, the enhanced images are not. These images are initialized according to the current approximative results and only 5 iterations of the gradient descent approach are applied. The proposed policy allows to start the segmentation before the final focused image is available.

5.1 Appropriate end-criterion

The intention is that the results could be enhanced until the focused image is reached. Actually, this cannot be done since the best results are achieved, when stopping with the image of a focus level below zero. In practice, this is not possible, as the focus levels are defined relatively to the focused image (which is not known apriori). However, when saving the result history, these results can be recovered.

5.2 Speeding up the initial segmentation

Whereas an enhancement step 3 is fast, the initial step 2 takes quite a long time. As the initial contour starts at the boundary of the image, has to shrink until it collapses and shrinks one pixel per iteration, about \(\frac{h}{2}\) iterations are necessary (\(h\) is the image height).

Whereas a further reduction of the image size affects the segmentation accuracy, increasing the step size of the contour does not, as far as robustness is concerned. Instead of modifying the evolving shape parameters by one per iteration, we propose to increase the step size (i.e. in one iteration, each parameter is adjusted by the positive or negative step size or stays the same). Increasing the step size to e.g. 4, we achieved less accurate results after the initial segmentation step, but after the enhancement steps, the results were exactly the same (the results are shown in Sect. 6).

6 Experiments

6.1 Database

For testing, two different databases with 150 (DB1) and 216 images (DB2) (resolution \(1,\!280 \times 1,\!024\) pixels) provided to us were used. These focused images have been evaluated manually with respect to identifying the four vertices, by four experts independently. The ground truth was determined by taking the mean of all four measures. Moreover, one more database (DB3) consisting of 25 indentation series, each consisting of 40 images with different focus settings is used to evaluate the Shape from Focus and the gradual enhancement approach. The quality of these images is considerably lower. The ground truth of these images has been determined by two people.

Our aim is to detect the four vertices of the approximately square Vickers indentations. In the following analyses, the distances between detected vertices and the ground truth are measured. In these figures, for each deviation bin (Euclidean distance in pixels) on the x axis, the number of vertices detected within this deviation is shown on the y axis.

6.2 Traditional two-stage approach without Shape from Focus

The following strategy turned out to be competitive as far as segmentation performance is concerned:

  • Stage 1A, Localization: approximative segmentation with the Shape Prior algorithm (Sect. 3) on downscaled images (factor 10).

  • Stage 2A, Refinement: the precise region-based level set method based on the results (as initializations) of the localization stage.

  • For a further improvement of the segmentation accuracy, the local Hough transform [4, 8] is applied in an area of 60 pixels around the corners as a postprocessing strategy.

The approximative Shape Prior localization stage already is highly robust, as a high rate of vertices can be located within a deviation of, e.g., 50 pixels, as can be seen in Fig. 12a and b. The region-based refinement stage based on the Shape Prior results but without a Hough transform is able to improve the accuracy (e.g., more vertices with deviations \(\le \)5 pixels). With the local Hough transform, the accuracy can be improved further more.

Fig. 12
figure 12

Results of the overall two-stage approach (left most line) and interim results (without refinement)

In Fig. 13a and b, different energy criteria of the refinement stage are compared. The region-based criterion is the best choice. The edge-based criterion is very vulnerable even with good initializations, as small edge operators are used. In this case, the traditional active contours approach (snake) is used instead of the level set approach. The edge-based level set approach is more competitive, but not as competitive as the statistical or the region-based level set approach.

Fig. 13
figure 13

Comparison of different energy criteria in the second stage

In Fig. 14a and b, we compare the proposed two-stage active contours approach with two existing template matching approaches [10, 11]. In the high accuracy range (0–1 pixels), our two-stage approach is the best choice. As far as deviations of about 3–8 pixels are concerned, the three-stage template matching approach [11] is slightly more competitive. However, when regarding higher deviations, the corner template matching approach [10] and our proposed method are more reliable. Over the whole range, the proposed 2-stage method seems to be the best alternative. The deduced knowledge seems to be reliable, as both databases show a similar behavior.

Fig. 14
figure 14

Comparison of our two-stage approach with other known approaches in the literature [10, 11]

6.3 Shape from Focus

We compare the traditional proposed two-stage approach without depth information with the versions with depth information. As the Shape from Focus approach requires series of images with different focus levels, for the experiments the image database DB3 is chosen. We decided to utilize seven images per indentation with the focus plane farther away than the specimen and the focused image.

First, we investigate the impact of the depth information on the first approximative stage. The best results are achieved with the two dimensional feature vector (Eq. 20). The distributions \(p_\mathrm{in}\) and \(p_\mathrm{out}\) are calculated by convolving the empirical distributions with a Gaussian Parcen window (\(\sigma =2\)).

The choice of the focus measure is not very decisive as the results are quite similar. The achieved segmentation performance with the different measures is shown in Fig. 15a. The Shape from Focus method in the following experiments is based on the SML focus measure (\(T=7, N=1, s=3\)) which is slightly more competitive as far as outliers are concerned. In Fig. 15b, the results of the statistical method with depth information (Eq. 20) is compared with the statistical method without depth information (Eq. 4). Whereas the number of edges detected quite exactly (0–25 pixels) is similar, the number of outliers (deviation \(>\)50 pixels) can be decreased significantly with the Shape from Focus information. As we concentrate on a low outliers ratio (i.e. a high degree of robustness) in the localization stage, the achieved results seem to be more appropriate as initialization for the exact segmentation stage.

Fig. 15
figure 15

Comparisons: changes if Shape from Focus information is used only in the first approximative stage

Now we investigate the impact of the shape information on the second stage. First of all, in Fig. 15c, the impact of the initialization on the traditional region-based level set segmentation is shown. If the method is initialized with the results achieved with the Shape Prior method including the depth information, the results are superior. The depth information used by the Shape Prior approach definitely increases the segmentation performance as far as robustness is concerned.

In Fig. 15d, we compare our methods from Fig. 15c with the corner template-matching approach [10] and the three-stage template-matching approach [11]. The corner template-matching approach definitely suffers from the low quality of the images. The robustness highly declines. The three-stage template-matching approach is slightly more accurate than the two-stage active contours approach applied to the low quality images. However, the active contours approach additionally based on the shape information delivers even better results than the three-stage approach, especially as far as outliers are concerned.

Now we compare different stage 2 methods (statistical level set approach including shape information, traditional region-based level set method [11] and statistical level set approach without shape information). The methods are initialized with the results achieved with the Shape Prior approach including the depth information (Stage 1B). The results are shown in Fig. 16.

Fig. 16
figure 16

Comparison: changes if Shape from Focus information is used only in the precise second stage

In contrast to the approximative Shape Prior approach, the results of the precise level set segmentation approach are more similar. The approach including the depth information seems to be slightly more robust than the region-based approach (regarding deviations of e.g., \(\le \)40 pixels). However, the region-based approach tends to be more accurate (regarding deviations of e.g., \(\le \)5 pixels). The statistical approach without the depth information tends to be in the middle of the other mentioned approaches.

To understand this behavior, we consider the different depth images provided by the depth from focus method (Fig. 10). Images with low noise and high contrast, which can be segmented well without any depth information (middle and right image) often have poor depth information [large regions without reliable depth information (marked white)]. Consequently, the segmentation of these images suffers from the additional information. In opposite, the depth information of highly noisy images (left image) usually is quite accurate, so the segmentation performance increases. Such an image is also shown in Fig. 17. In this image, the addition of depth information leads to a successful segmentation.

Fig. 17
figure 17

Achieved corner points with traditional method (\(\oplus \)) and Shape from Focus based approach (\(\otimes \)) (left) and the corresponding depth information (right)

Consequently, we recommend to use the second stage Shape from Focus approach only, if the image is very hard to segment by the traditional algorithms as the execution runtime increases considerably (Sect. 6.5) if the second stage is based on shape information. In the first approximative stage, the additional computational costs are low and the performance significantly raises.

6.4 Gradual enhancement with unfocused images

As this approach also requires differently focused image, DB3 is chosen with the low quality images. The step size between two focus levels is chosen in dependence on the optical zoom of the camera (see Table 1). E.g. with a magnification of 10, one focus level less means that the camera with a fixed-focus lens moves 10,000 nm towards the specimen.

Table 1 Focus step size dependent on the optical zoom factor

First of all, we investigate the effect of single unfocused images instead of focused ones on the proposed approximative Shape Prior indentation segmentation approach (traditional stage 1A, with unfocused images). Figure 18a and b shows the results of the approximative methods. In Fig. 18c, the focus plane is farther away from the camera compared with the best-focus setting.

Fig. 18
figure 18

Comparisons: impact of unfocused images used in the approximative stage 1

The robustness (i.e., the number of outliers is low) of the segmentation not only stay unchanged but can also actually be increased. The segmentation accuracy (e.g., the ratio of vertices with a deviation of maximal 20 pixels) slightly decreases. In opposite, if the focus plane is nearer to the camera compared with the best-focus setting (Fig. 18b), the accuracy and the robustness significantly decrease.

Now we regard the proposed gradual enhancement approach (stage 1C): the results with different focus levels as stopping conditions are shown in Fig. 19a. Although the behavior is similar to the behavior with one single unfocused image (best stopping level: \(fl = -10\)), the effect is smaller. The outliers ratio generally is lower than with the single image approach.

Fig. 19
figure 19

Comparisons: gradual enhancement approach

In Fig. 19b, the gradual enhancement approach with the best stopping focus level (\(fl = -10\)) is compared with the best results achieved with one single (unfocused) image and with the results with the focused image (traditional stage 1A). The gradual enhancement approach definitely is more competitive as far as the approximative stage is concerned than the best focus approach and even more robust (less outliers) than the single unfocused image approach.

The results (of different stopping criteria) seem to be more similar compared to the single image approach. However, the impact of the different initialization results on the level set algorithm is considerable, as shown in Fig. 19c. Especially, the number of outliers can considerably be decreased when stopping earlier (\(fl = -10\)). The slightly lower accuracy of the first stage can be compensated by the second stage. Consequently, we define the stop level \(-\)10 to be the best choice. In Fig. 19c, the achieved results of best configurations (gradual enhancement and single unfocused image) are compared with the traditional approach with focused images. The performance of the methods using unfocused images is definitely higher than the performance of the simple approach dealing with the focused image. The gradual enhancement approach is even slightly more robust (very few outliers) than the single unfocused image approach.

In Fig. 20, the Shape from Focus and the gradual enhancement approach are compared with the traditional method. Only the first stage varies, the second stage is always the same (region based level set approach). As shown, with the gradual enhancement approach in the first stage, the robustness of the overall segmentation can be improved once again.

Fig. 20
figure 20

Comparison: whereas in the first stage the different introduced methods are used, the second stage remains the same (2A)

6.5 Runtimes

In Table 2, the average runtimes of the different approaches are shown. The traditional two-stage approach takes about \(4.3\) s per image. If the Shape from Focus approach is included into the first stage, the overall runtime slightly increases, as the shape information must be computed. If the Shape from Focus approach is included in both stages, the runtime significantly raises, as the shape information must be computed from the originally sized images, which is computationally expensive.

Table 2 Execution runtimes (Architecture: Intel Core 2 Duo T5500 1.66 GHz)

If the gradual enhancement approach based on unfocused images is used, the traditional Shape Prior approach (2.2 s per image) is replaced by the initial step which takes 1.0 s and a number of enhancement steps (0.14 s per step). The runtimes cannot be compared, as with this approach the segmentation could start earlier.

7 Conclusion

We showed, that active contours are a precise segmentation approach especially if a good localization of the indentation is already available. The introduced Shape Prior approach provides very robust approximative results, which are used as initializations for the active contours. On the high quality databases, the introduced two-stage approach produces better results than existing approaches. When incorporating the mentioned approaches with Shape from Focus information, robustness can even more be improved. Especially, the localization stage benefits from the additional 3D information. On the low quality database, the traditional two-stage active contours approach is slightly less competitive than an existing high performing template-matching based method. However, with the shape information, the two-stage active contours approach is even more competitive than this existing approach. Similar results are achieved with the gradual enhancement approach. Nevertheless, the additional advantage of this method is that the overall execution runtime potentially can be reduced, when incorporating the approximative segmentation into the autofocus procedure.