1 Originality and contribution

In general terms, image segmentation divides an image into related sections or regions, consisting of image pixels having related data feature values. Among the wide variety of image segmentation approaches, Mathematical Morphology is considered to be a powerful tool for such kind of applications, in particular for those applications involving some kind of geometric analysis. As a result, in our paper, we describe a generic parallel architecture to implement what we call the shifted Mathematical Morphology operations. Then, the MGT segmentation scheme is designed and all the basic morphological operations are revised and adapted for working in a real-world environment. After that, the optimal threshold for each image is selected by means of a binary distance and, finally, a set of experiments is completed to show that our segmentation scheme achieves near-optimal results. The minimum distance between the segmented images and the ideal segmentation provides the optimal threshold, which is obtained when MGT is around the 70% of the maximum value of the morphological gradient, for the 875-image database used in the experiments.

The main advantages of our method include the combination of the local properties of every pixel and an efficient edge detection algorithm to calculate a well-suited threshold. Moreover, the use of what we call shifted morphological operators improves the robustness of image segmentation algorithms. Finally, we must remark that the MGT threshold could be updated in order to consider changing conditions in the environment, such as varying lighting conditions or shading effects, among others, making it robust enough to be used in unstructured environments.

2 Introduction

The recognition of the world that surrounds a human and the subsequent fast response to the different stimuli which a human brain may have identified seems to be a simple task. In many cases, this response is related to an efficient processing of the visual information previously collected. In other terms, if human beings are able to determine what things are so easily, to what extent is it necessary to develop automated systems for analyzing images?

Nevertheless, the last 40 years have shown that there are hundreds (even thousands) of vision applications in which using an automated system brings a more appropriate result than the combination eyes + brain could do. For instance, there are some applications, such as visual inspection in industrial manufacturing or the implementation of specific benchmarks to compare the performance of different vision algorithms, in which a huge amount of measures from image databases are needed; therefore, a massive visual analysis from a human being could become a tedious work and, what is more, it could generate erroneous conclusions. A well-designed automated system accurately performs these systematic tasks and avoids possible human errors.

In general terms, the first step in a vision processing system is the pre-processing stage, which is usually used for enhancing or restoring images. After that, the two subsequent phases are image segmentation and feature extraction. The task of decomposing an image into distinctly different but homogeneous parts is called image segmentation, and the process of extracting information from these regions that define their unique properties is called feature extraction.

Image segmentation has been the subject of intensive research and a wide variety of segmentation techniques have been reported in literature [13]. It is an essential issue since it is the first step for image understanding, and any other step, such as feature extraction and recognition, heavily depends on its results. However, there is still certain ambiguity about the definition of image segmentation, whereas for some authors segmentation simply consists of labeling every pixel in an image using a value indicating their correspondence to a region or a class [4, 5], many other researchers consider that it is also necessary to provide a method to represent the topological relations between the different regions in which the original image has been originally divided [6, 7].

On the other hand, segmentation algorithms are based on two significant criteria: the homogeneity of a region and the discontinuity between adjacent disjoint regions. Although many image segmentation approaches deal with these problems [810], it is difficult to satisfy all the properties for the optimal set of segmented regions. The resulting segmented images generally depend on a series of predetermined threshold values. So, the algorithm often fails due to it either merges regions that should be divided or separates connected regions.

One of the main applications in which image segmentation is being applied is vision-based robot navigation. Visual navigation of a mobile robot in a real environment has received considerable attention over the past few years [1113]. The basic requirements of robot navigation include self-localization, obstacle avoidance and path planning. There are a great amount of proposals for global localization [14, 15], which implies in many cases to select different sensor technologies. For our proposals, we shall consider global localization by means of vision.

One of the main disadvantages for vision-based navigation to be widely accepted is the huge amount of information involving the robots environment which must be provided by the camera(s). Therefore, the development of techniques which reduce the massive processing of input data and, consequently, the computation time, involves a subsequent improvement of the recognition rates of the surrounding world of an autonomous system. To do this, finding a reliable segmentation scheme is the first step to effectively accomplish the remaining tasks that navigation involves.

In relation to that, in this work a new technique for segmenting images, which combines edge detection methods with local pixel information algorithms, is proposed. Our method is based on using the so-called shifted Mathematical Morphology operations to find the edges of the input image, which are used to compute what we call the pixel-symbol image. In some of our previous works [16], the basis of this algorithm was first introduced. In the present work, we will study the shifted morphological operations in depth; then, we generalize the concept of pixel-symbol image, showing that the search of the optimal threshold for obtaining a segmented image is a minimization procedure, and, finally, the experiments are extended to a more robust image database, integrated by 875 real-world images. The completed experiments show that our method gives accurate results and that using the estimated optimal threshold allows the system to obtain a segmented image of good quality.

The paper is organized as follows: first, a review of the main current approaches to image segmentation is found in Sect. 3. Then in Sect. 4, the fundamentals of gray-scale Mathematical Morphology are analyzed and the shifted morphological operators are defined. After that, the Morphological Gradient Threshold (MGT) image segmentation algorithm is described throughout Sect. 5. Section 6 shows a distance metrics to measure how a MGT segmented image can be used instead of the ideal segmentation, which leads to the estimation of the optimal threshold for the segmented images. Afterwards, in Sect. 7 the completed experiments to find the optimal MGT value are shown. Finally, we end with some concluding remarks in Sect. 8.

3 Image segmentation fundamentals

In many cases, it is commonly accepted that the central objective of image segmentation is the separation of the objects included in an image from the background pixels. The most popular methods use the intensity levels to measure the degree of similarity among pixels and, as as result, pixels are classified either as objects or as background.

The classical methods for image segmentation are divided into three different categories [1720]:

  • Methods based on a global (and usually, prior) knowledge about image information: shape of objects, range of intensity levels, etc.

  • Methods based on finding edges.

  • Methods based on finding regions or textures.

When different objects have different gray levels, a good choice is using the histogram, which indicates the number of pixels belonging to each gray level. In this case, the selection of a threshold depends on the calculation of the gray level which optimally divides the image into foreground and background pixels [21, 22]. These threshold techniques, which make decisions based on local pixel information, are effective when the intensity levels in the objects are quite different from the range of levels in the background. Histogram-based threshold methods are very sensitive to noise and non-uniform illumination; for that reason, the image is usually filtered (smoothing filters, median filters, etc.) before the computation of the threshold is completed.

On the other hand, edge-based methods are usually utilized for the detection of contours, i.e., it is assumed that there is a correspondence between the intensity discontinuities in an image and the boundaries of objects. In many cases, the extraction of the boundary pixels starts from the computation of the x and y gradients of the intensity levels of the image (i.e., the first derivative) and, afterwards, a pixel is classified as ‘boundary’ or ‘non-boundary’. Assuming that the boundary of the objects would generate well-defined intensity changes is usually correct; but in many cases the intensity levels suddenly change due to other factors, such as different textures in the same object, varying lighting conditions or non-uniform surfaces, among others. Therefore, it is rather difficult to extract well-defined contours and a subsequent algorithm to connect boundary pixels must be used.

Once the edges in the input image are obtained, the existing objects in the scene must be determined. If detection were perfect, each object would be surrounded by a perfect contour. Unfortunately, this ideal situation seldom occurs, since boundaries are usually discontinuous and noise pixels are often classified as boundary pixels and connected to what actually are object contours. To overcome this problem, different approaches have been proposed [23, 24]; to link contours, starting from a pixel included in a contour, most algorithms search for other contour pixels and link them by means of different criteria, such as following a contour perpendicular to the gradient direction [25, 26]. Other proposals take into account the so-called active contours [27, 28], which start with some initial boundary shape (usually represented as spline curves), then the algorithm iteratively modifies the shape by applying several deformation operations according to certain energy function.

Finally, region-based methods usually assume that the objects to be segmented are represented by homogeneous regions, where homogeneity is determined following some criterion, such as finding the similarities among the intensity levels [29, 30]. These segmentation techniques are useful when the system works with noisy images, where edge-based methods have shown not to perform properly in many cases. After splitting the image into different regions, adjacent regions are merged depending on a set of rules, which often involve homogeneity, sharpness or the textures of different region boundaries.

To sum up, although there are a wide variety of proposals which make use of many of the segmentation approaches described above (or some variations of them), there is no universal solution able to overcome all the real condition factors, such as the lack of uniform lighting conditions, the presence of noise or the overlapping problems due to occlusions. This fact has made image segmentation a field of intensive research so far, and hybrid techniques, which use a mix of the methods described above, have become popular in recent years.

4 A revision of mathematical morphology operations

Mathematical Morphology is considered to be a powerful tool for image analysis, in particular for those applications which involve some kind of geometric analysis. Mathematical Morphology has been extensively used in low-level image processing applications, since it allows to filter and/or enhance only some specific features of objects, depending on their morphological shape (for further details see [3136], among others).

The underlying idea of this technique is to analyze the shape of the objects in an image by convolving each pixel with a small template (usually, a symmetric, regular-shaped template, such as a line, a segment, a circle, a square, etc.), known as the Structuring Element (SE). For our proposals, let us assume that a grey-level image can be represented as a function f(X) defined as the union of a pile of decreasing binary sets X; thus, each of the possible 256 sets (each one representing a grey level, from 0 to 255) is the intersection between the function f and a series of constant level planes λ, where \(\lambda = 0, 1,\ldots, 255{:}\)

$$ X_{f}(\lambda)=\{x\in{\mathbb{R}}^{2},\;f(x)\geq\lambda\}\Leftrightarrow f=\int X_{f}(\lambda){\text{d}}\lambda. $$
(1)

Hereafter, let us denote f(X) ≡ F. Following our revision, the two elementary operations of Mathematical Morphology, erosion \(F\ominus B\) and dilation \(F \oplus B,\) for grayscale images are defined as follows:

$$ F \ominus B = \inf_{b \in B, x \in X} f [ (x+b) ] $$
(2)
$$ F \oplus B = \sup_{b \in B, x \in X} f [ (x+b) ]. $$
(3)

As (2) and (3) state, erosion and dilation operate with pixels in a local way, i.e., by taking into account the intensity values for each pixel and its neighborhood, which is defined by the shape and the size of the SE B. Since almost any other morphological operator, such as opening, closing, top hat, etc., uses erosion or dilation (or a combination of both) to complete their execution, these two basic operations must be properly implemented to satisfy real-time constraints (see [37] for more details). It is clear, though, that the efficiency of the algorithms used to compute morphological operations will depend on the particular hardware architecture in which the vision system is being implemented.

From the definition of erosion and dilation, we can conclude that the computational complexity of morphological operations depends on the number of elements in the morphological operands. As a result, to compute a dilation or an erosion, for each pixel the algorithm looks for the highest (dilation) or the lowest (erosion) intensity level in the neighborhood defined by the structuring element. Let P = M × N be the number of pixels of the input image F, where M is the number of rows in F and N is the number of columns in F; let Q be the number of elements in the structuring element B. The computational complexity \(\mathcal{O}(\cdot)\) of the dilation (erosion) \(F \oplus B\) \((F \ominus B)\) is:

$$ {\mathcal{O}}(F \oplus B) = {\mathcal{O}}(F \ominus B) = P \times Q. $$
(4)

In many cases, the structuring element B is a small template and, consequently, PQ. As a result, \(\mathcal{O}(F \oplus B) = \mathcal{O}(F \ominus B) \simeq P.\) This computational complexity can be reduced as follows. Let us define some concepts first.

Definition 1

A structuring element B is said to be convex with respect to a given set of morphological operations (e.g., dilation) with a given set of structuring elements \(B_1, B_2, \ldots, B_n,\) if it can be expressed as a chain of dilations of the elements \(B_i,\; for\; i=1, \ldots, n{:}\)

$$ B= \left( \left( (B_1 \oplus B_2)\oplus B_3 \right) \oplus \ldots \right) \oplus B_n. $$
(5)

Otherwise, B is said to be non-convex with respect to the same set of structuring elements.

Most planar structuring elements can be decomposed as a chain of morphological operations, i.e., they are convex. This property is useful to reduce the computation time of Mathematical Morphology basic operations. The point now is how to implement the decomposition to improve the efficiency of morphological operations.

Ignoring the trivial case in which only one point belongs to the SE, the most basic SEs just consist of two consecutive points. Thus, there are four possible elemental SEs—one for each coordinate direction—which constitute what we call a SE canonical base.

Definition 2

A set of SEs \(\{B_1,B_2, \ldots, B_n\}\) constitute a canonical base \(\Uplambda\) for a morphological operation \(F \bigcirc B\) if any other convex SE can be decomposed and/or reconstructed with \(F \bigcirc B\) using only some or all the SEs included in \(\Uplambda.\)

Thus, the four elements in Fig. 1 constitute a canonical base \(\Uplambda_1\) for the dilation, where black pixels indicate the center of each structuring element:

Fig. 1
figure 1

An example of a canonical base

One of the most popular structuring elements is the 3 × 3 square. This SE B can be decomposed using the canonical base \(\Uplambda_1\) as follows:

$$ B= \left( (B_1 \oplus B_2)\oplus B_3 \right) \oplus B_4. $$
(6)

Thus, the dilation of an image F using the 3 × 3 square SE can be performed as follows:

$$ F \oplus B= \left( ((F \oplus B_1) \oplus B_2)\oplus B_3 \right) \oplus B_4. $$
(7)

Figure 2 shows how a 5 × 5 square SE can be divided using the canonical base \(\Uplambda_1.\)

Fig. 2
figure 2

Decomposition of a 5 × 5 square SE with the canonical base \(\Uplambda_1. \)

Now, the following proposition can be enunciated.

Proposition 1

Dilating an image F with a k-point linear segment B is equivalent to shift image F k-1 times in the direction defined by the center in B and, then, compute in parallel:

$$ \sup_{x_i \in F, s_i \in F_s} [ x_i, s_i ] , \quad {\text{for}} \,i= 1,\ldots,P, $$

whereF s is the shifted image.

The erosion of an image F using any convex SE can be expressed in a similar way, since the erosion of an object is equivalent to the dilation of the background. Thus, for the erosion, Proposition 1 is also valid, but the infimum must be computed instead of the supremum.

Following with the 3 × 3 square example, a dilation (or an erosion) can be computed using four 1-pixel shifts of the original image F, each one along a different direction of the coordinate axis. After that, for each pixel in the original image F and in the shifted images \(F_{s_j}, \) j = 1, 2, 3, 4, a global supremum (or a global infimum for the erosion) must be calculated.

This technique can be applied to any convex planar structuring element for a canonical base \(\Uplambda.\) The parallelization is now straightforward: if a processing element is used to compute each shift, and each processing element completes its execution in parallel, the computational time of dilation/erosion will be reduced by a factor γ, where γ is the number of elements in the SE canonical base \(\Uplambda.\) As any convex structuring element can be ultimately decomposed using the four elements in the canonical base \(\Uplambda_1,\) this canonical base can be taken as the practical set of SEs to determine the number of required shifts for completing a morphological operation. Let pqrs be the times we must perform the 1-pixel shift along each coordinate direction. Then, for a 3 × 3 square SE, p = q = r = s = 1, whereas for a 5 × 5 square SE, p = q = r = s = 2.

Two remarks must be pointed out to conclude this section: (a) as shown before, the computation time of the shifted morphological operations have a direct relation with the size of the image, P (number of rows and columns), and the size of the structuring element Q, since the number of shifts are related to the shape and the size of the structuring elements in the canonical base \(\Uplambda;\) and (b) due to their parallelization properties, the computational cost of the shifted morphological operations will be reduced when a parallel architecture is used by the automated vision system.

5 Developing a morphological segmentation scheme

As previously described, segmentation is one of the major research areas in image analysis and object recognition. Designing a robust segmentation scheme is not an easy task: a tradeoff between the computational cost of visual recognition and the reliability of the information obtained must be considered. Thus, in general terms, the less uncertainty about the interpretation of a scene is desired, the more time to complete the task is needed. Due to real-time constraints, developers often choose low computation time algorithms to implement real segmentation schemes. Therefore, some of the most popular approaches who provide low computation times and good interpretation results are threshold techniques and edge-based methods, as previously shown.

Threshold techniques, which make decisions based on local pixel information, are highly dependent on the contrast of images, i.e., the intensity levels of objects must be quite different from the ones in the background. Threshold algorithms are simple and give very good results when used in controlled environments, such as visual inspection for assembly lines in a factory, but deciding the threshold values is not an easy task. Specially, this becomes a hard issue for an automated vision system, as the system must update the threshold values without any external help.

On the other hand, edge-based methods mainly use gradient operators, such as Sobel, Prewitt or Canny ones, to detect the boundary of the objects in an image. Edge detection also involves some problems, usually when working with noisy images, since it could divide edges that must be connected or detect false contours due to spurious noise.

To overcome these problems, we propose a method that combines both threshold and gradient operators: the so-called MGT segmentation, as described in Fig. 6 (see also [16]). It consists of seven steps, where the gradient and the Laplacian are calculated using the Mathematical Morphology operations and the optimal threshold value is selected by measuring the lowest distance between the ideal segmentation and a pre-computed set of MGT segmented images.

In the following sections, a more detailed description of this process is found.

5.1 Pre-computing morphological operations

In almost any digital image there is some amount of white noise. To avoid noise effects, which only spend computation time and modify the original features of image, an initial filtering process has to be applied. There are many algorithms to accomplish this task; for instance, a Gaussian filter can be used, since it preserves many of the image features while its computational cost can be assumed in a real-time environment [38].

One inherent problem of low-pass filters, such as the Gaussian filter or a mean filter, is that, despite they efficiently minimize the presence of noise in images, image contrast is also reduced. This fact results in blurred object boundaries, which can cause errors when detecting edges. Hence, using a median filter could be a better choice, since it eliminates noise while preserving object boundaries [39]. That is the reason why a median filter has been chosen for the pre-processing stage of our segmentation scheme.

Once the noise is reduced, the point is how to get the optimal segmented image. To do this, let us consider first the computation of the derivative-based operations to perform the edge detection stage, i.e., the gradient and the Laplacian must be calculated.

To detect contour lines, Sobel or Prewitt operators are frequently chosen as practical gradient masks, since they achieve acceptable results without using much computation time, either. In spite of this, most of the times gradient operators are combined with other kind of post-processing techniques to avoid practical mistakes—such as classifying noise or background pixels as boundaries—as much as possible. Moreover, gradient-based operators work efficiently when there are sudden intensity changes throughout the image, but they detect poorly gradual intensity variations. To overcome this problem, one of the most popular methods is to compute the zero-crossing points of the second derivative, the Laplacian, of the input image. If first-derivative operators are quite sensitive to noise, this effect is much far extended to second-derivative operators. As a consequence, after finishing the edge detection stage, the resulting image may have some isolated pixels to be removed. This post-processing operation can be performed using Mathematical Morphology, as shown next.

Let us recall that the shifted morphological operations are essentially computed by shifting the input image depending on the structuring element and that having defined a SE canonical base \(\Uplambda,\) every structuring element can be decomposed in terms of the γ basic elements included in \(\Uplambda.\) Furthermore, almost any complex morphological operation is a combination of the two basic dual operators, erosion and dilation. Consequently, having pre-calculated the necessary shifts of the input image, the remaining operations to generate the optimal MGT segmented image can be completed in terms of global supremum/infimum (dilation/erosion) operations. Thus, the edge detection can be performed using the morphological gradient and Laplacian and the post-processing removal of the isolated points is efficiently performed by means of the morphological closing.

Thus, the morphological gradient of an image F for a structuring element B, ρ B (F), is defined as follows:

$$ \rho_B (F) = \frac{1}{2} (F \oplus B - F \ominus B), $$
(8)

where\(F \oplus B\) and \(F \ominus B\) are, respectively, the dilation and the erosion of an image F for a structuring element B defined in (2) and (3).

The following step is to calculate the second derivative, the Laplacian. A morphological implementation for the Laplacian has also been chosen, since the previously pre-calculated shifted images for computing erosion and dilation can be used, as well. Thus, the morphological Laplacian of an image F for a structuring element B\(\Upgamma_B (F),\) is defined as:

$$ \Upgamma_B (F) = \frac{1}{2} (F \oplus B + F \ominus B) - F. $$
(9)

Finally, to remove isolated points we have chosen a morphological closing operator. The closing of image F for a structuring element B is defined as:

$$ F \cdot B= (F \oplus B) \ominus B, $$
(10)

i.e., the closing is obtained by dilating the original image first, then eroding the resulting image.

The results for a gray-scale image after these initial steps are shown in Fig. 3, where the SE B is a 3 × 3 square and the canonical base is, therefore, \(\Uplambda_1\) (see Fig. 1).

Fig. 3
figure 3

An image from real world: a original image, b ρ B (F), c \(\Upgamma_B (F) \)

5.2 The pixel-symbol image

The next task is building a new image, which we call the pixel-symbol image which characterizes pixels for an optimal segmentation. Thus, after having completed all the operations described in the last section, the pixel-symbol image \(\mu(x,y,\Upomega), \) for \(\Upomega= \langle \alpha_1, \alpha_2, \alpha_3 \rangle, \) is obtained as follows:

$$ \mu(x, y,\Upomega) = \left\{ \begin{array}{l} \alpha_1 \quad \hbox{ if }\rho_B(x,y)< MGT, \\ \alpha_2 \quad \hbox{ if }\rho_B(x,y)\geq MGT\hbox{ and }\Upgamma_B(x,y)\geq 0,\\ \alpha_3\quad \hbox{ if }\rho_B(x,y)\geq MGT\hbox{ and }\Upgamma_B(x,y)<0, \end{array} \right. $$
(11)

where MGT is the Morphological Gradient Threshold, \(\Upomega = \langle \alpha_1, \alpha_2, \alpha_3 \rangle\) is a ternary list of symbols so that \({\{\alpha_i \in [0,255], \alpha_i \in \mathbb{Z}^{+}, \alpha_i \neq \alpha_j, \; \forall \; i , j= \{1,2,3\} / i \neq j \},}\) and (xy) is a pixel in F. Consequently, the resulting pixel-symbol image has three different gray-levels, according for a pixel to belong to an object, to the background or to the boundary of an object.

The choice of the threshold value is a challenging task, since the final result is high dependent on many factors, such as lighting conditions, the texture of objects or shading effects. Figure 4 shows the results of the construction of the pixel-symbol image for the image in Fig. 3, with several different MGT values and \(\Upomega =\langle 0 , 128, 255 \rangle.\)

Fig. 4
figure 4

The pixel-symbol image \(\mu(x,y,\Upomega)\) for different MGT values: a gradient mean, b MGT = 0.7 × max(ρ B (F)). c MGT = 0.9 × max(ρ B (F))

Though many practical systems utilize a manually pre-calculated threshold, in this work we consider the use of an automated thresholding system. This method takes into account a binary image metrics to compare the segmentation results and, afterwards, to establish the quality level of the obtained segmentation, as it is described in the following section.

6 Finding an optimal MGT value: the Δp distance

A main problem in computer vision is to be able to compare the results using a proper metrics. This will quantify the differences between two images and, if binary images are used, the method would be both easily implementable and low computationally complex [40]. In our system, we are interested in measuring the distance between image S′ (the segmented image after gradient thresholding) and image S (the manually pre-calculated ideal segmentation). Thus, this will lead to the calculation of the optimal MGT value.

Consequently, the pixel-symbol image \(\mu(x, y,\Upomega)\) must be binarized first. We must recall that \(\mu(x, y,\Upomega)\) has only three gray-levels (11) which are determined by \(\Upomega = \langle \alpha_{1}, \alpha_{2}, \alpha_{3}\rangle.\) For our system to be robust and computationally efficient, let us consider that the threshold is the same as the one used in the construction of \(\mu(x, y,\Upomega),\) i.e., the gradient threshold MGT. The results of this process are shown in Fig. 5, for \(\Upomega = \langle 0 , 128, 255 \rangle.\)

Fig. 5
figure 5

Binarization with different MGT values: a gradient mean, b MGT = 0.7 × max(ρ B (F)). c MGT = 0.9 × max(ρ B (F))

Next, a reliable measure to compare the obtained segmented image with the ideal segmentation must be selected.

6.1 An error metric for MGT segmentation

Error measures are often used in the designing of algorithms for segmentation and classification [4143] and edge detection for computer vision [4446]. General principles for error measurement were enunciated in [44], in the context of edge detection. A good edge filter should exhibit:

  • Good detection. Low probability of failing to detect an edge, and low probability of incorrectly labeling a background pixel as an edge.

  • Good localization. Points identified as edge pixels should be as close as possible to the center of the true edge.

  • Unicity. There should be only one response to a single edge.

Optimal edge filtering involves a tradeoff between good detection and good localization. Error measures for detection and localization were surveyed by [4749]. Our main interest focus on a direct comparison between the ideal segmentation S and the resulting pixel-symbol image S

Definition 3

Let S be the ideal binary segmented image and let S′ be the segmented image, obtained from some segmentation threshold MGT and some SE canonical base \(\Uplambda.\) Therefore, d(S, S′) is a distance metric iff the four following properties are accomplished:

  1. (i)

    \(d(S,S^{\prime})\geq 0.\)

  2. (ii)

    \(d(S,S^{\prime})= 0 \Longleftrightarrow S=S^{\prime}.\)

  3. (iii)

    \(d(S,S^{\prime})= d(S^{\prime},S).\)

  4. (iv)

    \(d(S,S^{\prime})\leq d(S,S^{\prime\prime})+d(S^{\prime\prime},S^{\prime}), \)

where \(S^{\prime\prime}\) is another MGT segmented image.

Since our segmentation scheme is based on Mathematical Morphology, one of the error metrics related with this kind of operations is Hausdorff distance, which has been used, for instance, to interpolate noisy images by means of morphological operators [50, 51].

Hausdorff distance is a technique for comparing sets of points; let us assume, thus, that a binary image S can be represented as a set \(\Upphi,\) so that \(\Upphi=\{s \in S: \Upphi(s)=1\}.\) Consequently, for each pixel s in a model image S, Hausdorff distance computes its proximity to any pixel \(s^{\prime}\) in a new binary image \(S^{\prime}\) and viceversa. This metrics is employed in pattern recognition to determine the degree of resemblance between two objects (see [51]) and, as a result, it can be used to evaluate how similar two binary segmented images, S and S′ are.

Definition 4

If \(\delta(s,s^{\prime})\) is the distance between two pixels s and \(s^{\prime},\) then the shortest distance between a pixel \(s \in \Upphi\) and \(S^{\prime} \subseteq \Upphi^{\prime}\) is defined by:

$$ d(s,S^{\prime}) = \inf \left\{ \delta(s,s^{\prime}): s^{\prime} \in S^{\prime} \right \}, $$
(12)

where \(\Upphi^{\prime}=\{s^{\prime} \in S^{\prime}: \Upphi^{\prime}(s^{\prime})=1\}.\)

The distance function has a Lipschitz property:

$$ d(s,S^{\prime}) \leq d(t,S^{\prime}) + \delta(s,t) $$
(13)

for any \(s,t \in \Upphi.\) In particular, \(d(\cdotp,S^{\prime})\) is uniformly continuous. Closed sets are characterized by their distance functions: \(d(\cdot,S) \equiv d(\cdot,S^{\prime})\) iff \(S=S^{\prime}.\)

Definition 5

Let \(\Upphi=\{s_{1},\ldots,s_{m}\}\) be the set who represents the ideal segmentation S of image F and let \(\Upphi^{\prime}=\{s^{\prime}_{1},\ldots,s^{\prime}_{n}\}\) be the set who represents the MGT segmented image. The Hausdorff Distance \(H(\Upphi,\Upphi^{\prime})\) can be defined as:

$$ H(\Upphi,\Upphi^{\prime}) = \hbox{max} \left \{ \sup_{s \in \Upphi} (d(\Upphi,\Upphi^{\prime})), \sup_{s^{\prime} \in \Upphi^{\prime}} (d(\Upphi^{\prime},\Upphi)) \right \}. $$
(14)

As a result, the Hausdorff distance gives an interesting measure of the mutual proximity of two sets of points, by indicating the maximum distance from any pixel of one set to the other set. Unfortunately, this metrics involves an inherent problem: the presence of outlier points, mainly coming from noise effects. That is, Hausdorff distance is very sensitive to noise: a single outlier pixel can cause \(H(\Upphi,\Upphi^{\prime})\) to take its maximum possible value, since it uses a supremum of distance values.

As shown before, one of the main tools to remove noise in image pre-processing is the use of some kind of image smoothing, such as some kind of median filter or a Gaussian filter. Due to the fact that Hausdorff distance is highly dependent on noise, in order for the noise effects to be removed, let us replace the supremum in (14) by a mean or pth-order mean. This is shown next:

Definition 6

Let \(\Updelta^p(\Upphi, \Upphi^{\prime})\) be a distance metrics, defined as the pth-order mean difference of the distance between two sets: \(\Upphi\) (the ideal binary segmentation) and \(\Upphi^{\prime}\) (the binary pixel-symbol image).

Then, for \(1\leq p <\infty\) we define:

$$\Updelta^p(\Upphi, \Upphi^{\prime}) = \left[\frac{1}{N}\sum_{s \in \Upphi} |(d(s,\Upphi))-(d(s,\Upphi^{\prime}))|^p \right]^{1/p}, $$
(15)

where N is the total number of pixels in \(\Upphi.\)

The result is topologically equivalent to \(H(\Upphi,\Upphi^{\prime}),\) since distance functions \(d(s,\Upphi)\) are continuous Lipschitz functions by (13) (see [49]).

It is evident that \(\Updelta^p(\Upphi, \Upphi^{\prime})\) is a metric, since properties (i) and (iii) of Definition 3 are directly satisfied by (15). From (13), property (iv) is also accomplished and, finally, as mentioned before, closed sets are characterized by their distance functions: \(d(\cdot,\Upphi) \equiv d(\cdot,\Upphi^{\prime})\) iff \(\Upphi=\Upphi^{\prime},\) so (15) also satisfies property (ii) in Definition 3. As a consequence, \(\Updelta^p(\Upphi, \Upphi^{\prime})\) is a metric.

Intuitively, \(\Updelta^p(\Upphi, \Upphi^{\prime})\) measures the suitability for an estimated image to be used instead of the real one. Parameter p determines the relative importance of large localization errors. For large values of p, the effect is similar to the Hausdorff metric, since the pth-order mean is equivalent to the supremum:

$$\Updelta^{\infty}(\Upphi, \Upphi^{\prime}) = H(\Upphi,\Upphi^{\prime})$$
(16)

6.2 Obtaining the optimal threshold

In this point, we have defined a distance metrics to compare the ideal segmentation (represented by the set ϕ) with the segmented images using the MGT segmentation scheme, which are represented by the sets \(\phi^{\prime}_i\) for \(i=1,\ldots,M\) defined by a set of M different MGT values.

Therefore, we seek for the MGT threshold that minimizes the distance function \(\Updelta^p(\Upphi, \Upphi^{\prime}_i),\) since this minimum distance indicates the closest MGT segmented image to the ideal segmentation. That is to say:

$$ MGT_{opt}= \arg \hbox{min} _{MGT} \Updelta^p(\Upphi, \Upphi^{\prime}_i) $$
(17)

for \(i=1,\ldots,M.\)

The main steps of our algorithm are shown in Fig. 6.

Fig. 6
figure 6

Morphological Gradient Threshold segmentation scheme

Hence, the method for obtaining the optimal MGT threshold can be summarized as follows:

  • Select a set of M different MGT values.

  • From (11), define a set of M pixel-symbol maps, \(\mu_i(x, y,\Upomega)\), for \(i=1,\ldots,M. \)

  • Binarize the pixel-symbol maps and obtain the sets \(\phi^{\prime}_i.\)

  • Compute MGT opt by minimizing \(\Updelta^p(\Upphi, \Upphi^{\prime}_i).\)

As stated in Sect. 5.1, almost all the morphological operations can be defined in terms of the basic erosion and dilation operations; thus, since the morphological erosion and dilation were defined as a set of shifts in Sect. 4, we can pre-calculate the necessary shifts of the input images for any application (according to the selected structuring element), then compute the global infimum/supremum (erosion/dilation) as necessary. For our experiments, we have used a 3 × 3 square SE, so the number of 1-pixel shifts for each coordinate direction is p = q = r = s = 1, i.e., 4 shifts in total.

As a result, the optimal MGT segmented image can be obtained following the procedure aforementioned; according to (8)–(10), who are the ones involving Mathematical Morphology operations and are necessary to obtain the pixel-symbol image, there are a total of three erosions and three dilations for obtaining \(\mu(x, y,\Upomega), \) i.e., we shall need two basic operations for each one of the defined M pixel-symbol maps; if the shifts are pre-computed this process will involve only two global infimum/supremum, one for the erosion and one for the dilation.

After defining the distance metrics and the minimization process, let us evaluate the performance of our segmentation scheme using a set of real-world images.

7 Experiments

Let us show now the results of some experiments completed for our model. We will first focus on obtaining the minimum distance between the ideal segmentation and the MGT segmented images. After that, our method is compared with others in order to extract the advantages of our approach.

7.1 Finding MGT opt

The tests have been performed with a series of real-world images, whose pixel-symbol images have been calculated for different MGT values. Then, after applying the binarization process, the distance \(\Updelta^p(\Upphi, \Upphi^{\prime})\) has been computed. To complete the tests, a number of M = 9 equispaced MGT values are chosen; we also included the gradient mean as a MGT value.

In any case for our tests, what we call the ideal segmentation is computed using a semi-automated procedure; thus, we first applied the well-known Sobel algorithm for detecting edges and, afterwards, images were binarized using a threshold, for each input image. The threshold level for the binarization is determined by the average and standard deviation of the grayscale values. Then, in order to obtain the true edges of the images, isolated points were removed through a morphological closing operation. All these methods were supervised by a human so as to achieve the best results for the ideal segmentation.

Table 1 and Fig. 7 show the distance results, with different MGT values, for the image in Fig. 3a, for p = 2.

Table 1 Distance results after the MGT segmentation processing for the first image in Fig. 3
Fig. 7
figure 7

Distance distribution for different MGT values for the image in Fig. 3a

In this case, the lowest distance is obtained when MGT opt = 0.7 × max(ρ B (F)).

Figure 8 compares the ideal segmentation and the MGT segmentation with the lowest \(\Updelta^p(\Upphi, \Upphi^{\prime}).\) As shown, the selected MGT value is rather similar to the ideal segmentation.

Fig. 8
figure 8

Segmentation results for the image in Fig. 3a: a original image, b ideal segmentation, c MGT segmentation

Our segmentation scheme has been applied to an empirically collected database of real-world images, which include a set of 875 images; to do this, after selecting a wide range of MGT values, the distance between the ideal segmentation and each MGT segmented image has been computed. Figure 9 shows some of the images included in the database.

Fig. 9
figure 9

Some images in the database

Let us consider now the MGT segmentation results for the first image in Fig. 9. Table 2 and Fig. 10 show the distance results, for different MGT values and p = 2. Then, Fig. 11 compares the resulting segmented image for the optimal MGT value (MGT opt = 0.8 × max(ρ B (F))) with the ideal segmentation.

Table 2 Distance results after the MGT segmentation processing for the first image in Fig. 9
Fig. 10
figure 10

Distance distribution for different MGT values for the first image in Fig. 9

Fig. 11
figure 11

Segmentation results for the first image in Fig. 9: a original image, b ideal segmentation, c MGT segmentation

Finally, let us show the results obtained after applying our segmentation scheme to all the images in the database. Table 3 and Fig. 12 show the mean distance results for our database.

Table 3 Mean distance results after the MGT segmentation processing for the image database
Fig. 12
figure 12

Mean distance distribution for different MGT values

The mean minimum distance, \(\overline{\Updelta^p}(\Upphi, \Upphi^{\prime})= 0.2874,\) is obtained when MGT opt = 0.7 × max(ρ B (F)) and, as a consequence, we find that this value of the MGT is the optimal one for our database, since in this case our segmentation scheme is very close to the ideal segmentation. Moreover, for all the performed experiments, the mean value of the MGT for obtaining the minimum distance was \(\overline{MGT}_{\hbox{min}(\Updelta) } = 0.7263 \times \hbox{max}(\rho_B(F)), \) which again shows that the choice of this value for MGT opt leads to a correct way to achieve a good segmentation result.

As a final remark, we must point out that the threshold could be adaptively updated in order to dynamically consider the real conditions (such as varying lighting conditions or shading effects, among others) in which each image may have been captured by the camera of the vision system.

7.2 Comparison with other methods

In image processing, Markov Random Fields (MRF) modeling has received a great deal of attention in the past 15–20 years, since it has a powerful capacity to integrate some visual information. This model has been used to some vision problems such as image restoration and segmentation [5255]. However, due to the great number of pixels on which the MRF need to be defined, it usually takes much computational time to obtain optimal labels, which makes difficult to apply the MRF model to real scene domains. In such cases, multiscale can be applied to reduce the computational complexity.

In some of our previous works (see [9, 56]), we have developed multiscale MRF models. Before applying our MRF image segmentation scheme, the optimal scale was determined; then a preliminary segmentation using a modified segmentation algorithm based on partition mode test (MPMT) [57] is made, and finally the MRF model on the segmented regions is defined. To find the best segmentation results we used the well-known Simulated Annealing algorithm [58, 59]; from our experiments, the best results were obtained for scale 5.

Let us compare the results of the MGT segmentation algorithm with the ones obtained from the Simulated Annealing approach, for the same database. In Figs. 13 and 14 the results of some of the segmented images for the optimal MGT value and for scale 5 in the Simulated Annealing (SA) algorithm can be observed.

Fig. 13
figure 13

Optimal results for the image in Fig. 3a for MGT and SA methods: a MGT segmentation. b SA segmentation

Fig. 14
figure 14

Optimal results first image in Fig. 9 for MGT and SA methods: a MGT segmentation. b SA segmentation

From these experiments, the visual results for the stochastic method are very similar to those obtained from our gradient approach. Let us consider now the distance results from the ideal segmentation to the optimal results for the MGT and SA methods (see Table 4), for the images in Figs. 13 and 14. We have also included the mean distance results for all the images in the database and, in Table 5, the computation time required for obtaining the optimal segmentation in each case. The experiments were run on a Intel© Core™ 2 CPU, with 2,048 MB of RAM memory, and were developed using the Image Toolbox of Matlab.

Table 4 Distance results for the MGT and SA algorithms
Table 5 Computation time results for the MGT and SA algorithms

From these results, we must remark that the distance values between the ideal segmentation and the MGT and SA segmentation algorithms are comparable, although this distance is always lower for the MGT approach. Moreover, according to the running time to perform the algorithms, MGT can run even nine times faster than Simulated Annealing with scale 5, which makes the MGT algorithm be robust enough for: (a) achieving a segmented image of good quality and (b) being implemented where real-time constraints appear.

8 Conclusions

To sum up, in this paper, we have described a novel approach to achieve an appropriate image segmentation scheme, which is based on the selection of what we called the optimal MGT. To do this, the shifted morphological operators were first described and a parallel architecture for improving the speed of these operators when working in a real-time system has been proposed.

After that, the shifted morphological operators have been used to compute the gradient and the Laplacian and, as a result, the pixel-symbol image was obtained. Therefore, this image was binarized and the distance \(\Updelta^p\) between the ideal segmentation and the MGT segmentation has been computed. Finally, the MGT value with the lowest \(\Updelta^p\) distance has been selected as the optimal threshold value.

Throughout our paper, the advantages of using a model that combines a global knowledge about image information and an efficient edge detection algorithm have been shown. Experimental results (which were performed for a 875-image database) proved that our model is robust enough and it could be applied for real-time imaging. The current work has therefore demonstrated the efficiency and robustness of the MGT segmentation algorithm.

As a future work, the results of our research could be extended to object classification and recognition. It would also be an interesting task to consider new simulation experiments with different environments, such as image sequences obtained from a camera placed in a robot platform, where real-time constraints have a great influence in the final recognition results.