Abstract
Segmentation of images represents the first step in many of the tasks that pattern recognition or computer vision has to deal with. Therefore, the main goal of our paper is to describe a new method for image segmentation, taking into account some Mathematical Morphology operations and an adaptively updated threshold, what we call Morphological Gradient Threshold, to obtain the optimal segmentation. The key factor in our work is the calculation of the distance between the segmented image and the ideal segmentation. Experimental results show that the optimal threshold is obtained when the Morphological Gradient Threshold is around the 70% of the maximum value of the gradient. This threshold could be computed, for any new image captured by the vision system, using a properly designed binary metrics.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Originality and contribution
In general terms, image segmentation divides an image into related sections or regions, consisting of image pixels having related data feature values. Among the wide variety of image segmentation approaches, Mathematical Morphology is considered to be a powerful tool for such kind of applications, in particular for those applications involving some kind of geometric analysis. As a result, in our paper, we describe a generic parallel architecture to implement what we call the shifted Mathematical Morphology operations. Then, the MGT segmentation scheme is designed and all the basic morphological operations are revised and adapted for working in a real-world environment. After that, the optimal threshold for each image is selected by means of a binary distance and, finally, a set of experiments is completed to show that our segmentation scheme achieves near-optimal results. The minimum distance between the segmented images and the ideal segmentation provides the optimal threshold, which is obtained when MGT is around the 70% of the maximum value of the morphological gradient, for the 875-image database used in the experiments.
The main advantages of our method include the combination of the local properties of every pixel and an efficient edge detection algorithm to calculate a well-suited threshold. Moreover, the use of what we call shifted morphological operators improves the robustness of image segmentation algorithms. Finally, we must remark that the MGT threshold could be updated in order to consider changing conditions in the environment, such as varying lighting conditions or shading effects, among others, making it robust enough to be used in unstructured environments.
2 Introduction
The recognition of the world that surrounds a human and the subsequent fast response to the different stimuli which a human brain may have identified seems to be a simple task. In many cases, this response is related to an efficient processing of the visual information previously collected. In other terms, if human beings are able to determine what things are so easily, to what extent is it necessary to develop automated systems for analyzing images?
Nevertheless, the last 40 years have shown that there are hundreds (even thousands) of vision applications in which using an automated system brings a more appropriate result than the combination eyes + brain could do. For instance, there are some applications, such as visual inspection in industrial manufacturing or the implementation of specific benchmarks to compare the performance of different vision algorithms, in which a huge amount of measures from image databases are needed; therefore, a massive visual analysis from a human being could become a tedious work and, what is more, it could generate erroneous conclusions. A well-designed automated system accurately performs these systematic tasks and avoids possible human errors.
In general terms, the first step in a vision processing system is the pre-processing stage, which is usually used for enhancing or restoring images. After that, the two subsequent phases are image segmentation and feature extraction. The task of decomposing an image into distinctly different but homogeneous parts is called image segmentation, and the process of extracting information from these regions that define their unique properties is called feature extraction.
Image segmentation has been the subject of intensive research and a wide variety of segmentation techniques have been reported in literature [1–3]. It is an essential issue since it is the first step for image understanding, and any other step, such as feature extraction and recognition, heavily depends on its results. However, there is still certain ambiguity about the definition of image segmentation, whereas for some authors segmentation simply consists of labeling every pixel in an image using a value indicating their correspondence to a region or a class [4, 5], many other researchers consider that it is also necessary to provide a method to represent the topological relations between the different regions in which the original image has been originally divided [6, 7].
On the other hand, segmentation algorithms are based on two significant criteria: the homogeneity of a region and the discontinuity between adjacent disjoint regions. Although many image segmentation approaches deal with these problems [8–10], it is difficult to satisfy all the properties for the optimal set of segmented regions. The resulting segmented images generally depend on a series of predetermined threshold values. So, the algorithm often fails due to it either merges regions that should be divided or separates connected regions.
One of the main applications in which image segmentation is being applied is vision-based robot navigation. Visual navigation of a mobile robot in a real environment has received considerable attention over the past few years [11–13]. The basic requirements of robot navigation include self-localization, obstacle avoidance and path planning. There are a great amount of proposals for global localization [14, 15], which implies in many cases to select different sensor technologies. For our proposals, we shall consider global localization by means of vision.
One of the main disadvantages for vision-based navigation to be widely accepted is the huge amount of information involving the robots environment which must be provided by the camera(s). Therefore, the development of techniques which reduce the massive processing of input data and, consequently, the computation time, involves a subsequent improvement of the recognition rates of the surrounding world of an autonomous system. To do this, finding a reliable segmentation scheme is the first step to effectively accomplish the remaining tasks that navigation involves.
In relation to that, in this work a new technique for segmenting images, which combines edge detection methods with local pixel information algorithms, is proposed. Our method is based on using the so-called shifted Mathematical Morphology operations to find the edges of the input image, which are used to compute what we call the pixel-symbol image. In some of our previous works [16], the basis of this algorithm was first introduced. In the present work, we will study the shifted morphological operations in depth; then, we generalize the concept of pixel-symbol image, showing that the search of the optimal threshold for obtaining a segmented image is a minimization procedure, and, finally, the experiments are extended to a more robust image database, integrated by 875 real-world images. The completed experiments show that our method gives accurate results and that using the estimated optimal threshold allows the system to obtain a segmented image of good quality.
The paper is organized as follows: first, a review of the main current approaches to image segmentation is found in Sect. 3. Then in Sect. 4, the fundamentals of gray-scale Mathematical Morphology are analyzed and the shifted morphological operators are defined. After that, the Morphological Gradient Threshold (MGT) image segmentation algorithm is described throughout Sect. 5. Section 6 shows a distance metrics to measure how a MGT segmented image can be used instead of the ideal segmentation, which leads to the estimation of the optimal threshold for the segmented images. Afterwards, in Sect. 7 the completed experiments to find the optimal MGT value are shown. Finally, we end with some concluding remarks in Sect. 8.
3 Image segmentation fundamentals
In many cases, it is commonly accepted that the central objective of image segmentation is the separation of the objects included in an image from the background pixels. The most popular methods use the intensity levels to measure the degree of similarity among pixels and, as as result, pixels are classified either as objects or as background.
The classical methods for image segmentation are divided into three different categories [17–20]:
-
Methods based on a global (and usually, prior) knowledge about image information: shape of objects, range of intensity levels, etc.
-
Methods based on finding edges.
-
Methods based on finding regions or textures.
When different objects have different gray levels, a good choice is using the histogram, which indicates the number of pixels belonging to each gray level. In this case, the selection of a threshold depends on the calculation of the gray level which optimally divides the image into foreground and background pixels [21, 22]. These threshold techniques, which make decisions based on local pixel information, are effective when the intensity levels in the objects are quite different from the range of levels in the background. Histogram-based threshold methods are very sensitive to noise and non-uniform illumination; for that reason, the image is usually filtered (smoothing filters, median filters, etc.) before the computation of the threshold is completed.
On the other hand, edge-based methods are usually utilized for the detection of contours, i.e., it is assumed that there is a correspondence between the intensity discontinuities in an image and the boundaries of objects. In many cases, the extraction of the boundary pixels starts from the computation of the x and y gradients of the intensity levels of the image (i.e., the first derivative) and, afterwards, a pixel is classified as ‘boundary’ or ‘non-boundary’. Assuming that the boundary of the objects would generate well-defined intensity changes is usually correct; but in many cases the intensity levels suddenly change due to other factors, such as different textures in the same object, varying lighting conditions or non-uniform surfaces, among others. Therefore, it is rather difficult to extract well-defined contours and a subsequent algorithm to connect boundary pixels must be used.
Once the edges in the input image are obtained, the existing objects in the scene must be determined. If detection were perfect, each object would be surrounded by a perfect contour. Unfortunately, this ideal situation seldom occurs, since boundaries are usually discontinuous and noise pixels are often classified as boundary pixels and connected to what actually are object contours. To overcome this problem, different approaches have been proposed [23, 24]; to link contours, starting from a pixel included in a contour, most algorithms search for other contour pixels and link them by means of different criteria, such as following a contour perpendicular to the gradient direction [25, 26]. Other proposals take into account the so-called active contours [27, 28], which start with some initial boundary shape (usually represented as spline curves), then the algorithm iteratively modifies the shape by applying several deformation operations according to certain energy function.
Finally, region-based methods usually assume that the objects to be segmented are represented by homogeneous regions, where homogeneity is determined following some criterion, such as finding the similarities among the intensity levels [29, 30]. These segmentation techniques are useful when the system works with noisy images, where edge-based methods have shown not to perform properly in many cases. After splitting the image into different regions, adjacent regions are merged depending on a set of rules, which often involve homogeneity, sharpness or the textures of different region boundaries.
To sum up, although there are a wide variety of proposals which make use of many of the segmentation approaches described above (or some variations of them), there is no universal solution able to overcome all the real condition factors, such as the lack of uniform lighting conditions, the presence of noise or the overlapping problems due to occlusions. This fact has made image segmentation a field of intensive research so far, and hybrid techniques, which use a mix of the methods described above, have become popular in recent years.
4 A revision of mathematical morphology operations
Mathematical Morphology is considered to be a powerful tool for image analysis, in particular for those applications which involve some kind of geometric analysis. Mathematical Morphology has been extensively used in low-level image processing applications, since it allows to filter and/or enhance only some specific features of objects, depending on their morphological shape (for further details see [31–36], among others).
The underlying idea of this technique is to analyze the shape of the objects in an image by convolving each pixel with a small template (usually, a symmetric, regular-shaped template, such as a line, a segment, a circle, a square, etc.), known as the Structuring Element (SE). For our proposals, let us assume that a grey-level image can be represented as a function f(X) defined as the union of a pile of decreasing binary sets X; thus, each of the possible 256 sets (each one representing a grey level, from 0 to 255) is the intersection between the function f and a series of constant level planes λ, where \(\lambda = 0, 1,\ldots, 255{:}\)
Hereafter, let us denote f(X) ≡ F. Following our revision, the two elementary operations of Mathematical Morphology, erosion \(F\ominus B\) and dilation \(F \oplus B,\) for grayscale images are defined as follows:
As (2) and (3) state, erosion and dilation operate with pixels in a local way, i.e., by taking into account the intensity values for each pixel and its neighborhood, which is defined by the shape and the size of the SE B. Since almost any other morphological operator, such as opening, closing, top hat, etc., uses erosion or dilation (or a combination of both) to complete their execution, these two basic operations must be properly implemented to satisfy real-time constraints (see [37] for more details). It is clear, though, that the efficiency of the algorithms used to compute morphological operations will depend on the particular hardware architecture in which the vision system is being implemented.
From the definition of erosion and dilation, we can conclude that the computational complexity of morphological operations depends on the number of elements in the morphological operands. As a result, to compute a dilation or an erosion, for each pixel the algorithm looks for the highest (dilation) or the lowest (erosion) intensity level in the neighborhood defined by the structuring element. Let P = M × N be the number of pixels of the input image F, where M is the number of rows in F and N is the number of columns in F; let Q be the number of elements in the structuring element B. The computational complexity \(\mathcal{O}(\cdot)\) of the dilation (erosion) \(F \oplus B\) \((F \ominus B)\) is:
In many cases, the structuring element B is a small template and, consequently, P ≫ Q. As a result, \(\mathcal{O}(F \oplus B) = \mathcal{O}(F \ominus B) \simeq P.\) This computational complexity can be reduced as follows. Let us define some concepts first.
Definition 1
A structuring element B is said to be convex with respect to a given set of morphological operations (e.g., dilation) with a given set of structuring elements \(B_1, B_2, \ldots, B_n,\) if it can be expressed as a chain of dilations of the elements \(B_i,\; for\; i=1, \ldots, n{:}\)
Otherwise, B is said to be non-convex with respect to the same set of structuring elements.
Most planar structuring elements can be decomposed as a chain of morphological operations, i.e., they are convex. This property is useful to reduce the computation time of Mathematical Morphology basic operations. The point now is how to implement the decomposition to improve the efficiency of morphological operations.
Ignoring the trivial case in which only one point belongs to the SE, the most basic SEs just consist of two consecutive points. Thus, there are four possible elemental SEs—one for each coordinate direction—which constitute what we call a SE canonical base.
Definition 2
A set of SEs \(\{B_1,B_2, \ldots, B_n\}\) constitute a canonical base \(\Uplambda\) for a morphological operation \(F \bigcirc B\) if any other convex SE can be decomposed and/or reconstructed with \(F \bigcirc B\) using only some or all the SEs included in \(\Uplambda.\)
Thus, the four elements in Fig. 1 constitute a canonical base \(\Uplambda_1\) for the dilation, where black pixels indicate the center of each structuring element:
One of the most popular structuring elements is the 3 × 3 square. This SE B can be decomposed using the canonical base \(\Uplambda_1\) as follows:
Thus, the dilation of an image F using the 3 × 3 square SE can be performed as follows:
Figure 2 shows how a 5 × 5 square SE can be divided using the canonical base \(\Uplambda_1.\)
Now, the following proposition can be enunciated.
Proposition 1
Dilating an image F with a k-point linear segment B is equivalent to shift image F k-1 times in the direction defined by the center in B and, then, compute in parallel:
whereF s is the shifted image.
The erosion of an image F using any convex SE can be expressed in a similar way, since the erosion of an object is equivalent to the dilation of the background. Thus, for the erosion, Proposition 1 is also valid, but the infimum must be computed instead of the supremum.
Following with the 3 × 3 square example, a dilation (or an erosion) can be computed using four 1-pixel shifts of the original image F, each one along a different direction of the coordinate axis. After that, for each pixel in the original image F and in the shifted images \(F_{s_j}, \) j = 1, 2, 3, 4, a global supremum (or a global infimum for the erosion) must be calculated.
This technique can be applied to any convex planar structuring element for a canonical base \(\Uplambda.\) The parallelization is now straightforward: if a processing element is used to compute each shift, and each processing element completes its execution in parallel, the computational time of dilation/erosion will be reduced by a factor γ, where γ is the number of elements in the SE canonical base \(\Uplambda.\) As any convex structuring element can be ultimately decomposed using the four elements in the canonical base \(\Uplambda_1,\) this canonical base can be taken as the practical set of SEs to determine the number of required shifts for completing a morphological operation. Let p, q, r, s be the times we must perform the 1-pixel shift along each coordinate direction. Then, for a 3 × 3 square SE, p = q = r = s = 1, whereas for a 5 × 5 square SE, p = q = r = s = 2.
Two remarks must be pointed out to conclude this section: (a) as shown before, the computation time of the shifted morphological operations have a direct relation with the size of the image, P (number of rows and columns), and the size of the structuring element Q, since the number of shifts are related to the shape and the size of the structuring elements in the canonical base \(\Uplambda;\) and (b) due to their parallelization properties, the computational cost of the shifted morphological operations will be reduced when a parallel architecture is used by the automated vision system.
5 Developing a morphological segmentation scheme
As previously described, segmentation is one of the major research areas in image analysis and object recognition. Designing a robust segmentation scheme is not an easy task: a tradeoff between the computational cost of visual recognition and the reliability of the information obtained must be considered. Thus, in general terms, the less uncertainty about the interpretation of a scene is desired, the more time to complete the task is needed. Due to real-time constraints, developers often choose low computation time algorithms to implement real segmentation schemes. Therefore, some of the most popular approaches who provide low computation times and good interpretation results are threshold techniques and edge-based methods, as previously shown.
Threshold techniques, which make decisions based on local pixel information, are highly dependent on the contrast of images, i.e., the intensity levels of objects must be quite different from the ones in the background. Threshold algorithms are simple and give very good results when used in controlled environments, such as visual inspection for assembly lines in a factory, but deciding the threshold values is not an easy task. Specially, this becomes a hard issue for an automated vision system, as the system must update the threshold values without any external help.
On the other hand, edge-based methods mainly use gradient operators, such as Sobel, Prewitt or Canny ones, to detect the boundary of the objects in an image. Edge detection also involves some problems, usually when working with noisy images, since it could divide edges that must be connected or detect false contours due to spurious noise.
To overcome these problems, we propose a method that combines both threshold and gradient operators: the so-called MGT segmentation, as described in Fig. 6 (see also [16]). It consists of seven steps, where the gradient and the Laplacian are calculated using the Mathematical Morphology operations and the optimal threshold value is selected by measuring the lowest distance between the ideal segmentation and a pre-computed set of MGT segmented images.
In the following sections, a more detailed description of this process is found.
5.1 Pre-computing morphological operations
In almost any digital image there is some amount of white noise. To avoid noise effects, which only spend computation time and modify the original features of image, an initial filtering process has to be applied. There are many algorithms to accomplish this task; for instance, a Gaussian filter can be used, since it preserves many of the image features while its computational cost can be assumed in a real-time environment [38].
One inherent problem of low-pass filters, such as the Gaussian filter or a mean filter, is that, despite they efficiently minimize the presence of noise in images, image contrast is also reduced. This fact results in blurred object boundaries, which can cause errors when detecting edges. Hence, using a median filter could be a better choice, since it eliminates noise while preserving object boundaries [39]. That is the reason why a median filter has been chosen for the pre-processing stage of our segmentation scheme.
Once the noise is reduced, the point is how to get the optimal segmented image. To do this, let us consider first the computation of the derivative-based operations to perform the edge detection stage, i.e., the gradient and the Laplacian must be calculated.
To detect contour lines, Sobel or Prewitt operators are frequently chosen as practical gradient masks, since they achieve acceptable results without using much computation time, either. In spite of this, most of the times gradient operators are combined with other kind of post-processing techniques to avoid practical mistakes—such as classifying noise or background pixels as boundaries—as much as possible. Moreover, gradient-based operators work efficiently when there are sudden intensity changes throughout the image, but they detect poorly gradual intensity variations. To overcome this problem, one of the most popular methods is to compute the zero-crossing points of the second derivative, the Laplacian, of the input image. If first-derivative operators are quite sensitive to noise, this effect is much far extended to second-derivative operators. As a consequence, after finishing the edge detection stage, the resulting image may have some isolated pixels to be removed. This post-processing operation can be performed using Mathematical Morphology, as shown next.
Let us recall that the shifted morphological operations are essentially computed by shifting the input image depending on the structuring element and that having defined a SE canonical base \(\Uplambda,\) every structuring element can be decomposed in terms of the γ basic elements included in \(\Uplambda.\) Furthermore, almost any complex morphological operation is a combination of the two basic dual operators, erosion and dilation. Consequently, having pre-calculated the necessary shifts of the input image, the remaining operations to generate the optimal MGT segmented image can be completed in terms of global supremum/infimum (dilation/erosion) operations. Thus, the edge detection can be performed using the morphological gradient and Laplacian and the post-processing removal of the isolated points is efficiently performed by means of the morphological closing.
Thus, the morphological gradient of an image F for a structuring element B, ρ B (F), is defined as follows:
where\(F \oplus B\) and \(F \ominus B\) are, respectively, the dilation and the erosion of an image F for a structuring element B defined in (2) and (3).
The following step is to calculate the second derivative, the Laplacian. A morphological implementation for the Laplacian has also been chosen, since the previously pre-calculated shifted images for computing erosion and dilation can be used, as well. Thus, the morphological Laplacian of an image F for a structuring element B, \(\Upgamma_B (F),\) is defined as:
Finally, to remove isolated points we have chosen a morphological closing operator. The closing of image F for a structuring element B is defined as:
i.e., the closing is obtained by dilating the original image first, then eroding the resulting image.
The results for a gray-scale image after these initial steps are shown in Fig. 3, where the SE B is a 3 × 3 square and the canonical base is, therefore, \(\Uplambda_1\) (see Fig. 1).
5.2 The pixel-symbol image
The next task is building a new image, which we call the pixel-symbol image which characterizes pixels for an optimal segmentation. Thus, after having completed all the operations described in the last section, the pixel-symbol image \(\mu(x,y,\Upomega), \) for \(\Upomega= \langle \alpha_1, \alpha_2, \alpha_3 \rangle, \) is obtained as follows:
where MGT is the Morphological Gradient Threshold, \(\Upomega = \langle \alpha_1, \alpha_2, \alpha_3 \rangle\) is a ternary list of symbols so that \({\{\alpha_i \in [0,255], \alpha_i \in \mathbb{Z}^{+}, \alpha_i \neq \alpha_j, \; \forall \; i , j= \{1,2,3\} / i \neq j \},}\) and (x, y) is a pixel in F. Consequently, the resulting pixel-symbol image has three different gray-levels, according for a pixel to belong to an object, to the background or to the boundary of an object.
The choice of the threshold value is a challenging task, since the final result is high dependent on many factors, such as lighting conditions, the texture of objects or shading effects. Figure 4 shows the results of the construction of the pixel-symbol image for the image in Fig. 3, with several different MGT values and \(\Upomega =\langle 0 , 128, 255 \rangle.\)
Though many practical systems utilize a manually pre-calculated threshold, in this work we consider the use of an automated thresholding system. This method takes into account a binary image metrics to compare the segmentation results and, afterwards, to establish the quality level of the obtained segmentation, as it is described in the following section.
6 Finding an optimal MGT value: the Δp distance
A main problem in computer vision is to be able to compare the results using a proper metrics. This will quantify the differences between two images and, if binary images are used, the method would be both easily implementable and low computationally complex [40]. In our system, we are interested in measuring the distance between image S′ (the segmented image after gradient thresholding) and image S (the manually pre-calculated ideal segmentation). Thus, this will lead to the calculation of the optimal MGT value.
Consequently, the pixel-symbol image \(\mu(x, y,\Upomega)\) must be binarized first. We must recall that \(\mu(x, y,\Upomega)\) has only three gray-levels (11) which are determined by \(\Upomega = \langle \alpha_{1}, \alpha_{2}, \alpha_{3}\rangle.\) For our system to be robust and computationally efficient, let us consider that the threshold is the same as the one used in the construction of \(\mu(x, y,\Upomega),\) i.e., the gradient threshold MGT. The results of this process are shown in Fig. 5, for \(\Upomega = \langle 0 , 128, 255 \rangle.\)
Next, a reliable measure to compare the obtained segmented image with the ideal segmentation must be selected.
6.1 An error metric for MGT segmentation
Error measures are often used in the designing of algorithms for segmentation and classification [41–43] and edge detection for computer vision [44–46]. General principles for error measurement were enunciated in [44], in the context of edge detection. A good edge filter should exhibit:
-
Good detection. Low probability of failing to detect an edge, and low probability of incorrectly labeling a background pixel as an edge.
-
Good localization. Points identified as edge pixels should be as close as possible to the center of the true edge.
-
Unicity. There should be only one response to a single edge.
Optimal edge filtering involves a tradeoff between good detection and good localization. Error measures for detection and localization were surveyed by [47–49]. Our main interest focus on a direct comparison between the ideal segmentation S and the resulting pixel-symbol image S′
Definition 3
Let S be the ideal binary segmented image and let S′ be the segmented image, obtained from some segmentation threshold MGT and some SE canonical base \(\Uplambda.\) Therefore, d(S, S′) is a distance metric iff the four following properties are accomplished:
-
(i)
\(d(S,S^{\prime})\geq 0.\)
-
(ii)
\(d(S,S^{\prime})= 0 \Longleftrightarrow S=S^{\prime}.\)
-
(iii)
\(d(S,S^{\prime})= d(S^{\prime},S).\)
-
(iv)
\(d(S,S^{\prime})\leq d(S,S^{\prime\prime})+d(S^{\prime\prime},S^{\prime}), \)
where \(S^{\prime\prime}\) is another MGT segmented image.
Since our segmentation scheme is based on Mathematical Morphology, one of the error metrics related with this kind of operations is Hausdorff distance, which has been used, for instance, to interpolate noisy images by means of morphological operators [50, 51].
Hausdorff distance is a technique for comparing sets of points; let us assume, thus, that a binary image S can be represented as a set \(\Upphi,\) so that \(\Upphi=\{s \in S: \Upphi(s)=1\}.\) Consequently, for each pixel s in a model image S, Hausdorff distance computes its proximity to any pixel \(s^{\prime}\) in a new binary image \(S^{\prime}\) and viceversa. This metrics is employed in pattern recognition to determine the degree of resemblance between two objects (see [51]) and, as a result, it can be used to evaluate how similar two binary segmented images, S and S′ are.
Definition 4
If \(\delta(s,s^{\prime})\) is the distance between two pixels s and \(s^{\prime},\) then the shortest distance between a pixel \(s \in \Upphi\) and \(S^{\prime} \subseteq \Upphi^{\prime}\) is defined by:
where \(\Upphi^{\prime}=\{s^{\prime} \in S^{\prime}: \Upphi^{\prime}(s^{\prime})=1\}.\)
The distance function has a Lipschitz property:
for any \(s,t \in \Upphi.\) In particular, \(d(\cdotp,S^{\prime})\) is uniformly continuous. Closed sets are characterized by their distance functions: \(d(\cdot,S) \equiv d(\cdot,S^{\prime})\) iff \(S=S^{\prime}.\)
Definition 5
Let \(\Upphi=\{s_{1},\ldots,s_{m}\}\) be the set who represents the ideal segmentation S of image F and let \(\Upphi^{\prime}=\{s^{\prime}_{1},\ldots,s^{\prime}_{n}\}\) be the set who represents the MGT segmented image. The Hausdorff Distance \(H(\Upphi,\Upphi^{\prime})\) can be defined as:
As a result, the Hausdorff distance gives an interesting measure of the mutual proximity of two sets of points, by indicating the maximum distance from any pixel of one set to the other set. Unfortunately, this metrics involves an inherent problem: the presence of outlier points, mainly coming from noise effects. That is, Hausdorff distance is very sensitive to noise: a single outlier pixel can cause \(H(\Upphi,\Upphi^{\prime})\) to take its maximum possible value, since it uses a supremum of distance values.
As shown before, one of the main tools to remove noise in image pre-processing is the use of some kind of image smoothing, such as some kind of median filter or a Gaussian filter. Due to the fact that Hausdorff distance is highly dependent on noise, in order for the noise effects to be removed, let us replace the supremum in (14) by a mean or pth-order mean. This is shown next:
Definition 6
Let \(\Updelta^p(\Upphi, \Upphi^{\prime})\) be a distance metrics, defined as the pth-order mean difference of the distance between two sets: \(\Upphi\) (the ideal binary segmentation) and \(\Upphi^{\prime}\) (the binary pixel-symbol image).
Then, for \(1\leq p <\infty\) we define:
where N is the total number of pixels in \(\Upphi.\)
The result is topologically equivalent to \(H(\Upphi,\Upphi^{\prime}),\) since distance functions \(d(s,\Upphi)\) are continuous Lipschitz functions by (13) (see [49]).
It is evident that \(\Updelta^p(\Upphi, \Upphi^{\prime})\) is a metric, since properties (i) and (iii) of Definition 3 are directly satisfied by (15). From (13), property (iv) is also accomplished and, finally, as mentioned before, closed sets are characterized by their distance functions: \(d(\cdot,\Upphi) \equiv d(\cdot,\Upphi^{\prime})\) iff \(\Upphi=\Upphi^{\prime},\) so (15) also satisfies property (ii) in Definition 3. As a consequence, \(\Updelta^p(\Upphi, \Upphi^{\prime})\) is a metric.
Intuitively, \(\Updelta^p(\Upphi, \Upphi^{\prime})\) measures the suitability for an estimated image to be used instead of the real one. Parameter p determines the relative importance of large localization errors. For large values of p, the effect is similar to the Hausdorff metric, since the pth-order mean is equivalent to the supremum:
6.2 Obtaining the optimal threshold
In this point, we have defined a distance metrics to compare the ideal segmentation (represented by the set ϕ) with the segmented images using the MGT segmentation scheme, which are represented by the sets \(\phi^{\prime}_i\) for \(i=1,\ldots,M\) defined by a set of M different MGT values.
Therefore, we seek for the MGT threshold that minimizes the distance function \(\Updelta^p(\Upphi, \Upphi^{\prime}_i),\) since this minimum distance indicates the closest MGT segmented image to the ideal segmentation. That is to say:
for \(i=1,\ldots,M.\)
The main steps of our algorithm are shown in Fig. 6.
Hence, the method for obtaining the optimal MGT threshold can be summarized as follows:
-
Select a set of M different MGT values.
-
From (11), define a set of M pixel-symbol maps, \(\mu_i(x, y,\Upomega)\), for \(i=1,\ldots,M. \)
-
Binarize the pixel-symbol maps and obtain the sets \(\phi^{\prime}_i.\)
-
Compute MGT opt by minimizing \(\Updelta^p(\Upphi, \Upphi^{\prime}_i).\)
As stated in Sect. 5.1, almost all the morphological operations can be defined in terms of the basic erosion and dilation operations; thus, since the morphological erosion and dilation were defined as a set of shifts in Sect. 4, we can pre-calculate the necessary shifts of the input images for any application (according to the selected structuring element), then compute the global infimum/supremum (erosion/dilation) as necessary. For our experiments, we have used a 3 × 3 square SE, so the number of 1-pixel shifts for each coordinate direction is p = q = r = s = 1, i.e., 4 shifts in total.
As a result, the optimal MGT segmented image can be obtained following the procedure aforementioned; according to (8)–(10), who are the ones involving Mathematical Morphology operations and are necessary to obtain the pixel-symbol image, there are a total of three erosions and three dilations for obtaining \(\mu(x, y,\Upomega), \) i.e., we shall need two basic operations for each one of the defined M pixel-symbol maps; if the shifts are pre-computed this process will involve only two global infimum/supremum, one for the erosion and one for the dilation.
After defining the distance metrics and the minimization process, let us evaluate the performance of our segmentation scheme using a set of real-world images.
7 Experiments
Let us show now the results of some experiments completed for our model. We will first focus on obtaining the minimum distance between the ideal segmentation and the MGT segmented images. After that, our method is compared with others in order to extract the advantages of our approach.
7.1 Finding MGT opt
The tests have been performed with a series of real-world images, whose pixel-symbol images have been calculated for different MGT values. Then, after applying the binarization process, the distance \(\Updelta^p(\Upphi, \Upphi^{\prime})\) has been computed. To complete the tests, a number of M = 9 equispaced MGT values are chosen; we also included the gradient mean as a MGT value.
In any case for our tests, what we call the ideal segmentation is computed using a semi-automated procedure; thus, we first applied the well-known Sobel algorithm for detecting edges and, afterwards, images were binarized using a threshold, for each input image. The threshold level for the binarization is determined by the average and standard deviation of the grayscale values. Then, in order to obtain the true edges of the images, isolated points were removed through a morphological closing operation. All these methods were supervised by a human so as to achieve the best results for the ideal segmentation.
Table 1 and Fig. 7 show the distance results, with different MGT values, for the image in Fig. 3a, for p = 2.
In this case, the lowest distance is obtained when MGT opt = 0.7 × max(ρ B (F)).
Figure 8 compares the ideal segmentation and the MGT segmentation with the lowest \(\Updelta^p(\Upphi, \Upphi^{\prime}).\) As shown, the selected MGT value is rather similar to the ideal segmentation.
Our segmentation scheme has been applied to an empirically collected database of real-world images, which include a set of 875 images; to do this, after selecting a wide range of MGT values, the distance between the ideal segmentation and each MGT segmented image has been computed. Figure 9 shows some of the images included in the database.
Let us consider now the MGT segmentation results for the first image in Fig. 9. Table 2 and Fig. 10 show the distance results, for different MGT values and p = 2. Then, Fig. 11 compares the resulting segmented image for the optimal MGT value (MGT opt = 0.8 × max(ρ B (F))) with the ideal segmentation.
Finally, let us show the results obtained after applying our segmentation scheme to all the images in the database. Table 3 and Fig. 12 show the mean distance results for our database.
The mean minimum distance, \(\overline{\Updelta^p}(\Upphi, \Upphi^{\prime})= 0.2874,\) is obtained when MGT opt = 0.7 × max(ρ B (F)) and, as a consequence, we find that this value of the MGT is the optimal one for our database, since in this case our segmentation scheme is very close to the ideal segmentation. Moreover, for all the performed experiments, the mean value of the MGT for obtaining the minimum distance was \(\overline{MGT}_{\hbox{min}(\Updelta) } = 0.7263 \times \hbox{max}(\rho_B(F)), \) which again shows that the choice of this value for MGT opt leads to a correct way to achieve a good segmentation result.
As a final remark, we must point out that the threshold could be adaptively updated in order to dynamically consider the real conditions (such as varying lighting conditions or shading effects, among others) in which each image may have been captured by the camera of the vision system.
7.2 Comparison with other methods
In image processing, Markov Random Fields (MRF) modeling has received a great deal of attention in the past 15–20 years, since it has a powerful capacity to integrate some visual information. This model has been used to some vision problems such as image restoration and segmentation [52–55]. However, due to the great number of pixels on which the MRF need to be defined, it usually takes much computational time to obtain optimal labels, which makes difficult to apply the MRF model to real scene domains. In such cases, multiscale can be applied to reduce the computational complexity.
In some of our previous works (see [9, 56]), we have developed multiscale MRF models. Before applying our MRF image segmentation scheme, the optimal scale was determined; then a preliminary segmentation using a modified segmentation algorithm based on partition mode test (MPMT) [57] is made, and finally the MRF model on the segmented regions is defined. To find the best segmentation results we used the well-known Simulated Annealing algorithm [58, 59]; from our experiments, the best results were obtained for scale 5.
Let us compare the results of the MGT segmentation algorithm with the ones obtained from the Simulated Annealing approach, for the same database. In Figs. 13 and 14 the results of some of the segmented images for the optimal MGT value and for scale 5 in the Simulated Annealing (SA) algorithm can be observed.
From these experiments, the visual results for the stochastic method are very similar to those obtained from our gradient approach. Let us consider now the distance results from the ideal segmentation to the optimal results for the MGT and SA methods (see Table 4), for the images in Figs. 13 and 14. We have also included the mean distance results for all the images in the database and, in Table 5, the computation time required for obtaining the optimal segmentation in each case. The experiments were run on a Intel© Core™ 2 CPU, with 2,048 MB of RAM memory, and were developed using the Image Toolbox of Matlab.
From these results, we must remark that the distance values between the ideal segmentation and the MGT and SA segmentation algorithms are comparable, although this distance is always lower for the MGT approach. Moreover, according to the running time to perform the algorithms, MGT can run even nine times faster than Simulated Annealing with scale 5, which makes the MGT algorithm be robust enough for: (a) achieving a segmented image of good quality and (b) being implemented where real-time constraints appear.
8 Conclusions
To sum up, in this paper, we have described a novel approach to achieve an appropriate image segmentation scheme, which is based on the selection of what we called the optimal MGT. To do this, the shifted morphological operators were first described and a parallel architecture for improving the speed of these operators when working in a real-time system has been proposed.
After that, the shifted morphological operators have been used to compute the gradient and the Laplacian and, as a result, the pixel-symbol image was obtained. Therefore, this image was binarized and the distance \(\Updelta^p\) between the ideal segmentation and the MGT segmentation has been computed. Finally, the MGT value with the lowest \(\Updelta^p\) distance has been selected as the optimal threshold value.
Throughout our paper, the advantages of using a model that combines a global knowledge about image information and an efficient edge detection algorithm have been shown. Experimental results (which were performed for a 875-image database) proved that our model is robust enough and it could be applied for real-time imaging. The current work has therefore demonstrated the efficiency and robustness of the MGT segmentation algorithm.
As a future work, the results of our research could be extended to object classification and recognition. It would also be an interesting task to consider new simulation experiments with different environments, such as image sequences obtained from a camera placed in a robot platform, where real-time constraints have a great influence in the final recognition results.
References
Weszka JS (1978) A survey of threshold selection techniques. Comput Graphics Image Process 7(2):259–265
Zhang YJ (1996) A survey on evaluation methods for image segmentation. Pattern Recogn Lett 29(8):1335–1346
Freixenet J, Muñoz X, Raba D, Marti J, Cufi X (2002) Yet another survey on image segmentation: region and boundary information integration. In: Proceedings of the 7th European conference on computer vision ECCV 2002. Lecture notes in computer science, vol 2352. Springer, Berlin, pp 408–422
Paglieroni DW (2004) Design considerations for image segmentation quality assessment measures. Pattern Recogn Lett 37(8):1607–1617
Ruiz-Del-Solar J, Verschae R (2004) Robust skin segmentation using neighborhood information. In: Proceedings of the 2004 international conference on image processing ICIP 2004, IEEE, vol 1, pp 207–210
Couprie M, Bezerra FN, Bertrand G (2001) Topological operators for grayscale image processing. J Electr Imaging 10(4):1003–1015
Dokladal P, Bloch I, Couprie M, Ruijters D, Urtasun R, Garnero L (2003) Segmentation of 3D head MR images using morphological reconstruction under constraints and automatic selection of markers. Pattern Recogn Lett 36(10):2463–2478
Sclaroff S, Liu L (2001) Deformable shape detection and description via model-based region grouping. IEEE Trans Pattern Anal Mach Intell 23(5):475–489
Arques P, Compañ P, Molina R, Pujol M, Rizo R (2003) Minimization of an energy function with robust features for image segmentation. Kybernetes 32(9/10):1481–1491
Martin D, Fowlkes C, Malik J (2004) Learning to detect natural image boundaries using local brightness, color and texture cues. IEEE Trans Pattern Anal Mach Intell 26(5):530–549
Ohya A, Kosaka A, Kak A (1998) Vision-based navigation by a mobile robot with obstacle avoidance using single-camera vision and ultrasonic sensing. IEEE Trans Robot Autom 14:969–978
Winters N, Santos-Victor J (2002) Information sampling for vision-based robot navigation. Robot Autonom Syst 41:145–159
Yuen DCK, MacDonald BA (2005) Vision-based localization algorithm based on landmark matching, triangulation, reconstruction, and comparison.. IEEE Trans Robot Autom 21(2):217–226
Wolf J, Burgard W, Burkhardt H (2002) Using an image retrieval system for vision-based mobile robot localization. In: Lew MS, Sebe N, Eakins JP (eds) International Conference on image and video retrieval (CIVR 2002). Lecture notes in computer science, vol 2383. Springer, Berlin, pp 108–119
Se S, Lowe DG, Little JJ (2005) Vision-based global localization and mapping for mobile robots. IEEE Trans Robot Autom 21(3):364–375
Pujol FA, García JM, Pujol M, Rizo R, Pujol MJ (2004) Selection of an automated morphological gradient threshold for image segmentation. In: Sanfeliu A, Martinez JF, Carrasco JA (eds) Progress in pattern recognition, image analysis and applications: 9th Iberoamerican congress on pattern recognition, CIARP 2004. Lecture notes in computer science, vol 3287. Springer, Berlin, pp 92–99
Lucchese L, Mitra SK (2001) Color image segmentation: a state-of-the-art survey. In: Proceedings of the Indian National Science Academy (INSA-A), vol 67, no 2, pp 207–221
Zouagui T, Benoit-Cattin H, Odet C (2004) Image segmentation functional model. Pattern Recogn Lett 37(9):1785–1795
Brink AD (1989) Gray-level thresholding of images using a correlation criterion. Pattern Recogn Lett 9(5):335–341
Maddah M, Zou KH, Wells WM, Kikinis R, Warfield SK (2004) Automatic optimization of segmentation algorithms through simultaneous truth and performance level estimation (STAPLE). In: Barillot C, Haynor DR, Hellier P (eds) Medical image computing and computer-assisted intervention MICCAI 2004: 7th international conference, Saint-Malo, France. Lecture notes in computer science, vol 3216. Springer, Berlin, pp 274–282
Puzicha J, Hofmann T, Buhmann J (1999) Histogram clustering for unsupervised image segmentation. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition, CVPR’99, vol 2, pp 602–608
Saha PK, Udupa JK (2001) Optimum image thresholding via class uncertainty and region homogeneity. IEEE Trans Pattern Anal Mach Intell 23(7):689–706
Donoho DL (1995) De-noising by soft-thresholding. IEEE Trans Inf Theory 41(3):613–627
Pollak I, Willsky AS, Krim H (2000) Image segmentation and edge enhancement with stabilized inverse diffusion equations. IEEE Trans Image Process 9(2):256–266
Chen J, Sato Y, Tamura S (2000) Orientation space filtering for multiple orientation line segmentation. IEEE Trans Pattern Anal Mach Intell 22(5):417–429
Colombo C, Comanducci D, Del Bimbo A (2006) Camera calibration with two arbitrary coaxial circles. In: Proceedings of the 9th European conference on computer vision ECCV 2006. Lecture notes in computer science, vol 3951. Springer, Berlin, pp 265–276
Lobregt S, Viergever M (1995) A discrete dynamic contour model. IEEE Trans Med Imaging 14(1):12–24
Kass M, Witkin A, Terzopoulos D (1988) Snakes: active contour models. Int J Comput Vis 1(4):321–331
Chu CC, Aggarwal JK (1993) The integration of image segmentation maps using region and edge information. IEEE Trans Pattern Anal Mach Intell 15(12):1241–1252
Bhalerao A, Wilson R (2001) Unsupervised image segmentation combining region and boundary estimation. Image Vis Comput 19(6):353–368
Serra J (1982) Image analysis and mathematical morphology. Academic Press, New York
Serra J (1988) Image analysis and mathematical morphology: theoretical advances, vol. 2. Academic Press, London
Goutsias J, Vincent L, Bloomberg DS (2000) Mathematical morphology and its applications to image and signal processing. Kluwer Academic Press, Boston
Soille P (2003) Morphological image analysis: principles and applications. Springer, New York
d’Ornellas MC, van den Boomgaard R (2003) The state of art and future development of morphological software, towards generic algorithms. Int J Pattern Recogn Artif Intell 17(2):231–255
Pina P, Ribeiro L, Muge F (2001) A mathematical morphology contribution to study some aspects of hydrogeological systems. Comput Geosci 27(9):1061–1070
Pujol FA, García JM, Fuster A, Pujol M, Rizo R (2002) Use of mathematical morphology in real-time path planning. Kybernetes 31(1):115–123
Basu M, Su M (2001) Image smoothing with exponential functions. Int J Pattern Recogn Artif Intell 15:735–752
Ben Hamza A, Luque PL, Martínez J, Román R (1999) Removing noise and preserving details with relaxed median filters. J Math Imaging Vis 11:161–177
Cardoso JS, Corte-Real L (2005) Toward a generic evaluation of image segmentation. IEEE Trans Image Process 14(11):1773–1782
Shepp L, Vardi Y (1982) Maximum likelihood reconstruction for emission tomography. IEEE Trans Med Imaging 1(2):113–122
Besag J (1986) On the statistical analysis of dirty pictures. J R Stat Soc B 48:259–302
Oyama S, Tanaka K (2006) Learning a distance metric for object identification without human supervision. In: Knowledge discovery in databases: 10th European conference on principles and practice of knowledge discovery in databases, PKDD 2006. Lecture notes in computer science, vol 4213. Springer, Berlin, pp 609–616
Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 6(6):679–698
Grimson WEL (1990) Object recognition by computer—the role of geometric constraints. MIT Press, Cambridge
Tagare HD, de Figueiredo RJP (1990) On the localization performance measure and optimal edge detection. IEEE Trans Pattern Anal Mach Intell 12(12):1186–1190
Peli T, Malah D (1982) A study on edge detection algorithms. Comput Graphics Image Process 20:1–21
van Vliet LJ, Young IT, Beckers GL (1989) A nonlinear laplace operator as edge detector in noisy images. Comput Vis Graphics Image Process 45:167–195
Baddeley AJ (1992) An error metric for binary images. In: Forstner W, Ruwiedel S (eds) Robust computer vision: quality of vision algorithms. In: Proceedings, international workshop on robust computer vision. Wichmann, Karlsruhe, pp 59–78
Dougherty ER (1991) Application of the Hausdorff metric in gray-scale Mathematical Morphology via truncated umbrae. J Vis Commun Image Represent 2:177–187
Huttenlocher DP, Klanderman GA, Rucklidge WJ (1993) Comparing images using the Hausdorff distance. IEEE Trans Pattern Anal Mach Intell 14(9):850–853
Geman S, Geman D (1984) Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6:721–741
Geman D (1991) Random fields and inverse problems in imaging. Lecture notes in mathematics, vol 1427. Springer, Berlin, pp 113–193
Li SZ (2001) Markov random field modeling in image analysis. Springer, New York
Modestino JW, Zhang J (1992) A Markov random field model-based approach to image interpretation. IEEE Transa Pattern Anal Mach Intell 14(6):606–615
Arques P, Compañ P, Molina R, Pujol M, Rizo R (2002) A cybernetic approach to the multiscale minimization of energy function: grey level image segmentation. Kybernetes 31(3–4):596–608
Suk M, Chung SM (1983) A new image segmentation technique based on partition mode test. Pattern Recogn Lett 16(5):469–480
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Sci Agric 220(4598):671–680
Yalabik N, Yalabik C, Goktepe M, Atalay V (1998) Unsupervised texture based image segmentation by simulated annealing using Markov random field and Potts models. In: Proceedings of the 14th international conference on pattern recognition (ICPR’98), vol 1. IEEE Computer Society, Washington, DC, pp 820–822
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pujol, F.A., Pujol, M., Rizo, R. et al. On searching for an optimal threshold for morphological image segmentation. Pattern Anal Applic 14, 235–250 (2011). https://doi.org/10.1007/s10044-011-0215-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-011-0215-0