INTRODUCTION

Image segmentation consists in partitioning a digital image, represented as a region, into a set of nonoverlapping connected subregions (segments) the elements of which are similar with respect to some characteristic and differ from the elements of adjacent regions [14, 15]. In this case, the problem of choosing parameters of image segmentation algorithms arises. These parameters are set to obtain the best image partition. The quality of the partition can be evaluated by an expert (visual estimation) or based on a quantitative index.

Survey [33] provides a rather complete classification of methods for evaluating image segmentation quality. Supervised and unsupervised methods can be distinguished as the main groups of these methods. A separate group includes methods that use classifiers trained on certain sets of features extracted from input images; these classifiers are capable of predicting image segmentation results.

With supervised methods, the segmentation result is usually compared with an image segmented by a specialist and taken as a ground truth segmentation [2]. There can be several ground truth segmentations provided by different experts. The quality of image segmentation can be characterized by various measures that describe edge detection error, consistency of segments, overlap of segments (Szymkiewicz–Simpson coefficient), etc. In [2, 20], the precision and recall measures were used to compare segment boundaries. In [19], Martin et al. proposed global and local consistency errors as measures for comparing segments in the output and ground truth segmentations. Other measures for evaluating segmentation quality were discussed in [6, 33]. If the image segmentation operation is considered as a process of pixel clustering, then set-theoretical, statistical, and information-theoretical measures [33] are used to compare data clustering results. The most popular measures are the Chi-square, the Rand index [26] and its modifications [31], the Jaccard index [33], the Fowlkes–Mallows measure [10], mutual information and normalized mutual information [30], and variation of information [21]. These measures allow one to compare different variants of image segmentation.

Based on similarity/dissimilarity measures, a number of automated image segmentation methods were developed. In [7, 18], empirical quality functionals were proposed that take into account the number of image segments and variation of color characteristics within the segments of output images.

In [27], a two-step algorithm for segmentation of MR and CT images was proposed; it represents the image segmentation process by using models of direct and reversed information channels. At the first step, the image is structured into quasi-homogeneous regions, based on the condition of maximum mutual information between the input image and the resulting partition. At the second step, the intensity levels of the input image histogram are clustered based on the minimum loss of mutual information between the clustering result and the partition obtained at the first step. Since the dependences of the mutual information at the input and output of the direct and reversed channels do not have extreme properties, the authors used additional conditions to obtain the best segmentation result (the number of segments, the probability of error, and the ratio between the mutual information values of the reversed and direct channels).

In [32], an information-theoretic method for evaluation of image segmentation results was developed. The entropy-based quality measure takes into account the heterogeneity of pixel characteristics within segments and the complexity of partitioning the image into segments. According to the authors, the best image segmentation results correspond to the local minima of the entropy-based measure. This measure allows one to compare the qualities of different segmentations obtained with the same algorithm at different parameter values, as well as compare the results provided by different algorithms.

A heuristic method based on the iterative application of the mean shift algorithm [8] was described in [29]. As a criterion for selecting the best partition, the relative rate of decrease in the entropy of the image obtained at the filtering stage between iterations of the mean shift algorithm with different parameter values is used. The threshold value for this criterion is chosen empirically. A disadvantage of this method is the lack of universality as it is oriented to a particular image segmentation algorithm.

An alternative approach is based on fusion of segmented images, rather than on finding an image partition that optimizes a quality measure selected. In [22], an iterative method was proposed that fuses a set of coarsely segmented images obtained in different color spaces for different values of parameters of a clustering algorithm. This idea was successfully implemented to fuse ensembles of data partitions obtained by clustering [12, 30]. For evaluation of image segmentation quality, the criterion that characterizes the mean variation of information between a fused image and each coarse segmentation was used. In [16], a generalization of the method [22] was proposed. The problem of fusing coarsely segmented images is solved as a multicriterion optimization problem. To evaluate the result of image segment fusion, the minimum mean variation of information is computed; to evaluate the accuracy of segment boundaries, the F-measure is used. The multicriterion approach improves segmentation of images from the Berkeley Segmentation Dataset (BSDS500) as compared to the single-criteria method.

The group of methods considered above consists mainly of heuristic methods, which provide good results on test sets of images.

A number of publications were devoted to automatic image segmentation based on classifiers, which are used to predict quality of image segmentation. In [17], a method for segmentation error evaluation based on a regression algorithm was proposed. The method consists of the following stages: computing the features that characterize the image segmentation result and training the regression algorithm on a set of images with available ground truth segmentations to predict the error. The method can be used to set the parameters of the image segmentation algorithm or select the best result when analyzing the input image by several segmentation algorithms running in parallel. In [13], when setting the parameters of the algorithm, the similarity between the segmentation result and the original image was estimated. As a similarity measure, the weighted uncertainty index was used; it is computed based on values of normalized mutual information for the corresponding color channels of the input and segmented images [11]. Based on expert estimates of segmentation results for the series of images, obtained at different parameter values on the coordinate plane by using the classifier, regions of undersegmentation, oversegmentation, and optimal segmentation were identified. In the process of image segmentation, the parameter of the algorithm at each point of the processed image is set using an iterative procedure based on the graph-cut algorithm [9]. In recent years, methods for predicting the quality of image segmentation based on deep neural networks have appeared [28]. In this case, the Dice similarity coefficient is used as a measure.

The disadvantages of the classifier-based approach are the subjectivity of expert estimates and the fact that the trained algorithm provides acceptable results only for those classes of images that were used in the process of training.

In [4], an information-theoretical model of the human visual system was proposed. The model is based on Barlow’s hypothesis [5] about minimization of information redundancy at the initial stages of signal processing in the human visual system. It was assumed that, at the initial stages of visual perception, the redundancy of the information coming from the retina through the optic nerve is reduced. In [24, 25], using the principle of redundancy minimization [4, 5], the criterion based on the minimum of the information redundancy measure was used to find the best image partition in a set of partitions obtained at different values of the segmentation parameter. This paper investigates this criterion of image segmentation quality. To analyze the properties of the criterion, a simplified mathematical model of the image segmentation process is proposed. It is shown that there exists a minimum of the redundancy measure for the proposed model. To confirm the validity of the proposed model, a computational experiment is carried out on images from BSDS500 [2]. The results of image segmentation based on the criterion of minimum redundancy and the entropy criterion described in [29] are compared.

1 MODEL OF THE IMAGE SEGMENTATION SYSTEM

To investigate the properties of the image segmentation system, it is required to construct its mathematical model. In this section, we propose and analyze a simplified information model that is not oriented to any particular segmentation algorithm.

1.1 Image Segmentation Problem

The image segmentation operation can be represented by the following model [24]:

$$V = F(U,t),$$
(1)

where \(U:{{\mathbb{Z}}^{2}} \to \mathbb{Z}\) is the input image, \(V:{{\mathbb{Z}}^{2}} \to \mathbb{Z}\) is the segmented image, \(F\) is the operator that describes the segmentation algorithm, and \(t \in \mathbb{R}\) is the segmentation parameter. The image segmentation problem can be formulated as follows. Given input image \(U\), segmentation algorithm (1) yields a set \(Q\) of images \(\mathfrak{V} = \{ {{V}_{1}},{{V}_{2}}, \ldots ,{{V}_{q}}, \ldots ,{{V}_{Q}}\} \) for different values of parameter \(t\). It is required to select image \({{V}_{{q\min }}}\) that minimizes measure \(M(U,{{V}_{q}})\):

$${{q}_{{\min }}} = \mathop {\arg \min }\limits_q \left[ {M\left( {U,{{V}_{q}}} \right)} \right],\quad q = 1,2, \ldots ,Q.$$
(2)

In the next section, we use the information criterion of image segmentation quality proposed in [24, 25].

1.2 Information Criterion of Image Segmentation Quality

In [4], the criterion of minimum information redundancy was used as a basis for the information-theoretical model of the human visual system. In [24], the criterion of minimum information redundancy was implemented in an image segmentation algorithm. To apply the information-theoretical approach, a probabilistic model of the relationship between the original and segmented images is required.

For simplicity, we consider the process of grayscale image segmentation. Suppose that the original and segmented images are the input and output of the stochastic information system. The gray levels of the images are described by discrete random variables \(U\) and \(V\) with values \(u\) and \({v}\). Variable \(U\) has \(L\) gray levels, while \(V\) can have \(1 \leqslant l \leqslant L\) gray levels.

The image segmentation operation is represented by the following information channel model, which is similar to the model used in [23]:

$$V = F(U + \eta ),$$
(3)

where \(U\) is the signal at the input of the channel, \(V\) is the output of the channel, \(F\) is a transformation function, and \(\eta \) is channel noise. Variables \(V\) and \(\eta \) are assumed to be independent.

We use the redundancy measure [4] as a measure of image segmentation quality:

$$R = 1 - \frac{{I(U;V)}}{{C(V)}}$$
(4)

where \(I(U;V)\) is mutual information and \(C(V)\) is channel capacity. Suppose that \(C(V) = H(V)\), where \(H(V)\) is the entropy of the output. Then, taking into account that \(I(U;V)\) = \(H(V)\)\(H(V\,|\,U)\), expression (4) takes the following form:

$$R = \frac{{H(V\,|\,U)}}{{H(V)}},$$
(5)

where \(H(V\,|\,U)\) is the conditional entropy of the channel output given input \(U\). The criterion of image segmentation quality is the minimum channel redundancy.

In the following section, we propose a simplified qualitative model of grayscale image segmentation and show that this model provides the minimum of the redundancy measure.

1.3 Information Model of Image Segmentation

To investigate the qualitative properties of the redundancy measure, we need to construct a model of joint two-dimensional discrete distribution of gray levels for the input and output of the segmentation system. This model allows us to analyze the dynamics of information measures (entropies) that characterize the process of image segmentation.

To construct the qualitative model, it is required to consider the dynamics of the joint two-dimensional discrete distribution of gray levels for the input and output of the segmentation system depending on the number of segments \(K\) in image \(V\). Suppose that grayscale image \(U\) shown in Fig. 1a is input to the image segmentation algorithm. Figures 1b–1d show two-dimensional discrete distributions for three values of \(K\) and number of grayscale levels \(L = 16\) in image \(U\). The figures illustrate the distributions of gray tones in the image segments. Each segment has a significant number of pixels of the dominant gray level, which form the peaks of the distribution, and some pixels with other gray levels. When the number of image segments is large, the distribution of gray levels within segments has sharp peaks. With enlargement of the segments and reduction in their number, the peaks are smoothed out (see Fig. 1).

Fig. 1.
figure 1

Joint discrete distribution of gray levels for the input and output of the image segmentation system at three different values of \(K\) (number of image segments): (a) grayscale version of the input image (35010.jpg). Joint discrete distribution of gray levels for \(U\) and \(V\) at K = (b) 32, (c) 57, and (d) 75.

Suppose that the joint distribution of gray levels for images \(U\) and \(V\) can be represented by \(K\) components corresponding to segments of image \(V\). For simplicity, we assume that all components are of the same type and consist of \(L\) elements that correspond to the frequency of occurrence of pixels with gray level l in the kth segment of \(V\). Each of the components has a peak corresponding to the dominant gray level.

Suppose that the components have different dominant gray levels. Suppose also that \(P({{u}_{l}},{{{v}}_{k}})\) = P if \(l = k\). The relationship between the probabilities of gray levels \(l\) in the components is determined by coefficient \(\alpha \), \(0 < \alpha \leqslant 1\). For instance, in the segment of V encoded by level \({{{v}}_{1}}\), we have

$$\begin{gathered} P({{u}_{2}},{{{v}}_{1}}) = P({{u}_{3}},{{{v}}_{1}}) = \ldots = P({{u}_{L}},{{{v}}_{1}}) \\ = \alpha P({{u}_{1}},{{{v}}_{1}}) = \alpha P, \\ \end{gathered} $$
(6)

where \(\alpha = \alpha (K)\) depends on the number of segments in image \(V\). The model of the two-dimensional discrete distribution described above is shown in Fig. 2. For this model, the following relations hold: \(P({{{v}}_{k}})\) = \((L - 1)\alpha (K)P + P\) and \(K\left[ {\left( {L - 1} \right)\alpha (K) + 1} \right]P\) = 1, which imply that \(P = {1 \mathord{\left/ {\vphantom {1 {K\left[ {\left( {L - 1} \right)\alpha + 1} \right]}}} \right. \kern-0em} {K\left[ {\left( {L - 1} \right)\alpha + 1} \right]}}\). Then, taking into account model (6) of the joint discrete distribution of gray levels for the input and output of the image segmentation system, we express the entropies included in formula (5) as follows:

$$\begin{gathered} H(U,V) = - \sum\limits_{l = 1}^L {\sum\limits_{k = 1}^K {P({{u}_{l}},{{{v}}_{k}})\log P({{u}_{l}},{{{v}}_{k}})} } \\ = \log K + \log \left[ {\alpha \left( {L - 1} \right) + 1} \right] - \frac{{\left( {L - 1} \right)\alpha \log \alpha }}{{\alpha \left( {L - 1} \right) + 1}} \\ \end{gathered} $$
(7)
$$\begin{gathered} H(U) = - \sum\limits_{l = 1}^L {\left\{ {\left[ {\sum\limits_{k = 1}^K {P({{u}_{l}},{{{v}}_{k}})} } \right]\log \left[ {\sum\limits_{k = 1}^K {P({{u}_{l}},{{{v}}_{k}})} } \right]} \right\}} \\ = - L{{P}_{0}}\left[ {\left( {{{K}_{0}} - 1} \right){{\alpha }_{0}} + 1} \right]\log \left[ {\left( {{{K}_{0}} - 1} \right){{\alpha }_{0}}{{P}_{0}} + {{P}_{0}}} \right]; \\ \end{gathered} $$
(8)
$$\begin{gathered} H(V) = - \sum\limits_{k = 1}^K {\left\{ {\left[ {\sum\limits_{l = 1}^L {P({{u}_{l}},{{{v}}_{k}})} } \right]\log \left[ {\sum\limits_{l = 1}^L {P({{u}_{l}},{{{v}}_{k}})} } \right]} \right\}} \\ = \log K, \\ \end{gathered} $$
(9)
$$\begin{gathered} H(V\,|\,U) = H(U,V) - H(U) \\ = \log K + \log \left[ {\alpha \left( {L - 1} \right) + 1} \right] - \frac{{\left( {L - 1} \right)\alpha \log \alpha }}{{\alpha \left( {L - 1} \right) + 1}} \\ + \;L{{P}_{0}}\left[ {\left( {{{K}_{0}} - 1} \right){{\alpha }_{0}} + 1} \right]\log \left[ {\left( {{{K}_{0}} - 1} \right){{\alpha }_{0}}{{P}_{0}} + {{P}_{0}}} \right], \\ \end{gathered} $$
(10)

where \({{P}_{0}}\), \({{K}_{0}}\), and \({{\alpha }_{0}}\) are quantities that correspond to image \(U\).

Fig. 2.
figure 2

Model of the joint discrete distribution of gray levels for the input and output images (\(U\) and \(V\)).

By substituting expressions (7)–(10) into (5), we obtain

$$R = 1 + \frac{{\log \left[ {\alpha \left( {L - 1} \right) + 1} \right] - \frac{{\left( {L - 1} \right)\alpha \log \alpha }}{{\alpha \left( {L - 1} \right) + 1}} + L{{P}_{0}}\left[ {\left( {{{K}_{0}} - 1} \right){{\alpha }_{0}} + 1} \right]\log \left[ {\left( {{{K}_{0}} - 1} \right){{\alpha }_{0}}{{P}_{0}} + {{P}_{0}}} \right]}}{{\log K}}.$$
(11)

It should be noted that, with decreasing number of segments \(K\) in image V, the distribution (see Fig. 1) changes. In the model shown in Fig. 2, this transformation of the distribution corresponds to an increase in coefficient \(\alpha \). This dependence can be represented by a monotonic function:

$$\alpha \left( K \right) = \frac{a}{{1 + {{e}^{{c(K - b)}}}}} + d,$$
(12)

where \(a\), \(b\), \(c\), and \(d\) are parameters (\(b > 1\) and \(d = 1 - a\)). The behavior of dependence \(\alpha \left( K \right)\) for different values of parameter \(c\) is shown in Fig. 3.

Fig. 3.
figure 3

Function \(\alpha \left( K \right)\) at different values of parameter \(c\).

Let us show that redundancy measure \(R\) defined by expressions (11), (7)–(10), and (12) has a minimum. Variable \(K\) takes values from a set of integers. To analyze function \(R\), in formulas (7)(12) we replace integer variable \(K\) with real variable \(z\). For simplicity, we assume that \(\log (x)\) in expression (11) is a natural logarithm. Then, derivative \({{dR} \mathord{\left/ {\vphantom {{dR} {dz}}} \right. \kern-0em} {dz}}\) has the following form:

$$\begin{gathered} R{\kern 1pt} '{\kern 1pt} (z) = - \frac{{\log \left[ {\alpha \left( {L - 1} \right) + 1} \right]}}{{z{{{\log }}^{2}}z}} + \frac{{\alpha \left( {L - 1} \right)\log \alpha }}{{z\left[ {\alpha \left( {L - 1} \right) + 1} \right]{{{\log }}^{2}}z}} \\ + \;\frac{{ac{{e}^{{c\left( {z - b} \right)}}}\left( {L - 1} \right)\log \alpha }}{{{{{[\alpha (L - 1) + 1]}}^{2}}{{{[{{e}^{{c\left( {z - b} \right)}}} + 1]}}^{2}}\log z}} + \frac{{H(U)}}{{z{{{\log }}^{2}}z}}. \\ \end{gathered} $$
(13)

Let us introduce the following designations:

$$\begin{gathered} R_{1}^{'}\left( z \right) = - \frac{{\log \left[ {\alpha \left( {L - 1} \right) + 1} \right]}}{{z{{{\log }}^{2}}z}}; \\ \\ R_{2}^{'}\left( z \right) = \frac{{\alpha \left( {L - 1} \right)\log \alpha }}{{z\left[ {\alpha \left( {L - 1} \right) + 1} \right]{{{\log }}^{2}}z}}; \\ \\ R_{3}^{'}\left( z \right) = \frac{{ac{{e}^{{c\left( {z - b} \right)}}}\left( {L - 1} \right)\log \alpha }}{{{{{[\alpha \left( {L - 1} \right) + 1]}}^{2}}{{{[{{e}^{{c\left( {z - b} \right)}}} + 1]}}^{2}}\log z}}; \\ \\ R_{4}^{'}\left( z \right) = \frac{{H(U)}}{{z{{{\log }}^{2}}z}}. \\ \end{gathered} $$
(14)

We consider the behavior of function \(R'\left( z \right)\) when changing \(z\) and, therefore, \(\alpha \left( z \right)\). Suppose that \({{K}_{0}} = L\); then, \(H\left( U \right)\) = \(\log \left( {{{K}_{0}}} \right)\) = \(\log \left( L \right)\). For \(z = {{K}_{0}}\), \({{\alpha }_{0}} = \alpha \left( {{{K}_{0}}} \right) = d\), \(d \ll a\). Function \(\alpha \left( z \right)\) described by expression (12) decreases with increasing \(z\) (see Fig. 3).

For small \(z\) and \(\alpha \approx 1\), the derivative of the redundancy is close to zero, \(R{\kern 1pt} '{\kern 1pt} (z) \approx 0\).

Suppose that \(z = b\) and \(\alpha (b) = \alpha (0){\text{/2}} = 0.5\). Expression (13) implies

$$\begin{gathered} R{\kern 1pt} '{\kern 1pt} (z) = \frac{{\left( {\alpha L - \alpha + 1} \right)\log \frac{L}{{\alpha L - \alpha + 1}} + \alpha \left( {L - 1} \right)\log \alpha }}{{\left( {\alpha L - \alpha + 1} \right)z{{{\log }}^{2}}z}} + \frac{{ac{{e}^{{c\left( {z - b} \right)}}}\left( {L - 1} \right)\log \alpha }}{{{{{\left[ {\alpha \left( {L - 1} \right) + 1} \right]}}^{2}}{{{[{{e}^{{c\left( {z - b} \right)}}} + 1]}}^{2}}\log z}} \\ = \frac{{4{{{\left[ {\alpha \left( {L - 1} \right) + 1} \right]}}^{2}}\log \frac{L}{{\alpha L - \alpha + 1}} + 4[{{\alpha }^{2}}{{{\left( {L - 1} \right)}}^{2}} + \alpha \left( {L - 1} \right)]\log \alpha + acb\left( {L - 1} \right)\log b\log \alpha }}{{4{{{\left[ {\alpha \left( {L - 1} \right) + 1} \right]}}^{2}}b{{{\log }}^{2}}b}} \\ = \frac{{{{{\left( {L + 1} \right)}}^{2}}\log \frac{{2L}}{{L + 1}} - ({{L}^{2}} - 1)\log 2 - acb\left( {L - 1} \right)\log b\log 2}}{{{{{\left( {L + 1} \right)}}^{2}}b{{{\log }}^{2}}b}}. \\ \end{gathered} $$

Inequality \(R{\kern 1pt} '{\kern 1pt} (z) < 0\) holds if

$$\begin{gathered} {{\left( {L + 1} \right)}^{2}}\log \frac{{2L}}{{L + 1}} - ({{L}^{2}} - 1)\log 2 \\ - \;acb\left( {L - 1} \right)\log b\log 2 < 0, \\ \end{gathered} $$

or

$$\begin{gathered} acb\left( {L - 1} \right)\log b\log 2 \\ > {{\left( {L + 1} \right)}^{2}}\log \frac{{2L}}{{L + 1}} - ({{L}^{2}} - 1)\log 2. \\ \end{gathered} $$
(15)

Let us strengthen condition (15):

$$acb\left( {L - 1} \right)\log b\log 2 > {{\left( {L + 1} \right)}^{2}}\log 2 - ({{L}^{2}} - 1)\log 2.$$

In this case, inequality \(R{\kern 1pt} '{\kern 1pt} (z) < 0\) holds if

$$acb\log b > 2\frac{{L + 1}}{{L - 1}}.$$
(16)

For instance, at \(L = 256\), \(a = 0.99\), and \(c = 0.1\), condition (16) and, therefore, condition (15) hold if \(b \geqslant 10\); at \(c = 0.5\), condition (16) holds if \(b \geqslant 6\).

Suppose that \(z = {{K}_{0}} = L\) and \(\alpha = d\). Assume also that \(d = {1 \mathord{\left/ {\vphantom {1 L}} \right. \kern-0em} L}\). Let us find conditions under which \(R{\kern 1pt} '{\kern 1pt} (z) > 0\). Based on the assumptions made above, (13) and (14) imply

$$\begin{gathered} R{\kern 1pt} '{\kern 1pt} (z) = \frac{{\left( {2L - 1} \right)\log \frac{{{{L}^{2}}}}{{\left( {2L - 1} \right)}} - \left( {L - 1} \right)\log L}}{{L\left( {2L - 1} \right){{{\log }}^{2}}L}} \\ - \;\frac{{ac{{e}^{{c\left( {L - b} \right)}}}\left( {L - 1} \right){{L}^{2}}}}{{{{{\left( {2L - 1} \right)}}^{2}}{{{[{{e}^{{c\left( {L - b} \right)}}} + 1]}}^{2}}}}. \\ \end{gathered} $$
(17)

For \(R{\kern 1pt} '{\kern 1pt} (z) > 0\) to hold, the following inequality must hold:

$$\begin{gathered} \frac{{\left( {2L - 1} \right)\log \frac{{{{L}^{2}}}}{{\left( {2L - 1} \right)}} - \left( {L - 1} \right)\log L}}{{L\left( {2L - 1} \right){{{\log }}^{2}}L}} \\ - \;\frac{{ac{{e}^{{c\left( {L - b} \right)}}}\left( {L - 1} \right){{L}^{2}}}}{{{{{\left( {2L - 1} \right)}}^{2}}{{{[{{e}^{{c\left( {L - b} \right)}}} + 1]}}^{2}}}} > 0 \\ \end{gathered} $$
(18)

Let us find the relationship of parameters of the model (6)–(12) in such a way that inequality (18) holds.

Let us strengthen inequality (18):

$$\begin{gathered} \frac{{\left( {2L - 1} \right)\log \frac{{{{L}^{2}}}}{{2L - 1}} - \left( {L - 1} \right)\log L}}{{L\left( {2L - 1} \right){{{\log }}^{2}}L}} \\ - \;\frac{{ac\left( {L - 1} \right){{L}^{2}}}}{{{{{\left( {2L - 1} \right)}}^{2}}[{{e}^{{c\left( {L - b} \right)}}} + 1]}} > 0. \\ \end{gathered} $$

By transformation, we obtain

$$\frac{{\left( {2L - 1} \right)\left[ {\left( {2L - 1} \right)\log \frac{{{{L}^{2}}}}{{\left( {2L - 1} \right)}} - \left( {L - 1} \right)\log L} \right][{{e}^{{c\left( {L - b} \right)}}} + 1] - ac\left( {L - 1} \right){{L}^{3}}{{{\log }}^{2}}L}}{{L{{{\left( {2L - 1} \right)}}^{2}}[{{e}^{{c\left( {L - b} \right)}}} + 1]{{{\log }}^{2}}L}} > 0.$$
(19)

Inequality (19) holds if

$$\left( {2L - 1} \right)\left[ {\left( {2L - 1} \right)\log \frac{{{{L}^{2}}}}{{\left( {2L - 1} \right)}} - \left( {L - 1} \right)\log L} \right][{{e}^{{c\left( {L - b} \right)}}} + 1] - ac\left( {L - 1} \right){{L}^{3}}{{\log }^{2}}L > 0,$$

which leads to the condition

$$b < L - \frac{1}{c}\log \left\{ {\frac{{ac\left( {L - 1} \right){{L}^{3}}{{{\log }}^{2}}L}}{{\left( {2L - 1} \right)\left[ {\left( {2L - 1} \right)\log \frac{{{{L}^{2}}}}{{\left( {2L - 1} \right)}} - \left( {L - 1} \right)\log L} \right]}} - 1} \right\}.$$
(20)

For instance, at \(L = 256\), \(a = 0.99\), and \(c = 0.1\), condition (18) holds if \(b < 155\); at \(c = 0.5\), condition (18) holds if \(b < 232\).

Thus, in a fairly wide range of the parameters of model (6)–(12), the value of \(R{\kern 1pt} '{\kern 1pt} (z)\) is negative at relatively small \(z\) and \(\alpha < 1\), and it is positive at large \(z\) and \(\alpha \approx d\). Therefore, there is at least one point \(z\) at which \(R{\kern 1pt} '{\kern 1pt} (z) = 0\) with the second derivative at this point being positive, \(R{\kern 1pt} ''{\kern 1pt} (z) > 0\). This means that function \(R\left( z \right)\) has a minimum.

This result can be formulated as follows.

Statement. Suppose that the image segmentation system is described by model (1)–(3), (5), (6), (11), and (12). Then, for the model parameters determined by conditions (16) and (20), information redundancy measure (5) has a minimum.

The graphs of functions \(R_{1}^{'}\left( z \right) + R_{2}^{'}\left( z \right) + R_{4}^{'}\left( z \right)\), \(R_{3}^{'}\left( z \right)\), and \(R{\kern 1pt} '{\kern 1pt} (z)\) (see (14)) at \(b = 50\), \(c = 0.1\), and \(L = 256\) (see Figs. 4a and 4b) confirm this statement.

Fig. 4.
figure 4

Function \(R'\left( z \right)\) and its components at \(b = 50\), \(c = 0.1\), and \(L = 256\): (a) functions \(R_{1}^{'}\left( z \right)\) + \(R_{2}^{'}\left( z \right)\) + \(R_{4}^{'}\left( z \right)\) and \(R_{3}^{'}\left( z \right)\) and (b) function \(R'\left( z \right)\).

The graphs of function \(R\left( K \right)\), given by expression (11), at various values of parameter \(c\) for dependence \(\alpha \left( K \right)\), given by expression (12), are shown in Fig. 5a. It can be seen from Fig. 5a that function \(R\left( K \right)\) has a minimum. Variation of entropies (7)–(10), which determine the value of information redundancy (11), is shown in Fig. 5b versus the number of image segments.

Fig. 5.
figure 5

Dependence of redundancy measure \(R\left( K \right)\) and entropies for the input and segmented images: (a) function \(R\left( K \right)\) at different values of parameter \(c\) and (b) entropies that determine the value of information redundancy \(R\) vs. number of image segments \(K\) at \(c = 0.1\).

Figures 6a and 6b show variations in the information redundancy and entropies that determine information redundancy (5) versus the number of image segments for the image depicted in Fig. 1a. Figures 4a and 6a, as well as 4b and 6b, demonstrate the qualitative similarity in the dynamics of the information redundancy and entropy values depending on the number of image segments when partitioning the hypothetical and real images.

Fig. 6.
figure 6

Dependences of the information characteristics for the image from Fig. 1a: (a) function \(R\left( K \right)\) and (b) entropies and average mutual information \(I\left( {U;V} \right)\) that determine \(R\left( K \right)\).

Now, let us show that segmented image \(V\left( {{{K}_{{\min }}}} \right)\), which corresponds to minimum redundancy \(R\left( {{{K}_{{\min }}}} \right)\), has a sufficiently small informational difference from original image \(U\).

As a measure of difference, we use variation of information, which is a metric proposed in [21]; it has the properties necessary to compare data clustering results. In our case, the variation of information characterizes the difference (distance) between the original and segmented images:

$$VI(U,V) = H(U,V) - I(U;V),$$
(21)

where \(VI(U,V)\) is the variation of information, \(H(U)\) and \(H(V)\) are the entropies of the compared images (\(U\) and \(V\)), and \(I(U;V)\) is the mutual information of the compared images.

To measure the difference between the original (\(U\)) and segmented (\(V\)) images, we use the normalized variation of information:

$$V{{I}_{n}}\left( {U,V} \right) = \frac{{VI\left( {U,V} \right)}}{{H\left( {U,V} \right)}},$$
(22)

where \(V{{I}_{n}}\left( {U,V} \right)\) is the normalized variation of information. Let us evaluate \(V{{I}_{n}}\left( {U,V} \right)\) for segmented image \(V\left( {{{K}_{{\min }}}} \right)\). For this purpose, using formulas (5) and (21), as well as relation \(H\left( {U,V} \right)\) = \(H(U)\) + \(H(V\,|\,U)\), we express \(V{{I}_{n}}\left( {U,V} \right)\) in terms of information redundancy \(R\left( K \right)\):

$$V{{I}_{n}}\left( {U,V} \right) = \frac{{VI\left( {U,V} \right)}}{{H\left( {U,V} \right)}} = 1 - \frac{{\left[ {1 - R\left( K \right)} \right]}}{{\frac{{H\left( U \right)}}{{H\left( V \right)}} + R\left( K \right)}}.$$
(23)

As above, we replace integer variable \(K\) with real variable \(z\) and compute derivative of \(V{{I}_{n}}\left( {U,V} \right)\) with respect to \(z\):

$$\begin{gathered} VI_{n}^{'}\left( {U,V} \right) = \frac{{\left[ {R\left( z \right) - 1} \right]H'\left( V \right)H\left( U \right)}}{{{{{\left[ {H\left( U \right) + R\left( z \right)H\left( V \right)} \right]}}^{2}}}} \\ + \;\frac{{R'\left( z \right)H\left( V \right)\left[ {H\left( U \right) + H\left( V \right)} \right]}}{{{{{\left[ {H\left( U \right) + R\left( z \right)H\left( V \right)} \right]}}^{2}}}}, \\ \end{gathered} $$
(24)

where \(R'\left( z \right)\) and \(H'\left( V \right)\) are the derivatives with respect to \(z\) of the redundancy measure and entropy of the system output. With a small number of segments, \(R\left( z \right) \approx 1\) and \(H(V) \ll H\left( U \right)\). In this case, (23) implies that \(V{{I}_{n}}\left( {U,V} \right) \approx 1\). For \(z < {{z}_{{\min }}}\), both terms in expression (24) are negative. Hence, the value of the normalized variation of information decreases. Then, \(\left| {1 - R\left( z \right)} \right|\) < \(\left| {1 - R\left( {{{z}_{{\min }}}} \right)} \right|\), \(\frac{{H\left( U \right)}}{{H\left( {V\left( z \right)} \right)}}\) + \(R\left( z \right)\) > \(\frac{{H\left( U \right)}}{{H\left( {V\left( {{{z}_{{\min }}}} \right)} \right)}}\) + \(R\left( {{{z}_{{\min }}}} \right)\), and

$$\left| {\frac{{\left[ {1 - R\left( z \right)} \right]}}{{\frac{{H\left( U \right)}}{{H\left( {V\left( z \right)} \right)}} + R\left( z \right)}}} \right| < \left| {\frac{{\left[ {1 - R\left( {{{z}_{{\min }}}} \right)} \right]}}{{\frac{{H\left( U \right)}}{{H\left( {V\left( {{{z}_{{\min }}}} \right)} \right)}} + R\left( {{{z}_{{\min }}}} \right)}}} \right|,$$

which implies that \(V{{I}_{n}}\left( {U,V\left( {{{z}_{{\min }}}} \right)} \right) < V{{I}_{n}}\left( {U,V\left( z \right)} \right)\) for \(z < {{z}_{{\min }}}\).

At the point corresponding to \(R\left( {{{z}_{{\min }}}} \right)\), taking into account (9), expression (24) implies

$$\begin{gathered} VI_{n}^{'}\left( {U,V} \right) = \frac{{\left[ {R\left( {{{z}_{{\min }}}} \right) - 1} \right]H{\kern 1pt} '\left( {V\left( {{{z}_{{\min }}}} \right)} \right)H\left( U \right)}}{{{{{\left[ {H\left( U \right) + H\left( {V\left( {{{z}_{{\min }}}} \right)} \right)R\left( {{{z}_{{\min }}}} \right)} \right]}}^{2}}}} \\ = \frac{{\left[ {R\left( {{{z}_{{\min }}}} \right) - 1} \right]H\left( U \right)}}{{{{z}_{{\min }}}{{{\left[ {H\left( U \right) + R\left( {{{z}_{{\min }}}} \right)\log {{z}_{{\min }}}} \right]}}^{2}}}} < 0. \\ \end{gathered} $$

For \(z > {{z}_{{\min }}}\), \(R\left( z \right) > R\left( {{{z}_{{\min }}}} \right)\) and the first term in expression (24) satisfies the following inequality:

$$\begin{gathered} \left| {\frac{{\left[ {R\left( z \right) - 1} \right]H\left( U \right)}}{{z{{{\left[ {H\left( U \right) + R\left( z \right)\log z} \right]}}^{2}}}}} \right| \\ < \;\left| {\frac{{\left[ {R\left( {{{z}_{{\min }}}} \right) - 1} \right]H\left( U \right)}}{{{{z}_{{\min }}}{{{\left[ {H\left( U \right) + R\left( {{{z}_{{\min }}}} \right)\log {{z}_{{\min }}}} \right]}}^{2}}}}} \right|. \\ \end{gathered} $$

In addition, at \(z > {{z}_{{\min }}}\), the second term on the right-hand side of expression (24) is positive and increases with increasing \(z\), whereas the absolute value of the first term decreases. Therefore, the rate of decrease in \(V{{I}_{n}}\left( {U,V} \right)\) at \(z \geqslant {{z}_{{\min }}}\) drops. This indicates that the segmented image corresponding to the minimum of information redundancy has a sufficiently small informational difference in terms of measure (22) from the image input to segmentation system (3). Graphs that characterize the dependence of measure \(V{{I}_{n}}\left( {U,V} \right)\) on the number of image (\(V\)) segments at different values of parameter \(c\) in function (12) of the segmentation model are shown in Fig. 7.

Fig. 7.
figure 7

Measure of difference \(V{{I}_{n}}\left( {U,V} \right)\) vs. the number of segments for image \(V\) at different values of parameter \(c\).

The effectiveness of the information redundancy criterion in image segmentation is confirmed by the computational experiment described in the next section.

2 COMPUTATIONAL EXPERIMENT

The computational experiment that is carried out to estimate the effectiveness of the criterion consists of two stages. At the first stage, the criterion of minimum information redundancy is used in combination with the simple linear iterative clustering (SLIC) algorithm [1] for segmenting images from BSDS500 dataset [2]. At the second stage, the results of image segmentation based on the criterion of minimum information redundancy and the entropy criterion proposed in [29] are compared.

2.1 Using the Criterion of Minimum Information Redundancy in Combination with the SLIC Algorithm

In this work, to demonstrate the validity of the proposed image segmentation model, we use images from BSDS500 [2]. Each of the analyzed images is segmented using a modified SLIC algorithm [1] at different values of postprocessing parameter \({{\Delta }_{1}}\) [24]. As a result of segmenting image \(U\), we obtain a set of \(Q\) images \(\mathfrak{V} = \{ {{V}_{1}},{{V}_{2}},...,{{V}_{Q}}\} \). For image \(U\) and each of images \({{V}_{q}}\), \(q = 1,2,...,Q\) obtained at different values of the image segmentation parameter, redundancy measure R is evaluated and image \({{V}_{{R\min }}}\) corresponding to the global minimum of \(R\) is found among images \({{V}_{q}}\). To take into account all the color channels of the image, a weighted redundancy measure is used:

$${{R}_{w}}(U,{{V}_{q}}) = \frac{{{{R}_{L}}{{H}_{L}}(U) + {{R}_{a}}{{H}_{a}}(U) + {{R}_{b}}{{H}_{b}}(U)}}{{{{H}_{L}}(U) + {{H}_{a}}(U) + {{H}_{b}}(U)}},$$
(27)

where \({{R}_{i}}\) is the redundancy measure computed for channel \(i \in \{ L,a,b\} \) of the CIELAB color space for images \(U\) and \({{V}_{q}}\), while \({{H}_{i}}\) is the entropy of color channel \(i\) for image \(U\). We select image \({{V}_{{q\min }}}\) that minimizes measure \({{R}_{w}}\): \({{R}_{w}}(U,{{V}_{{q\min }}})\) = \({{R}_{{\min }}}\). The results of this stage of the experiment are illustrated by a color version of the image (35010.jpg) shown in Fig. 1a. The values of the weighted redundancy measure for this image are represented by the solid line in Fig. 8. The minimum of the redundancy measure is reached at \(K = 87\).

Fig. 8.
figure 8

Weighted redundancy measure \({{R}_{w}}\) and normalized variation of information \(V{{I}_{w}}(U,{{V}_{q}})\) vs. the number of segments for the color version of the image from Fig. 1a.

Then, segmented images \({{V}_{q}}\), \(q = 1,2,...,Q\), are compared with original image \(U\) and ground truth segmentations \(V_{s}^{{GT}}\), \(t = 1,2,...,S\) from BSDS500 database. For this purpose, we use the weighted normalized variation of information [3, 22]:

$$V{{I}_{w}}(U,{{V}_{q}}) = \frac{{V{{I}_{{nL}}}{{H}_{L}}(U) + V{{I}_{{na}}}{{H}_{a}}(U) + V{{I}_{{nb}}}{{H}_{b}}(U)}}{{{{H}_{L}}(U) + {{H}_{a}}(U) + {{H}_{b}}(U)}},$$
(28)
$$V{{I}_{{ni}}}(U,{{V}_{q}}) = \frac{{{{H}_{i}}(U) + {{H}_{i}}({{V}_{q}}) - 2{{I}_{i}}(U{\text{;}}{{V}_{q}})}}{{{{H}_{i}}(U,{{V}_{q}})}},$$
(29)

where \(V{{I}_{w}}(U,{{V}_{q}})\) is the weighted variation of information, \(V{{I}_{{ni}}}\) is the normalized variation of information between color channels \(i\) of images \(U\) and \({{V}_{q}}\), \({{I}_{i}}(U;{{V}_{q}})\) is the mutual information between color channels of the images, and \({{H}_{i}}(U,{{V}_{q}})\) is the joint entropy in the ith color channel.

Based on the results of comparing the original image (35010.jpg, denoted by \(U\)) with images \({{V}_{q}}\) obtained at \(0 \leqslant {{\Delta }_{1}} \leqslant 3.6\), Fig. 8 plots variation of information \(V{{I}_{w}}(U,{{V}_{q}})\), shown by the dashed line, against number of segments K.

The next task is to compare five ground truth segmentations of the image under analysis (corresponding numbers of image segments \({{K}_{{GT}}}\) are 30, 26, 31, 33, and 23) with a series of segmented images \({{V}_{q}}\) obtained by the SLIC algorithm [1] with the post-processing procedure [24] for parameter values \(0 \leqslant {{\Delta }_{1}} \leqslant 3.6\). The result of the comparison is shown in Fig. 9. Three of the five ground truth segmentations have the minimum distance, in terms of measure (28)–(29) (which implies the maximum similarity), to the image with 87 segments obtained at \({{\Delta }_{1}} = 2\), for which the minimum of redundancy measure \({{R}_{w}}(U,{{V}_{q}})\) computed by formula (27) (see Fig. 8) is reached.

Fig. 9.
figure 9

Normalized variation of information \(V{{I}_{w}}(U_{i}^{{GT}},{{V}_{q}})\), \(i = 1,2,...,5\), vs. the number of segments when comparing segmented images \({{V}_{q}}\) with ground truth segmentations \(U_{i}^{{GT}}\) from BSDS500.

To evaluate the quality of image segmentation, we use the relative difference

$$\Delta {{K}_{{{\text{rel}}}}} = \frac{{{{K}_{{\min }}} - K_{{\min }}^{{GT}}}}{{{{K}_{{\max }}}}},$$

where \({{K}_{{\min }}}\) is the number of image segments that provides the minimum of redundancy criterion \({{R}_{w}}(U,{{V}_{{q\min }}}) = {{R}_{{\min }}}\), \(K_{{\min }}^{{GT}}\) is the number of segments in image \({{V}_{q}}\) that provides the minimum variation of information \(V{{I}_{w}}(V_{s}^{{GT}},{{V}_{q}})\) as compared to ground truth segmentation \(V_{s}^{{GT}}\), and \(K_{{\max }}^{{}}\) is the maximum number of segments in images \({{V}_{1}},{{V}_{2}},...,{{V}_{Q}}\). A histogram for \(\Delta {{K}_{{{\text{rel}}}}}\), which is constructed based on the results of segmenting 54 images from BSDS500 and comparison with 270 ground truth segmentations, is shown in Fig. 10. It can be seen that there is a sufficiently large group of images for which the results of segmentation by the criterion of minimum information redundancy demonstrate high informational similarity to the ground truth segmentations.

Fig. 10.
figure 10

Histogram for \(\Delta {{K}_{{{\text{rel}}}}}\) computed when segmenting images from BSDS500, where \(\nu \) is the frequency of occurrence of a certain value of \(\Delta {{K}_{{{\text{rel}}}}}\).

2.2 Comparison of the Proposed Method with the Method Based on the Entropy Criterion

We compared results of segmenting the images from BSDS500 by the criterion of minimum information redundancy and the entropy criterion proposed in [29]. The comparison results are presented for the image shown in Fig. 11 (118035.jpg).

Fig. 11.
figure 11

Test image: 118035.jpg.

For segmentation, we use the mean shift algorithm [8] implemented in the EDISON system. In accordance with the algorithm proposed in [29], using the EDISON system, we obtain a set of filtered (\(V_{q}^{{{\text{filt}}}}\)) and segmented (\({{V}_{q}}\), \(q = 1,2,...,Q\)) images with different parameter values. Spatial resolution \({{h}_{s}}\) varies from 8 to 32, while range resolution \({{h}_{r}}\) varies from 4 to 16. The smallest significant feature size is \(M = 30\). For each of the filtered images, the mean entropy is computed:

$${{H}_{{{\text{mean}}}}}(V_{q}^{{{\text{filt}}}}) = \frac{1}{3}\sum\limits_i {{{H}_{i}}(V_{q}^{{{\text{filt}}}}),} $$

where \({{H}_{i}}\) is the entropy of color channel \(i\) for image \(V_{q}^{{{\text{filt}}}}\). Figure 12 shows the mean entropies for \({{h}_{r}} = 15\), \({{h}_{r}} = 16\), and \(8 \leqslant {{h}_{s}} \leqslant 32\). When using the entropy criterion under the condition that the difference of entropies \({{H}_{{{\text{mean}}}}}\) of the images after filtering by the EDISON algorithm does not exceed the value \(edsEnt = 0.005\) recommended in [29], we obtain the image with the number of segments \(K = 36\) at \({{h}_{r}} = 16\) and \({{h}_{s}} = 22\). The value of the normalized variation of information between the input image and the image obtained by the entropy criterion (\({{V}_{{q\min }}}\)) is \(V{{I}_{w}} = 0.7897\). For \(edsEnt = 0.012\), we obtain image \({{V}_{{q\min }}}\) with the number of segments \(K = 30\) at \({{h}_{r}} = 16\) and \({{h}_{s}} = 20\), for which \(V{{I}_{w}} = 0.7894\).

Fig. 12.
figure 12

Mean entropy \({{H}_{{{\text{mean}}}}}({{V}_{q}})\) of the filtered image from Fig. 11 at EDISON parameters \({{h}_{r}} = 15\), \({{h}_{r}} = 16\), and \(8 \leqslant {{h}_{s}} \leqslant 32\).

The minimum value of the redundancy measure for image \({{V}_{{q\min }}}\) from the obtained set of segmented images \({{V}_{q}}\), \(q = 1,2,...,Q\), is \({{R}_{{w\min }}} = 0.2923\) for number of segments \(K = 35\), \({{h}_{r}} = 15\), and \({{h}_{s}} = 16\). In this case, the value of the weighted normalized variation of information (28) between the original and segmented images is \(V{{I}_{w}}(U,{{V}_{{q\min }}}) = 0.7863\). Thus, the segmented image obtained by the condition of minimum redundancy is more similar to the input image than the image obtained with the entropy criterion proposed in [29]. The images segmented by different criteria are shown in Fig. 13. The segmentation results are visually similar. In the image corresponding to the information redundancy minimum, the bell is better distinguished (see Fig. 13a), while in the images obtained by the entropy criterion, some of the small bell tower details are slightly better distinguished (see Figs. 13b, 13c).

Fig. 13.
figure 13

Results of segmenting the image from Fig. 11: (a) based on the criterion of minimum information redundancy, (b) based on the entropy criterion [29] with threshold edsEnt = 0.005, and (c) based on the entropy criterion [29] with edsEnt = 0.012.

For quantitative comparison of the results, we estimate information difference \(V{{I}_{w}}(V_{s}^{{GT}},{{V}_{{q\min }}})\), \(s = 1,2,...,5\), (28)–(29) for five ground truth segmentations \(V_{s}^{{GT}}\) from BSDS500 and segmented images \({{V}_{{q\min }}}\) obtained using the criteria discussed above. The comparison results are shown in Table 1. It can be seen that, in three out of five cases, the segmented image obtained by the criterion of minimum information redundancy has smaller difference from the ground truth segmentations than the segmented image obtained by the minimum entropy criterion when threshold \(edsEnt = 0.012\) and in four out of five cases when \(edsEnt = 0.0005\). It should be noted that, in general, the segmentation results that minimize these criteria have quite similar variation of information as compared to the ground truth segmentations from BSDS500.

Table 1. Weighted normalized variation of information \(V{{I}_{w}}(V_{s}^{{GT}},{{V}_{{q\min }}})\), \(s = 1,2,...,5\) between the ground truth segmentations of the image from BSDS500 and the segmentations obtained using the EDISON algorithm with different criteria

CONCLUSIONS

In this paper, we have described an iterative information-theoretical method for evaluating the quality of digital image segmentation. A system has been considered that includes a segmentation algorithm with a parameter determining the number of image segments and the iterative procedure for setting the value of this parameter that provides the minimum of the quality functional. The image segmentation algorithm has been regarded as an information channel. The simplified mathematical model of the digital image segmentation algorithm has been proposed. Based on Barlow’s hypothesis [5] and the principle of minimum redundancy in the visual perception model [4], the segmentation quality index—the information redundancy measure—has been selected. It has been shown that, for the proposed model, there exists a minimum of the redundancy measure, which corresponds to the best image partition. The relationships among the model parameters for which this minimum is reached have been derived. The validity of the model has been confirmed by the computational experiment on images from the Berkeley Segmentation Dataset (BSDS500). It has been found that, for a sufficiently large group of test images, the segmentation results obtained by the condition of minimum redundancy have the minimum variation of information as compared to the ground truth segmentations from BSDS500. With the ground truth images being manually segmented by specialists, it can be concluded that the segmentations obtained based on the condition of minimum information redundancy are the best segmentations in terms of visual perception. This does not contradict Barlow’s hypothesis [5] that information redundancy is minimized at the initial stages of signal processing in the human visual system.

We compared the results of image segmentation by the EDISON system based on the criterion of minimum information redundancy and the entropy criterion. The segmented image obtained based on the condition of minimum redundancy is more similar to the input image than the image obtained by the entropy criterion, and, in most cases, it is more similar to the ground truth segmentations.

In future works, we intend to carry out a deeper investigation of the developed model and properties of the segmentation quality index based on the redundancy measure.