1 Introduction

Image segmentation is a fundamental problem in computer vision. It refers to split the image into uniform and homogeneous regions which correspond to meaningful parts of the image. Image segmentation application area varies from the identification of objects from remote sensing data [7, 21, 22] to the detection of cancerous cells. For example, it can be used to diagnose medical imaging [27], or to extract interest points for images for identifying the local features of an image [43]. In the literature, a lot of image segmentation algorithms have been proposed. They can be classified into two main categories: Edge detection and Region-based approaches. Edge detection methods are based on the use of discontinuities to detect the edge for segmenting an image. Several methods [3, 16] are proposed in this category, which almost are based on the abrupt changes in image intensity or color. In Region-based approaches which is another popular category of segmentation methods, the segmentation is achieved using an iterative manner, until some uniformity criteria are satisfied. The principal methods in this category are based on thresholding [30], regional growth [39] and graph-based [6]. Graph-based techniques represent the image components into mathematically sound structures which makes the segmentation problem easier and the computation faster and efficient. The problem of the graph based image segmentation methods concerns the partition into several sub-graphs, such that each sub-graph represents a meaningful object of interest in the image. The idea of using graphs as an approach for segmentation was brought by Wu et al. [41]. From then on, the study of optimization techniques on the graph attracted much research attention [1, 15, 17, 20, 34]. Our work is in this line. In this paper, we propose a framework that uses a weighted region adjacency graph to represent an image as a network, where regions represent nodes in the network. An edge between two nodes is considered if they are adjacent and then the weight is calculated using images features (color and texture).

Because networks are growing exponentially in size, variety, and complexity, newer and different types of communication networks are emerging, such as social networks, biological network and so on. Most of them are structured into communities, which represent groups of nodes that are heavily connected among themselves but sparsely connected with nodes in other groups [21]. Taking into account the importance of community detection, it is not surprising that many community detection methods have been developed using techniques from different disciplines. So, it is possible to consider those methods for identification of objects in an image. More specifically, an image can be mapped as a graph and community detection approaches can be considered to identify the regions, which correspond to communities in networks. However, using them only has a practical limitation.

Motivated by graph-based methods and the application of community detection algorithms in graphs, we propose a general framework for image segmentation based on community detection algorithms. In many existing works, complex networks have been formulated as non oriented unweighted networks to simplify the analysis and the computation. Nevertheless, these networks can simplify computation and analysis, but they lost some important information, which affects the network performance. Unlike these existing works, the proposed general framework starts with an initial segmentation of an image in order to build a Region Adjacency Graph (RAG) [36]. The nodes are the initial regions and an edge exists between two nodes if they are adjacent. Then the edges are weighted according to the similarity between meaningful visual features of the regions (texture, color). A community detection algorithm is applied next on the Weighted RAG (WRAG) in order to partition the network into a set of communities. These communities are used to group similar adjacent regions in the image. Indeed, all the nodes that belong to the same community are considered to belong to the same region and merged into a single region in the image. The process is iterated until there is no difference between the uncovered community structures of two successive iterations

The main contributions of this work are the following:

  • First, we propose to model an image with a Weighted RAG that takes advantages of the topological and visual properties of images (texture, color).

  • Second, we use community detection algorithms developed in the complex networks paradigm in order to solve the segmentation problem.

  • Third, an iterative process is proposed in order to avoid over-segmentation issues.

The rest of the paper is organized as follows. Section 2 presents previous investigations related to the proposed general framework. In Section 3, we review the necessary background on evaluation metrics used to asses the proposed framework. Section 4 presents the definition of community detection concept. The details of each step of the proposed framework are presented in Section 5. Section 6 gives a description of data and methods used for experiments. We reported our experiments results in Section 7. Finally, in Section 8, we present our conclusions.

2 Related work

Several graph-based techniques are used to segment an image [15, 17, 41], a lot of them are based on the optimization of a cost function. In [41], authors considered a minimum cut criterion, which seeks to partition a graph into a number of subgraphs by minimizing the sum of the edge weights.

Felzenszwalb et al. [15] proposed an approach derived from a pairwise region comparison to segment the image. It defines a criterion to evaluate if an edge exists between two regions based on an iterative strategy and using a graph representation of the image, to obtain the final segmentation. Their proposed method is almost linear-time in the number of graph edges and can be employed to segment large images.

Shi et al. [17] used a technique called normalized cut. Authors represent the points to be clustered by an undirected graph, where the nodes represent the points to be segmented and each edge weight represents the similarity between two points. The graph cut is measured by the weights of the total connection from vertices in a set A to all the vertices in the graph, the weight is computed next by measuring a certain image quantity (e.g., color, intensity, etc.) between the two vertices connected by that edge. The Ncut measure tries to minimize the cut and to penalize partitions at the same time in which one set of nodes is only loosely connected to the graph at large.

Li et al. propose an image segmentation algorithm [34] based on graph modularity optimization. First, they start their algorithm by an initial segmentation using the superpixels method which oversegment the image into a set of small segments. Each segment represents a region in the graph, and then the graph of regions is constructed using two image features to compute the similarity between regions, this similarity is assigned as a weight to the network. Finally, from the regions graph, they apply a community detection algorithm based on modularity optimization.

In Abin et al. [1], authors start by an initial segmentation using meanshift and then they construct the network using the similarity between two regions using only color information. This similarity is used as a weight to the edges of regions network. Finally, they apply a community detection algorithm to obtain the segmented image. Nevertheless, the authors didn’t use an iterative process to avoid the over-segmentation problem, instead, they use a post-processing algorithm to merge regions with areas smaller than a predefined threshold with other regions. If a region area is smaller than the threshold t, it’s merged to the most similar adjacent region in the network.

Linares et al. [20] proposed also an algorithm based on community detection algorithms. They start first by an initial segmentation using superpixels method, and then, they consider pixels on the image as nodes on the graph. To construct the network, they use the CIELAB feature to compute the similarity between superpixels in the image. A connection between two nodes is considered only if the weight is smaller than a threshold t. Finally, when the graph of superpixels is constructed, they apply the fast greedy algorithm which is a hierarchical agglomerative algorithm for community detection. However, the proposed approach in [20] uses an unweighted network instead of a weighted network which means that some important informations of the image are lost, which affects the network performance.

The cited graph-based segmentation approach are generally sensitive to noise, and use either the color or the texture as a measure to compute the similarity between image regions, which leads to over-segmentation by neglecting the regularities inside the image. Moreover, most of these methods are based on community detection algorithms that have a high computational cost. To overcome these limitations, the proposed framework starts with an initial segmentation to split the image into regions which should be coherent and preserves most of the information necessary for segmentation. Then, a RAG is used to represent the image where each region represents a node in the graph. An edge between two regions is considered if they are adjacent. In order to weight the RAG, a combination of texture and color features is employed to measure the similarity between nodes. Finally, based on efficient community detection algorithms which strike the best balance between the computational cost and segmentation performance, we extract communities that represent regions in the image. The process is repeated iteratively until the optimal segmentation is achieved.

3 Background

As we mentioned previously, the proposed framework starts by an initial segmentation to split the image into small regions, then a RAG is constructed using the adjacency relationship. This RAG is weighted using the similarity between regions based on image features (color, texture). Finally, community detection algorithms are employed for partitioning the weighted RAG into communities to obtain the optimal segmentation in the image. In this section, we describe methods and measures used in the proposed framework.

3.1 Initial Segmentation algorithms

The goal of the initial segmentation is to split the image into homogeneous, possibly small regions. Several low-level segmentation methods can be used in this step, such as super-pixel, Meanshift, levelset, and watershed.

3.1.1 Super-pixels algorithm

The super-pixels algorithm splits the same perceptual region in a multitude of smaller regions. It’s usually used as an initial segmentation process for reducing the pixels number and the computational complexity of subsequent tasks. Several studies exist for the extraction of super-pixels. In [9], authors propose an efficient technique that yields quasi-uniform super-pixels with low computational cost. The results obtained by the method show its efficiency in terms of computational costs and compactness of segments and over-segmentation errors. Authors in [10] use a connected K-means algorithm with convexity constraint for extracting super-pixels. According to a number of regions desired by the user, the image is splited into rectangular regions (segments) using a regular grid. Then, by minimizing a cost function:

$$ C_{x,y}(i)= \lambda_{1}.|I(x,y)-I_{i}|+\lambda_{2}.|(x-{C_{x}^{i}})^{2}+(y-{C_{y}^{i}})^{2})| $$

Where x and y are the positions of the pixel tested among different segments; λ1 and λ2 correspond respectively, to the weighting of intensity similarity and convexity constraints; Ii denotes the mean intensity of the ith segment and \({C_{x}^{i}}\) and \({C_{y}^{i}}\) are the center positions of the ith segment. The super-pixels algorithm tests pixels at the over segment boundaries and assigned them to the new segments. In the proposed framework, a publicly available code [24] is used to get the superpixel initialization.

3.1.2 Meanshift algorithm

The meanshift is a non-parametric iterative algorithm [13] that can be used for a lot of purposes like finding modes, clustering etc. One advantage of Meanshift over other pre-segmentation techniques is that we don’t have to specify the number of segments (clusters) because the algorithm itself finds the best number of clusters for the image. To start the MeanShift algorithm on a set of data points X (pixels in the image), we need:

  • A function N(x) which determines what are the neighbors of a point xX. The neighboring points are the points within a certain distance. The distance metric is usually Euclidean Distance.

  • A kernel function K(d) to use in Meanshift. K is usually a Gaussian Kernel, and d is the distance between two data points.

Now with the above functions, the process of Meanshift for a set of data points X follows these steps:

  1. 1.

    For each data point xX, find the neighboring points N(x) of x.

  2. 2.

    For each data point xX, calculate the mean shift m(x) from this equation:

    $$ m(x)=\frac{{\Sigma}_{x_{i}\in N(x)}K(x_{i}-x)x_{i}}{{\Sigma}_{x_{i}\in N(x)}K(x_{i}-x)} $$
  3. 3.

    For each data point xX, update xm(x).

  4. 4.

    Repeat 1. for n iterations or until the points are almost not moving or not moving.

3.2 Similarity measures

Several measures can be employed to compute the similarity for the proposed features of the framework.

3.2.1 Color Similarity

Color in segmentation is an important and straightforward feature. Each pixel in a color image is represented by a three-dimensional vector. We assume that the pixel intensity value of a given region follows a Gaussian distribution. Therefore, the distribution of a region Ri is given by: RiN(μi;vari) , where μi is the mean vector of the the pixel intensity computed in the three-dimensional color space in regions Ri, vari denotes the variance of Ri.

To measure the similarity between two distributions, various distance measures are proposed in the literature such as the Earth Mover’s Distance (EMD) [31], Kullback-Leibler (KL) Divergence [18], Mean Distance (MD), etc. We choose to use the Mean Distance (MD) because it is generally a good approximation of the Earth Mover’s Distance with a lower complexity. MD can be defined by the formula below:

$$ \text{D}_{\text{MD}}(\mathrm{R}_{\mathrm{i}};\mathrm{R}_{\mathrm{j}})= (\mu_{i}-\mu_{j})^{\mathrm{T}} (\mu_{i}-\mu_{j}) $$
(1)

To transform the color feature distribution distance to a similarity measure, we use a radial basis function kernel:

$$ {\mathrm{c}_{\text{ij}}= \exp \left( \frac{- \mathrm{D}_{\text{MD}}(\mathrm{R}_{i};\mathrm{R}_{\mathrm{j}})}{2\sigma^{2}} \right)} $$
(2)

where σ is a parameter defined by the user.

Various color space can be used such as RGB, YUV, L*a*b, HSV, etc. Choosing an appropriate color space to segment a color image is a crucial step in order to achieve a better segmentation performance. Thanks to its accordance with the human visual system [22], we choose the LAB color space. It’s a 3-axis color-opponent space with dimension L for lightness and A and B for the color opponent dimensions.

3.2.2 Texture similarity

Using only the color feature in the image cannot achieve a good segmentation result, because the color feature in some homogeneous object will decompose image regularities into different segments. Therefore, we propose a texture feature as a solution to remedy this problem. Many recent approaches use wavelet as features [38], other methods, such as [4], learn dictionaries of local structures from training images. In this work, we use a feature called Histogram of Oriented Gradients (HOG) which is well known in image processing and computer vision for detecting objects in the image. It computes the number of gradient orientation occurrences in localized parts in the image. To construct the Histogram of Oriented Gradients we proceed with the following steps:

  • We need first to calculate the horizontal and vertical gradients; after all, we want to calculate the histogram of gradients for a region image.

  • In the second step, the image region is splitted into small cells of size C × C pixels (C = 8). For each cell, the histogram of gradient directions is computed. The histogram is essentially a vector of 9 bins.

  • In the third step, we use a method called Block normalization to group individual cells into blocks and normalize them to ensure invariance to illumination changes. A Block is represented by a 2 × 2 cells so that each block has a size 2C × 2C of pixels (4 histograms).

  • Finally, the final feature vector is calculated for the entire region Ri, where the histogram of gradient vectors of blocks hc are grouped into a single HOG feature vector Hi:

    $$ H_{i}=[h_{1},...,h_{c}] $$
    (3)

Where hc denotes the histogram of gradient vectors of a block, c is the number of blocks inside a region Ri. To compute the similarity between two regions Ri and Rj, we use the cosine similarity measure as defined by the formula below:

$$ {\mathrm{t}_{\text{ij}}= \text{cos}(\mathrm{H}_{\mathrm{i}},\mathrm{H}_{\mathrm{j}})= \frac{{\mathrm{H}_{\mathrm{i}}^{\mathrm{T}}}\mathrm{H}_{\mathrm{j}}}{\|\mathrm{H}_{\mathrm{i}}\|.\|\mathrm{H}_{\mathrm{j}}\|} } $$
(4)

where ||.|| denotes the L2 norm, Hi, Hj are respectively the HOG vectors of the regions Ri and Rj.

4 Community detection

Community structure is a property of complex networks, which can be described as the gathering of nodes into communities such that there is a higher density of edges within communities than between communities themselves [12]. The identification of communities is quite useful because nodes belonging to the same community are more likely to share properties. Many algorithms have been proposed for extracting the community structure in networks. Newman [26] has defined a measure called the modularity, which is widely used for evaluating a partitioning of a network into communities:

$$ Q= {\Sigma}(e_{ii}-{a_{i}^{2}}) $$
(5)

Where eii denotes the fraction of network edges which are inserted into a community i, and \({a_{i}^{2}}\) denotes the fraction considering that edges are inserted randomly. The modularity value Q is between 0 and 1. A high value of the modularity means a strong community structure of the network. Another quality measure called stability Qs was introduced in [14] based on the clustered auto-covariance of a dynamic Markov process, which also measures the quality of a partition as a community structure. In order to choose the appropriate community detection algorithms for the built weighted region adjacency graph to extract communities, synthesis papers are used to find and then to assess community detection algorithms. From [29], [28], [19], the algorithms proposed by Ronhovde and Nussinov [32], Infomap [33], Fast greedy modularity optimization algorithm [11] and Louvain [5] are judged to be able for delivering a reasonable estimator of the number of communities for different size of networks and then, outperforms all the state of the art algorithms for detecting communities. Here, we present a brief description of each algorithm.

4.1 Fast multi-scale community detection algorithm using the criterion from Ronhovde and Nussinov (FMCDRN)

The algorithm is an improvement of the algorithm in [32] which is based on the minimization of the Hamiltonian of a Potts-like spin model, where the spin state denotes the belonging of the node in the community. To cover multiple community scales, from very small to very large, a resolution parameter is used. To identify relevant scales, the algorithm checks for each given value of the resolution parameter, the stability of the obtained partitions. This is done when we compute for the same resolution parameter, the similarity of partitions obtained, but by starting from different initial conditions. Peaks in the similarity spectrum represent relevant partitions. The algorithm is rather fast and its computation complexity is slightly superlinear in the number of edges of the graph. We refer to the method as RN in the next sections. The aim of the proposed framework is speed efficiency. To deal with that a greedy approach is used which exploits all the available information (i.e. input data and information computed as the algorithm runs).

4.2 Infomap

The algorithm [33] is based on a compression technique to define the information flow on networks. The algorithm uses random walks of a given length with a given probability of jumping for performing. Walks are considered as sequences of steps inside the community which are followed by a jump through a two-levels nomenclature based on Huffman coding. The two-levels are used to identify nodes in the community and communities from the network. The algorithm uses a coding strategy, where each node codeword is derived from the visit node frequency of an infinitely long random walk. This strategy leads to a compact representation of the walks. Authors showed that the optimal partitioning problem treated as finding the minimum description length for all the walks.

4.3 Fast greedy modularity optimization algorithm (FGMDO)

The algorithm [11] represents the fast version of a previous method proposed by Newman. It starts from a set of nodes that are initially isolated and added edges between them to construct the original graph iteratively. It produces at each step the greatest possible increase of the modularity value of Newman and Girvan. First, the algorithm starts with a number of communities N, each community contains a single node, the communities are repeatedly grouped together iteratively at each step, by choosing the set that results in the largest increase (or smallest decrease) in modularity value. The algorithm runs far more quickly. For networks that have a hierarchical structure with communities at many scales and sparse networks, the algorithm has essentially linear running time. This is not only an advanced technique but it’s a technique that has substantial practical implications, as it allows to study networks with a large number of nodes.

4.4 Louvain

The algorithm finds partitions of large graphs with high modularity value in short time and unfolds a complete hierarchical community structure for the graph [5]. It’s divided into two steps which are iteratively repeated. First, the algorithm starts with a weighted graph that contains N nodes and we assign to each community of the network one node. For each node i, the algorithm considers the neighbors j of i, and evaluates the gain of modularity when i and j are grouped into the same community. The node i is then moved to the community for which this gain is maximum, but only if this gain is positive. If no positive gain is possible, the node i stays in its current community. The process of grouping is applied iteratively for all nodes until no further maximization of the gain can be achieved. In the second step of the algorithm a new graph is constructed, whose nodes are now the communities found during the first step. Louvain offers a fair balance between the accuracy of the estimate of the modularity maximum and computational complexity. The output of the algorithm, therefore, provides several partitions. The partition found after the first step contains many communities that contain a small number of nodes. At subsequent steps, larger communities are found due to the iterative grouping mechanism. This process naturally leads to hierarchical decomposition of the graph.

5 The proposed framework

Due to the inherent properties of the image, segmentation and community detection problems are different. Using community detection algorithms only to segment an image by considering pixels as nodes on the graph, can leads to low performances. The failure of such method can be explained by several reasons. First, when we segment an image, pixels can have different properties, for example, different colors, but in community detection, nodes can share similar features. Second, we cannot take regularities and information for homogeneous segments from the image using just a single pixel. Third, compared with communities, images share some information, as an example, two adjacent regions belong probably to the same community. So, to address the mentioned problems, the proposed framework takes advantage of the inherent properties of the image and also the efficient optimization in modularity/stability using community detection algorithms. In this section, we describe the proposed framework steps so as the reader will have a global picture of the entire framework before delving into the details. We refer to Fig. 1 for the illustration of the steps of the proposed general framework. The details of each step and some technical points are explained in the next sections.

Fig. 1
figure 1

Flow chart of the proposed framework for an iteration

5.1 Initial segmentation

The initial segmentation process is very important to the function of the proposed framework. Because a single pixel cannot capture any information about texture, using regions instead of pixels capture the details among pixels (e.g texture) and prevent local variations to be lost. It showed that regions have the advantage to adapt themselves to the image structure, being larger where the color remains similar over a large area. Moreover, regions decrease the number of nodes in a graph abruptly from millions to thousands, hence, the computational complexity, without affecting the segmentation performance. For all these reasons we use an initial segmentation approach to provide very small regions of pixels that contain information and regularities and which will be used to compute the similarity between regions. In addition, the proposed framework allows using various methods that have been adopted to produce this over-segmentation.

5.2 Regions adjacency graph construction

Unlike conventional networks, images contain spatial a priori information compared to social networks or citation networks. Adjacent regions in the image are often considered as a single image segment, than other regions which are far away. So, we construct the RAG using the spatial a priori information of the image. Let G = (V,E) be an undirected graph, where viV is a set of nodes corresponding to image regions Ri. E is a set of edges connecting the pairs of neighboring nodes. In other words, an edge is considered between two nodes, if their corresponding regions are adjacent in the image. As shown in Fig. 2 the RAG is built after the initial segmentation, where each region in the image is considered as a node in the network.

Fig. 2
figure 2

The construction process of the RAG from initially segmented images, each region Ri in the image represents a node in the RAG

5.3 Weighting the RAG

In this work, we compute the weight using the similarity between regions, where the color and texture feature of the image are employed between two adjacent regions in the RAG. To compute the similarity matrix W (weighted RAG), we use (2) and (4), for measuring the similarity between each two adjacent regions. Then, we associate a weight between them. Unlike the proposed approach in [20] where authors built a graph by considering each node as a connected super-pixel according to a weight function, based only on color. In this work, the similarity is computed using a combination of the LAB and HOG features. We refer to [35] for the construction of the similarity matrix (weighted RAG). In [35], authors use a hybrid model that combine two features. In this work we choose the texture (HOG) and the color (LAB) features as defined in the equation below:

$$ \begin{array}{@{}rcl@{}} {\mathrm{W}=\mathrm{w}_{\text{ij}}= \mathrm{a}\times \sqrt{\mathrm{t}_{\text{ij}}\times \mathrm{c}_{\text{ij}}}+(1-\mathrm{a})\times \mathrm{c}_{\text{ij}}; }\\ {(\mathrm{i},\mathrm{j})= 1,..,\mathrm{n} } \end{array} $$
(6)

Where n denotes regions number and a is a balancing parameter.

5.4 Extracting communities from the network

In this step, from the weighted RAG, we extract communities using community detection algorithms. To find the best partition of the network that gives a maximum modularity or stability, several algorithms have been proposed in the literature. Unlike the proposed approaches in Li et al. [34] and Abin et al. [1] which use community detection algorithms that have a high computational cost, and does not always produces the best segmentation, such as Newman-Fast algorithm and Modularity optimization. The proposed framework uses efficient community detection algorithms [5, 11, 32, 33] which strike the best balance between the computational cost and segmentation performance. Moreover, the proposed framework allows using any future existing community detection algorithms in this step.

figure d

5.5 Merging process

In this step, an iterative process is used to construct the similarity matrix, and to recalculate weights between regions at each iteration according to the (10). Because when we use community detection algorithms, regions keep expanding at each iteration, and the similarity measure given by the previous iteration may be is not suitable for the current iteration. So, updating the weighted RAG at each iteration allows reevaluating weights between regions. This process avoids many small regions in the image which should be merged on the same community according to the perspective of the human visual system.

The algorithm of the proposed framework can be summarized in Algorithm 1.

6 Data and methods

6.1 Data

In order to check the performance of the proposed framework in comparison with other alternative methods, manual segmentation labeling of a database is required. Thanks to its availability, the publicly available Berkeley Segmentation Data Set 500 (BSDS500) [2] was used to evaluate the performance of the proposed framework, BSDS500 contains 500 natural images, including 200 images for training, 200 images for testing and 100 images for validation. Boundaries are labeled for each of the 500 images of size 481×321 by several workers and are averaged to form the ground truth. Figure 3 shows BSDS500 images for different categories, with their ground truths segmentation. As for the evaluation of segmentation performance, we use three different metrics, i.e., the Probability Rand Index (PRI), Variation of Information (VOI), Precision, Recall and F-measure. These metrics have different characters, for example, the PRI tends to over-segmentation, while VOI encourage under-segmentation. Therefore, an overall consideration of these three metrics is necessary and reasonable. Note that, good segmentation corresponds to high PRI, Precision, Recall and F-measure value, but low VOI value. All algorithms are implemented in Matlab and are carried out on 4 GB of RAM and a 2.60 GHz processor.

Fig. 3
figure 3

BSDS500 images for different categories. For each category, Line 1: Original images. Line 2: Ground truths segmentation

6.2 Methods

To investigate the efficiency of the proposed framework, we study first the influence of some parameters used computing the similarity between regions. We perform first experiments on 100 images of BSDS500 using empirically several values of the balancing parameter a to find out the appropriate value that achieve the best segmentation results.

Second, a comparison between image features (Color and Texture) is done. In this regard, we examine the performance of the framework quantitatively as well qualitatively, in cases where texture alone, color alone and color-texture are used in the segmentation process.

Third, in order to choose the appropriate initial segmentation algorithm for the framework, we run some experiments in 100 images of BSDS500 and we compare the average values of the PRI, VOI, Precision, and Recall for each algorithm. Then, we choose the algorithm which ensure the best segmentation results.

We study next, the influence of community detection algorithms. We consider that all images in BSDS500 are classified into four categories, namely, people, urban scenery, animals and natural scenery. We take for each category five randomly chosen images with their ground truths, and we compare the results of four efficient algorithms qualitatively as well quantitatively using the average value of PRI, VOI, Precision and Recall metrics for each category.

Finally, with the appropriate parameters and methods discussed previously, we perform a qualitative and quantitative comparison of the framework with alternative methods: Li et al. [34], Abin et al. [1], Lossy Compression (LC) [42] and EDISON [8]. In Li et al. [34] and Abin et al. [1], we preserve the same parameters used by authors. In EDISON [8] method which is based on the mean shift implementation in both boundaries extraction and noise filtering scheme, the main parameters of EDISON is the minimal region size. So, we set the parameter value to 1000, to avoid the creation of small regions. In Lossy Compression (LC) [42] we use the Gaussian Mixture Model to fit the image textures, and for finding the optimal segmentation we employ the principle of Minimum Description Length, that produces the minimum coding length under a certain distortion ratio. We use the distortion rate 𝜖= 0.2.

6.3 Evaluation metrics

For our evaluation, we investigate for the quantitative evaluation, the Probabilistic Rand Index (PRI) [37] and the Variation of Information (VOI)[23] which are a well-known evaluation metrics for segmentation. The PRI measures the probability that a segmentation and its ground truth have matched labels in the two partitions. The larger the value is, the more the similarity between the two segmentation is. The PRI range is in [0,1].

VOI metric measures the sum of information gain and information loss between two segmentations. The VOI metric is nonnegative, the more is lower the more the similarity is greater. It’s defined by the formula below:

$$ {\text{VOI}(\mathrm{C},\mathrm{C}^{\prime}) = \mathrm{H}(\mathrm{C}) + \mathrm{H}(\mathrm{C}^{\prime})\textbf{- } 2\mathrm{I}(\mathrm{C},\mathrm{C}^{\prime})} $$
(7)

where H(C) and H(C’) denotes the entropy of the two segmentation C and C’ respectively and I(C,C’) denote the mutual information of C and C’. The metric range is [0,], and the smaller the value is, the more similar the two segmentations are.

We also evaluate the performance of the proposed framework from two aspects: Precision and Recall. These two measures are attractive as measures of segmentation quality because they are sensitive to under and over-segmentation, under-segmentation leads to low recall scores, while over-segmentation leads to low precision scores.

The Precision measures the fraction of detected boundary pixels which match the ground-truth boundaries is defined as:

$$ {\text{Precision}= \frac{|\mathrm{S}_{\text{test}}|\cap |\mathrm{S}_{\text{gt}}|}{|\mathrm{S}_{\text{test}}}} $$
(8)

where Sgt is the ground truth segmentation and Stest the testing segmentation and |S| denotes the boundary pixels number in the segmentation S.

The Recall computes the percentage of ground-truth boundary pixels that are detected, is defined as:

$$ {\text{Recall}= \frac{|\mathrm{S}_{\text{test}}|\cap |\mathrm{S}_{\text{gt}}|}{|\mathrm{S}_{\text{gt}}|}} $$
(9)

Fα-measure is a quality measure based on Recall and Precision only, which measure the harmonic mean of the Precision and Recall, is defined as:

$$ {\text{F-measure}= \frac{\text{Precision} . \text{Recall}}{(1-\alpha). \text{Recall}+ \alpha . \text{Precision}}} $$
(10)

For all next experiments we set α = 0.5.

7 Experimental results

We have discussed the proposed framework but so far not shown any results. For the sake of completeness and illustration, in this section, the performance of the proposed framework is assessed qualitatively as well as quantitatively by providing some experiments.

7.1 Influence of similarity measures

7.1.1 Adjusting the balancing parameter a

As mentioned in Section 6.3, during each iteration, we use a combination of the LAB feature and the HOG texture feature to compute the similarity between regions, using the (10), where a is a balancing parameter. If a = 0, it means that the texture information is not considered. We can observe in Fig. 4 that the wheel of the vehicle and its upper are well encoded into the similarity. With increasing the value of a, more such information patterns are encoded into the similarity, thus better preserves the regularities. However, if a is too large in our case a = 0.8, the wheel and the upper of the vehicle in the image are merged into one segment. We run the framework to determine the best value of the parameter a, we vary a from 0 to 1 with 0.2 intervals because when we step by 0.1 or 0.05 we can’t observe any change in the segmentation result. Results show that a = 0.4 gives the best performance in term of both metrics PRI and VOI. In all next experiments, we use a = 0.4.

Fig. 4
figure 4

Image segmentation with various values of the balancing parameter a by FMCDRN

7.1.2 Comparison between features of similarity

An important issue for the proposed framework is to evaluate the influence of the color(LAB) and texture(HOG) information in the segmentation process. We compare the performance of the proposed framework in cases where HOG alone, LAB alone and the combination HOG+LAB are used in the segmentation process. Our experiments was tested in 100 images of Berkeley segmentation dataset using meanshift as an initial segmentation and FMCDRN as a community detection algorithm. The balance between the texture and color is performed by the weight wij in (10) and to obtain the texture and color alone segmentation we override the balancing parameter a with manual settings (i.e. a = 0 for color alone segmentation and a = 1 for texture alone segmentation). Since in BSDS500, there are multiple segmentation maps of the ground-truth, 5 segmentation maps for each image, in our experiment the mean value of the computed metrics is used between all the segmentation maps for each image and the segmentation result. As illustrated in Table 1 which presents the average values of the PRI, VOI, Precision and Recall for 100 images, we can notice that texture and color alone results are generally inferior to results obtained from the combination texture and color for all metrics.

Table 1 Quantitative comparison between texture and color features

Figure 5 shows sample visual results for the segmentation of three images of BSDS500. We can observe that using only one of the proposed features leads to severe performance degradation, for example in the first image it is clear that with color or texture only used in computing the similarity, the visually coherent pyramid together with the desert are broken, the same thing for human face. Because even with properly chosen distance parameter, using only one of the feature, color or texture, breaks the regularities inside the object and leads to over-segmentation. In contrast, the combination of these two features leads to better visual performance and well preserves these regularities.

Fig. 5
figure 5

Comparison of segmentation results: a) Original image; b) HOG feature; d) LAB feature; c) HOG with LAB

7.2 Influence of the initial segmentation algorithms

To choose the appropriate initial segmentation that ensure the best segmentation results, experiments have been conducted using using two initial segmentation algorithm to find out the appropriate one. The parameters for the Superpixels algorithm were chosen to give a number of regions in the same range as those for the Meanshift algorithm. For each image we compute the value of PRI, VOI, Precision, and Recall, then we average all these metrics for 100 images of BSDS500. Table 2 reports the average values of the PRI, VOI, Precision, and Recall of all images for Meanshift and Superpixels algorithms. We can notice that the Meanshift approach is more appropriate than the super-pixels for all the four metrics. Motivated by the limitations exposed in the cited approaches for segmentation, we want that the proposed framework takes the time complexity aspect into consideration. To achieve that, we compare the run time of each algorithm. Results show that Meanshift runs faster than superpixels, about 2.5 faster than superpixels. For all these reasons, Meanshift is chosen as an initial segmentation for the next experiments.

Table 2 Quantitative comparison between Meanshift and Superpixels

7.3 Influence of community detection algorithms

In this section, we compare the proposed community detection algorithms, to choose the best of them for the next comparison with alternative methods.

In the first qualitative evaluations experiment, as shown in Figs. 678 and 9, for each category Animals, People, Natural Scenery and Urban Scenery, we test the proposed framework for each community detection algorithms. The results produce sizable regions and give much better results for all selected image for each category, for example, the human face in the people category, animals in natural scenery category, the castle in urban scenery category and mountains in natural scenery category. As observed in figures the Infomap algorithm doesn’t always gives the best segmentation and underestimates the number of communities even though is faster than other community detection algorithms. In addition, we can observe from the figures that the FMCDRN algorithm gives the best segmentation of the image.

Fig. 6
figure 6

Segmentation results of the framework with the proposed community detection algorithms, for images from animals category, Line 1:Original image; Line 2: FMCDRN algorithm; Line 3: FGMDO algorithm; Line 4: Louvain; Line 5: Infomap

Fig. 7
figure 7

Segmentation results of the framework with the proposed community detection algorithms, for images from people category, Line 1:Original image; Line 2: FMCDRN algorithm; Line 3: FGMDO algorithm; Line 4: Louvain; Line 5: Infomap

Fig. 8
figure 8

Segmentation results of the framework with the proposed community detection algorithms, for images from natural scenery category, Line 1:Original image; Line 2: FMCDRN algorithm; Line 3: FGMDO algorithm; Line 4: Louvain; Line 5: Infomap

Fig. 9
figure 9

Segmentation results of the framework with the proposed community detection algorithms, for images from urban scenery category, Line 1:Original image; Line 2: FMCDRN algorithm; Line 3: FGMDO algorithm; Line 4: Louvain; Line 5:Infomap

We also assess the performance of the proposed framework with the four segmentation techniques quantitatively. Tables 345 and 6 present the average values of the PRI, VOI, Precision and Recall for each category. Again, the results of the FMCDRN algorithm show their efficiency for the image segmentation task in term of PRI/VOI/Precision/Recall/F-measure.

Table 3 Quantitative comparison between community detection algorithms used in the proposed framework on animals category (Bold symbols means the best value of the measurement metric)
Table 4 Quantitative comparison between community detection algorithms used in the proposed framework on people category (Bold symbols means the best value of the measurement metric)
Table 5 Quantitative comparison between community detection algorithms used in the proposed framework on natural scenery category (Bold symbols means the best value of the measurement metric)
Table 6 Quantitative comparison between community detection algorithms used in the proposed framework on urban scenery category (Bold symbols means the best value of the measurement metric)

7.4 Comparison with the alternative methods

First, we perform a qualitative comparison of the framework with some well-known state of the art segmentation methods: Li et al. [34], Abin et al. [1], Lossy Compression (LC) [42] and EDISON [8]. We choose FMCDRN algorithm for the proposed framework in comparison, because it gives the best image segmentation as shown previously. Figure 10 shows that LC, EDISON shows the different extent of over-segmentation by resulting many small regions, and also by breaking information and regularities in some homogeneous regions of the image, compared to the proposed framework, which preserves information and regularities in the segmented image and produces sizable homogeneous regions, also the framework has the best performance as Li et al. [34] and Abin et al. [1]. Results indicate the superiority of the proposed framework over other methods. We compute next the average values of the PRI, VOI, Precision and Recall for all images. As shown in Table 7 the framework gives a high value and better results for the image segmentation task, compared to all well-known segmentation algorithms EDISON, LC, Li et al. [34] and Abin et al. [1] in term of PRI/VOI, and also have a close performance to human visual perception with PRI= 0.828 and VOI= 1.695. Also, the Precision, Recall and F-measure of the proposed framework with FMCDRN algorithm obtain the highest values with Precision= 0.788, Recall= 0.621 and F-measure= 0.694 compared to the other algorithms which indicate that most of our segmentation has consistent labels with the ground-truth segmentation in BSDS500. As a conclusion, we can say that the proposed framework achieves better performance in terms of Precision, Recall, and F-measure compared with other state of the art algorithms.

Fig. 10
figure 10

Comparison of segmentation results of all algorithms, Line 1:Original image; Line 2:EDISON; Line 3: LC; Line 4: Abin et al. [1]; Line 5:Li et al. [34]; Line 6: Our Framework + FMCDRN algorithm

Table 7 Quantitative comparison between different algorithms on all Berkeley dataset images (Bold symbols means the best value of the measurement metric)

7.5 Computational Time

We compare the running time between our proposed framework with FMCDRN algorithm and Li et al. [34] and Abin et al. [1]. Each algorithm is tested over 100 validation images of Berkeley dataset, and then for each step (Initial segmentation, Graph generation, and Community detection), we compute the mean running time. It can be observed in Table 7 that the proposed framework runs consistently faster than Li et al. [34] and Abin et al. [1], specifically, about 2.5 times faster than Abin et al. [1], and 5 times faster than Li et al. [34]. As a conclusion, we can say that the proposed framework yields better results with shorter processing time than other algorithms of the stat of the art (Table 8).

Table 8 Computational time obtained in the segmentation of algorithms (in unit of seconds) (Bold symbols means the best value of the measurement metric)

8 Conclusion

This paper proposed a framework for image segmentation which takes advantages of the inherent properties of images and the optimization of modularity/stability. Efficient community detection algorithms are used to optimize modularity/stability, as FMCDRN, FGMDO, and Louvain. All these algorithms can detect automatically the number of the region in the image. By using both, Histogram of Oriented Gradients (HOG) texture feature and color feature, the similarity matrix is constructed adaptively between different regions by optimizing the modularity/stability and merge adjacent regions iteratively. If no change occurs in community structure when we apply community detection algorithms, the optimal segmentation is achieved. Our experiments have shown that the proposed framework gives a best qualitative segmentation result, as proved in the figures and achieve the best performance quantitatively compared to all state of the art methods in terms of PRI, VOI, Precision, and Recall. Since, the general framework based on three efficient community detection algorithms, it avoids the problem of having many small regions in the image and preserves information and regularities in the object. In addition, it provides a good time complexity and runs consistently faster than the state of the art algorithms [25, 40].