Abstract
In recent years, superpixels have become a prevailing tool in computer vision and many methods have been proposed. However, due to the problems such as high time complexity, low object boundary adherence and irregular shape, only a few methods are widely used. To improve these issues, we propose a novel general superpixel segmentation method called minstpixel, which relies on energy functional minimization. Minstpixel introduces an energy functional based on minimal spanning tree and designs a strategy to gain the global optimum. It never needs sophisticated optimization scheme, complicated mathematical deduction or fussy iteration process. At the same time, the time complexity of minstpixel is approximately linear with respect to the number of image pixels. The benchmark on Berkeley segmentation database shows that minstpixel could rival state-of-the-art in every aspect.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Superpixels are usually defined as perceptually meaningful atomic regions [1] that can not only effectively capture image features but also greatly reduce the number of entities to be processed in subsequent image processing tasks, such as image segmentation, saliency detection [2, 3], contour closure [4], sketches extraction [5], object location [6], object tracking [7], 3D reconstruction [8], and many others [9].
According to previous work [10,11,12], we sum up five important properties of superpixel.
-
1.
Connectivity: Each superpixel is an individual connected region, all of the superpixels constitute the whole image, and superpixels do not overlap each other.
-
2.
Homogeneity: Pixels in the same superpixel should present same or similar visual features, especially in color and texture.
-
3.
Adhesion: The boundary of superpixels should strongly adhere to the boundary of objects in an image. In other words, the set of objects’ boundary is a proper subset of the set of superpixels’ boundary.
-
4.
Regularity: The regular arrangement, similar size, and the same shape are necessary, which provides more convenience for subsequent analysis. In the past, regularity is always ignored and devalued.
-
5.
Complexity: Algorithm complexity includes time complexity and space complexity. As an image preprocessing technique, linear time complexity and linear space complexity are necessary.
Among the above properties, connectivity is the essential qualification for image segmentation; homogeneity and adhesion are the premise and foundation for the accuracy of subsequent image analysis tasks; regularity is a convenience to application developers and helps reduce the trouble of development. Unfortunately, adhesion and regularity are a pair of contradictions, while homogeneity and efficiency are another pair of contradictions. Most superpixel segmentation methods attempt to strike a balance among these contradictions. Besides these four properties, computation efficiency is also an important qualification obviously. Superpixels are typically used to accelerate complex image understanding and analysis process. Therefore, the superpixel generation algorithm itself should be simple, straightforward to implement and of linear complexity, otherwise it will not be useful.
Each superpixel segmentation method is a compromise, and no method is perfect. For example, ERS is good at adhesion but weak in regularity, while SLIC is good at regularity but weak in adhesion [13,14,15]. Minstpixel devotes itself to strike a balance among the five properties. To meet this goal, we regard the superpixel segmentation problem as an energy minimization problem then design an energy functional based on the minimal spanning tree, and finally develop a greedy optimization strategy to optimize it. Unlike previous work, minstpixel never needs sophisticated optimization schemes, complicated mathematical deductions or a fussy iteration process. In benchmark, we implement minstpixel with approximately linear time complexity, which has achieved state-of-the-art performance on homogeneity, adhesion, regularity, and efficiency. Additionally, the Appendix mathematically proves that the greedy optimization strategy can exactly achieve the optimal value of the proposed energy functional. The drawback of minstpixel we have noted is that an individual pixel has the potential to be an independent superpixel, especially when there are too many superpixels. To overcome the drawback minstpixel merges the individual pixel into neighboring superpixels in a post-processing. Both theoretical analysis and experimental results demonstrate that minstpixel is effective and practical (Fig. 1).
2 Related works
Scientists have developed more than twenty distinctive superpixel generation methods. We selectively pick out seven methods for the contrastive experiments, and each chosen method is recognized as an excellent one [10]. They are SLIC(2012) [10], Ncut(2000) [16], ERS(2012) [17], Quick shift(2008) [18], Watersheds(1991) [19], Waterpixel(2015) [20], SEEDS(2012) [21], LSC(2017) [22], Manifold SLIC(2016) [1] and FH(2004) [23]. We will make a comparison between minstpixel and these seven methods in the experiment section of this article. Watershed transformation [19] proposed by Vincent and Soille is a pioneer of superpixel segmentation algorithms. It automatically seeks minimal gradient pixels as seed points and gradually extends these points to superpixels. The latest improved watershed transformation is waterpixel which achieves a tunable tradeoff between the superpixel regularity and the adherence to object boundary. Simple linear iterative clustering (SLIC) clusters pixels of input image to k superpixels using modified K-means algorithm. Due to its simplicity and high performance, SLIC is widely used. LSC and MSLIC are the last improved scheme of SLIC. MSLIC extends SLIC to content sensitive superpixels which means small superpixels are in dense regions and large superpixels are in sparse regions [1]. LSC further enhances the performance of SLIC using the conclusion that the objective functions of the weighted K-means and the normalized cuts share the same optimum points by mapping each point to specific feature space [22]. To generate superpixels, Ncut [16] recursively partitions a given graph by minimizing a cost function defined on the edges between partition boundary [10]. Mean-shift [24] and Quick-shift [18] are mode-seeking algorithms that generate superpixels by recursively moving to the kernel smoothed centroid for every data point in the pixel feature space [10]. SEEDS [21] extracts superpixels by minimizing a cost function defined by color distribution term and boundary term using hill-climbing method. Except the chosen methods above, ERGC [25], ETPS [26], Vcell [27], VCCS [28], PF [29], TPS [30], DASP [31], superpixel lattice [12] and Turbopixels [32] are meaningful methods as well.
3 Minstpixel
This section provides a detailed description of minstpixel. In this section, we first show a method to represent a superpixel via minimal spanning tree (MST), next propose an approach to construct energy functional based on MST, then introduce a strategy to get the global optimal solution of the energy functional, and finally analyze the time complexity.
3.1 Requisite knowledge
Graph representation
An undirected graph is a tuple \(\mathrm {G}(\mathcal {V},\mathcal {E})\), where \(\mathcal {V}\) and \(\mathcal {E}\) are vertex set and edge set, respectively. In edge set \(\mathcal {E}\), each edge e(vi, vj) consists of a pair of vertices vi, vj and a weight w(vi, vj).
Connected component
A connected component is a subgraph in which any two vertices are connected to each other by a path which only consists of vertices in this subgraph. Besides, vertices in two distinct subgraphs are not connected. Any undirected graph consists of one connected component at least.
Minimal spanning tree
A spanning tree T of an undirected graph G is a tree that includes all of the vertices of G. A minimal spanning tree (MST) of a graph is a spanning tree whose weight (the sum of the weights of its edges) is no larger than any other spanning tree. Figure 2 shows an example of minimal spanning tree. Figure 2a and c are undirected connected graphs where each vertex has a pixel value and each edge has a weight representing the difference between two adjacent vertices. Figure 2 and d are the minimal spanning trees of Fig. 2a and c respectively. As we can see, the weight of minimal spanning tree in Fig. 2c is smaller than that in Fig. 2d. This means vertices in Fig. 2a are more homogeneous than these in Fig. 2c.
Graph partition
A graph partion \(\mathcal {S}\) refers to a division of the vertex set \(\mathcal {V}\) into disjoint connected components \(\mathcal {S}=\{S_{1},S_{2},\cdots ,S_{k}\}\) such that Si ∩ Sj = ∅ for i≠j and \(\bigcup _{i} S_{i} = \mathcal {V}\). In other words, the graph partion aims to turn a graph \(\mathrm {G}(\mathcal {V},\mathcal {E})\) into a new graph \(\mathrm {G}(\mathcal {V},\mathcal {A})\) which consists of k connected components by removing some edges from edge set \(\mathcal {E}\). It can be restated as a subset selection problem. The goal of subset selection problem is to select a subset of edges \(\mathcal {A} \in \mathcal {E}\) such that the resulting graph \((\mathcal {V},\mathcal {A})\) consists of k connected components.
3.2 Construct graph
We now associate a weighted undirected graph \((\mathcal {V},\mathcal {E})\) with an image I. Each vertex \(v_{i} \in \mathcal {V}\) corresponds to one pixel I(xi, yi) in image I. If pixel I(xi, yi) and pixel I(xj, yj) are neighbors in image I, there is an edge \((v_{i},v_{j}) \in \mathcal {E}\) between vertice vi and vertice vj. Weight w(vi, vj) is a non-negative measure of the distance between pixel I(xi, yi) and I(xj, yj), usually based on the differences in intensity, color, texture or some other local attributes. In our experiment, distance between pixels is determined in CIElab color space which is widely considered as perceptually uniform for small color distances [10], as shown in (1). From the viewpoint of graph theory, superpixels generation problem can be restated as the graph partition problem. In graph partition problem, people manage to divide graph \((\mathcal {V},\mathcal {E})\) into a group of smaller separated connected components, all the connected components form a new graph \((\mathcal {V},\mathcal {A})\). The graph \((\mathcal {V},\mathcal {A})\) corresponds to a segmentation \(\mathcal {S}\), and each connected component ci corresponds to a superpixel si in segmentation \(\mathcal {S}\). There are various criteria to describe the quality of a segmentation but almost all criteria support that the pixels in the same superpixel should share as many of the same features as possible, while the pixels in the distinct superpixels should share as less of the same features as possible. This means that the lower the weight of edges in \(\mathcal {A}\) and the higher weight of edges in \(\mathcal {E}-\mathcal {A}\), the better the segmentation result. Any partition of a graph can be determined by the edge set \(\mathcal {A}\) or \(\mathcal {E}-\mathcal {A}\). So, there are two ways to get the partion. One method, such as Ncut, is to manage to get the set \(\mathcal {E}-\mathcal {A}\), that is to say, some chosen edges will be removed from the graph \((\mathcal {V},\mathcal {E})\) [33]. The other method, such as minstpixel, is to manage to get the edge set \(\mathcal {A}\).
3.3 Energy functional
From the viewpoint of graph theory, minstpixel regards a superpixel as an individual connected component and defines the internal similarity measure as the weight of minimal spanning tree. It is
In principle, this measure is problematic since it depends on the size of ci. It leads to that D(ci) can not compare to each other directly because their size may be unequal. However, we have found this measure works quite well in practice because minstpixel tends to generate superpixels with little differences in size. Although some definitions are more robust to outliers, they make the optimization strategy NP hard.
For a graph \(\mathrm {G}(\mathcal {V}, \mathcal {A})\) (or corresponding superpixel segmentation \(\mathcal {S}\)), we define the similarity measure as the sum of all superpixels. It is,
3.4 Constraint condition
Energy functional (3) has at least two obvious defects.
-
It can not provide any way to control the number of superpixels explicitly.
-
It can not allow us to explicitly adjust the distribution of superpixels.
In order to ameliorate these defects, we define a seed point set \(\mathcal {P}\) containing k seed points. Meanwhile, we impose that each superpixel must have one and only one seed point. If seed points are evenly distributed across the whole image, the superpixels will be distributed across the whole image evenly. More formally,
Obviously, the minimal value of (4) corresponds to the optimal superpixel segmentation.
3.5 Initial seed points placement
In minstpixel each superpixel grows up from a seed pixel, thus it is important to start from a good initial seed point set. SLIC [10], Turbo Pixels [32] and some other methods use regular grid points as seed points, but minstpixel considers that the center of each homogeneous region is the optimum position for seed. This seed point set makes the initial energy lower than regular grid points, so minstpixel can achieve better segmentation result.
To locate these points heuristically, minstpixel employs (6) to describe edge distribution because it performs well when searching for edges of the image [34].
Eq.5 describes the calculation of the normalized gradient, where Gσ, ∥∇I∥ and γ are the Gaussian function with standard deviation σ, the gradient magnitude of color image I and the parameter to reduce the noise disturbance respectively. In (6), the \(\overline {\mathrm {E}(x,y)}\) is the mean of E(x, y), and the Z is the regularized factor making sure that the integral of D(x, y) is 1. D(x, y) can be regarded as a probability that (x, y) lies on edges of objects.
Furthermore, we suppose that point set \(\mathcal {P}^{\prime } = \{ (x^{\prime },y^{\prime })\}\) contains k grid points which have the same span \(\sqrt {m/k}\), where m is the total pixel number of the image and k is the number of superpixels. Our initialization method can be represented as a map \(\mathcal {P^{\prime }}\rightarrow \mathcal {P}\). Equation (7a, 7b) quantitatively describes the relationship between \(\mathcal {P}^{\prime }\) and \(\mathcal {P}\).
From (7a), xi and y are in one-to-one relationship, then (7a) is a curve extending in direction y. Similarly, (7b7b) is a curve extending in direction x. The intersection of the two curves is the seed point (xi, yi). If there are multiple points of intersection, minstpixel will choose the one with the lowest D(x, y).
3.6 Energy minimization
Mathematically, the problem that we are facing is a typical combination optimal problem. It’s almost impossible to find any analytical solution with current technologies. Instead, we craft a greedy method to gain reasonable superpixel segmentation. Our method begins by sampling k seed pixels and then gradually extend to all pixels based on a greedy rule. Then, we have proved that the optimum point can be reached by this method using mathematical induction in the Appendix.
We now have established a segmentation S obeying energy functional (4). Figure 3 shows the complete process of algorithm 1. It is inescapably clear that the result graph includes m vertices, m − k edges and k connected components, and each component corresponds to a superpixel. In the sequel we show that algorithm 1 converges exactly to the global optimal solution of (4), although it makes greedy decisions in each iteration.
3.7 Implementation and complexity
Algorithm 1 maintains the segmentation \(\mathcal {S}\) using a disjoint set with path compression and union by rank. This disjoint set can finish n merge operations and m search operations in O(mα(n)) time, where mα(n) is inverse Ackerman’s function which is an approximate linear function [35]. Step 1 sorts m weights into nondecreasing order. In general, it can consume O(m log m) time using Quick Sort [35]. Step 2 and Step 3 find and merge eligible components. There are m − k merge operations and up to m search operations, so it consumes O((m − k)α(m)) time. All in all, the time complexity of algorithm 1 is O(m log m + (m − k)α(m)).
4 Experiments
We compare minstpixel with ten state-of-the-art algorithms, namely SLIC(2012) [10], Ncut(2000) [16], ERS(2012) [17], Quick shift(2008) [18], Watersheds(1991) [19], Waterpixel(2015) [20], SEEDS(2012) [21], LSC(2017) [22], Manifold SLIC(2016) [1] and FH(2004) [23], using publicly available source codes on the Berkeley Segmentation Database [36]. All benchmarks are conducted on a desktop PC equipped with two intel xeon E5-2620 v4 processors and 16GB of memory.
4.1 Adhesion
The adhesion refers to the ability to adhere object boundary. Superpixels generally serve as the foundation of object detection, so adhesion is a key evaluation criterion. In order to evaluate adhesion, predecessors propose boundary recall against the number of superpixels (BR). BR is the fraction of ground truth boundary that fall in the superpixel boundary as (8). A ground truth boundary pixel is counted if it falls within strictly less than 2 pixels from superpixel boundary.
Where TP (True Positives) is the number of boundary pixels that fall in the superpixel boundary, TN is the number of boundary pixels that fail to fall in the superpixel boundary. A high BR indicates that the majority of true boundary are preserved. However some previous algorithms also achieved high BR in benchmark, because these algorithms tend to generate tortuous boundary which result in high boundary density. For example, boundary density of FH significantly exceeds minstpixel, so FH has more chances to hit true boundary than minstpixel, as shown in Fig. 6. It is unfair to the method tending to generate smooth boundary. Machairas and Kalinin have noticed this problem and propose boundary density against boundary recall to evaluate adhesion of superpixels [11]. However, boundary density against boundary recall still ignores the influence of the quantity of superpixels. In order to take both the quantity of superpixels and the density of boundary into account, we define normalization boundary recall as NBR = BR/C, by introducing the penalty factor C which is proportional to boundary density. Figure 4a demonstrates the results of benchmark.
4.2 Homogeneity
Homogeneity refers to the similarity of the pixels from the same superpixel. Superior homogeneity means that a superpixel can only cover one object. The homogeneity benchmark consists of two common evaluation criteria: under segmentation error (USE) and achievable segmentation accuracy (ASA). USE measures what extent superpixels stretch over the ground truth segment boundary. A lower USE indicates that fewer superpixels cross the boundary of objects. USE is defined as \(\left [ {\sum }_{S} \left ({\sum }_{P:P \cup S \neq \emptyset } \min (|P_{in}|,| P_{out} | ) \right ) \right ]/N\), where N is the number of pixels in image, |Pin| is the number of pixels in part Pin and |Pout| is the number of pixels in part Pout. A factitious case of USE is shown in Fig. 5f, in this case N is (|Aout| + |Bin| + |Cin|)/|S|. ASA measures the upper bound of accuracy when image objects are reconstructed using a given set of superpixels as units. Given a ground truth segmentation G and a superpixel segmentation S, achievable segmentation accuracy is defined as \(\left ({\sum }_{k} \max _{i} \mid S_{k} \cap G_{i} \mid \right ) / {\sum }_{i} \mid G_{i} \mid \).
4.3 Regularity
The regularity is always devalued but it is an important indicator. On the one hand, irregular boundary will contribute to the false high performance in benchmark. On the other hand, irregular shape will cause trouble for subsequent image analysis (Fig. 6).
The regularity contains two items [20]: (1) the smoothness of boundary (2) the shape and size similarity of superpixels. The smoothness is evaluated by the boundary density which is defined as a ratio of the number of superpixel boundary pixels to the total number of pixels in the image. The shape and size similarity is evaluated by average standard deviation of size. Figure 7 shows the benchmark result in regularity.
4.4 Efficiency
Obviously, time complexity is also a crucial evaluation criterion for superpixel segmentation algorithm. Table 1 and Fig. 8 shows the theoretical time complexity and actual run time on the Berkeley database.
4.5 Application
In this section, we display one typical computer vision application that benefit from superpixels: object segmentation. Minstpixel reduces the number of entities to manipulate and improve the result of existing algorithm. The task is to recognize 21 classes of objects from MSRC database using Ada-Boost classifiers and the appearance features are based on the work of Gould et al [9]. For each superpixel si, we construct an 83 dimensions feature vector ϕ(si) to describe region’s size, location, color, shape and texture feature.
It specifically includes:
-
Size and location (1 + 2)
-
The ratio of area to circumference and inertia moment (1 + 2)
-
The mean, standard deviation, skewness and kurtosis statistics of RGB and Lab color space components (4 × (3 + 3))
-
The mean, standard deviation, skewness and kurtosis statistics of texture feature drawn from 13 filters responses (4 × 13)
In addition, we add the weighted mean of adjacent superpixels’ feature vector to each superpixel as (9),
where \(\mathcal {N} (\mathbf {s}_{i}) = \left \{ S_{j} | (\mathbf {s}_{i},S_{j}) \in \epsilon (\mathcal {I}) \right \}\) is the set of superpixels which are adjacent to si and ∣Sj∣ is the area of superpixel Sj. We train a one-vs-all Ada-Boost classifier \(\sigma _{c^{\prime }}\) (c′ is the class of object) for each object and then normalize over all classifiers to get the probability \(P(c_{i}=c^{\prime } \mid \mathbf {s}_{i}, \mathcal {I} )\) as (10)
where σc is the Ada-Boost classifier for class c. Finally, let \(c_{i} = \mathop {\arg \max }\limits _{c^{\prime }} P(c_{i} = c^{\prime } \mid \mathbf {s}_{i} , \mathcal {I} )\), then ci is the class of superpixel si.
Compared with the original method, the accuracy is increased by 1.8% and the running time is reduced by 99%, as show in Table 2.
4.6 Discussion and analysis
We can draw many meaningful conclusions from experiment results. Firstly, none of superpixel segmentation algorithms can really defeat all other algorithms comprehensively. Each algorithm is a compromise between performance and time consumption. The increase in performance is usually accompanied by decrease in efficiency or other properties. When performance reaches a certain level, time consumption will increase exponentially. For instance, ERS leads minstpixel by 1% in performance but ERS spends 10 times than minstpixel. Secondly, whether a superpixel is good or bad should be judged based on a specific scene. For example, ERS is good at dealing with the demand for high accuracy, SLIC is good at dealing with the demand for regular shape and WS is skilled at dealing with real time scenes. Finally, all things considered, minstpixel has reached the level of state-of-art algorithms, although minstpixel can not comprehensively defect all other algorithms.
5 Conclusions
We have presented a novel superpixel segmentation algorithm called minstpixel, which produces compact and regular shaped superpixels with high accuracy, low time complexity and simple implementation. To generate high quality superpixels, minstpixel firstly proposes an energy functional based on minimal spanning tree and the initialization strategy. Then, minstpixel optimizes the energy functional by picking out the specific edge at each iteration. Additionally, we prove that the optimization algorithm can produce exactly the global optimum of the energy functional. Both theory and practice suggest that minstpixel is reasonable and practicable. Certainly, minstpixel is not perfect and may be improved by multi-scale mutual information [37,38,39].
-
Similarity measure for single superpixel D(si) in energy functional (4) depends on the size of superpixel. It causes the superpixel size to be nonuniform. Usually it isn’t a serious implication, since the constraint condition that each superpixel has one and only one seed point to fill this gap to a certain extent.
-
Energy functional (4) has multiple optimal solutions but algorithm 1 can only find one. It means that there may be some better solutions which are neglected by algorithm 1.
References
Liu Y-J, Cheng-Chi Y, Min-Jing Y, He Y (2016) Manifold SLIC: a fast method to compute content-sensitive superpixels. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 651–659
Cheng M-M, Zhang G-X, Mitra NJ, Huang X, Shi-Min H (2011) Global contrast based salient region detection. In: CVPR 2011, vol 37. IEEE, pp 409–416
Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. In: 2009 IEEE conference on computer vision and pattern recognition, number Ic. IEEE, pp 1597–1604
Levinshtein A, Sminchisescu C, Dickinson S (2010) Optimal contour closure by superpixel grouping. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), volume 6312 LNCS. Springer, Berlin, pp 480–493
Yu C-C, Liu Y-J, Wu MT, Li K-Y, Fu X (2014) A global energy optimization framework for 2.1D sketch extraction from monocular images. Graph Models 76(5):507–521
Fulkerson B, Vedaldi A, Soatto S (2009) Class segmentation and object localization with superpixel neighborhoods. In: 2009 IEEE 12th international conference on computer vision. IEEE, pp 670–677
Shu W, Huchuan L, Fan Y, Ming-Hsuan Y (2011) Superpixel tracking. In: 2011 IEEE international conference on computer vision (ICCV). IEEE, pp 1323–1330
Mičušík B, Košecká J (2010) Multi-view superpixel stereo in urban environments. Int J Comput Vis 89(1):106–119
Gould S, Rodgers J, Cohen D, Elidan G, Koller D (2008) Multi-class segmentation with relative location prior. Int J Comput Vis 80(3):300–316
Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Susstrunk S (2012) SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274– 2282
Machairas V, Decenciere E, Walter T (2014) Waterpixels: superpixels based on the watershed transformation. In: 2014 IEEE international conference on image processing (ICIP). IEEE, pp 4343–4347
Moore AP, Prince SJD, Warrell J, Mohammed U, Jones G (2008) Superpixel lattices. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–8
Mathieu B, Crouzil A, Puel JB (2017) Oversegmentation methods: a new evaluation. Pattern Recogn Image Anal 10255:185–193
Wang M, Liu X, Gao Y, Ma X, Soomro NQ (2017) Superpixel segmentation: a benchmark. Signal Process Image Commun 56:28–39
Neubert P, Protzel P (2012) Superpixel benchmark and comparison. In: Forum bildverarbeitung, pp 1–12
Jianbo S, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Liu M-Y, Tuzel O, Ramalingam S, Chellappa R (2011) Entropy rate superpixel segmentation. In: CVPR 2011. IEEE, pp 2097–2104
Vedaldi A, Soatto S (2008) Quick shift and kernel methods for mode seeking. In: Computer vision – ECCV 2008, volume 5305 LNCS. Springer, Berlin, pp 705–718
Vincent L, Soille P (1991) Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Trans Pattern Anal Mach Intell 13(6):583–598
Machairas V, Faessel M, Cardenas-Pena D, Chabardes T, Walter T, Decenciere E (2015) Waterpixels. IEEE Trans Image Process 24(11):3707–3716
Van den Bergh M, Boix X, Roig G, de Capitani B, Van Gool L (2012) SEEDS: superpixels extracted via energy-driven sampling. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), volume 7578 LNCS, pp 13–26
Chen J, Li Z, Bo H (2017) Linear spectral clustering superpixel. IEEE Trans Image Process, 1–1
Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vis 59(2):167–181
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 24(5):603–619
Buyssens P, Gardin I, Ruan S (2014) Eikonal based region growing for superpixels generation: application to semi-supervised real time organ segmentation in CT images. IRBM 35(1):20–26
Yao J, Boben M, Fidler S, Urtasun R (2015) Real-time coarse-to-fine topologically preserving segmentation. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2947–2955
Wang J, Wang X (2012) VCells: simple and efficient superpixels using edge-weighted centroidal voronoi tessellations. IEEE Trans Pattern Anal Mach Intell 34(6):1241–1247
Papon J, Abramov A, Schoeler M, Worgotter F (2013) Voxel cloud connectivity segmentation - supervoxels for point clouds. In: 2013 IEEE conference on computer vision and pattern recognition. IEEE, pp 2027–2034
Drucker F, MacCormick J (2009) Fast superpixels for video analysis. In: 2009 workshop on motion and video computing (WMVC). IEEE
Tang D, Huazhu F, Cao X (2012) Topology preserved regular superpixel. In: 2012 IEEE international conference on multimedia and expo. IEEE, pp 765–768
Weikersdorfer D (2012) Depth-adaptive superpixels
Levinshtein A, Stere A, Kutulakos KN, Fleet DJ, Dickinson SJ, Siddiqi K (2009) TurboPixels: fast superpixels using geometric flows. IEEE Trans Pattern Anal Mach Intell 31(12):2290–2297
Ren X, Malik J (2003) Learning a classification model for segmentation. Proc Ninth IEEE Int Conf Comput Vis 1(c):10–17, 1
Wang P, Zeng G, Gan R, Wang J, Zha H (2013) Structure-sensitive superpixels via geodesic distance. Int J Comput Vis 103(1):1–21
Cormen T, Rivest R, Leiserson C (1989) Introduction to algorithms. MIT Press
Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings eighth IEEE international conference on computer vision. ICCV 2001, vol 2. IEEE Comput. Soc, pp 416–423
Wei W, Fan X, Song H, Fan X, Yang J (2016) Imperfect information dynamic Stackelberg game based resource allocation using hidden Markov for cloud computing. IEEE Trans Serv Comput, 1–1
Wei W, Fan X, Song H, Wang H (2017) Video tamper detection based on multi-scale mutual information. Multimed Tools Appl, 1–18
Wei W, Yang X-L, Zhou B, Feng J, Shen P-Y (2012) Combined energy minimization for image reconstruction from few views. Math Probl Eng 2012:1–15
Kruskal JB (1956) On the shortest spanning subtree of a graph and the traveling salesman problem. Proc Amer Math Soc 7(1):48
Acknowledgments
This work was supported by the National High Technology Research, Development Program of China (No.2017YFB1401600), the National Natural Science Foundation of China (No.61573235,61703260), the Shanghai Innovation Action Project of Science and Technology (No.16111105802), and the Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
This section prepares an proof for the global optimality of the algorithm 1. Proposition 1 and proposition 2 demonstrate that the segmentation \(\mathcal {S}\) generated by algorithm 1 is a valid superpixel segmentation. Proposition 3 demonstrates that algorithm 1 will always converge to the optimum of (4).
Proposition 1
There is one and only one seed point in each connected component produced by algorithm 1.
Proof
On the one hand, the number of seed points in each component is less than 2 (1 or 0) according to the step 3 of algorithm 1. On the other hand, we suppose that in segmentation \(\mathcal {S}\) there exists one connected component ci without seed points, then the number of connected components of segmentation is greater than k. There is a contradiction between assumption and termination condition of algorithm 1. So, the number of seed points in each component cannot be other except 1.□
Proposition 2
Each connected component produced by algorithm 1 is a minimum spanning tree.
Proof
Let \(G(\mathcal {V},\mathcal {E})\) be an input graph, \(G(\mathcal {V},\mathcal {A})\) is the segmentation result produced by algorithm 1, ci(vi, ai) is the ith connected component of graph \((\mathcal {V},\mathcal {A})\) i.e. \(\bigcup _{i} \mathbf {v}_{i}=\mathcal {V}\) and \(\bigcup _{i} \mathbf {a}_{i}\) = \(\mathcal {A}\). Obviously, if we remove an edge e ∉ \(\mathcal {A}\) from graph \((\mathcal {V},\mathcal {E})\), then the segmentation result \((\mathcal {V},\mathcal {A})\) produced by algorithm 1 remains the same. Now, we build a graph \((\mathcal {V}^{\prime },\mathcal {E}^{\prime })\) from graph \((\mathcal {V},\mathcal {E})\) by removing all edges e(v1, v2) satisfying v1 ∈ ci, v2 ∈ cj and ci≠cj. Suppose that \((\mathcal {V}^{\prime },\mathcal {A}^{\prime })\) is the segmentation result produced by algorithm 1 on graph \((\mathcal {V}^{\prime },\mathcal {E}^{\prime })\), \(c^{\prime }_{i}(\mathbf {v}^{\prime }_{i},\mathbf {a}^{\prime }_{i})\) is the ith connected component of graph \((\mathcal {V}^{\prime },\mathcal {A}^{\prime })\) i.e. \(\bigcup _{i} \mathbf {v}^{\prime }_{i}\) = \(\mathcal {V}^{\prime }\) and \(\bigcup _{i} \mathbf {a}^{\prime }_{i}=\mathcal {A}^{\prime }\). Graph \((\mathcal {V},\mathcal {A})\) and graph \((\mathcal {V}^{\prime },\mathcal {A}^{\prime })\) is identical because all removed edges don’t belong to \(\mathcal {A}\), and then each ci is identical to the corresponding \(c^{\prime }_{i}\). On the other side, the process of algorithm 1 for graph \((\mathcal {V}^{\prime },\mathcal {E}^{\prime })\) is perfectly equivalent to the process of Kruskal algorithm for every \(c^{\prime }_{i}\) [40]. So, each connected component \(c^{\prime }_{i}\) is a minimum spanning tree, and then each ci is also a minimum spanning tree.□
Proposition 3
The algorithm 1 can converge to the global minimum point.
Proof
We define proposition P : If \(\mathcal {A}\) is the set of edges chosen at any stage of the algorithm 1, then there are some optimal segmentations that contain \(\mathcal {A}\). Clearly, proposition 3 is equivalent to proposition P. We show that proposition P is true by induction.
-
1.
Clearly, P is true at the beginning, when \(\mathcal {A}\) is empty: any optimal segmentation will do, and there exists one because a weighted connected graph always has some segmentation satisfying the constraint condition of energy functional (4).
-
2.
Now assume that P is true for a non-final edge set \(\mathcal {A}\) and let \(\mathcal {S}\) be an optimal segmentation that contains \(\mathcal {A}\). If the next chosen edge e is also in \(\mathcal {S}\), then P is true for \(\mathcal {A} + e\). Otherwise, \(\mathcal {S} +e\) has a cycle C or leads to a superpixel which contains two seed points p1 and p2.
-
If, \(\mathcal {S} + e\) has a cycle C and there is another edge f that is in C but not \(\mathcal {A}\). (If there were no such edge f , then e could not have been added to \(\mathcal {A}\), since doing so would have created the cycle C.) Then \(\mathcal {S} - f + e\) is a valid superpixel segmentation, and it has the same weight as \(\mathcal {S}\), since \(\mathcal {S}\) has minimum weight and the weight of f cannot be less than the weight of e, otherwise the algorithm 1 would have chosen f instead of e. So \(\mathcal {S} - f + e\) is a minimum spanning tree containing \(\mathcal {A} + e\) and again, P holds.
-
If, \(\mathcal {S} + e\) leads to a superpixel contains two seed points p1 and p2. In other words, there is a path between p1 and p2 and there is another edge f that is in path p1p2 but not \(\mathcal {A}\). Then \(\mathcal {S} - f + e\) is a valid superpixel segmentation, and it has the same weight as \(\mathcal {S}\), since \(\mathcal {S}\) has minimum weight and the weight of f cannot be less than the weight of e, otherwise the algorithm 1 would have chosen f instead of e. So \(\mathcal {S} - f + e\) is a minimum spanning tree containing \(\mathcal {A} + e\) and again, P holds.
-
-
3.
Therefore, by the principle of induction, P holds when \(\mathcal {A}\) has become a valid segmentation, which is possible only if \(\mathcal {A}\) is a optimal superpixel segmentation itself.
□
Rights and permissions
About this article
Cite this article
Wu, X., Liu, X., Chen, Y. et al. A graph based superpixel generation algorithm. Appl Intell 48, 4485–4496 (2018). https://doi.org/10.1007/s10489-018-1223-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1223-1