1 Introduction

Superpixels are usually defined as perceptually meaningful atomic regions [1] that can not only effectively capture image features but also greatly reduce the number of entities to be processed in subsequent image processing tasks, such as image segmentation, saliency detection [2, 3], contour closure [4], sketches extraction [5], object location [6], object tracking [7], 3D reconstruction [8], and many others [9].

According to previous work [10,11,12], we sum up five important properties of superpixel.

  1. 1.

    Connectivity: Each superpixel is an individual connected region, all of the superpixels constitute the whole image, and superpixels do not overlap each other.

  2. 2.

    Homogeneity: Pixels in the same superpixel should present same or similar visual features, especially in color and texture.

  3. 3.

    Adhesion: The boundary of superpixels should strongly adhere to the boundary of objects in an image. In other words, the set of objects’ boundary is a proper subset of the set of superpixels’ boundary.

  4. 4.

    Regularity: The regular arrangement, similar size, and the same shape are necessary, which provides more convenience for subsequent analysis. In the past, regularity is always ignored and devalued.

  5. 5.

    Complexity: Algorithm complexity includes time complexity and space complexity. As an image preprocessing technique, linear time complexity and linear space complexity are necessary.

Among the above properties, connectivity is the essential qualification for image segmentation; homogeneity and adhesion are the premise and foundation for the accuracy of subsequent image analysis tasks; regularity is a convenience to application developers and helps reduce the trouble of development. Unfortunately, adhesion and regularity are a pair of contradictions, while homogeneity and efficiency are another pair of contradictions. Most superpixel segmentation methods attempt to strike a balance among these contradictions. Besides these four properties, computation efficiency is also an important qualification obviously. Superpixels are typically used to accelerate complex image understanding and analysis process. Therefore, the superpixel generation algorithm itself should be simple, straightforward to implement and of linear complexity, otherwise it will not be useful.

Each superpixel segmentation method is a compromise, and no method is perfect. For example, ERS is good at adhesion but weak in regularity, while SLIC is good at regularity but weak in adhesion [13,14,15]. Minstpixel devotes itself to strike a balance among the five properties. To meet this goal, we regard the superpixel segmentation problem as an energy minimization problem then design an energy functional based on the minimal spanning tree, and finally develop a greedy optimization strategy to optimize it. Unlike previous work, minstpixel never needs sophisticated optimization schemes, complicated mathematical deductions or a fussy iteration process. In benchmark, we implement minstpixel with approximately linear time complexity, which has achieved state-of-the-art performance on homogeneity, adhesion, regularity, and efficiency. Additionally, the Appendix mathematically proves that the greedy optimization strategy can exactly achieve the optimal value of the proposed energy functional. The drawback of minstpixel we have noted is that an individual pixel has the potential to be an independent superpixel, especially when there are too many superpixels. To overcome the drawback minstpixel merges the individual pixel into neighboring superpixels in a post-processing. Both theoretical analysis and experimental results demonstrate that minstpixel is effective and practical (Fig. 1).

Fig. 1
figure 1

Superpixel illustration. The original image comes from the famous Berkeley segmentation database. From left to right each superpixel contains about 1000, 500, 250 pixels respectively

2 Related works

Scientists have developed more than twenty distinctive superpixel generation methods. We selectively pick out seven methods for the contrastive experiments, and each chosen method is recognized as an excellent one [10]. They are SLIC(2012) [10], Ncut(2000) [16], ERS(2012) [17], Quick shift(2008) [18], Watersheds(1991) [19], Waterpixel(2015) [20], SEEDS(2012) [21], LSC(2017) [22], Manifold SLIC(2016) [1] and FH(2004) [23]. We will make a comparison between minstpixel and these seven methods in the experiment section of this article. Watershed transformation [19] proposed by Vincent and Soille is a pioneer of superpixel segmentation algorithms. It automatically seeks minimal gradient pixels as seed points and gradually extends these points to superpixels. The latest improved watershed transformation is waterpixel which achieves a tunable tradeoff between the superpixel regularity and the adherence to object boundary. Simple linear iterative clustering (SLIC) clusters pixels of input image to k superpixels using modified K-means algorithm. Due to its simplicity and high performance, SLIC is widely used. LSC and MSLIC are the last improved scheme of SLIC. MSLIC extends SLIC to content sensitive superpixels which means small superpixels are in dense regions and large superpixels are in sparse regions [1]. LSC further enhances the performance of SLIC using the conclusion that the objective functions of the weighted K-means and the normalized cuts share the same optimum points by mapping each point to specific feature space [22]. To generate superpixels, Ncut [16] recursively partitions a given graph by minimizing a cost function defined on the edges between partition boundary [10]. Mean-shift [24] and Quick-shift [18] are mode-seeking algorithms that generate superpixels by recursively moving to the kernel smoothed centroid for every data point in the pixel feature space [10]. SEEDS [21] extracts superpixels by minimizing a cost function defined by color distribution term and boundary term using hill-climbing method. Except the chosen methods above, ERGC [25], ETPS [26], Vcell [27], VCCS [28], PF [29], TPS [30], DASP [31], superpixel lattice [12] and Turbopixels [32] are meaningful methods as well.

3 Minstpixel

This section provides a detailed description of minstpixel. In this section, we first show a method to represent a superpixel via minimal spanning tree (MST), next propose an approach to construct energy functional based on MST, then introduce a strategy to get the global optimal solution of the energy functional, and finally analyze the time complexity.

3.1 Requisite knowledge

Graph representation

An undirected graph is a tuple \(\mathrm {G}(\mathcal {V},\mathcal {E})\), where \(\mathcal {V}\) and \(\mathcal {E}\) are vertex set and edge set, respectively. In edge set \(\mathcal {E}\), each edge e(vi, vj) consists of a pair of vertices vi, vj and a weight w(vi, vj).

Connected component

A connected component is a subgraph in which any two vertices are connected to each other by a path which only consists of vertices in this subgraph. Besides, vertices in two distinct subgraphs are not connected. Any undirected graph consists of one connected component at least.

Minimal spanning tree

A spanning tree T of an undirected graph G is a tree that includes all of the vertices of G. A minimal spanning tree (MST) of a graph is a spanning tree whose weight (the sum of the weights of its edges) is no larger than any other spanning tree. Figure 2 shows an example of minimal spanning tree. Figure 2a and c are undirected connected graphs where each vertex has a pixel value and each edge has a weight representing the difference between two adjacent vertices. Figure 2 and d are the minimal spanning trees of Fig. 2a and c respectively. As we can see, the weight of minimal spanning tree in Fig. 2c is smaller than that in Fig. 2d. This means vertices in Fig. 2a are more homogeneous than these in Fig. 2c.

Fig. 2
figure 2

minimal spanning tree

Graph partition

A graph partion \(\mathcal {S}\) refers to a division of the vertex set \(\mathcal {V}\) into disjoint connected components \(\mathcal {S}=\{S_{1},S_{2},\cdots ,S_{k}\}\) such that SiSj = for ij and \(\bigcup _{i} S_{i} = \mathcal {V}\). In other words, the graph partion aims to turn a graph \(\mathrm {G}(\mathcal {V},\mathcal {E})\) into a new graph \(\mathrm {G}(\mathcal {V},\mathcal {A})\) which consists of k connected components by removing some edges from edge set \(\mathcal {E}\). It can be restated as a subset selection problem. The goal of subset selection problem is to select a subset of edges \(\mathcal {A} \in \mathcal {E}\) such that the resulting graph \((\mathcal {V},\mathcal {A})\) consists of k connected components.

3.2 Construct graph

We now associate a weighted undirected graph \((\mathcal {V},\mathcal {E})\) with an image I. Each vertex \(v_{i} \in \mathcal {V}\) corresponds to one pixel I(xi, yi) in image I. If pixel I(xi, yi) and pixel I(xj, yj) are neighbors in image I, there is an edge \((v_{i},v_{j}) \in \mathcal {E}\) between vertice vi and vertice vj. Weight w(vi, vj) is a non-negative measure of the distance between pixel I(xi, yi) and I(xj, yj), usually based on the differences in intensity, color, texture or some other local attributes. In our experiment, distance between pixels is determined in CIElab color space which is widely considered as perceptually uniform for small color distances [10], as shown in (1). From the viewpoint of graph theory, superpixels generation problem can be restated as the graph partition problem. In graph partition problem, people manage to divide graph \((\mathcal {V},\mathcal {E})\) into a group of smaller separated connected components, all the connected components form a new graph \((\mathcal {V},\mathcal {A})\). The graph \((\mathcal {V},\mathcal {A})\) corresponds to a segmentation \(\mathcal {S}\), and each connected component ci corresponds to a superpixel si in segmentation \(\mathcal {S}\). There are various criteria to describe the quality of a segmentation but almost all criteria support that the pixels in the same superpixel should share as many of the same features as possible, while the pixels in the distinct superpixels should share as less of the same features as possible. This means that the lower the weight of edges in \(\mathcal {A}\) and the higher weight of edges in \(\mathcal {E}-\mathcal {A}\), the better the segmentation result. Any partition of a graph can be determined by the edge set \(\mathcal {A}\) or \(\mathcal {E}-\mathcal {A}\). So, there are two ways to get the partion. One method, such as Ncut, is to manage to get the set \(\mathcal {E}-\mathcal {A}\), that is to say, some chosen edges will be removed from the graph \((\mathcal {V},\mathcal {E})\) [33]. The other method, such as minstpixel, is to manage to get the edge set \(\mathcal {A}\).

$$ w(v_{i},v_{j})=\sqrt{(L_{i}-L_{j})^{2}+(a_{i}-a_{j})^{2}+(b_{i}-b_{j})^{2}} $$
(1)

3.3 Energy functional

From the viewpoint of graph theory, minstpixel regards a superpixel as an individual connected component and defines the internal similarity measure as the weight of minimal spanning tree. It is

$$ D(\mathbf{s}_{i}) = \sum\limits_{e \in MST(\mathbf{s}_{i}) } w(e) $$
(2)

In principle, this measure is problematic since it depends on the size of ci. It leads to that D(ci) can not compare to each other directly because their size may be unequal. However, we have found this measure works quite well in practice because minstpixel tends to generate superpixels with little differences in size. Although some definitions are more robust to outliers, they make the optimization strategy NP hard.

For a graph \(\mathrm {G}(\mathcal {V}, \mathcal {A})\) (or corresponding superpixel segmentation \(\mathcal {S}\)), we define the similarity measure as the sum of all superpixels. It is,

$$ E(\mathcal{S}) = \sum\limits_{\mathbf{s}_{i} \in \mathcal{S} } D(\mathbf{s}_{i})=\sum\limits_{\mathbf{s}_{i} \in \mathcal{S}}\sum\limits_{e \in MST(\mathbf{s}_{i}) } w(e) $$
(3)

3.4 Constraint condition

Energy functional (3) has at least two obvious defects.

  • It can not provide any way to control the number of superpixels explicitly.

  • It can not allow us to explicitly adjust the distribution of superpixels.

In order to ameliorate these defects, we define a seed point set \(\mathcal {P}\) containing k seed points. Meanwhile, we impose that each superpixel must have one and only one seed point. If seed points are evenly distributed across the whole image, the superpixels will be distributed across the whole image evenly. More formally,

$$ \begin{array}{lllllll} E(\mathcal{S}) = \sum\limits_{c_{i} \in \mathcal{S}}\sum\limits_{e \in MST(c_{i},E) } w(e)\\ &wst. \\ \qquad\forall i, \qquad \mid \mathbf{s}_{i} \cup \mathcal{P} \mid = 1 \\ \forall i\neq j, \qquad \mathbf{s}_{i} \cup \mathcal{P} \neq S_{j} \cup \mathcal{P} \end{array} $$
(4)

Obviously, the minimal value of (4) corresponds to the optimal superpixel segmentation.

3.5 Initial seed points placement

In minstpixel each superpixel grows up from a seed pixel, thus it is important to start from a good initial seed point set. SLIC [10], Turbo Pixels [32] and some other methods use regular grid points as seed points, but minstpixel considers that the center of each homogeneous region is the optimum position for seed. This seed point set makes the initial energy lower than regular grid points, so minstpixel can achieve better segmentation result.

To locate these points heuristically, minstpixel employs (6) to describe edge distribution because it performs well when searching for edges of the image [34].

$$ \mathrm{E}(x,y)= \frac{\Vert \nabla \mathrm{I}\Vert }{\mathrm{G}_{\sigma_{1}} *\Vert \nabla \mathrm{I}\Vert + \gamma } $$
(5)
$$ \mathrm{D}(x,y)= \frac{1}{Z}(\mathrm{G}_{\sigma_{2}}*(\mathrm{E}(x,y)+\lambda \overline{\mathrm{E}(x,y})) $$
(6)

Eq.5 describes the calculation of the normalized gradient, where Gσ, ∥∇I∥ and γ are the Gaussian function with standard deviation σ, the gradient magnitude of color image I and the parameter to reduce the noise disturbance respectively. In (6), the \(\overline {\mathrm {E}(x,y)}\) is the mean of E(x, y), and the Z is the regularized factor making sure that the integral of D(x, y) is 1. D(x, y) can be regarded as a probability that (x, y) lies on edges of objects.

Furthermore, we suppose that point set \(\mathcal {P}^{\prime } = \{ (x^{\prime },y^{\prime })\}\) contains k grid points which have the same span \(\sqrt {m/k}\), where m is the total pixel number of the image and k is the number of superpixels. Our initialization method can be represented as a map \(\mathcal {P^{\prime }}\rightarrow \mathcal {P}\). Equation (7a, 7b) quantitatively describes the relationship between \(\mathcal {P}^{\prime }\) and \(\mathcal {P}\).

$$ {\int}_{0}^{x_{i}}{\mathrm{D}(x,y)dx} = x_{i}^{\prime} \times \frac{{{\int}_{0}^{w}}{\mathrm{D}(x,y)dx}}{w} $$
(7a)
$$ {\int}_{0}^{y_{i}}{\mathrm{D}(x,y)dy} = y_{i}^{\prime} \times \frac{{{\int}_{0}^{h}}{\mathrm{D}(x,y)dy}}{h} $$
(7b)

From (7a), xi and y are in one-to-one relationship, then (7a) is a curve extending in direction y. Similarly, (7b7b) is a curve extending in direction x. The intersection of the two curves is the seed point (xi, yi). If there are multiple points of intersection, minstpixel will choose the one with the lowest D(x, y).

3.6 Energy minimization

Mathematically, the problem that we are facing is a typical combination optimal problem. It’s almost impossible to find any analytical solution with current technologies. Instead, we craft a greedy method to gain reasonable superpixel segmentation. Our method begins by sampling k seed pixels and then gradually extend to all pixels based on a greedy rule. Then, we have proved that the optimum point can be reached by this method using mathematical induction in the Appendix.

figure f

We now have established a segmentation S obeying energy functional (4). Figure 3 shows the complete process of algorithm 1. It is inescapably clear that the result graph includes m vertices, mk edges and k connected components, and each component corresponds to a superpixel. In the sequel we show that algorithm 1 converges exactly to the global optimal solution of (4), although it makes greedy decisions in each iteration.

Fig. 3
figure 3

An example of algorithm 1. a Each vertex corresponds to a pixel of input image. The figures on the vertex are pixel values. The figures on the edges are the distances between neighboring pixels. Blue vertex and green vertex are both seed points. b After two iterations, two edges are chosen. c After ten iterations, ten edges are chosen. d The graph is divided into two components. Each component is a minimal spanning tree and each minimal spanning tree corresponds to a superpixel

3.7 Implementation and complexity

Algorithm 1 maintains the segmentation \(\mathcal {S}\) using a disjoint set with path compression and union by rank. This disjoint set can finish n merge operations and m search operations in O(mα(n)) time, where mα(n) is inverse Ackerman’s function which is an approximate linear function [35]. Step 1 sorts m weights into nondecreasing order. In general, it can consume O(m log m) time using Quick Sort [35]. Step 2 and Step 3 find and merge eligible components. There are mk merge operations and up to m search operations, so it consumes O((mk)α(m)) time. All in all, the time complexity of algorithm 1 is O(m log m + (mk)α(m)).

4 Experiments

We compare minstpixel with ten state-of-the-art algorithms, namely SLIC(2012) [10], Ncut(2000) [16], ERS(2012) [17], Quick shift(2008) [18], Watersheds(1991) [19], Waterpixel(2015) [20], SEEDS(2012) [21], LSC(2017) [22], Manifold SLIC(2016) [1] and FH(2004) [23], using publicly available source codes on the Berkeley Segmentation Database [36]. All benchmarks are conducted on a desktop PC equipped with two intel xeon E5-2620 v4 processors and 16GB of memory.

4.1 Adhesion

The adhesion refers to the ability to adhere object boundary. Superpixels generally serve as the foundation of object detection, so adhesion is a key evaluation criterion. In order to evaluate adhesion, predecessors propose boundary recall against the number of superpixels (BR). BR is the fraction of ground truth boundary that fall in the superpixel boundary as (8). A ground truth boundary pixel is counted if it falls within strictly less than 2 pixels from superpixel boundary.

$$ BR=\frac{\text{TP}}{\text{TP}+\text{FN}} $$
(8)

Where TP (True Positives) is the number of boundary pixels that fall in the superpixel boundary, TN is the number of boundary pixels that fail to fall in the superpixel boundary. A high BR indicates that the majority of true boundary are preserved. However some previous algorithms also achieved high BR in benchmark, because these algorithms tend to generate tortuous boundary which result in high boundary density. For example, boundary density of FH significantly exceeds minstpixel, so FH has more chances to hit true boundary than minstpixel, as shown in Fig. 6. It is unfair to the method tending to generate smooth boundary. Machairas and Kalinin have noticed this problem and propose boundary density against boundary recall to evaluate adhesion of superpixels [11]. However, boundary density against boundary recall still ignores the influence of the quantity of superpixels. In order to take both the quantity of superpixels and the density of boundary into account, we define normalization boundary recall as NBR = BR/C, by introducing the penalty factor C which is proportional to boundary density. Figure 4a demonstrates the results of benchmark.

Fig. 4
figure 4

Illustrations of boundary recall benchmark. a shows the comparison between minstpixel and eight fine methods, and minstpixel reaches upper-middle level. be demonstrate the boundary missed by minstpixel. b is an image slice from the Berkeley segmentation database in which the color and texture are very similar between foreground and background, so it’s very difficult to accurately distinguish objects from background. c and d are minstpixel and the boundary recalled by minstpixel respectively, and boundaries are marked out by double pixels. e shows groundtruth marked out by single pixel, green parts represent the boundary minstpixel hit and red parts represent the boundary minstpixel miss

4.2 Homogeneity

Homogeneity refers to the similarity of the pixels from the same superpixel. Superior homogeneity means that a superpixel can only cover one object. The homogeneity benchmark consists of two common evaluation criteria: under segmentation error (USE) and achievable segmentation accuracy (ASA). USE measures what extent superpixels stretch over the ground truth segment boundary. A lower USE indicates that fewer superpixels cross the boundary of objects. USE is defined as \(\left [ {\sum }_{S} \left ({\sum }_{P:P \cup S \neq \emptyset } \min (|P_{in}|,| P_{out} | ) \right ) \right ]/N\), where N is the number of pixels in image, |Pin| is the number of pixels in part Pin and |Pout| is the number of pixels in part Pout. A factitious case of USE is shown in Fig. 5f, in this case N is (|Aout| + |Bin| + |Cin|)/|S|. ASA measures the upper bound of accuracy when image objects are reconstructed using a given set of superpixels as units. Given a ground truth segmentation G and a superpixel segmentation S, achievable segmentation accuracy is defined as \(\left ({\sum }_{k} \max _{i} \mid S_{k} \cap G_{i} \mid \right ) / {\sum }_{i} \mid G_{i} \mid \).

Fig. 5
figure 5

Homogeneity illustrations. a Achievable segmentation accuracy. b Under segmentation error. c Original image. d Minstpixel. e Object detection by minstpixel. a and b are line charts smoothed by second order B spline function for the sake of beauty, and it can be seen from these charts that minstpixel reaches the top three in these two tests. ce show an example of object detection by minstpixel. Red represents right detection pixels, blue represents missed detection pixels and green represents over detection pixels

4.3 Regularity

The regularity is always devalued but it is an important indicator. On the one hand, irregular boundary will contribute to the false high performance in benchmark. On the other hand, irregular shape will cause trouble for subsequent image analysis (Fig. 6).

Fig. 6
figure 6

The shape of superpixels. a, e: Origin image. b, f: FH. c, g: ERS. d, h Minstpixel. a, e are both 150 × 150 image slices from BSD500. a comes from background region while (e) comes from foreground region. It can be seen easily that the shape and boundary density of distinct superpixels are various; for instance, the boundary of FH is very twisted, while the boundary of ERS is smooth. Minstpixel has regular shape and well maintains the border of objects in input image

The regularity contains two items [20]: (1) the smoothness of boundary (2) the shape and size similarity of superpixels. The smoothness is evaluated by the boundary density which is defined as a ratio of the number of superpixel boundary pixels to the total number of pixels in the image. The shape and size similarity is evaluated by average standard deviation of size. Figure 7 shows the benchmark result in regularity.

Fig. 7
figure 7

Benchmark: contour density and average standard deviation of size. a: a bar graph of average contour density on Berkeley segmentation database and each method is set to generate 400 superpixels. d: average standard deviation of size against the number of superpixels

4.4 Efficiency

Obviously, time complexity is also a crucial evaluation criterion for superpixel segmentation algorithm. Table 1 and Fig. 8 shows the theoretical time complexity and actual run time on the Berkeley database.

Table 1 Comparison of runtime among state-of-art superpixels
Fig. 8
figure 8

Time consumption per image of eleven superpixel segmentation algorithms against number of superpixels on Berkeley image segmentation database. Minstpixel takes less than 0.5s at any number of superpixels

4.5 Application

In this section, we display one typical computer vision application that benefit from superpixels: object segmentation. Minstpixel reduces the number of entities to manipulate and improve the result of existing algorithm. The task is to recognize 21 classes of objects from MSRC database using Ada-Boost classifiers and the appearance features are based on the work of Gould et al [9]. For each superpixel si, we construct an 83 dimensions feature vector ϕ(si) to describe region’s size, location, color, shape and texture feature.

It specifically includes:

  • Size and location (1 + 2)

  • The ratio of area to circumference and inertia moment (1 + 2)

  • The mean, standard deviation, skewness and kurtosis statistics of RGB and Lab color space components (4 × (3 + 3))

  • The mean, standard deviation, skewness and kurtosis statistics of texture feature drawn from 13 filters responses (4 × 13)

In addition, we add the weighted mean of adjacent superpixels’ feature vector to each superpixel as (9),

$$ \frac{{\sum}_{S_{j} \in \mathcal{N}(S_{j}) } \mid S_{j} \mid \cdot \phi(S_{j}) }{{\sum}_{S_{j} \in \mathcal{N}(\mathbf{s}_{i})} \mid S_{j} \mid } $$
(9)

where \(\mathcal {N} (\mathbf {s}_{i}) = \left \{ S_{j} | (\mathbf {s}_{i},S_{j}) \in \epsilon (\mathcal {I}) \right \}\) is the set of superpixels which are adjacent to si and ∣Sj∣ is the area of superpixel Sj. We train a one-vs-all Ada-Boost classifier \(\sigma _{c^{\prime }}\) (c is the class of object) for each object and then normalize over all classifiers to get the probability \(P(c_{i}=c^{\prime } \mid \mathbf {s}_{i}, \mathcal {I} )\) as (10)

$$ P(c_{i}=c^{\prime} \mid \mathbf{s}_{i}, \mathcal{I} ) = \frac{ \exp\{\sigma_{c^{\prime}}(\mathbf{s}_{i}) \}}{{\sum}_{c} \exp \{\sigma_{c}(\mathbf{s}_{i})\} } $$
(10)

where σc is the Ada-Boost classifier for class c. Finally, let \(c_{i} = \mathop {\arg \max }\limits _{c^{\prime }} P(c_{i} = c^{\prime } \mid \mathbf {s}_{i} , \mathcal {I} )\), then ci is the class of superpixel si.

Compared with the original method, the accuracy is increased by 1.8% and the running time is reduced by 99%, as show in Table 2.

Table 2 Object class recognition for various superpixel methods

4.6 Discussion and analysis

We can draw many meaningful conclusions from experiment results. Firstly, none of superpixel segmentation algorithms can really defeat all other algorithms comprehensively. Each algorithm is a compromise between performance and time consumption. The increase in performance is usually accompanied by decrease in efficiency or other properties. When performance reaches a certain level, time consumption will increase exponentially. For instance, ERS leads minstpixel by 1% in performance but ERS spends 10 times than minstpixel. Secondly, whether a superpixel is good or bad should be judged based on a specific scene. For example, ERS is good at dealing with the demand for high accuracy, SLIC is good at dealing with the demand for regular shape and WS is skilled at dealing with real time scenes. Finally, all things considered, minstpixel has reached the level of state-of-art algorithms, although minstpixel can not comprehensively defect all other algorithms.

5 Conclusions

We have presented a novel superpixel segmentation algorithm called minstpixel, which produces compact and regular shaped superpixels with high accuracy, low time complexity and simple implementation. To generate high quality superpixels, minstpixel firstly proposes an energy functional based on minimal spanning tree and the initialization strategy. Then, minstpixel optimizes the energy functional by picking out the specific edge at each iteration. Additionally, we prove that the optimization algorithm can produce exactly the global optimum of the energy functional. Both theory and practice suggest that minstpixel is reasonable and practicable. Certainly, minstpixel is not perfect and may be improved by multi-scale mutual information [37,38,39].

  • Similarity measure for single superpixel D(si) in energy functional (4) depends on the size of superpixel. It causes the superpixel size to be nonuniform. Usually it isn’t a serious implication, since the constraint condition that each superpixel has one and only one seed point to fill this gap to a certain extent.

  • Energy functional (4) has multiple optimal solutions but algorithm 1 can only find one. It means that there may be some better solutions which are neglected by algorithm 1.