1 Introduction

Most of the classic algorithms in computational geometry are based on the assumption that the locations of input points are known exactly. In practice, that is not always the case. Data stored in computers are imprecise in most cases. The causes of imprecise data are various, for example, the uncertain properties of a moving object (Cheng et al. 2004), measurement error, sampling error, network latency (Sistla et al. 1997; Pfoser and Jensen 1999), location privacy protection (Beresford and Stajano 2003; Cheng et al. 2006; Gedik and Liu 2005), etc. Any or a combination of these could be the causes leading to imprecision of the data.

A continuous region model such as a disc model, a square model and so on can be used to describe an imprecise point in general. There are many studies based on such a model. Löffler et al. studied the largest and the smallest convex hull problems based on the disc model, the square model, and the parallel line segment model (Kreveld and Löffler 2007). They proved the NP-hardness for some problems and proposed polynomial time algorithms for some other problems, ranging in running time from O(nlogn) to O(n 13). Other papers dealing with the problems based on the continuous region models include (Khanban and Edalat 2003; Nagai and Tokura 2000; Boissonnat and Lazard 1996) and so on.

In addition to this, a discrete point set model can be used to describe imprecise points. In the discrete point set model, a set of discrete points is used for the possible positions where an imprecise point may appear. In computational geometry, the discrete point set model is also called the color-spanning set model because the set of positions for an imprecise point is regarded as the positions with the same color. The input of color-spanning set problems are usually n points with m colors.

There has also been a lot of research studying the color-spanning set problems. Zhang et al. proposed a brute force algorithm to solve the minimum diameter color-spanning set (MDCS) problem and the running time of their algorithm is O(n m) (Zhang et al. 2009). Fleischer and Xu showed that the MDCS problem can be solved in polynomial time for the L 1 and L metrics, while it is NP-hard for L p (p=2,3,4,…) metrics (Fleischer 2010). They also proposed an efficient constant factor approximation algorithm for the problem. Abellanas et al. showed that the Farthest Color Voronoi Diagram (FCVD) is of complexity Θ(nm) if mn/2 (Abellanas et al. 2001a, 2001b). They proposed an efficient algorithm to construct FCVD and then they presented efficient algorithms to construct the smallest color-spanning circle, the smallest color-spanning rectangle and the narrowest color-spanning strip of arbitrary orientation with the help of FCVD. In Das et al. (2009) proposed an algorithm for identifying the smallest color-spanning corridor in O(n 2logn) time and O(n) space and an algorithm for identifying the smallest color-spanning rectangle of arbitrary orientation with an O(n 3logm) running time and O(n) space.

In the database community, a similar framework under a different name “uncertain data” has been used. An imprecise point is called an uncertain object and the different positions with the same color are regarded as the different possible instances of an uncertain object. Pei et al. have performed some research that pertains to geometric problems in this framework (Pei et al. 2007; Cheema et al. 2010; Yuen et al. 2010).

In this paper, we study five color-spanning set problems. We select m points with m colors such that

  • The diameter of the m points is maximized. This problem is called the Maximum Diameter Color-spanning Set problem (MaxDCS).

  • The distance between the closest pair of the m points is maximized. This problem is called the Largest Closest Pair Color-spanning Set problem (LCPCS).

  • The length of the planar minimum spanning tree over the m points is minimized. This problem is called the Planar Smallest Minimum Spanning Tree Color-spanning Set problem (PSMSTCS).

  • The length of the planar minimum spanning tree over the m points is maximized. This problem is called the Planar Largest Minimum Spanning Tree Color-spanning Set problem (PLMSTCS).

  • The perimeter of the planar convex hull over the m points is minimized. This problem is called the Planar Smallest Perimeter Convex Hull Color-spanning Set problem (PSPCHCS).

We discuss these five problems in the following five sections and then give conclusions in the last section.

2 An efficient algorithm for MaxDCS

Computing the diameter of a point set has a long history. By a reduction to set disjointness, it can be shown that computing the diameter of n points with the same color in R d requires Ω(nlogn) operations in the algebraic computation tree model (Preparata and Shamos 1985). However, to the best of our knowledge, the maximum diameter color-spanning set problem (see an example in Fig. 1) has not been studied yet.

Fig. 1
figure 1

Illustration of the maximum diameter of color-spanning sets

Let S be a set of n points with m colors. The steps of our algorithm (MaxDCS1) to compute the maximum color-spanning diameter of S are as follows:

  1. 1.

    Compute the maximum distance between two points of S in the plane (ignoring colors) using the algorithm in Preparata and Shamos (1985). Let these two points be p a and p b . If p a and p b have different colors, then the distance d 0 between p a and p b is the maximum color-spanning diameter D of S, and we can exit the algorithm; else, we continue to the next step.

  2. 2.

    Let the subset of points with the same colors as that of points p a and p b be S ab (S ab also includes p a and p b ). Let \(S_{ab}'=S-S_{ab}\). We compute the distance between every pair of points, one in S ab and the other in \(S_{ab}'\), and let the resulting maximum distance be d ab .

  3. 3.

    Let the set of points of \(S_{ab}'\) which are on or to the left of the line p a p b be set S l . Let the set of points of \(S_{ab}'\) which are on the right of the line p a p b be S r . For any point p in S r such that no point in S l has the same color of p, we put p into set \(S^{\prime}_{r}\). Symmetrically, for any point p in S l such that no point in S r has the same color of p, we put p into set \(S^{\prime}_{l}\). Let \(S_{l}^{\prime\prime}=S_{l}\setminus S_{l}'\) and \(S_{r}^{\prime\prime}=S_{r}\setminus S_{r}'\).

  4. 4.

    Let \(S_{1}=S_{l} \cup S^{\prime}_{r}\) and compute the maximum distance d l between two points in S 1 (ignoring colors) using the algorithm in Preparata and Shamos (1985). Similarly, let \(S_{2}=S_{r} \cup S^{\prime}_{l}\) and compute the maximum distance d r between two points in S 2.

  5. 5.

    Compute the diameter D′ of point set \(S_{l}^{\prime\prime}\cup S_{r}^{\prime\prime}\) (considering colors). The details of this step will be given in Algorithm MaxDCS2.

  6. 6.

    Let D=max(d ab ,d l ,d r ,D′).

Next, we prove the correctness and the time complexity of our algorithm.

Lemma 1

For two points p e and p f in set S l (S r ), at least one of the four segments \(\overline{p_{a}p_{e}}\), \(\overline{p_{a}p_{f}}\), \(\overline{p_{b}p_{e}}\), \(\overline{p_{b}p_{f}}\) has a length longer than that of \(\overline{p_{e}p_{f}}\).

Proof

Since p e and p f are on the same side of line p a p b , there are only two cases for the positions of four points p a , p b , p e , p f :

  1. 1.

    Four points p a , p b , p e , p f form a convex quadrangle (see Fig. 2(a)). If the length of segment \(\overline{p_{e}p_{f}}\) is longer than or equal to the lengths of the segments \(\overline{p_{a}p_{e}}\), \(\overline{p_{a}p_{f}}\), \(\overline{p_{b}p_{e}}\), \(\overline{p_{b}p_{f}}\), then the angles ∠p e p f p b and ∠p f p e p a are smaller than π/2. Because the length of the segment \(\overline{p_{a}p_{b}}\) is longer than or equal to all the above segments, the angles ∠p a p b p f and ∠p b p a p e are smaller than π/2. This contradicts to the fact that the sum of four angles of a convex quadrangle is 2π.

    Fig. 2
    figure 2

    The illustration for the proof of Lemma 1

  2. 2.

    The three points p a , p b , p e form a triangle and p f is in △p a p b p e , \(\overline{p_{e}p_{h}}\) is the height of triangle; moreover, the point p h must be on segment \(\overline{p_{a}p_{b}}\) (see Fig. 2(b)). Otherwise, the length of \(\overline{p_{a}p_{e}}\) or \(\overline{p_{b}p_{e}}\) is longer than \(\overline{p_{a}p_{b}}\). Since p f is either in the right triangle △p a p h p e or in the right triangle △p b p e p h , the length of \(\overline{p_{e}p_{f}}\) is shorter than the longer of \(\overline{p_{e}p_{a}}\) and \(\overline{p_{e}p_{b}}\).

Hence the lemma is proven. □

Let p x and p y be the two points corresponding to the maximum color-spanning diameter of S. There are five cases:

  1. 1.

    p x is p a and p y is p b .

  2. 2.

    p x S ab and \(p_{y}\in S_{ab}'\).

  3. 3.

    p x and p y are in set S l , or p x and p y are in set S r .

  4. 4.

    p x is in set S l , while p y is in set \(S_{r}'\), or p x in set \(S^{\prime}_{l}\) and p y in set S r .

  5. 5.

    p x is in set \(S_{l}^{\prime\prime}\) and p y is in set \(S_{r}^{\prime\prime}\).

Cases 1, 2 and 4 are computed at Steps 1, 2 and 4 of the algorithm MaxDCS1. We do not need to compute the diameter in Case 3 according to Lemma 1 and the fact that the color of p a (p b ) is different from those points in S l (S r ). In other words, the maximum diameter in Case 3 is less than d ab in Case 2. Step 1, 3, 4, and 6 can be finished in O(nlogn) time. Step 2 can be finished in O(kn) time where k′ is the size of the set S ab . However, if k′ is larger than O(logn), in order to reduce the time complexity to O(nlogn), we cannot use a brute force method at Step 2. Instead, we can compute the farthest-point Voronoi diagram of set S ab in O(k′logk′) time (Berg et al. 2008). The Voronoi cell where each point p in set \(S_{ab}'\) is located can be found in O(logk′) time, and then we can compute the distance between p and corresponding site in an additional O(1) time. Therefore, Step 2 can be finished in O(nlogn) time.

How about the time complexity of Step 5 (or the algorithm MaxDCS2) (Algorithm 1)? The time consuming parts of MaxDCS2 are two FOR loops. Since those two loops are symmetric, we only analyze the first one. Notice that the difference between the consecutive steps inside the loop are point sets of two colors. In total, each point is inserted and deleted from S′ once, that means only O(n) insertions and deletions. It is shown that the diameter for n points without color after each insertion or deletion can be updated in O(n ε) time where ε could be an arbitrarily small positive constant (Agarwal et al. 1992; Eppstein 1996). Therefore, the running time of MaxDCS2 is O(n 1+ε).

Algorithm 1
figure 3

MaxDCS2

In Steps 4 and 5 of the algorithm MaxDCS1, the two points realizing d l (d r or D′) could be on the same side of p a p b and could be of the same color. However, according to Lemma 1, those d l ,d r ,D′ are all shorter than d ab and we let D=max(d ab ,d l ,d r ,D′). Therefore, those d l ,d r ,D′ cannot be the real diameter D. Our algorithm considers the distances of all points with its farthest point of different color. Hence we have the following theorem.

Theorem 1

Let S be a set of n points of m colors. The maximum color-spanning diameter of S can be computed in O(n 1+ε) time, where ε could be an arbitrarily small positive constant.

3 Hardness of LCPCS

In this section, we show that LCPCS is NP-Complete even in one dimension. To facilitate the reading, we first present the proofs in the plane, under different metrics.

Theorem 2

LCPCS is NP-hard under the L p metric, for 2≤p<∞,

Proof

We prove the hardness of LCPCS by a reduction from 3-SAT. We give the proof for the L 2 metric in two dimensions and then show how to extend it to any L p metric and to higher dimensions.

Let F be a Boolean formula in conjunctive normal form with n variables x 1,…,x n and m clauses c 1,…,c m , each of size at most three. We take the following steps to construct an instance I of LCPCS.

For each Boolean variable x i x i in F, let k i be the maximum number of times that x i and ¬x i appear in F. Then we draw a rectangle V i vertically and separate it into k i −1 small rectangles horizontally. Every small rectangle has length b and height a. The diagonal length of each small rectangle is d (d 2=a 2+b 2,d<2a). We place 2k i points with different colors on the 2k i vertices of small rectangles (see Fig. 3). The 2k i points of different colors are placed on k i rows and two columns. Let A xy (1≤xk i ,1≤y≤2) denote the point at row x and column y. Then we draw another rectangle H i horizontally which is far away from V i and separate it into k i −1 small rectangles vertically. Every small rectangle has length c (c=d+ε) and height dε (see Fig. 3). We place 2k i points with different colors on the 2k i vertices of those small rectangles. Let B xy (1≤x≤2,1≤yk i ) denote the point at row x and column y. When x is an odd number, only A x1 and B 2x have the same distinct color, and only A x2 and B 1x have the same distinct color.Footnote 1 When x is an even number, only A x2 and B 2x have the same distinct color, and only A x1 and B 1x have the same distinct color (see Fig. 3).

Fig. 3
figure 4

Variable gadget

Let P 1 be the set of points A xy where x is odd and y=1 or x is even and y=2. Let P 2 be the set of points A xy where x is even and y=1 or x is odd and y=2. Let Q 1 be the set of points B xy where x=1 and Q 2 be the set of points B xy where x=2. The idea is that if we want to maximize the distance between the closest pair with this configuration, we either choose the point sets P 1 and Q 1, or point sets P 2 and Q 2. The fist case represents the value 1 for this variable, and the second case represents the value 0. The rectangles corresponding to different variables lie far away enough to each other (at least 4d). The points in different variables have totally different colors.

For each clause in F, we add three points and these three points have the same distinct color. We deal with the clauses from left to right. For example, let the i-th clause be (x 1x 2∨¬x 3), and suppose that x 1 appears l 1−1 times, x 2 appears l 2−1 times, ¬x 3 appears l 3−1 times in the previous i−1 clauses. Then we put one point next to rectangle H 1 and it is right below the point \(B_{2l_{1}}\) with distance ε. The second point is next to rectangle H 2 and it is right below \(B_{2l_{2}}\) with distance ε. The third point is next to rectangle H 3 and it is right above \(B_{1l_{3}}\) with distance ε (see Fig. 4). One of these three points has to be selected for this color. For the distance between the closest pair to be maximum (i.e., equal to d), it is only possible when x 1 is assigned 1, or x 2 is assigned 1, or x 3 is assigned 0.

Fig. 4
figure 5

Clause gadget for (x 1x 2∨¬x 3). Different shapes of points also mean different colors

It is easy to see that F is satisfiable if and only if the largest distance between the closest pair is equal to d.

In order to extend the proof to the L p metric, the only requirement is d>a and d>b. When 1≤p<∞, we have d>a and d>b. Only when p=∞, we have d=a or d=b (then the above construction fails). Also, the hardness for two dimensions implies the hardness for higher dimensions. Therefore LCPCS is NP-hard for the L p metric, for 1≤p<∞, in two or higher dimensions. □

We next show that LCPCS is NP-hard even in one dimension.

Theorem 3

LCPCS is NP-Complete in one dimension.

Proof

We prove LCPCS is NP-Complete by a reduction from the 3-SAT problem. Of course, in this case all points lie on a line l. Given a 3-SAT formula F, we make the following construction.

For each Boolean variable x i in F, let k i be the maximum number of times that x i and ¬x i appear in F (see Fig. 5) . We first put 2k i points \(A_{1},\ldots,A_{2k_{i}}\) on the segment l i1 which is a part of the line l and the distance between two adjacent points is d/2. Then we put anther 2k i points \(B_{1},\ldots,B_{2k_{i}}\) on the segment l i2 which is another part of the line l. The distance between two points B j and B j+1 is dε when j is odd, or 2d when j is even. Furthermore, we require that B j+1 and A j have the same distinct color for 1≤j≤2k i −1, and B 1 and \(A_{2k_{i}}\) have the same distinct color.

Fig. 5
figure 6

Variable gadget for variable x i for one dimension

Let P 1 be the set of points A j when j is odd, and P 2 be the set of points A j when j is even. Let Q 1 be the set of points B j when j is odd, and Q 2 be the set of points B j when j is even. If we want to maximize the distance between the closest pair with this configuration, we must select either the point sets P 1 and Q 1 or the point sets P 2 and Q 2. If we select P 1 and Q 1, x i is assigned 0 and if we select P 2 and Q 2, x i is assigned 1. The distance between every two variable gadgets is great enough (not less than 4d). The colors of points in different variable gadgets are totally different.

For every clause in F, we need three additional points P a , P b and P c . Certainly, only P a , P b and P c for the same clause have the same distinct color. We deal with the clauses from left to right. For example, let the i-th clause be (x u x v ∨¬x w ), and suppose that x u appears h 1−1 times, x v appears h 2−1 times, ¬x w appears h 3−1 times in the previous i−1 clauses. Then we put P a on the segment l u2 and it is to the left of the point \(B_{2h_{1}-1}\) with distance ε, P b on l v2 and it is to the left of the point\(B_{2h_{2}-1}\) with distance ε, and P c on the segment l w2 and it is to the right of the point \(B_{2h_{3}}\) (see Fig. 6). The distance between the closest pair become maximum (i.e., equal to d) if and only if at least one of x u or x v or ¬x w is assigned 1.

Fig. 6
figure 7

Clause gadget for (x u x v ∨¬x w ) for one dimension

Therefore, F is satisfiable if and only if the largest possible distance between the closest pair is equal to d. □

Actually we can prove that LCPCS is \((\frac{1}{2}+\varepsilon)\)-APX-hard, which means that it is NP-hard to find any approximation algorithm with an approximation ratio better than \(\frac{1}{2}\).

Theorem 4

LCPCS is \((\frac{1}{2}+\delta)\)-APX-hard \((0<\delta \le \frac{1}{2})\) in one dimension.

Proof

Consider Theorem 3 and Figs. 5 and 6, and let ε=d/2. Let O be the optimal solution value of LCPCS. We need to consider two cases:

  1. (1)

    O =d if and only if the 3-SAT formula F is satisfiable.

  2. (2)

    O d/2 if and only if the 3-SAT formula F is not satisfiable.

Hence, to find an approximation algorithm whose approximation ratio is better than \(\frac{1}{2}\) is equivalent to finding an algorithm for O =d, which is NP-hard. The theorem is proved. □

4 Hardness of PSMSTCS

In this section, we prove the NP-completeness of the Planar Smallest Minimum Spanning Tree Color-Spanning Set (PSMSTCS) problem. First we show that this problem belongs to NP. Given an instance of the problem, we use as a certificate the m different color points chosen from n points. The verification algorithm computes the MST of those m points and check whether the length is at most L. This process can certainly be done in polynomial time.

We then prove that PSMSTCS is NP-hard by a reduction from the 3-SAT problem. The general idea is that for a 3-SAT formula, we put some colored points on the plane such that the given 3-SAT formula is satisfiable if and only if the length of the smallest color-spanning minimum spanning tree equals some given value.

First we put the point O with a distinct color at (0,0). Given a 3-SAT formula ψ, suppose that it has n 1 variables \(x_{1},x_{2},\ldots,x_{n_{1}}\) and m 1 clauses. For each variable x i (1≤in 1), six points \(p_{i}^{1}\), \(p_{i}^{2}\), \(p_{i}^{3}\), \(p_{i}^{4}\), \(p_{i}^{5}\) and \(p_{i}^{6}\) are put at (400i−300,0), (400i−200,0), (400i−100,0), (400i,0), (400i−200,−100) and (400i,−100) respectively.

If the j-th literal, which is in the k-th clause in ψ, is x i x i ), we denote the literal by x i,j,k x i,j,k ). For every literal x i,j,k x i,j,k ), where 1≤in 1, j=1,2,3,… and 1≤km 1, eight additional points are constructed. \(p_{i,j,k}^{7}\) is put at (400i−200,−100), \(p_{i,j,k}^{8}\) at (400i,−100), \(p_{i,j,k}^{9}\) at (0,400j−300), \(p_{i,j,k}^{10}\) at (0,400j−200), \(p_{i,j,k}^{11}\) at (0,400j−100), \(p_{i,j,k}^{12}\) at (0,400j), \(p_{i,j,k}^{13}\) at \((-\frac{1}{3m_{1}},400j-200)\) and \(p_{i,j,k}^{14}\) at \((-\frac{1}{3m_{1}},400j)\). Among those fourteen points, only \(p_{i}^{5}\) and \(p_{i}^{6}\) have the same distinct color, only \(p_{i,j,k}^{7}\) and \(p_{i,j,k}^{13}\) have the same distinct color, only \(p_{i,j,k}^{8}\) and \(p_{i,j,k}^{14}\) have the same distinct color, and every one of the other points has a distinct color. Figure 7(a) shows the gadget for the variable x 1 which is also the first literal and in the first clause.

Fig. 7
figure 8

(a) The gadget for the variable x 1, which is also the first literal and in the first clause, in PSMSTCS. (b) The gadget for the variable x 1, which is also the first literal and in the first clause, in PLMSTCS. Different symbols means different colors. Every solid circle has a distinct color

Then, we have a set P of 6n 1+24m 1+1 points for the formula ψ. 4n 1 points (except O) are put on the x-axis, 2n 1+2×3m 1=2n 1+6m 1 points are put on the line y=−100, 4×3m 1=12m 1 (except O) points are put on the y-axis, 2×3m 1=6m 1 points are put on the line \(x=-\frac{1}{3m_{1}}\), one point is on the point O. Let T P be the planar smallest color-spanning minimum spanning tree over P.

Since only \(p_{i}^{5}\) and \(p_{i}^{6}\) have the same distinct color, only \(p_{i,j,k}^{7}\) and \(p_{i,j,k}^{13}\) have the same distinct color and only \(p_{i,j,k}^{8}\) and \(p_{i,j,k}^{14}\) have the same distinct color, we have to select either \(\{p_{i}^{5},p_{i,j,k}^{7},p_{i,j,k}^{14}\}\) or \(\{p_{i}^{6},p_{i,j,k}^{8},p_{i,j,k}^{13}\}\) to get the planar smallest color-spanning minimum spanning tree T P . Whichever set of points is selected, the length of T P is \(500n_{1}+400\times 3m_{1} +3m_{1}\times \frac{1}{3m_{1}}=500n_{1}+1200m_{1}+1\). If \(\{p_{i}^{5},p_{i,j,k}^{7},p_{i,j,k}^{14}\}\) are selected, x i is assigned 0. If \(\{p_{i}^{6},p_{i,j,k}^{8},p_{i,j,k}^{13}\}\) are selected, x i is assigned 1 (see Fig. 8).

Fig. 8
figure 9

Suppose x 1 is the first literal in the first clause of 3-SAT formula. (a) The planar smallest color-spanning minimum spanning tree when x 1 is assigned 0. (b) The planar smallest color-spanning minimum spanning tree when x 1 is assigned 1

Now, we need three additional points for every clause in the 3-SAT formula, where only the three points in the same clause have the same distinct color. If there is a literal x i1,j1,k1 which is the j1-th literal and in the k1-th clause, we put a point \(p_{i1,\lceil \frac{j1}{3} \rceil}^{\mathit{or}}\) at the position of \(p_{i1,j1,k1}^{13}\). If there is a literal ¬x i1,j1,k1, we put a point \(p_{i1,\lceil \frac{j1}{3} \rceil}^{\mathit{or}}\) at the position of \(p_{i1,j1,k1}^{14}\). Note that the points representing the literals in the same clause should have the same distinct color. For example, we assume that one clause is ¬x i1,j1,k1x i2,j1+1,k2x i3,j1+2,k3. We put three points \(p_{i1,\lceil \frac{j1}{3} \rceil}^{\mathit{or}}\), \(p_{i2,\lceil \frac{j1+1}{3} \rceil}^{\mathit{or}}\) and \(p_{i3,\lceil \frac{j1+2}{3}\rceil}^{\mathit{or}}\) at \(p_{i1,j1,k1}^{14}\), \(p_{i2,j1+1,k2}^{13}\) and \(p_{i3,j1+2,k3}^{13}\) respectively. Note that only \(p_{i1,\lceil \frac{j1}{3} \rceil}^{\mathit{or}}\), \(p_{i2,\lceil \frac{j1+1}{3}\rceil}^{\mathit{or}}\) and \(p_{i3,\lceil \frac{j1+2}{3} \rceil}^{\mathit{or}}\) have the same distinct color.

Figure 9 shows the variable gadget for the first clause (x 1x 2∨¬x 3) and Fig. 10 shows the gadgets for ψ=(x 1x 2∨¬x 3)∧(¬x 1x 3x 4). Let T opt be the planar smallest color-spanning minimum spanning tree over P plus the 3m 1 points for every clause. For literal x i,j,k x i,j,k ), if a variable x i is assigned 1 (0), \(\{p_{i}^{6},p_{i,j,k}^{8},p_{i,j,k}^{13}\}\) (\(\{p_{i}^{5},p_{i,j,k}^{7},p_{i,j,k}^{14}\}\)) are selected. That means the point \(p_{i,\lceil\frac{j}{3}\rceil}^{\mathit{or}}\) can be selected to construct T opt without adding any extra length comparing with T P . Therefore, if at least one literal is assigned 1 in every clause, the length of T opt equals that of T P . If all three literals in at least one clause are assigned 0, all the three points for the clause cannot be covered by points in T P and the length of T opt is greater than that of T P . Therefore, we obtain that a 3-SAT formula ψ with n 1 variables \(x_{1},x_{2},\ldots,x_{n_{1}}\) and m 1 clauses is satisfiable if and only if the length of the corresponding T opt is equal to 500n 1+1200m 1+1. Because the 3-SAT problem is NP-Complete, we obtain the following theorem:

Fig. 9
figure 10

Gadget for the clause (x 1x 2∨¬x 3), assuming it is the first clause

Fig. 10
figure 11

Gadgets for ψ=(x 1x 2∨¬x 3)∧(¬x 1x 3x 4)

Theorem 5

PSMSTCS is NP-Complete.

5 Hardness of PLMSTCS

In this section, we prove the NP-completeness of the Planar Largest Minimum Spanning Tree Color-Spanning Set (PSMSTCS) problem. Again, we first show that this problem belongs to NP. Given an instance of the problem, we use as a certificate the m different color points chosen from n points. The verification algorithm computes the MST of those m points and check whether the length is at most L. This process can certainly be done in polynomial time.

We prove this problem is NP-hard by a reduction from the 3-SAT problem. First we put a point O with a distinct color at (0,0). For a 3-SAT formula ψ, suppose that it has n 1 variables \(x_{1},x_{2},\ldots,x_{n_{1}}\) and m 1 clauses. For each variable x i (1≤in 1), five points \(p_{i}^{1}\), \(p_{i}^{2}\), \(p_{i}^{3}\), \(p_{i}^{4}\) and \(p_{i}^{5}\) are put at (201i−101,0), (201i−1,0), (201i,0), (201i−1,−10), (201i,−10) respectively.

If the j-th literal, which is in the k-th clause in ψ, is x i x i ), we also denote the literal by x i,j,k x i,j,k ). For every literal x i,j,k (or ¬x i,j,k ), ten additional points are put down. \(p_{i,j,k}^{6}\) is put at (201i−1,−9−2k), \(p_{i,j,k}^{7}\) at (201i,−9−2k), \(p_{i,j,k}^{8}\) at (201i−1,−10−2k), \(p_{i,j,k}^{9}\) at (201i,−10−2k), \(p_{i,j,k}^{10}\) at (0,400j−300), \(p_{i,j,k}^{11}\) at (0,400j−200), \(p_{i,j,k}^{12}\) at (0,400j−100), \(p_{i,j,k}^{13}\) at (0,400j), \(p_{i,j,k}^{14}\) at (−0.5,400j−200) and \(p_{i,j,k}^{15}\) at (−0.5,400j). Among those fifteen points, only \(p_{i}^{4}\) and \(p_{i}^{5}\) have the same distinct color, only \(p_{i,j,k}^{6}\) and \(p_{i,j,k}^{14}\) have the same distinct color, only \(p_{i,j,k}^{7}\) and \(p_{i,j,k}^{15}\) have the same distinct color, only \(p_{i,j,k}^{8}\) and \(p_{i,j,k}^{9}\) have the same distinct color, and every one of the other points has a distinct color. Figure 7(b) shows the gadget for the variable x 1 which is the first literal and in the first clause.

Then, we obtain a set P of 5n 1+30m 1+1 points in the plane. Let T P be the planar largest color-spanning minimum spanning tree over P. 3n 1+12m 1+1 points are on the x-axis and the y-axis, and every one of them has a distinct color.

Since only \(p_{i}^{4}\) and \(p_{i}^{5}\) have the same distinct color, only \(p_{i,j,k}^{6}\) and \(p_{i,j,k}^{14}\) have the same distinct color, only \(p_{i,j,k}^{7}\) and \(p_{i,j,k}^{15}\) have the same distinct color, only \(p_{i,j,k}^{8}\) and \(p_{i,j,k}^{9}\) have the same distinct color, we have to select either \(\{p_{i}^{4},p_{i,j,k}^{7},p_{i,j,k}^{8}, p_{i,j,k}^{14}\}\) or \(\{p_{i}^{5},p_{i,j,k}^{6},p_{i,j,k}^{9},p_{i,j,k}^{15}\}\) to obtain the planar largest color-spanning minimum spanning tree T P whose length is \((201+10)n_{1}+3m_{1}(400+0.5+2\sqrt{2})=211n_{1}+1201.5m_{1}+6\sqrt{2}m_{1}\). If \(\{p_{i}^{4},p_{i,j,k}^{7},p_{i,j,k}^{8},p_{i,j,k}^{14}\}\) are selected, x i is assigned 0. If \(\{p_{i}^{5},p_{i,j,k}^{6},p_{i,j,k}^{9},p_{i,j,k}^{15}\}\) are selected, x i is assigned 1 (see Fig. 11).

Fig. 11
figure 12

(a) The planar largest color-spanning minimum spanning tree when x 1 is assigned 0. (b) The planar largest color-spanning minimum spanning tree when x 1 is assigned 1

Now, we need three additional points for every clause in the 3-SAT formula, where only the three points in the same clause have the same distinct color. For every literal x i,j,k (or ¬x i,j,k ), we put one point \(p_{i,\lceil \frac{j}{3} \rceil}^{\mathit{or}}\) at the position of \(p_{i,j,k}^{14}\) (or \(p_{i,j,k}^{15}\)). Figure 12 shows the gadgets for ψ=(x 1x 2∨¬x 3)∧(¬x 1x 3x 4).

Fig. 12
figure 13

Gadgets for ψ=(x 1x 2∨¬x 3)∧(¬x 1x 3x 4)

Let T opt be the planar largest color-spanning minimum spanning tree over P plus the 3m 1 points for every clause. For literal x i,j,k x i,j,k ), if a variable x i is assigned 1 (0), \(\{p_{i}^{5},p_{i,j,k}^{6},p_{i,j,k}^{9},p_{i,j,k}^{15}\}\) (\(\{p_{i}^{4},p_{i,j,k}^{7},p_{i,j,k}^{8},p_{i,j,k}^{14}\}\)) are selected. This means that the point \(p_{i,\lceil\frac{j}{3}\rceil}^{\mathit{or}}\) can be selected to construct T opt . Therefore, if at least one literal is assigned 1 in every clause, the length of T opt equals that of T P plus m 1∗0.5. If all three literals in at least one clause are assigned 0, the length of T opt is less than that of T P plus 3m 1∗0.5. Consequently, we obtain that a 3-SAT formula ψ with n 1 variables \(x_{1},x_{2},\ldots,x_{n_{1}}\) and m 1 clauses is satisfiable if and only if the length of the corresponding T opt is equal to \(211n_{1}+1202m_{1}+6\sqrt{2}m_{1}\). Because the 3-SAT problem is NP-Complete, we obtain the following theorem:

Theorem 6

PLMSTCS is NP-Complete.

6 Hardness of PSPCHCS

Finally in Sect. 6, we prove the NP-completeness of the Planar Smallest Perimeter Convex Hull Color-Spanning Set (PSPCHCS) problem, followed with two simple approximation algorithms.

6.1 NP-completeness for PSPCHCS

First we show that this problem belongs to NP. Given an instance of the problem, we use as a certificate the m different color points chosen from n points. The verification algorithm computes the perimeter of the convex hull of those m points and check whether the perimeter is at most p. This process can certainly be done in polynomial time.

Again, we prove this problem is NP-hard by a reduction from the 3-SAT problem. For a given 3-SAT formula, suppose that it has n 1 variables \(x_{1},x_{2},\ldots,x_{n_{1}}\). First we draw a circle C. For two points a and b on C, let \(\widetilde{ab}\) be the arc of C from a to b and \(\overline{ab}\) be the line segment between a and b. Let the length of \(\overline{ab}\) be \(|\overline{ab}|\).

For each variable x j (j=1,2,…,n 1), we put 10 points \(x_{j}^{1}, x_{j}^{2},\ldots, x_{j}^{10}\) on the circle C in clockwise order (see Fig. 13). Note that in Fig. 13 these points are shown just for the clarity purpose, we can in fact screeze these points so that all the 10n 1 points can be put on the circle C. Only \(x_{j}^{2}\) and \(x_{j}^{3}\) have the same distinct color and only p has a distinct color for every one point p of the other points. \(|\overline{x_{j}^{1}x_{j}^{2}}|=|\overline{x_{j}^{3}x_{j}^{4}}|\). Then we add two additional points \(x_{j}^{11}\) and \(x_{j}^{12}\), where \(x_{j}^{11}\) is on the line segment \(\overline{x_{j}^{1}x_{j}^{2}}\) and \(x_{j}^{12}\) is on the line segment \(\overline{x_{j}^{3}x_{j}^{4}}\). Only \(x_{j}^{11}\) and \(x_{j}^{6}\) have the same distinct color, and only \(x_{j}^{12}\) and \(x_{j}^{9}\) have the same distinct color. Moreover, we have the following equations:

Fig. 13
figure 14

The gadget for x j . The colors of all the empty circles are different from each other and from solid colored points. Solid line segments must appear on CH opt . Dashed line segments are candidates for edges on CH opt

Δp 1≫Δp 2 ensures that either \(\overline{x_{j}^{1}x_{j}^{2}}\cup\overline{x_{j}^{2}x_{j}^{4}}\) or \(\overline{x_{j}^{1}x_{j}^{3}}\cup\overline{x_{j}^{3}x_{j}^{4}}\) is the part of CH opt . Suppose we select \(x_{j}^{1}\), \(x_{j}^{2}\) and \(x_{j}^{4}\) to be the vertices of CH opt . Then we must select \(x_{j}^{11}\), \(x_{j}^{8}\), \(x_{j}^{9}\), \(x_{j}^{10}\), \(x_{j}^{5}\) and \(x_{j}^{7}\) in order to obtain the minimum length.

If we select \(\overline{x_{j}^{1}x_{j}^{2}}\cup\overline{x_{j}^{2}x_{j}^{4}}\cup \overline{x_{j}^{4}x_{j}^{5}} \cup\overline{x_{j}^{5}x_{j}^{7}} \cup \overline{x_{j}^{7}x_{j}^{8}} \cup \overline{x_{j}^{8}x_{j}^{9}}\cup\overline{x_{j}^{9}x_{j}^{10}}\) as a part of CH opt , the Boolean variable x j is assigned 1. If we select \(\overline{x_{j}^{1}x_{j}^{3}}\cup\overline{x_{j}^{3}x_{j}^{4}}\cup \overline{x_{j}^{4}x_{j}^{5}} \cup\overline{x_{j}^{5}x_{j}^{6}} \cup \overline{x_{j}^{6}x_{j}^{7}}\cup\overline{x_{j}^{7}x_{j}^{8}}\cup \overline{x_{j}^{8}x_{j}^{10}}\) as a part of CH opt , x j is assigned 0.

Therefore, if we connect all the n 1 gadgets for the n 1 variables together to construct a convex hull CH, the perimeter of CH is n 1×(Z 1+2Z 2Z 2)+Z 4 where \(Z_{4}=2\sum_{j=1}^{n_{1}} (|\overline{x_{j}^{4}x_{j}^{5}}|+|\overline{x_{j}^{7}x_{j}^{8}}|)+\sum_{j=1}^{n_{1}-1}|\overline{x_{j}^{10}x_{j+1}^{1}}|+|\overline{x_{n_{1}}^{10}x_{1}^{1}}|\).

We need three additional points for every clause and only the three points for the same clause have the same distinct color. For every positive literal x j in one clause, we put a new point at the position of \(x_{j}^{13}\) which is inside the triangle \(\bigtriangleup x_{j}^{5}x_{j}^{6}x_{j}^{7}\) such that \(\Delta Z_{3} =|\overline{x_{j}^{5}x_{j}^{13}}|+|\overline{x_{j}^{13}x_{j}^{7}}|-|\overline{x_{j}^{5}x_{j}^{7}}|\ll\Delta Z_{2}\). For every negative literal ¬x j in one clause, we put a new point at the position of \(x_{j}^{14}\) which is inside the triangle \(\bigtriangleup x_{j}^{8}x_{j}^{9}x_{j}^{10}\) such that \(\Delta Z_{3} =|\overline{x_{j}^{8}x_{j}^{14}}|+|\overline{x_{j}^{14}x_{j}^{10}}|-|\overline{x_{j}^{8}x_{j}^{10}}|\ll\Delta Z_{2}\). For example, if there is a clause (x 1x 2∨¬x 3), we put three points at the position of \(x_{1}^{13}\), \(x_{2}^{13}\) and \(x_{3}^{14}\) respectively. \(x_{1}^{13}\), \(x_{2}^{13}\) and \(x_{3}^{14}\) lie in the triangle \(\bigtriangleup x_{1}^{5}x_{1}^{6}x_{1}^{7}\), \(\bigtriangleup x_{2}^{5}x_{2}^{6}x_{2}^{7}\) and \(\bigtriangleup x_{3}^{8}x_{3}^{9}x_{3}^{10}\) respectively.

We call \(x_{j}^{6}\) and \(x_{j}^{9}\) the apex-points, and call \(x_{j}^{13}\) and \(x_{j}^{14}\) the or-points. If all the three literals in one clause are assigned 0, then the corresponding apex-points will not be selected as the vertices of CH opt . One of the three or-points has to be selected as the vertex of CH opt because only they have the same distinct color. Then the perimeter of CH opt at least equals that of CH plus ΔZ 3. If a given 3-SAT formula with n 1 variables is satisfiable, the perimeter of CH opt equals n 1×(Z 1+2Z 2Z 2)+Z 4. Similarly, at least one literal of every clause is assigned 1 and the given 3-SAT formula is satisfiable if the perimeter of CH opt over the gadgets equals n 1×(Z 1+2Z 2Z 2)+Z 4. Otherwise, the perimeter of the convex hull CH opt n 1×(Z 1+2Z 2Z 2)+Z 4Z 3. Because the 3-SAT problem is NP-Complete, we obtain the following theorem:

Theorem 7

PSPCHCS is NP-Complete.

6.2 Approximation algorithms for PSPCHCS

Here we present two simple approximation algorithms for the PSPCHCS problem.

6.2.1 The π-approximation algorithm

For every point p in the plane, select m−1 points p 1,…,p m−1. The colors of every two points of the m selected points are different. Moreover, p i (1≤im−1) is the closest point from p among all the points which have the same color with p i . Use the m points to construct a convex hull CH p and compute the perimeter of CH p . Thus, we can obtain n perimeters and the smallest one is what we would return.

The running time for constructing a convex hull is O(mlogm) according to Graham (1972) and the running time for finding out the other m−1 points is O(n) when p is selected. Therefore, the total running time of this algorithm is O(n(n+mlogm)).

Assume that the convex hull we obtain is CH app , the length of CH app is per app , the optimal convex hull is CH opt and perimeter of CH opt is per opt . Now we prove that per app πper opt .

Suppose p a and p b are the vertices of CH opt and \(\overline{p_{a}p_{b}}\) is the diameter of CH opt (see Fig. 14). Let \(r=|\overline{p_{a}p_{b}}|\). We draw a circle C with center p a and radius r. When we select p a as the above p and select the m−1 points p 1,…,p m−1 to construct \(\mathit{CH}_{p_{a}}\), \(\mathit{CH}_{p_{a}}\) must be inside C because p i (1≤im−1) is the closest point from p among all the points which have the same color with p i and CH opt is inside C. Then the perimeter of \(\mathit{CH}_{p_{a}}\), denoted by \(\mathit{per}_{p_{a}}\), satisfies \(\mathit{per}_{p_{a}} \le 2\pi r\). Because \(\mathit{per}_{\mathit{app}} \le \mathit{per}_{p_{a}}\) and per opt ≥2r, per app πper opt .

Fig. 14
figure 15

Illustration of the π-approximation algorithm

Theorem 8

There is a π-approximation algorithm for PSPCHCS with a running time O(n 2+nmlogm).

6.2.2 The \(\sqrt{2}\)-approximation algorithm

In Abellanas et al. (2001b) proposed an O(min{n(nm)2,nm(nm)}) time algorithm for computing the smallest perimeter axis-parallel rectangle enclosing at least one point of each color. We use the above algorithm to obtain the smallest perimeter axis-parallel rectangle R. After R is obtained, we construct a convex hull CH app over the points inside R. CH app is the convex hull we need.

Let the perimeter of R be per R and let the diagonal of R be l dia and the length of l dia be d. CH app contains at least one point of each color and CH app is totally inside R (see Fig. 15). Therefore per app per R . Let R′ be the smallest perimeter axis-parallel rectangle enclosing CH opt , per R be the perimeter of R′, l dia be the diagonal of R′ and d′ be the length of l dia . Because R′ is the smallest perimeter axis-parallel rectangle, at least one point with a distinct color lies on every edge of R′. Then per Rper opt ≥2d′. We know \(\mathit{per}_{R'} \le 2\sqrt{2}d'\). Therefore, \(\mathit{per}_{R'}\ge \mathit{per}_{\mathit{opt}} \ge {\sqrt{2}\over 2}\mathit{per}_{R'}\). Hence we have \(\mathit{per}_{\mathit{opt}}\le \mathit{per}_{\mathit{app}}\le \sqrt{2} \mathit{per}_{\mathit{opt}}\), due to that per Rper R per app .

Fig. 15
figure 16

Illustration of the \(\sqrt{2}\)-approximation algorithm

Theorem 9

There is a \(\sqrt{2}\)-approximation algorithm for PSPCHCS with a running time O(min{n(nm)2,nm(nm)}).

7 Conclusions

In this paper we study several geometric problems of color-spanning sets. We propose an O(n 1+ε) time algorithm for MaxDCS and show that LCPCS is NP-Complete for the L p (1≤p<∞) metric. Moreover, we prove that LCPCS is \((\frac{1}{2}+\varepsilon)\)-APX-hard in one dimension, which means that finding an approximation algorithm whose approximation ratio is better than \(\frac{1}{2}\) is NP-hard. Then we prove that PSMSTCS, PLMSTCS and PSPCHCS are NP-Complete and propose two efficient constant factor approximation algorithms for PSPCHCS. For the future work, it will be interesting to investigate whether there exists an \(\frac{1}{2}\)-approximation for LCPCS in one dimension.