Keywords

1 Introduction

Several problems in graph theory and combinatorial optimization involve determining if a given graph G has a subgraph with certain properties. Examples include seeking paths, cycles, trees, cliques, vertex covers, matching, independent sets, bipartite subgraphs, etc. Related optimization problems include finding a maximum clique, a maximum (connected) vertex cover, a maximum independent set, a minimum (connected) dominating set, etc. These well-studied problems have significant theoretical interest and many practical applications.

We study the problem in which we are given a simple connected graph \(G=(V,E)\) whose vertex set V has each node being “” or “” (note, the color assignment might not be a proper 2-coloring of the vertices, i.e., we allow nodes of the same color to be adjacent in G). We seek a maximum-cardinality subset \(V'\subseteq V\) of the nodes such that \(V'\) is , i.e. having same number of red and blue nodes in \(V'\), and such that the induced subgraph H by \(V'\) in G is connected. We refer to this as the problem:

figure a

1.1 Connection with the Graph Motif Problem

Here we establish a connection between the BCS problem and the problem [7, 14, 20]. In the Graph Motif problem, we are given the input as a graph \(G = (V, E)\), a color function \( col : V \rightarrow \mathcal {C}\) on the vertices, and a multiset M of colors of \(\mathcal {C}\); the objective is to find a subset \(V'\subseteq V\) such that the induced subgraph on \(V'\) is connected and \(col(V')=M\). We note that if \(\mathcal {C}=\{\)red, blue\(\}\) and the motif has same number of blues and reds, then the solution of the Graph Motif problem gives a balanced connected subgraph (not necessarily a maximum balanced connected subgraph).

Fellows et al. [14] showed that the Graph Motif problem is NP-complete for trees of maximum degree 3 where the given motif is a colorful set instead of a multiset (that is, no color occurs more than once). They also showed that the Graph Motif problem remains NP-hard for bipartite graphs of maximum degree 4 and the motif contains only two colors. It is easy to observe that a solution to the Graph Motif problem (essentially) gives a solution to the BCS problem, with an impact of a polynomial factor in the running time. On the other hand the NP-hardness result for the BCS problem on a particular graph class implies the NP-hardness result for the Graph Motif problem on the same class. We conclude that BCS problem is a special case of the Graph Motif problem. Note that much of the work on the Graph Motif problem (e.g., [7, 14, 20]) is addressing the parameterized complexity of the Graph Motif problem.

1.2 Motivation and Possible Applications

In [7], the authors mentioned that the vertex-colored graph problems have numerous applications in bioinformatics. See the references of [7] for more specific applications. However, the Graph Motif problem is motivated by the applications in biological network analysis [20]. This problem also has applications in social or technical networks [4, 14] or in the context of mass spectrometry [6, 14].

The BCS problem is closely related to the problem [12, 16]. In the MNWCS problem, we are given a connected graph G(VE), with an integer weight associated with each node in V, and an integer bound B; the objective is to decide whether there exists a subset \(V'\subseteq V\) such that the subgraph induced by \(V'\) is connected and the total weight of the vertices in \(V'\) is at least B. In the MNWCS problem, if the weight of each vertex is either \(+1\) (red) or \(-1\) (blue), and if we ask for a largest connected subgraph whose total weight is zero, then it is equivalent to the BCS problem. The MNWCS problem along with its variations have numerous practical application in various fields (see [12] and the references therein). We believe some of these applications also serve well to motivate the BCS problem.

1.3 Related Work

Bichromatic input points, often referred to as “red-blue” input, has appeared extensively in numerous problems. For a detailed survey on geometric problems with red-blue points see [17]. In [5, 10, 11] colored points have been considered in the context of matching and partitioning problems. In [1], Aichholzer et al. considered the balanced island problem and devised polynomial algorithms for points in the plane. On the combinatorial side, Balanchandran et al. [2] studied the problem of unbiased representatives in a set of bicolorings. Kaneko et al. [18] considered the problem of balancing colored points on a line. Later on, Bereg et al. [3] studied balanced partitions of 3-colored geometric sets in the plane.

Finding a certain type of subgraph in a graph is a fundamental algorithmic question. In [13], Feige et al. studied the dense k-subgraph problem in which we are given a graph G and a parameter k, and the goal is to find a set of k vertices with maximum average degree in the subgraph induced by this set. Crowston et al. [8] considered parameterized algorithms for the balanced subgraph problem. Kierstead et al. [19] studied the problem of finding a colorful induced subgraph in a properly colored graph. In [9], Derhy and Picouleau considered the problem of finding induced trees in both weighted and unweighted graphs and obtained hardness and algorithmic results. They have studied bipartite graphs and triangle-free graphs; moreover, they have considered the case in which the number of prescribed vertices is bounded.

1.4 Our Results

In this paper, we consider the balanced connected subgraph problem on various graph families and present several hardness and algorithmic results.

On the hardness side, in Sect. 2, we prove that the BCS problem is NP-hard on general graphs, even for planar graphs, bipartite graphs (with a general red/blue color assignment, not necessarily a proper 2-coloring), and chordal graphs. Furthermore, we show that the existence of a balanced connected subgraph containing a specific vertex is NP-complete. In addition to that, we prove that finding the maximum balanced path in a graph is NP-hard. Note that, Fellows et al. [14] showed that the Graph Motif problem is NP-complete for bipartite graphs with two colors. However, their reduction does not imply that the BCS problem on bipartite graph is NP-hard since in their reduction the motif is not color-balanced (i.e., does not include the same number of blues and reds).

On the algorithmic side, in Sect. 3, we devise polynomial-time algorithms for trees (in \(O(n^4)\) time), split graphs (in \(O(n^2)\) time), bipartite graphs with a proper 2-coloring (in \(O(n^2)\) time), and graphs with diameter 2 (in \(O(n^2)\) time). Here, n is the number of vertices in the input graphs.

2 Hardness Results

2.1 BCS Problem on Bipartite Graphs

In this section we prove that the BCS problem is NP-hard for bipartite graphs with a general red/blue color assignment, not necessarily a proper 2-coloring. We give a reduction from the problem [15]. In this EC3Set problem, we are given a set U with 3k elements and a collection S of m subsets of U such that each \(s_i\in S\) contains exactly 3 elements. The objective is to find an exact cover for U (if one exists), i.e., a sub-collection \(S'\subseteq S\) such that every element of U occurs in exactly one member of \(S'\). During the reduction, we generate an instance \(G=(R\cup B,E)\) of the BCS problem from an instance X(SU) of the EC3Set problem as follows:

Reduction: For each set \(s_i\in S\), we take a blue vertex \(s_i \in B\). For each element \(u_j\in U\), we take a red vertex \(u_j \in R\). Now consider a set \(s_i \in S\) containing three elements, \(u_\alpha , u_\beta \), and \(u_\gamma \), and add the three edges \((s_i,u_\alpha )\), \((s_i,u_\beta )\), and \((s_i,u_\gamma )\) to the edge set E. Additionally, we consider a path of 5k blue vertices starting and ending with vertices \(b_1\) and \(b_{5k}\), respectively. Similarly, we consider a path of 3k red vertices starting and ending with vertices \(r_1\) and \(r_{3k}\), respectively. We connect these two paths by joining the vertices \(r_{3k}\) and \(b_1\) by an edge. Finally, we add edges connecting each vertex \(s_i\) with \(b_{5k}\). This completes the construction. See Fig. 1 for the complete construction. Clearly, the numbers of vertices and edges in G are polynomial in terms of the numbers of elements and sets in X; hence, the construction can be done in polynomial time. We now prove the following lemma.

Fig. 1.
figure 1

Construction of the instance G of the BCS problem. (Color figure online)

Lemma 1

The instance X of the EC3Set problem has a solution if and only if the instance G of the BCS problem has a connected balanced subgraph T with 12k vertices (6k red and 6k blue).

Proof

Assume that EC3Set problem has a solution. Let \(S^*\) be an optimal solution in it. We choose the corresponding vertices of \(S^*\) in T. Since this solution covers all \(u_j\)’s. So we select all \(u_j\)’s in T. Finally we select all the 5k blue and 3k red vertices in T, resulting in a total of 6k red and 6k blue vertices.

On the other hand, assume that there is a balanced tree T in G with 6k vertices of each color. The solution must pick the 5k blue vertices \(b_1, \ldots , b_{5k}\). Otherwise, it exclude the 3k red vertices \(r_1, \ldots ,r_{3k}\), and reducing the size of the solution. Since the graph G has at most 6k red vertices, at most k vertices can be picked from the set \(s_1, \ldots ,s_m\) and need to cover all the 3k red vertices corresponding to \(u_j\) for \(1\le j\le 3k\). Hence, this k sets give an exact cover.    \(\square \)

It is easy to see that the graph we constructed from the (EC3Set) problem in Fig. 1 is indeed a bipartite graph. Hence we conclude the following theorem.

Theorem 1

The BCS problem is NP-hard for bipartite graphs.

2.2 NP-Hardness: BCS Problem on Special Classes of Graphs

In this section, we show that the BCS problem is NP-hard even if we restrict the graph classes to be planar, or chordal graphs.

Planar Graphs: In this section we prove that BCS problem is NP-hard for planar graphs. We give a reduction from the [15]. In this problem, we are given a planar graph \(G = (V,E)\), a subset \(X \subseteq V\), and a positive integer \(k \in \mathbb {N}\). The objective is to find a tree \(T =(V',E')\) with at most k edges such that \( X \subseteq V'\). Without loss of generality we assume that \(k\ge |X|-1\), otherwise the STPG problem has no solution.

Reduction: We generate an instance \(H=(R \cup B,E(H))\) for the BCS problem from an instance \(G = (V,E)\) of the STPG problem. We color all the vertices, V, in G as blue. We create a set of |X| red vertices as follows: for each vertex \(u_i \in X\), we create a red vertex \(u'_i\) in H, and we connect \(u'_i\) to \(u_i\) via an edge. Additionally, we take a set Z of \((k+1-|X|)\) red vertices in H and the edges \((z_{j},u'_1)\) into E(H), for each \(z_{j} \in Z\). Hence we have, \(B =V\), and \(R= Z \cup \{u'_i ; \ 1 \le i \le |X|\} \). Note that \(|R|<|B|\) and \( |R|=(k+1) \). This completes the construction. For an illustration see Fig. 2. Clearly the number of vertices and edges in H are polynomial in terms of vertices in G. Hence the construction can be done in polynomial time. We now prove the following lemma.

Fig. 2.
figure 2

Schematic construction for planar graphs. (Color figure online)

Lemma 2

The STPG problem has a solution if and only if the instance H of the BCS problem has a balanced connected subgraph with \((k+1)\) vertices each of the two colors.

Proof

Assume that STPG has a solution. Let \(T=(V',E')\) be the resulting Steiner tree, which contains at most k edges and \(X\subseteq V'\). If \( |V'|=(k+1) \) then the subgraph of H induced by \((V' \cup R)\) is connected and balanced with \((k+1)\) vertices of each color. If \( |V'| < (k+1) \) then we take a set Y of \(((k+1)-|V'|)\) many vertices from V such that the subgraph of G induced by \((V' \cup Y)\) is connected. Clearly \( |V' \cup Y|=(k+1) \). Now the subgraph of H induced by \((V' \cup Y \cup R)\) is connected and balanced with \((k+1)\) vertices of each red and blue color.

On the other hand, assume that there is a balanced connected subgraph \(H'\) of H with \((k+1)\) vertices of each color. Note that, except vertex \(u'_1\), in H all the red vertices are of degree 1 and connected to blue vertices. Let \(G'\) be the subgraph of G induced by all blue vertices in \(H'\). Since H is connected and there is no edge between any two red vertices, \(G'\) is connected. Since \(G'\) contains \((k+1)\) vertices, any spanning tree T of \(H'\) contains k edges. So T is a solution of the STPG problem.    \(\square \)

Theorem 2

The BCS problem is NP-hard for planar graphs.

Chordal Graphs: We prove that the BCS problem is NP-hard where the input graph is a chordal graph. The hardness construction is similar to the construction in Sect. 2.1; we modify the construction so that the graph is chordal. In particular, we add edges between \(s_{i}\) and \(s_{j}\) for each \(i \ne j, 1 \le i,j \le m\). For this modified graph, it is easy to see that a lemma identical to Lemma 1 holds. Hence, we conclude that the BCS problem is NP-hard for chordal graphs.

2.3 NP-Hardness: BCS Problem with a Specific Vertex

In this section we prove that the existence of a balanced subgraph containing a specific vertex is NP-complete. We call this problem the problem. The reduction is similar to the reduction used in showing the NP-hardness of the BCS problem; we also use here a reduction from the EC3Set problem (see Sect. 2.1 for the definition).

Reduction: Assume that we are given a EC3Set problem instance \(X=(U,S)\), where set U contains 3k elements and a collection S of m subsets of U such that each \(s_i\in S\) contains exactly 3 elements. We generate an instance G(RBE) of the BCS-existence problem from X as follows. The red vertices R are the elements \(u_j\in U\); i.e., \(R=U\). The blue vertices B are the 3-element sets \(s_i\in S\); i.e., \(B=S\). For each blue vertex \(s_i=\{u_\alpha ,u_\beta ,u_\gamma \} \in S=B\), we add the 3 edges \((s_i,u_\alpha )\), \((s_i,u_\beta )\), and \((s_i,u_\gamma )\) to the set E of edges of G. We instantiate an additional set of 2k blue vertices, \(\{b_1,\ldots ,b_{2k}\}\), and add edges to E to link them into a path \((b_1,b_2,\ldots ,b_{2k})\). Finally, we add an edge from \(b_{2k}\) to each of the blue vertices \(s_i\). Refer to Fig. 3.

Fig. 3.
figure 3

Construction of the instance G of the BCS problem containing \( b_{1}\). (Color figure online)

Clearly, the number of vertices and edges in G are polynomial in terms of number of elements and sets in the EC3Set problem instance X, and hence the construction can be done in polynomial time. We now prove the following lemma.

Lemma 3

The instance X of the EC3Set problem has a solution iff the instance G of the corresponding BCS existence problem has a balanced subgraph T containing the vertex \(b_1\).

Proof

Assume that the EC3Set problem has a solution, and let \(S^*\) be the collection of \(k=|S^*|\) sets of S in the solution. Then, we obtain a balanced subgraph T that contains \(b_1\) as follows: T is the induced subgraph of the 3k red vertices U, together with the k blue vertices \(S^*\) and the 2k blue vertices \(b_1,\ldots ,b_{2k}\). Note that T is balanced and connected and contains \(b_1\).

Conversely, assume there is a balanced connected subgraph T containing \(b_{1}\). Let t be the number of (blue) vertices of S within T. First, note that \(t\le k\). (Since T is balanced and contains at most 3k red vertices, it must contain at most 3k blue vertices, 2k of which must be \(\{b_1,\ldots ,b_{2k}\}\), in order that T is connected.) Next, we claim that, in fact, \(t\ge k\). To see this, note that each of the t blue vertices of T that corresponds to a set in S is connected by edges to 3 red vertices; thus, T has at most 3t red vertices. Now, T has \(2k+t\) blue vertices (since it has t vertices other than the path \((b_1,\ldots ,b_{2k})\)), and T is balanced; thus, T has exactly \(2k+t\) red vertices, and we conclude that \(2k+t\le 3t\), implying \(k\le t\), as claimed. Therefore, we need to select exactly k blue vertices corresponding to the sets S, and these vertices connect to all 3k of the red vertices. The k sets corresponding to these k blue vertices is a solution for the EC3Set problem.    \(\square \)

Clearly, the BCS existence problem is in NP. Hence, we conclude:

Theorem 3

It is NP-complete to decide if there exists a connected balanced subgraph that contains a specific vertex.

2.4 NP-Hardness: Balanced Connected Path Problem

In this section we consider the Problem and prove that it is NP-hard. In this problem instead of finding a balanced connected subgraph, our goal is to find a balanced path with a maximum cardinality of vertices. To prove the BCP problem is NP-hard we give a polynomial time reduction from the problem which is known to be NP-complete [15]. In this problem, we are given an undirected graph Q, and the goal is to find a Hamiltonian path in Q i.e., a path which visits every vertex in Q exactly once. In the reduction we generate an instance G of the BCP problem from an instance Q of the Ham-Path problem as follows:

Reduction: We make a new graph \(Q'\) from Q. Let us assume that the graph Q contains m vertices. If m is even then \(Q'=Q\). If m is odd, then we add a dummy vertex u in Q, connect to every other vertices in Q by edges with u and attach a path of length 2 to u. The resulting graph is our desired \(Q'\). It is easy to observe that, Q has a Hamiltonian path if and only if \(Q'\) has a Hamiltonian path.

Now we have a Ham-Path instance \(Q'\) with even number of vertices, say n. We arbitrary choose any n / 2 vertices in \(Q'\) and color them red and color the remaining n / 2 vertices blue. Let G be the colored graph. This completes the construction. Clearly, this can be done in polynomial time.

Lemma 4

\(Q'\) has a Hamiltonian path T if and only if G has a balanced path P with exactly n vertices.

Proof

Assume that \(Q'\) has a Hamiltonian path T. This implies that, T visits every vertex in \(Q'\). Since by the construction there are exactly half of the vertices in G is red and remaining are blue, the same path T is balanced with n/2 vertices of each color. On the other hand, assume that there is a balanced path P in G with exactly n/2 vertices of each color. Since, G has a total of n vertices, the path P visits every vertex in G. Hence, P is a Hamiltonian path.    \(\square \)

Theorem 4

The BCP problem is NP-hard for general graph.

3 Algorithmic Results

In this section, we consider several graph families and devise polynomial time algorithms for the BCS problem. Notice that, if the graph is a path or cycle, the optimal solution is just a path. Hence, one can do brute-force search to obtain the maximum balanced path. In case of a complete graph \(K_n\), we output a sub-graph H of \(K_{n}\) induced by V, where \(|V|= 2|B|\), \(B \subset V\), and B is the set of all blue vertices in \(K_{n}\) (assuming that, the number of blue vertices is at most the number of red vertices in \(K_{n}\)). Clearly, H is the maximum-cardinality balanced connected subgraph in \(K_{n}\). We consider trees, split graphs, bipartite graphs (properly colored), graphs of diameter 2, and present polynomial algorithms for each of them.

3.1 Trees

In this section we give a polynomial time algorithm for the BCS problem where the input graph is a tree. We first consider the following problem.

Problem 1:

Given a tree \(T=(V,E)\), and a root \(t \in V\) where \(V=V_R\cup V_B\). The vertices in \(V_R\) and \(V_B\) are colored red and blue,respectively. The objective is to find maximum balanced tree with root t.

We now design an algorithm to solve this problem. Let v be a vertex in G. We associate a set \(P_v\) of of the form (rb) to v, where r is the count of red vertices and b is the count of blue vertices. A single pair (rb) associated with vertex v indicates that there is a subtree rooted at v having r red and b blue vertices. Note that r may not be equal to b. Now for any k pairs, the sum is also a pair which is defined as the element-wise sum of these k pairs. Let \(A_1,A_2, \ldots ,A_k\) be k sets. The Minkowski sum \(^M\sum _{i=1}^k A_i\) denotes the set of sums of k elements one from each set \(A_i\) i.e., \(^M\sum _{i=1}^k A_i = A_1 \oplus A_2 \oplus \ldots \oplus A_k\). We use \(\oplus \) to denote Minkowski sum between sets. For example, for the Minkowski sum of the sets A and B, we write \(A \oplus B\) and it means \(A \oplus B=\{a+b :a\in A, b\in B\}\).

Now we are ready to describe the algorithm to solve Problem 1. In Algorithm 1, we describe how to get maximum balanced subtree with root t for a tree T rooted at t.

figure b

In Algorithm 1 we compute a finite set \(P_{t}\) of pairs \(\{(r,b)\}\) at the root t in T. To do so, we recursively calculate the set of pairs from leaf to the root. For an internal vertex v, the set \(P_v\) is calculated as follows: let the color of v is red and it has k children \(u_1,u_2,\ldots ,u_k\). Then, \(P_{v} = \{(0,0)\} \cup \{^M\sum _{i=1}^{k} P_{u_i} \oplus \{(1,0)\}\}\). We now prove the following lemma.

Lemma 5

Let T be rooted tree with t as a root. Then Algorithm 1 produces all possible balanced subtrees rooted at t in \(O(n^6)\) time.

Proof

Notice that in Algorithm 1, at each node \(v\in T\), we store a set \(P_{v}\) of pairs \(\{(r_{i},b_{i})\}\), where each \((r_{i},b_{i})\) indicates that there exists a subtree \(T'\) with root v such that number of red and blue vertices in \(T'\) are \(r_{i}\) and \(b_{i}\), respectively. Note that \(r_{i}\) may not be the same as \(b_{i}\). When we construct the set \(P_v\), all the sets corresponding to its children are already calculated. Finally, in steps 6 and 8 of Algorithm 1 we calculate the set \(P_v\) based on the color of v. Hence, when Algorithm 1 terminates, we get the set \(P_t\) where t is the root of T.

Now we calculate the time taken by Algorithm 1. Clearly, steps 2 and 4 take O(1) time to construct the \(P_v\) when v is a leaf. Note that, the size of \(P_v\), for an internal node v is \(O(n^2)\). Since there are at most n blue and red vertices in the subtree rooted at v. If v has k children then we have to take Minkowski sum of the sets corresponds to the children of v. To get the sum of two sets it takes \( O(n^4) \) time. As there are at most n children of node v, so the time taken by steps 6 and 8 are \(O(n^5)\). Finally, we traverse the tree from bottom to the root. Hence, the total time taken by the algorithm is \(O(n^6)\).    \(\square \)

We can now improve the time complexity by slightly modifying the Algorithm 1. For an internal vertex v, we actually do not need all the pairs to get the maximum balanced subtree. Suppose there are two pairs (ab) and (cd) in \(P_{v}\), where \((b-a)=(d-c)\) and \(a<c\). Then, instead of using the subtree with pair (ab), it is better to use the subtree with pair (cd), since it may help to construct a larger balance subtree. Therefore, in a set \(P_{v}\) if there are k pairs \(\{(a_{i},b_{i}); 1 \le i \le k \}\) such that \((b_{i}-a_{i})=(b_{j}-a_{j})\) whenever \(i \ne j\), \(1 \le i,j \le k\). Then we remove the \((k-1)\) pairs and store only the pair which is largest among all these k pairs. We say \((a_{m},b_{m})\) is largest when \(a_{m} > a_{i}\) and \(b_{m}> b_{i}\) for \(1 \le i \le k, i \ne m\). So we reduce the size of \(P_{v}\) for each vertex \(v \in T\) from \( O(n^2) \) to O(n). Let T(n) be the time to compute red-blue pairset for the root vertex t in the tree T with size n. If r has k children \(u_{1}, u_{2},\ldots ,u_{k}\) with size \(n_{1},n_{2},\ldots ,n_{k}\). Then the recurrence is \(T(n)= T(n_{1})+T({n_{2})+\ldots +T(n_{k})+O(\sum _{i=1}^{k-1} (n_{1}+ n_{2}+ \dots + n_{i})n_{i+1})}\). Now \( \sum _{i=1}^{k-1} (n_{1}+ n_{2}+ \dots n_{i})n_{i+1} \le \sum _{i=1}^{k-1} nn_{i+1} = n \sum _{i=1}^{k-1} n_{i+1} \le n^2\). which gives the solution that \(T(n)= O(n^3)\). Hence, we conclude the following lemma.

Lemma 6

Let T be rooted tree with t as a root. We can produces all possible balanced subtrees rooted at t in \(O(n^3)\) time and \( O(n^2) \) space complexity.

Optimal Solution for BCS Problem in Tree

If there are n nodes in the tree T, then, for each node \(v_{i}, 1 \le i \le n\), we consider T to be a tree rooted at \(v_{i}\). We then apply Algorithm 1 to find maximum-cardinality balanced subtree rooted at \(v_{i}\); let \(T_i\) be the resulting balanced subtree, having \(m_{i}\) vertices of each color. Then, to obtain an optimal solution for the BCST problem in T we choose a balanced subtree that has \(\max \{m_i; 1 \le i \le n\}\) vertices of each color. Now we can state the following theorem.

Theorem 5

Let T be a tree whose n vertices are colored either red or blue. Then, in \(O(n^4)\) time and \(O(n^2)\) space, one can compute a maximum-cardinality balanced subtree of T.

3.2 Split Graphs

A graph \(G = (V, E)\) is defined to be a split graph if there is a partition of V into two sets S and K such that S is an independent set and K is a complete graph. There is no restriction on edges between vertices of S and K. Here we give a polynomial time algorithm for the BCS problem where the input graph \(G=(V,E)\) is a split graph. Let V be partitioned into S and K where S and K induce an independent set and a clique respectively in G. Also, let \(S_B\) and \(S_R\) be the sets of blue and red vertices in S, respectively. Similarly, let \(K_B\) and \(K_R\) be the sets of blue and red vertices in K, respectively. We argue that there exists a balanced connected subgraph in G, having \(\min \{|S_B\cup K_B|,|S_R\cup K_R|\}\) vertices of each color.

Note that if \(|S_{B}\cup K_{B}|=|S_{R}\cup K_{R}|\) then G itself is balanced. Now, w.l.o.g., we can assume that \(|S_{B}\cup K_{B}|<|S_{R}\cup K_{R}|\). We will find a connected balanced subgraph H of G, where the number of vertices in H is exactly \(2|S_{B}\cup K_{B}|\). To do so, we first modify the graph \(G=(V,E)\) to a graph \(G'=(V,E')\). Then, from \(G'\), we will find the desired balanced subgraph with \(|S_B\cup K_B|\) many vertices of each color. Moreover, this process is done in two steps.

Step 1: Construct \(G'=(V,E')\) from \(G=(V,E)\).

For each \(u \in S_{B}\), if u is adjacent to at least a vertex \(u'\) in \(K_{R}\), then remove all adjacent edges with u except the edge \((u,u')\). Similarly, for each \(v \in S_{R}\), if v is adjacent to at least a vertex \(v'\) in \(K_{B}\), then remove all adjacent edges with v except the edge \((v,v')\).

Step 2: Delete \(|S_{R}\cup K_{R}|-|S_{B}\cup K_{B}|\) vertices from \(G'\).

Let \(k= |S_{R}\cup K_{R}|-|S_{B}\cup K_{B}|\). Now we have following cases.

Case 1:\(|S_{R}| \ge k\). We remove k vertices from \(S_{R}\) in \(G'\). Clearly, after this modification, \(G'\) is connected, and we get a balanced subgraph having \(|S_B\cup K_B|\) vertices of each color.

Case 2:\(|S_{R}|< k\). Then we know, \(|K_{R}| > |K_{B}\cup S_{B}|\). Let \(S'_{B} \subseteq S_{B}\) be the set of vertices in \(G'\) such that each vertex of \(S'_{B}\) has exactly one neighbor in \(K_R\). Then, we take a set \( X\subset K_{R} \) with cardinality \(|K_{B} \cup S_{B}|\) such that X contains all adjacent vertices of \(S'_{B}\). Now we take the subgraph H of \(G'\) induced by \((S_{B} \cup K_{B} \cup X )\). H is optimal and balanced.

Running Time: Step 1 takes O(|E|) time to construct \(G'\) from G. Now in step 2, both Case 1 and Case 2 take O(|V|) time to delete \(|S_{R}\cup K_{R}|-|S_{B}\cup K_{B}|\) vertices from \(G'\). Hence, the total time taken is \(O(n^2)\), where n is the number of vertices in G. We conclude in the following theorem.

Theorem 6

Given a split graph G of n vertices, with r red and b blue (\(n=r+b\)) vertices, then, in \(O(n^2)\) time we can find a balanced connected subgraph of G having \(\min \{b,r\}\) vertices of each color.

3.3 Bipartite Graphs, Properly Colored

In this section, we describe a polynomial-time algorithm for the BCS problem where the input graph is a bipartite graph whose nodes are colored red/blue according to proper 2-coloring of vertices in a graph. We show that there is a balanced connected subgraph of G having \(\min \{b,r\}\) vertices of each color where G contains r red vertices and b blue vertices. Note that we earlier showed that the BCS problem is NP-hard in bipartite graphs whose vertices are colored red/blue arbitrarily; here, we insist on the coloring being a proper coloring (the construction in the hardness proof had adjacent pairs of vertices of the same color). We begin with the following lemma.

Lemma 7

Consider a tree T (which is necessarily bipartite) and a proper 2-coloring of its nodes, with r red nodes and b blue nodes. If \(r<b\), then T has at least one blue leaf.

Proof

We prove it by contradiction. Let there is no blue leaf. Now assign any blue node say \(b_{r}\) as a root. Note that it always exists. Now \(b_{r}\) is at level 0 and \(b_{r}\) has degree at least 2. Otherwise, \(b_{r}\) is a leaf with blue color. We put all the adjacent vertices of \(b_{r}\) in level 1. This level consists of only red vertices. In level 2 we put all the adjacent vertices of level 1. So level 2 consists of only blue vertices. This way we traverse all the vertices in T and let that we stop at \(k^{th}\)-level. k cannot be even as all the vertices in even level are blue. So k must be odd. Now for each \(0 \leqslant i \leqslant \frac{k-1}{2}\), in the vertices of (level 2i \(\cup \) level \((2i+1)\)), number of blue vertices is at most the number of red vertices. Which leads to the contradiction that \(r<b\). Hence there exists at least one leaf with blue color.    \(\square \)

Now we describe the algorithm. We first find a spanning tree T in G. If \(r=b\) then T itself is a maximum balanced subtree (subgraph also) of G. Without loss of generality assume that \(r<b\). So by Lemma 7, T has at least 1 blue leaf. Now we remove that blue leaf from T. Using similar reason, we repetitively remove \((b-r)\) blue vertices from T. Finally, T becomes balanced subgraph of G, with r vertices of each color.

Running Time: Finding a spanning tree in G requires \(O(n^2)\) time. To find all the leaves in the tree T requires O(n) time (breadth first search). Hence the total time is needed is \(O(n^2)\).

Now, we state the following theorem.

Theorem 7

Given a bipartite graph G with a proper 2 coloring (r red or b blue vertices), then in \(O(n^2)\) time we can find a balanced connected subgraph in G having \(\min \{b,r\}\) vertices of each color.

3.4 Graphs of Diameter 2

In this section, we give a polynomial time algorithm for the BCS-problem where the input graph has diameter 2. Let G(VE) be such a graph which contains b blue vertex set B and r red vertex set R. We find a balanced connected subgraph H of G having \(\min \{b,r\}\) vertices of each color. Assume that \(b<r\). This can be done in two phases. In phase 1, we generate an induced connected subgraph \(G'\) of G such that (i) \(G'\) contains all the vertices in B, and (ii) the number of vertices in \(G'\) is at most \((2b-1)\). In phase 2, we find H from \(G'\).

Phase 1: To generate \(G'\), we use the following result.

Lemma 8

Let \(G=(V,E)\) be a graph of diameter 2. Then for any pair of non adjacent vertices u and v from G, there always exists a vertex w such that both \((u,w)\in E\) and \((v,w) \in E\).

We first include B in \(G'\). Now we have the following two cases.

Case 1: The induced subgraph G[B] of B is connected. In this case, \(G'\) is G[B].

Case 2: The induced subgraph G[B] of B is not connected. Assume that G[B] has \(k(>1)\) components. Let \(B_{1},B_{2},\ldots , B_{k}\) be k disjoints sets of vertices such that each induced subgraph \(G[B_i]\) of \(B_i\) in G is connected. Now using Lemma 8, any two vertices \(v_i \in B_i\) and \(v_j \in B_j\) are adjacent to a vertex say \(u_\ell \in R\). We repetitively apply Lemma 8 to merge all the k subgraphs into a larger graph. We need at most \((k-1)\) red vertices to merge k subgraph. We take this larger graph as the graph \(G'\).

Phase 2: In this phase, we find the balanced connected subgraph H with b vertices of each color. Note that the graph \(G'\) generated in phase 1 contains b blue and at most \((b-1)\) red vertices. Assume that \(G'\) contains \(b'\) red vertices. We add \((b-b')\) red vertices from \(G \setminus G'\) to \(G'\). This is possible since G in connected.

Running Time: In phase 1, first finding all the blue vertices and it’s induced subgraph takes \(O(n^2)\) time. Now to merge all the k components into a single component which is \(G'\) needs \(O(n^2)\) time. In phase 2, adding \((b-b')\) red vertices to \(G'\) takes \(O(n^2)\) time as well. Hence, total time requirement is \(O(n^2)\).

Theorem 8

Given a graph \(G = (V, E)\) of diameter 2, where the vertices in G are colored either red or blue. If G has b blue and r red vertices then, in \(O(n^2)\) time we can find a balanced connected subgraph in G having \(\min \{b,r\}\) vertices of each color.