Keywords

1 Introduction

In an instance of the classical stable matching problem we are given a (complete) bipartite graph \(G=(A \,\cup \, B, E)\) where, following standard terminology, the nodes in A will be referred to as men, and the nodes in B represent women. Each man \(a \in A\) possesses a (strict, and complete) preference order over women in B, and similarly, all women in B have a preference order over men in A. A matching M in G is called stable if there are no blocking pairs (ab); i.e. there do not exist \((a,b) \not \in M\) where both a and b prefer each other over their current partners in M (if there are any). In their celebrated work  [4], Gale and Shapley proposed an efficient algorithm for finding a stable matching, providing a constructive proof that stable matchings always exist.

Stable matchings have wide-spread applications (e.g., see Manlove  [3]), and many of these are large-scale. Therefore, as McDermid  [15] points out, assuming that preferences are complete and strict is not realistic. Thus, in this paper, we will focus on stable matchings in the setting where preference lists are allowed to be incomplete and contain ties. Here, a woman is allowed to be indifferent between various men, and similarly, a man may be indifferent between several women. In this setting we consider the maximum-cardinality stable matching problem where the goal is to find a stable matching of maximum cardinality.

It is well-known that, in the settings where G is either complete or preferences do not contain ties, all stable matchings have the same cardinality  [5]. Moreover, a straightforward extension of the algorithm in [4] solves our problem in these cases. When ties and incomplete preferences are permitted simultaneously, on the other hand, the problem of finding a maximum-cardinality stable matching is well-known to be NP-hard  [14]. Furthermore, Yanagisawa  [17] showed that it is NP-hard to find a \((33/29 - \varepsilon )\)-approximate, maximum-cardinality stable matching. The same author also showed that assuming the unique games conjecture (UGC) it is hard to achieve performance guarantee of \(4/3 - \varepsilon \).

On the positive side, maximum-cardinality stable matchings with ties and incomplete preferences have attracted significant attention [1, 6,7,8,9,10,11,12, 15, 16]. The best-known approximation algorithms for the problem achieve an approximation ratio of 3/2  [9, 15, 16].

How does the hardness of maximum-cardinality stable matching depend on the maximum allowed size of ties in the given instance? Huang and Kavitha  [6] recently considered the case where the size of any tie is bounded by \(L=2\). The authors proposed an algorithm and showed that its performance guarantee is at most 10/7. Chiang and Pashkovich  [2] later provided an improved analysis for the same algorithm, showing that its real performance ratio is at most 4/3, and this result is tight under the UGC  [17]. Lam and Plaxton  [13] very recently designed a \(1+(1-1/L)^L\)-approximation algorithm for the so-called one-sided special case of our problem, where only preferences of men are allowed to have ties.

1.1 Our Contribution

Our main result is captured in the following theorem. Note that the integrality gap of the natural LP relaxation for the problem is at least \((3L-2)/(2L-1)\)  [8]. Hence, the performance ratio of our algorithm matches the known lower bound on the integrality gap.

Theorem 1

Given an instance of the maximum-cardinality stable matching problem with incomplete preferences, and ties of size at most L; the polynomial-time algorithm described in Sect. 2 finds a stable matching M with

$$ |M|\ge \frac{2L-1}{3L-2} \,|\textsf {OPT}|, $$

where \(\textsf {OPT}\) is an optimal stable matching.

Our algorithm is an extension of that by Huang and Kavitha  [6] for ties of size two: every man has L proposals where each proposal goes to the acceptable women. Women can accept or reject these proposals under the condition that no woman holds more than L proposals at any point during the algorithm. Similar to the algorithm in [6], we use the concept of promotion introduced by Király  [9] to grant men repeat chances in proposing to women. In comparison to [6], the larger number of proposals in our algorithm leads to subtle changes to the forward and rejection mechanisms of women, and to further modifications to the way we obtain the output matching.

Our analysis is inspired by the analyses of both, Chiang and Pashkovich  [2], and Huang and Kavitha  [6], but requires several new ideas to extend it to the setting with larger ties. In both  [6] and  [2], the analyses are based on charging schemes: some objects are first assigned some values, called charges, and then charges are redistributed to nodes by a cost function. After a charging scheme is determined, relations between the generated total charges, and the sizes of output and optimal matchings are established, respectively, that lead to an approximation ratio. The analysis in  [6] employs a complex charging scheme that acts globally, possibly distributing charges over the entire graph. In contrast, the charging scheme in  [2] is local in nature, and exploits only the local structure of the output and optimal matchings, respectively.

We do not know of a direct way to extend the local cost-based analysis of [2] to obtain an approximation algorithm whose performance beats the best known 3/2-approximation for the general case. Indeed we believe that any such improvement must involve a non-trivial change in the charging scheme employed. As a result, we propose a new analysis that combines local and global aspects from [2, 6]. The central technical novelty in the analysis is captured by Lemma 4 that provides an improved lower bound on the cost of components. As we will see later, our new charging scheme allows for a more fine-grained accounting of augmenting paths for the output matching of our algorithm.

2 Algorithm for Two-Sided Ties of Size Up to L

We introduce some notational conventions. Let \(a', a'' \in A\) be on the preference list of \(b \in B\). We write \(a' \simeq _b a''\) if b is indifferent between \(a'\) and \(a''\), and we write \(a'>_b a''\), or \(a' \ge _b a''\) if b strongly, or weakly prefers \(a'\) over \(a''\), respectively. The preferences of men over women are defined analogously. For \(c \in A \,\cup \, B\), we let N(c) denote the set of nodes adjacent to c in G.

2.1 How Men Propose

Each man \(a\in A\) has L proposals \(p_a^1, p_a^2, \ldots , p_a^L\). A man starts out as basic, and later becomes 1-promoted before he is eventually elevated to 2-promoted status. Each man \(a \in A\) has a rejection history R(a) which records the women who rejected a proposal from a during his current promotion status. Initially, we let \(R(a)=\varnothing \), for all \(a \in A\).

Each proposal \(p_a^i\) for \(a\in A\) and \(i = 1,2,\ldots ,L\) goes to a woman in \(N(a)\setminus R(a)\) most preferred by a, and ties are broken arbitrarily. If a proposal \(p_a^i\) for \(a\in A\) and \(i = 1,2,\ldots ,L\) is rejected by a woman \(b\in B\), b is added to the rejection history of a, and subsequently, \(p_a^i\) is sent to a most preferred remaining woman in \(N(a)\setminus R(a)\).

Suppose now that R(a) becomes equal to N(a) for some man \(a \in A\). If a is either basic or 1-promoted then a’s rejection history is cleared, and a is promoted. Otherwise, if a is already 2-promoted, a stops making proposals.

2.2 How Women Decide

Each woman \(b\in B\) can hold up to L proposals, and among these more than one can come from the same man. Whenever she holds less than L proposals, newly received proposals are automatically accepted. Otherwise, b first tries to bounce one of her proposals, and if that fails, she will try to forward one of her proposals. If b can neither bounce nor forward a proposal, then b rejects a proposal.

We continue describing the details. In the following, we let P(b) and A(b) denote the set of proposals held by \(b \in B\) at the current point, and the set of men corresponding to these, respectively. Suppose that \(|P(b)|=L\), and that b receives a new proposal \(p_a^i\) for some \(a\in A\) and \(i=1,\ldots , L\).

Bounce Step. If there is a man \(\alpha \in A(b) \,\cup \, \{a\}\) and a woman \(\beta \in B\setminus \{b\}\) such that \(\beta \simeq _{\alpha } b\), and \(\beta \) currently holds less than L proposals, then we move one of \(\alpha \)’s proposals from b to \(\beta \), and we call the bounce step successful.

Forward Step. If there is a man \(\alpha \in A(b) \cup \{a\}\) and a woman \(\beta \in B \setminus \{b\}\) such that \(\beta \simeq _{\alpha } b\), at least two proposals from \(\alpha \) are present in P(b), no proposal from \(\alpha \) is present in \(P(\beta )\) and \(\beta \) is not in \(R(\alpha )\), then b forwards a proposal \(p_{\alpha }^j \in P(b)\cup \{p^i_a\}\) for some \(j=1,\ldots , L\) to \(\beta \) and the forward step is called successful. As a consequence of a successful forward step, \(\alpha \) makes the proposal \(p_{\alpha }^j\) to \(\beta \).

We point out that bounce and forward steps do not lead to an update to the rejection history of an involved man. To describe the rejection step, we introduce the following notions. For a woman \(b\in B\), a proposal \(p_{a'}^{i'}\) is called more desirable than \(p_{a''}^{i''}\) for \(a',a''\in A\) and \(i', i''=1,\ldots , L\) if b strongly prefers \(a'\) to \(a''\), or if b is indifferent between \(a'\) and \(a''\) and \(a'\) has higher promotion status than \(a''\). A proposal \(p^{i'}_{a'} \in P(b)\) is least desirable in P(b) if \(p^{i'}_{a'}\) is not more desirable than any proposal in P(b). Whenever \(b \in B\) receives a proposal \(p_a^i\), \(|P(b)|=L\), and neither bounce nor forward steps are successful, we execute a rejection step.

Rejection Step. If there is unique least desirable proposal in \(P(b) \cup \{p_a^i\}\), then b rejects that proposal. Otherwise, if there are more than one least desirable proposal in P(b), b rejects a proposal from a man with the largest number of least desirable proposals in \(P(b) \cup \{p_a^i\}\). If there are several such men, then we break ties arbitrarily. Subsequently, b is added to the rejection history of the man whose proposal is rejected.

2.3 The Algorithm

An approximate maximum-cardinality stable matching for a given instance \(G = (A \cup B, E)\) is computed in two stages.

Stage 1. Please see Algorithm 1 for the pseudo code for Stage 1.

Men propose in an arbitrary order and women bounce, forward or reject proposals as described above. The first stage finishes, when for each man \(a \in A\), one of the following two conditions is satisfied: all proposals of a are accepted; R(a) becomes equal to N(a) for the third time.

We represent the outcome of the first stage as a bipartite graph \(G' = (A \cup B, E')\) with the node set \(A \cup B\) and the edge set \(E'\), where each edge \((a, b) \in E'\) denotes a proposal from a held by b at the end of the first stage. Note that \(G'\) may be a multigraph in which an edge of the form (ab) appears with multiplicity equal to the number of proposals that b holds from a. Clearly, each node u in \(G'\) has degree at most L, denoted by \(\deg _{G'}(u) \le L\), since every man has at most L proposals that may be accepted and every woman can hold at most L proposals at any point in the first stage.

figure a

Stage 2. We compute a maximum-cardinality matching M in \(G'\) such that all nodes of degree L in \(G'\) are matched. The existence of such matching is guaranteed by Lemma 1. The result of the second stage is such a matching M, that is the output of the algorithm.

Lemma 1

There exists a matching in the graph \(G'\) such that all nodes of degree L in \(G'\) are matched. Moreover, there is such a matching M, where all nodes of degree L in \(G'\) are matched and we have

$$ |M|\ge |E'|/L\,. $$

Proof

Consider the graph \(G'=(A\cup B, E')\) and the following linear program

$$\begin{aligned} \max ~~&\sum _{e\in E'} x_e \\ \text{ s.t. }~~&\sum _{e\in \delta (u)}x_e \le 1 ~~ (u\in A\cup B) \\&\sum _{e\in \delta (u)}x_e=1 ~~ (u\in A\cup B, \deg _{G'}(u)=L) \\&x\ge 0. \end{aligned}$$

It is well-known that the feasible region of the above LP is an integral polyhedron. Moreover, the above LP is feasible as is easily seen by considering the point that assigns 1/L to each edge in \(E'\). Hence there exists an integral point optimal for this linear program. Notice, that every integral point feasible for this linear program is a characteristic vector of a matching in \(G'\), which matches all nodes of degree L in \(G'\). To finish the proof, notice that the value of the objective function calculated at \(x^\star \) equals \(|E'|/L\). Thus the value of this linear program is at least \(|E'|/L\), finishing the proof.    \(\square \)

2.4 Stability of Output Matching

Let the above algorithm terminate with a matching M. We first argue that it is stable.

Lemma 2

The output matching M is stable in \(G=(A\cup B, E)\).

Proof

Suppose for contradiction that M is not stable, i.e. suppose that there exists an edge \((a,b) \in E\) that blocks M. If b rejected a proposal from a during the algorithm, then b holds L proposals when the algorithm terminates and all these proposals are from men, who are weakly preferred by b over a. Thus the degree of b in \(G'\) is L implying that b is matched in M with a man, who is not less preferred by b than a. We get a contradiction to the statement that (ab) blocks M.

Conversely, if b did not reject any proposal from a during the algorithm, then the algorithm terminates with all L proposals of a being accepted, particularly, by women, who are weakly preferred by a over b. Therefore the degree of a in \(G'\) is L implying that a is matched in M with a woman, who is not less preferred by a than b. Again, we get a contradiction to the statement that (ab) is a blocking pair for M.    \(\square \)

2.5 Running Time

We now show that each stage of the algorithm has polynomial execution time. For the first stage, we illustrate that only a polynomial number of proposals are bounced, forwarded, or rejected during this stage. For the second stage, the proof of Lemma 2 implies that it is sufficient to find an optimal extreme solution for a linear program of polynomial size.

First, we show that proposals are bounced only polynomially many times. For every \(b \in B\), at most L proposals may be bounced to b. Indeed, with each proposal bounced to b, the number of proposals held by b increases; also, the number of proposals held by b never decreases or exceeds L during the algorithm. Hence at most L|B| proposals are bounced during the first stage.

Second, we illustrate that proposals are forwarded only polynomially many times. For each \(a\in A\), promotion status of a, and \(b\in B\) such that \((a, b) \in E\), at most one proposal of a may be forwarded to b. To see this, let \(b'\) be a woman forwarding a proposal of a to b. Notice that b cannot bounce the proposal after b receives it because, otherwise, \(b'\) could bounce it by the transitivity of indifference. Observe also that b may forward a proposal from a only if she holds another proposal from him. Then it follows from the forward step that no woman can forward a proposal of a to b as long as b holds a proposal from him. If b rejects the proposal, then she is added to the rejection history of a, and so b does not receive any proposal from a unless the promotion status of a changes. Hence at most 3|A| |B| proposals are forwarded during the first stage.

Finally, for each \(a\in A\), promotion status of a, and \(b\in B\) such that \((a, b) \in E\), b may reject at most L proposals from a. Indeed, b holds at most L proposals at any point in time, and since b is added to the rejection history of a after she rejected him, b does not receive any proposal from a unless the promotion status of a changes. Hence at most 3L|A| |B| proposals are rejected during the first stage.

3 Tight Analysis

Recall that \(\textsf {OPT}\) is a maximum-cardinality stable matching in G, and let M be the output matching defined above. If \(a \in A\) is matched with \(b \in B\) in \(\textsf {OPT}\), we write and . Similarly, we use the notations and when \(a \in A\) is matched with \(b \in B\) in M. Note that our analysis is based on graph \(G'\) and therefore all graph-related objects will assume \(G'\).

Definition 1

A man \(a \in A\) is called successful if the algorithm terminates with all of his L proposals being accepted. Likewise, a woman b is called successful if she holds L proposals when the algorithm stops. In other words, a person \(c \in A \cup B\) is successful if the degree of c in \(G'\) is L, and unsuccessful otherwise.

Definition 2

A woman is called popular if she rejected a proposal during the algorithm, and unpopular otherwise.

Remarks 1 and 2 below directly follow from the algorithm and are consequences of the bouncing step, and the rejection step, respectively.

Remark 1

Let \(a \in A\) and \(b,b'\in B\) be such that b holds a proposal from a when the algorithm finishes, \(b'\) is unsuccessful, and \(b' \simeq _a b\). Then b is unpopular.

Proof

Suppose for contradiction that b is popular. Then at some point she could not bounce or forward any one of her proposals and so she was to reject a proposal. This implies that after b became popular, whenever she received a new proposal that could be bounced, that proposal would immediately be bounced. But then, when the algorithm terminates, b holds a proposal from a, that could successfully be bounced to \(b'\), a contradiction.    \(\square \)

Remark 2

Let \(a,a' \in A\) and \(b \in B\) be such that b holds at least two proposals from a when the algorithm finishes, b rejected a proposal from \(a'\) at some point, a is basic, and \(a' \simeq _b a\). Then there is an edge \((a', b)\) in \(G'\).

Proof

Suppose for a contradiction that \((a', b) \notin G'\) holds. Let t be the most recent point in time when b rejects a proposal from \(a'\). Then it follows from the algorithm that, at t, \(a'' \ge _{b} a'\) holds for all \(a'' \in A(b)\). The rejection step also implies that, at t, there is no \(a'' \in A\) such that \(a' \simeq _b a''\), \(a''\) is basic, and b holds more than one proposal from \(a''\). Moreover, the algorithm implies that, after t, whenever she receives a new proposal from a man \(a''\) such that \(a'' <_b a'\), she will immediately reject it unless she successfully bounces or forwards it. Now, consider a point in time after t when there is a man \(a''\) such that \(a' \simeq _b a''\), b already holds a proposal from \(a''\), and receives another proposal from \(a''\). Then the rejection step implies that she will reject one of the proposals from \(a''\) unless she successfully bounces or forwards it. But then, when the algorithm terminates, b holds at least two proposals from a, a contradiction.

3.1 Analytical Techniques

In the following, we define inputs, outputs, and costs – notions that are central in the analysis of our charging scheme. Before we take a closer look at these notions and define them formally, let us discuss phenomena captured by them.

We use two different objects, inputs and outputs, to differentiate between two different viewpoints on proposals accepted when the algorithm ends. In particular, inputs are associated with the viewpoint of women on the proposals whereas outputs are associated with the viewpoint of men. The choice of terms “inputs” and “outputs” is due to the analysis in  [6] where the edges of \(G'\) are directed from men to women, and so each proposal becomes an “input” for the woman, and analogously becomes an “output” for the corresponding man.

Now we describe the ideas that motivated our definitions concerning outputs and inputs. Let \(M + \textsf {OPT}\) denote the multiset that contains the edges in M and the edges in \(\textsf {OPT}\). To establish the approximation guarantee of our algorithm, we analyze each connected component in \(M + \textsf {OPT}\). In order to show that M-augmenting paths in \(M + \textsf {OPT}\) do not lead to a large approximation guarantee, we introduce the notions of bad and good inputs as well as bad and good outputs. For example, a certain number of bad inputs and bad outputs are generated by the edges incident to the endpoints of an M-augmenting path in \(M + \textsf {OPT}\). Indeed, as we will see later, if \(a_0-b_0-a_1- \dotsc -a_k-b_k\) is an M-augmenting path in \(M + \textsf {OPT}\) of length \(2k +1\), \(k \ge 2\) where \(a_0 \in A\), then \(b_0\) has at least \(L-2\) bad inputs and \(a_k\) has at least \(L-2\) bad outputs. Then to show the approximation guarantee of \((3L-2)/(2L-1)\), we provide a way to obtain a lower bound on the number of bad inputs and bad outputs of men and women in each M-augmenting path; and later we provide an upper bound on the total number of bad inputs and bad outputs of all men and women.

To implement the above ideas, we use a charging scheme. Our charging scheme associates a cost with each man and each woman. These costs keep track of bad inputs and bad outputs: bad inputs lead to an increase of the corresponding woman’s cost and bad outputs lead to an increase of the corresponding man’s cost. We show that the total cost of all men and women is bounded above by 2L|M|. On the other side, we provide a lower bound on the total cost by giving a lower bound on the cost of each connected component in \(M + \textsf {OPT}\). These upper and lower bounds lead to the desired approximation guarantee of \((3L-2)/(2L-1)\).

3.2 Inputs and Outputs

In our analysis inputs and outputs are fundamental edge-related objects for our charging scheme. Each edge in \(G'\) generates a certain number of charges. For example, as we will see in Sect. 3.3, if an edge (ab) in \(G'\) belongs either to M or to \(\textsf {OPT}\), two charges are generated by (ab) so that one is carried to node a and one is carried to node b by cost function. To define similar charging mechanisms for the remaining types of edges in \(G'\), we first distinguish them as in the following definitions.

Definition 3

Given an edge (ab) in \(G'\), we say that (ab) is an output from \(a \in A\) and an input to \(b \in B\) if (ab) is not in \(M + \textsf {OPT}\).

To illustrate how outputs and inputs are determined, for example, let \((a,b)\in M\), \(a\in A\), \(b\in B\) and \(n_{(a,b)}\) be the number of edges of the form (ab) in the multigraph \(G'\), then the edge (ab) gives rise to the following number \(s_{(a,b)}\) of inputs (and to the same number of outputs)

$$ s_{(a,b)}:={\left\{ \begin{array}{ll} n_{(a,b)}-1 &{}\text {if }(a,b)\not \in \textsf {OPT}\\ 0 &{}\text {if }n_{(a,b)}=1\\ n_{(a,b)}-2 &{}\text {otherwise}.\\ \end{array}\right. } $$

Definition 4

An input (ab) to \(b \in B\) is called a bad input if one of the following is true:

  • b is popular and \(a >_b \textsf {OPT}(b)\).

  • b is popular, \(a \simeq _b \textsf {OPT}(b)\), but \(\textsf {OPT}(b)\) is unsuccessful.

  • b is popular, a is 1-promoted, \(\textsf {OPT}(b)\) is successful and \(M(b) \simeq _b \textsf {OPT}(b) \simeq _b a\).

An input (ab) to \(b \in B\) is a good input if it is not a bad input. In other words, an input (ab) to \(b \in B\) is a good input if one of the following is true:

  • b is unpopular.

  • b is popular and \(\textsf {OPT}(b) >_b a\).

  • b is popular, \(a \simeq _b \textsf {OPT}(b)\), \(\textsf {OPT}(b)\) is successful and a is not 1-promoted.

  • b is popular, \(a \simeq _b \textsf {OPT}(b)\), \(\textsf {OPT}(b)\) is successful, but not \(M(b) \simeq _b \textsf {OPT}(b) \simeq _b a\).

An output (ab) from a man a is called a bad output if one of the following is true:

  • b is unpopular.

  • b is popular, \(b >_a \textsf {OPT}(a)\), a is 1-promoted, but not \(M(b) \simeq _b \textsf {OPT}(b) \simeq _b a\).

  • b is popular, \(b >_a \textsf {OPT}(a)\) and a is basic.

An output from a man a is a good output if that is not a bad output. In other words, an output (ab) from a man \(a \in A\) is a good output if one of the following is true:

  • b is popular and \(\textsf {OPT}(a) \ge _a b\).

  • b is popular, \(b >_a \textsf {OPT}(a)\) and a is 2-promoted.

  • b is popular, \(b >_a \textsf {OPT}(a)\), a is 1-promoted and \(M(b) \simeq _b \textsf {OPT}(b) \simeq _b a\).

Lemma 3

There is no edge which is both a bad input and a bad output.

Proof

Assume that an edge (ab), \(a \in A\), \(b \in B\) is both a bad input to b and a bad output from a. First, consider the first case from the definition of a bad output. It trivially contradicts all the cases from the definition of a bad input. Second, consider the first case from the definition of a bad input and either the second or the third case from the definition of a bad output. Then the case (1) below is implied. Third, consider the second case from the definition of a bad input and either the second or the third case from the definition of a bad output. Then the case (2) below is implied. Finally, consider the third case from the definition of a bad input. It trivially contradicts both the second and the third case from the definition of a bad output. Thus one of the following cases is true:

  1. 1.

    \(a >_b \textsf {OPT}(b)\); \(b >_a \textsf {OPT}(a)\).

  2. 2.

    \(a \simeq _b \textsf {OPT}(b)\), and \(\textsf {OPT}(b)\) is unsuccessful; a is not 2-promoted.

In case (1), the edge (ab) is a blocking pair for \(\textsf {OPT}\), contradicting the stability of \(\textsf {OPT}\).

In case (2), since \(\textsf {OPT}(b)\) is unsuccessful, \(\textsf {OPT}(b)\) was rejected by b as a 2-promoted man. On the other hand, \(a \simeq _b \textsf {OPT}(b)\), a is not 2-promoted, and b holds a proposal from a when the algorithm terminates, contradicting the rejection step.    \(\square \)

Corollary 1

The number of good inputs is at least the number of bad outputs.

Proof

Assume for a contradiction that the number of good inputs is smaller than the number of bad outputs. Then there is an edge in \(G'\) which is a bad output but not a good input. In other words, there is an edge in \(G'\) which is both a bad output and a bad input, contradicting Lemma 3.    \(\square \)

3.3 Cost

In our charging scheme, cost is a function that assigns charges, that originate from the edges, to the nodes. More specifically, the cost of a man a is obtained by counting the edges in \(G'\) incident to a, where bad outputs contribute 2 and all other edges contribute 1. Similarly, the cost of a woman b is obtained by counting the edges in \(G'\) incident to b, to which good inputs contribute 0 and all other edges contribute 1.

In the following, let \(\deg (u)\) be the degree of the node u in \(G'\). For \(a\in A\), we define his cost as follows:

for \(b\in B\), we define her cost as follows:

For a node set \(S \subseteq A \cup B\), \(\textsf {cost}(S)\) is defined as the sum of costs of all the nodes in S.

The above definitions lead to next three remarks.

Remark 3

Let \(b \in B\) be matched in M and have at least k bad inputs. Then \(\textsf {cost}(b) \ge k + 1\).

Proof

Let \(k'\) be the number of good inputs to b. Since b is matched in M, the edge (M(b), b) is contained in \(G'\) and therefore it is not an input to b. Thus \(\deg (b) \ge k + k' + 1\). Hence, by definition of cost, \(\textsf {cost}(b) = \deg (b) - k' \ge k + 1\) holds.    \(\square \)

Remark 4

Let \(b \in B\) be matched in \(\textsf {OPT}\), have at least k bad inputs, and \((\textsf {OPT}(b),b)\in E'\) where \(E'\) is the edge set of \(G'\). Then \(\textsf {cost}(b) \ge k + 1\).

Proof

Let \(k'\) be the number of good inputs to b. Since the edge \((\textsf {OPT}(b), b)\) is in \(G'\), it is not an input to b. Thus \(\deg (b) \ge k + k' + 1\). So, by definition of cost, \(\textsf {cost}(b) = \deg (b) - k' \ge k + 1\) holds.    \(\square \)

Remark 5

Let \(b \in B\) be matched in both \(\textsf {OPT}\) and M, \(\textsf {OPT}(b) \ne M(b)\), and \((\textsf {OPT}(b),b)\in E'\) where \(E'\) is the edge set of \(G'\). Then \(\textsf {cost}(b) \ge 2\).

Proof

Let k and \(k'\) be the numbers of bad inputs and good inputs to b, respectively. Since the edges \((\textsf {OPT}(b), b)\) and (M(b), b) are contained in \(G'\), they are not inputs to b. Thus \(\deg (b) \ge k + k' + 2\). So, by definition of cost, \(\textsf {cost}(b) = \deg (b) - k' \ge k + 2 \ge 2\) holds.    \(\square \)

3.4 The Approximation Ratio

Let \(\mathcal {C}(M + \textsf {OPT})\) denote the set of connected components in a graph induced by the edge set \(M + \textsf {OPT}\). Lemma 4 below bounds the cost of \(M + \textsf {OPT}\). For the proof of this lemma, we refer the reader to the full version of the paper on arXiv

Lemma 4

\(\sum _{C \in \mathcal {C}(M + \textsf {OPT})} \textsf {cost}(C) \ge (L+1)|\textsf {OPT}| + (L-2)(|\textsf {OPT}|-|M|)\).

We are ready to prove our main theorem, and restate it here for completeness.

Theorem 1

Given an instance of the maximum-cardinality stable matching problem with incomplete preferences, and ties of size at most L; the polynomial-time algorithm described in Sect. 2 finds a stable matching M with

$$ |M|\ge \frac{2L-1}{3L-2} \,|\textsf {OPT}|, $$

where \(\textsf {OPT}\) is an optimal stable matching.

Proof

By Lemma 1, we have

$$ |M| \ge \frac{|E'|}{L} = \sum _{u \in A \cup B} \frac{\deg (u)}{2L}\,. $$

By definition of cost and by Corollary 1, we obtain

$$ \sum _{u \in A \cup B} \deg (u) \ge \textsf {cost}(A \cup B)\,. $$

Combining the above inequalities, we get

$$ 2L|M| \ge \sum _{u \in A \cup B} \deg (u) \ge \textsf {cost}(A \cup B) = \sum _{C \in \mathcal {C}(M + \textsf {OPT})}\textsf {cost}(C)\,, $$

By Lemma 4, we obtain

$$ 2L|M| \ge \sum _{C \in \mathcal {C}(M + \textsf {OPT})}\textsf {cost}(C) \ge (L+1)|\textsf {OPT}| + (L-2)(|\textsf {OPT}|-|M|)\,. $$

By rearranging the terms, we obtain

$$ 2L |M| +(L-2) |M|\ge (L+1)|\textsf {OPT}| +(L-2)|\textsf {OPT}|\,, $$

and so we obtain the desired inequality

$$ (3L-2) |M|\ge (2L-1)|\textsf {OPT}|\,. $$

   \(\square \)

3.5 Tightness of the Analysis

The following example shows that the bound in Theorem 1 is tight.

Fig. 1.
figure 1

An instance with ties of size at most L, \(L \ge 2\) for which the algorithm outputs a stable matching M with \(|\textsf {OPT}| / |M| = (3L-2)/(2L-1)\)

Example 1

In Fig. 1, the preference list of each individual is ordered from a most preferred person to a least preferred one, where individuals within parentheses are tied. For example, \(a_1^{\beta }\) is indifferent between all the women in his preference list except \(b_1^{\beta }\), who is less preferred than the others.

It is straightforward to check that there exists a unique maximum-cardinality stable matching, namely \(\textsf {OPT}= \{(a_0, b_0)\} \cup \{(a_i^{j}, b_i^{j}) \mid i = 1,\ldots ,L-1, \ j = \alpha ,\beta , \gamma \}\). We show that there exists an execution of the algorithm which outputs the matching \(M = \{(a_0, b_0)\} \cup \{(a_i^{\alpha }, b_i^{\gamma }) \mid i = 1,\ldots ,L-1 \} \cup \{(a_i^{\beta }, b_i^{\alpha }) \mid i = 1,\ldots ,L-1 \}\), leading to the ratio \(|\textsf {OPT}| / |M| = (3L-2)/(2L-1)\).

Proof

The following is an execution of the algorithm which leads either to the matching M or a matching with the size of M.

  • \(a_0\) makes one proposal to every woman in his list; the women accept.

  • \(a_i^{\alpha }\) for all \(i = 1,\ldots ,L-1\) makes one proposal to every woman in his list; the women accept.

  • \(a_i^{\beta }\) for all \(i = 1,\ldots ,L-1\) makes one proposal to every woman except the last one in his list; the women accept.

  • \(a_i^{\gamma }\) starts to propose \(b_i^{\gamma }\) for all \(i = 1,\ldots ,L-1\), but each time \(a_i^{\gamma }\) makes a proposal, the proposal is rejected; \(a_i^{\gamma }\) gives up.

   \(\square \)