1 Introduction

In this paper, we focus on the Team Formation problem (TF). One of its simplest and most abstract forms consists in forming a team of agents with minimum cost that meets a given set of requirements. The problem is equivalent to the set covering and hitting set problems [1]. We are given a set of agents, where each agent is associated with a set of skills and a deployment/hiring cost. The TF problem consists in finding a team T (i.e., a subset of agents) of minimal overall cost that is efficient, i.e., such that each skill is possessed by at least one agent from T. This problem is well-known to be \(\textbf{NP}\)-hard [1, 2].

In realistic settings, there may be uncertainty about agents, in particularly after deployment: agents previously deemed capable may become unable to perform their duties, e.g., due to illness. Thus it is important to consider resilience properties in TF, i.e., proactively seek to form an efficient team that is robust to unexpected changes.

Robustness [2] and recoverability [3] are complementary notions in TF, both with their own advantages and disadvantages. A team is said to be k-robust if it remains efficient even in the event that any k agents are removed from it [2]. Robust TF consists in finding an optimal k-robust team of minimal deployment cost. Interestingly, computing an optimal robust team is not harder than computing an optimal efficient team without considering robustness [2]. However, the deployment cost of a robust team may be prohibitively high, as providing such strong guarantees is only possible by introducing a high degree of redundancy, i.e., each skill must be covered by at least \(k + 1\) agents.

An alternative to robustness is recoverability: a team is considered k-recoverable if after losing any k agents, it can be “easily repaired” by hiring new agents at low cost. Demirović et al. [3] empirically showed that the overall deployment cost is effectively lower than the initial deployment cost of a robust team, providing a trade-off: the cost of the team may be lower at the expense of allowing that the team may be dysfunctional for some period after disruption. However, the problem of computing a recoverable team is \(\mathbf {\Sigma ^P_3}\)-hard [3], making it difficult, if not inapplicable, in practice. Moreover, recoverability does not provide coverage guarantee during the disaster phase. As a result, a large number of critical skills may become uncovered for a certain amount of time during which the system loses most of its functionality. Depending on the application, this may be highly undesirable.

Let us introduce an example illustrating the problem and notions.

Example 1

The organizers of a special exhibition have a budget of 900 to hire professional guides. The attendees are expected to be from China (50%), Japan (40%), and France (10%). To enhance attendee experience, the organizers plan to hire proficient Chinese, Japanese and French speakers so that the exhibits could be explained to the attendees in their native language. A number of guides (called agents in the following) are available for hire. Type 1 agents are monolingual. It costs 100 to hire a type 1 Chinese-speaking or Japanese-speaking agent, and 150 for a French-speaking agent. Let us denote by C, J, and F a type 1 Chinese-, Japanese-, and French-speaking agent, respectively. Type 2 agents are bilingual, and are denoted respectively by CJ, CF, and FJ. It costs 180 to hire CJ, and 230 for CF or FJ. Agents are paid in advance and should be contacted at least one day ahead of the event, otherwise the agents may only attend the afternoon session whilst charging the same price.

A minimum-cost alternative (Plan I) is is to hire the team \(\{C, FJ\}\). By doing so, agent C may help Chinese speakers, whereas agent FJ may help French and Japanese speakers. This plan corresponds to an optimal team for the standard TF problem and costs 330, i.e., it only considers covering the required skills (language proficiency) while minimizing cost, without considering robustness. In the unfortunate case where both of these agents fall ill on the day of the event, two new agents may be hired as an emergency at the cost of 330.

However, under the same circumstances a slightly cheaper solution exists (Plan II), which consists in initially hiring the team \(\{C, F, J\}\). At an initial cost of 350, in the worst case where two agents including F are absent on the day of the exhibition, one can hire for the afternoon a type 2 agent possessing the two languages skills that have been lost, e.g., if we lack agents F and J from the hired team, one could ask for help from an FJ agent. Plan II is slightly more expensive than Plan I (350 vs. 330), but the recovery cost is 230 instead of 330, making this alternative arguably preferable over the first one. Plan II corresponds to an optimal 2-recoverable team.

However, in both Plans I and II, if we were to lose two agents in the last minute, then the event would be without support for two languages the whole morning (no support for any language in the case of Plan I).

Assuming we would like to be robust to a loss of two agents, an alternative plan would be to form the team \(\{C,\) CJCFFJ\(FJ\}\) (Plan III): since each language is spoken by at least three agents, losing any two agents would still result in a team that can provide support in all three languages. Plan III corresponds to an optimal 2-robust team. However, Plan III is quite expensive: it costs 970, which exceeds the initial budget of 900 and the organizers cannot afford it.

The organizers would be happy with an alternative that is more affordable than Plan III, while still being “robust” to potential losses. Noteworthy, only 10% of the attendees are expected speak only French, and French translators are more costly than the other ones. Thus a reasonable alternative is to hire the team \(\{CJ, CJ, CJ, F\}\) (Plan IV) at the cost of 690. Losing two agents from the team would still guarantee the Chinese and Japanese languages are covered by the remaining team members, while losing agent F would only result in a coverage loss of 10% among the attendees. In addition, repairing the team would not cost more than 150 in the worst case (one would need to hire an F agent), so the overall cost of 840 would still remain under the budget constraints.

Our paper aims to introduce the solution concept illustrated above in Plan IV. A team T is said to be \(\langle k, t\rangle\)-partially robust (\(t \in [0, 1]\)) if whenever k agents are removed from team T, some “proportion” (reflected by t) of the overall set of skills remains covered. Partially Robust TF (PR-TF) is the problem to form an optimal \(\langle k, t\rangle\)-partially robust team. Plan IV in the above example corresponds to an optimal \(\langle 2, 0.9\rangle\)-partially robust team. This notion generalizes (full) robustness: a team is \(\langle k, 1\rangle\)-partially robust if and only if it is k-robust. Computationally speaking, we show that the decision problem related to PR-TF is \(\mathbf {\Sigma ^P_2}\)-complete, thus it lies “in-between” robust TF and recoverable TF. We empirically show that forming an optimal partially robust team has the advantages of both robustness and recoverability. Indeed, on the one hand, a partially robust team may be computed significantly more efficiently than a recoverable one, and by definition it provides a skill coverage guarantee in the disaster phase. On the other hand, the overall cost of a partially robust team may be cheaper than the initial deployment cost of a “fully” robust team in the general case.

We empirically study the benefits of our novel partial robustness notion on a number of existing benchmarks used in [3]. We also introduce a new set of benchmarks related to the facility location problem. This problem consists in deploying a set of facilities (e.g., health centers, antennas, schools, shelters) on a populated map to maximize a certain population coverage while minimizing the overall deployment cost [4]. The notion of partial robustness is of particular importance for these type of problems. When facilities are interpreted as agents and population as (weighted) skills, the goal is to guarantee that a high percentage of the population still can access to a fully functional facility after some of the facilities break down.

This article is an extended and revised version of [5], a paper in which we introduced the notion of partial robustness, reported the computational complexity of the corresponding decision problem, presented a complete algorithm (PR) and reported some empirical results. This present article extends the results reported in [5] in several aspects:

  • In addition to our initial algorithm PR, we introduce another complete algorithm (PR_A) that is anytime. While algorithm PR needs termination to return a partially robust solution, algorithm PR_A starts with a sub-optimal partially robust solution (sub-optimal in terms of deployment cost) and improves it over time. Interestingly, algorithm PR_A empirically outperforms algorithm PR in terms of runtime to reach optimality on a large set of instances. Our empirical evaluation shows that we may obtain near-optimal solutions after a very few number of iterations. Despite the high theoretical complexity of achieving partial robustness, algorithm PR_A allows one to compute satisfactory partially robust solutions in a very short amount of time on instances of reasonable size. This is particularly interesting for large instances, for which algorithm PR does not compute any solution within 3600 s. Both algorithms are presented and detailed in Sect. 5.

  • We introduce two new “cuts” (named cut and cut+ in this paper) that allow one to learn from counter-examples and prune the search space, improving the efficiency of both algorithms PR and PR_A (cf. Sect. 5.3). These cuts improve over the single cut reported in our conference paper.

  • We provide software to generate facility location instances discussed above, whose structure and size can be tuned by a number of parameters. This was used to generate four sets of benchmarks of different sizes, described briefly in Sect. 6 and in more detail in Appendix B. We made our instance generator, together with the implementation of our algorithms, publicly available [6].

  • We have extended our empirical analysis beyond the conference version. This includes both the evaluation of algorithms PR and PR_A, the comparison of our different cuts, and the additional set of benchmarks. These results further show the advantages of our novel anytime algorithm PR_A.

  • The proofs of our formal propositions are available in Appendix A.

2 Related work

Team formation (TF) has received an ever-growing interest from practitioners and researchers over the past two decades, e.g., see [7] for a recent overview on applications, computational complexity, and resolution methods on TF. In particular, notions of resilience and robustness in multiagent systems, both separately and in combination, have recently attracted attention in a variety of contexts, including the design of new solution concepts, their formalization in the underlying multiagent frameworks, and the computational issues that follow.

From the conceptual perspective, robustness is the ability to withstand adversarial conditions without negative impact on performance [8, 2]. Recoverability, on the other hand, is the capability to restore the functionality of the system after disturbance [9]. Both notions have been separately formalized in TF and analyzed from the computational viewpoint [3, 2]: the decision problems related to robust TF and recoverable TF have been shown to be respectively \(\textbf{NP}\)-complete and \(\mathbf {\Sigma _3^P}\)-complete. It turns out that recoverability can be seen as a generalization of robustness in TF, as pointed out in [3]. Alternative solution concepts, although derived from robustness and resilience, have been recently considered in TF, including team diversity [10], a key concept for the formation of synergistic teams; or when skills are replaced by (complex) tasks [11], team stabilizability [12]. The notion of robustness has also been imported in some variants of coalition formation including Hedonic games [13] and multi-team formation [14, 15]. Uncertainty issues have also been considered in the redundant multi-agent task allocation problem [16] that involves uncertain agents-task assignment costs, and in cooperative games with other failure models [17,18,19,20,21]. These frameworks depart from TF in both their structure and the underlying goals, which involve notions of stability and fairness, e.g., when some underlying team utility or value is distributed among the agents.

Closely related to TF is the framework of Coalition Structure Generation (CSG) [22], in which a number of solution concepts considering agent failure have also been formalized in the recent years. In CSG, we are given a set of agents and a characteristic function that associates every subset of agents with a value representing its utility. The goal is to form a coalition structure, i.e., a partition of the given agents into a set of teams such that the sum of the utilities of all teams is maximized. This problem is known to be \(\textbf{NP}\)-hard in the general case [23] and many algorithms have been proposed for solving, both in the cases when the characteristic function is provided extensively (i.e., as a table with \(2^n\) entries, n being the number of agents) [24,25,26,27], or when concise representation languages for characteristic functions are considered, such as marginal contribution networks (MC-nets) [28], synergy coalition groups (SCGs) [29], skilled-based representations [30], agent-type representations [31], and coalition resource games [32], among others. Robustness and stochastic notions have recently been investigated in CSG. Robustness in CSG was first introduced in [33] and is a notion similar to robust TF: a coalition structure is k-robust if its global performance is kept above a certain threshold value in any case where at most k agents are removed from it. It has been proved that the underlying decision problem for robust CSG is \(\mathbf {\Sigma _2^P}\)-complete in the general case [33], and that \(\mathbf {\Sigma _2^P}\)-harness still holds when the characteristic function is represented by means of a coalition resource game [34]. Forming coalition structures under uncertainty has also been considered in the so-called probabilistic CSG framework [35, 36], where the attendance of agents is of stochastic nature. In [36] the problem was shown to be \(\textbf{NP}^{\textbf{PP}}\)-hard in the general case, but remains in \(\textbf{NP}\) for some natural classes of problems where the characteristic function is represented by means of an MC-net. CSG and TF share obvious similarities in that in both cases the goal is to form teams (in the case of TF, a single team) to accomplish a certain (set of) task(s). However, CSG and TF depart from each other in their core structure: in TF, a single team is formed out of a pool of agents and the selection of each agent involves a cost, whereas in CSG a set of agents is already preselected and must be split into different teams, each of which is implicitly associated with a certain task to perform. Indeed, while the standard CSG and TF decision problems can be viewed as equivalent from the theoretically computational viewpoint (they are both \(\textbf{NP}\)-complete), the same cannot be stated for their robust counterpart: robust TF is \(\textbf{NP}\)-complete [2], whereas robust CSG is \(\mathbf {\Sigma _2^P}\)-complete [33]. One of the direct consequence of this observation is that results in robust CSG cannot be directly adapted to TF.

Noteworthy, TF is equivalent to the well-known set cover problem (SCP) [37], which is intrinsically connected to the facility location problem (FLP) [38]. There is a vast literature investigating extensions and variants of this problem (see [39] for a survey), a number of which deal with uncertainties [40,41,42,43,44,45,46,47]. In all of these works, different uncertainty aspects are considered such as the reliability of the service provided by a facility, or when reliability decays with the distance from a facility to a customer. All of these considerations are of stochastic nature, and the main goal is to find a input set / facilities that maximize some expected coverage. Thus, these notions depart from the robustness notions focused in this paper, that are based on worst-case scenarios given k agent losses. In [48] the authors consider a variant of the FLP called maximal covering location problem (MCLP). In the MCLP, the goal is to maximize the amount of demand covered within an acceptable service distance by locating a given fixed number of facilities. Similar to our considerations of skill coverage, in MCLP the key assumption is that coverage is binary: each customer is either fully covered if there is a facility within acceptable distance, otherwise it is not covered at all. Such an assumption has been released in subsequent works, leading to the alternative notion of partial covering problem [49,50,51,52]. These notions are intended to maximize a coverage degree in the initial deployment step, and thus do not account for potential damaging events after deployment. More closely related to our notion of partial robustness, are similar notions introduced for FLPs in [53, 54]. In [54], the authors introduced the r-interdiction covering problem, in which facilities and services can be lost due to natural or man-made disasters. The problem to be solved takes place from the viewpoint of the attacker: from some predefined facility deployment, the goal is to find a subset of r facilities, which when removed, maximizes the resulting drop in coverage. The notion can be viewed as some inverse problem of the standard FLP/SCP. Indeed, the problem still lies in \(\textbf{NP}\). In [53] the defensive maximal covering problem is introduced. The notion is defined for a specific type of facility location problems where the coverage of facilities is induced by “weighted links”, connecting the candidate facility nodes in a network. The authors then consider a leader-follower formulation of the problem: the leader wants to locate facilities to maximize demand coverage while the follower’s goal is to disconnect the most damaging links once the facilities are deployed. While the notion bears similarity with our notion of partial robustness, in [53] the potential loss is not considered in the facilities but on the links between the candidate facility locations. Moreover, the underlying leader-follower model is formulated as a pair of independent problems, which are designed to be solved sequentially; this makes both problems in \(\textbf{NP}\) and thus departs from our work.

From the algorithmic viewpoint, our approach to compute optimal partial robust teams is similar to Counter-Example-Guided Abstraction Refinement (CEGAR) [55,56,57], one of the most successful approaches for QBF (Quantified Boolean Formulae) solving. Indeed, as will be shown in Sect. 4, the decision problem related to our notion of partial robustness is in \(\mathbf {\Sigma _2^P}\) (cf. Prop. 4), so it is similar in structure to a 2QBF problem \(\exists X \forall Y \varphi\), where \(Var(\varphi ) = X \cup Y\) [58]. Such problems have a natural interpretation as a two-person game between an “existential” player and a “universal” player [59]: the goal of the existential player is to find out a valuation of X that cannot be refuted by the universal player through a valuation of Y. In a CEGAR-based algorithm, computing a solution \(\omega _X\) is done by iteratively searching for a solution of an “abstracted”, simplified problem. If such an “abstract” solution is found, one needs to check whether it is an actual solution of the original problem by searching for a certificate \(\omega _Y\) showing that \(\omega _X\) is a counter-example. If no such certificate \(\omega _Y\) is found for the abstract solution \(\omega _X\), then \(\omega _X\) is a solution of the original problem. Otherwise, \(\omega _X\) is blocked by taking advantage of the certificate \(\omega _Y\) to refine the abstracted problem. Our algorithms for computing optimal partially robust teams are similar in essence. Initially a simplified problem is considered, and the algorithm iteratively computes a new candidate team. If a certificate is found showing that the candidate team is a counter-example, i.e., it is does not meet the partial robustness constraints, then a new constraint is generated to prune the search space, and the process iterates until an optimal partially robust team is found. In this context, a certificate consists in a removal of some agents of the candidate team resulting in a violation of the partial robustness constraints.

3 Preliminaries

This section recalls some preliminaries about basic notions of computational complexity, the Team Formation (TF) problem, and its extensions to Robust TF [2] and Recoverable TF [3].

3.1 Computational complexity

We assume that the reader is familiar with the complexity class \(\textbf{NP}\) (see [60] for more details). Higher complexity classes are defined using oracles. In particular, \(\mathbf {\Sigma _2^P} = \textbf{NP}^{\textbf{NP}}\) (resp. \(\mathbf {\Sigma _3^P}\)) corresponds to the class of decision problems that are solved in polynomial time by non-deterministic Turing machines using an oracle for \(\textbf{NP}\) (resp. \(\mathbf {\Sigma _2^P}\)).

3.2 Team formation

Let us formalize the (standard) TF problem [2].

Definition 1

(TF Problem Description) A TF problem description is a tuple \(\langle A, S, f, \alpha \rangle\) where \(A = \{a_1, \ldots ,a_n\}\) is a set of agents, \(S = \{s_1, \ldots , s_m\}\) is a set of skills, \(f: A \mapsto {\mathbb {N}}\) is a deployment cost function, and \(\alpha : A \mapsto 2^S\) is an agent-to-skill function.

A team is a subset of agents \(T \subseteq A\). It is possible to extend the cost function f to teams T as \(f(T) = \sum _{a_i \in T}{f(a_i)}\). Likewise, the agent-to-skill function \(\alpha\) is extended to teams T as \(\alpha (T) = \bigcup _{a_i \in T}{\alpha (a_i)}\). A standard expected property in Team Formation is efficiency: a team \(T \subseteq A\) is efficient if all skills from S are covered by T, i.e., when \(\alpha (T) = S\). An optimal team for TF is an efficient team minimizing the cost function. The corresponding decision problem \(\textsf{DP}\)-\(\textsf{TF}\) asks, given as input a TF problem description \(\langle A, S, f, \alpha \rangle\) where f and \(\alpha\) are computed in polynomial time and a threshold \(c \in {\mathbb {N}}\), whether there exists an efficient team T such that \(f(T) \le c\). This problem is equivalent to the well-known set cover problem [1]:

Proposition 1

( [2]) \(\textsf{DP}\)-\(\textsf{TF}\) is \(\textbf{NP}\)-complete.

3.3 Robust TF

Definition 2

(Robust Team [2]) Given a TF problem description \(\langle A, S, f, \alpha \rangle\) and \(k \in {\mathbb {N}}\), a team T is said to be k-robust if for every set of agents \(T' \subseteq T\) such that \(|T'| \le k\), the team \(T \setminus T'\) is efficient.

Robustness generalizes efficiency: a team is 0-robust if and only if it is efficient. Interestingly, despite this generalization, computing an optimal k-robust team (for any \(k \ge 0\)) does not lead to a computational shift. Indeed, the decision problem for robustness (labeled \(\textsf{DP}\)-\(\textsf{RobTF}\)) asks, given as input a TF problem description \(\langle A, S, f, \alpha \rangle\) where f and \(\alpha\) are computed in polynomial time and ck in \({\mathbb {N}}\), if there exists a k-robust team \(T \subseteq A\) such that \(f(T) \le c\). Then:

Proposition 2

( [2]) \(\textsf{DP}\)-\(\textsf{RobTF}\) is \(\textbf{NP}\)-complete.

This problem still lies in \(\textbf{NP}\) because checking whether a given team T is k-robust, despite its combinatorial nature, is equivalent to checking whether each skill from S is possessed by at least \(k + 1\) agents from T; and this task can be performed in polynomial time. The goal of robust TF (RobTF) is to find an optimal k-robust team, i.e., a k-robust team T such that f(T) is minimal.

3.4 Recoverable TF

Recoverable TF (RecTF) consists in finding a team that can be repaired after k agents are removed from it. The notion is based on an extension of a TF problem description (cf. Def. 1):

Definition 3

(RecTF Problem Description [3]) A RecTF problem description is a tuple \(\langle A, S, f, \alpha , h\rangle\) where \(\langle A, S, f, \alpha \rangle\) is a TF problem description and \(h: A \mapsto {\mathbb {N}} \cup \{+\infty \}\) is a recovery cost function.

A RecTF problem description considers in addition a recovery cost function h that defines the cost of deploying a “rescue” team. It adds more flexibility to the framework: some agents \(a_i\) may be deployed at a higher cost later in an emergency situation (\(h(a_i) > f(a_i)\)) or not be available at all (\(h(a_i) = +\infty\)). Similar to f, for any team T one sets \(h(T) = \sum _{a_i \in T}{h(a_i)}\). Given a team \(T \subseteq A\) and \(T' \subseteq T\), \(rcS(T, T')\) is defined as the cost of the cheapest team \(T_{rec}\) such that \((T {\setminus } T') \cup T_{rec}\) is efficient:

$$\begin{aligned} rcS(T, T') = \min _{T_{rec} \subseteq A\setminus T, (T\setminus T')\cup T_{rec} \text{ is } \text{ efficient }}{h(T_{rec})}. \end{aligned}$$

The k-recovery cost of T is then defined as the highest value \(rcS(T, T')\) for any set \(T'\) of size lower or equal to k:

$$\begin{aligned} rc(T, k) = \max _{T' \subseteq T, |T'| \le k}{rcS(T, T')}. \end{aligned}$$

Definition 4

(Recoverable Team [3]) Given a RecTF problem description \(\langle A, S, f, \alpha , h\rangle\) and non-negative integers kr, a team T is said to be \(\langle k, r\rangle\)-recoverable if T is efficient and \(rc(T, k) \le r\).

Recoverability generalizes robustness: if \(h(a_i) > 0\) for any \(a_i \in A\), any team T is \(\langle k, 0\rangle\)-recoverable if and only if T is k-robust. The decision problem for recoverability (labeled \(\textsf{DP}\)-\(\textsf{RecTF}\)) asks, given as input a TF problem description \(\langle A, S, f, \alpha , h\rangle\) where f, \(\alpha\) and h are computed in polynomial time and cr in \({\mathbb {N}}\), if there exists a \(\langle k, r\rangle\)-recoverable team \(T \subseteq A\) such that \(f(T) \le c\). It turns out that this problem is much harder than the robustness counterpart:

Proposition 3

( [3]) \(\textsf{DP}\)-\(\textsf{RecTF}\) is \(\mathbf {\Sigma _3^P}\)-complete.

The goal of RecTF is, given a fixed non-negative integer k, to compute an optimal k-recoverable team. As opposed to RobTF, optimality in RecTF can be defined in several ways. Indeed, after fixing the robustness parameter k, one is left with two functions to minimize for a team T: its deployment cost f(T) and its k-recovery cost rc(Tk), bounded by the input parameters c and r, respectively, in the decision problem \(\textsf{DP}\)-\(\textsf{RecTF}\). For instance, in [3] the authors defined optimality as a single objective problem: an optimal k-recoverable team is therein defined as a team T that is \(\langle k, r\rangle\)-recoverable and whose overall cost \(f(T) + rc(T, k)\) is minimal.

4 Partial robustness in TF

We introduce a new solution concept for TF called partial robustness. Intuitively, a team is partially robust if it is efficient, and if after removing a certain number of agents from it, the residual team covers a certain proportion of the set of all skills. Thus, it makes sense to associate each skill with a weight to emphasize its relative importance:

Definition 5

(Weighted TF Problem Description) A weighted TF problem description is a tuple \(\langle A, S, f, w, \alpha \rangle\) where \(\langle A, S, f, \alpha \rangle\) is a TF problem description and \(w: 2^S \mapsto [0,1]\) is a skill weight function such that \(w(S) = 1\) and w is monotone, i.e., \(\forall S_1, S_2 \subseteq S\), \(w(S_1) \le w(S_1 \cup S_2)\).

For every \(s_j \in S\), \(w(\{s_j\})\) is simply denoted by \(w(s_j)\). A natural way to define w is to consider a normalized weighted sum function \(w_\Sigma\), satisfying \(w_\Sigma (S') = \sum _{s_j \in S'}{w_\Sigma (s_j)}\) for each \(S' \subseteq S\), and \(\sum _{s_j \in S}{w_\Sigma (s_j)} = 1\). Accordingly, it satisfies the conditions from Def. 5 and we used it in all benchmarks presented in Sect. 6.

The coverage of a team T, denoted by cov(T) is defined as:

$$\begin{aligned} cov(T) = w(\alpha (T)). \end{aligned}$$

The k-partial coverage of a team, denoted by pc(Tk), is defined as:

$$\begin{aligned} pc(T, k) = \min _{T' \subseteq T, |T'| \le k}{cov(T \setminus T')}. \end{aligned}$$

We are ready to define formally the notion of partially robust team:

Definition 6

(Partially Robust Team) Given a weighted TF problem description \(\langle A, S, f, w, \alpha \rangle\), \(k \in {\mathbb {N}}\) and a rational number \(t \in [0, 1]\), a team T is said to be \(\langle k, t\rangle\)-partially robust if T is efficient and \(pc(T, k) \ge t\).

A team is \(\langle k, t\rangle\)-partially robust if whenever k agents are removed from it, the coverage of the residual team is not lower than t. For instance, if one wants to guarantee that 95% of the weighted sum of all skills is covered after a loss of k agents, one simply considers a normalized weighted sum function \(w = w_\Sigma\) and sets \(t = 0.95\). Whereas k-robustness only essentially reports a binary value (either the team is k-robust or it is not), partial robustness provides in a sense more information regarding the team’s robustness. Noteworthy, partial robustness generalizes robustness since a team is k-robust if and only if it is \(\langle k, 1\rangle\)-partially robust.

The decision problem for partial robustness (labeled \(\textsf{DP}\)-\(\textsf{PR}\)-\(\textsf{TF}\)) asks, given a weighted TF problem description \(\langle A, S, f, w, \alpha \rangle\) where f, w and \(\alpha\) are computed in polynomial time, non-negative integers c, k, and rational number \(t \in [0, 1]\), if there exists a \(\langle k, t\rangle\)-partially robust team \(T \subseteq A\) such that \(f(T) \le c\). We show below that the computational complexity of this problem lies “in-between” the robustness and recoverability counterpartsFootnote 1:

Proposition 4

\(\textsf{DP}\)-\(\textsf{PR}\)-\(\textsf{TF}\) is \(\mathbf {\Sigma _2^P}\)-complete.

Optimality is defined similar to the efficient and robust TF cases: T is an optimal \(\langle k, t\rangle\)-partially robust team if T is \(\langle k, t\rangle\)-partially robust and f(T) is minimal.

Before concluding this section, let us illustrate the notions introduced so far on our introductory example of organizing a special exhibition and hiring of professional guides (cf. Sect. 1):

Example

(continued) Table 1 summarizes the deployment cost, recovery cost, overall cost and coverage of the teams \(T_1, \ldots , T_4\), which correspond to plans I, ..., IV, respectively, described in the introductive example.Footnote 2 The team \(T_1\) is an optimal efficient team: it has the lowest deployment cost \(f(T_1)\) among all possible efficient teams. Likewise, the team \(T_3\) is an optimal 2-robust team and team \(T_4\) is an optimal \(\langle 2,.9\rangle\)-partially robust team. The team \(T_2\) is an optimal 2-recoverable team: it has the lowest overall cost \(f(T_2) + rc(T_2, 2)\). Note that the criterion of optimality in RecTF slightly differs from the other notions since it also considers the recovery cost.

The optimal \(\langle 2,.9\rangle\)-partially robust team \(T_4\) has the following interesting features compared to the other teams: (i) by definition it provides a 2-partial coverage of 0.9, which is much higher than \(T_1\) and \(T_2\); (ii) while covering \(90\%\) of the weighted sum of skills if two agents are lost in the worst case, its deployment cost is only \(690 / 970 = 71\%\) of the one of the optimal 2-robust team \(T_3\); and (iii) its overall cost \(f(T_4) + rc(T_4, 2)\) remains cheaper than the deployment cost of \(T_3\) (\(f(T_3)\)).

Table 1 Comparison in terms of deployment cost, recovery cost, overall cost and coverage of the teams \(T_1, \ldots , T_4\) corresponding to plans I, ..., IV in the introductive example

5 Algorithms

This section provides two procedures to compute an optimal \(\langle k, t\rangle\)-partially robust team, given a weighted TF problem description \(\langle A, S, f, w, \alpha \rangle\), a non-negative integer k and a rational number \(t \in [0, 1]\). Both approaches follow a similar idea as Counter-Example-Guided Abstraction Refinement (CEGAR) [55,56,57], briefly described in the related work section (Sect. 2).

The first method, referred to as PR in the following text, iterates over the set of optimal efficient teams (the “abstract” candidate solutions chosen by an existential player). For each such candidate team T, one tries to “break” it by removing k agents from it so that the coverage of the residual team is strictly lower than t. If T cannot be broken, then it is \(\langle k, t\rangle\)-partially robust. Otherwise, one generates a constraint that safely removes a set of teams from the search space (including T), and one proceeds to the next iteration step. Stated otherwise, all optimal efficient teams that are potentially \(\langle k, t\rangle\)-partially robust are computed and tested for partial robustness in an increasing deployment cost order. The process is iterated until a candidate team that cannot be broken is found. This first procedure provides a guarantee to find an optimal partially robust team in a finite amount of steps.

With our first method, no \(\langle k, t\rangle\)-partially robust team is provided before the procedure terminates. However, because of time limitations one may be interested in being given a sub-optimal \(\langle k, t\rangle\)-partially robust team in a short amount of time (a team with a non-minimal deployment cost), that improves over time. This motivates our second method, an anytime algorithm named PR_A. It first starts with a k-robust team, which can be computed in a single call to an \(\textbf{NP}\) oracle (cf. Proposition 2). Accordingly, a k-robust team is \(\langle k, t\rangle\)-partially robust for any value \(t \in [0, 1]\), and thus can be considered as a sub-optimal output. The procedure then searches for a sub-optimal efficient team T, i.e., such that \(f(T) < c\), where c is the deployment cost of the team computed at the previous step. It tries to “break” the team similarly to the first method, updates c when the break attempt fails, and in such a case proceeds to search for another efficient team with the updated threshold c. While PR_A is anytime, it also provides a guarantee to eventually find an optimal partially robust team.

5.1 Algorithm PR

The outline of our first method (PR) is given in Algorithm 1. Let us first explain the core of the algorithm (the details of the procedures initConstraints in line 2, solveTF in lines 3 and 9, breakTeam in line 5 and generateConstraint in line 8 are detailed later in this section.) Initially, one computes an efficient team \(T_{cur}\) of minimal cost, i.e., an optimal efficient team (cf. procedure solveTF in line 3). In line 5, the procedure breakTeam searches for a set \(T' \subseteq T_{cur}\) such that \(|T'| \le k\) and \(cov(T_{cur} \setminus T') < t\), that is, it seeks to remove k agents from \(T_{cur}\) so that the coverage degree of the residual team is less than the input threshold t. If such a set \(T'\) does not exist, it means that \(T_{cur}\) is an optimal \(\langle k, t\rangle\)-partially robust team and the algorithm returns it (line 7). Otherwise, \(T'\) serves as a certificate that \(T_{cur}\) is not \(\langle k, t\rangle\)-partially robust. In line 8, a constraint is generated based on \(T_{cur}\) and \(T'\) to prune a set of non \(\langle k, t\rangle\)-partially robust teams (including \(T_{cur}\)) from future consideration. The procedure then searches for another efficient team of minimal cost under the new set of constraints. The same process is iterated until one of the following conditions occurs: the procedure breakTeam returns null, which means that the team \(T_{cur}\) computed at the corresponding iteration is \(\langle k, t\rangle\)-partially robust and returned in line 7; or the procedure solveTF in line 9 returns null, which means that there is no \(\langle k, t\rangle\)-partially robust team, and null is returned in line 10.

figure a

Let us now explain in more details the procedures initConstraints, solveTF, breakTeam, and generateConstraint involved in the algorithm. Our model considers a set X of n binary variables \(X = x_1, \ldots , x_n\), n being the total number of agents in A. An assignment of values to the variables from X corresponds to a team T where \(a_i \in T\) if and only if \(x_i = 1\).

\(initConstraints(\langle A, S, f, w, \alpha \rangle , k, t)\). This procedure initializes a set of linear contraints C on X characterized by:

$$\begin{aligned} \forall s_j \in S \ \ \sum _{x_i \in X, s_j \in \alpha (a_i)}{x_i} \ge 1 \end{aligned}$$
(1)

This set of constraints precisely encodes the conditions of team efficiency, i.e., an assignment of X satisfies the set C if and only if it corresponds to an efficient team.

solveTF(C). This procedure is called in lines 3 and 9 and simply searches for an assignment of X that minimizes the cost of the corresponding team under the set of constraints C:

$$\begin{aligned} minimize \ \sum _{x_i \in X}{f(a_i) \cdot x_i} \end{aligned}$$
(2)

breakTeam(Tkt) This procedure searches for a set \(T' \subseteq T\), \(|T'| \le k\), such that \(cov(T {\setminus } T') < t\). This is done by generating a new model on a set \(X'\) of |T| binary variables \(X' = \{x'_i \mid a_i \in T\}\), so that an assignment of values to the variables from \(X'\) represents a subset of agents \(T' \subseteq T\) to be removed from T, i.e., \(a_i \in T'\) if and only if \(x'_i = 1\). The procedure finds an assignment of \(X'\) satisfying a set of constraints characterized by the following pair of equations:

$$\begin{aligned}{} & {} \sum _{x'_i \in X'}{x'_i} \le k \end{aligned}$$
(3)
$$w\left( {\bigcup\limits_{{x^{\prime}_{i} \in X^{\prime},x^{\prime}_{i} = 0}} {\alpha (a_{i} )} } \right){\text{ }} < t{\text{ }}$$
(4)

Equation 3 requires that \(|T'| \le k\), and Eq. 4 requires that \(cov(T {\setminus } T') < t\). So accordingly, breakTeam(Tkt) returns such a subset \(T'\) if it exists, otherwise it returns null.

The procedures solveTF and breakTeam both involve solving optimization problems that are \(\textbf{NP}\)-hard in the general case, and thus rely on an underlying integer programming optimization solver. Notably, these procedures can be implemented using any state-of-the-art solver, the choice of which may only have an effect the computational efficiency of the whole procedure.

\(generateConstraint(T, T')\). When this procedure is called, its input teams T, \(T'\) are such that T is efficient, \(|T'| \le k\) and \(cov(T \setminus T') < t\). That is, T is an efficient team and \(T'\) can be viewed as a certificate proving that T is not \(\langle k, t\rangle\)-partially robust. The procedure aims to generate a constraint based on T and \(T'\) to discard a set of teams from the search space, such that the two following conditions are satisfied: (i) at least team T must be discarded; and (ii) all teams from the discarded set must be non \(\langle k, t\rangle\)-partially robust teams. Condition (i) ensures that Algorithm 1 terminates, and condition (ii) ensures that no potentially \(\langle k, t\rangle\)-partially robust team is discarded, so that the \(\langle k, t\rangle\)-partially robust team eventually returned by the algorithm is optimal. Such a constraint can be specified in a number of ways. We first present here its simplest alternative, which consists in discarding T only.

Doing so, one forbids the previously computed candidate team T to be selected again in any future search iteration. The corresponding constraint states that a candidate team must contain at least one agent that does not belong to T:

$$\begin{aligned} \sum _{x_i \in X, a_i \in A \setminus T}{x_i} \ge 1 \end{aligned}$$
(5)

Since the constraint formalized in Eq. 5 discards T only, it does not take advantage of potentially additional information provided by the certificate \(T'\). However, in the next section (Sect. 5.3), we will introduce two cuts, i.e., two alternatives that exploit team \(T'\) and improve computational efficiency.

To summarize, Algorithm 1 explores a set of efficient teams by increasing deployment cost order, checking for each one of them whether it is \(\langle k, t\rangle\)-partially robust, and returns it as soon as one such team is found. Accordingly, it returns an optimal \(\langle k, t\rangle\)-partially robust team, if it exists.

5.2 Algorithm PR_A

While PR (cf. Algorithm 1) has the guarantee to compute an optimal partially robust team after a finite number of steps, it does not provide any output before the procedure terminates. However, one may be interested in a procedure that provides one with a sub-optimal partially robust team first, and that iteratively searches for another “better” partially robust candidate over time, with a lower deployment cost. This motivates our second method PR_A, an anytime algorithm whose outline is given in Algorithm 2.

figure b

This algorithm takes advantage of the procedures initConstraints, breakTeam and generateConstraint also used in Algorithm 1, and uses two new procedures solveRobTF and solveTF-thr whose details are given later in this section.

The initialization phase goes from lines 2 to 8 and is divided into two sub-steps. First, one checks if the team formed of all available agents (\(T_{cur} = A\), line 2) is \(\langle k, t\rangle\)-partially robust (line 3). If not, then obviously enough no \(\langle k, t\rangle\)-partially robust team can exist, and null is returned in line 5. Otherwise, \(T_{cur}\) is a first \(\langle k, t\rangle\)-partially robust candidate that one seeks to improve in terms of deployment cost. Second, as another potential, better candidate, one computes an optimal k-robust team \(T_{new}\) (line 6). If such a team is found and is different from the initial team \(T_{cur} = A\), this team is set to be the new improved candidate (line 8): since it is k-robust, it is necessarily \(\langle k, t\rangle\)-partially robust as well; and since \(T_{new} \ne A\) (cf. line 7), this team is necessarily strictly improved compared to \(T_{cur}\), i.e., \(f(T_{new}) < f(T_{cur})\). Starting from a k-robust team instead of the set of all agents serves as a search space pruning: if a k-robust team exists, there is no need to search for teams with a higher deployment cost. This step is also “almost free” computationally speaking, since computing an optimal k-robust team is not harder than computing an optimal efficient team (cf. Proposition 2).

At this point, one proceeds to iteratively improve the current candidate \(T_{cur}\), with the procedure solveTF-thr being now the main tool. Similar to Algorithm 1, one starts with a set of constraints C that encodes the conditions of team efficiency (line 10). One then searches using solveTF-thr (line 11) for any efficient team \(T_{new}\) whose deployment cost is strictly lower than the one of the current candidate \(T_{cur}\), i.e., such that \(f(T_{new}) < c\) with \(c = f(T_{cur})\). If no such team exists, then no partially \(\langle k, t\rangle\)-partially robust team can be found with deployment cost strictly lower than c, i.e., \(T_{cur}\) is an optimal \(\langle k, t\rangle\)-partially robust team and is returned in line 19. Otherwise, similar to line 3 one checks whether \(T_{new}\) is partially robust, by seeking to “break” it (line 13). If \(T_{new}\) is not found to be a counter-example, then it is a new \(\langle k, t\rangle\)-partially robust candidate with a deployment cost improved compared to the one of \(T_{cur}\). In this case, \(T_{cur}\) and the new cost threshold are updated accordingly (lines 15 and 16). Otherwise, \(T'\) serves as a certificate that \(T_{new}\) is not \(\langle k, t\rangle\)-partially robust, and a constraint is generated based on \(T_{new}\) and \(T'\) (line 17). In both cases, \(T_{new}\) is removed from the search space and a new efficient candidate is sought for (line 18).

The procedures solveRobTF and solveTF-thr are characterized as follows:

\(solveRobTF(\langle A, S, f, w, \alpha \rangle , k)\). This procedure is called in line 6 and computes an optimal k-robust team. It searches for an assignment of X that minimizes the cost of the corresponding team (cf. Eq. 2) under the following set of constraints:

$$\begin{aligned} \forall s_j \in S \ \ \sum _{x_i \in X, s_j \in \alpha (a_i)}{x_i} \ge k + 1 \end{aligned}$$
(6)

That is, solveRobTF consists in calling the procedure solveTF used in Algorithm 1, but under a set of constraints that strengthen the ones given in Eq. 1. This set of constraints precisely encodes the conditions of team robustness, i.e., a team is k-robust if and only if each skill is covered by least \(k + 1\) agents from the team [2].

solveTF-thr(Cc). This procedure is called in lines 11 and 18. It is a modified version of the procedure solveTF: instead of searching for an efficient team of minimal deployment cost, solveTF-thr(Cc) consists in restricting the search space (currently characterized by the set of constraints C) to teams whose deployment cost is strictly lower than a threshold c. The set of corresponding constraints is characterized by:

$$\begin{aligned} \sum _{x_i \in X}{f(a_i) \cdot x_i} < c \end{aligned}$$
(7)

PR_A is anytime, since one is always given a partially robust team \(T_{cur}\) that is updated over time if another better candidate is found. It is also complete, since it is guaranteed to return an optimal team after a finite number of steps.

5.3 Cut generation

The constraint added by the procedure \(generateConstraint(T, T')\) used in both algorithms can be specified in various ways, as long as it safely discards from the search space a set of teams that are guaranteed not to be \(\langle k, t\rangle\)-partially robust, including T. In the previous section, we have introduced one of its simplest forms that simply removes T from the search space, and thus does not exploit any potentially useful information provided by the certificate \(T'\) (cf. Eq. 5). To fill the gap, we now present two cuts, i.e., two alternative implementations of this procedure that exploit \(T'\) to prune from the search space a larger set of spurious teams.

These cuts are based on the following useful result:

Proposition 5

Given a weighted TF problem description \(\langle A,\) Sfw\(\alpha \rangle\), \(k \in {\mathbb {N}}\) and a rational number \(t \in [0, 1]\), a team \(T \subseteq A\) is \(\langle k, t\rangle\)-partially robust if and only it is efficient and for each \(S' \subseteq S\) such that \(w(S \setminus S') < t\), we have that \(|\{a_i \in T \mid \alpha (a_i) \cap S' \ne \emptyset \}| \ge k + 1\).

Proposition 5 above states that a necessary and sufficient condition for any team T to be \(\langle k, t\rangle\)-partially robust is that for any subset of skills \(S'\) such that the weight of \(S \setminus S'\) is strictly lower than t, T must contain at least \(k + 1\) agents that possess at least one of the skills from \(S'\). Our cuts, simply named cut and cut+ in the following, are based on this result: they both consist in generating a (set of) constraint(s) that exclude(s) teams which do not satisfy this condition, based on the counter-example \(T'\) found in line 5 of Algorithm 1 and line 13 of Algorithm 3.

cut (first version). Taking advantage of a certificate \(T'\), one seeks for the smallest (in terms of cardinality) subset of skills \(filter(\alpha (T'))\) (simply denoted by \(S'\)) covered by \(T'\) such that \(w(S \setminus S') < t\). To this end, one sorts the skills \(\alpha (T')\) in a non-increasing order w.r.t. w, and from the resulting ordered list \((s_1, s_2, \ldots )\) with \(w(s_i) \ge w(s_{i+1})\) for each i, one selects the sublist \(S' = (s_1, \ldots , s_j)\) such that \(w(S {\setminus } S') < t\) and \(w((S {\setminus } S') \cup \{s_j\}) \ge t\). Obviously enough, the set \(S'\) is computed in polynomial time in the number of agents in \(T'\).

This cut is implemented in the procedure \(generateConstraint(T, T')\) in line 8 of PR (Algorithm 1) and line 17 of PR_A (Algorithm 3), that computes \(S'\) as described above and generates a constraint characterized by the following equation:

$$\begin{aligned} \sum _{x_i \in X, \alpha (a_i) \cap S' \ne \emptyset }{x_i} \ge k + 1 \end{aligned}$$
(8)

Doing so, one prunes only teams that are not \(\langle k, t\rangle\)-partially robust, which also includes the team T last computed. The idea is an improved version of the cut considered in [3] to compute a recoverable team, which consists in excluding those that do not include at least one agent from a certain set of skills without which the team cannot be recovered (or otherwise would be suboptimal).

Notably, in Eq. 8 one could simply choose \(\alpha (T')\) instead of its preprocessed subset of skills \(S' = filter(\alpha (T'))\). However, the equation using \(S'\) generates a constraint strictly stronger than using \(\alpha (T')\) (when \(S' \subsetneq \alpha (T')\)), resulting in an improved pruning of the search space.

cut+. We also considered a slightly modified version of the cut introduced above, by generating a set of constraints based on the counter-example \(T'\). The idea is similar to the first version of the cut, i.e., by sorting the skills \(\alpha (T')\) in a non-increasing order w.r.t. w. However, instead of focusing only on its smallest subset \(S' = filter(\alpha (T')) \subseteq \alpha (T')\) one considers a set of such subsets \({{\mathcal {S}}} = \{S'_1, \ldots \}\), recursively computed as:

  • \(S'_1 = filter(\alpha (T'))\)

  • for each \(S'_i \in {{\mathcal {S}}}\), \(i > 1\), \(S'_i = filter(\alpha (T') {\setminus } (S'_1 \cup \ldots \cup S'_{i-1}))\),

and such that for each \(S'_i \in {{\mathcal {S}}}\), \(w(S {\setminus } S'_i) < t\).

This modified cut is implemented in the procedure \(generateConstraint(T, T')\) that now computes \({{\mathcal {S}}}\) as described above and generates a set of constraints characterized by the following set of equations:

$$\begin{aligned} \forall S'_i \in {{\mathcal {S}}} \ \ \ \ \sum _{x_i \in X, \alpha (a_i) \cap S'_i \ne \emptyset }{x_i} \ge k + 1 \end{aligned}$$
(9)

The choice of this modified cut is motivated by the observation that the constraints from the generated set all bear on disjoint set of skills \(S'_i\). Indeed, this potentially increases the diversity of the set of generated contraints given a single certificate \(T'\), i.e., it intends to increase the pairwise symmetrical difference between the set of agents appearing in these constraints, thus pruning different areas of the search space at once.

In the empirical results section, these two cuts (cut and cut+) will be compared and shown to significantly reduce the runtime of the base procedures.

6 Benchmarks

In this section, we empirically compare our algorithm for finding optimal partially robust teams to other solution concepts, namely efficiency, robustness and recoverability, on a wide range of instances. This section also describes the generation protocol of these instances, which can be divided into two sets. The first set consists of existing instances found in previous works on TF and set covering [61, 62, 2, 63]. These instances were precisely the ones considered in [3] to compare the computational efficiency of TF, RobTF and RecTF. Since none of these instances considered weighted skills (the notion being relevant only to partially robust TF), we have artifically extended them to weighted TF problem descriptions (see Sect. 6.1 below). The second set consists of original, structured weighted TF instances generated following a protocol we introduce in Sect. 6.2.

6.1 Existing benchmarks (rtf30, cire100, cire150, scp1000)

First, we have considered all sets of instances (40 instances in total) that have been considered in [3]:

  • rtf30 a set of ten small instances (30 agents / 11 skills) which correspond to the robust TF instances given in [2];

  • cire100 / cire150 two sets of ten medium-sized instances (100 agents / 20 skills for the cire100 set and 150 agents / 30 skills for the cire150 set), that are set covering instances used in [62, 63];

  • scp1000 a set of ten large instances (1000 agents / 200 skills), that are classical instances found in the OR Library [61] under the scp4x package, used in set covering works (e.g., [64,65,66]).

All of these instances fully specify a TF problem description as in Definition 1, i.e., no weight was initially associated with the skills. Therefore, for all of these instances we set by default a weight of \(w(s_j) = 1/|S|\) for each skill, and we used a normalized weighted sum function \(w = w_\Sigma\) (cf. Sect. 4), i.e., every \(S' \subseteq S\), \(w(S') = \sum _{s_j \in S'}{w(s_j)}\).

These instances were randomly generated following different and simple protocols. For example, the protocol used to generate each rtf30 instance [2] consisted in associating with each agent \(a_i \in A\) a random cost \(f(a_i)\) within range [1000, 2000], and a random set of skills with set size randomly chosen within range [1, 10]. Similarly, the cire100 and cire150 instances consisted of agents with a random cost and a random set of skills. As to the scp1000 instances, for each cost value c ranging over [1, 100], ten agents were associated with c, and a small randomly chosen set of skills was associated with each agent (four skills per agent in average). Table 2 summarizes some characteristics of these instances.

6.2 New benchmarks (map-r1, map-r2, map-r3, map-r4)

One of the drawbacks of the instances presented in the previous section is that they lack structure and may not reflect real-world situations. In particular, in all of these instances the cost of each agent has no link with the amount or rarity of skills it is associated with. As a consequence, these instances are not prevented from the presence of agents \(a_i\) dominated by some other agent \(a_j\), i.e., such that \(f(a_j) < f(a_i)\) and \(\alpha (a_i) \subseteq \alpha (a_j)\). In addition, all skills have the same importance since we assigned by default the same weight value for each skill; doing otherwise, i.e., associating each skill with a random value, would not solve the issue about the lack of structure.

To fill the gap, we have developed a software that allows one to generate structured instances modeling facility deployment problem instancesFootnote 3. Our software is publicly available [6] and allows for the generation of various types of instances according to a set of parameters. The facility deployment problem consists in deploying a set of facilities (e.g., health centers, antennas, schools, shelters) on a populated map so as to maximize a certain population coverage while minimizing the overall deployment cost [4]. The problem is of particular importance, e.g., for mobile phone operators which aim to deploy a set of cell towers in an urban environment. In this context, finding an optimal efficient team corresponds to finding a facility deployment solution of minimal cost while providing a service coverage over the whole population: facilities correspond to agents and the population to be covered in a certain area corresponds to a weighted skill, where the weight depends on the density of the population at that specific location. Indeed, this type of problems is perfectly suitable from the partial robustness analysis viewpoint: when a number of cell towers suddenly becomes unfunctional, one wants to ensure that a certain high percentage (if not 100%) of the population is still provided with an access to the network before recovery.

Table 2 Description of instances for each type

Each populated map was synthesized according to a number of parameters, e.g., size, variation in terms of elevation, and total population. We only provide below a rough description of the generation protocol, but the detailed list of the parameters and their role are available in Appendix B.

First, one generates an elevation map made of water parts, lands and mountains. A grid of numbers is created using Perlin noise [67], a procedural texture primitive commonly used by visual effects artists to increase the appearance of realism in computer graphics. The grid is then converted into a hexagonal grid for which each cell is associated with a “type” depending on the range of its value in the grid. A low (resp., mid, high) value is interpreted as a water cell (resp., a land cell, a mountain cell).

Fig. 1
figure 1

Optimal teams for different solution concepts (efficiency, robustness and partial robustness)

Second, the map is populated by iteratively adding an individual on the grid. Initially, a few individuals are added in different land cells randomly chosen, provided that the cell is next to a water cell. Then, a new individual is added at random following a probability distribution. The closer to an already populated cell, the higher its probability to welcome a new individual. The water cells and the cells that already host a maximum number of individuals cannot host a new individual. The process is repeated a number of times which at last corresponds to the total population in the map. Figure 1a represents a populated map generated using this method: blue (resp. brown, white) cells are of water type (resp. land, mountain type). Different scales of brown correspond to different elevation degrees of land, only used to tune the probability of adding an individual to a land cell. The gray scales represent the number of individuals in a cell. The darker a cell, the more densely populated, so a pitch black cell contains a maximum number of individuals.

Third, a populated map is translated into a weighted TF problem description \(\langle A, S, f, w_\Sigma , \alpha \rangle\) as follows. We consider different types of agents \(type_1, type_2, \ldots\). Each type \(type_i\) of agent corresponds to a facility that has a deployment cost equal to i and a cover range equal to \(i - 1\). For instance, a cell tower of type \(type_3\) has a deployment cost equal to 3, and when it is deployed in a certain cell C on the grid, it provides the required service to anyone that is in a cell \(C'\) such that the distance between C and \(C'\) is at most 2; the distance between two cells on the grid corresponds to the length of the shortest path between C and \(C'\). So for each type of facility \(type_i\) and each grid cell C that is not of water type, one considers an agent \(a_i^C\) of cost \(f(a_i^C) = i\) which corresponds to a facility of type \(type_i\) to be potentially deployed in the cell C. This defines the set A of agents and the cost function f. The set of skills S and the skill weight function \(w_\Sigma\) (note that we considered a normalized weighted sum function) are simply defined as follows. One associates with each populated grid cell P (i.e., a cell that hosts at least one individual) a skill \(s_P\); and the weight of each skill \(w_\Sigma (s_P)\) is defined as the number of individuals in the grid cell P. Then, the agent-to-skill mapping \(\alpha\) is defined as follows. An agent \(a_i^C\) has the skill \(s_P\) if the grid cell P is within the reach of the facility \(a_i^C\), i.e., if the distance between the grid cells C and P is less than or equal to \(i - 1\). Lastly, we have added for each instance a set of constraints forbidding the joint deployment of two facilities \(a_i^C\), \(a_j^C\) located at the same cell C. Thus, given such an instance \(\langle A, S, f, w_\Sigma , \alpha \rangle\), a team T corresponds to a set of facilities to be deployed on the corresponding map.

We have focused on four sets of 100 map instances of different sizes by varying their “resolution” r from 1 to 4, and with default values for the remaining parameters. These resulted in the four sets of instances map-r1, map-r2, map-r3 and map-r4. The map depicted in Fig. 1a represents an instance of the set map-r3. Considering four such sets allowed us to focus on weighted TF instances on different sizes in terms of number of agents and skills. For instance, the map-r3 set was formed of instances with an average of 537 agents / 75 skills, while the map-r4 set consisted of instances with an average of 1535 agents / 285 skills. Table 2 summarizes the characteristics of each set.

Figure 1 depicts for a given map instance from the set map-r3 an optimal team, that is respectively efficient (Fig. 1b), 1-robust (Fig. 1c), \(\langle 1,.99\rangle\)-partially robust (Fig. 1d), \(\langle 1,.95\rangle\)-partially robust (Fig. 1e), and \(\langle 1,.90\rangle\)-partially robust (Fig. 1f). For instance, the optimal efficient team (Fig. 1b) is formed of eight agents corresponding to four facilities of type \(type_4\) and four facilities of type \(type_1\). Each such facility is represented by a label on the corresponding grid cell, corresponding to its type / deployment cost. The circle drawn around each deployed facility represents the populated area that is covered by it, i.e., the set of skills possessed by the corresponding agent.

7 Empirical results

We have implemented our algorithms PR and PR_A (cf. Algorithms 1 and 2) to compute partially robust solutions, as well as algorithms to compute solutions related to the existing solution concepts of optimal efficiency (denoted by TF for short), (full) robustness (RobTF), and recoverability (RecTF). The notion of optimality for RecTF (i.e., given k, one seeks for a \(\langle k, r\rangle\)-recoverable team T whose overall cost \(f(T) + rc(T, k)\) is minimal) and the corresponding encodings were the same ones as described and used in [3]; TF was computed using Eq. 2 under the set of constraints given in Eq. 1; and RobTF through Eqs. 2 and 6.

All the solution concepts and algorithms were empirically evaluated and compared on the benchmarks described in the previous section. The evaluation was performed from the viewpoint of computational efficiency (runtime), but also in terms of solution quality: more precisely, we compared for each instance the deployment cost of the computed solution, its skill coverage when k agents are lost, and its recovery cost afterwise. We also analyzed how the parameter t in \(\langle k, t\rangle\)-partially robustness impacts those results, and focused on values \(t \in \{0.99, 0.95, 0.90\}\).

We used CPLEX as the constraint solver in all our experiments. The version of CPLEX used was IBM(R) ILOG(R) CPLEX(R) Interactive Optimizer 12.10 with the option set parallel 1. All experimentations have been conducted on Intel Xeon E52643 (3.30GHz) processors with 64Gb memory on Linux CentOS. Time-out was set to 3600 s for each run of the algorithm and for each instance; memory-out was set to 32 Gb for each such run.

7.1 Computational efficiency

Table 3 shows the number of instances solved within the time limit of 3600 s for each method, except for TF and RobTF for which all instances were solved. In the particular case of the anytime algorithm PR_A, an instance is considered to be solved when an optimal team is found, i.e., the algorithm terminates without being interrupted. We also measured the time in seconds required for each solution concept and algorithm. Figure 2 shows four cactus plots for different values of k in \(\{1,2,3,4\}\) (the results for PR and PR_A are shown using cut+). Each plot gives for each notion the number of instances solved in a given amount of time. Figures 3, 4 and 5 provide through scatter plots some additional insights on the relative efficiency between (i) an implementation of PR with (our first version of) the cut and without the cut (Fig. 3); (ii) an implementation of PR with our first version of the cut and cut+ (Fig. 4); (iii) PR and PR_A, both using cut+ (Fig. 5).

First, it can be seen in Table 3 that both PR and PR_A are proved to be much more efficient than RecTF: for example, no RecTF solution could be found for any of the map-r3 instances, whereas PR and PR_A could find one for a reasonable proportion of these instances, using any of our cuts (cut / cut+) presented in Sect. 5. Note that all map-r2 instances were solved using PR or PR_A for \(k \in \{3, 4\}\), whereas none of them was solved for RecTF.

Table 3 also shows how our cuts play a crucial role in the efficiency of both algorithms PR and PR_A, as few instances overall were solved without exploiting them. The impact of the cut even on instances that were solved without using it is also seen in Fig. 3. The reason why the cut did not impact the runtime efficiency of scp1000 instances for \(t =.90\) is because for these instances, optimal efficient teams were all optimal \(\langle k,.90\rangle\)-partially robust teams (see Fig. 8 which we discuss later): an optimal team was always found at the first iteration step (line 3 of Algorithm 1), so the cut was simply not used.

One can also see from Table 3 that in the majority of cases, the use of cut+ instead of the first version of our cut allows one to solve slightly more instances. As shown in Fig. 4, when considering the instances that were solved in both cases, using cut+ in PR was more efficient for \(t =.99\) compared to \(t =.90\). For \(t =.90\), the impact was lower, as the solver required more time to preprocess the additional constraints provided by our cut+ without gaining significant efficiency in the solving phase. This can be attributed to the fact that these instances were relatively small, and our cut+ did not provide much benefit. For instance, instances with approximately 100 agents could be solved almost instantly even without using our cut+. However, it can be seen that adding more constraints learned from a single counter-example is more beneficial for higher values of t. Indeed, the closer t is to 1, the more teams a cut prunes. It is actually easy to see that if \(t = 1\), PR only performs two iterations to reach optimality: Eqs. 6, 9 and 8 coincide, i.e., they all ensure that each skill is covered by at least \(k + 1\) agents, i.e., that the team is k-robust.

The phenomenon described above can also be seen in the cactus plots in Fig. 2. For higher values of k, the optimal partially robust teams could be found more efficiently for \(t=.99\) than for lower values of t: indeed, in those settings optimal partially robust teams are typically “far” from optimal efficient ones in terms of deployment cost, suggesting an increasing impact of pruning the search space through our cut+. When \(k = 1\), optimal partially robust teams could be found slightly more efficiently for lower values of t: indeed, those teams where close to the optimal efficient ones, and only few iterations were necessary to compute them without the need for pruning the search space. In contrast, for higher values of k the optimal partially robust teams could be found more efficiently when \(t=.99\) compared to lower values of t: indeed, for higher values of k and t, optimal partially robust teams are typically far from optimal efficient ones, suggesting that more iterations are typically required to compute them, and thus increasing the impact of an efficient pruning of the search space implemented through our cut+.

As to the comparison between PR and PR_A, Table 3 shows that in most of the benchmarks, PR_A could solve more instances than PR. However, through Fig. 5, PR is shown to be generally more efficient than PR_A for instances solved by both of them. This is explained by the fact that PR, as opposed to PR_A, iteratively computes the candidate efficient teams in an increasing order. Hence, for those partially robust teams whose cost is close to an optimal efficient team, PR performs better than PR_A. However, as the number of iterations increases, the chances of PR to fail to compute a solution within the time limit increase as well, and in these cases the success rate of PR_A is higher.

Table 3 Number of solved instances with a time out of 3600 s, for each set of instances and each solution concept
Fig. 2
figure 2

Time results on all instances for \(k \in \{1, 2, 3, 4\}\)

Fig. 3
figure 3

Comparison between the implementation of PR with the first version of the cut, and without

Fig. 4
figure 4

Comparison between the use of (the first version of the) cut and cut+ for PR

Fig. 5
figure 5

Comparison between PR and PR_A in terms of solving time (in seconds), both implemented with cut+

7.2 Solution Quality

Let us start with a short analysis of the results in the map instance example provided in Fig. 1. Similar to our introductory example, one can see on these figures the advantages of forming partially robust teams instead of efficient or (fully) robust teams. For instance, the optimal \(\langle 1,.99\rangle\)-robust team has only a deployment cost of 30 and a recovery cost of 4, whereas a 1-robust team has a deployment cost of 41: thus allowing \(1\%\) of coverage loss in the worst case during a disaster phase of the same scale (i.e., \(k = 1\) in both cases) leads the deployed solution to be \(1 - 30/41 = 27\%\) cheaper in the initial phase, and \(1 - 34/41 = 17\%\) cheaper overall (in the initial and recovery phases). The 1-partial coverage of the optimal efficient team (Fig. 1b) is only \(20\%\), since a single, large facility is deployed to provide the necessary service for a densely populated area; its overall cost is, however, comparable to the overall cost of all three depicted partially robust teams.

Figure 6 shows an instance of the map-r4 set that was not solved by PR and for which a sub-optimal solution was found using PR_A. The difference in terms of deployment cost between an optimal 2-robust team (\(f(T'_2) = 194\), cf. Fig. 6c) and a sub-optimal \(\langle 2,.90\rangle\)-partially robust team found by PR_A (\(f(T'_3) = 95\), cf. Fig. 6d) clearly shows the advantage of our anytime algorithm: even when an optimal solution is not found, its deployment cost appears to be quite satisfactory compared to the (fully) robust deployment solution, while providing a reasonable guarantee of population coverage in the disaster phase.

Fig. 6
figure 6

Optimal efficient and 2-robust teams and sub-optimal \(\langle 2,.90\rangle\)-partially robust team

The advantages of the concept of partial robustness are made even clearer through Fig. 7 and 8, which show the deployment and overall cost on average for the generated facility location instances (map-r2 to map-r4 sets), and for the cire100, cire150 and scp1000 instances, respectively. The results for partially robust teams were the ones computed by PR_A after termination or currently found within the time limit, and thus some of the reported results may include sub-optimal teams. The percentage above each TF bar corresponds to the k-partial coverage value pc(Tk) on average. It shows that the comparative behavior between optimal efficient teams, robust teams and partially robust teams described previously in Figs. 1 and 6 extends to all benchmarks on average. On the one hand, the overall costs of \(\langle k, t\rangle\)-partially robust teams remain below the deployment of k-robust teams, even for high values of t. As an example, one can see that the overall deployment cost of \(\langle 4,.95\rangle\)-partially robust teams for the map-r4 set (cf. Fig. 7) is about only 60% of the deployment cost of 4-robust teams on average; note that for this benchmark and parameters, all of the solutions found by PR_A are sub-optimal (see Table 3). On the other hand, these costs remain arguably reasonable in comparison with optimal efficient teams, while efficiency can only guarantee a very low k-partial coverage, especially for the more structured set of facility location instances (e.g., below 50% on average for \(k \ge 2\)).

Figure 9 also clearly shows the advantage of sub-optimal partially robust teams found with PR_A compared to robust ones in terms of deployment cost. This time, the results are shown instance-wise through a scatter plot, and only for those instances for which PR could not find a solution within the time limit. Lastly, Fig. 10 shows the evolution with time execution of the deployment cost of the currently found solution using PR_A, for some randomly selected instances and parameters. The focus was given for instances for which an optimal solution was found in more than 100 s, after more than 3 iterations (lines 12 to 18 in Algorithm 2). This clearly shows how on these instances a near-optimal solution can be quickly found in the first iteration steps, and suggests that one can fairly extend this conclusion to the cases when an optimal solution is not found within a certain time limit.

Fig. 7
figure 7

Deployment and overall cost on average for the generated facility location instances. The percentage above each TF bar corresponds to the k-partial coverage value pc(T) on average

Fig. 8
figure 8

Deployment and overall cost on average for the cire100, cire150 and scp1000 instances (values are divided by 100). The percentage above each TF bar corresponds to the k-partial coverage value pc(T) on average

Fig. 9
figure 9

Deployment cost of sub-optimal partially robust teams computed using PR_A compared with k-robust teams. Only instances not solved by PR are shown

Fig. 10
figure 10

Evolution in time execution of the deployment cost for some randomly selected instances (cire150, scp1000, r3 and r4 instances), using PR_A

8 Conclusion

We introduced the notion of partial robustness in Team Formation (PR-TF): given an integer k and a rational number \(t \in [0, 1]\), a deployed team is \(\langle k, t\rangle\)-partially robust if whenever k agents are removed from it, the “ratio” of skills covered by the team formed of the remaining agents is kept higher than t. We formalized PR-TF and analyzed its computational complexity. We then provided two complete algorithms à la CEGAR [55,56,57], named PR and PR_A, to compute optimal partially robust teams, i.e., partially robust teams of lowest deployment cost. While PR searches for candidate teams in an increasing deployment cost order, PR_A is an anytime algorithm: starting from a sub-optimal candidate, it iteratively seeks for subsequent partially robust candidate teams with lower deployment costs. To empirically evaluate this new notion and our algorithms PR and PR_A, we compared with existing solution concepts and encodings for team efficiency (TF), robustness (RobTF) and recoverability (RecTF) on existing benchmarks, but also on new structured instances. For this purpose, we have developed a benchmark generation software publicly available [6]. Our software allows one to create instances modeling facility location problems on populated elevation maps. The structure of these maps can be adjusted through various parameters related to for example granularity and Perlin noise [67]. This type of instances is more structured than some existing benchmarks previously used in Team Formation [3, 2], since they are closer to real-world situations, and thus are arguably suitable to evaluate teams from the (partial) robustness viewpoint: a deployed team consists of a set of facilities, and the goal is to find the least expensive set of facilities that provides an acceptable degree of population coverage, even in the case when some of them become out of service.

Our contribution reveals that PR-TF exhibits a number of interesting properties from several perspectives, when compared to the existing notions. First, we proved that the decision problem corresponding to PR-TF is \(\mathbf {\Sigma ^P_2}\)-complete. On the one hand, the counterpart problems for TF and RobTF are both \(\textbf{NP}\)-complete, which makes the partial robustness issue harder, computationally speaking. However, PR-TF is easier than RecTF since the latter was proved to be \(\mathbf {\Sigma ^P_3}\)-complete [3]. Given the ever-increasing efficiency of modern constraint-based solvers, and since our algorithms are based on a approach that makes iterative use of such solvers, computing optimal partially robust teams could be done quite efficiently for instances of reasonable size: for instance, for one of our facility deployment benchmark consisting of instances with more than 500 agents and 75 skills in average (map-r3), our algorithms could compute an optimal solution for a majority of instances within one hour whereas the existing encodings for RecTF could not solve a single instance. Second, PR-TF provides by definition a guarantee of skill coverage in case of some agent losses. In comparison, the notions of TF and RecTF do not provide any such guarantee. In particular, we have empirically shown that the coverage loss in a disaster phase can be substantial for optimal efficient teams, even after losing a few agents from the team.

Third, we have evaluated the deployment cost of optimal partially robust teams and compared it with the other solution concepts. For most of the benchmarks, we found that there is a substantial gap between the deployment cost of \(\langle k, t\rangle\)-partially robust teams and the one of “fully” robust teams (RobTF), even for high values of t.

PR-TF is thus a new solution concept that can reasonably seen as an interesting trade-off between the existing notions of team efficiency, robustness and recoverability, both in terms of computational efficiency, skill coverage in the case of agent losses, and overall deployment cost.

This work opens a number of perspectives for further research. Considering other heuristics to quickly compute interesting cuts and determining how many can be added depending on the instance at hand will deserve to be investigated.

It could be also interesting to evaluate the practical benefits of considering the nature of the graph representing the problems. Because the TF problem is equivalent to the hitting set problem, all the parameterized complexity results that stand for the hitting set problem also stand for the team formation problem. In particular, it is well known that the hitting set problem parameterized by the size of the solution is a W[2]-complete problem in parameterized complexity theory [68], which shows the difficulty of computing one TF solution. Nevertheless, as shown in [69], it could be possible to investigate other parameters such as the cyclomatic number of the graph to obtain efficient algorithms. Because all our algorithms are based on the elementary brick that consists in computing TF solutions, improving this basic brick will lead to a reduction of the required time needed to obtain solutions for all the other tasks.

To deal with the particular structure of our map instances, we could also consider coarse-grained methods that would take into account regions of different sizes. More precisely, it could be possible to divide the maps into squares that can be handled efficiently. The challenging part is then to combine local solutions in order to get a global one.

Another perspective for further research consists in evaluating other solving paradigms such as SAT [58] or CSP [70]. In this respect, we plan to analyse the results obtained by the state-of-the-art SAT solvers during the last SAT competition on our TF benchmarks [71].

From an application point of view, we also plan to investigate if our framework can be used in combination with smart city energy models. For instance, we could consider the simulator proposed in [72] to evaluate the different team formation notions presented in this paper in a well controlled environment. Such experiments could be helpful in order to estimate what is the good compromise between threshold values and the number of possibly defective agents.