1 Introduction

Given a classroom containing a fixed number of students and a fixed number of tables that can be of different sizes, as well as a list of preferred classmates to sit with for each student, the team composition problem in a classroom (TCPC) is the problem of finding an assignment of students to tables in such a way that preferences are maximally-satisfied. Our motivation behind this work is to solve a problem posed by the director of studies of a secondary school in the area of Barcelona, though this problem may be found in a wide range of situations and institutions.

In this paper, we formally define the TCPC, prove that it is NP-hard and define a MaxSAT model of the problem. Moreover, we report on the results of an empirical investigation that show that solving the TCPC with MaxSAT solvers is a promising approach.

To tackle the TCPC we use a MaxSAT-based problem solving approach, which is an active area of research in Artificial Intelligence, (see e.g. [2, 5, 7,8,9,10,11,12, 15,16,17, 20, 21] and the references therein for previous and related work). MaxSAT-based problem solving is a generic problem solving approach for optimization problems which consists on first defining a MaxSAT model for instances of the problem to be solved, and then derive solutions to the encoded instances of the problem using an off-the-shelf MaxSAT solver. By a MaxSAT model we mean a representation of the problem using the language of Boolean propositional logic. It is a declarative approach: we only need to define a model and from that model an optimal solution is automatically derived. Furthermore, the method is highly efficient because we may take advantage of the extremely efficient MaxSAT solvers which are publicly available.

It is commonly assumed that designing an algorithm to work directly on the original problem encoding should outperform approaches that require a translation via a generic intermediate formalism, such as a CSP, SAT or MaxSAT. However, this line of reasoning ignores the fact that generic solvers can benefit from many years of development by a broad research community. It is not easy to replicate this kind of effort in other domains.

In the present formulation of the problem, we consider the preferences of the students. Nevertheless, our approach could also be easily adapted to take into account other factors that can be relevant to the performance of a team such as personality, expertise, competence, competitiveness and human formation [4, 6].

The rest of the paper is organized as follows: Sect. 2 defines the TCPC formally and proves that it is NP-hard. Section 3 gives some background on MaxSAT. Section 4 defines a MaxSAT model of the TCPC. Section 5 reports on the empirical investigation conducted. Section 6 gives some conclusions and future work.

2 The Team Composition Problem in a Classroom

Depending on the activity to be performed in a classroom at a given moment, the distribution of the students may need to be different. In the general case, we consider there is a fixed number of students and there is a list of preferred classmates to sit with for each student. Then, the goal is to partition students into teams, which may have different sizes, in such a way that the preferences of the students are maximally-satisfied.

The version of the TCPC that we use as a case study in this paper has the following constraints:

  • The classroom has n students.

  • The classroom has tables of 2 and 3 students with a combined capacity for n students.

  • Each student has provided a list of classmates she would prefer to sit with.

The objective is to find an assignment of students to tables such that preferences are maximally-satisfied. Notice that the first two constraints are hard whereas the last one is soft. We will say that a solution is fully-satisfied if, and only if, all the students in the same table have the rest of the students of the table in their list of preferences. We will say that a solution is maximally-satisfied if, and only if, the number of students who have their preferences satisfied is maximized. Note that a fully-satisfied solution is also a maximally-satisfied solution.

Proposition 1

Given n students, a classroom that has tables of 2 and 3 students with a combined capacity for n students, and a list of preferred classmates to sit with for each student, the problem of deciding if there is a fully-satisfied solution is NP-complete.

Proof

This problem belongs to NP: we can check, in polynomial time, whether or not an assignment of students to tables is a fully-satisfied solution by inspecting the lists of preferences of the students.

We now prove that this problem is NP-hard by reducing the problem of partitioning a graph into triangles to it.

Given a graph \(G=(V,E)\), where V is the set of vertices and E is the set of edges, that verifies that \(|V|=3q\) for some integer q, the partition of V into triangles consists on finding a partition of V formed by \(V_1, \ldots , V_q\), each containing exactly 3 vertices, such that for each \(V_i=\{u_i,v_i.w_i\}\), \(1\le i \le q\), the edges \(\{u_i,v_i\}\), \(\{u_i,w_i\}\) and \(\{v_i,w_i\}\) belong to E. This problem is NP-complete [14].

That problem can be reduced to an instance of our problem without loss of generality by considering a classroom with 3q students, 0 tables of 2 and q tables of 3. For each edge \(\{u,v\}\) on graph V, establish a preference of student u for student v and a preference of student v for student u. Note that this reduction takes polynomial time. Then, the problem of partitioning the vertices of a graph into triangles has a solution if, and only if, all the students in the classroom can be sat in such a way that all students preferences are fully-satisfied.    \(\square \)

Corollary 1

The TCPC is NP-hard.

Proof

This follows from the fact that every fully-satisfied solution is also a maximally-satisfied solution.

3 The MaxSAT Problem

We assume readers have some familiarity with basic concepts of Boolean propositional logic. The most well-know problem of propositional logic is SAT: given a formula \(\phi \) in Conjunctive Normal Form (CNF), decide whether there is a truth assignment that satisfies \(\phi \).

Reminder: a literal is a propositional variable or a negated propositional variable, a clause is a disjunction of literals, a CNF formula is a conjunction of clauses, and a truth assignment is a mapping that assigns 0 (false) or 1 (true) to each propositional variable. A CNF is satisfied by an assignment if it is true under the usual truth-functional interpretation of \(\vee \) and \(\wedge \) and the truth-values assigned to the variables.

An optimization variant of SAT is MaxSAT: given a CNF formula \(\phi \), MaxSAT consists of finding a truth assignment that maximizes the number of satisfied clauses of \(\phi \). However, in this paper we use the term MaxSAT in a broad sense: we allow to distinguish between hard and soft clauses, and allow to associate a weight to soft clauses (formally, hard clauses have an infinite weight). This more general formulation of MaxSAT is technically known as weighted partial MaxSAT [15], which is formally defined in the remaining of this section.

We start by defining a more general notion of clause. A weighted clause is a pair (cw), where c is a clause and w, its weight, is a positive integer or infinity. A clause is hard if its weight is infinity; otherwise it is soft.

A weighted partial MaxSAT instance is a multiset of weighted clauses

$$\begin{aligned} \phi =\{(h_{1}, \infty ),\ldots , (h_{k}, \infty ),(c_{1}, w_{1}), \ldots , (c_{m}, w_{m}) \}, \end{aligned}$$

where the first k clauses are hard and the last m clauses are soft. For simplicity, in what follows, we omit infinity weights, and write \(\phi =\{h_1, \ldots , h_k,\) \( (c_{1}, w_{1}), \ldots , (c_{m}, w_{m}) \}\). A soft clause (cw) is equivalent to having w copies of the clause (c, 1), and \(\{(c, w_{1}),(c, w_{2}) \}\) is equivalent to \((c,w_{1} +w_{2})\).

Weighted partial MaxSAT for an instance \(\phi \) is the problem of finding an assignment that satisfies all the hard clauses and minimizes the sum of the weights of the falsified soft clauses; such an assignment is called optimal assignment.

4 The MaxSAT Encoding

We show how the TCPC can be represented as a weighted partial MaxSAT instance. In other words, we show how to model the TCPC in the weighted partial MaxSAT formalism. To illustrate how to model the problem, we will consider that the classroom has 28 students and there are 8 tables of 2 students and 4 tables of 3 students. This is a typical classroom distribution in secondary schools of the area of Barcelona.

First of all, we define the set of Boolean variables of our encoding:

$$\{x_{ij} | 1 \le i< j \le 28\} \cup \{x_{ijk} | 1 \le i< j <k\le 28\} \cup \{y_{i} | 1 \le i \le 28\}$$

These variables have the following intended meaning: \(x_{ij}\) is true iff students i and j sit together in a table of 2; \(x_{ijk}\) is true iff students ij and k sit together in a table of 3; and \(y_{i}\) is true if student i sits in a table of 2 and is false if student i sits in a table of 3.

Using the previous Boolean variables, we create a Weighted Partial MaxSAT instance that encodes the constraints of the problem. The proposed encoding has the following hard clauses:

  1. 1.

    For each student i, where \(1 \le i\le 28\), the encoding contains a set of hard clauses that encode the following cardinality constraint:

    1. (a)

      If \(i=1\), then

      $$\begin{aligned} \sum _{j=2}^{28} x_{1j} +\sum _{j=2}^{27}\sum _{k=j+1}^{28} x_{1jk}=1 \end{aligned}$$
    2. (b)

      If \(2 \le i \le 27\), then

      $$\begin{aligned} \sum _{j=1}^{i-1} x_{ji} + \sum _{j=i+1}^{28} x_{ij} +\sum _{j=1}^{i-1}\sum _{k=i+1}^{28} x_{jik} + \sum _{j=i+1}^{27}\sum _{k=j+1}^{28} x_{ijk}=1 \end{aligned}$$
    3. (c)

      If \(i=28\), then

      $$\begin{aligned} \sum _{j=1}^{27} x_{j28} +\sum _{j=1}^{26}\sum _{k=j+1}^{27} x_{jk28}=1 \end{aligned}$$

    This cardinality constraint states that student i sits exactly in one table, and the table is either of 2 or of 3.

  2. 2.

    For each variable \(x_{ij}\), the encoding contains the hard clauses \(\lnot x_{ij} \vee y_i\) and \(\lnot x_{ij} \vee y_j\). Note that \((\lnot x_{ij} \vee y_i) \wedge (\lnot x_{ij} \vee y_j)\) is equivalent to \(x_{ij} \rightarrow y_i \wedge y_j\). This clause states that if \(x_{ij}\) is true, then students i and j sit in a table of 2.

  3. 3.

    For each variable \(x_{ijk}\), the encoding contains the hard clauses \(\lnot x_{ijk} \vee \lnot y_i\), \(\lnot x_{ijk} \vee \lnot y_j\) and \(\lnot x_{ijk} \vee \lnot y_k\). Note that \((\lnot x_{ijk} \vee \lnot y_i) \wedge (\lnot x_{ijk} \vee \lnot y_j) \wedge (\lnot x_{ijk} \vee \lnot y_k)\) is equivalent to \(x_{ijk} \rightarrow \lnot y_i \wedge \lnot y_j \wedge \lnot y_k\). This clause states that if \(x_{ijk}\) is true, then students i, j and k sit in a table of 3.

  4. 4.

    The encoding contains a set of hard clauses that encode the following cardinality constraints: \(\sum _{i=1}^{28} y_i= 16\) and \(\sum _{i=1}^{28} \lnot y_i= 12\). These cardinality constraints state that there are 16 students sitting in tables of 2 and 12 students sitting in tables of 3. In practice, it is sufficient to add either the constraint \(\sum _{i=1}^{28} y_i= 16\) or the constraint \(\sum _{i=1}^{28} \lnot y_i= 12\) because if there are exactly 16 (12) variables \(y_{i}\), \(1 \le i \le 28\), that evaluate to true (false), then the remaining 12 (16) variables must evaluate to false.

The encoding of a cardinality constraint of the form \(x_1+ \ldots + x_n=k\) has \(\mathcal{O}(n)\) clauses if one uses the encoding based on counters and defined in [22]. Other efficient encodings of cardinality constraints are described and analyzed in [1]. In our empirical investigation, we encode the previous cardinality constraints using PBLibFootnote 1, which is a C++ tool for efficiently encoding pseudo-Boolean constraints to CNF.

Since we considered two sizes of tables, we just need one variable \(y_i\) for each student. If we consider n different sizes, then we need \(\left\lceil \log _2 n \right\rceil \) variables for each student. For example, for four different sizes, we need two variables (\(y_i, y'_i\)) and each size is represented by one of the following conjunctions: \(y_i \wedge y'_i\), \(\lnot y_i \wedge y'_i\), \(y_i \wedge \lnot y'_i\) and \(\lnot y_i \wedge \lnot y'_i\).

The soft clauses of our encoding are the following weighted unit clauses:

  1. 1.

    For each variable \(x_{ij}\), \(1\le i < j \le 28\), the encoding contains the weighted unit clause \((x_{ij}, w_{ij}).\)

  2. 2.

    For each variable \(x_{ijk}\), \(1\le i< j< k \le 28\), the encoding contains the weighted unit clause \((x_{ijk}, w_{ijk}).\)

Let us explain how weights are assigned to the variables of the form \(x_{ij}\) and \(x_{ijk}\). First of all, we build a directed graph \(G=(V,E)\), where V contains a vertex i for each student i in the classroom, and E contains an edge (ij) if student i wants to seat with student j. The weight associated with each student i in G, denoted by w(i), is the out-degree of the vertex i of G.Footnote 2 The weight associated with the variable \(x_{ij}\), denoted by \(w_{ij}\), is \(2(w(i)\times w(j))\), where w(i) and w(j) are the weights associated with vertices i and j, respectively, in the subgraph of G induced by the set of vertices \(\{i,j\}\) (i.e.; the weight of student i and j in \(G(\{i,j\})\)). The weight associated with the variable \(x_{ijk}\), denoted by \(w_{ijk}\), is \(3(w(i)\times w(j) \times w(k)/8)\), where w(i), w(j) and w(k) are the weights associated with vertices i, j and k, respectively, in \(G(\{i,j,k\})\). The value of \(w(i)\times w(j)\) ranges from 0 to 1 and the value of \(w(i)\times w(j) \times w(k)\) ranges from 0 to 8. This explains the fact that \(w(i)\times w(j) \times w(k)\) is divided by 8. Moreover, we multiply the weights by 2 in the tables of 2 and by 3 in the tables of 3. In this way, we maximize the number of satisfied students.Footnote 3

In the previous encoding, if the weight associated with a variable is 0, then the negation of this variable is added as a unit clause in the hard part. Moreover, an optimal solution corresponds to a fully-satisfied solution iff all the satisfied soft clauses of the form \((x_{ij}, w_{ij})\) and \((x_{ijk}, w_{ijk})\) have weight 2 and 3, respectively.

Observe that, for fully-satisfied instances, if we add to the hard part the negation of \(x_{ij}\) (i.e., the unit hard clause \(\lnot x_{ij}\)) for each variable \(x_{ij}\) whose associated weight is different from 2 and the negation of \(x_{ijk}\) (i.e., the unit hard clause \(\lnot x_{ijk}\)) for each variable \(x_{ijk}\) whose associated weight is different from 3, then we do not need to add any soft clause. Moreover, any satisfying assignment of the hard part allows us to derive a fully-satisfied solution. This case can be solved either with a SAT solver or with a MaxSAT solver fed with a MaxSAT instance that only contains hard clauses.

If there is no fully-satisfied solution, the objective is to find a solution that satisfies students as much as possible. Because of that, in the general case, we add the clauses \((x_{ij}, w_{ij})\) and \((x_{ijk}, w_{ijk})\) such that \(w_{ij}\ne 0\) and \(w_{ijk}\ne 0\) in the soft part of the encoding. In this way, we provide a solution that maximizes the number of satisfied students. In this case, we say that we have a maximally-satisfied solution.

Table 1. Experimental results for the encoding without soft clauses: Students: number of students; Clauses: mean number of clauses per instance; Variables: mean number of variables per instance; and Time: mean time, in seconds, needed to solve an instance. The number of solved instances, within a cutoff time of 3600 s, is shown in parentheses.

Finally, it is worth mentioning that it is possible to define a MaxSAT encoding of the TCPC using the set of propositional variables \(\{x_i^t | 1 \le i \le 28, 1 \le t \le 12\}\), where the intended meaning of \(x_i^t\) is that \(x_i^t\) is true iff student i sits at table t. However, all the experiments performed with encodings using this set of variables did not outperform the experiments performed with the encoding proposed in this section.

5 Experimental Results

We conducted an empirical investigation to assess how the MaxSAT-based approach to the TCPC works in practice on fully-satisfied instances. In the experiments, in order to analyze the scaling behavior, we considered different sizes of classrooms: the rows always have 2 tables of 2 and 1 table of 3, and the numbers of rows ranges from 1 to 20. So, the numbers of students per classroom ranges from 7 to 140. Besides, we assumed that each student gives a list of students she would like to sit with. We generated the preferences at random in such a way that we can guarantee that the generated instances have fully-satisfied solutions. We generated 50 different TCPC instances for each size of classroom, encoded them to weighted partial MaxSAT, and solved the resulting encodings with the exact MaxSAT solver WPM3 [7] using a cutoff time of 1 h. All the experiments were performed in a 2.3 GHz Intel PC with 1 GB RAM. The results obtained are shown in Tables 1 and 2.

Table 1 shows the results for the encoding that only contains hard clauses. Besides the hard clauses of Sect. 4, we add the unit hard clause \(\lnot x_{ij}\) for each variable \(x_{ij}\) whose associated weight is different from 2 and the unit hard clause \(\lnot x_{ijk}\) for each variable \(x_{ijk}\) whose associated weight is different from 3. Table 2 shows the results for the encoding that has the hard clauses of the previous encoding but also the soft clauses of the form \((x_{ij}, 2)\) and \((x_{ijk}, 3)\).

Table 2. Experimental results for the encoding with soft clauses: Students: number of students; Clauses: mean number of clauses per instance; Soft clauses: mean number of soft clauses per instance; Variables: mean number of variables per instance; and Time: mean time, in seconds, needed to solve an instance. The number of solved instances, within a cutoff time of 3600s, is shown in parentheses.

The empirical results show that the encoding without soft constraints finds optimal solutions quickly and scales well in practice. However, the encoding with soft constraints only finds optimal solutions quickly when the number of students is not greater than 35. In summary, the results show that MaxSAT allows one to find fully-satisfied solutions quickly using suitable encodings. For the TCPC, it is decisive to use efficient encodings for cardinality constraints.

6 Concluding Remarks

We have developed a method to encode the TCPC as a weighted partial MaxSAT problem, proved its NP-hardness, and carried out experiments to evaluate our approach using an exact MaxSAT solver. The results show that our method is useful because it does not need a dedicated algorithm; it is declarative, hence all stakeholders can be involved and understand the way the problem is specified; it is flexible because different classroom configurations can be solved with it; and it is efficient because it provides optimal solution in a reasonable amount of time. In the future, we plan to conduct a more exhaustive empirical investigation, model the problem using MinSAT [18, 19] instead of MaxSAT, and explore the possibility of using our method to encode similar team composition problems. In practice, our method could be combined with profiling techniques [13] to solve the group formation problem in Computer Supported Collaborative Learning applications. Our contributions could also be applied to other projects have taken a different approach to solve related problems using other AI techniques (see [3, 4, 6] and the references therein for further details).