Keywords

1 Motivation

The collaborative team-formation and staffing/scheduling problems in workforce management is of paramount importance in projects deployment and large/scale corporations. Given the intrinsic hardness of multidisciplinary team-formation and clustering techniques, it is necessary to develop tools for this task. In this work we are focused on a maximum diversity regrouping assignment of MBA students; nevertheless, the reader can find potential applications in similar clustering problems. Experience shows that the student skills and learning process benefit significantly from highly-diverse teams when regarding prior experience, age, gender, major and other features. MBA programs are usually split into four to six terms. Many MBA rotate the groups in every term so that students train their ability to adapt to different groups, benefit from new points of view and expand their peer network. Creating highly-diverse teams while keeping at a minimum the repetition of peer-pairs between terms is a very challenging problem faced by program directors at the beginning of every trimester.

The contributions of this paper can be summarized in the following items:

  1. 1.

    A novel combinatorial optimization problem called Max-Diversity Orthogonal Regrouping (MDOR) is here introduced. The goal is to find as many clusterings as terms, maximizing cluster diversity while keeping at a minimum the repetitions of pairs.

  2. 2.

    A GRASP/VND methodology combined with Tabu Search is developed.

  3. 3.

    The effectiveness of our proposal is tested with real-life students from the MBA program offered at IEEM Business School, Universidad de Montevideo, Uruguay.

The document is organized in the following manner. The related work is presented in Sect. 2. A mathematical programming formulation for the MDOR is introduced in Sect. 3. A full GRASP/VND heuristic combined with Tabu Search is presented in Sect. 4. Computational results based on real-life students are presented in Sect. 5. Section 6 contains concluding remarks and trends for future work.

2 Related Work

We identify the closest works of ours from the scientific literature in [2, 3, 7]. A simplified model with a large similarity in the team formation is presented in [3], which considers the dining philosophers problem for the assignment of students into groups. In [7], the problem is modeled using integer linear programming. This work considers a centroid for each cluster. Two approaches are studied: the min-sum approach tries to minimize the distances with respect to the centroid; the second is a min-max approach whose goal is to minimize the maximum (i.e., the worst) distance.

The case-study in [2] consists of the assignment of 235 students to 8 advisors. This work considers integer linear programming, and it is equivalent to the min-sum approach given by [7]. The problem belongs to the \(\mathcal {NP}\)-Hard class, and heuristics are available to tackle it [10]. A hybrid Genetic Algorithm is proposed in [9]. There, the authors suggest Tabu Search combined with strategic oscilations. Independently, [12] proposed an artificial bee-workers approach. In [8], a competitive General Variable Neighborhood Search (GVNS) is also proposed. An extension of this GVNS is offered in [4], with a Skewed VNS combined with a Shaking process to better explore the search-space. The goal in the Orthogonal Regrouping Problem is to partition a given set repeatedly, in such a way that every pair is included only once in some cluster. Well known instances have been extensively treated, e.g., the Kirkman’s Schoolgirl Problem and the Social Golfer Problem.

Here we introduce the MDOR problem, which is suitable to the assignment of MBA students to teams that are re-built in every term. It is worth to remark that our approach has potential applications to other scenarios, such as staffing and scheduling in workforce management [5], team formation models for collaboration [14], and team-formation algorithms for faultline minimization [1], among others.

3 Problem

In this section, we describe the main features of our problem, and then we present a mathematical programming formulation. A brief discussion covers particular cases, which will be considered to address the problem heuristically.

3.1 Problem Description

Our problem formulation requires a definition of distance between any two items. In the context of grouping MBA students, the distance between two students would represent how different they are in terms of a set of criteria (age, type of major, gender, work experience, admission test score, etc.) that the MBA Director chooses. In the case of the real-life sets used in our test, the criteria are:

  • Career (subdivided in percentage of Social Sciences, Natural and Exact Sciences content).

  • Score in the Admission Test.

  • Residence (urban or countryside).

  • Gender.

  • Age.

Career is split into three attributes in [0, 1] which account for the relative levels of Social Sciences, Natural and Exact Sciences. The score in the Admission Test and the Age are natural numbers, while the remaining attributes assume binary domain. Once the attributes are selected, a distance function between the different individuals \(d_{ij}\) must be specified. In what follows, the normalized-Euclidean distance is considered:

$$\begin{aligned} d_{ij} = d(x^{i},x^{j})=\frac{ \Vert x^{i}-x^{j}\Vert _{2}}{max_{u\ne v} \Vert u-v\Vert _2}, \end{aligned}$$
(1)

where the distance between each pair of students is found by a numerical assignment to the different attributes (i.e., different coordinates). Observe that this normalization implies that \(0 \le d_{ij}\le 1\) for all the pairs of students i and j with corresponding attributes \(x^{i}\) and \(x^{j}\).

3.2 Problem Formulation

Consider the following variables:

  • N the number of students.

  • G the number of teams (clusters).

  • K the number of attributes.

  • M the number of students per team: \(M = \frac{N}{G}\) (if integer).

  • S the number of terms (clusterings).

  • \(d_{ij}\) the distance between the students i and j.

  • R is the number of terms that any pair of students can share (R = 1 for a SGP instance).

Consider the set of binary decision variables \(x_{igs}\), such that \(x_{igs}=1\) if and only if the student i is assigned to the group g in term s, and \(x_{igs}=0\) otherwise. We introduce the MDOR problem as the following Integer Quadratic Problem:

$$\begin{aligned} \max _{x_{igs}}&\sum _{s=1}^{S} \sum _{g=1}^{G} \sum _{i=1}^{N-1} \sum _{j=i+1}^{N} d_{ij}x_{igs}x_{jgs}, \end{aligned}$$
(2)
$$\begin{aligned} s.t. \sum _{g=1}^{G} x_{igs}&= 1, \, \forall (i,s) \in \{1,\ldots ,N\} \times \{1,\ldots ,S\} \end{aligned}$$
(3)
$$\begin{aligned} \sum _{i=1}^{N} x_{igs}&= M, \, \forall (g,s) \in \{1,\ldots ,G\} \times \{1,\ldots ,S\}\ \end{aligned}$$
(4)
$$\begin{aligned} \sum _{s=1}^{S} \sum _{g=1}^{G} \sum _{i=1}^{N-1} \sum _{j=i+1}^{N}&x_{igs}x_{jgs} \le R, \, \forall (g,s) \in \{1,\ldots ,G\} \times \{1,\ldots ,S\} \end{aligned}$$
(5)
$$\begin{aligned} x_{igs} \in \{0,1\}, \,&\forall (i,g,s) \in \{1,\ldots ,N\} \times \{1,\ldots ,G\} \times \{1,\ldots ,S\} \end{aligned}$$
(6)

The goal is to maximize the diversity-sum among all clusters and clusterings, where the intra-cluster diversity is precisely the distance-sum among all the pairs of that cluster. Constraint 3 states that each student is included in a single team. Constraint 4 states that the teams have precisely M students. Constraint 5 limits the number of times any pair of students can meet in different terms. Finally, Constraint 6 defines the binary domain for the decision variables.

3.3 Discussion

Observe that the previous MDOR model is adequate when \(M = \frac{N}{G}\) is an integer. Next we comment on how to overcome this limitation and to minimize the number of repetitions as well.

Number of Students per Group. If \(M = \frac{N}{G}\) is not an integer, we can replace Constraints 4 with a minimal variation. In fact, consider the Euclidean division: \(N = G\times M+r\) for some remainder \(r: 0 \le r <G\). We can arrange \(M+1\) students in r groups, and M students in the remaining \(G-r\) groups.

As a more general setting, pick two vectors \(\varvec{a}\) and \(\varvec{b}\) representing lower and upper-bounds on the number of students per group. Replace Constraints 4 with:

$$\begin{aligned} \sum _{g=1}^{G}x_{igs}&\ge a_g, \, \forall (g,s) \in \{1,\ldots ,G\} \times \{1,\ldots ,S\}\\ \sum _{g=1}^{G}x_{igs}&\le b_g, \, \forall (g,s) \in \{1,\ldots ,G\} \times \{1,\ldots ,S\}. \end{aligned}$$

Avoiding Repetitions. Avoiding repetitions is not always possible, depending on the parameters GMS of a MDOR instance. Even when it is possible, no polynomial-complexity algorithm is known for the general case; variations like the SGP-completion problem are known to be NP-complete [6, 13].

Let us consider a certain student, and let \(w_s\) be the number of feasible peer students for him/her during the term s. The sequence \(w_s\) satisfies the following recurrence:

$$\begin{aligned} w_1&= N-1;\\ w_{i+1}&= w_i - (M-1), \end{aligned}$$

since \(M-1\) new students are met in the last term \(s=i\). A straight solution of the recurrence leads to \(w_s = N-1-(s-1)(M-1)\). When the courses are finished we get \(s=S\) and \(w_S = N-1-(S-1)(M-1)\). Hence, if \(N < (S-1)(M-1)+1\), it is impossible to avoid repetitions.

Two possible heuristic approaches arise to cope with the repetition problem. One might build high-diversity solutions while controlling the repetition level. Alternative, one might generate repetition-free solutions and then choose and/or modify them seeking for improved diversity. In this paper we introduce an algorithm that follows the first approach. A parameter \(GLOBAL\_REP\) is set; once more than \(GLOBAL\_REP\) times a solution is generated including a repetition for a certain pair, the algorithm accepts the repetition.

4 Solution

GRASP and VND are well known metaheuristics that have been successfully used to solve many hard combinatorial optimization problems. GRASP is a powerful multi-start process which operates in two phases. A feasible solution is built in a first phase, whose neighborhood is then explored in the Local Search Phase. The second phase is usually enriched by means of different variable neighborhood structures. For instance, VND explores several neighborhood structures in a deterministic order. Its success is based on the simple fact that different neighborhood structures do not usually have the same local minimum. Thus, the resulting solution is simultaneously a locally optimum solution under all the neighborhood structures. The reader is invited to consult the comprehensive Handbook of Heuristics for further information [11]. Here, we develop a GRASP/VND methodology.

4.1 GRASP/VND Methodology for the MDOR

We followed a traditional VND flow diagram, that consists of three local searches:

  • Insert: moves a student to another group.

  • Swap: swaps two students from different groups.

  • \(3-Chain\): exchanges three students from three different groups.

The most simple local searches appear at the beginning. Therefore, the order is respectively Insert, Swap and \(3-Chain\). A greedy randomized Construction phase takes effect first.

To speed-up the evaluation of the objective function, the internal structures in the main algorithm consider two vectors:

  • \(x^c[i]\): current group for student i, and

  • \(sd^c[i][g]\): current sum-diversity between the student i and his/her peers in group g.

Observe that \(sd^c[i][g]=\sum _{j:x^[j]=g}d_{i,j}\), and if we link the students in a graph with link-weights \(d_{i,j}\), by Handshaking Lemma we get that the objective is:

$$\begin{aligned} f(x^c) = \frac{1}{2}\sum _{i=1}^{N}sd^c[i][x^c[i]]. \end{aligned}$$
(7)

In the following, the details of the construction and local searches are presented, in the respective order.

4.2 Construction Phase

The search space is the set of all student assignments to the groups, where each student belongs to exactly one group. A feasible solution also meets the respective lower and upper bounds \(a_{g}\) and \(b_{g}\). In our Construction phase, an iterative student insertion into groups takes effect, meeting the lower bounds \(a_g\). Finally, in order to fulfill feasibility, all the students are assigned in some group, meeting the upper-bound \(b_g\). Two factors are considered for these group-insertions: diversity and repetitions. In this construction phase, the priority is given to repetitions. Therefore, a memory with the previous terms is used, and if two assignment have identical number of repetitions, the assignment with maximum diversity is chosen. During the process, the diversity per group g for some student x is found using the following expression:

$$\begin{aligned} d^{\prime }(x,g)=\sum _{y\in g} \frac{d(x,y)}{|g|}. \end{aligned}$$

Observe the relation with the cardinality |g|; otherwise, groups with larger number of students are always preferred (Fig. 1).

Fig. 1.
figure 1

Construction phase

The following variables are considered during the Construction phase:

  • studentGroup[s]: the group assigned to student \(s \in \{1,\ldots ,N\}\).

  • atrsStandard[ij]: the value of attribute \(j \in \{1,\ldots ,K\}\) for the student i.

  • groupCount[g]: the number of students in the group \(g \in \{1,\ldots ,G\}\).

The following functions are also considered:

  • assignOneRandomStudentToEachGroup(): assigns, in each group, one random student uniformly picked at random.

  • assignGroupToStudForMinRepetitions(): picks a random student, and assigns him/her to the group that leads to the least number of repetitions. Ties are solved using the maximum diversity.

4.3 Insertion

In this local search, a student i is moved from a different group. We remark that a local search takes place whenever the resulting solution is both better and feasible. To test feasibility, we just check the lower and upper bounds for the old and the new group, respectively. The difference in the objective is the change in the diversity:

$$\begin{aligned} f(x^{n}) - f(x^{c}) = sd^{c}[i][g_{2}] - sd^{c}[i][g_{1}], \end{aligned}$$

being \(x^{n}\) the new solution and \(x^{c}\) the current solution (Fig. 2).

Fig. 2.
figure 2

Local Search I: Insertion

4.4 Swap

In this local search, two students i and j, originally belonging to different groups \(g_i \ne g_j\), are exchanged, and the difference in the objective is:

$$\begin{aligned} f(x^{n}) - f(x^{c}) = (sd^{c}[i][g_{j}] - sd^{c}[i][g_{i}]) + (sd^{c}[j][g_{j}] - sd^{c}[j][g_{i}]) - 2 d_{ij} \end{aligned}$$

A pseudocode for Swap is presented in Fig. 3.

Fig. 3.
figure 3

Local Search II: Swap

4.5 3-Chain

Consider three different students i, j y k belonging to three different groups \(g_i\), \(g_j\) and \(g_k\). Student i is moved to \(g_j\), j is moved to \(g_k\) and k is moved to \(g_i\) (Fig. 4):

$$\begin{aligned} f(x^{n}) - f(x^{c})&= (sd^{c}[i][g_{j}] - sd^{c}[i][g_{i}]) + (sd^{c}[j][g_{k}] - sd^{c}[j][g_{j}]) + (sd^{c}[k][g_{i}] - sd^{c}[k][g_{k}])\\&-( d_{ij} + d_{jk} + d_{ki} ) \end{aligned}$$
Fig. 4.
figure 4

Local Search III: \(3-Chain\)

4.6 Shake

In order to increase the diversity in the search-space, a shake process takes place. Consider a k-neighborhood of Swap operation, this is, an arbitrary application of k swaps. Shake picks a k-neighbor, and the VND phase is re-started with the obtained solution, provided that the Tabu List allows for the shake to be done (i.e., controlling the repetitions threshold). Figure 5 presents a full pseudocode for Shake. In the general algorithm, k starts equal to a parameter \(K\_MIN\) and is increased by a second parameter \(K\_STEP\) until the solution is improved or up to a third parameter \(K\_MAX\).

Fig. 5.
figure 5

Perturbation Step: Shake

4.7 Main Algorithm

The main algorithm iterates over all terms. For each one, it starts by invoking Construction a number of times \(MAX\_TRIES\) that acts as a parameter. The most diverse solution is passed to the following step, where the following cycle is repeated a number of times \(T\_MAX\) (another parameter): Shake - Insertion - Swap - \(3-Chain\). The best solution found (the most diverse clustering) is chosen for the term, moving on to the next one.

5 Computational Results

We carried out a comparison between the algorithm here introduced and the manual team assignment that was done in real-life with two IEEM Business School MBA cohorts from 2014 and 2015: “MBA1314” (34 students, 6 teams) and “MBA1415” (45 students, 8 teams).

The algorithm was coded in C++ and executed in a home-PC (Intel-core i7 2.2GHz, 8GB RAM). One hundred independent iterations were run (since GRASP is a multi-start metaheuristic) and the best solution was finally returned. As a preliminary stage, an adjustment of all the parameters was performed running several experiments. \(MAX\_TRIES\) and \(T\_MAX\) were set to 100 and 500 respectively. The Shake parameters were finally set to \(K\_MIN=K\_STEP=1\) and \(K\_MAX=3\). There is a trade-off between diversity and number of repetitions. A larger freezing-factor \(GLOBAL\_REP\) in the Tabu List implies a lower level of diversity as one test with MBA1415 shows in Table 1. All results next reported were obtained with Tabu-list parameter to a freezing factor of 285.000 to keep repetitions at a minimum level.

Table 1. Diversity and repetitions per term, MBA1415: manual vs algorithm.
Table 2. Diversity per term, MBA1314 and MBA1415: manual vs algorithm.

Table 2 compares the diversity achieved by our algorithm vs the manual team assignment for the two cohorts and the five terms that the program spans; Table 3 does a similar comparison for repetitions per term. Our algorithm consistently outperformed the manual assignment when considering diversity and repetitions. It also took less time, since the longest execution took 50 min, while the manual assignment was reported to take more than 4 hours for each cohort.

Table 3. Repetitions per term, MBA1314 and MBA1415: manual vs algorithm.

6 Conclusions and Trends for Future Work

A novel combinatorial optimization problem is introduced named Max-Diversity Orthogonal Regrouping (MDOR). It was conceived to cope with the problem of partitioning MBA cohorts into high-diversity teams, rotating the teams in every term and keeping under a given (low) threshold the repetitions. Nevertheless, the MDOR has potential applications in workforce management or team formation models for collaboration. The mathematical programming formulation is similar to a quadratic assignment problem, and the MDOR is presumably hard, even though a formal proof is not available in the literature.

A GRASP/VND methodology enriched with Tabu Search is here proposed in order to address the MDOR. A Shaking process in order to further explore the search-space is also included. The tests presented show that this algorithm produces clusterings faster, with fewer repetitions and higher diversities than the manually-built clusters applied to the real-life cohorts of the test cases. Future work includes formally establishing the computational complexity of the MDOR, and comparing our GRASP/VND methodology with alternative heuristics.