1 Introduction

Timetabling problems are eminently relevant in practice. Petrovic and Burke (2004) state that these problems can be found in various fields, including nurse rostering, timetabling of public transport systems, timetabling of sport events and educational timetabling. Schaerf (1999) subdivides educational timetabling into school timetabling, examination timetabling and course timetabling and generally defines the problem as scheduling lectures that involve teachers and students in a prefixed period of time, while taking different constraints into account. Two variants of course timetabling have been formulated for the second international timetabling competition (ITC-2007), including Post Enrollment Course Timetabling (PE-CTT) and Curriculum-based Course Timetabling (CB-CTT). The focus of this study is on the latter one. In PE-CTT, conflicts between courses are specified by the enrollment data, while in the case of CB-CTT courses of the same curriculum must not be scheduled at the same time.

Typically, educational timetabling problems are NP-complete, as demonstrated by Cooper and Kingston (1996) in five different ways. Finding feasible course and exam timetables is NP-complete due to a reduction of graph coloring, as described by several authors, e.g., Werra (1985). In this context, lectures are represented by nodes, edges between nodes are introduced if the corresponding lectures belong to the same curriculum and the colors represent different periods in the timetable.

Educational timetabling has been intensively studied. For an overview of solution techniques applied in this field we refer to the surveys by Schaerf (1999) and Qu et al. (2009). Recent developments in educational timetabling are analyzed by Kristiansen and Stidsen (2013). A survey dealing solely with CB-CTT has recently been conducted by Bettinelli et al. (2015).

With regard to metaheuristcs, Lewis (2008) classifies approaches into one-stage optimization algorithms, two-stage optimization algorithms and algorithms that allow relaxations and explains their strengths and weaknesses. According to the author, neither principle is generally superior. One-stage algorithms tackle hard and soft constraints at once, typically by making use of a weighted sum objective function with sufficiently high penalties for hard constraint violations. Two-stage techniques decompose the problem in a way that a feasible solution satisfying all hard constraints is found first while the solution quality with respect to the soft constraints is improved in the second stage. Algorithms that allow relaxations refer to approaches where some features of the problem are relaxed. Violations of the hard constraints are however prohibited. These algorithms are thereby able to improve the solution with regard to the soft constraints. Additionally, the relaxations need to be removed during the search. This class of algorithms generally refers to two types of relaxations: Events that cannot be scheduled in a feasible way are either left temporarily unscheduled or scheduled at artificial extra slots.

Bellio et al. (2012, 2016) proposed one-stage simulated annealing approaches. Müller (2009) used a two-stage hybrid approach. His algorithm is the winner of the ITC-2007 tracks about CB-CTT and exam timetabling and rank fifth on the PE-CTT track, as stated on the website of the competition.Footnote 1 In its improvement phase a hill climbing algorithm, a great deluge algorithm and a simulated annealing approach alternate. The second-ranking algorithm of the CB-CTT track proposed by Lü and Hao (2010) is a two-stage algorithm that combines an intensification phase and a diversification phase based on iterated local search in the second stage. The two-stage approach by Abdullah et al. (2012) employs a multi-start great deluge algorithm with an electromagnetic-like mechanism. Abdullah and Turabieh (2012) applied a tabu-based memetic approach.

A metaheuristic based on adaptive large neighborhood search (ALNS) is presented in this paper. ALNS, originally proposed by Ropke and Pisinger (2006) for tackling vehicle routing problems, is based on iteratively destroying and subsequently repairing relatively large fractions of an incumbent solution. It is an extension of large neighborhood search (LNS) proposed by Shaw (1998) and closely related to the ruin and recreate method by Schrimpf et al. (2000). In the survey by Ahuja and Orlin (2002) techniques are discussed that make use of neighborhoods that grow exponentially in problem size or are too large to be searched explicitly in reasonable time, whereas the latter characteristic also holds for LNS. Other authors have also employed large neighborhood structures for university timetabling problems, e.g., Abdullah et al. (2007). In the field of high school timetabling, approaches based on ALNS have been proposed by Sørensen et al. (2012), Sørensen and Stidsen (2012) and Kristiansen et al. (2013).

The contributions of this paper are the following: a state-of-the-art metaheuristic based on ALNS is developed for the ITC-2007 formulation of CB-CTT. The proposed variant of ALNS has the distinctive feature of decreasing the upper bound of destruction gradually over time. It is documented that this feature is favorable in ALNS. The algorithm is able to generate competitive results for the ITC-2007 CB-CTT benchmark instances. Moreover, new best known solutions have been found for five instances. The outline is as follows. Section 2 provides a brief description of the ITC-2007 formulation of CB-CTT, a mixed integer programming model, and results for the ITC-2007 benchmark instances generated by CPLEX. The proposed solution approach is presented in Sect. 3. Computational results for the ITC-2007 benchmark instances are shown in Sect. 4. In Sect. 5 the contribution of the different operators and the effects of the features of the algorithm are discussed, while Sect. 6 concludes.

2 Problem description

2.1 Timetabling competitions

The first international timetabling competition (ITC-2002) had its emphasis on course timetabling. Within the competition, a problem formulation was stated and benchmark instances were released. Since then, the formulation has been used by several other researchers, as noted by McCollum et al. (2010). One of the objectives of the second competition was to reduce the gap between theory and practice by providing more realistic formulations and benchmark instances that were derived from real world problems. Each of the three tracks of the competition focused on different problems of university timetabling, including exam timetabling, PE-CTT and CB-CTT. The third international timetabling competition (ITC-2011) was about high school timetabling. The focus of this study is on CB-CTT. In particular, the formulation of the ITC-2007 is used and the proposed algorithm is tested on the ITC-2007 benchmark instances.

The CB-CTT problem consists of scheduling lectures of courses to periods and rooms, as described by Di Gaspero et al. (2007). The working days of a week are split into periods for which a timetable has to be found. For each course the number of lectures and the number of attending students are known, as well as the teacher holding the course. Furthermore, each course might be part of several curricula. Timetables that satisfy a set of hard constraints are called feasible. In the following, the constraints of the problem are briefly described and the respective abbreviation is given in brackets. The hard constraints include that all lectures have to be scheduled (Lectures), at most one lecture can take place in a room at a time (RoomOccupancy), at most one course of the same curriculum or taught by the same teacher can be held at the same time (Conflicts) and availabilities of teachers have to be respected (Availability). Any unscheduled lecture counts as one violation, as well as each excessive lecture per room and period. Each pair of conflicting lectures counts as one violation too. Finally, whenever a lecture is scheduled at an unavailable period, i.e., when its teacher is not available, one violation is counted.

The quality of a timetable is measured in terms of soft constraint violations. Whenever a lecture takes place in a room with a capacity less than the number of students of the course, each student exceeding the capacity limit counts as one violation (RoomCapacity). Moreover, lectures of the same course should preferably take place in the same room (RoomStability). Each additional room corresponds to one violation. Another soft constraint aims at spreading the lectures of each course over a predefined number of working days (MinWorkingDays). Each day less than the required spread is counted as one violation. Finally, curricula should be as compact as possible (IsolatedLectures). For each curriculum a lecture that is not adjacent to any other lecture of the same curriculum is counted as one violation.

2.2 Mathematical model

The presented model for the CB-CTT problem is based on the integer programming formulation proposed by Lach and Lübbecke (2012). Since they use a model for a decomposition approach, slight adaptations are made for a formulation of the whole problem. A similar model is stated by Burke et al. (2010). The notation used in the model is summarized in Table 1.

Table 1 Notation of the mathematical model

D denotes the set of days for the timetable. Each day is divided into periods, where the set of periods is denoted by P. In each period the same set of rooms R is available. The set of courses is denoted by C. Each course c has \(l_c\) lectures that need to be scheduled. These lectures should be spread over a minimum number of days specified by \(m_c\). Each course belongs to one or multiple curricula and is taught by one teacher. The set of curricula is denoted by CU and the set of courses belonging to a curriculum cu is denoted by \(K_{cu}\). T denotes the set of teachers and \(L_t\) the set of courses taught by teacher t. For each course c the set of the day-period pairs the course must not be assigned to is denoted by \(U_c\). On the other hand, \(A_{(p,d)}\) denotes the set of courses that can be scheduled at period p on day d. The capacity of the rooms and the number of students of the courses are known. Consequently one can compute the capacity shortage \(s_{c,r}\) for assigning course c to room r. The penalties are denoted by \(p^\text {TYPE}\) with the respective superscript.

The timetable is represented by the binary variable \(x_{c,d,p,r}\), which takes the value 1 if a lecture of course c is scheduled at period p on day d in room r. It is defined only for day-period pairs for which course c is available. Therefore, the formulation takes the availability constraint implicitly into account. Additional binary and integer decision variables are needed in order to formulate the soft constraints. The decision variables are defined as follows.

figure a

The weighted sum of the soft constraint penalties is minimized by the objective function (1). The sum of the variables \(x_{c,d,p,r}\) weighted by the penalty \(p^\text {CAP}\) times the respective capacity shortage \(s_{c,r}\) for assigning course c to room r yields the capacity penalty term. By employing the decision variable \(y_{c,r}\), indicating whether a lecture of course c is held in room r, one can easily compute the number of rooms used by the course. The deviation of this number from 1 times the penalty coefficient \(p^\text {STAB}\) yields the penalty for violations of the constraint RoomStability. The penalty term for an insufficient spread over working days is computed as the number of days less than the requirement times the respective weight. Similarly, for determining the penalty term for the curriculum compactness, one has to compute the number of isolated lectures weighted by \(p^\text {COMP}\).

$$\begin{aligned}&\min \sum _{d\in D, p\in P, c\in A_{(p,d)}, r\in R}x_{c,d,p,r}\cdot s_{c,r}\cdot p^{\text {CAP}} + \sum _{c\in C}\left( \sum _{r\in R}y_{c,r}-1\right) \cdot p^{\text {STAB}} \nonumber \\&\quad + \sum _{c\in C}w_c \cdot p^{\text {DAYS}} + \sum _{cu\in CU, p\in P, d\in D}v_{cu,p,d}\cdot p^{\text {COMP}} \end{aligned}$$
(1)

The availability constraint is implicitly taken into account. The other hard constraints are formulated as follows.

(2)
(3)
(4)
(5)

Constraints (2) make sure that all lectures of the courses are scheduled. Due to Constraints (3) each teacher holds at most one lecture at a time. Constraints (4) guarantee that two lectures of the same curriculum are not held in parallel. Finally, Constraints (5) respect that a room can accommodate at most one lecture at a time. The soft constraints are modeled in the following way.

(6)
(7)
(8)
(9)

Constraints (6) link \(z_{c,d}\) with \(x_{c,d,p,r}\) in a way that \(z_{c,d}\) can only be set to 1 if there is at least one lecture of course c held on day d. For each course, \(z_{c,d}\) and \(w_c\) are connected by Constraints (7), such that \(w_c\) counts the number of days less than the required spread. Constraints (8) are used to identify isolated lectures, represented by the variable \(v_{cu,p,d}\). The first term of the inequality constraint takes the value 1 if a lecture of the corresponding curriculum is held at the particular time and 0 otherwise. The second term represents the schedule of the previous and the subsequent periods. If at least one lecture of the same curriculum takes place in an adjacent period, either 1 or 2 is subtracted and \(v_{cu,p,d}\) can take the value 0. Otherwise an isolated lecture with respect to curriculum cu is identified and \(v_{cu,p,d}\) is set to 1. For the first and the last period of each day the second term has to be adapted in a way that the previous and the last period have to be omitted, respectively. Finally, Constraints (9) link \(x_{c,d,p,r}\) with \(y_{c,r}\), where M denotes a large number. Variable \(y_{c,r}\) is set to 1 if at least one lecture of course c is held in room r.

2.3 Results

We applied CPLEX 12.5 to the mixed integer programs of the ITC-2007 CB-CTT benchmark instances on a Linux PC with an Intel Core i7-4770 CPU running at 3.4 GHz and 16 GB memory. The results are shown in Table 2. Either the optimal solution or the best solution found after 24 hours is reported in column IP. Note, that this time limit clearly exceeds the one of the ITC-2007. The column Best refers to the best known solutions, available on the CB-CTT website.Footnote 2 Bold numbers indicate optimal solutions. Besides the best known solutions, also the solution techniques used for generating them as well as the best known lower bounds for the instances are stated on the website. Hence, the optimality of the bold values is either proven by the (exact) solution technique itself, or by the match of the lower and the upper bound. CPLEX found optimal solutions for 4 out of 21 instances. In particular, the optimal solutions for the instances comp01, comp04, comp08 and comp11 were found within 35, 2077, 6742 and 8 s, respectively. For some instances the solution found deviates significantly from the best known solution, though.

Table 2 Results of the integer programs of the ITC-2007 instances

3 Solution approach

In case of NP-hard problems, where timetabling problems typically belong to, one usually has to resort to (meta-)heuristic methods, particularly when good solutions have to be found in reasonable time. The previously shown results of the exact method underline this statement for the considered problem class. Therefore, an metaheuristic approach for the CB-CTT problem is presented in this section. More precisely, an ALSN approach based on the papers by Ropke and Pisinger (2006) and Pisinger and Ropke (2007) is developed.

3.1 Adaptive large neighborhood search

3.1.1 General description

In LNS an initial solution has to be created first. At each iteration, parts of the incumbent solution are destroyed and subsequently repaired. New solutions are accepted according to a certain criterion to become the new incumbent solution. The algorithm keeps track of the best solution, i.e., the solution with the least soft penalties among the solutions with the smallest number of unscheduled lectures. In the end the best solution found is returned. ALNS extends LNS in a way that several destroy and repair operators are used and their selection probability is biased towards the best-performing ones. The approach is sketched in Algorithm 1, while essential steps are described in detail in the following paragraphs.

figure b

The adaptive element of ALNS refers to the dynamic weight adjustment mechanism. Initially, each operator has the same selection probability until weights are recomputed. At each iteration, a destroy operator and a repair operator are selected separately by a roulette wheel mechanism. Given that there are k operators with weights \(w_i, i\in \{1,\dots ,k\}\), operator j is selected with the probability \(w_j/\sum _{i=1}^k w_i\). After applying the operators to the incumbent solution the corresponding scores are updated by adding a value \(\sigma \) depending on the solution quality.

$$\begin{aligned} \sigma = {\left\{ \begin{array}{ll} \sigma _1 &{} \text {if the solution is a new global best}\\ \sigma _2 &{} \text {if the solution is better than the incumbent and not accepted before}\\ \sigma _3 &{} \text {if the solution is worse than the incumbent, accepted and has not been}\\ &{} \text { accepted before} \end{array}\right. } \end{aligned}$$

This scheme encourages operators that find solutions having not been accepted before, in order to direct the algorithm towards unvisited regions of the search space.

The search process of the algorithm is divided into segments of \(s=100\) iterations. Each time the algorithm has performed s iterations, the end of the segment is reached and the weights are recomputed based on the scores achieved in the last segment. In general, the weights of the heuristics are recomputed as a convex combination of the old weight and the average score achieved in the last segment.

It turns out that the reparation phase is the most time-consuming part of the algorithm. Moreover, the computation times vary significantly between heuristics. Consequently, it appears to be particularly important to normalize the scores of the repair operators by their computational effort in order to achieve a good trade-off between quality and time, as suggested by Pisinger and Ropke (2007). Therefore, whenever the weight of repair heuristic j is computed, its average computation time \(t_j\) and the average computation time \(t_{2stagebest}\) of the reference repair operator 2-stage best are taken into account. Given the old weight \(w_j^{old}\) of repair operator j, the formula to compute its new weight \(w_j^{new}\) is given by

$$\begin{aligned} w_j^{new} = w_j^{old}\cdot (1-r)+r\cdot \frac{\pi _j}{\phi _j}\cdot \frac{t_{2stagebest}}{t_j} \end{aligned}$$

where \(r\in [0,1]\) denotes the reaction factor, \(\pi _j\) denotes the the score achieved during the last segment and \(\phi _j\) denotes the number of times operator j has been called in the last segment. The last fraction is omitted for non-repair operators.

Incorporating the average computation time of a reference operator is motivated by the fact, that the average computation time of an operator may change during the search. Omitting the reference computation time in the formula would consequently result in an imbalanced acquisition of scores over the different stages of the search. The selection of the reference operator itself is generally arbitrary, as this choice barely affects the final operator selection probabilities. We decided to pick the 2-stage best operator, as this operator is regularly called in any stage of the search and proves to be essential for the performance of the algorithm, as it will be shown in Sect. 5. The operator 2-stage best itself will be described in Sect. 3.3.1.

At the end of each segment, the average computation times of the repair operators, including 2-stage best, are updated by computing

$$\begin{aligned} t_j^{new}=t_j^{old}\cdot (1-r)+r\cdot \frac{\tau _j}{\phi _j} \end{aligned}$$

where \(\tau _j\) denotes the sum of the computation times of operator j in the last segment, \(\phi _j\) denotes the number of calls of operator j in the last segment and \(r\in [0,1]\) denotes the reaction factor.

3.1.2 Acceptance scheme

Ropke and Pisinger (2006) suggest to embed this algorithm in a simulated annealing (SA) framework, developed by Kirkpatrick et al. (1983). A new solution \(x'\) is accepted with the probability \(\min \left\{ 1,e^{-(f(x')-f(x))/T_i}\right\} \), where f denotes an evaluation function, x the incumbent solution and \(T_i\) the temperature in iteration i. Consequently, all solutions with an objective value less than the one of the incumbent solution are accepted, and partly also solutions with a greater objective value. The starting temperature is defined implicitly, such that in the beginning a solution being \(\psi \)-percent worse than the initial solution is accepted with a probability of 50 %. At the end of each iteration the temperature is decreased by a cooling rate, in a way that the temperature reaches a target \(T_\text {end}\) in the last iteration, where \(T_\text {end}\) is passed as a parameter. Similarly to Lewis and Thompson (2015), the cooling rate is calculated for each iteration individually, since a time limit is used as termination criterion. More precisely, the cooling rate \(\rho _i\) of iteration i is computed as \(\rho _i=(T_\text {end}/T_i)^{1/\mu _i}\), where \(\mu _i\) denotes the expected number of iterations before the search terminates. \(T_{i+1}\) is then computed as \(T_{i+1}=T_i\cdot \rho _i\).

In order to predict the expected number of remaining iterations, the average total number of iterations \(\nu \) for a given time limit \(t_\text {end}\) is pre-computed as an average over five runs. The expected number of remaining iterations \(\mu _i\) is then computed as \(\mu _i=\nu \cdot \left( 1-\frac{t_i}{t_\text {end}}\right) \), where \(t_i\) denotes the time consumed until iteration i. The pre-computed average iteration limit is used, since for the proposed algorithm one can hardly make reliable predictions about the number of remaining iterations solely on the basis of already performed iterations. One has to note, that in contrast to our approach, Lewis and Thompson (2015) do not rely on a pre-computed average iteration limit when deciding upon the expected number of remaining iterations.

3.1.3 Destroy limit

At each iteration, n lectures are removed from the current schedule to become reinserted, whereas the remaining lectures are fixed. The integer n is randomly drawn from the interval \([1,n^{max}]\), where \(n^{max}\) denotes the destroy limit, i.e., the maximum number of lectures that can be removed. The reference destroy limit \(n_0^{max}\) is set to d percent of the total number of lectures. It turns out that the usage of different percentages depending on the instance size is beneficial, i.e., \(d_s\) is used for small instances with less than 280 lectures and \(d_l\) for larger instances.

The destroy limit is then gradually decreased, in a way that in the last iteration m at most \(\frac{1}{\delta }\) of the reference destroy limit \(n_{0}^{max}\) can be destroyed. The function \(n^{max}(i)\) that gives the destroy limit for any iteration i has the form \(n^{max}(i)=\left\lfloor n_0^{max}-i^y \right\rfloor \), resulting in a steep decrease of the destroy limit in the beginning and a flatter decrease in the end of the search. Since \(n^{max}(m)=\left\lfloor \frac{1}{\delta }\cdot n_0^{max}\right\rfloor \), the value of y can be computed by setting \(n_0^{max}-m^y=\frac{1}{\delta }\cdot n_0^{max}\). This leads to the formula

$$\begin{aligned} n^{max}(i) = \left\lfloor n_{0}^{max} - {i}^{\log _m\left( \frac{\delta -1}{\delta }\cdot n_0^{max}\right) }\right\rfloor \end{aligned}$$

where \(m=\mu _i+i\) denotes the (expected) iteration limit. The expected number of remaining iterations \(\mu _i\) is computed as in the previous subsection.

Decreasing the destroy limit considerably increases the number of iterations within a given time limit, since repairing smaller parts of a solution typically requires less computation time. On the other hand, by destroying less lectures one could potentially lose diversification which in turn could outweigh the gain in performance resulting from the larger number of iterations. Consequently, when setting the decreasing parameter \(\delta \), this trade-off has to be taken into account. A motivation for decreasing the destroy limit based on computational tests is given in Sect. 5.2.

In the fields of resource-constrained project scheduling and lot-sizing with setup times, respectively, Muller (2009) and Muller et al. (2012) suggested to gradually reduce the parameter that controls the degree of destruction. However, in the terminology of this paper, their respective parameters correspond to the actual number of requested removals rather than the destroy limit.

3.1.4 Temperature reheats

The temperature is reheated, whenever h consecutive iterations have not found a new best solution. For that purpose, the temperature is set in the same way as described in Sect. 3.1.2, but with respect to the solution quality of the incumbent solution. More precisely, the new temperature is again defined implicitly, such that a solution that is \(\psi \)-percent worse than the incumbent solution is accepted with 50 % probability. Temperature reheats have been employed by several authors, e.g., Connolly (1992). In addition to temperature reheats the destroy limit is set to its initial level. The combination of the larger destroy limit and the higher acceptance probability of worse solutions helps to escape from local optima. The decreasing speed of the destroy limit and the cooling rate are adjusted to the expected number of remaining iterations.

3.1.5 Infeasible solutions

The algorithm generally allows infeasible solutions and thereby being able to take shortcuts by traversing infeasible regions of the search space. Benefits and drawbacks of permitting infeasible solutions are discussed by Lewis (2008). The repair heuristics try to schedule lectures even if their insertion costs are high. However, when the algorithm has to decide whether an infeasible solution is accepted, the penalty p is added to the objective value for each unassigned lecture. Initially, p is set to \(\max (1,p_{avg})\), where \(p_{avg}\) denotes the average penalty per lecture of the initial solution. The penalty p is then dynamically adjusted during the search. Whenever a feasible solution is accepted, the penalty p is set to \(\max (1,p^{old}/\alpha )\), where \(\alpha \) denotes a parameter and \(p^{old}\) the previous value of p. Conversely, whenever a solution with unassigned lectures is accepted, p is set to \(\min (p^{old}\cdot \alpha ,p_{max})\), where \(p_{max}\) denotes the worst case penalty for scheduling a lecture. \(p_{max}\) is computed as

$$\begin{aligned} p_{max}=p^{\textit{STAB}}+\max _{c\in C}\left( \max _{r\in R}\left( p^{\text {CAP}}\cdot s_{cr}\right) +p^{\text {COMPACT}}\cdot \left| \{cu\in CU:c\in K_{cu}\}\right| \right) \end{aligned}$$

where the same notation is used as described in Table 1. The penalty that corresponds to MinWorkingDays is omitted, since scheduling a lecture cannot worsen the objective value with respect to this constraint.

Bellio et al. (2012) use the same adjustment scheme for the penalty that corresponds to violations of the Conflicts constraint with different bounds, though. This method has been proposed by Gendreau et al. (1994). In general, the penalty is updated if a number of consecutive solutions is feasible or infeasible, respectively.

3.2 Destroy operators

3.2.1 Related removal

The related operator is inspired by Shaw (1998) and aims to remove similar lectures. The relatedness measure between lectures of two courses i and j is defined as

$$\begin{aligned} R(i,j):=\beta \cdot \frac{\min (o_i,o_j)}{\max (o_i,o_j)}+\frac{k_{ij}}{g_i+1} \end{aligned}$$

where \(o_i\) denotes the number of students taking course i, \(k_{ij}\) the number of curriculum and teacher conflicts between the courses i and j, \(g_i\) the number of curricula of course i, and \(\beta \) denotes a weight. Consequently, the first term shows the relatedness with respect to the number of students. The rationale behind this is that courses with similar capacity requirements might be easily swapped without causing capacity violations. The second term describes a conflict ratio between the two courses. The reason for adding this term is that moving a course to another period requires the removal of conflicting ones.

The operator is described in Algorithm 2, where the function c(b) in line 8 maps a lecture b to its course c(b). The set of lectures that are going to be removed B is initialized with a random scheduled lecture. As long as the cardinality of this set is less than the requested number of removals, a lecture b is randomly drawn from B. The list of all scheduled lectures that are not in B and do not belong to the same course of b is denoted by A and is sorted in descending order with respect to the relatedness to b. A lecture is drawn from A by computing its index as \(\lfloor |A|\cdot \upsilon ^{\kappa }\rfloor \), where \(\upsilon \) denotes a random number in [0, 1). The probability of selecting very related lectures is controlled by the parameter \(\kappa \). If \(\kappa \) is large it is likely to select the most related lecture. On the other hand, if \(\kappa =1\) each lecture has the same selection probability. The drawn lecture is then added to B and the process is repeated.

figure c

3.2.2 Random removal

The random destroy operator removes lectures from the schedule at random. Ropke and Pisinger (2006) employ a random removal heuristic as well.

3.2.3 Worst removal

As suggested by Ropke and Pisinger (2006) the worst destroy operator, shown in Algorithm 3, aims at removing highly penalized assignments, since reinserting these events may be beneficial. Due to interdependencies between lectures, it might be hard to associate penalties with single lectures. Consequently, the heuristic operates at the level of courses. The association of penalties with courses can be done straightforward for violations of RoomCapacity, MinWorkingDays and RoomStability. The penalty for IsolatedLectures is assigned to the lecture that is isolated.

figure d

The probability of selecting the most penalty-causing course for removal is again controlled by parameter \(\kappa \). The same parameter will also be used for similar destroy operators described in the following subsections. In terms of removals each destroyed course counts as much as its number of scheduled lectures. Since whole courses are destroyed, it is likely that the requested number of removals is occasionally exceeded. In general, the limit might be exceeded whenever destroy operators are applied that remove multiple lectures at once.

3.2.4 Random penalty removal

The random penalty destroy operator randomly selects lectures of courses that cause penalties and removes them from the schedule. In case all of these lectures are removed and the requested number of removals is not reached, other lectures are removed up to the limit at random.

3.2.5 Random period removal

The random period destroy operator repetitively selects a day-period pair at random and removes all its scheduled lectures. Rescheduling the lectures within a particular period allows changing their room assignments without affecting the curriculum compactness and the spread over days. Therefore, the operator might be particularly useful to improve the solution with respect to room-related constraints.

3.2.6 Room day removal

The room day destroy operator repetitively removes all lectures that are assigned to a randomly selected room on a randomly selected day. Removing all lectures from a particular room-day allows them to get reassigned to different periods on that day while preserving the penalty level of the room-related constraints and the spread over days. Thereby the operator focuses on improving the curriculum compactness.

3.2.7 Isolation and capacity removal

The isolation and capacity destroy heuristic is very similar to the worst removal operator. Instead of removing whole courses, individual lectures are selected for being removed, where only their capacity and compactness penalties are considered.

3.2.8 Spread and stability removal

The spread and stability operator is essentially the same as the worst removal operator. The ordering of the removable courses is based only on the penalties for violating the spread over days and the room stability, though.

3.2.9 Curriculum removal

The curriculum operator is similar to the worst removal operator, however curricula are selected instead of courses. The removable curricula are sorted in descending order with respect to their curriculum compactness penalties. This operator aims at reducing the compactness penalties. Furthermore, the operator might help to move lectures to periods that have been previously forbidden due to curriculum conflicts.

3.2.10 Teacher removal

The teacher operator is used to ease restrictions regarding teacher conflicts. Teachers are randomly selected and all of their lectures are removed from the schedule.

3.3 Repair operators

The algorithm makes use of several repair operators. They can be categorized into 2-stage and 1-stage heuristics. The 2-stage heuristics assign lectures to periods first and find a room schedule in the second stage, while the 1-stage heuristics perform period and room assignments at once. In this subsection and the subsequent one, the term period is used instead of day-period pair to facilitate readability.

3.3.1 2-Stage repair operators

The procedure that assigns lectures to periods is summarized in Algorithm 4. The algorithm receives a vector as input, where each element corresponds to a course indicating its number of lectures that have to be scheduled. Apart from the initial state where all lectures are unscheduled, these unscheduled lectures have either been removed by the priorly called destroy operator or remained unscheduled in the previous repair phase. Courses with unscheduled lectures are put in a list which is ordered according to a priority rule. Iteratively, the lectures of the course with the highest priority are scheduled at their best position with respect to an evaluation criterion. In case no feasible insertion position is left, conflicting lectures can be removed from the schedule that is under construction. Consequently, the courses where these lectures belong to have unscheduled lectures again and are therefore reinserted into the list of courses with unscheduled lectures. In order to prohibit cycles, lectures causing the removal of other lectures from a particular day-period pair once, must not remove lectures from the same pair at a later point. The algorithm generally continues as long as there are unscheduled lectures, but terminates in case the remaining lectures cannot be feasibly scheduled and cannot remove lectures from the schedule any more. In the end, the algorithm returns a schedule, i.e., an assignment of lectures to day-period pairs. In case there are still unscheduled lectures, a list of these lectures is also returned. The schedule is then passed to the room assignment operator of the second stage. Details are described in the following paragraphs.

figure e

Priority Rules The priority rules employed in Algorithm 4 are either saturation degree (SD), largest degree (LD) or random. The rule selection is based on a roulette wheel principle by using adaptively adjusted weights, as previously described. The LD rule employed by Broder (1964) prioritizes courses with the largest number of conflicts with other courses. The SD rule proposed by Brélaz (1979) arranges courses in ascending order with respect to their number of available periods for scheduling. The random rule mentioned by Carter et al. (1996) orders courses randomly. In order to diversify the outcome, a random number drawn from \([-\nu ,\nu ]\), where \(\nu \) denotes a parameter, is added to the priority value whenever SD or LD are used.

Lecture-period assignment Two mechanisms for evaluating insertion positions are implemented. The 2-stage best heuristic evaluates all conflict-free insertion periods for a lecture of the course that is next in line for being scheduled in the following way. If assigning the lecture to the considered period will add or remove isolations with respect to curricula, the curriculum compactness penalty is added or subtracted accordingly. Whenever the required spread over days of the course is not reached and no other lecture of the same course has been scheduled on the considered day, the respective penalty is subtracted, since the solution will be improved. If some lectures of the course have already been scheduled and none of the rooms where these lectures take place are available in the considered period, the penalty for violating the room stability is added. Finally, the capacity penalty is roughly estimated by assuming that if the lecture has the xth most students of all courses that are assigned to the period in the part of the schedule that is under construction, it will get the xth largest available room in the second stage. The capacity penalty then corresponds to the excess number of students. Ties between the lowest cost insertion positions are broken randomly, as it is done for the other repair operators.

The 2-stage mean heuristic differs only in the treatment of the capacity penalty. A reference utilization of the room capacities u is computed by dividing the sum of the number of students of all lectures that have to be scheduled \(\varSigma _l\) by the sum of the capacities of all available rooms \(\varSigma _r\), i.e., \(u=\frac{\varSigma _l}{\varSigma _r}\). Then a capacity limit is computed for each period individually as \(\eta \cdot u\cdot \varSigma _p\), where \(\varSigma _p\) denotes the sum of the capacities of the available rooms in period p and \(\eta \) denotes a parameter that controls the penalty-free number of students. The capacity penalty added to the insertion cost corresponds to the number of students of the assigned lectures exceeding the capacity limit of the considered period.

For diversification purposes, Ropke and Pisinger (2006) suggest to perturb the insertion cost by adding a random number. Therefore, two additional operators are implemented, 2-stage best noise and 2-stage mean noise, that are based on the previously described heuristics. Each time a period is evaluated a noise value is added to the insertion cost that is drawn randomly from \([-\mu \cdot p_{max},\mu \cdot p_{max}]\), where \(\mu \) denotes a parameter and \(p_{max}\) denotes the worst case insertion cost of one lecture.

Backtracking procedure In case a course is next in line that cannot be scheduled in a conflict-free way, a backtracking mechanism is applied. The implemented backtracking mechanism is similar to the one described by Carter et al. (1996). The aim is to induce the least distortions to the current schedule when removing conflicting lectures. Only lectures that do not belong to the fixed part of the current schedule can be removed. Each insertion position where the considered course is allowed to remove lectures from is evaluated. Positions with the smallest number of conflicting lectures that do not have any alternative conflict-free position left are prioritized. Ties are broken by selecting positions with the smallest number of conflicting lectures in total. In case the procedure is still indifferent, the period with the lowest insertion cost for the considered course is chosen.

The courses of the removed lectures are reinserted in the queue in a way that they are next in line for being scheduled, where courses without any potential conflict-free position are prioritized. However, in case of the SD rule the order is dynamically adjusted. To avoid cycles, a course that has a lecture assigned to a period must not remove events from the same period at a later time. In general, this mechanism does not guarantee finding a feasible solution, though. Hence, there might be unscheduled lectures that have to be passed to the repair operator called next.

Lecture-room assignment In the second stage lectures are assigned to rooms either by the greatest or the match heuristic. The operator selection is based on the previously described adaptive mechanism. The greatest heuristic selects a period randomly. Its lectures are sorted in descending order with respect to their number of students and are scheduled one after another. Each available room in the period is evaluated with respect to the room-related penalties. The lecture is then assigned to the room with the lowest insertion cost. Ties are broken by preferring rooms with the larger capacity. The rationale behind this is to keep the smaller rooms for courses with less students, which might be beneficial with regard to the room stability.

The match heuristic processes one period after another in a random order and for each period also the lectures are processed randomly. The evaluation of the available rooms is performed in the same way as for the greatest heuristic. However, ties are broken by selecting the room with the smallest capacity. The reason is that since lectures are scheduled in a random order, there might be lectures that are processed later and require large rooms. The second-stage heuristics stop when all lectures scheduled by the first-stage operator have been assigned to rooms.

3.3.2 1-Stage repair operators

A greedy and a regret heuristic are employed as 1-stage operators, as done by Ropke and Pisinger (2006). For each course, the greedy heuristic evaluates all its potential insertion positions. A lecture of the course with the lowest insertion cost, is then scheduled at its best position. On the contrary, the regret heuristic decides on the basis of regret values, which lecture is scheduled next. The regret value indicates the opportunity cost for not assigning a lecture to its currently best position. The regret value of a k-regret heuristic is computed as the sum of the differences between the best insertion position and the \(i^{\text {th}}\) best insertion position, \(i=2,\dots ,k\). A lecture of the course with the largest regret value is then scheduled at its best position.

figure f

Each room-period pair is evaluated in the following way. The capacity penalty corresponds to the number of students exceeding the capacity of the room. If inserting a lecture of the considered course would cause or remove compactness violations, the curriculum compactness penalty is added or subtracted accordingly. The room stability penalty is added if lectures of the same course have already been scheduled and none of them takes place in the considered room. Finally, if the required spread over days has not been reached yet and no lecture of the same course has been scheduled on the considered day, the penalty for violating the minimum spread is subtracted. The performance of the heuristics can be slightly improved by further encouraging the spread over days. The penalty for violating MinWorkingDays is added, if another lecture of the same course takes place on the considered day, regardless of the satisfaction of the required spread.

In the greedy heuristic the penalty for unassigned lectures of the current iteration, computed as in Sect. 3.1.5, is added to the insertion cost of the considered position if the number of available periods is greater than the number of lectures to assign of the course. Thereby, courses with less or equal potential insertion periods than lectures to schedule are prioritized. In case of the regret heuristic, the insertion costs of missing alternatives is set to the unassigned penalty of the current iteration, if less than k positions are available. Consequently, setting k sufficiently large takes the lack of alternatives into account. A 5-regret heuristic is used for this study.

As suggested by Ropke and Pisinger (2006) two additional noise operators are implemented, i.e., greedy noise and regret noise. Each time an alternative is evaluated a random number drawn from \([-\mu \cdot p_{max},\mu \cdot p_{max}]\) is added to the insertion cost.

3.4 Initial solution

The initial solution is generated by applying the 2-stage best heuristic in combination with the SD rule and the greatest heuristic. The initial solution is not necessarily feasible, however for the ITC-2007 instances a feasible one is typically found.

4 Computational experiments

4.1 Instances

Within the ITC-2007, 21 instances have been proposed for the CB-CTT track, classified into early, late, and hidden ones. The late instance set was released two weeks before the deadline of the competition, while the hidden instances were released after the closure and were used to rank the best participants. The benchmark instances are available on the website of the competition and are called comp01,...,comp21. Their characteristics are described by Bonutti et al. (2012) in detail. Best known solutions can be found on the website of the Timetabling Research Group at the University of Udine,Footnote 3 where researchers can upload their solutions and lower bounds. Additionally, we apply our algorithm to recently proposed instances being available on the same website, i.e., the sets DDS, EasyAcademy and Udine. Bellio et al. (2016) consider them as candidate benchmark sets for future comparisons.

4.2 Parameter tuning

The algorithm incorporates several parameters, whose setting is given in Table 3. Some of the initial parameter values are taken from Ropke and Pisinger (2006), including the control parameters for the weight adjustment mechanism, i.e., \(\sigma _1\), \(\sigma _2\), \(\sigma _3\), r, and the parameter \(\psi \) controlling the start temperature. The remaining initial parameter values have been found during the experiments in the implementation phase. Similarly to Ropke and Pisinger (2006), for setting the parameters appropriately, the change in performance is evaluated when altering one value at a time and keeping the others fixed. This is done for all parameters in parallel, though. Typically, a slightly larger and a slightly smaller value are checked for each parameter. The average penalty over five runs on the instances comp01,...,comp21 is computed. After each parameter value has been evaluated, the parameters are set to the best-performing values. This new setting is the basis for the next round. The procedure is repeated until no significant differences are observable. This basic tuning method has been chosen, as many parameters have to be tuned. In addition, the algorithm does not react very sensitive on changes of the parameter values. One has to note, that the participants of the ITC-2007 were not able to tune their algorithms on the instances comp15,...,comp21. Moreover, Bellio et al. (2016) do not use any of the comp instances for tuning, but rather a large set of artificial instances.

Table 3 Parameter setting

4.3 Results

The final results presented in this section are generated on a computer with a hardware that has been actual at the time of the competition, i.e., an AMD Turion X2 Ultra Dual-Core Mobile ZM-82 running at 2.2 GHz, 4 GB memory and an Ubuntu 14.04 64-bit operating system. As specified by the competition rules, only one CPU is used. In order to generate comparable results, the time limit is set according to the benchmarking tool provided on the website of the competition. For the stated equipment, the benchmarking tool suggests a time limit of 480 s.

ALNS is used to generate solutions for the comp instance set, with ten runs on each instance. Moreover, the algorithm keeps track of the selection rates of each operator. The average selection rates over all runs and each instance are then used as an input for LNS. By employing LNS with predefined operator selection probabilities one can omit the adaptive procedure and thus extra iterations may be executed within the same time. Moreover, the algorithm makes use of the adjusted selection probabilities from the very beginning. While for some instances, separate tuning (yielding instance-specific selection probabilities) could give slightly better results, it will be seen that this robust tuning over all instances will provide excellent results on average. The reheat limit is set to 80,000 for LNS, while the other parameter values are those stated in Table 3.

LNS with predefined selection probabilities proves to be superior compared to ALNS, as shown in Table 4. Average refers to the average penalty over ten runs and best presents the best result out of these runs. The column Best known shows the best known solutions, as stated on the website of the Timetabling Research Group at the University of Udine, whereas bold numbers indicate proven optimality. The proposed algorithm found new best solutions for the instances comp03, comp05, comp12, comp15 and comp18 during the tuning and experimental phases.

Table 4 ALNS versus LNS with predefined selection probabilities

The results for the instances comp01,...,comp21 are shown in Table 5, where our approach is compared with the algorithms by Abdullah and Turabieh (2012), Bellio et al. (2016), Abdullah et al. (2012) and the two best algorithms of the ITC-2007, i.e., the algorithms by Müller (2009) and Lü and Hao (2010). Their results are either those reported on the website of the competition or taken from the respective papers, as described in the footnotes. One has to note, that Bellio et al. (2016) use an iteration limit being roughly equivalent to the time limit, instead of the actual time limit.

In column LNS the average results of LNS over 10 runs with random seeds are presented. The results of the algorithms of the competition and those by Abdullah et al. (2012) are averages over 10 runs too, while Abdullah and Turabieh (2012) apply 11 runs and Bellio et al. (2016) use 31 runs. Italicized values indicate that the corresponding algorithm performs best compared to the other ones on the respective instance. LNS with predefined selection probabilities is superior to the other approaches on twelve instances and clearly outperforms the best algorithms of the ITC-2007.

Table 5 Average results for the CB-CTT ITC-2007 instances

Our results for the instance sets DDS, EasyAcademy and Udine are presented in Table 6 and are compared solely with the results by Bellio et al. (2016), since few authors have published solutions for these instance sets yet. For this purpose, we retained the parameter setting and the operator selection probabilities resulting from the tuning on the comp instances. The column LNS avg. refers to the average outcome over ten runs, while LNS best presents the best result out of these runs. In column Best known the best known solutions for these instances are shown, where bold numbers indicate proven optimality.

Bellio et al. (2016) propose another set of instances, called Erlangen, which differ significantly from the other instance sets, particularly in their huge problem size. These instances cannot be solved well by our ALNS without retuning.

Table 6 Results for the new instances

4.4 Statistics

Statistics about the intermediate solutions for the comp instances generated by ALNS are presented in Table 7. The numbers correspond to averages over 10 runs. The column It. refers to the number of iterations executed within the time limit. Found shows the iteration, when the best solution was found as percentage of the total number of iterations. Inf. indicates the percentage of generated infeasible solutions. In Accept the number of accepted infeasible solutions is given as a percentage of all infeasible solutions. 2-stage, Greedy and Regret correspond to the percentage of infeasible solutions generated by 2-stage repair operators, greedy heuristics and regret heuristics, respectively, with regard to the total number of solutions produced by the respective group of operators. It is interesting to see that apparently the greedy repair operators tend to generate infeasible solutions more frequently than the other operators. Finally, Reh. shows the number of temperature reheats.

Table 7 Statistics about the intermediate solutions

4.5 Statistical tests

In order to test the statistical significance of the obtained solutions for the comp instances, we conducted Friedman’s test, where we compared our approach with the other algorithms mentioned in Table 5. The test uses the average results for each instance and yields a p value of \(6.9\times 10^{-9}\) indicating that the results differ significantly for a critical level of \(\alpha = 0.05\). Following Derrac et al. (2011), we then performed a post-hoc analysis by computing adjusted p-values for the Friedman test with Holm’s and Hochberg’s procedures. LNS is considered as control method being compared to all other algorithms. The average ranks and adjusted p values (Friedman) are stated in Tables 8 and 9, respectively, whereas the same abbreviations for the authors are used as in Table 5. The results indicate that there is a significant difference between LNS and most of the other algorithms (\(\alpha = 0.05\)), except for the one by Abdullah and Turabieh (2012).

Table 8 Average ranks
Table 9 p values

5 Sensitivity analysis

Computational results underlying the analysis presented in this section have been computed on the Vienna Scientific Cluster. Therefore, an iteration limit has been used as termination criterion, computed for each instance as an average over five runs with the given time limit. Since ALNS incorporates randomization, the actual computation time of a single run might deviate from the requested time limit. The results are based on the comp instances with ten runs per instance. Unless stated otherwise, the same parameter values have been used as for the original version.

5.1 Contribution of the different operators

Operator statistics are listed in Table 10 indicating which operators are essential for a good performance of ALNS. The column Selection presents the average selection frequencies of the operators in percent. Deter. shows the average deterioration of the solution quality, given that the respective operator is removed while keeping all other operators and using the same iteration limit as for the original algorithm.

Table 10 Operator statistics

Apparently, all operators are useful as omitting them gives worse results, on average. With regard to the destroy operators isolation and capacity and random period contribute the most. 2-stage best and regret are the most important repair operators. The selection rate of regret may be reduced due to its high computational effort. The room assignment procedures and the priority rules perform similarly.

5.2 Effect of decreasing the destroy limit

Figure 1 shows the effects of destroying different numbers of lectures with regard to accepted solutions and new best solutions. The histograms are based on ALNS without reheating and without decreasing the destroy limit applied to comp06. Similar patterns can be observed for other instances, though. The x-axis of each plot refers to the value that is passed to the destroy operator as the requested number of removals. The y-axis indicates either the average number of accepted solutions or the average number of new best solutions resulting from repairing a partial solution with the respective number of removals. The search is split into segments of one third of the total number of iterations. Histograms are plotted for each segment.

Fig. 1
figure 1

Benefit of different destroy limits, comp06. a accepted solutions in first third, b accepted solutions in second third, c accepted solutions in last third, d new best solutions in first third, e new best solutions in first third, f new best solutions in first third

While ALNS approaches in the literature typically keep the destroy limit constant, Fig. 1 gives a clear indication that this is not the best choice. Indeed, the figures show that it is rather unlikely that removing a large number of lectures will lead to a new best solution immediately. Moreover, these solutions are barely accepted in later stages of the search. Therefore, destroying a large number of lectures cannot contribute much to the solution quality as the search proceeds. In addition, repairing partial solutions with many unscheduled lectures is relatively costly in terms of computational effort. Consequently, the destroy limit is reduced over iterations.

Table 11 presents the effect of omitting features on the performance. Column Deterioration shows the deviation of the modified algorithm from ALNS. Since these modifications typically affect the computational effort, the iteration limits have been adjusted. The last two lines of the table refer to the algorithm with uniformly distributed operator selection probabilities, i.e., without the adjustment scheme.

Table 11 Sensitivity analysis w.r.t. particular features, compared to ALNS

The feature of reheating the temperature turns out to be extremely useful. Contrary to Ropke and Pisinger (2006), we find that adding noise to the evaluation function does not seem to be very crucial for the performance of ALNS. Even without the use of any noise operator, there is only a slight deterioration observable. This indicates that the different operators lead to a sufficient diversification even without employing additional perturbation. It is interesting to see, that by discarding the weak performing noise operators, the version with uniformly distributed operator selection probabilities performs even slightly better than ALNS. This can be explained by the gain resulting from the extra iterations due to the removal of the adaptive mechanism. In case all parameter values are kept, decreasing the destroy limit has a strong effect on the performance. Fixing the destroy limit requires a reduced destroy limit compared to the original setting, though. ALNS with a fixed and adjusted destroy limit still performs worse than the original version. Apparently, omitting any feature of the algorithm will lead to a deterioration. However, when it comes to implementation in practice, some operators or features may be dropped without a significant loss in quality, in order to make the approach less complex.

6 Conclusion

This paper presented an adaptive large neighborhood search approach for solving the curriculum-based course timetabling problem. Implemented features include a reduction of the destroy limit over iterations, reheating the temperature of the simulated annealing acceptance scheme, allowing infeasible solutions, and taking the computation times of the repair operators into account when adjusting their selection probabilities. The algorithm incorporates several destroy and repair operators, including problem-specific operators that tackle the structure of timetabling problems. The performance of the algorithm was slightly enhanced by encouraging the spread over days in the evaluation function of the potential insertion positions of the lectures. Surprisingly, adding perturbation to the evaluation of potential insertion positions did not improve the performance significantly. The proposed approach generated competitive results for the benchmark instances of the second international timetabling competition. In particular, it outperformed the best algorithms of the competition. New best known solutions were found for five instances.