Introduction

Protecting critical infrastructures against terrorism is one of the top priorities in homeland security (Moteff 2005). Among these critical infrastructures, transportation systems, which serve 32 million passengers every day in the United States, are critical for supporting the national security and economic well-being. For decades, public transit systems around the world have been considered as a principal target for terrorist acts (Strandberg 2013; Shvetsov and Shvetsova 2017; Shvetsov et al. 2017). Among these systems, airliners are considered to be hard targets due to implementation of security checkpoints and increased security measures. Over the years, the number of attempted hijackings and bombings has declined gradually (although the public areas of airports still remain vulnerable). Unlike airlines, where security checkpoints screen passengers and luggage, mass transit options like subways, passenger trains, and buses, are designed to be easily accessible and are therefore harder to protect. Ground transportation systems, which often include enclosed spaces packed with people, could prove attractive targets for terrorists. Therefore, such open transit systems are considered to be soft targets for the terrorists. The attacks in Brussels and Istanbul along with many other incidents indicate that terrorists tend to target such large crowds to cause mass human casualties in addition to panic and chaos. These incidents along with many others highlight the importance of protecting such infrastructures. The threat to these infrastructures could be substantially reduced by analyzing the risk associated with attack to each infrastructure component, mitigation planning and designing efficient response policies. This includes assigning security teams and designing efficient patrol schedules to protect vulnerable areas. Patrol scheduling involves the process of constructing optimized work timetables for security staff in order to minimize the potential damage of possible attacks. Designing patrols to protect public transport systems and other soft targets poses unique challenges that have not been properly addressed in the literature of patrol scheduling so far. One of these challenges is the dynamic nature of crowd size inside these systems. Because the adversary’s primary objective is to inflict human casualties, the attacker’s payoff value for each station depends on the number of people residing in the station. These numbers may change over time. Another challenge is to develop schedules that observe the constraints regarding human resources, for example, the generated schedules may be required to include breaks for the security teams and these breaks should not be consecutive. Moreover, efficient methods are needed to design patrols for a general network. In this paper, we address these challenges in a patrolling game setting.

Patrol scheduling has many applications in real life. Police officers patrol cities; security officers patrol terminals at airports and transportation centers; security guards patrol museums and shopping malls. The first studies on the patrolling problems were conducted in 1970s, when several studies focused on allocating patrols to different areas to optimize performance measures such as patrol delays, average waiting time and total response time (Larson 1972; Olson and Wright 1975; Chaiken and Dormont 1978; Chelst 1978). These studies assume that crime frequency in different regions remain fixed and known to the patroller. However, since the adversaries can be strategic in their attacks, they may change the location and timing of their attacks in response to the patroller’s strategy. Therefore, game theoretic analysis of such attacks yields more realistic results. To this end, several researchers developed game theoretic models for the patrolling problem. One of the first models applying game theory in patrol scheduling, named ARMOR (Paruchuri et al. 2005, 2006, 2007, 2008; Pita et al. 2008), casts the patrolling/monitoring problem as a Bayesian Stackelberg game in order to aid the security agent decide where to allocate inspection gates in an airport. ARMOR was successfully deployed at the Los Angeles International Airport (LAX) to randomize checkpoints on the roadways entering the airport. Another stream of research focuses on protecting the urban rail systems against fare evasion (Jiang et al. 2012, 2013; Yin et al. 2012; Delle Fave et al. 2014, 2015). For example, Jiang et al. (2012) developed a Stackelberg game to generate patrolling strategies to inspect passenger tickets in an urban rail transportation system. They use this model to randomly deploy security personnel for fare inspection in the Los Angeles Metro Rail System. The obtained strategy seems to yield high levels of revenue by deterring fare evasion. Another model, named IRIS (Intelligent Randomization In Scheduling) is developed by Tsai et al. (2009) to assist Federal Air Marshal Service (FAMS) with randomly scheduling air marshals on flights. FAMS deployed IRIS in limited use since October 2009. There have been some studies on designing patrol strategies for transit systems. Varakantham et al. (2013) study the problem of designing randomized patrols to enhance security of mass rapid transit systems. They formulate this problem as a Stackelberg security game to optimize the patrols in terms of travelled distance. They then apply their model on a real life case of rail network in Singapore. To solve this problem, they apply a two-stage solution algorithm; in the first stage, they obtain coverage probabilities using a Stackelberg security game. Then in the second stage, they use these coverage probabilities to derive an actual patrol. They assume that every station is accessible from every other station; therefore, it is possible to generate feasible patrols for every coverage vector. However, the accessibility assumption may not always be valid; the distance between stations, especially the ones on different lines, can be significantly high. Therefore, some of the coverage vectors may lead to patrols that cannot be conducted within the given time horizon. Delle Fave et al. (2014) develop another model for patrolling transit systems. They propose a Multi-Operation Patrol Scheduling System (MOPSS), a new system to generate patrols for transit system to tackle multiple types of illegal activities including fare evasion, terrorism and crime. However, they do not consider human resource considerations such as break times etc. Moreover, the counter terrorism module in their model assumes that the targets have fixed values. This may not be a realistic assumption due to the dynamic nature of crowd sizes inside transportation systems.

The most relevant paper to our study is conducted by Lau et al. (2016). They study the problem of generating patrolling schedules for security teams to patrol a mass rapid transit rail network of an urban area. Their objective is to deploy patrolling units to the stations in different time units so that some scheduling and security related constraints are satisfied. They develop various mathematical models and apply it to a real rail network. The shortcoming of their model is that, because it is not a game based model, it is not designed to generate randomized schedules. To remedy this, they propose to generate randomized solutions by varying some of the problem parameters such as the start time and break time for each team. However, this may lead to sub-optimal patrol schedules. Moreover, the adversary’s attack probabilities are assumed to be fixed and known. Again, this is not a realistic assumption because terrorists can change the location and timing of their attacks in response to the patrol strategy. Game theoretic models are designed to generate randomized strategies and are more suited for such adversarial settings.

In this paper, we develop a game theoretic model to schedule security teams in order to protect an urban rail network against terrorist attacks. We develop column generation based algorithms to efficiently solve the game under general network structures. The computational results show the efficiency of the proposed algorithms. The rest of this paper is organized as follows. In Section “Problem description”, the problem under consideration is described. The proposed column generation approach is explained in Section “Column generation procedure”. In Section “A heuristic solution approach for the pricing sub-problem”, a heuristic algorithm is presented to efficiently solve the pricing sub-problem. Section “Numerical experiments” presents the numerical results along with an application of the model on a real case. Finally, the main conclusion of the paper and suggestions for future research are discussed in Section “Conclusions and future research”.

Problem description

The patrolling problem considered in this paper involves scheduling a set of security teams I to protect a set of stations J on an urban rail network over a time horizon of T time periods. The time periods can represent the working hours in a day. Figure 1 shows an example of such an urban rail network in Philadelphia.

Fig. 1
figure 1

Urban rail network in Philadelphia

Most of these networks consists of multiple lines that are connected via interchange stations. We model the patrolling problem as a simultaneous game between a defender and a single adversary. The defender controls the security teams and chooses a schedule to minimize the damage from the adversary’s attack, while the adversary chooses the station and the time to attack. A pure strategy for the adversary is represented by a pair (j, t) which indicates the station j and time t to attack. A pure strategy for the defender is a schedule that determines the complete course of actions for all teams throughout the time horizon. These actions include patrolling different stations or taking a break. Each team should have a pre-specified number of breaks and these breaks should not be consecutive or scheduled at the beginning or end of the time horizon. The payoffs to the players are determined by the expected damage to the network. While the defender wants to minimize the expected damage, the adversary wants to maximize it. We denote the value of station j at time t by c jt , this value can represent the number of affected people if a successful attack is launched. If the adversary decides to attack station j at time t and the station is not being patrolled by a security team, the adversary wins a payoff of c jt . On the other hand, if the station is being patrolled by a security team at the time of attack, with some probability d j the attack will be thwarted and with probability 1 − d j the attack will be successful. Therefore, the expected damage is (1 − d j )c jt . We represent the set of all possible schedules by S and index them by s with s = 1, 2, ⋯, |S|.

The players play a zero-sum matrix game where the defender plays as the row player; with the set of all possible schedules constituting the rows of the matrix. The adversary is the column player, with the set of all possible attack pairs (j, t) constituting the columns of the game matrix. The game can be solved by generating all of the possible strategies for both players. However, for the games of large size, the set of all possible strategies is exponentially large for the defender and generating all of them becomes impractical. In the next section, we develop an efficient column generation approach to obtain the Nash equilibrium for this game.

Column generation procedure

In this section, we develop a column generation algorithm to obtain the Nash equilibrium point for the patrolling game described in Section “Problem description”. We can write the linear program (LP) to obtain the Nash equilibrium of this game as:

$$ \mathit{\operatorname{Minimize}}\kern1.25em u $$
(1)
$$ subject\kern0.34em to\kern0.5em u\ge \sum \limits_{s\in \boldsymbol{S}}{c}_{jt}\left(1-{w}_{jt}^S{d}_j\right){x}_s,\forall j,t, $$
(2)
$$ \sum \limits_{s\in \boldsymbol{S}}{x}_s=1, $$
(3)
$$ {x}_s\ge 0,\kern1.25em \forall s\in \boldsymbol{S}. $$
(4)

In this formulation x s is the probability of using schedule s ∈ S in the defender’s mixed strategy. \( {w}_{jt}^s \) is a binary parameter that is equal to 1 if schedule s interrupts an attack strategy (j, t), zero otherwise. In the terminology of column generation, this LP is called the linear programming master problem (LPM). Note that each column in the LPM corresponds to a schedule. In general the set S, may be exponentially large; however, the number of non-zero variables (the basic variables) in the LPM is equal to the number of constraints i.e. the total number of (j, t) pairs: T|J|. Therefore, even though the number of possible schedules S is large, only a small number of them is used in the Nash equilibrium. The column generation algorithm uses this idea to start with a subset S' ⊆ S of columns and generate columns as needed. The starting subset S' could be any set of feasible schedules. Using the restricted set of schedules, S', we obtain the following LP:

$$ \mathit{\operatorname{Minimize}}\kern0.5em u $$
(5)
$$ subject\kern0.34em to\kern0.5em u\ge \sum \limits_{s\in {\boldsymbol{S}}^{\prime }}{c}_{jt}\left(1-{w}_{jt}^S{d}_j\right){x}_s,\forall j,t, $$
(6)
$$ \sum \limits_{s\in {\boldsymbol{S}}^{\prime }}{x}_s=1, $$
(7)
$$ {x}_s\ge 0,\kern1.25em \forall s\in {\boldsymbol{S}}^{\prime }. $$
(8)

This problem is called Restricted LPM (RLPM). Dual of the RLPM is:

$$ \mathit{\operatorname{Maximize}}\kern0.5em v $$
(9)
$$ subject\kern0.34em to\kern0.5em v\ge \sum \limits_{j,t}{c}_{jt}\left(1-{w}_{jt}^S{d}_j\right){q}_{jt},\forall s\in {\boldsymbol{S}}^{\prime }, $$
(10)
$$ \sum \limits_{j,t}{q}_{jt}=1, $$
(11)
$$ {q}_{jt}\ge 0,\kern0.5em \forall j,t. $$
(12)

Where q jt is the dual variable associated with constraint (j, t) in the RLPM. Next step is to find a column (schedule) in S\S' that could improve the current optimal solution of the RLPM. Given the optimal dual solution q jt of the RLPM, the reduced cost of column s ∈ S\S' is \( {\sum}_{j,t}{c}_{jt}\left(1-{w}_{jt}^s{d}_j\right){q}_{jt}-v \). Based on the concept of duality in linear programming, optimality of the RLPM is equivalent to feasibility of the dual. Therefore, patrols that violate the constraint \( v\le {\sum}_{j,t}{c}_{jt}\left(1-{w}_{jt}^s{d}_j\right){q}_{jt} \) can improve the current optimal solution. Therefore, we should look for a column (schedule) s such that: \( {\sum}_{j,t}{c}_{jt}\left(1-{w}_{jt}^s{d}_j\right){q}_{jt}-v<0 \). Note that q jt are fixed, and the problem is to find a schedule s with \( {w}_{jt}^s \) such that: \( {\sum}_{j,t}{c}_{jt}\left(1-{w}_{jt}^s{d}_j\right){q}_{jt}-v<0 \). This problem is called the pricing sub-problem. The pricing sub-problem involves finding a column, i.e. a schedule, with a negative reduced cost. Subsection “Mathematical formulation to solve the pricing sub-problem” develops a mathematical program to solve the pricing sub-problem. Subsection “Overall column generation procedure and a dual bound” presents the overall column generation algorithm and a lower bound on the value of the game.

Mathematical formulation to solve the pricing sub-problem

In this section, we develop a mathematical formulation to solve the pricing sub-problem. Here is a list of parameters and variables used to formulate the pricing sub-problem:

\( {a}_{j{j}^{\prime }} \) :

Binary parameter, is equal to 1 if it is feasible to visit stations j and j' consecutively; 0 otherwise.

M :

A big enough number.

x ijt :

Binary variable, 1 if team i patrols station j at time t; zero otherwise.

y ijt :

Binary variable, 1 if team i takes a break at station j at time t; zero otherwise.

w jt :

Binary variable, 1 if a team patrols station j at time t, zero otherwise.

Using this notation, the pricing sub-problem is formulated as follows:

$$ \mathit{\operatorname{Minimize}}\sum \limits_j\sum \limits_t{c}_{jt}{w}_{jt}{q}_{jt}{d}_j $$
(13)
$$ subject\kern0.34em to\kern0.5em \sum \limits_j\kern0.2em \left({x}_{ijt}+{y}_{ijt}\right)=1,\kern0.5em \forall i,t, $$
(14)
$$ {x}_{ijt}+{y}_{ijt}+{x}_{i{j}^{\prime },t+1}+{y}_{i{j}^{\prime },t+1}\le {a}_{j{j}^{\prime }}+1,\kern0.5em \forall i,j,{j}^{\prime },t, $$
(15)
$$ \sum \limits_i{x}_{ijt}\le M\ {w}_{jt},\kern0.5em \forall j,t, $$
(16)
$$ {w}_{jt}\le \sum \limits_i{x}_{ijt},\kern0.5em \forall j,t, $$
(17)
$$ \sum \limits_{j,t}{y}_{ijt}=2,\kern0.5em \forall i, $$
(18)
$$ \sum \limits_j{y}_{ij t}+\sum \limits_j{y}_{ij,t+1}\le 1,\kern0.5em \forall i,t, $$
(19)
$$ \sum \limits_j{y}_{ijt}=0,\kern0.5em \forall i\in I,t\in \left\{i,T\right\}, $$
(20)
$$ {x}_{ijt},{y}_{ijt},{w}_{jt}\in \left\{0,1\right\}. $$
(21)

In this formulation, Eq. (13) is the objective function, which is minimizing the reduced cost. Note that the reduced cost is equal to \( {\sum}_{j,t}{c}_{jt}\left(1-{w}_{jt}^s{d}_j\right){q}_{jt}-v \) which, after removing the fixed terms, is equivalent to maximizing ∑j ∈ Jt ∈ Tc jt w jt q jt d j . Equation (14) ensures that each team at each time can be assigned to exactly one job. This job can be patrolling a station or taking a break. Equation (15) ensures for each team that the pairs of jobs undertaken consecutively are feasible, for example, the team cannot consecutively patrol two stations that are far apart from each other, or they cannot take two consecutive breaks. Equation (16) ensures that if any team is patrolling station j at time t then w jt is equal to 1. Equation (17) ensures that if no team is patrolling station j at time t then w jt is equal to 0. Constraint (18) ensures that the number of breaks for each team is exactly equal to 2. Constraint (19) ensures that consecutive breaks do not happen. Constraint (20) ensures that the breaks are not scheduled at the beginning or the end of time horizon. Constraint (21) is the integrality constraint for variables w jt , y ijt and x ijt .

Overall column generation procedure and a dual bound

Figure 2 presents the pseudo-code for the overall column generation algorithm. As seen in this figure, the column generation algorithm starts with a randomly generated set of initial columns (schedules). The RLPM is solved using this set of initial columns and the vector of dual values is obtained. Dual values are then used in the pricing sub-problem to generate a new column (schedule). If a new column with a negative reduced cost is obtained, it is added to the RLPM and the process is repeated; otherwise, the procedure terminates.

Fig. 2
figure 2

Pseudo-code for the overall column generation algorithm

During column generation, we have access to a dual bound on value of the game so that we can terminate the algorithm when a desired solution quality is reached. The following lemma offers a lower bound on the value of the game, which can be computed in each iteration of the column generation algorithm.

Lemma 1

(Dual bound) Let v(RLPM) and v(LPM) denote the optimum objective function value of the current RLPM and LPM, respectively. Also, let v(PP) be the minimum reduced cost obtained by solving the pricing sub-problem to optimality. We have: v(LPM) ≥ v(RLPM) + v(PP).

Proof

General form of this result can be found in Lübbecke (2011). Because we know that \( {\sum}_{s\in {\boldsymbol{S}}^{\prime }}{x}_s=1 \) for an optimal solution of the MP, one cannot improve v(RLPM) by more than 1 times the smallest reduced cost v(PP), hence v(LPM) ≤ v(RLPM) + v(PP). ■

Remark 1

Note that in order for the bound in lemma 1 to be valid; the pricing sub-problem should be solved to optimality. In general, the dual bound is not monotone over the iterations, this is called the yo-yo effect.

A heuristic solution approach for the pricing sub-problem

In this section, we develop a dynamic programming based greedy algorithm to obtain an approximate solution to the pricing sub-problem. This heuristic algorithm can be used inside the column generation procedure to obtain an approximate solution for the patrolling game. To solve the pricing sub-problem, we use a greedy algorithm to generate schedules. We define a patrol as a detailed course of action for one team that determines what to do at each time period t. Note that a patrol is different from a schedule. While a schedule determines the complete course of action for all teams, a patrol does so for only one team. The definitions match if we have only one security team. For each security team the greedy algorithm assigns a patrol that maximizes \( {\sum}_{j\in J}{\sum}_{t\in T}{c}_{jt}{q}_{jt}{d}_j{a}_{jt}^p\left(1-{w}_{jt}\right) \), where w jt is a binary variable equal to 1 if station j at time t is already covered. To find such a patrol, we use a dynamic programming algorithm. The details of the dynamic programming approach are presented in Subsection “Dynamic programming procedure”. In Subsection “Overall heuristic procedure”, the overall greedy algorithm is presented and some results on the quality of the proposed algorithm are presented.

Dynamic programming procedure

In this section, we develop a dynamic programming (DP) procedure to solve the pricing sub-problem to find an optimal patrol. The aim is to find a patrol p with \( {a}_{jt}^p \) such that \( {\sum}_{j\in J}{\sum}_{t\in T}{c}_{jt}{q}_{jt}{d}_j{a}_{jt}^p\left(1-{w}_{jt}\right) \) is maximized.

DP is a method for solving complex problems by breaking them down into smaller problems (Larson and Casti 1978). In order to solve a problem using DP, the problem must be divided into smaller problems called stages. The stages are often solved backward, which is the case in the proposed DP procedure. Each stage has a number of states that are generally the information needed to solve the stage. The decision at a stage updates the state for the current stage to the state for the next stage. Given the current state, the optimal decision for the remaining stages is independent of the decisions made in the previous stages. This is the fundamental principle of optimality in DP. It means that the problem can be broken down into smaller problems, which can be solved independently. Finally, a recursive relationship between the values of decision at the current stage and the optimum decisions at previous stages must be identified. In other words, the optimum decision uses the previously found optimum decision values. Elements of the proposed dynamic programming procedure are as follows:

  • t = 1, 2, …, T: Stage variable, each time period is considered a stage.

  • x(t) = [b(t), l(t)]: State variable at stage t, consists of two components: number of breaks until time period t: b(t) and location at time t : l(t).

  • u(x, t): Decision in state x and stage t, take a break at an adjacent station j or patrol an adjacent station j.

  • r(x, t, u): Instant reward in state x at stage t if action u is taken. If the action is to patrol an adjacent station j, then the obtained instant reward is r(x, t, u) = c jt q jt d j (1 − w jt ). If the action is to take a break at an adjacent station j, then the reward is r(x, t, u) = 0.

  • R(x, t): Optimum accumulated reward with state x at stage t.

  • u(x, t): Optimum action with state x at stage t.

  • F(x, u): Transition function, if action u is taken in state x, the state in the next stage will be F(x, u). If the action is to patrol an adjacent station j, then the next state is x(t + 1) = [b(t + 1), l(t + 1)] = [b(t), j]. If the action is to take a break at an adjacent station j, then the next state is x(t + 1) = [b(t + 1), l(t + 1)] = [b(t) + 1, j].

Now using these parameters, a recursive equation can be written for the optimum accumulated reward functions:

$$ R\left(x,t\right)=\underset{u(x)}{\max}\left\{r\left(x,t,u\right)+R\left(F\left(x,u\right),t+1\right)\right\}. $$
(22)

This equation describes an iterative relation for determining R(x, t), for all feasible x and t from the knowledge of R(x, t + 1) for all feasible x and t + 1. R (x, T) can be easily solved by using \( R\left(x,T\right)=\underset{u(x)}{\max}\left\{r\left(x,T,u\right)\right\} \) then R(x, T − 1) can be determined using Eq. (22). Continuing this backward procedure R(x, 1) is determined.

Overall heuristic procedure

Figure 3 presents the pseudo code for the overall greedy approach. As seen in this pseudo code, the algorithm starts with initializing w jt to indicate that no patrols have been assigned to any security team. Then the DP is used to obtain a patrol with maximum \( {\sum}_{j\in J}{\sum}_{t\in T}{c}_{jt}{q}_{jt}{d}_j{a}_{jt}^p\left(1-{w}_{jt}\right) \). Next, we assign the obtained patrol to the security team and update w jt so that is reflects the covered (j, t) pairs. This process is repeated for all available security teams.

Fig. 3
figure 3

Pseudo-code for the greedy algorithm

The following lemma shows that the greedy algorithm achieves an approximation ratio of \( 1-{\left(1-\frac{1}{\left|I\right|}\right)}^{\left|I\right|} \).

Lemma 2

For fixed values of q jt Suppose \( {w}_{jt}^{\ast } \) and \( {w}_{jt}^G \) are the optimal and greedy values of w jt . Then we have: \( {\sum}_{j\in J}{\sum}_{t\in T}{c}_{jt}{q}_{jt}{d}_j{w}_{jt}^G\ge \left(1-{\left(1-\frac{1}{\left|I\right|}\right)}^{\left|I\right|}\right){\sum}_{j\in J}{\sum}_{t\in T}{c}_{jt}{q}_{jt}{d}_j{w}_{jt}^{\ast }>\left(1-\frac{1}{\left|I\right|}\right){\sum}_{j\in J}{\sum}_{t\in T}{c}_{jt}{q}_{jt}{d}_j{w}_{jt}^{\ast } \).

Proof

The pricing sub-problem is in fact a weighted maximum coverage problem with c jt q jt d j acting as weights and |I| as the maximum number of sets to be selected. The result comes from the fact that for the weighted maximum coverage problem, the greedy algorithm achieves an approximation ratio of \( 1-{\left(1-\frac{1}{\left|I\right|}\right)}^{\mid I\mid } \) (Nemhauser et al. 1978). ■

The next lemma shows the impact of using the greedy algorithm in the column generation procedure instead of solving the sub-problem to optimality.

Lemma 3

Let v(RLPM) and v(LPM) denote the optimum objective function value of the current RLPM and LPM respectively. Also, let v(PPG) be the reduced cost obtained by solving the pricing sub-problem using the greedy algorithm. We have:

$$ v(LPM)\ge v(RLPM)+\frac{v\left({PP}^G\right)}{1-{\left(1-\frac{1}{\left|I\right|}\right)}^{\left|I\right|}}>v(RLPM)+\frac{v\left({PP}^G\right)}{1-\frac{1}{e}}. $$
(23)

Proof

Let v(PP) be the optimal solution of pricing sub-problem. From Lemma 2 we have \( V\left({PP}^G\right)\le \left(1-{\left(1-\frac{1}{\left|I\right|}\right)}^{\left|I\right|}\right)v(PP)<\left(1-\frac{1}{\left|I\right|}\right)v(PP) \). Thus:

$$ v(PP)\ge \frac{V\left({PP}^G\right)}{1-{\left(1-\frac{1}{\left|I\right|}\right)}^{\left|I\right|}}>\frac{V\left({PP}^G\right)}{1-\frac{1}{e}}. $$
(24)

Moreover, from Lemma 1 we have:

$$ v(LPM)\ge v(RLPM)+v(PP). $$
(25)

The result follows from inequalities (24) and (25). ■

Remark 2

Note that when |I| = 1, when there is only one security team, the bound in Lemma 2 is tight and the greedy algorithm obtains the optimal solution.

Remark 3

Even-though we cannot prove a tighter bound for the greedy algorithm, the numerical experiments show that the solutions obtained using the greedy algorithm match the optimal solution in every instance.

Numerical experiments

In this section, we perform computational experiments to investigate efficiency of the proposed algorithms and gain insight on some properties of the game. The algorithms are coded in C++ and CPLEX 12.6 solver has been used to solve the LPs and the pricing sub-problems. The computational experiments are performed on a computer with 2.4 GH processor and 4 GB of RAM. For all experiments in this section, we assume that the number of security teams is given as a parameter, represented by ∣I∣. We also assume that these security teams are identical and they cannot be divided into smaller teams. Meaning that each team must be in exactly one station at each time. Moreover, each security team is required to have exactly two breaks in the time horizon.

In our first experiment, we compare the performances of the proposed exact column generation approach (ECG) with the greedy column generation (GCG). Our base set of test instances consists of randomly generated instances with the underlying general graphs. To generate general graphs, the expected edge density (measured as \( \frac{\left|E\right|}{\left|J\right|\left(\left|J\right|-1\right)} \), where E is the set of edges in the graph. We do not consider self-loop edges in calculating edge density) of 60% is used for the graph, and the number of stations, |J|, ranges from 20 to 40. In generating the general graphs, we first started with a random tree and added random edges until the edge density reaches 60%. We generated five instances for each problem size, with different values of T ∈ {10, 11, …, 15} and |I| ∈ {1, 2, …, 5}.

Our results show that the GCG always finds the same expected damage value as the ECG. One possible explanation for this phenomenon is that our model only indirectly considers the lost time to travel from one station to another. We then compare the algorithms in terms of their run times. Tables 1 and 2 show the obtained run times in seconds. In these tables for each instance, the smaller run time is highlighted in bold. As seen in these tables GCG performs better than ECG for all instances of the problem. Moreover, for both algorithms, the run time generally increases as the number of stations, i.e. |J|, and the number of time periods, i.e. |T|, increases.

Table 1 Comparison of exact column generation and greedy column generation run times (seconds)
Table 2 Comparison of exact column generation and greedy column generation run times (seconds)

Figure 4 shows the effect of increasing the number of security teams on the expected damage for different number of stations. As seen in this figure, the expected damage decreases as the number of security teams increases. However, the amount of decrease in expected damage also decreases as the number of security teams increases. This diminishing returns effect is visible for all values of |J|.

Fig. 4
figure 4

The effect of number of security teams on expected damage

Figure 5 shows convergence of the lower and upper bounds of GCG over iterations for an instance of the problem with |I| = 1, T = 10 and different values of |J|. In each iteration the lower bound is computed using Lemma 3 and the upper bound is taken as the current objective function value. As seen in this figure, the lower bound is not monotone over the iterations and the yo-yo effect is visible. Moreover, for most cases, the upper bound value stabilizes way before the algorithm terminates. This means that after the upper bound values stabilize, we can terminate the column generation algorithm without undermining the solution quality drastically.

Fig. 5
figure 5

Convergence of GCG over iterations

Next, we analyze the effect of detection probabilities on the expected damage, here we assume that the detection probabilities are equal to each other. Figure 6 shows the effect of detection probability on the expected damage for different values of number of security teams |I|. As seen in this figure, the expected damage is smaller when there are more security teams. Moreover, as the detection probability increases, the expected damage, generally, decreases and this decrease, roughly speaking, behaves linearly for higher values of detection probability.

Fig. 6
figure 6

The effect of detection probability on expected damage

Next, we consider a real case of an urban rail network with 51 stations. Figure 7 shows the network graph of this case. As seen in this figure, there are two main lines that connect different parts of the city together. There is one free interchange between stations 41 and 16. Occupancy levels in each station are collected based on ridership totals, which are in turn based on turnstile entry and exclude free interchange ridership. A 12-h planning time horizon, starting from 5:00 AM and ending at 5:00 PM, is considered for this problem (i.e. T = 12 is considered). Based on this information, we run the GCG algorithm to obtain the best patrolling strategy for |I| ∈ {1, 2, …, 10}.

Fig. 7
figure 7

Case network

We first study the effect of the defender’s deviation from the Nash equilibrium on the expected damage. Specifically, we consider the case that the defender, in deriving her strategy, mistakenly thinks that the attack probabilities are the same and are uniformly distributed over stations. She then uses a probabilistic approach (PA) to obtain a single schedule. We compare the expected damage in this case with the expected damage in the Nash Equilibrium (NE). Figure 8 shows the results of this comparison. As seen in this figure, Nash strategy results in smaller expected damage values for all instances. Moreover, as the number of teams increases, the expected damage decreases for both NE and PA. The diminishing returns phenomenon is visible for NE, however this effect does not exist for PA.

Fig. 8
figure 8

Comparison of probabilistic approach (PA) and the Nash equilibrium (NE)

We now study the distribution of the expected number of visits in important stations. Figure 9 shows the expected number of visits over a 30-day period for the 5 most visited stations based on time of the day, for the case of 10 security teams. As seen in this figure, majority of the stations have two peak visit times: one starts around 7:00 AM and end around 9:00 AM, the other one starts around 2:00 PM and end around 4:00 PM. Some stations also have peak visit times at the start and end of times horizon.

Fig. 9
figure 9

Expected number of visits for five most visited stations

Next, we study the distribution of the expected damage among stations based on time of the day. Figure 10 shows the distribution of expected damage for 10 stations with the highest expected damage values, for the case of 10 security teams. As seen in this figure, for the majority of the stations there are two peak times for the expected damage: one starts around 7:00 AM and end around 10:00 AM, the other one starts around 1:00 PM and end around 3:00 PM.

Fig. 10
figure 10

Distribution of expected damage for 10 stations with highest expected damage values

Conclusions and future research

In this paper, we propose a game theoretic approach for patrolling urban rail networks. The proposed model generates inherently randomized schedules that observe rostering related constraints. Our proposed model is a non-cooperative simultaneous move game between a defender and an attacker. We developed efficient algorithms to find the Nash equilibrium for this game. We also presented lower bounds on the expected potential damage, which can be used to terminate the column generation algorithm when a desired solution quality is reached. We then run computational experiments to investigate the efficiency of the proposed algorithms and to gain insight about the value of the patrolling game. Our results show the efficiency of the proposed algorithms. Finally, we present results for the case of a real urban rail network.

Our study closes a gap in the literature by extending the existing models so that schedule randomization is done in a strategic way with consideration of possible adversary strategies. However, there is still a need to further study more efficient solution methods to find the Nash equilibrium. Moreover, studying the effect of transparency vs secrecy is an interesting topic for future research in this area. Current model assumes complete secrecy of generated schedule. However, the adversary may perform surveillance to find out the current locations of the patrollers. Transparency could be modelled via a Stackelberg game.

Another way to extend the current research is integrating the current model with long term strategic decisions to obtain a comprehensive model. The current model only studies the operational decision of scheduling the security teams; it assumes that the strategic decisions are fixed and given. For example, the number of available security teams is an input parameter, moreover the detection probabilities are assumed to be fixed and given. However, the number of security teams can be considered as a decision variable, the detection probabilities can be changed by investing on new technologies. Usually the operational and strategic decisions are studied separately, however these decisions affect each other and integrating them to obtain a comprehensive model may lead to significant reductions in the expected potential damage.