Keywords

1 Introduction

Software Defined Networking (SDN) is an emergent paradigm that offers a software-oriented network design, simplifying network management by decoupling the control logic from forwarding devices [10]. The SDN is composed of three planes: management, control, and data. The management plane is responsible for defining the network policies, and it is connected to the control plane via northbound interfaces. The brain of SDN is the control plane, which interacts with the data plane using southbound interfaces like OpenFlow [5].

Most of the current available controllers, like NOX [6] and Beacon [4], are physically centralized. Although a single controller offers a complete network-wide view, it represents a single point of failure and lacks both reliability and scalability [19]. For this reason multi-controller SDNs were developed [5], allowing the control plane to be physically distributed, but maintaining it logically centralized by synchronizing the network state among controllers [20]. Controllers of this type include OpenDaylight [11] and Kandoo [7]. Multi-controller SDNs are able to solve the main problems found in the centralized SDN, but new challenges are introduced, like network state synchronization and switch-controller mappings. Another main problem in multi-controller SDNs is the Controller Placement Problem (CPP) [15]. The problem is proved to be NP-hard [17], and is one of the hottest topics in multi-controller SDNs [8, 12, 13, 17].

When a switch receives a new packet, it consults its forwarding rules in the flow table in order to determine how to handle the packet. If there is no match in the flow table, the packet is buffered temporarily and the switch initiates a packet-in message to the responsible controller. Reactively, the controller calculates the path for this packet and installs a new rule in the affected switches. Two major factors that influence the effectiveness of this process are: (i) load at the controller; (ii) propagation delay between the switch and the controller. These two constitute a single efficiency measure: the flow setup time, which will be twice the propagation delay between switch and controller plus the queuing and processing time of the packet in the controllers.

The CPP aims at deciding the number of required controllers and where to place them [8], partitioning the network into subregions (domains), while considering some quality criteria and cost constraints [1, 12]. The CPP model discussed in this article incorporates the previously mentioned flow setup time, while presenting reliable and scalable solutions.

The remaining part of this paper is organized as follows. Section 2 discusses work related with CPP in SDNs. Section 3 discusses the mathematical model, whose results are discussed in Sect. 4. Finally, some conclusions are drawn in Sect. 5, together with future work.

2 Related Work

The placement problem was mentioned for the first time in [8]. In fact, this problem is similar to the popular facility location problem, and is solved in the aftermentioned article as K-center problem, to minimize the inter-plane propagation delay. In [17] the problem is extended to incorporate the capacity of the controllers. A new metric called expected percentage of control path loss is proposed in [16] to guarantee a reliable model. Cost, controller type, bandwidth, and other factors are considered in [12], and the expansion problem is considered in [13]. The problem is usually modeled in these articles as integer programming.

Heuristic methods that incorporate switch migration can be found in [18]. In [2] a game model is also proposed. A comprehensive review of heuristic methods can be found in [9]. QoS-aware CPP is presented in [3] and solved using greedy and network partitioning algorithms. Recently, scalability and reliability issues in large-scale networks are considered in [1]. Clustering and genetic approaches are proposed, but these approaches are prune to sub-optimality.

When comparing our model with other in the literature, our model determines the optimal placement considering different failure scenarios and latency to reduce latency and overload of controllers while ensuring scalability, leading to reduced overload at controllers.

Table 1. Known information.
Table 2. Required variables.

3 Mathematical Model

In the following discussion the physical topology graph is assumed to be defined by \(\mathcal {G(N,L)}\), where \(\mathcal {N}\) is a set of physical nodes/locations and \(\mathcal {L}\) is a set of physical links. The remaining notation for known information and variables, used through this paper, is presented in Tables 1 and 2.

Objective Function: To ensure the linear scale up of the SDN network the goal will be:

$$\begin{aligned} \text {Minimize } \delta + K_1 \frac{\varTheta ^{\text {TOTAL}}}{\varDelta } + K_2 \frac{\varPi ^{\text {TOTAL}}}{\varDelta } \end{aligned}$$
(1)

where \(\varDelta \) is a big value. The primary goal is to minimize \(\delta \), and then to reduce latency. The factors \(K_1\) and \(K_2\) should be adapted according to inter-controller and switch-controller latency relevance. The motivation behind giving \(\delta \) more importance is that the provided solution for controller placement will be used for a relatively long period of time, during which traffic conditions may change. Therefore, the scalability of the solution is considered to be critical.

Constraints: The following additional constraints must be fulfilled.

– Placement of controllers:

$$\begin{aligned} \sum _{\{n \in \mathcal {N}_c\}} \sigma ^c_n = 1, \forall c \in \mathcal {C} \end{aligned}$$
(2)
$$\begin{aligned} \small \sum _{\{l \in {\mathcal {L}\backslash \mathcal {L}_f}: src(l)=n\}} \mu ^{c_i,c_j}_{n_i,n_j,f,l} - \sum _{\{l \in {\mathcal {L}\backslash \mathcal {L}_f}: dst(l)=n\}} \mu ^{c_i,c_j}_{n_i,n_j,f,l} =\nonumber \\ =\Bigg \{\begin{array}{rl} \sigma ^{c_i}_{n_i}, &{} if\ n=n_i \\ - \sigma ^{c_j}_{n_j}, &{} if\ n=n_j, \\ 0, &{} otherwise \end{array} \forall c_i,c_j \in \mathcal {C}, \forall f \in \mathcal {F}, \forall n_i \in \mathcal {N}_{c_i}, \forall n_j \in \mathcal {N}_{c_j}, \forall n \in \mathcal {N} \end{aligned}$$
(3)

Constraints (2) ensure a single location for each controller \(c \in \mathcal {C}\), while constraints (3) ensure that there will be a path between every pair of controllers, under any failure scenario, while considering their location. These paths are used for state synchronization.

In [14] it is stated that the controller load can be reduced, achieving load balance among neighboring controllers, if controller needs to communicate only with its local neighbors. Therefore, the paths from any controller, towards all the other controllers, should share as many links as possible (leads to a bus logical topology). This is ensured by the following constraints.

$$\begin{aligned} \beta ^l_f \ge \mu ^{c_i,c_j}_{n_i,n_j,f,l}, \forall c_i,c_j \in \mathcal {C},\forall n_i,n_j \in \mathcal {N}_{c}, \forall f \in \mathcal {F}, \forall l \in \mathcal {L} \end{aligned}$$
(4)
$$\begin{aligned} \varPi ^{\text {TOTAL}} =\sum _{\forall f \in \mathcal {F}} \sum _{\forall l \in \mathcal {L}} \beta ^l_f \times 1/2 \end{aligned}$$
(5)

where \(\varPi ^{\text {TOTAL}}\), counting for the highest number of end-to-end hops in inter-controller communication, is to be included in the objective function.

– Switch to controller mapping:

$$\begin{aligned} \sum _{\{c \in \mathcal {C}\}} \gamma ^{s,c}_f = 1, \forall s \in \mathcal {S}, \forall f \in \mathcal {F} \end{aligned}$$
(6)
$$\begin{aligned} \sum _{\{s \in \mathcal {S}\}} \gamma ^{s,c}_f \times p_s \le h_c \times \delta , \forall c \in \mathcal {C}, \forall f \in \mathcal {F} \end{aligned}$$
(7)

Constraints (6) ensure single mapping, and constraints (7) avoid the overload of controllers, while ensuring scalability regarding future switch migrations (triggered to deal with load fluctuations) due to the use of \(\delta \), which is, be included in the objective function too. Again, the multiple failure scenarios are taken into consideration.

– Switch-controller latency:

$$\begin{aligned} \lambda ^{s,c}_{n,f} \ge \gamma ^{s,c}_f+ \sigma ^c_n -1, \forall c \in \mathcal {C},\forall n \in \mathcal {N}_{c}, \forall f \in \mathcal {F}, \forall s \in \mathcal {S} \end{aligned}$$
(8)
$$\begin{aligned} \small \sum _{\{l \in {\mathcal {L}\backslash \mathcal {L}_f}: src(l)=n\}} \phi ^{s,c,ni}_{f,l} - \sum _{\{l \in {\mathcal {L}\backslash \mathcal {L}_f}: dst(l)=n\}} \phi ^{s,c,ni}_{f,l} =\nonumber \\ =\Bigg \{\begin{array}{rl} \lambda ^{s,c}_{n,f}, &{} if\ n=loc(s) \\ - \lambda ^{s,c}_{n,f}, &{} if\ n=n_i, \\ 0, &{} otherwise \end{array} \forall s \in \mathcal {S}, \forall c \in \mathcal {C}, \forall f \in \mathcal {F}, \forall n_i \in \mathcal {N}_{c}, \forall n \in \mathcal {N} \end{aligned}$$
(9)

where loc(s) is the location of switch s. The total latency is obtained by:

$$\begin{aligned} \varTheta ^{\text {TOTAL}} = \sum _{\{s \in \mathcal {S}\}} \sum _{\{c \in \mathcal {C}\}} \sum _{\{f \in \mathcal {F}\}} \sum _{\{l \in \mathcal {L}\}} \sum _{\{n \in \mathcal {N}_{c}\}} \phi ^{s,c,n}_{f,l} \end{aligned}$$
(10)

\(\varTheta ^{\text {TOTAL}}\) is included in the objective function for latency minimization.

– Non-negativity assignment to variables:

$$\begin{aligned} 0 \le \delta \le 1; \sigma ^c_n, \mu ^{c_i,c_j}_{n_i,n_j,f,l}, \beta ^l_f, \gamma ^{s,c}_f,\lambda ^{s,c}_{n,f}, \phi ^{s,c,ni}_{f,l} \in \{0,1\}; \varTheta ^{\text {TOTAL}}, \varPi ^{\text {TOTAL}} \in \mathfrak {R}^{+}. \end{aligned}$$
(11)

This model assumes that the physical layer is not disconnectable under a single physical link failure. The CPLEXFootnote 1 optimizer has been used to solve the problem instances discussed in the following section.

4 Results

The values for input parameters, used by the optimizer, are displayed in Table 3. Different failure cases were used to evaluate the model under three different topologies (Fig. 1). A case relates to single link failure (no two links fail at the same time in each scenario), while the other relates to multiple link failure scenarios. Two percentages for affected links were tested. More specifically:

Case I: Single link failure scenarios, where \(\mathop {\cup }\nolimits _{f \in \mathcal {F}} \mathcal {L}_f\) affects a total of 5% (a) or 15% (b) of all the links;

Case II: Two or more links failing simultaneously, in each failure scenario, where \(\mathop {\cup }\nolimits _{f \in \mathcal {F}} \mathcal {L}_f\) affects a total of 5% (a) or 15% (b) of all the links.

Fig. 1.
figure 1

Physical topologies used for analysis of results. Large nodes are controllers, medium are switches, and small are non-used locations.

Table 3. Input parameters.

That is, in case Ia 5% of the links may fail, but no two links fail at the same time, while in case IIa there is a 0.5 probability for any pair of links to go down at the same time, which may lead to failure scenarios where two or more links fail simultaneously. In cases Ib and IIb, 15% of the links may fail.

Fig. 2.
figure 2

Results for Topology I: high connectivity and relatively low number of possible locations for the controllers.

Table 4. Physical topology details.
Fig. 3.
figure 3

Results for Topology II: low connectivity and relatively low number of possible locations for the controllers.

Fig. 4.
figure 4

Results for Topology III: low connectivity and relatively high number of possible locations for the controllers.

Topology I: Results for this topology are shown in Fig. 2. This is the most dense topology, with relatively small number of possible locations for controllers. The scalability factor \(\delta \) increases linearly as the number of packet-in messages increases.

Topology II: Results for this topology are depicted in Fig. 3. This topology presents less links than Topology I, but the number of possible locations are kept similar. The main difference regarding these results is that the switch-controller latency has increased.

Topology III: Results for this topology are shown in Fig. 4. This topology also presents less links than Topology I, but the number of possible locations for each controller is increased. In this case it is possible to observe that the total latency significantly decreases. Therefore, the model was able compensate the reduced number of links, finding adequate places for controllers that lead to lower latency.

5 Conclusions

In this paper, a scalable and reliable model for controller placement is introduced. This model is mathematically formulated and optimal solutions for controller placement, under different failure scenarios, are obtained. Results show that scalability is ensured under different failure scenarios, while latency increase can be compensated through more freedom in controller’s locations. The model also serves adequately multiple failure scenarios, presenting similar results for more critical failure scenarios and less critical ones.

In general, results show that the model is able to keep scalability (\(\delta \)) while considering failure scenarios, ensuring load balancing among controllers. The latency may increase when less network connectivity decreases, but this might be avoided if more possible locations for controllers are allowed. Results are similar for Cases Ia, Ib, IIa and IIb, meaning that the model makes a controller placement that serves adequately multiple failure scenarios.