Keywords

1 Introduction

The explosive growth of the Internet of Things (IoT) and mobile devices leads to an explosion of new applications and services, increasing the burden of what today’s Internet could carry [1]. Especially, since the data collected from a lot of devices can generate excessive traffic, several researchers have tried to solve this issue using Edge Computing [2, 3]. To cope with the numerous and diverse IoT devices, the edge computing infrastructure has to support a lot of connected devices and the processing of the massive data collected and complex applications. However, the edge server in edge computing has limited computational and processing resources compared to high-end servers in the cloud server [4]. Therefore, to support high scalability, ultra- low latency, high throughput, and reliable transmission of data, the SDN paradigm is regarded as one of the suitable solutions [5,6,7].

SDN is an innovative technology in the field of computer network that separates data transmission function and control function from each other, allowing the users the flexibility of using the functions of the network in their own devices [8, 9]. SDN has numerous advantages including direct programming, centralized management, fast delivery, and flexibility. In SDN a single controller may be used as a centralized controller. If excessive packets need to be processed, however, its performance is significantly degraded because of limited processing capacity and distance to the switching devices [2]. Meanwhile, the failure of a single controller may lead to the collapse or congestion of entire network. In order to effectively resolve these issues, the distributed controller architecture was proposed [10,11,12]. Distributed controller architectures with more than one controller could be used to address some of the challenges of a single SDN controller such as availability [13]. Furthermore, distributed controllers can reduce the latency or increase the scalability and fault tolerance of the SDN deployment. However, this architecture increases the lookup overhead of communication between switches and distributed controllers. Moreover, this approach is difficult to maintain the consistent state in the overall distributed system [14, 15].

In this paper we propose a novel scheme which allows a switch to select an optimal controller from distributed controllers in SDN for edge computing environments. The proposed scheme decides optimal controller based on the data and weight of the controllers using the improved ABC algorithm based on Apriori algorithm for effective association rule mining between switch and controller. It can achieve controller optimization while keeping the excellent performance of the improved ABC algorithm. In the proposed scheme, the priority of each controller was determined by considering the computing and communication capacity of the controllers using an improved ABC algorithm, which includes Apriori algorithm and FCFS (First-Come-First-Served) policy. The proposed scheme significantly reduces the communication latency between the switch to the controller by selecting an optimal controller compared to the existing schemes. Also, the proposed scheme can solve the consistency problem by employing the meta-heuristic association rule mining algorithm. Furthermore, through the uniformly distributed controller, the proposed scheme increase scalability, connectivity, and flexibility of the network, which increases the communication efficiency and reduces the propagation delay of the link. Computer simulation reveals that the proposed scheme consistently outperforms the scheme employing only ABC and Apriori algorithm separately in terms of response time, arrival rate, number of messages, and accuracy.

The rest of the paper is organized as follows. Section 2 introduces the related work for proposed scheme. Section 3 describes the proposed scheme in a detailed manner. The experimental results of the proposed scheme are explained in Sect. 4. Finally, Sect. 5 concludes the paper and describes future research directions.

2 Related Work

2.1 Software Defined Networking

SDN is an effective networking paradigm which makes it easier for the network manager to control the network [16, 18]. SDN is a conceptual architecture that decouples the control plane and data plane of the network and enables network partitioning [5]. SDN architecture is divided into three categories as shown in Fig. 1: the physical infrastructure layer, the controllable control layer, and the application layer. This lets SDN not only create complex paths that cannot be configured in existing networks but also effectively cope with changing traffic patterns and quickly configure the virtual networks required in cloud environments. The control plane in SDN is decoupled from the data plane by drawing the networking functions from the forwarding devices as shown in Fig. 1. The separation of the control plane and the data plane of the network has the advantage of being able to respond more quickly to a malfunction caused by a problem and increase the flexibility and availability of the network. The control functions are deployed to logically centralized controllers so that they can be implemented on a centralized software platform [17]. Using a single, centralized controller might be efficient since the overloaded switch can migrate to a new controller from the previously connected one.

Fig. 1.
figure 1

The three-layer structure of SDN

2.2 OpenDaylight

OpenDaylight (ODL) is a modular open SDN platform for the networks of any size and scale [19, 20]. By sharing YANG data structure in the common data store and messaging infrastructure, ODL allows for fine-grained services to be created and combined together to solve more complex problems. In the ODL Model Driven Service Abstraction Layer (MD-SAL), any app or function can be bundled into a service that is loaded into the controller. The model-driven approach is being increasingly used in the networking domain to describe the functionality of network devices [21], services [22], policies [23, 24], and network APIs [25]. The protocols of choice are NETCONF and RESTCONF; the modeling language of choice is YANG. NETCONF [26] is an IETF network management protocol that defines configuration and operational conceptual data stores and a set of Create, Retrieve, Update, Delete (CRUD) operations that can be used to access these data stores. RESTCONF is a model that describes the mapping of YANG data to a REST-ful API [27, 28]. It is a REST-based protocol that runs over HTTP and is used to access YANG defined data, using Network Configuration Protocol (NETCONF) defined data stores. The YANG data modeling language is used to define the data sent over NETCONF [29]. It can model both the configuration data as well as the manipulated state data.

OpenDaylight SDN controller has several layers. The top layer consists of business and network logic applications. The middle layer is the framework layer, and the bottom layer consists of physical and virtual devices. The middle layer is the framework in which the SDN abstractions can manifest. This layer hosts north-bound and south-bound APIs. The controller exposes open north-bound APIs which are used by applications. OpenDaylight supports the OSGi framework and bidirectional REST for the northbound API. The business logic resides in the applications above the middle layer. The applications use the controller to gather network intelligence, run algorithms to perform the analytics, and then use the controller to orchestrate new rules, if any, throughout the network. ODL supports multi-controllers composing a cluster. If there is only a single ODL controller, it works individually. The multi-controller structure could avoid the consequence of single controller crash, and controllers directly communicate with each other rather than via data plane.

2.3 Artificial Bee Colony Algorithm

ABC algorithm is one of the more recent swarm intelligence based optimization algorithms for solving multidimensional optimization problems [30]. Figure 2 is the flowchart of ABC algorithm.

Fig. 2.
figure 2

The Flowchart of ABC Algorithm

The intelligent behavior of honey bee colony which search new food sources around their hive was considered to compose the algorithm. In the algorithm, the colony of artificial bees consists of three groups of bees called employed bees, onlookers and scouts. While a half of the colony consists of the employed artificial bees, the other half includes the onlookers. There is only one employed bee for every food source. That is, the number of employed bees is equal to the number of food sources around the hive. The main steps of the algorithm are given below.

First, source initialization is the initial source to a random value.

$${X}_{ij}={X}_{minj}+rand(\mathrm{0,1})({X}_{maxj}-{X}_{minj})$$
(1)

Second, the employed bee searches the neighbor source and estimates the amount of nectar of the source, and informs the onlooker bee of the source of higher fitness.

$${V}_{ik}={X}_{ik}+rand(-\mathrm{1,1})({X}_{ik}-{X}_{jk})$$
(2)

Third, here a source is selected probabilistically by onlooker bee based on the source discovered by employed bee and the estimated amount of nectar. Onlooker bee selects the source by

$${P}_{i}=\frac{f({X}_{i})}{\sum f({X}_{n})}$$
(3)

2.4 Apriori Algorithm

Data Mining is a way of obtaining undetected patterns or facts from massive amount of data in a database. Association rule mining is a major technique in the area of data mining. Association rule mining finds frequent itemsets from a set of transactional databases. Apriori algorithm is one of the earliest algorithms of association rule mining [31, 32]. Apriori employs an iterative approach known as level-wise search. In Apriori, (k + 1) itemsets are generated from k-itemsets. First, scan the database for count of each candidate and compare candidate support count with minimum support count to generate set of frequent 1-itemsets. The set is denoted as L1. Then, L1 is used to find L2, set of frequent 2-itemsets, which is further used to find L3 and so on, until no more frequent k-itemsets can be found [33]. After finding set of frequent k-itemsets, it is easy to generate strong association rules. The process of finding each Lk requires the database to be scanned completely once. To improve the efficiency of the level-wise generation of frequent itemsets, an important property called the Apriori property, presented is used to reduce the search space. In Apriori property, all nonempty subsets of a frequent itemset must also be frequent. A two-step process is used to find the frequent itemsets: join and prune actions.

  1. 1)

    The join step: To find Lk a set of candidate k-itemsets is generated by joining Lk–1 with itself. This set of candidates is denoted Ck.

  2. 2)

    The prune step: The members of Ck may or may not be frequent, but all of the frequent k -itemsets are included in Ck. A scan of the database to determine the count of each candidate in Ck would result in the determination of Lk (i.e., all candidates having a count no less than the minimum support count are frequent by definition, and therefore belong to Lk). To reduce the size of Ck, the Apriori property is used as follows. Any (k–1)-itemset that is not frequent cannot be a subset of a frequent k-itemset. Hence, if any (k–1)-subset of a candidate k-itemset is not in Lk–1, then the candidate cannot be frequent either and so can be removed from Ck.

3 The Proposed Scheme

3.1 Basic Operation

The proposed scheme gathers data from all the controllers to measure how often each controller is used, and then finds the association rules based on the items run frequently by the controllers. In the SDN distributed controller environment, the controllers are selected by Apriori algorithm according to the frequency of use. For this, the data of all the controllers are initialized to random values, and then the neighbor controllers of each controller are searched regarding the amount of nectar. Using the transaction support of Apriori algorithm, the controller’s goodness of fit is estimated. In order to minimize the time for searching the association rules, the FCFS (First-Come-First-Served) policy is applied. If there exists a priority rule, the rule is selected first. Then the remaining rules are found.

In searching the source only the one of the highest value is selected, while the others are discarded. This is for minimizing the communication cost. Figure 3 is the flowchart of the proposed scheme for selecting an optimal controller based on the data and weight of the controllers.

Fig. 3.
figure 3

The Flowchart of the proposed scheme

3.2 Priority and Weight of Controller

The priority of the controllers is decided using the ABC algorithm as follows. First, the data of the controllers are initialized to random value, and then the distance between a controller and its neighbor one is measured. The fitness of the controllers is then calculated using the transaction support of the Apriori algorithm and weight of them. The weight of each controller is defined to represent the throughput. The performance of a controller is affected by various factors such as distance to communicating controller, bandwidth, transmission delay, load, and packet loss probability, etc. Only the most frequently used controller identified by the proposed approach is selected, while the remaining controllers are excluded. By selecting the most frequently used controller, collision between the controllers and communication load can be reduced. Also, the detailed selection algorithm of controller is shown in Algorithm 1.

The following is to calculate the weight of controller_v, w(v).

$$w\left(v\right)={\omega }_{1}P\left(v\right)+{\omega }_{2}S\left(v\right)$$
(4)

where \({{\varvec{\omega}}}_{1}\) and \({{\varvec{\omega}}}_{2}\) are weight coefficients. P(v) is the operational performance and S(v) is amount nectar of controller_v, respectively.

$$ \begin{aligned} W\left(v, s\right)= & \mathrm{\alpha }\cdot dis\left(v, s\right)+\upbeta \cdot ban\left(v, s\right) \\ & +\upgamma \cdot del\left(v, s\right)+\updelta \cdot load\left(v, s\right) \\ \end{aligned} $$
(5)

In Eq. (5) 0 ≤  α, β, γ, λ ≤ 1, α + β + γ + λ = 1. The parameter of distance, bandwidth, delay, and load should be normalized:

$$\mathrm{dis}\left({w}_{v}, {w}_{s}\right)=\frac{dis\left({w}_{v}, {w}_{s}\right)}{\sum_{i,j\in l(v,s)\&i\ne j}dis({w}_{i},{w}_{j})}$$
$$\mathrm{ban}\left({w}_{v}, {w}_{s}\right)=\frac{ban\left({w}_{v}, {w}_{s}\right)}{{\sum }_{i,j\in l(v,s)\&i\ne j}ban\left({w}_{i},{w}_{j}\right)}$$
$$\mathrm{del}\left({w}_{v}, {w}_{s}\right)=\frac{del\left({w}_{v}, {w}_{s}\right)}{{\sum }_{i,j\in l\left(v,s\right)\&i\ne j}del\left({w}_{i},{w}_{j}\right)}$$
$$\mathrm{load}\left({w}_{v}, {w}_{s}\right)=\frac{load\left({w}_{v}, {w}_{s}\right)}{{\sum }_{i,j\in l\left(v,s\right)\&i\ne j}load\left({w}_{i},{w}_{j}\right)}$$

Then the weight of wv to all other (n − 1) controllers in the network is:

$$All-Weight\left({w}_{v}\right)={\sum }_{\mathrm{s}=1\&\mathrm{s}\ne \mathrm{v}}^{n}weight\left({w}_{v}, {w}_{s}\right)$$
(6)

The weight of wv is calculated by

$$W\left({w}_{v}\right)=\mu \cdot weight\left({w}_{v}\right)+\sigma \cdot All-Weight\left({w}_{v}\right)$$
(7)
figure a

3.3 Transaction

The neighbor source of an assigned source is checked, and the source of more amount of nectar is notified to the onlooker bee. They are also weighted, and the one of low nectar amount is discarded. Next, based on the source searched by employed bee and the estimated nectar amount, the onlooker bee selects the source to search. In this way, the onlooker bee selects the source of the largest amount of nectar among the ones the employed bee found.

Let S and C denote a switch and controller, respectively. Each food source, \({{\varvec{X}}}_{{\varvec{C}}}^{{\varvec{S}}}\), in the population is represented as

$${X}_{C}^{S}=\left\{{X}_{C}^{S},{X}_{C}^{S}, \dots ,{X}_{C}^{S}\right\}, \forall s \in N\forall c \in {P}_{s}$$
(8)

where Ps the set of controllers and N is the number of switches. It estimates the amount of nectar a switch can have from the controllers of identical schedule number. Equation (9) is used for deciding the fitness of a switch connected to a controller, Fcs:

$${F}_{cs}={C}_{cs}+{\theta }_{cs}, \forall cs \in {P}_{s}$$
(9)

where \({C}_{cs}\) indicates the sum of coverages for all schedule numbers with complete coverage, \({\theta }_{cs}\) shows the maximum incomplete coverage and \({F}_{cs}\) represents the fitness value for the cs in the controller.

Next, each new cs in the controller for each switch is generated by only updating.

$${V}_{c}^{s}= \left\{ {\begin{array}{*{20}c} {s,} & {pri>0.5} & {\forall c\in {P}_{s}, \forall c\in {N}_{nc}} \\ {{X}_{c}^{s},} & {otherwise} & {s\in \varphi} \\ \end{array} } \right. $$
(10)

Each switch will select a certain number s in the incomplete schedule vector \(\varphi \) as a priority.

3.4 Selection Manager

The controller of a higher weight is needed to have more networking operation. In the proposed scheme, the controller appropriate for leading the update process is elected according to the weight. The network manager has the privilege of commanding the entire network since it keeps the network view. The network manager has the privilege of commanding the entire network since it keeps the network view. The priority of controller_i, \({P}_{i}\), is defined as

$${P}_{i}= \left\{ {\begin{array}{*{20}c} {\left\lfloor {S*W(c_{i} )} \right\rfloor } & {without\;manager} \\ \infty & {with\;manager} \\ \end{array} } \right. $$

where S is the scale of the network. The controllers broadcast their priority and receive the priority of other controller in a present time. They regard the controller with the highest priority as their header.

The header selection is done following three steps.

  • Step 1. \({P}_{i}\) (i = 1,2,3,…,n) are calculated.

  • Step 2. \({P}_{i}\) is broadcast during \({T}_{b}\) which is broadcast period. It also receives the priority of other controller.

  • Step 3. The controller saves the address and priority of the controller which priority is bigger than itself. And it considers the controller with the highest priority as its header. Once a controller receives the priority which is same with itself, its priority will be subtracted one to avoid the same priority.

4 Performance Evaluation

In this section, the performance of the proposed scheme is evaluated via computer simulation. The simulation is performed on a PC consisting of Intel i5-7500 CPU, Window OS, and 8GB memory, and the scheme was implemented with Python and MATLAB. Also, the performance of the proposed scheme is compared with two other schemes to verify its relative effectiveness, the ABC and Apriori algorithm. The test data set is from Stanford Network Analysis Project (SNAP), which is a general purpose network analysis and graph mining library. The controller appropriate to lead the update process is elected to handle massive network of hundreds of millions of nodes and billions of edges. It is efficient for manipulating large graphs, calculates structural properties, generates regular and random graphs, and supports the attributes of nodes and edges.

To investigate the effectiveness of the proposed scheme, we first evaluate the response time with various sizes of data. The comparison results with ABC, Apriori algorithm, and the proposed scheme is shown in Fig. 4. Figure 4 shows that the proposed scheme displays the smallest response time among the three schemes, while the ABC algorithm is the largest. As a result, the proposed scheme effectively reduces the communication overhead in searching the controllers than existing schemes.

Fig. 4.
figure 4

The comparison of response time between proposed scheme and existing schemes

Arrival rates of the proposed scheme and existing schemes are compared in Fig. 5, which demonstrates that the proposed scheme consistently shows the lowest arrival rate with different sizes of controller data.

Fig. 5.
figure 5

The comparison of arrival rate between proposed scheme and existing schemes

Figure 6 shows the total amount of messages in ABC, Apriori algorithm, and the proposed scheme. Observe from the Fig. 6 that the proposed scheme always requires the smallest number of messages than ABC and Apriori algorithm. When the size of controller data is between 10 and 30, the ABC algorithm generates more messages than Apriori algorithm. However, Apriori algorithm needs slightly more messages than ABC algorithm after 40. This indicates that Apriori algorithm becomes overloaded when the data size grows beyond a certain level.

Fig. 6.
figure 6

The total amount of messages between proposed scheme and existing schemes

Figure 7 shows the comparison results of the accuracy in ABC algorithm, Apriori algorithm, and the proposed scheme. In Fig. 7, it demonstrates that the accuracy of the proposed scheme is about 90%, and it is the highest accuracy compared with the accuracies of the existing schemes. The accuracy of the ABC algorithm is the lowest, however, it continues getting higher accuracy as the controller data gets bigger, and it has the same accuracy as the Apriori scheme while the update data is 50 eventually.

Fig. 7.
figure 7

The comparison of accuracy with the proposed scheme and existing schemes

5 Conclusion

The SDN paradigm shifts control to a centralized omniscient controller. The controller, however, creates a bottleneck due to the enormous amount of message exchanges between the switches and the controller in SDN for edge computing. As a result, inappropriate switch assignment to the controller reduces performance. In this paper, we have proposed a novel scheme which allows a switch to select an optimal controller from distributed controllers in order to reduce communication and propagation latency and improve throughput and reliability in SDN. The optimal controller is selected using the ABC and Apriori algorithms. Also, the priority of each controller in the proposed scheme was determined by considering the computing and communication capacity of the controllers. Moreover, the proposed scheme can solve the consistency problem by employing the meta-heuristic association rule mining algorithm. Computer simulation reveals that the proposed scheme consistently outperforms the ABC and Apriori algorithms in terms of response time, arrival rate, and number of messages exchanged. In the future, we will expand the proposed controller selection scheme by employing more sophisticated scheme such as Gaussian mixture model and artificial intelligence technique. Also, the proposed scheme will also be tested and expanded considering various environments and applications where the requirements on the energy and communication latency are diverse.