1 Introduction

With the inception of the 5G wireless communication system, a fully connected and well-established society is said to persist, provide different types of business structures, domain expertise, and services, connected with different types of traffic patterns and intense prerequisites. Besides, several network models and different types of user equipment (UE) in turn coordinate with several potentialities comprising ultra-dense network (UDN) disposals, thus promising the comprehensive future of wireless communications even more demanding.

With the era of the inceptive stage of 5th generation (5G), this type of heterogeneous framework forces the users or devices in an abstract resource pool, shifting the move from the conventional cell notion. Also, exhibit the network administrator with the cooperative resources in terms of time, frequency, and space. A crucial requirement is thus produced, to acquire an outline of the network to acknowledge optimally.

The abstractions of SDN and software defined radio (SDR), also called cross-layer controller (CLC) [1], were an exercise in a harmonized and coordinated and enlightened pattern, therefore contributing to optimize bandwidth utilization. However, fairness during resource allocation was not concentrated. In this work, throughput optimization is addressed by proposing a reinforcement machine learning that stops learning only when the convergence is attained.

SDN is distinguished by the network programmability and 5G control centralization in the controller with which fine-grained network management (NM) is said to be ensured. Intelligent probing (IPro) was utilized in [2] knowledge defined networking (KDN) standard and reinforcement learning. With the KDN-based architecture, probing interval tuning was said to be optimized by also maintaining an admissible monitoring accuracy (MA) and optimal bandwidth. Despite the accuracy and optimal bandwidth achieved, the network optimization factors were not concentrated. In this work, network optimization is arrived at by proposing a mixed integer programming formulation. It uses latency and convergence time as optimization factors.

Motivated by the above works, the proposed MI-RLNO is introduced for SDN monitoring with network optimization. By combining SDN with machine learning technique, a network optimized method is introduced in 5G-SDN. The major idea of the method is to divide the network into the data plane, control plane, and application plane.

The data plane is for data transmission and the control plane controlling the input user equipment. With the introduction of machine learning techniques in the control plane, decision making is optimal via global knowledge of network states. The method achieves efficient network management and optimizes network utilization, thus results in lesser convergence time and higher throughput.

Our contributions are summarized as follows:

  • To achieve network-wide optimization, the MI-RLNO method is introduced by monitoring the real dataset of sensors.

  • We integrate Mixed Integer Programming and Q Learning to optimize the network in terms of latency, throughput, and convergence time.

  • To verify the feasibility of the MI-RLNO method, communication time, and computation time is calculated based on transmission power and bandwidth. Simulations are performed to illustrate convergence time, latency, and throughput.

The remainder of the article is ordered as follows. In Sect. 2, related works are reviewed. The MI-RLNO method is described in Sect. 3 with the aid of diagrams and algorithms. In Sect. 4 simulation setup is provided and a detailed analysis of the discussion is included. The article is concluded in Sect. 5.

2 Literature review

With rising developments in wireless networks, a large volume of data is generated at a faster rate than could be handled by the server. A three-parameter Weibull Cumulative Distribution was presented in [3] through SDN to scrutinize latency in a cyber-physical system. However, with the rapid spectrum shortage faced due to the voluminous increase during wireless communication, optimal spectrum utilization was not ensured.

To address the issue related to volume, Artificial Intelligence-based data analytics was presented in [4] for both feature extraction and dimensionality reduction, therefore contributing to optimal spectrum allocation. But, it failed to consider the spectrum shortage problems. Besides, to ensure secure and optimal resource allocation, a blockchain empowered AI model was proposed in [5] to minimize cumulative average system utility. However, the energy communication cost was higher.

In the current era, with the global vehicles reaching over 1 billion, to guarantee ubiquitous and reliable communications, (5G) enabling technologies have been used in recent years. A 5G enabled SDN was proposed in [6] to ensure scalable networking. But, the convergence time was not minimized. Yet another software-defined space-air ground integrated moving cells contributing integrity and security was presented in [7]. Despite the improvement in scalability and security, transmission interference was not analyzed. A Euclidean planar graph along with an interference relationship graph was designed in [8] therefore improving the average throughput capacity. However, the end-to-end delay was not reduced.

In 5G communication systems, energy and spectrum resources play a significant role in continuous evolution. In [9], a Software-Defined Energy Harvesting Networking (SD-EHN) was designed to contribute to the energy scheduling process. Besides, a stochastic inventory theory was also designed using the Nash bargaining gaming theory to optimize energy utilization and energy saving also. However, for latency-sensitive businesses like the internet of things it is difficult to converge at an optimal rate and time. To address this issue, and Experience Weighted Attraction (EWA) algorithm was presented in [10] resulting in optimal convergence time. But, the transmission delay was not reduced.

With the overall performance of the network being hindered by its lifetime, the internet of things is highly susceptible to energy harvesting. A software-defined energy harvesting internet of things (SEANET) was designed in [11] to enhance the communication speed and lessen energy consumption. Though energy harvesting was attained the power consumption was not found to be optimal. An energy efficiency metric called Ratio for Energy Saving in SDN was proposed in [12] to both improve the link saving and traffic proportionality. But, the supervised and reinforcement machine learning techniques were not considered.

A load-balancing scheme called, genetic programming based load balancing (GPLB) was presented in [13] and significantly big improvement was observed in latency and throughput. However, the performance of the network was not improved. A self adaptive load balancing (SALB) scheme was investigated in [14] that performed the task of load balancing between multiple controllers with the minimum number of packet drops. Though, flexibility is offered in SDB. However, security remains one of the most complicated metrics to be achieved. In [15] a recurrent neural network based on a new regularization technique was proposed to enhance the network security. But, the throughput rate was not increased.

A Q-learning algorithm based on host weight and vulnerability success rate to reduce both complexity and improve security was presented in [16]. Though propagation latency was concentrated, however, the controller capacity and load switches were overlooked.

A Varna based optimization (VBO) technique was presented in [17] to address issues related to the controller placement problems. But, it failed to analyze the large-sized network's capacity. An up-to-date review of security concerns in SDN was proposed in [18]. However, the method was not discussing the DDoS attack on SDN. Yet another method called, Adaptively Adjusting and Mapping controllers was investigated in [19] to minimize the delay and ensure robustness. However, the communication cost for obtaining flow information was not addressed. An integer linear programming model along with a heuristic technique was formulated in [20] to minimize the cost involved in communication between flows. But, the computational complexity was not minimized.

To cope with the problem, in this paper, we propose MI-RLNO for SDN monitoring that addresses the limitations with low latency, convergence time while ensuring a high throughput.

3 Materials and methods

From the above review of related literature, it is inferred that though the robust model of bringing cognitive structure into networks has been recommended in the past few years. With the shift into the 5G wireless communication systems, the idea has not yet found its way to a large-scale application. With the ever-increasing demand for high data rates and mobility, 5G communications technologies have started revolutionizing the current network.

In this work, a 5G-enabled communication technology that possesses dual advantages, mixed integer programming for latency and converging time optimization, and reinforcement learning for fair resource allocation or throughput improvement is presented.

The integrated framework enhances the overall network by optimizing network-wide objectives like network throughput maximization, reducing convergence time, fairness resource allocation, minimizing latency, or hop count. According to these premises, the MI-RLNO method concentrates on programming models, and in particular MI-RLNO as key enabling factors to network optimization. In the remainder of this section, the MI-RLNO method is detailed along with a system model and block diagram.

3.1 System model

Consider an SDN system with a number of switches fixed in a designated area. Assume that switches have diverse traffic transmission demands with SDN connected with user equipment (i.e. IoT devices, cloud users, nodes, and so on) and are authorized to select controllers from the distinct set. In this work, the controller placement problem of the network is studied. The number of switches in the SDN system is indicated as ‘\(N\)’ and ‘\(S={S}_{1}, {S}_{2}, \dots ,{S}_{N}\)’ is set of switches, where ‘\({S}_{i}\)’ is the ‘ith’ switch. To differentiate link status among switches, a duplex identifier ‘\(DID\)’ is provided, where ‘\({DID}_{ij}=1\)’, if ‘\({S}_{i}\)’ is directly linked with ‘\({S}_{j}\)’ in SDN, otherwise ‘\({DID}_{ij}=0\)’.

Let ‘\({C}_{i}\)’ represent the ‘\(ith\)’ controller, then it is assumed that ‘\({C}_{i}\)’ is combined with ‘\({S}_{i}\)’ with a maximum of ‘\(M\)’ controllers. Let ‘\(\alpha ={\alpha }_{1}, {\alpha }_{2}, \dots .,{\alpha }_{M}\)’ represent the controller set capacity, where ‘\({\alpha }_{M}\)’ denotes the ‘\(Mth\)’ capacity that is said to be allocated to one controller. Let us further assume the User Equipment to be ‘\(UE\)’ (i.e., IoT device, Cloud computing users, mobile devices). In Fig. 1, the proposed 5G-enabled SDN is presented.

Fig. 1
figure 1

Conceptual diagram of 5G-enabled SDN

As shown in the figure, data plane forward flows in observed SDN by network devices from user equipment ‘\(UE\)’. The user equipment here includes the IoT devices, cloud users, mobile devices, and so on. Next, the control plane includes various controllers ‘\(C\)’ that translates the requirements from user equipment ‘\(UE\)’ to the application plane. Figure 2 shows the block diagram of the MI-RLNO method for 5G-enabled SDN optimization.

Fig. 2
figure 2

Block diagram of mixed-integer and reinforcement learned network optimization

The diagram of the MI-RLNO is presented in Fig. 2. The agent is located within a control plane in SDN. It obtains the Key Performance Indicators (KPIs), i.e., the raw and aggregated spectrum data from a list of all sensors that define the system state ‘\({x}_{T}\)’ at time T, and performs local action ‘\({la}_{T}\)’. The agent sends to new state ‘\({x}_{T+1}\)’ and attains a reward ‘\({r}_{T+1}\)’. The KPI, in turn, stabilizes the control process in the control plane.

The Q-learning (QL) algorithm studies optimal policy which maps states to control actions. For continuous state space, we propose to use the MI-RLNO which combines Mixed Integer Programming with the QL algorithm. In the MI-RLNO method, the controlled system is presented as a communication and computation model. The cooperative learning model uses global reward includes rewards of each learning agent. In MI-RLNO nomenclature, we employ Labels ‘\(L\)’ that indicates discrete states, Actions ‘\(A\)’ corresponding to MIP rules. Through learning, the agents apply strategy by feeding the single MQ-table. In addition to a fast convergence time, this model benefits throughput via learning by the cooperating agents. Figure 3 given below shows the MQ-table.

Fig. 3
figure 3

Sample MQ table

The components of MI-RLNO are state, actions and strategy, and utility function. They are explained in the following sections. The input state vector to the MI-RLNO is defined as follows. The input state vector to the MI-RLNO is obtained from the communication and computation model given as below. Let us design a communication model where the ‘\(UEs\)’ communicate with the controllers ‘\(Cs\)’ using a one-hop network structure.

As the intervention from the controllers using the same channel with the transmitting device, the Spectrum Efficiency ‘\(SE\)’ for the communication link between the ‘\(j\)th’ controllers ‘\(Cs\)’ and the ‘\(i\)th’ User Equipment ‘\(UEs\)’ is mathematically formulated as given below.

$${SE}_{ij}^{C}={Log}_{2}\left(1+\frac{{t}_{i}{cg}_{ij}}{\sum {t}_{j}{cg}_{jk}+\delta }\right)$$
(1)

From the above Eq. (1), ‘\({t}_{i}\)’ represents the transmission power of ‘\(ith\)’ User Equipment ‘\(UEs\)’, ‘\({cg}_{ij}\)’ and ‘\({cg}_{jk}\)’ represents the channel gain between ‘\({UE}_{i}{C}_{j}\)’ and ‘\({UE}_{j}{C}_{k}\)’ respectively and ‘\(\delta\)’ denotes the noise. In Eq. (1), the spectrum efficiency is the optimized use of spectrum or bandwidth. Then, the higher amount of data is broadcasting with the minimum error.

Then, the data rate ‘\({DR}_{C,i}^{j}\)’ of ‘\(UE i\)’ distributed by ‘\(C\)’ is mathematically formulated as given below.

$${DR}_{C,i}^{j}=BW{SE}_{ij}^{C}$$
(2)

From the above Eq. (2), ‘\(BW\)’, refers to the bandwidth of the available spectrum between ‘\(UE\)’ and ‘\(C\)’. ‘\({SE}_{ij}^{C}\)’ denotes the Spectrum Efficiency. For the controller ‘\(C\)’ computing model, the ‘\(ith\)’ UE transfer the computation task ‘\({Task}_{i}\)’ to the controller ‘\(C\)’ through a wireless connection between UEs (i.e., IoT users, mobile users, cloud users) and controllers.

Then, the controllers evaluate the task for the ‘\(UEs\)’. Hence, the computational model for executing the task includes communication time and computation time. The communication time is measured based on the size of input data ‘\({D}_{size}\)’and the data rate ‘\({DR}_{C,i}^{j}\)’ of ‘\(UE\)’ distributed by ‘\(C\)’ and hence is mathematically formulated as given below.

$${T}_{i,comm}^{C}= \frac{{D}_{size}}{{DR}_{C,i}^{j}}$$
(3)

In Eq. (3), ‘\({T}_{i,comm}^{C}\)’ represents the communication time. Let ‘\({CR}_{C,i}\)’ corresponds to the computation resource of the ‘\(C\)’ assigned to UE ‘\(i\)’. Then, the computation time for task ‘\({Task}_{i}\)’ is mathematically formulated as given below.

$${T}_{i,comp}^{C}= \frac{{A}_{i}}{{CR}_{C,i}^{j}}$$
(4)

Therefore, the total execution time of the task of UE ‘\(i\)’ distributed by ‘\(C\)’ is mathematically formulated as given below.

$${T}_{i}^{C}={T}_{i,comm}^{C}+{T}_{i,comp}^{C}$$
(5)

From Eq. (5), ‘\({T}_{i}^{C}\)’ denotes the total execution time of each task. ‘\({T}_{i,comm}^{C}\)’ represents the communication model and ‘\({T}_{i,comp}^{C}\)’ denotes the computation model. Then, the state vector is mathematically formulated as given below.

$${X}_{C}=\left[{SE}_{ij}^{C}, {DR}_{C,i}^{j},{T}_{i}^{C}\right]$$
(6)

From the above Eq. (6), the input state vector ‘\({X}_{C}\)’ is formalized based on the spectrum efficiency ‘\({SE}_{ij}^{C}\)’, data rate ‘\({DR}_{C,i}^{j}\)’ and the total execution time of each task ‘\({T}_{i}^{C}\)’ for the corresponding UE. With the obtained state value, the action is then deduced. The action is the reduced number of hops and convergence time allocated by a controller to the user equipment ‘\(UE\)’.

The optimization problem (i.e. latency optimization and convergence time optimization) with a mixed integer programming is given below. Our first action is to decrease the latency and the second action is to decrease the convergence time. The MIP model to minimize the latency to the controller is mathematically formulated as given below.

$$\begin{aligned} & Minimize Lat \\ & Subject to {f}_{ij}^{k}=0, if {Dis}_{ij}>{P}_{max} \end{aligned}$$
(7)
$$\sum {f}_{ij}^{k}-\sum {f}_{ji}^{k}= {Dev}_{i}\left[{D}_{out}+{D}_{in}\right]$$
(8)
$${f}_{ij}^{k}=Lat$$
(9)

Equation (7) ensures that the distance between user equipment ‘\(i\)’ and user equipment ‘\(j\)’ is greater than the ‘\({P}_{max}\)’ at maximum power level. Then, this user equipment communicates with each other. Equation (8) refers to the user equipment flow ‘\(f\)’ stabilizing restriction and states that for all devices ‘\({Dev}_{i}\)’, the amount of data flowing is equal to the incoming flow ‘\({D}_{in}\)’ plus the outgoing flow ‘\({D}_{out}\)’. Finally, Eq. (9) is utilized to measure the total number of latencies in SDN. The second model is used to minimize the convergence time (CT). The MIP formulation for convergence time is given as follows.

$$\begin{aligned} & Minimize CT \\ & \sum {f}_{ij}^{k}-\sum {f}_{ji}^{k}= {Dev}_{i} n\left[{D}_{out}+{D}_{in}\right] \end{aligned}$$
(10)

Equation (10) is the user equipment flow stabilizing restriction. Each device generates a data packet ‘\(D\)’ at each round and during the convergence of learning generates ‘\(n\)’ amount of data packets. The strategy of a controller ‘\(C\)’, ‘\({\pi }_{C}\)’ is a mapping between the state of controller ‘\({X}_{C}\)’ and the action is the set of possible actions for the controller ‘\({A}_{C}\)’ is mathematically formulated as given below,

$${\pi }_{C}: {X}_{C}\to {A}_{C}$$
(11)

With the above-stated optimization problem for latency and convergence time, finally, the utility is derived. The controller of user equipment optimizing utility function described through the sum of discounted rewards. The optimization problem is expressed as below.

$$\underset{{\pi }_{C}\in {\pi }_{P}}{\mathrm{max}}: {R}_{C}=\left[\sum_{t=1}^{n}\beta {r}_{C}\left({X}_{CT}{,A}_{CT}\right)\right]$$
(12)

From Eq. (12) ‘\({\pi }_{P}\)’ is a set of allowable policies for controller ‘\(C\)’, ‘\({r}_{C}\left({X}_{CT}{,A}_{CT}\right)\)’ is an instantaneous reward seen by controller ‘\(C\)’ in-state ‘\({X}_{CT}\)’ when taking action ‘\({A}_{CT}\)’ at a time ‘\(T\)’ and ‘\(\beta\)’ refers to a cut-off factor varying between ‘\(\left[\mathrm{0,1}\right]\)’. When the cut-off factor ‘\(\beta\)’ is lesser, the significance given by the controller provides the rewards concerning future ones.

Finally, with throughput optimization, the solution to the maximization problem given in (12) utilizes the action-value function under the policy ‘\(\pi\)’. It is measured as the sum of discounted rewards when starting from state ‘\({X}_{0}=X\)’ at ‘\({T}_{0}\)’ and formulated as below.

$${Q}^{\pi }\left(X,A\right)=\left[\sum \alpha r \left({X}_{t},{A}_{t}\right)| {X}_{0}=X, {A}_{0}=A\right]$$
(13)

The pseudo-code representation of mixed integer reinforcement learning is given below.

figure a

From the above mixed integer reinforcement learning algorithm, the objective of the work remains in network type optimization in terms of fairness, latency, convergence time. In this work, an integrated mixed integer programming function with the reinforcement Q learning is applied to attain the network optimization. First, the input state vector to the MI-RLNO is attained from the communication and computation model. The communication and computation model is designed along with the data rate for SDN to address the QoS requirement. Next, mixed integer programming is used to select the action and strategies for reducing the latency and convergence time. Next, the utility function is described by sum of discounted rewards. Finally, SDN with its action-value function can achieve better network utilization via network-wide optimizations.

4 Results and validation

In this section, the results of the MI-RLNO method is compared with Cross-layer control based on SDN and SDR [1] and IPro [2] for SDN monitoring. In this section, experiments are conducted with data retrieved for a sensor within a specified time window extracted from https://electrosense.org/api-spec. Similarly, the web API permits for retrieval of raw and aggregated spectrum data for a list of different sensors. Performance measures are made in terms of latency, convergence time, and throughput rate.

Experiments are conducted in python for network-wide optimization and comparative analysis is made with the proposed MI-RLNO for SDN monitoring with the existing, Cross-layer control based on SDN and SDR [1] and IPro [2]. In the simulation, SDN scenarios include several SDN switches which are arbitrarily positioned in 2000 km × 2000 km square region. In other words, the switches position follows uniform arbitrary distribution in the simulation region. Besides, it is assumed that a random connection between any two switches exists in the region. The link transmission rate is selected randomly from a set of values. The three-link characteristics are provided in Table 1.

Table 1 Link transmission rate

For simulation, the data packet size in ‘\({D}_{in}\)’ is set as 160 bytes and the arrival rate of packet message requests from user equipment is indicated as ‘\(\lambda\)’. To observe the difference of ‘\(\lambda\)’ three message request scenarios from user equipment is presented as simulation. Table 2 lists the simulation parameters. Simulation results are averaged over 500 independent processes and instances (Figs. 4, 5, 6, 7, 8).

Table 2 Data packet request
Fig. 4
figure 4

Construction of electro sense Data API

Fig. 5
figure 5

Apply control plane

Fig. 6
figure 6

Output of control plane

Fig. 7
figure 7

Apply optimization

Fig. 8
figure 8

Output of optimization

4.1 Scenario 1: convergence time

Convergence time is calculated as how fast a group of user equipment (provided with data as message request) to the controllers reaches a state of convergence. It is a significant performance indicator for network-wide optimization that runs the protocol to rapidly and reliably converge. It is measured as given below.

$$CT= \sum_{i=1}^{n}{C}_{i}*Time \left[{Dev}_{i} n\left[{D}_{out}+{D}_{in}\right]\right]$$
(14)

From Eq. (14), the convergence time ‘\(CT\)’ is calculated depending on the number of controllers ‘\({C}_{i}\)’ and time consumed for ‘\(n\)’ number of packets. It is measured in milliseconds (ms). Table 3 given below demonstrates the converging time tabulation results for the MI-RLNO method, cross-layer control based on SDN and SDR [1], and IPro [2].

Table 3 Convergence time results

Figure 9 shows the relationship between the convergence time concerning 20 different controllers with three different methods. From the figure, it is inferred that for case 1 with 2 controllers, the convergence time decreases. However, with controllers in the network reaching a value (i.e. 20), the convergence time starts to increase when the number of controllers increases. This is due to the reason that the increase in the controller results in the increase in the inter-controller delay and therefore results in an increase in the convergence time. However, from the figure, the proposed method outperforms than the existing method [1, 2]. The reason is that the algorithm in [1, 2] mainly enhances the monitoring accuracy among controllers and switches and ignores the convergence time among controllers and switches. Also, the simulation results show that the convergence time with 2 controllers offers the best latency, while convergence time with 20 controllers offers the worst latency. This is because the service request of switches for 2 controllers is low as compared to other higher numbers of controllers.

Fig. 9
figure 9

Convergence time

  • As a result, the proposed MI-RLNO method reduces the convergence time by 27% as compared to [1] and 22% as compared to [2].

4.2 Scenario 2: throughput

Throughput refers to the percentage ratio of successfully received data packets by the controller to the total data packets being sent from the user equipment ‘\(UE\)’. The throughput is measured as given below.

$$Tput=\frac{{D}_{C}}{{D}_{UE}}*100$$
(15)

From the above Eq. (15), throughput ‘\(Tput\)’ is measured based on the ‘\({D}_{C}\)’ and ‘\({D}_{UE}\)’. It is measured in terms of percentage (%). Table 4 given below demonstrates the throughput tabulation results for the MI-RLNO method, cross-layer control based on SDN and SDR [1], and IPro [2].

Table 4 Throughput results

Figure 10 given above illustrates the relationship between the throughput rate and data packets in the range of 15–150 sent by the different user equipment to the controller at different time intervals. Ten different simulation runs were conducted. With data packet requests for scenario 2 considered as simulation, to start with an increase in the number of data packets causes a decrease in throughput and at the fifth simulation, runs start increasing and so on. However, simulations conducted with 15 data packets, throughput were observed to be 86.66% using the MI-RLNO method, 80% using [1], and 73.33% using [2]. From this simulation, it is inferred that fairness during the resource allocation is said to be improved by applying the MI-RLNO method and therefore increasing the overall system throughput. This is because of the application of Mixed Integer Programming as an input state in the Mixed Integer Reinforcement Learning algorithm. Thus in turn based on the communication and computation model performs message request allocation to the corresponding user equipment.

Fig. 10
figure 10

Throughput

  • Thus, the throughput of the MI-RLNO method is improved by 7% when compared to [1] and 17% compared to [2].

4.3 Scenario 3: latency

Latency in terms of network optimization refers to the time taken for a request (i.e. data packet request) from the sender (i.e. user equipment) to the controller and for the controller to process that request. In other words, it is also referred to as the round trip time from the user equipment to the controller in SDN.

$$L=\sum_{i=1}^{n}{C}_{i}*Time \left({D}_{UE}-{D}_{C}-{D}_{UE}\right)$$
(16)

From the above Eq. (16), latency ‘\(L\)’ is measured based on the round trip time ‘\(\left({D}_{UE}-{D}_{C}-{D}_{UE}\right)\)’ and the number of controllers ‘\({C}_{i}\)’. It is measured in terms of milliseconds (ms). Table 5 given below demonstrates the throughput tabulation results for the MI-RLNO method, cross-layer control based on SDN and SDR [1], and IPro [2].

Table 5 Latency results

Figure 11 given above demonstrates the relationship between the number of controllers and latency. The performance of latency is analyzed through the dataset with two existing methods. The same input is used to analyze the latency for proposed and existing methods. From that, the proposed work reduces the latency than the existing [1, 2]. From the figure, it is inferred that for case 1 with 2 controllers, the convergence time decreases. Latency in our work is measured based on the round-trip time. From the figure, it is inferred that increasing the number of controllers, the time consumed between the sender and receiver increases and therefore the latency also increases. But with the simulations conducted for 2 controllers, the latency was observed to be 0.25 ms using the MI-RLNO method, 0.33 ms using [1] and 0.41 ms using [2]. The latency improvement in the proposed method over [1, 2] is due to the application of the Mixed Integer Reinforcement Learning algorithm. By applying this algorithm, first, communication and computation model is used along with the data rate as input, therefore improving the bandwidth utilization. Next, utilizing MIP, actions, and strategies based on the optimization factors are used to form strategies. Finally, with the discounted rewards, the allocation is made by the controller.

Fig. 11
figure 11

Latency

  • Therefore, the latency of the MI-RLNO method is reduced by 12% as compared to [1] and 26% as compared to [2].

5 Conclusions

MI-RLNO method is proposed to improve the throughput of SDN and optimize system utility (i.e. latency and convergence time). The controller placement mode of SDN is proposed in this algorithm and diverse traffic transmission demands are scrutinized based on the transmission process. The algorithm using Q Learning selects the user equipment to be allocated in the network with the required request based on the communication and computation time. Next, Mixed Integer Programming is used as a means of selecting the action and strategy according to the optimization factors. Finally, an optimal strategy is learned via the sum of discounted rewards to maximize the utility benefits. From the simulation results, the MI-RLNO method provides enhanced performance than the others.