Keywords

1 Introduction

SDN aims to separate the control plane, and data plane to make network management easier, cut operating costs, and encourage innovation and development [1]. It is a networking design that allows for a dynamically, efficient network arrangement to boost overall network ability while also making networks more agile and adaptable [2]. It enables software applications to govern the network centrally. This allows operators to control the whole network. It allows a controller to make centralized decisions and adaptively design packet-switching nodes [3]. The SDN architecture is shown in Fig. 1.

Fig. 1
A block diagram flows from left to right. S D N architecture, applications layer flow to business applications, cloud orchestration, S D N applications, control layer flow to S D N controller.

SDN architecture [4]

1.1 Hybrid SDN

A hybrid SDN may be a networking strategy that mixes ancient networking with software-defined network manner within the same atmosphere [5]. Engineers will operate SDN technologies and old shift rules on constant physical hardware in a very hybrid SDN [3]. Whereas traditional distributed networking protocols still steer the bulk of the traffic on the network, a network manager will style the SDN management plane to seek out and govern sure traffic flows.

RL-Routing is a reinforcement learning routing technique that solves a traffic engineering (TE) problem in terms of throughput and delay in an SDN. Instead of constructing an exact mathematical model, RL-Routing tackles the TE problem through experience. One-to-many network setup for routing choices is used and extensive network information for state representation is considered. The reward mechanism, which uses network throughput and delay to optimize network throughput, can be adjusted to optimize upward or downward network throughput. The agent develops a policy that anticipates future behavior of the underlying network and offers improved routing paths between switches after receiving suitable training.

1.2 Traffic Engineering

TE is a network management technology that enhances the performance of networks by optimizing traffic routing approaches and communicating by anticipating and controlling the behavior of transmitted data [1]. Using data flow to determine link status paths within a network balances the load on multiple connections, routers, and switches [6]. This is particularly important in networks with several parallel paths.

1.3 Routing

Routing is the method of choosing the most convenient path through the associate network to send packets to a destination host or hosts so that the router forwards the packets to those hosts [6]. It is the method of creating a traffic path among a network, likewise as across and across many networks. The neural network’s purpose is to reduce network time delay while optimizing the packet pathways that are being addressed. The shortest path is regarded the most important issue in any routing method that may be carried out in real time.

1.4 Reinforcement Learning

RL could be an ML coaching strategy that rewards fascinating behaviors whereas laborious undesirable ones [7]. A reinforcement learning agent will understand and comprehend its surroundings, act, and learn through trial and error usually. It is all regarding working out a way to behave optimally during a given scenario to maximize reward. Associate degree agent explores associate degree unknown atmosphere to attain a goal within the reinforcement learning downside [8]. RL is made on the concept that the increasing of expected accumulative reward could also be wont to represent any goal. To maximize reward, the agent should learn to sense and disturb the state of the atmosphere through its activities.

Both deep learning and reinforcement learning are self-learning systems. Deep learning involves learning from a training set and then applies that knowledge to fresh data, whereas reinforcement learning involves dynamically learning by altering actions depending on continuous response to enlarge a reward. Reinforcement learning and deep learning are not mutually inconsistent.

1.5 Deep Reinforcement Learning

Deep reinforcement learning may be a variety in ML that permits intelligent robots to find out from their actions within the same from that folk do [9]. The fact that associate agent is rewarded or penalized to support their actions is inherent during this variety of machine learning [10]. Deep RL may be an answer that features deep learning permitting agents to form selections supported by unstructured computer files while not manually constructing the state area.

This paper’s primary contributions are summarized as follows:

  • The article shows different approaches to improve network performance using TE strategies with reinforcement learning in SDN and hybrid SDN.

  • Various routing methods in the software-defined network to improve network efficiency.

  • The paper shows a comparison of various traffic engineering and routing algorithms in SDN and hybrid SDN.

The remainder of this paper is organized as follows: Sect. 2 represents the various TE methods in SDN and hybrid SDN using deep RL. Section 3 describes the different routing approaches in SDN, and Sect. 4 represents the comparison of various traffic engineering and routing algorithms in SDN and hybrid SDN.

2 TE in SDN and Hybrid SDN Using Reinforcement Learning

2.1 TE in Hybrid SDN

Guo et al. [6] proposed a novel method on the TE in hybrid SDN. With the rise of software-defined networks (SDNs), network routing is becoming more centralized and flexible. Traffic engineering in hybrid SDNs is a topic that is attracting wide interest from academia and industry. RL-based traffic splitting method that learns to balance the dynamically changing traffic and address it through a traffic splitting agent in hybrid SDN. To generate a routing strategy for new traffic needs quickly and intelligently, a traffic splitting agent is created and learned offline using the RL algorithm to build a direct link between traffic demands and traffic splitting rules. The powerful traffic splitting guidelines provides the policies that can be used to set up the traffic splitting ratios on SDN switches. These switches can be advanced quickly and can expand rapidly once the traffic splitting agent has been learned [11]. It is usually recommended to create a suitable simulation environment to avoid routing loops. The traffic splitting guidelines are used to fulfill the interactive standards.

An RL approach for tackling the TE problem of the hybrid SDN consists of two steps: offline learning and online routing. Draw the directed acyclic graph (DAG) [12] based on hybrid SDN architecture and traffic statistics to ensure a hybrid SDN environment with loop-free routing for the RL agent’s interaction during the offline learning stage. A traffic-partitioning agent is taught to make a direct link between the network environment and the routing techniques using the RL methods and the constructed DAG. When the demand of the traffic changes, the trained traffic splitting agent can quickly establish an acceptable routing scheme at the online routing stage.

The TE problem’s network model is a hybrid SDN with dynamic traffics. The network topology is represented as an undirected graph H = (X, F), where X is the set of forwarding devices, which is made up of SDN switch Xs and router X1, and F is the link set, with C(f) denoting the capacity of connection \(f \in F\).

The group of (TMs) may be indicated by b = {B1, B2, …, Bn} also ri may be the weight constant of TM bi. Those components Bi(p, q) to \(b_i\) speak the traffic demand starting from gadget p to gadget q.p,q ∈ X, W is the link weight setup under the OSPF protocol. Variable e means the partitioning of traffic flows.

A collection of TMs B, the purpose of TE on the hybrid SDN is to increase the network performance by minimizing the maximum link usage (MLU) [13] for every single TM Bi. In the hybrid SDN, both the link weight settings \(w\) and the partitioning traffic movement \(f\) on each SDN switch decide the result of MLU. TE problems on hybrid SDN are defined as

$$ \begin{aligned} & {\bf{Minimize}}\mathop \sum \limits_{i = 1}^n rU_i^{\max } \\ & \mathop \sum \limits_{j = 1}^n r_i = 1,\quad 0 \le r_j \le 1 \\ & w\left( f \right) \in N,\;\forall f \in F \\ & 0 \le U_i^{\max } \le 1 \\ \end{aligned} $$

A node \(q \in X\) and TM \(B_i \in B\), we calculate node q’s MLK as below:

$$ {\text{MLK}}\;B\left( v \right) = \max U_i \left( e \right) $$

The value of MLK of node \(q\) in different TMs is calculated based on \({\text{MLK}}\left( q \right)\) by computing equation

$$ {\text{score}}\left( v \right) = \sum \limits_{B_i \in B} {\text{MLK}}\;B_i \left( v \right) $$

The TE’s goal in hybrid \(S\) reduces the MLU as much as possible. As a result, MLU is included in constructing reward rt as defined to enable the traffic-partitioning agent to effectively learn optimal rules with small MLU

$$ \begin{aligned} r_t & = \left\{ {\begin{array}{*{20}c} { - e^{2\left( {\frac{1}{a} - 1} \right)} } \\ 0 \\ {e^{2\left( {a - 1} \right)} } \\ \end{array} } \right. \\ a & = U_i^{\max } /U_t^{\max } \\ \end{aligned} $$

The index a represents the level of performance improvement after traffic-partitioning rules are implemented at \(t\) time.

RL approach is compared to other methods such as open shortest path first and WA-SRT. RL technique outperforms the OSPF method in terms of MLU reduction and comes close to the WA-SRTE method [14]. The RL approach demonstrates the ability to relate network status to traffic-partitioning rules, assisting in the link capacity balancing process.

2.2 TE in Hybrid SDN

Zhang et al. [5] proposed a novel method for TE issues with RL. By rerouting as several flows as potential, traffic engineering approaches area units capable of achieving the best performance [13]. One TE methodology for mitigating the impact of a network outage is to balance link usage by forwarding the bulk of traffic matrices that area unit wedged. The networking unit provides two examples: ECMP [15] and resending form of essential flows utilizing a code package. Extremely, crucial flows rerouting RL is another RL technique, mechanically learn the rules for choosing needed flow for every TM. CFR-RL formulates and solves an easy arithmetic drawback so as to direct these essential flows so as to balance network association usage.

An RL-based technique for balancing network link utilization by learning critical flows selecting strategy and rerouting relevant essential flow. CFR-RL uses the reinforce [16] technique to train this neural network, with minor tweaks. Reward and state space of a key flow selection strategy utilizing a customized RL technique are all included. State space inputs are the traffic matrix, and RL issue necessitates a huge action space containing massive network nodes, with the LP function as a reward. ECMP routing is used to disperse the traffic.

State: An agent is given state \(c_t = T_t\), where \(T_t\) is TM at time t that provides information about each flow’s traffic demand.

Action Space: Action spaces are of two types. They are discrete action space and continuous action space. Here, the agent chooses from a finite action set which distinct action to do using a discrete action space. Actions are conveyed as a single real-valued vector in a continuous action space. CFR-RL would choose \(P\) essential flows for each state st. Given that a network with \(K\) nodes has a total of \(K*\left( {K - 1} \right)\) flows, RL issue will necessitate huge action space the value of the \(C_{k*\left( {k - 1} \right)}^p .\) Permit agent to some P distinct actions in every time t by setting the action space to \(\left( {0,1, \ldots ,\left( {K* \left( {K - 1} \right)} \right)} \right)\) [17].

Reward: CFR-RL resends these key flows and gets the highest link usage U by resolving the problem of routing optimization after sampling \(P\) distinct essential flows for state \(c_t\). Reward \(q\) is set to \(1/U\) that reflects network performance after crucial traffic is rerouted to balance link use.

The following is a description of the crucial flow rerouting problem. The goal is to acquire optimal routing ratios \(\sigma_{s,d}^{i,j}\) for each essential flow, so in MLU, \(U\) is minimized, given a network \(H \left( {X,F} \right)\) with the group of demands of traffic \(Bs,d\) for the set of essential flows \(\left( {fk} \right)\) and link load \(L_{i,j}\) supplied by the rest of the flows that are utilizing the default settings. We construct the problem of rerouting as an optimization to find all viable under-utilized pathways for the specified important flows

$$ {\bf{minimize}}\;U + \sum \limits_{\left( {i,j} \right) \in F} \sum \limits_{\left( {s,d} \right) \in f_k } \sigma_{i,j}^{s,d} $$

The optimal routing result for chosen important flow is derived by addressing the aforementioned LP problem with LP solvers. The SDN controller then installs and changes flow entries at the switches in the appropriate order.

The CFR-RL scheme is presented with the goal of lowering maximum link utilization in a network and minimizing network disturbance that causes service disruption. By resending just 11–20.3% of the entire traffic, CFR-RL delivers near-optimal performance. CFR-RL achieves optimal load balancing performance in excess of 95% of the time.

2.3 ScaleDRL Scheme for TE in SDN Using Pinning Control

Sun et al. [10] proposed a method for SDN issues using TE. Deep reinforcement learning and software-defined learning able to develop a model-free TE system with the assistance of networking technologies. Existing DRL-based TE results, on the opposite hand, all have a quantifiability issue that forestalls them from getting used in giant networks. A method that mixes management theory associate degreed DRL technology is planned to develop an economical network management approach for TE [18]. ScaleDRL could be a planned approach that employs the construct of promise management theory to spot and label the group of network links as essential. Supported traffic distribution info received by SDN controller, DRL approach is utilized to effectively change the collection of link loads for vital links. The forwarding pathways of network flows are often dynamically modified employing a weighted shortest path technique.

ScaleDRL presents a mechanism for evaluating the importance of network links in routing path development. A control theory-based flow selection algorithms is developed on this assessment approach [19]. ScaleDRL adapts the DRL method to manage communication network traffic allotment statistics and dynamically construct TE regulations. ScaleDRL is validated using OMNet++ in a fine-grained simulation with several network topologies of various sizes.

The link weights are the goal of DRL algorithm in ScaleDRL. As a result, in order to pick a fraction of data plane links with pinning control, we basically introduce the concept of equilibrium to define the flow relevance in overall network’s routing patterns. Link centrality, in particular, refers to the correlations that exist between links as a result of routing pathways.

For network \(H\), \(R\) is to represent the group of node \(r\) and \(F\) is to represent links \(f\) between nodes, having \(H = \left( {R,F} \right).\) For a couple of node \(r_i ,\,r_j \in R\), that shall contain at least a single readdressing path \(p_{i,j} = \left\{ {f1,f2, \ldots ,f\left| o \right| } \right\}\) that will send traffic from \(r_i\) to \(r_j\) nodes, and represent the shortest path between them as \(o_{i,j}^{*}\).

With weighted shortest path algorithms, computed the shortest path on \(H\), where value of weight for the link fm is represented by \(w_m\) (0 ≤ m ≤ |F|). Use the indicator \(y_{i,j}^m\) to denote if link fm is acquire in \(o_{i,j}^{*}\) that is \(y_{i,j}^m\) = 1 if \(o_{i,j}^{*}\) carry fm, \(y_{i,j}^m =\) 0. The equilibrium of the link fm is defined as

$$ f_m = \frac{{\sum_{i = 1}^{\left| R \right|} \sum_{j = 1}^{\left| R \right|} x_{i,j}^m }}{\left| R \right| \times \left| R \right|} $$

A MDP is used to define model DRL’s working process (MDP) [20]. The DRL algorithms in the MDP interconnect with target environment. MDP is characterized as follows:

$$ N = \left( {T,A,Q,S,Z} \right), $$

with \(T\) represents state space \(s\), \(A\) represents action space \(a\), \(Q\) represents reward space \(q\), \(S\) represents the group of probability \(P\), and \(Z\) denoting a discount factor.

The entire training goal of DRL algorithm is to increase the cumulative rewards

$$ Q = \mathop \sum \limits_{t = 0}^T \gamma^t q_t $$

In numerous network topologies, the packets simulation reveals the ScaleDRL decreases typical end coordinated universal time up to 40% in comparison with progressive DRL-based TE system. The link centrality-based choice theme had the best performance of all the schemes studied that confirms the link centrality-based choice strategy.

2.4 RL Approach for TE Based on Link Control

Xu et al. [3] proposed a method for TE issues using RL. Deep reinforcement learning (DRL) permits to use of machine learning to make a model-free TE theme. Existing DRL-based TE solutions, on the opposite hand, cannot be utilized in massive networks. To develop a TE theme, a theme that mixes management theory and DRL is bestowed. The prompt arrange ScaleDRL selects link range in network and names the essential flows employing a construct from promise management theory [21]. A DRL technique is employed to dynamically alter the weights of links for essential flows supported traffic distribution statistics. The forwarding pathways of the flows will be dynamically changed employing a weighted shortest path technique.

The basic idea for pinning management is during advanced and dominant whole network elements to attain network state consumes a lot of resources within the management algorithmic rule and so does not happen; instead, bound management signals in mere a part of the network may be removed to attain the supposed synchronization mode. ScaleDRL is enforced supported SDN. There is a vital link algorithmic rule and a DRL algorithmic rule [22] that resides within the ScaleDRL management. With SDN, traffic distribution may be collected from time to time by the controller, and TE rules may be upgraded sporadically. Supported this technology, ScaleDRL operates in 2 categories: offline and online. Within the offline section, the link choice algorithmic rule analyzes constellation and a select group of flows as important network flows supported pin-control perspective. Within the online section, the DRL algorithmic rule manages flow weights to direct traffic networks.

In link weight algorithm [19], for network \(H,R\) is to represent the group of node \(r,f\) is to represent links \(f\) between nodes, having \(H = \left( {R,F} \right)\). For a couple of nodes \(r_i ,r_j \in R\), that shall contain at least a single readdressing path \(p_{i,j} = \{ f1,f2, \ldots ,f\left| o \right| \)} that will send traffic from ri to rj nodes, and represent shortest path between them as \(o_{i,j}^*\).

With weighted shortest path algorithms, computed the shortest path on \(H\), where value of weight for the link \(f_m\) is represented by \(w_m\) (0 ≤ m ≤ |F|). Use the indicator \(y_{i,j}^m\) to denote if link \(f_m\) is acquire in \(o_{i,j}^*\) that is \(y_{i,j}^m\) = 1 if \(o_{i,j}^*\) carry \(f_m\), \(y_{i,j}^m =\) 0. The equilibrium \(\beta\) of the link \(f_m\) is defined as

$$ \beta \left( {b_m } \right) = \frac{{\sum_{i = 1}^{\left| R \right|} \sum_{j = 1}^{\left| R \right|} y_{i,j}^m }}{\left| R \right| \times \left| R \right|} $$

DRL algorithm develop an action with in neural network to the surroundings at time \(t\) of MDP supported the observation of the state \(k_t\). The DRL algorithm obtains reward \(r_t\) after \(a_t\) is performed in the environment, which assesses at’s performance in the environment. There are currently several forms of DRL algorithms, with the key distinction being the mechanism used to update the neural network parameters in DRL. The DRL framework we are using is ACKTR [23].

The packets simulation reveals the ScaleDRL decreases typical end coordinated universal time up to 40% in comparison with progressive DRL-based TE system.

3 Routing in SDN Using Deep Reinforcement Learning

3.1 RL-Routing: An SDN Routing Algorithm Based on DRL

Chen et al. [7] proposed a method for routing in SDN based on RL. Because communication networks have become so complex and dynamic, they are challenging to describe and anticipate. To overcome traffic engineering challenge of SDN with respect to throughput, latency to create a reinforcement learning routing method. Instead of constructing an exact mathematical model, RL-Routing tackles TE problems through training. To employ a one-to-many network setup for routing options and use extensive network information for state representation. The reward method, which will use networks work rate and delay to optimize network throughput, may be adjusted to optimize up or down network work rate. The algorithm develops a rule which anticipates further behavior of basic networks and offers improved routing pathways among switches after receiving suitable training.

The RL-Routing is located, how it interconnects with remaining components in SDN architecture. For message exchange, the controller attaches to switch over the OpenFlow path. There are two main modules in the RL-Routing application. They are network monitoring modules, which obtain network information through passive and active network measurements [24]. Network information relates to the position of network tool, such as flow delay and flow work rate. Another module is the action translator module, which converts the algorithm chosen action into the suitable group of OpenFlow messages to modify switch link tables.

Assume network as \({\text{DGH}}\left( {X,F} \right)\) where \(X = \left\{ {s_1 ,s_2 , \ldots s_n } \right\} \) is set of switches and \(F \subseteq X \times X\) is the group of the flows in a network, |F| = m. Consider the network flows are two-way directional means \(f_{i,j}\) and \(f_j ,I\) are upward and downward flows connect to \(s_i\). Neighbors of switches \(s_i\) are \(N\left( {s_i } \right) = \left\{ {s_j \in X} \right\}.F\left( {s_j } \right) = \left\{ {f_i ,k \in F} \right\} \) is group of adjacent edge to \(s_i\). \(s_{{\text{src}}}\) is the source switch.

Let \({\text{Dsrc}} \subset X-\left\{ {s_{{\text{src}}} } \right\}\) be collection of entire destination switch from \(s_{{\text{src}}}\). Path \(p_{\text{src,des}}\) is a path in network \(H\left( {X,F} \right)\) that interconnects \(s_{{\text{src}}}\) to \(s_{{\text{des}}}\) to the order of switch \(\left( {s_{{\text{src}}} ,s_i ,s_j ,s_k ,s_{{\text{des}}} } \right)\), where the besides switches in order form edges in \(F\), and every switch will be visited once. \(b_t \left( {f_{i,j} } \right)\) is link bandwidth \(f_{i,j}\) that interconnects \(s_i\) to \(s_j\) at period intervals \(\Delta t\). \({\text{Delay}}\left( {f_{i,j} } \right)\) and\({\text{error}}\,t\left( {f_{i,j} } \right)\) represent delay of link and indicator for the fault occurred in fi,j for time intervals ∆t. Bandwidth of path \(b_t \left( {p_{\text{src,des}} } \right) = \min b_t \left( {f_{i,j} } \right)\) is min bandwidth of the link at period intervals \(\Delta t\). Delay of the path \({\text{delay}}_t \left( {p_{\text{src,des}} } \right) = \sum \limits_{f_{i,j} } {\text{delay}}\left( {f_{i,j} } \right)\) is addition of flow delay in path at period intervals \(\Delta t\).

The problem of TE is represented as given \(H\left( {X,F} \right),\) Ssrc, Dsrc and finds a group of flows for the succeeding ssrc’s data to switch in Dsrc. The aim is to increase swsrc’s work rate and reduces communications delay.

The description of RL-Routing is given here. Then, provided a Q learning algorithm [25] to resolve the traffic engineering problem.

Description of RL-Routing

  1. 1.

    The template for routing is designated as \(O = \left( {L,B,S,U,N} \right) \) where \(L \subset Sz \) represents state space.

  2. 2.

    \(B \) represents action space.

  3. 3.

    \(S:T \times B \to S\) represents reward function.

  4. 4.

    \(U\) represents transition probability.

  5. 5.

    \(N{ } \in { }\left[ {0,1} \right]\) represents discount rate.

On various network topologies, simulation output demonstrates the RL-Routing earns greater reward and allows host to send the big file quicker than OSPF and LL algorithm. On the NSF Net topology, for example, the total of RL-rewards routings is 119.30, where OSPF and LLs are 106.59 and 74.76. The average RL-Routing transmission period for 40 GB file is 25.2 s. OSPF and LL have 63 and 53.4 s.

3.2 TIDE: Time-Relevant Deep Reinforcement Learning for Routing Optimization

Sun et al. [8] proposed a novel method for routing issues. TIDE is a smart network control architecture in view of DRL that can progressively advance routing algorithms in an SDN network without requiring human cooperation. TIDE has been completely tried and executed in a real-world network setting. The discoveries of the test show that TIDE can change the directing system powerfully founded on the network and can limit in general network sending delay by about 9% contrasted with standard calculations. The optimization has been studied for a protracted time in network style, and several other optimization ways are given by each lecturers and business. However, such systems are either too troublesome to use in applications or perform poorly. AI-based routing ways are planned in early years, with an emergence of SDN and computing. TIDE, associate degree intelligent network management design supported DRL that may effectively improve routings ways in associate degree network while not requiring intervention of human, is planned during this study. TIDE is tested and enforced during a real-world network surroundings. The results of the experiment show that TIDE will dynamically adapt the routing strategy supported the network state of affairs and improve the full network transmittal delay.

To implement the automated routing strategy in SDN, an initial have to be compelled to build an Associate in Nursing intelligent network management design known as TIDE [26]. 3 logic planes form up the recommended design for intelligent network control: information plane, management plane, and AI plane. There are 3 components to the intelligent call loop: reward, state and assortment, rule development, and policy preparation. The most contributions are to execute intelligent routing management of a sending network, “collections-decision-adjustment” loop is bestowed, and RNN-based DRL system is rigorously created for abstracting traffic properties and might effectively develop a closure optimal routing set up counting on the ever-changing traffic distribution.

The fundamental method of TIDE is DDPG [27], a DRL framework for constant control. DDPG’s result is not distinct as a narrow group of some actions, unlike the bulk of reinforcement learning models like DQN. In order to elegantly regulate entire network traffic flows, routing optimization usually requires adjusting the link capacity for every link. As a result, the link capacity space value in the network should be large, making constant algorithms like DDPG is a good choice for creating routing strategies.

The interconnection process between agent and environment is viewed as a MDP in RL. The element tuple of MDP is \(O = \left( {V,B,K,D,Z } \right) \), where \(V\) represents state space \(v\), B represents action space \(b\), \(K\) represents reward space \(k\), \(D\) represents transition probability method, and \(z \left[ {0,1} \right] \) represents discount factor. An agent selects action \(b\) under state \(v\) according to rule, which is represented as \( (b | v) \) in normal rules and \(b = \left( v \right) \) in deterministic rules.

Value functions are used to determine if a policy is beneficial or not. The value of \(C\) is a prominent value function in reinforcement learning. When choosing action \(b\) in state \(v\), value of policy \(C\) is defined as [28]

$$ C\left( {st,b} \right) = E\left[ {\mathop \sum \limits_{k = 0}^\infty y^k K\left( {v_{t + k} ,b_{t + k} } \right)} \right] $$

TIDE decreases the overall transmission latency of entire traffic by around 9%. This is due to the growing unpredictability of noise traffic, which makes it more difficult for TIDE categorize network traffic, limiting TIDE’s capacity to make perfect decisions.

3.3 QR-SDN: Toward Reinforcement Learning States, Actions, and Rewards for Direct Flow Routing in Software-Defined Networks

Rischke et al. [9] proposed a novel method for SDN issues. QR-SDN could be an ancient tabular reinforcement learning system that builds and evaluates routing patterns of single flows in an action statehouse. The findings are accustomed produce a model-free reinforcement learning strategy. Owing to direct illustration of link routes within QR-SDN action statehouse, QR-SDN is the initial RL-Routing technique that changes many routing ways in which among given offer switch destination try whereas holding flow integrity. In alternative words, with QR-SDN, packets from an eternal flow follow a set routing path, however, flows from a continuing source destinations switches might take a spread of routes. QR-SDN tends to be enforced in an exceedingly extremely SDN compete for the testbed.

The presentation of SDN link routing drawback to the economical higher cognitive process by RL agent is not totally been investigated. The planning of the states and actions, particularly, should be self-addressed as to adequately represent link routing drawback for a process by RL algorithm, to see, however, with success an action solves the flow routing drawback, we tend to utilize the reward. The total of latencies on these pathways of the flows is the planned incentive.

Assume the network \(H \left( {W,X} \right),\) where \(X\) is a collection of edges that connects a group of vertices \(W\). We prefer to focus on a single communication flow, that is, the flows that convey data from a single sender to a single receiver. Flow \(e\) represents information transfer from a given sender \({\text{se}}\) to a given receiver \({\text{de}}\) for a certain application or transport layer context, such as a given TCP flow. \(E\) is commonly used to represent the group of all flows. We usually assume that flow \(e\) transfers a certain traffic rate \(Q_e\) into the network from supply host.

The path \(V_{s,d}\) is an order of vertices \(V = \left( {p_1 , \ldots ,p_n } \right) \) from a group of every possible path \(V{ } \in { }V_{s,d}\) interconnecting \(s\) to \(d\), where group \(V_{s,d}\) might be defined by search algorithms like DFS [29,30,31,32,33].

The SDN controller’s RL agent monitors the environment by monitoring the required main performance indicator, as bandwidth, at different times \(t = 0,1,2 \ldots\). The observation contains reward \(S_t \in S \subset S \) and the environment’s state \(R_t \) from the group of states \(R = \left\{ {R_1 ,R_2 , \ldots } \right\}\).

The state \(R_t\) should be made up of a table with the presently chosen paths \(Q\) for every flow \(e\). An action \(B_t \in B\) is chosen based on the state \(R\) and its accompanying reward \(S\). The group of alternative paths including the present path determines the set of actions \(B = \left\{ {B_{t,1} ,B_{t,2} , \ldots } \right\} \). The total latencies \(L_e\) along the present routes \(V_{s,d}\) of flows \(e \in E\) is reward \(S_t\).

For moderate to high loads, the link-preserving different routing of paths QR-SDN provides much lesser latencies than conventional unicast path routing systems, according to the tests. Shifts, like load changes owing to additional flows that end, are successfully accommodated by QR-SDN.

4 Comparison of Various Traffic Engineering and Routing Algorithms in SDN and Hybrid SDN

In the section, various traffic engineering techniques and routing algorithms used in SDN and hybrid SDN are analyzed and given in Table 1.

Table 1 An overview on various traffic engineering techniques and routing algorithms

The mentioned algorithms or techniques (Table 2) are used to improve the performance of the network and to increase the efficiency of routing in software-defined networking. Deep reinforcement algorithms and traffic engineering techniques that mentioned in the previous approaches are to optimize the maximum link utilization and to improve the flow routing in the network. Link selection algorithms are used to optimize distributed estimation and increase network performance. DRL algorithms are used for traffic control and channel rerouting in the network. RL framework helps to improve dynamically routing of flows in the network. So, to improve the efficiency of networks and to increase the overall performance of the SDN, it is needed to use traffic engineering schemes and reinforcement learning methods in the model.

Table 2 Mathematical model/algorithm used

5 Conclusion

The state-of-the-art TE techniques and routing algorithms in SDN and hybrid SDN were analyzed in-depth. For each of the article, the issues addressed, mathematical model or algorithm used along with its core classification is tabulated very clearly for the researchers to have an idea on the literature. The study gives a very elaborated insight.