Abstract
5G networks use a network architecture based on SDN due to the efficiency, low cost, ease of management, and scalability provided by SDN. However, SDN has the issue of determining the routes and traffic optimization centrally. The problem of traffic optimization and routing in 5G networks has been solved through the use of deep learning algorithms such as Deep Deterministic Policy Gradient (DDPG). DDPG provides good results but suffers from the problem of overestimation bias and runs the risk of becoming unstable. These issues have been solved by an alternative deep learning algorithm called Twin-Delayed Deep Deterministic Policy Gradient (TD3). One of the changes in TD3 is training the agent with two Q value functions instead of a single Q value function and taking the minimum of two values. TD3 also uses delayed policy/target updates and smoothing of target policy. There is no mention of TD3 in literature for solving the problem of SDN routing, so this paper analyzes and compares DDPG (existing approach) and TD3 (proposed approach). A simulation environment consisting of the Omnet++ discrete event simulator was used to simulate a 5G network with SDN routing. Two different simulation runs were used—with DDPG and TD3. It was demonstrated that the TD3 approach provided a much better performance with lower latency.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Wireless communication networks have evolved with the advancements in technology to their fifth generation which supports much faster data rates, is ultra-dense, and provides lower latency. The 5G networks use a lot of emerging technologies and innovations to achieve these capabilities. Software-defined networks are one such technology deployed in 5G networks to reduce latency as well as cost. SDN architecture separates the data plane from the control plane in the network and provides a centralized network control. The routers in the traditional network are replaced with high-speed switches that have limited control plane capability. The routing functionality is shifted to the centralized SDN controller that shares the routing information with the switches. Standard protocols like OpenFlow are used for this communication between the SDN controller and the switches. The routing decisions are communicated to the switches by the SDN controller through this standard protocol. The centralized control of routing eliminates the overhead of routing from the network and also allows a centralized policy to be enforced. SDN networks provide much higher reliability and scalability compared to the traditional model of distributed routing.
The use of SDN architecture improves the performance of the networks significantly and also provides much more scalability compared to conventional routing protocols. Rego et al. [1] demonstrated a performance improvement when OSPF routing was used in an SDN network as compared to traditional distributed networks. There was a significant reduction seen in packet delays as well as jitter in video streaming applications in SDN networks as compared to OSPF routing networks. Zhang et al. [2] demonstrated that SDN routing is much more scalable and routing convergence is faster in the case of large networks with higher link delays. The OSPF networks provided better response time as compared to SDN networks in the case of small, 16-node topology. However, the response time of SDN networks was 20% faster as compared to conventional OSPF networks in the case of large, 120-node topology. Gopi et al. [3] compared the routing convergence time of conventional routing networks and SDN networks. It was demonstrated that in an 80-node topology, conventional networks took 3 times more to converge as compared to SDN networks.
This architecture, however, requires the routing function to be implemented in the SDN controller and merely shifts the problem from routers to the SDN controller.
An alternative to the use of routing protocols in the control plane is to use machine learning techniques for traffic optimizations and learning routing paths in SDN networks. Machine learning techniques can be used for capturing multiple features such as bandwidth, delays, energy efficiency, QoS in routing. The overheads of traditional routing algorithms can be eliminated through the use of machine learning.
2 Existing Work in This Area
The existing literature includes work in the area of using machine learning techniques for learning of routing in SDN networks. Reinforcement learning is one of the commonly used techniques used for this purpose.
Lin et al. [4] proposed a QoS-aware adaptive routing for SDN networks based on reinforcement learning. The specific RL algorithm used by them was State-Reward-Action-State-Reward (SARSA). SARSA is conservative and learns a near-optimal policy as opposed to optimal policy learning in Q learning.
Tang et al. [5] suggested the use of deep CNN-based learning for automatically learning routing information in SDN networks. The deep learning approach was found to be faster compared to traditional routing. At a packet generate rate of 480 Mbps, the packet loss in deep learning was 50% of the rate observed in traditional routing. Deep learning, however, has the overhead and drawbacks of supervised learning and is not dynamic.
Stampa et al. [6] proposed the use of a deep reinforcement learning approach for learning routing information from the network using the Deep Deterministic Policy Gradient (DDPG) algorithm. It was demonstrated that the algorithm provided optimal delays in the network as compared to an initial benchmark. DDPG has a problem of overestimation bias which also needs to be addressed.
Yu et al. [7] also experimented with the DDPG algorithm to optimize routing in SDN networks. Minimization of delay was used as a performance metric and the benchmark used for comparison was OSPF routing data. Under a 70% traffic load condition, the delay performance of the proposed algorithm improved by 40.4% as compared to OSPF.
Xu et al. [8] integrated the DDPG learning algorithm into the routing process of SDN to facilitate routing. A comparison was made with the traditional OSPF routing protocol.
Tu et al. [9] also used the DDPG algorithm to take real-time routing decisions in SDN networks. The reward was set chosen as a function of bandwidth, delay, jitter, and packet loss. A comparison was also done with the OSPF routing protocol and significant improvement was seen in the case of DDPG-based routing.
Pham et al. [10] proposed a knowledge plane in SDN networks to manage the routing decisions. This knowledge plane used the DDPG algorithm to learn QoS-aware routing decisions. Latency and packet loss rate were considered as the criteria for optimizing routing. Improvement in packet loss rate as well as latency was observed in the case of the DDPG algorithm as compared to traditional routing mechanisms.
Kim et al. [11] implemented a DDPG-based deep reinforcement learning agent (DRL) for routing in SDN networks. DDPG-based agents demonstrated better performance compared to Naive methods.
Sun et al. [12] proposed Time-Relevant Deep Learning Control (TIDE) algorithm based on DDPG for QoS guarantee in SDN networks. TIDE took less time for running as compared to Shortest Path (SP) algorithms and provided better results.
The DDPG algorithm, used by several researchers, has the problem of over-estimation bias. Any error is propagated through Bellman equations and can make the algorithm unstable, making it miss the target value or find a local optimum.
3 DDPG and TD3 Deep Learning Algorithms
Lillicrap et al. [13] proposed a deep learning algorithm for continuous spaces and called in Deep Deterministic Policy Gradient (DDPG). DDPG is an off-policy, model-free, online learning algorithm that uses the Actor-Critic method for learning. The DDPG agent looks for an optimal policy that maximizes the cumulative long-term reward. It uses four function approximators—Actor, Target Actor, Critic, and Target Critic for learning. The Actor takes the action to maximize the reward, and the Critic returns the expected value of the long-term reward. Targets are used to improve the stability of the optimization. DDPG uses a replay buffer to randomly sample past actions.
DDPG also uses soft updates through target networks for both Actors and Critics. This means that all the weights are not copied from the target networks to the main networks, but a fraction of the weights are copied to provide stability.
This can be mathematically shown as:
where θ and ϕ are the weights of the networks and τ is the parameter used for the soft update. The value of τ is less than 1 (typically around 0.999).
The DDPG algorithm, however, still suffers from an overestimation bias problem and can become unstable or may lead to a local optimum. To avoid this problem, several changes have been suggested in an alternative to DDPG known as Twin-Delayed Deep Deterministic Policy Gradient or TD3 algorithm. The changes from DDPG to TD3 are:
A TD3 agent learns two Q values instead of a single value used by DDPG. The minimum of these two values is used by the algorithm for updates.
Unlike DDPG, the policy is not updated after every iteration in TD3 but is updated less frequently (usually every two or three iterations of Q values).
During policy updates, the TD3 agent adds noise to the target action. This ensures that any high values are not exploited by the agent. The noise used in TD3 is Gaussian noise.
-
TD3 Algorithm
Initialization:
Two critic networks \(Q_{{\theta_{1} }} \;{\text{and}}\;Q_{{\theta_{2} }}\).
are initialized with random parameters \(\theta_{1} \;{\text{and}}\;\theta_{2}\).
An Actor network \(\Pi_{\phi }\) is initialized with random parameter ϕ.
Target networks are initialized \(\theta ^{\prime}_{1} \leftarrow \theta_{1} ,\theta ^{\prime}_{2} \leftarrow \theta_{2} ,\phi ^{\prime} \leftarrow \phi\).
The replay buffer Ɓ is also initialized.
for t in range (1, T):
Select an Action with exploration noise \(a\sim \pi_{\phi } (s) + \varepsilon ,\varepsilon \sim N(0,\sigma )\).
Watch the reward r and new state s′.
Save the transition values (s, r, a, s′) in the replay buffer.
Pick up N random transitions (s, r, a, s′) from Replay Buffer
Update the critics
if t mod d:
update ϕ using the Deterministic Policy Gradient
Update the target networks
4 Experiments and Results
The experiments were carried out using the Omnet++ simulator which is a discrete event simulator written in C++ and also provides support for Python. The hardware environment consisted of an 8 × NVIDIA A100 System with 40 GB GPUs (5 GB GPU Memory allocated for this setup). The network topology simulated using Omnet ++ was the standard 14-node NSFNet topology with 21 full duplex links as shown in Fig. 1.
The nodes were simple forwarding switches connected to a central SDN controller. The routing decisions were taken by the SDN controller using the OpenFlow control protocol. The central controller application for routing was an agent based on reinforcement learning (RL agent) that controlled the routing entries on all switches. Two versions of RL agents were implemented using TensorFlow libraries in Python. The first version was based on the DDPG algorithm and the second version was based on the TD3 algorithm (Table 1).
A traffic matrix was created for testing the performance with 1000 different traffic configurations. To check the performance of the RL, 100,000 random routing configurations were used where all nodes were reachable. The same set of routing configurations was used for all the traffic configurations (Fig. 2).
Latency is an important criterion in deciding the QoS as well as congestion. The target is to minimize latency, so negative latency was chosen as the reward function which had to be maximized by the RL agents.
The performance results obtained for the models used are shown in Figs. 3 and 4. Figure 3 compares the latency in the network as the training progresses. It can be seen that beyond 50 epochs, the latency is higher in the case of DDPG compared to TD3. The trend is consistent as the training progresses. This demonstrates the effectiveness of the TD3 algorithm in learning the SDN routing with minimal delay as compared to DDPG algorithm.
Figure 4 shows the runtime performance of the agents after the training is completed. Again, it can be seen that the performance of TD3 is better as compared to DDPG.
5 Conclusion and Future Directions
As demonstrated through the simulation results, TD3 is more effective and efficient in learning the SDN routing in 5G network as compared to DDPG and reduces the overhead of a routing protocol in the network. It is therefore preferable to use an RL agent based on TD3 for SDN routing in 5G networks.
This study has focused on static network topologies, where learning takes place in a fixed network environment. In case of any changes in the topology, the training may need to be repeated. A subject for future study is to explore transfer learning for using the current models even in case of any changes in the network topology.
References
Rego A, Sendra S, Jimenez JM, Lloret J (2017) OSPF routing protocol performance in software defined networks. In: 2017 Fourth International conference on software defined systems (SDS). Valencia, Spain, pp 131–136. https://doi.org/10.1109/SDS.2017.7939153
Afaq A, Haider N, Baig MZ, Khan KS, Imran M, Razzak I (2021) Machine learning for 5G security: architecture, recent advances, and challenges. Ad Hoc Netw 123:102667. https://doi.org/10.1016/j.adhoc.2021.102667
Zhang H, Yan J (2015) Performance of SDN routing in comparison with legacy routing protocols. In: 2015 International conference on cyber-enabled distributed computing and knowledge discovery. Xi’an, China, pp 491–494. https://doi.org/10.1109/CyberC.2015.30
Gopi D, Cheng S, Huck R (2017) IEEE 2017 International conference on computer, information and telecommunication systems (CITS)—Dalian, China (2017.7.21–2017.7.23)]. 2017 International conference on computer, information and telecommunication systems (CITS)—Comparative analysis of SDN and conventional networks using routing protocols, pp 108–112. https://doi.org/10.1109/CITS.2017.8035305
Lin SC, Akyildiz IF, Wang P, Luo M (2016) QoS-aware adaptive routing in multi-layer hierarchical software defined networks: a reinforcement learning approach. In: 2016 IEEE International conference on services computing (SCC). IEEE, pp 25–33
Tang F, Mao B, Fadlullah ZM, Kato N, Akashi O, Inoue T, Mizutani K (2017) On removing routing protocol from future wireless networks: a real-time deep learning approach for intelligent traffic control. IEEE Wirel Commun 25(1):154–160
Stampa G, Arias M, Sánchez-Charles D, Muntés-Mulero V, Cabellos A (2017) A deep-reinforcement learning approach for software-defined networking routing optimization. arXiv preprint arXiv:1709.07080
Yu C, Lan J, Guo Z, Hu Y (2018) DROM: optimizing the routing in software-defined networks with deep reinforcement learning. IEEE Access 1. https://doi.org/10.1109/ACCESS.2018.2877686
Tu Z, Zhou H, Li K, Li G, Shen Q (2019) A routing optimization method for software-defined SGIN based on deep reinforcement learning. In: 2019 IEEE Globecom workshops (GC Wkshps). IEEE, pp 1–6
Pham TAQ, Hadjadj-Aoul Y, Outtagarts A (2019) Deep reinforcement learning based qos-aware routing in knowledge-defined networking. In: Quality, reliability, security and robustness in heterogeneous systems: 14th EAI International conference, Qshine 2018, Ho Chi Minh City, Vietnam, December 3–4, 2018, Proceedings, 14. Springer International Publishing, pp 14–26
Kim G, Kim Y, Lim H (2022) Deep reinforcement learning-based routing on software-defined networks. IEEE Access 10:18121–18133
Sun P, Hu Y, Lan J, Tian L, Chen M (2019) TIDE: time-relevant deep reinforcement learning for routing optimization. Futur Gener Comput Syst 99:401–409
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.0297
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kulshreshtha, P., Garg, A.K. (2024). Traffic Optimization and Optimal Routing in 5G SDN Networks Using Deep Learning. In: Shaw, R.N., Siano, P., Makhilef, S., Ghosh, A., Shimi, S.L. (eds) Innovations in Electrical and Electronic Engineering. ICEEE 2023. Lecture Notes in Electrical Engineering, vol 1115. Springer, Singapore. https://doi.org/10.1007/978-981-99-8661-3_3
Download citation
DOI: https://doi.org/10.1007/978-981-99-8661-3_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8660-6
Online ISBN: 978-981-99-8661-3
eBook Packages: EnergyEnergy (R0)