1 Introduction

Rapid growth of Internet and extensive use of mobile communication move the world towards a sophisticated communication system. The area of mobile communications networks reformed the mobile market to several generations of cellular technology. In 1997, the first IEEE 802.11 technology was developed for WLAN.

WLAN standards are updated with the latest standards being IEEE 802.11 ax, 802.11 ay, 802.11 az, 802.11 ba, and so on. The advancement is the change in infrastructure for communication networks that are indispensable for various communication services. With the revise in generations, the technology keeps on changing over the network, which focuses on the transmission of data.

The concept of networking is changing from our traditional communication technology to modern technique like WSN, SDN, IoT, DTN and so on. This type of network posses various issues like lifetime, node performance, load balancing, and so on. This makes the network more complicated. Apart from these complications, there is another important aspect, known as delay. Both wired and wireless networks are suffering from network delay that causes packet re-transmission and network congestion. Traffic congestion is due to improper load balancing, less utilization of resources, and an increase in different services. Traffic are random in nature and sudden delay occurs over the media. Delay are generally classified into four categories: transmission, propagation, queuing, and processing delay. The amount of time taken to bring all of a packet’s bits in the communication channel is the transmission delay. Propagation delay is the time required to send a single bit in the communication link. Some packets are stored in a queue in the switch as the receiver is busy processing other packets. A queuing delay is defined as the time a packet stays in the queue until it gets processed by the processing unit. Processing delay occurs due to the processing of a packet header for route selection. Back-off is another kind of delay, where packets after collision wait in the queue for re-transmission to get rid of further collision.

This paper reviews on queuing delay and various delay model like, Little theorem, M/M/1, M/M/m, M/G/1, etc with examples. Here the effects of different flows, ECN bits, and MPLS are studied. Delay and different models are reviewed here on a different network like SDN, Wireless network, WSN, IoT, Mobile communication, and DTN. Apart from these, delay-based routing, various fields relating to delay are explained here at the end. The overview of delay based on these different fields are represent in Fig. 1. The abbreviations are listed in Table 1.

Fig. 1
figure 1

Overview of delay and other related fields

2 Basics on delays in network

Queuing system are found in forwarding devices of network. The arrival packets are queued inside the buffer for processing. Packets transmitted from one end to the other end are relayed through intermediate nodes or routers. The arrival packets at router faced queuing delay until routed to an outline path which consequence the presence of a queuing system as shown in Fig. 2. The system consists of two terms arrival rate and service rate. The arrival rate is the rate at which packets are incoming towards the device. The service rate is the rate of serving the number of packets per unit of time. There are various forms of selecting packets from a queue like FIFO, LIFO, priority based etc. If the number of packets arriving and departing are counted then their differences will provide the number of packets currently in the system. Some standard queuing model algorithms are explained below,

Table 1 Popular abbreviation
Fig. 2
figure 2

A simple queuing model

Little theorem Little Theorem states, in a constant condition, the number of packets A in the system is similar to the arrival rate of packets \(\lambda \), multiplied by time Q [1], as below.

$$\begin{aligned} A = \lambda \times Q \end{aligned}$$
(1)

Little’s theorem law is useful in various well-known applications for its simplicity and is implemented in the email management system, factory production, tollbooth, etc. Assume a system, where \(\lambda =5000\) packets arrive per second and each packet is required to stay for \(Q=10^{-3}\) second in the system, then the average number of packets waiting in the queue is,

$$\begin{aligned} A = \lambda \times Q = 5000 \times 10^{-3} = 5 \text{ nos. } \end{aligned}$$
(2)

M/M/1 queuing system It is one of the simplest models for measuring queuing delay in a system. The name of the model consists of three terms describing the parameters and structure of the system. First M denotes arrival rate, the second M denotes service rate and the last parameter 1 denotes the number of servers or processing unit in a system. Here a single server is available in this model [2].

Consider a situation where packets coming towards a system are served by a single server at a rate of \(\mu \) per second and the arrival rate of packets is \(\lambda \). The arrival rate and service rate of packets are characterized here as Poisson and Exponential distribution, respectively. The number of packets available in the queue Q and in the system S are used to calculate the average waiting time in queue \(W_q\) and in system \(W_s\), respectively. Q and S are calculated from utilization rate \(\rho \) as below.

$$\begin{aligned} \rho= & {} \frac{\lambda }{\mu } \end{aligned}$$
(3)
$$\begin{aligned} S= & {} \dfrac{\lambda }{\lambda - \mu } = \dfrac{\frac{\lambda }{\mu }}{1 -\frac{\lambda }{\mu }} = \dfrac{\rho }{1 - \rho } \end{aligned}$$
(4)
$$\begin{aligned} Q= & {} \dfrac{\lambda ^2}{\mu (\mu - \lambda )} = \dfrac{\rho ^2}{1 - \rho } = S \times \rho \end{aligned}$$
(5)

Average waiting time of packets in system,

$$\begin{aligned} W_s = \dfrac{S}{\lambda } = \dfrac{\frac{\rho }{(1 - \rho )}}{\lambda } \end{aligned}$$
(6)

Average waiting time of packets in queue,

$$\begin{aligned} W_q = \dfrac{Q}{\lambda } = \dfrac{S \times \rho }{\lambda } \end{aligned}$$
(7)

Take an example of a M/M/1, where arrival rate \(\lambda =100\) pkts/sec and service rate \(\mu = 150\) pkts/sec, then various performance parameters are obtained as below.

$$\begin{aligned} \rho= & {} \dfrac{\lambda }{\mu } = \dfrac{1000}{1500} = 0.67 \end{aligned}$$
(8)
$$\begin{aligned} S= & {} \dfrac{\rho }{1 - \rho } = \dfrac{0.67}{1 - 0.67} = 2.02 \end{aligned}$$
(9)
$$\begin{aligned} Q= & {} S \times \rho = 2.02 \times 0.67 = 1.35 \end{aligned}$$
(10)
$$\begin{aligned} W_s= & {} \dfrac{S}{\lambda } = \dfrac{2.02}{1000} = 2.02 \times 10^{-3} sec \end{aligned}$$
(11)
$$\begin{aligned} W_q= & {} \dfrac{Q}{\lambda } = \dfrac{1.35}{1000} = 1.35 \times 10^{-3} \text{ sec } \end{aligned}$$
(12)

M/M/m queuing system With the introduction of multi-server processor in a queuing system, the impact of delay is also changed. In M/M/m, the parameter m is the number of servers for simultaneous processing of arrival packets. Like M/M/1, packet arrival rate \(\lambda \) and individual server service rate \(\mu \) are considered as Poisson and Exponential distribution respectively.

Take an example of the M/M/m model, in which \(P_0\) i.e, probability of empty packets in the system, \(\rho \), Q, S, \(W_q\) and \(W_s\) are calculated as below.

Steady-state probability of the system,

$$\begin{aligned} P_0 = \left[ \sum _{n=0}^{m-1}\frac{\left( \frac{\lambda }{\mu }\right) ^n}{n!} + \frac{\left( \frac{\lambda }{\mu }\right) ^m}{m!}\times \frac{1}{1-\frac{\lambda }{m\mu }}\right] ^{-1} \end{aligned}$$
(13)

Utilization rate of the system,

$$\begin{aligned} \rho = \frac{\lambda }{m\mu } < 1 \end{aligned}$$
(14)

Using the above two parameters, various queue performance parameters are obtained as below.

The number of packets present in a queue for a particular instance of time is obtained by,

$$\begin{aligned} Q = \frac{\left( \frac{\lambda }{\mu }\right) ^m\rho }{m!(1-\rho )^2}P_0 \end{aligned}$$
(15)

Average waiting time of packets in the queue,

$$\begin{aligned} W_q = \frac{Q}{\lambda }, \end{aligned}$$
(16)

Average waiting time of packets in the system,

$$\begin{aligned} W_s = W_q + \frac{1}{\mu } \end{aligned}$$
(17)

Total number of packets in the system,

$$\begin{aligned} S = L_q + \frac{\lambda }{\mu } \end{aligned}$$
(18)

Take an example of a M/M/m, where \(\lambda =1000\) pkts/sec, service rate \(\mu =1100\) pkts/sec and \(\mathrm{m}=4\), various performance parameters are obtained as below.

$$\begin{aligned} \rho = \frac{\lambda }{m\mu } = \frac{1000}{1100} \times 4 = 0.227 \end{aligned}$$
(19)

Steady state probability,

$$\begin{aligned} P_0 = \left[ \sum _{n=0}^{4-1}\frac{\left( \frac{1000}{1100}\right) ^n}{n!} + \frac{\left( \frac{1000}{1100}\right) ^4}{4!}\times \frac{1}{1-\frac{1000}{4\times 1100}}\right] ^{-1} { = 0.402} \end{aligned}$$
(20)

Number of packets present in the queue,

$$\begin{aligned} Q = \frac{\left( \frac{1000}{1100}\right) ^4\times 0.227}{4!(1-0.227)^2}\times 0.402 = 0.004 \end{aligned}$$
(21)

Average waiting time of packets in the queue,

$$\begin{aligned} W_q = \frac{Q}{\lambda } = \frac{0.004}{1000} = 4\times 10^{-6} \text{ sec } \end{aligned}$$
(22)

Average waiting time of packets in the system,

$$\begin{aligned} W_s = W_q + \frac{1}{\mu } = 4\times 10^{-6} + \frac{1}{1100} = 9\times 10^{-4} \text{ sec } \end{aligned}$$
(23)

Total number of packets present in the system,

$$\begin{aligned} S = L_q + \frac{\lambda }{\mu } = 0.004 + \frac{1000}{1100} = 0.913 \end{aligned}$$
(24)

M/M/\(\infty \) queuing system Assuming a system of an infinite number of servers, thus the processing of packets will be unlimited. This system does not form a queue, as an infinite number of idle servers are available. So, packets in the queue Q is zero. The arrival rate \(\lambda \) and service rate \(\mu \) of the packets in the system is assumed to be Poisson distribution and Exponential distribution respectively. The utilization rate \(\rho \) and the probability of no packets, \(P_0\) in this system is obtained by

$$\begin{aligned} \rho= & {} \frac{\lambda }{\mu } \end{aligned}$$
(25)
$$\begin{aligned} P_0= & {} e^{-\rho } \end{aligned}$$
(26)

Using these two parameters, same as the earlier model, various queue parameters are possible to calculate, like the number of packets available in the queue as well as in the system and the waiting time for packets in the queue as well as in the system.

Number of packets in the system is obtained by,

$$\begin{aligned} S = \rho = \frac{\lambda }{\mu } \end{aligned}$$
(27)

Packets waiting in the queue \(W_q\), as well as in the system \(W_s\) are obtained by,

$$\begin{aligned} W_q = \frac{Q}{\lambda } = 0 \end{aligned}$$
(28)

It is stated that as all arrival packets will be assigned to any idle server in no time, thus \(W_q = 0\).

$$\begin{aligned} W_s =\frac{S}{\lambda } \end{aligned}$$
(29)

Take an example of a M/M/\(\infty \), where \(\lambda = 1000\) pkts/sec, \(\mu = 1100\) pkts/sec and m = \(\infty \), various parameters are obtained as below.

$$\begin{aligned} \rho= & {} \frac{\lambda }{\mu } = \frac{1000}{1100} = 0.909 \end{aligned}$$
(30)
$$\begin{aligned} P_0= & {} e^{-\rho } = e^{-0.909} = 0.402 \end{aligned}$$
(31)

Busy period, B of a system is obtained as,

$$\begin{aligned} B = 1 - P_0 = 1 - 0.402 = 59.80 \% \end{aligned}$$
(32)

Number of packets in the system,

$$\begin{aligned} S = \frac{\lambda }{\mu } = \frac{1000}{1100} = 0.909 \end{aligned}$$
(33)

Average waiting time of packets in the system,

$$\begin{aligned} W_s = \frac{S}{\lambda } = \frac{0.909}{1000} = 9.09 \times 10^{-4} \text{ sec. } \end{aligned}$$
(34)

M/G/1 queuing system M/G/1 queuing system consists of a single server having enough buffer. Arrival packets into the system are considered as Poisson distribution in nature. This system considers arrival rate, service time and inter-arrival time as \(\lambda \), S and I respectively. An inter-arrival rate is the time a packet starts arriving from one system and reached to the other system. It is stated that the service time is considered to be smaller than the inter-arrival time [3]. In a system, a service time distribution is denoted by D(x) and a Probability Distribution Function (PDF) of the service time is denoted by d(x). The probability that an inter-arrival time of packet greater than the service time of a packet is shown as,

$$\begin{aligned} Pr(S< I) = \int _{o}^ \infty Pr\left( S< I\mid S= x \right) d(x)dx = \int _{o}^ \infty e^{-\lambda x}d(x)dx = d^*(\lambda ) \end{aligned}$$
(35)

If the service time distribution, D(x) is shown as,

$$\begin{aligned} D(x) = Pr(S < x) \end{aligned}$$
(36)

And the PDF, d(x) is,

$$\begin{aligned} d(x) = \frac{\partial D(x)}{\partial x} \end{aligned}$$
(37)

Then applying Laplace theorem for both of them is shown as,

$$\begin{aligned} D^*(s)= & {} \int _{o}^ \infty e^{-sx}D(x)dx \end{aligned}$$
(38)
$$\begin{aligned} d^*(s)= & {} \int _{o}^ \infty e^{-sx}d(x)dx \end{aligned}$$
(39)

If the total number of packets served \(W(\tau )\), for an inter-arrival time \(\tau \), then the average number of packets served is shown as,

$$\begin{aligned} E(W(\tau )) = \int _{x=o}^ \infty E(W(\tau )| s=x)d(x)dx \end{aligned}$$
(40)

The Laplace transformation is shown as,

$$\begin{aligned} W^*(s) = \frac{D^*(s)}{1 - sD^*(s)} \end{aligned}$$
(41)

In [4], another explanation is placed based on a queuing model. They considered the M/G/1 queuing model with D-policy. They assumed the same as the system with a single server with unlimited buffer. Here First-Come First-Serve is applied for arrival packets. The service time of this model is expressed as \( E(B)=1/\mu \), where \(\mu \) is the service rate. This system used a strategy named customer utility function. This strategy explained that the arrival packet waiting in a queue if not served within the time limit D, a cost C is applied to the packet. They considered D as exponentially distributed time limit. After the completion of serving a packet, a reward R is applied. The ultimate utility after serving a packet, is calculated as below,

$$\begin{aligned} U = R - E(CX) \end{aligned}$$
(42)

where, U, R, C and E(CX) are utility value of packets, reward, cost and service time respectively.

There are two types of situations following the same strategy during servicing of packets. The first situation arise when a packet start servicing within time limit D or after the expiry of D. The utility value of the packet is calculated as,

$$\begin{aligned} U = R - E(CI) \end{aligned}$$
(43)

Here R is a reward applied to all packets after completion and I is a parameter whose value i.e., 0 or 1 gets changed depending on the expiry of the service time.

Another situation is after the completion of packet servicing. The utility is computed as,

$$\begin{aligned} U = R - E(CP) \end{aligned}$$
(44)

In the above equation, the value of parameter P either 0 or 1 performs the same as I and follows the same pattern (Table 2).

Table 2 Well known basic algorithms managing queuing delay

3 Delay and classification of flows

Apart from protocol based classification, traffic is classified into two categories based on the size of the flow namely, Elephant flow and Mice flow. Elephant flow is characterized as large packet size traffic flow, whereas Mice flow is traffic flow consisting of small size packets. Elephant flow does not care about the time deadline and it requires high throughput. Among the total amount of traffic present in the network, elephant flow is maximum. In Mice flow, the traffic is delay-sensitive and follows the deadline during transmission. Compared to Elephant flow, Mice flow is more preferable for providing services to users. The queue length formed by mice flow is low, achieving low queuing delay. A flow scheduling scheme called Freeway is proposed which separates the link into a delay-sensitive link and high throughput link [5].

Increase in activity of user applications leads to huge data generation. The data center network provides a solution to manage this vast data. This network consists of a large number of servers connected with it. It experiences both elephant flow and mice flow that passes to and from the server. The routing schemes give importance to elephant flows and improve the throughput by increasing the bandwidth, while the routers ignore the delay incurred by mice flow. In [6], authors considered this issue of mice flows and proposed a routing scheme managing both elephant and mice flow. This protocol reduces the delay from mice flow and maintains a proper throughput for the elephant flow.

In [7], authors considered the importance of routing to tackle these two different flow. They proposed label-based forwarding and routing method, where elephant traffic is detected and splits the traffic into mice flows. This helped in providing sensitivity in delay for the mice flow. The authors ensure that mice flows will not be affected by the elephant flow. In [8], the authors focused on the detection and routing of elephant flow. They proposed a routing solution that performed both traffic detection and multipath routing of the traffic. This protocol routes elephant traffic to a low delay path and ensures proper access to resources, making the network performance efficient. An elephant flow detection mechanism on a data center network and its impact on delay is considered in [9]. In data center network, elephant flow cause congestion and delay in a link. To tackle this problem, a suitable elephant flow detection mechanism and scheduling algorithm are proposed. The scheduling algorithm is based on a stable match among various tables of traffic flows and switches. Flow detection is also performed in [10] to manage flows in a network efficiently, otherwise the network buffers get filled up with the flow and leads to queuing delay.

4 Delay and congestion based header fields

4.1 Explicit congestion notification (ECN)

ECN is known for an optional notification of network congestion in TCP/IP protocol suite. It is a two-bits value field in both TCP and IP header that lets the sender to know the status of buffer in the communication path. It is used to determine the variation in transmission speed to overcome queuing delay, packet overflow, and packet re-transmission. The sender reduces the transmission rate to omit the congestion that could happen in the route. ECN is available in TCP header to provide reliability instead of delivering packets on time, although the latency incurred in TCP is not acceptable for some applications. ECN helps in the proper functioning of congestion control mechanism, notification of buffer status, dynamic routing, etc [11].

The ECN position in IP header is shown in Fig. 3 and TCP header in Fig. 4. In the IP header, two bits are used by the ECN field. The meaning of four combinations of two ECN bits are 01—ECN Capable Transport (ECT(1)), 00—Non ECN Capable Transport (Non-ECT), 10—ECN Capable Transport (ECT(0)), and 11—Congestion Encountered (CE). The route first configure ECN in every switch before the transmission to avail the facility. Sender and receiver have option to use either 10 or 01 for ECN capable Transport. If a router experiences congestion by observing the buffer, it will set bit 11 (CE) for marking the notification in the header before packets starts dropping [12]. After receiving the notification of the congestion, the receiver echo back to the sender requesting to slow down the rate of transmission. In the TCP header, two fields of 1 bit each are used to represent the ECN field. These two fields are ECN-echo (ECE) and Congestion Window Reduced (CWR). They are used to acknowledge the packets that are marked as CE and CWR respectively [13].

Fig. 3
figure 3

ECN bit field in IP header

Fig. 4
figure 4

ECN bit field in TCP header

4.2 Multi-protocol label switching (MPLS)

Among many routing techniques MPLS is one of them, in which data are transmitted to relay nodes through labeling. Using MPLS the performance or efficiency of the routing can be increased. The routing protocol plays a significant role for any network performance. In this routing protocol, the performance or efficiency of the routing can be increased using MPLS [14]. MPLS resides in between layer 2 and layer 3 and for transmission of packets, hardware switching is applied. MPLS is used for traffic engineering and packet switching. The basic idea of MPLS is to make the packets routed by using the label assigned by MPLS instead of using those traditional network addresses. This helps to smooth and increase packet switching performance. MPLS also helps the network to provide QoS services. In MPLS, the router used for routing in the network is termed as LSR. For LSR it needs to get designed with the MPLS. LSR is used to check the MPLS header of a packet and then assign a label and pass the data to the next router. The label assigned to a packet is entered in a table called Label Forward Information Base.

IP packets of different sizes routed to the destination through some intermediate node do not support fast data delivery. With the release of MPLS technology, the transmission delay is reduced to some extend. MPLS use LSR for labelling IP packets and create Label Switching Path for faster transfer of data using cut-through mechanism. This also helps in solving traffic engineering problem [15]. Routers involved in MPLS are core routers and the flow of packet is from the ingress port of the core network to the egress port of the core network. Routers consisting of ports get connected to the output link where data are forwarded. In each port, there is one queue where all the packets coming into the port are buffered before going to forward it to the output link [16].

5 Effect of delay on different types of network

5.1 Network delay on SDN

In SDN, data and control planes are decoupled. There is a central controller that controls the flow rules in forwarding device. It gives the concept of operating system in network and programmers are able to write and plug applications into the controller. The performance of SDN is measured by various metrics like delay, throughput, etc. With the increase in the size of the SDN, delay generally increases. To measure and analyze the delay and find the equilibrium in delay and size, Network calculus is proposed [17]. Network calculus focuses on propagation delay and its effects. They computed the relationship between size and delay and determined a suitable network size with affordable delay. Switch consists of a TCAM based lookup table containing all flow rules. In the lookup table, the arrival packets are matched and forwarded to a proper egress port. If no match is found in the flow rule, switch transmits the arrival packet to the controller through packet-in event message. With the increase in the processing time of a controller, it will be a challenging to maintain a low end-to-end delay. A single controller operating large volume of traffic will face long processing time, thus increase in end-to-end delay. Generally different queues are maintained both in controller and in switch. All event-handler messages are buffered at the queue in the controller. Switches maintain self buffer for arrival packets. In [18], authors implemented \(M^X/M/1\) queuing model for arrival packets in switch and M/G/1 queuing model for all packet-in messages in controller. Controller and switch has fixed service rate for which traffic are diverted to different paths for balancing congestion. With the same aim of avoiding congestion, [19] authors proposed an algorithm for relocation of arrival packets at different path after examining the queues. 5G, although a wireless standard with high speed data rate, also suffers from queuing delay. In [20], authors proposed SDN with 5G. TCP packets suffer from expiring timestamps. Mostly, UDP traffics is suppressed by TCP making more queuing delay . Due to this, an algorithm named SDN-based Softwarization is proposed in which the optimal delay path is determined, considering heterogeneous traffic and load intensity. The model performed virtualization, end-to-end delay optimization, and queuing theory in OpenFlow Switch.

Queuing delay occurs at every switch in a path. Summation of all delays occurs at different switches will be the total delay for end-to-end communication. In [21], authors calculated the end-to-end average queuing delay under certain QoS like delay, reliability, low packet drop ratio, and so on. They measure the average queuing delay of a whole path instead of measuring a single node. Queuing delay for the whole path describes the delay for all devices involved in the path. Their model is proposed to control of end-to-end delay for both cases of single and multiflow TCP. In delay estimation, processing and propagation delay are neglected in the model. Real-Time System generates data used for a real-time scenario like an online video conference, live chatting, video streaming, and so on. Real-time data are delay-sensitive and the aim is to reach their destination within a time limit.

In [22], to provide a one-way delay bound guarantee, the authors used Commercial Off The Shelf Hardware. They considered high priority traffic to provide a low delay. The model provides a path to the destination considering delay and bandwidth.

In general, SDN is deployed in LAN till now not suitable for WAN. A single controller is enough for controlling the LAN. When it comes to WAN, the whole network is required to divide into small domains [23]. An individual controller is required for an individual domain that generates controller placement problem for a large network. Multiple controllers are distributed into small domains. In the controller placement problem, the optimal placement of the controller depends on the propagation delay from a switch to the controller. There are many locations for placement of the controller but the suitable location is the one where the path between the controller and switches having minimum propagation delay. In [27], authors divided the WAN into a smaller section and optimized the inter-controller delay. They used different metrices like link delay, load, scalability in terms of several switches. The presence of a multi-controller helps in reducing delay at the controller. In this connection, authors [28] pointed out the benefits of multi-controller in SDN. In a wireless network, packets propagate from the edge node to the base station, by relaying through some intermediate nodes. Intermediate communication causes a delay in packet delivery. In [29], the authors discussed the impact of delay in SDN based wireless network, where edge computing is implemented. They considered the processing of packets at the edge of the network before reaching to the base station. In edge computing, packets are stored, processed, and transferred. They focused on the controller’s response time to the SDN switches. Multi hopping is allowed between controller and switches in large distance in direct communication. Placing multiple controllers at the edge of the network reduces delay caused by multi hopping. They proposed linearization and supermodular function techniques for finding the optimal locations for better handling of delay.

In [24], the authors proposed an controller placement algorithm called K-median or K-center to determine the appropriate position. They claimed that controllers placed in the positions will cover maximum number of switches with certain delay bound. There are some bio-inspired algorithms used to solve controller placement problems. Like K-median, K-shortest path algorithm is also applied in SDN for better forwarding of user flows providing low end-to-end delay. SDN switches use cache to store the flow rules used for forwarding multiple user flows. This forwarding of user flows cause end-to-end delay in SDN. To minimize this delay, authors in [32], has proposed a joint optimization problem where both rule caching and flow forwarding problems are considered. This optimization problem is divided into three sub-problems such as candidate path selection, flow forwarding, and rule caching problems. To solve these sub-problems, Priority-based rule caching algorithm, Lagrangian dual method and K-shortest path algorithm are used. In [25], the author proposed the Salp Swarm Optimization Algorithm for minimizing latency, energy, and maximize reliability. In [26], authors used bio-inspired based swarm optimization for the same placement problem. They used a meta heuristic-based approach in Software Defined Mobile Network. The implementation of SDN in medical application in [30] gives priority to critical data of audio, video, and voice types. They proposed a routing protocol that assures QoS on type of data.

In [31], authors considered small scale data center known as cloudlets that are deployed nearest to mobile nodes to achieve minimum delay. Mobile nodes offload data to their nearest cloudlets instead of servers achieving lesser time and better efficiency. However, proper management and deployment of cloudlets is possible using SDN switches to avoid load imbalance (Table 3).

Table 3 Summarizing on various delay management algorithms in SDN

5.2 Delay on wireless network

Delay in wireless network impacts lifetime, QoS, packet loss, load imbalance, re-transmission, etc. In a multi-hop network, measuring the end-to-end delay is difficult for hidden and exposed terminal problem. In [33], the authors proposed an analytical model to predict media accessed delay in a node in single or multi-hop network. This model determined the distribution function of the node to compute media access delay. They used probabilities of the Markov chain state transition for measuring queuing and servicing delay. In [34], authors used Markov chain model to calculate channel access delay and M/G/1 model for the queuing model. The model is then extended to calculate end-to-end packet delay in multi-hop ad hoc networks. There are some situations where the wireless node transmits multiple packets to various nodes. This is called parallel transmission. This transmission makes maximum usage of network resources. In [35], the authors focused on optimizing parallel end-to-end transmission to minimize the delay. They used a combination of two techniques: transmission scheduling and routing technique. They pointed out, an end-to-end delay is the sum of total time slots required for transmission between two ends. The time slots for end-to-end communication are reduced by having a minimum slots number. For the routing technique, they performed cancellation of interference. The obtained result shows reduction in the overall delay. In [36], authors considered a wireless multi-robot network for delay reduction in a wireless network, where robots are used when nodes perform multi hoping. They focused on the issue of limited bandwidth. They assumed that limited bandwidth resulted in multiple message re-transmission. Delay occurs as the robots buffered their message generated at different time and send the messages at the same time. For the buffering, they assumed the arrival packets are in FIFO and this resulted in the generation of a certain type of flow which caused a delay. Considering these issues they implemented a UDP-based multi-router and the result shows in a reduction of delay.

MANET is a decentralized infrastructure-less wireless network. Nodes are required to flood packets for maintaining connectivity, topology discovery, and update routes periodically. This caused congestion in the network which leads to the occurrence of delay and depletion of the battery. In [37], the authors considered the control of network congestion, power and delay analysis for another form of MANET known as Flying Ad Hoc Network. Interference is another cause for delay. They introduced asynchronous update algorithm, delay scale factor, and primal and dual decomposition techniques in their optimization model. In [38], authors performed delay analysis on MANET. They focused on delay estimation of a flooding routing protocol. They developed an analytical queuing model, M/M/1 that determined the delay. In [39], the authors focused on the impact of delay and throughput in the content-centric wireless network. This type of network is following multi hopping in wireless caching network architecture. User activity is mainly to retrieve data from the node’s cache. The delay depends on the method of caching technique. The proposed algorithm is a joint scheme of caching and transportation schemes. In this scheme, users are allowed to retrieve content from the nearest Euclidean distance node. The caching strategy was further optimized to minimize the average network delay. In [40], authors explained the provision of QoS on a control system for both wired and wireless network. They stated that providing of minimum end-to-end delay and guarantee of reliability, helped in supporting QoS. As reliability is a challenging task for wireless networks, QoS framework was proposed for wired/wireless hybrid networks. This model guarantees QoS to all the devices both in the control system as well as in the wireless network.

A base station in a wireless network is used to provide communication between network and users. In [41], authors considered using a virtual base station and analyzed the impact of delay on network virtualization. The virtual base stations are used for collecting and queuing of data. The number of virtual base stations is the same as that of the users and each virtual base station contains a queue. All these queues will contain the data used for serving the users. In this scenario a cross-layer stochastic optimization algorithm was used for achieving minimum delay. This algorithm maintained optimal queue length and reduced delay. In [42], authors pointed out that the switching of users is dynamic and it causes a delay called reconfiguration delay. Increase in reconfiguration delay decreases the stability of the queuing system, which resulted in the serving of a single node for a long time. They used 1-lookahead and Correlated-Queue length based Biased Max Weight in this issue. The result showed an improvement in delay.

Network coding is an interesting area where encoding and decoding of data are performed during transmission. Packets send in relay and these are either encode or uncode during the transmission. Encoding requires additional time for which in [43] proposed an method delay aware network coding in adversarial traffic. To maintain a balance between transmission cost and delay, they designed a scheduling algorithm which decides the requirement of coding or not for transmission. The performance of the algorithm is compared with the minimum achievable total cost. In [44], packets are regulated in multi-hop wireless network to reduce the delay. They proposed a scheduling algorithm based on data flow with dynamic scheduling and congestion control as well. This algorithm minimizes the end-to-end delay, keeping the throughput ratio high (Table 4).

Table 4 Summarizing on various delay problems in wireless network

5.3 Delay on WSN and IoT

The characterization of delay is an important parameter in WSN. Sensors perform sense, collect, and forward. Sensed data are forwarded to a base station or a gateway. Sometimes dead sensors disturb the performance and increase the delay. In [45], the authors formulated various types of delay analysis like, each hope, random source destination and multiple sources to one sink in WSN. They considered the node density and number of hop count for that route. Sensors have limited energy, storage and processing power. Duty cycle is a method of periodical wake and sleep mode to overcome limited energy. The duty cycle is of two forms namely asynchronous and synchronous mode. In asynchronous, the wake up time of nodes are different and in synchronous mode, nodes with their neighbors wake up simultaneously. In [46], authors taken these duty cycle and explained that the synchronous form helped in identifying child nodes of the same parent. It also let them wake up at the same time. The broadcast packets through flooding by the child nodes is possible to received by the parent at the same time. Their proposed delay-aware synchronous duty-cycle scheduling algorithm helps in reduce the waiting delay for the receiver. In [47], authors focused on the asynchronous part of the duty cycle. They assumed packets are multi-casted in WSN. They claimed that during an asynchronous mode, multi-casting of packets lead to longer transmission delay. Sensors in asynchronous mode could be in sleep state while receiving of packets result in packet dropped and packet re-transmission. This problem is known as Minimum Active time slot Augmentation for Delay-bounded Multi-cast. They proposed various optimal algorithm like heuristic latency bounding algorithm, approximation algorithms and distributed algorithm.

Nodes while in sleep mode can not transmit data and this causes a delay known as sleep delay in duty cycle. Authors in [48], proposed a sleep delay based algorithm named Dynamic Duty Cycle. This algorithm considers nodes in a non-hotspots area, and prolong the active periods of a duty cycle. With this, nodes will awake for a longer time result in reduction of sleep delay. In [49], the authors study on the sensing, processing, and transmission delay in sensors. They estimate the effective parameters in WSN considering these three unavoidable delays and proposed an algorithm named Distributed Incremental LMS. In [50], the authors focused on maintaining a balance between delay bound and power consumption during transmission. They proposed a delay-bound based algorithm minimizing the power consumption and maximizing the network lifetime. They used the stochastic approach, named network calculus for performance analysis. Data dissemination is a term used in WSN, describes the transmitting of signals or any other data to the end users. Fast data dissemination between sensors and actuators is important for WSN, but resource constraint in nodes bring challenges to maintain the delay requirement in a network. In [51], authors proposed an algorithm named epidemic-inspired algorithm that work on this problem. This algorithm is based on epidemic theory that models the dissemination of data in a network. This algorithm has no fixed parameters or predefined algorithms that results in automatic control of node states to reduce delay in the network.

Delay on a route depends on the forwarding node and improper selection of forwarding node hampers QoS. In [52], the selection of one of the fittest forwarding nodes is discussed. They focused on the optimal selection strategy of forwarding node from collections of candidates. The optimal forwarding node maintains to wake upstate to avoid waiting for packets in the queues at different stages. They implemented this strategy by proposing an algorithm named Optimization Duty-Cycle and Size of Forwarding Node Set algorithm and showed better results in the reduction of delay. The clustering technique used in WSN also experience delay. This technique is composed of a cluster head node, that sends data to the sink node collected from their neighboring nodes. In [53], the authors aimed to maintain a trade-off between energy consumption and delay. They pointed out that low reliability leads to data re-transmission and causes delay. They proposed Broadcasting Combined with Multi-NACK/ACK, an algorithm for data gathering. Their algorithm showed efficient delay as well as energy consumption. It is found that for a large network, multiple gateways are required to maintain the service. In [54], authors discussed issue on relating to multi gateways in IoT based WSN. They experienced an increase in delay with the rise in the number of gateways. They proposed an algorithm for a gateway to gateway load balancing algorithm to maintain a certain delay. Data fusion in a wireless sensor network collect data from multiple sources and extract useful and accurate information. Data fusion causes delay overhead and it leads to high power consumption. So to maintain a trade-off between delay and energy, authors in [55], proposed a hybrid delay-aware adaptive clustering. A node with multiple sensors has to process and communicate data for different sensed parameters in the network resulting in high end-to-end delay and high energy consumption. The same situation is depicted in [56], where authors have proposed a multiple tree algorithm to improve the network performance. In this algorithm, a strategy is provided where nodes having equipped with a single sensor are considered to form a tree-based topology. This helps in reducing the end-to-end delay as well as consumption of energy in WSN (Table 5).

Table 5 Summarizing on various delay problems in WSN

IoT is a collection of heterogeneous devices that worked under different architecture. Heterogeneous devices are different from their MAC and physical layer architecture. These devices worked together to perform a certain collaborative task like control, sense, decision, and so on. In [57], authors explained one of the common standard MAC layer hopping techniques, Time Slotted Channel Hopping and the attention has been to reduce the delay in IoT from end-to-end transmission. They proposed an algorithm named Distributed Stratum Scheduling algorithm. This algorithm performed scheduling of transmission for a limited end-to-end delay. In [58], authors explained that link disconnectivity, low radio signal, dynamic topology made delay worse in IoT. They proposed an algorithm, named Adaptive Dynamic THreshold queue management. They introduced nodal delay as a parameter which is defined as a total sum of all delay occurred at every step from arrival to departure. This algorithm provided a limitation on the nodal delay under dynamic conditions. Apart from, in [59], the authors focused on routing technique in FANET. FANET, one kind of IoT, helped in the communication of different IoT devices. They proposed a protocol named adaptive distributed routing protocol to cope with the delay issue in FANET. This protocol routed only local packets among common nodes. They also used the optimization algorithm that minimized the network delay. Lastly, they proposed a dual decomposition that made nodes to transmit the packets using local data and also determined its delay before sending it. MAC layer in IoT uses superframe structure for communicating devices. But for any adaptive traffic, this structure suffers from high bandwidth wastage and high delay. This issue is considered in [60], where authors proposed a new superframe structure, in which the contention slots are reduced and perform fine-tuning of the superframe structure. This results in less delay and better utilization of the link. One important measurement for the delay in a multipath radio channel is the root mean square (RMS) delay spread. This RMS delay spread is considered in [61], where RMS delay is discussed in an IoT scenario. For this discussion, a neural network based model is proposed that characterize the RMS delay (Table 6).

Table 6 Summarizing on various delay problems in IoT

5.4 Delay on mobile communication network

A mobile communication network is a cellular network of macro cell. This macrocells is consists of further small microcells and each micro cells consist of a network named Machine-to-Machine (MM) network. In MM network, devices are called Machine Type Devices (MTD) and a gateway called Machine Type gateway. All data collected in the gateway are transmitted to the base station in each microcell. In [62], the authors focused on the delay incurred by these MTDs for sending packets to the base station. To minimize the delay, they proposed a packet adaptation scheme. This scheme made proper use of the arrival packets, where packets combined to form a large packet or a large packet split into small ones at the gateway. The result obtained by them showed a reasonable reduction of average delay in the network. The infrastructure of a mobile network is not sufficient for the QoS to users. The traffic are random in nature causing congestion in certain areas. Offloading of data from high congestion traffic cellular network to other low congestion traffic network is an alternative solutions to perform load balance. In [63], authors worked on these type of delay for data offloading via the heterogeneous network. In a delay based offloading technique, data are offloaded through WiFi or device to device network or cellular network itself. If data is delay-sensitive, it will be offloaded into a WiFi network else continue to transmit through a cellular network. They used the Markov chain decision process in hybrid offloading and in monotone offloading algorithm. The same approach of offloading is introduced in [64]. They used unlicensed spectrum due to scarcity of licensed bandwidth and the increase in the use of unlicensed spectrum in a hybrid manner caused a delay. Only WiFi-based offloading is not able to withstand the data flow, thus LTE WLAN Aggregation solved this issue. They proposed an algorithm named Delay-Aware LTE WLAN Aggregation for offloading and make a balanced between cost and delay performance in a cellular network. In [65], the authors expressed the effects of a link failure in data offloading. Offloading process sometime pauses to link failure, until the device need to wait for the link recovery. They used the M/M/1 queuing model for tracking of the waiting time of the device. The characteristic of the waiting time followed Exponential Distribution. If the deadline time gets expired, then the offloading process gets to stop, otherwise continue regularly.

4G LTE technology in mobile communications consist of better response time and low latency in compare to 3G. In 4G, the steps required to follow, while sending data to the base station through a random access communication channel are - selection of a base station, allocation, and management of resources for transmission. 4G technology suffers from overhead in delay and for that authors in [66] used 5G. They proposed a random access model in a single microcell in 5G. The obtained result shows optimal number of re-transmission and optimal resource allocation. In 4G network, users demand precise quality of services that guarantees low delay and high throughput. Considering the same aim, authors in [67] proposed a scheduling scheme for finding the bond between delay and throughput, and provides throughput guarantee and delay awareness within the scheme in a 4G network. In [68], authors investigated delay in 5G. They proposed a Multiple-Input-Multiple-Output (MIMO) cascade control system. It is a round trip delay controlling technique and strict control of delay among multiple paths. MIMO is termed to be a bit more advance in controlling delay rather than TCP protocol and AQM.

Sensor nodes based on their type performs different tasks in the mobile wireless network. Mobile nodes move in search of events and then sense data whereas static nodes directly send the sensed data to the sink node. In [69], the authors explained the concept of faster communication with the sink node. For selection of relay node to the sink node having faster response time, Fast Response and Multihop Relay Transmission with Variable Duty Cycle (FRAVD) is proposed, ensuring sufficient energy capacity for transmission. In [70], authors proposed a model in 5G for efficient use of low delay spectrum. Two different algorithms Non-orthogonal Multiple Access (NOMA) and Efficient Capacity (EC) are proposed. NOMA creates a Virtual Wireless Network using a small spectrum of multiple service providers with the formation of network slices. Network slices assures delay constraint through resource allocation. EC method calculates statistical delay depending on type of application and determines economical low latency communication. In [71], the authors explained the requirement of random access channel for low latency. The delay on random access channel is different based on the types of arrivals. Arrival packets are generally considered as poisson or beta distribution. They proposed an algorithm named Admission Control based Traffic Agnostic Delay Constrained Random Access (AC/DC-RA) and used the stochastic delay in the random access channel. An increase in several subscripted users in a mobile wireless network, leading to a large amount of data and energy consumption. Nodes select minimum energy path for communication maintaining battery power constraint. Multi-casting is a well-known technique for sending data to multiple destinations from a single source. Multi-casting also suffering from high power consumption and latency. In [72], authors distinguished application on delay-sensitive data on real-time scenario. Delay sensitive data are given a certain threshold time before delivery. Time-division is a technique where time is divide into equal time slots, is performed in a node before transmission. They proposed DeMEM & ConMap algorithm and a delay constraint is imposed on multi-casting for a certain time slot.

Handoff is a mechanism used in cellular network, where mobile devices move from one network to other. At initialization of handoff, a message named Router Solicitation Message is sent to access router of the new subnet and in response, access router replied a Binding Acknowledgement message. The time to perform the mechanism is called handoff delay. In [73], authors analyzed the effects of signaling cost, delay, and failure rate in handoffs. They considered a mixed Ipv4/IPv6 network, supporting the mobility of the nodes. The incurred delay happens for transmission, servicing, queuing, and propagation. In the IPv4 network, mobile nodes used to suffer from handoff delay named as Triangular Network problem. They considered that if the handoff delay crosses the threshold value then handoff mechanism is failed. Although the result showed that with the increase in the traffic load, the average handoff delay also gets increased. Cloud Radio Access Network is an important component in 5G network, requires strict delay bound in the fronthaul network. Authors in [74], have studied the delay bound of the network having mixed backhaul and fronthaul network. In this study, the network calculus concept that derives the worst cast scenario of delay bound, is accepted. Nodes in VANET suffer from rehealing delay due to link disconnection. For calculating this delay, upstream or downstream delivery of a message from the mobile nodes is considered. Authors in [75], analyzed the delay caused during delivery of downstream messages. A central limit theorem is utilized to derive end-to-end delay distribution by calculating the rehealing delay in the network (Table 7).

Table 7 Summarizing on various delay management algorithms in mobile communication network

5.5 Delay tolerant network

A wireless network provides both infrastructure and infrastructure-less networks. In an infrastructure network, there consists of an access point to which all the nodes are connected and exchange messages. In an infrastructure-less network, communication is done directly without any access-point. MANET and DTN both come under infrastructure-less networks. Though both are infrastructure-less they are not similar according to their properties. Unlike MANET, in DTN routing of data through an end-to-end path is not possible. DTN is characterized as a network of unreliable links, bit error rate, and having high or variable delays. In DTN, due to the mobility of nodes, an end-to-end connection is hard to maintain, for which store, carry and forward strategies are applied. When a forwarding packet is received by a node, it keeps the packet in its buffer until a relay node is capable enough to relay packets to the destination. There are various protocols used in DTN for communication such as Epidemic, PRoPHET, MaxProp, and so on. Among them, a bundle is one of the currently used protocols in DTN for communication. In [76], the mechanism of bundle protocol in the application layer is stated. In bundle protocol, data are grouped into blocks and these blocks are transmitted to other nodes. In DTN, due to the storing of the packet in a buffer, the delay occurred. Nodes might not able to find the proper relay node for communication. In [77], authors stated that bundle protocol provides the overlay of storing and forwarding network communication. This protocol helps in coping up with the routing challenges, such as improper end-to-end path, high mobility, dynamic network topology, and so on.

The routing of data in DTN is different from the protocols applied in MANET. In DTN, due to its high mobility, there is no surety of a proper path from source to destination. Some of the routing protocols are Prophet, MaxProp, and Spray and Focus. Routing protocols follow the store and forward mechanism. In [78], the authors explained the techniques of the store and forward mechanism. Devices have their communication range and any device founded in their communication range can exchange messages. All devices are identified by a unique Id, called Endpoint Identifiers (EID). Generally, the same packets are copied and forwarded to the next hops until reaching the destination. During transmitting a packet, the source node copied the packet multiple times and then transmit. It helps in increasing the possibility of delivery of packets to the destination and increases the chance for delivery at the destination. Routing protocols are categorized into two types- direct delivery and epidemic delivery. In [79], authors explained that, in direct delivery, data transmission occurs when both source and destination are within their communication range, that creates delay. Whereas in epidemic routing, it is not required to wait for the destination and source to be in their communication range. In epidemic routing, the received data is forwarded to other devices and from there data are further forwarded to other devices that are near to the received device. This process continues until reaches the destination. The successful rate of delivery in the epidemic is higher than direct routing.

For routing in DTN, the position of nodes is required. In DTN, for high mobility, it is hard to determine their locations every time. However, mobile nodes follow a particular pattern of their movement. The chances of visiting the same location are generally higher making possible to predict the movement of nodes. This type of routing is called probabilistic routing, that used proximity for data forwarding. In [80], the authors explained probabilistic routing in which the forwarding is performed based on the proximity of destination or relay nodes. Proximity value ranges from [0,1] is defined as the probability of delivering a packet to a given destination. After encountering a device, the computed proximity value of these nodes is exchanged and a threshold value is used to determine to act as a forwarding node. Probabilistic routing failed to balance the number of forwarding nodes and results in increase in required resources. In [81], authors proposed spray routing to make proper balancing of resources. There are two protocols named, Spray & wait and Spray & focus, that come under spray routing. In Spray & focus, the sender makes a copy of the same packets and then spraying/sending to its neighbor. The neighbors only forward the received packets to another relay node. Here only coping of a packet is performed at source once at a time for which it is called a single copy utility scheme. Each node calculates the timer value which is defined as the time elapsed between connection and disconnection of each other. Higher timer value makes more priority in selection as a forwarding node. Suppose a node X wants to deliver a packet to a destination Y. Node X will forward the copied message to a node say P, if and only if,

$$\begin{aligned} Z_P(Y) > Z_X(Y) \end{aligned}$$
(45)

Here \(Z_P(Y)\) and \(Z_X(Y)\) determine the timer value of nodes P and X respectively. Further forwarding of packets to intermediate nodes is performed based on the same condition until reaching the destination. Routing is again categorized into pure opportunistic and social based. In [82, 83] and [84], authors explained these two categories of routing. In social-based routing, the relay agent is selected using various performances whereas opportunistic routing protocol selects a relay node under a strategy called metric basis selection.

6 Delay in routing protocol

A sink node is used to collect data from sensor nodes and acts as a base station. The sink nodes has limited battery capacity suffers from quick depletion of energy. Mobile sink reduces consumption of energy as nodes located near the sink keeps changing. But sink mobility leads to delay with increase in collisions. In [85], authors proposed an energy-efficient routing algorithm named nested routing for dealing with delay in movable sink. Two kinds of nodes, normal sensor nodes and router nodes are taken in this network. The router nodes form a ring and inside these rings they keep track of sink position. Any change in sink location are updated to the routers. When sensors need to send data, they obtained the updated location from the router. Different algorithms use different techniques for path selection under certain conditions. Among them random selection technique provides good throughput. In [86], authors proposed a random selection algorithm for path selection in DTN. They showed that DTN suffers packet loss in multi-copy routing technique. Proper buffer management is incorporated in the algorithm to increase the throughput. A router in the center of the mesh network is connected through a half-duplex link with sensors, thus making slow performance in both side transmission simultaneously. Two issues affecting performance are superframe length and transmission slot-order. Reordering of transmission slot and minimizing the number of slots for reduction of superframe length, help in improving network performance. In [87], authors proposed two algorithms named JRS-Multi-DEC and RS-BIP, which performed the same techniques, i.e, superframe length reduction and transmission scheduling. JRS-Multi-DEC performed routing of low superframe length and RS-BIP is used for transmission scheduling. These algorithms provided low end-to-end delay in comparison to the other algorithms e.g., NJR and JRS-Shortest algorithms. In an IoT based WMN, during routing and load balancing, the network experiences delay due to interference which is known as interference delay. An efficient routing algorithm is required to prevent the network from interference due to congestion. In [88], the authors proposed a routing algorithm based on clusters which employed the same concept of removing congestion. Their algorithm reduces the redundant control messages, prevented the interference delay, and provides load balancing. Load balancing help in selecting a path for reduce delay and increase throughput with minimum level of interferences.

Real-time data are delay-sensitive and networks consisting of such data provides some specific QoS such as low delay and low energy consumption. Multipath routing is a perfect technique to reduce end-to-end network delays but maintaining multiple paths in routing due to dynamicity. In [89], the authors proposed a QoS-oriented Multipath Multimedia Transmission Planning algorithm for managing multiple paths while keeping a balance between delay and energy usage. In [90], authors also considered the same issue of delay occurrence in multipath routing. They applied the genetic algorithm to solve the multipath routing. This algorithm forms a multicast tree of minimal cost to connect from source to destination. A network performance gets affected either by node or link failure. Especially if a link carrying packets of multiple destination fails, it causes delay, packet loss, and congestion. To avoid this, allocation of delay-constrained primary multicast tree and protecting p-cycle is needed. This problem is termed as Delay-Constrained Survivable Multicast Routing Problem [91]. To solve this problem, they proposed - Delay Constrained Disjoint-Paths Protection, Delay Constrained Link-disjoint Tree Protection, and Delay Constrained Span p-Cycle Protection, by taking delay constraint into consideration. These methods were used to obtain some backup resources satisfying the delay constraint and the recovered multicast tree.

Routing an arrival packet to any of the neighboring nodes based on their capabilities or strength is termed as anycast routing. In [92], the authors focused on the delay in a WDM network with anycast routing. WDM provides efficient bandwidth and effectively deals with link failures. Transmission delay occurs while switching from one wavelength to another. To reduce this delay they proposed a delay aware anycast routing called Delay-Constrained Wavelength Conversions Anycast Routing. This algorithm finds the minimal number of wavelength conversions while the sum of delay is kept under a delay constraint. Sensors in WSN has a limitation in energy, coverage area, and processing power, that cause constraint in deployment. For communication, sensors use radio links, whose strength varies frequently. This link creates a great impact on the delay in communication. To cope up with this delay, authors in [93], introduced a routing technique named Predicted Remaining Deliveries and according to them, routing protocols control the whole network for efficient performance. Expected Transmission Count (ETX) and Expected Transmission Time (ETT) are two metrics used for analyzing the link quality and end-to-end delay of a routing path. Both ETX and ETT helped in managing energy consumption by sensors and also to reduce delay in a path. In [94], the authors proposed a Sojourn-Time-Backlog (STB) as a metric for delay. They used Back Pressure (BP) routing algorithm to improved the throughput by stabling the queues. BP needs time to accumulate enough backlogs to form a stable queue, which increases the delay. Moreover, BP gives priority to the queue of a network based on its length. Thus a huge delay is created due to low priority queue waiting for an undefined time. They deigned STB-based back-pressure routing algorithm which improves delay while maintaining optimal throughput. In a wireless network, synchronization is required for receiving real-time data from nodes. Synchronization makes the balance in delay and delays variance during transmission. For a multicast routing, a multicast tree is formed where delay and delays variance is bounded. The forming of such a tree is termed as a Steiner tree problem. In [95], authors focused on synchronization and impact on delay. They considered the Ant Colony Optimization Problem that used the limitation of delay variance as the constraint value. The result showed better convergence of delay of a network (Table 8).

Table 8 Summarizing on delay management algorithms in a routing algorithms

7 Other areas

Internet Protocol (IP), a mechanism for packet traversal among networks, is the backbone of the TCP/IP protocol suite. IP packet has a particular header format depending on two different versions IPv4 and IPv6. There are 14 and 8 numbers of fields in IPv4 and IPv6 respectively. In [96, 97], authors explained these IP header format in detail. The size of the header Ipv4 packet is varying from 20 to 60 bytes whereas IPv6 is fixed to 40 bytes. Although, the size of the packet is varying there is a limitation on the size of each packet, known as Maximum Transmission Unit (MTU). A larger packet will fragment into smaller size according to MTU. In [98], the authors conducted a case study of IPv6 and showed that, under the Autonomous System (AS) level path, the delay and loss rate of packets is similar to IPv4.

Packet delay does not affect the packet loss, but large packet delay creates fluctuations in the packet loss rate. They also compared the effect of reordering of packets, and found that IPv6 has a lower reordering rate. In [99], authors expressed the same, trade-off on delay and link utilization. They observed that in a responsive TCP traffic flow if an unresponsive UDP flow exists, then the trade-off will not be satisfied. They proposed, Minstrel PIE, an extension of the Proportional Integral controller Enhanced (PIE) algorithm, that improved trade-off using reference queue delay.

In a large scale network, routing is challenging for inconsistency in link connectivity. A network must have an alternate way to recover from any link failure. For updating of topology, routing protocol floods various packets in the network. Flooding of packets cause delay known as convergence delay. In [100], the authors focused on convergence delay and its effect in dynamic routing protocols. For analyzing this delay, various dynamic routing protocols are considered. The routing protocols considered are Interior Gateway Routing Protocol (IGRP), Open Shortest Path First (OSPF), Enhanced Interior Gateway Routing Protocol (EIGRP), Intermediate System to Intermediate System (IS-IS), Routing Information Protocol (RIP), and so on. They compared EIGRP and OSPF based on the delay in convergence. The obtained result shows that EIGRP performs better than OSPF.

IP packets require certain QoS, e.g delay, for proper servicing of an application. This IP flow is either unidirectional or bi-directional. In bi-directional flow, the QoS value in one direction is not the same as the opposite direction. The delay formed in a particular single direction is named a one-way delay. In [101], authors discussed this one-way delay incurred in the network. One-way delay is measured as the difference between the departure time in source and arrival time in destination. Each packet is assigned a timestamp at the source node and another timestamp as it reach the destination node. The difference between these timestamps provides the one-way delay.

In [102], authors explained that a cognitive network consists of two types of user - primary and secondary. They focused on an increase in throughput of secondary users, while the delay criteria of a primary user is satisfied. The authors used Markov fluid queue model and proposed Adaptive Queue Management Policies (QMP) algorithm. The performance of this algorithm depends on the channel status and the intensity of primary user traffic. They compared and showed that their algorithm outperforms the hybrid interweave/overlay model.

In [103], a mathematical framework is proposed to deal with the heterogeneous flow of traffic in a network. These flows are from a single source to multiple sinks. Here network flows are modeled as flow dynamics with a solution. They considered flow as Lagrangian flow. The flows are grouped into some categories, like FIFO. The model expresses the delay incurred at each junction of the network. This model is numerically derived by using Feed Forward algorithm. They assured that the model is well manned that can serve as a solution to figure out the dynamic system problem.

In [104], a study is performed on Observation of a system having a single server queue. Based on the observation, like queue length and server’s status, a decision can be taken by the customer, whether to join the system or balk it. But there are different observable cases of a queue length, such as partially and fully observable cases, where customers unaware of the queue length. Those cases are being studied and are derived using Equilibrium mixed strategies. They considered an M/M/1 queuing model in a random environment. Some examples are given to understand in this case study paper.

In [105], on a discrete-time multi-server queuing system, the authors analyzed the delay characteristics of customers. The queuing system has an infinite-capacity. The service time is a constant one with multiple time slots for the arrival packets. They first obtained the relationship between the service content and the Probability Generating Function (PGF) of packet delay and after that, the PGF of the customer delay is derived. Along with this, there are various parameters such as mean value, delay variance, delay distribution are also derived. They provided some numerical examples based on this.

In [106], authors studied traffic priority in the queue. They pointed out that, the high priority traffic space allocated in a router is small, which prevents the traffic to access the output link, forming starvation. The paper showed a model, where it focuses on extracting only the high priority traffic and also introduce a queue management algorithms framework for determining the packet loss and analysis of the impact on the system by the high priority traffic (Table 9).

Table 9 Summarizing on delay management algorithms in other areas

8 Discussion and conclusion

Among various types of delay, effects of queuing delay is higher in a network. Proper management of delay is possible with proper management of underutilized network resources. From this survey, it is clear that on managing different delay problems in the network various delay management algorithms are required, such as Little theorem, M/M/1, M/M/m, M/G/1, \(M/M/\infty \). These algorithms are based on Markov chain models and analysed the characteristics of traffic flow and service rate. The arrival rate and service rate of the packets are considered to be Poisson and Exponential distribution respectively. Delay affecting various network performances vary based on the network environment. For example, based on the types of traffic flow the amount of delay imposed is different. Flows having large packet size are said to have high packet delay and vice versa. Elephant and mice are two types of flow that comes under this issue. In elephant flow packets size are large whereas in mice flow the size is small. Packets in mice flow are delay sensitive in nature, and it achieves low queuing delay. In this survey, various network environments are considered, where the delay is analyzed. Here one of the recent advanced technologies, SDN is also considered. In SDN, the delay experienced by the switch and controller is taken into account. Estimating and calculating packet delay is an important task in SDN. It is clear that while transmitting of control messages among the switches and the controllers, there will be both queuing delay and propagation delay experienced by both of them. To avoid this congestion packets reallocation are performed in the queue, where packets are diverted to other paths having less congestion. Moreover deployment of 5G in SDN also suffers from network delay. Apart from that controller placement problem in SDN comes up as a new technique to obtain optimal communication delay between switches and controllers. SDN in a large scale network is difficult to deploy, and so there is a requirement of multiple small SDN domains covered by controllers for better network performance. Placement of controllers in an optimal position is necessary, as this will impact on the propagation delay between controller and switch in an SDN. This placement problem comes under distributed control environment in SDN. Delays imposed in a wireless network are mainly due to channel access, packet re-transmission, less buffer size, unorganized scheduling of transmission, and so on. Our focus is on the queuing delay, where different Markov chain based models such as M/M/1 and M/G/1 queuing model are used to manage the delay. Like wireless networks, in WSN there is an issue of having a low radio link channel, causing the dropping of packets and several re-transmission. WSN is also characterized by high mobile nodes, causing frequent link disconnection, which leads to delay. Routing is another way of dealing with network delay. Improper routing causes traffic congestion, high energy consumption in sink nodes, packet drops, and high interference. Apart from that, the effects of a delay from other networks such as IoT, mobile communication network, DTN are discussed here.

Next generation network like 5G, SDN, IoT are the future fields of research work. In the 5G network, there is zero delay satisfying services to the providers. The communication is a device to device communication that helps in achieving low delay in a cellular network. Analyzing delay in a device to device communication can be a good research work in the future. With the introduction of 5G network there will be uncontrolled traffic, affecting the waiting time on a queue in a network, as traffic heterogeneity rate will increase. Heterogeneity in traffic makes TCP suffers from expiry of time stamp and UDP get suppressed by TCP causing delay in the queue. Considering this issue, it can be another good objective to be focused on. Another important area is DTN, which has importance in the area of satellite communication. Long delay and high packet loss are the characteristics in satellite communication and the solution for these comes from DTN. Delay analysis in a distributed control environment in SDN, delay analysis in an M2M communication in IoT, and traffic engineering in SDN are some other research fields in networking delay (Table 10).

Table 10 Selective research papers and their equations