1 Introduction

Ever-increasing demands for high-speed data communication coupled with the scarcity of radio resources and adverse nature of the wireless medium have brought great difficulties to improve users’ QoS while making efficient utilization of available radio resources [3]. In order to overcome these difficulties, Orthogonal Frequency Division Multiple Access (OFDMA) is adopted for LTE downlink radio access. However, despite its many advantages, a number of research challenges remain to be tackled. One of the most important challenges is having a decent packet scheduler that will manage the assignment of users to available resources.

The 3GPP standard leaves open the design and implementation of radio resource scheduling. Consequently, several attempts have been made to design the packet scheduler by considering different requirements of the network [14, 29, 33, 42]. Conventional wireless systems employ the Open Systems Interconnect (OSI) layered design, and the design of each individual layer does not consider the parameters of other layers. However, this kind of architecture does not provide adequate support to face the growing demand for multimedia applications as well as rapid variation of channel state. The reason for that is the low layers of protocol stack (i.e., physical, MAC and network layers) are optimized without explicitly considering the specific characteristics of multimedia applications. More precisely, the available techniques at the application layer such as streaming algorithm and multimedia compression do not consider the mechanisms provided by the lower layers for resource management, scheduling and error recovery. To overcome such limitations, Cross Layer Design (CLD) has been introduced to make efficient utilization of network resources by optimizing across the boundaries of strict layering principles. In the other word, CLD presents a flexible architecture which enables the interaction between different layers of protocol stack. Many studies confirm that CLD can improve users satisfaction [6, 8, 15, 25, 26, 31, 40, 41]. In [18], CLD jointly optimizes the application, MAC, and physical layer by introducing resource allocation based on MOS prediction. In [11], a CLD approach, including an adaptive radio link buffer, congestion control by adapting H.264/SVC. The status of physical layer and data link layer taken into consideration in [22] by adopting SVC video coding.

A considerable amount of literature has been published on LTE scheduling was mainly concerned the QoS metrics (i.e., throughout, packet loss and delay, etc.), such as the ones proposed in [7, 12, 21, 22, 32, 34]. However, the performances of such a schedulers are unclear in terms of end-user perceived quality. This means that guaranteeing QoS requirements of RT application is not necessarily leading to a high user satisfaction, mainly because of the additional constraints imposed by RT traffic characteristics. Moreover, the demand of services have been shifted toward multimedia applications recently, consequently; the objectives of scheduling strategies are also changed from improving QoS to Quality of Experience (QoE). Indeed, QoE is the assessment of the level of customer satisfaction to the provided service which can be measured in terms of Mean Opinion Score (MOS) [16]. The MOS method provides a common numerical measure of the user-perceived quality of video applications. Furthermore, these scheduling strategies are usually designed to support a specific type of application. However, in reality, the radio resources are shared among the users with different type of applications. In this context, designing a packet scheduler which could jointly consider RT and nRT is crucial for heterogeneous nature of wireless traffics.

The remainder of this paper is structured as follows. Section 3 describes downlink LTE system model. Section 4 gives a detailed description of our proposed method. Simulation setup and results are then discussed in Sect. 5, and the paper is concluded in Sect. 6.

2 Related Works

Since the traditional wireless systems employ the Open Systems Interconnect (OSI) layered design, and the design of each individual layer does not consider the constraints of other layers, this kind of architecture does not provide adequate support to face the growing demand for RT applications, especially video application. Unlike the nRT services, video applications are highly sensitive to delay and transmission rate. Therefore, it is envisaged that the QoS and resource utilization efficiency can be improved by taking constraints of different layers into consideration. This approach is known as CLD. The CLD presents a flexible architecture which enables the interaction between different layers of protocol stack. In subsequent subsections, we discuss different CLD with their characteristics.

A wealth of studies has focused on cross layer packet scheduling for LTE network in recent years. A recent study by [22] proposed a CLD for multimedia application to improve the performance of LTE packet scheduler in terms of throughput and fairness. For this purpose, The CLD takes CQI, queue length and bit error rate into considerations. In another study, [31] proposed an efficient video streaming over LTE network. In this context, they developed a cross layer signaling between MAC and Real Time Transport (RTP) protocol to obtain the channel dependent adaption in the video server. In order to employ frequency diversity of the channel, MAC scheduling based on channel quality is used at the MAC layer. The author of [15] developed a cross layer scheme, in which the scheduler is aware of the channel rate and queue state at physical and MAC layer respectively. The goal of this design was to maximize UEs’ QoS and achieve a certain level of fairness. A QoS-guaranteed cross-layer scheduling scheme is proposed in [38]. The proposed scheduler employs MAC and physical layer to ensure the QoS while maintaining the throughput and fairness performances among users. Similar as the scheme in [38], the authors in [25] proposed cross layer scheduler based on the joint optimization of MAC and physical layer aimed at maximizing video quality by reducing video distortion at the application layer.

The aforementioned strategies are driven by QoS parameters without consideration of subjective perceived quality. Considering the fact that maintaining and improving user satisfaction becomes very important to network providers; therefore, QoE-driven schedulers has received significant attention in both academic and industry [2, 5, 18, 20, 28, 39, 40]. In [18], CLD jointly optimizes the application, MAC, and physical layer by introducing resource allocation based on Mean Opinion Score (MOS) prediction. The scheduler uses the MOS value and remaining playback to make the allocation decisions. In [39] developed an optimization framework by employing parameterized models of application and link layer. In [40], a QoE driven cross layer optimization was proposed in which, application and lower layers of protocol stack were jointly optimized. The key objective of this research is to minimize the temporal change of the video quality. In [20] proposed cross layer approach using MOS as an application layer performance metric. In this scheme, the application, MAC and physical layer interact in order to maximize QoE. An experimental survey carried out in [5] aimed at incorporating the subjective human perception into the cross-layer design. In this study, the sensitivity of the users to the service response time for the web browsing application and data rate was used to estimate the MOS.

Although most of the works adapted CLD as a framework, however, the majority of these researches mainly focused on RT traffics without taking nRT services into account. Another weakness is that most of the QoE-driven CLD schedulers do not consider the characteristics of multimedia applications (i.e., video frame type, quantization parameters). Moreover, some studies conducted a survey to evaluate user satisfaction by human feedback about a particular services. This technique is believed to be a time-consuming, very expensive and not enable online-monitoring.

3 System Model

In this paper, the emphasis is placed on the downlink transmission of LTE system, in which eNB serves heterogeneous type of traffics including RT and nRT services in a multi-cell scenarios. Figure 1 provides an insight into the scenarios. Here, it was assumed that the serving eNB buffers received video streams from the media server (pre-coded video sequences) through a lossless and high bandwidth backbone network. The H.264/AVC encoder was used to encode the video sequences generated by the media server. As stated earlier, the H.264/AVC encoder used a motion-compensation technique with single reference frame. Each frame can be partitioned into one or more slices, where each slice header acts as a resynchronization marker [45]. For video packet transmission over the network, the slices are packetized into the video packet with fixed length.

Fig. 1
figure 1

Multi-cell scenarios

At the application layer, generally, the quality of video is measured in the form of the Peak Signal-to-Noise Ratio (PSNR), which can be characterized as follows:

$$PSNR = 10\times log10\frac{255^2}{MSE}$$
(1)

where Mean Square Error (MSE) can specify the distortion of video sequences in the form of the cumulative square between compressed and original image. This is discussed in more details in Sect. 3.1. In order to evaluate the user-perceived video quality, the PSNR value can be translated into MOS value, which is adapted to present the level of user satisfaction in the form of five-scale mapping starting from poor to excellent video quality (Table 1) [18]. In addition to MOS model, the frame priority scheme is also taken into consideration to guarantee important frames are given higher priority to be sent out (more detailed discussion is provided in Sect. 3.2).

Table 1 PSNR to MOS conversion

At the MAC layer, the delay requirement for user i can be expressed by:

$$P_r\{D_r > D_t\} \leqslant \delta _i$$
(2)

where \(D_r\) is the current delay introduced by the HoL packet delay and \(D_t\) denotes the delay threshold for the user i. Moreover, \(\delta _i\) is the maximum probability of exceeding packet delay. Therefore, the delay function in the timeslot t is defined as:

$$\psi (i)= \frac{-(log~\delta _i)D_r(t)}{ D_t \overline{r_i (t)}}$$
(3)

where \(\overline{r_i(t)}\) is the average channel rate of user i in the corresponding timeslot t which can be calculated as:

$$\overline{r_i(t)} = (1-\beta )\overline{r_i(t-1)}+\beta r_i(t-1)$$
(4)

At the physical layer, OFDMA system using flat Rayleigh ergodic fading channel model was employed. It was assumed that eNB holds the perfect channel information and error-free channel estimation. According to [27], the instantaneous Signal-to-Noise Ratio (SNR) of user i on the subchannel j can be defined as:

$$SNR_{i,j} =\frac{\rho _{i,j} | \varPhi _{i,j}|^2}{\eta ^2}$$
(5)

where \(\rho _{i,j}\) is the assigned power to user i and subchannel j, and the \(\varPhi _{i,j}\) is the channel gain which is evaluated by channel estimation. The \(\eta ^2\) is Additive White Gaussian Noise (AWGN) which can be characterized as follows:

$$\eta ^2 =\frac{N_p \times B}{N}$$
(6)

where \(N_P\) denotes the noise power spectral density. B and N denotes bandwidth and corresponding subchannel respectively.

Finally, the achievable instantaneous data rate of user i on subcarrier j is expresses as:

$${\mathfrak {R}}_{i,j} =\frac{B}{N}~~log_2\left( 1+\frac{SNR_{i,j}}{\varphi }\right)$$
(7)

where \(\varphi = -ln(5BER/1.5)\) and BER is the target bit error rate of the AMC.

3.1 Quality of Experience (QoE) Evaluation Model

In order to estimate video distortion at the application layer, PSNR which is a widely used metric of objective measurement for video quality is employed. PSNR is a logarithmic form of Mean Square Error (MSE) [46]. Basically, for the objective of video quality assessment, the MSE can be defined between two monochromatic images, where one image is considered to be an approximation of the other. The MSE can be described as the mean of the square of the differences in the pixel values between the corresponding pixels of the two images. In addition, the MSE can specify distortion of video sequences in the form of the cumulative square between compressed and original image. The MSE could be obtained through calculating \(V_{dis}\), which is the sum of expected \(D_{ecd}\) and \(D_{loss}\),

$$V_{dis} = D_{ecd} + D_{loss}$$
(8)

where \(D_{ecd}\) is quantization error introduced at the encoder and the \(D_{loss}\) is packet loss either caused by transmission errors or due to late arrivals. In fact, because of the video compression, \(D_{ecd}\) is scattered through the encoded frames that can be formulated as convex function of the encoding rate [43]. Consequently, \(D_{ecd}\) can be characterized as follows:

$$D_{ecd} =\frac{\varTheta }{\varGamma - \varGamma _0} + \beta$$
(9)

where \(\varGamma\) represents the output rate of the video encoder. \(\varTheta\), \(\varGamma _0\) and \(\beta\) are the parameters of the distortion model which depend on the encoding structure as well as on the encoded video sequence [37]. In contrast, \(D_{loss}\) is defined by the relationship between packet loss rate (\(P_{loss}\)) and decoded video distortion (\(D_{ecd}\)) in a wireless channel. To be more precise, \(D_{loss}\) can be modeled by a linear function as follows:

$$D_{loss} = \sigma P_{loss}$$
(10)

where \(\sigma\) is assumed to be independent of the rate of video encoder and depends on parameters related to the compressed video sequence. These parameters include, the effectiveness of error concealment of the encoder and the proportion of intra-coded Macro Blocks (MBs).

Therefore, the video distortion can be estimated through,

$$V_{dis} = D_{ecd} + D_{loss} = \frac{\varTheta }{\varGamma - \varGamma _0} + \beta + \sigma . P_{loss}$$
(11)

Finally, PSNR can be expressed as:

$$PSNR = 10\times log10{\frac{255^2}{\frac{\varTheta }{\varGamma - \varGamma _0} + \beta + \sigma . P_{loss}}}$$
(12)

3.2 Frame Priority Marking Scheme

As mentioned earlier, at the application layer, along with MOS value, frame priority, which can present the contribution of each video packet on the quality of perceived video, was taken into consideration. Considering the fact that I-frame is a key frame, mainly because it contains the complete information for an image. Therefore, missing I-frame may lead to more degradation in the perceptual quality rather than B-frame and P-frame. For instance, Fig. 2 depicts an example of equal packet loss in different video frame types. It is obvious that packet loss corresponding to I-frame causes more severe degradation. Consequently, it is of vital importance that the scheduler gives higher priority to those packets that have more impact towards the video quality.

Fig. 2
figure 2

Equal packet loss in different video frame type. a I-frame packet loss, b P-frame packet loss, c B-frame packet loss

Determining the importance of each frame in contrast with others is still an open challenge. Many studies investigate this matter by analyzing the distortion [23, 24, 37]. However, these algorithms consider the number of skipped Macro Blocks or intra-coded MBs and the statistics of motion vector, in which the compressed video data should be decoded. Unfortunately, these methods cannot be applied for low power and low cost network equipment since they are computationally too expensive. As a consequence, it is envisaged to deploy a less complex priority strategy, which is presented in [44]. They assumed that one video frame is encapsulated in several data packets with the same priority.

As shown in Fig. 3, the Real-time Transport Protocol (RTP) was employed where each video packet has an extended RTP header to record its priority (i.e,. I-frame > P-frame > B-frame). Precisely, an extension mechanism is provided to allow individual implementations to experiment with new payload-format-independent functions that require additional information to be carried in the RTP data packet header [36].

Fig. 3
figure 3

Extension of RTP header

3.3 Video Traces for Network Performance

There are different methods to characterize encoded video for research community; video bit stream, video traffic traces, and video traffic model. Each of them exhibits different drawbacks and benefits. As far as complexity, capacity and copyright are concerned, video traces could solve the limitation introduced in the aforementioned methods. The video traffic traces file give the number of bits used for the encoding of the individual video frames. Precisely, the structure of video traces consists of information in an ASCII file with one line per frame. This information includes frame index, frame type, time stamp and frame size. Table 2 gives the first ten lines of the trace file of foreman video encoding with a target bitrate of 440 kbit/s [13].

Table 2 H.264 trace file sample

4 The Proposed QoE-Oriented Cross Layer Scheduling

In this section, first, the architecture of proposed cross layer scheduler is introduced. It is followed by elaborating the different OSI layers that interact with each other to form the cross layer scheduler.

4.1 Cross Layer Architecture

In this study, a cross layer scheduler which could jointly optimize the application, MAC and the physical layer of the protocol stack is proposed. The objective of such a design is to maximize user-perceived video quality and maintain a degree of fairness among users. As shown in Fig. 4 our framework consists of different modules, including video application, Cross Layer Resource Allocator (CLRA), scheduler and transmitter.

In this framework, when a user requests a specific application, the respective data packets traverse through backbone network and arrive at the users’ buffer in the eNB. Particularly, for video streaming, when a user requests a video application, the media server transmits the video frames to eNB through the backbone network. The video packets will be arrived at the user buffer in the eNB to obtain scheduling opportunity. Before that the video application module would calculate the distortion of incoming video sequence \(V_d\) at the application layer. Afterward, channel distortion \(V_{ch}\) is measured in the form of CQI feedback from UE at the physical layer. Using the value of \(V_d\) and \(V_{ch}\), the CLRA module would evaluate the corresponding MOS value. In addition to MOS value, a video frame priority was considered in the way that the priority of each frame is determined based on a contribution of each video packet on the quality of perceived video. Finally, depending on the MOS value \(MOS_i\), Head of Line (HoL) delay of the video packet in the user’s buffer \(D_r(t)\) and frame priority weight \(F_{px}\), the allocation decision is made by the CLRA module.

Upon the accomplishment of the aforementioned tasks, the feedback procedures are initiated to update other layers regarding further operations that need to be done. In this context, CLRA will provide feedback information to the scheduler module; this module would assign Resource Blocks (RBs) to each user according to the allocation information. CLRA also would send feedback to the video application module regarding the appropriate video bit rate. Besides, CLRA would return feedback to the transmitter module which leads to code the video packets base on the MCS information.

Fig. 4
figure 4

Cross layer design architecture

Meanwhile, for nRT traffics, CLRA uses the UEs’ CQI feedback to make the scheduling decisions. In this sense, \(\xi _{nrt}\) utilized by CLRA to achieve fair resource allocation among UEs. In fact, The \(\xi _{nrt}\) metric inherit the meaning of Proportional Fair (PF) scheduling scheme [19]. The \(\xi _{nrt}\) metric approximately allocate the same number of resources to all users and try to allocate the resources in any given scheduling interval to a user whose channel condition is near its peak. The \(\xi _{nrt}\) metric is expressed as:

$$\xi _{u} = \frac{{Tput^u_j}}{{ave\_Tput^u_j}}$$
(13)

where \(Tput^u_j\) is the instantaneous achievable throughput and \(ave\_Tput^u_j\) is the past average throughout of user i.

4.2 Cross-Layer Design

In order to exchange information across the individual layers, inter-layer signaling was utilized [35]. This approach allows propagation of signaling message along with packet data flow. In general, two methods are considered for information exchange process; the bottom-up and top-down methods. In this study, the bottom-up method was exploited to design the cross-layer signaling message exchange process.

Figure 5 presents a similar concept of cross-layer scheme with more emphasis on CLRA modules in terms of input and output parameters. From CLRA perspective, it receives source distortion, queue status, channel status and frame priority information from different layers of protocol stack. Then, based on the adaptive scheduling scheme output and AMC scheme, an optimal resource allocation strategy and MCS will be achieved.

Fig. 5
figure 5

Cross-layer resource allocator

4.3 Adaptive Scheduling Scheme

To jointly consider video characteristics, application QoS constraints and fairness, the presented resource allocation strategy jointly considers different factors for each UEs based on the type of traffics. For video traffics, video distortion (i.e., MOS), frame priority and delay constraints factors are considered as scheduling factors. In addition, For nRT traffics, instantaneous and historical average data rate are taken into consideration. Each design factor is achieved by weighting the UEs against all the available RBs in the current TTI.

Regarding RT traffics notably video application, every TTI, the video distortion for each data flow can be evaluated according to Eq. 8. Let denotes this design factor \(MOS_i\). Thus, to achieve higher quality, the smaller MOS value for a given data flow is needed. The elements of \(MOS_i\) are normalized against the maximum MOS value of all waiting UEs. Besides, each user i is also associated with a delay constraint \(\psi _i\) (Eq. 3), which is the second design factor, denoted as \(\psi _i\). The elements of \(\psi _i\) are normalized against the maximum delay constraint of all UEs waiting for scheduling opportunity. Therefore, a smaller delay constraint means higher urgency for the current UEs to use RBs for transmission. Furthermore, for each data flow, the contribution of each packet on video quality can be estimated, denoted as \(f_i\). Thus, a higher value of \(f_i\) indicates that the data flow contains more important frames. Additionally, to avoid certain UEs holding radio resources for too long, another design factor needs to be incorporated to ensure the fairness of resource allocation. The instantaneous data rate and historically average data rate of a UEs are utilized as fairness criterion, denoted as \(\xi _i\) (Eq. 14).

Meanwhile, for nRT traffics, CLRA used the UEs’ CQI feedback to make the scheduling decisions. In this sense, \(\xi _i\) was utilized by CLRA to achieve fair resource allocation among UEs. In fact, the \(\xi _i\) metric inherited the meaning of Proportional Fair (PF) scheduling scheme [19]. The \(\xi _i\) metric approximately allocate the same number of resources to all UEs and try to allocate the resources in any given scheduling interval to a UEs whose channel condition is near its peak. The \(\xi _i\) metric is expressed as:

$$\xi _i = \frac{{Tput_i}}{{ave\_Tput_i}}$$
(14)

where \(Tput_i\) is the instantaneous achievable throughput and \(ave\_Tput_i\) is the past average throughout of UE i.

As a result, with these design factors, UE i is jointly weighted on all available RBs. It is worth noting that in every TTI, all design factors are constantly updated. Then the overall scheduling decision vector can be expressed as:

$$\begin{aligned} w_{RT}&=\left\{ w_{i,rb},~0<\forall _i \le I\;\;and\;\; 0<\forall _{rb} \le ~I_{RB}^{tot} \right\} \\&=[MOS_i*\nu _i - f_i*\nu _i - \psi _i*\nu _i - \xi _i*\nu _i]\\ \end{aligned}$$
(15)
$$\begin{aligned} w_{nRT}&=\left\{ w_{i,rb},~0<\forall _i \le I\;\;and\;\; 0<\forall _{rb} \le ~I_{RB}^{tot} \right\} \\&=[\xi _i*\varLambda _i] \end{aligned}$$
(16)

where \(\nu _i\) and \(\varLambda _i\) are the linear variables of UE i that decide the relative significance of these design factors. Therefore, the scheduling decision weight factor is obtained for each UEs on all available RBs. The adaptive scheduling scheme is illustrated in Algorithm 1.

In the presented adaptive scheduling scheme, within a TTI, the aforementioned scheduling metrics are first calculated for each UEs according to the current user status. Afterward, the overall weight, depending on UEs’s traffic types, for each UE i is obtained. Then, the elements of all I UE decision vectors are combined and sorted into one vector in descending order. Based on this order, the RB is assigned to the UE with the highest weight. If two or more waiting UEs have the same weight, they will be allocated RBs in a round-robin way. This process continues until all the RB of the current subframes are allocated. Then the scheduling scheme proceeds to allocate resources for the next subframes.

figure e

4.4 Video Distortion

At the application layer, video distortion can be caused by source coding parameters, error concealment or the network variations. More precisely, during decoding, each video frame is generally represented in block shape units of \(16 \times 16\) pixel region called Macro Blocks (MBs). Typically, in a successful construction of error free packet, each packet composed of one or several rows of MBs and can be independently decodable. However, when a video packet is lost, the temporal replacement is adopted as error concealment strategy in such a way that the missing pixels in the current frame are replaced by the pixels in the previous frame based on the estimated motion vector. In this study, MSE is considered as the distortion metric. To obtain MOS value, first, Eq. 12 was used to calculate the PSNR, then the PSNR was translated to MOS by using mapping model in Table 1.

4.5 Modulation and Coding

At the physical layer, AMC approaches have been utilized to dynamically adjust transmission parameters with the varying channel condition. In fact, the key objective of AMC is to maximize the link level data rate while minimizing the PLR and transmission delay [25]. In this paper, the AMC method used in [10] was adopted. Table 3 presents the list of available MCS modes, where each mode consists of a pair of modulation scheme and coding rate as in 3GPP [1]. Based on the CQI feedback with Physical Uplink Control Channel (PUCCH), one of the transmission modes is selected in order to accommodate time-varying channel condition.

Table 3 AMC mode at the physical layer

5 Performance Evaluation

In this section, a brief introduction on the simulator is provided. It is followed by the description of simulation configuration and metrics, which are used to evaluate the proposed architecture.

5.1 Simulation Setup

This study focuses on the downlink transmission of LTE system where users simultaneously run different types of applications. In our simulation, eNB delivers the diverse type of traffic to N number of UEs, which are composed of RT and nRT services. Realistic traffic models are used for video and data, and unlimited buffer source is assumed for best effort flows. UEs travel using the random direction mobility model with a speed equals to 3 and 120 km/h for pedestrian and vehicular scenarios respectively. It is assumed that each eNB is equipped with two antennas with transmission power of 46 dBm. 50 available sub-channels and transport block sizes are utilized depending on the selected MCS. Here, normal Cyclic Prefix (CP), three OFDM symbols of Physical Downlink Control Channel (PDCCH) are assumed. However, no sync signals and physical broadcast channels are considered. The summary of simulation parameters is presented in Table 4.

An urban-cell simulation scenario with one eNB is emulated by taking into account path loss, penetration loss, shadowing and multi-path fast fading effects of wireless channel, i.e., \(P_L = 128.1 + 37.6 \log (d)\), in which d is the distance between eNB and mobile user. Large scale shadowing is modeled by a log-normal distribution \(N(0,8\,\text{dB})\). Penetration loss is 10 dB and Rayleigh fading channel model is used [47]. Periodic CQI reports is received by UE every two TTI which is assumed carried over the full bandwidth, and then a corresponding CQI feedback is sent to eNB [30].

Table 4 Simulation parameters

5.2 Performance Metric

The performance of the proposed framework is evaluated upon the basis of network-level metrics and the metrics that are related to the objective video quality assessment. Network throughput, packet delay, packet loss ratio and HoL can be categorized under the network performance metrics. However, PSNR and MOS lies within the category of objective video quality related metrics. The considered metrics are listed as below.

5.2.1 Throughput

The summation of transmitted packet within the simulation time is defined as aggregate throughput. Mathematically, it can be expressed as follows:

$$Throughput = \frac{1}{t_s} \sum _{i=1}^{u}\sum _{i=1}^{t_s}ps_i(t)$$
(17)

where \(t_s\) represents the simulation time and \(ps_i\) describes the size of the packet. The number of active users in the system is denoted by u.

5.2.2 HoL Delay

A HoL packet delay is the difference between recent packet serving time and the time when the packet was stamped on its arrival in the service queue. It has the following mathematical expression for user i:

$$HoL_i(t)= T_{curr}(t) - T_{stmp}(t)$$
(18)

where \(T_{stmp}\) represents the time record of the packet since it arrived at the queue and \(T_{curr}\) reflects the current packet processing time.

5.2.3 Packet Delay

The difference between the packet arrival time in queue and the time instant it is transmitted to UE from the service queue is considered as delay:

$$Delay = \frac{1}{t_s} \sum _{i=1}^{t_s}\frac{1}{s_f} \sum _{i=1}^{s_f}HoL_i(t)$$
(19)

where \(t_s\) denotes the simulation time and \(s_f\) is the total number of users in specific service flow.

5.2.4 Packet Loss Ratio (PLR)

The PLR is calculated as the ratio of the discarded packet from the eNB service queue over the simulation time. To be more precise, those packets could not meet the requirement of the delay budget would be discarded.

$$PLR =\frac{\sum _{i=1}^{u}\sum _{i=1}^{t_s}p_{dis}(t)}{\sum _{i=1}^{u}\sum _{i=1}^{t_s}ps_i(t)}$$
(20)

where \(ps_i\) is the total number of transmitted packets and \(p_{dis}\) reflects the discarded packets.

5.2.5 PSNR

PSNR is one of the metrics related to the objective video quality assessment. It is a logarithmic form of MSE, in which, MSE can be described as the mean of the square of the differences in the pixel values between the corresponding pixels of the two images as stated in Eq. (1).

5.2.6 MOS

MOS is used as a unified QoE metric that indicates the user-perceived quality for real-time or multimedia services notably video applications. Basically, MOS score is classified into five levels, where each level represents the quality of video in terms of end-user perspective. The MOS scores range start from bad to excellent video quality. The relation between the MOS and PSNR has been the subject of many studies, which leads to derive a mapping model to translate PSNR to MOS. In this context, the mapping model based on a hyperbolic tangent function proposed in [18] was used. This mapping model was introduced in Table 1.

5.2.7 Fairness Index

Fairness Index indicates how well a certain level of fairness is guaranteed by different scheduling strategies. In this study, Jain Fairness Index [17] was used to evaluate the level of fairness in BE services. Fairness Index can be defined as below:

$$FI = {{\left( {\sum\limits_{{i = 1}}^{u} {F_{i} } } \right)^{2} } \mathord{\left/ {\vphantom {{\left( {\sum\limits_{{i = 1}}^{u} {F_{i} } } \right)^{2} } {\left( {\sum\limits_{{i = 1}}^{u} {F_{i}^{2} } } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {\sum\limits_{{i = 1}}^{u} {F_{i}^{2} } } \right)}}$$
(21)

where u represents the number of active users in the system and \(F_i\) is the throughput for ith connection.

5.2.8 Spectral Efficiency

Spectral efficiency or bandwidth efficiency [4] is a measure of the efficient use of spectrum. In general, SE is defined as the information rate that can be transmitted over a given bandwidth. To this end, the SE can be expressed as follows:

$$SE = \frac{d_r}{B}\quad \text{[bit/Hz]}$$
(22)

where \(d_r\) and B represent data rate and bandwidth.

5.3 Simulation Results and Discussion

5.3.1 QoE-Level Performance Evaluation Results

Application-level metric like MOS is playing an essential role in order to reflect the performance of network to guarantee a certain level of QoE for end-users. In this section, the results obtained from the simulations which were run for speed of 3 km/h will be analyzed and discussed.

To estimate the end-user perceived video quality, MOS was employed. MOS is widely used as a unified metric that closely linked with end-user satisfaction. Figure 6 presents the end-user video quality under different channel condition. As it can be seen, significant performance improvement was achieved by using the proposed cross-layer scheme. When channel quality decreased, the received video quality in terms of MOS decreased accordingly. However, the proposed scheme achieved higher performance gain in this case. The reason was that the CLRA can holistically and dynamically adapt to the varying channel condition and the video distortion with optimal resource allocation and proper MCS schemes to ensure that the best video quality can be received. In addition, CLRA added more weight on I-frames, which was crucial to reconstructed video quality; therefore, it preserved and reconstructed more I-frames, which helped to improve the video quality.

Fig. 6
figure 6

End-user perceived video quality under different channel conditions

Figure 7 depicts the Cumulative Density Function (CDF) of the MOS for CLRA with various number of UEs. It was clear that most of the UEs has high video quality when the number of UEs below than 20. However, Further increase in the number of UEs was followed by dramatic fall in the MOS. The reason behind this was that with the growing number of concurrent video traffics, the probability of discarding packets for deadline expiration increased accordingly. In addition, due to radio resource limitation, increasing the competition among UEs leads to decreasing the chance of transmission for each UEs.

Fig. 7
figure 7

Cumulative density function of MOS with varying number of UEs

Figure 8 indicates the comparison of CLRA performance in terms of end-user perceived video quality with LOG-Rule and MLWDF schedulers. The general trend showed that as the number of UEs increased, the MOS decreased gradually. This was caused by the fact that increasing the number of UEs leads to decreasing the chance of transmission for each UEs. CLRA outperformed the other examined schedulers. As it can be seen, the CLRA outperformed the other examined schedulers and achieved higher MOS.

Fig. 8
figure 8

Comparison of CLRA with classic schedulers in terms of varying number of UEs

5.3.2 Network-Level Performance Evaluation Results

Except from application-level metrics such as MOS, other performance metrics are also playing an essential role in order to reflect the performance of network to guarantee a certain level of QoS for end-users. They are network-level metrics including packet loss, delay and throughput. In this section, the results obtained from network-level performance evaluation of the proposed cross-layer design will be discussed and analyzed.

The PLR of UEs using video application is illustrated in Fig. 9. For all the examined schedulers, as the number of UEs increase, the PLR rises markedly and this trend is true for both scenarios. Furthermore, The LOG-Rule and M-LWDF schedulers experienced a similar PLR when the UEs speed is equal to 3 km/h (Fig. 9a). The ratio of packet loss for the UEs handled by the PF scheduler grow dramatically as the number of UEs increase. The reason for such a high PLR is that the PF scheduler does not take QoS constraints into account for scheduling purpose. However, CLRA presents a lower PLR compared with other schedulers. In the case where the UEs speed was set to 120 km/h, Fig. 9b, The CLRA, LOG-Rule and M-LWDF scheduler behave alike by showing the 40% PLR for the 50 number of UEs. As expected, PF presents a higher PLR of around 70% of the forwarded packets. CLRA achieves comparable results with other QoS-aware schedulers. Such extensive PLR could significantly degrade video quality even for low bitrate video. Thus, in order to avoid high PLR, the video application users will have to accept higher network delay.

Fig. 9
figure 9

Packet loss ratio for video. a 3 km/h, b 120 km/h

The delay experienced by UEs is shown in Fig. 10. For all examined cases, the UEs handled by the LOG-Rule, MLWDF and CLRA schedulers had a delay value lower than the delay threshold which was set to 0.1s. For the scenario where the users move at 3 km/h, Fig. 10a, the difference in the performance of the schedulers started to be visible after the number of the UEs exceed 20. For the 30 UEs, the delay experienced by users of the PF scheduler is increased exponentially as the number of users increase. After increasing the UEs speed to 120 km/h, Fig. 10b, the delay experienced by UEs increased proportionally. In both scenarios, the CLRA scheduler could keep pace with the other QoS-aware scheduler and achieved comparable results. It is worth noting that as pointed out in [9], the delay above 0.2 s experienced by clients may cause re-buffering. Thus, the PF scheduler could not provide adequate video quality for the UEs handle by this scheduler.

Fig. 10
figure 10

Video delay for different speed of UEs. a 3 km/h, b 120 km/h

Figure 11 illustrates the video throughput for the different number of users. A general trend of video throughput under the different schedulers shows a dramatic decline as the number of UEs increase. In the scenario of users with 3 km/h speed, Fig. 11a, when the number of UEs is less than 30 active users within the network cell, the average throughput utilized by a single UE is sufficient to play a 440 kbps video. However, when the number of UEs increase, the throughput experienced by a single UE drops significantly for all schedulers. Quite similar results are achieved for the scenario in which the UEs’ speed are increase to 120 km/h, Fig. 11b. However, the PF scheduler achieves a lower throughput in comparison with the case in which UEs move at 3 km/h. Therefore, increasing the number of UEs or an increment in the speed of users leads to degradation of the video quality.

Fig. 11
figure 11

Video throughput per UE. a 3 km/h, b 120 km/h

MOS value was estimated for the different number of users as shown in Fig. 12. As the number of UEs increase, the MOS decreases gradually. For the case where the UEs move at 3 km/h, Fig. 12a, CLRA could maintain a higher video quality for UEs in comparison with other schedulers. However, PF scheduler presents the worst performance by showing significant decline in MOS from near 3.6–3.54. Moreover, M-LWDF and LOG-Rule schedulers present a similar trend. Further increase of the UEs’ speed to 120 km/h, Fig. 12b, quite similar results obtained in which MOS decreases for all schedulers. However, the video quality is highly affected by the UEs handled by PF scheduler. This degradation in video quality is caused by packet loss, which may introduce by fading effect. It is predicted that the PF could not handle video stream traffic since it lies within the category of non-friendly multimedia scheduling schemes, which means that it could not handle RT services properly. Therefore, the CLRA scheduler achieves comparable results with other schedulers and in the low speed case, it outperforms other examined schedulers.

Fig. 12
figure 12

MOS score. a 3 km/h, b 120 km/h

In Fig. 13, the BE service throughput is compared. In both scenarios, the performance of all the schedulers is relatively much the same. A sharp decline can be observed when the number of UEs rise to 20. However, in the case where the UEs speed are set to 120 km/h, the throughput achieved by UEs are clearly lower compared to the case in which the speed was set to 3 km/h. To be more precise, when UEs move at 3 km/h, Fig. 13a, the throughput drop sharply about 85% and followed by the modest decline. The CLRA scheduler has a similar trend to all other schedulers. Although, the PF scheduler keeps pace with other scheduler when the number of UEs are below than 20, However, the difference between PF and other schedulers are noticed when the number of UEs exceed 20. After increasing the UEs speed to 120 km/h, Fig. 13b, the performance of the scheduler in this scenario behaves similar to the previous scenario with the difference that the performance of the schedulers deteriorated. The reason for such a great degradation on the UEs throughput is the fact that with the increase of multimedia traffic, the BE services are pushed to the background.

Fig. 13
figure 13

BE throughput per UE. a 3 km/h, b 120 km/h

Figure 14 depicts the PLR of BE service. A general trend of BE PLR under the different schedulers shows a dramatic growth as the number of UEs increase. In the case of the low speed, Fig. 14a, the PLR is less than 0.2% for all examined schedulers. For the small number of users, the CLRA scheduler keeps pace with the other schedulers until the number of UEs are 20. Further increase on the number of UEs, widen the gap between the CLRA and other schedulers. Furthermore, LOG-Rule and M-LWDF present similar tend and PF scheduler shows the worst performance in this scenario. This trend is also true for the scenario of 120 km/h but with higher PLR for all examined schedulers (Fig. 14b). In both scenarios, CLRA could achieve a lower packet loss in comparison to its counterparts. As mentioned earlier, the reason for such a high PLR can be justified by considering the efficiency obtained by video traffics, which nRT traffics can tolerate higher packet loss in comparison to the RT traffics, the scheduler tries to give higher priority to those flows with time bounded applications. Consequently, the PLR of the nRT services will be increased.

Fig. 14
figure 14

Packet loss ratio for best effort flows. a 3 km/h, b 120 km/h

The achieved level of fairness by examined schedulers are shown in Fig. 15. By increasing the number of UEs, the fairness index decreases significantly, which mainly because of the smaller available bandwidth left free for BE traffics. As mentioned earlier, the CLRA handle BE traffic similar to PF scheduling scheme. Similar results were obtained for the scenario where the UEs move at 120 km/h with the difference in the level of fairness. The CLRA scheduler provides a comparable level of fairness in comparison to other schedulers and even could present slightly higher level of fairness in the case of low speed.

Fig. 15
figure 15

BE fairness index. a 3 km/h, b 120 km/h

Figure 16 shows the spectral efficiency for the considered scenarios. In both scenarios, when the number of UEs increase, a dramatic drop was observed. For the case of low speed, Fig. 16a, the spectral efficiency was severely reduced by more than 60%, when the number of UEs were less than 20. Nevertheless, when the number of UEs exceed 20, a gradual decline of around 50% was observed for all schedulers. This trend was true for the case in which the speed was set to 120 km/h, Fig. 16b, However, a lower efficiency achieved by the examined schedulers. As it can be seen, in both scenarios, the CLRA scheduler keep pace with other schedulers and could achieve a comparable result.

Fig. 16
figure 16

Spectral efficiency. a 3 km/h, b 120 km/h

The results obtained through simulation show that the proposed scheme is able to achieve higher performance compared against classic schedulers. The reason for that is CLRA exploits the parameters of different layers, including channel quality information, queue status, frame priority scheme and video distortion information. Moreover, CLRA adds more weight on I-frames which are crucial to reconstructed video quality. As a result, CLRA preserves and reconstructs more I-frames, which help to improve the video quality. Application layer metric optimized scheduling is more efficient and effective in enhancing the video quality. For nRT traffic, CLRA provides a high degree of fairness by taking past average throughput into consideration.

6 Conclusion

In this paper, a cross-layer scheduling was introduced to serve heterogeneous type of traffics. The presented approach utilized the application, the MAC and the physical layer parameters to improve end-user perceived video quality and to ensure the high degree of fairness among nRT users. In this framework, different modules were employed to handle cross-layer scheduling, including video application, Cross-Layer Resource Allocator (CLRA), scheduler and transmitter. Video application module at the application layer buffers the incoming video from backbone and reports video distortion to CLRA module. Next, CLRA exploits the video distortion along with channel distortion form physical layer to estimate PSNR value. Subsequently, the PSNR is translated to MOS. Finally, based on the obtained MOS value, frame priority weight, QoS delay constraints and channel status, in every TTI, the user with the highest metric will obtain scheduling opportunity. For non-real-time(nRT) services, the instantaneous throughput and average throughput are taken into consideration to ensure a high level of fairness among UEs. Simulation results indicate that the proposed QoE-Oriented cross-layer framework leads to remarkable improvement in terms of user-perceived video quality and spectral efficiency as well as maintain fairness among nRT users. Future work may includes Comparing CLRA with the latest QOE-Driven scheme, the complexity analysis and optimization of the CLRA scheme. Also, defining QoE models which derives MOS directly from packet loss, bit rate with different classification of video contents is a potential topic for future works.