Keywords

1 Introduction

Surveillance has always been important to acquire security and safety for human beings; technology plays important role in this context. In modern times, UAV based video surveillance is very much popular and beneficial in prevention of crime or any unwanted activity specially at remote site. Camera mounted UAVs not only decline the budget of surveillance but it also safe the human life as UAVs work in place of persons to achieve this objective. UAVs equipped with latest digital equipment are used to stream real-time video across the network in surveillance of monitored zone [16]. There are many surveillance systems proposed based on UAVs [22, 24, 27, 35, 41, 44, 45, 54]. As UAVs mobility supports in surveillance at remote side; there must be a cellular infrastructure that assist UAVs to flight at far distance; otherwise remote control UAVs fly within limited distance are not suitable in monitoring specially for surveillance purpose. For this reason, 4G-LTE networks are considered to be one of the best solution [19, 36, 43, 47]. In this paper, we propose an architecture suitable for 4G-LTE UAV based surveillance applications. This surveillance architecture is designed for monitoring targeted areas inside the buildings where stationary UAVs are responsible to capture the videos and outside the buildings in open space where mobile UAVs capture the videos of interested vantage points for surveillance. These captured video are streaming towards the respective base station of each UAV. From Base station, these videos are streamed to a single command-and-control center where all activities are monitor in real-time to take appropriate action if needed. We investigate different factors that effects the real-time video streaming which degrade the Quality of Experience (QoE) of video viewing in such an architecture. For instance as 4G-LTE supports only hard-handover (which means during flight, UAVs first break the connection from current attached base-station and then establish the connection with new base-station) which crashes the bandwidth suddenly [32], result in poor QoE of video-viewing. We took into consideration such facts and demonstrate their effects on quality of stream video. To examine the quality of video-viewing, we have calculated two objective metrics; the Peak Signal to Noise Ratio (PSNR) and Structural SIMilarity Index (SSIM) [8, 17, 52]. For real-time video streaming we have used two famous codecs i.e. H.264 and H.265. For simulation, we have selected NS-3 to simulate our work. Simulation results revealed how different factors effects the streaming video quality in such an architecture. Simulated results also prove that H.265 performs better than H.264 under different circumstances.

2 Related Work

Recent technology plays an important role in every aspect of life [31]. Surveillance is also a major concern of every modern society [30]. Instead of using fixed cameras for surveillance; it is a nice idea to use drones for this purpose. These drones are also known as Unmanned Aerial Vehicles (UAVs) [26]. These UAVs not only capture the Live video of different vantage points but also stream these captured video towards remote station to check all activities in monitoring zone with low operational cost in minimum span of setup time and effort [9, 39]. Such UAVs can easily fly in an open area and could be tuned-up during flight to capture the important events in monitoring zone [13, 21]. There are several UAV-based surveillance framework proposed to enrich this technology [7, 23, 49]. Two interested survey papers [11, 37] for Flying Ad-hoc Networks (FANETs) educate about the fundamental operations of FANETs and their operational behavior in a specific environment. These survey papers also revealed the recent challenges in this technology and provide the solutions of major hurdles in communication among UAVs. Mustaqim et al. [38] evaluate the communication among UAVs during flight in their work. Researchers also examine the UAV to ground communication in FANETs by using antenna arrays. Qazi et al. [42] evaluate the performance of UAVs in different propagation models specially when these UAVs are flying in very low altitude in an surveillance architecture. In another paper Qazi et al. [43] proposed UAV-based framework for surveillance over 4G-LTE network using two tiered architecture by placing stationary UAVs inside the buildings and flying mobile UAVs outside the building. The UAVs transmitted the captured video at remote site in the surveillance framework. Researchers analyzed different factors that effect the streaming of video including losses in shadowing and fading models. Researchers also examine delay, throughput and multi-path propagation-loss over the proposed framework.

Fig. 1.
figure 1

Proposed surveillance architecture in an urban area

3 Surveillance Architecture

For the surveillance architecture, consider the Fig. 1. The basic concept of topology and all concerned terminologies are taken from 3GPP R4-092042 standard. The architecture we have designed have several buildings. There exists certain monitoring targets inside and outside the buildings. Some UAVs are placed inside the buildings are referred as homeUEs while some UAVs are outside the buildings in the air known as macroUEs. In our work, homeUEs are stationary while macroUEs are present outside the buildings, flying continuously in the air. Femto cells are inside while macro cells are outside the buildings. HomeUEs are located in femto cells while macroUEs are in macroCells in free space. All UAVs streams their video to their respective base stations. The base stations for homeUEs are known as homeENBs while for macroUEs are known as macroENBs. The base stations are responsible to receive video streams from UAVs and deliver real-time video to the single command and control center located at remote site via legacy internet connection. Such an surveillance architecture not only provide the real-time monitoring of targeted areas but also facilitate to prevent any mishap by taking appropriate action on time.

4 Measuring Metrics Used for QoE of Video

Image quality can be measured in several ways. There are two major categories for the schemes evaluating the image quality i.e. subjective and objective [10]. Subjective schemes are based on human judgment and operate without reference to explicit criteria [46]. Objective methods are based on comparisons using explicit numerical criteria [12, 40] and several references are possible such as the ground truth or prior knowledge expressed in terms of statistical parameters and tests [15, 33, 48].

If we talking about PSNR, it is actually the ratio between extreme-signal’s-power and noise-corrupting-power due to which the signal is modified. PSNR represented by means of logarithmic decibels as its bound fluctuates dynamically.

PSNR could be used for a rough approximation of comparative characteristics if types-of-distortions and video-contents remains the same, only the altitude of distortion is changed [50]. Though, dependent upon content of the video and corruption after reception, this is also a reality that the correlation between PSNR and subjective-quality could become very small [25]. For this reason, PSNR considered as an inconsistent approach for measuring the video QoE among dissimilar-contents of the video. Despite all these facts, PSNR is yet considered as a quality-metric. The complexity of PSNR is very low, which is another reason of its popularity [18].

PSNR is derived by applying the Mean-Square-Error (MSE) in relation to the maximum possible value of the luminance (\( 2^8 - 1 = 255 \)) for a typical 8 bit value as

$$\begin{aligned} MSE = \frac{\varSigma _{i=1}^{M} \varSigma _{j=1}^{N} [(f(i,j)- (F(i,j)]^2)}{M.N} \end{aligned}$$
(1)

Where f(ij) is the original-signal at pixel (ij),

F(ij) is the reconstructed-signal, and

M.N is the picture-size.

MSE is the cumulative-squared-error between the original and the distorted videos.

$$\begin{aligned} PSNR = 20\log _{10}\Bigg [\frac{255}{\sqrt{MSE}}\Bigg ]\text {dB} \end{aligned}$$
(2)

The resultant value is a specific digit expressed in decibels. The range of this digit is from 30 dB for medium video quality upto 40 dB for higher video QoE [29]. As depicted in (2), the PSNR and MSE are inversely-proportional to each-other. For the same video, a higher-quality video has a higher PSNR while lower MSE observed and vice-versa.

SSIM is another objective metric which is used to compute the commonality (similarity) between two video frames [29, 51]. For measuring the similarity between two pictures, SSIM deals with two pictures in such a way that one picture is taken as error-free and the other picture as erroneous. The major deficit of PSNR is that it could not resolve the irregularities as perceived with human-eyes which is why SSIM has been recommended. SSIM is a quantify-metric between two windows having equal-length. The digit +1 represents the 100% similarity while −1 indicates the 0% similarity in frames.

For calculating the distorted-image quality, correlations in luminance, contrast, and structure are used in comparison locally between the reference and distorted images and averaging these quantities over the entire image. The theme of SSIM scheme is derived from the working of HVS [14]. To gauge the structural similarity between two signals let vector x and y is given below:

$$\begin{aligned} \small SSIM(x,y)= (\frac{2\mu _x \mu _y+C_1}{\mu ^2_x + \mu ^2_y +C_1})^\alpha (\frac{2\sigma _x \sigma _y+C_2}{\sigma ^2_x + \sigma ^2_y +C_2})^\beta (\frac{2\sigma _xy +C_3}{\sigma ^2_x + \sigma ^2_y +C_2})^\gamma \end{aligned}$$
(3)

where, \( x = (x_i), i = 1,2,3....\,N\)

\( y = (y_i), i = 1,2,3....\,N\)

\( (\frac{2\mu _x \mu _y+C_1}{\mu ^2_x + \mu ^2_y +C_1})^\alpha \) matches the signal-luminance

\((\frac{2\sigma _x \sigma _y+C_2}{\sigma ^2_x + \sigma ^2_y +C_2})^\beta \) matches the signal-contrast

\((\frac{2\sigma _xy +C_3}{\sigma ^2_x + \sigma ^2_y +C_2})^\gamma \) measures the structural-correlation of signal. \(\mu _x \mu _y\) are the sample means of x and y respectively,

\(\sigma _x \sigma _y\) are the sample standard deviations of x and y respectively,

\(\sigma _xy\) indicates the cross co-variance between x and y,

\(C_1, C_2, C_3\) are the constants that are used to stabilize the metric, \(\alpha> 0,\beta> 0, \gamma > 0 \) are the parameters that are used to adjust the relative importance of the three components.

As \( \alpha ,\beta ,\gamma \) should always be greater than one, hence the product should be one, which explains the condition given below

$$\begin{aligned} \small (\frac{2\mu _x \mu _y+C_1}{\mu ^2_x + \mu ^2_y +C_1})^\alpha + (\frac{2\sigma _x \sigma _y+C_2}{\sigma ^2_x + \sigma ^2_y +C_2})^\beta + (\frac{2\sigma _{xy} +C_3}{\sigma ^2_x + \sigma ^2_y +C_2})^\gamma = 1 \end{aligned}$$
(4)

To obtain the above condition;

\(\mu _x = \mu _y \rightarrow \) The mean of two videos must equal

\(\sigma _x = \sigma _y = \sigma _{xy} \rightarrow \) The standard deviations of both the videos and their cross covariance must be the same.

A video with extreme bad quality has −1 SSIM value. Such a video represents a strong-negative-correlation and hence a strong-deviation between the frame(s)-of-interest and the original-frame(s).

5 Simulation Settings for Streaming Video

We have captured real video from drone of different events in Intellect 2017, first international conference held on 15–16 Nov 2017 at Pakistan Air Force, Karachi Institute of Economics & Technology (PAFKIET) in Karachi. The captured video converted both into H.264 and H.265 with MP4 format. For the conversion of video in different encoding we have used FFmpeg [2] which offer strong utilities for video conversion even for real-time video-streaming [28].

All the simulation in this work are performed over Network Simulator-3 commonly known as NS-3. NS-3 is the most popular and trustworthy network simulator among research community. It is mainly designed for research purpose specially to simulate the operations over latest advanced networks. To simulate 4G-LTE in our work, we have used lena-dual-stripe package of NS-3 in the simulation. The simulation parameters set shown in Table 1. For the communication between client/server over 4G-LTE network, we have used Evalvid. Evalvid is an application developed by GERCOM group [3] which is mainly designed to simulate video-streaming for client/server communication over the network. This application facilitates how user observed video quality on the reception of streaming-video. Evalvid uses the trace video file for streaming in the simulation that is derived from MP4 encoded video. In the original Evalvid application Random Waypoint mobility model is applied to simulate the flight pattern of UAVs. In this work, we have changed the mobility model. We have selected Gauss-Markov mobility model which is more realistic flight pattern. The second modification we have applied is to change the streaming direction of video. In the original Evalvid application, the video is transmitted from client to the server while in our work, we modify it and the video is streamed from server (which is actually UAV) to remote client (which is static, we refer it as command and control center). All the communications over Evalvid application are UDP based. We also preferred UDP for our work as it is suitable protocol for real-time video streaming. For the examination of PSNR and SSIM of streaming video, we go along with the guidelines available at [5] using Evalvid binaries required that could be found at [1].

As video captured by UAVs are in uncompressed format hence these are huge videos and take a lot of time to transmit over the network. For this purpose, a good encoding scheme is required that compress the small chunks of captured video before transmission in very short span of time. Now-a-days, for fast and reliable encoding, H.264 is preferred most. H.264 also referred as MPEG AVC is a general-purpose encoding scheme specially designed for mobile low bitrate video applications to high definition video transmission of television. H.264 not only covers the vast range of application but also offers remarkable enhancement in the efficiency of compression has made it the most demanding codec in the industry.

High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard, designed as a successor to the widely used AVC (H.264 or MPEG-4 Part 10). HEVC provides better compression in comparison with other encoding schemes. It improves the video compression from 25 to 50% for the video with same level of quality at the same bit-rate in comparison with AVC. In comparison with H.264, HEVC or H.265 has low complexity and it is more hardware friendly even in ad-hoc networks [20]. HEVC also provide low delay configuration specially for [53] architecture.

In this study, we compare every instance of result both from H.264 and H.265 video codecs. The objective metrics PSNR and SSIM shows HEVC is much better than H.264. The only drawback is that HEVC taking more encoding time as compare with H.264 [34]. For this reason, high configuration UAVs are required in the surveillance that can faster the encoding process.

Fig. 2.
figure 2

Complete simulation platform

The complete simulation platform is shown in Fig. 2. The step by step process sequencing from 1 to 14 with directed arrows shows the practical approach for this work. The steps before simulation are labeled with 1 to 5, in which the camera-mounted-UAV captures the video of interested area in the monitored-zone. The captured video used in the simulation in H.264 and H.265 codecs. This captured video then transforms into a YUV-sequence, afterwards in MP4, then in M4V and then finally again to MP4. This MP4 contains the hint-tracks in video samples by using MP4Box which is used to insert hint-tracks [4]. mp4trace is a tool offered by EVALVID which is used to generate trace-file from the hinted MP4 file that transmits over UDP in the networks. Steps during the simulation are labeled from 6 to 11. In the simulation steps, the flying UAVs transmit the captured real-time video towards a static remote client. For wireless medium, a wireless-propagation-model is applied to emulate the wireless-infrastructure. To mimic the realistic flight pattern of UAVs, we apply Gauss Markov mobility model in the simulation. Corrupt trace file received by the remote client because of frequent handovers and channel variations. Steps after simulation are labeled from 12 to 14 in which the first step is the rebuilding of the streamed video as it is seen by the receiver. To acquire this goal, at the receiver-end, the MP4 and trace files were processed by the etmp4 tool. This tool produces a possibly corrupt-video file in which the lost-frames are deleted afterwards this corrupted-video then decodes into the YUV-sequence. Finally, the binary file psnr offered by EVALVID is used to compute the PSNR and SSIM from original and corrupt YUVs which indicates the difference between in original video and the corrupt video.

Table 1. Parameters in simulation adopted from 3GPP R4-092042 specification

6 Performance Evaluation

We have performed four experiments to analyze the effect of different factors on QoE of video streaming across the network by measuring two objective metrics PSNR and SSIM. These experiments are based on varying Line of Sight to Non-Line of Sight threshold (LoS2NLoS), varying macroENB sites, varying homeUEs per homeENB ratio and varying internal wall-loss.

Effect of Varying LoS2NLoS Threshold. To consider the impact of Line-of-Sight to Non-Line-of-Sight threshold, we vary it from 200 m towards gradually increasing upto 300 m as shown in Fig. 3 and Fig. 4. Both PSNR and SSIM are showing the rising trend in the graphs and it seems to be obvious as there is no such hurdles or obstacles found within Line-of-Sight range; the minimum losses observed here and hence higher PSNR and SSIM are represented by graphs. For this experiment we have taken all the parameters as mentioned in Table 1 except LoS2NLoS values which are increasing gradually to analyze LoS2NLoS impact over QoE of video.

Fig. 3.
figure 3

Effect of LoS2NLoS threshold on PSNR

Fig. 4.
figure 4

Effect of LoS2NLoS threshold on SSIM

Table 2. Internal Wall Loss of different materials [6]

Effect of Internal Wall Loss of Different Materials. To examine the impact of Internal Wall Loss, we have selected different materials in simulation settings. These material are shown in Table 2 that shows thickness and wall loss of each mentioned material. As the thickness of wall increases because of different type of material, the QoE of video decline that can seen in the Fig. 5 and Fig. 6. The decreasing trend in graph showing lower PSNR and SSIM because of attenuation caused by different materials. Hence it is the fact that the material of the constructed building also effects the QoE of video viewing over the surveillance architecture.

Fig. 5.
figure 5

Effect of internal Wall loss on PSNR

Fig. 6.
figure 6

Effect of internal Wall loss on SSIM

Effect of Varying macroENB Sites. To measure the impact of varying macroENB sites in the surveillance architecture; we increase the number of macroENB sites from 1 to 7. As the number of macroENB sites increases, the QoE of video decreases as depicted by PSNR and SSIM in the Fig. 7 and Fig. 8 respectively. This decline is because of frequent handovers. As we are increasing the number of macroENB sites in limited distance of 500 m, the frequency of handover increases. As 4G-LTE only support hard handover [32], the bandwidth suddenly crashes over each handover, as the result QoE of video degrades; which is why the PSNR and SSIM are showing decline in the graphs.

Fig. 7.
figure 7

Effect of macroENB sites on PSNR

Fig. 8.
figure 8

Effect of macroENB sites on SSIM

Effect of Varying homeUEs per homeENBs Ratio. In this experiment, we allocate homeUEs in random rooms of the buildings. The motive of this placement is to capture different events in different locations of the buildings. The size of such kind of UAVs are very small that nobody could easily recognized these UAVs. We increase the homeUEs to homeENBs ratio gradually from 0.5 to 4 with stepping 0.5. By increasing this ratio we observe decline in QoE of video as depicted in Fig. 9 and Fig. 10. This decreasing trend in PSNR and SSIM in the graphs is because of increasing the burden of homeUEs over homeENBs. This is also the fact that if the burden of homeUEs over homeENBs increases, there are less chances of homeENBs and homeUEs to be in close proximity with each other inside the building. Hence this will introduce higher propagation loss and hence lowers the QoE of streaming video. Therefore, it is suggested to adjust optimal ratio among homeUEs over homeENBs, otherwise poor QoE of video viewing is expected which is of-course not affordable in an surveillance architecture.

Fig. 9.
figure 9

Effect of HomeUEs/HomeENBs on PSNR

Fig. 10.
figure 10

Effect of HomeUEs/HomeENBs on SSIM

7 Conclusion

In this paper, we proposed an UAV-based surveillance architecture over 4G-LTE network. We have tried to maximize the QoE of video viewing over such an architecture. To examine the QoE of video viewing, we have used two objective metrics i.e. PSNR and SSIM. We consider the impacts of different factors by examine PSNR and SSIM of such an architecture that is helpful in analysis of QoE of streaming video. This study is helpful to evaluate the performance of video streaming over UAV-based surveillance architecture. We have selected NS-3 simulator for all simulations in this work. We have performed several experiments to explore the effects of different factors on the QoE of streaming video over UAV-based surveillance architecture. The experimental results provide useful analysis that could be used to upgrade the QoE of video-monitoring. Two encoding schemes H.264 and H.265 (HEVC) are used for video streaming. The comparative analysis proves that H.265 performs better than H.264 in different scenarios. The only requirement is high configuration UAVs that can compute fastly the complex computation of H.265 scheme to minimize encoding delay.