1 Introduction

Cameras are affordable these days due to the technology advancements, which leads to a significant utilization of cameras by the users for capturing precious moments in their life. Omni-direction cameras can capture the whole scene using more than one camera and the images captured by these cameras are stitched together to give a 360\(^\circ \) view of the scene.

Figure 1 portrays the basic workflow of 360\(^\circ \) video. It generally commences with an omni-direction camera capturing 360\(^\circ \) frames. Those are organized (i.e., stitched) together and sent to the encoding phase where the spherical video is projected to a 2D plane followed by frame packing and compression. The commonly used two projection formats: Equirectangular Projection (ERP) and Cubemap Projection (CMP) of a user-generated 360\(^\circ \) video are shown in Figs. 2 and 3, respectively. The encoding phase is followed by the decoding phase where a single video undergoes interactive projection that offers the rendering process inter-relating with the respective input/output technology (such as HMD) at the consumer end.

Figure 4 depicts the different FoVs in traditional viewing mode extracted from the equirectangular projection given in Fig. 2. This gives the content creators flexibility to shoot in 360\(^\circ \) and later in the post-processing they can select the FoV that matters the most.

This review article

  • is the first review on the user-generated 360\(^\circ \) video to the best of our knowledge.

  • introduces various research areas in the user-generated 360\(^\circ \) video.

  • investigates recent literature and categorizes based on research areas.

  • highlights the pros and cons of each methodology.

Fig. 1
figure 1

360\(^\circ \) video processing workflow [1]

Fig. 2
figure 2

Equirectangular projection

Fig. 3
figure 3

Cubemap projection

Fig. 4
figure 4

af Different FoVs from user-generated 360\(^\circ \) video

Fig. 5
figure 5

Areas of research in 360\(^\circ \) video

The article is organized as follows. Section 2 briefs various research trends in 360\(^\circ \) video production, communication, and analysis. The processing techniques applied on 360\(^\circ \) videos are discussed in Sect. 2.1. Section 2.2 discusses steaming techniques. Video post production methodologies are discussed in Sect. 2.3. The evaluation of the quality of 360\(^\circ \) videos are reviewd in Sect. 2.4. Observations are listed in Sect. 3 and Sect. 4 concludes this article.

2 Research Trends in 360\(^\circ \) Video

A brief survey of each research area in a 360\(^\circ \) video is discussed in this section. Figure 5 depicts the research trends in 360\(^\circ \) video.

2.1 Processing of 360\(^\circ \) Video

This section discusses various processing techniques required for 360\(^\circ \) videos before transmitting or storing. After capturing a 360\(^\circ \) video, they need to be stitched and projected into a suitable representation, and then it will be compressed for transmission or storage. The following subsections present a review of the existing methods in processing 360\(^\circ \) video.

2.1.1 Projection

In Sphere Segmented Projection, the visual artifact is caused due to inactive region [2]. In order to enhance coding efficiencies and to minimize visual artifacts, Yoon et al. suggest a scheme of padding inactive region. For panoramic videos, Huang et al. presented a low-complexity prototype scheme and video stitching mechanism [3]. Hanhart et al. recommended a coded approach on the basis of spherical neighboring relationship and projection form adaptation [4]. Su and Grauman proposed a spherical convolutional network used to process 360\(^\circ \) imagery straightforward in its equirectangular projection, which is translated from a planar Convolutional Neural Network (CNN) [5]. Lin et al. propose a hybrid equiangular cubemap projection that minimizes seam artifacts [6]. Some characteristic equirectangular projection forms of sequences in the clip are experimented by Wang et al. [7].

It is unfavorable to attain a well-organized compression for storing and transmitting [8]. Hence, Vishwanath et al. recommended a rotational model for identifying the angular motion on the sphere effectively. In 3D space, for an angle \(\alpha \), vector A is rotated around an axis given by a unit vector B. The coordinates of vectors A and B are (p, q, r) and (l, m, n), respectively. The coordinates of the rotated vector \(A^{'}\) will be

$$\begin{aligned} p^{'}=l(B\cdot A)(1-\cos \alpha )+p\cos \alpha +(-nq+Ar)sin\alpha \end{aligned}$$
(1)
$$\begin{aligned} q^{'}=m(B\cdot A)(1-\cos \alpha )+q\cos \alpha +(np-lr)sin\alpha \end{aligned}$$
(2)
$$\begin{aligned} r^{'}=n(B\cdot A)(1-\cos \alpha )+r\cos \alpha +(-Ap+lq)sin\alpha \end{aligned}$$
(3)

where \(B\cdot A\) is the dot product. Rotation of axis B is the vector right angled to the plane well defined through the origin, vector A, and also rotated vector \(A^{'}\). Vector B is computed as follows:

$$\begin{aligned} B=\frac{A\times A^{'}}{| A\times A^{'}|} \end{aligned}$$
(4)

Angle of rotation is given as

$$\begin{aligned} \alpha =\cos ^{-1}(A.A^{'}) \end{aligned}$$
(5)

The summary of techniques, highlights, and challenges of 360\(^\circ \) video projections is listed in Table 1.

Table 1 Summary on projection of 360\(^\circ \) video

2.1.2 Distortion

Azevedo et al. provide an extensive analysis of the most popular visual deformity that alters the 360\(^\circ \) video signals in immersive applications [1]. Aksu et al. present a scalable multicast live deliver of 360\(^\circ \) video with distortion analysis [9]. Yoon et al. recommend an approach of adding inactive regions to lessen deformations [2]. A detailed review of the distortions in 360\(^\circ \) video is given in Table 2.

Table 2 Summary on distortion of 360\(^\circ \) video

2.1.3 Compression

Le et al. designed a transcoding system with ARIA block cipher for encoding purpose [10]. To gain high steady sampling, Lin et al. offer 360\(^\circ \) specific coding tools [6]. The mapping function is given as follows:

2D (Cube-Map) to Sphere:

$$\begin{aligned} h_{b}(a,b)=\frac{b}{1+0.4(1-a^{2})(1-b^{2})} \end{aligned}$$
(6)

Sphere to 2D (Cube-Map):

$$\begin{aligned} k_{b}(a,b)= \left\{ \begin{array}{cl} b, &{}\text {if s=0} \\ \frac{1-\sqrt{1-4s(b-s)}}{2s},&{} \text {otherwise}\end{array}\right. \end{aligned}$$
(7)
$$\begin{aligned} {\text {where }}s=0.4b(k_{b}(a{\tiny })^{2}-1). \end{aligned}$$

To improve the quantity of storing and compressing video based on perception, an efficient compression mechanism called Vignette was suggested by Mazumdar et al. [11]. Xiu et al. recommended that gain for the paired categories of video such as HDR and SDR attains considerable efficiency on coding [12]. Aimed at the spherical environment, Wang et al. propose an algorithm for compensation and estimation based on the motion prototype [7].

In order to enhance efficiency and minimize encoding time, Zhang et al. present an optimization procedure on compression [13]. Choi et al. offer an inventive video compression approach for video service accompanied by high quality and video coding schemes using HDR [14]. Lin et al. propose a subject labeled database with a massive scale, which comprises compressed H.265/HEVC videos consisting of miscellaneous PEAs [15]. In order to have an enhanced performance, Le et al. designed a transcoding system that plays a vital role in modifying bit rates and changing the resolution of 360\(^\circ \) videos [10]. Various 360\(^\circ \) video compression techniques, highlights, and challenges are summarized in Table  3.

Table 3 Summary on compression of 360\(^\circ \) video

2.2 Streaming of 360\(^\circ \) Video

This section presents various mechanisms required for 360\(^\circ \) video streaming. Streaming can be done based on FoV or Tiles. The following subsection gives a detailed description of the techniques involved in FoV-based and Tile-based streaming.

2.2.1 FoV-Based Streaming

Duanmu et al. established a two-tier framework to intensify the utilization of bandwidth for 360\(^\circ \) video streaming [16]. Skupin et al. propose an optimal way of streaming based on the FoV [17]. Sun et al. propose a two-tier solution to deliver the entire 360\(^\circ \) span video at a truncated quality base tier and a higher quality enhancement tier [18]. Jiang et al. recommended Plato for viewport adaptive streaming using reinforcement learning [19].

Qian et al. introduce a cellular-friendly streaming methodology which conveys only 360\(^\circ \) video viewport created on the prediction of head actions [20]. The 360\(^\circ \) video stream has greater bandwidth requirements and needs quicker responsiveness to viewers’ inputs [21]. In this aspect, Zhou et al. perform an analysis of oculus 360\(^\circ \) video streaming. Among the future and past viewpoints, in order to capture the long-term dependent and nonlinear relation Yang et al. presented a single viewpoint prediction model built on CNN [22]. Corbillon et al. give a viewpoint adaptive approach that allows the streaming video to have a lower bit rate on comparison with the original video [23]. Table 4 gives the review on FoV-based streaming of 360\(^\circ \) video.

Table 4 Summary on FoV-based streaming of 360\(^\circ \) video

2.2.2 Tile-Based Streaming

Sanchez et al. illustrate the streaming established by means of tile tactics followed in the Moving Picture Expert Group OMAF requirement [24]. Xie et al. presented a compatible streaming model for probabilistic tiles referred as 360ProbDASH [25]. Graf et al. propose adaptive tile-based streaming over HTTP to present the solution for the problems faced in video delivery infrastructures [26].

As the complexity of the 360\(^\circ \) video increases with the essential to accomplish bitrate adaptation for a varying network [27], Le Feuvre and Concolato recommended MPEG DASH (Dynamic Adaptive-Streaming over HTTP) standard to designate by what means spatial accessing can be attained. Kammachi-Sreedhar and Curcio described an optimal way of streaming technology [28].

Nguyen et al. suggest a flexible method for tiling-based viewpoint streaming [29]. Due to the latency in the network, 360\(^\circ \) video streaming is a difficult task [30]. Hence, Mahzari et al. recommended a tile-based caching policy. In real life using cellular networks, tiled video develops a probable solution for violently minimizing the essential bandwidth for 360\(^\circ \) video transmission [31]. As a result, Lo et al. give the performance over a cellular network of tile-based streaming. For high-quality streaming, there is a limitation of power consumption and bandwidth effectiveness [32]. Hence, Son et al. offer a tiling-based streaming approach. Summary on tile-based streaming of 360\(^\circ \) video is shown in Table 5.

Table 5 Summary on tile-based streaming of 360\(^\circ \) video

2.3 Post-production of 360\(^\circ \) Video

At the user end, post-processing of the stored or streamed content is done. It provides consumer ease in comprehension, seamless visualization, and user experience. Several methods for post-production have been discussed under the following subsections.

2.3.1 Visualization

On live broadcasting, the broadcaster may not be aware of the user’s FoV [33]. In this aspect, Takada et al. propose a visualization method based on users’ Points of View (PoV) making use of a spherical heat map allowing the broadcaster to grip users’ FoV easily and exchange information with users evenly. Azevedo et al. alter the 360\(^\circ \) video signals for better visualization in immersive applications [1]. Existing techniques, highlights, and challenges of visualization in 360\(^\circ \) video are summarized in Table 6.

Table 6 Summary on visualization of 360\(^\circ \) video

2.3.2 Viewport Prediction

User head movements result in user interaction and modifications in the spatial parts of the video allowing them to view only essential portions in the video for a specified time [9]. To achieve this, Aksu et al. offered a novel adaptable framework for the prediction of the viewport. Heyse et al. offered an approach for contextual bandit based on reinforcement learning [34]. The tiles which map the field of view, provided with high resolution by using viewpoint adaptive streaming, was proposed by Jiang et al. [19]. Hu et al. recommended a mechanism of agent-based deep learning “deep 360\(^\circ \) pilot” for viewers to pilot the 360\(^\circ \) sports video spontaneously and develops an agent-specified domain to have a clear definition about the objects in the video [35].

To analyze visual quality at the viewport based on end-to-end delay, a viewpoint-dependent scheme was proposed by Sanchez et al. with the gain of 46% when compared with viewpoint-independent scheme [24]. Foreseeing the future PoV in a long time horizon can help in saving bandwidth incomes for on-request streaming of a video in which pausing of the video is diminished with noteworthy bandwidth variations in network [36]. To support this Li et al. introduced a two clusters point of view prediction models. Table 7 summarizes the viewport prediction of 360\(^\circ \) video.

Table 7 Summary on viewport prediction of 360\(^\circ \) video

2.3.3 Designing Interface

Pavel et al. presented a technique based on the interactive orientation of shots enabling users to view all the significant content in the film [37]. Poblete et al. proposed a scalable appeal of design on crowdsourced technique [38]. Tang and Fakourfar supported collaborative perspective and interaction through proper awareness on gaze and technique on gesture for 360\(^\circ \) videos [39]. The designing interfaces of 360\(^\circ \) video are completely reviewed in Table 8.

Table 8 Summary on designing interface of 360\(^\circ \) video

2.3.4 User Experience

Broeck et al. proposed a numerous interaction methodology [40]. One task of looking at 360\(^\circ \) videos is endlessly focusing and refocusing intentional targets [41]. To overcome this, Lin et al. addressed an approach on two focus guidance such as Automatic Piloting (directly taking audiences to the goal) and Visual Supervision (representing track of the goal). Nasrabadi et al. proposed taxonomy on 360\(^\circ \) videos and classified them based on the motion of the camera and object [42]. Existing 360\(^\circ \) video user experience techniques, highlights, and challenges are reviewed in Table 9.

Table 9 Summary on user experience of 360\(^\circ \) video

2.3.5 Cybersickness

Bala et al. proposed an investigational study toward comparing and joining numerous available methodologies in 360\(^\circ \) video to minimize cybersickness [43]. Cybersickness of 360\(^\circ \) video is summarized in Table 10.

Table 10 Summary on cybersickness of 360\(^\circ \) video

2.3.6 Summarization

For a long 360\(^\circ \) videos, Sung et al. addressed the issue of story-based time-oriented summarization [44]. An innovative prototype based on memory network (Past Future Memory Network) was proposed. Available techniques, highlights, and challenges about summarization of 360\(^\circ \) video are listed in Table 11.

Table 11 Summary on summarization of 360\(^\circ \) video

2.3.7 Subtitle

Brown et al. designate behaviors of four subtitle (120-degree, static-follow, lag-follow, appear) in order to accomplish user testing in 360\(^\circ \) video experience [45]. A detailed review of the subtitle of 360\(^\circ \) video is illustrated in Table 12.

Table 12 Summary on subtitle of 360\(^\circ \) video

2.4 Quality Evaluation of 360\(^\circ \) Video

This section gives the literature review on assessing the quality of the user-generated 360\(^\circ \) videos. Some of the existing works have been listed in the following subsections.

2.4.1 Standardization

Wien et al. addressed the current status of standardization on focus with scientific aspects associated with the video [46]. Hannuksela et al. give an outline of the foremost edition of the standards in OMAF [47]. Skupin et al. presented the details regarding up-to-date status of precise efforts available in standardization [17]. Azevedo et al. offered some standardization techniques [1]. Domanski et al. proposed different kinds of visual media that are highly immersive [48]. Table 13 describes the standardization of 360\(^\circ \) video techniques, highlights, and challenges.

Table 13 Summary on standardization of 360\(^\circ \) video

2.4.2 Stabilization

Kopf offers a hybrid 2D-3D procedure for 360\(^\circ \) video stabilizing by means of a deformed rotationally moving model [49]. Tang et al. introduce an approach for combined stabilization with the direction of 360\(^\circ \) videos [50]. It includes a precisely designed new motion determination technique for 360\(^\circ \) videos. Stabilization of 360\(^\circ \) video is summarized in Table 14.

Table 14 Summary on stabilization of 360\(^\circ \) video

2.4.3 Assessment

Huang et al. support evaluation of video quality and propose a visual attention model for latitude-based 360-degree videos [51]. Hanhart et al. aim at the quality evaluation scheme recognized by JVET of ITU-T VCEG and ISO/IEC MPEG [52]. Zakharchenko et al. discussed the immersion media delivery format and quality assessment process [53]. Tran et al. investigated the quality benchmark of both subjective and objective for 360\(^\circ \) videos [54]. For 360\(^\circ \) video communication, Tran et al. aid to recognize suitable objective quality benchmark [55]. Xie et al. presented a QoE-based optimization framework [25]. Jiang et al. suggested Plato that outperforms existing strategy in numerous QoE metrics [19]. Corbillon et al. recommended an interactive high-quality mechanism for QoE measurement in Head-Mount-Device audience with small supervision [23]. Table 15 gives the quality assessment of 360\(^\circ \) video.

Table 15 Summary on assessment of 360\(^\circ \) Video

3 Observation

The following are the observations made through this study:

  • The 360\(^\circ \) video is gaining interest among the consumers due to its simplicity.

  • During projection there may be a chance of occurring visual artifacts which are also termed as distortion. Hence, extra caution has to be taken during the process of projection to learn the contents of the clip fruitfully.

  • Once the 360\(^\circ \) video is projected they undergo coding in order to have efficient storage and transmission where the 360\(^\circ \) videos are compressed by preserving the quality of the video.

  • As the viewpoint increases, the concept of streaming becomes difficult. Hence, an efficient approach of streaming the 360\(^\circ \) video has to be done with better visual qualities.

  • The high immersive nature of the video should not lead to motion sickness.

  • 360\(^\circ \) video can be delivered to the user optimally by summing up all the significant informations that are available in the clip.

  • In order to have a clear understanding of the information available in the 360\(^\circ \) clip, the video can be streamed with closed caption (i.e., text form).

  • On the aspect of maximizing the smoothness of visual quality, the video can be stabilized.

  • At the user end, the quality of the video can be checked by using the quality metrics.

4 Conclusion

360\(^\circ \) video can offer an immersive experience for the users. As the FoV in 360\(^\circ \) video increases in comparison with the standard videos, they encompass a huge amount of information. Due to the high resolution, 360\(^\circ \) video processing, transmission, and displaying have to be done efficiently. This article presents the various techniques, highlights, and challenges involved in processing, transmission, and displaying the 360\(^\circ \) video. At the viewer end, the decoded video has to be checked for its standardization, stabilization, and the quality of experience, to analyze the video for high standards, increased immersion, and improved QoE, respectively. The various techniques involved in the mechanism of standardization, stabilization, and QoE are listed in this survey with its highlights and challenges. The overall challenges faced in 360\(^\circ \) videos are the high rate of compression and improvement in the quality-based viewport prediction.

On the aspect of future trends, the 360\(^\circ \) videos are growing at a faster pace. In the near future, this technology will experience a huge leap. The major role of 360\(^\circ \) video is storytelling with an immersive environment. Further improvement in terms of the cost may be possible in the coming years to give users an immersive experience. Faster improvement in the 360\(^\circ \) technology and inexpensiveness of the equipment makes the 360\(^\circ \) video to spread swiftly across many industries in the near future. In the upcoming years, for high-end performance, the 360\(^\circ \) technology will provide a high level of video capturing with a High Dynamic Range (HDR).