Keywords

1 Introduction

Recently, many researches have focused on the development of image quality assessment (IQA), and stereoscopic/3D-IQA has becoming a super-hot.

The existing stereoscopic image quality assessment (SIQA) methods can be classified into full-reference (FR) [1,2,3], reduced-reference (RR) [4], and no-reference (NR) methods [5,6,7,8,9]. Our work aims emphasis on no-reference stereoscopic image quality assessment (NR-SIQA), in which no reference image information is available. Akhter et al. [5] propose a NR-SIQA algorithm by extracting segmented local features of artifacts from stereo pairs and the estimated disparity map. Chen et al. [6] first use the binocular fusion model to compute cyclopean image from left and right images of stereo pairs. Features including 2D features and 3D features are extracted in spatial domain by natural scene statistics (NSS). In [7], Su et al. also consider the binocular combination model to generate a convergent cyclopean image from left and right images of stereo pairs. Spatial domain univariate NSS features, wavelet domain univariate NSS features, and bivariate density and correlation NSS features are extracted from the convergent cyclopean image. Zhou et al. [8] utilize the complementary local patterns of binocular energy response and binocular rivalry response to simulate the binocular visual perception. The local patterns of the binocular responses’ encoding maps are used to form various binocular quality-predictive features. In [9], Wang et al. construct feature vectors from binocular energy response and then use the machine learning method to learn a visual quality prediction model. However, the problem is that all these existing methods do not simultaneously take perceptual factors affecting 2D image quality and 3D stereo perception into consideration. Furthermore, 3D visual characteristics are partial simulated. In order to design a well-defined NR-SIQA method, quality relevant features from the independent left and right images could be useful. Moreover, features from transformed domain of cyclopean images are supplements. Others 3D visual perception information should be involved.

In this paper, we propose a perceptual NR-SIQA algorithm, in which quality relevant information including stereo pairs, cyclopean image and binocular disparity are considered. Features are extracted from the wavelet coefficients using natural scene statistics in the wavelet domain when stereo pairs, cyclopean image and binocular disparity map are transformed by the steerable pyramid decomposition. Support vector regression (SVR) is utilized to learn a regression model to predict the quality of stereoscopic image.

The remainder of this paper is organized as follows. The proposed algorithm is described in detail in Sect. 2. Experimental results are analyzed in Sect. 3, and finally conclusions are drawn in Sect. 4.

2 Proposed Algorithm

The flowchart of the proposed NR-SIQA algorithm is shown in Fig. 1.

Fig. 1.
figure 1

Flowchart of the proposed algorithm

2.1 Binocular Disparity Search

The binocular disparity of one point in 3D image is the distance between two projected points in the left image and the right image, and the estimate of disparity for any point in left image is to find the same point in the right image.

In this paper, a Gaussian average SSIM based disparity search algorithm is proposed. SSIM index [10] measures the similarity between two image patch by

$$ {\text{SSIM}}\left( {1,{\text{r}}} \right) = \frac{{\left( {2\mu_{l} \mu_{r} + c_{1} } \right)\left( {2\sigma_{lr} + c_{2} } \right)}}{{\left( {\mu_{l}^{2} + \mu_{r}^{2} + c_{1} } \right)\left( {\sigma_{l}^{2} + \sigma_{r}^{2} + c_{2} } \right)}} $$
(1)

where \( \mu_{l} \) and \( \mu_{r} \) are the average of left image patch and the right mage patch. \( \sigma_{l}^{2} \) and \( \sigma_{r}^{2} \) are the variance of left image and right image. \( \sigma_{lr} \) is the covariance of \( {\text{l }} \) and r. \( c_{1} \) and \( c_{2} \) are two parameters avoiding meaningless of the equation

In the process of disparity search, the leftmost part of the left image and the rightmost part of the right image should be discarded, since they cannot be captured by both cameras. For any point in left image with coordinate \( \left[ {x_{l} ,y_{l} } \right] \), its related point in the right image is search from \( [x_{l} - {\text{range}},y_{l} ] \) to \( [x_{l} + {\text{range}},y_{l} ] \). In our experiment, the range is 32. The matched point in the right image has the largest similarity between the current point in the left image and the matched point in the search range of right image. The SSIM value between point in the left block and related point in the right image is calculated, and all the SSIM values are merged together with a Gaussian weighted sum. Figure 2 shows the stereo pairs and disparity image. Figures 2(a) and (b) denote the right view and left view of stereo pairs with high quality, respectively. The estimated disparity image of the stereo pairs is shown in Fig. 2(c). Figures 2(d) and (e) denote stereo pairs with gauss noise distortion, and the estimated disparity is shown in Fig. 2(f).

Fig. 2.
figure 2

Stereo pairs and disparity image

2.2 Cyclopean Image Generation

Cyclopean image is a single metal image of a scene created by the brain when combining two images from the two eyes. This process can be explained by binocular vision combination characteristic.

In this paper, the latest biological model, Gain-Control Theory model is utilized to simulate binocular fusion and explain cyclopean perception. The first step of cyclopean image generation is to decide the base image between the left image and the right image. In our implement, the image with better quality is selected as the base image, while the other one image is the aid image.

If the left image is selected as base image, the synthesized cyclopean image \( {\text{I}}({\text{x}},{\text{y}}) \) is calculated as follows,

$$ \begin{array}{*{20}l} {{\text{I}}\left( {{\text{x}},{\text{y}}} \right) = \omega_{L} \left( {x,y} \right) \cdot I_{L} \left( {x,y} \right)} \hfill \\ {\quad \quad \quad \,\,\, + \omega_{R} \left( {x - D_{L} (x,y),y} \right) \cdot I_{R} \left( {x - D_{L} (x,y),y} \right)} \hfill \\ \end{array} $$
(2)

where

$$ \omega_{L} = \frac{{E_{L} (x,y)}}{{E_{L} \left( {x,y} \right) + E_{R} (x - D_{L} \left( {x,y} \right),y)}} ; \ \omega_{R} = \frac{{E_{R} (x - D_{L} \left( {x,y} \right),y)}}{{E_{L} \left( {x,y} \right) + E_{R} (x - D_{L} \left( {x,y} \right),y)}} $$

While if the right image is selected as base image, the synthesized cyclopean image \( {\text{I}}({\text{x}},{\text{y}}) \) can be calculated as,

$$ \begin{array}{*{20}l} {{\text{I}}\left( {{\text{x}},{\text{y}}} \right) = \omega_{L} \left( {x + D_{R} (x,y),y} \right) \cdot I_{L} \left( {x + D_{R} (x,y),y} \right)} \hfill \\ {\quad \quad \quad \;\;\; + \omega_{R} \left( {x,y} \right) \cdot I_{R} \left( {x,y} \right)} \hfill \\ \end{array} $$
(3)

where

$$ \omega_{L} = \frac{{E_{L} \left( {x + D_{R} \left( {x,y} \right),y} \right)}}{{E_{L} \left( {x + D_{R} \left( {x,y} \right),y} \right) + E_{R} (x,y)}}; \ \omega_{R} = \frac{{E_{R} (x,y)}}{{E_{L} \left( {x + D_{R} \left( {x,y} \right),y} \right) + E_{R} (x,y)}} $$

In Eqs. (2) and (3), \( D_{L} (x,y) \) and \( D_{R} (x,y) \) are the estimated disparity map computed using the left image and right image as the base image, respectively. \( \omega_{L} \) and \( \omega_{R} \) are weighting maps. \( E_{L } \) and \( E_{R} \) denote the sum of the energies of wavelet coefficient computed using a steerable pyramid. The stereo pairs and synthesized cyclopean image are shown in Fig. 3. Figures 3(a) and (b) show the stereo pairs with no distortion existing, and the synthesized cyclopean image computed from Fig. 3(a) and (b) is shown in Fig. 3(c). Stereo pairs with gauss distortion are shown in Fig. 3(d) and (e), respectively. Figure 3(f) shows the cyclopean image with gauss distortion.

Fig. 3.
figure 3

Stereo pairs and cyclopean image

2.3 Features Extraction

The strategy that adopted for feature extraction is based on natural scene statistics (NSS), which has been proved to be effective in image quality assessment [11].

To extract features, the estimated disparity image, synthesized cyclopean image and image with better quality from the stereo pairs are processed by wavelet decomposition to form oriented band-pass responses. The motivation is based on the fact that the scale-space-orientation decomposition achieves a high performance in modeling the visual signal processing mechanism, which occurs in the primary visual cortex of human visual system (HVS).

In our implement, we perform wavelet decomposition over 2 scales and 6 orientations. Thus, 12 sub-bands across scales and orientations labeled \( S_{\beta }^{\theta } \left( {\upbeta \in \left\{ {1,2} \right\} {\text{and }}\uptheta \in \left\{ {0^{^\circ } ,30^{^\circ } ,60^{^\circ } ,90^{^\circ } ,120^{^\circ } ,150^{^\circ } } \right\}} \right) \) are obtained for each image. Then, a series of statistics features will be extracted from the obtained sub-band coefficients as follow.

2.3.1 Single Sub-band-Based NSS Feature

To extract single scale feature, it is not necessary to use all the sub-bands, since some of the sub-bands may be correlated with others. In this paper, four sub-bands (i.e., \( S_{1}^{{0^{^\circ } }} ,S_{1}^{{90^{^\circ } }} ,S_{2}^{{0^{^\circ } }} \,{\text{and}}\,S_{2}^{{90^{^\circ } }} \)) are chosen to extract single sub-band based feature.

The single sub-band coefficients statistics of the cyclopean image in Fig. 3(c) for various distortions are shown in Fig. 4. It demonstrates that the probability density distribution of the sub-band coefficients exhibits a Gaussian-like appearance, which can be effectively captured by a generalized Gaussian distribution (GGD). The density function of GGD with zero mean is given by,

Fig. 4.
figure 4

Single sub-band statistics of the cyclopean image

$$ f(x;\alpha ,\sigma^{2} ) = \frac{\alpha }{{2\beta\Gamma (1/\alpha )}}\exp \left( { - \left( {\frac{|x|}{\beta }} \right)^{\alpha } } \right) $$
(4)

where \( \beta = \sigma \sqrt {\frac{{\Gamma (1/\alpha )}}{{\Gamma (3/\alpha )}}} , \) and \( \Gamma (.) \) is the gamma function: \( \Gamma (\alpha ) = \int_{0}^{\infty } {t^{a - 1} } e{}^{ - t}dt{\text{ a > 0}} \). In the model of GGD, the shape parameter \( \upalpha \) controls the real shape of the distribution, and \( \sigma^{2} \) controls the variance. We use the parameter \( \upalpha \) and \( \sigma^{2} \) as the quality relevant features, which can be reliably estimated using the moment-matching based approach in [12]. Features can be extracted from the wavelet coefficients of cyclopean image, disparity image and stereo pairs.

2.3.2 Spatial Correlation-Based NSS Feature

There is a high correlation between sub-bands over the same scale and different orientation, as well as sub-bands over different scale and same orientation.

In Fig. 5(a), we plot the probability density distribution of coefficients formed by \( [S_{1}^{{0^{^\circ } }} :S_{1}^{{90^{^\circ } }} ] \) to demonstrate the sub-bands correlation over the same scale and different orientation. To show the correlation over different scale and same orientation, the probability density distribution of coefficients formed by \( [S_{1}^{{0^{^\circ } }} :S_{2}^{{0^{^\circ } }} ] \) is shown in Fig. 5(b).

Fig. 5.
figure 5

Sub-bands statistics of spatial correlation

To extract features, the GGD model can also be utilized to fit the Gaussian-like distortion. The parameter \( (\upalpha,\sigma^{2} ) \) of GGD is extracted as features from the wavelet coefficients of cyclopean image, disparity image and stereo pairs.

2.3.3 Spatial Difference-Based NSS Feature

In our work, we calculate the difference between two sub-bands coefficient at the same scale across different orientation as,

$$ D_{1}^{{(\theta_{i} - \theta_{j} )}} = S_{1}^{{\theta_{i} }} - S_{1}^{{\theta_{j} }} ;\,D_{2}^{{(\theta_{i} - \theta_{j} )}} = S_{2}^{{\theta_{i} }} - S_{2}^{{\theta_{j} }} $$
(5)

where \( \theta_{i} \in \{ 0^{^\circ } \} \) and \( \theta_{j} \in \{ 0^{^\circ } ,30^{^\circ } ,60^{^\circ } ,90^{^\circ } ,120^{^\circ } ,150^{^\circ } \} \).

The probability density distribution of \( D_{1}^{{(0^{^\circ } - 90^{^\circ } )}} \) and \( D_{2}^{{(0^{^\circ } - 90^{^\circ } )}} \) computed from the cyclopean image are shown in Fig. 6(a) and (b). Similarly, to extract features from spatial difference, the GGD model is used to fit the distributions of the difference. Thus, the parameter \( (\upalpha,\sigma^{2} ) \) of GGD computed from cyclopean image, disparity image and stereo pairs are extracted as features.

Fig. 6.
figure 6

Sub-band statistics of spatial difference

2.4 Quality Prediction

Machine learning is applied to map these features to quality score. Specifically, in the train phase, the best map between features and subjective quality scores (MOSs) included in the 3D image databases is obtained by the regression module. In the test phase, correlation between predicted objective scores obtained by SIQA algorithm and subjective scores is received. Multiple iterations of the above training and testing procedure are performed by varying the splitting of data over the training and test sets.

3 Experimental Results and Analysis

3.1 Databases and Evaluation Criteria

The proposed 3D-IQA method is evaluated on two publicly available subject-rated image quality databases: LIVE 3D IQA Database Phase I [13] and LIVE 3D IQA Database Phase II [6]. The LIVE 3D IQA Database Phase I consists of 20 reference images and 365 distorted images. Five types of distortions: JPEG and JPEG2000 (JP2K) compression, Gaussian blur (GB), White noise (WN) and a Rayleigh fast-fading (FF) are symmetrically applied to the left and right reference images at various levels. The LIVE 3D IQA Database Phase II consists of 120 symmetrically and 240 asymmetrically distorted images generated from 8 reference images, with the same distortions as Database Phase I.

Three criterions including Pearson linear correlation coefficient (PLCC), Spearman rank order correlation coefficient (SROCC) and root mean squared error (RMSE) are used to evaluate the SIQA metrics. PLCC and RMSE are used to evaluate prediction accuracy of SIQA metrics, and SROCC is used for prediction monotonicity. Before computing these performance criterions, a nonlinear regression analysis is utilized to provide a nonlinear mapping between the objective scores and subjective mean opinion scores (MOSs). For the nonlinear regression, a five-parameter logistic function [14] is used,

$$ f(x) = \beta_{1} .\left( {\frac{1}{2} - \frac{1}{{1 + e^{{\beta_{2} (x - \beta_{3} )}} }}} \right) + \beta_{4} x + \beta_{5} $$
(6)

where \( \beta_{i} ,i = 1,2 \ldots 5 \), are parameters determined by the best fit of subjective scores and the objective scores. Higher SROCC and PLCC and lower RMSE values demonstrate better objective SIQA metrics.

3.2 Overall Performance Comparison

The overall performance comparison of proposed algorithm and other NR-SIQA methods on both LIVE 3D image database Phase I and Phase II is shown in Table 1. The top metric has been highlighted in boldface. It can be seen from Table 1 that the proposed algorithm significantly outperforms (i.e., PLCC and SROCC are highest, while RMSE is lowest) all the considered other metrics on both the two databases. Meanwhile, the proposed metric and other NR-SIQA metrics achieve a better performance on LIVE 3D image quality database phase I than those on database phase II. The reason is that LIVE 3D image quality database phase I contains only symmetric distortions, while phase II contains both symmetric and asymmetric distortions. It is much more difficult to assess the stereoscopic images with asymmetric distortions. That is because of the limited understanding of human visual system when human eyes watching asymmetric stereoscopic images.

Table 1. Overall performance on LIVE Phase I and Phase II

We also evaluate our proposed algorithm on each type of distortion. To make a comprehensive comparison, several state-of-the-art FR-SIQA metrics (i.e., Benoit [1], You [2], and Chen [3]) are also considered. Performances on LIVE 3D image database Phase I and Phase II are listed in Tables 2 and 3, respectively. The top two metrics are highlighted in boldface, and italicized algorithms are FR-SIQA algorithms. It can be seen that the proposed metric belongs to the top two metrics, which mean that the proposed method can predict the stereoscopic image quality consistently across different types of distortions. For FR-SIQA metrics, Benoit’s and You’s methods are based on 2D-IQA algorithms and simultaneously consider disparity information. Chen’s algorithm in [3] uses the SSIM index to assess the quality of cyclopean image. Even though the reference information is available in these FR-SIQA metrics, the performance is not better than our method. For NR-SIQA metrics, the scheme in [6] extracts NSS-model-based features of cyclopean in spatial domain, but others 3D visual perception are not considered adequately. Therefore, the performance is worse than our proposed algorithm.

Table 2. Performance on each type of distortion on LIVE Phase I
Table 3. Performance on each type of distortion on LIVE Phase II

4 Conclusion

In this paper, we propose a novel no-reference quality assessment for stereoscopic image using wavelet decomposition and natural scene statistics. We find that the statistics of the wavelet coefficients can be effectively captured by a generalized Gaussian distribution (GGD). Therefore, the parameters of GGD are extracted as the quality relevant features. To extract feature, stereo pairs, cyclopean image and binocular disparity image are considered. Finally, machine learning is applied to map these features to quality score. Experimental results demonstrate that the proposed algorithm achieves high consistency with subjective assessment on two public available 3D image quality assessment databases.