1 Introduction

Stereo imaging technology is one of the important directions of information processing development in the future. Stereo image quality evaluation pushes forward the development of stereoscopic imaging technology (Su et al. 2015). Image quality assessment is the key technology and fundamental research in the development of image technology. At present, many studies in the research of image quality assessment have made great progress, especially in the quality evaluation of two-dimensional image which has already formed a mature subjective and objective scheme. Compared with the quality evaluation of the two-dimensional image, the research on the quality evaluation of stereoscopic image is still in its infancy stage (Yang et al. 2015; Lee et al. 2013). In recent years, with the increasing attention and application of stereo image technology, the development of stereo image quality assessment is extremely urgent.

Stereo image technology can provide depth information to enhance the image stereo sense (Wöpking 1995). In a variety of stereoscopic image compression or transmission systems, it is often necessary to evaluate the degree of stereo image compression or to adjust the related parameters in order to achieve a better compression or transmission effect (IJsselsteijn et al. 2000). Thus, the quality assessment of stereoscopic image plays an important role in the field of stereo image compression, processing and communication. It is of great practical value and practical significance to study the quality evaluation method of stereoscopic image. There are two main methods of stereo image quality assessment: subjective quality assessment and objective quality assessment. The subjective quality evaluation is to allow a large number of testers to score the visual effects of the test images according to the prescribed evaluation rules (Zhou et al. 2011; Wang et al. 2009). The original data obtained from all the tests were processed to get the effective experiment data, and the subjective quality evaluation value of the test image is then obtained by statistical analysis (Dumić et al. 2017). The objective quality evaluation method is to design a set of models by means of mathematical methods to automatically score the test images, and then verify the reliability of the model through the correlation with the subjective quality assessment (Battisti et al. 2015). Thus, the main purpose of the objective quality assessment method is to make the objective evaluation value consistent with subjective perception as accurately as possible. However, due to the lack of understanding of the human vision system, it is difficult to establish an accurate and unified model, which led to directly affect the accuracy of the objective evaluation and couldn’t truly reflect the merits of the stereo image quality.

The subjective evaluation of stereoscopic images plays an important role in promoting the objective quality evaluation of stereoscopic images (Lambooij et al. 2007). It has been different from the subjective evaluation of the two-dimensional image that the subjective evaluation of the stereo image has faced much more introduced challenges, such as depth perception and asymmetric stereo compression (Wang et al. 2015; Patterson 2007).

Many scholars are trying hard to explore the subjective evaluation of stereoscopic image quality, and have achieved some research results. Tam (2007) studied the distortion produced during the encoding and transmission of stereoscopic images, and gave the quantitative relationship between the subjective perception quality and distortion of the stereoscopic images. Moreover, in previous study conducted by Seuntiens et al. (2006), the researchers investigated the relationship between the subjective perceptual quality of the stereoscopic image and the quality of the left and right view images under the block effect and the fuzzy distortion in the encoding and transmission process. Stelmach and Tam (1998) has analysed the influence factors of image perception quality evaluation, and their results showed that the human eye is very sensitive to visual depth and left and right image quality. Lin and Wu (2014) have examined the relationship between the perceived quality of stereo images and the quality of left and right viewpoint images given under different compression rates. Their results supported the idea in previous study of Stelmach et al. (2000) that the quality of stereoscopic image perception is mainly affected by the better quality side in the left and right views, that is to say, the quality of the one view can be guaranteed in the rate allocation process, and then the rate of the other view is reduced properly, thus ensuring the quality of the stereoscopic image in general hold the same and remained unchanged. This discovery also provides a new perspective for the rate allocation theory.

However, the traditional subjective quality evaluation method relies on artificial observations for multiple tests (Saygili et al. 2011), it is easy to be restricted and affected by many factors and the operation is complex that it could not be used in batch processing. In addition, it has no repeatability and high cost, which always result in that the evaluation results are inaccurate and poor in stability (Qi et al. 2015; Lambooij et al. 2011). These shortcomings have become the key points for a complete and unified stereoscopic image quality evaluation scheme to be broken through, and also become the hot and difficult points in the current subjective evaluation research.

Human beings are the ultimate recipients of stereoscopic images and human eyes have binocular vision properties (Kim and Sohn 2011). The human visual system can form depth perception through the fusion of the left and right binocular disparity, thus creating a three-dimensional sense and increasing the ability of stereopsis identification. In order to reliably predict the quality of stereo images, the subjective quality assessment method should base on a subjective binocular fusion experiment to conduct the effect of observer’s limited stereopsis experience. Moreover, the traditional image quality evaluation method cannot be simply transplanted into the evaluation of stereoscopic images, and the image quality evaluation should be carried out in the case of binocular fusion to form stereovision, so as to really reflect the subjective feeling of human stereo vision (Seshadrinathan et al. 2010). Thus, a stereo image quality assessment method based on binocular vision fusion should be established on the basis of the stereo vision characteristics combing with the human stereophysics and physiophysics.

Therefore, the present study, embarking from human visual perception and subjective selective attention mechanism, considers the factors which have great influence on the image quality by visual attention mechanism, so as to better reflect the processing process of human binocular vision and ensure the consistency of the output value and subjective perception.

Since subjective assessment method provides technical parameter for the objective quality evaluation method, and can reduce the gap between the objective quality evaluation value and the direct subjective observation results. In order to overcome the shortcomings of the traditional subjective quality evaluation method that restricted by subjective rating system, the present study proposes a subjective evaluation method of stereo image quality on the basis of the characteristics of human binocular vision and the psychophysics of stereoscopic fusion.

2 Methodology

In this section, a subjective assessment framework for stereo images based on stereoscopic fusion in binocular vision is proposed. As can be seen in Fig. 1, the major part of this framework including (1) the visual response of two eyes and the binocular fusion behaviours in visual systems, (2) neural processing of visual information in brain and (3) functions: perception and actions.

Fig. 1
figure 1

A simplified illustration of the subjective assessment framework

2.1 The stimuli of stereo image

Setting up testing stereo images is the basic work for subjective assessment experiment. To set up testing stereo images, this study first selected six original 3D stereo video sequences that download from (ftp://ftp.ivc.polytech.univ-nantes.fr/NAMA3DS1_COSPAD1/Avi_videos/HRC_00_Reference/), and all this original sequences are in full stereo format, with 25 fps frame rate and are 16 s long (Dumić et al. 2017). Then, in order to obtain the stereoscopic images stimulus which was similar to the stimuli type adopted in the study of Moorthy et al. (2013), this study used first frame from each sequence, left view and right view to form a stereo image pair.

The testing stereo image pairs required for present experiment were partly generated from the compressed original stereo image pairs. The compression was using the H.264/AVC encoder from the JVT JM18.6 reference software package. The types of the stereoscopic image compression can be classified according to the relationship between the quantization parameters of the right-view image and the left-view image (Lin and Wu 2014). The type of stereo image compression in the present study was asymmetric stereo image compression (Fig. 2).

Fig. 2
figure 2

The type of stereo image compression in the subjective assessment study

The design of the asymmetric compressed stereo images in the experiment test was described as follows: First, the left and right view images of the original stereoscopic image are compressed according to the compression degree of 12%, 18%, 24%, 30%, 36%, 42%, 48%, 54%, 60% and 66% respectively, and then each of left and right view images of the stereo images under different degrees of compression are obtained. Afterwards, the left view image of asymmetric compressed stereo image pair is the left view image of the original stereo image pair, and the right view image of asymmetric compressed stereo image pair has been constitute by the right view image of the original stereo image pair except that the middle rectangle area was replaced by the corresponding middle rectangle region of the right view image under each of the different compression degree. The size of the left and right view image of the stereo image pair is 8.86 cm × 8.86 cm, and the middle rectangle area is a square area of 1.59 cm × 1.59 cm.

2.2 Observers

The process of subjective evaluation of stereo image quality has strict requirements on the test environment (Site 2003; Team 2003) and the choice of the observer. In the present study, the whole test process of subjective evaluation was carried out in a darkroom, and during the experiment, we should keep quiet and keep the indoor temperature moderate.

In this study, the CRT display was used to display the stereoscopic image pairs. The resolution of the display is 1024 × 768, the refresh frequency is 60 Hz, and the viewing angle of the image is 10.5°. Stereo images were displayed horizontally on the screen at the same time. The observers watch the stereo image through binocular stereo mirror. The binocular stereoscope makes the two offset images presenting separately to the left and right eye of the viewer, and then these two-dimensional images are combined in the brain to give the perception of depth.

A total of 18 observers participated in the present study. These observers have good stereoscopic vision and no stereoscopic blindness. In addition, all the physiological metric such as eyesight and color discrimination of all the observers were normal after correction. The observers ranged from 18 to 28 years old, including eight males and ten females.

2.3 Design

The presentation control of the stereo image sequence is divided into two aspects: one is the order control between different experimental groups, and the other is the control of the presentation of the testing sequences of stereo images for each of group within the experiment. The order of presentation sequence between different experimental groups was calculated using Latin square algorithm to obtain the testing order for different observers. Latin square designs are efficient designs to study the effect of one treatment factor in the presence of many other factors. Latin square has order like permutation and combination, and ensures that all the factors have the same number of levels. Latin square designs are restricted, and its property is an elementary property of all groups and the defining property of quasigroups in mathematics.

A total of six groups of the original stereo images used in this study are shown in Fig. 3.

Fig. 3
figure 3

First frame, left view and right view, from each of the tested sequences

Figure 4 illustrated the sequence of testing procedure of experiment. As shown in Fig. 4, each trial started with a presentation of fixation point. The observers had to response as accurately and quickly as possible after all the stimulus of each trail has ended. In each of the testing procedure, observers began each trial by presenting cross fixation in the centre for 500 ms, after which the target stimulus (stimulus I) appeared for a period of duration of 2000 ms, immediately after the stimulus presentation a blank screen (300 ms) was followed, and then another stimulus (Stimulus II) was presented for 2000 ms. After observers response with feedback, the trail was ended.

Fig. 4
figure 4

Testing procedure

There are six groups of asymmetric compressed stereo images, and each group of asymmetric stereo images will have multiple experimental trials. The stereoscopic image presentation process is shown in Fig. 5. In the test of asymmetric compression image with each of the compression degree, the total playing time of the stereoscopic image stimulus was denoted as t, where t is a fixed value, and the value of t in each trail is 2000 ms. As shown in Fig. 5, the presentation process of stereoscopic images is divided into three stages. The first stage of the presentation is t1. This stage presents the original stereo image pair. The second stage, the presentation time of which was denoted as ∆t2, presented asymmetrical compressed stereo images. The third stage presents the same time as the first stage, and this stage still present the original stereo image pairs. The initial value of t1 in the experiment was 900 ms, and the initial value of ∆t2 was 200 ms. During the test, the value of ∆t2 was dynamically adjusted according to the feedback of the observer. Since the total time t is a fixed value that the value of t1 is also in adjustment to keep t = t1 + ∆t2 + t1. The reason for the three stage play control is to compare the original stereoscopic images played in the first and third stages and the asymmetric compression stereoscopic images of the second stage, and then combined with the human visual attention mechanism, allowing the observers perceive the “jitter” in the middle region of the unrelated compressed stereoscopic mage.

Fig. 5
figure 5

Schematic diagram of the presentation process

During the test, the dynamic adjustment of ∆t2 uses the three-down one-up program measurement method (Levitt 1971). If the observer judged correctly three times, the length of the unrelated fragments decreased. If the observer misjudged one time, the length of the unrelated fragment increased. That is, if the tester observes the “jitter” of the image and judges correctly for three consecutive times, the presenting time of the asymmetric compression stereo image is shortened by a certain step length; If the tester did not observe the “jitter” of the image, one judgment error was made, and the presenting time of the asymmetric compression stereo image was increased by a certain step length. The initial step d of the test is ∆t2/2, and the step length d in the same direction changes at the ratio of 0.5, until the step length d decreases to a certain fixed value, and the fixed value is set to 1 ms in this trail. The inflection point refers to the point in the upward or downward direction of the change curve in mathematics, intuitively the tangent passes through the curve. In the test, when the step d reaches 1 ms, the control program will end the quality evaluation test.

2.4 Procedure

All observers are required to be trained before the formal experiment of subjective quality assessment of stereo image. The purpose of training is to enable participants to correctly follow the evaluation instruction and to be familiar with the process of testing so as to ensure the accuracy and reliability of the output results data.

The formal experiment was performed in a dark room. Stereoscopic vision was made possible by using stereoscope (Stidwill and Fletcher 2010) so that each image was visible to one of the participants’ eyes only. The observers were seated in front of the stereoscope with their chin in a chin rest to maintain constant viewing distance being at 48 cm, and their head position was stabilized using a chin rest and a head rest. During the formal testing, the left and right view images to be tested are displayed at the same time. The left and right view images are horizontally arranged on the display, and the centre of the left and right view images is 17.71° apart.

The subjective assessment of stereo image method used in this study is partially the same as other subjective evaluation methods, which requires the observers to complete the test independently to reduce the influence of external factors on the output results. The observers have to choose one that appears “jitter” phenomenon in the middle area of the stereoscopic image of the two choices as quickly and accurately as possible. Specifically, if the observers perceived the “jitter” phenomenon of the stereoscopic image, press the left mouse button, otherwise click the right mouse button. When the testing sequence in a trail is finished, the observers should immediately select a key response, and then the corresponding real-time feedback that whether it is correct or not will display on the screen. The testing session for each group lasted about 10–15 min. After each session, the observes needed to take a rest 2–3 min to continue the next session of tests, which could avoid visual fatigue and physical discomfort during a long time test.

3 Experimental data analysis and results

The experimental data collected in this study is value of presentation time ∆t2 which indicated the playing time of the asymmetrical compressed stereo images at each compression ratio level. The test values of ∆t2 under each compression ratio level will include 15 inflection points, and the average value of the final six inflection points was calculated as the measurement threshold under this corresponding compression ratio level. The testing procedure of each group at the corresponding compression ratio level should be repeated three times, and then the average threshold of this three measured threshold will be used as the ultimate threshold of the compression quality grade.

According to the measurement method aforementioned in the present study, the experimental results are shown in Table 1.

Table 1 Experiment results

The threshold in Table 1 is in a unit of time-length with milliseconds, which is the average result for each stereo image under the corresponding compression degree. Figure 6 shows group-mean threshold for each of the 6 groups of stereo images as a function of the compression-ratio for 18 participants. The resulting threshold of the six groups of testing sequences across each of compression ratio condition is expressed in six colours respectively.

Fig. 6
figure 6

Assessment results

As can be seen in Fig. 6, the results of the present study showed that when the degree of compression of stereo image was high enough, human observers were easily able to detect the compressed region within the stereoscopic image and perceived the “jitter” phenomenon of the stereoscopic image.

When the duration of compressed region was fixed at a sufficiently long value, this time length was used as the parameter in the adaptive procedure. The results indicated that the observers can detect the occurrence of “jitter” with the longest ∆t2 being in the range of 230.5–10.1 ms, and the ∆t2 was shorter with high compression degree of the middle region within the stereo image than that with the low compression degree. For the middle region within the stereo image with low compression degree of 12%, the results of this study indicated that, observers could detect the occurrence of “jitter” up to the duration of the compressed region of 228–232 ms. With the increase of the compression degree from 0.12 to 0.18, 0.24, 0.3, 0.36, 0.42, 0.48, 0.54, 0.60 and 0.66, the average measure of the longest ∆t2 in 18 participants gradually reduced from 232.4 to 206.9, 194.3, 169.2, 99.6, 49.2, 39.8, 32.4, 22.6 and 8.7 ms, respectively, and the group mean duration of ∆t2 reduced from 230.5 to 203.5, 190.5, 167.3, 97.4, 45.3, 36.6, 29.5, 20.3 and 10.1 ms, respectively.

Figure 7 shows the average threshold for each of the compression ratio across 18 observers, in which group-mean of sensitivity threshold for each of the compression ratio across observers was displayed. Error bars indicate the standard errors of the mean.

Fig. 7
figure 7

Group-mean experimental results

According the experimental results, it can be seen that the overall trend of test values across six stereo images were consistent, and the value of ∆t2 duration decreases with the increase of the compression degree of the stereoscopic image. The visual sensitivity of the six groups of stereoscopic images tends to be consistent with the actual compression quality of the stereoscopic image, and the test results are in line with the human visual characteristics. Therefore, the subjective quality evaluation scheme proposed is feasible and effective.

The experiment data from previous study (Liu and Uang 2016) supported the view in the present study that stereopsis produced binocular percepts. The results also support the idea that binocular cues take advantage of binocular vision which allowing each eye to receive a slightly offset view of the same visual scene and include stereopsis. Generating the required information using the traditional methods to evaluate the motivation of human stereopsis perception is very costly and time demanding, which including surveys, statistical techniques and subjective psychological analysis (Bordel and Alcarria 2017). The method in the present study was helpful to the development of robotic visual servo control system, as practical large scale deployment of vision-based monitoring application for assisted living is gaining increasing attention (Sathyanarayana et al. 2015). The results of present study was also helpful to the real-time designing and evaluating safety services (Mettel et al. 2018) for older adults.

Since stereo image transmission requires a nominal doubling of the bandwidth of a video signal, and additional savings in bandwidth could be obtained by taking advantage of properties of the human visual system, the proposed subjective evaluation method of stereo image compression quality can not only help achieving certain efficiency of stereo image transmission but also ensure the consistency of the directly output evaluation value and the sensitivity of stereoscopic perception. Therefore, this method, which exhibits good feasibility and effectiveness, would provide an effective means to accelerate the transmission process of stereo image in the current network environment with limited bandwidth.

4 Conclusion

The quality evaluation of stereoscopic image plays an important role in the field of stereo image transmission. In the present study, on the basis of human visual characteristics and the particularity of stereoscopic images, a subjective evaluation scheme of stereo image quality based on binocular vision fusion is designed. Compared with the existing traditional evaluation method that most was subjective rating system, this scheme combines the visual attention mechanism of human visual system and the method of physiological psychology to introduce a frame of subjective evaluation model of stereoscopic image quality based on visual fusion. The proposed method can not only help achieving certain efficiency of stereo image transmission but also facilitate the consistency of the directly intuitive output evaluation value and the sensitivity of stereoscopic perception. The new method is more intuitive, and has better feasibility and effectiveness. Moreover, this subjective quality evaluation scheme greatly reduces the shortcoming that the traditional subjective quality evaluation method was usually restricted and influenced by the subjective rating system.