Full reference image quality assessment based on dual-space multi-feature fusion

Wu, Xinrong; Shi, Zhiming

doi:10.1007/s00530-024-01353-5

Full reference image quality assessment based on dual-space multi-feature fusion

Regular Paper
Published: 18 May 2024

Volume 30, article number 151, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Systems Aims and scope Submit manuscript

Full reference image quality assessment based on dual-space multi-feature fusion

Download PDF

Xinrong Wu¹ &
Zhiming Shi¹

103 Accesses
Explore all metrics

Abstract

At present, the majority of techniques for assessing image quality are limited to extracting features from an image in a single space. This paper proposes a new dual-space multi-feature fusion based method for full-reference image quality assessment. This method involves simultaneously extracting features from both the YIQ and L^*a^*b^* color spaces. First, we extract the luminance, slope, chroma, and gradient features in the spatial domain of the image to describe the salient differences in the image. Second, based on contrast sensitivity characteristics, we extract spatial frequency features in the spatial domain of the image to represent frequency differences in the image. Next, merge the features extracted in the dual space to construct a quality perception feature vector. Finally, the feature vector is input into the Random Forest model for regression prediction to obtain the predicted score of the image. Many experiments have been carried out on the four public datasets, and contrasted with other methods. The experimental confirm that the proposed method predicts image quality more accurately. The MATLAB source code and dataset of this paper will be published on GitHub, and the corresponding author can be contacted if necessary.

Image quality assessment via multiple features

Article 22 December 2021

Image Quality Assessment Using a Combination of Hand-Crafted and Deep Features

New feature selection algorithms for no-reference image quality assessment

Article 10 March 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Recently, with the rapid development of internet technology, many kinds of media formats have filled people’s daily lives, and the sharing and transmission of images and videos are more frequent [1]. However, during the process of image capture and transmission, factors such as transmission protocols and signal interference can lead to image distortion, resulting in a decrease in image quality that affects user perception. To improve the consistency of visual perception of image quality and meet the needs of observers, it is of great value to study the degradation of images generated during different processing stages [2]. In terms of image processing, image quality assessment (IQA) is important. Subjective and objective methods are the two categories of IQA methods, which vary depending on the assessment criteria and application circumstances.

The subjective method assesses the quality of image directly by observers. Subjective assessment is the most reliable assessment method. The most commonly used subjective evaluation methods include double-stimulus impairment scale (DSIS), double-stimulus continuous quality scale (DSCQS), and single-stimulus continuous quality scale (SSCQS). However, it is frequently costly and time-consuming [3]. Thus, in order to quantify visual quality efficiently, objective methods that have a strong correlation with subjective scores must be developed [4].

The objective method uses the mathematical models to assess the image quality. Objective methods include full-reference (FR), reduced-reference (RR), and no-reference (NR) image quality assessment.

The FR-IQA methods fully utilize the original images to assess the quality of distorted images [5]. Wen Sun et al. proposed an FR-IQA method based on superpixel similarity index, it uses three metrics to assess the quality of an image: pixel gradient similarity, superpixel luminance similarity, and superpixel chrominance similarity [6]. Although it improves calculation accuracy, its computational complexity is high. Junfeng Yang et al. proposed a diffusion speed structure similarity index for FR-IQA. This method calculates image similarity by considering both intra-block structures and inter-block textures, thereby deriving the quality score of the image [7]. Kyohoon Sim et al. introduced a method for assessing image quality, named the deep and local similarity method. This technique uses 2D full-reference analysis to evaluate the similarity between original and deformed deep feature maps obtained from convolutional neural networks. The mean and deviation of these similarity measures are reported as key findings [8]. This method uses the mean and standard deviation to reflect the influence of visual saliency and the distribution of image distortions on quality, respectively [9]. It has a good pooling method, but its computational complexity is high. Zihan Zhou et al. proposed an FR-IQA method which constructed a kernel dictionary and introduced nonlinear sparse coding to IQA. While this method is proficient in analyzing various types of distortions in a higher-dimensional feature space, its generalization capability is limited [10]. Keyan Ding et al. proposed a method that combines the correlations of spatial averages with correlations of the feature maps [11]. This technique explains human perception scores on texture datasets as well as traditional image quality datasets. Dong Wu et al. proposed an FR-IQA method based on multi-scale and multi-directional visibility differences. This method considers the visibility differences and contrast sensitivity functions in the discrete and non-separable shear transform domain, as well as visual masking effects. It evaluates all sub-bands of the shear wavelet transform and combines the perceptual errors of these sub-bands to obtain an objective quality metric for distorted images [12]. This method maintains moderate computational complexity, but its generalization capability needs to be improved. Ke Gu et al. introduced a novel method for assessing the quality of perceptual images by leveraging the capabilities of the human visual system (HVS). This method efficiently applies convolution operations across several scales, taking into account factors such as gradient magnitude, similarity of color information, and perception-based pooling [13]. The advantage of FR-IQA lies in its higher accuracy, as it is based on the comparative analysis between the original image and the deformed image being evaluated, enabling an accurate assessment of the degree of image distortion.

The RR-IQA method refers to using partial original image information as a reference for quality assessment. Mengzhu Yu et al. introduced a novel perceptual hashing method that incorporates the use of complementary color wavelet transform (CCWT) and compressed sensing (CS) for the purpose of RR-IQA. The CCWT is used to decompose the input color image into various sub-bands, while the block-based CS technique is utilized to extract relevant features from these CCWT sub-bands [14]. Wenhan Zhu et al. proposed a free energy based RR-IQA metric inspired by the principle of free energy. This method involves decomposing the image using wavelet transformation, extracting the free energy features of sub-band images using coefficient matrices, and finally using support vector regression to assess image quality [15].

The NR-IQA refers to the method of evaluating image quality solely by analyzing the deformed image's features, without relying on information from the original image [16]. It has strong current development potential and will be obtained in the future [17]. Xiaohan Yang et al. proposed a new transfer learning method for NR-IQA, which can effectively alleviate the overfitting issue [18]. Lixiong Liu et al. introduced a novel NR-IQA metric that considers the influence of pre-attention and spatial dependency on the perceived quality of distorted images. The proposed model, known as the pre-attention and spatial-dependency driven quality assessment predictor, incorporates the pre-attention theory to simulate early phase visual perception by enhancing luminance-channel data [19]. Yang Wen et al. proposed an unsupervised image deblurring method for fuzzy images, which is based on multi-adversarial optimization cycles to uniformly generate adversarial networks. This method enhances the structure and detail retention capability of multi-adversarial networks by introducing a sensing mechanism [20]. Guanghui Yue et al. proposed a new NF-IQA method, named TANet. This method embeds a texture enhancement module in the shallow layers to evaluate facial images by considering texture artifacts. Experimental results on the constructed SZU-RFD benchmark dataset show that the method achieves high accuracy [21]. To address the current lack of fair comparisons in assessing the performance of LFI stitching methods, Yueli Cui et al. built the first stitched WLFI dataset and proposed a blind stitching WLFI quality metric to assess the visual quality. Compared to other quality metrics, the proposed metric demonstrates excellent performance [22]. Zhewei Fang et al. proposed a robust blind metric. This method captures local statistical features to capture local texture variations due to significant local texture degradation engendered by the DIBR procedure. Additionally, global features of the image are extracted to characterize overall blurriness. This metric outperforms the newly developed 3D-synthesized image metrics [23]. The NR-IQA typically involves the use of machine learning or deep learning techniques to classify or predict based on the features of distorted images. Nowadays, due to the availability of computational power and large sets of labeled training images, a lot of researches are going in the field of image quality assessment which utilizes Deep Learning methods [24]. Learning-based methods do achieve better performance than hand-crafted-based methods in certain fields [25]. In contrast, partial reference assessment methods only require partial reference image information to assess quality by analyzing partial image features. The accuracy of NR-IQA is typically lower than that of FR-IQA because it does not rely on original image information and may be influenced by the type and degree of distortion.

The results of FR-IQA are compared based on the original image, thus allowing for comparability across different times, locations and devices. The FR-IQA can more accurately assess the degree of image distortion. Therefore, this paper introduces a dual-space multi-feature fusion-based method for FR-IQA.

The representation of images in different spaces has distinct features and application scenarios. Simultaneously extracting image features from these two color spaces can comprehensively utilize their advantages, construct more feature representation, and improve the precision and robustness of image quality assessment. Firstly, extracting the luminance, slope, chroma, gradient, and spatial frequency features of the image in both the YIQ space and L^*a^*b^* space. Next, the extracted features are combined to form a feature vector. Finally, the Random Forest regression model is used to predict the image quality.

In this paper, a method based on dual-space multi-feature fusion is proposed to full-reference image quality assessment. The method calculates the similarity of two images by extracting the chroma, luminance, slope, gradient, and spatial frequency features of the images. On one hand, compared to other color spaces of an image, the YIQ color space and L^*a^*b^* color space of an image separate chroma information from luminance information. This allows for independent extraction of chroma and luminance features during image processing, increasing the flexibility in image manipulation. On the other hand, both the YIQ and L^*a^*b^* color spaces exhibit high chroma uniformity, meaning that chroma changes within the same distance are relatively consistent. This characteristic enhances stability and reliability when conducting chroma analysis and feature extraction. Currently, most methods extract features in a single space, where the information displayed in a single space is limited, and images show different information feature in different spaces. Inspired by this, this paper further improves the accuracy of evaluation by extracting the relevant features of images in the dual-space of YIQ and L^*a^*b^*. After that, the extracted features are fused into a feature vector, which is input into the Random Forest for regression prediction. The dual-space feature extraction method and the slope feature extraction method proposed in this paper are a novel method. The main contributions of this paper are as follows:

We propose a new method for dual-space feature extraction, which goes beyond the method of feature extraction in a single space. In different spaces, images contain different information. As a result, our method is able to take full advantage of more information from the image.

We introduce a new feature of image, i.e., slope. In remotely sensed terrain images, the slope represents the undulating variation of the image. In non-topographic maps, the texture information of the image is reflected by extracting the slope features of the image.

This paper is structured in the following manner: Sect. 1 introduces the methods and relevant concepts of IQA, proposing an FR-IQA method. Section 2 provides an overview of the conceptual structure of the method model. Section 3 focuses on the necessary preprocessing of images before conducting image feature extraction, as well as the process and relevant computations of image feature extraction. Section 4 summarizes the extracted features, performs feature fusion, and introduces the Random Forest model as the primary tool for data processing. Additionally this paper introduces the validation of the proposed method in four public image datasets, comparing it with several mainstream and currently popular methods. It also includes feature analysis, model performance analysis, and sample size analysis. Section 5 provides a summary of this paper, elaborating on the innovative aspects and future work.

2 Method model

Given that features extracted in a single space often cannot fully describe all information of image and image have many features in different space, this paper proposes a dual-space multi-feature fusion based method to FR-IQA to further describe the internal in formation of the images. The model aims to extract features from two different spaces of the image simultaneously to construct a more comprehensive representation of image features. Considering human visual perception, we extract luminance and chroma features to assess color differences, and further extracts slope features based on the luminance features. In addition, distorted images often disrupt the image structure, so gradient features are extracted to describe structural differences in the image [26]. Extracting spatial frequency features reflects the visual differences in the image. The Fig. 1 illustrates the overall framework of the method proposed in this paper. Where SY and SL respectively represent the luminance similarity calculated from the YIQ space and the L^*a^*b^* space. Similarly, SP1 and SP2 represent the slope similarity, SC1 and SC2 represent the chroma similarity, SG1 and SG2 represent the gradient similarity. Additionally, SH1, SM1, SL1 as well as SH2, SM2, SL2 represent the frequency similarity calculated from the YIQ space and the L^*a^*b^* space. After extracting these 14 features, they are combined into a feature vector, which is then used to generate a dataset along with the subjective scores. Next, the dataset is partitioned into testing and training subsets. In this paper, decision trees are used as regressors to construct a Random Forest model. The training set is inputted into the Random Forest for training. Finally, the proposed model is used to predicting image quality.

3 Image feature extraction

3.1 Image preprocessing

In image processing, preprocessing the input image is a common method. Xiao Lin et al. proposed the division of images into blocks and devised an encoding and decoding communication module to capture communication information among all image blocks [27]. Images have different features in different color spaces. As illustrated in the Fig. 1, to establish a more comprehensive representation of image features, the model described in this paper all perform feature extraction in both the YIQ and L^*a^*b^* color spaces. Most assessment images are in the RGB color space. Therefore, before conducting feature extraction, it is necessary to convert the two images into the YIQ and L^*a^*b^* color spaces, respectively. Due to the inability to directly convert RGB images to the L^*a^*b^* color space, conversion via the XYZ color space is necessary. The conversion from the RGB space to YIQ space and L^*a^*b^* space is detailed in formulas (1), (2), and (3):

$$ \left[ {\begin{array}{*{20}c} Y \\ I \\ Q \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {0.299} & {0.587} & {0.114} \\ {0.596} & { - 0.274} & { - 0.322} \\ {0.211} & { - 0.523} & {0.312} \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} R \\ G \\ B \\ \end{array} } \right] $$

(1)

$$ \left[ {\begin{array}{*{20}c} X \\ Y \\ Z \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {0.412} & {0.357} & {0.180} \\ {0.212} & {0.715} & {0.072} \\ {0.019} & {0.119} & {0.950} \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} R \\ G \\ B \\ \end{array} } \right] $$

(2)

$$ \left[ {\begin{array}{*{20}c} {L^* } \\ {a^* } \\ {b^* } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {3.240} & { - 1.537} & { - 0.498} \\ { - 0.969} & {1.875} & {0.041} \\ {0.055} & { - 0.204} & {1.507} \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} X \\ Y \\ Z \\ \end{array} } \right] $$

(3)

where Y and L^* represent the luminance channels of the image, while I, Q, a^* and b^* represent the image's chrominance channels.

Different color spaces have different features. According to the Fig. 2, (a) is an RGB image, Fig. 2(b) represents the image in the YIQ space, and Fig. 2(c) represents the image in the L^*a^*b^* space.

3.2 Luminance feature

The luminance feature is one of the significant features of an image, which reflects the overall luminance and contrast, typically represented by the magnitude of pixel values. When an image experiences luminance distortion, this uneven distribution of luminance causes certain areas to appear darker or brighter, thereby affecting the overall visual impact of the image.

This paper extracts the luminance feature from the image's luminance channel to assess the degree of luminance distortion in the image. Firstly, in the image's YIQ space and L^*a^*b^* space, the Y and L^* channels represent the image's luminance channels. Subsequently, the Y channel and L^* channel are used to extract the luminance features from both the deformed image and the reference image. Next, the similarity calculation formula is applied to calculate the luminance similarity SY and SL from the images in both the YIQ space and L^*a^*b^* space, respectively. The luminance similarity calculation formulas are as follows:

$$ SY = \frac{1}{N} \sum \limits_x \frac{2Y_R (x) \cdot Y_D (x) + C_1 }{{Y_R^2 (x) + Y_D^2 (x) + C_1 }} $$

(4)

$$ SL = \frac{1}{N} \sum \limits_x \frac{{2L_R (x){{ \cdot }}L_D (x) + C_1 }}{L_R^2 (x) + L_D^2 (x) + C_1 } $$

(5)

where the luminance information in the Y channel of the original image is represented by Y_R, and that of the distorted image is represented by Y_D; In the L channel, L_R and L_D stand for the luminance information of both images, respectively. To avoid calculation errors due to a zero denominator, C₁ is introduced as a constant. SY and SL represent the luminance similarity between the original and deformed images in YIQ space and L^*a^*b^* space, respectively.

Through the extraction of the image's luminance features and the subsequent calculation of the similarity, we can obtain the degree of resemblance between the two images. According to the Fig. 3, Fig. 3(a) is the original image, while Fig. 3(b) and (c) represent different levels of distortion under the same type of distortion (Level 1 to 5, level 1 indicates the least distortion degree, level 5 indicates the greatest distortion degree). The Fig. 3(b) represents distortion level 1, with a luminance similarity of 0.3331 in the YIQ space and 0.2623 in the L^*a^*b^* space compared to the reference image. The Fig. 3(c) represents distortion level 5, with a luminance similarity of 0.3268 in the YIQ space and 0.1852 in the L^*a^*b^* space compared to the reference image. Therefore, the larger values of SY and SL, the smaller difference in luminance between the two images, indicating a higher degree of similarity in luminance and better image quality.

3.3 Slope feature

The slope feature is a method to describe local variations in the image, measuring the degree of local changes at each pixel in the image. If the image undergoes distortion, it will result in greater local variations in the image. The slope maps corresponding to the original image and the deformed image are shown in the Fig. 4. Therefore, this paper introduces the concept of slope into non-topographic images, enabling the extraction of slope features. This will enhance the assessment of image quality's accuracy.

First, after extracting the luminance features from the image, we obtain the luminance feature maps, then extract slope features from luminance feature maps. For a pixel of an input image with two dimensions: horizontal and vertical, let the function be f(x, y). The slope at point (x_i, y_i) is calculated by computing the derivatives f'(x_i) in the x-direction and f'(y_i) in the y-direction. Then, we calculate the angle from the positive x-axis to the direction of the slope using the following formula:

$$ Slope = \arctan \, (\frac{{f{\prime} (y_i )}}{{f{\prime} (x_i )}}) $$

(6)

where arctan() represents the arctangent function. Finally, the slope features extracted from the original image and the deformed image are used to compute the similarity of slopes, with the following formula:

$$ SP_i = \frac{1}{N} \sum \limits_x \frac{{2S_R {{ \cdot }}S_D + C_1 }}{S_R^2 + S_D^2 + C_1 } $$

(7)

where S_R and S_D respectively represent the slope features of the original image and deformed image. SP₁ and SP₂ represent the slope similarity of the reference image and distorted image in the YIQ space and L^*a^*b^* space, respectively.

Table 1 lists the values of SP_i(i = 1,2) for different levels of distortion in an input image in both color spaces. As the distortion level increases, the values of SP_i gradually decrease. Therefore, a higher value of SP_i denotes a lower degree of image distortion and better image quality.

Table 1 Comparison of slope similarity for different levels of distortion

Full reference image quality assessment based on dual-space multi-feature fusion

Abstract

Similar content being viewed by others

Image quality assessment via multiple features

Image Quality Assessment Using a Combination of Hand-Crafted and Deep Features

New feature selection algorithms for no-reference image quality assessment

Explore related subjects

1 Introduction

2 Method model

3 Image feature extraction

3.1 Image preprocessing

3.2 Luminance feature

3.3 Slope feature

3.4 Chroma feature

3.5 Gradient features

3.6 Spatial frequency feature

4 Model and experimental analysis

4.1 Feature fusion

4.2 Random forest regression prediction model

4.3 Feature analysis

4.4 Image datasets

4.5 Experimental analysis

4.6 Ablation experiments

4.7 Analysis of model performance

4.8 Analysis of sample quantity

5 Conclusion and outlook

5.1 Conclusion

5.2 Outlook

5.3 Future work

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interests

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation