1 Introduction

Ultrasound imaging is a powerful non-invasive diagnostic imaging modality in medicine [23]. However, like all medical imaging modalities, that exhibit various image artifacts, ultrasound is subject to a locally correlated multiplicative noise, called speckle, which degrades image quality and compromises diagnostic confidence [23]. For medical images, quality can be objectively defined in terms of performance in clinically relevant tasks such as lesion detection and classification, where typical tasks are the detection of an abnormality, the estimation of some parameters of interest, or the combination of the above [16]. Most studies today have assessed the equipment performance by testing diagnostic performance of multiple experts, which also suffers from intra- and inter-observer variability. Although this is the most important method of assessing the results of image degradation, few studies have attempted to perform physical measurements of degradation [31].

Image quality is important when evaluating or segmenting atherosclerotic carotid plaques [11, 21] or the intima-media-thickness (IMT) in the carotid artery [20], where speckle obscures subtle details [19] in the image. Speckle, which is a multiplicative noise, is the major performance-limiting factor in visual lesion detection in ultrasound imaging, that makes the signal or lesion difficult to detect [1921, 31]. In a recent study [19], we have shown that speckle reduction improves the visual perception of the expert in the assessment of ultrasound imaging of the carotid artery. Traditionally, suspected plaque formation is confirmed using color blood flow imaging, where the types of the plaque were visually identified, and the delineations of the plaque and IMT were manually made by medical experts [20, 21].

In order to be able to design accurate and reliable quality metrics, it is necessary to understand what quality means to the expert. An expert’s satisfaction when watching an image depends on many factors. One of the most important is of course image content and material. Research made in the area of image quality showed, that this depends on many parameters, such as: viewing distance, display size, resolution, brightness, contrast, sharpness, colorfulness, naturalness, and other factors [3].

It is also important to note that there is often a difference between fidelity (the accurate reproduction of the original on the display) and perceived quality. Sharp images with high contrast are usually more appealing to the average expert. Likewise, subjects prefer slightly more colorful and saturated images despite realizing that they look somewhat unnatural [12]. For studying visual quality some of the definitions above should be related to the human visual system. Unfortunately, subjective quality may not be described by an exact figure, due to its inherent subjectivity, it can only be described statistically. Even in psychological threshold experiments, where the task of the expert is to give a yes or no answer, there exists a significant variation between expert’s contrast sensitivity functions and other critical low-level visual parameters. When speckle noise is apparent in the image, the expert’s differing experiences with noise are bound to lead to different weightings of the artifact [3]. Researchers showed that experts and non-experts, with respect to image quality, examine different critical image characteristics to form their final opinion [9].

The objective of this study was to investigate the usefulness of image quality evaluation based on image quality metrics, and visual perception, in ultrasound imaging of the carotid artery after normalization and speckle reduction filtering. For this task, we have evaluated the quality of ultrasound imaging of the carotid artery on two different ultrasound scanners, the HDI ATL-3000 and the HDI ATL-5000, before (NF) and after (DS) speckle reduction, after image normalization (N), and after image normalization and speckle reduction filtering (NDS). Statistical and texture analysis was carried out on the original and processed images and these findings were compared with the visual perception, carried out by two experts.

2 Methodology

2.1 Ultrasound imaging scanners

The images used in this study, were captured using two different ultrasound scanners, the ATL HDI-3000 and the ATL HDI-5000 (Advanced Technology Laboratories, Seattle, USA).

The ATL HDI-3000 ultrasound scanner is equipped with a 64-element fine pitch high-resolution, 38-mm broadband array, a multi-element ultrasound scan head with an operating frequency range of 4–7 MHz, an acoustic aperture of 10×8 mm and a transmission focal range of 0.8–11 cm [1].

The ATL HDI-5000 ultrasound scanner is equipped with a 256-element fine pitch high-resolution 50-mm linear array, a multi-element ultrasound scan head with an extended operating frequency range of 5–12 MHz and it offers real spatial compound imaging. The scanner increases the image clarity using SonoCT imaging by enhancing the resolution and borders, and interface margins are better displayed. Several tests made by the manufacturer showed that, overall, the ATL HDI-5000 scanner was superior to conventional two-dimensional imaging systems, primarily because of the reduction of speckle, contrast resolution, and tissue differentiation, and the image was visually better [1].

The settings for the two ultrasound scanners were the same during the acquisition of all images in this study. The images were captured with the ultrasound probe positioned at right angles to the adventitia and the image was magnified, or the depth was adjusted so that the plaque would fill a substantial area of the image. Digital images were resolution normalized at 16.66 pixels/mm (see Sect. 2.4). This was carried out due to the small variations in the number of pixels per mm of image depth (i.e for deeply situated carotid arteries, image depth was increased and therefore digital image spatial resolution would have decreased) and in order to maintain uniformity in the digital image spatial resolution [17]. B-mode scan settings were adjusted at 170 dB, so that the maximum dynamic range was used with a linear post-processing curve. In order to ensure that a linear-post-processing curve is used, these settings were pre-selected (by selecting the appropriate start-up presets from the software) and were included in the part of the start-up settings of the ultrasound scanner. The position of the probe was adjusted so that the ultrasonic beam was vertical to the artery wall. The time gain compensation (TGC) curve was adjusted, (gently sloping), to produce uniform intensity of echoes on the screen, but it was vertical in the lumen of the artery where attenuation in blood was minimal, so that echogenicity of the far wall was the same as that of the near wall. The overall gain was set so that, the appearance of the plaque was assessed to be optimal, and slight noise appeared within the lumen. It was then decreased so that at least some areas in the lumen appeared to be free of noise (black).

2.2 Materials

A total of 80 symptomatic B-mode longitudinal ultrasound images from identical vessel segments of the carotid artery bifurcation, were acquired from each ultrasound scanner. The images were recorded digitally on a magneto optical drive with a resolution of 768×576 pixels with 256 gray levels.

The images were recorded at the Institute of Neurology and Genetics, in Nicosia, Cyprus, from 32 female and 48 male symptomatic patients aged between 26 and 95 years, with a mean age of 54 years. These subjects were at risk of atherosclerosis, which have already developed clinical symptoms, such as a stroke or a transient ischemic attack.

In addition, ten symptomatic ultrasound images of the carotid artery representing different types of atherosclerotic carotid plaque formation with irregular geometry typically found in this blood vessel were acquired from each scanner.

Plaques may be classified into the following types: (1) type I: uniformly echolucent (black), where bright areas occupy less than 15% of the plaque area, (2) type II: predominantly echolucent, where bright echoes occupy 15–50% of the plaque area, (3) type III: predominantly echogenic, where bright echoes occupy 50–85% of the plaque area, (4) type IV: uniformly echogenic, where bright echoes occupy more than 85% of the plaque area, and (5) type V: calcified cap with acoustic shadow so that the rest of the plaque cannot be visualized [11, 21]. In this study, the plaques delineated were of types II, III, and IV because it is easier to make a manual delineation since the fibrous cap, which is the border between blood and plaque, is more easily identified. If the plaque is of type I, borders are not visible well. Plaques of type V produce acoustic shadowing and the plaque is also not visible well.

2.3 Speckle reduction

The linear scaling speckle reduction filter [linear scaling mean variance (lsmv)] utilizing the mean and the variance of a pixel neighborhood, first introduced in [18] and implemented by our group, was used in this study. The lsmv filter was also used in other studies for the speckle reduction filtering of ultrasound carotid artery images and forms an output image as follows [2, 18, 19]:

$$f_{{i,j}} = \bar{g} + k_{{i,j}} (g_{{i,j}} - \bar{g})$$
(1)

where f i,j is the new estimated noise-free pixel value in the moving window, g i,j , is the noisy pixel value in the middle of the moving window, \(\bar{g}\) is the local mean value of an M × N region surrounding and including pixel g i,j , k i,j , is a weighting factor with k i,j ∈[0...1], and i, j, are the pixel absolute pixel coordinates. The factor k i,j , is a function of the local statistics in a moving window and may be derived as [18]:

$$k_{{i,j}} = \frac{{\sigma _{g} ^{2} }}{{\bar{g}^{2} \sigma _{g} ^{2} + \sigma ^{2}_{\text n}}}.$$
(2)

The values \( \sigma _{g} ^{2} \) and \( \sigma _{\text {n}} ^{2} \) represent the variance in moving window and the variance of the noise in the whole image, respectively. The noise variance, \( \sigma _{\text {n}} ^{2} \) may be calculated for the logarithmically compressed image, by computing the average noise variance over a number of windows with dimensions considerable larger than the filtering window. In each window the noise variance is computed as [19]:

$$\sigma ^{2}_{\text n} = \frac{{\sum_{i = 1}^p {\sigma ^{2}_{p}}}} {{\bar{g}_{p}}}$$
(3)

where \( \sigma _{p} ^{2}, \) and \(\bar{g}_{p},\) are the variance and mean of the noise in the selected windows, respectively, and p is the index covering all windows in the whole image [10]. If the value of k i,j is 1 (in edge areas) this will result in an unchanged pixel, whereas a value of 0 (in uniform areas) replaces the actual pixel by the local average, \(\bar{g},\) over a small region of interest (see Eq. 1). It was shown that speckle in ultrasound images can be approximated by the Rayleigh distribution [5, 6, 26] which is implicitly contained by \( \sigma _{\text{n}} ^{2} \) in Eqs. 2 and 3. Speckle reduction filtering was applied on the images using the lsmv filter, which was applied four times iteratively on the images using a 7×7 moving pixel window without overlapping, as this produced the best results [19].

2.4 Image normalization

The need for image standardization or post-processing has been suggested in the past, and normalization using only blood echogenicity as a reference point has been applied in ultrasound images of carotid artery [11]. Brightness adjustments of the ultrasound images has been used in this study, as this has been shown to improve image compatibility, by reducing the variability introduced by different gain settings and facilitate ultrasound tissue comparability [11, 17].

The images were normalized manually by linearly adjusting the image so that the median gray level value of the blood was 0–5, and the median gray level of the adventitia (artery wall) was 180–190. The scale of the gray level of the images ranged from 0–255 [25]. This normalization using blood and adventitia as reference points was necessary in order to extract comparable measurements in case of processing images obtained by different operators or different equipments [25].

The image normalization procedure performed in this study was implemented in MATLAB software (version 6.1.0.450, release 12.1, May 2001, by The Mathworks, Inc.), and tested on a Pentium III desktop computer, running at 1.9 GHz, with 512 MB of RAM memory. The same software and computer station were also used for all other methods employed in this study.

2.5 Statistical and texture analysis

Texture provides useful information, which is used by humans for the interpretation and analysis of many types of images. It may provide useful information about object characterization in ultrasound images [8]. The following statistical and texture features were extracted from the original, and the processed images to evaluate their usefulness based on speckle reduction filtering, image normalization, and visual perception evaluation:

Statistical features (SF) [8]

(1) mean, (2) median, (3) variance (σ2), (4) skewness (σ3), (5) kurtosis (σ4), and (6) Speckle index \((C = \sigma ^{2} /\bar{g}).\)

Spatial Gray Level Dependence Matrix-range values (SGLDM)

These include selected features as proposed by Haralick et al. [14], measured in four directions, namely in the east, west, north, and south direction of a pixel neighborhood. The range of these four values were computed for each feature where the following features were computed: (1) entropy, (2) contrast, and (3) angular second moment (ASM).

The Wilcoxon matched-pairs signed rank sum test was also used in order to determine if a significant (S) or not significant (NS) difference exists between the results of the visual perception evaluation made by the two experts (see Table 2) and the statistical and texture features (see Table 4) at P<0.05. The test was applied on all the 80 images of the carotid artery for the cases NF, DS, N, and NDS.

2.6 Image quality and evaluation metrics

Differences between the original, g i,j , and the processed, f i,j , images were evaluated using the following image quality and evaluation metrics, which were used as statistical measures. The basic idea is to compute a single number that reflects the quality of the processed image. Processed images with higher metrics have been evaluated to be better [24]. The following measures, which are easy to compute and have clear physical meaning, were computed:

Normalized mean square error (MSE)

The MSE measures the quality change between the original and processed image in an M × N window [7]:

$${\text{MSE}} = \frac{1}{{MN}}{\sum\limits_{i = 1}^M {{\sum\limits_{j = 1}^N {\left(\frac{{g_{{i,j}} - f_{{i,j}} }}{{{\text{lpg}}_{{i,j}} }}\right)^{2} } }} }$$
(4)

where lpg i,j , is the low-pass-filtered image of the original image, g i,j . For the case that lpg i,j is equal to zero, its value is replaced with the smallest gray level value in the image. The MSE has been widely used to quantify image quality and, when used alone, it does not correlate strongly enough with perceptual quality. It should be used, therefore, together with other quality metrics and visual perception [7, 9].

Normalized root mean square error (RMSE)

The RMSE is the square root of the squared error averaged over an M × N array [13]:

$${\text{RMSE}} = {\sqrt {\frac{1}{{MN}}{\sum\limits_{i = 1}^M {{\sum\limits_{j = 1}^N {\left(\frac{{g_{{i,j}} - f_{{i,j}} }}{{{\text{lpg}}_{{i,j}} }}\right)^{2} } }} }} }.$$
(5)

The popularity of RMSE arises mostly from the fact that is in general the best approximation of the standard error.

Normalized error summation in the form of the Minkowski metric (Err)

The Err is the norm of the dissimilarity between the original and the processed images [7, 30, 31]:

$${\text{Err}} = {\left( {\frac{1}{{MN}}{\sum\limits_{i = 1}^M}{\sum\limits_{j = 1}^N {{\left| {\frac{{g_{{i,j}} - f_{{i,j}} }}{{{\text{lpg}}_{{i,j}} }}} \right|}^{\beta } } }} \right)}^{{1}/{\beta}} $$
(6)

computed for β=3 (Err3) and β=4 (Err4). For β=2, the RMSE is computed as in Eq. 5, whereas for β=1 the absolute difference, and for β=∞ the maximum difference measure.

Normalized geometric average error (GAE)

It is a measure, which shows if the transformed image is very bad [33], and it is used to replace or complete RMSE. It is positive only if every pixel value is different between the original and the transformed image. The GAE is approaching zero, if there is a very good transformation (small differences) between the original and the transformed image, and high vice versa. This measure is also used for tele-ultrasound, when transmitting ultrasound images and is defined as:

$${\text{GAE}} = \left({\prod\limits_{i = 1}^N {{\prod\limits_{j = 1}^M {{\sqrt {\frac{{g_{{i,j}} - f_{{i,j}} }}{{{\text{lpg}}_{{i,j}} }}} }} }} }\right)^{{1}/{NM}}.$$
(7)

Normalized signal-to-noise radio (SNR)

The SNR [28] is given by:

$${\text{SNR}} = 10\,\log _{{10}}\frac{{{\sum\nolimits_{i=1}^M{{\sum\nolimits_{j=1}^N {\left[ {{\left( {{g^{2}_{{i,j}} + f^{2}_{{i,j}} }} \right)}/{{{\text{lpg}}_{{i,j}} }}} \right]} }} }}}{{{\sum\nolimits_{i = 1}^M {{\sum\nolimits_{j = 1}^N {\left[ {\left( {{g_{{i,j}} - f_{{i,j}} }} \right)} /{{{\text{lpg}}_{{i,j}} }} \right]} ^{2}}}}}}.$$
(8)

The SNR, RMSE, and Err, prove to be very sensitive tests for image degradation, but they are completely non-specific. Any small change, in image noise, filtering, and transmitting preferences would cause an increase of the above measures.

Normalized peak signal-to-noise radio (PSNR)

The PSNR [28] is computed by:

$${\text{PSNR}} = - 10\,\log _{{10}} \frac{{{\text{MSE}}}}{{s^{2} }}$$
(9)

where s is the maximum intensity in the original image. The PSNR is higher for a better-transformed image and lower for a poorly transformed image. It measures image fidelity that is how closely the transformed image resembles the original image.

Mathematically defined universal quality index (Q)

It models any distortion as a combination of three different factors [31] which are: loss of correlation, luminance distortion, and contrast distortion and is derived as:

$$Q = \frac{{\sigma _{{gf}} }}{{\sigma _{f} \sigma _{g} }}\frac{{2\overline{f} \bar{g}}}{{(\bar{f})^{2} + (\bar{g})^{2} }}\frac{{2\sigma _{f} \sigma _{g} }}{{\sigma ^{2}_{f} + \sigma ^{2}_{g} }}, \quad - 1< Q < 1$$
(10)

where \(\bar{g},\) and \(\bar{f}\) represent the mean of the original and transformed image values, with their standard deviations, σ g and σ f , of the original and transformed values of the analysis window; and σ gf represents the covariance between the original and transformed images. Q is computed for a sliding window of size 8×8 without overlapping. Its highest value is 1 if g i,j =f i,j , while its lowest value is −1 if \(f_{{i,j}} = 2\bar{g} - g_{{i,j}}.\)

Structural similarity index (SSIN)

The SSIN between two images [30], which is a generalization of Eq. 10, is given by:

$${\text{SSIN}} = \frac{{(2\bar{g}\bar{f} + c_{1})(2\sigma _{{gf}} + c_{2})}}{{(\bar{g}^{2} + \bar{f}^{2} + c_{1})(\sigma ^{2}_{g} + \sigma ^{2}_{f} + c_{2})}}, \quad -1 < Q < 1$$
(11)

where c 1=0.01dr, and c 2=0.03dr, with dr=255, representing the dynamic rage of the ultrasound images. The range of values for the SSIN lies between −1 and 1, where −1 stands for a bad similarity between the original and transformed images and 1 stands for a good similarity between them. It is computed similarly to the Q measure for a sliding window of size 8×8 without overlapping.

2.7 Visual perception evaluation

Visual evaluation can be broadly categorized as the ability of an expert to extract useful anatomical information from an ultrasound image. The visual evaluation varies of course from expert to expert and is subject to the expert’s variability [16]. The visual evaluation, in this study, was carried out according to the ITU-R recommendations with the Double Stimulus Continuous Quality Scale (DSCQS) procedure [33]. All the visual evaluation experiments were carried out at the same workstation under indirect fluorescent lighting typical of an office environment. Two vascular experts evaluated the images. The vascular experts, an angiologist and a neurovascular specialist, were allowed to position themselves comfortably with respect to the viewing monitor, where a typical distance of about 50 cm was kept. Experts in real-life applications employ a variety of conscious and unconscious strategies for image evaluation, and it was our intent to create an application environment as close as possible to the real one. The two vascular experts evaluated 80 ultrasound images recorded from each ultrasound scanner before and after speckle reduction, after image normalization, and after normalization and speckle reduction filtering.

The two vascular experts evaluated the area around the distal common carotid, between 2 and 3 cm, before the bifurcation and after the bifurcation. It is known that measurements taken from the far wall of the carotid artery are more accurate than those taken from the near wall [5, 11]. Furthermore, the experts were examining the image in the lumen area, in order to identify the existence of a plaque or not. The primary interest of the experts was the area around the borders between blood and tissue of the carotid artery, and how much better they can differentiate blood from carotid wall, intima media, or plaque surface.

For each image, an individual expert is asked to assign a score in the 1–5 scale, corresponding to low and high subjective visual perception criteria. Five was given to an image with the best visual perception. Therefore, the maximum score for a procedure is 400 if the expert assigned the score of 5 for all the 80 images. For each procedure, the score was divided by 4 to be expressed in percentage format. The experts were allowed to give equal scores to more than one image in each case. For each preprocessing procedure the average score was computed.

3 Results

Figure 1 illustrates the original (NF) speckle reduction (DS) normalized (N), and normalized speckle reduction (NDS) images for the two ultrasound image scanners. It is shown that the images for the ATL HDI-3000 scanner have greater speckle noise compared to the ATL HDI-5000 images. Moreover the lumen borders and the IMT are more easily identified with the ATL HDI-5000 on the N and NDS images.

Fig. 1
figure 1

Ultrasound carotid artery images of the original (NF), speckle reduction (DS), normalized (N), and normalized speckle reduction (NDS) of the ATL HDI-3000 and ATL HDI-5000, shown in the left and right columns, respectively. Vertical lines given in the original image (NF) of the ATL HDI-3000 and the ATL HDI-5000 scanners define the position of the gray-value line profiles plotted in Fig. 2. a Original (NF) 3000; b original (NF) 5000; c speckle reduction (DS) 3000; d speckle reduction (DS) 5000; e normalized (N) 3000; f normalized (N) 5000; g normalized speckle reduction (NDS) 3000; h normalized speckle reduction (NDS) 5000

Figure 2 shows gray-value line profiles, from top to bottom of an ultrasound carotid image (see Fig. 1a, b) for the original (NF) speckle reduction (DS) normalized (N), and normalized speckle reduction (NDS) images for the ATL HDI-3000 and ATL HDI-5000 scanner. Figure 2 also shows that speckle reduction filtering sharpens the edges. The contrast in the ATL HDI-3000 images was decreased after normalization and speckle reduction filtering, whereas the contrast for the ATL HDI-5000 images was increased after normalization.

Fig. 2
figure 2

Gray-value line profiles of the lines illustrated in Fig. 1a, b for the NF, DS, N, and NDS images, for the ATL HDI-3000 and ATL HDI-5000 scanner, shown in the left and right columns, respectively. The gray-scale value, and the column 240, are shown in the y and x axes. a Original (NF) 3000; b original (NF) 5000; c speckle reduction (DS) 3000; d speckle reduction (DS) 5000; e normalized (N) 3000; f normalized (N) 5000; g normalized speckle reduction (NDS) 3000; h normalized speckle reduction (NDS) 5000

Table 1 shows the results in percentage (%) format for the visual perception evaluation made by the two vascular experts on the two scanners. It is clearly shown that the highest scores are given for the NDS images, followed by the N, DS, and NF images for both scanners from both experts.

Table 1 Visual perception evaluation for the image quality on 80 images processed from each scanner for the original (NF), speckle reduction (DS), normalized (N), and normalized speckle reduction (NDS)

Table 2 presents the results of the Wilcoxon rank sum test for the visual perception evaluation performed between the images, NF–DS, NF–N, NF–NDS, DS–N, DS–NDS, and N–NDS, for the first and second observer on the ATL HDI-3000 and the ATL HDI-5000 scanner, respectively. The results of the Wilcoxon rank sum test in Table 2 for the visual perception evaluation were mostly significantly different (S) showing large intra-observer and inter-observer variability for the different preprocessing procedures (NF–DS, NF–N, NF–NDS, DS–N, DS–NDS, N–NDS) for both scanners. Values not significantly (NS) different were obtained for both scanners, after normalization and speckle reduction filtering, showing that this improves the optical perception evaluation.

Table 2 Wilcoxon rank sum test P value for the ATL HDI-3000 and the ATL HDI-5000 scanner for the visual perception evaluation performed by the experts between the images: NF–DS, NF–N, NF–NDS, DS–N, DS–NDS, and N–NDS

Table 3 presents the results of the statistical and texture features for the 80 images recorded from each image scanner. The upper part of Table 3 shows that, the effect of speckle reduction filtering (DS) for both scanners was similar; that is the mean and the median were preserved, the standard deviation was reduced, the skewness and the kurtosis were reduced, and the speckle index was reduced (see also Fig. 2c, d, g, and h, where it is shown that the gray-value line profiles are smoother and less flattened). Furthermore, Table 3 shows that some statistical measures like the skewness, kurtosis, and speckle index, were better than the original (NF) and speckle reduction (DS) images after normalization (N) for both scanners, and were even better after normalization and speckle reduction (NDS). However, the mean was increased for N and NDS images for both scanners.

Table 3 Statistical and texture features (mean values for 80 images processed from each scanner) for the original (NF), speckle reduction (DS), normalized (N) and normalized speckle reduction (NDS) images

In the bottom part of Table 3, it is shown that the entropy was increased and the contrast was reduced significantly in the cases of DS and NDS for both scanners. The entropy was slightly increased and the contrast was slightly reduced in the cases of N images for both scanners. The ASM was reduced for the DS images for both scanners and for the NDS images for the ATL HDI-5000 scanner.

Table 4 presents the results of the Wilcoxon rank sum test for the statistical and texture features (see Table 3) performed on the NF–DS, NF–N, NF–NDS, DS–N, DS–NDS, and N–NDS images on the ATL HDI-3000 scanner. No statistically significant difference was found in the first part of Table 4 when performing the non-parametric Wilcoxon rank sum test at P<0.05, between the original (NF) and speckle reduction (DS), the original (NF) and normalized (N), and the original (NF) and normalized speckle reduction (NDS) features for both scanners. Statistical significant different values were mostly obtained for the second part of Table 4 for the ASM, contrast, and entropy.

Table 4 Wilcoxon rank sum test P value for the ATL HDI-3000 scanner for the statistical and texture features between the NF–DS, NF–N, NF–NDS, DS–N, DS–NDS, and N–NDS images

Furthermore, Table 4 shows that the entropy, which is a measure of the information content of the image, was higher for the ATL HDI-5000 in all the cases. The ASM, which is a measure of the inhomogeneity of the image, is lower for the ATL HDI-5000 in the cases of the DS and NDS images. Furthermore, the entropy and the ASM were more influenced from speckle reduction than normalization, as they are reaching their best values after speckle reduction filtering.

Table 5 illustrates the image quality evaluation metrics, for the 80 ultrasound images recorded from each image scanner, between the following images: NF–DS, NF–N, NF–NDS, and N–NDS. Best values were obtained for the NF–N images with lower RMSE, Err3, and Err4, higher SNR and PSNR for both scanners. The GAE was 0.00 for all cases, and this can be attributed to the fact that the information between the original and the processed images remains unchanged. Best values for Q and SSIN were obtained for the NF–N images for both scanners, whereas best values for SNR were obtained on the ATL HDI-3000 scanner for the NF–N images.

Table 5 Image quality evaluation metrics between the original and speckle reduction (NF–DS), original and normalized (NF–N), original and normalized speckle reduction (NF–NDS), and the normalized and normalized speckle reduction (N–NDS) images

Table 5 shows that the effect of speckle reduction filtering was more obvious on the ATL HDI-3000 scanner, which shows that the ATL HDI-5000 scanner produces images with lower noise and distortion. Moreover, it was obvious that all quality metrics presented here are equally important for image quality evaluation. Specifically, for the most of the quality metrics, better measures were obtained between the NF and N (NF–N) images, followed by the NF–NDS and N–NDS images for both scanners. It is, furthermore, important to note that a higher PSNR (or equivalently, a lower RMSE) does not necessarily imply a higher subjective image quality, although they do provide some measure of relative quality.

Furthermore, the two experts evaluated visually 10 B-mode ultrasound images with different types of plaque [11] (see Fig. 3), by delineating the plaque at the far wall of the carotid artery wall. The visual perception evaluation, and the delineations made by the two experts, showed that the plaque may be better identified on the ATL HDI-5000 scanner after normalization and speckle reduction (NDS), whereas the borders of the plaque and the surrounding tissue may be better visualized on the ATL HDI-5000 when compared with the ATL HDI-3000 scanner.

Fig. 3
figure 3

Ultrasound carotid plaque images of type II outlined by an expert of the NF, DS, N, and NDS of the ATL HDI-3000 and ATL HDI-5000, shown in the left and right columns, respectively. a Original (NF) 3000; b original (NF) 5000; c speckle reduction (DS) 3000; d speckle reduction (DS) 5000; e normalized (N) 3000; f normalized (N) 5000; g normalized speckle reduction (NDS) 3000; h normalized speckle reduction (NDS) 5000

Table 6 summarizes the image quality evaluation results of this study, for the visual evaluation (Table 1), the statistical and texture analysis (Table 3), and the image quality evaluation metrics (Table 5). A double plus sign in Table 6 indicates very good performance, while a single plus sign a good performance. Table 6 can be summarized as follows: (1) the NDS images were rated visually better on both scanners, (2) the NDS images showed better statistical and texture analysis results on both scanners, (3) the NF–N images on both scanners showed better image quality evaluation results, followed by the NF–DS on the ATL HDI-5000 scanner and the NF–DS on the HDI ATL-3000 scanner, (4) the ATL HDI-5000 scanner images have considerable higher entropy than the ATL HDI-3000 and thus more information content. However, based on the visual evaluation by the two experts, both scanners were rated similarly.

Table 6 Summary findings of image quality evaluation in ultrasound imaging of the carotid artery

4 Discussion

Normalization and speckle reduction filtering are very important preprocessing steps in the assessment of atherosclerosis in ultrasound imaging. We have, therefore, investigated the usefulness of image quality evaluation, in 80 ultrasound images of the carotid bifurcation, based on image quality metrics and visual perception after normalization and speckle reduction filtering using two different ultrasound scanners (ATL HDI-3000 and ATL HDI-5000). Specifically, the images were evaluated, before and after speckle reduction, after normalization, and after normalization and speckle reduction filtering. The evaluation was based on visual evaluation by two experts, statistical and texture features, image normalization, speckle reduction, as well as based on image quality evaluation metrics. It is noted that, to the best of our knowledge, there are no other studies found in the literature for evaluating ultrasound image quality, based on speckle reduction filtering and normalization performed on carotid artery images, acquired by two different ultrasound scanners.

The main findings of this study can be summarized as follows: (1) the NDS images were rated visually better on both scanners, (2) the NDS images showed better statistical and texture analysis results on both scanners, (3) better image quality evaluation results were obtained for the NF–N images for both scanners, followed by the NF–DS images on the ATL HDI-5000 scanner and the NF–DS on the HDI ATL-3000 scanner, (4) the ATL HDI-5000 scanner images have considerable higher entropy than the ATL HDI-3000 scanner and thus more information content. However, based on the visual evaluation by the two experts, both scanners were rated similarly.

It was shown that normalization and speckle reduction produces better images. Normalization was also proposed in other studies using blood echogenicity as a reference and applied in carotid artery images [32]. In [11, 17], it was shown that normalization improves the image comparability by reducing the variability introduced by different gain settings, different operators, and different equipments. It should be noted that the order of applying these processes (normalization and speckle reduction filtering) affects the final result. Based on unpublished results, we have observed that by applying speckle reduction filtering first and then normalization produces distorted edges. The preferred method is to apply normalization first and then speckle reduction filtering for better results.

In two recent studies [20, 21], it was shown that the preprocessing of ultrasound images of the carotid artery with normalization and speckle reduction filtering improves the performance of the automated segmentation of the IMT [20] and plaque [21]. More specifically, it was shown in [20] that a smaller variability in segmentation results was observed when performed on images after normalization and speckle reduction filtering, compared with the manual delineation results attained by two medical experts. Furthermore, in another study [19], we have shown that speckle reduction filtering improves the percentage of correct classification score of symptomatic and asymptomatic images of the carotid. Speckle reduction filtering was also investigated by other researchers on ultrasound images of liver and kidney [4], and on natural scenery [18], using an adaptive two-dimensional filter similar to the lsmv speckle reduction filter used in this study. In these studies [4, 18], speckle reduction filtering was evaluated based only on visual perception evaluation made by the researchers.

Verhoeven et al. [29] applied mean and median filtering in simulated ultrasound images and in ultrasound images with blood vessels. The lesion-signal-to-noise ratio was used in order to quantify the detectability of lesions after filtering. Filtering was applied on images with fixed and adaptive size windows in order to investigate the influence of the filter window size. It was shown that the difference in performance between the filters was small but the choice of the correct window size was important. Kotropoulos et al. [15], applied adaptive speckle reduction filtering in simulated tissue mimicking phantom, and liver ultrasound B-mode images, where it was shown that the proposed maximum likelihood estimator filter was superior than the mean filter.

Although, in this study, speckle has been considered as noise, there are other studies where speckle—approximated by the Rayleigh distribution—was used to support automated segmentation. Specifically in [5], an automated luminal contour segmentation method based on a statistical approach was introduced, whereas in [26], ultrasound intravascular images were segmented using knowledge-based methods. Furthermore, in [6], a semi-automatic segmentation method for intravascular ultrasound images, based on gray-scale statistics of the image was proposed, where the lumen, IMT, and the plaque were segmented in parallel by utilizing a fast marching model.

Some statistical measures, as shown in the upper part of Table 3, were better after normalization and some others, shown in the bottom part of Table 3, were better after speckle reduction. Table 3 also shows that the contrast was higher for the NF and N images on both scanners and was significantly different (S) after normalization and speckle reduction filtering (see Table 4). All other measures presented in Table 3 were comparable showing that better values were obtained on the NDS images. Moreover, it was shown that the entropy that is a measure of the information content of the image [14] was higher for both scanners in the cases of the NDS and DS images. Significantly different entropy values were obtained mostly after normalization and speckle reduction filtering (see Table 4). Low entropy images have low contrast and large areas of pixels with same or similar gray level values. An image which is perfectly flat will have a zero entropy. On the other hand, high entropy images have high contrast and thus higher entropy values [13]. The ATL HDI-5000 scanner produces, therefore, images with higher information content. The entropy was also used in other studies to classify the best liver ultrasound images [34], where it was shown that the experts rated the images with higher entropy values as better. In [8], entropy and other texture features were used to classify between symptomatic and asymptomatic carotid plaques for assessing the risk of stroke. It was also shown [17] that asymptomatic plaques tend to be brighter, have higher entropy and are more coarse, whereas symptomatic plaques tend to be darker, have lower entropy (i.e. the image intensity in neighboring pixels is more unequal) and are less coarse. Furthermore, it is noted that texture analysis could also be performed on smaller areas of the carotid artery, such as the plaque, after segmentation [20, 21].

In previous studies [3, 9, 12, 16, 30, 31, 33], researchers evaluated image quality on natural scenery images using either only the visual perception by experts or some of the evaluation metrics presented in Table 5. In this study, MSE and RMSE values were in the range of 0.4–2.0, for all cases, and Err3, Err4, SNR, PSNR, Q, and SSIN were better for the NF–N images for both scanners, showing that normalization increases the values of these measures. In [2], speckle reduction filtering was investigated on ultrasound images of the heart. The MSE values reported after speckle reduction for the adaptive weighted median filtering, wavelet shrinkage-enhanced filter, wavelet shrinkage filter, and non-linear coherence diffusion were 289, 271, 132, and 121, respectively. Loupas et al. [22] applied an adaptive weighted median filter for speckle reduction in ultrasound images of the liver and gallbladder and used the speckle index and the MSE for comparing the filter with a conventional mean filter. It was shown that the filter improves the resolution of small structures in the ultrasound images. It was also documented in [27, 30, 31] that the MSE, RMSE, SNR, and PSNR measures are not objective for image quality evaluation and that they do not correspond to all aspects of the visual perception nor they correctly reflect artifacts [7, 33].

Recently the Q [31] and SSIN [30] measures for objective image quality evaluation have been proposed. The best values obtained in this study, were Q=0.95 and SSIN=0.95 and were obtained for the NF–N images for both scanners. These results were followed by Q=0.73 and SSIN=0.92 in the case of NF–NDS for the HDI ATL-3000 scanner, and Q=0.72 and SSIN=0.94 in the case of NF–DS for the HDI ATL-5000 scanner. In [31], where natural scenery images were distorted by speckle noise, the value reported for Q before contrast stretching was 0.4408, whereas the value for Q after contrast stretching was 0.9372.

The methodology presented in this study may also be applicable in future studies, for the evaluation of new ultrasound and telemedicine systems, in order to compare their performance. It is also important to note that the methodology consists of a combination of subjective and objective measures that should be combined together for a proper image quality evaluation result [33].

Concluding, the results of this study showed that normalization and speckle reduction filtering are important processing steps favoring image quality. Additionally, the usefulness of the proposed methodology, based on quality evaluation metrics combined with visual evaluation, in ultrasound systems and in wireless telemedicine systems needs to be further investigated.