1 Introduction

RGB is a simple color scheme image which consists of three primary red (R), green (G), and blue (B) color components. Moreover, there is a total of 24 bits per pixel (8 bits for each color component). It is evident that each color component values range lies in [0, 255]. Hence, there are \({2}^{24}\approx 16.8\) million possible colors. This image type is commonly used for transmission, representation, and storage of color images on both analog and digital devices [1,2,3,4]. Color image quantization is a significant procedure of reducing the huge range of color values of a digital color image into a limited range. This scheme may be used in displaying devices with a limited color range, color image compression or reducing the transfer time of the image in a limited network traffic. Furthermore, a desirable quantization approach considers not only low computational complexity but also low color distortion. In other words, a desired output of quantization algorithm must be gained instantly with a minimum range of color values in a color palette. Additionally, the output must be visually similar to the original image, as far as possible. Although many algorithms have been presented for color quantization, the number of colors in a color palette is a predetermined parameter. Also, there are two types of color palette generating methods: image-independent method and image-dependent method. The former generates the color palette without taking the particular image contents into consideration. In the latter, however, the color palette is generated in regards to the color distribution of images [5].

Color quantization can be divided into two main groups, scalar quantization and vector quantization. In scaler method each color component range is limited to smaller range independently. So, the quantization process is done irrespective of the original color space distribution. It is worth mentioning that a scalar scheme is only ideal if image colors are uniformly distributed in RGB color space. Nevertheless, the distribution of colors in generic color images is by no means uniform. On the other hand, vector quantization considers all color components together, contrary to scalar quantization. Therefore, each pixel is considered as a color vector \(p_{i} = (r_{i} ,g_{i} ,b_{i} )\) which is regarded as a single unit [1]. Consequently, vector quantization likely outperforms scalar quantization. However, due to the large search space, computational time needed for vector quantization is longer.

Vector quantization is categorized into two groups: splitting algorithm [6,7,8,9,10,11,12,13,14] and clustering-based algorithm [15,16,17,18,19,20,21,22,23,24]. In splitting approaches the color space of the given image is split into a set of disjointed regions. In other word, the color cube is divided into many prisms. On the other side, in clustering method the color palette is structured in regards to cluster centroids. Therefore, optimum predefined \(K\) cluster centroids is figured out by means of clustering algorithm such as k-means or C-means. Then each cluster centroid represents a unit color in the color palette.

Median-cut algorithm is a well-known splitting-based method proposed by Heckbert [6]. The RGB cube is repeatedly split into smaller prisms along one of the color components axes. In each step, the sub-prisms with highest pixel count is selected for splitting. The optimal cutting point is selected from a point where both sub prisms contain almost equal number of image pixel. This procedure is repeated until the predefined number of representative colors is touched. Later the mean vector of the contained colors in each prism is selected to represent the image.

The popularity algorithm is also proposed by Heckbert [6]. This method chooses k number of colors to represent the image. For that reason, k regions of image color distribution with high density are selected for the color palette.

The center-cut algorithm [7] makes three efficient modifications on the median-cut scheme in order to achieve a more effective result. At the first step the sub-prism with the longest-dimension is selected for splitting instead of the sub-prism with the largest number of pixels. Next, the prism is cut from the center of the prism instead of cutting from a point where both sub-prisms contain almost equal number of image pixels. Finally, instead of cutting the three significant bits off each color component, three, two and four bits of the red, green and blue component are cut respectively.

On the other hand, there is another approach which is based on median-cut, called variance-based algorithm [10]. In this approach the prism with the highest weighted variances of color distribution is selected for splitting. Moreover, the prism is split along the axis which reduces the variance more than the other two axes. Lastly, the sum of squared errors from prisms are almost equal.

Another alternative splitting-based approach is Octree algorithm [11] which is also like median-cut algorithm. In this method the color cube can be divided into eight sub prisms. Then, the further sub prisms are divided into eight smaller prisms. The dividing operation continues until the \(K\) number of predefined sub prisms are constructed. Later the average of color in sub prisms represents the colors in the color palette.

Moreover, clustering-based quantization algorithms divide pixels into predefined \(K\) clusters with respect to pixel similarity. The K-means clustering scheme is widely applied in color quantization [25,26,27,28,29]. The \(K\) number of cluster centroids is randomly initialized. Afterwards, the initialized centroids are improving iteratively in order to minimize the similarity error. In each iteration pixels are assigned to the closest centroid. Thus, pixels associated with the same centroid represent a cluster. Subsequently, cluster centroids are updated by calculating the mean of all pixel values associated with the same centroid. Next, each cluster centroid represents colors in the color palette. Furthermore, Fuzzy c-means (FCM) also has been well applied in color quantization [30,31,32]. FCM is the extended version of k-means such that each cluster member can belong to many clusters instead of exactly one cluster. There is a membership value between each pixel and cluster centroid and this value indicates the level of the association between that pixel and a cluster centroid [33, 34].

The rest of this paper is arranged as follows: Sect. 2 describes the proposed method; Sect. 3 includes the comprehensive experimental results and discussion. Finally, Sect. 4, provides the conclusions.

2 Proposed color quantization algorithm

The purpose of the color quantization algorithm is to create an image with almost the same quality as the original image with a limited number of colors. In the meantime, important data of the image should be preserved while reducing the color range. In the proposed method there is no predefined parameter for desired number of colors for presenting a color image. This paper presents a vector-based color quantization algorithm using peak-picking strategy from red, green, and blue channel histograms. Moreover, in the proposed approach the color palette is determined by an image-independent scheme.

2.1 Color palette/quantized palette design

Generally, the real word image histograms are multimodal and there are multiple peaks and valleys on image histogram [34]. The histogram curve shows that pixels near the peak are more frequent in occurrence than pixels far from the peak. On the other hand, there are many unreliable peaks and several too close dominating peaks upon the image histogram which makes it hard to find the dominating peaks. To tackle this shortage, it is beneficial to eliminate unreliable peaks upon the image histogram by smoothing the histogram curve. Hence, red, green and blue component histogram curve has been smoothed independently by applying a Gaussian smoothing filter. The original histogram curve, f(x) is convoluted with Gaussian mask, g(x) as follows:

$$h(x) = f \times g(x) = \int\limits_{ - \infty }^{\infty } {f(x - \tau )g(\tau ){d}\tau ,}$$
(1)
$$g(x) = \frac{1}{{\sigma \sqrt {2\pi } }}{\text{e}}^{{ - \frac{{x^{2} }}{{2\sigma^{2} }}}} ,$$
(2)

where \(\sigma\) is variance of mask and h(x) is filtered histogram. The size and variance of the Gaussian filter used to smoothen the histogram are empirically set to 3 and 11.

As it is clear from Fig. 1 all dominated peaks are easily visible. The peak-picking method can be defined as follows:

$$peak_{s} = \;\left( {\left( {i,h_{s} \;(i)} \right)|h_{s} \;(i) > h_{s} \;(i - 1) \wedge h_{s} \;(i) > h_{s} \;(i + 1)} \right),\;s = \left\{ {r,g,b} \right\}$$
(3)
Fig. 1
figure 1

a Original image histogram. b Smoothed histogram

Let \(\vec{r},\vec{g}\) and \(\vec{b}\) be the peaks location for red, green and blue color component, respectively. Let's generate the color palette with the assumption that n, m and k are number of existing peaks in red, green and blue color component, respectively. Consequently, the maximum number of colors in the color palette will be \(n \times m \times k\). Therefore, for n = 2, m = 3 and k = 2 the color palette can be represented as Table 1.

Table 1 Example of color palette

However, the occurrence frequency of \(c_{i}\) in the 3D histogram may be zero. In the other word, some of the colors in the palette may come across a place in color space where there is no pixel. Thus, unreliable colors in the palette will be removed from the palette. As it is clear, the location of color vectors of quantized palette on color space is dependent on the color distribution in RGB color space.

2.2 Pixel mapping

Firstly, since each color vector \(c_{i}\) is a single unit in the color palette, they would be indicated as cluster centers in RGB color space. Next, each pixel \(p_{u,v}\) in the image will be assigned to the cluster which has the least Euclidean distance. The assignment operation is mathematically calculated as follows:

$$S_{i} = \left\{ {\mathop p\nolimits_{u,v} :D_{{\mathop c\nolimits_{i} }}^{{\mathop p\nolimits_{u,v} }} \le D_{{\mathop c\nolimits_{j} }}^{{\mathop p\nolimits_{u,v} }} \forall j,1 \le j \le \left( {n \times m \times k} \right)} \right\}$$
(4)

where \(D_{{\mathop c\nolimits_{i} }}^{{\mathop p\nolimits_{u,v} }} = \sqrt {\left( {\Delta R^{2} + \Delta G^{2} + \Delta B^{2} } \right)}\), \(\Delta R = \left| {p_{{_{u,v} }}^{r} - c_{i}^{r} } \right|\), \(\Delta G = \left| {p_{{_{u,v} }}^{g} - c_{i}^{g} } \right|\), \(\Delta B = \left| {p_{{_{u,v} }}^{b} - c_{i}^{b} } \right|\) and \(p_{u,v} = (r_{u,v} ,g_{u,v} ,b_{u,v} )\).

Next, if no members are assigned to a cluster and it remains empty, that cluster will be deleted. As can be seen, there is no user-defined parameter for the number of colors in the color palette. By this account, homogeneous pixels will be grouped together into clusters. Later, the color palette (Cluster centers) will be updated by the mean color vector of pixels in each cluster. Finally, the new image will be generated by replacing the original pixel value with the associated color vector of cluster center.

The proposed algorithm is summarized by the following seven steps:

  • Computing histogram for all color components.

  • Smoothing all color component histograms.

  • Finding the dominating peaks from smoothed histograms.

  • Generating the color palette (Cluster Centers) by means of histograms peaks.

  • Getting rid of the unreliable color from the palette.

  • Assigning each pixel to the closest value on the color palette

  • Updating pixel values by the mean color vector of pixels in the clusters

Figure 2 helps to better understand the mechanism of the proposed algorithm. Figure 2a shows the original color image where Fig. 2b shows the quantized image by the proposed approach. On the other hand, Fig. 2c illustrates the histogram of each color components. Lastly, Fig. 2d shows the color distribution of the image along with generated color palette using histogram peaks. It is clear from Fig. 2d, the intersection points of the histogram peaks in the color space indicate the unit color vector in the palette.

Fig. 2
figure 2

a Original image, b quantized image, c smoothed RGB histogram with located peaks, d generated color palette and color distribution on color space

3 Experimental results and performance evaluation

Our experiments were implemented using MATLAB R2019 on a Core i7-7700HQ 2.80 GHz CPU, 32 GB RAM running Windows 10. The proposed method and all compared algorithms have been tested over standard publicly accessible Berkeley segmentation dataset (Fig. 3) [35]. To evaluate the ability of the proposed approach, 22 images from this dataset have been randomly selected and tested with proposed method and two well-known fuzzy c-means (FCM) [36] and median cut (MC) quantization algorithms. Both visual and numerical assessments have been used to evaluate the outcome of algorithms. Moreover, algorithm computation time, peak signal to noise ratio (PSNR), MSE and structural-similarity index (SSIM) performance criteria have been considered in the evaluations. General form of PSNR, MSE and SSIM mathematically calculated as follows:

$$\begin{gathered} {\text{PSNR}} = 10 \log_{10} \left( {\frac{{255^{2} }}{{{\text{MSE}}}}} \right), \hfill \\ {\text{MSE}} = \frac{1}{M \times N}\mathop \sum \limits_{x = 0}^{M - 1} \mathop \sum \limits_{y = 0}^{N - 1} \left[ {I\left( {x,y} \right) - \tilde{I}\left( {x,y} \right)} \right]^{2} , \hfill \\ \end{gathered}$$
(5)

where \(I\) and \(\tilde{I}\) are original and quantized images of size M \(\times\)N, respectively.

$${\text{SSIM}}\left( {I,\tilde{I}} \right) = \frac{{\left( {2\mu_{I} \mu_{{\tilde{I}}} + c_{1} } \right)\left( {2\sigma_{{I\,\tilde{I}}} + c_{2} } \right)}}{{\left( {\mu_{I}^{2} \mu_{{\tilde{I}}}^{2} + c_{1} } \right)\left( {\sigma_{I}^{2} + \sigma_{{\tilde{I}}}^{2} + c_{2} } \right)}},$$
(6)

where \(\mu_{I}\) and \(\mu_{{\tilde{I}}}\) are the local sample averages \(I\) and \(\tilde{I}\) respectively,\(\sigma_{I\,}\) and \(\sigma_{{\,\tilde{I}}}\) the local sample standard deviations of \(I\) and \(\tilde{I}\) respectively, \(\sigma_{{I\,\tilde{I}}}\) the local sample correlation coefficient between \(I\) and \(\tilde{I}\), C1, C2 the constants required to balance the calculations. The SSIM can take values in range [− 1, 1]. 1 is the desired value for SSIM. Moreover, higher and lower value are desired in PSNR and MSE, respectively.

Fig. 3
figure 3figure 3figure 3figure 3

Original images and quantized images

As it is mentioned before, the number of the colors in the quantized palette is determined automatically by the algorithm. Therefore, all images are also quantized with the FCM and MC algorithms where the number of colors of the quantized palette is set as compatible with the proposed method. However, MC works if only the number of colors be power of two. Hence, the determined color number (m) by the proposed method will be rounded down to the nearest power of two. Furthermore, since this FCM is a stochastic algorithm that may have different performances at each run. Consequently, FCM runs 30 times over each test image. Therefore, the results will be reported by taking the average over all runs of this algorithm. The maximum number of iterations for FCM is equal to 200.

The visual qualitative assessment of all test images is shown in Fig. 3 while Table 2 reports the numerical qualitative analysis of the results achieved using all three methods tested. First column of Fig. 3 indicates the names of images where the rest of the columns show outcomes for PPCS, FCM and MC. The value below each image is the number of colors in the corresponding image. Besides, Table 2 is constructed by six main columns. First one gives the name of the images, columns 3 reports the number of representative colors in each method. Likewise, the rest of the columns provide the PSNR, MSE, SSIM and computation time of each algorithm.

Table 2 Quantitative evaluation of results

From Fig. 3 it is evident that the proposed approach has the ability to reach a promising quantization. Table 2 reveals that for nearly all test images, PPCS, FCM and MC all yield satisfying outcomes. Major difference is that PPCS requires no number of representative colors to be set in advance. PPCS determines the number of representative colors automatically. FCM could perform strong quantization results if the optimum number of representative colors was known in advance. Moreover, in MC the number of representative colors must set to power of two. Evidently, the number of representative colors is a handicap for both FCM and MC. Therefore, for a fair assessment of the performance, the number of representative colors for FCM and MC is determined by the number of generated cluster centers identified by PPCS.

Considering the results shown in Fig. 2 for ‘239096’ and ‘15004’, it would appear that FCM is inefficient at quantizing the image. Except these two images it is clear the results achieved by all three methods are viable for the rest of the images.

Table 2 shows that with respect to the numerical evaluation, all methods achieve competitive outcomes, such as they surpassed one another in many cases. For example, for ‘351093’ PPCS outperforms other methods in all performance criteria. However, FCM outperforms others for ‘106024’ in all cases except for competition time. On the other hand, MC surpasses others for ‘239096’ regarding all performance criteria. Clearly, the computational effort needed to perform FCM is dependent on the cluster size. Also, the size of the image. is almost same, and computational efforts of PPCS and MC are independent of the image size and therefore the number of clusters does not have noticeable effect. Such that, quantization of images with a high number of representative colors takes considerably long with FCM whereas it takes a shorter time with low number of representative colors. However, the computation time for all images is almost the same in PPCP and MC. Last row of Table 2 indicates the average of each performance criteria over all test images. Therefore, the results can be conveniently interpreted and compared. As regards PSNR and computation time, the proposed algorithm is ranks first where MC and FCM are ranked second and last. Likewise, respective to MSE and SSIM, FCM ranked first when PPCS and MC ranked second and last, correspondingly.

Looking at the overall results, it can be concluded that the PPCS, FCM and MC approaches perform well, and provide satisfactory quantization results for roughly all test images. Evident from the comprehensive investigation of both visual and numerical evaluations, the proposed (PPCS) method presents promising quantization results. This success is owed to the ability of the algorithm in automatic estimation of the appropriate number of representative colors.

Figure 4 shows the average PSNR, MSE, SSIM and computation time of PPCS, FCM and MC algorithms on all 22 color images.

Fig. 4
figure 4

PSNR, MSE, SSIM and computation time comparisons among PPCS, FCM and MC on all test image

4 Conclusion

In this paper, a novel automated clustering of pixels and color quantization algorithm is proposed. The proposed color quantization technique (PPCS) is able to automatically estimate an appropriate number of colors in quantized palette as well as the cluster centroids (Colors of palette), indicating the advantage applying peak-peaking strategy on color spaces. The ideal number of representative colors is unknown beforehand in most practical cases. This issue may in fact be a significant handicap to many methods, since this parameter is a user-defined one. However, proposed PPCS requires no number of representative colors to be set in advance and estimation of this parameter is provided by the algorithm which makes it usable and practical in real-word application. On the other hand, the computation time of the algorithm is not dependent on the size of the image to be quantized. To evaluate the ability of the PPCS, 22 images from Berkeley segmentation dataset have been selected and tested with PPCS, FCM and MC algorithms. The numerical evaluations have been carried out by using computation time, PSNR, MSE and SSIM performance criteria. Both visual and numerical evaluations reveal that the proposed method presents promising quantization results. Such that, PPCS is ranked first, second, first and first respect to PSNR, MSE, SSIM and computation time, respectively.