1 Introduction

In lossy compression, the reconstructed image contains degradation relative to the original [76]. Lossy methods are especially suitable for natural images in applications where minor loss of fidelity is acceptable to achieve a substantial reduction in bit rate. Image compression can be used in many areas such as security [27, 28], vehicular networks [15, 72], video aplications [73, 74], text detection [71], smart materials [70], motion estimation [22, 23],...

Among the existing lossy compression methods, transform coding is one of the most effective strategies [41]. The basic components of this method are performed in three stages, namely the image transformation stage followed by the quantization and the entropy coding stages as shown in Fig. 1.

Fig. 1
figure 1

Basic model of lossy image compression

The transform-based image compression techniques are mainly based either on Discrete Cosine Transform (DCT) [50] such as the Joint Photographic Experts Group (JPEG) standard [48] or on Discret Wavelet Transform (DWT) [19].

Although DCT-based image compression provides satisfactory compression efficiency and gives a low memory implementation, the tiling of the blocks causes blocking artifacts that lead to a degradation in performance especially in low bit rates [49]. One way to resolve the problem is to allow the transform to be performed across different blocks. An example is the DWT, which is applied on the whole image instead of a block basis. The DWT has been successfully used in image-processing applications ever since Mallat [43] proposed the multiresolution representation of signals based on wavelet decomposition.

The advantage of DWT over other transformations is that it carries out multiresolution analysis with localization in both time and frequency. The DWT is increasingly used for image compression today since it supports features like progressive image transmission [3, 45]. In fact, it is the basis of the JPEG2000 image compression standard [69].

DWT has traditionally been implemented by convolution or Finite-Impulse Response filter (FIR) bank structures. However, it requires both far more computations and large storage features that are not desirable for either high-speed or low-power image-processing applications [19, 40, 44].

Hence, the lifting-based DWT [62] was proposed and it has become popular with a lower cost of computation, more efficient performance and easier hardware implementability.

Over the years, several methods of coding and quantization coefficients [35] have been developed and implemented. These include Embedded Zerotree Wavelet (EZW) [58], Set-Partitioning in Hierarchical Trees (SPIHT) [56], Set Partitioned Embedded Block Coder (SPECK) [36] and Embedded Block Coding with Optimized Truncation (EBCOT) [63].

These algorithms are used along with the wavelet transform subband coding in order to keep the progressive side. In fact, they create an embedded binary flow and a progressive data transmission that allow the image to be reconstructed using different compression ratios.

In the literature, various improvements have been made on the three blocks of image compression to enhance their performance. Since it was shown in [60] that arithmetic coding [68] is more efficient than Huffman coding [33] and it is the most commonly used in wavelet-based algorithms, different improvements were focused hence on the first and second blocs called transformation and quantization respectively.

Recently, some surveys have been conducted on the wavelet-based image compression such as the work presented in [16, 24, 25, 34, 37, 47, 51, 53]. In [51], the authors proposed an extensive study of the different existing wavelets and the encoding algorithms. In [25, 34, 53], the authors focused on the features and applications of the popular image compression algorithms. In [16, 24, 37, 47], a survey of various image compression techniques was suggested. Besides, the authors proposed an extensive study of the features of these algorithms.

Unlike [16, 24, 25, 34, 37, 47, 51, 53], we present a detailed study of the recent improvements in wavelet transform. In particular, we suggest the best wavelet transform used in image compression. Also, we review and discuss the advantages and disadvantages of different wavelet coding methods. Furthermore, our work focuses on the differents improvement done to the EZW and SPIHT algorithm: we suggest a new improved algorithm based on EZW and we shed light on the design of a new efficient wavelet image codec. Our contribution is highlighted as follows:

  • We provide an overview of the wavelet transform as well as of the existing wavelet-based image compression algorithms. We review the recent improvements of the different steps of image compression.

  • We analyze and compare the performance of the enhancement and compression techniques in order to suggest and design an efficient codec for wavelet image compression.

The remaining sections of this paper are organized in the following way. In Section 2, we give an outline of the various states of the art about the wavelet transform as well as the recent improvements. Section 3 presents a comparative study between different DWT. The different methods of coding and quantization coefficients is given in Section 4. The salient features of each of those coding schemes are discussed briefly. Also, an outline of the recent improvements of different wavelet-based coding schemes is given. Section 5 presents a qualitative and quantitative analysis of the different coding schemes. A suggestions of designing an efficient codec for wavelet image compression is given in Section 6. Finally, Section 7 concludes the paper and presents the future works.

2 Overview of wavelet transform (WT)

Basically, wavelets are mathematical functions that have an average value of zero and an effectively-limited duration.

The WT represents an image as a sum of wavelet functions that are localized in both time and frequency [19, 57]. Any decomposition of an image into wavelets implies two waveforms. The first one represents the low frequencies or smooth parts of an image (scaling function Φ) and the second one stands for the high frequencies corresponding to the detailed parts of an image (wavelet function Ψ).

2.1 Multi-resolution analysis: First generation wavelets

The WT carries out multiresolution image analysis [44]. The result of multiresolution analysis is a simultaneous image representation on various resolution levels. The resolution is determined by a threshold below which all details are skipped. The difference between two neighboring resolutions represents details. Consequently, an image can be represented by approximation (a low-resolution image) and the details on each higher-resolution level [21].

Let us consider a one-dimensional 1-D function f(t). At the resolution level j, fj(t) is the approximation of the function f(t). At the next resolution level (j + 1), fj+ 1(t) is the approximation of the function f(t). The details represented by dj(t) are contained in fj+ 1(t):

$$ f_{j + 1}(t)=f_{j}(t)+d_{j}(t) $$
(1)

This procedure can be repeated many times and the f(t) can be considered as:

$$ f(t)=f_{j}(t)+\sum\limits_{k=j}^{n}d_{k}(t) $$
(2)

The space of square integrable functions \(L^{2}(\mathbb {R})\) can be considered similarly as a composition of the scaling and the wavelet subspaces V j and Wj such that the details (dj(t)) of f(t) are in Wj and the approximation at resolution j (fj(t)) is in V j.

V j and Wj are defined in terms of translates and dilates of the wavelet (Ψ) and the scaling functions (Φ):

$$ \left\{ \begin{array}{ll} V_{j}=\{{\Phi}(2^{j}x-k)|k \in \mathbb{Z}\} \\ W_{j}=\{{\Psi}(2^{j}x-k)|k \in \mathbb{Z}\} \end{array} \right. $$
(3)

V j and Wj are localized in a dyadically-scaled frequency by the resolution parameter or scale 2j and localized spatially by translation k . The V j must be contained in all subspaces on higher resolutions (V jV j+ 1). The Wj fills the gaps between successive scales: V j+ 1 = V jWj.

First, we start with an approximation on some scales V 0. Then, we use the wavelets to fill in the missing details on much finer scales. The finest resolution level contains all square integral functions: \(L^{2}(\mathbb {R})=V_{0} +\oplus ^{\infty }_{j = 0} W_{j}\). Since Φ ∈ V 0V 1, it ensues that the scaling function for multiresolution approximation can be got as the solution to a two-scale dilatational equation for some appropriate sequence of coefficients (aL(k), L = Low):

$$ {\Phi}(x)=\sum\limits_{k}a_{L}(k){\Phi}(2x-k) $$
(4)

Once Φ has been found, a related mother wavelet is defined by a similar-looking formula

$$ {\Psi}(x)=\sum\limits_{k}a_{H}(k){\Phi}(2x-k) $$
(5)

Some effort is required to produce appropriate coefficient sequences aL(k) (for low “L” pass filters) and aH(k) (for high “H” pass filters) [19].

As far as the wavelet analysis is concerned, one of the major discoveries was that perfect reconstruction filter banks could be formed using the coefficient sequences (aL(k) and aH(k)) that are shown in Fig. 2.

Fig. 2
figure 2

Two-channel filter bank

The input sequence x is convolved with high-pass filters aH(k) and low-pass filters aL(k) and every result undergoes decimation by the factor of ‘two’, which gives the transform signals xH and xL.

The signal is reconstructed through upsampling and convolution with high synthesis filters (sH(k)) and low synthesis filters (sL(k)). For properly designed filters, the signal is reconstructed exactly (y = x).

The choice of filters determines the possibility of perfect reconstruction and also the form of wavelet we use to perform the analysis. By cascading the analysis filter bank a number of times, a digital signal decomposition DWT can be formed. The inverse manipulation is called inverse DWT (for the synthesis). An efficient way to implement this scheme using filters was developed by Mallat [43].

A 2-D DWT can be derived from the 1-D DWT. To obtain scaling and wavelet functions for two dimensions, we have to multiply two 1-D functions. The scaling function for 2-D DWT can be obtained by multiplying two 1-D scaling functions: Φ(x,y) = Φ(x)Φ(y).

As for the wavelet functions for 2-D DWT, they can be obtained by multiplying two wavelet functions and scaling function for 1-D analysis.

There are three wavelet functions that scan details in horizontal, vertical and diagonal directions:

  • ΨI(x,y) = Φ(x)Ψ(y) (horizontal)

  • ΨII(x,y) = Ψ(x)Φ(y) (vertical)

  • ΨIII(x,y) = Ψ(x)Ψ(y) (diagonal)

This may be represented as a four-channel perfect reconstruction filter bank as shown in Fig. 3. Each filter is 2-D with the subscript representing the type of filter (low or high pass filter) for separable horizontal and vertical components. The resulting four transform components consist of all possible combinations of high and low pass filtering in the two directions.

Fig. 3
figure 3

One filter stage in 2D-DWT

By using these filters (high and low pass) in one stage, an image can be decomposed into four bands (sub-bands). There are three types of detail images for every resolution: HL (horizontal), LH (vertical), and HH (diagonal). Here, L = low and H = high. The same operations can be repeated on the LL (low-low band) sub-band using the second stage of identical filter bank. This way, a 2-D DWT will generate the hierarchical pyramidal structure shown in Fig. 4.

Fig. 4
figure 4

Subband decomposition

The important criteria for the selection of the suitable wavelet functions are the compact support (obtain sparse representations), the number of vanishing moments (related to the number of wavelet oscillations), the symmetry (useful in avoiding dephasing), the regularity and the degree of smoothness (related to filter order or length) [26].

The Daubechies Wavelet (DW), with its high and low pass filter coefficients, which are given in Table 1, is a family of orthogonal wavelets that is compactly supported [19]. The criterion of ‘compactly supported’ for wavelets corresponds to FIR filters so that this filter leads to efficient implementation [75]. A major disadvantage of DWs is the asymmetry of their filters, which gives way to artifacts at the borders of the wavelet subbands. Symmetry in wavelets can be obtained only if we are willing to give up either the compact support or the orthogonality of the wavelet, except for the Haar wavelet, which is the only orthogonal wavelet that is compactly supported and symmetric [26]. The high and low pass filter coefficients of Haar is given in Table 2.

Table 1 The filter coefficients of the Daubechies-2 wavelet
Table 2 The filter coefficients of the Haar wavelet

To have both symmetry and compact support, we should release the orthogonality constraint, which prevents having a symmetrical wavelet with compact support, and allows non orthogonal wavelet functions. An example of this type of wavelet is the family of biorthogonal wavelets [18] that contains compactly-supported and symmetric wavelets [1].

Many biorthogonal wavelet filters are used in the calculus of wavelet transform [18].

Cohen, Daubechies and Feauveau [18, 65] gave the first biorthogonal wavelet CDF9/7. It has a great number of vanishing moments for a relatively short support. It is more symmetrical and very close to orthogonality. The CDF9/7 wavelet has been widely used in image coding and has become the core of the JPEG2000 standard [69]. It is used then for the irreversible lossy DWT source.

2.2 Lifting-based DWT scheme: Second generation wavelets

The convolution based on discrete wavelet transform (DWT) requires both far more computations and large storage features that are not desirable for either high speed or low power image processing applications [19, 44]. Swelden has proposed a new mathematical formulation for wavelet transformation based on the spatial construction of the biorthogonal wavelets [62]. In [20], a very versatile scheme for its factorization has been suggested. This new approach is called the lifting-based DWT scheme. It constructs second-generation wavelets together with Daubechies and realizes the lifting formats of traditional wavelets.

The lifting-based DWT scheme can be implemented as shown in Fig. 5 to reduce the computational complexity.

Fig. 5
figure 5

Conventional lifting scheme

Consider x[m,n], a 2-D signal. Without loss of generality, we assume that this signal is decomposed by a 1-D wavelet transform in the horizontal direction firstly and in the vertical direction secondly. Each 1-D wavelet transform can be factored into one or multiple lifting stages. A typical lifting scheme is composed of four steps in both the decomposition and the reconstruction stages.

As for the decomposition stage, it consists of four steps: Split, Predict, Update and Normalize.

  1. (a)

    Splitting: The x[m,n] is split into even e1 and odd o1 sub-arrays, where:

    e1 = {e1[m,n] = x[2m,n])} and o1 = {o1[m,n] = x[2m + 1,n]}

  2. (b)

    Predict: The o1(odd array), whose samples are located at integer positions, is predicted from the neighboring even array e1. The predictor P(.) is a linear combination of the neighboring even arrays.

    $$ P({e}^{1})[m,n] =\sum\limits_{i}p_{i}{e}^{1}[m+i,n]\\ $$
    (6)

    pi represents the high-pass filter coefficient. The resulting prediction residual or high subband coefficient is

    $$ o_{\nu}^{1}[m,n] =o^{1}[m,n]- P(e^{1})[m,n]\\ $$
    (7)
  3. (c)

    Update: This step transforms the even array e1 into a low-pass filtered and sub-sampled version of x[m,n]. This coarse approximation is obtained by updating with a linear combination of the prediction residual \(o_{\nu }^{1}\). The e1[m,n] is replaced with

    $$ e_{\nu}^{1}[m,n] =e^{1}[m,n]+ U(o_{\nu}^{1})[m,n] $$
    (8)

    where U(.) is a linear combination of the neighboring \(o_{\nu }^{1}\) values

    $$ U(o_{\nu}^{1})[m,n] =\sum\limits_{j}u_{j}o_{\nu}^{1}[m+j,n] $$
    (9)

    where uj represents the low-pass filter coefficient.

  4. (d)

    Normalization: s1 and d1 which represent the smoothed array and the array of details, are obtained from the following operation:

    $$ \left\{ \begin{array}{ll} s^{1}= \sqrt{2}e_{\nu}^{1}\\ d^{1}= o_{\nu}^{1}/\sqrt{2} \end{array} \right. $$
    (10)

    where \(e_{\nu }^{1}\) and \(o_{\nu }^{1}\) represent the new even and odd arrays.

The reconstruction of the signal x from the smoothed array and the array of details s1 and d1 in a reverse order is one of the most attractive features of the lifting scheme.

The inverse lifting stage is shown in Fig. 5b.

The Lifting scheme of the CDF9/7 DWT goes through four steps: two prediction operators and two update operators.

The lifting coefficients of the CDF9/7 wavelet filter are enlisted in Table 3.

Table 3 Lifting coefficient of CDF9/7 wavelet

In this table, α, γ denote the prediction filter coefficients and β, δ denote the update filter coefficients. The parameter ξ represents the scaling factor.

2.3 Recent improvements of biorthogonal wavelet transforms: spline-based DWT scheme

Like other wavelet bases, splines have a significant impact on the theory of the wavelet transform [17, 64]. The symmetric compactly supported spline wavelets [18] turn out to be quite popular for coding applications in image compression. Mainly, the symmetry and compact support criteria are very useful for reducing artifacts in the reconstructed images. However, since the introduction of the lifting scheme for the design of wavelet transforms, a new way has opened for the use of splines as a tool for devising wavelet transforms. For the construction of the appropriate filters, authors use polynomial and discrete splines because they provide filters whose properties are useful for signal processing. These filters have flat spectra. Their impulse responses are symmetric and well-localized in the spatial domain. Also, they generate wavelets that can have any number of vanishing moments.

A mathematical formulation for the construction of prediction and update filters using discrete and polynomial splines is given.

2.3.1 Discrete splines

The discrete B-spline [5, 8] of first order is defined as follows:

$$ \beta_{1,n}(j) = \left\{ \begin{array}{ll} 1 & \text{if } j = 0,....,2n-1 \quad n \in N, j \in Z\\ 0 & \text{elsewhere} \end{array} \right. $$
(11)

The higher order B-spline is defined as the discrete convolution by recurrence:

$$ \beta_{l,n}= \beta_{1,n}* \beta_{l-1,n} $$
(12)

The z-transform of the B-spline of order l is then:

$$ \beta_{l,n}(z)= (1+z^{-1}+z^{-2}+...+z^{-2n + 1})^{l}, l = 1,2,... $$
(13)

Only the case when l = 2r, rN and n = 1 is used because B-splines of odd-order are not symmetric [7].

The corresponding splines are denoted as βr = β2r,1. In this case we have βr(z) = (1 + z− 1)2r.

We define the central B-spline Qr(j) of order 2r as a shift of the B-spline and the z-transform by:

$$ \begin{array}{ll} Q_{r}(j)= \beta_{r}(j+r) \\ Q_{r}(z)=z^{r}\beta_{r}(z)=z^{r}(1+z^{-1})^{2r}, k\in Z. \end{array} $$
(14)

The discrete spline of order 2r is defined as a linear combination, with real-valued coefficients, of shifts of the central B-spline of order 2r:

$$ S_{r}(k)= \sum\limits_{i=-\infty}^{i=+\infty}c(l)Q_{r}(k-2l) $$
(15)

Let e(k) be a given sequence. The discrete spline Sr is called the interpolatory spline if the following relations hold:

$$ S_{r}(2k)= e(k), k \in Z $$
(16)

The points {2k} are called the nodes of the spline.

The values of the splines in the midpoints between the nodes are denoted as

$$ \sigma(k) =S_{r}(2k + 1),k \in Z $$
(17)

The interpolatory spline which satisfies the conditions (8) and (9) is represented as follows:

$$ C(z^{2})= \frac{2e(z^{2})}{z^{r}(1+z^{-1})^{2r}+(-z)^{r}(1-z^{-1})^{2r}} $$
(18)

The z-transform of the interpolatory spline values in the midpoints are:

$$ \begin{array}{ll} \sigma(z^{2})=z U_{d}^{2r}(z)e(z^{2})\\ U_{d}^{2r}(z)= z\frac{(1+z^{-1})^{2r}-(-1)^{r}(1-z^{-1})^{2r}}{(1+z^{-1})^{2r}+(-1)^{r}(1-z^{-1})^{2r}} \end{array} $$
(19)

The transfer functions of these prediction filters \(U_{d}^{2r}\) (except the case r = 1) are rational ones which do not have poles on the unit circle |z| = 1. Thus, filtering can be performed in a recursive manner. Explicit formulas were derived, which enable fast cascade or parallel implementation.

2.3.2 Polynomial splines

The central B-spline of first order on the grid is defined by the following sequence:

$$ \beta^{1}(x) = \left\{ \begin{array}{ll} 1 & \text{if } x \in[-1/2,1/2]\\ 0 & \text{elsewhere} \end{array} \right. $$
(20)

The central B-spline of order l is defined via the convolution:

$$ \beta^{l}=\beta^{l-1}*\beta^{1},p \ge 2 $$
(21)

It is supported on the interval [−l/2,l/2] and it is positive within its support and symmetric around zero. The nodes of even- and odd-order B-splines are located respectively at points {k} and {k + 1/2} with kZ.

Denote:

$$ \begin{array}{ll} t^{l}=\{\beta^{l}(k)\}\\ w^{l}=\{\beta^{l}(k + 1/2)\}, k\in Z. \end{array} $$
(22)

The prediction filter, which is obtained for the spline of order l, is represented as follows [9]:

$$ {U_{p}^{l}}=\frac{W^{l}(z)}{T^{l}(z)} $$
(23)

where Wland Tl represent respectively the z-transforms of the sequences wl and tl.

These z-transforms are Laurent polynomials. They are called The Euler-Frobenius polynomials [7].

To complete the construction of the transform, we need to define the filter V for the update step which was derived from discrete and polynomial splines. In order to produce synthesis and analysis filters with similar properties, it is advisable to choose [7]:

$$ V(z)=U(z)/z $$
(24)

where U is the predict filter.

In this context, Averbuch et al. [6] propose a new family of biorthogonal symmetric wavelet transforms using discrete and polynomial splines for the design of filters for the predict and update operations in lifting schemes. They explore various combinations of filters for prediction and update steps in the lifting methodology to obtain different transforms as indicated in Table 4.

Table 4 Transfer function of different transforms

Boujelbene et al. [11] present a new biorthogonal wavelet transforms using polynomial splines of different order, as given in Table 5, performed in a lifting manner. They study the influence of the order of filters on image compression in order to choose the best wavelet transform in terms of reconstructed images’ quality. In addition, they compare the performance of the proposed transforms with the biorthogonal B9/7 transform (we denote by B9/7 the 9/7 Daubechies’s transform) which is frequently used in image compression and secondly with the existing B-spline-based transforms.

Table 5 Transfer function of different transforms

In [12], the authors check the success of the different biorthogonal spline wavelet transforms proposed in [11] on an improved algorithm using different test images and several performance measures. Thus, they justify the choice of the optimal transform presented in [11].

3 Comparative study between different DWTs

To assess the performance of the above mentioned transforms, numerical experiments have been performed on a number of standard images of size 512∗512 with 8 bits per pixel (bpp). In this paper, only the results related to Lena and Barbara images given in Fig. 6 are presented.

Fig. 6
figure 6

Test images

PSNR, compression ratios and encoding time were used as a set of performance criteria to determine the efficient transform.

We describe here a comparative analysis of the PS5 transform with the commonly used B9/7 transform [18] firstly and with the existing spline based transforms such as P1U1 and P3U3, which have been proven to be the most efficient [6] secondly.

The result of the compression ratio and the PSNR values is shown in Tables 6 and 7.

Table 6 Performance comparison of the different transforms in PSNR (dB) at various compression ratios(bpp) for the Lena image
Table 7 Performance comparison of the different transforms in PSNR (dB) at various compression ratios(bpp) for the Barbara image

From Tables 6 and 7, it is observed that the transform PS5 [11] generated by polynomial splines of order 5:

  • has a better PSNR for almost all compression ratios and for all test images. In fact, this transform outperforms the biorthogonal B9/7 which is frequently used in image compression and especially in the JPEG2000 standard [69].

  • gives a better PSNR as compared to the two most efficient spline transforms (P1U1 and P3U3) [6]. In fact, the properties of the PS5 transform such as the number of vanishing moments and the regularity of the scaling functions, are better than those for the existing efficient transforms [11, 12].

To highlight the performance of the PS5 transform in terms of computing time and complexity, we will now make a comparison between this scheme using the PS5 transform and the existing approaches using the spline wavelet transform and the biorthogonal B9/7 transform.

The encoding time of the proposed scheme in [12] using PS5 and the existing ones is shown in Fig. 7. One can see that the scheme using PS5 is faster than the existing codecs.

Fig. 7
figure 7

Encoding time (s) performances of different codecs using PS5 [12], B9/7 [18], P1U1 [6] and P3U3 [6] at various compression ratios

Finally, we conclude from the results that the PS5 transform which is generated from the polynomial spline of order 5 and performed in the lifting scheme provides the best PSNR values and Encoding time for different test images as compared to the other spline wavelet transform. Additionally, PS5 outperforms the B9/7 transform which is the most commonly used and efficient transform in image compression.

4 Overview of wavelet-based coding schemes

A variety of wavelet-based image compression schemes have been developed over the past few years. Many attempts to enhance the coders’ features and to reduce its limitations have been suggested in the literature. In this section, we will discuss different well-known algorithms such as EZW, SPIHT, EBCOT, and SPECK. We will also describe the different improved versions of these algorithms.

4.1 Embedded zero tree wavelet (EZW)

In 1993, Jerome Shapiro proposed the EZW [58]. It was one of the first powerful algorithms of wavelet-based image compression.The EZW encoder is based on progressive coding to compress an image into a stream bit with increasing accuracy. Progressive encoding is also known as embedded encoding which explains the E in EZW. The basic idea behind the EZW is to form a tree structure with its root located into the lowest-frequency subband after the DWT is applied to the image. During the coding process of EZW, as illustrated in Fig. 8, the coefficient will be compared to a predefined threshold T.

Fig. 8
figure 8

The EZW process of encoding a wavelet coefficient

All the coefficients are scanned in the order shown in Fig. 9. This guarantees that when a node is visited, all its parents will be already scanned. During the dominant pass and the subordinate pass, the process starts at the lowest-frequency sub-band path. In the dominant pass, each coefficient visited in the scan is classified as a positive significant (POS) or a negative significant (NEG), an isolated zero (IZ), or a zero tree root (ZTR). A zero tree root is a coefficient that is insignificant and all its descendants are insignificant. In this situation, the complete tree can be coded using a single symbol (ZTR) to achieve compression. The isolated zero (IZ) is a coefficient that is insignificant but has a significant descendant. In other way, when the coefficient’s magnitude is larger than its threshold, the coefficient is considered as significant. Either POS or NEG will represent it depending on whether the value is positive or negative. In the subordinate pass, the coefficients are tested to determine the significant ones (POS or NEG) that will be refined to an additional bit of precision. When all the coefficients are refined, the threshold T is halved and the coding process will be repeated. The encoder stops the loop when the desired bit rates are reached. The encoded symbols’ stream bits that contain a combination of the symbols POS, NEG, IZ, ZTR and the refinement bit (‘1’ or ‘0’) are then arithmetically coded for bit stream transmission.

Fig. 9
figure 9

Scanning order

4.2 Set-partitioning in hierarchical trees (SPIHT)

The SPIHT encoder proposed by Said and Pearlman [56] is a highly refined version of the EZW algorithm. It is based on the same concepts: progressive coding and the use of hierarchical dependencies between the coefficients of the different sub-bands.

However, a new protocol dependency between the coefficients is defined. The coefficients of the lowest frequency sub-band (LL2) are grouped into four. For each group, the offspring is as follows: One of the four coefficients (colored ‘black’ in Fig. 10) does not admit descendants. However, as for the other three (in bleu), each has four descendants. Regarding other sub-bands, as in the case of the EZW, each coefficient admits four descendants.

Fig. 10
figure 10

Parent/offspring dependencies in a spatial orientation tree

Three sets of coordinates are used:

  • O(i,j): The set of the coordinates of all the offsprings in the node (i,j);children only

  • D(i,j): set of coordinates of all descendants of node

  • L(i,j): D(i,j)-O(i,j) (all descendants except the offspring).

Unlike the EZW algorithm that uses only one set describing the significance of coefficients, SPIHT uses three lists called list of significant pixels (LSP), list of insignificant pixels (LIP), and list of insignificant sets (LIS). SPIHT adopts a set-partitioning approach where the significance test result is binary instead of the symbols used in the EZW. A coefficient is encoded as significant:

Sn(i, j) = 1 if its value is larger than or equal to the threshold T. If the coefficient value is smaller than T, then it is coded as insignificant: Sn(i, j) = 0.

Similarly to the principle found in EZW, there are two coding passes in the SPIHT algorithm: the sorting pass and the refinement pass. In the sorting pass, a significant test is performed on LIP. The elements which are found to be significant in the LIP are moved to the LSP and are followed by a significance test. The significance test is performed for the sets in the LIS, and when a set is found to be significant, it is removed from the LIS and partitioned into four single elements and a new subset. The so-formed new subset is added back to the LIS. The four newly-formed elements are moved to the LSP or the LIP according to their significance.

In the refinement pass, a refinement is carried on on every coefficient that is added to the LSP except for those elements that are newly added during the sorting pass. Each of the coefficients in the LSP is refined to an additional bit of precision. Finally, the threshold T is halved and the coding process will be repeated until the target is met or all the coefficients are coded.

4.3 Set partitioned embedded block coder (SPECK)

The SPECK encoding technique [36] is different from EZW and SPIHT because it does not use trees, which span and exploit the similarity across different sub bands. Rather, it makes use of sets in the form of blocks.

The wavelet coefficients are initially separated in two sets (regions). The one noted S could be of varying dimensions. The dimension of S depends on the dimension of the original image and the subband level of the pyramidal structure at which the set lies.

The other type of sets noted I is obtained by chopping off a small square region from the top left portion of a larger region.

Figures 11 and 12 show a partitioning of an image into the sets S and I.

Fig. 11
figure 11

a Partitioning of image X into sets S and I, b Partitioning of set S

Fig. 12
figure 12

Partitioning of set I

Two linked lists: List of Insignificant Sets (LIS) and List of Significant Pixels (LSP) are maintained. The former contains sets of type S of varying sizes which have not yet been found significant against a threshold T while the LSP contains those pixels which have proved to be significant against T.

If a set I is significant, probably the pixels that cause I to be significant lie in the top left regions of I. These regions are decomposed into sets of type S and they are put next in line for processing.

In this manner, regions that are likely to have significant pixels are grouped into smaller sets and processed first, while the other regions that are likely to have insignificant pixels are grouped into a large set. The decoder uses the same mechanism as the encoder. It reconstructs the image progressively as the algorithm proceeds. The SPECK algorithm usually gives results similar to those obtained by SPIHT [31].

4.4 Embedded block coding with optimized truncation (EBCOT)

The EBCOT encoder [63] adopted for the JPEG2000 standard does not belong to the family of compression schemes to zero tree.

Its principle is summarized in Fig. 13. The image can be processed completely or divided into rectangular ‘tiles’. These tiles (of sizes 16 × 16, 32 × 32 or 64 × 64) are then coded independently. The Tiles (or the image) are then decomposed into sub-bands on several levels. Scalar quantization is applied to the different sub-bands to reduce the accuracy of the supposed non-significant coefficients. Each sub-band is then cut into small independent blocks called ‘code block’. These are then coded in a progressive way by a bit plane and by an arithmetic encoder.

Fig. 13
figure 13

Principle of EBCOT coding

4.5 Recent improvements of wavelet-based coding schemes

In the last few years, some attempts to overcome coders’ shortcomings have been suggested in the literature, for instance [2, 4, 10, 13, 14, 30, 32, 38, 39, 42, 46, 55, 66, 67]. These improvements are concerned generally with EZW and SPIHT algorithms seeing their simplicity of coding compared to the other algorithms.

4.5.1 Modifications in EZW

The EZW algorithm was one of the first and powerful algorithms based on wavelet image compression. It is a remarkably effective, fast in execution and computationally simple technique for the stream bit of the wavelet-based image compression. However, it still has many disadvantages. So, various improvements to EZW have been done in the previous years in terms of PSNR and compression ratio.

Ouafi et al. [46] propose a modification of the Shapiro’s Embedded Zero tree Wavelet (EZW) algorithm. This new approach, called Modified EZW (MEZW), distributes the entropy among six symbols instead of four in EZW and the generated symbols are binary regrouped before entropy coding.

The difference between the MEZW algorithm and the EZW algorithm lies in the significance test process used for the wavelet coefficients and the coding procedure used for the significance symbols. If a coefficient is tested and found to be significant, its descendants must also be tested. If all the descendants are judged insignificant, the coefficients are coded according to MEZW algorithm coding rules, using the symbols Pt for positive coefficients and Nt for negative coefficients. The coding process of MEZW is shown in Fig. 14.

Fig. 14
figure 14

Principle significance test of the coefficients for the MEZW algorithm

Ahmed et al. [2] exploited the main classification rules used in EZW and Modified EZW (MEZW) presented in [46]. The new modification, called NMEZW (New Modified EZW), distributes entropy differently than EZW and MEZW by using eight symbols instead of the four used in EZW and the six used in MEZW. In addition, it optimizes the coding by a binary grouping of elements before coding which is an additional pass implemented in MEZW, too. The new NMEZW approach incorporates two new symbols compared to the MEZW; those are: Zero Pair (Zp) and Zero Triple (Zt) for the two and three consecutive adjacent zero tree classified brothers, children of (P) or (N) or (Z) classified parents.

In the NMEZW algorithm, the total number of symbols are reduced before being regrouped.

Another algorithm proposed by Che et al. [14] describes a new image compression based on decreasing coding bits which uses a conception of compression cells and compression operators about a number of consecutive Z or IZ. In this algorithm, firstly, the wavelet coefficients are coded using the symbols P, N and Z (positive P: greater than or equal to the threshold, negative N: greater than or equal to the threshold and Z: less than the threshold) (see Fig. 15). Then, it encodes in accordance with the order of character encoding reducing the coding bits. The symbols P and N are coded with 2 bits.

Fig. 15
figure 15

Algorithm coding process proposed in [14]

For the symbol Z, the coding is as follows: When the number of consecutive Z is smaller than 5, each Z is separately coded with 2 bits. Otherwise, when the consecutive number of Z is greater than 5, these consecutive Z are transformed into a compressor cell. A block diagram of this algorithm is shown in Fig. 16.

Fig. 16
figure 16

Image compression process

Another algorithm proposed by Bhokare et al. [10] presents a new image compression approach called E-EZW (Enhanced EZW) based on the principle of the EZW algorithm.

The performance of the EZW algorithm is limited in terms of compression gain by the large number of symbols Isolated Zero (IZ). Therefore, a large number of bits are consumed in the encoding of this symbol. To overcome this limitation, the E-EZW algorithm was proposed [10].

This proposed algorithm is based on the novel concept of a sparse-tree (ST) encoding scheme. This encoding scheme provides an efficient encoding of ‘IZ’ symbols and possibly gives a significant improvement in compression gain.

In the proposed algorithm E-EZW, for each wavelet coefficient at the root level, a spatial hierarchical tree is formed called ‘Sparse Tree’ (ST) similar to ‘Zero tree’. Therefore, in the ST encoding, the ‘IZ’ symbol is replaced by the symbol ‘ST’, while the other symbols (‘PS ‘,’ NS ‘,’ ZTR’) are the same as in the EZW encoding. The encoding stream bit process of the relative position of SCs (significant coefficient) in ST is shown in Fig. 17.

Fig. 17
figure 17

Encoding of a subtree in the ST of the E-EZW algorithm proposed in [10]

The ST encoding scheme provides an efficient encoding of ‘IZ’ symbols and ultimately achieves a significant improvement in compression gain.

Hazarathaiah et al. [30] present a new study of image compression based on the principle of Shapiro’s EZW.

In this paper, the EZW algorithm is modified in three ways. First, the HH band of the wavelet decomposition is neglected. Second, the three insignificant bands (HH, HL and) are ignored. Third, two more symbols are used in addition to the four symbols used in EZW. These three schemes are used with the biorthogonal wavelet and with the lifting-based wavelet decomposition.

Liu et al. [42] propose a new coding algorithm with weighted subblock-trees and adaptive coding order based on the EZW coder in order to improve the performance on the edge. First, the authors attribute bigger weights to edge sub-block-trees. Then, they favor scanning the neighbor of previous significant wavelet coefficients. Also, all of them are refined even though they are not significant.

Brahimi et al. [13] propose an efficient image coding method based on the EZW coder (IMP1EZW). This proposed algorithm is based on the use of a new significant symbol map which is represented in a more efficient way in order to reduce the number of zerotrees, the scanning and symbol redundancy of the original EZW and also to speed up the coding process. Moreover, in this paper, the authors propose a development of IMP1EZW to embed colour image coding by taking advantage of the interdependency of colour components. The proposed algorithms have shown good results for both greyscale and colour image coding applications at any given bit rate for all test images.

4.5.2 Modifications in SPIHT

SPIHT algorithm has many advantages as it is a fully embedded codec, provides good image quality and high PSNR, and it is for progressive image transmission. It has another advantage which is that it produces a very compact output bit stream with a large bit variation and no additional entropy coding is required. But still it has many disadvantages; that is why, various improvements to SPIHT have been done in previous years in terms of redundancy, quality, compression ratio,etc...

Wang et al. [67] present an algorithm to obtain a better compression ratio and hence a better PSNR. The problem of quantizing the wavelet coefficients in the lowest frequency subband with a multi-scalar method is investigated in this paper. A new quantization implementation of lossy image compression using the SPIHT algorithm is proposed. In the beginning, in the higher bit plane, this algorithm only quantizes the wavelet coefficients in the lowest frequency subband. Later, it quantizes other ones by uniform scalar.

Ke-kun [38] proposes an improved SPIHT algorithm (ASPIHT) to achieve a high compression on the image edge. This algorithm is based on prior scanning of the coefficients around which there were coefficients that are more significant. The coefficients were sorted according to the number of significant surrounding coefficients before being coded. The previous significant coefficients were refined as soon as the sets around which there were any significant coefficients had been scanned.

Another algorithm proposed by Ke-kun [39], presents an improved SPIHT algorithm based on a binary tree (TSPIHT). In this paper, four coefficients split by D-type sets are coded by a binary tree. Through coding the significance of L-type sets first, the algorithm can determine the significance of the root of the binary tree in advance with a high probability, in such a way as to improve the coding efficiency.

Huang et al. [32] propose a scan-based method based on a binary tree coding with an adaptive scanning order (BTCA) which has quality, position, and resolution scalability. In BTCA, if a block is significant, it is divided into two sub-blocks instead of four. Moreover, an adaptive scanning order is used to scan the neighbors of the significant coefficient in advance.

A paper proposed by Wang et al. [66] deals with the problem of the redundancy’s outputted bit stream of the SPIHT algorithm. In order to improve the performance of the SPIHT algorithm, a new algorithm (MSPIHT-ICA) based on the same principle as the SPIHT algorithm is proposed. Despite the simplicity and efficiency of SPIHT, this algorithm is modified by adding a new judgment to type sets. MSPIHT-ICA effectively optimized the compression of the algorithm’s outputted bit stream.

Then, Athmane [4] presents a new algorithm for image compression based on the same principle as the SPIHT algorithm. This new algorithm called MSPIHT (Modified SPIHT) reduces the total number of output bit streams. Therefore, several coefficients are encoded using a single bit. The MSPIHT algorithm lies in the insignificance test process used for the coefficient (direct descendant of the third resolution). If these coefficients are tested and found to be insignificant, the bit value ‘0’ is added to DO(i, j) (DO: the set of direct descendants of the last resolution) and the four insignificant descendants are coded by a single bit instead of four in the SPIHT. If the coefficients are not all insignificant (one or more significant) a bit value ‘1’ is added in DO(i, j) and then each bit is coded separately.

Rema et al. [55] propose a modification on the conventional SPIHT for monochrome and color images. A new way of reordering the spatial orientation tree of SPIHT ensures that the SPIHT algorithm codes more significant information in the initial bits, allowing a better performance at very low bit rates. For color images, an altered parent offspring relationship and an additional DWT decomposition level were carried out.

5 Performance analysis of different coding schemes

We present in this section a qualitative and quantitative analysis of the coding schemes mentioned in the above section. We summarize the advantages and shortcomings of each of these algorithms. Also, we perform a comparative analysis of these techniques in terms of PSNR and compression ratio.

5.1 Qualitative analysis

The success of the wavelet-based coding schemes is due to the advent of effective subband coders [19]. EZW is the first to provide a remarkable rate-distortion performance while allowing a progressive decoding of the image [58]. The principle of zero trees allows to take account of the dependency between the coefficients. After the separation of the significant coefficients and insignificant coefficients, these remain relatively independent and can be efficiently encoded using quantization techniques and entropy coding [36, 56, 58, 63].

Successive encoders such as SPIHT and SPECK improved the principle of EZW by proposing a more efficient coding of the significant information which encodes the position of the significant coefficients [16, 59].

SPIHT is a fast and computationally simple but yet efficient image compression algorithm. It improves the performance of the EZW algorithm even without arithmetic coding [29, 52, 61]. Although the EBCOT algorithm gives a higher compression efficiency as compared to SPIHT, it is the most computationally intensive part of JPEG2000 which comprises more than 50 percent of computational complexity and requires additional memory allocation [54]. All these wavelet-based image codings allow a progressive decoding [37].

Among the progressive encoders, EZW SPIHT and SPECK assume that the energy of the wavelet coefficients decreases coarse scales at fine scales and the significant coefficients have probably significant descendants. The coefficients are coded by order of scanning as defined by the algorithm and fixed for all images. However, the EBCOT encoder allows creating independent progressive stream bits [24]. Thus, it can encode each block of 32x32 coefficients independently. By registering the truncation points of each stream bit and the corresponding distortions, a rate-distortion optimization becomes feasible afterwards.

In order to overcome the coders’s shortcomings mentioned above, some improvements to the EZW and the SPIHT algorithms have been proposed in the literature seeing their simplicity of coding compared to the other algorithms.

The performance of EZW is limited in terms of compression gain due to the large number of Isolated Zero symbols. Consequently, a large number of bits are consumed in the encoding of this symbol. Hence, the E-EZW algorithm solves this problem by using a novel concept of the sparce tree encoding scheme [10]. This aproach gives an efficient encoding of isolated Zero symbols and possibly provides a significant improvement in compression gain.

Other approaches are based on the use of six or eight symbols instead of the four used in the original EZW, which reduces the total number of symbols. However, the majority of those techniques allow the increase of the total number of bits [2, 13, 46].

In order to reduce the total number of bits, the method proposed in [14] is based on decreasing the coding bits which use a conception of compression cells.

In order to improve the performances of the SPIHT algorithm, many other coding schemes have been proposed to achieve a high compression on the image edge and to improve the coding efficiency. Although those proposed approaches have quality, position, and resolution scalability, no significant objective quality improvement over the original SPIHT algorithm is reported [32, 38, 39].

Various features and demerits of the different coding techniques are summarized in Table 8.

Table 8 Performance and demerits of various coding techniques

5.2 Quantitative analysis

The result of the compression ratio and the PSNR value using the EZW, the SPIHT, the SPECK and the JPEG2000 algorithms for the Lena and Barbara images of size 512 × 512 is shown in Table 9 and Fig. 18.

Table 9 Results of the various algorithms applied to Lena and Barbara
Fig. 18
figure 18

Comparison of the different coding algorithms applied to the Lena and Barbara images

For the Lena image, in the majority of cases, the SPIHT algorithm outperforms the EZW and the SPECK algorithms by an average of 0.5 dB. Also, SPIHT shows a superior performance at all compression ratios compared to JPEG200.

For the Barbara image, it can be seen that the PSNR values obtained by the SPECK algorithm are better than those obtained by the EZW and the SPIHT algorithms and lower than those obtained by JPEG2000.

Although EZW will not produce higher PSNR values than other algorithms as observed from Table 9 and Fig. 18, it can produce perceptually superior images especially at high compression rates.

The performance of EZW and its improvements in terms of PSNR for the Lena and Barbara images is shown in Table 10 for the practical compression ratios of 0.25, 0.5 and 1 bpp.

Table 10 PSNR performance for the Lena and Barbara images

The PSNR table shows that the PSNR results obtained by MEZW, E-EZW and NMEZW for the two test images are better than those obtained by the EZW algorithm in most cases. As can be seen in Fig. 19 which is obtained from the result of Table 10, the MEZW and the E-EZW algorithms outperform the EZW algorithm, and the improvement in PSNR is significantly high around 0.5 − 2.0 dB for the Lena and Barbara images.

Fig. 19
figure 19

Comparison of different compression methods applied to the Lena and Barbara images

For the NMEZW algorithm, the rate-distortion performance shows a consistently higher PSNR around 2.0 − 4.0 dB at lower bit rates and around 0.5 − 2.0 dB over medium and high rates of 0.5 − 1.0 bpp for the Lena and Barbara images.

Then, in Fig. 20, we compare the perceptual quality of the reconstructed Barbara image.

Fig. 20
figure 20

Results for the Barbara image reconstructed using the algorithms: a EZW, b M-EZW, c E-EZW, d MNEZW for RC = 0.25bpp

It can be seen that the texture of the tablecloth and the scarf is better revealed with the NMEZW algorithm.

Therefore, E-EZW is the best improved algorithm which gives the best trade-off between the different performance criteria for all test images.

Table 11 shows the result of compression ratios and PSNR values using the different SPIHT improved algorithms for all test images.

Table 11 Results of the various algorithms applied to the Lena and Barbara images

From Table 11, it is observed that the TSPIHT, ASPIHT and BTCA algorithms have slightly better PSNRs as compared to the original SPIHT algorithm for all compression ratios and for all test images. Furthermore, Fig. 21 shows the comparison of different compression methods applied to the two test images in terms of PSNR. It can be seen that the ASPIHT, TSPIHT and BTCA algorithms outperform the SPIHT algorithm, and improvement in PSNR is around 0.01 − 0.25 dB for the Lena and Barbara images.

Fig. 21
figure 21

Comparison of the different compression methods applied to the Lena and Barbara images

Then, the perceptual image quality of the reconstructed Lena image by the four algorithms (SPIHT, ASPIHT, TSPIHT and BTCA) is shown in Fig. 22. It can be seen that the Lena image reconstructed by these four algorithms is perceptually the same.

Fig. 22
figure 22

Results for the Lena image reconstructed using the algorithms: a SPIHT, b ASPIHT, c TSPIHT, d BTCA for RC = 0.125 bpp

Hence, despite the advantages of the SPIHT algorithm, it has been noticed that the performance of the new version of SPIHT in terms of subjective and objective criteria has not been improved too much compared to the original version.

The comparison in terms of image quality for the different improved algorithms is given by the PSNR curves represented in Figs. 23 and 24. By comparing the different values of PSNR, we see clearly the effectiveness of the majority of the improved EZW algorithms as compared to the improved SPIHT algorithms in terms of the compressed image quality for all test images.

Fig. 23
figure 23

Comparison of the different improved compression methods applied to the Lena image

Fig. 24
figure 24

Comparison of the different improved compression methods applied to the Barbara image

To conclude, using simulation results, we demonstrated that the EZW-improved algorithms outperform the existing EZW and the SPIHT algorithms and SPIHT-improved algorithms. So, EZW is the best algorithm to use in order to propose other improvements that give a better performance in terms of PSNR and compression ratio.

6 Suggestions of designing an efficient codec for wavelet image compression

We present here the optimal wavelet image compression scheme yielded by the above study. We are the first to present this scheme. In fact, our study focused on the transformation and coding steps of image compression. We dealt with the recent improvements of each step. In addition, a detailed comparative study of the different improvements is carried out. This allowed us to suggest and propose a new powerful wavelet image compression scheme that will be presented in this section.

We chose the spline wavelet transform as a biorthogonal wavelet transform seeing its effciency as compared to the other transforms. Specifically, the biorthogonal transform generated by polynomial splines of fifth order has been chosen. This is due to the fact that it has good properties and gives good performances in terms of PSNR and encoding time. In fact, the symmetric compactly-supported spline wavelets turn out to be quite popular for coding applications in image compression [6, 7]. Also, the wavelet generated by the polynomial spline of fifth order has six vanishing moments. Its synthesis-scaling functions have four continuous derivatives. Hence, as shown in [7, 11] a trade-off between the different properties is the key to achieve better compression performances. All these points justify the choice of the polynomial spline wavelet transform of fifth order [11, 12]. This transform is performed in a “lifting” manner because this scheme has many advantages: it minimizes the number of operations as well as the memory occupation. Also, it is a refined system of very short filters achieving important computational and memory savings.

In order to futher improve the performance of the biorthogonal wavelet transform based on the polynomial spline of fifth order, it is possible to propose a modified structure of this optimal transform which provides an efficient representation for edges and textures in such directions.

Furthermore, from the above study done on the different wavelet algorithms, we chose the improved version of EZW as a coding scheme seeing its simplicity of coding and its performances in terms of visual quality as compared to the other. Therefore, in order to further raise the performance of EZW, an improved algorithm can be proposed in the future. The bloc diagram for this optimal codec is shown in Fig. 25. It consists of two closely-connected components.

Fig. 25
figure 25

Block diagram for optimal codec

7 Conclusion

We presented in this paper an overview of the wavelet transform (WT), which contains the advantage of DWT over other transforms and the advantage of the wavelet implementation based on the lifting scheme as compared to the WT-based convolution. Also, we described the applicability of the biorthogonal wavelet transforms using polynomial splines of different orders to still image compression.

In addition, we described the functionality of the different wavelet-based image compression techniques such as EZW, SPIHT, SPECK and EBCOT. We provided the advantages and shortcomings, as well as the area of usage for each compression algorithm.

A detailed overview of the recent modifications on the original EZW and SPIHT algorithms was given.

A comparative analysis of various improvements for different images was done based on the following parameters: CR, PSNR, visual quality and encoding time.

From the results, it is concluded that the PS5 transform, generated by the polynomial spline of fifth order, outperforms the other transforms which are frequently used in image compression. Also, all the improvements done to the EZW algorithm give better PSNRs than the original algorithm EZW; in addition, almost all these improvements outperform those done to the SPIHT algorithm.

Consequently, in the future, it is better to work on the EZW algorithm to further improve its performances and also to implement a modified structure of the optimal PS5 transform, based on directional lifting in image coding.