Introduction

Image is a two-dimensional representation of discrete signal samples [1]. The image captured by a camera will be in analog form. It has to be converted into a digital format for efficient processing, storage, and transmission. Due to the advances in recent technologies, many smart computing devices use the camera as part of their applications to capture the image at any particular instant and share it directly to the Internet. Transmission of images requires extra demand of bandwidth as the image contains more information than a simple text document. This leads to an increase in the storage space needed for the effective storage of a huge amount of image data. Therefore, it is critical to minimize the image’s size before storing or transmitting it without degrading its quality [2]. Extensive research has been carried out in the field of image compression, and several techniques have been proposed to effectively compress the images by removing some redundant bits.

The basic compression scheme is shown in Fig. 1, which follows the process of compressing the raw image with a suitable compression algorithm before storing it to a disk or memory. The original image can be restored by using the decompression technique [3].

Fig. 1
figure 1

Block diagram of basic compression scheme

Types of Compression Techniques

Image compression techniques can be broadly divided into two categories, namely (I) lossy image compression and (II) lossless image compression.

Lossy Compression Techniques

Lossy compression, as the name implies, is a technique that does not return the exact original image after decompression. The compressed image has a loss of certain pixel information, but it is considered negligible in few cases. As this technique cannot rebuild the original image through a decompression process, it is not appropriate for textual data, but can be applied on video, audio, images, etc. Lossy compression methods have larger compression ratios compared to the lossless compression techniques [4]. The most popular lossy compression techniques are discussed in the following.

Transform Coding

Transform coding is the most commonly used lossy compression technique. In this approach, the image is transformed from one domain to another, i.e., spatial to temporal. These transformed values are coded based on interpixel correlation. The most frequently used coding algorithms are discussed in the following.

Discrete Cosine Transform (DCT)

Discrete cosine transform is the widely used transform technique that is derived from the discrete Fourier transform (DFT) which represents the image as a sum of cosine functions of varying magnitude and frequency. The DCT forms the basis for the JPEG compression standard. In this technique, the input image is divided into 8 × 8 or 16 × 16 blocks, for computing its 2D DCT coefficients. These coefficients are further quantized, coded, and transmitted. At the receiver end, decompression is done by performing inverse 2D DCT on each block, thereby converting it back to a single image. These coefficients have values near zero, and they do not degrade the quality of the output image [5]. The modified discrete cosine transform (MDCT) is an upgraded version of DCT used in audio compression which does not contain phase information of the signal and eliminates the aliasing caused due to the overlap of the segments [6].

Discrete Wavelet Transform (DWT)

Discrete wavelet transform uses the concept of wavelets. Wavelets are irregular signals that are local in both time and scale. The input image is broken into wavelet coefficients termed as ‘sub-bands’ for which thresholding is employed by comparing each coefficient with the selected threshold value [7]. The output from the previous stage is further quantized to remove floating values and coded using entropy encoding. At the receiver end, the inverse DWT is applied to retrieve the original image.

The coefficients generated by the DWT are generally floating-point values, which when rounded off during the image reconstruction process results in image quality degradation. To overcome this disadvantage, integer wavelet transform (IWT) was proposed. This transform uses an efficient lifting scheme to provide a perfect correlation among the coefficients. As all the sub-bands generated in DWT are not effectively used, sub-band replacement discrete wavelet transform (SR-DWT) technique was designed to competently use the unused sub-bands. This technique replaces the horizontal sub-band with the Cb component and the vertical sub-band with the Cr component to obtain a textured image in compressed format [8].

Two notable types of DWT including embedded zero tree wavelet (EZW) and set partitioning in hierarchical trees (SPIHT) are derived based on tree concept. EZW uses an iteratively decreasing threshold value which enables the user to view finer details in the compressed image, whereas SPIHT is the popular DWT technique with the highest PSNR value and hence used in the compression of hyperspectral images. It uses a hierarchical quadtree structure with each node having no leaves or four children nodes on the transformed pixel values [9].

The use of Haar transform along with the traditional DWT gave rise to Haar wavelet transform (HWT). This transform is based on two sets of coefficients, namely transformed coefficients and detail coefficients, that are generated as a result of high-pass and low-pass filtering, respectively [10]. In this technique, the transformation is carried out along the rows as well as columns for three iterations, followed by quantization.

Karhunen–Loève Transform (KLT)

The KL transform is best suited for images with limited object types. In this case, the spectral coefficients derived are based on the statistical correlation between the image pixels. The transform works based on finding the eigenvectors and eigenvalues of the covariance matrix from which energy of coefficients is derived. These coefficients are further reduced, followed by the application of nonlinear quantization. KLT provides high de-correlation compared to other transform techniques, but it is more dependent on signal statistics. Therefore, the KL transform can be used in combination with other transforms for better results.

Image adaptive transform (IAT) uses a combination of KLT and DCT algorithms, where KLT is used to derive the transform kernels and DCT is employed to provide a statistical grouping of similar image blocks [11].

K-Means Algorithm

K-Means clustering is a very simple algorithm applied on color images that perform quantization of colors present in an image, thereby compressing the image. In this algorithm, the entire dataset is divided into K clusters and each cluster is represented by its centroid point [12]. Each pixel in a color image is represented using three bytes Red-Green-Blue (RGB) with intensity values ranging from 0 to 255 and hence the total number of colors is given as 256*256*256 = 16,777,216. The human eye cannot visualize these many colors in an image. K-Means algorithm uses this as an advantage and represents the image with few colors, thereby reducing the size of the image. This process is carried out by choosing the K number of colors which is used to form the clusters. These clusters are initially formed using some similarity metrics, followed by assignment and update steps to assign the data point to the clusters and to calculate the centroid. The iteration continues until no further color needs to be replaced based on centroid. As the K value is always less than 256, the number of colors used to represent the image is reduced, which in turn minimizes the size of the image.

Chroma Subsampling

Chroma subsampling is based on the reality that the human eye can perceive changes in brightness more explicitly than color information. This is achieved by reducing a few of the chrominance details by sampling. The color component information (chroma) is reduced by sampling them at a lower rate than the brightness component (luma) [13]. The subsampling scheme is usually based on a three-component ratio represented as J/a/b, where J denotes the region’s size in pixels, the value of ‘J’ being usually 4, which indicates four pixels in length and two pixels in width, ‘a’ denotes the number of chrominance samples, and ‘b’ denotes the variation in the chrominance samples.

4:2:2

In this technique, two pixels out of the four pixels from each row contain the chroma information, i.e., sampling is carried out on two pixels in the top and bottom row. The bandwidth is reduced to one-third containing only 50 percent of chroma information. It is used in high-end digital video formats including AVC-Intra 100, Digital Betacam, Betacam SX, etc. The 4:2:2 sampling can be further adjusted to enhance the quality of the reconstructed RGB image by dynamically adjusting the values of U and V components in the transformed YUV image [14].

4:1:1

In this scheme, one pixel out of the four pixels from each row contains the chroma information, i.e., sampling is carried out on one pixel in the top and bottom row, respectively. The bandwidth used is halved containing only 25 percent of chroma information. It finds its application in DVCPRO, NTSC DV, and DVCAM.

4:2:0

In this subsampling technique, only two pixels from the top row have chroma information, whereas none from the bottom row have chroma information. Due to this condition, the pixels from the bottom row use the chroma data from the top row. Similar to the 4:1:1 technique, the bandwidth used is halved containing only 25 percent of chroma information. This scheme finds its application in AVCHD, AVC-Intra 50, and Apple Intermediate Codec. It is also commonly used in JPEG/JFIF and MJPEG implementations.

K. L. Chung et al. proposed a technique that combines the traditional chroma sampling with distortion-minimization-based luma modification process to achieve high quality in digital time-delay integration (DTDI) image containing two primary colors and Bayer mosaic image containing one primary color [13].

Fractal Compression

Fractal compression is applied for digital images using the concept of fractals [15]. This technique is commonly used for natural images, and it is based on the assumption that some parts of an input image will always have other parts of the same image that are identical. The challenge exists in calculating the compression ratio for the fractal scheme since the image can be decoded at any scale.

Fast Sparse Fractal Image Compression (FSFIC)

This technique is based on the absolute value of Pearson’s correlation coefficient, which fastens the encoding scheme and sparse searching strategy to improve the quality of the reconstructed image. The input image is portioned into range blocks and domain blocks which are shrunk to maintain a uniform size. These shrunk domain blocks are classified based on Fisher’s three-class method, and sorting is done with an offline-trained preset block, followed by a search algorithm to find the best matching pair and finally encoded using orthogonal matching pursuit (OMP) solution.

Optimization-Based Fractal Compression Techniques

R. Menassel et al. proposed a fractal compression technique that is based on wolf pack algorithm (WPA) in which the entire image is considered as a search space for the scooting wolves to be placed. These wolves explore the space to find the similarity between the small set of blocks based on certain parameters [16]. The blocks with the best fitness are chosen, and the process ends after a fixed number of iteration or if no improved solution is achieved. Particle swarm optimization (PSO)- and flower pollination algorithm (FPA)-based fractal compression is used to improve the encoding process and provide optimal search solutions with better compression ratio and PSNR value [17]. The krill herd algorithm (KHA) is also used in fractal compression to improve the time taken for the encoding process [18]. The KHA is based on the behavior of the krill in search of food and recollecting the past occurrence of food.

Lossless Compression Techniques

Lossless compression enables the user to obtain an output image that will be a replica of the input image after the decompression process. It aims to compress the image by encoding all the information in the original image. The lossless method is preferred for medical imaging, technical drawing, satellite images, etc.

Run-Length Coding

Run-length coding (RLC) is the simplest and most commonly used lossless compression technique [19]. RLC algorithms are well suited for monochrome images which contain huge areas of contiguous color pixels. This algorithm works on the principle of searching the entire image for runs of pixels with the same color information or value and encoding its length. In few instances, the increase in length expands the size of the file. There are several RLC variants such as TIFF, PCX, and BMP graphics formats. Run-length encoding replaces the sequence of data elements with consecutive identical characters as a single data using (character, length) combination. The string 111,112,222,333,111 is represented as (1, 5) (2, 4) (3, 3) (1, 3).

Improved Versions of RLC

There are various renovations to the original version of RLC. The improved version of run-length coding (I-RLC) proposed by S. Anantha Babu et al. is based on converting the input image into a large matrix format, which is further divided into non-overlapping small matrix blocks, thereby achieving a high compression ratio [20]. An efficient run-length coding for grayscale images called (E-RLC) presented by S. A. Babu et al. delivered high-quality reconstructed images by dividing the image into different block sizes [21]. The size issue of RLC coding has been rectified using an extension to RLC coding developed by A. Birajdar et al. where the size of an image after encoding never exceeds its original size [22].

RLC for Data and Speech Compression

RLC coding can be used in various fields other than image compression. Significant use of RLC in data compression improves the bandwidth utilized. A. Amin et al. integrated the concept of bit stuffing with traditional RLC to reduce the number of bits that are used to represent the length of each run. As a result, the size of the sequence is reduced, thereby improving the compression and flexibility. [23]. M. Arif et al. applied the run-length encoding techniques on several speech signals of different languages. The quality of the reconstructed speech signal heard was completely identical to the original signal due to lossless compression [24].

Entropy Encoding

Entropy encoding is usually performed on the quantized data by encoding the specific part of symbols with the least number of bits needed to represent them. The most commonly used entropy encoding includes Huffman coding and arithmetic coding.

Huffman Coding

Huffman coding is a lossless compression technique developed by David Huffman in 1950. Huffman encoding works based on the probability of occurrence of a symbol in an image or a file. This algorithm uses variable-length codes for each input character data. Thus, the length of the code greatly depends on how often the characters are repeated [25]. It follows two steps: The first one involves creating a Huffman tree with the root node and the second step is to backtrack the tree by assigning ‘0 s’ and 1 s’ along its path. In the case of an input sequence ‘AAAABBBCC,’ the repetition of character ‘A’ is highest compared to ‘B’ and ‘C’, and hence the code length of ‘A’ is represented with a minimum number of bits. Similarly, the code length of B will be smaller than C, thus resulting in a lesser number of bits in the final compressed file, which in turn reduces the file size to a greater extent.

Arithmetic Coding

Arithmetic coding is another entropy coding technique that differs from Huffman coding by the fact that it encodes the whole message received into a single number value. The basic idea of arithmetic coding is to assign an interval from 0 to 1 for each symbol. The process involves finding the probability of occurrence of each symbol. In this case, each interval is divided into several subintervals. These subintervals that are derived from the coded symbol are then taken as the interval for the next symbol. The same step is repeated by calculating the probability range for each symbol. The final range of probability obtained is sufficient to reconstruct the original sequence [26].

Other Variants of Entropy Encoding

M. A. Kabir et al. proposed an edge-based transformation and entropy coding (ETEC) scheme for pixelated images to reduce the image size without any loss in content. This involves finding the intensity difference of neighboring pixels in the horizontal or vertical direction. Two matrices with absolute intensity and polarity of differences are formed, and then, Huffman or arithmetic entropy coding is applied on the generated matrices [27].

F. Mentzer et al. used a practical earned lossless image compression system called L3C, which is an enhanced version of adaptive entropy coding. It uses learned auxiliary representations and requires three forward passes to predict all pixel probabilities thereby increasing speed [28].

The encoding of bits to improve entropy encoding is carried out using two elegant and sleek algorithms, namely move-to-front (MTF) transform and Burrows–Wheeler transform (BWT) [29]. The MTF transform is a reversible transform that maps a sequence of input characters to an array of output numbers. In this algorithm, the alphabets ‘a’ to ‘z’ are numbered from 0 to 25 which denote its position number. Each time an alphabet is encountered its corresponding position is returned and the alphabet is moved to the front. Based on this, the entire string is read and the final encoded sequence is obtained. If the sequence contains small integers like 0, 1, and 2, it denotes the occurrence of repetitive characters in the string. The performance of the MTF transform can be improved by using BWT which is a block sorting algorithm that permutes the order of characters. The algorithm starts with performing cyclic rotation on the given string and then arranging them lexicographically. The last column of the arranged strings will contain the output of BWT.

Dictionary-Based Compression

In dictionary-based compression methods, the variable-length codes are replaced with short and preferably fixed-length codewords [30]. Compression is achieved by replacing long strings with shorter codewords. The dictionary holds strings of symbols that can be static or dynamic with the former permitting either addition or deletion and later allowing for both additions and deletions as new input is being read.

Lempel–Ziv Algorithms

The Lempel–Ziv algorithm is a family of algorithms that evolved from the two algorithms proposed by Jacob Ziv and Abraham Lempel in their seminal papers published in 1977 and 1978. This algorithm is widely used in compression utilities such as gzip, GIF image compression, and V.42 modem standard.

  • LZ77:

    LZ77 is used when the repetition of words and phrases is likely to occur. LZ77 uses direct pointers to the preceding text. It is a simple technique that contains a dictionary of past encoded sequences. The input sequence is examined through a sliding window for the longest match, and the output is obtained through a reference pointer [31]. There are several variations of the LZ77 scheme available which include LZSS, LZH, and LZB that are used for different applications.

  • LZ78:

LZ78 uses pointers to a separate dictionary. An identical dictionary is built on both the encoder and decoder sides. When the codeword is not found after examining the longest match, the corresponding index and symbol pairs get added to the existing dictionary. Hence, LZ78 algorithm updates and builds up the dictionary whenever a new match is found [31]. The extension of the LZ78 algorithm called LZW was presented by Lempel–Ziv–Welch in 1984. This technique guarantees that a match is always to be found and the dictionary is initialized with all possible entries of ASCII code ranging from 0 to 255.

Dictionary-based compression methods find their application in text document compression wherein Y. Guo et al. have developed a Bayesian framework with dictionary learning using cost function to encode the restored image [32]. M. Ignatosk et al. used the LZW algorithm for compressing text of eight different languages [33]. It is found that LZW is best suited for long text files with repetitive strings, thereby providing a good compression ratio within a short duration of time utilizing minimum resources.

Deflate

Deflate uses the principle of Huffman coding and LZ77 for compression. The deflate compressor provides greater flexibility in the compression of data. The large chunk of data is broken into ‘blocks,’ and each block uses an individual mode of compression. Deflate algorithm checks for similar strings within a given text file and replaces them temporarily with some placeholders to reduce the overall size of the file.

The deflate algorithm finds its use in different compression requirements. Y. Zu et al. focused on parallelizing deflate algorithm on GPU to improve compression speed [34]. The deflate compression algorithm developed by M. Ledwon et al. is applied on field-programmable gate arrays (FPGAs) for hardware acceleration and to speed up computation-intensive applications [35].

Predictive Coding

A predictor transforms the two-dimensional dependence in the original data by removing the redundancy among the pixels. This is done by extracting only the new required information from each pixel and transforming it into a form that can be handled by coding techniques for one-dimensional data [36]. Here, the new information of a pixel can be relatively computed by finding the variation between actual and predicted values of the pixel. Predictive coding can be made preferable with the help of compression ratio and time algorithm.

Median Edge Detector (MED)

Median edge detector forms the basis for the JPEG-LS scheme. In MED predictor, raster scan is performed on each pixel, thereby generating a predicted value of each pixel. The difference between this original and predicted value is based on the threshold value [37]. The median value is chosen among N, W, W + N–NW (where N denotes north and W denotes the west direction of pixels) which are neighboring pixels, thereby providing simplicity and efficiency.

Gradient Adjusted Predictor (GAP)

Gradient adjusted predictor works based on the combination of predefined threshold values and gradient estimation of the current local pixel. It can classify edges as soft, hard, and simple, thereby providing different weights to neighboring pixels and removing the redundant part of the image [37].

Gradient Edge Detector (GED)

Gradient edge detector eliminates the disadvantages of MED and GAP prediction schemes [37]. The main difference here is the usage of a single threshold value which can be fixed or variable. The prediction model is based on MED and the estimation of the gradient is based on GAP, thus making the prediction model simple and efficient.

Adaptive Linear Prediction Coding (ALPC)

This technique is based on entropy encoding and pixel prediction. The algorithm scans the input image in all four directions and predicts the intensity value of each pixel based on the weighted sum of its neighboring pixels [38]. The context used is of fixed shape, and only the weights can be updated during the compression process.

Chain Codes

Chain codes are applied mostly to monochrome and bi-level images for compression. Chain code separately encodes each blob or connected component in an image. In this technique, the coordinates of the boundary of the image are assigned with a code digit during the transition from each given point to the next destination point. Here, each code represents a particular direction in which the next point on the connected line is present. The following includes the most popular chain codes that are commonly used in image compression.

Freeman Chain Code of Eight Directions (FCCE)

The first approach for representing digital boundaries was introduced by Freeman in 1961 using chain codes. This scheme is based on the connectivity of segments which can be either 4 or 8. The code is generated based on the direction of movement, and it is represented as numbers. The boundary code generated based on these numbers is called Freeman chain code. FCCE keeps track of this line and moves in the counterclockwise direction as we go from one contour pixel to another [39].

Vertex Chain Code (VCC)

This boundary chain code is cell-based chain code rather than pixel-based. This technique provides an element that is based on the count of cell vertices that touches the bounding contour of any symbol shape that may be triangular, rectangular, or hexagonal [40]. It uses these three symbols to represent the contour and can connect the chain length with the contact perimeter. Y. K. Liu et al. proposed a new code called compressed vertex chain code (CVCC) without expanding the average bits per code which was more efficient than traditional VCC [41].

Three Orthogonal Symbol Chain Code (3OT)

These codes are suitable to compress any two-dimensional closed binary shape objects by representing them with a three-bit chain code for each orthogonal change direction [42]. This low-cost method provides no loss in information as well as less storage memory.

Unsigned Manhattan Chain Code (UMCC)

This new chain code uses two code symbols for traversing through the geometric shape of all the obtained boundary pixels without rotation and mirroring [43]. This technique provides shape magnification and isolates the monotonic parts of each geometric shape by moving along the direction of the two coordinates x and y using two flags. UMCC can detect monotonic parts from the geometric shape. It has high performance compared to FECC, VCC, and 3OT.

Chain codes based on biological behavior are also developed including the ant colonies chain code (ACCC) which is based on the pheromone released by the ant during the food search. Predator–prey system chain code (PPSCC) is based on the wolf sheep predation model. In this case, the technical solution is obtained by tracking the movement of the wolf reaching its prey (sheep). Agent-based modeling chain code (ABMCC) is based on how the beavers build boundaries to protect them from intruders.

Observation and Discussion

This paper provides a detailed survey of various lossy and lossless compression techniques. The transform-based lossy compression such as DCT, DWT, and KLT follows the pattern of generating the coefficients and comparing them with the threshold value, followed by quantization and coding. DCT performs well at moderate bit rates, whereas DWT performs well at lower bit rates. The EZW-based wavelet compression is robust and can be used for medical image compression. SPIHT is a powerful compression algorithm based on partitioning of decomposed wavelets having high PSNR value and hence used in hyperspectral image compression. K-means is a clustering algorithm that is simple and uses only K number of colors to visualize the image, thereby reducing the size of the image. Chroma subsampling is used for high-definition (HD) and 4 K resolution images by reducing the number of rows containing chroma information, whereas in fractal compression the challenge lies in choosing the right optimization technique for improving the encoding speed.

Regarding the lossless compression techniques, the run-length coding is very efficient but applicable only to images or text with repeated patterns. The arithmetic coding surpasses Huffman coding by using a single code to represent the entire string. Deflate performs better than dictionary-based coding as it employs multiple search mechanisms. It does not wait for the particular codeword to be found before searching for the next codeword, and hence, the computation time is drastically improved. Predictive coding is used to remove the redundant information in an image by comparing the actual value with the predicted value. Chain codes are the current state-of-the-art compressing technique. These codes are generated by moving across the image from one point to another. The shortest path is chosen using the biological behavior of ants, bees, etc., thereby reducing the size of the image to a greater extent.

Each technique has its unique property and can be used based on the area of application. Table 1 provides an overall comparison of the above discussed lossy and lossless compression schemes.

Table 1 Comparison summary of lossy and lossless compression techniques

Performance Metrics

Various performance metrics can be used to determine the quality of the decompressed image. The quality measure varies depending on the type of images and techniques used. The following are the different performance parameters related to compression schemes.

  • Compression ratio: It refers to the ratio obtained by dividing the output compressed image file size (in bits) by the original input file size (in bits).

    100% Compression Ratio = Original Size/Compressed Size (applicable when both the original image and compressed image are of the same size).

  • Compression Time: Time taken for compression and decompression of the image must be taken into consideration as in some cases either decompression time or compression time or both of them are necessary.

  • Mean Square Error (MSE): The mean square represents the total squared error value between the compressed and the original image.

    $${\text{MSE}} = \frac{1}{{{\text{MN}}}}\mathop \sum \limits_{i = 1}^{M} \mathop \sum \limits_{j = 1}^{N} \left[ {X\left( {i,j} \right) - X^{\prime } \left( {i,j} \right)} \right]^{2}$$

    where \(X\left( {i,j} \right)\) denotes the original image, \(X^{\prime}\left( {i,j} \right)\) denotes the decompressed image, and M, N represents the size of the image.

  • Peak Signal-to-Noise Ratio (PSNR): The PSNR is a ratio of quality measurement between the original and a compressed image. The peak signal-to-noise ratio is measured in decibels. The higher value of PSNR indicates the superior quality of decompressed or reconstructed images.

    $${\text{PSNR}} = \, 20*\log_{10} \left( {255/{\text{sqrt}}\left( {{\text{MSE}}} \right)} \right)$$
  • Normalized Cross-Correlation (NCC): Cross-correlation is similar to the property of convolution of two signals. The term normalized cross-correlation relates to the measurement of similarity between the two images. The correlation value becomes 1 as the difference between the two images is equal to 0. It is invariant to brightness and contrast measures.

    $${\text{NCC}} = \frac{{ \mathop \sum \nolimits_{i = 1}^{M} \mathop \sum \nolimits_{j = 1}^{N} \left[ {X\left( {i,j} \right) - X^{\prime } \left( {i,j} \right)} \right]}}{{\mathop \sum \nolimits_{i = 1}^{M} \mathop \sum \nolimits_{j = 1}^{N} X\left( {i,j} \right)^{2} }}.$$
  • Normalized Absolute Error (NAE): It refers to the numeric variance of the reconstructed image, i.e., the difference between an original image and the processed image. The resultant NCC of an image ranges from 0 to 1. If the value is close to 0, the similarity between the original and decompressed image is high. If the value is near 1, then the similarity is low, thereby resulting in a low-quality image.

    $${\text{NAE }} = \frac{{ \mathop \sum \nolimits_{i = 1}^{M} \mathop \sum \nolimits_{j = 1}^{N} \left| {X\left( {i,j} \right) - X^{\prime } \left( {i,j} \right)} \right|^{ } }}{{\mathop \sum \nolimits_{i = 1}^{M} \mathop \sum \nolimits_{j = 1}^{N} \left| {X\left( {i,j} \right)} \right|}}.$$
  • Average Difference (AD): Average difference refers to the noise content present in the decompressed image. A lower value of AD indicates that the image is cleaner with less noise, whereas a higher value indicates more noise. The formula of AD is given as follows:

    $$\frac{{{ }\mathop \sum \nolimits_{i = 1}^{M} \mathop \sum \nolimits_{j = 1}^{N} [X\left( {i,j} \right) - X^{\prime}\left( {i,j} \right)]^{ } }}{MN}.$$
  • Maximum Difference (MD): MD takes into account the fact that the larger pixel appears after the smallest pixel. It refers to the maximum difference of pixels between the original and decompressed image among all the differences. The large value MD refers to the poor quality of an image.

    $${\text{MD}} = {\text{MAX}}\left( { \left| {X\left( {i,j} \right) - X^{\prime } \left( {i,j} \right)} \right|} \right).$$
  • Structural Content (SC): The SC compares both the images using a small patch that is derived using 2D continuous wavelet and is common in both the images. Similar to AD, SD also has to be as low as possible to achieve high-quality images.

    $${\text{SC}} = \frac{{\mathop \sum \nolimits_{i = 1}^{M} \mathop \sum \nolimits_{j = 1}^{N} \left\{ {X\left( {i,j} \right)} \right\}^{2} }}{{\mathop \sum \nolimits_{i = 1}^{M} \mathop \sum \nolimits_{j = 1}^{N} \left\{ {X^{\prime}\left( {i,j} \right)} \right\}^{2} }}.$$
  • Structural Similarity Index (SSIM): The structural similarity index is a metric employed to find the degradation of image quality caused during data compression or data transmission. It is the measure of structural differences based on visible structures between two similar images.

\({\text{SSIM}}(X,X^{\prime } ) = \frac{{\left( {2\mu_{X} \mu_{{X^{\prime } }} + C_{1} } \right) + \left( {\sigma_{{XX^{{^{\prime } }} }} + C_{2} } \right)}}{{\left( {\mu_{X}^{2} + \mu_{{X^{\prime } }}^{2} + C_{1} } \right) + \left( {\sigma_{X}^{2} + \sigma_{{X^{\prime } }}^{2} + C_{2} } \right)}}\), where X, X′ refers to the original and decompressed image, \(\mu_{X} ,\mu_{{X^{\prime } }}\) denotes the mean and \(\sigma_{X}^{2} ,\sigma_{X^{\prime}}^{2}\) denotes the variance of the images.