Discrete transformation technique for video compression

Iwasokun, Gabriel Babatunde; Olaoye, Monday Olutayo

doi:10.1007/s42044-021-00085-3

Discrete transformation technique for video compression

Original Article
Published: 24 March 2021

Volume 4, pages 281–292, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Iran Journal of Computer Science Aims and scope Submit manuscript

Discrete transformation technique for video compression

Download PDF

131 Accesses
Explore all metrics

Abstract

A large composition of image, sound, motion, video and other signal sequences is always a rich source of useful and reliable information. With the proliferating cost and complexities of mass storage media, the need for redundancy and complex transmission-free signals is on the rise. Furthermore, the growing increase in the number of video-specific applications has heightened the need for digital multimedia technologies that emphasized bit compression. This paper presents a Discrete Transform Technique (DTT) that leveraged the integration of Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) for higher scale video compression. The experimental study of the new technique was based on some online videos with varying sizes, sources and formats. The analysis of the experimental values obtained for standard performance metrics such as maximum difference, mean square error and peak signal to noise ratio for the pre- and post-compression videos portrayed a very significant and encouraging impact of the new technique. Comparative analysis based on compression ratios derived using the new technique and some recent video compression techniques also established a very impressive performance of the new technique.

Video Compression

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Compression is a fundamental requirement for modern-day efficient and effective management of signals, images, videos and other related data. It is primarily performed as means of eliminating unwanted or redundant components from data with a view to reducing storage requirement and transmission time as well as endearing compact representation and optimal bandwidth utilization [1]. The rudiment of data compression is presented in Fig. 1 [2].

The encoder applies discrete transform on the original data to obtain transform coefficients that are subsequently quantized and entropy coded to obtain a code stream (bit stream). The decoder is used to reverse the operation of the encoder [3]. Video or image compression is classified as lossless or lossy. Lossy compression is conceptualized in Fig. 2 with a predictor that interfaces with a lossless encoder before releasing its final output to the entry encoder. If the entry encoded stream passes the lossless test, it is passed to the final stage as the compressed image otherwise it is passed to the lossless encoder for further encoding. Lossless compression encodes all information from a still or motion image based on Huffman, run length, arithmetic, bit plane, dictionary or any other encoding technique and it is most suitable for archival, medical and other related images [4, 5].

The basic flow of lossy compression is presented in Fig. 3. The source encoder encodes the input video and passes its output for quantization. The quantized image is subsequently passed to the entropy encoder for transformation into the final output. Lossy compression often results in loss of little information and it is specifically suitable for compression of natural images in applications where minor loss of conformity is tolerable for achieving a sizeable diminution in bit rate [6,7,8].

Compression offers optimized and cost-effective representation, storage and transmission of still or motion images [9]. Retrospectively, video coding standards like Joint Photographic Expert Group (JPEG), Moving Picture Experts Group (MPEG) and H.26 × had been widely explored but with no provision for effortless and swift compression and decompression due to the complex nature of the encoders and decoders [10]. These conventional coding techniques require reassessment when adopted for video surveillance, telemedicine, space and satellite imaging, live broadcasting and other related applications. The escalating and multidimensional form of present-day activities results in enormous video data sizes which have heightened the demand for more flexible and result-oriented compression techniques.

Some of the established video compression techniques include motion JPEG, motion JPEG 2000, MPEG-1, MPEG-2, MPEG-4, H.261, H.263 and H.264. A motion JPEG is often coded as a string of JPEG pictures displayed over time with flexibility in terms of quality as well as compression ratio. However, its compression ratio is often lower than what obtains for other video compression techniques. JPEG 2000 offers a slightly better compression ratio compared to JPEG but with greater complexity. MPEG-1 video compression standard is based on the same technique in JPEG as well as methods for efficient coding and limiting the bandwidth consumption. MPEG-2 is primarily targeted at attaining very high picture quality in Television transmission and other applications capable of 4 Mbps and higher data rates. It is compatible with MPEG-1 and supports interlaced video formats, increased image quality and other features for promoting high definition video transmission. It is however not suitable for real-time surveillance applications. MPEG-4 video compression technique is useful to establishing applications with minimal bandwidth consumption and high photograph quality. However, it is susceptible to high encoding complexities coupled with very weighty computational load for motion evaluation. H.261 is also a video coding standard based on DCT and with specific applications in video conferencing. It promotes full encoding of certain video frames and encoding of the differences between others with prediction, block transformation, quantization and entropy coding being its main elements. H.263 is a video coding technique for achieving low bit-rate visual telephone services in multimedia terminals. Though structurally similar and with backward compatibility with H.261, it offers superior picture quality. H.264 video compression technique is used to deliver optimal bit rate reduction, tolerate transmission errors over various networks, support low latency capabilities, offer simpler implementation and exact match decoding [1, 9]. Transform coding constitutes an integral component of contemporary image/video processing applications and relies on the premise that pixels in an image exhibit a certain level of correlation with neighboring pixels [11]. It is often used to map the spatial (correlated) data into transformed (uncorrelated) coefficients [12]. By transformation, an image/video is changed into a format with highly reduced inter-pixel redundancies based on a reversible, linear mathematical mapping of the pixel values onto a set of coefficients, which are then quantized and encoded [13]. The key success factor of video transformation is its small magnitude as well as quantized and distortion-free outputs for most natural images [14,15,16].

Majority of the existing knowledge on video compression adopts single-mode approach that is based on lossless, lossy, DCT, DWT, Huffman coding, frame skipping, run length coding or any other technique. The single-mode approach has experienced some limitations which include error of parallax, decrease in quality measurement, failure to handle blocking effects and artifacts, non-applicable to 3D and web stressing applications, failure with images exhibiting full mathematical precision and coarse quantization of frequency coefficients resulting in reconstructed image with poor edge quality. Other limitations include the inability to handle distortions arising from higher compression and abnormality arising from large pixel blocks and false contouring due to poor quantization of the transform coefficients. The need for addressing the aforementioned limitations buttressed the importance of this study. The proposed technique uses discrete transformation to achieve higher scale compression of videos. The underlying difference between the new technique and other existing techniques is the mode of operation. While existing techniques adopt single or serial mode of operation, the new technique adopts an integrative mode that is based on DCT and DWT. The objective is to establish a video compression technique that takes the storage, processing and transmission of videos to optimized levels. The new technique made four significant contributions. The first is that it established that DCT and DWT can be combined for video compression. The second contribution is that it established a platform that outperforms some other existing techniques in terms of compression ratios. The third contribution is that it established a platform for the compression of videos of varying formats with no adversity or side effect on key attributes. The fourth contribution is that it established a video compression technique that is suitable for optimizing storage requirement as well as higher speed processing and archiving. The higher compression ratio is attributed to the increase in the bit per pixel strength of the video while retaining the key attributes was achieved using the segregation and decomposition method. The reduction in the storage requirements as well as increase in the speed of processing, retrieval and processing are on the basis of the significant reduction in the bit size. The research has therefore resolved the hypothesis that non-compressed video is confronted with problems of high processing time, expensive memory requirement, slow speed query, inefficient load distribution among others. The dataset used in the study comprises twenty videos downloaded from YouTube between January and April 2020. The preference for the selected videos was based on quality, application and format multiplicity criteria which are key metrics for evaluating the performance of any video compression technique. The general results from the research include a platform for obtaining bit-optimized videos as well as bit-optimized versions of the experimental videos. Notably, the study established that the integration of DCT and DWT for video compression provides advancement over some existing ones. Section 2 presents the review of relevant literature, Sects. 3 and 4 focus on discrete cosine transform and discrete wavelet transform respectively. The discrete transform technique, case study of some selected videos and the conclusion drawn from the research are presented in Sects. 5, 6 and 7 respectively. Section 8 presents the future research focus.

2 Synopsis of some related works

In [1], a compressive sensing model is presented for accurate acquisition and reconstruction of signals, images and video sequences. The model uses minimization and orthogonal matching based on some performance parameters like compression ratio, peak signal to noise ratio (PNSR) and structural similarity index for its reconstruction schemes. PSNR however depends on pixel intensity and human eye perception for similarity which can lead to error of parallax. The authors in [4] present a DCT and DWT model for video compression. The model effectively reduced the number of bits required for quality video presentation but with decrease in quality measurement due to lossy approach. In [17], a lossless and run length coding model for digital image compression is presented. The model supports compression of grayscale images and JPEG with data loss exerting minimal effect on image clarity. The implementation of the model demonstrated its usefulness for dividing an image into approximation and detailed sub-signal prior to compression as well as established a scheme for lossless transformation. The model is however susceptible to blocking effect and artifacts. The authors in [18] present a DCT and DWT technique for video compression. Lossy compression is adopted for achieving reconstructed video quality as well as mass screening of video images. The technique is, however, limited by its inability to handle distortions arising from higher compression and abnormality arising from large pixel blocks and false contouring due to poor quantization of the transform coefficients. A model for lossy compression of poor contrast images is presented in [19]. The model effectively uses DCT and DWT to suppress redundancies and achieve increased clarity but it is unfit for compression of images with full mathematical precision.

In [20], a model for video compression using DWT, DCT and Huffman encoding techniques is presented. The model uses the basic idea behind these techniques to reduce the average number of bits per pixel for better representation and presentation. The model is however limited by coarse quantization of its frequency coefficients resulting in reconstructed image with poor edge quality. A vector quantization and k-means clustering model for video compression are presented in [21]. The model computes the difference between two input frames using motion compensation and estimation techniques and generates its compressed image using DCT-DWT transform and arithmetic encoding followed by arithmetic decoding of residual reconstruction and compression. The model achieves encouraging compression ratio with improved video quality based on hybridization of DCT but not applicable to 3D and web stressing applications. The authors in [22] present a hybridization of DWT, DCT and encoding model for biomedical video application. The model relies on DCT to achieve less computation and higher energy compaction. It also uses DWT and Huffman coding to achieve higher compression ratios and represent a string of symbols with lesser number of bits respectively. The model is however susceptible to high blocking artifacts due to heavy compression and false contouring arising from distortion of smoothly graded areas.

A sensing model for medical image compression and reconstruction is presented in [23]. The model is anchored on non-linear programming and involves marginalization of compressive sensing and recovering of signal or images from far fewer samples than traditionally required. It also involves the selection of a continuous sub-group of coefficients with the largest energy. A model for mobile application video decoding is presented in [24]. The model uses up-down sampling, frame skipping and DCT for high scale video compression and web based streaming with scalable vector. The results from the model however demonstrate some inconsistencies. Rate distortion optimized motion estimation and quad tree model for video compression are presented in [25]. Quad tree algorithm was used to analyze the natural ecology and estimate the rate of distortion and motion optimization. Block matching algorithm was applied on the natural ecology protection system to locate video regions most prone to pollution. It is, however, noted that the model underperforms with edge detection and compression. The authors in [26] established a model that combines some simple functions with DCT to establish a 2-D DCT platform for video compression. With this platform, video compression is via 8-by-8 transformation and retransformation. Results from the model displayed some goodness but not without some noticeable differences between the original and reconstructed versions. A run-length encoding video compression model is presented in [27]. The model eliminates redundant data but with no usefulness to images with a limited number of runs. In [28], a DCT and DWT motion estimation and compensation model for video compression are presented. The model achieves sufficiently high compression ratios but with computational complexities and diminishing performances with poor quality videos. Table 1 presents the summary of the objectives, methods, advantages and disadvantages of the reviewed works.

Table 1 Summary of the objectives, methods, advantages, and disadvantages of the reviewed works

Full size table

3 Discrete cosine transform (DCT)

The DCT is based on the mathematical function, ${\boldsymbol{\delta}}$ which defines a 1-D sequence of length N and is presented as follows:

$${\boldsymbol{\delta}}\left(l\right)=d\left(l\right){\sum }_{i=1}^{\varepsilon }{{\boldsymbol{\rho}}\left(i\right)}\mathrm{cos}\left({\left(\left(2\mathrm{i}+1\right)\mathrm{w}\pi \right)}^{-2\varepsilon }\right), $$

(1)

$${\boldsymbol{\rho}}\left({\boldsymbol{i}}\right)={\sum }_{w=1}^{\varepsilon }\beta \left(l\right){\boldsymbol{\delta}}\left({\boldsymbol{l}}\right)\mathrm{cos}\left({\left(\left(2\mathrm{i}+1\right)\mathrm{w}\pi \right)}^{-2\varepsilon }\right), $$

(2)

$$\beta \left(l\right)= \left\{\begin{array}{l}\sqrt{\frac{1}{\varepsilon }},\quad for\,\, l=0\\ \sqrt{\varepsilon },\quad for\,\, l=1, 2, \dots , \varepsilon \end{array}\right. , $$

(3)

$\beta \left(l\right)$ is the inverse transformation. The 2-D DCT represented by $\delta $ is a direct extension of the 1-D case and is given by:

$$\delta \left(\varphi , l\right)=\beta \left(\varphi \right)\beta \left(l\right)\sum_{s=1}^{\varepsilon }\sum_{i=1}^{\varepsilon }\rho \left(s,i\right)\mathrm{cos}\left({\left(2\mathrm{s}\varphi\uppi +\varphi\uppi \right)}^{-2\varepsilon }\right)\mathrm{cos}\left({\left(2\mathrm{i}\varphi\uppi +l\uppi \right)}^{-2\varepsilon }\right),$$

(4)

$$\rho (s,i)=\sum_{\varphi =1}^{\varepsilon }\sum_{l=1}^{\varepsilon }\beta \left(\varphi \right)\beta \left(l\right)\delta \left(\varphi ,l\right)\mathrm{cos}\left({\left(2\mathrm{s}\varphi\uppi +\varphi\uppi \right)}^{-2\varepsilon }\right)\mathrm{cos}\left({\left(2\mathrm{i}\varphi\uppi +l\uppi \right)}^{-2\varepsilon }\right),$$

(5)

$\varphi $, w = 1, 2, …, $\varepsilon $ and $\rho (s,i)$ is the inverse transform. $\varepsilon $ is the count of partitions defined in the input video $\rho $ having $i$ and $s$ pixels, $\rho (s,i)$ is the intensity of the pixel in row $s$ and column $i$ of the image, $\delta (\varphi ,i)$ gives the DCT coefficient in row $\varphi $ and column $i$ of the DCT matrix. The corresponding 2-D DCT function is generated based on the product of the horizontally oriented 1-D DCT function and the vertically oriented set of the same functions. The functions for $\varepsilon $ = 6 are shown in Fig. 4 which portrays a progressive increase in frequency both in the vertical and horizontal directions. As shown in Fig. 4, the number of horizontal bars increases from 2 for row 1 to 7 for row 6 while the number of vertical bars increases from 3 for column 1 to 8 for column 6.

4 Discrete wavelet transform (DWT)

Discrete wavelet transform is a multi-resolution transform technique used to achieve higher image or video compression ratio. It compounds series of transformation methods as a means of compensating for their respective weaknesses [28]. For a 2-D DWT, while the input data are passed through a set of low and high pass filters in the rows and columns directions, the outputs are down-sampled by factor 2 in each bearing in a manner related to 1-D DWT as shown in Fig. 5. The output represents a set of quadruple coefficients LL, HL, LH and HH where the first and second correspondences signify the transform in row and column respectively. The correspondent L means low pass signal while H means high pass signal. Thus, LL represents low pass signal in row direction and low pass signal in column direction while HL implies high pass signal in row direction and low pass signal in column direction.

The LH signal contains horizontal elements while HL and HH signals contain vertical and diagonal elements, respectively. In DWT reconstruction, input data is obtained with more than a few resolutions by further splitting the LL coefficient into multiple levels as shown in Fig. 6. The output is derived based on factor 2 up-sampling compression using high pass and low pass filters in both row and column.

The poly-phase network of the wavelet filter is factorized into an order of substituting upper and lower triangular grids and a corner to corner lattice, leading to joined grid duplications. The DWT scheme comprises of some processing set-up, integer-to-integer number wavelet transform (IWT), symmetric forward and converse transform among others. It is used to separate the high pass and low pass filters into a banding of upper and lower triangular grids and deliver the filter execution into united network duplications. A wavelet transform with separate taps in the high and low pass investigation filters which represent the shortest and least complex symmetrical bi-orthogonal wavelet is used to break an image into high and low-frequency segments. The mathematical formula for the fundamental lifting schemes low pass (l) and high pass (h) comparisons are presented as follow:

$$l=\left(2i+1\right)\left(\delta - 2\rho \right) ,$$

(6)

$$h=2i\left(3\delta - \rho \right),$$

(7)

$i, \delta\, and\, \rho $ represent the pixel size, DCT coefficient and intensity respectively.

5 Discrete transform technique (DTT) for video compression

The methods proposed in the literature for video compression are noted for problems of error of parallax [1], reduction in quality measurement [4], blocking effect and artifacts [17, 22], failure to handle distortions, abnormalities and false contouring [18], coarse quantization of frequency coefficients [20], failure to handle 3D and web stressing applications [21] and computationally expensiveness [23, 28]. The need to address these problems is partly responsible for the formulation of the proposed DTT. The adoption of the new technique is also based on the need to provide answers to some questions bothering on the impact of non-optimization of bits on video management vis-a-vis memory requirement, processing time, query servicing, load distribution among others. As conceptualized in Fig. 7, DTT is a combination of the DCT and DWT techniques presented in Sect. 3 and Sect. 4 respectively. DCT is a fast orthogonal technique with usefulness for the transformation of matrices with a large number of (near) zero values. Its major limitation is the compelling quantization step required for taking decisions on integer-valued output. DWT on its own does not offer specific information on the output but provides a better way to detect any anomaly in the transformation. The motive behind the combination of DCT and DWT is to cancel out the weaknesses of each technique using the strength of the other.

The new technique performs image segregation using 2-D frame of the model presented in Fig. 6 as well as the decomposition of each segment using a 2-D DWT based on quantization and elimination of zeros and compression of each sub-band by Arithmetic Coding as illustrated in Fig. 8. The quantization process takes the LL coefficients as input to form a [0,1] element matrix which is passed to the later stage for discarding of the zeros. The zero-free matrix is then passed to the final stage for arithmetic coding. The new technique also performs the compression of the sub-bands (HL, LH, HH) using T – Matrix Coding based on the order shown in Fig. 9. The discrete transform technique uses 2-D DWT to decompose each component to obtain coefficients which are further subjected to 8-point DCT. For improved compression, the preponderance of the high coefficients is subsequently discarded based on a JPEG-like quantization in which the higher frequency components are subjected to scaling. A 1-D DCT quantization is also used to obtain the array of each sub-band. The combination of DWT technique with the DCT technique is to achieve higher video compression ratio as the former reduces false contouring and artifacts effects while the later reduces redundancies using frequency-based classifications.

6 Case study

A case study of some randomly selected online videos (shown in Fig. 10) which were downloaded from YouTube between January and April 2020 was carried out on Pentium 3 CPU with 256 MB RAM and 128 MB HDD and runs Windows XP Operating System. Visual Basic.NET was used to implement the programming logic while Access database system provided the auxiliary storage unit for the information on the videos and system. The experimental videos shown in Fig. 10 exhibit varying sizes, formats and attributes. Table 2 presents the pre- and post-compression attributes for all the experimental videos while Fig. 11 presents a section of the post-compression report on video link, pre- and post-compression sizes as well as the date of the operation. As shown in Table 1, the accumulated bytes of spaces released after the compression operations on all the twenty videos is 1.11196 GB (4326.76–3214.80) which is 25.70% of the total pre-compression sizes. The compression operation on all the videos also led to an average bit per pixel of 98 and average compression time of 58.8 s. These results indicate good performances of the discrete transform technique and established its ability to successfully compress images of all types and formats. It is equally proved that the combination of DCT and DWT for video compression promotes storage conservation and management efficiency. The compressions were based on low pass ($h)$ and high pass ($g)$ filter coefficients of the conventional 5/3 filter given as:

Table 2 Pre- and post-compression attributes of the experimental videos

Full size table

$$h:\left\{- 0.125, 0.25, 0.75, 0.25, -0.125\right\}$$

$$g:\{-\mathrm{0.25,1}, -0.25\}$$

Based on this co-efficient, the filter outputs, ${f}_{1}$, ${f}_{2} and {f}_{3}$ were computed based on the inverse transform, $x$ and 1-D sequence of length $n$ as follows:

$${f}_{1}= -0.25x\left(n-1\right)+ 0.5x\left(n-2\right)- 0.25 x\left(n-3\right) ,$$

(8)

$${f}_{2}= 0.125x\left(n\right)+ 0.25x\left(n-1\right)+ 0.75x\left(n-2\right)+ 0.25 x\left(n-3\right)- 0.1255x\left(n-4\right) ,$$

(9)

$${f}_{3}= 0.125[-\mathrm{ x }(\mathrm{n})+2\mathrm{x }(\mathrm{n }- 1)+6\mathrm{x }(\mathrm{n}-2)\hspace{0.17em}+\hspace{0.17em}2\mathrm{x }(\mathrm{n}-3) -\mathrm{ x }(\mathrm{n }- 4)]$$

(10)

Results obtained for some standard metrics such as compression ratio ($\partial $), bits per pixel ($\gamma $) and relative redundancy ($\varepsilon $) served as the basis of the justification for the good performance claim. $\partial $ and $\gamma $ are obtained from:

$$\partial =\frac{{S}_{o}}{{S}_{c}} ,$$

(11)

$$ \gamma =\frac{1}{8\partial },$$

(12)

${S}_{o}$ and ${S}_{c}$ are the pre- and post-compression bit sizes, respectively. The space conservation index, $\mu $ is obtained from

$$\mu =\left(1-\frac{{S}_{c}}{{S}_{o}}\right) \times100 .$$

(13)

The frequently used objective measurements based on numerical and statistical terms for performance evaluation are peak signal noise ratio ($\vartheta $), mean square error ($\varnothing $) and maximum difference ($\tau )$. These measurements are defined as follows:

$$\vartheta =10\mathrm{log}\left(\frac{{m}^{2}}{\varnothing }\right) ,$$

(14)

$$\varnothing =\frac{1}{ab}\sum_{x=1}^{a}\sum_{y=1}^{b}{(\mathrm{O}}_{(x,y)}-{{C}_{(x,y)})}^{2}$$

(15)

$$\tau =\mathrm{max }\{|{O}_{(x,y)}-{C}_{\left(x,y\right)}\left|\right\},$$

(16)

$m$ is the bit value, a and b are the image dimensions while O_{(x,y) and} C_(x,y) are the original and compressed images, respectively. $\vartheta $ is based on $\varnothing $ and the higher its value, the higher the level of compression. $\varnothing $ represents the cumulative squared error between the compressed and the original image and the lower its value, the lower the error. Similarly, the higher the value of $\tau $, the higher the compression. The plots of the obtained pre and post-compression $\vartheta $, $\varnothing $ and $\tau $ values are presented in Figs. 12, 13 and 14. Visual inspection of the plots reveals higher post-compression peak to noise signal ratios and maximum differences as well as lower mean square errors in all corresponding cases. These figures conveyed the integration of DCT and DWT as suitable for achieving size reduction, higher bit density and improved quality of display for the videos.

A comparative analysis based on the experimental study of ten other techniques; namely Compressive Sensing [1], Convolutional Neural Network [7], Hybrid DCT-DWT [18], Residual Vector Quantization [21], DCT [25], Differential Detection and Run-Length Encoding [27], Huffman [29, 30], Lossless JPEG-Huffman [31], Run Length [32, 33] and Motion Compensation [34] techniques was carried out. The choice of these techniques is based on their recentness, popularity and applications. The comparative analysis adopted the same environment, engines and dataset used for the new technique and its purpose is to provide the basis for comparing the new technique with existing ones. The compression ratios obtained from the experimental study of the ten techniques on the twenty research videos are presented in Table 3 along with the ones obtained for the new technique. Visual inspection of the figures presented in Table 3 reveals that higher figures are recorded for the current (new) technique in most cases. This presents the fact that the new technique stands the ground of preference over the ten other techniques when optimal bit compression of videos of all types and formats is required. Few cases where the older algorithms outperformed the new algorithm are highlighted.

Table 3 Compression ratios for different video compression algorithms

Full size table

7 Conclusion

Videos are noted for size explosiveness which largely accounts for their large download/upload and storage complexities as well as lengthened times for transmission and processing. Compression operation on videos is therefore often required for bit reduction to more economic friendly storage, transmission and playback. Previous works had adopted single-mode approaches that are based on lossless, lossy, DCT, DWT, Huffman coding, frame skipping, run length coding or any other technique for video compression with attendant limitations which include error of parallax, decrease in quality measurement, failure to handle blocking effect and artifacts, non-applicable to three dimensional (3D) and web stressing applications, failure with image containing full mathematical precision and coarse quantization of frequency coefficients resulting in reconstructed image with poor edge quality. Other limitations of some existing techniques include inability to handle distortions arising from higher compression and abnormality due to large pixel blocks and false contouring emanating from poor quantization of the transform coefficients. Specifically, the research is based on the hypothesis that non-bit-optimized videos are faced with the challenges of high processing time, expensive memory requirement, slow speed query, inefficient load distribution among others. The experimental study of the DTT for resolving the hypothesis was based on some online videos with varying sizes, sources and formats. The figures recorded from the experimental study using some standard metrics such as maximum difference, mean square error and peak signal to noise ratios for the original and compressed videos formed the basis of performance analysis. The analysis showed that compression based on the integration of DCT and DWT achieved significant reduction in bit size for all the videos. It was also revealed that the new technique is applicable to 3D and web stressing applications as well as suitable for full mathematical precision and coarse quantization of frequency coefficients. Specifically, the post-compression bit figures recorded for the research led to 25.70% reduction in the aggregate storage requirements for all the experimental videos. Judging from the superior compression ratios presented in Table 3, this cumulative reduction figure will unarguably exceed equivalent figures that may be derived using the algorithms proposed in [1, 7, 18, 21, 25, 27] and [29,30,31,32,33,34]. Ultimately, the reduction in the storage requirements will leads to lower processing and querying times as well as more efficient load distribution.

8 Future works

Thirty-three (33) of the two hundred (200) cases of the compression ratios reported and highlighted in Table 3 for the ten older but related algorithms exceed the values reported for the new technique. This implies a 16.5% below performance figure which is blamed on some complexities that the new technique could not handle for the specific images. Further research, therefore, focuses on strengthening the new technique with the addition of other compression algorithms such as Huffman coding, frame skipping and run length coding for attaining near-zero below performance figure when compared with other existing and recent techniques. Consideration will also be given to the compression of higher dimension images.

References

Charu, B.U., Meenakshi, S.: Performance evaluation of compression for images and videos using compressive sensing technique. In: Proceedings of the 4th International Conference on Computing for Sustainable Global Development, BVICAM, New Delhi, pp, 6253–6257 (2017).
Shriram, P.H.: Algorithms and Architecture for Discrete Wavelet Transform Based Video encoder. PhD thesis submitted to Department of Electrical and Electronics Engineering, Vinaya Missions University (2015)
Anuja, R., Sridevi, S., Vijayakuymar, V.R.: A survey on various compression methods or medical images. Int. J. Intell. Syst. Appl. 3, 13–19 (2012)
Google Scholar
Kumar, P.R., Murty, P.S., Babu, P.N.: The novel lossless text compression technique using ambigram logic and Huffman coding. Inf. Knowl. Manag. 2(2) (2016)
Balle, J., Laparra, V., Simoncelli, E.P.: End-to-end optimization of nonlinear transform codes for perceptual quality. In: IEE Picture Coding Symposium. https://arxiv.org/abs/1607.05006. Accessed 25 Sep 2017 (2016).
Dosovitskly, A., Brox, T.D.: Generating Images with Perceptual Similarity Metrics Based on Deep Networks. https://arXiv.org/abs/1602.02644. Accessed 25 Sep 2017 (2016)
Gatys, L.A., Ecker, A.S., Bethge, M. Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. www.semanticscholar.org/paper/image-style-transfer-Using-Convolutional-Neural-Network-Gatys-Ecker/7568d13a82f7afa4be79f09c295940e48ec6db89, Accessed 12 Nov 2018 (2016)
Hall, K., Gregor, I., Danihelka, A.G., Wierstra, D.: Towards Conceptual Compression. https://arXiv.org/abs/1601.06759. Accessed 14 Sep 2018 (2016)
Zhao, C., Siwei, M., Zhang, J., Xiong, R., Gao, W.: Video compressive sensing reconstruction via reweighted residual sparsity. In: IEEE Transaction on Circuits and Systems for Video Technology, pp. 1–14 (2015)
Zhang, F., Duan, Y.: A highly effective impulse noise detection algorithm for switching median filters. IEEE Signal Process. Lett. 17(7), 647–650 (2015)
MathSciNet Google Scholar
Gao, C., Cao, Y., Zhou, Z., Sun, X.: Coverless information hiding based on the molecular structure images of material. Comput. Mater. Contin. 54(2), 197–207 (2015)
Google Scholar
Devore, J.L.: Probability and Statistics. Thomson Asia Pte Ltd, Singapore (2002)
Google Scholar
Hong, S.W., Bao, P.: Hybrid image compression model based on sub band coding and edge preserving regularization. IEEE Proc. Vis. Image Signal Process. 147(1), 16–22 (2005)
Article MathSciNet Google Scholar
Ying, L., Dimitris, A.P.: Compressed sensed domain L1 PCA video surveillance. IEEE Trans. Multimed. 18(3), 351–363 (2016)
Article Google Scholar
Gonzalez, R.C., Woods, R.E.: Digital Image Processing, 3rd edn. Pearson Prentice Hall (2005)
Gonzalez, R.C., Woods, R.E.: Digital Image Processing, 2nd edn. Prentice Hall (1992)
Dhande, J.D., Gulhane, S.M.: Daubechies wavelet based neural network classification system for biomedical signal. Int. Conf. Inf. Process. Pune 99, 188–191 (2015)
Google Scholar
Phakade, S.V., Harish, P.: Video compression using hybrid DCT-DWT algorithm. Int. Res. J. Eng. Technol. 3, 2758–2761 (2016)
Google Scholar
Bheemeswara, G.R., Kavitha, T.S., Yedukondalu, U.: FGPA Implementation of JPEG2000 image compression using modified DA based DWT and lifting DWT-IDWT technique. Int. J. Eng. Res. Appl. 2 (2012)
Singh, L., Rajeshwar, D., Sandeep, K.: Video compression techniques. Int. J. Sci. Technol. Res. 1(10), 114–119 (2012)
Google Scholar
Rani, S., Chitra, P.: An improved fractal image compression using residual vector quantization for compressing medical images. Proc. Int. Conf. Front. Eng. Appl. Sci. Technol. 3, 86–90 (2017)
Google Scholar
Bharath, K.N., Padmajadevi, G.K.: Hybrid compression using DWT-DCT and Huffman encoding techniques for biomedical image and video applications. Int. J. Comput. Sci. Mobile Comput. 2(5), 255–261 (2013)
Google Scholar
Bhatt, M.U., Bamniya, K.: Medical Image Compression and Reconstruction Using Compressive Sensing. Academic Press (2015)
Google Scholar
Ronik, D., Poorva, W., Sangeeta, J.: Web based modified video decoding for mobile application. Int. J. Comput. Appl. 126(15), 19–23 (2015)
Google Scholar
Kumar, S.S., Mohan, B.C., Chatterji, B.N.: Video compression system for online usage using DCT. Int. J. Trend Res. Dev. 2(4), 67–72 (2019)
Google Scholar
Gupta, M., Garg, K.: Analysis of image compression algorithm using DCT. Int. J. Eng. Res. Appl. (IJERA) 2(1), 515–521 (2016)
Google Scholar
Dai, B., Yin, S., Gao, Z., Wang, K., Zhang, D., Zhuang, S., Wang, X.: Data compression for time-stretch imaging based on differential detection and run-length encoding. J. Light Wave Technol. 35(23), 5098–5104 (2017)
Article Google Scholar
Shanthi, R.M., Somasundaram, K.: Mode based K-means algorithm with residual vector quantization for compressing images. In: International Conference on Control Computation and Information System, pp. 105–112 (2017)
Huffman, D.: A method for the construction of minimum-redundancy codes. Proc. IRE 40(10), 1098–1101 (1952)
Article Google Scholar
Shikhar, M.: Greedy Algorithms Set 3 Huffman Coding. http://www.geeksforgeeks.org/greedy-algorithms-set-3-huffman-coding/. Accessed 15 Dec 2016 (2017)
Pennebaker, W.B., Mitchell, J.L.: JPEG Still Image Data Compression Standard, 3rd edn. Springer (2003)
Husseen, A.H., Mahmud, S.S., Mohammed, R.J.: Image compression using proposed enhanced run length encoding algorithm, IBN AL-Haitham. J. Pure Appl. Sci. 24(1) (2011)
Bandyopadhyay, S.K., Paul, T.U., Raychoudhury, A. Image compression using approximate matching and run length. Int. J. Adv. Comput. Sci. Appl. 2(6) (2011)
Zaid, H., Maher, K.: Video compression based on motion compensation and contourlet transform. In: Proceedings of Third Scientific Conference of Electrical Engineering (SCEE). IEEE (2018)

Download references

Author information

Authors and Affiliations

Department of Software Engineering, Federal University of Technology, Akure, Nigeria
Gabriel Babatunde Iwasokun
Department of Computer Science, Federal University of Technology, Akure, Nigeria
Monday Olutayo Olaoye

Authors

Gabriel Babatunde Iwasokun
View author publications
You can also search for this author in PubMed Google Scholar
Monday Olutayo Olaoye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriel Babatunde Iwasokun.

Ethics declarations

Conflict of interest

As far as the content of this article is concerned, the authors have no conflicts of interest to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iwasokun, G.B., Olaoye, M.O. Discrete transformation technique for video compression. Iran J Comput Sci 4, 281–292 (2021). https://doi.org/10.1007/s42044-021-00085-3

Download citation

Received: 04 November 2020
Accepted: 09 March 2021
Published: 24 March 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s42044-021-00085-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Discrete transformation technique for video compression

Abstract