High-Speed Wavelet Image Processing Using the Winograd Method

Nagornov, N. N.; Semyonova, N. F.; Abdulsalyamova, A. S.

doi:10.1007/978-3-031-34127-4_36

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 702))

Included in the following conference series:

International Conference on Actual Problems of Applied Mathematics and Computer Science

291 Accesses

Abstract

Wavelets are actively used for solving of image processing problems in various fields of science and technology. Modern imaging systems have not kept pace with the rapid growth in the amount of digital visual information that needs to be processed, stored, and transmitted. Many approaches are being developed and used to speed up computations in the implementation of various image processing methods. This paper proposes the Winograd method (WM) to speed up the wavelet image processing methods on modern microelectronic devices. The scheme for wavelet image filtering using WM has been developed. WM application reduced the computational complexity of wavelet filtering asymptotically to 72.9% compared to the direct implementation. An evaluation based on the unit-gate model showed that WM reduces the device delay to 66.9%, 73.6%, and 68.8% for 4-, 6-, and 8-tap wavelets, respectively. Revealed that the larger the processed image fragments size, the less time is spent on wavelet filtering, but the larger the transformation matrices size, the more difficult their compilation and WM design on modern microelectronic devices. The obtained results can be used to improve the performance of wavelet image processing devices for image compression and denoising. WM hardware implementation on a field-programmable gate arrays and an application-specific integrated circuits to accelerate wavelet image processing is a promising direction for further research.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Comparison of Approaches to the Circuits Design for DWT with CDF 9/7 Wavelet

Low-Power, Low-Area Multi-level 2-D Discrete Wavelet Transform Architecture

Article 11 April 2017

An Efficient Architecture for Modified Lifting-Based Discrete Wavelet Transform

Article 26 October 2020

Keywords

1 Introduction

Wavelets are actively used for solving image processing problems in various fields of science and technology such as denoising [1], color image processing [2], video analysis [3]. However, modern imaging systems have not kept pace with the rapid growth in the amount of digital visual information that needs to be processed, stored, and transmitted. Many approaches are being developed and used to speed up computations in the implementation of various image processing methods. The authors of [4] focus on the evolution and application of various hardware architectures. The fast decomposition algorithms based on a different representation called product convolution extension has been proposed in [5]. This decomposition can be efficiently estimated by assuming that multiple operator impulse responses are available. The new simple adjacent sum method is developed in [6] for multidimensional wavelet constructing. This method provides a systematic way to build multidimensional non-separable wavelet filter banks from two 1D low-pass filters, one of which is interpolating to increase image processing speed. The authors of [7] describe an asymmetric 2D Haar transform and extend it to wavelet packets containing an exponentially large number of bases. A basis selection algorithm is also proposing for optimal basis finding in wavelet packets. Various modern GPU optimization strategies for the discrete wavelet transform implementation such as the use of shared memory, registers, warp shuffling instructions, and parallelism at the level of threads and instructions are presented in [8]. A mixed memory structure for the Haar transform is proposed in which a multilevel transform can be performed with a single launch of the combined kernel. The paper [9] proposes a new algorithm for 2D discrete wavelet transform of high-resolution images on low-cost visual sensors and nodes of the Internet of things. The reduction in computational complexity and power consumption compared to modern low-memory 2D discrete wavelet transform methods are the main advantages of the proposed segmented modified fractional wavelet filter. However, all of these methods are based on pixel-by-pixel image processing. The Winograd method (WM) reduce image processing time due to group pixel processing. The processed image is assembled from fragments of a certain size which reduces the multiplications number by increasing the additions number.

The purpose of this paper is to accelerate wavelet image processing using WM on modern microelectronic devices.

2 Wavelet Image Processing Using the Direct Implementation and the Winograd Method

Wavelet filtering using direct implementation (DI) has the form

$${I}_{2}\left(x\right)={\sum }_{i=0}^{f-1}{I}_{1}\left(x-i\right)K\left(i\right),$$

(1)

where ${I}_{1}$ and ${I}_{2}$ are the original and processed 2D images, respectively, $x$ is the row number of the pixel processed by $f$-tap wavelet filter $K$. The wavelet transform extracts local information about the signal in both frequency and time. High computational complexity is a significant disadvantage of this transform. The scheme of 1D wavelet filtering of an image fragment using DI is shown in Fig. 1a, where ${S}_{I}$ is the original image fragment, $L$ and $H$ are the low- and high-pass wavelet filters, ${P}_{A}$ and ${P}_{D}$ are the processed image pixels with approximate and detailing image information, respectively.

Image filtering using WM in matrix form [10] can be presented as

$$Z={A}^{T}\left(\left(GK\right)\odot \left({B}^{T}S\right)\right),$$

(2)

where: $Z$ is the processed image fragment of size $z\times 1$; $K$ is the wavelet filter of size $f\times 1$; $S$ is the original image fragment of size $s\times 1$, where $s=z+f-1$; ${A}^{T}$, $G$, ${B}^{T}$ are the transformation matrices of sizes $z\times s$, $s\times f$, $s\times s$, respectively; $\odot$ is the element-wise matrix multiplication. Algorithms for matrices ${A}^{T}$, $G$, ${B}^{T}$ obtaining are described in [11]. WM is denoted as $F(z,f)$. Digital filtering is performed on two computational channels corresponding to low- and high-frequency wavelet filters during wavelet image processing. The products of $GL$ and $GH$ are calculated in advance when using a specific wavelet. The product of $S$ and the transformation matrix ${B}^{T}$ can be computed before splitting the calculations into two channels because does not depend on the wavelet choice. Next, the element-wise multiplications ${B}^{T}S$ by $GL$ and $GH$ and the products of the obtained results with the transformation matrix ${A}^{T}$ are performed over two computational channels. The scheme of 1D wavelet filtering of an image fragment using WM is shown in Fig. 1b, where $S$ is the original image fragment, $L$ and $H$ are the low- and high-frequency wavelet filters, ${B}^{T}$, ${A}^{T}$, $G$ are the transformation matrices, ${S}_{A}$ and ${S}_{D}$ are the processed image fragments with approximate and detailing image information, respectively.

The results of increasing the speed of wavelet image processing using MW are presented below.

3 Acceleration of Wavelet Image Processing Using the Winograd Method

The computational complexity in time of wavelet filtering using WM $F(z,f)$ depends on the $z$ and $f$ and on the choice of points ${s}_{0},{s}_{1},...,{s}_{n-2},{s}_{n-1}$. These values determine the form of transformation matrices ${A}^{T}$, $G$, ${B}^{T}$. The set of the Lagrange polynomial points $L=\mathrm{0,1},-\mathrm{1,2},-\mathrm{2,4},-4,...,{2}^{l},-{2}^{l},{2}^{l+1},-{2}^{l+1},...,\infty $ was used to construct the Vandermonde matrix $V$ and matrices ${A}^{T}$, $G$, ${B}^{T}$ [11]. The cases of using 4-, 6-, and 8-tap wavelets and processing of the original image fragments with size $z=\mathrm{2,3},\mathrm{4,5},\mathrm{6,7}$ are considered. Table 1 is based on transformation matrices and contains the counting results of the multiplications and additions number required for wavelet filtering of images using DI and WM. The table values are obtained as follows.

1.
The DI multiplications number is equal to the wavelet filters coefficients number.
2.
The WM multiplications number is equal to twice the number of the processed image fragment pixels.
3.
The DI additions number is equal to the number 2 less than the multiplications number.
4.
The WM basic additions number is equal to the additions number of nonzero elements of matrices ${A}^{T}$ (twice) and ${B}^{T}$ by rows.
5.
The WM complementary additions number is equal to the sum of the matrix element units in binary notation reduced by 1 for all elements of matrices ${A}^{T}$ (twice) and ${B}^{T}$.
6.
The total additions number is equal to the sum of basic and complementary additions.
7.
WM receives several pixel values of the processed image in one iteration. Obtaining pixel brightness value requires the entire iteration as well as obtaining the entire fragment. Introduce the pixel specific value (PSV) for a correct comparison of the methods computational complexity. PSV is calculated as a quotient of the required operations number (multiplications or additions) divided by the number of pixels in the processed image fragment.

Table 1. The number of additions and multiplications in wavelet filtering of an image fragment using the direct implementation and the Winograd method

Full size table

Table 1 shows that the greatest reduction in the specific weight of a pixel by multipliers is observed for 8-tap wavelet using WM $F\left(\mathrm{6,8}\right)$. The computational complexity decreases asymptotically by 72.9% compared to DI. The asymptotic estimate does not take into account addition operations since their complexity is an order of magnitude less than multiplication. This assessment is predominantly theoretical and may have a low correlation with the results obtained in the design of wavelet image processing devices in practice. Therefore, the unit-gate model (UGM) was used to calculate the operating time of a microelectronic device. UGM is a method for theoretical evaluation of device characteristics based on counting the number of the basic logical elements “and”, “or” [12]. The response time of one such element will be taken as a conventional unit (CU). Describe the principles of performing calculations in the theoretical estimation of the wavelet filtering devices delay according to the schemes in Fig. 1a and Fig. 1b for DI and WM, respectively. All multiplications are performed in parallel when using both methods.

Matrix multiplication operations can be replaced by shift and addition operations using the ${B}^{T}$ and ${A}^{T}$ matrices. The number of ones in the number binary representation for each element of the matrices ${A}^{T}$ and ${B}^{T}$ was calculated to determine the terms number in the rows of these matrices (Table 2). The products $GL$ and $GH$ are performed a priori. The products ${B}^{T}$ S on $GL$ and $GH$ are realized by element-wise multiplications. Multiplications and additions are implemented using a generalized multiplier (GM) and a multi-operand adder (MOA), respectively [13]. The delays of GM and MOA for $k$-bit numbers on computing devices are $6.8\,{\mathrm{log}}_{2}\,N+2\,{\mathrm{log}}_{2}\,k+4$ and $8.8\,{\mathrm{log}}_{2}\,k+4$, respectively [14], where $N$ is the largest number of elements in rows of matrices ${A}^{T}$ and ${B}^{T}$, $k$ is the image color depth and the coefficients bit-width of used wavelet filters. The calculations are performed for $k=8$. The results of the device delay evaluation for wavelet image processing using DI and WM are presented in Table 2.

Table 2. UGM-based evaluation results of the device delay for wavelet processing of 8-bit image using the direct implementation and the Winograd method

Full size table

The following conclusions are drawn based on the results in Table 2.

1.
WM reduced the device delay of wavelet image processing to 66.9%, 73.6%, and 68.8% for 4-, 6-, and 8-tap wavelets, respectively, compared DI according to UGM.
2.
The larger the processed image fragments size $z$, the less time is spent on wavelet filtering, but the larger the transformation matrices size, the more difficult their compilation and WM design on modern microelectronic devices.
3.
The greatest reduction in device delay with an increase in the size of the resulting image fragments processed using WM is achieved at $z=2$ and $z=3$ according to UGM. For example, the device delay is reduced by $55.0-39.3=15.7$ CU and $39.3-26.9=12.4$ CU at $z=2$ and $z=3$, respectively, the device delay is reduced by 4.3 CU and 4.5 CU at $z=4$ and $z=5$, respectively, for 4-tap wavelet according to UGM.

4 Conclusion

The scheme for 1D wavelet image processing using WM has been developed. A comparative analysis of the image filtering time with DI was carried out. WM reduced the computational complexity of wavelet image processing asymptotically to 72.9% depending on the size of the filters used and fragments of the processed image. WM reduced the device delay of wavelet image processing to 66.9%, 73.6%, and 68.8% for 4-, 6-, and 8-tap wavelets, respectively, according to UGM. The larger the processed image fragments size z, the less time is spent on wavelet filtering, but the larger the transformation matrices size, the more difficult their compilation and WM design on modern microelectronic devices. The obtained results can be used to improve the performance of wavelet image processing devices for image compression and denoising. WM hardware implementation on FPGAs and ASICs to accelerate wavelet image processing is a promising direction for further research.

References

Wu, Y., Gao, G., Cui, C.: Improved wavelet denoising by non-convex sparse regularization under double wavelet domains. IEEE Access 7, 30659–30671 (2019)
Article Google Scholar
Soulard, R., Carré, P.: Elliptical monogenic wavelets for the analysis and processing of color images. IEEE Trans. Sig. Process. 64, 1535–1549 (2016)
Article MathSciNet MATH Google Scholar
Chen, Y., Li, D., Zhang, J.Q.: Complementary color wavelet: a novel tool for the color image/video analysis and processing. IEEE Trans. Circuits Syst. Video Technol. 29, 12–27 (2019)
Google Scholar
Alcaín, E., et al.: Hardware architectures for real-time medical imaging. Electronics 10, 3118 (2021)
Google Scholar
Escande, P., Weiss, P.: Fast wavelet decomposition of linear operators through product-convolution expansions. IMA J. Numer. Anal. 42, 569–596 (2022)
Article MathSciNet MATH Google Scholar
Hur, Y., Zheng, F.: Prime coset sum: a systematic method for designing multi-d wavelet filter banks with fast algorithms. IEEE Trans. Inf. Theory, 7565569 (2016)
Google Scholar
Ouyang, W., Zhao, T., Cham, W.K., Wei, L.: Fast full-search-equivalent pattern matching using asymmetric haar wavelet packets. IEEE Trans. Circuits Syst. Video Technol. 28, 819–833 (2018)
Article Google Scholar
Quan, T.M., Jeong, W.K.: A fast discrete wavelet transform using hybrid parallelism on GPUs. IEEE Trans. Parallel Distrib. Syst. 27, 3088–3100 (2016)
Article Google Scholar
Tausif, M., Khan, E., Hasan, M., Reisslein, M.: SMFrWF: segmented modified fractional wavelet filter: fast low-memory discrete wavelet transform (DWT). IEEE Access 7, 84448–84467 (2019)
Article Google Scholar
Winograd, S.: Arithmetic Complexity of Computations (1980)
Google Scholar
Lyakhov, P., Abdulsalyamova, A., Semyonova, N., Nagornov, N.: On the computational complexity of 2D filtering by Winograd method. In: 2022 11th Mediterranean Conference on Embedded Computing (MECO), pp. 1–4 (2022)
Google Scholar
Zimmermann, R.: Binary adder architectures for cell-based VLSI and their synthesis. Hartung-Gorre (1998)
Google Scholar
Parhami, B.: Computer Arithmetic: Algorithms and Hardware Designs. Oxford University Press, Oxford (2010)
Google Scholar
Lyakhov, P., Valueva, M., Valuev, G., Nagornov, N.: A method of increasing digital filter performance based on truncated multiply-accumulate units. Appl. Sci. 10 (2020)
Google Scholar

Download references

Acknowledgments

The work was supported by Russian Science Foundation, project № 22-71-00009.

Author information

Authors and Affiliations

North-Caucasus Federal University, Stavropol, Russia
N. N. Nagornov & N. F. Semyonova
North-Caucasus Center for Mathematical Research, Stavropol, Russia
A. S. Abdulsalyamova

Authors

N. N. Nagornov
View author publications
You can also search for this author in PubMed Google Scholar
N. F. Semyonova
View author publications
You can also search for this author in PubMed Google Scholar
A. S. Abdulsalyamova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. N. Nagornov .

Editor information

Editors and Affiliations

Information Systems Department, Stavropol State Agrarian University, Stavropol, Russia
Anatoly Alikhanov
Information Systems Department, Stavropol State Agrarian University, Stavropol, Russia
Pavel Lyakhov
Information Systems Department, Stavropol State Agrarian University, Stavropol, Stavropol Territory, Russia
Irina Samoylenko

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nagornov, N.N., Semyonova, N.F., Abdulsalyamova, A.S. (2023). High-Speed Wavelet Image Processing Using the Winograd Method. In: Alikhanov, A., Lyakhov, P., Samoylenko, I. (eds) Current Problems in Applied Mathematics and Computer Science and Systems. APAMCS 2022. Lecture Notes in Networks and Systems, vol 702. Springer, Cham. https://doi.org/10.1007/978-3-031-34127-4_36

Download citation

DOI: https://doi.org/10.1007/978-3-031-34127-4_36
Published: 06 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34126-7
Online ISBN: 978-3-031-34127-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

High-Speed Wavelet Image Processing Using the Winograd Method

Abstract

Similar content being viewed by others

Comparison of Approaches to the Circuits Design for DWT with CDF 9/7 Wavelet

Low-Power, Low-Area Multi-level 2-D Discrete Wavelet Transform Architecture

An Efficient Architecture for Modified Lifting-Based Discrete Wavelet Transform

Keywords

1 Introduction

2 Wavelet Image Processing Using the Direct Implementation and the Winograd Method

3 Acceleration of Wavelet Image Processing Using the Winograd Method

4 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

High-Speed Wavelet Image Processing Using the Winograd Method

Abstract

Similar content being viewed by others

Comparison of Approaches to the Circuits Design for DWT with CDF 9/7 Wavelet

Low-Power, Low-Area Multi-level 2-D Discrete Wavelet Transform Architecture

An Efficient Architecture for Modified Lifting-Based Discrete Wavelet Transform

Keywords

1 Introduction

2 Wavelet Image Processing Using the Direct Implementation and the Winograd Method

3 Acceleration of Wavelet Image Processing Using the Winograd Method

4 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation