Deep Representation and Analysis of Visual Information, Based on the IDP Decomposition

Jain, Lakhmi C.; Kountchev, Roumen K.; Kountcheva, Roumiana A.

doi:10.1007/978-981-97-0109-4_1

Lakhmi C. Jain⁸,
Roumen K. Kountchev⁹ &
Roumiana A. Kountcheva¹⁰

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 385))

Included in the following conference series:

International Workshop on New Approaches for Multidimensional Signal Processing

34 Accesses

Abstract

We present contemporary methods for image decomposition analysis in the spectrum domain, based on the Inverse Difference Pyramid (IDP) decomposition. The basic IDP implementations in various aspects of visual information processing and analysis are discussed, in the range from 2D images to third-order tensors. Special attention is paid to the main IDP features, which are compared with those of the famous pyramidal decompositions. The basic IDP modifications are presented: the Reduced Branched IDP, which could be implemented on the basis of various 2D orthogonal transforms (WHT, discrete Fourier transform DFT, DCT, KLT, etc.), and the upgrade to the Adaptive IDP, based on neural networks integration. Special approaches are introduced for the IDP-based decomposition for sequences of correlated images, and some important applications in multidimensional image tensor representation are given; the compression of single and groups of correlated multispectral, multi-view, and computer tomography images; and the faster object search in large image databases. The experimental results obtained by the approaches based on the IDP decomposition, confirm its efficiency which is very high for some image classes. In the conclusions the analysis results and the trends for future investigations and implementations are explained.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Keywords

1.1 Introduction

The widely used methods for hierarchical image decomposition, based on the pyramidal representation in the pixel domain, comprise of two main approaches, the Non-orthogonal Pyramids and the Orthogonal Pyramids [1, 2]:

Non-orthogonal Pyramids belong the Gaussian/Laplacian Pyramid (GP/LP); the Reduced/Enhanced Laplacian Pyramid, the Reduced-Sum/Reduced-Difference Pyramid, the Hierarchy-Embedded Differential Pyramid, etc.,
Orthogonal Pyramids are mainly represented by the Sub-Band Decomposition based on filter banks: the Orthonormal Wavelet Pyramid, the Steerable Pyramid (based on directional filter banks), the Curvelets Pyramid, the Ridgelet Pyramid, etc.

The approaches, based on the Laplacian/Gaussian decompositions are well-known and multiple applications already exist for image compression, analysis, machine learning, etc. [1,2,3]. Specific for these decompositions is, that in the lowest decomposition level the number of components is the largest, and decreases in each consecutive level (Fig. 1.1). The decomposition ends after the last coefficients of the highest level are calculated. Besides, these decompositions require the execution of multiple decimations and interpolations. Unlike this the IDP decomposition starts with a relatively low number of decomposition components and each level contains a larger number, but the decomposition can stop before the last possible level is executed, depending on the needed quality of the restored image.

An illustration represents the multiple decimations of an image under the Gaussian pyramid, leading to multiple levels of interpolation. The obtained outputs after the interpolations are arranged under the Laplacian pyramid. — **Fig. 1.1**

The Inverse Difference Pyramid [4] is a kind of hierarchical image decomposition, based on pyramidal representation in the spectrum domain. The IDP structure is based on orthogonal deterministic or statistic transforms, such as the Discrete Fourier Transform (DFT), the Discrete Cosine Transform (DCT), the Walsh-Hadamard Transform (WHT), the Complex Hadamard Transform (CHT), the Karhunen–Loeve Transform (KLT), etc. One modification of IDP is the Adaptive Inverse Difference Pyramid (AIDP), which could be based on the following: Back-Propagation Neural Networks (BPNN), Gaussian Radial Basis Function Networks (GRBFN), Self-Organizing Feature Mapping Vector Quantization (SOFM-VQ). Another kind of IDP decomposition is the Non-linear Inverse Difference Pyramid (NL-IDP), based on the Discrete Modified Mellin-Fourier Transform (MMFT).

A simplified structure of the basic IDP decomposition is shown on Fig. 1.2.

A block diagram represents the flow of an input image going through multiple stages, followed by levels 2, 1, and 0, sub-band arrangement, and lossless coding from top to bottom before obtaining the output coded data. — **Fig. 1.2**

The processing of the input (original) 2D image starts at the lowest level. For this, the image is represented by the matrix, [B]. To simplify the calculations, the image is divided into smaller pieces (sub-blocks), each of size 2ⁿ × 2ⁿ, and on the figure is shown the processing of one of these blocks only. For the decomposition any 2D orthogonal transform could be used like the Walsh-Hadamard Transform (WHT), the Discrete Fourier Transform (DFT), the Discrete Cosine Transform (DCT), the Karhunen–Loève Transform (KLT) or Principal Component Analysis (PCA), etc. In order to reduce the number of calculated transform coefficients, the “truncated” orthogonal transform is used, i.e., part of the coefficients are not calculated because (after analysis) their values are evaluated as too small to influence the quality of the restored image. The corresponding block in the figure is denoted as Truncated Orthogonal Transform, “TOT”. The values of the calculated coefficients are saved in the block \({[}{\hat{\boldsymbol{S}}}{]}\) (in the figure are used correspondingly: \({[}{\hat{\boldsymbol{S}}}_{{0}} {]}\)—for the start level; \({[}{\hat{\boldsymbol{S}}}_{1} {]}\)—for level 1, etc.). From the start level in this example 4 coefficients are retained, \(\hat{s}_{{0}} {(}u,v{)}\) only, as shown in the lower right part of the figure. The processing continues with the preparation of the data for the next decomposition level. For this, on the coefficient s₀ Inverse Orthogonal Transform (the block “IOT”) is executed, and the first is obtained, which is the coarse approximation of the processed image block, \({[}{\hat{\boldsymbol{B}}}_{{0}} {]}\). This approximation (i.e., the restored image sub-block) is subtracted from the original in “∑” and the difference (error) image is calculated. For this level, the “Error” image matrix is denoted as [E₀]. In the second level it is divided into smaller sub-blocks, of size 2ⁿ⁻¹ × 2ⁿ⁻¹, and after that processed with TOT again. The values of the calculated coefficients are saved in the corresponding blocks \({[}{\hat{\boldsymbol{S}}}_{1} {]}\). The calculated 16 coefficients for this level, are saved in the block \(\hat{s}_{{1}} {(}u,v{)}\). The coefficients in the next level are calculated in a similar way and the obtained corresponding 256 coefficients are retained and saved in the block \(\hat{s}_{{2}} {(}u,v{)}\). As it is shown, the number of coefficients is higher in each consecutive decomposition level, and this is why the decomposition is called “Inverse Difference Pyramid”. The decomposition can stop in any decomposition level, depending on the achieved quality of the restored image, or on the values calculated for the “difference” image. When the decomposition is stopped, the values of the coefficients, retained in all levels, are arranged in accordance with their spatial frequencies and are then losslessly coded through Adaptive Run-length Encoding (ARLE), followed by a Modified Huffman (MH) coding. The ARLE and the MH coding were specially developed for the IDP decomposition, in accordance with the specifics of the processed data [5]. The restoration of the decomposed image is executed in reverse order.

The main advantage of IDP is its efficiency when used for the compression of 2D or 3D visual information. Unlike Laplacian/Gaussian pyramids, the IDP decomposition does not use decimations and interpolations, because it is based on some kind of orthogonal transform. The computational cost of the process depends on the number of needed operations, analyzed in previous research of the authors [6]. The authors also proved theoretically the low computational complexity for the tensor representation of multidimensional visual information [7, 8]. One important disadvantage of IDP is, that if all coefficients are retained, the decomposition is over-complete. To avoid this, the properties of the used orthogonal transform, are used. For example, for the WHT it is not necessary to calculate all neighbor coefficients in a group, because for each 4 coefficients is necessary to calculate 3 only (the fourth is calculated by using the values of the remaining three). On the basis of this property, the problem with the decomposition over-completeness, is solved.

1.2 Branched IDP

To achieve a better performance for the processing and analysis of sequences of correlated images, the Branched Inverse Difference Pyramid (BIDP) was developed. This approach is extremely useful for processing sequences of medical images (computer tomography images, magnetic resonance images, etc.), and groups of multispectral or multi-view images. In all these cases, the processed groups of images have high mutual correlation. The BIDP block diagram is shown on Fig. 1.3. In this case, the basic IDP diagram is retained (the red rectangle), but some new relations between images in the processed group, are introduced.

A block diagram represents the input images going through the approximation levels and losslessly compressed data before obtaining the output. — **Fig. 1.3**

In the case shown on Fig. 1.3, one sequence of (2N + 1) images (representing same object or scene) is processed bound together by similarity, and one of the images is used as a reference.

To select the reference image, various approaches could be used, for example, through PSNR comparison, etc. For video sequences, the middle image in the group is usually the most suitable, and this is the easiest solution. Another approach is based on the analysis of the images histograms for the group: the image, whose histogram is most similar with these of the remaining in the group, is chosen to be used as a reference (R).

The image decomposition starts with this reference image, which is processed with some kind of orthogonal transform, using a limited number (preset) of transform coefficients. After inverse transform of the so calculated coefficients, the coarse approximation of the processed image is obtained. The IDP decomposition then branches out into several decompositions, whose number corresponds to the number of images in the group. The first approximation for all multispectral images is that calculated for the reference image. In case that IDP comprises of 2 levels only, each branch is built individually in the next level. For IDP of more levels, the reference image approximation could be used also in the second level, etc. The similarity between the processed images, permits the same coarse approximation to be used for the whole group. Depending on the visual contents, the number of images in one group could be different and is set in relation to their mutual correlation. For example, for video sequences, the highest correlation usually exists between each 8–12 sequential frames, and for CT and MRI sequences, longer groups could be used. Due to the result of the processing based on BIDP, high compression and very good visual quality of the restored images are achieved, as confirmed by the experimental results.

1.3 Multi-layer Tensor Decomposition Trough 3D IDP

The common practice in image representation is based on the use of a 2D matrix, where each pixel corresponds to one matrix element. Together with this, many contemporary applications exist, where video sequences and groups of correlated images, obtained from various sources must be stored, analyzed or searched, and for this, the most suitable approach is that they are to be treated as 3D arrays of matrices. Recently tensors, and specially the third-order tensors, are most suitable to represent such sequences. The main obstacle for the wide tensor decomposition implementation in real time applications is the high computational complexity. Tensor decompositions are usually based on deterministic discrete transforms of the kind: Discrete Wavelet Transform (DWT), or the Discrete Cosine Transform (DCT) followed by SVD in the frequency domain [9]. In part of the related publications [10,11,12,13], algorithms are proposed for cubical decomposition based on the 3D separable discrete transforms: Discrete Fourier Transform, Discrete Hartley Transform, Discrete Cosine Transform, etc. To reduce the computational complexity, in many cases “fast” algorithms are used, one of which is the 3D Fast Fourier Transform (FFT). Compared to the SVD/PCA-based algorithms, the tensor decomposition based on deterministic orthogonal transforms offers lower energy concentration in the first decomposition components but accelerates the computations. This is why, the tensor decomposition based on orthogonal transforms is reasonable in cases when real-time processing of various multidimensional data is needed.

The approach, presented here, is the hierarchical third-order tensor decomposition based on the 3D-IDP. For this, the tensors are transformed into the 3D-WHT spectrum space. The basic concept is to represent each third-order tensor X of size N × N × N through a 3D Reduced IDP (3D-RIDP) [14] of the kind, shown on Fig. 1.4. For this, the tensor X is initially divided into Q sub-tensors X_q for q = 1, 2, …, Q, each of size M × M × M, where \({\text{M}}{\kern 1pt} = {\kern 1pt} {\kern 1pt} \left\lfloor {{\text{N}}{/}\sqrt[{3}]{{\text{Q}}}} \right\rfloor\). The value of M is defined in accordance with the condition M = 2^m. For the calculation of each sub-tensor X_q of size N × N × N (N = 2ⁿ) is built the individual n-level 3D-RIDP. In result, the tensor X is transformed into the corresponding spectrum tensor S, which comprises n levels of coefficients. The coefficients in the initial level have the highest energy concentration, while the energy in the next levels decreases quickly. In correspondence with Parseval’s theorem, where the total energy of the coefficients of the tensor S is equal to that of the elements of the tensor X, but is redistributed. The main advantages of the method are the lower computational complexity because the only mathematical operations, which are needed, are “additions” and their number is relatively low. Furthermore, the main part of the tensor energy is concentrated in a small number of coefficients from the first pyramid level which permits significant information redundancy reduction, after neglecting the low-energy elements.

A block diagram of 3-D reduced I D P represents the flow of input proceeding through multiple stages arranged under levels 0, 1, and 2 for obtaining the output. — **Fig. 1.4**

The properties of the 3D-RIDP open new possibilities for implementation in various application areas related to processing and analysis of 3D data: sequences of correlated images (video, multi-spectral, multi-view, medical images from various sources), multichannel signals, etc.

1.4 Multi-layer Tensor IDP, Based on Hierarchical SVD

To achieve computational complexity reduction, a new non-iterative approach for multi-dimensional tensor representation based on the Multi-layer Tensor Spectrum Pyramid (MLTSP) [15] is proposed, with embedded 3D orthogonal transforms (3D OT) and Hierarchical Tensor SVD (HTSVD) [8]. This approach is illustrated by an example for the representation of a tensor of size 8 × 8 × 8 through a two-layer tensor spectrum pyramid (2LTSP), with embedded 3D Frequency-Ordered Fast Walsh-Hadamard Transform (3D FO-FWHT) [7] and HTSVD for a tensor of size 2 × 2 × 2 (HTSVD_2×2×2) [8]. To explain the multi-layer TSP structure, as an example here is used the two-layer tensor spectrum pyramid (2LTSP), which comprises a coder and a decoder. The structure of the decoder is mirror-symmetrical to that of the coder. Both block diagrams are shown on Fig. 1.5, a, b. The block diagram of the computational graph of the algorithm HTSVD_2×2×2 of two hierarchical levels for the decomposition of the elementary tensor \(\boldsymbol{S}_{2 \times 2 \times 2}\) of size 2 × 2 × 2, is shown in Fig. 1.6. The decomposition is based on SVD for the matrix [X] of size 2 × 2, denoted as SVD_2×2. The SVD_2×2 decomposition is executed through simple relations, of low computational complexity [15]. After the decomposition of \(\boldsymbol{S}_{2 \times 2 \times 2}\) is finished, the tensors in the resulting sum are arranged following the decrease of the variances for the sub-matrices obtained after the unfolding.

A 2-part illustration represents the flow from input to output proceeding through multiple layers grouped under 2 L T S P coder and 2 L T S P decoder. — **Fig. 1.5**

A block diagram exhibits that a cubical structure branches into 2 cubes under rearrangement 1 in Level 1. Each cube further branches into 2 cubical structures under rearrangement 2 in Level 2. Different 2 by 2 matrixes are indicated around the diagram. — **Fig. 1.6**

The voxels of higher values in the \({\boldsymbol{S}}_{2 \times 2 \times 2}\) decomposition in Fig. 1.6, are colored in red, and these of lower—in blue. The main advantages of the MLTSP algorithm are the low computational complexity, the high flexibility regarding the choice of their parameters, and the high ability for information redundancy reduction in the input tensor.

1.5 3D Adaptive Inverse Difference Pyramid with Convolutional Auto Encoder/Decoder

This adaptive IDP version is based on the use of a Convolutional Auto Encoder/Decoder (CED) [16]. The two components of the CED neural network are ained by deep learning. These are the Convolutional Coder (CE), is used to compress the input data, and the Convolutional Decoder (CD), which restores the already compressed input data. On Fig. 1.7, the block diagram of one multi-layer CED is shown, of the kind m³ → n → m³, i.e., it comprises m³ input cells, n cells in the hidden layer (the output of the coder and the input of the decoder, respectively), and m³ output cells. After the end of the iterative CED training, the values obtained for the output cells approximate these from the input cells, at minimum root mean square error.

A 2-part illustration represents 2 neural network diagrams grouped under convolutional encoder and convolutional decoder. The middle section between the layers is denoted as n hidden cells. The encoder section has m 3 input cells and the decoder section has m 3 output cells. — **Fig. 1.7**

Two new approaches are offered here for the compression of a single third-order (cubical) tensor.

The First approach is based on the Adaptive Inverse Difference Pyramid (AIDP) structure, combined with CED. On Fig. 1.8 the corresponding three-level block diagram is shown. Here the third-order tensor of size m × m × m, enters the AIDP input. In the initial (zero) hierarchical AIDP level, the elements of the input tensor are arranged as a sequence of length m³, following a preselected rule. This sequence defines the m³-dimensional vector, which enters the auto-encoder CED-0. In this case, the hidden layer contains n cells (for n << m³), while the output layer is of m³ cells. From them, after inverse rearrangement, the output third-order tensor of size m × m × m is restored. After the CED-0 self-training is finished, the so obtained output tensor approximates the input tensor, of size m × m × m. In the first summator (Σ₁), the approximated tensor is subtracted element-by-element from the input tensor, and as a result the difference tensor is obtained, which corresponds to the approximation error. In the first hierarchical AIDP level, the difference tensor is divided into 8 sub-tensors, each of size (m/2) × (m/2) × (m/2). The elements of these sub-tensors are transformed into eight (m³/8)-dimensional vectors respectively. They enter sequentially the input of CED-1, whose hidden layer contains (n/2) cells, and the output layer—(m³/8) cells. After the self-training of CED-1 for these sub-tensors is finished, the corresponding 8 sub-tensors of length (m³/8) are restored, which comprise the approximating difference tensor. In the second summator (Σ₂), the approximated difference tensor is subtracted element-by-element from the first approximation, and in the end the second difference tensor, i.e., the second approximation of the tensor is obtained. In the second hierarchical AIDP level, the difference tensor is divided into 64 sub-tensors, each of size (m/4) × (m/4) × (m/4), which are then transformed into the corresponding (m³/64)-dimensional vectors. They enter sequentially the auto-encoder CED-2, whose hidden layer comprises (n/4) cells, and the output layer, (m³/64) cells.

A block diagram of the two-level adaptive I D P represents the flow of input through multiple units under the encoder, which proceed to different units under the decoder to obtain the output of a 3-D image. — **Fig. 1.8**

After the end of the CED-2 self-training, from the so calculated 64 output vectors of length (m³/64), the corresponding 64 sub-tensors are restored which build the approximated second difference tensor. Each of the restored tensors in the AIDP levels, has a corresponding feedback to the CED-0, CED-1, and CED-2 coders. These connections are used in the self-training process of the auto-encoders. At the output of the AIDP zero level, an n-dimensional vector is obtained, which comprises the cells of the hidden layer of the trained CED-0. At the output of the AIDP first level, the (4n)-dimensional vector which is the concatenation of the elements of the 8 vectors is obtained, each of length (n/2), built by the cells of the CED-1 hidden layer, for the corresponding input vector. At the output of the AIDP second level, the (16n)-dimensional vector is obtained which is the concatenation of the elements of the 64 vectors, (each of length (n/4), built by the cells of the hidden layer of the trained CED-1, for the corresponding input vector. The output n-dimensional vector for the zero AIDP level is the shortest, but it carries the largest information volume for the input tensor of size m × m × m. The output vectors from the first and second AIDP level, are correspondingly 4 and 16 times longer than these from the zero level, but they carry much less information about the input tensor. This permits significant reduction of the data obtained from the AIDP outputs, without noticeable information loss; i.e., the meaningless information is filtered (neglected). The number of AIDP levels together with the size growth of the input third-order tensor, should be increased.

The Second approach, aimed at tensors sequence compression (for example, color RGB images), as illustrated in Fig. 1.9. Here, a tensor decomposition is shown, based on the 2-level 3D Branched IDP (3D BIDP). The levels numbers are p = 0.1 [17], in correspondence to the block diagram from Fig. 1.3. Each tensor is of size M × N × 3; the input sequence is denoted as \({\boldsymbol{X}}_{t - 1}\), \({\boldsymbol{X}}_{t}\), \({\boldsymbol{X}}_{t + 1}\) for k = 1, and in the time moments t − 1, t, t + 1, where it contains 3 matrices of size M × N. For the moment, at the input of the trained CED arrives the tensor \({\boldsymbol{X}}\), and at the output is obtained the approximated tensor, \({\hat{\boldsymbol{X}}}\). The approximation accuracy depends both on the CED training, and on the neuron number (n) in the hidden layer. These neurons are the components of the corresponding output n-dimensional vector, s.

A block diagram explains the input data going through multiple layers of M times N times 3, followed by the encoder, C E D 1, C E D k + 1, and C E D 2 k + 1 coders, and losslessly compressed data for obtaining the output. — **Fig. 1.9**

Due to the result of the 3D BIDP implementation for the tensor sequence \({\boldsymbol{X}}_{t - 1}\), \({\boldsymbol{X}}_{t}\), \({\boldsymbol{X}}_{t + 1}\), the output vectors \({\boldsymbol{s}}_{0,t}\), \(\Delta {\boldsymbol{s}}_{1,t - 1}^{i}\), and \(\Delta {\boldsymbol{s}}_{1,t + 1}^{i}\) for i = 1, 2, 3, 4 are obtained of total length n + 4n + 4n + 4n = 13n. Due to high correlation existing between the sequential tensors \({\boldsymbol{X}}_{t - 1}\), \({\boldsymbol{X}}_{t}\), \({\boldsymbol{X}}_{t + 1}\), the values of the significant part of the components in the difference vectors \(\Delta {\boldsymbol{s}}_{1,t + 1}^{i}\) and \(\Delta {\boldsymbol{s}}_{1,t + 1}^{i}\) are close to zero. In this way, the input tensors are transformed into an output vector of small length which contains many zero values, i.e., the features’ space is reduced at minimum computational cost. For the calculations reduction the mutual correlation between the tensors is used, which determines the relation \({\hat{\boldsymbol{X}}}_{t} \approx {\hat{\boldsymbol{X}}}_{t - 1} \approx {\hat{\boldsymbol{X}}}_{t + 1}\). In the result, the calculation of tensors \({\hat{\boldsymbol{X}}}_{t - 1}\) and \({\hat{\boldsymbol{X}}}_{t + 1}\) through the corresponding CEDs, is not necessary. In the second level (p = 1) of the 3D BIDP, each difference tensor \({\boldsymbol{E}}_{0,t} = {\boldsymbol{X}}_{t} - {\hat{\boldsymbol{X}}}_{t}\), \({\boldsymbol{E}}_{0,t - 1} = {\boldsymbol{X}}_{t - 1} - {\hat{\boldsymbol{X}}}_{t}\), and \({\boldsymbol{E}}_{0,t + 1} = {\boldsymbol{X}}_{t + 1} - {\hat{\boldsymbol{X}}}_{t}\), is divided into four sub-tensors of size (M/2) × (N/2) × 3, and for each, a corresponding CED is used. The neurons in the hidden layers of all CED in the level p = 1 are represented by the vectors \({\boldsymbol{s}}_{1,t}\), \({\boldsymbol{s}}_{1,t - 1}\), and \({\boldsymbol{s}}_{1,t + 1}\).

Applications

In this part of our work, the experimental results obtained for some of the most important applications are shown: for compression and image content protection, and for efficient object search in large image databases.

1.5.1 Compression of Multidimensional Images

The compression algorithms are developed both for single images, and for groups of correlated images or sequences. The approach is based on the IDP decomposition with Back Propagation Neural Networks (BPNN). In Table 1.1 are shown some comparison results obtained for several widely used test images, when approximately the same quality of the restored images is achieved. The results for the IDP-BPNN decomposition are given for two-level IDP with initial sub-blocks of size 8 × 8. As it could be noticed, the compression ratio for IDP-BPNN for most of the test images is approximately two times higher, while the quality of the restored images (evaluated by their PSNR) is close or better than this for JPEG 2000.

Table 1.1 Comparison results for JPEG2000 and IDP-BPNN

Full size table

The use of the BIDP for sequences or groups of correlated images offers similar results, part of which are given in Table 1.2. For the evaluation were used various kinds of medical images. The following abbreviations are used: CT—computer tomography; MG—mammography; NM—nuclear magnetic; US—ultrasound; dcm—Dicom; and jp2—JPEG2000. The “idp” format was specially created for the IDP decomposition [18]. The header contains detailed information about the decomposition structure—the number of levels, the used transform for each level (for example, DCT WHT, etc.), the arrangement of the retained coefficients, and the kind of lossless compression applied on the compressed image data. The lossless compression method is based on adaptive run-length coding, which corresponds to the data statistics. The results show that IDP offers higher compression ratio than Dicom, and is comparable in efficiency with JPEG 2000, but at lower computational cost (the wavelet transform is more complicated than WHT or DCT). Additional advantage of IDP is the sub-block structure, which offers high flexibility in observing medical images, and permits enlargement (on request) of a selected Region Of Interest (ROI). This is an important feature in case of remote diagnostics and medical decision support applications.

Table 1.2 Compression results for various kinds of medical images

Full size table

1.5.2 Content Protection of Visual Information

The IDP-based decomposition permits the insertion of invisible resistant watermark (WM) [19]. The block diagram of a two-level WM insertion algorithm, is shown in Fig. 1.10. In this case, the IDP structure is retained, but the decomposition is based on the Complex Hadamard Transform (CHT). The watermark is prepared in the same way, as the image (i.e. decomposed into 2 or 3 levels), and in the watermarking process, its coefficients are added to the image decomposition coefficients. To retain the image quality unchanged (i.e., “invisible” watermarking), the WM information is inserted in the phases of selected coefficients, and could be extracted by using a special decoding software. The watermark “depth” depends on the phase rotation angle, for example, for rotations in the range 0–20°, the PSNR is always higher than 35 dB, i.e., the visual quality of the watermarked test images is retained. In Fig. 1.10, the following notations are used: w₀(r)—the decomposed watermark data for level 0; w₁(r)—the decomposed watermark data for level 1; Z₀(r)—the calculated coefficients for the initial (zero) level, with the inserted watermark; Z₁(r)—the calculated coefficients for level 1, with the inserted watermark. For a decomposition of higher number of levels, the structure is retained. The so calculated coefficients are arranged following their spatial frequency, and are losslessly coded. With this, the coding procedure is finished. The decoding is executed in reverse order.

A block diagram denotes an input image going through multiple stages, followed by watermark level 1, watermark level 0, coefficients arrangement, and coding for obtaining the compressed watermarked image data. — **Fig. 1.10**

The main advantages of the algorithm [19], are:

The algorithm is highly resistant against attacks, based on high-frequency filtration (JPEG compression), which is confirmed by the almost constant MSE value for the extracted watermark.
The algorithm permits insertion of significant amounts of data (the number of inserted bits could be approximately equal to ¾ of the total number of pixels).
The algorithm is highly resistant against attacks related to image editing of the kind: crop, rotations, etc.
The algorithm permits to insert different watermarks in each consecutive decomposition level, which is an additional tool to ensure hierarchical access control.

1.5.3 Object Search in Large Image Databases

Contemporary databases contain huge number of images, video sequences, etc., and sometimes the search needs too much time. The layered structure of the IDP-based decomposition gives significant abilities for the search process enhancement.

For this, the images in the database and the query image are decomposed in a similar way. The retained decomposition coefficients from all decomposition levels, which represent the query image, constitute the “Cognitive 3D IDP model” (Fig. 1.11). The retained coefficients are used for the layer-by-layer comparison and evaluation in the search process.

A block diagram exhibits the input data going through multiple layers of query image decomposition, feature extraction, model evaluation, and N Ntuning before obtaining the final model description. — **Fig. 1.11**

In case that the search is aimed at a specified group of images, the method permits to select in advance the most suitable group of coefficients (which ensure the highest similarity), so as to enhance the process significantly.

The model is based on the n-level IDP decomposition under Neural Network (NN) control, shown on Fig. 1.12. The accuracy of the 3D model in the IDP level p is defined by the NN in the preceding level p − 1, and as a result the minimum mean-square approximation error in the restored image for the corresponding level is obtained.

A block diagram exhibits the input images from the image database going through the layers of I D P, similarity estimations for levels 0, 1, and n. They proceed to the next level if the results are good and go to query image I D P model through N N tuning if there are more models. — **Fig. 1.12**

The comparison starts in the initial (lowest) decomposition level. The similarity of the first approximations of the query image and the images in the database is evaluated, and for the search in the next level the closest images only are retained. Thus, the number of analyzed images in each consecutive level is reduced, which enhances the process efficiency.

1.6 Conclusions

In this work, the basic IDP implementations in various aspects of visual information processing and analysis are discussed, starting with single images, and upgrading to the third-order tensor representation of multidimensional visual information. The main advantages of IDP compared to other well-known decompositions, are the lower computational complexity, the feasible hardware implementation and the application flexibility. Very good results are achieved in processing and compression of correlated sequences of medical images.

The analysis and the related experiments show that the basic pyramidal structure of the IDP decomposition suits the 2D and 3D (tensor) implementations in various applications: compression of visual information; content protection, efficient object search, etc. The layered architecture permits adaptive processing and flexible approach, with changeable structure.

The future work will be mostly aimed at the implementation in new medical devices based on the IDP decomposition, which will support remote diagnostics and medical decision support integrated in the contemporary smart communications. Special interest attract some new technologies, which need processing of continuously changing information, among which are the Digital twins and the Mip-Map technology. The flexibility of the IDP structure permits easy development of adaptive decomposition architectures.

References

Burt, P., Adelson, E.: The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983)
Article Google Scholar
Lin, T., Ma, Z., Li, F., He, D., Li, X., Ding, E., Wang, N., Li, J., Gao, X.: Drafting and revision: Laplacian pyramid network for fast high-quality artistic style transfer. Sci. Comput. Vis. Pattern Recogn. arXiv:2104.05376 [cs.CV] (2021)
Liang, H., Gong, Y., Kervadec, H., Li, C., Yuan, J., Liu, X., Zheng, H., Wang, S.: Laplacian pyramid-based complex neural network learning for fast MR imaging. In: Proceedings of Machine Learning Research (Proceedings of the Third Conference on Medical Imaging with Deep Learning), vol. 121, pp. 454–464 (2020)
Google Scholar
Kountchev, R., Rubin, S., Milanova, M., Kountcheva, R.: Comparison of image decompositions through inverse difference and Laplacian pyramids. J. Multimed. Data Eng. Manag. 6(1), 19–38 (2015)
Google Scholar
Milanova, M., Todorov, V.L., Kountcheva, R.: Lossless data compression for image decomposition with recursive IDP algorithm. In: 17th International Conference on Pattern Recognition (ICPR), Cambridge, UK, pp. 823–826 (2004)
Google Scholar
Kountchev, R., Mironov, R., Kountcheva, R.: Complexity estimation of cubical tensor represented through 3D frequency-ordered hierarchical KLT. MDPI Symmetry 12(10), 1605, SI “Advances in Symmetric Tensor Decomposition Methods” (2020)
Google Scholar
Kountchev, R., Mironov, R., Kountcheva, R.: Hierarchical cubical tensor decomposition through low-complexity orthogonal transforms. Symmetry 12, 864 (2020)
Article Google Scholar
Kountchev, R., Kountcheva, R.: Comparative analysis of the hierarchical 3D-SVD and reduced inverse tensor pyramid in regard to famous 3D orthogonal transforms. In: Kountchev, R., Mironov, R., Li, S. (Eds.) Proceedings of NAMSP 2020, Springer SIST Series, pp. 35–56 (2021)
Google Scholar
Grasedyck, L.: Hierarchical singular value decomposition of tensors. SIAM J. Matrix Anal. Appl. 31(4), 2029–2054 (2010)
Article MathSciNet Google Scholar
Bergqvist, G., Larsson, E.: The higher-order singular value decomposition: theory and an application. IEEE Signal Process. Mag. 27(3), 151–154 (2010)
Article Google Scholar
Cichocki, A., Mandic, D., Phan, A., Caiafa, C., Zhou, G., Zhao, Q., De Lathauwer, L.: Tensor decompositions for signal processing applications: from two-way to multi-way component analysis. IEEE Signal Process. Mag. 32(2), 145–163 (2015)
Article Google Scholar
Sakai, T., Sedukhin, S.: 3D discrete transforms with cubical data decomposition on the IBM Blue Gene/Q. Technical Report 2013-001, Graduate School of Computer Science and Engineering, The University of Aizu, 31 p. (2013)
Google Scholar
Woods, J.: Multidimensional Signal, Image, and Video Processing and Coding, 2nd edn. Academic Press, Elsevier, Amserdam (2012)
Google Scholar
Kountchev, R., Kountcheva, R.: Low computational complexity third-order tensor representation through inverse spectrum pyramid. In: Kountchev, R., Patnaik, S., Shi, J., Favorskaya, M. (Eds.) Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology - Methods and Algorithms, vol. 1, pp. 63–76. Springer (2020)
Google Scholar
Kountcheva, R., Mironov, R., Kountchev, R., MLTSP: New 3D framework, based on the multilayer tensor spectrum pyramid. MDPI Symmetry 12(14), 1909, September (2022)
Google Scholar
Girin, L., Leglaive, S., Bie, X., Diard, J., Hueber, T., Alameda-Pineda, X.: Dynamical variational autoencoders: a comprehensive review. Found. Trends Mach. Learn. 15(1–2), 1–175 (2021)
Article Google Scholar
Kountchev, R., Kountcheva, R.: Tensor spectral pyramid for color video sequences representation, based on 3D FO-AHKLT. In: Kountchev, R., Mironov, R., Nakamatsu, K. (eds.) New Approaches for Multidimensional Signal Processing (NAMSP’22). Springer SIST series, vol. 332, Chap. 4, pp. 31–42 (2022)
Google Scholar
Todorov, Vl., Kountcheva, R.: New format for coding of still images based on the IPD. In: Kountchev, R. (Ed.) New Approaches in Intelligent Image Processing, pp. 198–200. WSEAS Books (2013)
Google Scholar
Kountchev, R., Todorov, Vl., Kountcheva, R.: Fragile and resistant image watermarking based on inverse difference pyramid decomposition. WSEAS Trans. Signal Process. 3(6), 101–112 (2010)
Google Scholar

Download references

Acknowledgements

This work was supported by the Bulgarian National Science Fund: Project No. KP-06-H27/16: “Development of efficient methods and algorithms for tensor-based processing and analysis of multidimensional images with application in interdisciplinary areas”.

Author information

Authors and Affiliations

KES International, Selby and University of Arad, Arad, Romania
Lakhmi C. Jain
Technical University of Sofia, Bul. Kl. Ohridsky 8, Sofia, Bulgaria
Roumen K. Kountchev
TK Engineering, Sofia, Bulgaria
Roumiana A. Kountcheva

Authors

Lakhmi C. Jain
View author publications
You can also search for this author in PubMed Google Scholar
Roumen K. Kountchev
View author publications
You can also search for this author in PubMed Google Scholar
Roumiana A. Kountcheva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roumiana A. Kountcheva .

Editor information

Editors and Affiliations

Technical University of Sofia, Sofia, Bulgaria
Roumen Kountchev
Technical University of Sofia, Sofia, Bulgaria
Rumen Mironov
Technical University of Sofia, Sofia, Bulgaria
Ivo Draganov
TK Engineering, Sofia, Bulgaria
Roumiana Kountcheva
University of Hyogo, Kobe, Japan
Kazumi Nakamatsu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jain, L.C., Kountchev, R.K., Kountcheva, R.A. (2024). Deep Representation and Analysis of Visual Information, Based on the IDP Decomposition. In: Kountchev, R., Mironov, R., Draganov, I., Kountcheva, R., Nakamatsu, K. (eds) New Approaches for Multidimensional Signal Processing. NAMSP 2023. Smart Innovation, Systems and Technologies, vol 385. Springer, Singapore. https://doi.org/10.1007/978-981-97-0109-4_1

Download citation

DOI: https://doi.org/10.1007/978-981-97-0109-4_1
Published: 29 June 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2743-8
Online ISBN: 978-981-97-0109-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics