Keywords

3.1 Introduction

The automation of scientific researches based on video recording, smart vision, and video coding applications require that changes in a sequence of frames be detected without observer’s assistance. Variations in object brightness, color, and size are easily detectable by Mean-Square Error (MSE) and Peak Signal to Noise Ratio (PSNR) energy criteria [1]. However, a number of tasks require algorithms that react on structural (texture) variations of images. Generally, an image region with variable structure is formless.

Another problem of vision systems is a variation of observation conditions and also interference. The problem solution is based on the use of high robust and interference-immunity decision making algorithms. The detection of variations in the image segment structure is based on spectral and correlation analysis of spatial-temporal domain. At present, the quasi-optimum heuristic algorithms applying variations of correlation features exist. However, they are non-invariant to spectrum in various bases in relation to a segment movement and change of texture features.

The structural differences of images can be determined by various techniques. Figure 3.1 shows the criteria and the metrics being a basis to detect these differences. The authors analyzed difference estimation methods and algorithms for images presented by the numbered blocks.

Fig. 3.1
figure 1

Methods and algorithms for structural image difference detection, where 1 are algorithms based on MSSIM, MNSSIM1, and MNSSIM2 structural criteria, 2 are algorithms based on difference of generalized orthogonal basis spectra, 3 are algorithms based on difference of correlation features of moving video sequence fragments, 4 are spectral algorithms for texture anisotropy detection and estimation, 5 is a spectral algorithm for moving object detection, 6 are algorithms based on the Kullback and the Bhattacharyya information metrics

The MSSIM criterion [24] and its modification MNSSIM [5] are the tri-criterion functionals that respond to the changes of brightness, contrast, and correlation features of image. Therefore, the MSSIM, the MNSSIM, and other modifications are the energy criteria for detecting image variations. They are sensitive to texture variations as well. The growing popularity of these criteria is proved by their quite appropriate compliance with the human vision system.

The image as a whole or its separate blocks can be expanded in a generalized Fourier series by using the system of orthogonal functions \({\upvarphi }_{km} (x,y)\). The following methods can be emphasized among a multitude of orthogonal basis: the Discrete Cosine Transform (DCT) [1] and its integer variant called pseudo-cosine transform [6], Walsh-Hadamard transform [1], wavelet transform [7, 8]. In the researches [912], the class of discrete polynomial transforms and easily version discrete Chebyshev transform or the Generalized DCT (GDCT) were proposed. The GDCT has a number of special properties that allow the efficient image processing.

The current research has proved that the spectral algorithms can be a base to implement the image structural variation detectors, which are robust to the change of observation conditions and interference. Let us notice that the MNSSIM algorithms and spectral algorithms are quite simple in computation.

The next Sect. 3.2 covers examples of using the MSSIM and the MNSSIM. Section 3.3 provides the description of spectral criteria of structural image similarity. Spectral field variation detection is discussed in Sect. 3.4. Experimental confirmation of structural similarity criteria is situated in Sect. 3.5. Conclusion is drawn in Sect. 3.6.

3.2 Pixel Structural Similarity Criteria

The task of detection and estimation for the structural similarity of two images having uncertain structures is a crucial issue in computer vision. The random images or frames of video sequence can be analyzed. This task cannot be formalized in full and has not yet been solved unambiguously. Objective structural similarity criteria may be classified in the following manner: single factor criteria, multi-factor criteria, and integral criteria being a combination of single factor and multi-factor criteria.

One of the simplest single factor criteria is a deviation of MSE. The MSE is determined by Eq. 3.1 for a separate image brightness or color component, where \(X,Y\) are images under a comparison (\(X = \left( {x_{ij} } \right)\), \(Y = \left( {y_{ij} } \right)\), \(i = 1 \ldots n\), \(j = 1 \ldots m\)).

$$MSE(X,Y) = \frac{{\sum\nolimits_{i = 1,j = 1}^{n,m} {(x_{ij} - y_{ij} )^{2} } }}{n \cdot m}$$
(3.1)

This criterion cannot be applied to human vision system. According to the MSE criterion, the images differ from each other, if the brightness reduces by 5 % only (the human vision system does not recognize this while the brightness options of different computer screens vary much more). At the same time, the images with the pronounced color variation of separate point, mild stripes, or frequency distortion resulting in a sharpness loss will be recognized as “almost unchanged”.

The widely used nowadays PSNR for a separate brightness or color component of an image is expressed by Eq. 3.2, where MAX I is the maximum value assumed by the image component element.

$$PSNR(X,Y) = 10 \cdot \log \left( {\frac{{MAX_{I}^{2} }}{MSE}} \right) = 10 \cdot \log_{10} \frac{{255^{2} \cdot n \cdot m}}{{\sum\nolimits_{i = 1,j = 1}^{n,m} {(x_{ij} - y_{ij} )^{2} } }}$$
(3.2)

For an RGB image, each component R, G, or B occupies 8 bits and, hence, MAX I  = 28 − 1 = 255 for this image. This measure is appropriate due to the logarithmic scale. It has the same drawbacks that root-mean-square deviation does [1, 13].

In a number of cases, the criterion should assume a variation of all image colors. The total PSNR for a full color RGB image PSNRRGB is calculated based on the summed squared error of the components provided by Eq. 3.3, where the maximum value is \(\hbox{max}\,S_{\varSigma }^{2} = 3 \cdot 255^{2} \cdot n^{2} \cdot \nu\) (\(n \times n\) is a number of pixels in a block, for the sake of simplicity the blocks are assumed to be square), v is a number of blocks, \(\upeta = 1 \ldots\upnu\) is a number of the current block, \((x_{i,\,j}^{{\left(\upeta \right)}} ,\;y_{i,\,j}^{{\left(\upeta \right)}} )_{R,G,B}\) are brightness of R, G, B components of pixels of blocks of images under comparison.

$$S_{\varSigma }^{2} = \sum\limits_{{\upeta = 1}}^{\upnu} {\left\{ {} \right.\sum\limits_{i,\,j = 1}^{n} {\left( {x_{i,\,j}^{{\left(\upeta \right)}} - y_{i,\,j}^{{\left(\upeta \right)}} } \right)_{R}^{2} } + \sum\limits_{i,\,j = 1}^{n} {\left( {x_{i,\,j}^{{\left(\upeta \right)}} - y_{i,\,j}^{{\left(\upeta \right)}} } \right)_{G}^{2} } + } \sum\limits_{i,\,j = 1}^{n} {\left( {x_{i,\,j}^{{\left(\upeta \right)}} - y_{i,\,j}^{{\left(\upeta \right)}} } \right)_{B}^{2} } \}$$
(3.3)

Hence, Eq. 3.3 can be re-written as Eq. 3.4.

$$PSNR_{RGB} = 10 \cdot \lg \left( {{\raise0.7ex\hbox{${\hbox{max} S_{\varSigma }^{2} }$} \!\mathord{\left/ {\vphantom {{\hbox{max} S_{\varSigma }^{2} } {S_{\varSigma }^{2} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${S_{\varSigma }^{2} }$}}} \right) = 10 \cdot \lg \left( {\frac{{3 \cdot 255^{2} \cdot n^{2} \cdot\upnu}}{{S_{\varSigma }^{2} }}} \right)$$
(3.4)

The PSNR can be calculated for images in the YUV color format and other formats by equations similar to Eqs. 3.33.4. The combined criterion PSNRRGB provides the image similarity performance, which is more relevant for human vision. However, Eqs. 3.13.4 are slightly sensitive to texture image changes.

At the moment one of the criteria closest to the subjective perception of the recovered image quality, if the MSSIM [24, 14] characterizes a similarity of X and Y images by brightness, contrast, and structure, i.e. it is tri-factor. It appears as Eq. 3.5, where \(X_{\upeta} ,Y_{\upeta}\) are the images compared in a block having number \(\upeta = 1 \ldots\upnu\), v is a number of blocks.

$$MSSIM = \frac{1}{\upnu} \cdot \sum\limits_{{\upeta = 1}}^{\upnu} {SSIM(X_{\upeta} ,Y_{\upeta} )}$$
(3.5)

The SSIM criterion is a block criterion. This means that it is applied not to the whole image at once but to its separate parts—equal blocks of the image, and later this value is averaged by all computed blocks producing the resulting MSSIM value for the whole image. In the general case, the SSIM(X, Y) value for each block is calculated by Eq. 3.6, where \(l(x,y)\)is a brightness comparison functional, \(c(x,y)\) is a contrast comparison functional, \(s(x,y)\) is a structure comparison functional, \(\upalpha\), \(\upbeta\), \(\upgamma\) are the control coefficients.

$$SSIM(X,Y) = l(x,y)^{\upalpha} c(x,y)^{\upbeta} s(x,y)^{\upgamma}$$
(3.6)

In accordance with control coefficients α = β = γ = 1 [4], the comparison functionals in the blocks are calculated by Eq. 3.7.

$$l(x,y) = \frac{{2\upmu_{x}\upmu_{y} + C_{1} }}{{\upmu_{x}^{2} +\upmu_{y}^{2} + C_{1} }}\quad c(x,y) = \frac{{2\upsigma_{x}\upsigma_{y} + C_{2} }}{{\upsigma_{x}^{2} +\upsigma_{y}^{2} + C_{2} }}\quad s(x,y) = \frac{{\upsigma_{xy} + C_{3} }}{{\upsigma_{x}\upsigma_{y} + C_{3} }}$$
(3.7)

Here variables have the following meanings:

– μ X , μ Y are the sample mean for image blocks X η and Y η, respectively,

$$\mu_{X} = \frac{1}{{N^{2} }} \cdot \sum\limits_{i,\,j = 0}^{N - 1} {x_{ij} } \quad \mu_{Y} = \frac{1}{{N^{2} }} \cdot \sum\limits_{i,\,j = 0}^{N - 1} {y_{ij} } .$$

– σ 2 X , σ 2 Y are sample variance for image blocks X η and Y η, respectively,

$$\upsigma_{X}^{2} = \frac{1}{{N^{2} }} \cdot \sum\limits_{i,\,j = 0}^{N - 1} {(x_{ij} - \mu_{X} )^{2} }\;\upsigma_{Y}^{2} = \frac{1}{{N^{2} }} \cdot \sum\limits_{i,\,j = 0}^{N - 1} {(y_{ij} - \mu_{Y} )^{2} }.$$

\(\upsigma_{XY}\) is a moment of correlation between image blocks X η and Y η

$$\upsigma_{XY} = \frac{1}{{N^{2} }} \cdot \sum\limits_{i,\,j = 0}^{N - 1} {(x_{ij} - \mu_{X} )(y_{ij} - \mu_{Y} )} .$$

Within constant \(C_{3}\), a functional \(s(x,y)\) coincides with the Pearson’s sample correlation coefficient. C1, C2, C3 are small constants preventing incorrect behavior of the criterion when the moments are cleared. In accordance with [24] one can assume \(C_{1} = \left( {0.01L} \right)^{2} ,C_{2} = \left( {0.03L} \right)^{2} ,C_{3} = 0.5C_{2} ,L\) is image bit width.

The MSSIM criterion assumes values from −1 to 1. The value of 1 is obtained only in the case, when one and the same image is compared. This means that closer an image compared to the original image, closer the criterion value to 1.

A selection of functional \(s(x,y)\) in Eq. 3.7 as a measure of structural difference is mostly justified, when the comparing vectors have the values with multivariate Gaussian distribution. Therefore, the MSSIM criterion perfectly distinguishes textures in a form of Gaussian noise. When the laws of distortion distribution are unknown (non-Gaussian), it is reasonable to apply estimates of the respective structural characteristics based on nonparametric statistics [15]. To estimate mean brightness, it is efficient to use a sample median with higher stability as compared with the sample mean and one of the rank correlation coefficients instead of the Pearson’s correlation coefficient. Two modifications of the structural similarity criterion based on the nonparametric MNSSIM1 and MNSSIM2 was proposed in [5, 7].

For MNSSIM1, the functionals below (Eq. 3.8) are suggested instead of Eq. 3.7

$$l(x,y) = \frac{{2m_{x} m_{y} + C_{1} }}{{m_{x}^{2} + m_{y}^{2} + C_{1} }}\quad c(x,y) = \frac{{2{\upsigma}m_{x}\upsigma m_{y} + C_{2} }}{{{\upsigma}m_{x}^{2} + \, {\upsigma}m_{y}^{2} + C_{2} }}\quad s(x,y) = Rs(x,y),$$
(3.8)

where \(m_{x} = mediana(\vec{x})\) and \(m_{y} = mediana(\vec{y})\) are medians of the brightness vectors in image blocks x and y, respectively, \(\sigma m_{x}^{2} = mediana\left[ {(\vec{x} - m_{x} )^{2} } \right]\) and \(\sigma m_{y}^{2} = mediana\left[ {(\vec{y} - m_{y} )^{2} } \right]\) are medians of the squared vector difference of brightness and the median, Rs(xy) is Spearman’s rank correlation coefficient [15]. The Rs values change from −1 to 1, while Rs = 0 means the absence of correlation.

The functionals (Eq. 3.9) are used in the MNSSIM2.

$$l(x,y) = \frac{{2m_{x} m_{y} + C_{1} }}{{m_{x}^{2} + m_{y}^{2} + C_{1} }}\quad c(x,y) = \frac{{2\upsigma_{x}\upsigma_{y} + C_{2} }}{{\upsigma_{x}^{2} +\upsigma_{y}^{2} + C_{2} }}\quad s(x,y) = Rs(x,y)$$
(3.9)

In other words, a contrast comparison functional \(c(x,y)\) should remain the same as for the MSSIM (Eq. 3.7). The structure comparison functional \(s(x,y)\) and brightness comparison functional \(l(x,y)\) should be used as those used in the MNSSIM1 (Eq. 3.8). Constants C 1 and C 2 from Eqs. 3.83.9 are identical to those used to calculate the MSSIM [2, 4]. The MNSSIM2 criterion is computationally simpler than the MNSSIM1.

Therefore, the MNSSIM1 and the MNSSIM2 criteria are the tri-factor criteria that use the nonparametric estimations of random field parameters. Nonparametric criteria of the MNSSIM1 and the MNSSIM2 structural similarities are practically identical to the MSSIM criterion at the presence of Gaussian distortion. However, if a point interference or other non-Gaussian statistics interference or a block distortion take place, then the MNSSIM1 and the MNSSIM2 metrics are better for human subjective vision than the estimation by the MSSIM.

3.3 Spectral Criteria of Structural Image Similarity

The image in a block can be expanded into a generalized Fourier series by the system of continuous \({\upvarphi }_{km} (x,y)\) or discrete orthogonal functions \({\upvarphi }_{km} (i,\,j)\) [1012, 1719].

Let us examine the main types and features of orthogonal transforms for continuous arguments. A sequence of functions \(\left\{ {{\upvarphi }_{k} (z)} \right\}\), \(k = 0,1, \ldots \infty\) is called orthonormal against \(\uprho(z)\)provided by Eq. 3.10, where \(\updelta_{mn}\) is a Kronecker symbol, z has no dimension value.

$$({\upvarphi }_{m} ,{\upvarphi }_{n} ) = \int {\upvarphi_{m} (z)\upvarphi_{n} (z)} \;\uprho(z)dz =\updelta_{mn}$$
(3.10)

For weight function \(\uprho(z)\), ratios \( \uprho(z) \ge 0 \, {\rm and} 0 \, \le \int_{a}^{b} {\uprho(z)dz < \infty } \) exist. The system of functions \(\left\{ {{\upvarphi }_{k} (z)} \right\}\), \(\left\{ {\psi_{k} (z)} \right\}\), \(k = 0,1, \ldots ,k^{\prime}\) is called bi-orthogonal against weight \(\uprho(z)\) (Eq. 3.11).

$$({\upvarphi }_{m} ,\psi_{n} ) = \int {{\upvarphi }_{m} (z)\psi_{n} (z)}\uprho(z)dz =\updelta_{mn}$$
(3.11)

The full and closed set of functions \(\left\{ {{\upvarphi }_{k} (z)} \right\}\) is of great interest.

In the following Sects. 3.3.1 and 3.3.2, the polynomial transform and the discrete transforms are discussed, respectively.

3.3.1 Polynomial Transforms

An important class of functions \(\left\{ {{\upvarphi }_{k} (z)} \right\}\) with the properties of orthogonality, completeness and closure is orthogonal polynomials \(p_{k} (z)\) satisfying the Eq. 3.12, where \(d_{m}\) is a norm of polynomial \(p_{m} (z)\).

$$\int\limits_{a}^{b} {\uprho(z)p_{m} (z)p_{k} (z)dz = d_{m}\updelta_{km} }$$
(3.12)

To expand the signals, one can apply the following conventional orthogonal polynomials [11, 20, 21]:

  • The Hermitian polynomials \(H_{m} (z)\), Eq. 3.13.

$$\rho (z) = \exp ( - z^{2} )\quad p_{m} (z) = H_{m} (z)\quad - \infty < z < \infty$$
(3.13)
  • The Laguerre polynomials, Eq. 3.14.

$$\uprho(z) = z^{\upalpha} \exp ( - z)\quad p_{m} = L_{m}^{\upalpha} (z)\quad 0 \le z < \infty$$
(3.14)
  • The Jacobi polynomials \(p_{m} (z) = P_{m}^{{(\upalpha,\upbeta)}} (z){\kern 1pt}\), Eq. 3.15.

$$\uprho(z) = (1 - z)^{\upalpha} (1 + z)^{\upbeta} {\kern 1pt} \quad\upalpha > - 1\quad\upbeta > - 1\quad - 1 \le z \le 1\quad p_{m} (z) = P_{m}^{{(\upalpha,\upbeta)}} (z)$$
(3.15)

The Jacobi polynomials form a wide group of orthogonal polynomials. The important particular cases of Jacobi polynomials are:

  • The Legendre polynomials

$$P_{n} (z)\quad {\kern 1pt} (\upalpha =\upbeta = 0)\quad\uprho(z) = 1.$$
  • The Chebyshev polynomials of the 1st and 2nd types provided by Eq. 3.16.

$$\begin{aligned} & T_{n} (z)\quad (\upalpha =\upbeta = - 1/2)\quad\uprho(z) = (1 - z^{2} )^{ - 1/2} \\ & U_{n} (z)\quad (\upalpha =\upbeta = 1/2)\quad\uprho(z) = (1 - z^{2} )^{1/2} \\ \end{aligned}$$
(3.16)

For practical use, the most convenient Jacobi polynomials are the polynomials of Legendre and Chebyshev of the 1st and 2nd types.

The Chebyshev polynomials of the 1st type have a number of useful properties so that their use to expand the signals is very attractive. The Chebyshev polynomial of the 1st type \(T_{m} (z)\) associated with weight function \(\uprho(z) = 1/\sqrt {1 - z^{2} } {\kern 1pt}\) can be determined variously. One of the most appropriate means has a view of Eq. 3.17.

$$T_{m} (z) = \cos (m \cdot \arccos (z)){\kern 1pt}$$
(3.17)

For the Chebyshev polynomials \(T_{m} ( - z) = ( - 1)^{m} \cdot T_{m} (z)\). According to \(T_{m + 1} (z) = 2 \cdot z \cdot T_{m} (z) - T_{m - 1} (z)\), one can get Eq. 3.18.

$$T_{0} (z) = 1 \quad T_{1} (z) = z\quad T_{2} (z) = 2z^{2} - 1\quad T_{3} (z) = 4z^{3} - 3z \quad T_{4} (z) = 8z^{4} - 8z^{2} + 1$$
(3.18)

The orthogonality condition of functions Tm(z) is provided by Eq. 3.18.

$$\int\limits_{ - 1}^{1} {\frac{{T_{m} (z)T_{k} (z)}}{{\sqrt {1 - z^{2} } }}} \;dz = \left\{ {\begin{array}{*{20}l} 0 \hfill & {k \ne m} \hfill \\\uppi \hfill & {k = m = 0} \hfill \\ {\frac{\uppi }{2}} \hfill & {k = m \ne 0} \hfill \\ \end{array} } \right.$$
(3.19)

Nulls of the Chebyshev polynomials are easily determined from Eq. 3.17, which can be re-written as Eq. 3.20.

$$T_{m} (z) = \cos (m \cdot \arccos (z)){\kern 1pt} = 0$$
(3.20)

Based on the above equations, one can get Eq. 3.21.

$$z_{k} = \cos \frac{2k + 1}{2m}\uppi \quad k = 0,1,2, \ldots ,m - 1$$
(3.21)

Expansion of function \(f(z)\) in the Chebyshev polynomials of the 1st type has a form of Eq. 3.22, where \(d_{m} = \left\{ {\begin{array}{*{20}l} {\uppi/2} \hfill & {m \ne 0} \hfill \\\uppi \hfill & {m = 0} \hfill \\ \end{array} } \right.\) is a norm of the Chebyshev polynomials.

$$\begin{aligned} C_{m} & = \frac{1}{{d_{m} }}\int\limits_{ - 1}^{1} {\frac{{f(z)T_{m} (z)}}{{\sqrt {1 - z^{2} } }}dz} {\kern 1pt} \\ f(z) & = \sum\limits_{m = 0}^{\infty } {C_{m} T_{m} (z)} \\ \end{aligned}$$
(3.22)

The expansion in the Chebyshev polynomials of the 1st type \(T_{m} (z)\) is the most converged among all possible expansions in degrees \(z^{k}\), \(k = 0,1, \ldots \infty\).

The above relations are generalized for a 2D case. Let \(\varOmega\) be a region of the 2D Euclidian space and \(z = (z_{1} ,z_{2} )\) be a point in this space. The basis of function orthonormality can be defined by scalar product (Eq. 3.23).

$$\int\limits_{\Omega } {\uprho(z){\upvarphi }_{km} (z)\uppsi_{r\,n} (z)dz = \delta_{kr} \delta_{mn} }$$
(3.23)

The system of functions \({\upvarphi }_{km} (z)\), \(\uppsi_{r,n} (z)\) is bi-orthogonal. The system of functions \(\left\{ {{\upvarphi }_{km} (z),\uppsi_{{r{\kern 1pt} n}} (z)} \right\}\) depends on the form of weight function \(\uprho(z)\) and geometry of region \(\Omega\). Then for function \(f(z)\) with the finite norm and weight \(\uprho(z)\), there are possible two equal presentations by Eqs. 3.243.25.

$$C_{km} = \int\limits_{\Omega } {\uprho(z)f(z){\upvarphi }_{km} (z)dz} \quad f(z) = \sum\limits_{k,m} {C_{km}\uppsi_{km} (z)}$$
(3.24)
$$B_{{r{\kern 1pt} n}} = \int\limits_{\Omega } {\uprho(z)f(z)\uppsi_{{r{\kern 1pt} n}} (z)dz} {\kern 1pt} \quad f(z) = \sum\limits_{r,n} {B_{{r{\kern 1pt} n}} {\upvarphi }_{{r{\kern 1pt} n}} (z)} {\kern 1pt}$$
(3.25)

It should be noted, that a bi-orthogonality is a typical feature of multi-dimensional expansions.

Processing implementation is greatly simplified, when functions \({\upvarphi }_{km} (x,y) = {\upvarphi }_{k} (x){\upvarphi }_{m} (y)\) are factorized. For a signal \(f(x,y,\tau )\), a pair of transforms is performed by Eq. 3.26, where (x, y) is a non-normalized coordinates of field’s point, \(\uptau\) is a vector parameter of shift, rotation and other affine transformations.

$$f(x,y,\tau ) = \sum\limits_{m} {\sum\limits_{k} {C_{mk} (\uptau){\upvarphi }_{m} (x} )} {\upvarphi }_{k} (y)\quad C_{mk} (\uptau) = \int {\int\limits_{\Omega } {f(x,y,\tau ){\upvarphi }_{m} (x){\upvarphi }_{k} (y)} } dxdy{\kern 1pt}$$
(3.26)

If \(a_{x} ,a_{y}\) denote a typical size of sub-region \(\Omega\), \(z_{1} = x/a_{x} ,\;z_{2} = y/a_{y}\), then Eq. 3.26 can be re-written by Eq. 3.27, where \(d_{m}\) is a norm of orthogonal \(p_{m} (z)\) polynomials with weight \(\uprho(z)\).

$$f(x,y) = \sum\limits_{m,k} {C_{mk} p_{m} (x/a_{x} )p_{k} (y/a_{y} )} {\kern 1pt}$$
(3.27)
$$\begin{aligned} C_{mk} & = (d_{m} d_{k} )^{ - 1} \iint\limits_{\Omega} {f(a_{x} z_{1} ,a_{y} z_{2} )\uprho(z_{1} )p_{m} (z_{1} )\uprho(z_{2} )p_{k} (z_{2} )dz_{1} dz_{2} } \\ & = (d_{m} d_{k} )^{ - 1} \int {\uprho(z_{1} )p_{m} (z_{1} )dz_{1} \int {f(a_{x} z_{1} ,a_{y} z_{2} )\uprho(z_{2} )p_{k} (z_{2} )dz_{2} } } \\ \end{aligned}$$
(3.28)

It is obvious from Eqs. 3.273.28, that the spectral coefficients are determined by a sequential integration of x, y coordinates that is greatly simplify the computation.

3.3.2 Discrete Transforms

Let us analyze a generality and a difference of continuous and discrete transforms (Table 3.1). Here \(\left\{ {\upphi_{k} (n)} \right\},{\kern 1pt} {\kern 1pt} {\kern 1pt} n = 0 \ldots N - 1\) is a complex basis of orthonormal vectors. In the general case, it is necessary to use a bi-orthogonal basis \(\left\{ {{\vec{\upvarphi }}_{k} (n)} \right\},\quad \left\{ {{\vec{\psi }}_{k} (n)} \right\}\) to expand a signal.

Table 3.1 The difference of continuous and discrete transforms

Let us consider a vector-matrix transform notation. Let vector \({\mathbf{F}}\) be a column of signal samples, as show in Eq. 3.29.

$${\mathbf{F}} = \left( {\begin{array}{*{20}c} {f(0)} \\ : \\ {f(N - 1)} \\ \end{array} } \right)$$
(3.29)

The transform matrix can be written by Eq. 3.30.

$${\varvec{\Phi}} = \left( {\begin{array}{*{20}c} {\upvarphi_{0} (0)} & {\upvarphi_{0} (1)} & \ldots & {\upvarphi_{0} (N - 1)} \\ {\upvarphi_{1} (0)} & {\upvarphi_{1} (1)} & \ldots & {\upvarphi_{1} (N - 1)} \\ \ldots & \ldots & \ldots & \ldots \\ {\upvarphi_{N - 1} (0)} & {\upvarphi_{N - 1} (1)} & \ldots & {\upvarphi_{N - 1} (N - 1)} \\ \end{array} } \right)$$
(3.30)

A pair of signal-to-spectrum transform in the matrix form has a view of Eq. 3.31.

$${\mathbf{C = {\varvec{\Phi}} F}}\quad {\mathbf{F = {\varvec{\Phi}} }}^{ - 1} {\mathbf{C}} \Rightarrow {\varvec{\Phi}}^{T} {\mathbf{C}}$$
(3.31)

If \({\varvec{\Phi}}^{ - 1} = {\varvec{\Phi}}^{T} , \;\det \left( {\varvec{\Phi}} \right) = \pm 1\), then \({\varvec{\Phi}}\) is an orthogonal matrix and any two lines of it are orthogonal vectors. For a bi-orthogonal matrix transforms, Eq. 3.32 can be written.

$${\mathbf{C ={\varvec{\Phi}} F}}\quad {\mathbf{F = \Psi C}}\quad {\mathbf{\Psi{\varvec{\Phi}} }} = 1$$
(3.32)

Let us consider a discrete variant of 2D transform. In this case the function is setup on a 2D discrete point grid, Eq. 3.33.

$$f(x,y) \Rightarrow f(i,\,j)\quad i,\,j = 0 \ldots N - 1$$
(3.33)

The samples form a square matrix \({\mathbf{F}} = \left[ {f(i,\,j)} \right]\). The same grid could be uniform or it can be formed non-uniformly by a special law.

A pair of transforms of a signal matrix to a spectral matrix is determined by Eq. 3.34.

$$C_{km} = \sum\limits_{i,\,j}^{{}} {f(i,\,j){\upvarphi }_{km}^{*} (} i,\,j)\quad f(i,\,j) = \sum\limits_{k,m}^{{}} {C_{km} {\upvarphi }_{km} (i,\,j)}$$
(3.34)

If the discrete basis functions are factorized, then Eq. 3.35 is accomplished.

$${\upvarphi }_{km} (i,\,j) = {\upvarphi }_{k} (i){\upvarphi }_{m} (j)$$
(3.35)

Expansion and synthesis are reduced to serial operations in i and j. This operation is of the form below in the matrix form, Eq. 3.36.

$${\mathbf{C = {\varvec{\Phi}} F{\varvec{\Phi}} }}^{T} \quad {\mathbf{F = {\varvec{\Phi}} }}^{{{\mathbf{ - 1}}}} {\mathbf{C({\varvec{\Phi}} }}^{{{\mathbf{ - 1}}}} {\mathbf{)}}^{T}$$
(3.36)

Let us consider the Discrete Fourier Transform (DFT). The basis orthogonal functions [1, 18, 19] have the form of Eq. 3.37.

$${\tilde{\upvarphi }}_{k} (n) = \exp \left( {j2\uppi\frac{kn}{N}} \right) = W^{kn} \quad W = \exp \left( {j\frac{{2\uppi}}{N}} \right)\quad n = 0 \ldots N - 1\quad k = 0 \ldots N - 1$$
(3.37)

The orthogonality condition and the functions’ norm are defined by Eq. 3.38.

$$\sum\limits_{n = 0}^{N - 1} {W^{kn} W*^{ln} } = \left\{ {\begin{array}{*{20}l} 0 \hfill & {k \ne l} \hfill \\ N \hfill & {k = l} \hfill \\ \end{array} } \right.$$
(3.38)

Then the pair of transforms with non-symmetric normalizing coefficients is computed by Eq. 3.39.

$$C_{k} = \frac{1}{N}\sum\limits_{n = 0}^{N - 1} {f(n)W^{kn} } \quad f(n) = \sum\limits_{k = 0}^{N - 1} {C_{k} W^{ - kn} }$$
(3.39)

For the DFT, there are other variants of normalizing coefficients before the sum up, for example, the expressions from Eq. 3.40.

$$C_{k} = \sum\limits_{n = 0}^{N - 1} {f(n)W^{kn} } {\kern 1pt} \quad f(n) = \frac{1}{N}\sum\limits_{k = 0}^{N - 1} {C_{k} W^{ - kn} }$$
(3.40)

When the DFT is applied, it is essential that a discrete signal \(f(n)\) is considered to be periodically extended with the period of N, while spectrum \(C_{k}\) is also discrete and periodic with the period of N. Therefore, a condition is imposed on border frequency \(\Delta f_{m}\) of continuous signal \(f(t)\) and sampling step T: \(\Delta f_{m} T \le 0.5\). The DFT with reduced number of operations is called Fast Fourier Transform (FFT) [1, 17, 22]. If a signal \(f(n)\) is true, then Eq. 3.41 is executed.

$$\left| {C_{k} } \right| = \left| {C_{N - k} } \right|\quad \arg (C_{k} ) = - \arg {\kern 1pt} (C_{N - k} )$$
(3.41)

Therefore, the FFT for such signal calculates only a half of spectral coefficients.

Let us consider the Discrete Cosine Transform (DCT) in two cases—one-dimensional and two-dimensional DCT.

One-dimensional DCT.

A non-normalized basis functions [1, 18, 19] are determined by Eq. 3.42, where \(n = 0 \ldots N - 1\).

$${\tilde{\upvarphi }}_{k} (n) = \cos \left( {{\uppi}k\frac{2n + 1}{2N}} \right) = \cos \left( {{\uppi}k\frac{n + 0.5}{N}} \right)$$
(3.42)

The parameter N may be both even and odd. However, the transform with even N [1] is often used in practice. The orthogonality condition for these functions is provided by Eq. 3.43.

$$\sum\limits_{n = 0}^{N - 1} {{\tilde{\upvarphi }}_{k} (n){\tilde{\upvarphi }}_{l} (n)} = \left\{ {\begin{array}{*{20}c} {N\quad\quad\quad\, k = l = 0} \\ {N/2\quad\quad\quad k = l \ne 0} \\ {\!\!\!\!\!\!\!\!\!\!0\quad\quad\quad\, k \ne l} \\ \end{array} } \right.$$
(3.43)

During the calculation of a signal spectrum, the orthonormal basis functions is applied as it shown in Eq. 3.44.

$${\upvarphi }_{k} (n) = \sqrt \frac{2}{N} \cdot g_{k} \cdot \cos \left( {{\uppi}k\frac{n + 0.5}{N}} \right)\quad g_{k} = \left\{ {\begin{array}{*{20}l} {\sqrt {0.5} } \hfill & {k = 0} \hfill \\ 1 \hfill & {k \ne 0} \hfill \\ \end{array} } \right.$$
(3.44)

In this case, the direct and the reverse DCTs are expressed by Eqs. 3.453.46.

$$C_{k} = \sqrt \frac{2}{N} \cdot g_{k} \sum\limits_{n = 0}^{N - 1} {f(n)\cos \left( {{\uppi}k\frac{n + 0.5}{N}} \right)}$$
(3.45)
$$f(n) = \sqrt \frac{2}{N} \sum\limits_{k = 0}^{N - 1} {g_{k} C_{k} \cos \left( {{\uppi}k\frac{n + 0.5}{N}} \right)}$$
(3.46)

Two-Dimensional DCT.

The transforms with symmetric normalization of direct and reverse transform matrices have a view of Eq. 3.47, where \({\mathbf{F}} = \left[ {f(i,\,j)} \right]\) is a source block and C = [C km ] is its spectrum.

$${\mathbf{C}} = {\mathbf{ {\varvec{\Phi}} F {\varvec{\Phi}} }}^{T} \quad {\mathbf{F = {\varvec{\Phi}} }}^{{\mathbf{T}}} {\mathbf{C{\varvec{\Phi}} }}$$
(3.47)

The matrices of the direct \({\varvec{\Phi}}\) and reverse DCT coincide, Eq. 3.48.

$${\varvec{\Phi}} = \left[ {{\upvarphi }_{k} (n)} \right] = \sqrt \frac{2}{N} \left[ {\begin{array}{*{20}c} {\sqrt {0.5} } \\ {\cos ({\uppi}k\frac{(n \,+ \,0.5)}{N}} \\ \end{array} } \right]\quad \left[ {\begin{array}{*{20}c} {k = 0} \\ {k \ne 0} \\ \end{array} } \right]\quad k = 0 \ldots N - 1\quad n = 0 \cdots N - 1$$
(3.48)

Matrix \({\varvec{\Phi}}\) is square and orthogonal \({\varvec{\Phi}}^{ - 1} = {\varvec{\Phi}}^{T}\).

Integer cosine (pseudo-cosine) transform.

Standard H.264 applies a pseudo-cosine transform for blocks transformations. For 4 × 4 block, the DCT matrix is determined by Eq. 3.49, where \(k,n = 0 \ldots 3\).

$$\begin{aligned} {\varvec{\Phi}} & = \sqrt \frac{1}{2} \cdot \left[ {\begin{array}{*{20}c} {\sqrt {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}} } \\ {\cos (\frac{(2 \cdot n \,+\, 1) \cdot k \cdot \pi }{2 \cdot 4})} \\ \end{array} } \right] \\ & = \frac{1}{2} \cdot \left[ {\begin{array}{*{20}c} 1 & 1 & 1 & 1 \\ {\sqrt {1 + \frac{1}{\sqrt 2 }} } & {\sqrt {1 - \frac{1}{\sqrt 2 }} } & { - \sqrt {1 - \frac{1}{\sqrt 2 }} } & { - \sqrt {1 + \frac{1}{\sqrt 2 }} } \\ 1 & { - 1} & { - 1} & 1 \\ {\sqrt {1 - \frac{1}{\sqrt 2 }} } & { - \sqrt {1 + \frac{1}{\sqrt 2 }} } & {\sqrt {1 + \frac{1}{\sqrt 2 }} } & { - \sqrt {1 - \frac{1}{\sqrt 2 }} } \\ \end{array} } \right] \\ \end{aligned}$$
(3.49)

It is proposed to use integer matrix H [6] instead of matrix \({\varvec{\Phi}}\), Eq. 3.50.

$${\mathbf{H}} = \frac{1}{2} \cdot \left[ {\begin{array}{*{20}c} 1 & 1 & 1 & 1 \\ 2 & 1 & { - 1} & { - 2} \\ 1 & { - 1} & { - 1} & 1 \\ 1 & { - 2} & 2 & { - 1} \\ \end{array} } \right]$$
(3.50)

On the one hand, such replacement speeds up the integer operations of digital signal transform. On the other hand, this breaks a transform orthogonality. Such break should be compensated by the extra transforms in further steps.

The Hadamard system for Walsh functions.

The Walsh-Hadamard discrete basis is appropriate for computing. It is based on the Hadamard matrix. The matrix is calculated by the recurrent scheme, Eqs. 3.513.52.

$${\text{A}}_{{2^{\text{n}} }} = \left( {\begin{array}{*{20}c} {{\text{A}}_{{2^{{({\text{n}} - 1)}} }} } & {{\text{A}}_{{2^{{({\text{n}} - 1)}} }} } \\ {{\text{A}}_{{2^{{({\text{n}} - 1)}} }} } & { - {\text{A}}_{{2^{{({\text{n}} - 1)}} }} } \\ \end{array} } \right)$$
(3.51)
$${\text{A}}_{1} = 1\quad {\text{A}}_{2} = \left( {\begin{array}{*{20}c} 1 & 1 \\ 1 & { - 1} \\ \end{array} } \right)\quad {\text{A}}_{4} = \left( {\begin{array}{*{20}c} {{\text{A}}_{2} } & {{\text{A}}_{2} } \\ {{\text{A}}_{2} } & { - {\text{A}}_{2} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} 1 & 1 & 1 & 1 \\ 1 & { - 1} & 1 & { - 1} \\ 1 & 1 & { - 1} & { - 1} \\ 1 & { - 1} & { - 1} & 1 \\ \end{array} } \right)$$
(3.52)

Any line (column) of the Hadamard matrix is a discrete sample of the Walsh function of any order. The Hadamard matrix structure results in that only signal sample summation and subtraction operations are performed during the orthogonal transform. However, a convergence rate of series by the Walsh-Hadamard basis is less than the DCT provides. Besides, the spectrums are often non-monotonous, when this basis is applied.

Discrete Chebyshev GDCT.

The above mentioned discrete orthogonal transforms apply a uniform signal sample grid. However, a sample grid with a special non-uniformity allows to get the fast converged generalized Fourier series. The discrete Chebyshev transform called Generalized DCT (GDCT) [912] belongs to this type of transforms. It is a particular case of transforms in the orthogonal polynomials.

To calculate integrals in the orthogonal polynomial transforms, it is proposed to use a Gaussian type quadrature formula with the highest algebraic accuracy [20] expressed by Eq. 3.53, where \(z_{i}\) are nulls of orthogonal polynomial \(p_{N} (z)\) with weight \(\uprho(z)\), \(\uplambda_{i}\) are the Christoffel numbers.

$$\int {f(z)\uprho(z)dz} = \sum\limits_{i = 1}^{N} {\uplambda_{i} f(z_{i} )}$$
(3.53)

Knots and weights \(\left\{ {z_{i} } \right\},\left\{ {\lambda_{i} } \right\}\) are clearly defined by the form of polynomial \(p_{N} (z)\). In general, Eq. 3.53 expands in orthogonal polynomial system. A particular case of Eq. 3.53, when the Chebyshev polynomials are used, is called the Gauss-Chebyshev (Meller) formula [20] presented in a view of Eq. 3.54 form, where \(z_{i} = \cos \left( {{\uppi}(i + 0.5)/N} \right)\) are nulls of Chebyshev polynomial \(T_{N} (z) = 0\), \(\uplambda_{i} = {\uppi}/N = {\text{const}}\).

$$\int {\frac{f(z)}{{\sqrt {1 - z^{2} } }}dz} = \frac{{\uppi}}{N}\sum\limits_{i = 0}^{N - 1} {f(z_{i} )}$$
(3.54)

For the Chebyshev polynomials, the direct and the reverse transforms (Eq. 3.22) can be applied (Eq. 3.54) (one-dimensional variant for normalized interval \(z \in \left[ { - 1,1} \right]\)). Then Eqs. 3.553.56 will be received [9, 11], where \(g_{m} = 1\) with \(m > 0\) and \(g_{m} = \sqrt {0.5}\) with m = 0.

$$C_{m} = g_{m} \cdot \sqrt \frac{2}{N} \sum\limits_{i = 0}^{N - 1} {f(z_{i} ) \cdot \cos ({\uppi}m\frac{i + 0.5}{N})}$$
(3.55)
$$Y_{M} (z) = \sqrt \frac{2}{N} \sum\limits_{m = 0}^{M - 1} {g_{m} \cdot C_{m} \cdot \cos (m \cdot \arccos (z))}$$
(3.56)

In accordance with Eqs. 3.553.56, the sample points and the Chebyshev samples \(z_{i} = \cos \left( {{\uppi}(i + 0.5)/N} \right)\) of signal \(f(z)\) are taken non-uniformly (Fig. 3.2). A synthesis (recovery) of signal \(Y(z)\) by M spectral components is performed in random point \(z \in [ - 1,1]\), but not within a discrete set of sample points as this is done in the DCT. During recovery, any sample grid can be used, for example, uniform \(z_{n} = 2n/(L - 1) \, - 1\), \(n = 0 \ldots L - 1\), if \(L \ne N\). The recovered image is a subject to geometric scaling to downwards the size and upwards the size as well.

Fig. 3.2
figure 2

The Chebyshev block sampling: a \(N1/N = 8/6\), b \(N1/N = 12/8\)

In the case of 2D GDCT N × N, the Chebyshev samples of the image are taken within the sample block of N1 × N1 points (pixels) by Eq. 3.57.

$$\begin{aligned} x_{i} & = 0.5(N1 - 1) \cdot \left( {1 + \cos \left( {{\uppi}(i + 0.5)/N} \right)} \right) \\ y_{j} & = 0.5 \cdot (N1 - 1) \cdot (1 + \cos ({\uppi} \cdot (j + 0.5)/N)) \\ & \quad i,\,j = 0 \ldots N - 1 \\ \end{aligned}$$
(3.57)

This matrix is transformed to spectral coefficient matrix C of M × M size with the use of direct transform rectangular matrix of M × N size. In the case of reverse transform, a rectangular matrix of L × M size can be applied. That means that a recovered block \({\mathbf{R}} = \left[ {r_{ij} } \right]\) (i, j = 1 … L) size is L × L. The Chebyshev samples can be obtained by using linear interpolation by the nearest pixels [20, 23].

The direct and the reverse Chebyshev transforms (GDCT) in the matrix form are defined by the operations Eq. 3.58, i.e. the transform Eq. 3.36 falls into the bi-orthogonal transform class.

$${\mathbf{C = {\varvec{\Phi}} F {\varvec{\Phi}} }}^{\text{T}} \quad {\mathbf{F = \Psi }}^{{\mathbf{T}}} {\mathbf{C\Psi }}$$
(3.58)

The matrix \({\varvec{\Phi}}\) is a direct transform matrix (Eq. 3.59).

$${\varvec{\Phi}} = \sqrt \frac{2}{N} \cdot \left[ {\begin{array}{*{20}c} {\sqrt {0.5} } \\ {\cos \left( {\uppi \frac{(i \,+ \,0.5) \cdot m}{N}} \right)} \\ \end{array} } \right].\quad m = 0 \ldots M - 1\quad i = 0 \ldots N - 1$$
(3.59)

The matrix \({\varvec{\Psi}}\) is a reverse transform matrix (Eq. 3.60).

$${\varvec{\Psi}} = \sqrt \frac{2}{N} \cdot \left[ {\begin{array}{*{20}c} {\sqrt {0.5} } \\ {\cos (m \cdot \arccos \left( {\frac{2 \cdot n}{L - 1} - 1} \right))} \\ \end{array} } \right].\quad m = 0 \ldots M - 1\quad n = 0 \ldots L - 1$$
(3.60)

It is obvious from Eqs. 3.593.60, that in the general case the GDCT matrices \({\varvec{\Phi}}\) and \({\varvec{\Psi}}\) are rectangular. The direct GDCT matrix coincides with the direct DCT matrix (Eq. 3.48) with M = N. The reverse transforms differ from each other. It should be noted that in the DCT, the transform matrices are square and have the size of (N × N = M × M) as opposed to the GDCT, where M does not equal N in the general case [11, 12]. The GDCT is much close to the ideal de-correlation Karhunen-Loeve [1, 19] transform among all analyzed orthogonal transforms.

3.4 Spectral Image Variation Detection

Let a field \(f^{(i)} ({\mathbf{r}},t_{i} )\) being fragment \(u({\mathbf{r}})I_{\Omega } ({\mathbf{r}})\updelta(t - t_{i} )\) of a space-time signal (dynamic image) at discrete time moment t i , i = 0, 1, … be observed in a sub-region \({\mathbf{r}} = (x,y) \in\Omega\) (a block of any frame). Here, \(I_{\Omega } ({\mathbf{r}})\)is a sub-region indication function. Additionally let us assume that the white Gaussian noise \(\upeta(x,y) \equiv\upeta({\mathbf{r}})\) is available into an image. After comparing a separate block of the (i–1)th and ith frames, one can hypothesize (Eq. 3.61).

$$\begin{aligned} H_{0} & :\upxi(\mathbf{r}) = f_{1} (\mathbf{r}) + \eta (\mathbf{r})\quad f_{1} (\mathbf{r}) = f^{(i)} (\mathbf{r}) = f^{(i - 1)} (\mathbf{r}) \\ H_{1} & :\upxi(\mathbf{r}) = f_{2} (\mathbf{r}) + \eta (\mathbf{r})\quad f_{2} (\mathbf{r}) = f^{(i)} (\mathbf{r}) \ne f_{1}^{(i - 1)} (\mathbf{r}) \\ \end{aligned}$$
(3.61)

Basing on observation of \(\xi ({\mathbf{r}})\), it is necessary to accept or reject the main hypothesis about a block image invariance. If two images are compared rather than video sequence frames are observed, then \(f_{1} ({\mathbf{r}})\) is an unvaried texture in frames and \(f_{2} ({\mathbf{r}})\) is a texture different from \(f_{1} ({\mathbf{r}})\).

After checking the sophisticated hypotheses, signal \(f_{1} ({\mathbf{r}}),\;f_{2} ({\mathbf{r}})\) are formed, and a structure uncertainty arises. Generally, the processing of the unknown form signal cannot be solved without the use of some additional factors. One of the most convenient ways to solve the a priori uncertainty is a signal parameterization. In this case, a signal form uncertainty transforms to a parametric uncertainty, which resolving ways are well designed [13, 19]. The convenient signal parameterization method has its presentation in the form of generalized Fourier series by any orthonormal basis \({\upvarphi }_{km}\) [1, 10, 17, 19].

For different hypotheses, the signals can be presented by Eq. 3.62, where \(\left\langle {\upeta_{k,m} } \right\rangle = 0,\) \(D(\upeta_{k,m} ) = \frac{{N_{0} }}{2} \cdot \left\| {{\upvarphi }_{km} } \right\|^{2}\).

$$\begin{aligned} f_{1} ({\mathbf{r}}) & = \sum\limits_{k,m}^{{}} {C_{km}^{(1)} {\upvarphi }_{km} ({\mathbf{r}})} \quad f_{2} ({\mathbf{r}}) = \sum\limits_{k,m}^{{}} {C_{km}^{(2)} {\upvarphi }_{km} ({\mathbf{r}})} \\ \xi ({\mathbf{r}}) & = \sum\limits_{k,m}^{{}} {X_{km}^{{}} {\upvarphi }_{km} ({\mathbf{r}})} \quad \eta ({\mathbf{r}}) = \sum\limits_{k,m}^{{}} {\upeta_{km} {\upvarphi }_{km} ({\mathbf{r}})} \\ \end{aligned}$$
(3.62)

The spectral coefficients for expansion of \(C_{k,m}^{(2)}\) are assumed to be unknown. The coefficients \(C_{k,m}^{(1)}\) are reference and defined.

Therefore, the hypotheses in a spectral definition are checked by Eq. 3.63.

$$\begin{array}{*{20}l} {H_{0} :X_{km} = C_{km}^{(1)} + \eta_{km} } \\ {H_{1} :X_{km} = C_{km}^{(2)} + \eta_{km} } \\ \end{array}$$
(3.63)

Section 3.4.1 provides the optimal detection algorithm and quasi-optimal algorithms are discussed in Sect. 3.4.2.

3.4.1 Optimal Detection Algorithm

The maximum likelihood algorithm [17, 24] is an asymptotically optimum rule for hypothesis check provided by Eq. 3.64, where L(·) is a log likelihood ratio functional determined by Eq. 3.65.

$$\mathop {\hbox{max} }\limits_{{C^{(2)} }} L(X\left| {C^{(2)} } \right.)\begin{array}{*{20}c} < \\ > \\ \end{array} h_{0}$$
(3.64)
$$L(X\left| {C^{(2)} } \right.) = \ln \left[ {\frac{{W\left[ {X_{k,m} \left| {f_{2} (r|C^{(2)} )} \right.} \right]}}{{W\left[ {X_{k,m} \left| {f_{1} (r|C^{(1)} )} \right.} \right]}}} \right]$$
(3.65)

According to Eq. 3.64 it is required to determine the absolute maximum of log likelihood ratio functional by an unknown vector of parameters \(C^{(2)}\) and make a non-randomized decision in favor of the respective hypothesis.

Now our selected spectral signal form parameterization (Eq. 3.62) is taken into account. To solve the hypothesis, let us check task by Eq. 3.66.

$$L\left( {X\left| {C^{(2)} } \right.} \right) = \left( {1/N_{0} } \right)\left\{ {2\sum\limits_{k,m}^{{}} {X_{km} \left( {C_{km}^{(1)} - C_{km}^{(2)} } \right) - \sum\limits_{k,m}^{{}} {\left( {\left( {C_{km}^{(1)} } \right)^{2} - \left( {C_{km}^{(2)} } \right)^{2} } \right)} } } \right\}$$
(3.66)

The following maximization by \(C_{k,m}^{(2)}\) the decision rule [17] is obtained by Eq. 3.67.

$$(1/N_{0} )\sum\limits_{k,m}^{{}} {\left( {X_{k,m} - C_{k,m}^{(1)} } \right)^{2} } \begin{array}{*{20}c} < \\ > \\ \end{array} h_{0}$$
(3.67)

3.4.2 Quasi-optimal Algorithms

Due to the fact that N 0 determines the interference power and typically is unknown in practice, the normalized statistics can be used, They may be written in the general form [17] by Eq. 3.68.

$$D_{0} = \frac{{\sum\nolimits_{k,m} {(X_{k,m} - C_{k,m}^{(1)} )^{2} } }}{{(C_{0,0}^{(1)} )^{2} }}\;{\text{or}}\;D_{E} = \frac{{\sum\nolimits_{k,m} {(X_{k,m} - C_{k,m}^{(1)} )^{2} } }}{{\sum\nolimits_{k,m} {(C_{k,m}^{(1)} )^{2} } }}{\kern 1pt}$$
(3.68)

As \(C_{00}^{(1)}\) is proportional to mean image brightness in the frame block and \(\sum {\left( {C_{km}^{(1)} } \right)^{2} }\) is a block energy, the statistics \(D_{0} ,D_{E}\) are stable to block brightness variation, i.e. to observation conditions. Taking into account Eq. 3.68, the decision rule (Eq. 3.64) is written by Eq. 3.69.

$$D_{0} \begin{array}{*{20}c} {\gamma_{1} } \\ > \\ < \\ {\gamma_{0} } \\ \end{array} h_{0} \;{\text{or}}\;D_{E} \begin{array}{*{20}c} {\gamma_{1} } \\ > \\ < \\ {\gamma_{0} } \\ \end{array} h_{E} {\kern 1pt}$$
(3.69)

The number of summable summands and order of their selection in statistics \(D_{0} ,D_{E}\) can be setup from peculiarities of a certain task.

3.5 Experimental Research of Structural Similarity Algorithms

Let us consider the applications of structural similarity algorithms for image analysis (Sect. 3.5.1), the experimental research of spectral statistics (Sect. 3.5.2), and the experimental research of MSSIM and MNSSIM1(2) criteria (Sect. 3.5.3).

3.5.1 Practical Using of Pixel and Spectral Algorithms in Image Analysis

The MSSIM, MNSSIM1, and two structural similarity criteria, the \(D_{0} ,D_{E}\) spectral criteria were experimentally studied. Such performance as sensitivity to texture variation, interference tolerance, robustness to observation conditions and decision making threshold selection were tested [16].

In our research for detection of substance phase transitions, when heating, the 75-frames from video sequence were selected. It was taken by Infinity 1-3C digital camera during an experiment on heating cesium chloride (CsCl) sample within the temperature range of (250; 710)°C. (Each frame represents the substance at a certain temperature and temperature variation differs from frame to frame).

Two phase transitions take place in the CsCl sample into this temperature range. The sample surface texture changes sharply (stepwise), and, hence, a texture of successive frames corresponding to this transition also changes abruptly. Figure 3.3 presents the images with small visual differences in the texture of some regions (blocks), which correspond to the first effect, polymorphous transformation of the substance (i.e., transition from one solid state to another followed by reconstructive re-arrangement of the crystal mosaic structure or “grains” in the image).

Fig. 3.3
figure 3

Fragment of video sequence corresponding to the phase transition (polymorphous transformation) of the CsCl sample at temperatures: a 462.8 °C; b 468.4 °C

Figure 3.4 presents the images with noticeable visual differences, which correspond to the second observed effect, substance melting (transition from the solid state to the liquid one followed by crystal fracture and disappearance of grains in the image).

Fig. 3.4
figure 4

Fragment of video sequence corresponding to the phase transition (melting) of the CsCl sample at temperatures: a 642.7 °C; b 645.2 °C

At the phase of transition moments, this image structure should have a step. The MSSIM index, its modification MNSSIM1(2) and spectral metric for detecting changes in video sequence D 0 were used.

Figure 3.5 shows the MSSIM(T), the MNSSIM1(T) and the MNSSIM2(T) plots. Figure 3.6 demonstrates the D 0(T) plot. All these dependences have two clearly discernible peaks (extrema) corresponding to the phase transitions temperatures. It is the left jump that has an important physical meaning, since it represents a phase transition (significant changes appeared); and the right jump means that the changes ceased. The fluctuations (small peaks) are explained by the random changes in frame brightness as well as random changes of the substance structure. The oscillations (see MSSIM(T), MNSSIM1(T), MNSSIM2(T) plot) after the second high peak at 645.2 °C temperature can be attributed to the fact that, upon melting, the changes still proceed on the surface of the substance under observation, thus reflecting on the image texture. It is obvious from Fig. 3.5, that for the given video sequence, the MSSIM, the MNSSIM1, and the MNSSIM2 dependences are similar. Therefore, the MNSSIM1 and the MNSSIM2 plots are almost identical. However, if the analyzed video sequence is subjected to some distortion factors (e.g., pulse noise) during a video recording and an artificial improvement of images, then the MNSSIM1 and the MNSSIM2 criteria seem to be more preferable.

Fig. 3.5
figure 5

The plots of MSSIM(T), MNSSIM1(T), MNSSIM2(T)

Fig. 3.6
figure 6

The plot of D0(T)

Peak values of the MSSIM, the MNSSIM1, theMNSSIM2 and D0, their mean values before the peaks (to estimate jumps) and temperatures of the peaks are provided in Tables 3.2 and 3.3.

Table 3.2 Values before the peaks
Table 3.3 Peak values

The temperature values detected by the analysis of the sharp jumps in the MSSIM(T) the MNSSIM1(T), the MNSSIM2(T), and the D 0(T) dependences, correlate well with data from literature sources [16] T1 = 469 °C, T2 = 645 °C.

Figures 3.5 and 3.6 also result that the MSSIM criterion and its MNSSIM1(2) modification are inferior to spectral criterion the D 0 in the scale of jumps. In the peaks, the MSSIM and the MNSSIM1(2) curves change several-fold, whereas the D 0 curve changes several tens of times. This feature of the spectral criterion can be useful in automatic detection of process change moment. Even such a low automatically set threshold as 6 ÷ 8 min (D 0) ensures that fluctuations of the image brightness and color not followed by texture changes will not belong to useful effects. Due to higher fluctuations of the MSSIM, the MNSSIM1 and the MNSSIM2 criteria, a threshold selection requires the additional data analysis by the researcher.

3.5.2 Experimental Research of Spectral Statistics D 0 and D E

To estimate the algorithm provided by Eq. 3.68, the experiments with real dynamic fields were done [17]. During experiment, values of statistics D 0 and D E (Eq. 3.68) for block of brightness components of two diverted video sequence frames were calculated. Blocks characterizing by such change types as practically unvaried, weakly, or strongly varied were selected as blocks for analysis. Figure 3.7 shows an example of a frame from the analyzed dynamic sequence. The distinctive blocks, where statistics values were computed and differences were detected, are marked with boxes. The GDCT discrete transform is used.

Fig. 3.7
figure 7

Image frames with varied blocks obtained by the full spectrum: a h E  = 0.01, b h E  = 0.04

Figure 3.7 shows a frame of video sequence with varied blocks (marked in boxes) for different thresholds: The results are tuned for h E  = 0.01 and h E  = 0.04 (Fig. 3.7a, b, respectively.) Four different blocks were analyzed and average statistics were calculated for each variation type. The obtained statistics D 0 and D E for the brightness component for the image (Fig. 3.7) are presented in Table 3.4.

Table 3.4 Brightness component’s statistics for the image (Fig. 3.7)

The given results prove that for blocks with low changes, metrics D 0 and D E are quite close. Their difference from metrics for varied (flexible) blocks is 2–3 orders. This allows to select easily the threshold separating flexible and inactive blocks. It should be noted that metric D 0 change range is 1.7–2 times more than that metric D E . Higher change range of statistics D 0 proves its preference. Thresholds \(h_{0}\) and \(h_{E}\) are connected with the following relation:

$$h_{0} = h_{E} \frac{{\sum\nolimits_{k,m} {\left( {C_{k,m}^{\left( 1 \right)} } \right)^{2} } }}{{\left( {C_{0,0}^{\left( 1 \right)} } \right)^{2} }}$$

Table 3.5 presents the h E thresholds averaged by frames, calculated thresholds \(\bar{h}_{0}\), and percentage of varied block.

Table 3.5 Thresholds averaged by frame and percentage of block varied

It is evident from Table 3.5, that 10-fold threshold change results in number of blocks varied in less than 3 times. This phenomenon proves a non-criticality of threshold selection, which separates flexible and inactive blocks being a result of the above described test.

The possibility for use of decision rules (Eq. 3.68) with the truncated spectrum was analyzed in different video sequences. Our analysis demonstrated that detection can be performed by the truncated spectrum and even by one spectral component (mean block brightness). Figure 3.8 shows an example of video sequence frames with varied blocks defined by metrics D E with the full spectrum (N = 64) (see Fig. 3.8a) and one spectral component (N = 1) (see Fig. 3.8b). A threshold value was h E  = 0.01 for both figures.

Fig. 3.8
figure 8

Image frames with varied blocks obtained by full and truncated spectrum: a full spectrum, b truncated spectrum

Table 3.6 provides the percentage of the changed blocks detecting by the whole spectrum (D(N), %) and by one spectral component (D(1), %) as well as by a relative (\({{\Delta D} \mathord{\left/ {\vphantom {{\Delta D} {D_{N} }}} \right. \kern-0pt} {D_{N} }}\)) detection error by one spectral component.

Table 3.6 Percentage of changed blocks

Here, \(\frac{{\Delta D}}{D\left( N \right)} = \frac{D\left( N \right) - D\left( 1 \right)}{D\left( N \right)}\). It is evident from the Table 3.6 that the maximum relative error of variation detection by one spectral component does not exceed 0.16. Therefore, the block changes can be satisfactorily detected by one spectral component. Test performed with test video sequences “Container”, “Foreman”, and “Suzie” demonstrated the similar results.

The spectral video sequence block variation detection algorithm has been used to implement the MGDCT video coding concept [25]. Our proposed concept applies the DCT/GDCT orthogonal transforms in a video codec structure.

To decrease a video codec bit-rate, the algorithm (Eq. 3.68, 3.69) was used to detect the reliable inter-frame video sequence variations. Figures 3.93.10 show video frames with marked blocks that have varied because of moving objects. Figure 3.9 applied “Container” video sequence, and Fig. 3.10 presents a frame of a remote video monitoring system. The detectors apply the GDCT transform.

Fig. 3.9
figure 9

Spectral variation detection in “Container” test sequence

Fig. 3.10
figure 10

Spectral variation detection in the video monitoring system

Combining the Chebyshev sampling and spectral detection allows to reduce information processing in 4–8 times depending on frame variations.

3.5.3 Experimental Research of MSSIM and MNSSIM1(2) Criteria

Notwithstanding that the MSSIM structural similarity criterion is one of the criteria closest to the human vision system; it has drawbacks of certain types. For example, in the case of image blur, pulse noise, or blocking, the MSSIM criterion provides values that are not quite similar with vision system. The MNSSIM 1(2) criteria have not such effects [5]. Let us investigate the MSSIM, MNSSIM1(2) criteria under various artifacts.

The MSSIM, MNSSIM1(2) criteria at pulse interference.

The image called “Lena” was used to test impact of pulse interference on the MSSIM, the MNSSIM1, and the MNSSIM2 criteria. Figure 3.11 shows images disturbed by pulse noise of “pepper” and “salt/pepper” types with probability p = 0.05, and the MSSIM, the MNSSIM1, and the MNSSIM2 criteria values corresponding to them.

Fig. 3.11
figure 11

“Lena” image: a “pepper” noise with p = 0.05, MSSIM = 0.371, MNSSIM1 = 0.899, MNSSIM2 = 0.902, b “salt/pepper” noise with p = 0.05, MSSIM = 0.315, MNSSIM1 = 0.899, MNSSIM2 = 0.898

A pulse noise was setup by two characteristics: an intensity determined by noise probability in pixel p and a noise type with three variants such as “salt”, “pepper”, and “salt/pepper”. The mathematical model of pulse noise for different noise peaks is determined as follows. Let \(X = \left[ {x_{ij} } \right]\) be a non-distorted image and \(Y = \left[ {y_{ij} } \right]\) be a distorted image. For the “pepper” type noise pulse, the model is setup by Eq. 3.70.

$$y_{ij} = \left\{ {\begin{array}{*{20}c} {0:prob.p} \\ {x_{ij} :prob.1 - p} \\ \end{array} } \right\}$$
(3.70)

For the “salt” type noise pulse, the model is setup by Eq. 3.71.

$$y_{ij} = \left\{ {\begin{array}{*{20}c} {255:prob.p} \\ {x_{ij} :prob.1 - p} \\ \end{array} } \right\}$$
(3.71)

For the “salt/pepper” type noise pulse, the model is setup by Eqs. 3.723.73.

$$f_{0} = \left\{ {\begin{array}{*{20}c} {0,prob.0.5:noise {\text{``}}pepper{\text{''}}} \\ {255,prob.0.5:noise {\text{``}}salt{\text{''}}} \\ \end{array} } \right\}$$
(3.72)
$$y_{ij} = \left\{ {\begin{array}{*{20}c} {f_{0} :prob.p} \\ {x_{ij} :prob.1 - p} \\ \end{array} } \right\}$$
(3.73)

Figure 3.12 shows a behavior of criteria depending on probability of pulse interference \(p\). Plots from Fig. 3.12а, b confirm that the MSSIM criterion greatly reduces the image quality at low interference intensity. The MNSSIM1 and the MNSSIM2 matrices demonstrate values more identical to human vision system and close to each other (a human vision filters the low interference intensity). At higher interference intensity, the MSSIM criterion passes to the “saturation” mode. The MNSSIM1(2) criteria values within region of high p values almost reduce, when this parameter increases

Fig. 3.12
figure 12

The MSSIM, MNSSIM1 and MNSSIM2 values versus pulse noise intensity and noise type: a “salt” type of pulse noise, b “salt/pepper” type of pulse noise

The MSSIM and the MNSSIM1(2) criteria at quasi-Gaussian noise.

Let us clarify the term of quasi-Gaussian interference in a digital image. Each of color components RGB takes on a value from 0 to 28 − 1 = 255 at integer 8-bit signal level presentation. Therefore, each color RGB component at quasi-Gaussian interference is formed by Eq. 3.74, where η ij  ∈ N(0, 1) is the independent Gaussian values with zero mean and unit variance for each image pixel, \(\upsigma_{0} \le 1\) is a normalizing factor that control noise variance, [z] is an integral part of number z.

$$y_{\text{ij}} {\text{ = x}}_{\text{ij}} \,{ + }\left[ {\upsigma_{0} \cdot\upeta_{ij} \cdot 2^{5} } \right]{ + }\;2^{7}$$
(3.74)

Coefficient 25 = 32 and deviation 27 = 128 are selected so that resulting values with probability p = 0.995 are within the interval \(0 \le y_{ij} \le 255\). If a resulting value is beyond this interval, i.e. \(y_{ij} < 0\) or \(y_{ij} > 255\), then a rounding is performed to \(y_{ij} = 0\) or \(y_{ij} = 255\), respectively.

The images from database of Laboratory for Image and Video Engineering [26] were used as test images. Subjective quality values of Difference Mean Opinion Score (DMOS) values for images from this database were also obtained. For images noisy with Gaussian noise, the MSSIM, the MNSSIM1, and the MNSSIM2 criteria values as well as the rank Spearman correlation coefficients between these criteria values and DMOS values were calculated.

The rank Spearman correlation coefficients between the MSSIM, the MNSSIM1, and the MNSSIM2 criteria and DMOS values are computed as follows. The calculated values of the MSSIM, the MNSSIM1, and the MNSSIM2 criteria are assigned with ranks, and ranks are also set to respective DMOS values. Then the Spearman correlation coefficient is calculated. A number of elements in sequence n equals 49 for all quality criteria and their respective DMOS values. For each of the MSSIM, the MNSSIM1, and the MNSSIM2 quality criteria, the obtained correlation coefficients were checked. Test results of image “Parrots” are provided by Table 3.7.

Table 3.7 Correlation results for image “Parrots”

It is evident from Table 3.7, that the MSSIM criterion values and variation range are higher than those of the MNSSIM1 and the MNSSIM2 criteria at quasi-Gaussian noise. This is explained by the fact that the MSSIM criteria structure is more suitable for images with Gaussian statistics. All criteria are strongly correlated with DMOS.

With strong image block distortion or blue, the MSSIM criterion produces an overestimate. At the same time the MNSSIM1 and the MNSSIM2 criteria have low values, thus corresponding to low visual quality of the distorted images. Figure 3.13 presents an image with block distortion and high JPEG algorithm compression.

Fig. 3.13
figure 13

“Lena” image recovered after JPEG compression. MSSIM = 0.634, MNSSIM1 = 0.073, MNSSIM2 = 0.079

Possibilities of use of structural similarity criteria to detect changes of actions.

During research, an issue concerning the ability of structural similarity criteria to track the contextual similarity of images was investigated. To solve this task, the MSSIM, the MNSSIM1, and the MNSSIM2 criteria were applied to different images. It was discovered that, when comparing absolutely different images with the same spatial sizes the MSSIM criteria gives unreasonably high values, whereas the MNSSIM1 and the MNSSIM2 criteria values are practically equal to 0. Figure 3.14 shows an example of images compared by the MSSIM, the MNSSIM1, and the MNSSIM2 criteria values. The obtained data proves that the non-parametric modifications are more relevant to the name of structural similarity criterion.

Fig. 3.14
figure 14

Comparison of two images with MSSIM = 0.339, MNSSIM1 = −0.004, MNSSIM2 = −0.006

Comparison of pixel and spectral image analysis algorithms.

Multiple tests with real images and video sequences were made by the authors to discover the features and the abilities provided by the above described algorithms. The following intermediate conclusion can be performed:

  1. 1.

    The MSSIM and the MNSSIM1(2) structural similarity criteria are efficient to detect changes in frame and video sequence fragments, when images are processed without compression.

  2. 2.

    The non-parametric the MNSSIM1(2) criteria require more operations as compared with the MSSIM and the spectral algorithm. However, their values are more compatible with human perception.

  3. 3.

    Among all analyzed algorithms, the MNSSIM1(2) has the highest immunity to pulse and other non-Gaussian interference.

  4. 4.

    The MNSSIM1(2) non-parametric criteria could be applied to determine change in a video sequence scene and to detect external frames added to a video sequence.

  5. 5.

    The algorithms for spectral structural variation detection obtained by maximum likelihood method are optimal with Gaussian interference.

  6. 6.

    The quasi-optimal detection algorithms applying statistics are similar to the optimal ones by their characteristics. They are not sensitive for threshold selection and image type. The spectral algorithms are more sensitive to change of an image type as compared to the MSSIM and the MNSSIM1(2).

  7. 7.

    The difference in statistics values with or without texture variations are tens/hundreds times in case of spectral algorithms and several times in the case of the MSSIM and the MNSSIM1(2) algorithms.

  8. 8.

    As for computation expenses, the spectral and pixel algorithms are approximately similar. They can operate in the real-time mode.

  9. 9.

    The spectral algorithms are very efficient in real-time operation, especially when they are embedded to video codecs. It has been found that these algorithms can operate in the truncated spectrum width with a few components.

  10. 10.

    A promising spectral basis is the discrete Chebyshev transform (GDCT). The GDCT spectrum is the most fast decreasing for orthogonal transforms.

  11. 11.

    A combining of the Chebyshev sampling and a spectral detection allows to reduce information processing in 4–8 times depending on a frame variation nature.

  12. 12.

    To solve a certain task, the proposed algorithms should be selected by their resource consumption, computation automation degree, and image distortions.

3.6 Conclusion

The chapter covers the use of the MSSIM, the MNSSIM1, and the MNSSIM2 criteria and the spectral algorithms D 0 and D E to detect changes in a video sequence or to compare textures of various images. The developed MNSSIM1(2) criteria and the spectral algorithms D 0 and D E can be used in artificial vision systems, in many other scopes connecting with variations of texture, spectral and correlation parameters of recorded signals. To meet a certain challenge, the selection of the proposed algorithm should be determined by the required resource intensity, an automation degree, and availability of image distortions. The proposed algorithms are used to design novel video codecs and intelligent video record systems. To conclude one can emphasize that evolution of basic PSNR, the MSSIM criteria and the spectral criteria, etc. is proceeding in [2731].