Depth Map Reconstruction Based on Features Formed by Descriptor of Stereo Color Pairs

Kravchenko, V. F.; Ponomaryov, V. I.; Pustovoit, V. I.; Rosas-Miranda, D.

doi:10.1134/S1064562419040069

Depth Map Reconstruction Based on Features Formed by Descriptor of Stereo Color Pairs

COMPUTER SCIENCE
Published: 27 September 2019

Volume 100, pages 396–400, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Doklady Mathematics Aims and scope Submit manuscript

Depth Map Reconstruction Based on Features Formed by Descriptor of Stereo Color Pairs

Download PDF

V. F. Kravchenko¹,
V. I. Ponomaryov²,
V. I. Pustovoit³ &
…
D. Rosas-Miranda²

44 Accesses
Explore all metrics

Abstract

An approach to depth map reconstruction from stereo pairs of color images in visualization of three-dimensional objects is proposed and substantiated for the first time. The approach makes use of a novel local image descriptor based on visual primitives and relations between them, namely, cocolority, coplanarity, distance, and angle. The new approach is compared with other well-known descriptors, such as DAISY and SID. Numerical experiments and an analysis and physical interpretation of their results obtained in the case of actual radiometric differences in the exposition or illumination of stereo image pairs have shown that the new approach is superior to other existing descriptors.

Depth Mapping Method Based on Stereo Pairs

A new depth image quality metric using a pair of color and depth images

Article 05 March 2016

A robust method for the reconstruction of disparity maps based on multilevel processing of stereo color image pairs

Article 01 August 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Stereo visualization is an actively developing area in three-dimensional (3D) computer visualization, where the main problem is to estimate the positions of objects in three-dimensional space on the basis of constructed depth maps (DM).

Computer visualization problems include segmentation and detection of objects in 3D by extracting information from multidimensional data in actual systems and transforming these data on the basis of efficient descriptors, which substantially reduce the dimension of the problem. Image contours and relations between them are the basic elements in computer visualization and robotics and represent generalized information on analyzed scenes in the form of a series of features important for recognition. Among descriptors of this type, Daisy and SID are the ones most efficient and widely used in practice.

Based on the ideas described in [1–12], an original method for reconstructing depth maps from stereo color pairs in 3D visualization is proposed and justified. The new image descriptor (VPR) relies on visual primitives and relations between them and characterizes distinctions in color, plane positions, distances, and angles between the primitives. A fundamental difference of VPR from other well-known descriptors is that it uses structural and semantic information. This improves its robust properties, specifically, in the case of nonideal recoding, illumination, and reflection in stereo color pairs. Numerical experiments and an analysis and physical interpretation of their results under actual radiometric differences in the exposition and illumination of stereo image pairs have shown that the new approach is superior to other available descriptors.

FORMULATION OF THE PROBLEM AND SOLUTION METHOD

Our theoretical description of the proposed image descriptor is based on the Riesz transform [9]. For a given two-dimensional signal $f(x,y)$, two odd signal components are formed:

$${{F}_{{01,02}}}(w) = \left\{ \begin{gathered} i\frac{{{{w}_{{x,y}}}}}{{\left\| w \right\|}}F(w),\quad w \ne 0, \hfill \\ 0,\quad w = 0. \hfill \\ \end{gathered} \right.$$

The Log-Gabor filter ${{G}_{e}}(w) = \exp \left. {\left( { - \frac{{\log \left( {\frac{{\left\| w \right\|}}{{{{w}_{0}}}}} \right)}}{{2{{{(\log {{\sigma }_{0}})}}^{2}}}}} \right.} \right)$ is used, for which two odd components G₀₁(w) and G₀₂(w) are found, and the isotropic filter G(w) = $\sqrt {G_{{01}}^{2}(w) + G_{{02}}^{2}(w)} $ is applied.

After filtering, we form a vector of three components (two odd and one even) (so-called monogenic signal), namely, ${{f}_{m}}(x)\, = \,[{{f}_{e}}(x),{{f}_{{01}}}(x),{{f}_{{02}}}(x)]$. In spherical coordinates, this vector is characterized by the length A(x) and two angular coordinates:

$$\begin{gathered} \phi (x) = \arctan \left. {\left( {\frac{{{{f}_{0}}(x)}}{{{{f}_{e}}(x)}}} \right.} \right)\quad {\text{and}} \\ \theta (x) = \arctan \left. {\left( {\frac{{{{f}_{{02}}}(x)}}{{{{f}_{{01}}}(x)}}} \right.} \right). \\ \end{gathered} $$

The 2D visual primitive ${{\Pi }_{{2{\text{D}}}}}(x)$ is determined by these three parameters A(x), $\phi (x)$, and $\theta (x)$, while the 3D visual primitive is characterized by the vector Π_3D(x) = $(A(x),\theta (x),\phi (x),d(x))$, where d determines the position of the primitive in three-dimensional space.

The dependences between two primitives are determined by a number of features. Specifically, cocolority is computed in the color space CIELab as follows:

$${{R}^{{{{C}_{{}}}}}}({{\Pi }_{i}},{{\Pi }_{j}}) = \sqrt {{{{({{L}_{i}} - {{L}_{j}})}}^{2}} + {{{({{a}_{i}} - {{a}_{j}})}}^{2}} + {{{({{b}_{i}} - {{b}_{j}})}}^{2}}} .$$

Other features are characterized by color, position, orientation, and relations between them, namely, by the angle between the primitives,

$${{R}^{A}}({{\Pi }_{i}},{{\Pi }_{j}}) = \arccos \left. {\left( {\frac{{{{\Pi }_{i}}{{\Pi }_{j}}}}{{\left| {{{\Pi }_{i}}} \right|\left| {{{\Pi }_{j}}} \right|}}} \right.} \right);$$

by the normalized distance between them,

$${{R}^{{ND}}}({{\Pi }_{i}},{{\Pi }_{j}}) = \frac{1}{2}\left( {\frac{{\left| {{{W}_{i}} \times {{\Pi }_{j}}} \right|}}{{\left| {{{\Pi }_{j}}} \right|}}} \right. + \left. {\frac{{\left| {{{W}_{j}} \times {{\Pi }_{i}}} \right|}}{{\left| {{{\Pi }_{i}}} \right|}}} \right);$$

and by the coplanarity

$$\begin{gathered} {{R}^{P}}({{\Pi }_{i}},{{\Pi }_{j}}) \\ = \frac{1}{2}\left[ {\pi - \arccos \left( {\frac{{n{{A}^{O}}({{\Pi }_{i}})}}{{\left| n \right|{\text{|}}{{A}^{O}}({{\Pi }_{i}}){\text{|}}}}} \right) - \arccos \left( {\frac{{n{{A}^{O}}({{\Pi }_{j}})}}{{\left| n \right|{\text{|}}{{A}^{O}}({{\Pi }_{j}}){\text{|}}}}} \right)} \right] \\ \end{gathered} $$

(see [13]). The novel descriptor VPR is based on relations between the primitives and is implemented as described in the block diagram of the method presented in Fig. 1, i.e., for a given color image I_RGB, its colors are transformed from RGB to the CIELab space. Cocolority is determined by the monogenic signal formed for S scales and $\Sigma $ kernels (i = 1, 2, ..., S; $j = 1,2,...,\Sigma $) with the use of $S \times \Sigma $ Gaussian filters. Each filter performs the convolution of the image I_L of the channel L in the CIELab space and forms $3 \times S \times \Sigma $ components ${{f}_{m}}(x) = \left[ {{{f}_{e}}(x),{{f}_{{01}}}(x),{{f}_{{02}}}(x)} \right]$ of the monogenic signal for different $S \times \Sigma $ filters with parameters s_i and ${{\sigma }_{j}}$ in the Gaussian filters.

For each pixel, the new descriptor consists of a vector determined by the relations between the primitives $(x,y)$ and $(u,{v})$ in a square window W centered at $(x,y)$: $H{}_{A}(x) = [{{R}_{A}}\{ {{s}_{i}},{{\sigma }_{j}}\} (1,1),.......,{{R}_{A}}\{ {{s}_{i}},{{\sigma }_{j}}\} (u,{v})]$, where R_A is the angle determined by the relations between $(x,y)$ and $(u,v)$. The feature vector for angles between the primitives for $S \times \Sigma $ Gaussian filters is given by the matrix expression

$$D_{A}^{{}}(x,y) = \left[ {\begin{array}{*{20}{c}} \begin{gathered} {{H}_{A}}(x,y)_{{{{s}_{1}}}}^{{{{\sigma }_{1}}}} \hfill \\ .................... \hfill \\ \end{gathered} &\begin{gathered} {{H}_{A}}(x,y)_{{{{s}_{S}}}}^{{{{\sigma }_{1}}}} \hfill \\ ................ \hfill \\ \end{gathered} \\ {{{H}_{A}}(x,y)_{{{{s}_{1}}}}^{{{{\sigma }_{\Sigma }}}}......}&{{{H}_{A}}(x,y)_{{{{s}_{S}}}}^{{{{\sigma }_{\Sigma }}}}} \end{array}} \right].$$

Expressions for the features characterized by the normalized distances $D_{{ND}}^{{}}(x,y)$ and the coplanarity parameters $D_{{CP}}^{{}}(x,y)$ are found in a similar manner:

$$\begin{gathered} D_{{ND}}^{{}}(x,y) = \left[ {\begin{array}{*{20}{c}} \begin{gathered} {{H}_{{ND}}}(x,y)_{{{{s}_{1}}}}^{{{{\sigma }_{1}}}} \hfill \\ .................... \hfill \\ \end{gathered} &\begin{gathered} {{H}_{{ND}}}(x,y)_{{{{s}_{S}}}}^{{{{\sigma }_{1}}}} \hfill \\ ................ \hfill \\ \end{gathered} \\ {{{H}_{{ND}}}(x,y)_{{{{s}_{1}}}}^{{{{\sigma }_{\Sigma }}}}......}&{{{H}_{{ND}}}(x,y)_{{{{s}_{S}}}}^{{{{\sigma }_{\Sigma }}}}} \end{array}} \right], \\ D_{{CP}}^{{}}(x,y) = \left[ {\begin{array}{*{20}{c}} \begin{gathered} {{H}_{{CP}}}(x,y)_{{{{s}_{1}}}}^{{{{\sigma }_{1}}}} \hfill \\ .................... \hfill \\ \end{gathered} &\begin{gathered} {{H}_{{CP}}}(x,y)_{{{{s}_{S}}}}^{{{{\sigma }_{1}}}} \hfill \\ ................ \hfill \\ \end{gathered} \\ {{{H}_{{CP}}}(x,y)_{{{{s}_{1}}}}^{{{{\sigma }_{\Sigma }}}}......}&{{{H}_{{CP}}}(x,y)_{{{{s}_{S}}}}^{{{{\sigma }_{\Sigma }}}}} \end{array}} \right]. \\ \end{gathered} $$

The final descriptor in (x, y) is determined by the above-indicated features: D_VPR(x, y) = [D_A(x, y), D_C(x, y), $D_{{CP}}^{{}}(x,y),D_{{ND}}^{{}}(x,y)]$. Next, depth maps DM are reconstructed using a block matching procedure based on comparing the features of two stereo pairs.

NUMERICAL RESULTS

We studied synthetic benchmark images taken from the Middlebury Stereo Vision website [14, 15]. The 2005 dataset contains a series of different stereo pairs, together with ground-truth (GT) images (true depth maps) in full format (1390 × 1110 pixels), as well as in 1/2 and 1/3 formats. Additionally, the robustness of the new descriptor was examined using the 2014 dataset, which contains 33 stereo pairs divided into three groups (10 for learning, 10 for testing, and 13 additional pairs that are not illustrated by GT). Moreover, for each stereo pair, this dataset provides two images obtained in different illumination (L) or exposition (E) conditions.

All data in the experiments were processed together in order to confirm the efficiency and robustness of the VPR descriptor as applied to the reconstruction of depth maps for stereo pairs obtained in different conditions.

The quality of depth map reconstruction was analyzed using the QBP criterion (the number of bad matching pixels). For each of the studied images DM, the QBP value was computed using the formula

$$QBP = \frac{1}{N}\sum\limits_{(x,y)} {\left( {\left| {D{{M}_{I}}(x,y) - D{{M}_{{GT}}}(x,y)} \right| > {{\delta }_{d}}} \right)} ,$$

((3))

where N is the number of pixels in the image or frame; DM_I and DM_GT are the estimated and true (GT) maps, respectively; and ${{\delta }_{d}} = 1$.

In Fig. 2, the depth maps reconstructed by the VPR descriptor are visually compared with those produced by the DAISY and SID descriptors, which are best available in the literature; here, the traditional block matching technique is used to compute DM in all three methods.

Under different illumination (L) and exposition (E) conditions, the presented images show that the SID descriptor strongly smooths the objects and fails to estimate the depth maps correctly. The depth maps reconstructed by VPR and Daisy demonstrate a similar quality both in terms of a quantitative metric and in a subjective analysis. An analysis of the various images (Fig. 2, Table 1) suggests that the VPR descriptor has the best robustness in processing the inaccuracies of stereo image pairs.

Table 1. QBP values for VPR and other descriptors as applied to depth map reconstruction in the case of nonideal exposition (E) and illumination (L) of stereo pairs

Full size table

Since the traditional block matching technique is used in the depth map reconstruction, the problem of occlusions cannot be resolved directly. The new VPR descriptor can be used in conjunction with other algorithms, such as semiglobal matching or graph cuts.

CONCLUSIONS

An analysis of the numerical results produced by the new method for depth map reconstruction suggests the following important conclusions:

(i) The proposed and substantiated method is based on visual primitives and relations between them, namely, cocolority, coplanarity, distance, and angle between the primitives. The method makes it possible to improve the quality of reconstructed depth maps.

(ii) The performance characteristics of the new VPR descriptor are superior to other descriptors, namely, DAISY and SID, which are widely used in the literature.

(iii) It has been confirmed experimentally that the VPR descriptor demonstrates the best robustness in depth map formation in the case of radiometric differences in the exposition and illumination of stereo image pairs.

REFERENCES

E. Ramos-Diaz, V. Kravchenko, and V. Ponomaryov, EURASIP J. Adv. Signal Process. 41 (1), 1–10 (2011).
Google Scholar
V. F. Kravchenko, V. I. Ponomaryov, and V. I. Pustovoit, Dokl. Phys. 59 (11), 507–511 (2014).
Article Google Scholar
V. F. Kravchenko, V. I. Ponomaryov, and V. I. Pustovoit, Dokl. Phys. 60 (11), 495–499 (2015).
Article Google Scholar
V. F. Kravchenko, V. I. Ponomaryov, V. I. Pustovoit, and S. N. Sadovnichiy, Dokl. Phys. 62 (8), 374–378 (2017).
Article Google Scholar
V. Huitron and V. Ponomaryov, IEEE Lat. Am. Trans. 14 (6), 2968–2973 (2016).
Article Google Scholar
V. Gonzalez-Huilton, V. Ponomaryov, E. Ramos-Diaz, and S. Sadovnychiy, Signal Image Video Process. 12 (2), 231–238 (2018).
Article Google Scholar
D. Rosas, V. Ponomaryov, and R. Reyes, Int. J. Comput. 17 (3), 171–179 (2018).
Google Scholar
V. F. Kravchenko, H. M. Perez-Meana, and V. I. Ponomaryov, Adaptive Digital Processing of Multidimensional Signals with Applications (Fizmatlit, Moscow, 2009).
Google Scholar
M. Felsberg and G. Sommer, IEEE Trans. Signal Process. 49, 3136–3144 (2001).
Article MathSciNet Google Scholar
Y. Xuanzi, Machine Vision Appl. 26 (7–8), 975–990 (2015).
Article Google Scholar
Y. Wan, Z. Miao, Z. Tang, L. Wan, and Z. Wang, IEICE Trans. Inf. Sys. 95 (7), 2021–2024 (2012).
Article Google Scholar
E. Tola, V. Lepetit, and P. Fua, IEEE Trans. Pat. Anal. Mach. Intell. 32 (5), 815–830 (2010).
Article Google Scholar
N. Pugeault, F. Wörgötter, and N. Krüger, Int. J. Human. Robot. 7, 379–405 (2010).
Google Scholar
http://vision.middlebury.edu/stereo/data (November 2013).
http://vision.middlebury.edu/stereo/ (2016).

Download references

Author information

Authors and Affiliations

Kotelnikov Institute of Radio Engineering and Electronics, Russian Academy of Sciences, 125009, Moscow, Russia
V. F. Kravchenko
Instituto Politecnico Nacional, 04430, Mexico, Mexico
V. I. Ponomaryov & D. Rosas-Miranda
Scientific and Technological Center of Unique Instrumentation, Russian Academy of Sciences, 117342, Moscow, Russia
V. I. Pustovoit

Authors

V. F. Kravchenko
View author publications
You can also search for this author in PubMed Google Scholar
V. I. Ponomaryov
View author publications
You can also search for this author in PubMed Google Scholar
V. I. Pustovoit
View author publications
You can also search for this author in PubMed Google Scholar
D. Rosas-Miranda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to V. I. Ponomaryov.

Additional information

Translated by I. Ruzanova

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kravchenko, V.F., Ponomaryov, V.I., Pustovoit, V.I. et al. Depth Map Reconstruction Based on Features Formed by Descriptor of Stereo Color Pairs. Dokl. Math. 100, 396–400 (2019). https://doi.org/10.1134/S1064562419040069

Download citation

Received: 25 March 2019
Published: 27 September 2019
Issue Date: July 2019
DOI: https://doi.org/10.1134/S1064562419040069

Use our pre-submission checklist

Avoid common mistakes on your manuscript.