Fusion of color and hallucinated depth features for enhanced multimodal deep learning-based damage segmentation

Mondal, Tarutal Ghosh; Jahanshahi, Mohammad Reza

doi:10.1007/s11803-023-2155-2

Fusion of color and hallucinated depth features for enhanced multimodal deep learning-based damage segmentation

Published: 20 January 2023

Volume 22, pages 55–68, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Earthquake Engineering and Engineering Vibration Aims and scope Submit manuscript

Fusion of color and hallucinated depth features for enhanced multimodal deep learning-based damage segmentation

Download PDF

Tarutal Ghosh Mondal¹ &
Mohammad Reza Jahanshahi^2,3

257 Accesses
2 Citations
Explore all metrics

Abstract

Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models. However, alongside the advantages, depth-sensing also presents many practical challenges. For instance, the depth sensors impose an additional payload burden on the robotic inspection platforms limiting the operation time and increasing the inspection cost. Additionally, some lidar-based depth sensors have poor outdoor performance due to sunlight contamination during the daytime. In this context, this study investigates the feasibility of abolishing depth-sensing at test time without compromising the segmentation performance. An autonomous damage segmentation framework is developed, based on recent advancements in vision-based multi-modal sensing such as modality hallucination (MH) and monocular depth estimation (MDE), which require depth data only during the model training. At the time of deployment, depth data becomes expendable as it can be simulated from the corresponding RGB frames. This makes it possible to reap the benefits of depth fusion without any depth perception per se. This study explored two different depth encoding techniques and three different fusion strategies in addition to a baseline RGB-based model. The proposed approach is validated on computer-generated RGB-D data of reinforced concrete buildings subjected to seismic damage. It was observed that the surrogate techniques can increase the segmentation IoU by up to 20.1% with a negligible increase in the computation cost. Overall, this study is believed to make a positive contribution to enhancing the resilience of critical civil infrastructure.

Article PDF

Integrated 3D Structural Element and Damage Identification: Dataset and Benchmarking

Fusion of thermal and RGB images for automated deep learning based crack detection in civil infrastructure

Article Open access 18 August 2022

MaDnet: multi-task semantic segmentation of multiple types of structural materials and damage in images of civil infrastructure

Article 08 June 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

ACI 318-11 (2011), Building Code Requirements for Structural Concrete, American Concrete Institute, USA.
Google Scholar
Alexander QG, Hoskere V, Narazaki Y, Maxwell A, Spencer BF (2022), “Fusion of Thermal and RGB Images for Automated Deep Learning Based Crack Detection in Civil Infrastructure,” AI in Civil Engineering, 1(1): 1–10.
Article Google Scholar
Alhashim I and Peter W (2018), “High Quality Monocular Depth Estimation via Transfer Learning,” arXiv preprint arXiv:1812.11941.
Bhoi A (2019), “Monocular Depth Estimation: A Survey,” arXiv preprint arXiv:1901.09402.
Cao ZL, Zhong-Hong Y and Hong W (2015), “Summary of Binocular Stereo Vision Matching Technology,” Journal of Chongqing University of Technology (Natural Science), 29(2): 70–75.
Google Scholar
Chang A, Dai A, Funkhouser T, Halber M, Niessner M, Savva M, Song S, Zeng A and Zhang Y (2017), “Matterport3D: Learning from RGB-D Data in Indoor Environments,” International Conference on 3D Vision (3DV).
Cheng Y, Cai R, Li Z, Zhao X and Huang K (2017), “Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3029–3037.
Deng J, Dong W, Socher R, Li LJ, Li K and Fei-Fei L (2009), “ImageNet: A Large-Scale Hierarchical Image Database,” 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 248–255.
Elkins EB (2020), “Simulating Destruction Effects in SideFX Houdini,” Undergraduate Honors Theses, Paper 524. https://du.etsu.edu/honors/524
Gao Y and Khalid MM (2022), “Deep Learning Visual Interpretation of Structural Damage Images,” Journal of Building Engineering, p. 105144.
Groenendijk R (2020), “On the Benefit of Adversarial Training for Monocular Depth Estimation,” Computer Vision and Image Understanding, 190, p. 102848.
Article Google Scholar
Gunasekar K, Qiang Q and Yezhou Y (2020), “Low to High Dimensional Modality Hallucination Using Aggregated Fields of View,” IEEE Robotics and Automation Letters, 5(2): 1983–1990.
Article Google Scholar
Hazirbas C, Ma L, Domokos C and Cremers D (2016), “FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture,” Asian Conference on Computer Vision. Springer, 213–228.
Hoffman J, Saurabh G, and Trevor D (2016), “Learning with Side Information Through Modality Hallucination,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 826–834.
Huang G, Liu Z, Van Der Maaten L and Weinberger KQ (2017), “Densely Connected Convolutional Networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4700–4708.
Kim H, Lee S, Ahn E, Shin M and Sim SH (2021), “Crack Identification Method for Concrete Structures Considering Angle of View Using RGB-D Camera-Based Sensor Fusion,” Structural Health Monitoring, 20(2): 500–512.
Article Google Scholar
Kumar ACS, Suchendra MB and Mukta P (2018), “Monocular Depth Prediction Using Generative Adversarial Networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 300–308.
Kwak DH and Lee SH (2020), “A Novel Method for Estimating Monocular Depth Using Cycle GAN and Segmentation,” Sensors, 20(9): 2567.
Article Google Scholar
Lazaros N, Georgios CS and Antonios G (2008), “Review of Stereo Vision Algorithms: From Software to Hardware,” International Journal of Optomechatronics, 2(4): 435–462.
Article Google Scholar
Le L, Andrew P and Martha W (2018), “Supervised Autoencoders: Improving Generalization Performance with Unsupervised Regularizers,” Advances in Neural Information Processing Systems, 31.
Lore KG, Reddy K, Giering M and Bernal EA (2018). “Generative Adversarial Networks for Depth Map Estimation from RGB Video,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 1177–1185.
Mondal TG (2021), “Development of Multimodal Fusion-Based Visual Data Analytics for Robotic Inspection and Condition Assessment,” PhD Thesis, Purdue University, USA.
Google Scholar
Mondal TG and Jahanshahi MR (2020), “Autonomous Vision-Based Damage Chronology for Spatiotemporal Condition Assessment of Civil Infrastructure Using Unmanned Aerial Vehicle,” Smart Structures and Systems, An International Journal, 25(6): 733–749.
Google Scholar
Mondal TG and Jahanshahi MR (2022), “Applications of Depth Sensing for Advanced Structural Condition Assessment in Smart Cities,” The Rise of Smart Cities, Elsevier, 305–318.
Mondal TG, Jahanshahi MR, Wu RT and Wu ZY (2020), “Deep Learning-Based Multi-Class Damage Detection for Autonomous Post-Disaster Reconnaissance,” Structural Control and Health Monitoring, 27(4): e2507.
Google Scholar
NCREE (2016), 2016 Taiwan Meinong Earthquake. https://datacenterhub.org/deedsdv/publications/view/534.
Ophoff T, Kristof VB and Toon G (2018), “Improving Real-Time Pedestrian Detectors with RGB+ Depth Fusion,” 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), IEEE, 1–6.
Özyeşil O, Voroninski V, Basri R and Singer A (2017), “A Survey of Structure from Motion,” Acta Numerica, 26: 305–364.
Article Google Scholar
Park SJ, Ki-Sang H and Seungyong L (2017), “RDFNet: RGB-D Multi-Level Residual Feature Fusion for Indoor Semantic Segmentation,” Proceedings of the IEEE International Conference on Computer Vision, 4980–4989.
Schonberger JL and Jan-Michael F (2016), “Structure-from-Motion Revisited,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4104–4113.
Schwarz M (2018), “RGB-D Object Detection and Semantic Segmentation for Autonomous Manipulation in Clutter,” The International Journal of Robotics Research, 37(4–5): 437–451.
Article Google Scholar
Shah P, Pujol S, Puranam A and Laughery L (2015), Database on Performance of Low-Rise Reinforced Concrete Buildings in the 2015 Nepal Earthquake, https://datacenterhub.org/resources/238.
Sim C, Villalobos E, Smith JP, Rojas P, Pujol S, Puranam AY and Laughery L (2016), Performance of Low-rise Reinforced Concrete Buildings in the 2016 Ecuador Earthquake, https://datacenterhub.org/resources/14160.
Simonyan K and Zisserman A (2014), “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv preprint arXiv:1409.1556.
Tan DS, Yao CY, Ruiz Jr C and Hua KL (2019), “Single-Image Depth Inference Using Generative Adversarial Networks,” Sensors, 19(7): 1708.
Article Google Scholar
Ullman S (1979), “The interpretation of structure from motion,” Proceedings of the Royal Society of London, Series B. Biological Sciences, 203(1153): 405–426.
Google Scholar
Wang Z, Zhang Y, Mosalam KM, Gao Y and Huang SL (2022), “Deep Semantic Segmentation for Visual Understanding on Construction Sites,” Computer-Aided Civil and Infrastructure Engineering, 37(2): 145–162.
Article Google Scholar
Wu C (2011), “VisualSFM: A Visual Structure from Motion System,” http://www.cs.washington.edu/homes/ccwu/vsfm.
Xu X, Li Y, Wu G and Luo J (2017), “Multi-Modal Deep Feature Learning for RGB-D Object Detection,” Pattern Recognition, 72: 300–313.
Article Google Scholar
Yeum CM, Dyke SJ, Benes B, Hacker T, Ramirez J, Lund A and Pujol S (2019), “Postevent Reconnaissance Image Documentation Using Automated Classification,” Journal of Performance of Constructed Facilities, 33(1): 04018103.
Article Google Scholar
Zennaro S, Munaro M, Milani S, Zanuttigh P, Bernardi A, Ghidoni S and Menegatti E (2015), “Performance Evaluation of the 1st and 2nd Generation Kinect for Multimedia Applications,” 2015 IEEE International Conference on Multimedia and Expo (ICME), IEEE, 1–6.
Zhao C, Sun Q, Zhang C, Tang Y and Qian F (2020), “Monocular Depth Estimation Based on Deep Learning: An overview,” Science China Technological Sciences, 1–16.
Zhou S and Song W (2020), “Deep Learning-Based Roadway Crack Classification with Heterogeneous Image Data Fusion,” Structural Health Monitoring, p. 1475921720948434.
Zhu JY, Park T, Isola P and Efros AA (2017). “Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks,” Proceedings of the IEEE International Conference on Computer Vision, 2223–2232.
Zou L and Yan L (2010), “A Method of Stereo Vision Matching Based on OpenCV,” 2010 International Conference on Audio, Language and Image Processing, IEEE, 185–190.

Download references

Acknowledgement

This study was supported in part by a fund from Bentley Systems, Inc.

Author information

Authors and Affiliations

Civil, Architectural and Environmental Engineering, Missouri University of Science and Technology, Rolla, MO, USA
Tarutal Ghosh Mondal
Lyles School of Civil Engineering, School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA
Mohammad Reza Jahanshahi
School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA
Mohammad Reza Jahanshahi

Authors

Tarutal Ghosh Mondal
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Reza Jahanshahi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Reza Jahanshahi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mondal, T.G., Jahanshahi, M.R. Fusion of color and hallucinated depth features for enhanced multimodal deep learning-based damage segmentation. Earthq. Eng. Eng. Vib. 22, 55–68 (2023). https://doi.org/10.1007/s11803-023-2155-2

Download citation

Received: 07 September 2022
Accepted: 24 November 2022
Published: 20 January 2023
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11803-023-2155-2

Fusion of color and hallucinated depth features for enhanced multimodal deep learning-based damage segmentation

Abstract

Article PDF

Similar content being viewed by others

Integrated 3D Structural Element and Damage Identification: Dataset and Benchmarking

Fusion of thermal and RGB images for automated deep learning based crack detection in civil infrastructure

MaDnet: multi-task semantic segmentation of multiple types of structural materials and damage in images of civil infrastructure

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fusion of color and hallucinated depth features for enhanced multimodal deep learning-based damage segmentation

Abstract

Article PDF

Similar content being viewed by others

Integrated 3D Structural Element and Damage Identification: Dataset and Benchmarking

Fusion of thermal and RGB images for automated deep learning based crack detection in civil infrastructure

MaDnet: multi-task semantic segmentation of multiple types of structural materials and damage in images of civil infrastructure

Explore related subjects

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation