Abstract
Recent advances in computer vision and deep learning have shown that the fusion of depth information can significantly enhance the performance of RGB-based damage detection and segmentation models. However, alongside the advantages, depth-sensing also presents many practical challenges. For instance, the depth sensors impose an additional payload burden on the robotic inspection platforms limiting the operation time and increasing the inspection cost. Additionally, some lidar-based depth sensors have poor outdoor performance due to sunlight contamination during the daytime. In this context, this study investigates the feasibility of abolishing depth-sensing at test time without compromising the segmentation performance. An autonomous damage segmentation framework is developed, based on recent advancements in vision-based multi-modal sensing such as modality hallucination (MH) and monocular depth estimation (MDE), which require depth data only during the model training. At the time of deployment, depth data becomes expendable as it can be simulated from the corresponding RGB frames. This makes it possible to reap the benefits of depth fusion without any depth perception per se. This study explored two different depth encoding techniques and three different fusion strategies in addition to a baseline RGB-based model. The proposed approach is validated on computer-generated RGB-D data of reinforced concrete buildings subjected to seismic damage. It was observed that the surrogate techniques can increase the segmentation IoU by up to 20.1% with a negligible increase in the computation cost. Overall, this study is believed to make a positive contribution to enhancing the resilience of critical civil infrastructure.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
ACI 318-11 (2011), Building Code Requirements for Structural Concrete, American Concrete Institute, USA.
Alexander QG, Hoskere V, Narazaki Y, Maxwell A, Spencer BF (2022), “Fusion of Thermal and RGB Images for Automated Deep Learning Based Crack Detection in Civil Infrastructure,” AI in Civil Engineering, 1(1): 1–10.
Alhashim I and Peter W (2018), “High Quality Monocular Depth Estimation via Transfer Learning,” arXiv preprint arXiv:1812.11941.
Bhoi A (2019), “Monocular Depth Estimation: A Survey,” arXiv preprint arXiv:1901.09402.
Cao ZL, Zhong-Hong Y and Hong W (2015), “Summary of Binocular Stereo Vision Matching Technology,” Journal of Chongqing University of Technology (Natural Science), 29(2): 70–75.
Chang A, Dai A, Funkhouser T, Halber M, Niessner M, Savva M, Song S, Zeng A and Zhang Y (2017), “Matterport3D: Learning from RGB-D Data in Indoor Environments,” International Conference on 3D Vision (3DV).
Cheng Y, Cai R, Li Z, Zhao X and Huang K (2017), “Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3029–3037.
Deng J, Dong W, Socher R, Li LJ, Li K and Fei-Fei L (2009), “ImageNet: A Large-Scale Hierarchical Image Database,” 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 248–255.
Elkins EB (2020), “Simulating Destruction Effects in SideFX Houdini,” Undergraduate Honors Theses, Paper 524. https://du.etsu.edu/honors/524
Gao Y and Khalid MM (2022), “Deep Learning Visual Interpretation of Structural Damage Images,” Journal of Building Engineering, p. 105144.
Groenendijk R (2020), “On the Benefit of Adversarial Training for Monocular Depth Estimation,” Computer Vision and Image Understanding, 190, p. 102848.
Gunasekar K, Qiang Q and Yezhou Y (2020), “Low to High Dimensional Modality Hallucination Using Aggregated Fields of View,” IEEE Robotics and Automation Letters, 5(2): 1983–1990.
Hazirbas C, Ma L, Domokos C and Cremers D (2016), “FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture,” Asian Conference on Computer Vision. Springer, 213–228.
Hoffman J, Saurabh G, and Trevor D (2016), “Learning with Side Information Through Modality Hallucination,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 826–834.
Huang G, Liu Z, Van Der Maaten L and Weinberger KQ (2017), “Densely Connected Convolutional Networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4700–4708.
Kim H, Lee S, Ahn E, Shin M and Sim SH (2021), “Crack Identification Method for Concrete Structures Considering Angle of View Using RGB-D Camera-Based Sensor Fusion,” Structural Health Monitoring, 20(2): 500–512.
Kumar ACS, Suchendra MB and Mukta P (2018), “Monocular Depth Prediction Using Generative Adversarial Networks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 300–308.
Kwak DH and Lee SH (2020), “A Novel Method for Estimating Monocular Depth Using Cycle GAN and Segmentation,” Sensors, 20(9): 2567.
Lazaros N, Georgios CS and Antonios G (2008), “Review of Stereo Vision Algorithms: From Software to Hardware,” International Journal of Optomechatronics, 2(4): 435–462.
Le L, Andrew P and Martha W (2018), “Supervised Autoencoders: Improving Generalization Performance with Unsupervised Regularizers,” Advances in Neural Information Processing Systems, 31.
Lore KG, Reddy K, Giering M and Bernal EA (2018). “Generative Adversarial Networks for Depth Map Estimation from RGB Video,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 1177–1185.
Mondal TG (2021), “Development of Multimodal Fusion-Based Visual Data Analytics for Robotic Inspection and Condition Assessment,” PhD Thesis, Purdue University, USA.
Mondal TG and Jahanshahi MR (2020), “Autonomous Vision-Based Damage Chronology for Spatiotemporal Condition Assessment of Civil Infrastructure Using Unmanned Aerial Vehicle,” Smart Structures and Systems, An International Journal, 25(6): 733–749.
Mondal TG and Jahanshahi MR (2022), “Applications of Depth Sensing for Advanced Structural Condition Assessment in Smart Cities,” The Rise of Smart Cities, Elsevier, 305–318.
Mondal TG, Jahanshahi MR, Wu RT and Wu ZY (2020), “Deep Learning-Based Multi-Class Damage Detection for Autonomous Post-Disaster Reconnaissance,” Structural Control and Health Monitoring, 27(4): e2507.
NCREE (2016), 2016 Taiwan Meinong Earthquake. https://datacenterhub.org/deedsdv/publications/view/534.
Ophoff T, Kristof VB and Toon G (2018), “Improving Real-Time Pedestrian Detectors with RGB+ Depth Fusion,” 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), IEEE, 1–6.
Özyeşil O, Voroninski V, Basri R and Singer A (2017), “A Survey of Structure from Motion,” Acta Numerica, 26: 305–364.
Park SJ, Ki-Sang H and Seungyong L (2017), “RDFNet: RGB-D Multi-Level Residual Feature Fusion for Indoor Semantic Segmentation,” Proceedings of the IEEE International Conference on Computer Vision, 4980–4989.
Schonberger JL and Jan-Michael F (2016), “Structure-from-Motion Revisited,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4104–4113.
Schwarz M (2018), “RGB-D Object Detection and Semantic Segmentation for Autonomous Manipulation in Clutter,” The International Journal of Robotics Research, 37(4–5): 437–451.
Shah P, Pujol S, Puranam A and Laughery L (2015), Database on Performance of Low-Rise Reinforced Concrete Buildings in the 2015 Nepal Earthquake, https://datacenterhub.org/resources/238.
Sim C, Villalobos E, Smith JP, Rojas P, Pujol S, Puranam AY and Laughery L (2016), Performance of Low-rise Reinforced Concrete Buildings in the 2016 Ecuador Earthquake, https://datacenterhub.org/resources/14160.
Simonyan K and Zisserman A (2014), “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv preprint arXiv:1409.1556.
Tan DS, Yao CY, Ruiz Jr C and Hua KL (2019), “Single-Image Depth Inference Using Generative Adversarial Networks,” Sensors, 19(7): 1708.
Ullman S (1979), “The interpretation of structure from motion,” Proceedings of the Royal Society of London, Series B. Biological Sciences, 203(1153): 405–426.
Wang Z, Zhang Y, Mosalam KM, Gao Y and Huang SL (2022), “Deep Semantic Segmentation for Visual Understanding on Construction Sites,” Computer-Aided Civil and Infrastructure Engineering, 37(2): 145–162.
Wu C (2011), “VisualSFM: A Visual Structure from Motion System,” http://www.cs.washington.edu/homes/ccwu/vsfm.
Xu X, Li Y, Wu G and Luo J (2017), “Multi-Modal Deep Feature Learning for RGB-D Object Detection,” Pattern Recognition, 72: 300–313.
Yeum CM, Dyke SJ, Benes B, Hacker T, Ramirez J, Lund A and Pujol S (2019), “Postevent Reconnaissance Image Documentation Using Automated Classification,” Journal of Performance of Constructed Facilities, 33(1): 04018103.
Zennaro S, Munaro M, Milani S, Zanuttigh P, Bernardi A, Ghidoni S and Menegatti E (2015), “Performance Evaluation of the 1st and 2nd Generation Kinect for Multimedia Applications,” 2015 IEEE International Conference on Multimedia and Expo (ICME), IEEE, 1–6.
Zhao C, Sun Q, Zhang C, Tang Y and Qian F (2020), “Monocular Depth Estimation Based on Deep Learning: An overview,” Science China Technological Sciences, 1–16.
Zhou S and Song W (2020), “Deep Learning-Based Roadway Crack Classification with Heterogeneous Image Data Fusion,” Structural Health Monitoring, p. 1475921720948434.
Zhu JY, Park T, Isola P and Efros AA (2017). “Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks,” Proceedings of the IEEE International Conference on Computer Vision, 2223–2232.
Zou L and Yan L (2010), “A Method of Stereo Vision Matching Based on OpenCV,” 2010 International Conference on Audio, Language and Image Processing, IEEE, 185–190.
Acknowledgement
This study was supported in part by a fund from Bentley Systems, Inc.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mondal, T.G., Jahanshahi, M.R. Fusion of color and hallucinated depth features for enhanced multimodal deep learning-based damage segmentation. Earthq. Eng. Eng. Vib. 22, 55–68 (2023). https://doi.org/10.1007/s11803-023-2155-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11803-023-2155-2