A Survey on 3D Scene Graphs: Definition, Generation and Application

Bae, Jaewon; Shin, Dongmin; Ko, Kangbeen; Lee, Juchan; Kim, Ue-Hwan

doi:10.1007/978-3-031-26889-2_13

Jaewon Bae¹⁶,
Dongmin Shin¹⁶,
Kangbeen Ko¹⁶,
Juchan Lee¹⁶ &
…
Ue-Hwan Kim¹⁶

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 642))

Included in the following conference series:

International Conference on Robot Intelligence Technology and Applications

1553 Accesses
1 Citations

Abstract

With the advancement of intelligent agents, 3D scene understanding has become one of key tasks of computer vision. 3D scene understanding is challenging to represent effectively because objects form various relationships and constantly interact with each other. A scene graph is a powerful tool to concisely represent the properties and relationships of objects in a scene—enabling various multi-modal tasks. Therefore, research on 3D scene graph (3DSG) is attracting increasing attention. However, 3DSG research is in its early stage—requiring a systematically organized survey. In this paper, we survey the latest advancement of 3DSG. In addition, we clarify 3DSG concepts that are currently defined in various ways, provide real-world applicability and present future research directions.

J. Bae and D. Shin—The two authors contributed equally to this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A survey of recent 3D scene analysis and processing methods

Article 27 February 2021

3D Scenes Semantic Understanding: New Approach Based on Image Processing for Time Learning Reducing

Visual Graphs from Motion (VGfM): Scene Understanding with Object Geometry Reasoning

Notes

1.
Note that SG generation (SGG) or SG prediction or SG construction are interchangeable.

References

Zelinsky, G.J.: Understanding scene understanding (2013)
Google Scholar
Wang, W., Yang, Y., Wang, X., Wang, W., Li, J.: Development of convolutional neural network and its application in image classification: a survey. Opt. Eng. 58(4), 040901 (2019)
Google Scholar
Zaidi, S.S.A., Ansari, M.S., Aslam, A., Kanwal, N., Asghar, M., Lee, B.: A survey of modern deep learning based object detection models. Digit. Signal Process. 103514 (2022)
Google Scholar
Liu, X., Deng, Z., Yang, Y.: Recent progress in semantic image segmentation. Artif. Intell. Rev. 52(2), 1089–1106 (2019)
Article Google Scholar
Kim, U.-H., Park, J.-M., Song, T.-J., Kim, J.-H.: 3-D scene graph: a sparse and semantic representation of physical environments for intelligent agents. IEEE Trans. Cybern. 50(12), 4921–4933 (2019). https://github.com/Uehwan/3-D-Scene-Graph
Hughes, N., Chang, Y., Carlone, L.: Hydra: a real-time spatial perception system for 3D scene graph construction and optimization (2022)
Google Scholar
Fisher, M., Savva, M., Hanrahan, P.: Characterizing structural relationships in scenes using graph kernels. In: SIGGRAPH, pp. 1–12 (2011)
Google Scholar
Tobler, R.F.: Separating semantics from rendering: a scene graph based architecture for graphics applications. Vis. Comput. 27(6), 687–695 (2011)
Article Google Scholar
Johnson, J., et al.: Image retrieval using scene graphs. In: CVPR, pp. 3668–3678 (2015)
Google Scholar
Lu, C., Krishna, R., Bernstein, M., Fei-Fei, L.: Visual relationship detection with language priors. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 852–869. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_51
Chapter Google Scholar
Xu, D., Zhu, Y., Choy, C.B., Fei-Fei, L.: Scene graph generation by iterative message passing. In: CVPR, pp. 5410–5419 (2017)
Google Scholar
Li, Y., Ouyang, W., Zhou, B., Wang, K., Wang, X.: Scene graph generation from objects, phrases and region captions. In: ICCV, pp. 1261–1270 (2017)
Google Scholar
Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Graph R-CNN for scene graph generation. In: ECCV, pp. 670–685 (2018)
Google Scholar
Li, Y., Ouyang, W., Zhou, B., Shi, J., Zhang, C., Wang, X.: Factorizable net: an efficient subgraph-based framework for scene graph generation. In: ECCV, pp. 335–351 (2018)
Google Scholar
Tang, K., Niu, Y., Huang, J., Shi, J., Zhang, H.: Unbiased scene graph generation from biased training. In: CVPR, pp. 3716–3725 (2020)
Google Scholar
Shang, X., Ren, T., Guo,J., Zhang, H., Chua, T.-S.: Video visual relation detection. In: ACM Multimedia (2017)
Google Scholar
Tsai, Y.-H.H., Divvala, S., Morency, L.-P., Salakhutdinov, R., Farhadi, A.: Video relationship reasoning using gated spatio-temporal energy graph. In: CVPR, pp. 10424–10433 (2019)
Google Scholar
Teng, Y., Wang, L., Li, Z., Wu, G. : Target adaptive context aggregation for video scene graph generation. In: CVPR, pp. 13688–13697 (2021)
Google Scholar
Cong, Y., Liao, W., Ackermann, H., Rosenhahn, B., Yang, M.Y.: Spatial-temporal transformer for dynamic scene graph generation. In: CVPR, pp. 16372–16382 (2021)
Google Scholar
Li, Y., Yang, X., Xu, C.: Dynamic scene graph generation via anticipatory pre-training. In: CVPR, pp. 13874–13883 (2022)
Google Scholar
Gay, P., Stuart, J., Del Bue, A.: Visual graphs from motion (VGfM): scene understanding with object geometry reasoning. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11363, pp. 330–346. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20893-6_21
Chapter Google Scholar
Chang, X., Ren, P., Xu, P., Li, Z., Chen, X., Hauptmann, A.G.: A comprehensive survey of scene graphs: generation and application. TPAMI 45, 1–26 (2021)
Article Google Scholar
Zhu, G., et al.: Scene graph generation: a comprehensive survey. arXiv preprint arXiv:2201.00443 (2022)
Wald, J., Dhamo, H., Navab, N., Tombari, F.: Learning 3D semantic scene graphs from 3d indoor reconstructions. In: CVPR, pp. 3961–3970 (2020). https://3dssg.github.io/#download
Zhang, S., Hao, A., Qin, H., et al.: Knowledge-inspired 3D scene graph prediction in point cloud. In: NeurIPS, vol. 34, pp. 18620–18632 (2021)
Google Scholar
Armeni, I., et al.: 3D scene graph: a structure for unified semantics, 3D space, and camera. In: CVPR, pp. 5664–5673 (2019). https://github.com/StanfordVL/3DSceneGraph
Rosinol, A., et al.: Kimera: from slam to spatial perception with 3D dynamic scene graphs. Int. J. Robot. Res. 40(12-14), 1510–1546 (2021). https://github.com/MIT-SPARK/Kimera
Wu, S.-C., Wald, J., Tateno, K., Navab, N., Tombari, F.: SceneGraphFusion: incremental 3D scene graph prediction from RGB-D sequences. In: CVPR, pp. 7515–7525 (2021)
Google Scholar
Li, X., Guo, D., Liu, H., Sun, F.: Embodied semantic scene graph generation. In: CoRL, pp. 1585–1594. PMLR (2022)
Google Scholar
Zhang, P., Ge, X., Renz, J.: Support relation analysis for objects in multiple view RGB-D images. In: El Fallah Seghrouchni, A., Sarne, D. (eds.) IJCAI 2019. LNCS (LNAI), vol. 12158, pp. 41–61. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-56150-5_3
Chapter Google Scholar
Zhang, C., Yu, J., Song, Y., Cai, W.: Exploiting edge-oriented reasoning for 3D point-based scene graph analysis. In: CVPR, pp. 9705–9715 (2021)
Google Scholar
Talak, R., Hu, S., Peng, L., Carlone, L.: Neural trees for learning on graphs. In: NeurIPS, vol. 34, pp. 26395–26408 (2021)
Google Scholar
Krishna, R., et al.: Visual genome: Connecting language and vision using crowdsourced dense image annotations. IJCV 123(1), 32–73 (2017)
Article MathSciNet Google Scholar
Kuznetsova, A., et al.: The open images dataset v4. IJCV 128(7), 1956–1981 (2020)
Article Google Scholar
Liang, Y., Bai, Y., Zhang, W., Qian, X., Zhu, L., Mei, T.: VrR-VG: refocusing visually-relevant relationships. In: CVPR, pp. 10403–10412 (2019)
Google Scholar
Yang, J., Ang, Y.Z., Guo, Z., Zhou, K., Zhang, W., Liu, Z.: Panoptic scene graph generation. arXiv preprint arXiv:2207.11247 (2022)
Ji, J., Krishna, R., Fei-Fei, L., Niebles, J.C.: Action genome: actions as compositions of spatio-temporal scene graphs. In: CVPR, pp. 10236–10247 (2020)
Google Scholar
Shang, X., Di, D., Xiao, J., Cao, Y., Yang, X., Chua, T.-S.: Annotating objects and relations in user-generated videos. In: ICMR, pp. 279–287 (2019)
Google Scholar
Zhuo, T., Cheng, Z., Zhang, P., Wong, Y., Kankanhalli, M.: Explainable video action reasoning via prior knowledge and state transitions. In: ACM Multimedia, pp. 521–529 (2019)
Google Scholar
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR, pp. 5828–5839 (2017). http://www.scan-net.org/
Giuliari, F., Skenderi, G., Cristani, M., Wang, Y., Del Bue, A.: Spatial commonsense graph for object localisation in partial scenes. In: CVPR, pp. 19518–19527 (2022). https://fgiuliari.github.io/projects/SpatialCommonsenseGraph/
Tian, Y., Carballo, A., Li, R., Takeda, K.: Road scene graph: a semantic graph-based scene representation dataset for intelligent vehicles. arXiv preprint arXiv:2011.13588 (2020). https://github.com/tianyafu/road-status-graph-dataset
Dreher, C.R., Wächter, M., Asfour, T.: Learning object-action relations from bimanual human demonstration using graph networks. IEEE RA-L 5(1), 187–194 (2019). https://bimanual-actions.humanoids.kit.edu/
Özsoy, E., Örnek, E.P., Eck, U., Czempiel, T., Tombari, F., Navab, N.: 4D-OR: semantic scene graphs for or domain modeling. arXiv preprint arXiv:2203.11937 (2022). https://github.com/egeozsoy/4D-OR
Goyal, A., Yang, K., Yang, D., Deng, J.: Rel3D: a minimally contrastive benchmark for grounding spatial relations in 3D. In: NeurIPS, vol. 33, pp. 10514–10525 (2020). https://github.com/princeton-vl/Rel3D
Hong, Y., Yi, L., Tenenbaum, J., Torralba, A., Gan, C.: PTR: a benchmark for part-based conceptual, relational, and physical reasoning. In: NeurIPS, vol. 34, pp. 17427–17440 (2021). http://ptr.csail.mit.edu/
Wald, J., Avetisyan, A., Navab, N., Tombari, F., Nießner, M.: RIO: 3D object instance re-localization in changing indoor environments. In: CVPR, pp. 7658–7667 (2019). https://waldjohannau.github.io/RIO
Xia, F., Zamir, A.R., He, Z.-Y., Sax, A., Malik, J., Savarese, S.: Gibson env: real-world perception for embodied agents. In: CVPR (2018). http://gibsonenv.stanford.edu/
Tang, K., Zhang, H., Wu, B., Luo, W., Liu, W.: Learning to compose dynamic tree structures for visual contexts. In: CVPR (2019)
Google Scholar
Gkanatsios, N., Pitsikalis, V., Koutras, P., Maragos, P.: Attention-translation-relation network for scalable scene graph generation. In: ICCV (2019)
Google Scholar
Li, X., Guo, D., Liu, H., Sun, F.: Embodied semantic scene graph generation. In: CoRL. Proceedings of Machine Learning Research, vol. 164, pp. 1585–1594. PMLR (2022)
Google Scholar
Wu, F., Yan, F., Shi, W., Zhou, Z.: 3d scene graph prediction from point clouds. Virtual Reality Intell. Hardw. 4(1), 76–88 (2022)
Article Google Scholar
Agia, C., et al.: Taskography: evaluating robot task planning over large 3D scene graphs. In: CoRL, pp. 46–58 (2022)
Google Scholar
Jiao, Z., Niu, Y., Zhang, Z., Zhu, S.-C., Zhu, Y., Liu, H.: Sequential manipulation planning on scene graph. In: IROS (2022)
Google Scholar
Ravichandran, Z., Peng, L., Hughes, N., Griffith, J.D., Carlone, L.: Hierarchical representations and explicit memory: learning effective navigation policies on 3d scene graphs using graph neural networks. In: ICRA, pp. 9272–9279 (2022)
Google Scholar
Dhamo, H., Manhardt, F., Navab, N., Tombari, F.: Graph-to-3D: end-to-end generation and manipulation of 3D scenes using scene graphs. In: CVPR, pp. 16352–16361 (2021)
Google Scholar
Savkin, A., Ellouze, R., Navab, N., Tombari, F.: Unsupervised traffic scene generation with synthetic 3D scene graphs. In: IROS, pp. 1229–1235. IEEE (2021)
Google Scholar

Download references

Acknowledgement

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2022-0-00907, Development of AI Bots Collaboration Platform and Self-organizing AI).

Author information

Authors and Affiliations

GIST, 123 Cheomdangwagi-ro, Buk-gu, Gwangju, 61005, Republic of Korea
Jaewon Bae, Dongmin Shin, Kangbeen Ko, Juchan Lee & Ue-Hwan Kim

Authors

Jaewon Bae
View author publications
You can also search for this author in PubMed Google Scholar
Dongmin Shin
View author publications
You can also search for this author in PubMed Google Scholar
Kangbeen Ko
View author publications
You can also search for this author in PubMed Google Scholar
Juchan Lee
View author publications
You can also search for this author in PubMed Google Scholar
Ue-Hwan Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ue-Hwan Kim .

Editor information

Editors and Affiliations

School of Information and Communication Technology, Griffith University, Southport, Australia
Jun Jo
Department of Aerospace Engineering, KAIST, Daejeon, Korea (Republic of)
Han-Lim Choi
School of Information and Communication Technology, Griffith University, Southport, Australia
Marde Helbig
Department of Mechanical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Korea (Republic of)
Hyondong Oh
Department of Mechanical Engineering, KAIST, Daejeon, Korea (Republic of)
Jemin Hwangbo
Department of Mechanical Engineering, KAIST, Daejeon, Korea (Republic of)
Chang-Hun Lee
School of Information and Communication Technology, Griffith University, Southport, Australia
Bela Stantic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bae, J., Shin, D., Ko, K., Lee, J., Kim, UH. (2023). A Survey on 3D Scene Graphs: Definition, Generation and Application. In: Jo, J., et al. Robot Intelligence Technology and Applications 7. RiTA 2022. Lecture Notes in Networks and Systems, vol 642. Springer, Cham. https://doi.org/10.1007/978-3-031-26889-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-26889-2_13
Published: 01 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26888-5
Online ISBN: 978-3-031-26889-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Survey on 3D Scene Graphs: Definition, Generation and Application

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A survey of recent 3D scene analysis and processing methods

3D Scenes Semantic Understanding: New Approach Based on Image Processing for Time Learning Reducing

Visual Graphs from Motion (VGfM): Scene Understanding with Object Geometry Reasoning

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Survey on 3D Scene Graphs: Definition, Generation and Application

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A survey of recent 3D scene analysis and processing methods

3D Scenes Semantic Understanding: New Approach Based on Image Processing for Time Learning Reducing

Visual Graphs from Motion (VGfM): Scene Understanding with Object Geometry Reasoning

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation