Abstract
With the advancement of intelligent agents, 3D scene understanding has become one of key tasks of computer vision. 3D scene understanding is challenging to represent effectively because objects form various relationships and constantly interact with each other. A scene graph is a powerful tool to concisely represent the properties and relationships of objects in a scene—enabling various multi-modal tasks. Therefore, research on 3D scene graph (3DSG) is attracting increasing attention. However, 3DSG research is in its early stage—requiring a systematically organized survey. In this paper, we survey the latest advancement of 3DSG. In addition, we clarify 3DSG concepts that are currently defined in various ways, provide real-world applicability and present future research directions.
J. Bae and D. Shin—The two authors contributed equally to this paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that SG generation (SGG) or SG prediction or SG construction are interchangeable.
References
Zelinsky, G.J.: Understanding scene understanding (2013)
Wang, W., Yang, Y., Wang, X., Wang, W., Li, J.: Development of convolutional neural network and its application in image classification: a survey. Opt. Eng. 58(4), 040901 (2019)
Zaidi, S.S.A., Ansari, M.S., Aslam, A., Kanwal, N., Asghar, M., Lee, B.: A survey of modern deep learning based object detection models. Digit. Signal Process. 103514 (2022)
Liu, X., Deng, Z., Yang, Y.: Recent progress in semantic image segmentation. Artif. Intell. Rev. 52(2), 1089–1106 (2019)
Kim, U.-H., Park, J.-M., Song, T.-J., Kim, J.-H.: 3-D scene graph: a sparse and semantic representation of physical environments for intelligent agents. IEEE Trans. Cybern. 50(12), 4921–4933 (2019). https://github.com/Uehwan/3-D-Scene-Graph
Hughes, N., Chang, Y., Carlone, L.: Hydra: a real-time spatial perception system for 3D scene graph construction and optimization (2022)
Fisher, M., Savva, M., Hanrahan, P.: Characterizing structural relationships in scenes using graph kernels. In: SIGGRAPH, pp. 1–12 (2011)
Tobler, R.F.: Separating semantics from rendering: a scene graph based architecture for graphics applications. Vis. Comput. 27(6), 687–695 (2011)
Johnson, J., et al.: Image retrieval using scene graphs. In: CVPR, pp. 3668–3678 (2015)
Lu, C., Krishna, R., Bernstein, M., Fei-Fei, L.: Visual relationship detection with language priors. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 852–869. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_51
Xu, D., Zhu, Y., Choy, C.B., Fei-Fei, L.: Scene graph generation by iterative message passing. In: CVPR, pp. 5410–5419 (2017)
Li, Y., Ouyang, W., Zhou, B., Wang, K., Wang, X.: Scene graph generation from objects, phrases and region captions. In: ICCV, pp. 1261–1270 (2017)
Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Graph R-CNN for scene graph generation. In: ECCV, pp. 670–685 (2018)
Li, Y., Ouyang, W., Zhou, B., Shi, J., Zhang, C., Wang, X.: Factorizable net: an efficient subgraph-based framework for scene graph generation. In: ECCV, pp. 335–351 (2018)
Tang, K., Niu, Y., Huang, J., Shi, J., Zhang, H.: Unbiased scene graph generation from biased training. In: CVPR, pp. 3716–3725 (2020)
Shang, X., Ren, T., Guo,J., Zhang, H., Chua, T.-S.: Video visual relation detection. In: ACM Multimedia (2017)
Tsai, Y.-H.H., Divvala, S., Morency, L.-P., Salakhutdinov, R., Farhadi, A.: Video relationship reasoning using gated spatio-temporal energy graph. In: CVPR, pp. 10424–10433 (2019)
Teng, Y., Wang, L., Li, Z., Wu, G. : Target adaptive context aggregation for video scene graph generation. In: CVPR, pp. 13688–13697 (2021)
Cong, Y., Liao, W., Ackermann, H., Rosenhahn, B., Yang, M.Y.: Spatial-temporal transformer for dynamic scene graph generation. In: CVPR, pp. 16372–16382 (2021)
Li, Y., Yang, X., Xu, C.: Dynamic scene graph generation via anticipatory pre-training. In: CVPR, pp. 13874–13883 (2022)
Gay, P., Stuart, J., Del Bue, A.: Visual graphs from motion (VGfM): scene understanding with object geometry reasoning. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11363, pp. 330–346. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20893-6_21
Chang, X., Ren, P., Xu, P., Li, Z., Chen, X., Hauptmann, A.G.: A comprehensive survey of scene graphs: generation and application. TPAMI 45, 1–26 (2021)
Zhu, G., et al.: Scene graph generation: a comprehensive survey. arXiv preprint arXiv:2201.00443 (2022)
Wald, J., Dhamo, H., Navab, N., Tombari, F.: Learning 3D semantic scene graphs from 3d indoor reconstructions. In: CVPR, pp. 3961–3970 (2020). https://3dssg.github.io/#download
Zhang, S., Hao, A., Qin, H., et al.: Knowledge-inspired 3D scene graph prediction in point cloud. In: NeurIPS, vol. 34, pp. 18620–18632 (2021)
Armeni, I., et al.: 3D scene graph: a structure for unified semantics, 3D space, and camera. In: CVPR, pp. 5664–5673 (2019). https://github.com/StanfordVL/3DSceneGraph
Rosinol, A., et al.: Kimera: from slam to spatial perception with 3D dynamic scene graphs. Int. J. Robot. Res. 40(12-14), 1510–1546 (2021). https://github.com/MIT-SPARK/Kimera
Wu, S.-C., Wald, J., Tateno, K., Navab, N., Tombari, F.: SceneGraphFusion: incremental 3D scene graph prediction from RGB-D sequences. In: CVPR, pp. 7515–7525 (2021)
Li, X., Guo, D., Liu, H., Sun, F.: Embodied semantic scene graph generation. In: CoRL, pp. 1585–1594. PMLR (2022)
Zhang, P., Ge, X., Renz, J.: Support relation analysis for objects in multiple view RGB-D images. In: El Fallah Seghrouchni, A., Sarne, D. (eds.) IJCAI 2019. LNCS (LNAI), vol. 12158, pp. 41–61. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-56150-5_3
Zhang, C., Yu, J., Song, Y., Cai, W.: Exploiting edge-oriented reasoning for 3D point-based scene graph analysis. In: CVPR, pp. 9705–9715 (2021)
Talak, R., Hu, S., Peng, L., Carlone, L.: Neural trees for learning on graphs. In: NeurIPS, vol. 34, pp. 26395–26408 (2021)
Krishna, R., et al.: Visual genome: Connecting language and vision using crowdsourced dense image annotations. IJCV 123(1), 32–73 (2017)
Kuznetsova, A., et al.: The open images dataset v4. IJCV 128(7), 1956–1981 (2020)
Liang, Y., Bai, Y., Zhang, W., Qian, X., Zhu, L., Mei, T.: VrR-VG: refocusing visually-relevant relationships. In: CVPR, pp. 10403–10412 (2019)
Yang, J., Ang, Y.Z., Guo, Z., Zhou, K., Zhang, W., Liu, Z.: Panoptic scene graph generation. arXiv preprint arXiv:2207.11247 (2022)
Ji, J., Krishna, R., Fei-Fei, L., Niebles, J.C.: Action genome: actions as compositions of spatio-temporal scene graphs. In: CVPR, pp. 10236–10247 (2020)
Shang, X., Di, D., Xiao, J., Cao, Y., Yang, X., Chua, T.-S.: Annotating objects and relations in user-generated videos. In: ICMR, pp. 279–287 (2019)
Zhuo, T., Cheng, Z., Zhang, P., Wong, Y., Kankanhalli, M.: Explainable video action reasoning via prior knowledge and state transitions. In: ACM Multimedia, pp. 521–529 (2019)
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR, pp. 5828–5839 (2017). http://www.scan-net.org/
Giuliari, F., Skenderi, G., Cristani, M., Wang, Y., Del Bue, A.: Spatial commonsense graph for object localisation in partial scenes. In: CVPR, pp. 19518–19527 (2022). https://fgiuliari.github.io/projects/SpatialCommonsenseGraph/
Tian, Y., Carballo, A., Li, R., Takeda, K.: Road scene graph: a semantic graph-based scene representation dataset for intelligent vehicles. arXiv preprint arXiv:2011.13588 (2020). https://github.com/tianyafu/road-status-graph-dataset
Dreher, C.R., Wächter, M., Asfour, T.: Learning object-action relations from bimanual human demonstration using graph networks. IEEE RA-L 5(1), 187–194 (2019). https://bimanual-actions.humanoids.kit.edu/
Özsoy, E., Örnek, E.P., Eck, U., Czempiel, T., Tombari, F., Navab, N.: 4D-OR: semantic scene graphs for or domain modeling. arXiv preprint arXiv:2203.11937 (2022). https://github.com/egeozsoy/4D-OR
Goyal, A., Yang, K., Yang, D., Deng, J.: Rel3D: a minimally contrastive benchmark for grounding spatial relations in 3D. In: NeurIPS, vol. 33, pp. 10514–10525 (2020). https://github.com/princeton-vl/Rel3D
Hong, Y., Yi, L., Tenenbaum, J., Torralba, A., Gan, C.: PTR: a benchmark for part-based conceptual, relational, and physical reasoning. In: NeurIPS, vol. 34, pp. 17427–17440 (2021). http://ptr.csail.mit.edu/
Wald, J., Avetisyan, A., Navab, N., Tombari, F., Nießner, M.: RIO: 3D object instance re-localization in changing indoor environments. In: CVPR, pp. 7658–7667 (2019). https://waldjohannau.github.io/RIO
Xia, F., Zamir, A.R., He, Z.-Y., Sax, A., Malik, J., Savarese, S.: Gibson env: real-world perception for embodied agents. In: CVPR (2018). http://gibsonenv.stanford.edu/
Tang, K., Zhang, H., Wu, B., Luo, W., Liu, W.: Learning to compose dynamic tree structures for visual contexts. In: CVPR (2019)
Gkanatsios, N., Pitsikalis, V., Koutras, P., Maragos, P.: Attention-translation-relation network for scalable scene graph generation. In: ICCV (2019)
Li, X., Guo, D., Liu, H., Sun, F.: Embodied semantic scene graph generation. In: CoRL. Proceedings of Machine Learning Research, vol. 164, pp. 1585–1594. PMLR (2022)
Wu, F., Yan, F., Shi, W., Zhou, Z.: 3d scene graph prediction from point clouds. Virtual Reality Intell. Hardw. 4(1), 76–88 (2022)
Agia, C., et al.: Taskography: evaluating robot task planning over large 3D scene graphs. In: CoRL, pp. 46–58 (2022)
Jiao, Z., Niu, Y., Zhang, Z., Zhu, S.-C., Zhu, Y., Liu, H.: Sequential manipulation planning on scene graph. In: IROS (2022)
Ravichandran, Z., Peng, L., Hughes, N., Griffith, J.D., Carlone, L.: Hierarchical representations and explicit memory: learning effective navigation policies on 3d scene graphs using graph neural networks. In: ICRA, pp. 9272–9279 (2022)
Dhamo, H., Manhardt, F., Navab, N., Tombari, F.: Graph-to-3D: end-to-end generation and manipulation of 3D scenes using scene graphs. In: CVPR, pp. 16352–16361 (2021)
Savkin, A., Ellouze, R., Navab, N., Tombari, F.: Unsupervised traffic scene generation with synthetic 3D scene graphs. In: IROS, pp. 1229–1235. IEEE (2021)
Acknowledgement
This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2022-0-00907, Development of AI Bots Collaboration Platform and Self-organizing AI).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bae, J., Shin, D., Ko, K., Lee, J., Kim, UH. (2023). A Survey on 3D Scene Graphs: Definition, Generation and Application. In: Jo, J., et al. Robot Intelligence Technology and Applications 7. RiTA 2022. Lecture Notes in Networks and Systems, vol 642. Springer, Cham. https://doi.org/10.1007/978-3-031-26889-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-26889-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26888-5
Online ISBN: 978-3-031-26889-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)