IQ-GAN: Instance-Quantized Image Synthesis

Kniaz, Vladimir; Knyaz, Vladimir; Moshkantsev, Petr

doi:10.1007/978-3-031-19032-2_30

Vladimir Kniaz⁶,
Vladimir Knyaz^6,7 &
Petr Moshkantsev⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1064))

Included in the following conference series:

International Conference on Neuroinformatics

593 Accesses

Abstract

For humans, it is natural to decompose an image into objects and background scene. Still, modern generative models usually analyze image at the scene level. Hence, it is challenging to control the style and quality of individual object instances. We propose an instance-quantized conditional generative model for the synthesis of images with high-fidelity instances of multiple classes. Specifically, we train two generators simultaneously: a scene generator that synthesizes the background environment and an instance generator that synthesizes each object instance individually. We design a differentiable image compositing layer that assembles the resulting image and allows effective error back-propagation. For our generators $G_S$ and $G_I$ we developed a new architecture leveraging modulated convolutional blocks. We evaluate our model and baselines on ADE20k, MHPv2, and Cityscapes datasets to demonstrate that our instance-quantized framework outperforms baselines in terms of FID and mIoU scores. Moreover, our approach allows us to separately control the style of each object and learn fine texture details. We demonstrate the effectiveness of our framework in a wide range of image manipulation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis

Article 10 February 2021

OASIS: Only Adversarial Supervision for Semantic Image Synthesis

Article Open access 17 September 2022

Auto-regressive Image Synthesis with Integrated Quantization

References

Mejjati, Y.A., Richardt, C., Tompkin, J., Cosker, D., Kim, K.I.: Unsupervised attention-guided image-to-image translation. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 3693–3703. Curran Associates, Inc. (2018). http://papers.nips.cc/paper/7627-unsupervised-attention-guided-image-to-image-translation.pdf
Azadi, S., Pathak, D., Ebrahimi, S., Darrell, T.: Compositional GAN: Learning image-conditional binary composition (2019)
Google Scholar
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018)
Chen, B., Kae, A.: Toward realistic image compositing with adversarial learning. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 8415–8424 (2019). https://doi.org/10.1109/CVPR.2019.00861. http://openaccess.thecvf.com/content_CVPR_2019/html/Chen_Toward_Realistic_Image_Compositing_With_Adversarial_Learning_CVPR_2019_paper.html
Cheng, Y.C., Lee, H.Y., Sun, M., Yang, M.H.: Controllable image synthesis via segvae (2020)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Goodfellow, I.J., et al.: Generative Adversarial Networks. arXiv preprint arXiv:1406.2661 (2014)
Härkönen, E., Hertzmann, A., Lehtinen, J., Paris, S.: GANSpace: discovering interpretable GAN controls. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 6–12 Dec 2020, virtual (2020). https://proceedings.neurips.cc/paper/2020/hash/6fe43269967adbb64ec6149852b5cc3e-Abstract.html
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4–9 Dec 2017, Long Beach, CA, USA, pp. 6626–6637 (2017). http://papers.nips.cc/paper/7240-gans-trained-by-a-two-time-scale-update-rule-converge-to-a-local-nash-equilibrium
Huang, X., Belongie, S.J.: Arbitrary style transfer in real-time with adaptive instance normalization. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 Oct 2017, pp. 1510–1519 (2017). https://doi.org/10.1109/ICCV.2017.167
Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 179–196. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_11
Chapter Google Scholar
Huh, M., Zhang, R., Zhu, J.Y., Paris, S., Hertzmann, A.: Transforming and projecting images into class-conditional generative networks (2020)
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976 (2017). https://doi.org/10.1109/CVPR.2017.632
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976. IEEE (2017)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 7–12 Dec 2015, Montreal, Quebec, Canada, pp. 2017–2025 (2015). https://proceedings.neurips.cc/paper/2015/hash/33ceb07bf4eeb3da587e268d663aba1a-Abstract.html
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 Apr – 3 May 2018, Conference Track Proceedings (2018). https://openreview.net/forum?id=Hk99zCeAb
Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. In: Proceedings NeurIPS (2020)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 4401–4410 (2019). https://doi.org/10.1109/CVPR.2019.00453. http://openaccess.thecvf.com/content_CVPR_2019/html/Karras_A_Style-Based_Generator_Architecture_for_Generative_Adversarial_Networks_CVPR_2019_paper.html
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. arXiv preprint arXiv:1912.04958 (2019)
Kniaz, V.V., Knyaz, V., Remondino, F.: The point where reality meets fantasy: mixed adversarial generators for image splice detection. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 215–226. Curran Associates, Inc. (2019). http://papers.nips.cc/paper/8315-the-point-where-reality-meets-fantasy-mixed-adversarial-generators.for-image-splice-detection.pdf
Kniaz, V.V., Knyaz, V.A., Mizginov, V., Kozyrev, M., Moshkantsev, P.: StructureFromGAN: single image 3D model reconstruction and photorealistic texturing. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12536, pp. 595–611. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66096-3_40
Chapter Google Scholar
Kniaz, V.V., Knyaz, V.A., Mizginov, V., Papazyan, A., Fomin, N., Grodzitsky, L.: Adversarial dataset augmentation using reinforcement learning and 3D modeling. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y. (eds.) NEUROINFORMATICS 2020. SCI, vol. 925, pp. 316–329. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-60577-3_38
Chapter Google Scholar
Li, T., Qian, R., Dong, C., Liu, S., Yan, Q., Zhu, W., Lin, L.: BeautyGAN: instance-level facial makeup transfer with deep generative adversarial network, pp. 645–653 (2018). https://doi.org/10.1145/3240508.3240618
Lin, C., Yumer, E., Wang, O., Shechtman, E., Lucey, S.: ST-GAN: spatial transformer generative adversarial networks for image compositing. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 9455–9464. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00985. http://openaccess.thecvf.com/content_cvpr_2018/html/Lin_ST-GAN_Spatial_Transformer_CVPR_2018_paper.html
Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 700–708. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/6672-unsupervised-image-to-image-translation-networks.pdf
Ma, S., Fu, J., Chen, C.W., Mei, T.: Da-GAN: instance-level image translation by deep attention generative adversarial networks (with supplementary materials) (2018)
Google Scholar
Mejjati, Y.A., Richardt, C., Tompkin, J., Cosker, D., Kim, K.I.: Unsupervised attention-guided image to image translation (2018)
Google Scholar
Mo, S., Cho, M., Shin, J.: InstaGAN: instance-aware image-to-image translation (2019)
Google Scholar
Monnier, T., Vincent, E., Ponce, J., Aubry, M.: Unsupervised layered image decomposition into object prototypes. arXiv preprint arXiv:2104.14575 (2021)
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
Paszke, A., et al.: Automatic differentiation in pyTorch (2017)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Schönfeld, E., Sushko, V., Zhang, D., Gall, J., Schiele, B., Khoreva, A.: You only need adversarial supervision for semantic image synthesis. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=yvQKLaqNE6M
Schor, N., Katzir, O., Zhang, H., Cohen-Or, D.: CompoNet: learning to generate the unseen by part synthesis and composition (2019)
Google Scholar
Shen, Z., Huang, M., Shi, J., Xue, X., Huang, T.: Towards instance-level image-to-image translation (2019)
Google Scholar
Shen, Z., Zhou, S.K., Chen, Y., Georgescu, B., Liu, X., Huang, T.S.: One-to-one mapping for unpaired image-to-image translation (2020)
Google Scholar
Su, J.W., Chu, H.K., Huang, J.B.: Instance-aware image colorization (2020)
Google Scholar
Viazovetskyi, Y., Ivashkin, V., Kashin, E.: StyleGAN2 distillation for feed-forward image manipulation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 170–186. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_11
Chapter Google Scholar
Volokitin, A., Susmelj, I., Agustsson, E., Van Gool, L., Timofte, R.: Efficiently detecting plausible locations for object placement using masked convolutions. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12538, pp. 252–266. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66823-5_15
Chapter Google Scholar
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 432–448. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_26
Chapter Google Scholar
Zhan, F., Lu, S., Zhang, C., Ma, F., Xie, X.: Adversarial image composition with auxiliary illumination (2021)
Google Scholar
Zhang, L., Wen, T., Min, J., Wang, J., Han, D., Shi, J.: Learning object placement by inpainting for compositional data augmentation, pp. 566–581 (2020). https://doi.org/10.1007/978-3-030-58601-0_34
Zhang, P., Zhang, B., Chen, D., Yuan, L., Wen, F.: Cross-domain correspondence learning for exemplar-based image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Zhang, Y., Hassan, M., Neumann, H., Black, M.J., Tang, S.: Generating 3D people in scenes without people (2020)
Google Scholar
Zhao, J., Li, J., Cheng, Y., Sim, T., Yan, S., Feng, J.: Understanding humans in crowded scenes: deep nested adversarial learning and a new benchmark for multi-human parsing. In: Boll, S., et al. (eds.) 2018 ACM Multimedia Conference on Multimedia Conference, MM 2018, Seoul, Republic of Korea, 22–26 Oct 2018, pp. 792–800. ACM (2018). https://doi.org/10.1145/3240508.3240509
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5122–5130 (2017). https://doi.org/10.1109/CVPR.2017.544
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Computer Vision (ICCV), 2017 IEEE International Conference on (2017)
Google Scholar
Zhu, J.Y., et al.: Toward multimodal image-to-image translation. In: Advances in Neural Information Processing Systems (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

State Research Institute of Aviation Systems (GosNIIAS), Moscow, Russia
Vladimir Kniaz, Vladimir Knyaz & Petr Moshkantsev
Moscow Institute of Physics and Technology (MIPT), Dolgoprudny, Russia
Vladimir Knyaz

Authors

Vladimir Kniaz
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Knyaz
View author publications
You can also search for this author in PubMed Google Scholar
Petr Moshkantsev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladimir Kniaz .

Editor information

Editors and Affiliations

Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow, Russia
Boris Kryzhanovsky
Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow, Russia
Witali Dunin-Barkowski
Scientific Research Institute for System Analysis, Russian Academy of Sciences, Moscow, Russia
Vladimir Redko
Moscow Aviation Institute (National Research University), Moscow, Russia
Yury Tiumentsev

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kniaz, V., Knyaz, V., Moshkantsev, P. (2023). IQ-GAN: Instance-Quantized Image Synthesis. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y. (eds) Advances in Neural Computation, Machine Learning, and Cognitive Research VI. NEUROINFORMATICS 2022. Studies in Computational Intelligence, vol 1064. Springer, Cham. https://doi.org/10.1007/978-3-031-19032-2_30

Download citation

DOI: https://doi.org/10.1007/978-3-031-19032-2_30
Published: 19 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19031-5
Online ISBN: 978-3-031-19032-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

IQ-GAN: Instance-Quantized Image Synthesis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis

OASIS: Only Adversarial Supervision for Semantic Image Synthesis

Auto-regressive Image Synthesis with Integrated Quantization

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

IQ-GAN: Instance-Quantized Image Synthesis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis

OASIS: Only Adversarial Supervision for Semantic Image Synthesis

Auto-regressive Image Synthesis with Integrated Quantization

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation