Skip to main content

IQ-GAN: Instance-Quantized Image Synthesis

  • Conference paper
  • First Online:
Advances in Neural Computation, Machine Learning, and Cognitive Research VI (NEUROINFORMATICS 2022)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1064))

Included in the following conference series:

  • 593 Accesses

Abstract

For humans, it is natural to decompose an image into objects and background scene. Still, modern generative models usually analyze image at the scene level. Hence, it is challenging to control the style and quality of individual object instances. We propose an instance-quantized conditional generative model for the synthesis of images with high-fidelity instances of multiple classes. Specifically, we train two generators simultaneously: a scene generator that synthesizes the background environment and an instance generator that synthesizes each object instance individually. We design a differentiable image compositing layer that assembles the resulting image and allows effective error back-propagation. For our generators \(G_S\) and \(G_I\) we developed a new architecture leveraging modulated convolutional blocks. We evaluate our model and baselines on ADE20k, MHPv2, and Cityscapes datasets to demonstrate that our instance-quantized framework outperforms baselines in terms of FID and mIoU scores. Moreover, our approach allows us to separately control the style of each object and learn fine texture details. We demonstrate the effectiveness of our framework in a wide range of image manipulation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Mejjati, Y.A., Richardt, C., Tompkin, J., Cosker, D., Kim, K.I.: Unsupervised attention-guided image-to-image translation. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems 31, pp. 3693–3703. Curran Associates, Inc. (2018). http://papers.nips.cc/paper/7627-unsupervised-attention-guided-image-to-image-translation.pdf

  2. Azadi, S., Pathak, D., Ebrahimi, S., Darrell, T.: Compositional GAN: Learning image-conditional binary composition (2019)

    Google Scholar 

  3. Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018)

  4. Chen, B., Kae, A.: Toward realistic image compositing with adversarial learning. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 8415–8424 (2019). https://doi.org/10.1109/CVPR.2019.00861. http://openaccess.thecvf.com/content_CVPR_2019/html/Chen_Toward_Realistic_Image_Compositing_With_Adversarial_Learning_CVPR_2019_paper.html

  5. Cheng, Y.C., Lee, H.Y., Sun, M., Yang, M.H.: Controllable image synthesis via segvae (2020)

    Google Scholar 

  6. Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  7. Goodfellow, I.J., et al.: Generative Adversarial Networks. arXiv preprint arXiv:1406.2661 (2014)

  8. Härkönen, E., Hertzmann, A., Lehtinen, J., Paris, S.: GANSpace: discovering interpretable GAN controls. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, 6–12 Dec 2020, virtual (2020). https://proceedings.neurips.cc/paper/2020/hash/6fe43269967adbb64ec6149852b5cc3e-Abstract.html

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  10. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4–9 Dec 2017, Long Beach, CA, USA, pp. 6626–6637 (2017). http://papers.nips.cc/paper/7240-gans-trained-by-a-two-time-scale-update-rule-converge-to-a-local-nash-equilibrium

  11. Huang, X., Belongie, S.J.: Arbitrary style transfer in real-time with adaptive instance normalization. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 Oct 2017, pp. 1510–1519 (2017). https://doi.org/10.1109/ICCV.2017.167

  12. Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 179–196. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_11

    Chapter  Google Scholar 

  13. Huh, M., Zhang, R., Zhu, J.Y., Paris, S., Hertzmann, A.: Transforming and projecting images into class-conditional generative networks (2020)

    Google Scholar 

  14. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976 (2017). https://doi.org/10.1109/CVPR.2017.632

  15. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976. IEEE (2017)

    Google Scholar 

  16. Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 7–12 Dec 2015, Montreal, Quebec, Canada, pp. 2017–2025 (2015). https://proceedings.neurips.cc/paper/2015/hash/33ceb07bf4eeb3da587e268d663aba1a-Abstract.html

  17. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 Apr – 3 May 2018, Conference Track Proceedings (2018). https://openreview.net/forum?id=Hk99zCeAb

  18. Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. In: Proceedings NeurIPS (2020)

    Google Scholar 

  19. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 4401–4410 (2019). https://doi.org/10.1109/CVPR.2019.00453. http://openaccess.thecvf.com/content_CVPR_2019/html/Karras_A_Style-Based_Generator_Architecture_for_Generative_Adversarial_Networks_CVPR_2019_paper.html

  20. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. arXiv preprint arXiv:1912.04958 (2019)

  21. Kniaz, V.V., Knyaz, V., Remondino, F.: The point where reality meets fantasy: mixed adversarial generators for image splice detection. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32, pp. 215–226. Curran Associates, Inc. (2019). http://papers.nips.cc/paper/8315-the-point-where-reality-meets-fantasy-mixed-adversarial-generators.for-image-splice-detection.pdf

  22. Kniaz, V.V., Knyaz, V.A., Mizginov, V., Kozyrev, M., Moshkantsev, P.: StructureFromGAN: single image 3D model reconstruction and photorealistic texturing. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12536, pp. 595–611. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66096-3_40

    Chapter  Google Scholar 

  23. Kniaz, V.V., Knyaz, V.A., Mizginov, V., Papazyan, A., Fomin, N., Grodzitsky, L.: Adversarial dataset augmentation using reinforcement learning and 3D modeling. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y. (eds.) NEUROINFORMATICS 2020. SCI, vol. 925, pp. 316–329. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-60577-3_38

    Chapter  Google Scholar 

  24. Li, T., Qian, R., Dong, C., Liu, S., Yan, Q., Zhu, W., Lin, L.: BeautyGAN: instance-level facial makeup transfer with deep generative adversarial network, pp. 645–653 (2018). https://doi.org/10.1145/3240508.3240618

  25. Lin, C., Yumer, E., Wang, O., Shechtman, E., Lucey, S.: ST-GAN: spatial transformer generative adversarial networks for image compositing. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 9455–9464. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00985. http://openaccess.thecvf.com/content_cvpr_2018/html/Lin_ST-GAN_Spatial_Transformer_CVPR_2018_paper.html

  26. Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 700–708. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/6672-unsupervised-image-to-image-translation-networks.pdf

  27. Ma, S., Fu, J., Chen, C.W., Mei, T.: Da-GAN: instance-level image translation by deep attention generative adversarial networks (with supplementary materials) (2018)

    Google Scholar 

  28. Mejjati, Y.A., Richardt, C., Tompkin, J., Cosker, D., Kim, K.I.: Unsupervised attention-guided image to image translation (2018)

    Google Scholar 

  29. Mo, S., Cho, M., Shin, J.: InstaGAN: instance-aware image-to-image translation (2019)

    Google Scholar 

  30. Monnier, T., Vincent, E., Ponce, J., Aubry, M.: Unsupervised layered image decomposition into object prototypes. arXiv preprint arXiv:2104.14575 (2021)

  31. Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  32. Paszke, A., et al.: Automatic differentiation in pyTorch (2017)

    Google Scholar 

  33. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  34. Schönfeld, E., Sushko, V., Zhang, D., Gall, J., Schiele, B., Khoreva, A.: You only need adversarial supervision for semantic image synthesis. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=yvQKLaqNE6M

  35. Schor, N., Katzir, O., Zhang, H., Cohen-Or, D.: CompoNet: learning to generate the unseen by part synthesis and composition (2019)

    Google Scholar 

  36. Shen, Z., Huang, M., Shi, J., Xue, X., Huang, T.: Towards instance-level image-to-image translation (2019)

    Google Scholar 

  37. Shen, Z., Zhou, S.K., Chen, Y., Georgescu, B., Liu, X., Huang, T.S.: One-to-one mapping for unpaired image-to-image translation (2020)

    Google Scholar 

  38. Su, J.W., Chu, H.K., Huang, J.B.: Instance-aware image colorization (2020)

    Google Scholar 

  39. Viazovetskyi, Y., Ivashkin, V., Kashin, E.: StyleGAN2 distillation for feed-forward image manipulation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 170–186. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_11

    Chapter  Google Scholar 

  40. Volokitin, A., Susmelj, I., Agustsson, E., Van Gool, L., Timofte, R.: Efficiently detecting plausible locations for object placement using masked convolutions. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12538, pp. 252–266. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66823-5_15

    Chapter  Google Scholar 

  41. Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  42. Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 432–448. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_26

    Chapter  Google Scholar 

  43. Zhan, F., Lu, S., Zhang, C., Ma, F., Xie, X.: Adversarial image composition with auxiliary illumination (2021)

    Google Scholar 

  44. Zhang, L., Wen, T., Min, J., Wang, J., Han, D., Shi, J.: Learning object placement by inpainting for compositional data augmentation, pp. 566–581 (2020). https://doi.org/10.1007/978-3-030-58601-0_34

  45. Zhang, P., Zhang, B., Chen, D., Yuan, L., Wen, F.: Cross-domain correspondence learning for exemplar-based image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  46. Zhang, Y., Hassan, M., Neumann, H., Black, M.J., Tang, S.: Generating 3D people in scenes without people (2020)

    Google Scholar 

  47. Zhao, J., Li, J., Cheng, Y., Sim, T., Yan, S., Feng, J.: Understanding humans in crowded scenes: deep nested adversarial learning and a new benchmark for multi-human parsing. In: Boll, S., et al. (eds.) 2018 ACM Multimedia Conference on Multimedia Conference, MM 2018, Seoul, Republic of Korea, 22–26 Oct 2018, pp. 792–800. ACM (2018). https://doi.org/10.1145/3240508.3240509

  48. Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5122–5130 (2017). https://doi.org/10.1109/CVPR.2017.544

  49. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Computer Vision (ICCV), 2017 IEEE International Conference on (2017)

    Google Scholar 

  50. Zhu, J.Y., et al.: Toward multimodal image-to-image translation. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir Kniaz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kniaz, V., Knyaz, V., Moshkantsev, P. (2023). IQ-GAN: Instance-Quantized Image Synthesis. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y. (eds) Advances in Neural Computation, Machine Learning, and Cognitive Research VI. NEUROINFORMATICS 2022. Studies in Computational Intelligence, vol 1064. Springer, Cham. https://doi.org/10.1007/978-3-031-19032-2_30

Download citation

Publish with us

Policies and ethics