Skip to content

Conditional Image Generation

Thanos Masouris edited this page Aug 28, 2022 · 11 revisions

CGAN

Overview of a Conditional GAN

Image Generation is a task of Computer Vision that has been long researched in the literature. Studies leverage Generative Adversarial Networks (GANs) [1], which using only a random noise vector as input can produce synthetic images. Recent models [2] are able to generate images that are indistinguishable from real ones, even in complex datasets such as ImageNet [3]. Although this is an interesting task, a more practical one, and also more complex, is the task of Conditional Image Generation. It refers to the task of Computer Vision, in which a generative model is used to synthesize realistic-looking images based on input conditions. The conditions could be attributes, text descriptions, or class labels, among others. Recent advances in this topic present models [4][5] that are able to generate high-quality and high-fidelity images, but at the expense of millions of parameters that require substantial computational resources.

References

[1] Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems 27 (2014).

[2] Sauer, Axel, Katja Schwarz, and Andreas Geiger. "Stylegan-xl: Scaling stylegan to large diverse datasets." arXiv preprint arXiv:2202.00273 1 (2022).

[3] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, ImageNet: A Large-Scale Hierarchical Image Database. IEEE Computer Vision and Pattern Recognition (CVPR), 2009.

[4] Kang, Minguk, et al. "Rebooting acgan: Auxiliary classifier gans with stable training." Advances in Neural Information Processing Systems 34 (2021): 23505-23518.

[5] Brock, Andrew, Jeff Donahue, and Karen Simonyan. "Large scale GAN training for high fidelity natural image synthesis." arXiv preprint arXiv:1809.11096 (2018).