Skip to content

Knowledge Distillation Framework

Thanos Masouris edited this page Aug 4, 2022 · 6 revisions

Training a GAN from scratch is an intricate procedure [1], especially on complex datasets. In addition, the current state of the art models [2][3] showcase a trend of scaling up to achieve better performance. So the question is whether or not it is possible to generate quality images using a smaller model. In Romero et al. (2014), the authors propose a Knowledge Distillation framework for network compression. In this framework the knowledge of a pre-trained teacher network (big model) is used to train a student network (small model), which is then able to achieve comparable results with the former’s, while requiring significantly fewer parameters. In Chang et al. (2020) [5], the authors adopted this framework and proposed a black-box knowledge distillation method designed for GANs. Their proposed model, TinyGAN, successfully distils BigGAN, by achieving competitive performance, while reducing the number of parameters by a factor of 16.

References

[1] Salimans, Tim, et al. "Improved techniques for training gans." Advances in neural information processing systems 29 (2016).

[2] Brock, Andrew, Jeff Donahue, and Karen Simonyan. "Large scale GAN training for high fidelity natural image synthesis." arXiv preprint arXiv:1809.11096 (2018).

[3] Kang, Minguk, et al. "Rebooting acgan: Auxiliary classifier gans with stable training." Advances in Neural Information Processing Systems 34 (2021): 23505-23518.

[4] Romero, Adriana, et al. "Fitnets: Hints for thin deep nets." arXiv preprint arXiv:1412.6550 (2014).

[5] Chang, Ting-Yun, and Chi-Jen Lu. "Tinygan: Distilling biggan for conditional image generation." Proceedings of the Asian Conference on Computer Vision. 2020.