Skip to content

Objectives

Thanos Masouris edited this page Jul 12, 2022 · 6 revisions

Objectives TinyGAN

Illustration of the training objectives in Chang et al. (2020) [1].

The main goal of the knowledge distillation framework is for the student’s generator to approximate the distribution of the teacher’s generator, or mimic its functionality. With this goal in mind, we leverage the objectives described in [1]. In particular, the authors propose the following objective for the generator:

$$\mathcal{L}_S = \mathcal{L}_{KD\_feat} + \lambda_1 \mathcal{L}_{KD\_pix} + \lambda_2\mathcal{L}_{KD\_S} + \lambda_3\mathcal{L}_{GAN\_S}$$

Where,

  • The Feature-Level Distillation Loss is calculated from feature maps extracted using the Discriminator’s network for both the images generated by the student network, and the images generated by the teacher one. In particular, it is calculated as the weighted sum of the L1 distances between feature maps of different levels for the pairs of student-teacher generated images. It is used to alleviate the blurriness of the generated images, when only using the pixel-level distillation loss described below.
$$L_{KD\_feat}=\mathbb{E}_{z, y}\left[\Sigma_{i} \alpha_{i}\left\|D_{i}(T(z, y), y)-D_{i}(S(z, y), y)\right\|_{1}\right]$$
  • The Pixel-Level Distillation Loss is calculated as the pixel-level L1 distance between the images generated by the teacher and the student networks. This loss is particularly important for the early stages of training, as it provides supervision while the student network tries to mimic the functionality of the teacher.
$$L_{KD\_pix}=\mathbb{E}_{z \sim p(z), y \sim q(y)}\left[\|T(z, y)-S(z, y)\|_{1}\right]$$
  • The Adversarial Distillation Loss is used to make the student’s generator approximate better the distribution of the teacher’s. In particular, the discriminator is used to provide an adversarial loss for the generated images by the student network, for the inputs used in the teacher network.
$$L_{KD\_S}=-\mathbb{E}_{z, y}[D(S(z, y), y)]$$
  • The Adversarial GAN Loss is used to ensure that the model also learns to produce images similar to the ones of the selected dataset. Thus, it utilizes the discriminator to produce an adversarial loss for randomly generated images of the student network. The generator loss $ L_{GAN\_S} $ is the same as the $ L_{KD\_S} $.

As for the discriminator, we use the following objective

$$\mathcal{L}_{D} = \mathcal{L}_{KD\_D} + \lambda_4 \mathcal{L}_{GAN\_D}$$

Where,

  • The Adversarial Distillation Loss is used to encourage the discriminator to distinguish between generated images by the student network, and the ones generated by the teacher network. Therefore, in this case the student images are treated as fake, while the teacher images as real.
$$L_{KD\_D}=\mathbb{E}_{z, y}[\max (0,1-D(T(z, y), y)) + \max (0,1+D(S(z, y), y))]$$
  • The Adversarial GAN Loss is used to encourage the discriminator to distinguish between student generated images, and images from the distribution of the real data.
$$L_{GAN\_D}=\mathbb{E}_{x, y}[\max (0,1-D(x, y))]+\mathbb{E}_{z, y}[\max (0,1+D(S(z, y), y))]$$

All of the aforementioned adversarial losses are calculated using the Hinge version [2] of the adversarial loss, instead of binary cross-entropy.

References

[1] Chang, Ting-Yun, and Chi-Jen Lu. "Tinygan: Distilling biggan for conditional image generation." Proceedings of the Asian Conference on Computer Vision. 2020.

[2] Lim, Jae Hyun, and Jong Chul Ye. "Geometric gan." arXiv preprint arXiv:1705.02894 (2017).