Skip to content

Black Box Distillation

Thanos Masouris edited this page Jul 12, 2022 · 2 revisions

We adopt the proposed knowledge distillation framework of Chang et al. (2020) [1], where the teacher network is applied as a black-box, requiring only limited access to its input-output pairs. In particular, the teacher model is utilized by collecting its outputs, or generated images, given a random noise vector as input, along with a class label. The created collection consists not only of the generated images, but also of the input vectors for the noise and the class labels. This collection, or dataset, is then leveraged to train the student network in a supervised way. Following this approach, no knowledge of the internal states or intermediate features of the model is required. In addition, upon the creation of the dataset, the teacher network can be discarded, since it does no longer participate in the training of the student network.

Black-Box TinyGAN

Illustration of the proposed black-box distillation in Chang et al. (2020) [1].

References

[1] Chang, Ting-Yun, and Chi-Jen Lu. "Tinygan: Distilling biggan for conditional image generation." Proceedings of the Asian Conference on Computer Vision. 2020.