-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconstruct Results / Implementation Details #8
Comments
I just noticed that the two "optimal" hyperparameters you mentioned in #3 don't match the KL values from the paper for PTB. For the Standard setting, your hyperparameter suggests lat_dim=50 and kappa=5 or kappa=35, which produces a KLD of 0.2 or 7.6, respectively. For the 5.7 written in the paper, and the same dimension, kappa must be something between 28 and 29. The same for Yelp, the configuration from #3 produces a KLD of 19.6, not the mentioned 18.6. The KLDs listed above were calculated using your implementation of vMF from vmf_batch with the following code: >>> from NVLL.distribution.vmf_batch import *
>>> vMF(hid_dim=1, lat_dim=50, kappa=5).kld
tensor([0.2372], device='cuda:0')
>>> vMF(hid_dim=1, lat_dim=50, kappa=35).kld
tensor([7.6284], device='cuda:0')
>>> vMF(hid_dim=1, lat_dim=50, kappa=80).kld
tensor([19.5847], device='cuda:0')
>>> vMF(hid_dim=1, lat_dim=50, kappa=28.6).kld
tensor([5.6961], device='cuda:0') Could you please provide the correct used configurations, or tell me whether I'm doing something wrong? |
Sorry for late response. I will take a look at the questions mentioned and try to give better response. |
Hi,
I have some questions regarding the implementation, and I can't reproduce the perplexities reported in the paper.
exp(recon_loss + kl)
? As far as I understand (wikipedia, here and here), the perplexity is a measure of how well the model output matches the given data. It should treat the model as a black box, likeexp(entropy)
orexp(cross-entropy)
. Especially, models with a high value forkappa
are impacted negatively, due to the constant kl term. E.g.,kappa->infty
, which is aequivalent to a (not variational) autoencoder on the hypersphere, always produces a PPL of infinity.kappa
, which would result in an increased PPL with the definition in point 3. The other linked implementation does not do this.Thanks!
The text was updated successfully, but these errors were encountered: