Reconstruct Results / Implementation Details #8

thequilo · 2019-03-09T08:48:40Z

Hi,

I have some questions regarding the implementation, and I can't reproduce the perplexities reported in the paper.

I'd be interested in How to Test your model #5, as well.
I can't reproduce the results mentioned in the paper. Even with the configurations suggested in Hyperparams for results reported in the paper #3 and About Implementation #7. My best model on PTB produces a PPL of 110.
Why don't you fix when I use yelp (with condition bit 50), I got this error message. #4 in your code? I don't think the majority of users still uses pytorch0.3.1.
Why are you computing the perplexity like exp(recon_loss + kl)? As far as I understand (wikipedia, here and here), the perplexity is a measure of how well the model output matches the given data. It should treat the model as a black box, like exp(entropy) or exp(cross-entropy). Especially, models with a high value for kappa are impacted negatively, due to the constant kl term. E.g., kappa->infty, which is aequivalent to a (not variational) autoencoder on the hypersphere, always produces a PPL of infinity.
Why do you sample in the latent space multiple times (nsample here, set to 3 be default), and then compute the mean? As far as I understand, the sampling and mean process produces samples that are not part of the vMF anymore, and the resulting samples have a recuded variance compared to the original samples. The reduced variance could as well be achieved by increasing kappa, which would result in an increased PPL with the definition in point 3. The other linked implementation does not do this.

Thanks!

The text was updated successfully, but these errors were encountered:

thequilo · 2019-03-19T21:07:03Z

I just noticed that the two "optimal" hyperparameters you mentioned in #3 don't match the KL values from the paper for PTB. For the Standard setting, your hyperparameter suggests lat_dim=50 and kappa=5 or kappa=35, which produces a KLD of 0.2 or 7.6, respectively. For the 5.7 written in the paper, and the same dimension, kappa must be something between 28 and 29. The same for Yelp, the configuration from #3 produces a KLD of 19.6, not the mentioned 18.6.

The KLDs listed above were calculated using your implementation of vMF from vmf_batch with the following code:

>>> from NVLL.distribution.vmf_batch import *
>>> vMF(hid_dim=1, lat_dim=50, kappa=5).kld
tensor([0.2372], device='cuda:0')
>>> vMF(hid_dim=1, lat_dim=50, kappa=35).kld
tensor([7.6284], device='cuda:0')
>>> vMF(hid_dim=1, lat_dim=50, kappa=80).kld
tensor([19.5847], device='cuda:0')
>>> vMF(hid_dim=1, lat_dim=50, kappa=28.6).kld
tensor([5.6961], device='cuda:0')

Could you please provide the correct used configurations, or tell me whether I'm doing something wrong?

jiacheng-xu · 2019-04-13T04:14:00Z

Hi,

I have some questions regarding the implementation, and I can't reproduce the perplexities reported in the paper.

I'd be interested in How to Test your model #5, as well.

I can't reproduce the results mentioned in the paper. Even with the configurations suggested in Hyperparams for results reported in the paper #3 and About Implementation #7. My best model on PTB produces a PPL of 110.

Why don't you fix when I use yelp (with condition bit 50), I got this error message. #4 in your code? I don't think the majority of users still uses pytorch0.3.1.

Why are you computing the perplexity like exp(recon_loss + kl)? As far as I understand (wikipedia, here and here), the perplexity is a measure of how well the model output matches the given data. It should treat the model as a black box, like exp(entropy) or exp(cross-entropy). Especially, models with a high value for kappa are impacted negatively, due to the constant kl term. E.g., kappa->infty, which is aequivalent to a (not variational) autoencoder on the hypersphere, always produces a PPL of infinity.

Why do you sample in the latent space multiple times (nsample here, set to 3 be default), and then compute the mean? As far as I understand, the sampling and mean process produces samples that are not part of the vMF anymore, and the resulting samples have a recuded variance compared to the original samples. The reduced variance could as well be achieved by increasing kappa, which would result in an increased PPL with the definition in point 3. The other linked implementation does not do this.

Thanks!

The test or evaluation could be done using eval_nvdm.py and eval_nvrnn.py
When talking about your best model, do you mean a vMF model or Gaussian or pure LSTM? I will take a look on this issue tho.
The update of pytorch is pretty frequent. I might not be able to handle the update timely.
recon_loss + kl is basically the NLL loss. In VAE we are referring to the ELBO, which is the black box you mentioned here.
The computation of expectation is intractable so we use the sampling method to approximate it. Multiple samples do help reduce the variance.

Sorry for late response. I will take a look at the questions mentioned and try to give better response.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconstruct Results / Implementation Details #8

Reconstruct Results / Implementation Details #8

thequilo commented Mar 9, 2019

thequilo commented Mar 19, 2019

jiacheng-xu commented Apr 13, 2019

Reconstruct Results / Implementation Details #8

Reconstruct Results / Implementation Details #8

Comments

thequilo commented Mar 9, 2019

thequilo commented Mar 19, 2019

jiacheng-xu commented Apr 13, 2019