Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The batch size would influence size of snapshot(.pkl)? #18

Open
landian60 opened this issue Apr 5, 2023 · 4 comments
Open

The batch size would influence size of snapshot(.pkl)? #18

landian60 opened this issue Apr 5, 2023 · 4 comments

Comments

@landian60
Copy link

landian60 commented Apr 5, 2023

Hello, thanks for your great job.
I had tried the experiment, and found that different batch size will change the sizes of checkpoint. Does the _fourier_embs_cache item affect the snapshot size? And if so, should train and test on the same snapshot have the same batch size?
tks.

@universome
Copy link
Owner

universome commented Apr 18, 2023

Hi @landian60 , could you please provide additional information (e.g., the sizes of the checkpoints). The batch size could indeed influence the checkpoint size since we cache the fourier features which likely leak into the model's checkpoint due to the persistence_class decorator.

@landian60
Copy link
Author

landian60 commented Apr 19, 2023

Thanks for your kind reply even the work is almost 2 yrs ago.
Also, if I train with a batch size of 16 per GPU, the checkpoint size is 4.86GB. If I train with a batch size of 24 per GPU, the checkpoint size is 5.91GB.
I changed the batch size to fully utilize the V100. So, it turns out that Fourier features occupy a large space and they wouldn’t affect the test process?
And could the checkpoint space be saved by caching just one group of Fourier features and repeating batch size numbers on a new dimension?
tks!

@landian60
Copy link
Author

And I have another question about extrapolating outside of image boundaries.
If I want to change the positional encoding coordinates from [0,1] to [-0.3,1.3], should I change the resolution of the logarithmic_basis? But if I do that, the size would not match with the const_embs.

@universome
Copy link
Owner

Hi @landian60, you are correct about "could the checkpoint space be saved by caching just one group of Fourier features and repeating batch size numbers on a new dimension". I guess, my reasoning back then was to cache the Fourier features for the whole batch to avoid additional memory allocation (which I thought could be expensive). To be honest, I do not remember benchmarking this (I only remember benchmarking "caching" vs "no caching") — so you might try it. Also, back then, I was not aware of torch.expand function (which does not allocate new memory) — it should be cheaper to use than torch.repeat in this scenario: I suspect that since we do concatenation afterward, there should anyway be new memory allocations/deallocations and then it might not matter much whether you used torch.expand or torch.repeat.

For extrapolation, you shouldn't change the basis. We didn't use const embeddings to train the generator on bedrooms to perform extrapolation afterwords.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants