Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How much time did it take to train on FFHQ? #82

Open
shionhonda opened this issue Apr 20, 2022 · 5 comments
Open

How much time did it take to train on FFHQ? #82

shionhonda opened this issue Apr 20, 2022 · 5 comments

Comments

@shionhonda
Copy link

Thanks for this great work!

I'm trying to train e4e from scratch on the face domain (mostly the same as FFHQ, but 512x512 resolution). Now it has been trained for 100k steps and the reconstruction results look fine so far.
The problem is, training proceeds very slowly. It is estimated to take more than one week to reach 300k steps on a single Tesla T4 GPU. I keep the size of the validation set to 1000, so the time consumed for evaluation is trivial.

My questions are as follows:

  • How much time did it take to train e4e on FFHQ? (the paper says it was 3 days with P40 GPU for the cars domain, so it's 1-2 weeks for FFHQ 1024x1024?)
  • Do you possibly to have any loss curve when you trained e4e? I'd like to know if we could prune the training in the middle (say, at 200k steps). I confirmed, at least, 100k steps were not enough. Both reconstruction and editing performed poorly.
  • Do you have any idea to reduce the time for training? I kind of suspect we could increase the learning rate to 0.001 or around and make the model converge faster

I know I should experiment with these myself, but since each trial takes a long time, so any suggestion will help.
I appreciate your kind reply.

@caopulan
Copy link

caopulan commented Aug 30, 2022

It costs 5 days to train FFHQ-1024x1024 (500k iterations) on 3090.

@shionhonda
Copy link
Author

@caopulan Thanks for sharing! Do you mean 50,000 iterations, I guess?

@shionhonda
Copy link
Author

Do you have any idea to reduce the time for training? I kind of suspect we could increase the learning rate to 0.001 or around and make the model converge faster

In my experiment learning rate could safely be raised to 0.001.

@caopulan
Copy link

@caopulan Thanks for sharing! Do you mean 50,000 iterations, I guess?

I'm sorry, I mean 500,000 iterations. I have corrected it.

@caopulan
Copy link

And I found distributed training is not effective. Training on 8 cards accelerates only 1.5~2x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants