-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix GPU memory leak in TextPairRegressor #3490
Fix GPU memory leak in TextPairRegressor #3490
Conversation
27c0638
to
a4e7223
Compare
Just pushed a type hint change for a file that I didn't really touch, but I think is checking because I added type hints to Not sure if it's the case that this should just be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! Mostly looks good. Just please correct the return signature of the embedding property, as indicated in the comment above.
I also added a suggestion on checking for the concatenated data point, which may give a minimal speed-up during training (though likely negligible so up to you if you want to include it).
Regarding the lemmatizer, I'm not so sure. Probably best to err on the side of caution and use the old, underspecified signature.
keep reference to concatenated sentence that is created when not embedding data points separately in a DataPair. those embeddings are then able to be cleared in clear_embeddings, freeing the memory from GPU
3accd91
to
1a6f110
Compare
I have addressed all of the requests I believe, let me know if you can approve or request any further changes |
I just realized that this same issue exists for The simple fix is to just apply the same logic to Since these inherit from different base classes, it doesn't seem like the best idea to try to do this via inheritance |
same as fix for TextPairRegressor
@MattGPT-ai thanks for fixing this and adding the type hints! I'll merge this now. The same bug in |
I actually pushed an additional commit to fix it in This would give better control over how these embeddings are set in the Currently in |
Hi @MattGPT-ai, is it a good idea and agree that saving the concatenated data point may lead to problematic inconsistencies. I'm a bit worried though that moving too much logic into the data class will cause the logic to become too distributed. Right now, the |
#3487
This addresses the above memory leak by saving a reference to the
TextPair
when the concatenatedSentence
is created. Its embeddings are explicitly cleared whenclear_embeddings
is called.This also adds type hints to files relevant to this change.