-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement LRU cache eviction for persistent compilation cache #21394
Conversation
906ee17
to
0de3c9c
Compare
e03c0d5
to
c7999c8
Compare
758c287
to
f28de67
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you link to the design doc?
Also, would be good to have it documented somewhere?
Like in the https://jax.readthedocs.io/en/latest/persistent_compilation_cache.html file?
Note, I saw some NFS server being configured to not update mtime to speed up the server. Maybe document that this can happen and in that case, this will revert to creation time?
The first time I saw the behavior without knowing the reason, it took times to understand what was going on.
Yes, I suspect there's a chance you might see stale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall nice work!
This is a first cut at the LRU eviction implementation, so it isn't expected to work well with network file systems yet (notably GCS, which many Cloud TPU users use for their cache storage). We'll iterate from here. I don't think we should publicly document this until it works well across filesystems, but absolutely agree this should eventually be in https://jax.readthedocs.io/en/latest/persistent_compilation_cache.html. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, but please address the comment in tests.
Test fails because the test utilises
|
Add filelock to build/test-requirements.txt and to the deps in tests/BUILD. Or skip the test for now if filelock is not importable. |
Just realised that JAX had a |
All comments resolved |
jax/BUILD
Outdated
":monitoring", | ||
":path", | ||
"//jax/_src/lib", | ||
"//third_party/py/filelock", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add that to lru_cache
deps?
You will want to use the py_deps
macros for this: py_deps("filelock")
. We have a list of deps in jax.bzl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where can I find the lru_cache
deps?
This should unblock #21394, which uses filelock in the compilation cache. PiperOrigin-RevId: 641310140
This should unblock #21394, which uses filelock in the compilation cache. PiperOrigin-RevId: 641310140
079b4a6
to
f46d41f
Compare
Thank you for preparing this. Please squash the long chain of commits, or at least most of them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please squash the commits.
1eb11f9
to
dfb947e
Compare
Co-authored-by: Sergei Lebedev <[email protected]>
@nouiz I've just added the link to the first comment. |
This PR is part of the implementation of LRU cache eviction using the
mtime
attribute provided by the filesystem. The current PR does not support GCS, but this problem will be solved in a subsequent PR.More details in the design doc: https://docs.google.com/document/d/111YibwGXOFb_hMm-lua1u63QooAzIBEH-xfRPGmibis/edit?usp=sharing