Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fsspec default caching not cleaning; filesystem full #62

Open
CodyCBakerPhD opened this issue May 19, 2024 · 1 comment
Open

Fsspec default caching not cleaning; filesystem full #62

CodyCBakerPhD opened this issue May 19, 2024 · 1 comment

Comments

@CodyCBakerPhD
Copy link
Collaborator

Eventually, after enough runs of the benchmarks, the fsspec + caching test fills up my temporary directory (2 TB in size) with files

image

and the benchmarks themselves throw errors such as

             For parameters: 'https://dandiarchive.s3.amazonaws.com/blobs/fec/8a6/fec8a690-2ece-4437-8877-8a002ff8bd8a', 'ElectricalSeriesAp', (slice(0, 30000, None), slice(0, 384, None))
             Traceback (most recent call last):
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv\benchmark.py", line 68, in <module>
                 main()
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv\benchmark.py", line 60, in main
                 commands[mode](args)
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\run.py", line 72, in _run
                 result = benchmark.do_run()
                          ^^^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\_base.py", line 661, in do_run
                 return self.run(*self._current_params)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\time.py", line 165, in run
                 samples, number = self.benchmark_timing(
                                   ^^^^^^^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\time.py", line 289, in benchmark_timing
                 timing = timer.timeit(number)
                          ^^^^^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\timeit.py", line 180, in timeit
                 timing = self.inner(it, self.timer)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "<timeit-src>", line 3, in inner
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\_base.py", line 644, in redo_setup
                 self.do_setup()
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\time.py", line 80, in do_setup
                 result = Benchmark.do_setup(self)
                          ^^^^^^^^^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\_base.py", line 632, in do_setup
                 setup(*self._current_params)
               File "D:\GitHub\nwb_benchmarks\src\nwb_benchmarks\benchmarks\time_remote_slicing.py", line 118, in setup
                 self.nwbfile, self.io, self.file, self.bytestream, self.tmpdir = read_hdf5_nwbfile_fsspec_with_cache(
                                                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "D:\GitHub\nwb_benchmarks\src\nwb_benchmarks\core\_streaming.py", line 74, in read_hdf5_nwbfile_fsspec_with_cache
                 (file, byte_stream, tmpdir) = read_hdf5_fsspec_with_cache(s3_url=s3_url)
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "D:\GitHub\nwb_benchmarks\src\nwb_benchmarks\core\_streaming.py", line 55, in read_hdf5_fsspec_with_cache
                 byte_stream = filesystem.open(path=s3_url, mode="rb")
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\implementations\cached.py", line 449, in <lambda>
                 return lambda *args, **kw: getattr(type(self), item).__get__(self)(
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\spec.py", line 1298, in open
                 f = self._open(
                     ^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\implementations\cached.py", line 449, in <lambda>
                 return lambda *args, **kw: getattr(type(self), item).__get__(self)(
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\implementations\cached.py", line 365, in _open
                 f.cache = MMapCache(f.blocksize, f._fetch_range, f.size, fn, blocks)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\caching.py", line 129, in __init__
                 self.cache = self._makefile()
                              ^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\caching.py", line 147, in _makefile
                 fd.flush()
             OSError: [Errno 28] No space left on device

             For parameters: 'https://dandiarchive.s3.amazonaws.com/blobs/38c/c24/38cc240b-77c5-418a-9040-a7f582ff6541', 'TwoPhotonSeries', (slice(0, 3, None), slice(0, 796, None), slice(0, 512, None))
             Traceback (most recent call last):
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv\benchmark.py", line 68, in <module>
                 main()
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv\benchmark.py", line 60, in main
                 commands[mode](args)
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\run.py", line 72, in _run
                 result = benchmark.do_run()
                          ^^^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\_base.py", line 661, in do_run
                 return self.run(*self._current_params)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\time.py", line 165, in run
                 samples, number = self.benchmark_timing(
                                   ^^^^^^^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\time.py", line 289, in benchmark_timing
                 timing = timer.timeit(number)
                          ^^^^^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\timeit.py", line 180, in timeit
                 timing = self.inner(it, self.timer)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "<timeit-src>", line 3, in inner
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\_base.py", line 644, in redo_setup
                 self.do_setup()
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\time.py", line 80, in do_setup
                 result = Benchmark.do_setup(self)
                          ^^^^^^^^^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\asv_runner\benchmarks\_base.py", line 632, in do_setup
                 setup(*self._current_params)
               File "D:\GitHub\nwb_benchmarks\src\nwb_benchmarks\benchmarks\time_remote_slicing.py", line 118, in setup
                 self.nwbfile, self.io, self.file, self.bytestream, self.tmpdir = read_hdf5_nwbfile_fsspec_with_cache(
                                                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "D:\GitHub\nwb_benchmarks\src\nwb_benchmarks\core\_streaming.py", line 74, in read_hdf5_nwbfile_fsspec_with_cache
                 (file, byte_stream, tmpdir) = read_hdf5_fsspec_with_cache(s3_url=s3_url)
                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "D:\GitHub\nwb_benchmarks\src\nwb_benchmarks\core\_streaming.py", line 55, in read_hdf5_fsspec_with_cache
                 byte_stream = filesystem.open(path=s3_url, mode="rb")
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\implementations\cached.py", line 449, in <lambda>
                 return lambda *args, **kw: getattr(type(self), item).__get__(self)(
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\spec.py", line 1298, in open
                 f = self._open(
                     ^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\implementations\cached.py", line 449, in <lambda>
                 return lambda *args, **kw: getattr(type(self), item).__get__(self)(
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\implementations\cached.py", line 365, in _open
                 f.cache = MMapCache(f.blocksize, f._fetch_range, f.size, fn, blocks)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\caching.py", line 129, in __init__
                 self.cache = self._makefile()
                              ^^^^^^^^^^^^^^^^
               File "C:\Users\theac\anaconda3\envs\nwb_benchmarks_created_5_19_2024\Lib\site-packages\fsspec\caching.py", line 147, in _makefile
                 fd.flush()
             OSError: [Errno 28] No space left on device

This seems related to the caching mode fsspec uses where it 'reserves' space on disk equivalent to the size of the file and fills in the bytes as requests are received - for small files this is intuitive and fine but for large files it's a pain due to issues like this

A user reported a similar pain with this kind of cache when they tried setting (accidentally) their cache inside an automatically syncing Google Drive folder, which overloaded both their I/O and WiFi speeds, slowing their computer to a crawl (and maxing out their drive storage)

Just something to be aware of in general when assessing the default caching for fsspec, but in the meantime...

I think I expressed doubts about the automatic cleaning functionality of tempfile.TemporaryDirectory.cleanup() before; I highly recommend we follow the pytest strategy of keeping a global folder (also possibly in local/temp but under a reserved name) that we can send repeated shutil.rmtree commands to both at the beginning and end of benchmark runs (therefore giving enough leeway for file locks to have released over time)

@CodyCBakerPhD
Copy link
Collaborator Author

This problem is actually SO bad on my older computer (which has similar architecture to laptops we've seen students use at user days) that I can't even run the benchmarks once without filling up temp space (~250 GB total in User folder; maybe < 100 GB free)

Also lesson to learn here; the location of such a cache really should not be the boot drive - the OS might take most of that and especially on remote servers is usually very slim - I have additional mounted volumes that are meant for bulk space such as fsspec is using here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant