Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move from ros3 to fsspec #379

Open
wants to merge 18 commits into
base: dev
Choose a base branch
from
Open

move from ros3 to fsspec #379

wants to merge 18 commits into from

Conversation

bendichter
Copy link
Contributor

@bendichter bendichter commented Jun 25, 2023

also created a context manager read_nwb, which will probably migrate to PyNWB eventually

Motivation

fix #293

I'd like to build zarr support into the new read_nwb context manager, so I'd like to get this merged before looking at supporting the zarr backend

create a context manager read_nwb, which will probably migrate to PyNWB eventually
@CodyCBakerPhD
Copy link
Collaborator

Have you been able to run this branch successfully on your local device? I.e.,

nwbinspector 000350 --stream

as well as

nwbinspector 000350 --stream --n-jobs 4

(which should then run about ~4 times faster)

The CI reports certain issues, and when I try this I also encounter a couple of problems when I try it on the Hub (especially the second one - any time I've ever tried parallelizing fsspec streaming something goes weird and it just indefinitely stalls)

If this does run for you, then could you also report a speed comparison between this branch and current dev (which would still use ros3 as the baseline)

Also, is there any significant downside to keeping ROS3 as an option for the environments where it is already configured? Given that we haven't run into any more problems with it in the past ~6 months; fsspec could just be an alternate mode for people that don't / can't do the special h5py installation through conda

@bendichter
Copy link
Contributor Author

I didn't anticipate these issues with parallelizing. I'll keep working on this and see if I can iron this out, with a speed comparison. In my experience fsspec is faster and more reliable than ros3, but I agree IRS worth a test. I'm not aware of any use case that would require ros3 instead of fsspec, so I'd rather drop support for it unless we can come up with a compelling reason to keep it.

@CodyCBakerPhD
Copy link
Collaborator

It might just require a refactor of the parallelization step to use joblib instead of concurrent.futures.ProcessPoolExecutor

See examples: https://gist.github.com/CodyCBakerPhD/b9d7d10ce1572dafcbd4646251781234

The ThreadPoolExecutor is possibly an easier option but last time I ran speed tests using it it bottlenecked pretty quickly and didn't scale as well as multiprocessing

@bendichter
Copy link
Contributor Author

@CodyCBakerPhD What do you think of this? I am adding back ros3 as an available "method" of open_nwb.

@codecov-commenter
Copy link

Codecov Report

Merging #379 (87bb6f3) into dev (afb3685) will decrease coverage by 0.63%.
The diff coverage is 63.63%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #379      +/-   ##
==========================================
- Coverage   90.99%   90.37%   -0.63%     
==========================================
  Files          20       20              
  Lines        1133     1132       -1     
==========================================
- Hits         1031     1023       -8     
- Misses        102      109       +7     
Flag Coverage Δ
unittests 90.37% <63.63%> (-0.63%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/nwbinspector/tools.py 46.66% <ø> (-2.39%) ⬇️
src/nwbinspector/nwbinspector.py 84.18% <63.33%> (-1.98%) ⬇️
src/nwbinspector/testing.py 91.17% <66.66%> (+1.98%) ⬆️

... and 1 file with indirect coverage changes

@CodyCBakerPhD
Copy link
Collaborator

Sequential stream is working now, and is about 33% faster than ros3

The command

nwbinspector 000350 --stream --n-jobs 8 [or anything >=2]

Still stalls out as per my demonstration in #379 (comment)

I'd recommend either switching to use joblib for parallelization (which apparently doesn't have whatever mysterious issue fsspec has with ProcessPoolExecutor)
OR an automatic switch in the CLI logic to use ros3 as default stream mode when n_jobs != 1
OR raise an informative error if someone tries fsspec + n_jobs != 1 telling them that combination only works for ros3

@CodyCBakerPhD
Copy link
Collaborator

Update: as mentioned in meeting, joblib won't work for us either because it breaks the generator behavior of the inspection functions

Martin mentioned a possible fix for fsspec so I'll try that out and see where it leads and then we can decide on whether to keep ros3 around just for parallelization - I do agree that we should try to move away from it however, largely due to the upcoming Zarr ssupport

As I play around more with the context, as well as reflections on issue NeurodataWithoutBorders/pynwb#1711 and PR NeurodataWithoutBorders/nwbwidgets#295, I think we might want to make the introduction of the read_nwbfile context function an entirely separate PR of its own with extensive tests to ensure it behaves as we expect - we have already seen users attempt to define their own such helper functions with mixed success, so we really need to ensure that the version of it we support is done the proper way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: update from ros3 to fsspec
3 participants