Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickle memory maps in SliceableDataChunkIterator #536

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

CodyCBakerPhD
Copy link
Member

@CodyCBakerPhD CodyCBakerPhD commented Jul 30, 2023

sister PR to hdmf-dev/hdmf#924 and hdmf-dev/hdmf-zarr#111

This acts as a first test implementation by allowing efficient pickling of a memory-map based iterator. Will add and test more conditions over time

Also, to be clear, a np.memmap (and Zarr arrays for that matter) are pickleable by default; however, they do it by pickling the data they contain, not a packaged set of instructions for re-instantiating that object after reloading (in our case, on a separate process)

So we want to override such methods to more efficiently allow re-initialization of the lazy mappings to the source data

Opening in draft for now

@CodyCBakerPhD CodyCBakerPhD self-assigned this Jul 30, 2023
Comment on lines -82 to +83
" use DataInterface.add_to_nwbfile()."
" use DataInterface.add_to_nwbfile().",
stacklevel=2,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found this useful in a separate conversion project; kept getting the warning but was unsure where it was coming from (had several data interfaces in the conversion)

Comment on lines 5 to 11
from hdmf.data_utils import GenericDataChunkIterator as HDMFGenericDataChunkIterator

# from hdmf.data_utils import GenericDataChunkIterator as HDMFGenericDataChunkIterator
from hdmf.data_utils import GenericDataChunkIterator

class GenericDataChunkIterator(HDMFGenericDataChunkIterator):

# class GenericDataChunkIterator(HDMFGenericDataChunkIterator):
class _GenericDataChunkIterator(GenericDataChunkIterator):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a temporary reversion, I think I can restore it - just need to test it doesn't break the serialization

@@ -90,3 +95,42 @@ def _get_maxshape(self) -> tuple:

def _get_data(self, selection: Tuple[slice]) -> np.ndarray:
return self.data[selection]

def is_pickleable(self) -> bool:
if isinstance(self.data, (np.memmap, np.array)):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to remove np.array here; easy support is only offered for memory-map like lazy access to source data

Comment on lines -1146 to +1147
if np.issubdtype(extended_data.dtype, np.object_):
extended_data = extended_data.astype("str", copy=False)
# if np.issubdtype(extended_data.dtype, np.object_):
# extended_data = extended_data.astype("str", copy=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: revert

@codecov
Copy link

codecov bot commented Aug 23, 2023

Codecov Report

Merging #536 (bcb08f0) into main (7b978ca) will decrease coverage by 0.54%.
Report is 39 commits behind head on main.
The diff coverage is 31.03%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #536      +/-   ##
==========================================
- Coverage   91.59%   91.06%   -0.54%     
==========================================
  Files          96       96              
  Lines        4806     4833      +27     
==========================================
- Hits         4402     4401       -1     
- Misses        404      432      +28     
Flag Coverage Δ
unittests 91.06% <31.03%> (-0.54%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
src/neuroconv/basedatainterface.py 87.80% <ø> (ø)
...c/neuroconv/tools/spikeinterface/spikeinterface.py 92.93% <ø> (-0.04%) ⬇️
src/neuroconv/tools/hdmf.py 67.85% <31.03%> (-32.15%) ⬇️

... and 5 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant