Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reading slices of data #52

Open
jamesmudd opened this issue Feb 20, 2019 · 7 comments
Open

Add support for reading slices of data #52

jamesmudd opened this issue Feb 20, 2019 · 7 comments
Labels
enhancement New feature or request

Comments

@jamesmudd
Copy link
Owner

It would be nice to be able to read subsets of the datasets, and probably to offer iterators through datasets returning slices. The way to specify slices need to be figured out.

@jamesmudd jamesmudd added the enhancement New feature or request label Sep 8, 2019
@slathi18
Copy link

We are using this library in our project. Now the requirement is to read large datasets. It would be of great help if this slicing feature is available in it.
Do you have any plans to release slicing feature in the near future?

@jamesmudd
Copy link
Owner Author

Thanks for the comment. I would really like to add this feature and I don't think it would be too much work. Unfortunately this is a spare time project for me so I can't really commit to a time scale. I would try to take a look in the next week and see how quick this could be.

@jamesmudd
Copy link
Owner Author

I have had a look at this and have a WIP branch https://github.com/jamesmudd/jhdf/tree/slicing-support I actually think adding basic slicing support will not be very long task. You didn't say whay type of datasets you wanted to slice currently I am looking at implementing contiguous, then adding chunked slicing support would be another task. I still can't commit to a timescale but would like to release this as soon as possible and will update the ticket with progress.

@jamesmudd
Copy link
Owner Author

There is now a PR #361 which adds support for slicing of contiguous datasets. Here is a jar jhdf-0.6.6-slice-beta.zip with built from the PR (needs to be renamed .zip.jar to workaround Github file restrictions)

@slathi18 you didn't mention the type of datasets you wanted to slice so not sure if this support is enough for your use case or not, if you give the jar a try would be great to get feedback.

There is a new method Dataset#getData(long[] offset, int[] sliceDimensions) which allows you to specify a slice you would like to take.

The PR still needs more tests and docs then need to look at chunked datasets.

@slathi18
Copy link

Thanks for this quick change. I would like to slice the large hdf files which will be around 11GB in size and read the project information resides within it. So I want to slice this big file in chunks so that it won't take a long time to load this huge file.

@jamesmudd
Copy link
Owner Author

@slathi18 are the datasets in your files contiguous or chunked? If they are contiguous then this jar might already work for you.

@jamesmudd
Copy link
Owner Author

#361 adds support for contiguous datasets. This is released in v0.6.6. Support for chunked datasets still needs to be implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants