Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for dependent blocks in decompression #190

Open
cnuernber opened this issue Mar 8, 2022 · 3 comments
Open

Support for dependent blocks in decompression #190

cnuernber opened this issue Mar 8, 2022 · 3 comments

Comments

@cnuernber
Copy link

Reading an apache arrow file we got:

Dependent block stream is unsupported (BLOCK_INDEPENDENCE must be set).

Is there any interest in supporting this feature? Our system decompresses columns in parallel so block level parallelism in decompression isn't necessary so my thought is to simply concatenate all blocks and decompress them in one shot.

@cnuernber
Copy link
Author

The work around for this is to use zstd - unfortunately lz4 is the default format for many of these pathways.

@cnuernber
Copy link
Author

The go code manually resizes the dictionary - https://github.com/pierrec/lz4/blob/v4/reader.go#L180.

The java code completely hides the dictionary leading to it being - I think - impossible to do with via simple updates to frameinputstream.

@jpountz - Is it a viable pathway to do a simple update to the java bindings in order to support dependent frames? Another pathway would be to just call the C library directly via FFI bindings.

@cnuernber
Copy link
Author

I was able to (hopefully temporarily) work around this using ffi bindings to the c library. Unfortunately this means users need to ensure liblz4 is available on their system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant