Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feather file has issue with compression #294

Closed
behrica opened this issue Mar 8, 2022 · 7 comments
Closed

feather file has issue with compression #294

behrica opened this issue Mar 8, 2022 · 7 comments

Comments

@behrica
Copy link
Contributor

behrica commented Mar 8, 2022

Reading

https://github.com/scicloj/scicloj.ml-tutorials/blob/main/data/tweets_sentiment.feather?raw=true

fails with

Execution error at net.jpountz.lz4.LZ4FrameOutputStream$FLG/validate (LZ4FrameOutputStream.java:362).
Dependent block stream is unsupported (BLOCK_INDEPENDENCE must be set)

I followed the setup instructions for arrow support in TMD.

@cnuernber
Copy link
Collaborator

Thanks, will take a look. What compressed this file?

@cnuernber
Copy link
Collaborator

Tracking this upstream - lz4/lz4-java#190

@cnuernber
Copy link
Collaborator

A temporary (hopefully) solution I am going to try is to use FFI bindings to call into the C library directly. This one is going to be a tough one as the only example of dependent frame compression I can find is the go library.

@behrica - How did you produce this file?

@cnuernber
Copy link
Collaborator

The point of the question is is this pathway going to be the standard pathway everyone is using or did you produce this file with some magic set of options that very few other people are going to use?

@cnuernber
Copy link
Collaborator

You will have to now also include jna and ensure that liblz4 is on your system which is system-dependent. My recommendation is to avoid dependent block compression on lz4 so if that was a parameter set it to false.

@behrica
Copy link
Contributor Author

behrica commented Mar 9, 2022

The point of the question is is this pathway going to be the standard pathway everyone is using or did you produce this file with some magic set of options that very few other people are going to use?

Not that I remember.

I think I created it in the simplest possible way from R:

x=readr::read_csv( ...)
arrow::write_feather(x , ...)

It came out while I was working on the file from #292 ,
and the above was my attempt to get the data into clojure (via feather ...)

@cnuernber
Copy link
Collaborator

That was my fear - then these things will be all over the place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants