Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HDF5 writing capability ? #354

Closed
zhuam opened this issue Feb 8, 2022 · 32 comments
Closed

Add HDF5 writing capability ? #354

zhuam opened this issue Feb 8, 2022 · 32 comments
Labels
enhancement New feature or request

Comments

@zhuam
Copy link

zhuam commented Feb 8, 2022

rt

thanks

@jamesmudd
Copy link
Owner

Thanks for raising the issue. I would like to add writing support and its currently a work in progress.

I have implemented a few of the prerequisites e.g. the Jenkins hash and there is a branch https://github.com/jamesmudd/jhdf/tree/writing which has a test https://github.com/jamesmudd/jhdf/blob/writing/jhdf/src/test/java/io/jhdf/SimpleWritingTest.java that will write an empty file that can be opened, however there is still lots of work to do.

As this is a spare time project (and I don't have much spare time at the moment) I can't commit to a timescale for implementing writing. I will leave the issue open and point to it for people to react to as a way to gauge interest.

@jamesmudd jamesmudd added the enhancement New feature or request label Feb 8, 2022
@jamesmudd jamesmudd changed the title When can I support HDF5 write capability ? Add HDF5 writing capability ? Feb 8, 2022
@zhuam
Copy link
Author

zhuam commented Feb 10, 2022

Thank you @jamesmudd
Can you add some basic writing functions, such as writing attribable or variable, so that we can also participate

@jamesmudd
Copy link
Owner

Unfortunately its not really easy to add "basic" writing functions, currently there is only support for writing a few very limited structures. jHDF will need to add writing to many more structures and then also add support for laying out the file on disk (currently this is hard coded in the branch). Design of the API also needs to be considered so it makes sense to use.

@jonathanschilling
Copy link

Hi,
regarding the basic functionality, I would like to point to @uhoefel s and my work on the Nujan library (uhoefel/nujan), where we tried to slightly modernize it.
If writing capabilities would be added to this project, we would be happy to retire our Nujan fork in favour of this library :-)

@Reissner
Copy link

Reissner commented Jul 9, 2022

I need also both reading and writing.
My project is an integration of octave into java.
Read/write variables is via save/load to file formats, i.e. streams.
I took over this project from a friend and he used the textual format.
This is slow and also introduces small failures due to decimal/binary conversion.
Well, i could use matlab internal format, but this requires reengineering.
RW access is very much appreciated.

@vedina
Copy link

vedina commented Aug 27, 2022

Read/Write access much appreciated as well

@moraru
Copy link

moraru commented Sep 12, 2022

+1

1 similar comment
@jbfaden
Copy link

jbfaden commented Jan 31, 2023

+1

@jbfaden
Copy link

jbfaden commented Jan 31, 2023

I've been using a NetCDF reader to read some HDF5 files for years, and I would love to be able to use this library. However, I need to be able to write HDF5 as well.

@marcelluethi
Copy link

As many others, I was looking for a solution to write hdf5 files without adding native libraries to my project. After trying and failing with nujan (which writes the hdf5 files just fine, but for some reason the files cannot be read with jhdf) I found a workaround which solves my problem for the moment.

I write the files in the hdfjson and use the converters provided by the hdf-group to create the hdf5 files. It is reasonably straight-forward to write files in the hdf5-json format. I also published my code on github scalismo-hdf5-json in case somebody wants to go down the same route.

However, I would also love to see writing support in jhdf. And at that point I would also like to thank everybody involved in creating jhdf for the awesome work. The project is great and having a pure java library to read hdf5 is super helpful.

@jamesmudd
Copy link
Owner

As many others, I was looking for a solution to write hdf5 files without adding native libraries to my project. After trying and failing with nujan (which writes the hdf5 files just fine, but for some reason the files cannot be read with jhdf) I found a workaround which solves my problem for the moment.

Would you be able to open another issue with an example of a file jHDF can not open I might be able to fix that.

I write the files in the hdfjson and use the converters provided by the hdf-group to create the hdf5 files. It is reasonably straight-forward to write files in the hdf5-json format. I also published my code on github scalismo-hdf5-json in case somebody wants to go down the same route.

However, I would also love to see writing support in jhdf. And at that point I would also like to thank everybody involved in creating jhdf for the awesome work. The project is great and having a pure java library to read hdf5 is super helpful.

Thanks a lot for the comments. Know there is a lot of interest in writing support and I hope to get some time to work on it soon!

@marcelluethi
Copy link

Sorry for the delayed reply. You can find a simple example that showcases the problem in this gist.

It throws the following exception when reading the file:
io.jhdf.exceptions.UnsupportedHdfException: Superblock extension is not supported

It is easily possible to change nujan such that it does set the superblock extension flag differently. However, after doing that, another error was thrown, which I had no idea how to solve. My knowledge of hdf5 is, unfortunately, extremely limited.

@thadguidry
Copy link
Sponsor

Sponsoring you now and also specifically to help work on this issue, sent $300. Go, Go Go @jamesmudd !

@jamesmudd
Copy link
Owner

@thadguidry Thanks very much for the sponsorship. I will prioritise working on this in my free time and hopefully give an update soon.

@jamesmudd
Copy link
Owner

Some good progress #530 implements lots of the required logic. Its a large PR so still needs quite a lot of tidy up, but it can write the structure of a file (i.e. only groups but in any nesting). So its a pretty big step IMO. Want to try and clean this up a bit and merge it then will look at writing datasets. Think I will aim for just int[] and double[] initially and then maybe consider a release. Would be happy to hear any feedback

@jons-pf
Copy link

jons-pf commented Jan 19, 2024

Thanks a lot, @jamesmudd (and @thadguidry for the sponsorship)!

@thadguidry
Copy link
Sponsor

@jamesmudd Love how you worked on this, just reading your commits shows how you probably spent the first hour just thinking, researching, and writing down an outline of work to be done! You saw the big chunk of problems and broke them up into small bite sized pieces and broke up even those into very tiny pieces and then began to implement them first and write tests. You'd be a great mentor to others. Seriously.

@Apollo3zehn
Copy link

@jamesmudd it might be useful for the unit tests to use h5dump to dump files written by jhdf which helps to ensure compatibility with the HDF5 C-lib.

The following Github Actions example show how to quickly install h5dump:

- name: Download HDF5 installer
  if: steps.cache-primes.outputs.cache-hit != 'true'
  run: wget -q https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.14/hdf5-1.14.1/bin/unix/hdf5-1.14.1-2-Std-ubuntu2204_64.tar.gz

- name: Install
  run: |
    tar -xzf hdf5-1.14.1-2-Std-ubuntu2204_64.tar.gz
    hdf/HDF5-1.14.1-Linux.sh --prefix=hdf --skip-license
    sudo ln -s $(pwd)/hdf/HDF_Group/HDF5/1.14.1/bin/h5dump /usr/bin/h5dump
    h5dump --version

And then use it in the unit tests to compare it against the expected dump output (C# example):

var actual = DumpH5File(filePath);

var expected = File
    .ReadAllText($"DumpFiles/attribute_on_group.dump")
    .Replace("<file-path>", filePath);

Assert.Equal(expected, actual);

// ...

public static string? DumpH5File(string filePath)
{
    var dump = default(string);

    var h5dumpProcess = new Process 
    {
        StartInfo = new ProcessStartInfo
        {
            FileName = "h5dump",
            Arguments = filePath,
            UseShellExecute = false,
            RedirectStandardOutput = true,
            RedirectStandardError = true,
            CreateNoWindow = true
        }
    };

    h5dumpProcess.Start();

    while (!h5dumpProcess.StandardOutput.EndOfStream)
    {
        var line = h5dumpProcess.StandardOutput.ReadLine();

        if (dump is null)
            dump = line;

        else
            dump += Environment.NewLine + line;
    }

    while (!h5dumpProcess.StandardError.EndOfStream)
    {
        var line = h5dumpProcess.StandardError.ReadLine();

        if (dump is null)
            dump = line;

        else
            dump += Environment.NewLine + line;
    }

    return dump;
}

An example h5dump output for a file with one group and two attributes would look like this:

HDF5 "<file-path>" {
GROUP "/" {
   GROUP "group" {
      ATTRIBUTE "attribute 1" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SCALAR
         DATA {
         (0): 99.2
         }
      }
      ATTRIBUTE "attribute 2" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SCALAR
         DATA {
         (0): 99.3
         }
      }
   }
}
}

I think this approach makes it much easier to validate h5 files compared to using the C-library directly (or using a wrapper), because that might become quite difficult for more complex features (e.g. compounds).

When the h5dump call succeeds and the output is as expected, the file is valid.

@jamesmudd
Copy link
Owner

Thanks @Apollo3zehn I really like this idea. Would be good to have tests confirming compatibility. Currently I have been doing this manually but think this approach could work well. Might be a little tricker for my CI as I'm currently building on all platforms but should be possible. I think h5dump supports JSON output so might look at parsing that back to do assertions.

jamesmudd added a commit that referenced this issue Jan 22, 2024
#354 Add Writing Support for file structure
@jamesmudd
Copy link
Owner

#535 is the next PR. Still lots to do, but it successfully writes an int[] dataset which can be read back by jHDF and HDFView. So IMO another milestone. I'm also thinking most of unknowns have been tackled and now its a matter of building out support from this POC.

@jamesmudd
Copy link
Owner

I have now merged #535 which adds basic dataset writing support. Next plan is to make an alpha release so people can try it out. Then work on compatibility testing and cleaning the code up so hopefully others can help building out wider support. Also be interested in what writing support is most useful to prioritise.

@jons-pf
Copy link

jons-pf commented Jan 30, 2024

Amazing, thanks a lot!

There are a bunch of test files in the mainline HDF5 repo: https://github.com/HDFGroup/hdf5/tree/develop/test/testfiles
The ultimate goal for jhdf could be to be able to reproduce all of them?

On a more short term, I would suggest to target the following functionality:

  • bool, char, int, double data types
  • scalars and rank-(1, 2, 3) arrays
  • char[] attributes for documentation purposes

@uhoefel Wonders still happen!

@jamesmudd
Copy link
Owner

jamesmudd commented Jan 31, 2024

Have just published v0.7.0-alpha which includes the initial writing support. It can write group structures, and n-dimensional int and double datasets. Should be on Maven Central shortly.

See WriteHdf5.java for example usage.

If anyone tries this out would be great to hear about the results.

@cfoushee
Copy link

cfoushee commented Mar 1, 2024

I could use this for a project right now if you added the ability to write char[]. I'm assuming that would that simply be a byte[] in java. Also, I would need to be able to write at least 2 attributes types like long and string and associate that with a dataset.

I did try out the WriteHdf5.java and it works perfectly for me.

@jamesmudd
Copy link
Owner

Thanks for giving it a try and great to hear it worked well.

Do you actually want to write char[] or a String dataset?

I think attributes should be possible. I have got a bit side tracked working on interoperability tests. That's proving harder than I thought, so I think I should probably leave that for now. Work on adding some more support like attributes and make a first release with writing.

@cfoushee
Copy link

cfoushee commented Mar 1, 2024

I need to write java byte[] which I assume map to HDF char[]

@jamesmudd
Copy link
Owner

I have just merged support for writing byte[]. Will probably break attributes out into another issue.

@cfoushee
Copy link

cfoushee commented Mar 5, 2024

Successfully was able to write a dataset with bytes. Thank you!!

@jln-ho
Copy link

jln-ho commented Mar 13, 2024

+1

@jamesmudd
Copy link
Owner

Have just released v0.7.0 which adds writing support. Thanks for all the interest in this and hope people give it a try. I am aware it's still limited and intend attributes and string datasets to be the next features to add.

@thadguidry
Copy link
Sponsor

Cool. One of the things I am planning on is eventually having OpenRefine 4.0 have a HDF5 exporter.

@jamesmudd
Copy link
Owner

With the v0.8.0 release im going to close this issue. Writing HDF5 files is now possible with jHDF, there are still things to add, but think these are better tracked as new smaller issues. If you want a writing feature not possible at the moment feel free to open another issue.

Special thanks to @thadguidry for the support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests