Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to store video #1647

Open
GoktugAlkan opened this issue Feb 16, 2023 · 12 comments
Open

Need to store video #1647

GoktugAlkan opened this issue Feb 16, 2023 · 12 comments
Labels
category: question questions about code or code behavior

Comments

@GoktugAlkan
Copy link

Hello,

We are planning to use the nwb format to share all our raw data from the experiments. For one recording session, we have around 300 short videos (each approx. 6 seconds) showing the behavior of the animal. To store these videos in the nwb format, I was trying to use the following approach:

  1. Load the mp4 videos into python.
  2. Extract the RGB frames from the videos.
  3. Concatenate the frames to a tensor (a numpy array).
  4. Save these tensors in a TimeSeries object.
  5. Add the TimeSeries object to the acquisition field of the nwb file.

The above steps work fine, i.e., the nwb file containing all raw videos of the session is held in the RAM. When I want to write this nwb file onto my disk I am facing a memory error. I think this is due to the fact that the tensors that I created based on the RGB frames are too large.

This is supported by the following observation: One short video is approximately 0,8 MB. The tensor created according to the above procedure results in a numpy array of an approximate size of 80 MB.

Based on this, I can conclude the following: Since the biggest part of the session raw data consists of these short videos, the nwb file will have a size that is approximately 100 times bigger than the actual raw data due to the tensors. Normally, I was expecting/hoping that the nwb file would have approximately the same memory size as the raw data of the session.

The lab members do not want to refer to a file, instead, it is required to store all the data in the nwb file. In addition, I think that there is no reference method for an mp4 file.

Using compressing methods would maybe work, but there is no possibility to store zipped files (e.g. gzip) in the nwb file.

I am stuck at this point. I would be very grateful if you could help me.

Thanks in advance!

@bendichter
Copy link
Contributor

@GoktugAlkan thanks for the detailed question. This is indeed a nuanced topic in NWB.

We have ways of helping you deal with the RAM problem, iteratively reading and then writing data. This is commonly used for e.g. long sessions of calcium imaging. However, I don't think that would be the most appropriate approach here because you would still end up with a file that is 100x the size of your original data, which is not ideal. Using lossless compression would help, but you'd probably only be able to compress maximum 50%, so you'd still have a file that's 50x the original size.

The problem is that mp4 uses a lossy compression codec for video that is very efficient and is not available in HDF5 due to strange licensing constraints. Because of this, we recommend videos like these that show behavior to be stored externally and to have a reference to these files within the NWB file. This is considered best practice in this case (though not for videos like calcium imaging where mp4's lossy compression is not appropriate).

Here is a tutorial for how to add a neurodata object that is a reference to an external video file.

@GoktugAlkan
Copy link
Author

Thanks a lot for the response. The solution to refer to external files is not very satisfying because the lab members request a file containing all raw data.

I will look up the compressing methods used in mp4. If I understand the concepts correctly, also in the mp4 video there are RGB frames. The question is then how these frames are stored. Storing the frames, in the same way, would actually solve my problem.

Thanks again for your help

@bendichter
Copy link
Contributor

@GoktugAlkan unfortunately it's not that simple. H.264 and H.265 are the standard approaches for mp4 and they work by having key frames and then storing the diff between that key and subsequent frames. It's an algorithm that spans multiple frames, so you won't be able to use this algorithm frame-by-frame. Even if you could, that would make the data quite difficult to interpret for anyone coming to this file later.

If storing the data within one file is really important, than I would recommend trying HDF5's built-in compression algorithms to see how much they help. This should be as simple as adding one line to your data writing program. See the tutorial here.

@GoktugAlkan
Copy link
Author

Thanks a lot for your support Ben! I'll have a look on that.

@rly
Copy link
Contributor

rly commented Mar 16, 2023

Hi @GoktugAlkan , I was wondering if you found a satisfactory solution to your problem, and how HDF5's built-in compression algorithms compare to MP4 compression.

I also want to mention again that the best practice of storing video files as MP4 outside of the NWB file is accepted by the DANDI Archive. Here is an example NWB file from the International Brain Lab on DANDI: https://dandiarchive.org/dandiset/000409/draft/files?location=sub-CSH-ZAD-001
The NWB file listed there contains an ImageSeries object that references an mp4 file in the adjacent folder with the same name.

@GoktugAlkan
Copy link
Author

GoktugAlkan commented Mar 16, 2023

Hi @rly,

I first tried the compression method with gzip but this didn't improve the storage efficiency significantly. It's a while ago, that's why I don't remember exact numbers anymore but I can collect some numbers considering the storage space needed for gzip and the non-compressed data, respectively.

Based on this, I decided to create a folder with all the mp4-files and to reference the ImageSeries objects to the corresponding file in that folder. However, I still think that this is not a very satisfactory solution for our problem.

I am wondering if you could build in the compression methods of mp4 (H.264 and H.265 @bendichter) into nwb, i.e. would it be possible to store the mp4 files compressed with H.264 or H.265 in an ImageSeries object? You could also build in the corresponding decompressing function inside the same object such that one can retrieve the compressed mp4 file from the ImageSeries object and apply the decompressing method to obtain the video.

Let me know if this makes sense to you and if I could contribute to the implementation of such an option. Also let me know if you have further comments or suggestions.

@oruebel
Copy link
Contributor

oruebel commented Mar 16, 2023

I am wondering if you could build in the compression methods of mp4 (H.264 and H.265 @bendichter) into nwb

While it is possible to implement custom compressors for HDF5, there are several challenges with implementing H.264 and H.265 for HDF5 (or most other chunk-based array stores for that matter). H.264 and H.265 are covered by a number of patents and would require licensing for integration. Also these compressors are based on key-frames (and difference between frames) while HDF5 compression operates on chunks. Movie players and other common video tools won't know how to load movies from HDF5 files.

@GoktugAlkan
Copy link
Author

Thanks for the clarification @oruebel. The solution with referencing to another folder is not satisfactory but I'll continue with this solution in this case until a better solution is developed.

@rly
Copy link
Contributor

rly commented Mar 16, 2023

@GoktugAlkan Just so that I have a better understanding of the problem, why do your lab members request a single file containing all of the video data? Is it for ease of analysis? Ease of sharing data? Ease of maintaining data integrity? Is this a requirement from an institution, funder, or publisher? Thanks in advance.

@GoktugAlkan
Copy link
Author

GoktugAlkan commented Mar 17, 2023

@rly The reason why we want to have all the data in one single file is that this is the most convenient way of sharing our data with other people. For example, if someone who we may potentially collaborate with wants to get a feeling for the data, we thought that we could simply give him/her the nwb file containing all the data. It would be a bit more complicated if we gave him an nwb file and also access to a server where he/she would find the video files. In such a situation we could profit a lot from a single file containing everything.

@dfsp-spirit
Copy link

dfsp-spirit commented Jun 30, 2023

@GoktugAlkan But how would they watch the videos embedded in the pynwb file? As @oruebel mentioned, no standard video player software will be able to open them.

@GoktugAlkan
Copy link
Author

@dfsp-spirit I switched to the proposed standard way of storing the videos, i.e., now, there is a reference inside the nwb files to the corresponding videos stored on the disk.

Before applying this procedure, I stored the the frames of the videos as tensors and then used the cv2 (I used the function VideoWriter) library to convert the tensors to video files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: question questions about code or code behavior
Projects
None yet
Development

No branches or pull requests

6 participants