-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Splitting NetCDF output with NetCDFOutputWriter #2967
Comments
I'd start by implementing the same feature that JLD2 has: a |
Check out the test for file splitting with
We'll want a practically identical test for a |
Hello, I'm interested in this feature. I've managed to implement the same feature @glwagner suggested of |
Since I want to fully understand the way Julia works, I added also the feature of splitting files based on time. However, this feature will not work well when computing averages Do you have any suggestion on how to approach this? or how do you envision splitting files based on time? I will not merge the changes to the other PR, since I think this time splitting a bit more thought. The code is currently at: |
Can you explain what you are trying to do in more detail? What does it mean to split files based on time? You mean that you want to split files on a I would use the existing Then I guess if you want to have two independent features with interacting schedules, you will have to enforce that the two schedules are compatible / consistent within the constructor for the output writer. Now that I think of it, it would probably better for size-based file splitting to also use schedules (eg a new schedule called This isn't really a Julia-specific issue, it's more of a code design issue I think... |
I would like output files that consistently have for example 30 days, in other words, it requires to create a new file once the time in the netCDF is equal to the
So far I'm only using the
Yes, it seems to me that in order to do it properly, it will be required to make it consistent with the schedules, but looking at the code, and I'm not sure where to start... |
A "schedule" is a function or callable object with a method schedule(model) that returns true or false based on a criterion. The cleanest way to get this feature is to refactor the output writers to have a more generic interface for splitting. If we have a property called NetCDFOutputWriter(model, outputs; file_splitting = TimeInterval(30days), ...) Then the decision about whether to start a new file will change from
to writer.file_splitting(model) && start_next_file(model, writer) Next, you will have to add a new schedule in mutable struct FileSizeLargerThan <: AbstractSchedule
max_filesize :: Float64
path :: String
end
(fslt::FileSizeLargerThan)(model) = filesize(fslt.path) >= fslt.max_filesize Finally, you need to add a user interface for initializing and modifying the schedules to smooth out the user experience (for example, we don't want users to have to specify the file path more than once, and the file path that is checked by the schedule has to be updated). This will have to take two parts. In FileSizeLargerThan(max_filesize) = FileSizeLargerThan(max_filesize, "") Then in output writers, an interface to be used in both the model constructor and update_schedule!(schedule, path) = nothing
update_schedule!(schedule::FileSizeLargerThan, path) = schedule.path = path This function
and also needs to be added in the output writer constructor so that
becomes filepath = joinpath(dir, filename)
update_schedule!(schedule, filepath) Make sense? |
This is a lot more sustainable than adding new properties to every output writer every time we want to support splitting a file by a different criterion. It's a decent change to the user interface. I can help if you like. |
Yes, that will be great. I will dive in into the schedule over the next few days! |
Ok, I'll open a PR that refactors the interface for file splitting. |
Opening this request as per the Slack conversation. Sometimes file sizes become very big (and thus are not easy to transfer, etc.), and so quite often it would probably be good to split data files according to time intervals.
So something with API like
Is there any specific things that should be noted if we are to extend the
start_next_file()
functionality to NetCDF as well instead of just JLD2? Like any flags or things that I should note.The text was updated successfully, but these errors were encountered: