Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load all timesteps for a given analysis_assim_extend reference time #156

Open
kvanwerkhoven opened this issue Apr 30, 2024 · 4 comments
Open
Assignees

Comments

@kvanwerkhoven
Copy link
Member

I keep running into issues with missing data due to the behavior of the analysis_assim loading function, which loads partial files for the first and last dates of a requested range (i.e., the values times versus reference times within the requested range). I frequently load portions of a period and then add a few days or reload part of the period for more locations, etc. I am constantly inadvertently overwriting complete files with partial files as a result. I believe the default expected behavior should be to always load all timesteps for a given dataset. Thus if a parquet file exists with a given reference timestamp, one can expect that it includes all the data for that reference time. If there was a reason for the opposite behavior, let's discuss.

@kvanwerkhoven
Copy link
Member Author

@samlamont @mgdenno Has this issue been resolved? I need to load an additional day of data for Hurricane Debby and wondering if it's possible or if I have to reload the entire period to get the complete AnA time series without gaps.

@mgdenno
Copy link
Contributor

mgdenno commented Aug 8, 2024

Unfortunately, no, it hasn't be addressed yet. I thought it had, but Sam and I reviewed open issues last week and this has not been resolved. Just off the top of my head, one workaround besides redownloading all data might be to write to a different location and then give conflicting files unique names when moving the new data to the existing data directory.

@samlamont
Copy link
Collaborator

Hi @kvanwerkhoven apologies for the delay on this. Just to confirm, we want the fetching of NWM analysis data to behave like any other forecast (prioritizing reference_time)? So if one day is requested for a single location for the standard analysis configuration (analysis_assim), we would end up with 24 files, each containing 3 timesteps (looking back). And we would no longer have a need for the t_minus_hours argument. Does that sound right? cc. @mgdenno

@kvanwerkhoven
Copy link
Member Author

Yes, I think either eliminate t_minus_hours or ensure that when certain hours are requested, all of those hours are returned for all reference times within the requested range. The issue I am having is that, for example, if the start date I request is 8/8 and n_day=5 with t_minus_hours = [0...28], on the first (8/8) and last day (8/13) it does not load all hours [0...28]. If I need to extend to add more days to the period (e.g., say 8/14-8/15), the only way I can get complete data on 8/13 and 8/14 is to reprocess the full period from 8/8-8/15. There could be use cases in the future when having the t_minus_hours option is useful, so I might lean toward suggesting the latter option - i.e., change the priority to always load all the hours requested on the dates requested - i.e., treating the date as reference time rather than value time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants