-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data type mismatches in model specification #603
Comments
Adding excerpts from the two model specifications. Note the different column data types julia> ADRIA.model_spec(dom)[:, [:component, :fieldname, :val, :bounds, :default_bounds]]
253×5 DataFrame
Row │ component fieldname val bounds default_bounds
│ String Symbol Real Tuple… Tuple…
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ EnvironmentalLayer dhw_scenario 1 (1.0, 51.0) (1.0, 51.0)
2 │ EnvironmentalLayer wave_scenario 1 (1.0, 51.0) (1.0, 51.0)
3 │ Intervention guided 0 (-1.0, 19.0) (-1.0, 19.0)
julia> rs.model_spec[:, [:component, :fieldname, :val, :bounds, :default_bounds]]
253×5 DataFrame
Row │ component fieldname val bounds default_bounds
│ String31 String Float64 String String
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ EnvironmentalLayer dhw_scenario 1.0 (1.0, 51.0) (1.0, 51.0)
2 │ EnvironmentalLayer wave_scenario 1.0 (1.0, 51.0) (1.0, 51.0)
3 │ Intervention guided 0.0 (-1.0, 19.0) (-1.0, 19.0) |
At the line Line 408 in 9422a04
model_spec.csv is loaded, it is possible to pass an argument types with an array of types for each column. The question is: how to get this list of types matching ADRIA.model_spec(dom) - since dom is not available inside this function?
I don't know what the best approach would be, but what do you think about having one row with types for each col inside |
Yes, that's one approach - to list the types in a comment directly above the column headers and make that part of the specification. Alternatively, switch to Zarr format instead of CSV and put the metadata of column types somewhere in Zarr. Or specify it in the ResultSet metadata (inside the datapackage.json file?)... Personally I like the Zarr-based solution, but it would come with additional complications... |
I also like the Zarr-based solution. I don't know what are the complications (can you elaborate?) but one problem I see is that is it harder to implement than just adding a commented row to the csv. |
Yes easier, but doing that would open a whole lot of security implications. Because we'd be evaluating string as code to obtain the equivalent data types. Actually, this is true for all the ideas above (but somewhat less so for Zarr). Maybe I'm overthinking it. Let's sleep on it. |
I was thinking: is it possible to just pass |
Do you remember what the motivation is behind keeping the Domain and ResultSet separate? |
Maybe, if we're willing to drop support for any prior versions, the simplest solution is to take the expected types from within ADRIA itself. This would be simple to do as we can take the types from the model spec, as generated from the default factor values...
This was the first thing you worked on. I said not to delete the function because we still want to be able to merge ResultSets. Later we want to be able to add to an existing ResultSet and other things too. |
Yes, you're right, sorry. Stupid idea. |
I think now is a good time to implement changes that are important like this but have the cost of dropping support to prior versions. Soon we will release 1.0.0 (right?) anyway. That said, what do you mean by "take the types from within ADRIA"? What I like about having the types saved somewhere within result set is that it allows us support other models/domains in the future. So far I'm leaning towards the Zarr-based solution.
Ok, I remember that. |
The model specification is defined by the composition of the various components (Intervention, CriteriaWeight, EnvironmentalLayer, Coral). I think only EnvironmentalLayer requires Domain metadata (e.g., the number of available trajectories), but have to check this. Given we only want the data types, I guess it should be possible to create a model spec with just the default values and infer the types from that. The Zarr-based approach has the same issue where we'd have to infer the correct types from a metadata string - running strings as code should be avoided. But again, maybe I'm overthinking it. |
Model specification as generated by
ADRIA.model_spec(dom)
and loaded throughrs = load_results(path)
do not have the same data types for the same columns.This leads to issues particularly for analyses which may expect Symbols vs Strings and vice versa (see for example: #601 (comment))
The text was updated successfully, but these errors were encountered: