Substitute types by abstractions. #100

ilopezgp · 2022-01-26T03:59:18Z

Relaxes the allowed types of some objects and input args for improved generalization:

Vector - > AbstractVector
Matrix -> AbstractMatrix
Sparse.SparseMatrix -> AbstractMatrix
Union{Vector, Matrix} -> AbstractVecOrMat

For covariance matrices in output space,

Matrix -> Union{AbstractMatrix, UniformScaling}

Tests involving different matrix types are now included for all Ensemble Kalman Processes.

charleskawczynski · 2022-01-26T15:22:50Z

src/DataContainers.jl

@@ -16,9 +16,9 @@ Container to store data samples as columns in an array.
 """
 struct DataContainer{FT <: Real}
    #stored data, each piece of data is a column [data dimension × number samples]
-    stored_data::Array{FT, 2}
+    stored_data::AbstractMatrix{FT}


https://docs.julialang.org/en/v1/manual/performance-tips/#Avoid-fields-with-abstract-containers

And same goes for elsewhere.

I would say most of infa here isn't performance critical so dynamic dispatch on a field is fine if all the computation is in the following LA operations.

A concern that can be limiting for the package is unnecessary memory use. In some cases, we want to store covariances that are extremely sparse, or even diagonal. Casting them and storing them as a Matrix{FT} then takes up much more space than e.g. Diagonal {FT}. Whichever solution solves this problem would work. Let me know how to proceed.

I would write it as generically as possible and then if there is some performance hotspots after a bit of profiling they should be straightforward to address. You don't need to specialize on every Matrix type here (as you said there are many) so IMO what is written is fine.

I wasn't suggesting to specialize on every type of matrix, just to make the type (and its properties) concrete. This was @glwagner's suggestion, too. It's not much overhead to have very concrete types and simply not specialize on methods that use it.

then you cannot mix and match representations (different matrix types might be required for different fields). Until performance is actually an issue I wouldn't worry about it and then you can go back and parameterize over type constraints for specialization when those constraints are better understood.

for calibration the costs are going to be dominated by actually running the models and IO, everything else I'm guessing won't matter much (~python speed is fine) but we'll see as we go

A concern that can be limiting for the package is unnecessary memory use. In some cases, we want to store covariances that are extremely sparse, or even diagonal. Casting them and storing them as a Matrix{FT} then takes up much more space than e.g. Diagonal {FT}. Whichever solution solves this problem would work. Let me know how to proceed.

@ilopezgp the point that @charleskawczynski was making is the difference between

struct DataContainer{FT <: Real} #stored data, each piece of data is a column [data dimension × number samples] stored_data::AbstractMatrix{FT}

and

struct DataContainer{M <: AbstractMatrix} #stored data, each piece of data is a column [data dimension × number samples] stored_data::M

In the first case, stored_data is abstractly typed (versus concretely typed), which means that the compiler cannot infer the type of stored_data given the type DataContainer. This has performance penalties (which @jakebolewski argues are not important). But it's utterly fundamental and so an important point to grasp.

Writing M <: AbstractMatrix will preclude UniformScaling. Omitting <: AbstractMatrix is the conservative choice. You might also write M <: Union{AbstractMatrix, UniformScaling}. However this is risky because you may be failing to anticipate other valid matrix-like types.

I understand the difference between abstract types and parameterized types, thank you. I think the conversation was about whether it practically matters in this application.

src/ParameterDistributions.jl

src/UnscentedKalmanInversion.jl

odunbar · 2022-01-27T18:53:14Z

Could I ask that you add the specific changes to the PR comment as a reference for the conventions we take? i see currently

Vector - > AbstractVector
Matrix -> AbstractMatrix
Sparse.SparseMatrix -> AbstractMatrix
Union{Vector, Matrix} -> AbstractVecOrMat
Vector{Vector} -> Iterable{Vector}

And on from Greg's point, we currently will exclude UniformScaling. (is this just for aesthetic?)

ilopezgp · 2022-01-27T18:57:49Z

Could I ask that you add the specific changes to the PR comment as a reference for the conventions we take? i see currently

Vector - > AbstractVector

Matrix -> AbstractMatrix

Sparse.SparseMatrix -> AbstractMatrix

Union{Vector, Matrix} -> AbstractVecOrMat

Vector{Vector} -> Iterable{Vector}

And on from Greg's point, we currently will exclude UniformScaling. (is this just for aesthetic?)

UniformScaling was in fact a specific request in issue #99, so we should include it.

odunbar

This is a good step towards greater flexibility! LGTM and addresses #99

Add tests.

ilopezgp · 2022-01-27T20:01:40Z

Closes #99 (the specific problems initially raised).

ilopezgp · 2022-01-27T20:01:53Z

bors r+

bors · 2022-01-27T20:10:35Z

Build succeeded:

132: [WIP] Substitute types with abstractions r=tsj5 a=tsj5 This PR implements the abstract typing done in CliMA/EnsembleKalmanProcesses.jl#100, e.g. `Array{FT, 2}` → `AbstractMatrix{FT}`, in order to be consistent with that dependency. See the discussion concerning performance at that PR; use of abstract types is [recommended against](https://docs.julialang.org/en/v1/manual/performance-tips/#Avoid-fields-with-abstract-containers) for perf reasons, but the rationale here is that the code is essentially "glue" rather than numerical routines appearing in hot loops, so writing for generality over perf is justified. Another downside is that existing abstract types aren't "abstract" enough, e.g. `UniformScaling` is not a subtype of `AbstractMatrix` and must be handled separately. As a benefit, the changes made here result in some method signatures being more strongly typed than they are in `master`, allowing us to replace repeated code with multiple dispatch ("Don't Repeat Yourself"). `MCMC` is changed to a mutable struct, instead of continuing the current code's practice of enabling mutability by making fields of the struct 1x1 Arrays instead of scalars. This change is a moot point, however, since it will be overridden by PR #130. Co-authored-by: Thomas Jackson <[email protected]>

ilopezgp linked an issue Jan 26, 2022 that may be closed by this pull request

Covariance matrix obs_noise_cov is restricted to Array{FT, 2} #99

Closed

charleskawczynski reviewed Jan 26, 2022

View reviewed changes

ilopezgp force-pushed the abstract_types_1 branch from 51303d5 to 04af23d Compare January 26, 2022 20:18

ilopezgp changed the title ~~[WIP] Substitute types by abstractions.~~ Substitute types by abstractions. Jan 26, 2022

ilopezgp requested review from charleskawczynski and jakebolewski January 26, 2022 20:20

jakebolewski reviewed Jan 27, 2022

View reviewed changes

ilopezgp force-pushed the abstract_types_1 branch from 04af23d to e3323c9 Compare January 27, 2022 17:15

odunbar approved these changes Jan 27, 2022

View reviewed changes

Substitute types by abstractions.

525b500

Add tests.

ilopezgp force-pushed the abstract_types_1 branch from e3323c9 to 525b500 Compare January 27, 2022 19:57

bors bot merged commit db54e9f into main Jan 27, 2022

bors bot deleted the abstract_types_1 branch January 27, 2022 20:10

tsj5 mentioned this pull request Jan 28, 2022

Substitute types with abstractions CliMA/CalibrateEmulateSample.jl#132

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Substitute types by abstractions. #100

Substitute types by abstractions. #100

ilopezgp commented Jan 26, 2022 •

edited

Loading

charleskawczynski Jan 26, 2022 •

edited

Loading

jakebolewski Jan 26, 2022 •

edited

Loading

ilopezgp Jan 26, 2022

jakebolewski Jan 26, 2022 •

edited

Loading

charleskawczynski Jan 27, 2022

jakebolewski Jan 27, 2022 •

edited

Loading

jakebolewski Jan 27, 2022 •

edited

Loading

glwagner Jan 27, 2022 •

edited

Loading

ilopezgp Jan 27, 2022

odunbar commented Jan 27, 2022 •

edited

Loading

ilopezgp commented Jan 27, 2022

odunbar left a comment

ilopezgp commented Jan 27, 2022

ilopezgp commented Jan 27, 2022

bors bot commented Jan 27, 2022

Substitute types by abstractions. #100

Substitute types by abstractions. #100

Conversation

ilopezgp commented Jan 26, 2022 • edited Loading

charleskawczynski Jan 26, 2022 • edited Loading

Choose a reason for hiding this comment

jakebolewski Jan 26, 2022 • edited Loading

Choose a reason for hiding this comment

ilopezgp Jan 26, 2022

Choose a reason for hiding this comment

jakebolewski Jan 26, 2022 • edited Loading

Choose a reason for hiding this comment

charleskawczynski Jan 27, 2022

Choose a reason for hiding this comment

jakebolewski Jan 27, 2022 • edited Loading

Choose a reason for hiding this comment

jakebolewski Jan 27, 2022 • edited Loading

Choose a reason for hiding this comment

glwagner Jan 27, 2022 • edited Loading

Choose a reason for hiding this comment

ilopezgp Jan 27, 2022

Choose a reason for hiding this comment

odunbar commented Jan 27, 2022 • edited Loading

ilopezgp commented Jan 27, 2022

odunbar left a comment

Choose a reason for hiding this comment

ilopezgp commented Jan 27, 2022

ilopezgp commented Jan 27, 2022

bors bot commented Jan 27, 2022

ilopezgp commented Jan 26, 2022 •

edited

Loading

charleskawczynski Jan 26, 2022 •

edited

Loading

jakebolewski Jan 26, 2022 •

edited

Loading

jakebolewski Jan 26, 2022 •

edited

Loading

jakebolewski Jan 27, 2022 •

edited

Loading

jakebolewski Jan 27, 2022 •

edited

Loading

glwagner Jan 27, 2022 •

edited

Loading

odunbar commented Jan 27, 2022 •

edited

Loading